Natural Language Processing Systems and Applications

N.B.: If you are a student enrolled in this course, please refer to the course page on Canvas. This page is purely for archival purposes.


Days Time (P.M.) Classroom
Tuesdays and Thursdays 1:30-3:20 RAI 116

 

Instructor Teaching Assistant
Name Ryan Georgi David Inman
Office GUG 418-D GUG 407
Contact Use Canvas Inbox
Office Hours Thu 12:30–2:00 or by Appt. TBA

 

Course description

This course examines building coherent systems to handle practical applications. Particular topics vary. This quarter we will focus on automatic summarization.

Course Resources

Textbook There is no required textbook for this course. However, you may find the following reference texts useful.

Some good survey texts in this area are:

A number of published research articles will also provide background for the course. The articles are linked from the syllabus below, and the full citations are found in the reading list.

Historical proceedings from Summarization shared tasks are also available.

Prerequisites:

  • LING 570, 571, 572
  • CSE 373 (Data Structures) or equivalent
  • Math 394 (Probability), MIT EdX 6.041, or equivalent
  • Formal grammars, languages, and automata
  • Programming in one or more of Java, Python, C/C++, or Perl
  • Linux/Unix commands

Grading

  • 60%: Deliverable Code
  • 20%: Project Reports
  • 10%: Project Presentations
  • 10%: Class/Group Participation & Peer Evaluation

NOTE: While participation is set at 10%, this accounts only for the overhead of working in a team. Groupmates are still individually responsible for completing their portion of the group’s project and will receive grades individually.

Code Repository Posting Policy

During the course of the class, you are required to have a private code repository. Due to the nature of the project being potentially CV-worthy, you may make your repository public after the quarter is over, provided that you follow the following public code posting policy.

Course Mechanics

Additional detailed information on grading, collaboration, incompletes, etc.


Tentative schedule, subject to change without notice.

 

Date Topics Readings Assignment Slides/Recordings
Week 1: March 27, 29 Summarization: Intro
Course Structure
Overview
J&M 23.3-; [Sparck Jones, 2007] Deliverable #1 out:
Due April 7, 23:00
1 – Intro
2 – Overview
Week 2: April 3, 5 Evaluation
Exemplar Systems
[Nenkova et al, 2007]; [Radev et al., 2000]
[Erkan & Radev, 2004]
Deliverable #2 out:
Due April 23
3 – Evaluation
4 – Content Selection
Week 3: April 10, 12 Content selection by
Classification, HMMs, and Discourse
[Conroy et al. 2004]
[Conroy et al, 2001]
[Hong & Nenkova, 2014];[Louis et al., 2010]
5 – Content Selection – Supervised and Discourse
6 – Discourse
Week 4: April 17, 19 Discourse, Topic Orientation
& Information Ordering
Otterbacher et al, 2005; Schilder et al, 2008 ; Barzilay et al, 2002 Deliverable #3 out:
Due May 14
7 – Optimization & Information Ordering
8 – Information Ordering
Week 5: April 24, 26 Deliverable #2 Presentations
Week 6: May 1, 3 Information Ordering Bollegala et al., 2012;Barzilay and Lapata, 2005, 2008
Barzilay and Lee, 2004;Conroy et al, 2006
11 – IO – Experts & Entities
12 – IO – Entities
Week 7: May 8, 10 Content Realization Zajic et al., 2007; Vanderwende et al, 2007.;Nenkova, 2008;Wang et al., 2013
Siddarthan et al., 2011
Deliverable #4 out:
Due May 28
13 – Content Realization
14 – Content Realization 2
Week 8: May 15, 17 Deliverable #3 Presentations
Week 9: May 22, 24 Alternate Views of Summarization Liu et al, 2015
Hu and Liu, 2004; Lerman & McDonald, 2009;
Maskey & Hirschberg, 2006
17 – Compression
18 – Alternate Views
Week 10: May 29, 31 Final presentations