571 HW #1

  1. Goals
  2. Background
  3. Parsing
  4. Programming
  5. Files

1. Goals

Through this assignment you will:

  • Explore the basics of automatic parsing.
  • Begin to gain some familiarity with the Natural LanguageToolkit (NLTK)
  • Gain some experience with the cluster and condor

[back to top]

2. Background

Please review the class slides and readings in the textbook on context-free grammars. Also, see Section 8.3 of the NLTK Book for examples of grammars and configuration of the included parsers. We’ll get to the later parts of that chapter soon.

[back to top]

3. Parsing

Create a program to parse the test sentences based on the provided grammar and analyze the results. Specifically, your program should:

  • Load the grammar
  • Build a parser for the grammar using nltk.parse.EarleyChartParser
  • Read in the example sentences
  • For each example sentence, output to a file:
    • The sentence itself
    • The simple bracketed structure parse(s), and
    • the number of parses for that sentence.
  • Finally, print the average number of parses per sentence obtained by the grammar.

[back to top]

4. Programming

Create a program named hw1_parse.sh to perform the parsing as described above, invoked as:

hw1_parse.sh <grammar_file> <test_sentence_file> <output_file>

where

  • <grammar file> is the name of the file holding the grammar rules in the NLTK .cfg format
  • <test_sentence_file> is the name of the file holding the set of sentences to parse, one sentence per line
  • <output_file> is the name of output file for your system

[back to top]

5. Files

In the dropbox:

You will find the following files in the dropbox:

  • toy.cfg
  • toy_sentences.txt
  • toy_output.txt

These files contain a toy grammar, some toy sentences, and the expected output format described above.

You will also find:

  • sentences.txt
  • grammar.cfg

These two files will be the test data on which to run your parser and generate a hw1_parse.out file.

Files to Submit:

  • hw1.tar.gz, containing:
    • hw1_parse.sh
      • The shell script described above
    • Your source code/binaries invoked by the shell script.
    • hw1_parse.out
      • The output file described above, as run using grammar.cfg and sentences.txt

[back to top]