Language and Computation

Spring 2014, LING 227 01 / 627 01 / PSYC 327 01 (S14), Yale University

Lecturer: Tamás Biró

Online Course Information

Lectures: Tu and Th 4.00-5.15 in WLH 113.

Discussion sections with Jen Runds: Mo 1:30-2:20 and 2:30-3:20 in WLH 205.

Course description

Syllabus

Information for undergrads choosing to write a term paper

Reading list

Final exam: Friday, 05/02 at 9 am.Location: WLH 116
Makeup exam:   Monday, 05/05 at 7 pm.   Location: WLH 208

Lectures

Lecture 1: January 14 (introduction)

Lecture 2: January 16 (language as computation)

Lecture 3: January 21 (regular languages, regular expressions)

Lecture 4: January 23 (FSA)

Lecture 5: January 28 (morphology, FST)

Lecture 6: January 30 (reading pseudo-codes)

Lecture 7: February 04 (finite state phonology, edit distance, text classification)

Lecture 8: February 06 (machine learning, evaluation metrics, basics of smoothing)

Lecture 9: February 11 (probability)

Notes in lieu of lecture 10: February 13 (frequencies and probabilities)

Lecture 11: February 18 (Markov Models)

Lecture 12: February 20 (Viterbi, Forward, Forward-Backward)

Lecture 13: February 25 (formal languages)

Lecture 14: February 27 (formal languages)

Lecture 15: March 04 (cancelled)

Lecture 16: March 06 (Chomsky hierarchy, pumping lemma)

Lecture 17: March 25 (CFG parsing)

Lecture 18: March 27 (PCFG parsing)

Lecture 19: April 01 (computational phonology)

Lecture 20: April 03 (Finite-state phonology)

Lecture 21: April 08 (Optimality Theory)

Lecture 22: April 10 (Semantics)

Lecture 23: April 15 (Computational semantics)

Lecture 24: April 17 (Discourse)

Lecture 25: April 22 (Information Extraction)

Lecture 26: April 24 (Machine Translation)

Assignments

Problem set 1 (due: Febr. 04)

Problem set 2 (due: Febr. 18)

    >>>  Minimal Edit Distance is a distance metric (remarks on your solutions of Problem set 2)

Problem set 3 (due: March 04)

Midterm (due: March 27)

    >>>  Evaluation sheet

Problem set 4 (due: April 10)

    >>>  Remarks on Assignment 4

Problem set 5 (due: April 28)

Readings

(Most of them are password protected.)

Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing: An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition
, Second Edition.
  Book website (maintained by the authors).
  Download Chapter 1 from authors’ website.
  Download Chapter 1 (password protected).

Python stuff:

Magnus Lie Hetland (2008). Beginning Python, 2nd edition.
  Author’s website
  Springer website
  Source codes on the author’s website
  Download chapter 1 (password protected).

Steven Bird, Edward Loper and Ewan Klein (2009). Natural Language Processing with Python. O’Reilly Media Inc.
Recommended readings to come.

Jacob Perkins (2010).Python Text Processing with NLTK 2.0 Cookbook. Packt Publishing.
Chapter 3 on creating corpora.
Chapter 7 on text classification.

Text classification:

Cavnar, William B., and John M. Trenkle (1994). N-gram-based text categorization.
Proceedings of the 1994 Symposium on Document Analysis and Information Retrieval, pp. 161–174.
Download.

Damashek, Marc (1995). Gauging similarity with n-grams: Language-independent categorization of text.
Science 267:843–848. Download.

Resources

TDS IPA-console

Source codes of the .py examples in Hetland: Beginning Python, 2nd. ed.

The Natural Language Toolkit for Python website.