CSU22062 – Natural Language Processing

Module CodeCSU22062
Module NameNatural Language Processing
ECTS Weighting[1]5 ECTS
Semester taughtSemester 2
Module Coordinator/s  Martin Emms

Module Learning Outcomes

On successful completion of this module, students will be able to:

  • LO1 understand and work with implementations of Finite State Automata and regular languages appreciating both their strengths and weakness and the areas of language processing to which they might be applied
  • LO2 understand and work with implementations of context-free grammars and parsers, including stack-based and chart parsers.
  • LO3 understand and work with implementations of probabilistic methods in language processing such as statistical parsers, the use of Hidden Markov Models for speech recognition or statistical machine translation
  • LO4 understand the uses to which Feature Structures may be put in grammars of natural languages
  • LO5 understand some aspects of recursive computations on grammatical structures to serve semantic ends

Module Content

  1. Regular languages
    i. notion of finite state automaton and transducer and areas of
    application
    ii. properties and limitations of finite state methods – centreembedding
    iii. C++ implementation of finite state automata
  2. Context Free languages

i. illustration of applications to natural language and potential
limitations – crossed dependencies
ii. bottom-up and top-down stack-based parsers, including
backtracking. chart-based parsers. Properties of these parsers and
their implementation in C++
iii. long-distance dependencies and slash-grammars

3 Feature structures
i. untyped and typed features structures with their associated
unification algorithms and areas of possible application in language
description
ii. C++ implementation via the LilFes library

4 Brief into to Probailistic Methods in NLP, topic varying year to year,
examples being the use of Hidden Markov models in speech recognition, or
statistical machine translation

5 Brief into recursive computation of semantic values from grammatical structures

Teaching and learning Methods

There is a mixture of lectures, lab sessions and tutorials. Most frequently there will
be a 2 lectures and one lab-session per week, but there will be occasions where 1 or
more of the time-tabled lecture sessions will actually be a lab-session or a tutorial.

There will be many exercises in online materials, all of which students
will be encouraged to attempt; a subset of these will be set as assignments and
graded. To all of the exercises suggested answers will be provided some time after
the exercise has been first made available

Assessment Details

Content

Assessment ComponentBrief DescriptionLearning Outcomes Addressed% of totalWeek setWeek Due
ExaminationIn Person ExamL01 – L0560
Coursework 1FSAsL01 102
Coursework 2Writing CFGsL0284
Coursework 3ParsersL02146
Coursework 4SemanticsL05810

The breakdown of Coursework into individual assignments summarises the previous year; it should be treated as an indicator of what will happen in this year rather than an exact schedule

Reassessment Details

In Person Exam

Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by:33 hours
Lecture
22 hours
Laboratory11 hours
Tutorial or seminar0 hours
Other0 hours
Independent study (outside scheduled contact hours), broken down by:69 hours
Preparation for classes and review of material (including preparation for examination, if applicable33 hours
completion of assessments (including examination, if applicable)36 hours
Total Hours102 hours

Recommended Reading List

Speech and Language Processing.D.Jurafsky and J.L.Martin.
Statistical Machine Translation. Philipp Koehn

Module Pre-requisites

Prerequisite modules: NA

Other/alternative non-module prerequisites: NA

Module Co-requisites

None

Module Website

www.scss.tcd.ie/Martin.Emms/2062