Module Code | CSU44062 |

Module Name | Advanced Computational Linguistics |

ECTS Weighting [1] | 5 ECTS |

Semester Taught | Semester 1 |

Module Coordinator/s | Dr Martin Emms |

## Module Learning Outcomes

When students have successfully completed this module they should be able to:

- Understand in general what a probabilistic model is, the distinction between so-called visible and hidden variables, and the distinctive nature of models where each datum is a sequence of varying length, rather then a fixed-size set of features;
- Understand the general idea of unsupervised training as way to set model parameters concerning hidden variables from evidence only on visible variables;
- Describe the Expectation Maximisation (EM) as a general unsupervised technique, make an implementation & identify the crucial steps in proofs of its convergence and property of increasing data likelihood;
- Describe specific instances of this via the models and training algorithms in Machine Translation and Speech Recognition including being able to explain the details of how seemingly infeasibly costly calculations can in fact be feasibly done;
- Describe the further case of the models proposed for hidden ‘topics’ in a document collection and the further modifications to learning to handle these;
- The aim is to give a grounding in so-called unsupervised machine learning techniques, vital to many language-processing technologies. Whilst most time will be directed to these contexts, the techniques themselves are used much more widely in data mining and machine vision for example, and students will gain some insight into this also.

## Module Content

Course is about how to have machines learn from data, with emphasis on techniques which can be applied in language-related areas such as Machine Translation, Speech Recognition and Topic Modeling, though the techniques are applicable much more widely than that. The big emphasis is on **unsupervised machine learning**, which is one trait distinguishing this from other modules concerning machine learning. Another trait is the concern with data items of **arbitrary length**.

Below an enumeration of topic covered:

- Probability basics on collections of variables with discrete outcomes (what word, what topic etc) in particular joint, marginal, and conditional probabilities; the chain rule;
- Idea of machine learning as parameter estimation to maximise likelihood of training data. Case of supervised parameter estimations, with all model variables visible in data; illustrations/proofs that intuitive relative frequency approaches are maximum likelihood estimators. Case of unsupervised parameter estimation, with some model variables hidden in data. Details of how EM algorithm achieves seemingly impossible feat of finding likelihood maximising estimate in this case, with hidden variables summed over;
- Statistical Machine Translation: general (source|target) x target formulation and learning from corpus of sentence pairs; idea of ‘hidden’ alignment variables between sentence pairs; the so-called IBM alignment models; brute-force EM for learning alignment models; efficient exact algorithms avoiding the exponential cost of brute-force EM;
- Generalisation to so-called ‘Phrase-based SMT’. Dealing with the algorithmic challenges of ‘decoding’ ie. finding a best solution;
- Speech Recognition: general Hidden Markov Model (O|S) x S formulation where O is observable speech, and S is hidden state sequence. Brute-force EM for learning HMM parameters from corpus of observed speech; exploration of how the efficient Baum-Welch algorithm achieves seemingly impossible feat of avoiding the exponential cost of brute-force EM;
- Topic Modelling: a technique for assisting the navigation of huge document collections by seeing them as involving hidden or latent ‘topic’ variables; how this can be used to recover hidden relationships between documents; techniques to learn parameters of these models;
- In each case, alongside the explanation of the algorithms, there will be practical work, either developing instances of them, or deploying existing implementations and running them on data sets to concretely see their properties.

## Teaching and Learning Methods

There is a mixture of lectures, tutorials and lab sessions. Most frequently there will be a 3 lectures per week, but there will be occasions where 1 or more of the timetabled lecture sessions will actually be a lab-session or a tutorial. This may happen in anticipation of the setting of a course work assignment.

In some cases there will be alternate forms of an assignment, one form more mathematical requiring ‘pen-and-pencil’ calculations, the other more implementational requiring the implementation of some (or part of some) algorithm.

There will be many further exercises in online materials, all of which students will be encouraged to attempt. To all of the exercises suggested answers will be provided some time after the exercise has been first made available

## Assessment Details

Assessment Component | Brief Description | Learning Outcomes Addressed | % of Total | Week Set | Week Due |

Examination | Take Home Exam | LO1, LO2, LO3, LO4, LO5, LO6 | 70% | N/A | N/A |

Coursework 3 | IBM Model Training | LO1, LO2, LO3, LO4 | 12.5% | Week 9 | Week 10 |

Coursework 2 | EM in a Coin-tossing Scenario | LO1, LO2, LO3, | 9.5% | Week 5 | Week 6 |

Coursework 1 | Probabilistic Models and Inference | LO1, LO6 | 8% | Week 2 | Week 3 |

## Reassessment Details

Take-Home Exam (100%).

## Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by: | 33 hours |

Lecture | 22 hours |

Laboratory | 6 hours |

Tutorial or seminar | 5 hours |

Other | 0 hours |

Independent Study (outside scheduled contact hours), broken down by: | 69 hours |

Preparation for classes and review of material (including preparation for examination, if applicable) | 33 hours |

Completion of assessments (including examination, if applicable) | 36 hours |

Total Hours | 102 hours |

## Recommended Reading List

Online notes will be provided. Sometimes these will direct attention to particular chapters from the following books, as well as possible online sources:

- Jurafsky and Martin, ‘Speech and Language Processing’.

- Russel and Norvig, ‘Artificial Intelligence: A Modern Approach’.

- Phillip Koehn, ’Statistical Machine Translation’ (Associated site: www.statmt.org/book)

- Kevin Murphy, ‘Machine Learning: A Probabilistic Perspective’.

- Witten and Frank, ‘Data Mining Practical Machine Learning Tools and Techniques’.

- Michael Collins, EM (http://www.cs.columbia.edu/~mcollins/6864/slides/em1.4up.pdf)

- Do & Batzoglou, EM (http://ai.stanford.edu/~chuongdo/papers/em_tutorial.pdf)

- Steyvers and Griffiths (2006), ‘Probabilistic Topic Models’. (http://cocosci.berkeley.edu/tom/papers/SteyversGriffiths.pdf)

## Module Pre-requisites

**Prerequisite modules:** N/A

**Other/alternative non-module prerequisites:** The module is self-contained. To be able to do some variants of assignments, ability to program is required. As noted above, there will be an alternative non-programming more maths-based version of any assignment involving programming.

## Module Co-requisites

N/A

## Module Website

For lecture materials the location will be www.scss.tcd.ie/Martin.Emms/4062

For handling assignments use will be made of a Blackboard module.