Module Code | CS7IS3 |
Module Name | Information Retrieval and Web Search |
ECTS Weighting [1] | 5 ECTS |
Semester Taught | Semester 1 |
Module Coordinator/s | Assistant Professor Yvette Graham |
Module Learning Outcomes
On successful completion of this module, students should be able to:
- Explain the process of content indexing in information retrieval including stop word removal, conflation (stemming, string-comparison), and the language dependency of these methods;
- Demonstrate an understanding of the importance and application of data structures in efficient information retrieval, in particular inverted file structures;
- Have knowledge of the theoretical basis and operation of standard algorithms for ranked information retrieval, including the term weighting and ranking models e.g. tf-idf weighting, vector-space model, probabilistic model, language modelling;
- Describe the process of relevance feedback for improved ranking in information retrieval, and apply standard relevance feedback algorithms;
- Understand the importance of evaluation in development of search engines, and the application of standard evaluation metrics such as precision and recall and test collections in measuring effectiveness of information retrieval systems, both in terms of the system’s performance and user satisfaction with the system;
- Appreciate the application and operation of search engines in diverse environments e.g. web search, audio-visual search, context-aware and mobile search, patent search, search in microblogs etc.;
- Be able to begin to combine technologies relevant to search systems in novel ways to synthesise new information retrieval applications.
Module Content
Specific topics addressed in this module include:
- Introduction to Web Search;
- Boolean Retrieval;
- Text Processing: Stopword Removal, Stemming, Spelling Correction;
- Index Construction and Compression;
- Probabilistic Information Retrieval;
- Computing Scores for Ranking: BM25, Vector Space Model, PageRank;
- Classification: Naïve Bayes, kNN, decision boundaries;
- Evaluation: Precision, Recall, F-score, NDCG;
- Link Analysis;
- Web Crawling;
- Question Answering;
- Personalisation.
Teaching and Learning Methods
Lectures, labs and self-directed study.
Assessment Details
Assessment Component | Brief Description | Learning Outcomes Addressed | % of Total | Week Set | Week Due |
Coursework | Major Assignments | 100% | N/A | N/A |
Reassessment Details
Assignment (100%).
Contact Hours and Indicative Student Workload
Contact Hours (scheduled hours per student over full module), broken down by: | 22 hours |
Lecture & Labs | 22 hours |
Independent Study (outside scheduled contact hours), broken down by: | 103 hours |
Self-directed study and completion of assignments | 103 hours |
Total Hours | 125 hours |
Recommended Reading List
- Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze: 2008, Introduction to Information Retrieval, 1, Cambridge University Press, 506, 978-0521865715 – https://nlp.stanford.edu/IR-book/
- Ricardo Baeza-Yates, Berthier Ribeiro-Neto: 2010, Modern Information Retrieval: The Concepts and Technology Behind Search, 2, Addison Wesley, 978-0321416919.
- Bruce Croft, Don Metzler, Trevor Strohman. Search Engines: Information Retrieval in Practice (2015).
Module Pre-requisites
Prerequisite modules: N/A
Other/alternative non-module prerequisites: N/A
Module Co-requisites
N/A