CS7IS3 – Information Retrieval and Web Search

Module CodeCS7IS3
Module NameInformation Retrieval and Web Search
ECTS Weighting [1]5 ECTS
Semester TaughtSemester 1
Module Coordinator/s  Assistant Professor Yvette Graham

Module Learning Outcomes

On successful completion of this module, students should be able to:

  1. Explain the process of content indexing in information retrieval including stop word removal, conflation (stemming, string-comparison), and the language dependency of these methods;
  2. Demonstrate an understanding of the importance and application of data structures in efficient information retrieval, in particular inverted file structures;
  3. Have knowledge of the theoretical basis and operation of standard algorithms for ranked information retrieval, including the term weighting and ranking models e.g. tf-idf weighting, vector-space model, probabilistic model, language modelling;
  4. Describe the process of relevance feedback for improved ranking in information retrieval, and apply standard relevance feedback algorithms;
  5. Understand the importance of evaluation in development of search engines, and the application of standard evaluation metrics such as precision and recall and test collections in measuring effectiveness of information retrieval systems, both in terms of the system’s performance and user satisfaction with the system;
  6. Appreciate the application and operation of search engines in diverse environments e.g. web search, audio-visual search, context-aware and mobile search, patent search, search in microblogs etc.;
  7. Be able to begin to combine technologies relevant to search systems in novel ways to synthesise new information retrieval applications.

Module Content

Specific topics addressed in this module include:

  • Introduction to Web Search;
  • Boolean Retrieval;
  • Text Processing: Stopword Removal, Stemming, Spelling Correction;
  • Index Construction and Compression;
  • Probabilistic Information Retrieval;
  • Computing Scores for Ranking: BM25, Vector Space Model, PageRank;
  • Classification: Naïve Bayes, kNN, decision boundaries;
  • Evaluation: Precision, Recall, F-score, NDCG;
  • Link Analysis;
  • Web Crawling;
  • Question Answering;
  • Personalisation.

Teaching and Learning Methods

Lectures, labs and self-directed study.

Assessment Details

Assessment ComponentBrief DescriptionLearning Outcomes Addressed% of TotalWeek SetWeek Due
CourseworkMajor Assignments100%N/A N/A

Reassessment Details

Assignment (100%).

Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by:22 hours
Lecture & Labs 22 hours
Independent Study (outside scheduled contact hours), broken down by:103 hours
Self-directed study and completion of assignments 103 hours
Total Hours125 hours

Recommended Reading List

  • Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze: 2008, Introduction to Information Retrieval, 1, Cambridge University Press, 506, 978-0521865715 – https://nlp.stanford.edu/IR-book/
  • Ricardo Baeza-Yates, Berthier Ribeiro-Neto: 2010, Modern Information Retrieval: The Concepts and Technology Behind Search, 2, Addison Wesley, 978-0321416919.
  • Bruce Croft, Don Metzler, Trevor Strohman. Search Engines: Information Retrieval in Practice (2015).

Module Pre-requisites

Prerequisite modules: N/A

Other/alternative non-module prerequisites: N/A

Module Co-requisites

N/A

Module Website

Blackboard