CS7DS1 – Data Analytics

Module CodeCS7DS1
Module NameData Analytics
ECTS Weighting [1]10 ECTS
Semester TaughtSemester 1 & 2
Module Coordinator/s  Dr. Bahman Honari

Module Learning Outcomes

On successful completion of this module, students will be able to:

  1. Identify, compare and select appropriate analysis and modelling techniques for a range of applications;
  2. Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas;
  3. Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.

Module Content

  • Overview of the field;
  • Review of Probability Theory;
  • Introduction of Monte-Carlo Methods and Simulation;
  • Review of Hypothesis Testing;
  • Analysis of Categorical Data;
  • Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain;
  • Using CHAID in Classification Tree;
  • Using Gini Index in Classification Tree;
  • Detailed Discussion of Classification and Regression Tree;
  • Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest,etc.;
  • Rule Fit Procedure, and Model Evaluation;
  • Handling Unbalance Dataset;
  • Concept of Similarity and Distance;
  • Distance Measures for Various Data Types;
  • Hierarchical Cluster Analysis;
  • Principal Component Analysis;
  • Concepts of Data Missingness and Its Mechanism;
  • Methods of Missing Data Imputation (MDI);
  • Using package MICE in R for MDI;
  • Introduction to Bayesian Statistics;
  • Examples of applications of Bayesian Statistics (Gibbs Sampling, etc).

Teaching and Learning Methods

Lectures and laboratories.

Assessment Details

Assessment ComponentBrief DescriptionLearning Outcomes Addressed% of TotalWeek SetWeek Due
ExaminationWritten Real-Time Examination (2 hours)All30%The End of Semester 2 N/A
Project End of Year Project All70%Week 10 (Semester 1)The End of Semester 2

Reassessment Details

Examination (2 hours, 100%).

Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by:54 hours
Lecture44 hours
Laboratory10 hours
Independent Study (outside scheduled contact hours), broken down by:60 hours
Preparation for classes and review of material (including preparation for examination, if applicable)30 hours
Completion of assessments (including examination, if applicable)30 hours
Total Hours114 hours

Recommended Reading List

  • Peter Bruce and Andrew Bruce, Practical Statistics for Data Scientists, O’Reilly, 2017.
  • Xin_She Yang, Introduction to Algorithms for Data Mining and Machine Learning, Academic Press, 2019.
  • Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons, 2019.
  • Michael Greenarcre and Raul Primicerio, Multivariate Analysis of Ecological Data, Fundacion BBVA, 2013.
  • Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2013.
  • Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning, Springer, 2021.
  • Pratap Dangeti, Statistics for Machine Learning, Packt, 2017.
  • Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, The Data Analysis Workshop, Packt, 2020.
  • Stef van Buuren, Flexible Imputation of Missing Data, CRC Press, 2018.
  • William M. Bolstad, James M. Curran, Introduction to Bayesian Statistics, Wiley, 2017.

Module Pre-requisites

Prerequisite modules: N/A

Other/alternative non-module prerequisites: A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression. A good working knowledge of R is also required.

Module Co-requisites

N/A

Module Website

Blackboard