STU44003 – Data Analytics

Module CodeSTU44003
Module Name Data Analytics
ECTS Weighting[1]10 ECTS
Semester taughtSemester 1
Module Coordinator/s Dr. Bahman Honari

Module Learning Outcomes

On successful completion of this module, students will be able to:

LO1. Identify, compare and select appropriate analysis and modelling techniques for a range of applications.

LO2. Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas.

LO3. Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.

Module Content

  • Overview of the field
  • Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain
  • Using CHAID in Classification Tree
  • Using Gini Index in Classification Tree
  • Detailed Discussion of Classification and Regression Tree
  • Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest, …)
  • Rule Fit Procedure, and Model Evaluation
  • Handling Unbalance Dataset
  • Concept of Similarity and Distance
  • Distance Measures for Various Data Types
  • Hierarchical Cluster Analysis
  • Principal Component Analysis
  • Concepts of Data Missingness and Its Mechanism
  • Methods of Missing Data Imputation (MDI)
  • Using package MICE in R for MDI
  • Introduction of Monte-Carlo Methods and Simulation
  • Introduction to Bayesian Statistics
  • Examples of applications of Bayesian Statistics (Gibbs Sampling, …)

Teaching and learning Methods

4 lectures and 1 lab per week.

Assessment Details

Assessment ComponentBrief Description Learning Outcomes Addressed% of totalWeek setWeek Due
Assignment Assignment – Practical application of techniques using actual data-setL01, L02, L0330 Week 8 of termEnd of Semester
ExamReal-time Exam (2 hours) L01, L02, L0370Exam week

Reassessment Details

Assignment (30%) and Final Exam (70%). Individual assignment 100%

Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by: 54 hours
Lecture44 hours
Laboratory10 hours
Independent study (outside scheduled contact hours), broken down by:40 hours
Preparation for classes and review of material (including preparation for examination, if applicable30 hours
Completion of assessments (including examination, if applicable)10 hours
Total Hours94 hours

Recommended Reading List

  • Peter Bruce and Andrew Bruce, Practical Statistics for Data Scientists, O’Reilly, 2017
  • Xin_She Yang, Introduction to Algorithms for Data Mining and Machine Learning, Academic Press, 2019
  • Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons, 2019
  • Michael Greenarcre and Raul Primicerio, Multivariate Analysis of Ecological Data, Fundacion BBVA, 2013
  • Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2013
  • Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning, Springer, 2021
  • Pratap Dangeti, Statistics for Machine Learning, Packt, 2017
  • Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, The data Analysis Workshop, Packt, 2020
  • Stef van Buuren, Flexible Imputation of Missing Data, CRC Press, 2018
  • William M. Bolstad, James M. Curran, Introduction to Bayesian Statistics, Wiley, 2017

Module Pre-requisites

Prerequisite modules: NA

Other/alternative non-module prerequisites: NA

Module Co-requisites

None

Module Website

Blackboard