Module Code | CS7DS1 |
Module Name | Data Analytics |
ECTS Weighting [1] | 10 ECTS |
Semester Taught | Semester 1 & 2 |
Module Coordinator/s | Dr. Bahman Honari |
Module Learning Outcomes
On successful completion of this module, students will be able to:
Identify, compare and select appropriate analysis and modelling techniques for a range of applications;
- Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas;
- Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.
Module Content
- Overview of the field;
- Review of Probability Theory;
- Introduction of Monte-Carlo Methods and Simulation;
- Review of Hypothesis Testing;
- Analysis of Categorical Data;
- Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain;
- Using CHAID in Classification Tree;
- Using Gini Index in Classification Tree;
- Detailed Discussion of Classification and Regression Tree;
- Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest,etc.;
- Rule Fit Procedure, and Model Evaluation;
- Handling Unbalance Dataset;
- Concept of Similarity and Distance;
- Distance Measures for Various Data Types;
- Hierarchical Cluster Analysis;
- Principal Component Analysis;
- Concepts of Data Missingness and Its Mechanism;
- Methods of Missing Data Imputation (MDI);
- Using package MICE in R for MDI;
- Introduction to Bayesian Statistics;
- Examples of applications of Bayesian Statistics (Gibbs Sampling, etc).
Teaching and Learning Methods
Lectures and laboratories.
Assessment Details
Assessment Component | Brief Description | Learning Outcomes Addressed | % of Total | Week Set | Week Due |
In-class/Online Test | 2 hours Test | All | 20% | Week 12 Semester 1 | N/A |
In-class/Online Test | 2 hours Test | All | 20% | Week 12 Semester 2 | N/A |
Project | End of Year Project | All | 60% | Week 9 Semester 1 | The End of Semester 2 |
Reassessment Details
Examination (2 hours, 100%).
Contact Hours and Indicative Student Workload
Contact Hours (scheduled hours per student over full module), broken down by: | 54 hours |
Lecture | 44 hours |
Laboratory | 10 hours |
Independent Study (outside scheduled contact hours), broken down by: | 60 hours |
Preparation for classes and review of material (including preparation for examination, if applicable) | 30 hours |
Completion of assessments (including examination, if applicable) | 30 hours |
Total Hours | 114 hours |
Recommended Reading List
- Peter Bruce and Andrew Bruce, Practical Statistics for Data Scientists, O’Reilly, 2017.
- Xin_She Yang, Introduction to Algorithms for Data Mining and Machine Learning, Academic Press, 2019.
- Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons, 2019.
- Michael Greenarcre and Raul Primicerio, Multivariate Analysis of Ecological Data, Fundacion BBVA, 2013.
- Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2013.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning, Springer, 2021.
- Pratap Dangeti, Statistics for Machine Learning, Packt, 2017.
- Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, The Data Analysis Workshop, Packt, 2020.
- Stef van Buuren, Flexible Imputation of Missing Data, CRC Press, 2018.
- William M. Bolstad, James M. Curran, Introduction to Bayesian Statistics, Wiley, 2017.
Module Pre-requisites
Prerequisite modules: N/A
Other/alternative non-module prerequisites: A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression. A good working knowledge of R is also required.
Module Co-requisites
N/A