|Module Name||Data Analytics|
|ECTS Weighting ||10 ECTS|
|Semester Taught||Semester 1 & 2|
|Module Coordinator/s||Dr. Bahman Honari|
Module Learning Outcomes
On successful completion of this module, students will be able to:
- Identify, compare and select appropriate analysis and modelling techniques for a range of applications;
- Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas;
- Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.
- Overview of the field;
- Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain;
- Using CHAID in Classification Tree;
- Using Gini Index in Classification Tree;
- Detailed Discussion of Classification and Regression Tree;
- Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest,etc.;
- Rule Fit Procedure, and Model Evaluation;
- Handling Unbalance Dataset;
- Concept of Similarity and Distance;
- Distance Measures for Various Data Types;
- Hierarchical Cluster Analysis;
- Principal Component Analysis;
- Concepts of Data Missingness and Its Mechanism;
- Methods of Missing Data Imputation (MDI);
- Using package MICE in R for MDI;
- Introduction of Monte-Carlo Methods and Simulation;
- Introduction to Bayesian Statistics;
- Examples of applications of Bayesian Statistics (Gibbs Sampling, etc).
Teaching and Learning Methods
Lectures and laboratories.
|Assessment Component||Brief Description||Learning Outcomes Addressed||% of Total||Week Set||Week Due|
|Examination||Written Real-Time Examination (2 hours)||All||40%||The End of Semester 2||N/A|
|Project||End of Year Project||All||60%||Week 8 (Semester 1)||The End of Semester 2|
Examination (2 hours, 100%).
Contact Hours and Indicative Student Workload
|Contact Hours (scheduled hours per student over full module), broken down by:||54 hours|
|Independent Study (outside scheduled contact hours), broken down by:||60 hours|
|Preparation for classes and review of material (including preparation for examination, if applicable)||30 hours|
|Completion of assessments (including examination, if applicable)||30 hours|
|Total Hours||114 hours|
Recommended Reading List
- Peter Bruce and Andrew Bruce, Practical Statistics for Data Scientists, O’Reilly, 2017.
- Xin_She Yang, Introduction to Algorithms for Data Mining and Machine Learning, Academic Press, 2019.
- Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons, 2019.
- Michael Greenarcre and Raul Primicerio, Multivariate Analysis of Ecological Data, Fundacion BBVA, 2013.
- Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2013.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning, Springer, 2021.
- Pratap Dangeti, Statistics for Machine Learning, Packt, 2017.
- Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, The Data Analysis Workshop, Packt, 2020.
- Stef van Buuren, Flexible Imputation of Missing Data, CRC Press, 2018.
- William M. Bolstad, James M. Curran, Introduction to Bayesian Statistics, Wiley, 2017.
Prerequisite modules: N/A
Other/alternative non-module prerequisites: A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression. A good working knowledge of R is also required.