Module Code | STU44003 |
Module Name | Data Analytics |
ECTS Weighting[1] | 10 ECTS |
Semester taught | Semester 1 & 2 |
Module Coordinator/s | Dr. Bahman Honari |
Module Learning Outcomes
On successful completion of this module, students will be able to:
LO1. Identify, compare and select appropriate analysis and modelling techniques for a range of applications.
LO2. Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas.
LO3. Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.
Module Content
- Overview of the field
- Review of Probability Theory
- Introduction of Monte-Carlo Methods and Simulation
- Review of Hypothesis Testing
- Analysis of Categorical Data
- Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain
- Using CHAID in Classification Tree
- Using Gini Index in Classification Tree
- Detailed Discussion of Classification and Regression Tree
- Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest, …)
- Rule Fit Procedure, and Model Evaluation
- Handling Unbalance Dataset
- Concept of Similarity and Distance
- Distance Measures for Various Data Types
- Hierarchical Cluster Analysis
- Principal Component Analysis
- Concepts of Data Missingness and Its Mechanism
- Methods of Missing Data Imputation (MDI)
- Using package MICE in R for MDI
- Introduction to Bayesian Statistics
- Examples of applications of Bayesian Statistics (Gibbs Sampling, …)
Teaching and learning Methods
Lectures and lab sessions.
Assessment Details
Assessment Component | Brief Description | Learning Outcomes Addressed | % of total | Week set | Week Due |
In-class/Online Test | 2 hours Test | All | 20% | Week 12 Semester 1 | N/A |
In-class/Online Test | 2 hours Test | All | 20% | Week 12 Semester 2 | N/A |
Project | End of Year Project | All | 60% | Week 9 Semester 1 | The End of Semester 2 |
Reassessment Details
Examination (2 hours, 100%)
Contact Hours and Indicative Student Workload
Contact Hours (scheduled hours per student over full module), broken down by: | 54 hours |
Lecture | 44 hours |
Laboratory | 10 hours |
Independent study (outside scheduled contact hours), broken down by: | 40 hours |
Preparation for classes and review of material (including preparation for examination, if applicable | 30 hours |
Completion of assessments (including examination, if applicable) | 10 hours |
Total Hours | 94 hours |
Recommended Reading List
- Peter Bruce and Andrew Bruce, Practical Statistics for Data Scientists, O’Reilly, 2017
- Xin_She Yang, Introduction to Algorithms for Data Mining and Machine Learning, Academic Press, 2019
- Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons, 2019
- Michael Greenarcre and Raul Primicerio, Multivariate Analysis of Ecological Data, Fundacion BBVA, 2013
- Max Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2013
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning, Springer, 2021
- Pratap Dangeti, Statistics for Machine Learning, Packt, 2017
- Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, The data Analysis Workshop, Packt, 2020
- Stef van Buuren, Flexible Imputation of Missing Data, CRC Press, 2018
- William M. Bolstad, James M. Curran, Introduction to Bayesian Statistics, Wiley, 2017
Module Pre-requisites
Prerequisite modules: This is a year 4 module.
Other/alternative non-module prerequisites: NA
Module Co-requisites
None