CS7DS1 – Data Analytics – Teaching and Learning

Module Code	CS7DS1
Module Name	Data Analytics
ECTS Weighting [1]	10 ECTS
Semester Taught	Semester 1 & 2
Module Coordinator/s	Profs. Alessio Benavoli (semester I) and Athanasios Georgiadis (semester II)

Module Learning Outcomes

On successful completion of this module, students will be able to:

Identify, compare and select appropriate analysis and modelling techniques for a range of applications;

Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas;
Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.

Module Content

Overview of the field;
Review of Probability Theory;
Introduction of Monte-Carlo Methods and Simulation;
Review of Hypothesis Testing;
Analysis of Categorical Data;
Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain;
Using CHAID in Classification Tree;
Using Gini Index in Classification Tree;
Detailed Discussion of Classification and Regression Tree;
Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest,etc.;
Rule Fit Procedure, and Model Evaluation;
Handling Unbalance Dataset;
Concept of Similarity and Distance;
Distance Measures for Various Data Types;
Hierarchical Cluster Analysis;
Principal Component Analysis;
Concepts of Data Missingness and Its Mechanism;
Methods of Missing Data Imputation (MDI);
Using package MICE in R for MDI;
Nonparametric methods;
Introduction to Bayesian Statistics;
Examples of applications of Bayesian Statistics (Gibbs Sampling, etc).

Teaching and Learning Methods

Lectures and laboratories.

Assessment Details

Assessment Component	Brief Description	Learning Outcomes Addressed	% of Total	Week Set	Week Due
Coursework	semester I	All	20%
Coursework	semester II	All	20%
Examination	in-person (2 hours)	All	60%		Exam session in Semester 2

Reassessment Details

Examination (2 hours, 100%).

Contact Hours and Indicative Student Workload

Contact Hours (scheduled hours per student over full module), broken down by:	54 hours
Lecture	44 hours
Laboratory	10 hours
Independent Study (outside scheduled contact hours), broken down by:	60 hours
Preparation for classes and review of material (including preparation for examination, if applicable)	30 hours
Completion of assessments (including examination, if applicable)	30 hours
Total Hours	114 hours

Module Pre-requisites

Prerequisite modules: N/A

Other/alternative non-module prerequisites: A course on Multivariate Analysis covering principal components multiple regression, clustering techniques and logistic regression. A good working knowledge of R is also required.

Module Co-requisites

N/A

Module Website

Blackboard