ST42015 Statistical Learning and Big Data
Introductory description
This module runs in Term 2 and is available for students on a course where it is a listed option (subject to restrictions*) and as an Unusual Option to students who have completed the prerequisite modules.
Prerequisites:
Statistics Undergraduate students: ST218 Mathematical Statistics A, ST219 Mathematical Statistics B and ST221 Linear Statistical Modelling.*
MSc in Statistics students: ST903 Statistical Methods and ST952 Introduction to Statistical Practice.*
Master’s in Financial Mathematics students: MA907 Simulation and Machine Learning.
External Undergraduate students: ST220 Introduction to Mathematical Statistics and ST221 Linear Statistical Modelling.*
Module aims
This module will introduce students to modern applications of Statistics in challenging modern data analysis contexts and provide them with the theoretical underpinnings to apply these methods.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
Statistical Learning – an introduction to statistical learning theory, using simple ML methods to illustrate the various ideas:
From overfitting to apparently complex methods which can work well, such as VC dimension and shattering sets.
PAC bounds. Loss functions. Risk (in the learning theoretic sense) and posterior expected risk. Generalisation error.
Supervised, unsupervised and semisupervised learning.
The use of distinct training, test and validation sets, particularly in the context of prediction problems.
The Bootstrap revisited. Bags of Little Bootstraps. Bootstrap aggregation. Boosting.
Big Data and Big Model – issues and (partial) solutions:
The “curse of dimensionality”. Multiple testing; voodoo correlations, falsediscovery rate and familywise error rate. Corrections: Bonferroni, BenjaminiHochberg.
Sparsity and Regularisation. Variable selection; regression. Spike and slab priors. Ridge Regression. The Lasso. The Dantzig Selector.
Concentration of measure and related inferential issues.
MCMC in high dimensions – preconditioned Crank Nicholson; MALA, HMC. Preconditioning. Rates of convergence.
Learning outcomes
By the end of the module, students should be able to:
 Explain, critically discuss and apply fundamental concepts and analytic tools in Statistical Learning;
 Analyse and discuss issues and fundamental tools in the analysis of Big Data and Big Models;
 Implement and assess methods for prediction based on partitioning data;
 Apply fundamental tools based on sparsity, regularisation and the control of error rates to analyse large data sets.
Indicative reading list
View reading list on Talis Aspire
Subject specific skills
TBC
Transferable skills
TBC
Study time
Type  Required  Optional 

Lectures  30 sessions of 1 hour (20%)  2 sessions of 1 hour 
Private study  90 hours (60%)  
Assessment  30 hours (20%)  
Total  150 hours 
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Students can register for this module without taking any assessment.
Assessment group D
Weighting  Study time  

Assignment 1  10%  15 hours 
Due Term 2 Week 6. 

Assignment 2  10%  15 hours 
Due Term 2 Week 9. 

Online Examination  80%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms  Moodle

Assessment group R
Weighting  Study time  

Online Examination  Resit  100%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade. ~Platforms  Moodle

Feedback on assessment
Solutions and cohort level feedback will be provided for the examination. Individual scripts are retained for external examiners and will not be returned.
Courses
This module is Optional for:
 Year 1 of TMAAG1PE Master of Advanced Study in Mathematical Sciences
 Year 1 of TMAAG1P0 Postgraduate Taught Mathematics
 Year 1 of TMAAG1PC Postgraduate Taught Mathematics (Diploma plus MSc)
 Year 1 of TSTAG4P1 Postgraduate Taught Statistics

USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
 Year 3 of G300 Mathematics, Operational Research, Statistics and Economics
 Year 4 of G300 Mathematics, Operational Research, Statistics and Economics
This module is Option list A for:
 Year 4 of USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
 Year 5 of USTAG301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated

USTAG1G3 Undergraduate Mathematics and Statistics (BSc MMathStat)
 Year 3 of G1G3 Mathematics and Statistics (BSc MMathStat)
 Year 4 of G1G3 Mathematics and Statistics (BSc MMathStat)
 Year 4 of USTAG1G4 Undergraduate Mathematics and Statistics (BSc MMathStat) (with Intercalated Year)
This module is Option list B for:
 Year 4 of USTAG304 Undergraduate Data Science (MSci)
This module is Option list D for:

USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
 Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
 Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
This module is Option list E for:
 Year 4 of USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
 Year 5 of USTAG301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated