ST420-15 Statistical Learning and Big Data
Introductory description
This module will introduce students to modern applications of Statistics in challenging modern data analysis contexts and provide them with the theoretical underpinnings to apply these methods.
This module is available for students on a course where it is a listed option and as an Unusual Option to students who have completed the prerequisite modules.
Pre-requisites
Statistics students.
- ST218 Mathematical Statistics A; or,
- ST228 Mathematical Methods for Statistics and Probability and ST229 Probability for Mathematical Statistic.
- ST231 Linear Statistical Modelling with R.
Non-statistics students.
- ST121 Statistical Laboratory and ST232/ST233 Introduction to Mathematical Statistics.
- ST240 Linear Statistical Modelling or ST351 Linear Statistical Modelling (For Finalists)
MSc students:
- ST961 Statistical Methods and Practice and ST962 Advanced Topics in Statistics and Probability; or,
- MA907 Simulation and Machine Learning.
Module aims
This module aims to provide an introduction to statistical learning theory, using machine learning methods to illustrate the various concepts.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
The module will examine a variety of areas such as those illustrated below.
-
From over-fitting to apparently complex methods which can work well, such as VC dimension and shattering sets.
-
PAC bounds. Loss functions. Risk (in the learning theoretic sense) and posterior expected risk. Generalisation error.
-
Supervised, unsupervised and semi-supervised learning.
-
The use of distinct training, test and validation sets, particularly in the context of prediction problems.
-
The Bootstrap revisited. Bags of Little Bootstraps. Bootstrap aggregation. Boosting.
-
Big Data and Big Model – issues and (partial) solutions:
-
The “curse of dimensionality”. Multiple testing; voodoo correlations, false-discovery rate and family-wise error rate.
-
Corrections: Bonferroni, Benjamini-Hochberg.
-
Sparsity and Regularisation. Variable selection; regression. Spike and slab priors. Ridge Regression. The Lasso. The Dantzig Selector.
-
Concentration of measure and related inferential issues.
-
MCMC in high dimensions – preconditioned Crank Nicholson; MALA, HMC. Preconditioning. Rates of convergence.
Learning outcomes
By the end of the module, students should be able to:
- Explain, critically discuss and apply fundamental concepts and analytic tools in Statistical Learning;
- Analyse and discuss issues and fundamental tools in the analysis of Big Data and Big Models;
- Implement and assess methods for prediction based on partitioning data;
- Apply fundamental tools based on sparsity, regularisation and the control of error rates to analyse large data sets.
Indicative reading list
View reading list on Talis Aspire
Subject specific skills
-
Evaluate, select and apply appropriate mathematical and/or probabilist techniques.
-
Demonstrate knowledge of and facility with formal probability concepts, both explicitly and by applying them to the solution of problems.
-
Create structured and coherent arguments communicating them in written form.
-
Construct logical mathematical arguments with clear identification of assumptions and conclusions.
-
Reason critically, carefully, and logically and derive (prove) mathematical results.
Transferable skills
-
Problem solving: Use rational and logical reasoning to deduce appropriate and well-reasoned conclusions. Retain an open mind, optimistic of finding solutions, thinking laterally and creatively to look beyond the obvious. Know how to learn from failure.
-
Self awareness: Reflect on learning, seeking feedback on and evaluating personal practices, strengths and opportunities for personal growth.
-
Communication: Present arguments, knowledge and ideas, in a range of formats.
-
Professionalism: Prepared to operate autonomously. Aware of how to be efficient and resilient. Manage priorities and time. Self-motivated, setting and achieving goals, prioritising tasks.
Study time
Type | Required | Optional |
---|---|---|
Lectures | 30 sessions of 1 hour (20%) | 2 sessions of 1 hour |
Private study | 90 hours (60%) | |
Assessment | 30 hours (20%) | |
Total | 150 hours |
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Students can register for this module without taking any assessment.
Assessment group D5
Weighting | Study time | Eligible for self-certification | |
---|---|---|---|
Assignment 1 | 10% | 15 hours | No |
The assignment will contain a number of questions for which solutions and / or written responses will be required. |
|||
Assignment 2 | 10% | 15 hours | No |
The deadline for the assignment can be found in the Statistics Assessment Handbook (http://warwick.ac.uk/STassessmenthandbook). |
|||
In-person Examination | 80% | No | |
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade.
|
Assessment group R5
Weighting | Study time | Eligible for self-certification | |
---|---|---|---|
In-person Examination - Resit | 100% | No | |
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade.
|
Feedback on assessment
Solutions and cohort level feedback will be provided for the examination. Individual scripts are retained for external examiners and will not be returned.
Courses
This module is Optional for:
- Year 1 of TIBS-N3G1 Postgraduate Taught Financial Mathematics
-
TSTA-G4P1 Postgraduate Taught Statistics
- Year 1 of G4P1 Statistics (Taught)
- Year 1 of G40B Statistics with Data Science (Taught)
- Year 1 of G40C Statistics with Finance (Taught)
- Year 1 of G40A Statistics with Probability (Taught)
- Year 4 of USTA-G304 Undergraduate Data Science (MSci)
- Year 5 of USTA-G305 Undergraduate Data Science (MSci) (with Intercalated Year)
-
USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
- Year 3 of G30A Master of Maths, Op.Res, Stats & Economics (Actuarial and Financial Mathematics Stream)
- Year 3 of G30J Master of Maths, Op.Res, Stats & Economics (Data Analysis Stream)
- Year 3 of G30B Master of Maths, Op.Res, Stats & Economics (Econometrics and Mathematical Economics Stream)
- Year 3 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 3 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 3 of G30D Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)
- Year 3 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 3 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 3 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 4 of G30A Master of Maths, Op.Res, Stats & Economics (Actuarial and Financial Mathematics Stream)
- Year 4 of G30J Master of Maths, Op.Res, Stats & Economics (Data Analysis Stream)
- Year 4 of G30B Master of Maths, Op.Res, Stats & Economics (Econometrics and Mathematical Economics Stream)
- Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 4 of G30C Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream)
- Year 4 of G30D Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)
- Year 4 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 4 of G300 Mathematics, Operational Research, Statistics and Economics
- Year 4 of G300 Mathematics, Operational Research, Statistics and Economics
-
USTA-G301 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics (with Intercalated
- Year 4 of G301 BSc Master of Mathematics, Operational Research, Statistcs and Economics (with Intercalated Year)
- Year 4 of G30E Master of Maths, Op.Res, Stats & Economics (Actuarial and Financial Mathematics Stream) Int
- Year 4 of G30K Master of Maths, Op.Res, Stats & Economics (Data Analysis Stream) Int
- Year 4 of G30F Master of Maths, Op.Res, Stats & Economics (Econometrics and Mathematical Economics Stream) Int
- Year 4 of G30G Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream) Int
- Year 4 of G30H Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)
- Year 5 of G301 BSc Master of Mathematics, Operational Research, Statistcs and Economics (with Intercalated Year)
- Year 5 of G30E Master of Maths, Op.Res, Stats & Economics (Actuarial and Financial Mathematics Stream) Int
- Year 5 of G30K Master of Maths, Op.Res, Stats & Economics (Data Analysis Stream) Int
- Year 5 of G30F Master of Maths, Op.Res, Stats & Economics (Econometrics and Mathematical Economics Stream) Int
- Year 5 of G30G Master of Maths, Op.Res, Stats & Economics (Operational Research and Statistics Stream) Int
- Year 5 of G30H Master of Maths, Op.Res, Stats & Economics (Statistics with Mathematics Stream)
-
USTA-G1G3 Undergraduate Mathematics and Statistics (BSc MMathStat)
- Year 3 of G1G3 Mathematics and Statistics (BSc MMathStat)
- Year 4 of G1G3 Mathematics and Statistics (BSc MMathStat)