ST963-15 Theory of Data Science
Introductory description
This module provides a graduate-level exploration of analysis of large datasets giving key experience of the work of modern data scientists. In the modern world it is very easy to generate large data sets. Capturing and exploiting the important information contained within such datasets poses a number of statistical challenges. It may not even be clear how much useful information the data contains.
This module is not available to undergraduate students or as an unusual option.
Module aims
This module aims to
- Develop students’ understanding of the theoretical foundations of modern machine learning algorithms.
- Introduce key techniques for both supervised and unsupervised learning.
- Provide relevant background to prepare students for further advanced modules in machine learning and for a career in data science.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
The module will cover the theory underlying many contemporary data analysis problems and algorithms.
-
Fundamentals of statistical learning: Supervised vs unsupervised, regression vs classification, over- vs under-fitting, curse of dimensionality, training vs testing, validation and cross-validation, regularisation.
-
Supervised learning methods. An indicative list is: logistic regression, k-nearest neighbours, support vector machines, Gaussian processes, artificial neural networks including transformers; exact details may vary from year to year.
-
Unsupervised learning methods. An indicative list is: clustering, dimension reduction such as principal components analysis, modern generative modelling techniques such as variational autoencoders and denoising diffusions; exact details may vary from year to year.
Learning outcomes
By the end of the module, students should be able to:
- Evaluate data analysis problems and select appropriate supervised and unsupervised learning algorithms.
- Critically evaluate the strengths and weaknesses of machine learning algorithms from a theoretical perspective.
- State and prove key results on the performance of some advanced learning methods.
- Interpret the output of various algorithms when applied to data sets.
Indicative reading list
- Drori (2022). The science of deep learning. CUP.
- Goodfellow et al. (2016). Deep learning. MIT Press.
- Hastie et al. (2009). The elements of statistical learning. Springer.
- Murphy (2020). Probabilistic machine learning: An introduction. MIT Press.
- Murphy (2023). Probabilistic machine learning: Advanced topics. MIT Press.
- Prince (2024). Understanding deep learning. MIT Press.
View reading list on Talis Aspire
Interdisciplinary
Data Science is an interdisciplinary sciences drawing together teams in the sciences to develop new theoretical frameworks. Furthermore, data scientists work with teams across a huge number of disciplines and without boundaries. This module will allow students to see how data scientists work with non-subject experts to explore the boundaries of the modern data revolution.
Subject specific skills
-
Demonstrate facility with rigorous data science methods.
-
Evaluate, select and apply appropriate techniques to a variety of situations.
-
Demonstrate knowledge of and facility with data science concepts, both explicitly and by applying them to the solution of problems.
-
Create structured and coherent arguments communicating them in written form.
-
Construct logical mathematical arguments with clear identification of assumptions and conclusions.
-
Reason critically, carefully, and logically.
Transferable skills
-
Problem solving: Use rational and logical reasoning to deduce appropriate and well-reasoned conclusions. Retain an open mind, optimistic of finding solutions, thinking laterally and creatively to look beyond the obvious. Know how to learn from failure.
-
Self awareness: Reflect on learning, seeking feedback on and evaluating personal practices, strengths and opportunities for personal growth.
-
Communication: Present arguments, knowledge and ideas, in a range of formats.
-
Professionalism: Prepared to operate autonomously. Aware of how to be efficient and resilient. Manage priorities and time. Self-motivated, setting and achieving goals, prioritising tasks.
Study time
Type | Required | Optional |
---|---|---|
Lectures | 30 sessions of 1 hour (20%) | 2 sessions of 1 hour |
Tutorials | (0%) | 5 sessions of 1 hour |
Private study | 118 hours (79%) | |
Assessment | 2 hours (1%) | |
Total | 150 hours |
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Assessment group B
Weighting | Study time | Eligible for self-certification | |
---|---|---|---|
Assessment component |
|||
In-person Examination | 100% | 2 hours | No |
The examination paper will include questions drawn from the module.
|
|||
Reassessment component is the same |
Feedback on assessment
Cohort level feedback will be provided for the examination.
Courses
This module is Core for:
- Year 1 of TSTA-G4P1 Postgraduate Taught Statistics
This module is Optional for:
- Year 1 of TSTA-G4P1 Postgraduate Taught Statistics
This module is Option list B for:
-
TSTA-G4P1 Postgraduate Taught Statistics
- Year 1 of G40C Statistics with Finance (Taught)
- Year 1 of G40A Statistics with Probability (Taught)