Skip to main content Skip to navigation

ST963-15 Theory of Data Science

Department
Statistics
Level
Taught Postgraduate Level
Module leader
Andi Wang
Credit value
15
Module duration
10 weeks
Assessment
100% exam
Study location
University of Warwick main campus, Coventry

Introductory description

This module provides a graduate-level exploration of analysis of large datasets giving key experience of the work of modern data scientists. In the modern world it is very easy to generate large data sets. Capturing and exploiting the important information contained within such datasets poses a number of statistical challenges. It may not even be clear how much useful information the data contains.

This module is not available to undergraduate students or as an unusual option.

Module web page

Module aims

This module aims to

  1. Develop students’ understanding of the theoretical foundations of modern machine learning algorithms.
  2. Introduce key techniques for both supervised and unsupervised learning.
  3. Provide relevant background to prepare students for further advanced modules in machine learning and for a career in data science.

Outline syllabus

This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.

The module will cover the theory underlying many contemporary data analysis problems and algorithms.

  1. Fundamentals of statistical learning: Supervised vs unsupervised, regression vs classification, over- vs under-fitting, curse of dimensionality, training vs testing, validation and cross-validation, regularisation.

  2. Supervised learning methods. An indicative list is: logistic regression, k-nearest neighbours, support vector machines, Gaussian processes, artificial neural networks including transformers; exact details may vary from year to year.

  3. Unsupervised learning methods. An indicative list is: clustering, dimension reduction such as principal components analysis, modern generative modelling techniques such as variational autoencoders and denoising diffusions; exact details may vary from year to year.

Learning outcomes

By the end of the module, students should be able to:

  • Evaluate data analysis problems and select appropriate supervised and unsupervised learning algorithms.
  • Critically evaluate the strengths and weaknesses of machine learning algorithms from a theoretical perspective.
  • State and prove key results on the performance of some advanced learning methods.
  • Interpret the output of various algorithms when applied to data sets.

Indicative reading list

  • Drori (2022). The science of deep learning. CUP.
  • Goodfellow et al. (2016). Deep learning. MIT Press.
  • Hastie et al. (2009). The elements of statistical learning. Springer.
  • Murphy (2020). Probabilistic machine learning: An introduction. MIT Press.
  • Murphy (2023). Probabilistic machine learning: Advanced topics. MIT Press.
  • Prince (2024). Understanding deep learning. MIT Press.

View reading list on Talis Aspire

Interdisciplinary

Data Science is an interdisciplinary sciences drawing together teams in the sciences to develop new theoretical frameworks. Furthermore, data scientists work with teams across a huge number of disciplines and without boundaries. This module will allow students to see how data scientists work with non-subject experts to explore the boundaries of the modern data revolution.

Subject specific skills

  • Demonstrate facility with rigorous data science methods.

  • Evaluate, select and apply appropriate techniques to a variety of situations.

  • Demonstrate knowledge of and facility with data science concepts, both explicitly and by applying them to the solution of problems.

  • Create structured and coherent arguments communicating them in written form. 

  • Construct logical mathematical arguments with clear identification of assumptions and conclusions.

  • Reason critically, carefully, and logically.

Transferable skills

  • Problem solving: Use rational and logical reasoning to deduce appropriate and well-reasoned conclusions. Retain an open mind, optimistic of finding solutions, thinking laterally and creatively to look beyond the obvious. Know how to learn from failure.

  • Self awareness: Reflect on learning, seeking feedback on and evaluating personal practices, strengths and opportunities for personal growth.

  • Communication: Present arguments, knowledge and ideas, in a range of formats.

  • Professionalism: Prepared to operate autonomously. Aware of how to be efficient and resilient. Manage priorities and time. Self-motivated, setting and achieving goals, prioritising tasks.

Study time

Type Required Optional
Lectures 30 sessions of 1 hour (20%) 2 sessions of 1 hour
Tutorials (0%) 5 sessions of 1 hour
Private study 118 hours (79%)
Assessment 2 hours (1%)
Total 150 hours

Private study description

Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.

Costs

No further costs have been identified for this module.

You do not need to pass all assessment components to pass the module.

Assessment group B
Weighting Study time Eligible for self-certification
Assessment component
In-person Examination 100% 2 hours No

The examination paper will include questions drawn from the module.


  • Answerbook Pink (12 page)
Reassessment component is the same
Feedback on assessment

Cohort level feedback will be provided for the examination.

Past exam papers for ST963

Courses

This module is Core for:

  • Year 1 of TSTA-G4P1 Postgraduate Taught Statistics

This module is Optional for:

  • Year 1 of TSTA-G4P1 Postgraduate Taught Statistics

This module is Option list B for:

  • TSTA-G4P1 Postgraduate Taught Statistics
    • Year 1 of G40C Statistics with Finance (Taught)
    • Year 1 of G40A Statistics with Probability (Taught)