ST22112 Linear Statistical Modelling
Introductory description
This module runs partly in term 2 and partly in term 3. It is available for students on a course where it is a listed option and as an Unusual Option to students who have completed the prerequisite modules. It is strongly recommended for any students intending to do substantial data analysis.
Students wishing to pursue the integrated Masters MMORSE are expected to take ST221 in Year 2. Data Science students will find it highly relevant for their third year project. ST221 may form part of the criteria for determining places on ST modules with capped numbers such as ST340 Programming for Data Science and ST344 Professional Practice of Data Analysis.
Prerequisites for Statistics students: ST115 Introduction to Probability, ST218 Mathematical Statistics A and ST219 Mathematical Statistics B (taken concurrently).
Prerequisites for NonStatistics students: ST111/ST112 Probability A & B and ST220 Introduction to Mathematical Statistics. Basic knowledge in R such as covered in ST104 Statistical Laboratory I will be useful.
Results from the coursework from this module may be partly used to determine exemption eligibility in the computer based assessment components of the Institute and Faculty of Actuaries modules CS1, CS2, CM1 and CM2. (Independent application to the IFoA may be required.)
Module aims
To introduce the ideas and methods of statistical modelling and statistical model exploration. To introduce students to the application of R software and its use as a tool for statistical modelling, specifically for working with linear models in a variety of different scenarios.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
 Introduction to the R software. Some useful methods of examining large data sets. The use of this package to obtain important summary features in different data structures.
 A review of the simple linear regression. Distributions of estimators and residuals.
 An introduction to multiple regression. Estimators of these models. How the study of residuals can inform and refine model choice. How to use R to check the plausibility of such a statistical model and how to use diagnostic plots in combination with the theory of model refinement.
 Introduction of polynomial regression and various ANOVA models. The coding and interpretation of these models using R.
 An introduction to linear models for time series and generalized linear models for frequency data.
Learning outcomes
By the end of the module, students should be able to:
 Make use of the language R to explore data sets with appropriate graphs and summary statistics.
 Make use of R to fit appropriate linear models to data sets.
 Understand how various linear models can be proposed, estimated, diagnostically checked, compared and criticised.
Indicative reading list
View reading list on Talis Aspire
Subject specific skills
TBC
Transferable skills
TBC
Study time
Type  Required  Optional 

Lectures  30 sessions of 1 hour (88%)  2 sessions of 1 hour 
Practical classes  4 sessions of 1 hour (12%)  
Total  34 hours 
Private study description
Weekly revision of lecture notes and materials, wider reading and practice exercises, working on problem sets and preparing for examination.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Students can register for this module without taking any assessment.
Assessment group D4
Weighting  Study time  

Assignment 1  10%  12 hours 
You will use the R program to carry out calculations and fit models on provided data sets in response to a set of questions. You will present, discuss and evaluate the results. 

Assignment 2  20%  24 hours 
You will use the R program to carry out calculations and fit models on provided data sets in response to a set of questions. You will present, discuss and evaluate the results. 

Inperson Examination  70%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade.

Assessment group R2
Weighting  Study time  

Inperson Examination  Resit  100%  
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade.

Feedback on assessment
Reports will be marked and feedback returned to students within 20 working days.
Solutions and cohort level feedback will be provided for the examination.
Postrequisite modules
If you pass this module, you can take:
 ST34615 Generalised Linear Models for Regression and Classification
 ST40415 Applied Statistical Modelling
Courses
This module is Optional for:
 Year 2 of USTAG305 Undergraduate Data Science (MSci) (with Intercalated Year)
This module is Option list A for:
 Year 2 of USTAG300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics