ST411-15 Dynamic Stochastic Control
Introductory description
An example of stochastic control is AlphaGo, trained as an autonomous agent to play a game like chess. Its goal is to learn a policy to maximise winning chances by making optimal moves in different game states. Through iterative learning, AlphaGo refines its decision-making, demonstrating the application of stochastic control theory in reinforcement learning.
This module is available for students on a course where it is a listed option and as an Unusual Option to students who have completed the prerequisite modules
Prerequisites ST318 Probability Theory and ST333 Applied Stochastic Processes.
Module aims
This module focuses on stochastic control in applied probability, including its applications in mathematical finance and reinforcement learning. It aims to prepare students for careers in business, industry, or government, while also exploring cutting-edge research in the field.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
This module will cover the following areas
- Markov Decision Processes (Markov chains, dynamic programming for controlled Markov chains and optimal stopping)
- Reinforcement Learning (temporal difference, Q-learning method, actor-critic method, policy gradient)
- Continuous-Time Deterministic Optimal Control (calculus of variations, dynamic programming and Hamilton-Jacobi equations)
- Continuous-Time Stochastic Optimal Control (controlled diffusion processes, dynamic programming and Hamilton-Jacobi-Bellman equations)
- Continuous-Time Reinforcement Learning (entropy-regularized exploratory formulation)
Learning outcomes
By the end of the module, students should be able to:
- Identify and to deal with stochastic control and optimal stopping problems.
- Solve simple Hamilton-Jacobi-Bellman equations both explicitly and numerically.
- Apply the above techniques to mathematical finance and reinforcement learning domains.
- Select and apply appropriate reinforcement learning techniques and its basic algorithms to solve problems.
Indicative reading list
Reading lists can be found in Talis
Specific reading list for the module
Subject specific skills
-
Evaluate, select and apply appropriate mathematical and/or probabilist techniques.
-
Demonstrate knowledge of and facility with formal probability concepts, both explicitly and by applying them to the solution of problems.
-
Create structured and coherent arguments communicating them in written form.
-
Construct logical mathematical arguments with clear identification of assumptions and conclusions.
-
Reason critically, carefully, and logically and derive (prove) mathematical results.
Transferable skills
-
Problem solving: Use rational and logical reasoning to deduce appropriate and well-reasoned conclusions. Retain an open mind, optimistic of finding solutions, thinking laterally and creatively to look beyond the obvious. Know how to learn from failure.
-
Self awareness: Reflect on learning, seeking feedback on and evaluating personal practices, strengths and opportunities for personal growth.
-
Communication: Present arguments, knowledge and ideas, in a range of formats.
-
Professionalism: Prepared to operate autonomously. Aware of how to be efficient and resilient. Manage priorities and time. Self-motivated, setting and achieving goals, prioritising tasks.
Study time
| Type | Required | Optional |
|---|---|---|
| Lectures | 30 sessions of 1 hour (100%) | 2 sessions of 1 hour |
| Total | 30 hours |
Private study description
Weekly revision of lecture notes and materials, wider reading, practice exercises and preparing for examination.
Costs
No further costs have been identified for this module.
You must pass all assessment components to pass the module.
Students can register for this module without taking any assessment.
Assessment group B4
| Weighting | Study time | Eligible for self-certification | |
|---|---|---|---|
Assessment component |
|||
| Centrally-timetabled examination (On-campus) | 100% | 20 hours | No |
|
The examination paper will contain four questions, of which the best marks of THREE questions will be used to calculate your grade.
|
|||
Reassessment component is the same |
|||
Feedback on assessment
Solutions and cohort level feedback will be provided for the examination.
Courses
This module is Optional for:
- Year 1 of TMAA-G1P0 Postgraduate Taught Mathematics
-
TMAA-G1PC Postgraduate Taught Mathematics (Diploma plus MSc)
- Year 1 of G1PC Mathematics (Diploma plus MSc)
- Year 2 of G1PC Mathematics (Diploma plus MSc)
- Year 1 of TSTA-G4P1 Postgraduate Taught Statistics
- Year 4 of USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics
This module is Option list A for:
- Year 1 of TSTA-G4P1 Postgraduate Taught Statistics
- Year 4 of USTA-G1G3 Undergraduate Mathematics and Statistics (BSc MMathStat)
This module is Option list B for:
-
TSTA-G4P1 Postgraduate Taught Statistics
- Year 1 of G40B Statistics with Data Science (Taught)
- Year 1 of G40C Statistics with Finance (Taught)
- Year 4 of USTA-G304 Undergraduate Data Science (MSci)
This module is Option list C for:
-
UMAA-G105 Undergraduate Master of Mathematics (with Intercalated Year)
- Year 4 of G105 Mathematics (MMath) with Intercalated Year
- Year 5 of G105 Mathematics (MMath) with Intercalated Year
-
UMAA-G103 Undergraduate Mathematics (MMath)
- Year 3 of G103 Mathematics (MMath)
- Year 4 of G103 Mathematics (MMath)
- Year 4 of UMAA-G107 Undergraduate Mathematics (MMath) with Study Abroad
This module is Option list F for:
- Year 4 of USTA-G300 Undergraduate Master of Mathematics,Operational Research,Statistics and Economics