Skip to main content Skip to navigation
Throughout the 2021-22 academic year, we will be prioritising face to face teaching as part of a blended learning approach that builds on the lessons learned over the course of the Coronavirus pandemic. Teaching will vary between online and on-campus delivery through the year, and you should read guidance from the academic department for details of how this will work for a particular module. You can find out more about the University’s overall response to Coronavirus at: https://warwick.ac.uk/coronavirus.

WM9A6-15 Machine Learning and Data Engineering

Department
WMG
Level
Taught Postgraduate Level
Module leader
Michael Mortenson
Credit value
15
Module duration
2 weeks
Assessment
Multiple
Study locations
  • University of Warwick main campus, Coventry Primary
  • Distance or Online Delivery
Introductory description

The practical application of data science and artificial intelligence systems requires the ability to process, engineer and manage the flow of data and the selection/implementation of learning algorithms. This module, using the industry-standard Python language, aims to provide students the necessary skills and competencies to implement efficient and reliable code, and employ best practices in data management, algorithm development and machine learning.

Module aims

This module aims to introduce students to many of the advanced statistical and data engineering techniques made possible by innovations in computing and modern processing power. This includes:

  • clustering
  • dimension reduction
  • regression
  • classification
  • feature engineering
  • natural language processing
  • high performance computing
  • analysis of algorithms and computational complexity.
Outline syllabus

This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.

Pandas, Dask , Spark and data management: Data cleaning; Data validation; Joining and merging datasets; Feature engineering; Automation.
Computational complexity and analysis of algorithms: Big O notation; Compilation; Vectorisation; Distibited processing; Best practices for programming.
Natural language processing: Working with text data; NLP; Topic models and decomposition.
Clustering and Dimension Reduction: Clustering; Dimension reduction.
Supervised Learning: Regression; Clustering; Ensembles.

Learning outcomes

By the end of the module, students should be able to:

  • Develop original, non-trivial Python applications and algorithms.
  • Implement robust and efficient data pipelines to extract and transform data from a variety of sources
  • Evaluate and optimise data engineering algorithms for better computational performance.
  • Automate advanced machine learning techniques and critically evaluate the results.
  • Implement and optimise machine learning algorithms for statistical and computational performance.
Indicative reading list

View reading list on Talis Aspire

Interdisciplinary

A mixture of technology/computing topics, statistics/machine learning, and business topics

International

Topics are of high international demand

Subject specific skills

Programming, databases, data engineering, clustering, dimenstion reduction, regression, classification, ensemble modelling, computational complexity, cloud computing, IT architecture

Transferable skills

Programming, data analysis, team work, critical analysis, IT architecture

Study time

Type Required
Lectures 10 sessions of 1 hour 30 minutes (10%)
Practical classes 14 sessions of 1 hour 30 minutes (14%)
Online learning (independent) 10 sessions of 1 hour (7%)
Assessment 104 hours (69%)
Total 150 hours
Private study description

No private study requirements defined for this module.

Costs

No further costs have been identified for this module.

You do not need to pass all assessment components to pass the module.

Assessment group A
Weighting Study time
Data Engineering and Machine Learning Pipeline 20% 14 hours

Creating a data engineering/machine learning pipeline. Comprises of application/pipeline code and a short (300 word) description

Post Module Assignment 80% 90 hours

An essay on applications and best practices in data engineering and a programmed implementation of a data pipeline

Assessment group R
Weighting Study time
Post Module Assignment 100%

An essay on applications and best practices in data engineering and a programmed implementation of a data pipeline

Feedback on assessment

Verbal feedback for in-module element. Written feedback and annotated scripts for post-module element

There is currently no information about the courses for which this module is core or optional.