Introductory description

Knowledge of the fundamental principles of natural language processing.

Module aims

The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.

Outline syllabus

This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.

The module will address core methodologies in natural language processing and related tools and will proceed to examine current applications. The syllabus may cover:

Regular expressions, word tokenisation, stemming, sentence segmentation
N-grams and language models
Part-of-Speech Tagging
Hidden Markov Models and Maximum Entropy Models
Semantics: Lexical Semantics, Distributional Semantics, Word Sense Disambiguation and Vector Space Models
Text classification
Sentiment analysis
Information Extraction: Named Entity Recognition, Relation Extraction
Syntactic Parsing
Semantic Parsing
Question Answering and Summarisation
Recommender systems

Learning outcomes

By the end of the module, students should be able to:

Demonstrate knowledge of the fundamental principles of natural language processing.
Understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
Understanding of the state of the art in the core areas of Natural Language Processing such as Language Models, Part-of-Speech tagging, Named Entity Recognition, Syntactic Parsing, Information Extraction, Text Classification, Distributional Semantics and Vector Space Models.
Working knowledge of state of the art tools available for analysing linguistic data in the context of the above mentioned areas.
Computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.

Indicative reading list

Reading lists can be found in Talis

Specific reading list for the module

Research element

Students need to do some research about features used for sentiment classifier training in Assignment 2

Subject specific skills

Have knowledge of the fundamental principles of Natural Language Processing (NLP).
Understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
Understanding of the state of the art in the core areas of Natural Language Processing such as Language models, Part-Of-Speech tagging, Named Entity Recognition, Syntactic Parsing, Information Extraction, Text Classification, Distributional Semantics and Vector Space Models.
Understanding of the state of the art in current application areas such as Semantic Parsing, Sentiment Analysis, Social Media analysis, Summarisation, Question Answering, Information Extraction.
Working knowledge of state of the art tools available for analysing linguistic data in the context of the above mentioned areas.
Computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.

Transferable skills

Analytical skills – Examine NLP problems thoroughly with attention to details
Research skills – Identify relevant resources and background information to be used in coursework projects
Problem solving skills – Think creatively and apply sensible approaches to solve the NLP problems given
Communication skills – Present approaches and findings in a coherent manner in coursework reports

Study time

Type	Required
Lectures	20 sessions of 1 hour (13%)
Seminars	8 sessions of 1 hour (5%)
Supervised practical classes	9 sessions of 1 hour (6%)
Private study	113 hours (75%)
Total	150 hours

Private study description

Background reading.
Coursework completion (including programming and report writing).
Revision.

Costs

No further costs have been identified for this module.

You do not need to pass all assessment components to pass the module.

Students can register for this module without taking any assessment.

Assessment group D2

	Weighting	Eligible for self-certification
Assessed practical coursework	30%	No
Assessed practical coursework. This assignment is worth more than 3 CATS and is not, therefore, eligible for self-certification.
Centrally-timetabled examination (On-campus)	70%	No
CS918 exam Answerbook Pink (12 page) Students may use a calculator

Assessment group R1

	Weighting	Study time	Eligible for self-certification
In-person Examination - Resit	100%		No
CS918 resit exam Answerbook Pink (12 page) Students may use a calculator

Feedback on assessment

Students will receive written feedback on coursework.

Past exam papers for CS918

Pre-requisites

Self-contained module but it would be helpful to take in conjunction with CS910 and/or CS909.

Courses

This module is Optional for:

TCSA-G5PD Postgraduate Taught Computer Science
- Year 1 of G5PD Computer Science
- Year 1 of G5PD Computer Science
Year 1 of TCSA-G5PA Postgraduate Taught Data Analytics
Year 1 of TIMA-L995 Postgraduate Taught Data Visualisation

This module is Core option list A for:

Year 1 of TPSS-C803 Postgraduate Taught Behavioural and Data Science

This module is Core option list C for:

Year 1 of TPSS-C803 Postgraduate Taught Behavioural and Data Science

This module is Option list A for:

Year 1 of TIMS-L990 Postgraduate Big Data and Digital Futures

This module is Option list B for:

Year 1 of TIMA-L981 Postgraduate Social Science Research

CS918-15 Natural Language Processing