ST122-15 Foundations of Data Science 1
Introductory description
This module provides an introduction to Data Science. This module provides the opportunity to develop knowledge and explore that thriving area of Data Science. Our world is data rich and you will have the opportunity to combine data exploration with Python programming skills to question, explore and question this data.
This module is designed for those who have taken mathematics to A-level, but who are otherwise not taking a mathematics or statistics course. This module does not assume knowledge of Python and is not designed for those who have a strong knowledge of Python.
Availability This module can be taken either at level 1 or level 2. Students interested in the level 2 version should consider ST238 Principles of Data Science 1. Students may not take both version of this module. Students in Year 3 must take ST238 Principles of Data Science 1.
Module aims
This module aims to:
- explore issues and problems through a Data Science lens.
- develop core concepts of data exploration, visualisation and inference.
- work hand-on with real data.
- develop Python programming skills.
Outline syllabus
This is an indicative module outline only to give an indication of the sort of topics that may be covered. Actual sessions held may differ.
This module covers introductory statistics vital to any subsequent study of data. This includes areas such as causality, data structures, data presentation and visualisation, exploratory analysis, sampling, decisions and reasoning and decision-making with under uncertainty.
This is an indicative module outline only to give an indication of the sort of topics that may be covered.
- What is data science: skills, aims and process.
- Introduction to statistics: types of datasets and variables, notations, summary statistics for centrality and dispersion.
- Data structures: cleaning and wrangling.
- Principles of exploratory data analysis and data visualisation.
- Basics of probability distributions, sampling and simulation.
- Bivariate distributions: A/B testing and tests of association.
- Statistical fitting, least squares method and linear regression (interpretation and model validation).
Learning outcomes
By the end of the module, students should be able to:
- Use Python to explore data, presenting the results in a variety of forms.
- Apply Python to implement appropriate data techniques to gain insights into real-world data.
- Make judgements on the basis of data analysis and present those judgements coherently and clearly.
- Apply appropriate methods to summarise data.
Indicative reading list
Reading lists can be found in Talis
Specific reading list for the module
Research element
There will be the opportunity to conduct an analysis of a contemporary data science, drawing conclusions from this data set and presenting new insights.
Interdisciplinary
Data Science is an interdisciplinary endeavour leveraging mathematics, statistics, computer science to provide new opportunities to explore other disciplines. Data sets and students will be drawn together by a shared interest in data exploration.
Subject specific skills
- Select and apply appropriate data techniques.
- Create structured and coherent arguments communicating them in written form.
- Construct and develop logical arguments with clear identification of assumptions and conclusions.
- Communicate subject-specific information effectively and coherently.
- Analyse problems, abstracting their essential information formulating them using appropriate language to facilitate their solution.
- Select and apply appropriate statistical programming language for data analysis.
- Understand major aspects of data collection, generation, and quality, and how this influences analyses and conclusions.
Transferable skills
- Critical thinking: extracting patterns from incomplete data and using them to form evidence-based conclusions.
- Problem solving: use of logical reasoning to build arguments grounded in evidence and with explicit underlying assumptions.
- Self-awareness: monitoring of your own learning and seeking feedback.
- Communication: verbal discussion of ideas in seminars and among peers; written communication in assignments and the final project.
- Teamwork: collaboration with peers in seminars, during self-study and during the completion of extended tasks.
- Information literacy: evaluation of data and uncertainty in a model-based way.
- Digital literacy: use of computational tools to understand and visualise data, and to produce reports.
- Professionalism: self-motivation, taking charge of your own learning, and prioritising effectively.
- Ethics: reflect on professional responsibilities as a statistician in conjunction with the generation and dissemination of information.
Study time
| Type | Required |
|---|---|
| Practical classes | 10 sessions of 2 hours (13%) |
| Other activity | 20 hours (13%) |
| Private study | 60 hours (40%) |
| Assessment | 50 hours (33%) |
| Total | 150 hours |
Private study description
Studying learning materials.
Preparing and consolidation of practical sessions.
Other activity description
Scheduled learning hours which may include in person or online learning.
Costs
No further costs have been identified for this module.
You do not need to pass all assessment components to pass the module.
Assessment group A1
| Weighting | Study time | Eligible for self-certification | |
|---|---|---|---|
| Assignment 1 | 20% | 10 hours | Yes (extension) |
|
The assignment will contain a number of questions for which solutions in the form of Python code, calculations and/or written responses will be required. Students’ answers may have a page/word limit when indicated. |
|||
| Assignment 2 | 30% | 15 hours | Yes (extension) |
|
The assignment will contain a number of questions for which solutions in the form of Python code, calculations and/or written responses will be required. Students’ answers may have a page/word limit when indicated. |
|||
| Assignment 3 | 50% | 25 hours | Yes (extension) |
|
Assignment 3 requires a data analysis project on a selected dataset that demonstrates the ability to make data-driven judgements based on analysis and convince others of its validity. The project will require:
Due to the nature of the work undertaken and the difficulty in assigning a word count to equations, figures, tables, graphics, data output and computer code, the word count is an approximation and an individual word count may vary depending on the nature of the analysis undertaken. The total length should not exceed 5 pages not including appendices. |
|||
Assessment group R
| Weighting | Study time | Eligible for self-certification | |
|---|---|---|---|
| Assignment | 100% | Yes (extension) | |
|
An assignment that requires a data analysis project on a selected dataset that demonstrates the ability to make data-driven judgements based on analysis and convince others of its validity. The project will require:
Due to the nature of the work undertaken and the difficulty in assigning a word count to equations, figures, tables, graphics, data output and computer code, the word count is an approximation and an individual word count may vary depending on the nature of the analysis undertaken. The total length should not exceed 5 pages not including appendices. |
|||
Feedback on assessment
Grades and feedback will be returned online within 20 working days of the submission deadline.
Pre-requisites
The module is open to all students on all UG courses across the university, except for students taking mathematics or statistics degrees, who are already familiar with its key themes.
Only available to first- and second-year students.
The module Principles of Data Science 1 is a Level 5 (Year 2) version of this module with different learning and assessment outcomes. You cannot take Foundations of Data Science 1 and Principles of Data Science 1.
Anti-requisite modules
If you take this module, you cannot also take:
- ST238-15 Principles of Data Science 1
There is currently no information about the courses for which this module is core or optional.