Programme And Module Handbook
Course Details in 2025/26 Session

If you find any data displayed on this website that should be amended, please contact the Curriculum Management Team.

Module Title LM Data Mining and Machine Learning
SchoolSchool of Engineering
Department Elec, Elec & Sys Engineering
Module Code 04 30058
Module Lead Prof. Martin Russell
Level Masters Level
Credits 20
Semester Semester 1
Restrictions None
Contact Hours Lecture-28 hours
Supervised time in studio/workshop-9 hours
Guided independent study-163 hours
Total: 200 hours
Description Data Mining and Machine Learning is concerned with computational techniques for data analysis, to extract relevant information or discover underlying structure. The course consists of three parts.
Part 1 concentrates on text and presents the principles which underpin current text search engines.
Part 2 is concerned with generic techniques for analysing and discovering the underlying structure of general data sets.
Part 3 will focus on application of hidden Markov models to automatic speech recognition.

Zipf's Law, Query-document similarity, Term-Frequency, Inverse Document Frequency, Topic spotting, Latent Semantic Analysis

Laboratory session 1: Implementation of a simple Search Engine using provided C implementations of the techniques covered.

Statistical modelling and probability estimation; Maximum Likelihood estimation for Gaussian PDFs and Gaussian Mixture PDFs (the E-M algorithm); Principal Component Analysis, Clustering; Neural Networks; Hidden Markov models (HMMs).

Laboratory Session 2: Application of agglomerative and k-means clustering.

Spectral analysis of speech/audio data; Basics of human speech production and perception; Introductory phonetics; Automatic speech recognition (ASR) – acoustic modelling, language modelling, adaptation.

Laboratory Session 3: Analysis of speech/audio data; Development of an ASR system using provided software tools.
Learning Outcomes By the end of the module students should be able to:
  • Construct a basic text-based search engine, including: Text normalization; Implementation of a Document Index; Calculation of Term‐Frequency and Inverse Document Frequency similarity between queries and documents.
  • Understand the basic principle of Latent Semantic Analysis.
  • Implement maximum likelihood estimation of Gaussian PDF and Gaussian Mixture PDF parameters for a given data set.
  • Understand the basic principle of Principle Components Analysis.
  • Understand and apply agglomerative, divisive and k-means clustering algorithms.
  • Understand the basic principles of Neural networks.
  • Demonstrate an in-depth understanding of hidden Markov models (HMMs) for modelling time-varying data.
  • Demonstrate an understanding of employment of HMMs for automatic speech recognition.
  • Explain the basic principles of human speech production and perception and use the language of elementary phonetics.
  • Understand basic spectral and spectro-temporal analysis of time-varying data.
  • Develop an HMM-based speech recognition system using available software tools.
Assessment 30058-01 : Module Mark : Mixed (100%)
Assessment Methods & Exceptions Main assessment
(50%) laboratory reports during/after the module delivered (Canvas submission) AND/OR several Canvas-based timed summative assessments (e.g., quizzes to consolidate learning)

(50%) end of module assessment:
Option A: 1.5 hour closed book examination at end of module (centrally timetabled January exam)
Option B: Open book assessment released and submitted via Canvas OR Canvas-based timed summative assessment (e.g., quiz)

Reassessment to match the main assessment method with due consideration made to any restrictions imposed at the time of reassessment. Students can carry forward passed assessment components from main assessment.
Reading List