If you find any data displayed on this website that should be amended, please contact the Curriculum Management Team.
Module Title
LM Data Mining and Machine Learning
School
School of Engineering
Department
Elec, Elec & Sys Engineering
Module Code
04 30058
Module Lead
Peter Jancovic
Level
Masters Level
Credits
20
Semester
Semester 1
Pre-requisites
Co-requisites
Restrictions
None
Contact Hours
Lecture-28 hours
Supervised time in studio/workshop-9 hours
Guided independent study-163 hours Total: 200 hours
Exclusions
Description
Data Mining and Machine Learning is concerned with computational techniques for data analysis, to extract relevant information or discover underlying structure. The course consists of three parts. Part 1 concentrates on text and presents the principles which underpin current text search engines. Part 2 is concerned with generic techniques for analysing and discovering the underlying structure of general data sets. Part 3 will focus on application of hidden Markov models to automatic speech recognition.
Laboratory session 1: Implementation of a simple Search Engine using provided C implementations of the techniques covered.
DATA ANALYSIS AND MACHINE LEARNING: Statistical modelling and probability estimation; Maximum Likelihood estimation for Gaussian PDFs and Gaussian Mixture PDFs (the E-M algorithm); Principal Component Analysis, Clustering; Neural Networks; Hidden Markov models (HMMs).
Laboratory Session 2: Application of agglomerative and k-means clustering.
APPLICATIONS – SPEECH/AUDIO PATTERN PROCESSING: Spectral analysis of speech/audio data; Basics of human speech production and perception; Introductory phonetics; Automatic speech recognition (ASR) – acoustic modelling, language modelling, adaptation.
Laboratory Session 3: Analysis of speech/audio data; Development of an ASR system using provided software tools.
Learning Outcomes
By the end of the module students should be able to:
Construct a basic text-based search engine, including: Text normalization; Implementation of a Document Index; Calculation of Term‐Frequency and Inverse Document Frequency similarity between queries and documents.
Understand the basic principle of Latent Semantic Analysis.
Implement maximum likelihood estimation of Gaussian PDF and Gaussian Mixture PDF parameters for a given data set.
Understand the basic principle of Principle Components Analysis.
Understand and apply agglomerative, divisive and k-means clustering algorithms.
Understand the basic principles of Neural networks.
Demonstrate an in-depth understanding of hidden Markov models (HMMs) for modelling time-varying data.
Demonstrate an understanding of employment of HMMs for automatic speech recognition.
Explain the basic principles of human speech production and perception and use the language of elementary phonetics.
Understand basic spectral and spectro-temporal analysis of time-varying data.
Develop an HMM-based speech recognition system using available software tools.
Assessment
30058-01 : Module Mark : Mixed (100%)
Assessment Methods & Exceptions
Main assessment (50%) laboratory reports during/after the module delivered (Canvas submission) AND/OR several Canvas-based timed summative assessments (e.g., quizzes to consolidate learning)
(50%) end of module assessment: Option A: 1.5 hour closed book examination at end of module (centrally timetabled January exam) Option B: Open book assessment released and submitted via Canvas OR Canvas-based timed summative assessment (e.g., quiz)
Supplementary/Reassessment Reassessment to match the main assessment method with due consideration made to any restrictions imposed at the time of reassessment. Students can carry forward passed assessment components from main assessment.