Programmes and Modules - Course Details

If you find any data displayed on this website that should be amended, please contact the Curriculum Management Team.

Module Title	LM Data Mining and Machine Learning
School	School of Engineering
Department	Elec, Elec & Sys Engineering
Module Code	04 30058
Module Lead	Peter Jancovic
Level	Masters Level
Credits	20
Semester	Semester 1
Pre-requisites
Co-requisites
Restrictions	None
Contact Hours	Lecture-28 hours Supervised time in studio/workshop-9 hours Guided independent study-163 hours Total: 200 hours
Exclusions
Description	Data Mining and Machine Learning is concerned with computational techniques for data analysis, to extract relevant information or discover underlying structure. The course consists of three parts. Part 1 concentrates on text and presents the principles which underpin current text search engines. Part 2 is concerned with generic techniques for analysing and discovering the underlying structure of general data sets. Part 3 will focus on application of hidden Markov models to automatic speech recognition. TEXT-BASED INFORMATION RETRIEVAL: Zipf's Law, Query-document similarity, Term-Frequency, Inverse Document Frequency, Topic spotting, Latent Semantic Analysis Laboratory session 1: Implementation of a simple Search Engine using provided C implementations of the techniques covered. DATA ANALYSIS AND MACHINE LEARNING: Statistical modelling and probability estimation; Maximum Likelihood estimation for Gaussian PDFs and Gaussian Mixture PDFs (the E-M algorithm); Principal Component Analysis, Clustering; Neural Networks; Hidden Markov models (HMMs). Laboratory Session 2: Application of agglomerative and k-means clustering. APPLICATIONS – SPEECH/AUDIO PATTERN PROCESSING: Spectral analysis of speech/audio data; Basics of human speech production and perception; Introductory phonetics; Automatic speech recognition (ASR) – acoustic modelling, language modelling, adaptation. Laboratory Session 3: Analysis of speech/audio data; Development of an ASR system using provided software tools.
Learning Outcomes	By the end of the module students should be able to: Construct a basic text-based search engine, including: Text normalization; Implementation of a Document Index; Calculation of Term‐Frequency and Inverse Document Frequency similarity between queries and documents. Understand the basic principle of Latent Semantic Analysis. Implement maximum likelihood estimation of Gaussian PDF and Gaussian Mixture PDF parameters for a given data set. Understand the basic principle of Principle Components Analysis. Understand and apply agglomerative, divisive and k-means clustering algorithms. Understand the basic principles of Neural networks. Demonstrate an in-depth understanding of hidden Markov models (HMMs) for modelling time-varying data. Demonstrate an understanding of employment of HMMs for automatic speech recognition. Explain the basic principles of human speech production and perception and use the language of elementary phonetics. Understand basic spectral and spectro-temporal analysis of time-varying data. Develop an HMM-based speech recognition system using available software tools.
Assessment	30058-01 : Module Mark : Mixed (100%)
Assessment Methods & Exceptions	Main assessment (50%) laboratory reports during/after the module delivered (Canvas submission) AND/OR several Canvas-based timed summative assessments (e.g., quizzes to consolidate learning) (50%) end of module assessment: Option A: 1.5 hour closed book examination at end of module (centrally timetabled January exam) Option B: Open book assessment released and submitted via Canvas OR Canvas-based timed summative assessment (e.g., quiz) Supplementary/Reassessment Reassessment to match the main assessment method with due consideration made to any restrictions imposed at the time of reassessment. Students can carry forward passed assessment components from main assessment.
Other
Reading List

Programmes & Modules Handbook