Fundamentals of probabilistic data mining


3 ECTS, C. 13.5h, L. 4.5h


Xavier Alameda-Pineda


This courses introduces probabilistic models with latent variables, and the associated algorithms to estimate the parameters and perform inference over the latent variables. Such models are used for unsupervised tasks such as clustering and source modeling, as well as for supervised tasks such as classification and regression. You will discover the basic probabilistic models as well as more advanced techniques.

The following topics are addressed:

Principles of probabilistic data mining and generative models Latent variables and probabilistic graphical models Mixture models The linear-Gaussian model and probabilistic PCA Markov models for time series with continuous and discrete latent variables Variational inference and variational auto-encoders At the end of the course, the student will have basic knowledge in the most common probabilistic models with latent variables. Therefore, the student will be able to perform model-based clustering, analysis and segmentation of time-series with hidden Markov models, build a graphical model associated with a given distribution, represent numerical multivariate data with missing coordinates into planes and work with state-of-the-art non-linear regression models based on variational autoencoders.


Fundamental principles in probability theory (conditioning) and statistics (maximum likelihood estimator and its usual asymptotic properties).


The first session combines a written exam (E1) and the reports of the three practical sessions (P). The final mark of the first session is obtained as (E1+P)/2. The second session consists of only a written exam (E2) which constitutes the final grade.