# Statistical analysis and document mining

### Credits

6 ECTS, CTD 36h, TP 18h

### Instructor

Jean-Baptiste Durand

### Description

The aim of this course is to present the statistical approaches for analysing multivariate data. The information age has resulted in masses of multivariate data in many different field: finance, marketing, economy, biology, environmental sciences,â€¦The theoretical and practical aspects of multivariate data analysis are given equal importance. This balance is achieved through practicals involving actual data analysis using the R software.

#### Content

- Multiple linear regression. Least squares, Gaussian linear model, test of linear hypotheses, one-way analysis of variance.
- Principal Components Analysis (PCA).
- Classification, linear discriminant analysis, perceptron, Naive Bayes
- Text mining, numeric representation of texts, connexion with graph clustering.

### Prerequisites

Elementary notions in probability theory (probability distribution, joint probability density function for random vectors, conditional distribution, expectation, variance, covariance, Gaussian distribution)

Elementary notions in mathematical statistics (estimator, confidence interval, statistical tests).

As a bonus: simple linear regression, linear algebra (matrix reductions), elementary notions in Rstudio and the R software.