Lecture 4 : Numerical methods in linear algebra¶

June, 2022 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

Last lecture¶

Generalities on optimization problems
- Notion of critical point
- Necessary and sufficient condition of optimality

Unconstrained optimization in dimension $n=1$
- Golden section search
- Newton's method

Unconstrained optimization in dimension $n\geq 2$
- Newton's method
- Gradient descent method
- Finite-difference method
- Cross-Entropy method

Constrained optimization
- Equality constrained and Lagrange
- Inequality constrained Lagrange duality

Table of contents¶

Some numerical methods in linear algebra

Principal Component Analysis (PCA)

(optional) Singular-Value Decomposition (SVD)

Principal Component Analysis (PCA)¶

Dimensionality reduction¶

A dimensionality reduction is transforming a data from a high-dimensional space into a low-dimensional space:
- Avoir curse of dimensionality: a sparse high-dimensional data is dangerous
- Noise reduction: there might be too much noise in high-dimensional data
- Data visualization (2D or 3D visualization): cannot visualize > 3D

PCA can be used as a dimensionality reduction while keeping as much of the data's variation as possible.

PCA : pseudo-code (see linear algebra course for more details)¶

PCA : optimal number of components¶

We note that each eigenvalue corresponds to the variance of each principal component (PC)

The optimal number of PC corresponds to a trade-off between dimensionality reduction and information loss

One way to quantify this trade-off (w.r.t. $k$) is computing the ratio of the cumulative variance:

$$ \frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{d} \lambda_i} $$

PCA : some limits¶

Model performance: PCA can reduce the model performance on datasets with low feature (linear) correlation

Interpretability: each principal component (PC1, PC2, ...) is a combination of original features (F1, F2, ...) $\implies$ principal components are not interpretable

(optional) Singular-Value Decomposition (SVD)¶

Matrix factorization method (like LU decomposition)
The singular value decomposition decomposes a matrix $X$ into $$ X = U \times S\times V^T $$ with
- $U, V$ orthogonal matrices and
- $S$ a diagonal matrix with singular values (e.g. square roots of the eigenvalues) of $X^TX$ as entries (assumed to be sorted in descending order)

We can reduce the matrix $X$ by truncating $U$, $S$ and $V$:

In [ ]: