Lecture 4 : Numerical methods in linear algebra¶

June, 2022 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

image.png

Last lecture¶

  • Generalities on optimization problems
    • Notion of critical point
    • Necessary and sufficient condition of optimality
  • Unconstrained optimization in dimension $n=1$
    • Golden section search
    • Newton's method
  • Unconstrained optimization in dimension $n\geq 2$
    • Newton's method
    • Gradient descent method
    • Finite-difference method
    • Cross-Entropy method
  • Constrained optimization
    • Equality constrained and Lagrange
    • Inequality constrained Lagrange duality

Table of contents¶

Some numerical methods in linear algebra

  • Principal Component Analysis (PCA)
  • (optional) Singular-Value Decomposition (SVD)

Principal Component Analysis (PCA)¶

Dimensionality reduction¶

  • A dimensionality reduction is transforming a data from a high-dimensional space into a low-dimensional space:
    • Avoir curse of dimensionality: a sparse high-dimensional data is dangerous
    • Noise reduction: there might be too much noise in high-dimensional data
    • Data visualization (2D or 3D visualization): cannot visualize > 3D
  • PCA can be used as a dimensionality reduction while keeping as much of the data's variation as possible.

image-2.png

PCA : pseudo-code (see linear algebra course for more details)¶

image-2.png

PCA : optimal number of components¶

  • We note that each eigenvalue corresponds to the variance of each principal component (PC)
  • The optimal number of PC corresponds to a trade-off between dimensionality reduction and information loss
  • One way to quantify this trade-off (w.r.t. $k$) is computing the ratio of the cumulative variance:
$$ \frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{d} \lambda_i} $$

PCA : some limits¶

  • Model performance: PCA can reduce the model performance on datasets with low feature (linear) correlation
  • Interpretability: each principal component (PC1, PC2, ...) is a combination of original features (F1, F2, ...) $\implies$ principal components are not interpretable

(optional) Singular-Value Decomposition (SVD)¶

  • Matrix factorization method (like LU decomposition)

  • The singular value decomposition decomposes a matrix $X$ into $$ X = U \times S\times V^T $$ with

    • $U, V$ orthogonal matrices and
    • $S$ a diagonal matrix with singular values (e.g. square roots of the eigenvalues) of $X^TX$ as entries (assumed to be sorted in descending order)
  • We can reduce the matrix $X$ by truncating $U$, $S$ and $V$:

image-3.png

In [ ]: