Lecture 1 : Calculus refresher¶

April, 2022 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

image-2.png

Table of contents¶

  • Optimization problem
  • Local and global optimization
  • Existence and uniqueness of extrema
  • First order optimization
  • Second order optimization
  • [optional] Constraint optimization

  • Optimization in dimension 1

    • Exercice 1: Golden section search
    • Exercice 2: Newton method
    • Exercice 3: Time complexity

Introduction¶

  • Existence and uniqueness of extrema
  • Necessary and sufficient condition verified by the extrema
  • Algorithms for the calculation of extrema in dimension 1

Optimization ¶

  • Problem in optimization: Given an (objective) function $f: \mathbb{R}^n \to \mathbb{R}$ and un set $S\subset\mathbb{R}^n$, find the minimum $x^*\in S$, i.e. such that $f(x^*)\leq f(x)$ for all $x\in S$
  • It is sufficient to consider only the minimization because maximizing $f$ is the same as minimizing $-f$
  • The objective function $f$ is generally differentiable and can be linear or non-linear
  • The set of constraints S is denoted by a system of equations and inequations which may be linear or non-linear. If $S = \mathbb{R}^n$ then the problem is called unconstraint.
  • A point $x\in S$ is called feasible

Optimization problem¶

  • An optimisation problem is generally written as:
$$ \min f(x) \quad \text{subject to} \quad g(x) = 0 \text{ and } h(x) \leq 0 $$

with $f: \mathbb{R}^n \to \mathbb{R}$, $g: \mathbb{R}^n \to \mathbb{R}^m$ and $h: \mathbb{R}^n \to \mathbb{R}^p$

  • Linear programming: $f$, $g$ and $h$ are linear functions
  • Non-linear programming: at least one of the functions $f$, $g$ and $h$ is non-linear

Local and global optimization ¶

  • $x^*\in S$ is a global minimum if $f(x^*)\leq f(x)$ for all $x\in S$
  • $x^*\in S$ is a local minimum if $f(x^*)\leq f(x)$ for all $x$ in a neighborhood of $x^*$
  • Finding a global minimum is usually very difficult !
  • Most optimisation methods are designed to find a local minimum
  • If a global minimum is sought, one can try to apply an optimization method with different initial points
  • For some problems, such as linear programming, the search for a global minimum is possible

Existence of a minimum ¶

  • If $f$ is continuous on a closed bounded $S\subset\mathbb{R}^n$ then $f$ has a global minimum on $S$
  • If $S$ is not closed or not bounded, then $f$ can have neither local nor global minimum
  • A continuous function $f$ is coercive on a non-bounded set $S$ if $$ \lim\limits_{\lvert\lvert x \lvert\lvert \to +\infty} f(x ) = +\infty $$
  • If $f$ is coercive on a non-bounded closed set $S\subset\mathbb{R}^n$ then $f$ admits a global minimum on $S$

Uniqueness of a minimum¶

  • The set $S$ is convex if it contains all the segments between two of its points:
$$ \forall x, y \in S, \forall \alpha\in [0,1], \quad \quad \alpha x + (1-\alpha)y \in S $$
  • A function $f: S\subset\mathbb{R}^n \to S$ is convex on a convex set $S$ if
$$ \forall x, y \in S, \forall \alpha\in [0,1], \quad \quad f(\alpha x + (1-\alpha)y) \leq \alpha f( x) + (1-\alpha)f(y) $$
  • Any local minimum of a convex function $f$ on a convex set $S$ is a global minimum of $f$ on $S$
  • Any local minimum of a strictly convex function $f$ on a convex set $S$ is the unique global minimum of $f$ on $S$

First order optimal condition ¶

  • For functions of one variable, extrema are found by calculating the zeros of the derivative
  • For functions with $n$ variables, we look for the critical points, i.e. the solutions of the system
$$ \nabla f(x) = 0 $$

where $\nabla f(x) = \left[\frac{\partial f(x)}{\partial x_1}, \frac{\partial f(x)}{\partial x_2}, \cdots, \frac{\partial f(x)}{\partial x_n}\right]$ is the gradient of $f$

  • If $f: S\subset \mathbb{R}^n \to \mathbb{R}$ is continuously differentiable (noted class $C^1$), then a local minimum x belonging to the interior of $S$ is a critical point of $f$
  • Be careful: not all critical points are necessarily minima (they can be maxima or saddle points)

Second order optimal condition ¶

  • For a function $f: S\subset \mathbb{R}^n \to \mathbb{R}$ of class $C^2$ (two times continuously differentiable)
  • we distinguish the critical points by considering the Hessian matrix $H_f(x)$ defined by $$ H_f(x) = \left[ H_f(x) \right]_{i, j} = \left[ \frac{\partial^2f(x)}{\partial x_i\partial x_j} \right]_{i, j} $$ which is symmetrical.

For a given critical point $x^*$, if $H_f(x^*)$ is

  • positive-definite (i.e. all its eigenvalues are positive) then $x^*$ is a (local) minimum of $f$
  • negative-definite (i.e. all its eigenvalues are negative) then $x^*$ is a (local) maximum of $f$
  • indefinite (i.e. it has both positive and negative eigenvalues) then $x^*$ is a saddle point of $f$
  • singular (i.e. at least one eigenvalue is zero, therefore the determinant is zero) then we can't say anything (inconclusive)

[optional] Constrained condition ¶

Optimization in dimension 1 ¶

  • Algorithm 1: Golden section search
  • Algorithm 2: Newton method

Objective: Let us minimize the function $$ f(x) = 0.5 - xe^{-x^2} $$

Golden section search: discussion¶

  • We assume $f$ unimodal on $[a, b]$. Let $x_1, x_2\in[a, b]$ such that $x_1<x_2$

  • By evaluating and comparing $f(x_1)$ and $f(x_2)$ one of the intervals $]x_2, b]$ or $[a, x_1[$ can be removed. The minimum belongs to one of the remaining sub-intervals

  • The process can be iterated by calculating only one evaluation of the function each time

  • We want to reduce the search interval by the same factor at each iteration and moreover, we want to keep the same relations between the points of the new interval as with the old one

  • To do this, the relative positions of the two points are chosen by $\tau$ and $1-\tau$ with $\tau^2 = 1-\tau$. Therefore $\tau = \dfrac{\sqrt{5}-1}{2} \approx 0.618$ and $1-\tau \approx 0.382$

  • Whatever sub-interval is chosen, its length will be $\tau$ times that of the previous interval and the new points will be in position $\tau$ and $1-\tau$ relative to the new interval

  • The convergence rate is linear $O(n)$

Golden section search: discussion¶

image-2.png

Golden section search: pseudo-algorithm¶

image-2.png

Newton method¶

  • Another approximation is using the Taylor formula at order 2 $$ f(x+h) \approx f(x) + f'(x)h + \dfrac{1}{2}f''(x)h^2 $$
  • The minimum of this quadratic function of $h$ is given by $$ h = -\dfrac{f'(x)}{f''(x)} $$
  • The iteration $$ x_{k+1} = x_k - \dfrac{f'(x)}{f''(x)} $$ is Newton's iteration for solving the non-linear system $f'(x)=0$
  • Newton's method for finding a minimum has quadratic convergence $O(n^2)$ provided that the iteration starts close enough to the solution

Exercices¶

Exercice 1: ¶

  1. Create a function golden_search(f, a, b, tol) which returns the minimum of the function f in the interval $[a, b]$ (for a given tolerance) using golden section search algorithm.

  2. Minimize with golden_search the function $f(x) = 0.5 - xe^{-x^2}$. We can set $a = 0$, $b = 2$ and $tol = 0.001$

In [ ]:
 

Exercice 2: ¶

Find the minimum of the function $f(x) = 0.5 - xe^{-x^2}$ using newton method (you can choos 1 as a starting point).

In [ ]:
 

Exercice 3: ¶

Compare numerically the time complexity of these two methods.

In [ ]: