*April, 2022 - François HU*

*Master of Science - EPITA*

*This lecture is available here: https://curiousml.github.io/*

- Existence and uniqueness of extrema

- Necessary and sufficient condition verified by the extrema

- Algorithms for the calculation of extrema in dimension 1

- Problem in
**optimization**: Given an (*objective*) function $f: \mathbb{R}^n \to \mathbb{R}$ and un set $S\subset\mathbb{R}^n$, find the*minimum*$x^*\in S$, i.e. such that $f(x^*)\leq f(x)$ for all $x\in S$

- It is sufficient to consider only the
**minimization**because maximizing $f$ is the same as minimizing $-f$

- The
**objective function**$f$ is generally differentiable and can be linear or non-linear

- The set of
**constraints**S is denoted by a system of equations and inequations which may be linear or non-linear. If $S = \mathbb{R}^n$ then the problem is called**unconstraint**.

- A point $x\in S$ is called
**feasible**

- An optimisation problem is generally written as:

with $f: \mathbb{R}^n \to \mathbb{R}$, $g: \mathbb{R}^n \to \mathbb{R}^m$ and $h: \mathbb{R}^n \to \mathbb{R}^p$

**Linear programming:**$f$, $g$ and $h$ are linear functions

**Non-linear programming:**at least one of the functions $f$, $g$ and $h$ is non-linear

- $x^*\in S$ is a
**global minimum**if $f(x^*)\leq f(x)$ for all $x\in S$

- $x^*\in S$ is a
**local minimum**if $f(x^*)\leq f(x)$ for all $x$ in a neighborhood of $x^*$

- Finding a global minimum is usually very difficult !

- Most optimisation methods are designed to find a local minimum

- If a global minimum is sought, one can try to apply an optimization method with different initial points

- For some problems, such as linear programming, the search for a global minimum is possible

- If $f$ is continuous on a closed bounded $S\subset\mathbb{R}^n$ then $f$ has a global minimum on $S$

- If $S$ is not closed or not bounded, then $f$ can have neither local nor global minimum

- A continuous function $f$ is
**coercive**on a non-bounded set $S$ if $$ \lim\limits_{\lvert\lvert x \lvert\lvert \to +\infty} f(x ) = +\infty $$

- If $f$ is coercive on a non-bounded closed set $S\subset\mathbb{R}^n$ then $f$ admits a global minimum on $S$

- The set $S$ is
**convex**if it contains all the segments between two of its points:

- A function $f: S\subset\mathbb{R}^n \to S$ is
**convex**on a convex set $S$ if

- Any local minimum of a convex function $f$ on a convex set $S$ is a global minimum of $f$ on $S$

- Any local minimum of a strictly convex function $f$ on a convex set $S$ is the unique global minimum of $f$ on $S$

- For functions of
**one**variable, extrema are found by calculating the**zeros**of the derivative

- For functions with
**$n$**variables, we look for the**critical points**, i.e. the solutions of the system

where $\nabla f(x) = \left[\frac{\partial f(x)}{\partial x_1}, \frac{\partial f(x)}{\partial x_2}, \cdots, \frac{\partial f(x)}{\partial x_n}\right]$ is the gradient of $f$

- If $f: S\subset \mathbb{R}^n \to \mathbb{R}$ is continuously differentiable (noted class $C^1$), then a local minimum x belonging to the interior of $S$ is a critical point of $f$

- Be careful: not all critical points are necessarily
**minima**(they can be**maxima**or**saddle points**)

- For a function $f: S\subset \mathbb{R}^n \to \mathbb{R}$ of class $C^2$ (two times continuously differentiable)

- we distinguish the critical points by considering the Hessian matrix $H_f(x)$ defined by
$$
H_f(x) = \left[ H_f(x) \right]_{i, j} = \left[ \frac{\partial^2f(x)}{\partial x_i\partial x_j} \right]_{i, j}
$$
which is
*symmetrical*.

For a given critical point $x^*$, if $H_f(x^*)$ is

- positive-definite (i.e. all its
*eigenvalues*are positive) then $x^*$ is a**(local) minimum**of $f$ - negative-definite (i.e. all its
*eigenvalues*are negative) then $x^*$ is a**(local) maximum**of $f$ - indefinite (i.e. it has both positive and negative
*eigenvalues*) then $x^*$ is a**saddle point**of $f$ - singular (i.e. at least one
*eigenvalue*is zero, therefore the determinant is zero) then we**can't say anything**(inconclusive)

**Algorithm 1:**Golden section search

**Algorithm 2:**Newton method

**Objective:** Let us minimize the function
$$
f(x) = 0.5 - xe^{-x^2}
$$

We assume $f$ unimodal on $[a, b]$. Let $x_1, x_2\in[a, b]$ such that $x_1<x_2$

By evaluating and comparing $f(x_1)$ and $f(x_2)$ one of the intervals $]x_2, b]$ or $[a, x_1[$ can be removed. The minimum belongs to one of the remaining sub-intervals

The process can be iterated by calculating only one evaluation of the function each time

We want to reduce the search interval by the same factor at each iteration and moreover, we want to keep the same relations between the points of the new interval as with the old one

To do this, the relative positions of the two points are chosen by $\tau$ and $1-\tau$ with $\tau^2 = 1-\tau$. Therefore $\tau = \dfrac{\sqrt{5}-1}{2} \approx 0.618$ and $1-\tau \approx 0.382$

Whatever sub-interval is chosen, its length will be $\tau$ times that of the previous interval and the new points will be in position $\tau$ and $1-\tau$ relative to the new interval

The convergence rate is linear $O(n)$

- Another approximation is using the Taylor formula at order 2 $$ f(x+h) \approx f(x) + f'(x)h + \dfrac{1}{2}f''(x)h^2 $$

- The minimum of this quadratic function of $h$ is given by $$ h = -\dfrac{f'(x)}{f''(x)} $$

- The iteration $$ x_{k+1} = x_k - \dfrac{f'(x)}{f''(x)} $$ is Newton's iteration for solving the non-linear system $f'(x)=0$

- Newton's method for finding a minimum has
**quadratic convergence**$O(n^2)$ provided that the iteration starts close enough to the solution

Create a function

`golden_search(f, a, b, tol)`

which returns the minimum of the function`f`

in the interval $[a, b]$ (for a given tolerance) using golden section search algorithm.Minimize with

`golden_search`

the function $f(x) = 0.5 - xe^{-x^2}$. We can set $a = 0$, $b = 2$ and $tol = 0.001$

In [ ]:

```
```

Find the minimum of the function $f(x) = 0.5 - xe^{-x^2}$ using newton method (you can choos 1 as a starting point).

In [ ]:

```
```

Compare numerically the time complexity of these two methods.

In [ ]:

```
```