May, 2022 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
Unconstrained optimization in dimension $n\geq2$
Let $f:\mathbb{R}^n \to\mathbb{R}$ be a real valued function of $n$ variables
We know that a minimum $x^*$ verifies $\nabla f(x^*) = 0$. We can therefore try to to solve the equation $\nabla f(x) = 0$ by Newton method.
We can also approximate $f$ by $$ f(x+h) \approx f(x) + \nabla f(x)h + \dfrac{1}{2}h^TH_f(x)h $$ and minimise the quadratic approximation as a function of $h$
In both cases, we obtain the iteration, $$ x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k) $$
We do not explicitly calculate the inverse of the Hessian. Instead, we solve the linear system $$ x_{k+1} = x_k + s_k $$
The convergence of Newton's method is quadratic provided you start the iteration close enough to the result.
Principle
Optimization, find: $x^*\in\arg\max_{x} S(x)$
for t = 0:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
for t = 1:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
for t = 8:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
Let $f(x, y) = (a-x)^2 + b(y-x^2)^2$ be the function of Rosenbrock. The Rosenbrock function (by Howard H. Rosenbrock in 1960) is used as a performance test problem for optimization algorithms. It has a global minimum at $(a ,a^2)$.
Question 1:
Define the rosenbrock function rosenbrock(X, a=1, b=100)
and do a 3D plot of the function in the region $[-6, 6]\times[-20, 50]$. In our case $X=(x, y)$
Question 2:
Define:
rosenbrock_gradient(X, a=1, b=100)
that returns the gradient of the rosenbrock function (in an array)rosenbrock_hessian(X, a=1, b=100)
that returns the Hessian of the rosenbrock function (in an array)
Question 3: Newton method
Define the newton optimizer newton(gradient_function, hessian_function, x0, eps=1e-10, max_iter=1000)
that returns the minimum of a function given the gradient and the hessian. The function should stop if max_iter
iterations are reached or if the norm of the gradient is smaller than eps
.
Question 4: Gradient descent method
Define the gradient descent optimizer gradient_descent(gradient_function, alpha=0.01, eps=1e-10, max_iter=1000)
that returns the minimum of a function given the gradient. The function should stop if max_iter
iterations are reached or if the norm of the gradient is smaller than eps
. alpha
corresponds to the step of the method.
Question 5: Gradient descent method with optimal step
Define the gradient descent optimizer gradient_descent_optimal(function, gradient_function, eps=1e-10, max_iter=1000)
that returns the minimum of a function given the gradient. The function should stop if max_iter
iterations are reached or if the norm of the gradient is smaller than eps
. for each iteration alpha
should be calibrated thanks to an 1D optimizer (e.g. golden section search of the first lecture).
Question 6.
cross_entreopy(function, n_sample = 1000, eps = 1e-10, max_iter = 1000)
to obtain the minimum of any function $S : \mathbb{R}^d \to \mathbb{R}$. Let's apply it to the Rosenbrock function.Let $S(x) = \sum\limits_{i=1}^d 100 (x_{i+1}-x_i)^2 + (x_i - 1)^2$ be the (variant) function of Rosenbrock. This function admits as global minimum the point $x^* = (1, \dots, 1)$.