May, 2022 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
Unconstrained optimization in dimension $n\geq2$
Let $f:\mathbb{R}^n \to\mathbb{R}$ be a real valued function of $n$ variables
We know that a minimum $x^*$ verifies $\nabla f(x^*) = 0$. We can therefore try to to solve the equation $\nabla f(x) = 0$ by Newton method.
We can also approximate $f$ by $$ f(x+h) \approx f(x) + \nabla f(x)h + \dfrac{1}{2}h^TH_f(x)h $$ and minimise the quadratic approximation as a function of $h$
In both cases, we obtain the iteration, $$ x_{k+1} = x_k - H_f^{-1}(x_k)\nabla f(x_k) $$
We do not explicitly calculate the inverse of the Hessian. Instead, we solve the linear system $$ x_{k+1} = x_k + s_k $$
The convergence of Newton's method is quadratic provided you start the iteration close enough to the result.
Principle
Optimization, find: $x^*\in\arg\max_{x} S(x)$
for t = 0:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
for t = 1:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
for t = 8:
we sample $Y_1, \dots, Y_n \sim \mathcal{N}(\mu_t, \sigma_t^2)$
we choose the best $10\%$ $Y_i$ that maximize $S$
we estimate (by MLE) $\mu_{t+1}$ and $\sigma_{t+1}^2$
Let $f(x, y) = (a-x)^2 + b(y-x^2)^2$ be the function of Rosenbrock. The Rosenbrock function (by Howard H. Rosenbrock in 1960) is used as a performance test problem for optimization algorithms. It has a global minimum at $(a ,a^2)$.
Question 1:
Define the rosenbrock function rosenbrock(X, a=1, b=100) and do a 3D plot of the function in the region $[-6, 6]\times[-20, 50]$. In our case $X=(x, y)$
Question 2:
Define:
rosenbrock_gradient(X, a=1, b=100) that returns the gradient of the rosenbrock function (in an array)rosenbrock_hessian(X, a=1, b=100) that returns the Hessian of the rosenbrock function (in an array)
Question 3: Newton method
Define the newton optimizer newton(gradient_function, hessian_function, x0, eps=1e-10, max_iter=1000) that returns the minimum of a function given the gradient and the hessian. The function should stop if max_iter iterations are reached or if the norm of the gradient is smaller than eps.
Question 4: Gradient descent method
Define the gradient descent optimizer gradient_descent(gradient_function, alpha=0.01, eps=1e-10, max_iter=1000) that returns the minimum of a function given the gradient. The function should stop if max_iter iterations are reached or if the norm of the gradient is smaller than eps. alpha corresponds to the step of the method.
Question 5: Gradient descent method with optimal step
Define the gradient descent optimizer gradient_descent_optimal(function, gradient_function, eps=1e-10, max_iter=1000) that returns the minimum of a function given the gradient. The function should stop if max_iter iterations are reached or if the norm of the gradient is smaller than eps. for each iteration alpha should be calibrated thanks to an 1D optimizer (e.g. golden section search of the first lecture).
Question 6.
cross_entreopy(function, n_sample = 1000, eps = 1e-10, max_iter = 1000) to obtain the minimum of any function $S : \mathbb{R}^d \to \mathbb{R}$. Let's apply it to the Rosenbrock function.Let $S(x) = \sum\limits_{i=1}^d 100 (x_{i+1}-x_i)^2 + (x_i - 1)^2$ be the (variant) function of Rosenbrock. This function admits as global minimum the point $x^* = (1, \dots, 1)$.