May, 2022 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
Constrained optimization
A constrained optimization problem is written as: $$ \min\limits_x f(x) \quad \text{subject to} \quad g(x) = 0 \text{ and } h(x) \leq 0 $$ with:
Later on, we will call this type of optimization problem: the primal problem.
An optimization problem with equality constraint is written as: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} with $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$
A necessary condition for a feasible point $x^*$ to be a solution is that $$ \nabla f(x^*) = - J_g(x^*)^T\lambda $$ with $J_g$ the Jacobian matrix ($\neq$ the gradient !) of $g$ $$ J_g(x) = \begin{bmatrix} \dfrac{\partial g}{\partial x_1} & \dots & \dfrac{\partial g}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^T g_1\\ \vdots \\ \nabla^T g_m\\ \end{bmatrix} = \begin{bmatrix} \dfrac{\partial g_1}{\partial x_1} & \dots & \dfrac{\partial g_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\ \dfrac{\partial g_m}{\partial x_1} & \dots & \dfrac{\partial g_m}{\partial x_n}\\ \end{bmatrix} $$ and $\lambda\in\mathbb{R}^m$ is called the vector of Lagrange multipliers (named after the mathematician Joseph-Louis Lagrange in 1788).
Its gradient is given by $$ \nabla \mathcal{L}(x, \lambda) = \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} \implies \text{a necessary condition: a critical point of the Lagrangian } \nabla \mathcal{L}(x, \lambda) = 0 $$
Its Hessian is given by $$ H_\mathcal{L}(x, \lambda) = \begin{bmatrix} H_f(x) + \sum_{i=1}^{m}\lambda_i H_{g_i}(x) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} = \begin{bmatrix} B(x, \lambda) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} $$
An optimization problem with equality constraint can easily be written as an inequality constraint: $$ g(x) = 0 \iff (g(x) \geq 0 \quad\&\quad g(x) \leq 0) $$
Consider an equality constrained problem: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} It can be written as \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0\\ \quad & -g(x) \leq 0\\ \end{align*}
For simplicity, we only consider the inequality constraints.
The (Lagrange) dual function associated to the constrained optimization problem is defined by
$$ F(\lambda) = \inf\limits_{x} \mathcal{L}(x, \lambda) = \inf\limits_{x}\left(f(x) + \lambda^T g(x)\right) $$with $\lambda_i \geq 0$
If we denote $p^*$ the solution of the primal problem (a.k.a primal optimal)
$$ p^* = \inf\limits_{x} \sup\limits_{\lambda\geq0} \mathcal{L}(x, \lambda) $$and $d^*$ the solution of the dual problem (a.k.a dual optimal)
$$ d^* = \sup\limits_{\lambda\geq0} \inf\limits_{x} \mathcal{L}(x, \lambda) $$then
(weak duality) this inequality always holds: $d^* \leq p^*$
(strong duality) often this equality does not hold in general: $d^* = p^*$
The strong duality does hold when convex problem satisfy some constraint qualifications (to be defined later)
Remark: Lagrange dual problem often easier to solve (simpler constraint) !
For a convex optimization problem, we usually have a strong duality, but not always
Slater's condition (or Slater's constraint qualifications): there exists a $x\in\mathbb{R}^n$ such that $g_i(x) < 0$ for all $i\in \{1, \dots, m\}$ (strict feasibility!)
Slater's condition is a sufficient condition for strong duality to hold for a convex optimization problem.
Theorem (Karush-Kuhn-Tucker (KKT) conditions): Let us assume that the primal problem is convex and that the slater's constraint qualification holds. We have strong duality if and only if all the following conditions hold: