Lecture 3 : Constrained optimization¶

May, 2022 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

Last lecture¶

Generalities on optimization problems
- Notion of critical point
- Necessary and sufficient condition of optimality

Unconstrained optimization in dimension $n=1$
- Golden section search
- Newton's method

Unconstrained optimization in dimension $n\geq 2$
- Newton's method
- Gradient descent method
- Finite-difference method
- Cross-Entropy method

Table of contents¶

Constrained optimization

Equality constraints and Lagrange
- Lagrange
- sequential quadratic programming

Inequality constraints and Lagrange duality
- Lagrange duality
- KKT conditions

Application: Ridge penalty

(optional) Application: SVM
- Linear classification

Reminder : constrained optimization¶

A constrained optimization problem is written as: $$ \min\limits_x f(x) \quad \text{subject to} \quad g(x) = 0 \text{ and } h(x) \leq 0 $$ with:
- $f: \mathbb{R}^n \to \mathbb{R}$ the function to minimize;
- $g: \mathbb{R}^n \to \mathbb{R}^m$ the equality constraint;
- and $h: \mathbb{R}^n \to \mathbb{R}^p$ the inequality constraint.
Later on, we will call this type of optimization problem: the primal problem.

Equality constraint and Lagrange¶

Lagrange multipliers¶

An optimization problem with equality constraint is written as: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} with $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$
A necessary condition for a feasible point $x^*$ to be a solution is that $$ \nabla f(x^*) = - J_g(x^*)^T\lambda $$ with $J_g$ the Jacobian matrix ($\neq$ the gradient !) of $g$ $$ J_g(x) = \begin{bmatrix} \dfrac{\partial g}{\partial x_1} & \dots & \dfrac{\partial g}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^T g_1\\ \vdots \\ \nabla^T g_m\\ \end{bmatrix} = \begin{bmatrix} \dfrac{\partial g_1}{\partial x_1} & \dots & \dfrac{\partial g_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\ \dfrac{\partial g_m}{\partial x_1} & \dots & \dfrac{\partial g_m}{\partial x_n}\\ \end{bmatrix} $$ and $\lambda\in\mathbb{R}^m$ is called the vector of Lagrange multipliers (named after the mathematician Joseph-Louis Lagrange in 1788).

Lagrangian¶

The Lagrangian $\mathcal{L}:\mathbb{R}^{n+m}\to \mathbb{R}$ is defined by $$ \mathcal{L}(x, \lambda) = f(x) + \lambda^T g(x) = f(x) + \sum\limits_{i=1}^{m}\lambda_i g_i(x) $$
Its gradient is given by $$ \nabla \mathcal{L}(x, \lambda) = \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} \implies \text{a necessary condition: a critical point of the Lagrangian } \nabla \mathcal{L}(x, \lambda) = 0 $$
Its Hessian is given by $$ H_\mathcal{L}(x, \lambda) = \begin{bmatrix} H_f(x) + \sum_{i=1}^{m}\lambda_i H_{g_i}(x) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} = \begin{bmatrix} B(x, \lambda) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} $$

(optional) Linear programming and simplex method : a quick overview¶

by applying Newton's method to the non-linear system $$\nabla \mathcal{L}(x, \lambda) = \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} = 0 $$ we obtain the linear system $$ \begin{bmatrix} B(x, \lambda) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} \begin{bmatrix} s\\ \lambda \end{bmatrix} = - \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} $$

(Contraint optimization) This approach is called sequential quadratic programming

(Uncontraint optimization) If the problem is unconstrained, then the method reduces to Newton's method

Equality constraints to inequality constraints ?¶

An optimization problem with equality constraint can easily be written as an inequality constraint: $$ g(x) = 0 \iff (g(x) \geq 0 \quad\&\quad g(x) \leq 0) $$
Consider an equality constrained problem: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} It can be written as \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0\\ \quad & -g(x) \leq 0\\ \end{align*}
For simplicity, we only consider the inequality constraints.

Inequality constraint and Lagrange duality¶

Lagrangian¶

An optimization problem with (inequality) constraint is written as: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*} with $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$

The Langrangian for this optimization problem is $$ \mathcal{L}(x, \lambda) = f(x) + \lambda^T g(x) = f(x) + \sum\limits_{i=1}^{m}\lambda_i g_i(x) $$ with $\lambda_i \geq 0$ the Lagrange multipliers.

Lagrange duality: definition¶

The (Lagrange) dual function associated to the constrained optimization problem is defined by

$$ F(\lambda) = \inf\limits_{x} \mathcal{L}(x, \lambda) = \inf\limits_{x}\left(f(x) + \lambda^T g(x)\right) $$

with $\lambda_i \geq 0$

We call the constrained optimization problem the primal problem:

\begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*}

and we call the following optimization problem the associated dual problem:

\begin{align*} \max \quad & F(\lambda) \\ \text{subject to} \quad & \lambda \geq 0 \end{align*}

Note that this dual problem is always convexe since the Lagrangian is concave (Lagrangian linear w.r.t. $\lambda$).

Lagrange duality: properties¶

If we denote $p^*$ the solution of the primal problem (a.k.a primal optimal)

$$ p^* = \inf\limits_{x} \sup\limits_{\lambda\geq0} \mathcal{L}(x, \lambda) $$

and $d^*$ the solution of the dual problem (a.k.a dual optimal)

$$ d^* = \sup\limits_{\lambda\geq0} \inf\limits_{x} \mathcal{L}(x, \lambda) $$

then

(weak duality) this inequality always holds: $d^* \leq p^*$
(strong duality) often this equality does not hold in general: $d^* = p^*$
The strong duality does hold when convex problem satisfy some constraint qualifications (to be defined later)

Remark: Lagrange dual problem often easier to solve (simpler constraint) !

Strong duality: Slater's condition¶

We define the problem \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*} a convex optimization problem if $g_i$ are convex functions.

For a convex optimization problem, we usually have a strong duality, but not always
Slater's condition (or Slater's constraint qualifications): there exists a $x\in\mathbb{R}^n$ such that $g_i(x) < 0$ for all $i\in \{1, \dots, m\}$ (strict feasibility!)
Slater's condition is a sufficient condition for strong duality to hold for a convex optimization problem.

Strong duality: KKT conditions¶

Theorem (Karush-Kuhn-Tucker (KKT) conditions): Let us assume that the primal problem is convex and that the slater's constraint qualification holds. We have strong duality if and only if all the following conditions hold:

(primal feasibility) there exists a primal optimal $x^*$
(dual feasibility) there exists a dual optimal $\lambda^*$
(complementary slackness) : $\lambda^{*T}g(x^*) = 0$ or equivalently for all $i\in\{1, \dots, m\}$, $\lambda_i^*g_i(x^*) = 0$
(stationarity) $\nabla_x\mathcal{L}(x^*, \lambda^*) = 0$

In [ ]: