Lecture 3 : Constrained optimization¶

May, 2022 - François HU

Master of Science - EPITA

This lecture is available here: https://curiousml.github.io/

image.png

Last lecture¶

  • Generalities on optimization problems
    • Notion of critical point
    • Necessary and sufficient condition of optimality
  • Unconstrained optimization in dimension $n=1$
    • Golden section search
    • Newton's method
  • Unconstrained optimization in dimension $n\geq 2$
    • Newton's method
    • Gradient descent method
    • Finite-difference method
    • Cross-Entropy method

Table of contents¶

Constrained optimization

  • Equality constraints and Lagrange
    • Lagrange
    • sequential quadratic programming
  • Inequality constraints and Lagrange duality
    • Lagrange duality
    • KKT conditions
  • Application: Ridge penalty
  • (optional) Application: SVM
    • Linear classification

Reminder : constrained optimization¶

  • A constrained optimization problem is written as: $$ \min\limits_x f(x) \quad \text{subject to} \quad g(x) = 0 \text{ and } h(x) \leq 0 $$ with:

    • $f: \mathbb{R}^n \to \mathbb{R}$ the function to minimize;
    • $g: \mathbb{R}^n \to \mathbb{R}^m$ the equality constraint;
    • and $h: \mathbb{R}^n \to \mathbb{R}^p$ the inequality constraint.
  • Later on, we will call this type of optimization problem: the primal problem.

Equality constraint and Lagrange¶

Lagrange multipliers¶

  • An optimization problem with equality constraint is written as: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} with $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$

  • A necessary condition for a feasible point $x^*$ to be a solution is that $$ \nabla f(x^*) = - J_g(x^*)^T\lambda $$ with $J_g$ the Jacobian matrix ($\neq$ the gradient !) of $g$ $$ J_g(x) = \begin{bmatrix} \dfrac{\partial g}{\partial x_1} & \dots & \dfrac{\partial g}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \nabla^T g_1\\ \vdots \\ \nabla^T g_m\\ \end{bmatrix} = \begin{bmatrix} \dfrac{\partial g_1}{\partial x_1} & \dots & \dfrac{\partial g_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\ \dfrac{\partial g_m}{\partial x_1} & \dots & \dfrac{\partial g_m}{\partial x_n}\\ \end{bmatrix} $$ and $\lambda\in\mathbb{R}^m$ is called the vector of Lagrange multipliers (named after the mathematician Joseph-Louis Lagrange in 1788).

Lagrangian¶

  • The Lagrangian $\mathcal{L}:\mathbb{R}^{n+m}\to \mathbb{R}$ is defined by $$ \mathcal{L}(x, \lambda) = f(x) + \lambda^T g(x) = f(x) + \sum\limits_{i=1}^{m}\lambda_i g_i(x) $$
  • Its gradient is given by $$ \nabla \mathcal{L}(x, \lambda) = \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} \implies \text{a necessary condition: a critical point of the Lagrangian } \nabla \mathcal{L}(x, \lambda) = 0 $$

  • Its Hessian is given by $$ H_\mathcal{L}(x, \lambda) = \begin{bmatrix} H_f(x) + \sum_{i=1}^{m}\lambda_i H_{g_i}(x) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} = \begin{bmatrix} B(x, \lambda) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} $$

(optional) Linear programming and simplex method : a quick overview¶

  • by applying Newton's method to the non-linear system $$\nabla \mathcal{L}(x, \lambda) = \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} = 0 $$ we obtain the linear system $$ \begin{bmatrix} B(x, \lambda) & J_g(x)^T\\ J_g(x) & 0 \end{bmatrix} \begin{bmatrix} s\\ \lambda \end{bmatrix} = - \begin{bmatrix} \nabla f(x) + J_g(x)^T\lambda \\ g(x) \end{bmatrix} $$
  • (Contraint optimization) This approach is called sequential quadratic programming
  • (Uncontraint optimization) If the problem is unconstrained, then the method reduces to Newton's method

Equality constraints to inequality constraints ?¶

  • An optimization problem with equality constraint can easily be written as an inequality constraint: $$ g(x) = 0 \iff (g(x) \geq 0 \quad\&\quad g(x) \leq 0) $$

  • Consider an equality constrained problem: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) = 0 \end{align*} It can be written as \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0\\ \quad & -g(x) \leq 0\\ \end{align*}

  • For simplicity, we only consider the inequality constraints.

Inequality constraint and Lagrange duality¶

Lagrangian¶

  • An optimization problem with (inequality) constraint is written as: \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*} with $f: \mathbb{R}^n \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}^m$
  • The Langrangian for this optimization problem is $$ \mathcal{L}(x, \lambda) = f(x) + \lambda^T g(x) = f(x) + \sum\limits_{i=1}^{m}\lambda_i g_i(x) $$ with $\lambda_i \geq 0$ the Lagrange multipliers.

Lagrange duality: definition¶

The (Lagrange) dual function associated to the constrained optimization problem is defined by

$$ F(\lambda) = \inf\limits_{x} \mathcal{L}(x, \lambda) = \inf\limits_{x}\left(f(x) + \lambda^T g(x)\right) $$

with $\lambda_i \geq 0$

  • We call the constrained optimization problem the primal problem:
\begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*}
  • and we call the following optimization problem the associated dual problem:
\begin{align*} \max \quad & F(\lambda) \\ \text{subject to} \quad & \lambda \geq 0 \end{align*}
  • Note that this dual problem is always convexe since the Lagrangian is concave (Lagrangian linear w.r.t. $\lambda$).

Lagrange duality: properties¶

If we denote $p^*$ the solution of the primal problem (a.k.a primal optimal)

$$ p^* = \inf\limits_{x} \sup\limits_{\lambda\geq0} \mathcal{L}(x, \lambda) $$

and $d^*$ the solution of the dual problem (a.k.a dual optimal)

$$ d^* = \sup\limits_{\lambda\geq0} \inf\limits_{x} \mathcal{L}(x, \lambda) $$

then

  • (weak duality) this inequality always holds: $d^* \leq p^*$

  • (strong duality) often this equality does not hold in general: $d^* = p^*$

  • The strong duality does hold when convex problem satisfy some constraint qualifications (to be defined later)

Remark: Lagrange dual problem often easier to solve (simpler constraint) !

Strong duality: Slater's condition¶

  • We define the problem \begin{align*} \min \quad & f(x) \\ \text{subject to} \quad & g(x) \leq 0 \end{align*} a convex optimization problem if $g_i$ are convex functions.
  • For a convex optimization problem, we usually have a strong duality, but not always

  • Slater's condition (or Slater's constraint qualifications): there exists a $x\in\mathbb{R}^n$ such that $g_i(x) < 0$ for all $i\in \{1, \dots, m\}$ (strict feasibility!)

  • Slater's condition is a sufficient condition for strong duality to hold for a convex optimization problem.

Strong duality: KKT conditions¶

Theorem (Karush-Kuhn-Tucker (KKT) conditions): Let us assume that the primal problem is convex and that the slater's constraint qualification holds. We have strong duality if and only if all the following conditions hold:

  1. (primal feasibility) there exists a primal optimal $x^*$
  2. (dual feasibility) there exists a dual optimal $\lambda^*$
  3. (complementary slackness) : $\lambda^{*T}g(x^*) = 0$ or equivalently for all $i\in\{1, \dots, m\}$, $\lambda_i^*g_i(x^*) = 0$
  4. (stationarity) $\nabla_x\mathcal{L}(x^*, \lambda^*) = 0$
In [ ]: