# Algorithm

Optimization Engine (OpEn) solves problems of the following form:

Our blanket assumptions are:

**A1.** Function $f({}\cdot{}, p):\mathbb{R}^{n_u}\to\mathbb{R}$ is continuously
differentiable in $u$ with $L$-Lipschitz gradient (which may depend on $p$)

**A2.** $U\subseteq \mathbb{R}^n$ is a nonempty set on which we can compute
projections (e.g., boxes, balls, discrete sets, etc); this set may possibly
be nonconvex

**A3.** Mapping $F_1({}\cdot{}, p):\mathbb{R}^n\to\mathbb{R}^{n_1}$ has partial derivatives
and $C$ is a closed, convex set from which we may compute distances, that is,

**A4.** Mapping $F_2({}\cdot{}, p):\mathbb{R}^n\to\mathbb{R}^{n_2}$ has partial derivatives

## Simple constraints

Let us first focus on problems of the following simpler form (we have dropped the dependence on $p$ for the sake of brevity)

### Projected gradient method

Recall that the projected gradient method applied on this problem yields the iterations

starting from an initial guess $u^0$.

Let us define $T_\gamma(u) = \Pi_U (u - \gamma \nabla f(u))$, so the projected gradient method can be written as $u^{\nu+1} = T_\gamma(u^\nu)$, that is, they satisfy

We call such points $u^\star$, $\gamma$-**critical points** of the problem.

If $\gamma < 2/L$, then the projected gradient method is guaranteed to converge in the sense that all accumulation points of the projected gradient method are fixed points of $T_\gamma$.

The projected gradient method involves only simple steps which makes it easy to program and suitable for embedded applications, but convergence is at most Q-linear, often slow and sensitive to ill conditioning.

### Speeding things up

In order to achieve superior congerence rates we proceed as follows:
a $\gamma$-crtical point of the above problem is a *zero* of the residual operator

We may, therefore, use a Newton-type method of the form

where $d^\nu$ is computed as

where $H_\nu$ are invertible linear operators that satisfy the secant condition $s^\nu = H_\nu y^\nu$ with $s^\nu = u^{\nu+1}-u^\nu$ and $y^\nu = R_\gamma(u^{\nu+1})-R_\gamma(u^{\nu})$.

We could then use fast L-BFGS quasi-Newtonian directions, which do not require matrix operations (only vector products) and have low memory requirements. However, convergence is only local, in a neighbourhood of a critical point $u^\star$.

Our objective is to employ a globalisation technique using an appropriate exact merit function.

### Forward-Backward Envelope

The forward-backward envelope (FBE) is a function $\varphi_\gamma:\mathbb{R}^n\to\mathbb{R}$ is a function with the following properties

- It is real-valued
- It is continuous
- For $0 < \gamma < \tfrac{1}{L}$ it shares the same local/strong minima with the original problem

Therefore, the original constrained optimisation problem is equivalent to the unconstrained minimisation problem of function $\varphi_\gamma$.

Additionally, if $f\in C^2$, then FBE is $C^1$ with gradient $\nabla \varphi_\gamma(u) = (I-\gamma \nabla^2 f(u))R_\gamma(u)$.

The FBE is constructed as follows: Function $f$ can be approximated at a point $u \in U$ by the (local) quadratic upper bound:

Then, the FBE is defined as

This construction is illustrated in the figure below

Provided that it is easy to compute the distance to $U$, the FBE can be easily computed by

### PANOC

The proximal averaged Newton-type method (PANOC) takes averaged iterated that combine
*safe* projected gradient updates and *fast* quasi-Newtonian directions, that is

and $\tau_\nu$ is chosen so that

for an appropriately chosen positive constant $\sigma$. You mean find more information about the algorithm in this paper.

Some facts about PANOC:

- It uses the same oracle as the projected gradient method
- Eventually only fast updates are activated: $u^{\nu+1} = u^\nu + d^\nu$
- The sequence $r^\nu$ goes to 0 sq. summably
- The cluster points of $u^\nu$ and $\bar{u}^\nu$ are $\gamma$-critical points
- No need to solve any linear systems
- Converges globally
- and it is very fast in practice

## General constraints

### Penalty Method

Let us now move on to problems with more general constraints of the general form

Equality constraints of the form $F_2(u) = 0$ can be used to accommodate inequality ones of the form

by taking $F_2(u) = \max\{0, h(u)\}$. Similarly, constraints of the form

where $\Phi:\mathbb{R}^{n_u}\to\mathbb{R}^{n_2}$ and $A\subseteq\mathbb{R}^{n_2}$, can be encoded by $F_2(u) = \operatorname{dist}_A(\Phi(u))$.

Although PANOC cannot solve the above problem directly, it can solve the following relaxed problem

Starting from an initial $\lambda_0 > 1$, we solve $\mathbb{P}(\lambda_\nu)$,
we update $\lambda_{\nu+1} = \rho \lambda_{\nu}$, for some $\rho > 1$, and we then solve
$\mathbb{P}(\lambda_{\nu+1})$ using the previous optimal solution, $u_\nu^\star$, as an
initial guess. We call $\rho$ the **penalty update coefficient**.

This procedure stops once $\|F_2(u^\nu)\|_\infty {}<{} \delta$ for some tolerance $\delta > 0$. Tolerance $\delta$ controls the infeasibility of the solution.

Typically, few such outer iterations are required to achive a reasonable satisfaction of the constraints, while inner problems are solved very fast using PANOC.

### Augmented Lagrangian Method

Similar to the penalty method, the augmented Lagrangian method (ALM) is used to solve
constrained optimization problems by performing an "outer" iterative procedure,
at every iteration of which, we need to solve an "inner" optimization problem.
ALM updates both a penalty parameter
as well as a *vector* $y$, known as the vector of Lagrange multipliers.

The inner optimization problems have the following form

where

This problem can be solved using PANOC.

At every outer iteration, $\nu$, the vector of Lagrange multipliers is updated according to

where $u^{\nu}$ is an (approximate) solution of the inner problem, $\mathbb{P}(p, c_{\nu}, y^{\nu})$, which is obtained using PANOC.

**Inputs:** $\epsilon, \delta > 0$ (tolerance), $\nu_{\max}$ (maximum number of iterations), $c_0 > 0$ (initial penalty parameter), $\epsilon_0 > \epsilon$ (initial tolerance), $\rho > 1$ (update factor for the penalty parameter), $\beta \in (0, 1)$ (decrease factor for the inner tolerance), $\theta \in (0, 1)$ (sufficient decrease coefficient), $u^0 \in \mathbb{R}^n$ (initial guess), $y^0 \in \mathbb{R}^{n_1}$ (initial guess for the Lagrange multipliers), $Y \subseteq C^*$ (compact set)

**Procedure:** The ALM/PM algorithm performs the following iterations:

- $\bar{\epsilon} = \epsilon_0$, $y\gets{}y^0$, $u\gets{}u^0$, $t,z\gets{}0$
- For $\nu=0,\ldots, \nu_{\max}$
- $y \gets \Pi_Y(y)$
- $u \gets \arg\min_{u\in U} \psi(u, \xi)$, where $\psi(u, \xi)$ is a given function: this problem solved with tolerance $\bar\epsilon$
- $y^+ \gets y + c(F_1(u) - \Pi_C(F_1(u) + y/c))$
- Define $z^+ \gets \Vert y^+ - y \Vert$ and $t^+ = \Vert F_2(u) \Vert$
- If $z^+ \leq c\delta$, $t^+ \leq \delta$ and $\epsilon_\nu \leq \epsilon$, return $(u, y^+)$
- else if not ($\nu=0$ or ($z^+ \leq \theta z$ and $t^+ \leq \theta t$)), $c \gets \rho{}c$
- $\bar\epsilon \gets \max\{\epsilon, \beta\bar{\epsilon}\}$