Brian Borchers Blog: 2011

My sabbatical leave last fall at the Institute for Pure and Applied Mathematics (IPAM) at UCLA was a wonderful opportunity to get up to speed on some recent developments in convex optimization and to reassess where the field is going. I recently gave a talk to our math department seminar with the somewhat provacative title of Convex Optimization 2.0.

This posting is a brief summary of what I talked about, along with some links to useful resources related to the talk.

I'll start with a quote from R. Tyrrell Rockafellar in 1993.

In fact, the great watershed in optimization isn't between linearity and nonlinearity, but convexity and nonconvexity.

R. Tyrrell Rockafellar (1993)

At the time, this was a somewhat surprising statement. Through the 60's, 70's, and into the 80's, most people working in the field thought of mathematical optimization being divided into two parts- linear programming and nonlinear programming. The central mathematical ideas of linear programming were combinatorial in nature, while the mathematics of nonlinear programming was all about analysis of smooth functions and applications of Newton's method.

What's amazing to me about this quote is that today, most of the people that I know who are working in optimization would agree with Rockafellar. The point of my talk and this blog post is to tell the story of how we got to this point.

George Dantzig's simplex method was the preferred method for solving linear programming problems from the 1950's through the mid 1980's. An enormous amount of research and software development effort (and lots of CPU cycles) were invested in the simplex method.

Methods for nonlinear programming are often divided into first order methods that use only first derivative information and second order methods that use second derivative information to obtain faster convergence. Of course second order methods also require storage that grows quadratically with the size of the problem. Thus first order methods are often preferred for very large scale problems, while second order methods are more commonly used for small to medium sized problems. There are also methods such as the Limited Memory BFGS Method that try to fit into the space between first and second order methods to get faster convergence than the first order methods without the memory requirements of second order methods.

When primal-dual interior point methods for linear programming appeared on the scene in the mid 1980's, it was surprising that ideas from nonlinear programming could actually be useful in linear programming. In less than a decade, these new interior point methods evolved to the point that they were often competitive with the classical simplex method, particularly on very large and degenerate linear programming problems. Interior point methods were also being applied to nonlinear programming problems, further blurring the distinction between linear and nonlinear programming.

Convex optimization problems are problems in which the feasible region is a convex set and the objective function to be minimized is a convex function. Some researchers, including Rockafellar, had worked on methods for solving nonlinear optimization problems that were convex but not necessarily smooth. The huge advantage of this class of problems is that any local minimum of a convex optimization problem is also a global minimum. On the other hand, without smoothness it is impossible to directly apply methods like Newton's method that depend on the differentiability of the objective and constraint functions. Methods for nonsmooth convex optimization (or "nondifferentiable optimization" as it was sometimes called) were quite limited.

After the development of interior point methods for linear programming, it was quickly discovered that the primal-dual interior point method for LP could also be extended to handle other kinds of convex optimization problems. For example, interior point methods can be used to solve convex quadratically constrained programming problems (CQP).

Of particular interest were second order cone programming problems which involve constraints of the form

$
\|Ax-b \|_{2} \leq c^{T}x+d
$

and semidefinite programming problems in which the variable $X$ is a symmetric matrix that is constrained to be positive semidefinite.

Looking at this situation from the point of view of our ability to solve various kinds of convex optimization problems, we now had a heirarchy

$
LP \subset CQP \subset SOCP \subset SDP \subset CP.
$

with primal-dual interior point methods able to efficiently and reliably solve problems up to the level of SDP.

Although SOCP and SDP might seem to be extremely weird problem classes, it turned out that in practice, many convex optimization problems could be reformulated into SOCP or SDP form. This led to a paradigm that I'll call Convex Optimization 1.0. Start with a convex optimization problem, reformulate as an LP, CQP, SOCP, or SDP, and then use a primal--dual interior point method to solve the problem. This approach is beautifully laid out in the 2003 textbook, Convex Optimzation, by Stephen Boyd and Lieven Vandenberghe.

In fact, the reformulation techniques used to put a problem into standard form are so simple that they can actually be built into software. For example, Michael Grant and Stephen Boyd developed a MATLAB package called CVX that automatically reformulates convex optimization problems as LP's, SOCP's, or SDP's, and then solves them using a general purpose LP/SOCP/SDP solver routine. Some other important software packages that also implement this approach include YALMIP, ROME for robust optimization, and GLOPTIPOLY and SOSTOOLS for polynomial optimization problems.

As interest in convex optimization grew, one important problem with primal-dual interior point methods became apparent. The primal-dual interior point method is a second order method, with storage requirements that grow as the square of the problem size. Users of interior point codes were often disappointed that the problems they wanted to solve simply couldn't be solved because of this storage requirement. The need for more storage efficient methods was particulary important in the burgeoning field of compressive sensing, where researchers wanted to solve problems involving millions of variables.

This led to a revival of interest in first order methods for convex optimization problems. Although these methods had originally been developed in the 1970's, this was an area of research that had been basically dead for a long time.

The classic algorithm for smooth convex minimization problems is the projected gradient descent method. Given a convex function $f(x)$, and a convex feasible set $C$, and a function for computing the projection of a point onto the nearest point in $C$, the projected gradient descent method simply alternates between moving downhill and projecting back onto the set $C$. If the problem is unconstrained, the $P_C$ is trivial. The projected gradient descent method follows the iteration:

$
x^{k+1}=P_{C}(x^{k}-\alpha_{k}\nabla f(x^{k}))
$

The step lengths $\alpha_{k}$ can be chosen in many different ways to insure convergence. One simple rule is $\alpha_{k}=1/k$.

For nonsmooth convex optimization problems, the projected gradient descent method can be extended by using subgradients instead of the gradient vector. A vector $g^{k}$ is a subgradient of $f$ at the point $x_{k}$, if

$
f(x) \geq f(x^{k})+g^{T}(x-x^{k})\;\; \mbox{for all} \; x.
$

The convergence rate of these methods is sublinear. However, it can be useful to analyze how many iterations are required to obtain a solution $x^{k}$ with $f(x^{k})-f(x^{*})<\epsilon$, where $x^{*}$ is an optimal solution. The classical (projected) (sub)gradient descent method takes $O(1/\epsilon)$ iterations on smooth problems and $O(1/\epsilon^{2})$ iterations on nonsmooth problems.

Nesterov (1983) gave an optimal first order method for smooth convex optimization problems with Lipschitz continuous gradients. This accelerated first order method uses previous gradients in a recursive formula for computing the next step. This effectively provides the algorithm with some information about the curvature of the function. This algorithm requires $O(\sqrt{L/\epsilon})$ iterations.

In 2005, Nesterov returned to this earlier algorithm and showed how a nonsmooth convex optimization problem could be approximated by a smooth convex optimization problem with Lipschitz continuous gradient. Nesterov's smoothed problem had $L=O(1/\sqrt{\epsilon})$, so the accelerated version of gradient descent on the smoothed problem took $O(1/\epsilon)$ iterations. Since 2005 there has been an explosion of interest in methods for various convex optimization problems that use this idea of accelerated gradient descent on a smoothed version of a nonsmooth convex optimization problem.

This approach has been particularly successful in the field of compressive sensing. Here the goal is to find a sparse (few nonzero entries) solution to an underdetermined linear system of equations $Ax=b$. Minimizing the number of nonzero entries in the solution is actually an NP-Hard combinatorial optimization problem. However, it turns out (and this is another story worthy of its own blog posting) that this can be done very effectively in practice by solving a basis pursuit denoising problem .

$
\min \| x \|_{1}
$
subject to
$
\| Ax - b \|_{2} \leq \sigma .
$

Although these problems can be solved in the Convex Optimization 1.0 paradigm by a general purpose SDP solver, in practice, specialized first order methods are vastly more efficient. Stephen Becker has produced a very useful collection of resources on software for BPDN and related problems.

Now we have the beginings of a new paradigm that I'll refer to as Convex Optimization 2.0. Use an accelerated first order method with smoothing to solve your convex optimization problem.

So far, most of the work on first order methods has been specialized to particular problems. An interesting recent development is the release of Templates for First Order Conic Solvers (TFOCS) by Stephen Becker, Emmanuel J. Candès and Michael Grant. TFOCS deals with these problems by formulating them in conic form (using for example SOCP constraints), dualizing, and then applying a first order method to the dual problem.

Although first order methods have worked out very well in many applications, we haven't yet seen these methods develop to the point that a general purpose first order method for LP, SOCP, and SDP can take the place of a primal-dual interior point method for general problems. The semidefinite programming case seems to be particularly hard. However, I'm hopeful that we may soon see a resurgence in interest in convex optimization as first order methods become more broadly applicable.

Now that I've returned to Socorro, it's a good time to sit down and reflect on my recent sabbatical leave at the Institute for Pure and Applied Mathematics (IPAM) at UCLA.

During this visit to IPAM I was a core participant in the Fall 2010 long program on Modern Trends In Optimization And Its Application. This meant that I was in residence at IPAM for 3 months and got to attend a series of week long workshops on various optimization topics. The other core participants were mainly graduate students and postdocs from UCLA, Cal Tech, Stanford, MIT, Rice, the University of Washington, and other places in the US, as well as international visitors from Belgium, France, Germany, Israel, Singapore, and the UK. In addition to the younger folks, there were a few of us more senior folks. I really enjoyed my time with this group of highly motivated and energetic optimization researchers.

The title of the long program was deliberately broad, but as you'll see in a moment, nearly all of the research presented in the various workshops related to convex optimization and applications of convex optimization.

The workshops included:

Optimization Tutorials. This first workshop consisted entirely of tutorial lectures introducing topics that appeared in later workshops.

Workshop I: Convex Optimization and Algebraic Geometry. This workshop focused on applications of convex optimization to polynomial optimization problems.

Workshop II: Numerical Methods for Continuous Optimization. This workshop focused primarily on first methods for convex optimization problems and particularly methods for problems involving nonsmooth convex functions that appear frequently in image processing and compressive sensing.

Workshop III: Discrete Optimization. Discrete optimization problems are inherently nonconvex. However, an important theme of this workshop was convex relaxations of discrete optimization problems, particularly SDP relaxations. There were also a couple of interesting talks related to graph sparsification.

Workshop IV: Robust Optimization. In robust optimization, we consider an optimization problem with problem data that are uncertain within a specified range or uncertainty set and formulate a robust counterpart to the original problem whose solution will be optimal regardless of where the problem data are within this uncertainty set. In most cases of interest the resulting reformulated problems are second order cone programs or semidefinite programming problems.

Workshop V: Applications of Optimization in Science and Engineering. This workshop focused mainly on applications of convex optimization in engineering design optimization and image processing.

Culminating Workshop at Lake Arrowhead. The culminating workshop at Lake Arrowhead was a chance for participants to present results of projects that they'd been working on during the long program. This relaxing week at UCLA's Lake Arrowhead conference center was a wonderful way to end the long program. By this time the group had bonded together, and this was our last chance to enjoy some social time together.

Overall, I would say that the workshops at IPAM give a very good overview of what's going on today in the area of convex optimization. Slides for many of the talks are available from the workshop web pages. In other cases you may have to contact the authors to get copies of the slides or related papers. I think that it would have been very helpful (particularly for the tutorial workshop) if IPAM had used a lecture capture system to capture talks so that others could see them.

In my next couple of postings I'll talk a bit more about some of the most important things that I learned during the long program.

Brian Borchers Blog

Sunday, April 24, 2011

Convex Optimization 2.0

Sunday, February 13, 2011

LaTeX in blogger postings

Sunday, January 2, 2011

How I Spent My Sabbatical Leave

About Me