|
I've been trying to maximize a family of complicated goal functions of ~50 variables. The function is somewhat expensive to calculate as it involves eigenvalue decomposition; 10^4 evaluations can be done in a second, and the complete maximization must be done on the order of seconds. The gradient is not available. The solution is known to lie in a hypercube centered on the origin.
The most promising methods I've looked at are the Efficient Global Optimization (EGO) family of algorithms, which interpolate the goal function from a few samples and solves another, cheaper optimization problem in each iteration in order to decide where to sample next. I've only read the original paper by Jones et al, but these methods seem to focus on much more expensive goal functions than mine -- the problem that must be solved in each iteration seems to take a second. Here is one instantiation of the goal function in two dimensions:
The 50-dimensional version retains many properties (although I can't visualize it): there are hyperplanes with higher function values that intersect close to the maximum, there's a lot of local optima, and the function is "smooth" in the intuitive sense. Have I overlooked some family of algorithms? What algorithm would fit such a problem? |
|
One thing you haven't mentioned is how good the computed optima must be. If the solution doesn't need to be extremely precise, you could give Simulated Annealing a try. You've mentioned that:
Based on (1)-(3), a search method like SA could work, as it requires no gradients, draws from a generation distribution you can choose to match your search space (i.e. your hypercube) and, given time->Inf you get some guarantees about convergence to an optimum value. However, SA is not known to be very fast, so (4) could be a problem. This really depends on (a) how you schedule the annealing process, and (b) how good your solutions need to be. If you can use some heuristics to restrict your search space, and tune the annealing schedule in an attempt to trade off time for accuracy, then you might be able to make it work. Perhaps take a look at Adaptive Simulated Annealing. It's hard to say more without seeing the function(s). Note: A SA method might work, but given that your particle swarm optimization was too slow I wouldn't be too optimistic that SA will be faster. The are numerous hybrid approaches that pair up SA with, for example, Neder-Mead's simplex algorithm (thus defeating the local optima problem.) That might be an avenue worth exploring. Finally, in case you haven't seen it already, Luis Rios (CMU) has a paper that does a quantitative comparison of many derivative free methods. Anecdotal advice: In situations like yours, I have found that my effort is better spent finding ways to improve gradient approximations and/or reduce dimensionality, as opposed to searching for fancier algorithms that can deal with my extremely hard problem. |
|
You should use Bayesian optimization. A lot of Frank Hutter's work in this area is relevant to your problem since you have a high budget of trials (you can do many trials per second). See his website: http://www.cs.ubc.ca/~hutter/ Unfortunately, I don't know the exact best paper for you to look at so you might have to read and skim a bunch of them. For example, take a look at http://www.cs.ubc.ca/~hutter/papers/11-LION5-SMAC.pdf and http://www.cs.ubc.ca/~hutter/papers/11-IJCAI-RCRA-HydraMIP.pdf although other papers may be more appropriate (and even his thesis). Basically, he replaces the typical Gaussian Process regression component of Bayesian optimization with random forests so it is more scalable to large numbers of trials. If this sounds interesting to you, also have a look at http://probabilistic-optimization.org/Global.html
(Aug 07 '12 at 05:02)
osdf
|
|
You could try the BOBYQA algorithm. It's described in the paper "The BOBYQA algorithm for bound constrained optimization without derivatives by M.J.D. Powell". It's similar to the Efficient Global Optimization you mentioned in that it samples points and uses them to build an interpolated model, except that it builds a simple quadratic model. It then uses this model in a trust region framework. So its overhead is very minimal. It's also been integrated into a number of open source libraries, such as dlib. |
|
The first thing I'd try in such a setting is SPSA. http://www.jhuapl.edu/spsa/ Here's the original SPSA paper: http://www.jhuapl.edu/spsa/PDF-SPSA/Spall_TAC92.pdf The basic algorithm is described in Section 2. |
|
Good question. Not an answer, just some comments: 1) have you tried gridding x, snapping all x_i to multiples of say .01, with N-M ? (Any method can be run with first a coarse, then a finer grid, devil in the details.) 2) NLopt has a handful of algorithms which can be easily called from C, Python ... (what do you have ?) to try them out. They include Powell's BOBYQA, and DIRECT and DIRECT_L which sound like EGO. 3) is there a problem like yours online anywhere ? That would be useful for people who play with / refine optimizers. |
|
If that slice is representative of your function, while it suffers from many flat areas and many local maxima, it seems to have only one area that is much higher that the others. That is a very good property for many global optimization algorithms, for example next paragraph. Look at "Stochastic Simultaneous Optimistic Optimization" by Valko et al. Note these guys do theory. Being able to prove something about the performance of an algorithm across all problems may require more conservative behaviour than getting a good result on your particular problem, so you might want to take more risks than they do. Another approach: don't give up on gradients too easily. Stochastic Gradient Descent (SGD) does not require a gradient, but an unbiased random estimate of a gradient. You can get that for pretty cheap: choose a random direction, compute just two function values to estimate the directional derivative in that direction, and normalize correctly, and bob's your uncle. See "Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback" for some hints. Note that none of the guarantees actually apply to you, since they are for convex problems, but SGD sometimes does just fine on non-convex problems too, so its worth a try. |

Have you looked at automatic differentiation