Hi Considering the large number of optimization methods a) Gradient Descent b) Coordinate ascent c) Conjugate Gradient d) BFGS (and variants) .....(please add more) is there a good set of rules on what should be used when ? I mean what method is suited in what conditions ?

thanks

asked Sep 10 '13 at 01:05

turbo364's gravatar image

turbo364
3191012


One Answer:

A rule of thumb I tend to use: - if your cost function is convex, use elaborate batch methods such as L-BFGS. When doing so, initialize it to a good starting point by doing a pass of SGD first? - if your cost function is not convex and you have a small dataset: also use a batch method - if your cost function is not convex and you have a large dataset: use a stochastic method such as SGD or its variants.

Additional details:

  • Except for tiny problems, full Newton methods are almost always a bad idea.
  • Coordinate descent is nice when you can exactly compute the optimum. I've never used it otherwise, which does not say it is not useful.
  • Conjugate gradient only helps when you do not have any better batch algorithm.
  • Stochastic methods are a bit trickier to make work than batch methods, especially in the choice of the learning rate. It is more and more accepted that there is no need to make the stepsize decay to achieve good performance on the test set.

answered Oct 10 '13 at 05:00

Nicolas%20Le%20Roux's gravatar image

Nicolas Le Roux
7652912

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.