|
Hi Considering the large number of optimization methods a) Gradient Descent b) Coordinate ascent c) Conjugate Gradient d) BFGS (and variants) .....(please add more) is there a good set of rules on what should be used when ? I mean what method is suited in what conditions ? thanks |
|
A rule of thumb I tend to use: - if your cost function is convex, use elaborate batch methods such as L-BFGS. When doing so, initialize it to a good starting point by doing a pass of SGD first? - if your cost function is not convex and you have a small dataset: also use a batch method - if your cost function is not convex and you have a large dataset: use a stochastic method such as SGD or its variants. Additional details:
|