I am trying to use LBFGS for training a convolutional neural network. I know this should work in principle, as was demonstrated by a paper from Andrew Ng's group. I am experiencing a dependency on the initialization that I feel is untypical of convolutional nets (trained with SGD for example). Does any one have any experience with applying L-BFGS to CNNs? What was your experience?

asked Sep 20 '12 at 06:36

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

1

Are you doing the trick where every once in a while you reset lbfgs's internal data structures?

(Sep 20 '12 at 09:00) Alexandre Passos ♦

Yes, though not very often.

(Sep 20 '12 at 09:38) Andreas Mueller

Is this trick actually documented in some way somewhere? I've read you mention it to reset the Hessian as soon as one has converged, but I am not sure too how exactly it works.

(Sep 20 '12 at 09:39) Justin Bayer
1

The only thing I noticed is that LBFGS is not equal to LBFGS. For example, scipy's LBFGS gets thrashed by minfunc's LBFGS for machine learning problems.

(Sep 20 '12 at 09:40) Justin Bayer
Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.