|
I am trying to use LBFGS for training a convolutional neural network. I know this should work in principle, as was demonstrated by a paper from Andrew Ng's group. I am experiencing a dependency on the initialization that I feel is untypical of convolutional nets (trained with SGD for example). Does any one have any experience with applying L-BFGS to CNNs? What was your experience? |
Are you doing the trick where every once in a while you reset lbfgs's internal data structures?
Yes, though not very often.
Is this trick actually documented in some way somewhere? I've read you mention it to reset the Hessian as soon as one has converged, but I am not sure too how exactly it works.
The only thing I noticed is that LBFGS is not equal to LBFGS. For example, scipy's LBFGS gets thrashed by minfunc's LBFGS for machine learning problems.