I'm working through the sparse autoencoder exercises from Andrew Ng here (in python) and I've run into a brick wall in terms of the gradient checking/descent. My numerical checks of the gradient seem to yield good answers (usually < 1e-9 difference between the analytical and numerical gradients), but when I use a BFGS implementation (scipy.optimize.fmin_l_bfgs_b), it informs me that there is either.

a) a disagreement between my cost function and gradient calculation OR

b) rounding errors dominate the calculations

I haven't yet added in the weight decay or sparsity terms, as recommended, because I'd like to be sure that I'm calculating the gradients correctly.

I'm inexperienced at this, and having trouble determining if the trouble is mine or in the scipy BFGS implementation. My code is here, and the data needed to run it can be found at http://ufldl.stanford.edu/wiki/resources/sparseae_exercise.zip.

First, try:

python autoencoder.py --imgpath '/path/to/IMAGES.mat'

This will display the gradients and print the deltas between the analytical and numerical gradient.

and then try:

python autoencoder.py --imgpath '/path/to/IMAGES.mat' --usebfgs

This will use scipy.optimize.fmin_l_bfgs_b to do the minimization.

Am I doing something subtly (or blatantly) wrong? When I perform the weight and bias updates in my own code, the reconstruction error always heads steadily downward. If anyone is familiar with either autoencoders, or this BFGS implementation, and can offer me some guidance/tips, I'd be eternally grateful!!!

asked Oct 20 '11 at 13:19

John%20Vinyard's gravatar image

John Vinyard
30115

closed Oct 27 '11 at 16:49

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Are you returning the right sign for the gradient?

(Oct 20 '11 at 14:04) Alexandre Passos ♦

I think so. If some or all of my gradient's signs were incorrect, I'd expect to see the reconstruction error bounce around or head upward (gradient ascent, in other words), right? What I see is an always decreasing reconstruction error though.

(Oct 20 '11 at 17:10) John Vinyard

What I mean is, if you're returning the negative gradient instead of the gradient you might still get less error when you move in its direction (obviously) but LBFGS can be confused by what is going on. In practice it's far too easy to do this. Have you tried adding a - sign in front of the thing you pass to L-BFGS to see what happens?

(Oct 20 '11 at 17:14) Alexandre Passos ♦

Yes, I get the same result when negating the gradient I've been using. When you say "f you're returning the negative gradient instead of the gradient you might still get less error when you move in its direction (obviously)", that isn't obvious to me, and might mean I'm misunderstanding something important. My intuition tells me that the negative gradient would update all your parameters in exactly the wrong direction, but it sounds like that might not be the case. Thanks for your help!

(Oct 20 '11 at 17:46) John Vinyard

Maybe check whether the signs of the numerical and analytical gradients agree: (scipy.sign(numerical_gradient)) == (scipy.sign(analytical_gradient)).all()

(Oct 21 '11 at 04:22) Justin Bayer

Surprise, surprise. I had a buggy implementation that wasn't always using the gradient updates from BFGS. Anyway, here's a working python implementation of the sparse autoencoder exercise for anyone who's interested: https://gist.github.com/1319832. Sorry to have wasted everyone's time!

(Oct 27 '11 at 11:16) John Vinyard
showing 5 of 6 show all

The question has been closed for the following reason "Problem is not reproducible or outdated" by Alexandre Passos Oct 27 '11 at 16:49

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.