|
I'm working through the sparse autoencoder exercises from Andrew Ng here (in python) and I've run into a brick wall in terms of the gradient checking/descent. My numerical checks of the gradient seem to yield good answers (usually < 1e-9 difference between the analytical and numerical gradients), but when I use a BFGS implementation ( a) a disagreement between my cost function and gradient calculation OR b) rounding errors dominate the calculations I haven't yet added in the weight decay or sparsity terms, as recommended, because I'd like to be sure that I'm calculating the gradients correctly. I'm inexperienced at this, and having trouble determining if the trouble is mine or in the scipy BFGS implementation. My code is here, and the data needed to run it can be found at http://ufldl.stanford.edu/wiki/resources/sparseae_exercise.zip. First, try: python autoencoder.py --imgpath '/path/to/IMAGES.mat' This will display the gradients and print the deltas between the analytical and numerical gradient. and then try: python autoencoder.py --imgpath '/path/to/IMAGES.mat' --usebfgs This will use Am I doing something subtly (or blatantly) wrong? When I perform the weight and bias updates in my own code, the reconstruction error always heads steadily downward. If anyone is familiar with either autoencoders, or this BFGS implementation, and can offer me some guidance/tips, I'd be eternally grateful!!!
showing 5 of 6
show all
|
Are you returning the right sign for the gradient?
I think so. If some or all of my gradient's signs were incorrect, I'd expect to see the reconstruction error bounce around or head upward (gradient ascent, in other words), right? What I see is an always decreasing reconstruction error though.
What I mean is, if you're returning the negative gradient instead of the gradient you might still get less error when you move in its direction (obviously) but LBFGS can be confused by what is going on. In practice it's far too easy to do this. Have you tried adding a - sign in front of the thing you pass to L-BFGS to see what happens?
Yes, I get the same result when negating the gradient I've been using. When you say "f you're returning the negative gradient instead of the gradient you might still get less error when you move in its direction (obviously)", that isn't obvious to me, and might mean I'm misunderstanding something important. My intuition tells me that the negative gradient would update all your parameters in exactly the wrong direction, but it sounds like that might not be the case. Thanks for your help!
Maybe check whether the signs of the numerical and analytical gradients agree: (scipy.sign(numerical_gradient)) == (scipy.sign(analytical_gradient)).all()
Surprise, surprise. I had a buggy implementation that wasn't always using the gradient updates from BFGS. Anyway, here's a working python implementation of the sparse autoencoder exercise for anyone who's interested: https://gist.github.com/1319832. Sorry to have wasted everyone's time!