|
I have been working of of the UFLDL tutorials: http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial and have been trying out the sparse autoencoder on different datasets. I tried running it on time-series data and encountered problems. Since the input data has negative values, the sigmoid activation function (1/1 + exp(-x)) is inappropriate. When substituting in tanh, the optimazion program minfunc (L-BFGS) fails (Step Size below TolX). I decreased the TolX constant dramatically with no change. I changed the output layer to linear, kept the input layer sigmoid, but this isn't a preferable solution. The output of the autoencoder is scaled up by a constant (0.5), which boogers the cost function. So.... in short:
Anyway, thanks in advance to anyone who answers! First post on here, I've been reading these forums more and more and am finding them increasingly helpful.. |
|
Did you update your gradient to match your cost function? If so, did you run a gradient check? LFBGS should work if th grading check passes. |
|
Did you use automated differentiation or did you compute the gradients by hand? If you did it by hand you should keep in mind that the derivative of tanh is not the same as the logistic sigmoid or linear units. Here are the derivatives in case this is the cause of the problem. tanh: 1 - f(x)^2 logistic: f(x)(1 - f(x)) linear: 1 Like nop said, it's good to check your gradient with something like finite differences to be sure. |
|
I think I might have figured it out. Thanks to both of you for answering! The sparsity penalty uses Kullback Leibler Divergence. See this link, a bit more than half the way down the page. (Can you type in Latex in here?) It might be kinda long anyway.. http://deeplearning.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity In english:
The sparsity penalty tries to minimize the activations of the hidden units, but it assumes a sigmoid with output range between 0 and 1, since KL div is real only between 0 and 1. If the average activation of tanh is 0 (which is what we would want for a sparse autoencoder) then the KL div given on that page is unhappy. Ive looked around without luck; is there a form of KL div which has an appropriate range for the tanh activation? Any references someone could point me to? On that site linked above, the author says many choices of sparsity penalty are ok, but doesn't elaborate further on what those other choices could be. Is it prudent to just make something up..? Or look for something thats accepted. Thanks again! You could just square or take the absolute value of the tanh activation. Doing this will penalize a value of -0.5 the same as 0.5, which I think is what you would want, although I don't have any literature to back this up. I'm also not sure why you said sigmoid hidden units are inappropriate with negative inputs in your original question. This is not the case.
(Jul 12 '12 at 09:26)
alto
You're right, it isn't the case. I'm pretty new to this stuff.
(Jul 12 '12 at 22:09)
PeterRabbit
|