I have a question regarding the auto-encoder when using squared error as loss function. In Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a local Denoising Criterion, the author tried two cases affine + sigmoid encoder and either affine decoder with squared error loss of affine + sigmoid decoder with cross-entropy loss.

My question is that what is the problem with this case affine+sigmoid encoder and affine +sigmoid decoder with squared error loss? And for real value, in decoder part, why people use linear activation function instead of non-linear activation function such as tanh or sigmoid.

Thanks

asked Mar 14 '13 at 19:57

pop0432's gravatar image

pop0432
16113


One Answer:

Intuitively: The sigmoid thresholds your output, forcing it to be between 0 and one.

  • If your /inputs/ are Gaussian-distributed, it does not make much sense to reconstruct only in the range ]0,1[.
  • If your inputs are binary, then sigmoid does make sense, but it is already part of the cross-entropy loss, so no need to saturate twice.

The two versions you found are straight forward extensions of popular methods you may want to look at: You can view the auto-encoder with linear output as a linear regression on top of a hidden layer, and the auto-encoder with cross-entropy loss as logistic regression on top of a hidden layer.

answered Mar 15 '13 at 06:03

Hannes%20S's gravatar image

Hannes S
86229

Another possible interpretation from which these combinations follow automatically is to view the autoencoder loss as the conditional log likelihood of the data given the hiddens. If this is Gaussian with the variance fixed at 1, you get an MSE penalty + linear reconstruction. If it is Bernoulli-distributed, you get a cross-entropy penalty + sigmoid reconstruction.

(Mar 15 '13 at 06:28) Sander Dieleman

Actually that's not entirely accurate, apologies. The variance should be constant, but it doesn't matter which value, since it just scales the objective function.

(Mar 15 '13 at 09:11) Sander Dieleman

Thanks. That helps a lot.

(Mar 15 '13 at 10:11) pop0432
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.