After my denoising autoencoder has been trained, and then fed data with no noise, its reconstructions from corrupted data look fine, but its reconstructions from un-corrupted data are over-saturated. How does one adapt the trained denoising autoencoder to perform optimally on data that is not corrupted?

The first thing I would try is to train it some more on un-corrupted data with all its parameters clamped except for the biases, but I couldn't find anything about this problem in the literature.

Edit:

To corrupt the inputs, I set each bit to 0 with a fixed probability, which was 0.5 in my test on the MNIST digits. I implemented the denoising autoencoder described in this paper, which is also described here.

Also, what's the deal with tying the weights so that W is the same as the transpose of W'? In all the autoencoders I've tested, this constraint harmed performance on the test set, both for reconstruction and for use as pretraining for a discriminative neural network.

Edit: I alleviated the problem by scaling the final input to hidden weights by a factor of (1 - v), where v is the probability of corrupting a bit in a training example. This ensures that the expected input to each hidden neuron is unchanged. Now, at test time, I can compress and reconstruct, and then repeat on the reconstruction, even 10 times and still have a recognizable digit. Eventually, it still gets over-saturated, though.

asked Jun 08 '13 at 00:17

justonium's gravatar image

justonium
1336

edited Jun 11 '13 at 23:44


One Answer:

If think it might be helpful to give some picture of what you are obtaining, what is you endgoal and to look at this paper.

As far as my understanding of this model goes, denoising autoencoder is trying to model the conditional P(X|tilde{X}) (tilde{X} being an example from the data distribution with additional injected noise), that mean that it denoises the noise that you actually inject and not the one that is actually in the collected data. From this point of view, it is less surprising to obtain a saturated result from a uncorrupted data. Unfortunately, I can't give a better advice than to train on even less corrupted data.

The constraint W_{decode} = W_{encode}^{T} is a way to regularize (limit the capacity of) the model to avoid learning an identity function on the data points for vanilla autoencoder. If g is zero at 0 and has g'(0) = 1 then tg(h/t) = h + 0.5g"(t)(h^2)/t + o(h^2/t) if t goes to infinity for fixed h, therefore you can have this identity for finite data. I have no exact report of your results so I can assume that your model has not enough capacity (it's underfitting) and that apart from dropping this constraint (which is not forbidden) you can try to augment the number of hidden units of your model.

Cheers,

Laurent

answered Jun 08 '13 at 02:33

Laurent%20Dinh's gravatar image

Laurent Dinh
12

Thanks, you might be right about underfitting, as I have been training autoencoders of a fixed size.

As for adapting the autoencoder to work well on uncorrupted data, the suggestion I made in my original post (training the trained denoising autoencoder on uncorrupted data with only the biases allowed to change) looks like it will do the job; I was just wondering if there were a more elegant way around this issue, or if I did something wrong which caused it to arise in the first place.

(Jun 08 '13 at 14:37) justonium

Also, what are g, h, and t in your answer?

(Jun 08 '13 at 14:41) justonium

It is a one-dimensional example: g is a non-linearity, h would be a data point and t would be parameter for the encoding (=1/t) and decoding (=t) weights.

(Jun 08 '13 at 15:10) Laurent Dinh
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.