Hi, I just finished to watch Joseph's Turian talk about deep belief networks at bil conference:

http://www.bilconference.com/videos/deep-learning-artificial-intelligence-joseph-turian/

and I have a question regarding the 2nd trick which is:

  "building the features one layer at a time"

From what I have read Geoffrey Hinton uses RBMs to do one layer at time training but I don't know yet about Boltzman Machines. I have a well working back propagation algorithm code implemented on a multi-GPU system and I was wondering if I could use it to train deep belief network training features one layer at a time exactly as explained in the video? Seems like it is similar to training auto-associative network, just sliding it as layer level increases. I would train N layers of features, and at the end I would add a small fully connected 3 layer network so it could decide which features to use for corresponding output. Will this technique work? Or these principles only apply to boltzman machines?

Thank you very much in advance.

asked Nov 28 '10 at 11:30

Nulik's gravatar image

Nulik
30336


One Answer:

Backpropagation can be used without modifications to train both the unsupervised and supervised phases of stacked denoising autoencoders. The Learning deep architectures for AI book-length paper by Yoshua Bengio explains how this can be done. You can find example code in the theano deep learning tutorial.

answered Nov 28 '10 at 11:56

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

1

Indeed. Stacked Denoising Autoencoders share some of the nice properties of the deep belief networks. The major difference is that there is no natural way to sample from the model: it more useful as an (un-)(semi-)supervised pre-trained discriminative model while DBNs can be used both generatively and discriminatively.

(Nov 28 '10 at 12:07) ogrisel

DBN's can be 'fine-tuned' by using backpropagation as well. The RBM's are generally trained using an algorithm called contrastive divergence (or one of its variants) one at the time. Every layer is trained on the output of the previously trained module. At the very end you can choose to train the system discriminatively by adding a layer of classification/regression nodes and interpreting it as a standard multilayer perceptron while doing backpropagation.

When using autoencoders you want to be sure they are not just copying the input by choosing smart starting weights regularization or applying some form of noise on the inputs.

(Nov 28 '10 at 16:11) Philemon Brakel
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.