1
2
  1. In Hinton's NIPS tutorial 2007, When trainning a DBN to model the digit, 10 label neurons are added to the top layer of DBN. And should we always add the label neurons when training a DBN? What if our training data has no label at all?

  2. In the paper To recognize shapes, first learn to generate images, the back propagation algorhtim are introduced to fine tune the DBN to get a better performance on discrimination. So, If we have already fine tuned the DBN, how can we use it to discriminate? Just as a standard artificial neural network?

asked May 14 '13 at 09:18

Chen%20You's gravatar image

Chen You
26235


2 Answers:

Hi,

  1. If you want to use the DBN as a classifier you need to add that layer on top of it, and you will need the labels. However, it still makes sense to build the network in an unsupervised way without that last classifier-layer. What you will get is a "higher level" representation of the data with each hidden layer you add to your stack. You can use this to compress the data, for example.

  2. You can see the DBN greedy layer-wise pre-training as a better way to initialize the neural network weights. Once you have performed this pre-training, it is easier for the backprop algorithm to train the network as a classifier. What I would do is to build a stack of 3-4 layers, pre-train it with the greedy unsupervised algorithm, add a classifier layer (i.e. with the same number of units than classes in your problem) on top of it, and train the whole network using standard backpropagation with the labels. I have made experiments where the backprop was only applied to the last layers of the stack, and the results were also good. As I said in the first point, what you get with the stack is a "high level" representation of the input data, so the classifier will benefit a lot from that even if you don't fine-tune the whole network with backprop.

I hope that helps.

Regards.

answered May 14 '13 at 11:51

David%20Diaz%20Vico's gravatar image

David Diaz Vico
7124

Thank you for your explanation. It helped a lot. But I am still not sure about this: If I was to use a DBN as a generative model, do I still have to add those labels? Another question is about the “classifier” that you just mentioned. Does it refer to merely one layer of neurons, or does it also include labels?

(May 14 '13 at 12:51) Chen You

A good generative model doesn't necessarily imply a good discriminative model. Getting a good generative model only implies you were able to model the underlying distribution of the data. In order for the model to be better at classification, one can use the labelled data to fine tune the model. In DBN you can do that with back-prop.

(May 14 '13 at 13:28) Rakesh Chalasani
1

If you just want to use the network to generate samples with the distribution of the training patterns, then you don't need labels.

I'm not sure, but i think that, once trained, if you propagate forward from the input layer to the last hidden layer, and then backpropagate from the last hidden layer back to the input layer, you should get in the input layer the samples you want. Just like a BM but with a multilayer structure. But for that purpose probably a DBM is a better choice.

When I say "classifier layer" I'm not being very accurate, sorry. I mean a layer where you can "write" your labels and then train using backprop. What we call an output layer in a classical multilayer perceptron trained for classification. It's just that I'm a little bit lazy and my English is not very good ;)

Also, as Rakesh has pointed, the generative pre-training doesn't give you a good classifier. Notice that the objective of such training is not to predict any target, but more or less to make the network "learn the distribution" of the training patterns. However, according to my little experience with this, it is usually a good idea to pre-train a network that will then be trained for a more specific purpose.

Maybe you can find something of use here http://arantxa.ii.uam.es/~gaa/. There is a little bit more detailed explanation of these ideas in my master's thesis and also I uploaded a very simple (and not totally finished and tested, I'm afraid) octave code that reproduces some experiments with DBNs and stacked autoencoders. Of course, the papers of Hinton, Bengio, Ng and others are much better, but maybe a master's thesis is easier to follow if you are a beginner with these models.

(May 14 '13 at 14:28) David Diaz Vico
  1. He adds the label neurons so he can (1) argue that the neural net is learning information about 10 digits, and (2) so that he can backpropagate label error to improve the representation (discriminative fine-tuning).

  2. Yes. While not an exact inference technique, doing the matrix multiplication and softmax on a DBN often ends up performing well.

answered May 14 '13 at 11:28

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.