|
I'm trying to replicate some of the experiments in the following papers: http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf http://www.cs.toronto.edu/~jmartens/docs/RNN_HF.pdf They basically talk about how to apply hessian free optimization (similar to truncated newton) to training recurrent neural networks. Then, once trained, the network should be able to predict the next step in a time series. This is the part I'm stuck on. In paper [1], there's a short section called "The RNN as a Generative Model" that says (paraphrasing):
Can someone explain what this means? I get that I'm supposed to take a sequence and do a forward pass through the trained weights and biases, getting my output units. The softmax of the output units gets me the probability distribution over that step in the sequence (4 dimensions, something like [.25,.25,.25,.25]), and I can repeat to get probabilities for all steps in the sequence. But then what? Or am I doing it all wrong? Never used a model to generate before, so I'm probably missing some things... Any insight very much appreciated! |
|
I believe the model in the paper you're referring to is a character level model, either way we'll assume this is the case for purposes of illustrating how you can use an RNN to generate sequences. So at time t, you have an input x_t (a character) and the state of the hidden units. Then the output of the network, assuming you've trained you're model to predict the next character in the sequence, will be a distribution over characters. Sample a character from this distribution. Now, treat this sampled character (the output) as the input at time t+1 and feed it back into your model. Rinse and repeat. You're generating sequences. |