I'm trying to replicate some of the experiments in the following papers:

http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf

http://www.cs.toronto.edu/~jmartens/docs/RNN_HF.pdf

They basically talk about how to apply hessian free optimization (similar to truncated newton) to training recurrent neural networks. Then, once trained, the network should be able to predict the next step in a time series. This is the part I'm stuck on.

In paper [1], there's a short section called "The RNN as a Generative Model" that says (paraphrasing):

given a training sequence (X1-Xt) (each X being a real numbered vector, t referring to a time step), the RNN uses the sequence of its output vectors (o1-ot) to obtain a sequence of predictive distributions softmax(ot). Objective is to maximize the total log probability of the training sequence SUM(log(P(softmax(ot[:-1])))), which implies that the RNN learns a probability distribution over sequences. Even though the hidden units are deterministic,. we can sample from an MRNN stochastically because the states of its output units define the conditional distribution softmax(ot). We can sample from this conditional distribution to get the next character in a generated string and provide it as the next input to the RNN. This means that the RNN is a directed non-Markov model and somewhat resembles the sequence memoizer.

Can someone explain what this means? I get that I'm supposed to take a sequence and do a forward pass through the trained weights and biases, getting my output units. The softmax of the output units gets me the probability distribution over that step in the sequence (4 dimensions, something like [.25,.25,.25,.25]), and I can repeat to get probabilities for all steps in the sequence. But then what? Or am I doing it all wrong? Never used a model to generate before, so I'm probably missing some things... Any insight very much appreciated!

asked Jul 13 '12 at 18:42

_DaveSullivan's gravatar image

_DaveSullivan
76558


One Answer:

I believe the model in the paper you're referring to is a character level model, either way we'll assume this is the case for purposes of illustrating how you can use an RNN to generate sequences.

So at time t, you have an input x_t (a character) and the state of the hidden units. Then the output of the network, assuming you've trained you're model to predict the next character in the sequence, will be a distribution over characters. Sample a character from this distribution. Now, treat this sampled character (the output) as the input at time t+1 and feed it back into your model. Rinse and repeat. You're generating sequences.

answered Jul 13 '12 at 22:33

alto's gravatar image

alto
60351124

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.