1
2

I know that Hidden Markov Models are good for modelling speech signals since they can handle latency in the data and variable-length data (by having a transition probability to stay in the same state).

Is there an equivalent feature for Conditional RBMs or RNNs (when talking about modelling time series) ?

Thanks.

asked Apr 09 '12 at 10:27

rm9's gravatar image

rm9
586203833


One Answer:

I may be mistaken, but latency in the time series in my mind amounts to the same thing as what is demonstrated by the addition problem discussed in Martens' paper on HF optimization applied to RNNs: generate a sequence of inputs of the form:

{ {x0,y0}, {x1,y1}, {x2,y2}, ..., , {xN,yN} }

... where x0 is a real-valued number, and y0 is a binary indicator variable. The objective is to track the sum:

s = x0*y0 + x1*y1 + ... + xN*yN

The purpose of this problem is to demonstrate the RNNs ability to handle long-term dependencies (eg, 10s or hundreds of values in a row with y_j == 0 before another y_k == 1 shows up.

answered Apr 09 '12 at 10:44

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

According to that paper, RNN learns a transfer matrix from the previous hidden states to the next hidden states in addition to the regular input-to-hidden matrix. I don't understand how this helps handling delays , unlike HMM where it can stay in the same state and "wait".

(Apr 09 '12 at 10:58) rm9

Take a look at my answer in the question I'll link at the end of this comment. In short, when you consider an RNN with fixed/constant weights, the behavior of the hidden units is like a dynamical system. Good weights create attractors that keep units, or sets of related units in certain states (or complementary states?). These attractors induce a memory-like behavior in the system, allowing it to remember some piece of information over long periods of time. Information propagated through a time-delayed response like what you've described could potentially be stored within the memory state of an RNN's hidden units. Here's the link: http://metaoptimize.com/qa/questions/8737/whats-the-difference-between-conditional-rbms-and-recurrent-neural-networks

(Apr 09 '12 at 12:13) Brian Vandenberg

what bothers me is that the hidden-to-hidden weights in RNN are static.. I would expect them to change over time. also, it only looks at the previous step .. what am I missing?

(Apr 09 '12 at 13:48) rm9
1

Why does it bother you? :) RNNs are able to approximate any (measureable) sequence to sequence mapping and also Turing complete. Thus, with the setup as described in the Martens paper, you have all computational power you want, at least in theory. (Btw, the generating text rnn by sutskever and martens does in a way have changing way, since the inputs and hidden states interact multiplicatively.)

(Apr 09 '12 at 14:05) Justin Bayer

The weights aren't static during training, only during use of the RNN for real-world (or at least as real-world as your experiment gets) use. In the Martens/Sutskever papers, they used a damping scheme that encourages the system to learn weights that try to keep the hidden-unit representation from changing over time, but doesn't inhibit the weights from changing.

(Apr 09 '12 at 14:12) Brian Vandenberg
1

ok, I guess I will have to see it in action to fully understand the dynamics. btw: another interesting paper about RNNs http://arxiv.org/abs/1111.4259

thanks all.

(Apr 09 '12 at 14:20) rm9

That paper sounds very interesting, though I haven't worked with Krylov subspaces before so I fear it's going to take a lot of pepsi to get me through that paper.

(Apr 09 '12 at 14:37) Brian Vandenberg

I have tried KSD on RNNs and did not have any success--but Oriol indicated that it's working with RNNs at NIPS, so maybe I just did it wrong. My implementation worked on deep networks, though.

(Apr 11 '12 at 05:25) Justin Bayer

I hope you'll forgive my ignorance here, but ... what's KSD? The only thing I could find is the kernel subdivision algorithm, but considering the acrobatics I had to go through to arrive at that I'm not at all certain it's what you're referring to.

(Apr 11 '12 at 11:48) Brian Vandenberg

Krylov Subspace Descent, the optimizer introduced in the above paper.

(Apr 12 '12 at 04:43) Justin Bayer
showing 5 of 10 show all
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.