|
I know that Hidden Markov Models are good for modelling speech signals since they can handle latency in the data and variable-length data (by having a transition probability to stay in the same state). Is there an equivalent feature for Conditional RBMs or RNNs (when talking about modelling time series) ? Thanks. |
|
I may be mistaken, but latency in the time series in my mind amounts to the same thing as what is demonstrated by the addition problem discussed in Martens' paper on HF optimization applied to RNNs: generate a sequence of inputs of the form:
... where x0 is a real-valued number, and y0 is a binary indicator variable. The objective is to track the sum:
The purpose of this problem is to demonstrate the RNNs ability to handle long-term dependencies (eg, 10s or hundreds of values in a row with According to that paper, RNN learns a transfer matrix from the previous hidden states to the next hidden states in addition to the regular input-to-hidden matrix. I don't understand how this helps handling delays , unlike HMM where it can stay in the same state and "wait".
(Apr 09 '12 at 10:58)
rm9
Take a look at my answer in the question I'll link at the end of this comment. In short, when you consider an RNN with fixed/constant weights, the behavior of the hidden units is like a dynamical system. Good weights create attractors that keep units, or sets of related units in certain states (or complementary states?). These attractors induce a memory-like behavior in the system, allowing it to remember some piece of information over long periods of time. Information propagated through a time-delayed response like what you've described could potentially be stored within the memory state of an RNN's hidden units. Here's the link: http://metaoptimize.com/qa/questions/8737/whats-the-difference-between-conditional-rbms-and-recurrent-neural-networks
(Apr 09 '12 at 12:13)
Brian Vandenberg
what bothers me is that the hidden-to-hidden weights in RNN are static.. I would expect them to change over time. also, it only looks at the previous step .. what am I missing?
(Apr 09 '12 at 13:48)
rm9
1
Why does it bother you? :) RNNs are able to approximate any (measureable) sequence to sequence mapping and also Turing complete. Thus, with the setup as described in the Martens paper, you have all computational power you want, at least in theory. (Btw, the generating text rnn by sutskever and martens does in a way have changing way, since the inputs and hidden states interact multiplicatively.)
(Apr 09 '12 at 14:05)
Justin Bayer
The weights aren't static during training, only during use of the RNN for real-world (or at least as real-world as your experiment gets) use. In the Martens/Sutskever papers, they used a damping scheme that encourages the system to learn weights that try to keep the hidden-unit representation from changing over time, but doesn't inhibit the weights from changing.
(Apr 09 '12 at 14:12)
Brian Vandenberg
1
ok, I guess I will have to see it in action to fully understand the dynamics. btw: another interesting paper about RNNs http://arxiv.org/abs/1111.4259 thanks all.
(Apr 09 '12 at 14:20)
rm9
That paper sounds very interesting, though I haven't worked with Krylov subspaces before so I fear it's going to take a lot of pepsi to get me through that paper.
(Apr 09 '12 at 14:37)
Brian Vandenberg
I have tried KSD on RNNs and did not have any success--but Oriol indicated that it's working with RNNs at NIPS, so maybe I just did it wrong. My implementation worked on deep networks, though.
(Apr 11 '12 at 05:25)
Justin Bayer
I hope you'll forgive my ignorance here, but ... what's KSD? The only thing I could find is the kernel subdivision algorithm, but considering the acrobatics I had to go through to arrive at that I'm not at all certain it's what you're referring to.
(Apr 11 '12 at 11:48)
Brian Vandenberg
Krylov Subspace Descent, the optimizer introduced in the above paper.
(Apr 12 '12 at 04:43)
Justin Bayer
showing 5 of 10
show all
|