|
I understand that both can be used to model time series data, but I'm not clear on the difference between the two and when you might use one over the other. Can anyone explain? |
|
Typically, when conditional RBMs are used for time series modeling (as in Modeling Human Motion Using Binary Latent Variables) the model is simply an RBM that conditions on previous observations, so in the rest of my answer, for concreteness, this is the sort of conditional RBM model I will assume. I will also assume we only have one sequence v(1) ... v(t) ..., not two paired sequences of observations and outputs, i.e. unlabeled sequence data. Once we condition on the previous observation(s), in the conditional RBM case we end up with an RBM for the current observation. Depending on what we conditioned on, this RBM might end up with different effective biases. But now we just have an RBM for the current time step v(t). This RBM defines a joint distribution P(v(t), h(t) | v(t-1)). In other words, it defines a distribution over the hidden and visible state at time t given the previous observation. So we can, among other things, marginalize away the hidden state h(t) and sample settings of the visible units. Also, note that in this case we don't have connections from previous hidden states. The conditional RBM is inherently a probabilistic model since you get an RBM after you do the conditioning. Typically, such RBMs are trained with a crude approximation to maximum likelihood. Contrast this with a recurrent network that predicts the next observation from the hidden state at the previous time step. Such a RNN is defined by the following two parameterized functions: h(t) = f( v(t), h(t-1); U ) and v(t+1) = g( h(t); W ). U and W are the model parameters. Both f and g are nonlinear functions, typically formed by applying a matrix and then an elementwise nonlinearity. This RNN can be fully deterministic. If it is able to stochastically generate sequences, the stochasticity will probably arise from setting up g to produce a distribution over v(t+1) and then sampling from it and feeding the sample back in to f for the next time step. In this way the RNN can be used to create a distribution over sequences. RNN gradients can be obtained using backpropagation through time once you have defined a suitable error function for the predicted v(t)'s. Then you can apply any optimizer you like. You can also think of unrolling the RNN into a feedforward neural net. As is hopefully clear, there are many differences of various levels of importance between the two models I described above. In particular, remember that he RNN has rich nonlinear hidden state dynamics because of the recurrent connections. The conditional RBM doesn't have recurrent connections. Also, the conditional RBM provides a undirected model of P( v(t), h(t) ) with directed connections from the past. The RNN needs the (feed-forward) connections added from the hidden state to the next observation (the g function in my notation) to define a distribution over the next observation. Use an RNN if you need a sequence model with rich, nonlinear hidden dynamics. Use a conditional RBM if you want a sequence of RBMs through time that are allowed to look at some of the inputs to previous RBMs in the sequence. |
|
George's answer is well-written, though I do want to add/point out one subtle point on the RBM model. Specifically, my comment relates to the CRBM model proposed in Graham Taylor's paper on modeling human motion. There've been more recent models proposed that complicate matters by allowing you to condition on previous latent-variable states, not just on previous visibles -- however, I'm not versed enough in them to go into detail. When you get right down to it, the model in Graham's papers on modeling pidgeon and human motion are only slightly more complicated RBMs. The previous time steps on the visibles contribute to the model by providing a dynamically changing bias. Contrasted with an RNN, the "brain-state" of the RNN is more similar to the CRBM models that condition on previous latent states. The contribution from previous time steps is no longer just used as a dynamic bias. If you treat the weights as constant on an RNN, the behavior of the internal "memory" state of the system can be thought of as a dynamical system. In a sense, good weights will create attractors to help induce stability in the internal memory state -- which is what gives the RNN the ability to act on information over time in a dynamic fashion. The CRBM, on the other hand, is far more static. It doesn't have a memory state. At each time step, it starts fresh; it sees a set of previous frames (which may or may not have been generated by the system at previous time steps), and more or less mechanically operates on those previous frames to generate a prediction about the next frame. Just to emphasize, the CRBM does not have memory. In my opinion at least, it's somewhat on par with using a dictionary for a chat bot -- no matter how clever the bot. This isn't to say CRBMs are bad; I think they're incredibly cool, I just wanted to make it clear where (in my mind at least) the differences lie. |
|
As far as I understand it , the main difference is the overall interpretation. The architecture is similar to that of hopfield networks, but while hopfield networks are deterministic, RBMs are stochastic and generatives, that is, you can generate new samples out of the network. |