|
I am wondering if there is some equivalence between a certain kind of restricted Boltzmann machines and pairwise Markov networks in terms of MAP inference. More specifically, let $y in {0,1}^m$ be the output/visible layer and $h in {0,1}^k$ be the hidden layer. Specifically $k < n$ (may be $ll$.) Let's say you consider the following style of MAP estimation over RBMs (note that instead of marginalizing over $h$, I am maximizing. argmax_{y in {0,1}^n ,h in {0,1}^k} y^T A h + u^t y + v^t h (1) for some given $A, y,$ and $v$. Now, a consider a pairwise Markov network defined only over $y$ where the inference is given by argmax_{y in {0,1}^n} y^T B y + w^t y (2), for some given $B$ and $w$. My question is that, given an RBM with matrix $A$ and vectors $u$ and $v$, do there exist $B$ and $w$ such that (1) and (2) are maximized for the same $y$? If not generally, is this true under some conditions? Is the vice-versa (i.e. given $B$ and $w$, come up with equivalent RBM) true with $k < n$? |
|
An RBM is an instance of a pairwise markov network (= [pairwise] random field = pairwise undirected graphical model) with latent variables. Does that answer your question? You can just let the y in the markov network be the concatenation of y and h in the RBM. Training is different since h is not observed but that doesn't change inference formally. Because of the special structure of the RBM, MCMC inference is somewhat easier than in a general pairwise random field.
This answer is marked "community wiki".
Thanks for the response Andreas. However, I am looking solely for expressiveness from the point of view of output variables y. That is for a given set of output variables y, you either directly using pairwise energy functions as in (2) or you introduce hidden variables and use the objective function in (1). My question is, what difference in expressiveness there is from the perspective of y? And when are they equivalent?
(Dec 07 '12 at 16:13)
Rajhans Samdani
1
The RBM is more powerfull as it allows higher order interactions. With a pairwise MRF, you can not encode "two of these three should be on" (hope the example is correct ;) I don't have a proof ready that each MRF can be modeled as an RBM but I'm pretty sure it's true. Maybe add one hidden node to the RBM per edge in the CRF and give it only weights to the two nodes incident to the edge. I feel that should somehow do it. edit: you could model all the possible "states of the edges" (i.e. node1=0, node2=0, 01, 10, 11) if you add two binary hidden nodes per edge. Maybe this is not the simplest way but I think it works.
(Dec 07 '12 at 17:00)
Andreas Mueller
Thanks Andreas. Regarding conversion from MRFs to RBMs, I am looking for a "low-rank" conversion in that convert MRF to RBMs with a small number of hidden states (number of hidden states < number of output variables, preferably <<). Any references for this one?
(Dec 08 '12 at 17:03)
Rajhans Samdani
Raj, this is a really interesting question! But do you think there is such a low-rank conversion that is efficient in general? My worry is that the performance of such a conversion would be highly dependent on the actual distribution that you are modeling.
(Dec 09 '12 at 07:37)
Oscar Täckström
Well, actually I am not really looking for direct conversion per se... and I agree that it might be hard/futile. I am just trying to theoretically explain the ease/difficulty of learning MRFs using a hidden variable model like RBMs.
(Dec 09 '12 at 22:01)
Rajhans Samdani
|