I have a small problem handling sums (integrations) over all combinations of vectors.

First one

Say h is an n dimensional (say binary) vector and we are summing a function over all possible values of h, that is 2^n combinations. How exactly can we split this summation over all values of h into multiple sums of the individual components of h?

Essentially what is done is this:

eq

I am missing an intermediate setp to feel comfortable with it.

sum_{h^} e^{f(h_k)}e^{f(h_{ineq k})}} = sum_{h_k} e^{f(h_k)} sum_{h^setminus h_k}e^{f(h_{i})}}

Second one h is still a vector of binary values and h* are all possible h:

sum_{h^*}prod_j e^{h_j(cj + (Wx)_j + Uj)} = prod_j (1 + e^{h_j(cj + (Wx)_j + Uj)})

alt text

This one appears under the context of discriminative RBM, when finding p(class|x).

Thanks!

asked Jul 25 '11 at 16:49

Roderick%20Nijs's gravatar image

Roderick Nijs
245101422

edited Jul 25 '11 at 17:49

In addition to the image, could you post the latex for it? mathurl.com is blocked from my location. Also, would you mind expounding a bit w/regard to what this relates to? I get the impression it's related to RBMs, but I'm not certain.

(Jul 25 '11 at 17:11) Brian Vandenberg

You are totally right :). It is a step while obtaining p(hi=1|v). If you start from p(v,h), divide by p(v) (marginalizing out h) and sum over all h_{mneq i} I should get the desired probability. Its really simple but i cancelling out all other hidden variables seems to require a trick I am not aware of.

(Jul 25 '11 at 17:21) Roderick Nijs

2 Answers:

Adding to George's answer:

The paper Exponential Harmoniums With Applications To Information Retrieval is a decent starting point, albeit difficult to read. It didn't fall into place very easily for me at all until I learned a bit more about exponential family distributions, however you might have an easier time of it.

As a starting point for understanding how to do this, I was working with Hinton & Salakhutdinov's paper on Semantic Hashing. I'll give a brief synopsis on how I reverse engineered and derived the transition functions they used.

Start with an energy function of the following form:

E(v_i,h) = F1(v,h;w) + F2(v;bv) + F3(h;bh)

This energy function needs to conform to that of an exponential family probability distribution -- in this case, the Poisson distribution. Therefore, it needs to take a form reminiscent of the following:

F(x;theta) = P(v_i|h) = h(x)*exp[ eta(theta)*T(x) - A(theta) ]

... where the Poisson distribution has the following basic form:

  • f(x,lambda) = (x!)^(-1) * exp( x*ln(lambda) - lambda )
  • h(x) = (x!)^(-1)
  • T(x) = x
  • eta(theta) = ln(lambda)
  • A(theta) = lambda
    • small hint: this is the normalization constant

To arrive at the same results they did, I made some of the following observations or assumptions:

  • The terms strictly involving {W,h,b^h} will go away if they appear in E(v,h) as a linear combination. Things get more complicated but doable if your (prior? is that the right word?) is not binary.
  • The normalization term in the denominator of P(v_i|h) resembles the Taylor series approximation of e^x about the point x=0, with one term removed. You can either solve the summation, and your normalization constant pops out, or you can do some educated guesswork and re-introduce that (constant factor) term, after-which the denominator sums to 1 and you're left with A(theta).

If that doesn't provide you with enough details, let me know and I'll find the write-up I did on it and re-post it here (I don't have access to it from my current location, otherwise I'd have posted it already).

answered Jul 26 '11 at 18:14

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

edited Jul 27 '11 at 01:04

Unless the function factors over individual components of h, you can't in general avoid the 2^n term naive summation computation.

However, if you are looking for the derivation for efficiently (linear in the number of hidden units) computing the free energy of an RBM with all the steps filled in, it is in Learning Deep Architectures for AI in equation 5.12 on page 51.

answered Jul 25 '11 at 17:35

gdahl's gravatar image

gdahl ♦
341453559

The latex formula i wrote down was trying to represent exactly the factorization over the individual components. In the equation you point out, they actually don't justify that you can split up the sum. That is the thing im not so confident with, but its good to see somebody do that step explicitly :)

(Jul 25 '11 at 17:47) Roderick Nijs

The sums are all written out explicitly and then factors independent of the summation index are moved out of the sum. The distributive law of multiplication makes this manipulation legitimate. The easiest way to prove that this is legitimate is to start with the factored side of the equation and apply the distributive property of multiplication until you get the un-factored form. If you do that, I am confident you will see why it is justified.

(Jul 26 '11 at 14:56) gdahl ♦

Of the five lines in displayed equation 5.12, which step(s) are you not comfortable with? If you point out which step, I can try to go into more detail. 1->2 is filling in the definition of the energy and the definition of sum_h 2->3 is using properties of exponentials, namely exp(x)*exp(y) = exp(x+y) 3->4 is using the distributive law (although going in the factoring direction) 4->5 is using the definition of subscripted product notation

(Jul 26 '11 at 15:08) gdahl ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.