Hi,

I have a problem to compute variational marginals like in the following link

http://mathbin.net/87482

asked Jan 17 at 19:30

Haluk%20Dogan's gravatar image

Haluk Dogan
21224


One Answer:

First write down the functional form of q(x1) and q(x2) in terms of some parameters. For example, you can use q(x1 = i) = theta_i, where theta is a vector of positive real numbers summing to one (and use, say, a phi vector for q(x2)). Then you can write down

KL(q||p) = sum_i sum_j q(x1=i) q(x2=j) log (q(x1=i) q(x2=j)/p(x1=i,x2=j))
    = sum_i sum_j theta_i phi_j (log theta_i + log phi_j - log p(x1=i, x2=j))
    = sum_i sum_j theta_i phi_j (log theta_i ) + sum_i sum_j theta_i phi_j (log phi_j) - sum_i sum_j theta_i phi_j ( log p(x1=i, x2=j))
    = sum_i theta_i (log theta_i ) + sum_j phi_j (log phi_j) - sum_i sum_j theta_i phi_j ( log p(x1=i, x2=j))

and finally you can minimize this function in terms of theta and phi by taking the gradient and setting it to zero, doing coordinate descent. For theta, the gradient is

grad_theta_i KL(q||p) = log(theta_i)+1 - sum_j phi_j log p(x1=i,x2=j)

which, when constrained to lie on the simplex, will lead to an update of the sort where you set

theta'_i = exp(sum_j phi_j log p(x1=i, x2=j)  - 1)

and then set theta_i to the normalized value of the theta' variables. The update for phi will be exactly the same, except with the variables inverted.

Of course, you can relax or tighten any of the decisions I made. For example, instead of having q(x1=i) = theta_i, you can have it be q(x1=i) = delta(i = v), for a parameter v. This would use a point approximation instead of a soft approximation, and the global optimum would place the parameters for q(x1) and q(x2) such that p(x1,x2) is maximized.

answered Jan 17 at 20:20

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1899744214335

edited Jan 17 at 20:25

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.