Suppose f(x) is a probability mass distribution. There are a number of statistics u_i=E_f[g_i(X)] and the corresponding estimates y_i. We only know y_i and the probability function p(y_i|u_i) (e.g., a Gaussian or Laplace distribution).

If we know u_i exactly, then we can seek for a distribution f that maximizes the entropy subject to the constrains E_f[g_i(X)]=u_i. But we only know the estimates y_i, so I don't know what should be maximized.

I suppose a reasonable objective function might be to maximize


, the sum of the entropy of X and the log-likelihood. But I don't know how to justify it.

Another problem is that it is hard to optimize for Q(f). f(x) should be in the form of

f(x;w)=1/Z(w)*exp(\sum_i g_i(x)*w_i)

I want to use (stochastic) gradient descend algorithm, so I tried to differentiate H_f(x) w.r.t. w_i, and obtain



The term u_i(f;w) can be simply estimated by Monte Carlo approach. But I don't know how to deal with ln(f(X;w))=\sum_i[g_i(x)*w_i]-ln(Z(w)), because the term ln(Z(w)) is hard to estimate.

A compromise is to drop the entropy term in the Q(f), and optimize only for the likelihood p({y_i}|w). But this solution seems not so elegant.

asked Oct 04 '10 at 06:27

x10000year's gravatar image


edited Oct 04 '10 at 08:47

One Answer:

Standard solution is to maximize entropy subject to relaxed moment constraints. The result is typically L1 regularized maximum likelihood See section 2.3.5 in Miroslav Dudik's thesis. If you really want to use entropy as the penalty term, you could do MCMC sampling there as well -- just note that entropy is expected value of logarithm of probability and switch order of derivative and expectation operators.

answered Oct 04 '10 at 12:16

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov

edited Oct 04 '10 at 13:54

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.