Hi everyone,

I wanted to ask if anyone knew of effective Monte Carlo methods when observing a function of a set of random variables. For example, if X_i ~ Poisson( lambda_i ) and I observe Z = sum_{i=1}^{N} X_i and lambda_i, what is a good way of approximating P(X_1, ... , X_N | Z)?

The simplest approach would be rejection sampling, but as N increases more and more samples will be rejected. One might imagine an importance sampling approach making it more likely for samples to get non-zero probability, but I'd really like a principled way of designing it. MCMC is also an option, but again choosing a good proposal distribution seems often to be a stroke of luck rather than insight.

Another interesting model is if N ~ Binomial(n,p), X_i ~ Poisson(lambda_0), Z = sum_{i} X_i with (n,p,Z) observed. Then each X_i is exchangeable, but N is unknown.

Does anyone know of any literature pertaining to models of this sort, or perhaps results on how difficult it is to do inference under these models?

Thanks!

asked Apr 13 '12 at 23:00

Daniel%20Duckwoth's gravatar image

Daniel Duckwoth
954222938

edited Apr 13 '12 at 23:04


One Answer:

Forgive me if I'm daft, but (1) isn't this (you have a bunch of variables and factors, only observe some of the variables, want to compute expectations over the rest) the standard use case for MCMC? (2) unless you have many copies of these models sharing only some of the variables, won't the exchangeability make all variables identically distributed, and hence maybe there is an analytic solution to each variable's posterior (though maybe not for the joint)?

Assuming you want to do MCMC in these settings, for your first model I'd recommend reparameterizing your variables such that X_N is always equal to Z - sum_i X_i. Then you can work in log-space (so your variables can be positive or negative, and no one cares) and use Hamiltonian Monte Carlo to "automatically" get efficient proposals from the gradient of this log-likelihood function, which should be computable efficiently with a bit of effort.

Your second case is more interesting, and it looks harder, except for the life of me I can't figure out how to share variables in it, so it looks like you should be able to analytically, for a given N, compute the likelihood of Z (given lambda), and then compute this for many values of N, from which you can then sample a value of N, and independently sample from the constrained density using something like the trick above. This sounds funny because it's a case where computing the evidence seems far easier than sampling, which makes me think I'm missing something.

You might be interested, though this is less directly related, in Dan Sheldon's collective graphical models, as they are an abstraction of this specific problem of inferring individual behavior from group-aggregated observations, except he does it with EM using sampling for the E step.

answered Apr 14 '12 at 21:17

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Hi Alexandre,

As for (1), this case is simple enough to enough to come up with a simple MCMC scheme, but what I'm trying to say is that you need to recognize how Z is defined as a function of all X_i. Single-site Gibbs sampling, for example, won't ever move out of its initial configuration. If Z is something more complex and silly like [1 + X_1 % (X_2/3 + 8)]*X_3 / ..., maybe even a black box, then what?

As for (2), you're absolutely right about symmetry, yet still identifying an effective proposal distribution is difficult. It's really counter-intuitive, and I feel like it shouldn't be the case. Lifted Inference [ http://books.nips.cc/papers/files/nips23/NIPS2010_0535.pdf ] is one way of identifying when a collection of variables will have the same distribution, but (to my understanding) it's usually applied when N is known.

Overall, I feel like trying to do Monte Carlo when you have variables defined deterministically by others is pretty common and would be the foundation of doing MCMC over the output of a program. If we could find a way around it, I think we could finally begin to models from their inference algorithms.

(Apr 14 '12 at 21:57) Daniel Duckwoth

Agreed; though I think in practice the reason why instead of saying z = sum(x) people say z ~ Normal(sum(x), epsilon) is to avoid these nasty things with deterministic factors that make Gibbs moves impractical and proposals hard. I think a good generic way out of these situations is reparametrizing things so that the space is unconstrained (as I suggested doing with the sum) and then using standard samplers. Another possibility is smooth things with distributions rather than deterministic factors and anneal down the variances to get a sample.

(Apr 14 '12 at 22:02) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.