I'm trying to learn a specific graphical model that is deep down a mixture of multinomials, with a few sets that I expect to be highly correlated. Just performing gibbs sampling on the model already learns in the correct direction, but by some extrapolation it should take a few million iteractions before convergence to a good place in the distribution (that I know exists; I checked by cheating in the initialization and maximizing the likelihood instead of sampling).

I guess blocked sampling (as mentioned by Ishwaran and James )seems like an alternative. Am I correct that sampling these highly correlated variables together is a good idea? If so, is there a good reference for finite mixtures of dirichlets (or just general graphical models)?

asked Jun 29 '10 at 21:23

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421


2 Answers:

This paper by Radford Neal is an absolute classic to me. Although it is in the context of a Dirichlet process mixture, the discussion applies to finite mixtures as well. The paper discusses 8 different kinds of samplers and evaluate which ones mix faster than others. Although a blocked Gibbs sampler can help a lot, in my experience a collapsed Gibss sampler (check Radford's paper, this one integrates out parameters) can do wonders. Moreover a slice sampler (which adds more random variables to the problem) can also help mixing faster.

answered Jun 30 '10 at 03:00

Jurgen's gravatar image

Jurgen
99531419

I was already using a collapsed gibbs sampler (I generally have good experiences with them as well), but using a slice sampler to resample the hyperparameters improved mixing perceptively. I wonder how I could apply it to the dirichlet variables, though.

(Jun 30 '10 at 13:23) Alexandre Passos ♦

In general block sampling is a good idea if you can do it sensibly (i.e. your proposal accounts for the structure/correlation of the problem). Liu's monte carlo book has all sorts of examples of Hybrid MCMC schemes. An alternative is to introduce an auxiliary variable that reduces the dependence.

Ishawaran and James seem to rely on a truncation approximation. I doubt that's necessary. Kalli et al (2010) show how slice sampling can avoid truncation approximations. It's a pretty intuitive idea: once you introduce the auxiliary slicing variable, the model is finite.

answered Jun 30 '10 at 00:25

Tristan's gravatar image

Tristan
27138

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.