I did not see this problem asked a lot here. However this seems an interesting and very challenging task. What are the stable methods for estimating marginal likelihood of a Dirichlet process mixture model? I am mainly focusing on MCMC/Gibbs sampling based methods.

Besides the infamous Harmonic Mean approach, there are some better (more accurate and stable) methods for estimating marginal likelihood. However, due to the high and unknown dimensionality of the model parameters (latent variables representing cluster assignment), few of them have proven to have high accuracy in the setting of Dirichlet process mixture model. For example, although considering as being accurate, Chib's method (listed below) also yield errors in a mixture model setting, which was pointed out by Radford Neal here.

In the following I listed some of the methods for estimating likelihood from Gibss sampling or MCMC for Bayesian model (however, may not work in a Dirichlet process mixture model). Any suggestions and recommended reading will be highly recommended.

  1. Chib, S. "Marginal likelihood from the Gibbs output". JASA, Vol. 90 (1995),
  2. Neal, R. M. (1998) "Annealed importance sampling", Technical Report No. 9805 (revised), Dept. of Statistics, University of Toronto
  3. N. Friel, A. N. Pettitt. "Marginal Likelihood Estimation via Power Posteriors", Journal of the Royal Statistical Society: Series B (Statistical Methodology), Volume 70, Issue 3, pages 589–607, July 2008

The power posteriors method is based on thermodynamics and path sampling by Gelman and Meng, and is very useful, though again, it is not clear if it works in a DPM case.

Thanks.

asked Aug 31 '10 at 09:06

Denzel's gravatar image

Denzel
963410

This is an interesting question, although I'm not aware of any approach that you did not list. Do variational lower bounds count?

I was not aware of this Marginal Likelihood from the Gibbs Output paper, and it seems very intetesting.

(Aug 31 '10 at 10:28) Alexandre Passos ♦
1

Hi Alexandre:

Thanks for your comment. The Chib's paper showed an example (Section 4.2.1) of applying its approach to a finite mixture of Gaussians, where the number of components are fixed and known. However, the application of it to mixture model should be warned. Regarding to the method, Radford Neal here pointed out that

"I believe the problem arises because mixture models are not identifiable, since relabeling the mixture components does not change the probability density for the data"...

He then suggested that a method should constraint the labeling of the components to produce some ordering. To do this correctly, one would need to incorporate, in its model, a prior over such a constraint.

The relabeling problem seems a general problem in marginal likelihood estimation for mixture model. Beyond this problem, however, I am wondering what are the other problems that could prohibit the application of the listed estimation methods in a DPM setting?

(Aug 31 '10 at 11:25) Denzel

2 Answers:

For mixture models in general, there have been some attempts to address the label-switching problem faced by Chib's approach for marginal likelihood estimation. See this paper, and the references therein. I'm not aware of work that specifically addressed it in the DP mixture framework but similar remedies might apply.

If you are considering the marginal likelihood of DP mixtures for the purpose of hyperparameter estimation (via empirical Bayes), there is this paper that presents an approach.

answered Aug 31 '10 at 12:11

spinxl39's gravatar image

spinxl39
342894367

Hi spinxl39:

The reason of me considering the marginal likelihood of DPM is for the purpose of model comparison.

The paper by Marin and Robert you mentioned made me re-consider the applicability of Chib's method in a positive way. Thanks for mentioning it.

(Aug 31 '10 at 14:53) Denzel

Hi, Im looking for some information about the accuracy of DP in clustering, but could not find much. Is there any kind of lower bound, from which we can compare with other clustering methods? Or is there any alternative way to compare two clustering methods? Im not sure if we can use marginal likelihood either. Any help would be really appreciated! Thanks.

answered Nov 08 '10 at 01:37

Nam%20Nguyen's gravatar image

Nam Nguyen
1

Evaluating clustering is hard. DP based clustering generally perform well whenever the main assumption of the DP is preserved: that the clusters were formed in a rich-get-richer process. If this assumption does not really apply, you're better off using some other technique to pick the number of clusters, such as the uniform process, from Wallach et al http://www.cs.umass.edu/~wallach/publications/wallach10alternative.pdf

(Nov 08 '10 at 06:30) Alexandre Passos ♦

Also, the only "proper" way to evaluate a clustering algorithm is using the clustering as a part of a system where there is a well-defined performance measure, and then comparing increases/decreases of end-to-end performance versus the clustering algorithm used.

(Nov 08 '10 at 06:33) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.