I wanted to know what is the mathematical justification for using ICM as an approximation for the E step in an EM algorithm.

As I understand in the E step the idea is to find a distribution that is equal to the posterior distribution of the latent variable, which guarantees that the likelihood increases or find the best possible distribution from some simpler family of distributions which guarantees that a lower bound of the likelihood functions increases.

How does one mathematically justify the use of ICM in such an E-step? Any reference/derivations/notes would be very helpful.

Thanks for your help.

asked May 08 '13 at 17:27

Percy%20Denton's gravatar image

Percy Denton
1222


2 Answers:

I don't have a direct answer but hopefully I can point you in the right direction.

I know that for mixture models, if you use EM with hard assignment (i.e., computing the mode of the latent variables instead of the posterior), you are not optimizing the likelihood but an objective that tries to minimize the overlap of the clusters. Unfortunately I don't know a paper where this is analysed. In practice this works quite well and can converge faster than true EM. Since ICM is an approximation of the mode of the latent variables I assume you can argue that a similar objective is optimized. It is akin to a hybrid of k-means and EM and also similar to Viterbi training of hidden Markov Models where the most likely sequence of hidden states is computed in the E step instead of the posterior. Perhaps it is easier to find literature that motivates this approach for HMMs.

answered May 09 '13 at 14:52

Philemon%20Brakel's gravatar image

Philemon Brakel
2445103560

By finding the mode instead of the distribution in the E-step, you are optimizing the "Viterbi likelihood", which can be interpreted as the standard log likelihood plus a strong regularization term on the conditional entropy of the latent variable. See the following two papers:

"Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars", EMNLP-CoNLL 2012

"Unified Expectation Maximization", NAACL 2012

answered May 10 '13 at 00:28

tukw's gravatar image

tukw
31114

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.