0
1

I'm curious about how to handle the normalization constant of noise-contrastive estimation, when using NCE to train an unnormalized model in a supervised way (trying to model a conditional distribution). In particular, I'm interested in estimating probabilities of words in a huge vocabulary, given some input, without having to sum over all words.

Here's what I've seen in the literature:

  • Mnih and Teh (2012) say they tried having one partial normalizing parameter (log of the partition function) for each value of the input (each "context") during training, and that it did not give better results than simply fixing all of them to zero. However, during validation and testing, they used explicitly-normalized probabilities, and they did not report whether the unnormalized probabilities were close, or if using them would impact the final results.

  • Gutmann and Hyvärinen (2012) mention that "The maximizing pdf is found to have unit integral automatically", but I think it assumes there is a free normalizing parameter, and I am not sure the theory holds when the parameter is pinned to 0.

  • Mnih and Kavukcuoglu (2013) use the previous reference to say they can ignore the normalization term. However, their focus is on learning embedding, not having an accurate probability estimation with an unnormalized model, so we do not know if the estimated conditional probabilities actually ended up approximately normalized.

  • Xiao and Guo (2013) seem to have attempted to explicitly model normalization parameters in function of the context, but provide little explanation on how they are modelled (except for the initialization to 0), nor the magnitude of the values that were actually learned (during training and at the end).

Do any of you have any more insight on that, or additional results or references?

asked Apr 07 '14 at 05:47

Pascal%20Lamblin's gravatar image

Pascal Lamblin
106126

Be the first one to answer this question!
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.