|
Suppose I have an OCR engine that makes 3 kinds of mistakes:
I'm given pairs of string (correct and observed word) and need to learn a model of P(observed|correct). Each pair can have several different error sequences associated with it, so it makes sense to model error sequence as a hidden variable and use some kind of Expectation Maximization
The problem is that optimal E step doesn't have closed form, so some compromises need to be made. My question -- has anyone seen a very similar model in the literature, and how did they train it? |
|
Indeed, I think this is very similar to the Brown alignment models used in machine translation. I could be wrong, but they also have intractable E steps (this is somewhat dealt with by clever initialization and approximations). See the original Brown et al paper, the mathematics of machine translation, for more details. Another thing that might help you is using hard EM (or viterbi EM), in which you maximize in the E step instead of computing an expectation, as maximization in this model is just the usual dynamic programming text alignment/diff algorithm.
(May 11 '11 at 14:42)
Alexandre Passos ♦
Well, if you do E-step on a per-example basis, then it's tractable. In other words, you compute separate distribution over paths for each example. I was getting non-closed form when I tried to get the E step by following steps in Minka's "Expectation Maximization as lower bound maximization" -- distribution over hidden variables seems to jointly depend on all training examples
(May 11 '11 at 23:53)
Yaroslav Bulatov
|