|
I have written some Java code to implement Baum-Welch algorithm to experiment with training hidden Markov models. Actually, the program only calculates alpha, beta, gamma and xi values for now as I have not written the estimation step. I think, normally, given the correct initial state, state transition and emittance probabilities for the model, gamma values generated by the algorithm for t=0 should give the exact initial state probabilites. (Reminder: A gamma value for i equals to Is this normal behavior? Is it possible to have a model that represents the observations better than the actual model? Why does the law of large numbers work differently here? I hope the question is clear enough. Thanks in advance. |
|
Let me first state my understanding of your experimental setup:
And your question is why don't your learned parameters -- gamma seems to be your (only?) problem here -- reflect the original parameters which generated your data. If gamma is the only problem, then that is easily explained. The initial state probability reflects P(s(i) | lambda), essentially it is drawn conditional on nothing else (not even the first observation!). If you were to simulate many, many chains from your model, you should see that the distribution of s(1) would be essentially equal to the distribution described by your initial state parameter. Gamma(1), on the other hand, asks the probability of the initial state, given every observed value from t=0 until the end of your chain. Essentially, it includes information from all observations, as opposed to the initial state parameter which includes none of this information. Thus the two should not, in general, be the same value. Hope this answers your question! |
If by "represents observations better than the original model" you mean overfits, then yes, it is possible. Otherwise pleasy clarify, as I don't understand your question.