I have a 2-state model and train it offline. Then I get real time observations, keep a rolling window and do the decoding over the emissions included. However, I get a lot of abrupt jumps in the outcome (the probability distribution of the states). Why can cause this? Is the emission alphabet too large perhaps? After the training stage, the emission probabilities over the 2 states seem to differ by much more than they should. Is this due to initialization problem, or perhaps my training interval is too small? Where can I seek the source of the problem?

I used matlab's hmmtrain() and hmmdecode(). Thank you

asked Mar 07 '12 at 00:18

Viktor%20Simjanoski's gravatar image

Viktor Simjanoski
193212529

The transition probabilities look like 9.1357e-01 8.6425e-02 7.9794e-02 9.2021e-01 so I really have no idea what's the source of this instability. Any clues?

(Mar 07 '12 at 16:53) Viktor Simjanoski

do you expect the model to be "not-jumpy"? you could add a "inertia prior" by boosting the prior prob on self transitions.

(Mar 07 '12 at 16:57) Travis Wolfe

Yes, I expect it to be much less jumpy than it is. Even though the time window includes around 2,000 observations, even a single one is able to make it go from [0.9 0.1] to [0.1 0.9]. Can you please elaborate on your suggestion of 'inertia prior'? How do I exactly use it? Where can I learn more about this? Thanks

(Mar 07 '12 at 17:00) Viktor Simjanoski

2 Answers:

If you really believe your data should not be "jumpy", I think the best way to capture this is in your prior on the transition matrix. You estimate your transition probabilities from (partial) counts in your transition count matrix. To add a prior of "inertia", add counts to the diagonal before training. Boosting the diagonal counts boosts the probability of self-transitioning, i.e. keeping the same state at time t as t-1. The bigger the count you add to the diagonal, the stronger your prior and the less you will "trust" the local search in your unsupervised training.

If you want to get really fancy, you can try posterior regularization of the transition matrix.

answered Mar 10 '12 at 17:35

Travis%20Wolfe's gravatar image

Travis Wolfe
235119

Are you using supervised or unsupervised training? If unsupervised, it might just be that your data is much better explained by a jumpy model than by a persistent one (for example, observations which are mixed from two clusters are really well-explained by a "jumpy" model). If this is true, why do you think your model should be less jumpy? Is there some meaning you're expecting these HMM states to have that they are not having? Which properties of this meaning match and which don't match with the assumptions in hidden Markov models? Regardless, if you think more weight should be given to the transition probabilities feel free to exponentiate them and renormalize.

If you're using supervised training, then it might be that you'd be better off using a CRF than an HMM, as it seems like the emissions are outweighing the transitions significantly, and CRFs can properly weight these two factors.

answered Mar 07 '12 at 17:37

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

edited Mar 07 '12 at 17:38

It is unsupervised. Yes, the outcome has certain physical meaning that I do not expect to change so rapidly. The changes are in the right direction, but they are too abrupt. When I apply exponentially moving average(ema) on the outcome, it looks much more satisfactory, but this is just a hack, not a satisfactory solution. Any suggestions what can I do about it to solve the problem more cleanly and methodically?

(Mar 07 '12 at 18:06) Viktor Simjanoski

I also believe that your emissions probability values have significant influence. You might try putting a prior on your emission distribution. Alternatively, revisit the independence assumption you made on your observations. For example, a strong assumption on full independence between observation variables tends to give either very high or low emission probability.

(Mar 08 '12 at 05:42) Christopher Tay

Will decreasing the emission alphabet size make the model less jumpy?

(Mar 08 '12 at 15:14) Viktor Simjanoski
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.