|
hi, Given an input sequence X = x1, x2 ... xn, my target is predict the best segmentation sequence S = s1, s2 ... sm, and predict a tag ti for easy segment si, so the predicted tag sequence is T = t1, t2 ... tm. so as a discriminative model, given a sequence X, the model predict the best S and T, that is argmax (S, T| X)
S,T
is there any method that can solve this problem, or have I made any mistake? thanks very much. |
|
Conditional random fields (or hidden Markov models) with encodings such as BIO are the usual tools for this kind of problem. See other questions in this website for good references on these methods. The general name for this is sequence labeling, and you might have good luck searching for solutions to problems like NP chunking in natural language processing. hi, Alexandre, I know BIO and BCEO, these are token based methods, and segmentation is convert to tagging. am I right? but, you know, token based methods has the disadvantage that it cannot easily use the features of segmentations. actually I would like to know if it is possible to solve the argmax problem. or maybe I do not understand BIO method? thank you very much.
(Jun 02 '11 at 08:37)
binbinsh
|
|
If you are interested in features of the segmentation, you likely want a hidden semi-Markov model or a semi-Markov CRF. |
|
Paul Mineiro over at Machined Learning has a really nice sequence of posts on Dyadic Learning (learning over a space of dyads, or pairs). (one two three four) He doesn't explicitly address sequence modeling, but it would be a relatively straightforward extension to include a latent variable in a markov chain. |
Which is your definition of the best segmentation sequence? ordering? ranking?
Assuming you have labelled data, you could learn a model of segmentation given sequence, then learn model of tags given sequence and segmentation.
hi Yaroslav, your method may applicable, but here the segmentation has some relationship with tagging, if they are combined and solve together, I think the result of the model will be better.