|
Hello everybody, I would like to pick up a discussion from a previous question on the Difference between MRF and CRF and ask a few related questions that I'm still unsure about: 1 - In what way does the discriminative approach p( labels | data ) allow including the entire dataset while p( labels , data) doesn't? Is it even correct that we can use the entire dataset BECAUSE of the discriminative approach? Allow me to propose the following explanation and please correct me if I'm wrong:
Is this the point? 2 - A common line is that modelling the joint p( labels , data ) is computationally more expensive because p( data ) must be "enumerated". What is this referring to? 3 - I remember reading papers on MRFs that don't just use the data values but also first derivatives of the data at every site. Why is this still an MRF? If arbitrarily many derivatives are allowed without violating the MRF property, I can just take the derivatives and use Taylor expansion to get a look at the entire image. Would this not rather be a case for an CRF? Any help would be most appreciated. Thanks, Bert |
|
In order,
Dear Alexandre, thank you very much for your detailed reply. 1 - Yes, the basic idea was that I could (if I found this reasonable) run some preprocessing like this. I admit that if the problem was finding singular/plural words it would be pointless to make this distinction already in the preprocessing. My main problem is: You've said "Using discriminative training allows to use features that condition on the observed sequence at no penalty in terms of model complexity". And this is the point I don't really understand. Let's forget about the sheep example, it was probably quite stupid. But: The reason that I can include the entire data "for free" without violating the Markov property is the following:
2 - I see, thanks. 3 - I'm afraid I don't have my documents here. From what I remember it was a gradient over image data that was used to detect discontinuities in the image information. I may be wrong and I think the question is in fact not so important. Again, thank you very much for your helpful reply. Regards, Bert
(Nov 16 '11 at 17:09)
Bert Draw
Regarding your comments on 1, remember that the only difference between CRFs and MRFs is in parameter estimation: in one case I model P(labels | data) and in the other case P(data). So, as far as the CRF is concerned, features over labels and data are just over labels, because all inference and learning is done with regards to the labels. If you use MRFs you also need to model the data, and this will often make for a more complex model. The difference is not about what is known or not, but about what is being learned, and here CRFs do not learn anything about the data (MRFs do), so dependencies involving the data are free.
(Nov 16 '11 at 17:16)
Alexandre Passos ♦
Thank you very much.
(Nov 16 '11 at 17:23)
Bert Draw
|