|
Hello all, I hope someone could shed a bit of light on this issue. I am trying to get a grasp on CRF but I get a bit lost in the forest of details. My doubts could be summarized as follows:
Any help would be great!
showing 5 of 8
show all
|
I don't have Bishops book here but from memory I am quite sure that he optimizes P(denoised-pixels|all-pixels) and calls it an MRF. Which does seem odd. @Alexandre: I completely agree with your definition. But do you know a place where it is written down? Also: The wikipedia page on CRFs is in pretty bad shape :( If there was a good reference, one could fix that up.
(Dec 25 '10 at 05:02)
Andreas Mueller
1
@Andreas: I like Sutton and McCallum's updated version of their "An introduction to conditional random fields" in arxiv: http://arxiv.org/abs/1011.4088
(Dec 25 '10 at 05:12)
Alexandre Passos ♦
But there also ought to be a difference in the model itself, right? We should require more parameters for a MRF to capture the distribution of the features...
(Dec 25 '10 at 05:14)
Roderick Nijs
@Roderick: Not really. For each directed generative model you can think of (naive bayes, hidden markov models, etc) you can build an undirected conditional model with precisely the same number of parameters that is a CRF (in this case logistic regression and linear-chain CRFs). Likewise, for every CRF you could theoretically maximize the joint likelihood and use it as a generative model, but in practice this is harder due to issues with normalization, which is why you can do things in CRFs that you don't see in hidden Markov models, such asd adding loads of features to each word, adding features for arbitrarily many previous and subsequent words, etc.
(Dec 25 '10 at 05:21)
Alexandre Passos ♦
|
There's some inconsistency in terminology between NLP and Vision papers -- vision papers sometimes call their models "CRF" even when they are not normalized and not trained to maximize any kind of likelihood
BTW, you can view image CRF as a family of Markov Random Fields -- each new observation gives a random field which is Markov with respect to the grid-graph
That point is not yet completely clear...how would that definition fit into Bishop's example? You can find it here(page 48 in the pdf):
http://research.microsoft.com/en-us/um/people/cmbishop/prml/bishop-prml-sample.pdf
This Graphical Models chapter of Bishop's book happens to be the a sample chapter :)
@Roderick: He's not estimating parameters in the denoising (computing the probabilities of denoised pixels given observed pixels), as there are only two parameters in the model (beta and eta). What he's doing is MAP inference on the variables, and this inference is conditional. So it's not a CRF as there's no training to maximize the conditional likelihood of the parameters.
Im starting to feel really stupid, but here I go anyway:
He doesnt actually say how he estimates beta and eta...but doesnt eta capture exactly "[...]the probabilities of denoised pixels given observed pixels"?
Or saying it in a different way: If the only difference between a CRF and MRF is the training, how do you know it is not a CRF if he does not say how he trained it?
@Roderick: Yeah, maybe it would be a CRF, but it's besides the point, as what's most important in that model is not the values of the two parameters but the soft constraints that original and denoised pixels should agree and neighboring pixels should agree. He gets good results even with the parameters fixed, so no training is needed so there's no need to call it a CRF.
Markov Random Field is a model where the density of the random variable factorizes over the graph. In Bishop's example, the random variable consists of pixel labels, so the density can be called a Markov Random Field for given image. It will be a different MRF for a different image. You can think of "Conditional Random Field" framework as a way of creating a family of Markov Random Fields parameterized by some image statistics.
Thank you guys, I guess I just need some time to crystallize these concepts. Im just going to write what I understood.
That in a MRF the density p(x=labels,y=image) of a some random variables factorizes as product of functions defined over the cliques.
A CRF "generates" a different MRF depending on the features.
So, an MRF is defined by the graph and the parameters (as many as we want),but if these parameters are a function of the input image(not the case for Bishop's example) we have a CRF.