4
3

Hello all, I hope someone could shed a bit of light on this issue. I am trying to get a grasp on CRF but I get a bit lost in the forest of details. My doubts could be summarized as follows:

  1. What is the difference between CRFs and MRFs? Is the only difference that in MRF we optimize the parameters for P(Y,X;theta) and in CRF P(Y|X;theta)? Would this be what is often used for distinction between generative and discriminative training?

  2. Why are global features allowed in CRF and (apparently) not in MRF? Where do we make this assumption?

  3. Typical phrase in vision papers: A CRF is an MRF conditioned on the image. In Bishop's MRF example(typical image de-noising), the de-noised pixels, e.i"labels", are also conditioned on the image. Why isnt it a CRF?

Any help would be great!

asked Dec 24 '10 at 13:45

Roderick%20Nijs's gravatar image

Roderick Nijs
245101422

edited Dec 24 '10 at 18:11

1

There's some inconsistency in terminology between NLP and Vision papers -- vision papers sometimes call their models "CRF" even when they are not normalized and not trained to maximize any kind of likelihood

(Dec 25 '10 at 00:29) Yaroslav Bulatov

BTW, you can view image CRF as a family of Markov Random Fields -- each new observation gives a random field which is Markov with respect to the grid-graph

(Dec 25 '10 at 17:43) Yaroslav Bulatov

That point is not yet completely clear...how would that definition fit into Bishop's example? You can find it here(page 48 in the pdf):

http://research.microsoft.com/en-us/um/people/cmbishop/prml/bishop-prml-sample.pdf

This Graphical Models chapter of Bishop's book happens to be the a sample chapter :)

(Dec 26 '10 at 06:06) Roderick Nijs
1

@Roderick: He's not estimating parameters in the denoising (computing the probabilities of denoised pixels given observed pixels), as there are only two parameters in the model (beta and eta). What he's doing is MAP inference on the variables, and this inference is conditional. So it's not a CRF as there's no training to maximize the conditional likelihood of the parameters.

(Dec 26 '10 at 06:11) Alexandre Passos ♦

Im starting to feel really stupid, but here I go anyway:

He doesnt actually say how he estimates beta and eta...but doesnt eta capture exactly "[...]the probabilities of denoised pixels given observed pixels"?

Or saying it in a different way: If the only difference between a CRF and MRF is the training, how do you know it is not a CRF if he does not say how he trained it?

(Dec 26 '10 at 11:26) Roderick Nijs
1

@Roderick: Yeah, maybe it would be a CRF, but it's besides the point, as what's most important in that model is not the values of the two parameters but the soft constraints that original and denoised pixels should agree and neighboring pixels should agree. He gets good results even with the parameters fixed, so no training is needed so there's no need to call it a CRF.

(Dec 26 '10 at 22:05) Alexandre Passos ♦
2

Markov Random Field is a model where the density of the random variable factorizes over the graph. In Bishop's example, the random variable consists of pixel labels, so the density can be called a Markov Random Field for given image. It will be a different MRF for a different image. You can think of "Conditional Random Field" framework as a way of creating a family of Markov Random Fields parameterized by some image statistics.

(Dec 27 '10 at 02:26) Yaroslav Bulatov
1

Thank you guys, I guess I just need some time to crystallize these concepts. Im just going to write what I understood.

That in a MRF the density p(x=labels,y=image) of a some random variables factorizes as product of functions defined over the cliques.

A CRF "generates" a different MRF depending on the features.

So, an MRF is defined by the graph and the parameters (as many as we want),but if these parameters are a function of the input image(not the case for Bishop's example) we have a CRF.

(Dec 27 '10 at 05:30) Roderick Nijs
showing 5 of 8 show all

One Answer:
  1. Yes, precisely
  2. Global features make sense in MRFs also, you just connect a factor to all nodes, but this really complicates inference. The thing with CRFs is that you only need to do inference and optimization over the label nodes, so it won't cost you anything to add more factors.
  3. I'm not familiar with Bishop's example, but if it optimizes P(denoised-pixels|all-pixels) it's a CRF, but if it optimized something like P(all-pixels) then it's an MRF.

answered Dec 24 '10 at 16:30

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

I don't have Bishops book here but from memory I am quite sure that he optimizes P(denoised-pixels|all-pixels) and calls it an MRF. Which does seem odd.

@Alexandre: I completely agree with your definition. But do you know a place where it is written down? Also: The wikipedia page on CRFs is in pretty bad shape :( If there was a good reference, one could fix that up.

(Dec 25 '10 at 05:02) Andreas Mueller
1

@Andreas: I like Sutton and McCallum's updated version of their "An introduction to conditional random fields" in arxiv: http://arxiv.org/abs/1011.4088

(Dec 25 '10 at 05:12) Alexandre Passos ♦

But there also ought to be a difference in the model itself, right? We should require more parameters for a MRF to capture the distribution of the features...

(Dec 25 '10 at 05:14) Roderick Nijs

@Roderick: Not really. For each directed generative model you can think of (naive bayes, hidden markov models, etc) you can build an undirected conditional model with precisely the same number of parameters that is a CRF (in this case logistic regression and linear-chain CRFs).

Likewise, for every CRF you could theoretically maximize the joint likelihood and use it as a generative model, but in practice this is harder due to issues with normalization, which is why you can do things in CRFs that you don't see in hidden Markov models, such asd adding loads of features to each word, adding features for arbitrarily many previous and subsequent words, etc.

(Dec 25 '10 at 05:21) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.