Hello -- So my question is regarding markov random fields. Assume I have a graph such that two vertices (states) are connected by some edge iff there is a markov relationship between the two states. I have a potential function that can describe the interaction between any two connected states.

I have four questions:

1) In this undirected mrf I have some cliques of size 3 (ie. [{state1--state2} {state1--state3} {state2--state3}]). Thus, instead of taking maximal cliques, is it still correct for me to instead factorize with only pairwise interactions? I feel the answer should be yes, since I can create a dummy state breaking the clique size 3 into 4 cliques of size 2.

2) In this undirected mrf, I have a state (state4) which loops back (ie. [{state3--state4} {state4--state4} {state4--state5}]. Are self loops allowed in mrfs?

3) Would a factor graph representation be more appropriate for avoiding these issues?

4) Are feature functions from CRFs and potential functions from MRFs the same?

Thanks.

asked Sep 21 '10 at 20:25

aak's gravatar image

aak
1112


2 Answers:
  1. If you factorize with only pairwise factors then you are restricting yourself to a subset of "all distributions with given Markov Structure". Your model is now a member of a more general class known as "hierarchical interaction models". Your case is the example Lauritzen gives as the simplest hierarchical non-graphical model on page 34 of "Lectures on Contingency Tables"

  2. Loops are allowed, but lots of loops can make things hard. For instance, getting marginal distributions of nodes is intractable for models with lots of strong loops. There are hundreds of papers (probably thousands if you consider related areas) devoted to the issue of accurately approximating marginals in models with loops. For regular graphs with degree above some small number, there are results showing that local algorithms for marginal inference have to take exponential time. Non-local algorithms could in theory do better, but it's hard to find non-local inference algorithms. (Schraudolph, 2008) is a notable exception but only applies in a restricted special case. It's an instance of a holographic algorithm. Unnormalized potentials and and some conditional probabilities remain easy in general case.

  3. Factor Graphs and Markov Graphs are both useful but insufficient descriptors of your model. If you want to know about conditional independence structure, you look at the Markov Graph, if you want to see form of message passing updates, check the Factor Graph, if you want to see number of parameters, you need a description with even more granularity, since your local potentials could be factorized. Factor Graphs capture everything about Markov structure but not vica versa. Some algorithms and concepts like Junction Tree and decay of correlations are formulated in terms of corresponding Markov Graph, so it may be more natural to look at your model in terms of its Markov Graph in those cases.

  4. Essentially, the difference is what they are functions of. You could have an MRF with density proportional to $exp(theta1 y1)$. During training you pick $theta$ to maximize likelihood. Suppose you now have a single observation x and want to fit a corresponding CRF. Now your data likelihood is written as $exp(theta(x,w) y1)$. $theta$ is now a function of input x and some weight vector w, for instance, theta(x,w)=x.w. Instead of picking $theta$ directly, you'd pick w to maximize likelihood. Often it's called conditional likelihood, but from formula standpoint they are fairly similar, both CRF and MRF define a density over y and inference algorithms are the same

answered Sep 21 '10 at 23:26

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
1963193458

edited Sep 22 '10 at 02:25

1) A Markov random field doesn't fully specify the model, so a clique of size 3 can correspond to many different factors. If you leave only pairwise interactions your model is weaker than one that allows for three-way interactions. See this question on this website for some discussion on representations of factor graphs.

2) You mean, can you put a factor that depends only on one node? The answer is yes, I think.

3) Yes. A factor graph representation is very good precisely for clearing up this sort of issues, as I said in point (1)

4) Yes. The only difference between CRFs and MRFs is the likelihood you optimize when learning the parameters: for crfs it's P(Y|X) (for appropriate disjoint sets of variables Y and X) and for mrfs it's P(X,Y). This makes training CRFs easier in some cases, since you only have to marginalize over Y, which can be done exactly in linear-chain and tree cases.

answered Sep 21 '10 at 20:44

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1893744214333

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.