8
2

In many papers on graphical models, directed graphical models and undirected graphical models are usually presented, and algorithms that operate on both, etc. But factor graphs are usually just mentioned in passing. Why is this so? It slightly bothers me that there are many different factor graphs for a given Markov random field, and that, to compute which factors in the MRF will be in the energy function is generally an NP-complete problem (listing all maximal cliques in a graph). Why not just directly specifing these models as factor graphs, and omit this onerous step of computing the terms of the energy function? Is it because most papers only deal with a specific model, and then can afford to precompute the factors, or am I missing something?

asked Jul 08 '10 at 07:21

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1899744214335

1

I agree it's bothersome. I usually think about my models in terms of factor graphs. The one thing that they don't convey, though, are the normalization constraints, if you have a directed graphical models.

(Jul 08 '10 at 08:58) Frank

Yes, agreed. This is why the question was specifically about undirected graphical models vs factor graphs. Generally, I also prefer dealing with directed graphical models as-is, instead of converting to undirected (which also the Wainwright & Jordan papers do before inference).

(Jul 08 '10 at 09:08) Alexandre Passos ♦

I think that one reason is that it is (at least for me) generally simpler to intuitively grasp the model through the MRF representation. Then you can list feature templates etc to provide a more detailed description.

(Feb 18 '11 at 13:28) Oscar Täckström

2 Answers:

I think two factors help explain why Markov random fields (MRF) are preferred over factor graphs:

  1. Sociological: MRF has a prominent history in machine learning and other disciplines, through the efforts of statistical physicists and statisticians mid-last century. They have become a kind of canonical undirected probabilistic graphical representation.
  2. Pragmatic: It seems that the factor graph equivalent of MRF is used when convenient, as merely a pass-through for executing message passing and the like-- that is for some problems (perhaps those prominent early in ML) the collapsed MRF provides an adequate, simple enough representation. However, it seems that there are those pushing for more use of factor graphs, given that they aid in problems where there are a large number of overlapping relationships between variables (e.g. gene interaction networks, electrical networks, etc.).

answered Jul 08 '10 at 13:52

John%20L%20Taylor's gravatar image

John L Taylor
61541518

edited Jul 08 '10 at 13:57

So it's mostly due to history, then, plus simplicity (since an MRF will have a lot less nodes than its factor graph by definition, and a lot of the time the factors are obvious).

(Jul 08 '10 at 13:57) Alexandre Passos ♦
2

This is my best judgment, anyways.

As an aside, it was interesting to note that on the linked page advocating factor graphs, it seems that there may be more reason to use a variant of factor graphs in terms of their greater generality: B. J. Frey 2003 Extending factor graphs so as to unify directed and undirected graphical models, in Proceedings of Conference on Uncertainty in Artificial Intelligence 19 (UAI 03), Morgan Kaufmann, CA, Acapulco, Mexico, August

(Jul 08 '10 at 14:07) John L Taylor
1

Yes, factor graphs are clearly more general. The simplest example I can think of is the triangle graph (a->b, b->c, c->a). It admits two different factor graph representations: (a->f, b->f, c->f) and (a->f1, a->f2, b->f1, b->f3, c->f2, c->f3), the first with a three-way interaction and the second with three pairwise interactions. These are clearly two different models. I don't think anybody advocating just specifying the MRF in this case, though.

(Jul 08 '10 at 14:14) Alexandre Passos ♦

I meant to point-out the less trivial generality of Frey's notation for representing both directed and undirected graphs as factor graphs, effectively subsuming MRF and Bayes Nets (BN).

(Jul 08 '10 at 14:42) John L Taylor

Markov Network and Factor Graph are just different levels of granularity when describing your probability distribution. Markov Network gives you all the independencies between variables, Factor Graph specifies which kinds of interactions are allowed within each maximal clique, and "set of features" gives you parameterization of entries in each of the factors.

In practice your log-linear distribution family is specified by the set of features, and "Markov Network/Factor Graph" are higher level overviews. Given a set of features, you get a corresponding factor graph by creating a factor for any every set of variables that are simultaneous inputs to some feature function. Given a factor graph you find a corresponding Markov Network by connecting any two nodes that share a factor.

As to why people even bother talking about their distribution family at the Markov Network level when the actual distribution they use is defined by some set of parameterized probability tables or feature functions, treewidth of Markov Network gives you the complexity of inference, and exact inference (Junction Tree algorithm) is formulated in terms of cliques/separators on the corresponding Markov Network

answered Jul 28 '10 at 20:40

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
1963193458

edited Jul 28 '10 at 23:58

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.