|
hi Can someone explain what is the marginal polytope and it's connection with approximate inference methods for exponential families ? There are a cpl of papers on this but i am not able to understand the basic connection - and hence the maths? please point to any easy-to-understand references Thanks |
|
The paper "Graphical Models, Exponential Families, and Variational Inference" by Wainwright and Jordan explains this in great detail. I'm no expert in this area but I will try to give you the gist of it as I understand it. I'm happy to get corrected by people who are more acquainted with this material. The marginal polytope is a space of valid model parameters that is only defined in terms of the individual nodes and edges of a graph. You can also view the marginal polytope as a set of linear constraints that define the valid parameter vectors of a family of distributions for a certain graph. In the way that the set of valid parameter vectors for a categorical distribution is the simplex of positive vectors that sum to 1, the marginal polytope is a parametrisation that follows these same constraints by defining the join probability of a graphical model as a the convex hull of the space defined by the possible values of the feature functions that define the nodes and edges of a graphical model. It is called the 'marginal' polytope because these are the marginal distributions over individual nodes or joint probabilities over two nodes if two nodes happen to be connected. In variational inference one generally want to find some optimal distribution according to a certain functional (e.g., the KL divergence with the distribution you are trying to approximate). Mean field methods tend to work in a subspace of the marginal polytope that is often not convex any more. So the optimal solution may actually be impossible to reach if it isn't contained in this subset. Belief propagation methods tend to search in a space that is a superset of the marginal polytope. This means that the optimum should always be in the search space but that the algorithm might find an optimum that is outside of the marginal polytope and violates some of the constraints imposed by it. |