|
In Bishop's Book (Machine Learning and Patter Recognition), Chapter 8, Section 8.4.1, page 395 He mentions that each of the potential functions in an undirected chain are given by a KxK table (each of the nodes has k states), I am guessing that table is basically the conditional transition probability table. He goes to do a factorization, to later state it as the message passing principle (So far so good) Then, (here is my confusion) to point the complexity (page 396) He says that after the summation over the x1 for each value of x2 (this is for the forward message) you have a vector of length K (I think is Kx1) then you multiply that by the outern matrix of the factor (x2,x3) (which also is KxK). To my understanding you will have (after the multiplication) a Kx1 vector, which is basically imposible to sum over x2 afterwards (unless it is a trivial sum), so how do you perform that operation, is it a point wise multiplication?, or is it just a scale over the columns of the KxK matrix asociated with (x2,X3)?. Does anyone has a cool reference to have a better understanding of graphical models? Thanks for your insight
showing 5 of 6
show all
|
|
As for better understanding message passing: I strongly suggest the tutorial Factor graphs and the sum-product algorithm and the more general paper The generalized distributive law. Both helped me a lot. If I remember correctly Bishop also heavily referenced them in the relevant chapter. |
|
Here are the steps to get P(x3) for a 4-state, 3-time-slice HMM
|

Sorry, I have trouble understanding your question. But there's no point-wise multiplication -- each elimination is equivalent to matrix-vector multiplication. For very first elimination, use vector of 1. The first paper that made sense of message passing for me was http://acl.ldc.upenn.edu/W/W02/W02-0102.pdf
What do you refer by elimination?
Eliminating a variable means summing it out. IE, summing over x1 is eliminating x1
Ok, but in the book it is described as the potential phi(x1,x2) being defined by a KxK matrix, which you have to use for the elimination. And after doing that you are left with a Kx1 vector
then you multiply that vector by phi(x2,x3), and that's equivalent to eliminating x2
Yes, but phi(x2,x3) is another KxK, the multiplication will leave you with a Kx1 matrix, in which is impossible to sum over x3 to do the elimination.