0
1

I've recently started looking into pyramid matching kernel as a solution to determine if two images are similar. I'm following the bags of visual words methodology and get the histogram of words representation for each image. But once I'm there, am I right to assume that the next step is to convert the histograms to pyramids? And then calculate the histogram intersect kernel matrix?

I used a demo code and got a m x n matrix for K, the intersect kernel. I know that the larger K, is the more similar the images are. But why is it multi dimensional? How exactly should I interpret it assuming that I just calculated K for the similarity between 2 images?

asked Jul 05 '12 at 19:47

mugetsu's gravatar image

mugetsu
233212431


One Answer:

Before you start with the pyramid, I think you should try to use a single BoW representation and the intersection kernel. Then each image is represented a vector of length n, the number of "words" in your quantization. It is a good idea to use normalized histograms, i.e. scale the histograms to sum to one.

To compute the intersection kernel, you simply do an element-wise minimum between the two vectors and sum the whole thing up. This yields a single number, the kernel between the two BoW representations.

To do a pyramid intersection kernel, you produce additional representations by splitting each image in 4 parts (2x2) and compute the histogram there. Then you concatenate these 4 histograms and compute the kernel for this vector of size 4 * n. You can do the same with 4x4 parts for another level.

Then you sum up all the kernel values you got, weighting each level in the pyramid with (I think) 2^-l. That gives you again a single value for the two images.

You can find the details in

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

answered Jul 06 '12 at 03:18

Andreas%20Mueller's gravatar image

Andreas Mueller
2686185893

thanks a lot! That makes a lot of sense! Looks like I can just easily implement this code on my own. However is there a way to get around having to compute the histogram for every level?

(Jul 06 '12 at 12:15) mugetsu
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.