Is the SIFT algorithm scalable? Most questions I've seen asked about SIFT seem focussed on a simple comparison between two images. Instead of determining how similar two images are, would it be practical to use SIFT to find the N closest matching images out of a collection of thousands of images?

For example, would it be practical to use SIFT to generate keypoints for a batch of images, store the keypoints in a database, and then find the ones that have the shortest Euclidean distance to the keypoints generated for a "query" image?

When calculating the Euclidean distance, would you ignore the x, y, scale, and orientation parts of the keypoints, and only look at the descriptor?

asked Mar 02 '11 at 08:55

Cerin's gravatar image

Cerin
462314046

edited Mar 02 '11 at 08:56

I am interested in this as well. Particularly I am interested in doing this with Lucene/Solr.

(Mar 02 '11 at 12:54) Oscar Täckström

Since Lucene/Solr are focussed on full text search, wouldn't they be inappropriate for this domain? The descriptors may be largely unique, but have very small Euclidean distances to "similar" descriptors. I was looking into a RDMS supported KNN solution, such as PostGIS or Mahout Taste.

(Mar 02 '11 at 13:13) Cerin

I found http://www.semanticmetadata.net/lire/, but I have no idea about its scalability. My experience with Lucene/Solr is very limited, the reason I asked for this is that all textual data in the system is indexed by this.

I looked quickly at PostGIS, but to me it seems that this will only support simple geometry and is most concerned with finding intersections/encapsulation of polygons.

(Mar 02 '11 at 13:29) Oscar Täckström

I think that if you do some sort of discretized feature extraction such as the one mentioned in my answer you might be able to use lucene/solr as black boxes to do queries on your image database; the main downside being that more complex queries (i.e., that depend for example on the relative positions of features) won't be supported (unless you are willing to add features that consider the relative position of the original features).

(Mar 02 '11 at 13:33) Alexandre Passos ♦

4 Answers:

Searching with SIFT is still an open area of research, but it is indeed possible. The simplest and fastest search algorithms for SIFT in a large database involve computing the descriptors for images as soon as they enter the database, using a clustering algorithm do to vector quantization on the descriptors to get a set of discrete features for each image, and finally use those features as one would use text features to search the database, maybe using tf-idf feature vectors or something similar.

This, as you mentioned, ignored the x, y, scale, and orientation of the keypoints. Taking those things in consideration is a lot harder, however, as you'd have to search not only for matches between the images' descriptors but for spatially consistent matches, and things like partial occlusion, 3d point-of-view differences, etc, can make this a very hard problem. One possible solution is to first filter the images in the database by presence/absence of the vector-quantized features and then do more expensive matching algorithms on the returned list of images to get a better precision. I can't refer you to any papers in this area as I've only watched a couple of seminars and didn't catch the references.

answered Mar 02 '11 at 09:06

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2549653277421

Would inverted indices really scale for SIFT-features used in this way? I thought you had to use some fast approximate nearest neighbour search approach in order to get this to scale.

(Mar 02 '11 at 13:33) Oscar Täckström

Inverted indices will work if you discretize the features. You do lose some resolution.

The problem with nearest-neighbor-search is that an image is a bag of nearly-unique sift features, so unless you define a distance metric between bags of continuous vectors it doesn't make sense to do nearest neighbor search, and I don't know of any obviously right way of doing this.

(Mar 02 '11 at 13:35) Alexandre Passos ♦

i would assume that someone has tried a straight LSH (http://en.wikipedia.org/wiki/Locality_sensitive_hashing) approach. do you know of any papers where this has been tried?

(Mar 24 '11 at 16:31) Travis Wolfe

Can you recommend any tools for clustering high-dimensional data?

(Apr 07 '11 at 09:52) Cerin

How about kmeans? If you have a NVidia GPU, thy this http://peekaboo-vision.blogspot.com/2011/03/k-means-on-cuda-with-cuv.html ;)

(Apr 09 '11 at 05:29) Andreas Mueller

Some work in this direction was done by Sivic: http://www.robots.ox.ac.uk:5000/~vgg/publications/papers/sivic06c.pdf. Have a look at this series of paper and related literature. Basically, as Alexandre says: This is still an area of open research but it seems feasible.

answered Mar 03 '11 at 07:18

Andreas%20Mueller's gravatar image

Andreas Mueller
2656185692

I'm not much versed in this area, but I've heard of attempts using kd-trees for image matching based on sift. Here some papers I found looking for "sift and kd-trees":

  • http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4587638
  • http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5473895%2F5485225%2F05485624.pdf%3Farnumber%3D5485624&authDecision=-203

answered Mar 22 '11 at 13:25

Breno's gravatar image

Breno
96247

The work by Torralba, Fergus, and Weiss, "Small Codes and Large Image Databases for Recognition" (CVPR 2008), describes an efficient SIFT-based image retrieval method capable of finding the nearest neighbors in a database of millions of images in a fraction of a second on a single PC.

For each image, a gist descriptor is computed, which is then reduced to a short binary code (tens to hundreds of bits). Similar images will have a small Hamming distance between their codes. Since the codes are short, a brute force search over the database is fast.

The mapping from the gist descriptor to the binary code is learned. One of the methods presented is based on a multiple-layer restricted Boltzmann machine, and uses the data set's image category labels during the "fine tuning" phase of learning.

answered Mar 30 '11 at 22:58

Louis%20Howe's gravatar image

Louis Howe
161

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.