5
2

As far as I understand, Face Recognition in a corpus of images or videos can be achieved by first using a face detector such as Viola-Jones and then computing the projection of detected face images on to a vectorial base extracted from the face collection e.g. EigenFaces, LaplacianFaces, FisherFaces...

Hence each photo of a person face is represented by the components in the {Eigen,Laplacian,Fisher}Faces and one can expect than by using nearest neighbors search in such a euclidean space it is possible to recognize the person represented on a new unlabeled picture.

However, naive projections will generate dense codes and if the base is big enough (say 100 or 1000 EigenFaces) then KNN lookups suffer from the curse of dimensionality. Hence it is probably better to try to use sparse coding using L1 penalized methods to compute the "sparse projection" on the face manifold.

So my question is: is it currently possible to achieve high scalability on this problem, for instance similar to the scalability of fulltext indexer such as Apache Lucene? Is is possible to recognize in a couple of seconds a person out of an index of millions of people (say for instance the pictures of famous persons in wikipedia)? If so what are the tricks? What is the typical dimension of the projection space? How many non-zero components should a face have when projected on to that vector space? Any reference paper? Any reference implementation?

asked Jul 03 '10 at 09:54

ogrisel's gravatar image

ogrisel
498995591

edited Dec 03 '10 at 07:07

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421


3 Answers:

There are techniques whereby one can learn a low-dimensional binary hash code, and you do hash lookup for retrieval. You look in buckets that are hamming distance k (for k maybe 2) from the binary code to which the image hashes. Look at work that cites Salakhutdinov + Hinton (2007) "Semantic Hashing" to learn more about this approach.

There is a more recent approach, which I haven't yet gotten to grok, by Gal Chechik et al (JMLY 2010) "Large Scale Online Learning of Image Similarity Through Ranking". I haven't read the work, but if I remember what a colleague told me, Chechik et al learn a high-dimensional sparse representation, and then use a conventional text retrieval technique. We discuss this sparsification in this answer.

answered Jul 04 '10 at 17:31

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

1

Indeed this is the kind of work I was looking for. It is really hard for general objects similarities in images hence I wondered if by restricting the problem to face recognition the results might not be better. As far as I can see from your very recent references this is still an active research subject and there is still no consensus whether text retrieval-like queries on high dim sparse codes are a better strategy than reduced dim binary hashes with hamming ball queries.

An interesting extension of the semantic hashing approach to the celebrity faces recognition problem can be found in Ruei-Sung Lin, David A. Ross, Jay Yagnik "SPEC Hashing: Similarity Preserving algorithm for Entropy-based Coding".

(Jul 04 '10 at 18:35) ogrisel

This seems to be a classic (2000) highly cited paper (~190 cites on scopus) on this topic http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.953

It might be interesting to follow papers which have recently cited this paper

answered Jul 04 '10 at 12:56

DirectedGraph's gravatar image

DirectedGraph
56031424

some info can be found here

http://www.intechopen.com/books/state_of_the_art_in_face_recognition

and here

http://www.cs.columbia.edu/CAVE/projects/face_search/

http://homes.cs.washington.edu/~neeraj/projects/facesearch/

answered Jan 30 '13 at 03:19

mrgloom's gravatar image

mrgloom
91131519

edited Jan 30 '13 at 03:38

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.