Hi all, i have raw feature vectors that are concatenated from very different feature extractors (like image features + acoustic features). Any good idea about measuring their distance/similarity (e.g., for use in knn or using similarity as features to feed into to SVMs)? I guess simple euclidean distance does not work well?

Thanks a lot.

asked May 22 '11 at 22:18

exppie's gravatar image

exppie
1111


One Answer:

A simple and effective way, actually, is to normalize each mode so that, taken individually, euclidean distances or dot products will typically be in the same range, and then simply sum or average similarities. You are then essentially using a naive bayes approach to evaluating the data of different modes.

Ultimately, however, many general methods can adopt to multimodal learning. In some sense, after all, age height, weight, etc. are significantly different features, compared to pixel intensities, yet we don't treat census data specially.

Also, look at this paper: http://www.stanford.edu/~aditya86/nips2010_khosla.pdf

and look into canonical correlation analysis, which can deal with cross-modal correlations.

answered May 23 '11 at 01:13

Jacob%20Jensen's gravatar image

Jacob Jensen
1914315663

Thanks! I'll take a look at this paper

(May 24 '11 at 10:32) exppie
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.