|
Hi all, i have raw feature vectors that are concatenated from very different feature extractors (like image features + acoustic features). Any good idea about measuring their distance/similarity (e.g., for use in knn or using similarity as features to feed into to SVMs)? I guess simple euclidean distance does not work well? Thanks a lot. |
|
A simple and effective way, actually, is to normalize each mode so that, taken individually, euclidean distances or dot products will typically be in the same range, and then simply sum or average similarities. You are then essentially using a naive bayes approach to evaluating the data of different modes. Ultimately, however, many general methods can adopt to multimodal learning. In some sense, after all, age height, weight, etc. are significantly different features, compared to pixel intensities, yet we don't treat census data specially. Also, look at this paper: http://www.stanford.edu/~aditya86/nips2010_khosla.pdf and look into canonical correlation analysis, which can deal with cross-modal correlations. Thanks! I'll take a look at this paper
(May 24 '11 at 10:32)
exppie
|