I am currently trying to identify the most discriminative features in a given bag of features. I have access to positive bags (bags that contain features from positive instances) and one negative bag (bag that contains all negative instances). I have multiple positive bags because I have multiple training instances to extract the bags from. For the negative bag, I just group together the features from all the negative instances. I am now tasked with identifying the most discriminative features from the positive bag.

I was thinking of solving my problem using the MIL framework to do the learning. I have a couple of questions regarding the usage of MIL.

  1. I know that mi-SVM can separate the examples well, but the distance of the feature point from the hyperplane is not directly related to the confidence measure. So, how can I get the confidence measure for each feature?

  2. Would there be any issues with training biases as I have lot many more negative instances than positive instances?

  3. In what other way can I measure the confidence levels of each feature, apart from using the MIL framework? I am guessing that I would need a good discriminative classifier to find the conditional probability. Any suggestions about the same?

asked Dec 13 '12 at 08:40

Vittal's gravatar image

Vittal
16335

edited Dec 13 '12 at 22:36

The question title does not really reflect the body of the quetion: the tile is about instance probability but the main body is about feature importance.

(Dec 13 '12 at 15:40) Daniel Mahler

I agree with Daniel below, the bags seems to just complicate things. The way you describe the problem it looks liks binary classification with imbalanced data. What is the purpose of the feature selection? If the goal is to achieve better classification performance, I would try to simply regularize the model before even considering some heuristic feature selection.

(Dec 14 '12 at 04:07) Oscar Täckström

The goal is not just better classification. The goal, primarily, is to identify discriminative features and the not-so-discriminative ones.

(Dec 14 '12 at 06:57) Vittal

Ok. Did you consider starting with something simple, like mutual information in a naive Bayes style model?

(Dec 14 '12 at 07:44) Oscar Täckström

Can you also say more about your data: number of instances & distinct features, feature distributions/sparsity, class frequencies ...

(Dec 14 '12 at 08:33) Daniel Mahler

One Answer:

You could also try using random forests. There are 2 fairly standard ways of estimating feature imprtance with RFs: one is base on the Gini coefficient reduction at the nodes that split on that feature & the other is based the performace degradation caused by completely randomizing that feature. The second way should really be applicable to any ML algorithm, but I have only seen it used in connection with RFs. RFs are very sensitive to class imbalace though, even a small imbalance can push an RF to just guesssing the most common class. Usually though subsampling the more common class to match the less common one seems to get good result. Alternatively you can include all the data in your training set, but undersample the more common class when constructing bags to train the individual trees. For undersampling during bagging, I have gotten sligltly better performace by sampling the bags without replacement. I think this is because if you sample with replaceant and are undersampling a much bigger class those samples will tend to be distinct, while sample from the less common class will contain duplicates, and fewer distinct examples. I think this ends up still biasing the learning. The randomForest package for R supports all of the above tweaks.

Is there any signifinace to the way your instances are grouped into bags or is it only to counteract the class imbalance? Can you just pool all the data into one training set? You colud also run the the feature importance algorithm on each of the multiple bags against the common bag & just see if the same features are important.

answered Dec 13 '12 at 15:38

Daniel%20Mahler's gravatar image

Daniel Mahler
122631322

edited Dec 13 '12 at 15:45

Thanks for the answer. I will experiment with RFs.

There is no significance to the way in which the instances are grouped into bags. I am given positive images and negative images. I extract multiple features from each image and group them into a bag. Some of the features from the positive images, are not significant enough and are found in the negative images as well. So, I want to identify a ranking order of the most important features from the positive instances.

Regarding pooling of all data into one training set, how would that help?

(Dec 13 '12 at 22:35) Vittal

Regarding pooling the data, I was mainly thinking of simplifying the problem conceptually. The bags seem to be an extraneous artifact that just complicates the problem. It seeems better to think of it just in terms of positive & negative instances. If you do any kind of undersampling or resampling you a reintroducing bags again, but that is a part of the solution. It does not seem to belong in the problem statement.

(Dec 14 '12 at 03:28) Daniel Mahler
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×2

Asked: Dec 13 '12 at 08:40

Seen: 912 times

Last updated: Dec 14 '12 at 08:33

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.