|
I am currently trying to identify the most discriminative features in a given bag of features. I have access to positive bags (bags that contain features from positive instances) and one negative bag (bag that contains all negative instances). I have multiple positive bags because I have multiple training instances to extract the bags from. For the negative bag, I just group together the features from all the negative instances. I am now tasked with identifying the most discriminative features from the positive bag. I was thinking of solving my problem using the MIL framework to do the learning. I have a couple of questions regarding the usage of MIL.
|
|
You could also try using random forests. There are 2 fairly standard ways of estimating feature imprtance with RFs: one is base on the Gini coefficient reduction at the nodes that split on that feature & the other is based the performace degradation caused by completely randomizing that feature. The second way should really be applicable to any ML algorithm, but I have only seen it used in connection with RFs. RFs are very sensitive to class imbalace though, even a small imbalance can push an RF to just guesssing the most common class. Usually though subsampling the more common class to match the less common one seems to get good result. Alternatively you can include all the data in your training set, but undersample the more common class when constructing bags to train the individual trees. For undersampling during bagging, I have gotten sligltly better performace by sampling the bags without replacement. I think this is because if you sample with replaceant and are undersampling a much bigger class those samples will tend to be distinct, while sample from the less common class will contain duplicates, and fewer distinct examples. I think this ends up still biasing the learning. The randomForest package for R supports all of the above tweaks. Is there any signifinace to the way your instances are grouped into bags or is it only to counteract the class imbalance? Can you just pool all the data into one training set? You colud also run the the feature importance algorithm on each of the multiple bags against the common bag & just see if the same features are important. Thanks for the answer. I will experiment with RFs. There is no significance to the way in which the instances are grouped into bags. I am given positive images and negative images. I extract multiple features from each image and group them into a bag. Some of the features from the positive images, are not significant enough and are found in the negative images as well. So, I want to identify a ranking order of the most important features from the positive instances. Regarding pooling of all data into one training set, how would that help?
(Dec 13 '12 at 22:35)
Vittal
Regarding pooling the data, I was mainly thinking of simplifying the problem conceptually. The bags seem to be an extraneous artifact that just complicates the problem. It seeems better to think of it just in terms of positive & negative instances. If you do any kind of undersampling or resampling you a reintroducing bags again, but that is a part of the solution. It does not seem to belong in the problem statement.
(Dec 14 '12 at 03:28)
Daniel Mahler
|
The question title does not really reflect the body of the quetion: the tile is about instance probability but the main body is about feature importance.
I agree with Daniel below, the bags seems to just complicate things. The way you describe the problem it looks liks binary classification with imbalanced data. What is the purpose of the feature selection? If the goal is to achieve better classification performance, I would try to simply regularize the model before even considering some heuristic feature selection.
The goal is not just better classification. The goal, primarily, is to identify discriminative features and the not-so-discriminative ones.
Ok. Did you consider starting with something simple, like mutual information in a naive Bayes style model?
Can you also say more about your data: number of instances & distinct features, feature distributions/sparsity, class frequencies ...