|
I have the following doubt. When doing image segmentation and classification, a logistic classifier will assign a label to each pixel or superpixel in an image. However, in my case, the test data will inevitably have more classes than the training data. I would like to only classify regions that are properly recognized, leaving the rest unknown. What are the options for doing classification with the presence of unknown classes? Here they suggest using the ratio of the log-likelihoods log(P(C|D)/P(~C| D) and threshold on it. However, I wonder if this is the proper way to go as this ratio only uses conditional likelihoods. Would it be better to use a tractable generative model and evaluate something like log(P(C,D)/P(~C, D)? |
|
One quick dirt way that I have used is SOM (Self Organized MAPS) It basically creates a 2D MAP of an N-Featured space, trying to mimic the topology of the space. (It'll will try to cluster similar elements) Essentially, the large the map, the larger the number of classes you can accommodate. So if you do know more or less how many more unseen classes you'll have, you can try to use a very large map to allow extra space for the extra unseen classes. The map is effective when classifying new inputs online, so you can have these new classes being allocated. |
|
One way of solving this problem is to first perform inference on test data. Then, find the marginal probability of the label assigned to each pixel. If the marginal probability is below some low threshold (say 0.1, in case of 2 labels), then assume that the class of this pixel is not among the known classes since the model does not have enough confidence in this label. Mark the pixel as UnknownClass. This is essentially what I mentioned in my post, right? The biggest problem I see is that classifiers are often over-confident, i.e the probabilities are not well calibrated, specially if you use something like adaboost to get good classification results.
(Apr 26 '11 at 17:44)
Roderick Nijs
|