|
I'm looking for general relevant results in this area. |
|
The main approaches, as far as I know, are:
I found the Hodge and Austin A survey of outlier detection methodologies paper to be useful. |
|
You can refer the work by David M. J. Tax, esp. his Ph.D thesis. Moreover, he has a toolbox in matlab. See http://prlab.tudelft.nl/users/david-tax for details. Maybe it is useful for you.
This answer is marked "community wiki".
|
|
There's a nice JMLR paper by Owen on logistic regression in the setting where the training set contains finitely many positive observations and infinitely many negative ones: Infinitely Imbalanced Logistic Regression |
|
I think the literature on Robust Statistics is relevant, though I am not familiar enough be able to give a more precise answer. |
|
This paper might be of some interest to you Regularized F-Measure Maximization for Feature Selection and Classification. I have not used the method myself, though I might have to ( this being the reason I stumble upon it). Supposedly it works well on unbalanced datasets which I guess is what you have in mind. The application I have in mind is not classification per se; I'm interested in the problem of detecting outliers, not necessarily dealing with them.
(Oct 18 '10 at 18:26)
Alexandre Passos ♦
1
That's a classification problem though, isn't it (f(is_an_outlier)=0, f(!is_an_outlier)=1)?
(Oct 19 '10 at 04:27)
Bob Durrant
@Durrant: This is true; I'm interested in detecting more than one sort of outlier, however, so it'd be hard to get reliable negative (isn't outlier) examples. And with logistic regression at least, as per the paper you mentioned, the only relevant thing would be the average features of the isn't outlier class, which is too coarse.
(Oct 19 '10 at 04:37)
Alexandre Passos ♦
|