|
I'm looking for detailed empirical studies of machine learning algorithms that try to shed light on the strengths and weaknesses of different methods. Most papers take the form "here is a standard method... here is my method... I win! Yay!" I'm looking more for conclusions along the lines "method X does well under situation 1 and poorly under situation 2." I think these types of references would be very useful for anyone getting started with or learning to apply machine learning techniques to real-world problems. Some references off the top of my head are: C. Perlich, F. Provost, J. Simonoff Tree induction vs. logistic regression: a learning-curve analysis JMLR 2003 A Niculescu-Mizil, R. Caruana An empirical comparison of supervised learning algorithms ICML 2006 E. Bernado Mansilla, T.K. Ho On classifier domains of competence ICPR 2004 I. Rish An empirical study of the Naive Bayes classifier IJCAI 2001 Surveys of work within specific problem domains (e.g. imbalanced data, very large data, very small data, high-dimensional data, non-stationary data) would also be of interest as long as the strengths and weaknesses of algorithms are discussed. I would imagine that there are tons, but they are highly scattered in time and publication venue. Any pointers would be great. Don't be afraid to promote your own work. |
|
Some possibly low-quality comparisons:
|
|
Here's another one: "An empirical evaluation of supervised learning in high dimensions" by Caruana et al. 2008 1
Here's a freely available copy from the IMLS website: http://www.machinelearning.org/archive/icml2008/papers/632.pdf
(Feb 15 '11 at 23:01)
Sean
|
I found the R. Caruana paper (the one listed) and his others very enlightning and helpful in creating high quality ensembles. In case you haven't seen this already here is a presentation he gave about the results of that paper.