1
1

Following this question, I thought it might be a good idea to have a separate question just about benchmarks. Are there any websites with benchmark datasets including results? I know about the UCI database but afaik, it contains only datasets, not results. Also most of the datasets that are widely used are very small.

What would be a good way of starting a benchmark collection and what properties should it have?

Classification is certainly something that is very easy to benchmark but there are other things like collaborative filtering and structured prediction that are also very interesting...

I see MNIST being used as a benchmark again and again. It seems to be one of the few datasets where there are results available on the web (at Yann LeCun's website). But people should definitely move away from that :(

This question is marked "community wiki".

asked Sep 11 '11 at 13:52

Andreas%20Mueller's gravatar image

Andreas Mueller
1817133671


One Answer:

Initiatives such as mlcomp and mldata are a step forward in that direction although they have not yet reached the critical mass in terms of adoption by researchers to make them a reference way to publish benchmark results.

This answer is marked "community wiki".

answered Sep 11 '11 at 17:52

ogrisel's gravatar image

ogrisel
398464480

edited Sep 11 '11 at 18:12

I tried applying some pressure to a professor in my university in Brazil to assign his students homework by making them upload either code or data on MLCOMP instead of doing all the testing separately. Unfortunately I don't think he actually did it.

(Sep 11 '11 at 18:10) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.