|
I'm looking for a wide range of resources. They might be query logs or popular query statistics. Unfortunately, to the best of my knowledge, the leading academic search engines like Google Scholar or Scirus don't publish datasets of that kind. |
|
UCI ML has a wide range of datasets that are commonly used to test new ML algorithms. I'm not 100% sure what you mean by 'query log', but Reddit recently released a large amount of data relating to their website, with user upvotes and downvotes on stories here. 1
Here is the link for UCI ML Repository http://archive.ics.uci.edu/ml/
(Nov 30 '10 at 22:23)
Leon Palafox ♦
Whoops. I did have it open on a tab, forgot to include it!
(Dec 01 '10 at 00:38)
Robert Layton
Robert, Leon, thanks for your comments. I'm aware of UCI ML, but there are no relevant data sets, and Reddit doesn't look pertinent as well. I'm looking for samples that are similar to ones of Yahoo! Webscope Program (http://webscope.sandbox.yahoo.com/). In particular, I'm interested in queries related to the mentioned fields.
(Dec 01 '10 at 04:13)
Nikita Zhiltsov
|
Have you looked at the LETOR datasets? And things like past TREC tracks?
Not yet. I'm looking through these resources. Thank you.