I'm looking for a wide range of resources. They might be query logs or popular query statistics. Unfortunately, to the best of my knowledge, the leading academic search engines like Google Scholar or Scirus don't publish datasets of that kind.

asked Nov 30 '10 at 17:19

Nikita%20Zhiltsov's gravatar image

Nikita Zhiltsov
664511

Have you looked at the LETOR datasets? And things like past TREC tracks?

(Dec 01 '10 at 13:39) Alexandre Passos ♦

Not yet. I'm looking through these resources. Thank you.

(Dec 02 '10 at 13:04) Nikita Zhiltsov

One Answer:

UCI ML has a wide range of datasets that are commonly used to test new ML algorithms. I'm not 100% sure what you mean by 'query log', but Reddit recently released a large amount of data relating to their website, with user upvotes and downvotes on stories here.

answered Nov 30 '10 at 17:26

Robert%20Layton's gravatar image

Robert Layton
1625122637

1

Here is the link for UCI ML Repository http://archive.ics.uci.edu/ml/

(Nov 30 '10 at 22:23) Leon Palafox ♦

Whoops. I did have it open on a tab, forgot to include it!

(Dec 01 '10 at 00:38) Robert Layton

Robert, Leon, thanks for your comments. I'm aware of UCI ML, but there are no relevant data sets, and Reddit doesn't look pertinent as well. I'm looking for samples that are similar to ones of Yahoo! Webscope Program (http://webscope.sandbox.yahoo.com/). In particular, I'm interested in queries related to the mentioned fields.

(Dec 01 '10 at 04:13) Nikita Zhiltsov
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.