5
6

Papers, workshops, courses, software...

Cross-posted on quora.

asked Jun 12 '11 at 18:02

alex's gravatar image

alex
40861318

edited Feb 13 '12 at 03:25

ogrisel's gravatar image

ogrisel
498995591


9 Answers:

To bring together people interested in processing Big Data, distributed computing and machine learning, I created a new website processingbigdata.com

answered Sep 09 '11 at 13:00

Sergey%20Dolgopolov's gravatar image

Sergey Dolgopolov
12115

edited Sep 09 '11 at 13:00

Mining of Massive Datasets is a great free e-book by two Stanford professors. It's more focused on the MapReduce-way of doing large-scale machine learning than on things like online methods.

If you're interested in Natural Language Processing in particular, Jimmy Lin also has a good free e-book called Data-Intensive Text Processing with MapReduce. It's also MapReduce-focused.

Vowpal Wabbit is a great piece of software for fast online learning on huge datasets.

If you're more interested in the research frontiers of large-scale machine learning, Stanford hosts a Workshop on Algorithms for Modern Massive Datasets that has a bunch of great papers.

There's also an upcoming book on Scaling up Machine Learning by Ron Bekkerman, Misha Bilenko, and John Langford.

answered Jun 16 '11 at 14:02

grautur's gravatar image

grautur
961122328

edited Sep 19 '11 at 16:08

I'm having trouble finding the paper, but I believe it was either Alexander Krizhevsky (more likely) or Graham Taylor that co-authored a paper that parallelized training a large RBM over a network between many machines. Their basic approach was to add a very large number of hidden units, then parallelize the up & down passes during gibbs sampling in a manner somewhat similar to the way work would be split up in a mapreduce style framework.

This answer is marked "community wiki".

answered Jun 16 '11 at 13:29

Brian%20Vandenberg's gravatar image

Brian Vandenberg
824213746

(Jun 17 '11 at 19:01) alex

I enjoyed reading Leon Bottou's tutorial [1] on large scale learning with SVMs and CRFs trained by stochastic gradient descent.

answered Jun 15 '11 at 13:37

levesque's gravatar image

levesque
3653515

edited Jun 15 '11 at 14:39

I follow the machine learning blog hunch.net of John Langford. There I found this link pointing to a list of resources for learning about large scale machine learning: http://www.quora.com/Machine-Learning/What-are-some-introductory-resources-for-learning-about-large-scale-machine-learning#ans104989

answered Jun 15 '11 at 04:57

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1664414

Even though it is more on the data mining side, I think this course and the accompanying book may be of interest: Data Mining: Learning from large datasets.

answered Jun 14 '11 at 14:33

Svetoslav%20Marinov's gravatar image

Svetoslav Marinov
26618

This paper (A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification, Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin; JMLR, 2010) is a good overview read in that direction. My favourite source for reading on various ML topics is still JMLR because of the teaching nature of these papers.

answered Jun 14 '11 at 11:12

Georgiana%20Ifrim's gravatar image

Georgiana Ifrim
1664414

I think large-scale machine learning is still very much an area of research. That given, most of the recent advances are still being published, and as far as I know there is no comprehensive book or class on the topic. I would recommend, then, that you watch the talks in the "learning on cores, clusters, and clouds" on NIPS 2010 and the "large scale machine learning" on NIPS 2009. Watching the talks and reading the papers might point you towards other interesting resources in this topic.

Edit: Actually a book just came out on the subject. It's called Scaling up machine learning, by Bekkerman, Bilenko, and Langford.

answered Jun 13 '11 at 03:38

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

edited Feb 26 '12 at 15:37

I'd like to add Alex Smola's talk on graphical models for the internet. (link)

(Feb 26 '12 at 12:10) jjossarin
(Feb 26 '12 at 15:39) Dov

I can recommend Apache Mahout software - hadoop based open source Java library implementing large scale machine learning and collaborative filtering algorithms.

answered Jun 12 '11 at 18:39

Sergey%20Dolgopolov's gravatar image

Sergey Dolgopolov
12115

edited Jun 12 '11 at 18:40

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.