Is the algorithm of data clustering used by google, known ? If yes, where can I find it.

asked Jan 12 at 12:16

Shna's gravatar image

Shna
284162029

edited Jan 26 at 04:35

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
470541105127

2

Google uses many variants of many clustering algorithms internally. Some of them are known (k-means), some of them are not (their clustering spelling corrector). Can you be more specific?

(Jan 12 at 12:38) Alexandre Passos ♦

What is the main essential algorithm that Google use to cluster the data (mainly web pages based content) ? They use kmeans for which task ? Is it an incremental version that they use ? It's not feasible to do a static clustering (e.g. classical kmeans) because the amount of concerned data is very large, change, and progress continuously ...

(Jan 12 at 18:26) Shna

Perhaps you might look into PageRank, which is he algorithm Google is best known for.

(Jan 12 at 23:08) Leon Palafox

this is lol

(Jan 12 at 23:12) Travis Wolfe
2

They don't use k-means, they use keywords, user behavior, and all the ingenuity and technical effort that their hundreds or thousands of search engineers working with the largest server farm in the world an a dataset larger than the sum total of all text produced by humans before the year 2005 can muster. There is no single algorithm (PageRank is the largest single element though).

(Jan 12 at 23:30) Jacob Jensen

2 Answers:

You could search Google publication database for clustering and K-means: http://research.google.com/pubs/papers.html

One relevant Googler paper that's not in that list for some reason is "Fast and Accurate k-means For Large Datasets", NIPS 2011

answered Jan 22 at 05:06

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
1963193458

Since you ask about web-scale clustering of web pages, I would assume they use either a distributed k-means, or the following algorithm which has a Googler as a co-author: Efficient Clustering of Web-Derived Data Sets (Sarmento et al, 2009).

I outline the algorithm in the answer to Large-scale clustering.

answered Jan 26 at 04:35

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
470541105127

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.