Is the algorithm of data clustering used by google, known ? If yes, where can I find it.

asked Jan 12 '12 at 12:16

shn's gravatar image

shn
462414759

edited Jan 26 '12 at 04:35

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

2

Google uses many variants of many clustering algorithms internally. Some of them are known (k-means), some of them are not (their clustering spelling corrector). Can you be more specific?

(Jan 12 '12 at 12:38) Alexandre Passos ♦

What is the main essential algorithm that Google use to cluster the data (mainly web pages based content) ? They use kmeans for which task ? Is it an incremental version that they use ? It's not feasible to do a static clustering (e.g. classical kmeans) because the amount of concerned data is very large, change, and progress continuously ...

(Jan 12 '12 at 18:26) shn

Perhaps you might look into PageRank, which is he algorithm Google is best known for.

(Jan 12 '12 at 23:08) Leon Palafox ♦

this is lol

(Jan 12 '12 at 23:12) Travis Wolfe
2

They don't use k-means, they use keywords, user behavior, and all the ingenuity and technical effort that their hundreds or thousands of search engineers working with the largest server farm in the world an a dataset larger than the sum total of all text produced by humans before the year 2005 can muster. There is no single algorithm (PageRank is the largest single element though).

(Jan 12 '12 at 23:30) Jacob Jensen

2 Answers:

You could search Google publication database for clustering and K-means: http://research.google.com/pubs/papers.html

One relevant Googler paper that's not in that list for some reason is "Fast and Accurate k-means For Large Datasets", NIPS 2011

answered Jan 22 '12 at 05:06

Yaroslav%20Bulatov's gravatar image

Yaroslav Bulatov
2333214365

Since you ask about web-scale clustering of web pages, I would assume they use either a distributed k-means, or the following algorithm which has a Googler as a co-author: Efficient Clustering of Web-Derived Data Sets (Sarmento et al, 2009).

I outline the algorithm in the answer to Large-scale clustering.

answered Jan 26 '12 at 04:35

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.