What is the best method to make location disambiguation for geonames data?

There are some scoring algorithm for geonames search, but they do not open source it and I'm not sure it very sophisticated. (i.e. for soma, ca it returns Soma lake in Canada which haven't even wikipedia article, instead of very popular Soma Neirbohood in san francisco)

There also some works I have found in google scholar, but they seems very shallow and similar with my heuristics like scoring by something(log(population) + 1000hasWikipedia(article)+ isCity100+isCapital(10)). My domain in travel articles som my scoring function should provide most probable tourist places(cities, place of interest(Disneyland, colleseum, big ben)).

Do you know any important articles describing geocoding algorithms used in production by Google maps, yahoo, bing?

asked Feb 22 '12 at 20:02

yura's gravatar image

yura
1025374854

edited Feb 22 '12 at 20:09


One Answer:

This will sound like a plug for my own work but I think a book chapter we wrote on Entity Linking might give you a good summary of existing approaches to this problem, for all kinds of entities (not just geonames), and describes a state-of-the-art supervised learning system for doing this task. In particular, you should look at the features section for what works for places.

answered Feb 22 '12 at 20:52

Delip%20Rao's gravatar image

Delip Rao
6653912

Thanks, great article.

(Mar 14 '12 at 03:42) yura
Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×4
×4
×1

Asked: Feb 22 '12 at 20:02

Seen: 1,590 times

Last updated: Mar 14 '12 at 03:42

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.