Natural Language Processing is an area which interests me. I know little about it and I'm hoping somebody can suggest a starting point. Basically I'd like to develop a search algorithm specific for natural languages. One that deals with ambiguities in an intelligent fashion. Where would be a great starting point? Can anyone recommend some good papers or other resources to help me get started?

asked Mar 05 '11 at 05:17

Aidan's gravatar image

Aidan
16112


3 Answers:

To learn about NLP in general I would recommend you to have a look at nltk.org where you will find both the python library and an online book that introduces all the major concepts of NLP. As for your search algorithms that resolve ambiguities in natural languages, you can either have a look a Latent Semantic Indexing and Named Entities extraction and disambiguation (assuming that your ambiguous queries are about ambiguous entities, e.g. distinct entities sharing the same name or very similar names).

More advanced semantic search could be based on Semantic Hashing (PDF) but this still an area of active research. You won't find ready to use toolkits to implement this easily in production.

answered Mar 05 '11 at 06:10

ogrisel's gravatar image

ogrisel
498995591

I'd advise you to read at least the following three books:

1) Speech & Language Processing (http://www.cs.colorado.edu/~martin/slp.html) by Daniel Jurafsky & James Martin 2) Foundations of Statistical Natural Language Processing by Manning & Schuetze (http://nlp.stanford.edu/fsnlp/) 3) Introduction to Information Retrieval by Manning, Raghaven & Schuetze (http://nlp.stanford.edu/IR-book/information-retrieval-book.html)

Your problem is not that trivial as it may probably seem to you. In addition, you have to at least take into consideration the idiosyncrasies of each language you'd like your search algorithm to work with.

In the book "Programming Collective Intelligence" by Toby Segaran, Chapter 4 you will find a simple python implementation of a search engine.

Btw, for which language do you want to design your search algorithm? You should be aware that there are many kinds of ambiguities in the languages and what is the case for English may not at all be true for Inuit for example.

answered Mar 05 '11 at 14:17

Svetoslav%20Marinov's gravatar image

Svetoslav Marinov
26618

edited Mar 06 '11 at 14:38

Search is a different problem than natural language processing. For information retrieval (as search is usually referred to) check out Manning's book.

If you want practical natural language processing, I suggest that you, after following Svetoslav's advice, learn how to use and understand the output of every tool in the Stanford CoreNLP package, which includes a parser, a part-of-speech tagger, a named entity recognizer, and a coreference resolver. Also look at wordnet if you plan on doing something with English, as it can help solve some specialized forms of ambiguity. Most semantic processing these days uses the output of this sort of tool as a basic building block, so it's good if you get used to thinking in terms of these structures before attempting to solve a very hard problem.

answered Mar 07 '11 at 11:15

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.