3
1

When implementing search auto-complete, what do you do about the fact that stop list words dominate the beginning of the question ("How do I")?

I am building a search auto-complete feature. This search auto-complete is for the OSQA software, which you are using now. As you type a question title, it searches for question and answer bodies (as well as tags and titles) using the current question title.

However, the beginning of the question is typically all stop words ("How do I"). What is the best behavior for search auto-complete in these circumstances?

asked May 26 '10 at 15:29

Hern%C3%A2ni%20Cerqueira's gravatar image

Hernâni Cerqueira
61235

edited May 26 '10 at 16:40

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146


2 Answers:

You should compute the frequencies of the bi-gram and tri-gram occurring in your queries before applying the stopwords filter: the unigrams "how" and "do" could be filtered by the stopword lists if you estimate that they only bring noise and memory usage for no additional signal, but not the bigram "how do". Then when you want to rank (estimate the probability) of the third word, you should combine the probabilities of the bi-grams (and maybe tri-grams) and the filtered unigrams.

answered Jun 23 '10 at 16:29

ogrisel's gravatar image

ogrisel
498995591

Use the method I provided in your other question. It has no stop words list.

http://metaoptimize.com/qa/questions/17/stemming-problems-when-writing-search-auto-complete#24

answered Jun 09 '10 at 14:41

buser's gravatar image

buser
11

edited Jun 09 '10 at 15:58

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.