|
When implementing search auto-complete, what do you do about the fact that stop list words dominate the beginning of the question ("How do I")? I am building a search auto-complete feature. This search auto-complete is for the OSQA software, which you are using now. As you type a question title, it searches for question and answer bodies (as well as tags and titles) using the current question title. However, the beginning of the question is typically all stop words ("How do I"). What is the best behavior for search auto-complete in these circumstances? |
|
You should compute the frequencies of the bi-gram and tri-gram occurring in your queries before applying the stopwords filter: the unigrams "how" and "do" could be filtered by the stopword lists if you estimate that they only bring noise and memory usage for no additional signal, but not the bigram "how do". Then when you want to rank (estimate the probability) of the third word, you should combine the probabilities of the bi-grams (and maybe tri-grams) and the filtered unigrams. |
|
Use the method I provided in your other question. It has no stop words list. http://metaoptimize.com/qa/questions/17/stemming-problems-when-writing-search-auto-complete#24 |