When implementing search auto-complete, how do you implement stemming when you are still typing a word?

Background: I am building a search auto-complete feature. This search auto-complete is for the OSQA software, which you are using now. As you type a question title, it searches for question and answer bodies (as well as tags and titles) using the current question title.

However, the search implementation will stem the terms in the query. When you are in the middle of typing a word, it is incomplete and cannot be correctly stemmed. What are patterns for including this incomplete word in your search query?

asked May 26 '10 at 15:15

Hern%C3%A2ni%20Cerqueira's gravatar image

Hernâni Cerqueira
61235

edited May 30 '10 at 18:06

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
579051125146

Aside: auto-complete as a web-service was recently proposed László Kozma: http://www.lkozma.net/seven.html#l2

(May 30 '10 at 18:08) Joseph Turian ♦♦

3 Answers:

In my mind, it would be far easier to have two query parsers.. one which does stemming, useful when the actual search is going on and one which looks for exact match for auto-completion.

Second option would be write your query parser that stems only completed words (i.e. tokenize on spaces and ignore the last word). I like this option better, though it would make the implementation a bit harder.

answered Jul 01 '10 at 10:19

Vaidhy%20Mayilrangam's gravatar image

Vaidhy Mayilrangam
161

A gentleman by the name of Ahmet helped me with stemming issues and live search on the SOLR email list.

Here's what he said:

Lets say you have short title field and you are going to give suggest/autocomplete using this field from index and order is not important. But in this ca

<fieldType name="prefix_full" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

<fieldType name="prefix_token" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20"/>
</analyzer>

You can use these two fields, populate them from your short title field

<copyField source="Title" dest="titlePrefix"/>
<copyField source="Title" dest="titlePrefixFull"/>

and use normal query, (not wildcard) as the user types words
q=titlePrefix:(term1 te) titlePrefixFull:"term1 te"&defType=lucene&q.op=OR&fl=Title
will return you suggestions. Does this satisfy your needs?
In this case you are suggesting whole title field.

answered Jun 09 '10 at 14:39

buser's gravatar image

buser
11

Just related to Joseph's "aside": we have actually implemented a first version of our search autocomplete system and we will soon release for it website search: http://www.metahint.com

answered Oct 09 '10 at 13:03

Laszlo%20Kozma's gravatar image

Laszlo Kozma
16358

edited Oct 09 '10 at 13:05

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.