|
Is there good paper regarding usage of Deep/shallow parsing for increasing search relevance? For example in eCommerce domain when we search for "mouse" we don't want results such as "mouse pads", "mouse cable" etc. I.e. we want to understand main concept of title and query. There are lot of other possible usages of NLP in Search Engine building, I'm looking for papers with good survey of applying such techniques. |
Given that the average query length is 2.5 words it is hard to do parsing, if by that you mean some sort of syntactic or semantic parsing. So what kind of parsing are you interested in and do you want to parse the query or the indexed documents? People do things clustering and categorization in order to say find the intent of the query, etc. I actually don't understand what do you mean by "understand the main concept of title and query". Do you want to understand how much the title of a document influences the relevance given a query?
Yes I mean syntactic or semantic parsing for query and document title. If we have 2.5 average length than at least we can weight differently both of these terms and it makes a lot of sense using noun head phrase analyze which is syntactic parsing. The same can be done for title. Thats what I mean.
Thanks for the clarification. The words in the title are one of the parameters for calculating the relevance score. I am not aware of how things work in the e-commerce domain but in the medical domain you can have a query like: "autism antibiotics" and some titles like "Antibiotics cause autism" and "Autism linked to antibiotics". So, it is very hard to do, say, a dependency based analysis on the query. And as for the titles, well often the head word (or phrase) is the verb and then the tricky question is to decide which is the main concept- the verb? or the subject? or the object? But I would also be interested in such a study.
Btw, Autonomy IDOL uses Meaning-based computing. This is maybe something in the line of what you are interested in.