|
I am wondering if the correct term for Stop words is Function words. It seems function words is the proper term and has been used by linguists. what do you think? |
|
While most of the time all words that are considered Stop words are indeed Function words, I think it can still be useful to discriminate between the actual linguistic term and the term used for words you are not interested in for some task. Some less frequent function words are perhaps still interesting. 'versus' or 'aboard' for example might not be in a Stop word list because they are still highly correlated with certain semantic content. 1
I couldn't agree more. The term "function words" has a specific linguistic meaning -- they are words that carry little or no referential value, but contribute to the syntactic interpretation of language. "Stop words" are just things you'd like to ignore for a particular task (say, low tf*idf terms). Usually these two groups have a lot of overlap, but I don't think we should forget the distinction.
(Jan 21 '11 at 07:28)
Andrew Rosenberg
Also, sometimes function words are not stopwords. In the Nigam, Mccallum, Thrun, and Mitchell paper that introduced EM for semi-supervised naive bayes ( http://www.cs.uu.nl/docs/vakken/ll/Labeled_unlabeled.pdf ), they note that the word "my" (which is in many stoplists) is actually a very good feature for one of the tasks (student homepage classification).
(Jan 21 '11 at 08:30)
Alexandre Passos ♦
|