Hi,

I would like to calculate the frequency of function words in Python/NLTK. I see two ways to go about it :

  1. Use Part-Of-Speech tagger and sum up on POS tags which constitute to function words

  2. Create a list of function words and perform a simple look up

The catch in the first case is that, my data is noisy and I don't know(for sure) which POS tags constitute as function words. The catch in the second case is I don't have a list and since my data is noisy the lookup won't be accurate.

I would prefer the first to the second or any other example which would throw me more accurate results

asked Apr 28 '11 at 13:35

Dexter's gravatar image

Dexter
416243438


2 Answers:

I just used the LIWC English 2007 dictionary ( I paid for the same) and performed a simple lookup as of now. Any other answers are most welcome.

answered Apr 29 '11 at 14:00

Dexter's gravatar image

Dexter
416243438

Using a list should be easiest, as you have done. If you know the language, you can sort words by frequency and then manually select the function words.

For english, Wikipedia has a list of prepositions and a bit of googling should turn up similar lists of pronouns etc. There are also free lists of stop words already made for you that you can find online as well (of varying quality), I found a few with my first google search. Here is one for english. It should be easy to check the list to see if it has any words you don't want to include and you can check your corpus manually for words you wish to include, but don't yet have on the list.

answered May 07 '11 at 02:41

gdahl's gravatar image

gdahl ♦
341453559

edited May 07 '11 at 02:44

Gdahl, Is it safe to make the assumption that stop words equal function words? NLTK has a default stop word list. I used LIWC for the list/lexicon look up. I think it's more safe.

(May 07 '11 at 03:00) Dexter

I believe those two words are usually used interchangeably. Wikipedia indicates that some function words are stop words, but not all: http://en.wikipedia.org/wiki/Stop_words

(May 07 '11 at 09:49) Robert Layton

Robert, Yes. Hence, I guess a pre-defined function word list is apt for my task.

(May 11 '11 at 10:57) Dexter
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.