hi can anyone tell how can i remove stopwords(ex:the,a,is,was)from the 100 text documents,which i have placed in particular folder in my pc.please suggest some code in java..its urgent can help immediately...

asked Mar 22 '12 at 05:31

vadivel's gravatar image

vadivel
0111


One Answer:

I don't know about java, but you basically need to:

  1. Create an array of all the words in your documents (You use some kind of split function)
  2. Now, you look for a database with usual stop words, like this one
  3. Using some kind of lookup function (in python is called find) you can in one fell swoop get all the indexes of the stop words in your array.
  4. Delete those indexes
  5. Done

This is a quick dirty way, but is the basic way a stopword removing algorithm would work.

answered Mar 22 '12 at 05:38

Leon%20Palafox's gravatar image

Leon Palafox ♦
40857194128

Your answer
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×3

Asked: Mar 22 '12 at 05:31

Seen: 949 times

Last updated: Mar 22 '12 at 05:38

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.