|
Hi, I am looking for a JAVA based library for doing the following on a huge dataset of natural language(English). I want to do do Tokenization, lemmentitaion, stemming , stop word removal and build a vocabulary and also if possible build tf-idf score table. Is there a single library available with which I can do all of the above? |