Are there any free / opensource libraries for NLP (with at least basic functionality like stemming, pos-tagging, morphology etc) with support of russian language or russian language models for OpenNLP or some other libraries? And which of them are better to use? Russian is not common and I did't find so much for it.

I really need to work with russian texts, so if there is no ready solution then I have to implement some NLP-stack by myself and what would you recommend for me to start from in this case? Which libraries / tools are good to adapt them to russian and what information I should use to do this? At first time I need stemming and POS-tagging only, and also I will be nice to have such a solution that may not work very good but which I can improve extensively, I mean if I use two times bigger training set or something then I'll get accuracy noticeable increased.

Russian is very different from many others so I'm asking not in general but in the context of this language.

This question is marked "community wiki".

asked Sep 21 '10 at 15:18

Sergey%20Bartunov's gravatar image

Sergey Bartunov
81111116

I'm guessing you want NLP tools that are pre-trained for Russian language. I expect that to be hard to find. Maybe it would be easier to find some Russian datasets and use a generic NLP learner. Perhaps you could browse through some Russian Machine Learning conferences to see if any of them do NLP applications and ask authors for their datasets -- http://www.machinelearning.ru/wiki/index.php?title=%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0

(Sep 21 '10 at 17:32) Yaroslav Bulatov

One Answer:

Russian is not very different from other languages, except it's probably not so well developed. NLP tools are based on old-fashion rules than on statistics.

There is quite big and successful Russian NLP project http://aot.ru which have morphology and syntax analysis tools. They are based on rules, but good enough to be a starting point. They also have project on sourceforge http://sourceforge.net/projects/seman/

This answer is marked "community wiki".

answered Sep 22 '10 at 06:33

Nickolay%20Shmyrev's gravatar image

Nickolay Shmyrev
46116

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.