|
Hi, I recently got an email from someone who studies Scandinavian literature and wants to do some NLP on them - are there parsers or part of speech taggers for Danish, Norwegian, or Swedish? What about trained machine translation models? What are his options beyond lexical stuff? -Aditi |
|
There is danish and swedish data from the CoNLL X shared task on multi language dependency parsing. You can trivially get POS data from there, as well as a dependency treebank, and train a standard tagger/parser. |
|
For Swedish, the HunPos tagger is reported to work well. It is a re-implementation of the HMM TnT tagger. There is also a Swedish Treebank, which is an extended version of this one. Judging from other languages, I suspect the Berkeley Parser would perform reasonably well. |
|
For danish NLP data another option is to look at the Copenhagen dependency treebank CDT There are three treebanks available:
|