I recently got an email from someone who studies Scandinavian literature and wants to do some NLP on them - are there parsers or part of speech taggers for Danish, Norwegian, or Swedish? What about trained machine translation models?

What are his options beyond lexical stuff?


asked Sep 15 '10 at 09:34

3 Answers:

There is danish and swedish data from the CoNLL X shared task on multi language dependency parsing. You can trivially get POS data from there, as well as a dependency treebank, and train a standard tagger/parser.

answered Sep 15 '10 at 10:52

Alexandre Passos ♦

For Swedish, the HunPos tagger is reported to work well. It is a re-implementation of the HMM TnT tagger.

There is also a Swedish Treebank, which is an extended version of this one. Judging from other languages, I suspect the Berkeley Parser would perform reasonably well.

answered Sep 15 '10 at 17:19

For danish NLP data another option is to look at the Copenhagen dependency treebank CDT There are three treebanks available:

  • CDT1: The Danish Dependency Treebank (100,000 words), which was used as training material in the CoNLL 2006 shared task.
  • CDT2: The Danish-English Parallel Dependency Treebank (95,000 words).
  • CDT3: The Copenhagen Dependency Treebanks for Danish, English, German, Italian and Spanish (2x100,000 + 3x60,000 words, work-in-progress).

answered Feb 10 '11 at 12:32

Carsten Lygteskov Hansen

