3
1

Hi,

I recently got an email from someone who studies Scandinavian literature and wants to do some NLP on them - are there parsers or part of speech taggers for Danish, Norwegian, or Swedish? What about trained machine translation models?

What are his options beyond lexical stuff?

-Aditi

asked Sep 15 '10 at 09:34

aditi's gravatar image

aditi
85072034


3 Answers:

There is danish and swedish data from the CoNLL X shared task on multi language dependency parsing. You can trivially get POS data from there, as well as a dependency treebank, and train a standard tagger/parser.

answered Sep 15 '10 at 10:52

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2524153274417

For Swedish, the HunPos tagger is reported to work well. It is a re-implementation of the HMM TnT tagger.

There is also a Swedish Treebank, which is an extended version of this one. Judging from other languages, I suspect the Berkeley Parser would perform reasonably well.

answered Sep 15 '10 at 17:19

yoavg's gravatar image

yoavg
74182029

For danish NLP data another option is to look at the Copenhagen dependency treebank CDT There are three treebanks available:

  • CDT1: The Danish Dependency Treebank (100,000 words), which was used as training material in the CoNLL 2006 shared task.
  • CDT2: The Danish-English Parallel Dependency Treebank (95,000 words).
  • CDT3: The Copenhagen Dependency Treebanks for Danish, English, German, Italian and Spanish (2x100,000 + 3x60,000 words, work-in-progress).

answered Feb 10 '11 at 12:32

Carsten%20Lygteskov%20Hansen's gravatar image

Carsten Lygteskov Hansen
312

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.