From what I've seen, most methods in phrasal translations seem pretty hacky at best. I was wondering if anyone here could enlighten me as to what some of the best methods are to identify phrasal translations, both for synonym phrases within a language (ex: dying and kicking the bucket) and across languages.

Thanks!

asked Jul 19 '10 at 03:37

Daniel%20Duckwoth's gravatar image

Daniel Duckwoth
954222938

You might also be interested in this:

Way, A. and Gough, N. 2003. wEBMT: developing and validating an example-based machine translation system using the world wide web. Comput. Linguist. 29, 3 (Sep. 2003), 421-457. DOI= http://dx.doi.org/10.1162/089120103322711596

or this:

Grefenstette, Gregory. 1999. The World Wide Web as a resource for example-based machine translation tasks. In Proceedings of the ASLIB Conference on Translating and the Computer, volume 21, London

http://ftp.xrce.xerox.com/Publications/Attachments/1999-004/gg_aslib.pdf

(Jul 19 '10 at 11:15) Gregory Grefenstette

2 Answers:

I don't know of any work on phrase translation, but this paper on phrase-to-phrase alignment is interesting.

answered Jul 19 '10 at 08:53

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
2554154278421

This paper (in French, except for the following abstract) tried to examine how to determine which words might or might not have literal translations:

Identification d'erreurs de traduction dans un dictionnaire de recherche d'informations translingue et traduction de mots composés à l'aide du World Wide Web - H. Naets, G. Grefenstette - CORIA'05

http://www-list.cea.fr/gb/publications/docs/si/ingenierie_connaissance/fr/CORIA2005_naetz_identification_erreurs_traduction.pdf

Cross-language information retrieval over non parallel text requires a translation phase between a source language query and a target language document. In order to achieve the same performance as a monolingual target language query, good translations for all terms in a source language query must be found. Unfortunately, available translation dictionaries do not contain exact translations for many multiword terms that can be found in a query. Cross language retrieval systems use statistically or manually built translation dictionaries to perform translation, and in order to translate a multiword term, many systems generate possible word-to-word translations and verify the existence of the translations in the target database. When validated translations of multiword structures are used, retrieval improves. But there are two unsolved problems with the generate-and-validate method: (1) if the proper translation for one word in the multiword term is not in the translation dictionary the translation that will be validated by the method will not be the best translation, and (2) if the multiword term in the source is not translated by the same number of nonstop words in the target language, then the best translation will not be generated. In this paper, we present two methods for recognizing when these situations arise.

answered Jul 19 '10 at 11:08

Gregory%20Grefenstette's gravatar image

Gregory Grefenstette
1

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.