Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

152

Voted

ACL
2009

129views Computational Linguistics» more ACL 2009»

Data Cleaning for Word Alignment

15 years 4 months ago

Data Cleaning for Word Alignment

Download www.aclweb.org

Parallel corpora are made by human beings. However, as an MT system is an aggregation of state-of-the-art NLP technologies without any intervention of human beings, it is unavoidable that quite a few sentence pairs are beyond its analysis and that will therefore not contribute to the system. Furthermore, they in turn may act against our objectives to make the overall performance worse. Possible unfavorable items are n : m mapping objects, such as paraphrases, non-literal translations, and multiword expressions. This paper presents a pre-processing method which detects such unfavorable items before supplying them to the word aligner under the assumption that their frequency is low, such as below 5 percent. We show an improvement of Bleu score from 28.0

Tsuyoshi Okita

Real-time Traffic

ACL 2009 | Computational Linguistics | Human Beings | Possible Unfavorable Items | Unfavorable Items |

claim paper

Related Content

» Active LearningBased Elicitation for SemiSupervised Word Alignment

» Keyword query cleaning

» Aligning words using matrix factorisation

» Data Issues in EnglishtoHindi Machine Translation

» Email data cleaning

» Boosting Statistical Word Alignment Using Labeled and Unlabeled Data

» Statistical Machine Translation with Word and SentenceAligned Parallel Corpora

» Word Alignment Annotation in a JapaneseChinese Parallel Corpus

» Combining Clues for Word Alignment

Post Info
More Details (n/a)

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Tsuyoshi Okita

Comments (0)