Sciweavers

INTERSPEECH
2010

Text normalization based on statistical machine translation and internet user support

13 years 6 months ago
Text normalization based on statistical machine translation and internet user support
In this paper, we describe and compare systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of internet users. Internet users normalize text displayed in a web interface, thereby providing a parallel corpus of normalized and nonnormalized text. With this corpus, SMT models are generated to translate non-normalized into normalized text. To build traditional language-specific text normalization systems, knowledge of linguistics as well as established computer skills to implement text normalization rules are required. Our systems are built without profound computer knowledge due to the simple self-explanatory user interface and the automatic generation of the SMT models. Additionally, no inhouse knowledge of the language to normalize is required due to the multilingual expertise of the internet community. All techniques are applied on French texts, crawled with our Rapid Language Adaptation Toolkit [1] and compared t...
Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Sch
Added 18 May 2011
Updated 18 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz
Comments (0)