Transferring structural markup across translations using multilingual alignment and projection

16 years 17 days ago

Download www.perseus.tufts.edu

We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags from source documents to their transcribed translations, with an 83.6% accuracy rate when projecting to texts containing uncorrected OCR. This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one. Categories and Subject Descriptors H.3.7 [Information Systems: Information Storage and Retrieval]...

David Bamman, Alison Babeu, Gregory Crane

Real-time Traffic

Accuracy Rate | Canonical Citation Structure | JCDL 2010 | Multilingual Digital Library |

claim paper

» CrossLanguage Frame Semantics Transfer in Bilingual Corpora

» Crosslingual Propagation for Morphological Analysis

» New Features in Spoken Language Search Hawk SpLaSH Query Language and Query Sequence

Post Info
More Details (n/a)

Added	10 Jul 2010
Updated	10 Jul 2010
Type	Conference
Year	2010
Where	JCDL
Authors	David Bamman, Alison Babeu, Gregory Crane

Comments (0)

Sciweavers

Transferring structural markup across translations using multilingual alignment and projection

Accuracy Rate | Canonical Citation Structure | JCDL 2010 | Multilingual Digital Library |

Explore & Download

Productivity Tools

Sciweavers