New Resources for Document Classification, Analysis and Translation Technologies

15 years 8 months ago

Download www.lrec-conf.org

The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into English transcripts, for use by humans and downstream applications. The first phase the program focuses on translation of handwritten Arabic documents. Linguistic Data Consortium (LDC) is creating publicly available linguistic resources for MADCAT technologies, on a scale and richness not previously available. Corpora will consist of existing LDC corpora and data donations from MADCAT partners, plus new data collection to provide high quality material for evaluation and to address strategic gaps (for genre, dialect, image quality, etc.) in the existing resources. Training and test data properties will expand over time to encompass a wide range of topics and genres: letters, diaries, training manuals, brochures, signs, ledgers, memos, instructions, postcards and forms among others. Data will be ground truthed, with ...

Stephanie Strassel, Lauren Friedman, Safa Ismael,

Real-time Traffic

Education | Handwritten Arabic Documents | Linguistic Data Consortium | LREC 2008 | Test Data Properties |

claim paper

» Towards practical genre classification of web documents

» An efficient method for using machine translation technologies in crosslanguage patent sea...

» Fast dimension reduction for document classification based on imprecise spectrum analysis

» Citation based plagiarism detection a new approach to identify plagiarized work language i...

» GenMAPP 2 new features and resources for pathway analysis

» Are SentiWordNet scores suited for multidomain sentiment classification

» Overview of VideoCLEF 2009 New Perspectives on SpeechBased Multimedia Content Enrichment

» Semisupervised Document Classification with a Mislabeling Error Model

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Stephanie Strassel, Lauren Friedman, Safa Ismael, Linda Brandschain

Comments (0)

Sciweavers

New Resources for Document Classification, Analysis and Translation Technologies

Education | Handwritten Arabic Documents | Linguistic Data Consortium | LREC 2008 | Test Data Properties |

Explore & Download

Productivity Tools

Sciweavers