Subword Variation in Text Message Classification

15 years 4 months ago

Download www.robertmunro.com

For millions of people in less resourced regions of the world, text messages (SMS) provide the only regular contact with their doctor. Classifying messages by medical labels supports rapid responses to emergencies, the early identification of epidemics and everyday administration, but challenges include textbrevity, rich morphology, phonological variation, and limited training data. We present a novel system that addresses these, working with a clinic in rural Malawi and texts in the Chichewa language. We show that modeling morphological and phonological variation leads to a substantial average gain of F=0.206 and an error reduction of up to 63.8% for specific labels, relative to a baseline system optimized over word-sequences. By comparison, there is no significant gain when applying the same system to the English translations of the same texts/labels, emphasizing the need for subword modeling in many languages. Language independent morphological models perform as accurately as langu...

Robert Munro, Christopher D. Manning

Real-time Traffic

Broad Deployment Potential | Computational Linguistics | NAACL 2010 | Phonological Variation | Substantial Average Gain |

claim paper

Added	14 Feb 2011
Updated	14 Feb 2011
Type	Journal
Year	2010
Where	NAACL
Authors	Robert Munro, Christopher D. Manning

Sciweavers

Subword Variation in Text Message Classification

Broad Deployment Potential | Computational Linguistics | NAACL 2010 | Phonological Variation | Substantial Average Gain |

Explore & Download

Productivity Tools

Sciweavers