Sciweavers

EMNLP
2008

Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora

14 years 1 months ago
Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora
We present a novel method for discovering and modeling the relationship between informal Chinese expressions (including colloquialisms and instant-messaging slang) and their formal equivalents. Specifically, we proposed a bootstrapping procedure to identify a list of candidate informal phrases in web corpora. Given an informal phrase, we retrieve contextual instances from the web using a search engine, generate hypotheses of formal equivalents via this data, and rank the hypotheses using a conditional log-linear model. In the log-linear model, we incorporate as feature functions both rule-based intuitions and data co-occurrence phenomena (either as an explicit or indirect definition, or through formal/informal usages occurring in free variation in a discourse). We test our system on manually collected test examples, and find that the (formal-informal) relationship discovery and extraction process using our method achieves an average 1-best precision of 62%. Given the ubiquity of infor...
Zhifei Li, David Yarowsky
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where EMNLP
Authors Zhifei Li, David Yarowsky
Comments (0)