Sciweavers

ACL
2010

Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

13 years 10 months ago
Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
Shane Bergsma, Emily Pitler, Dekang Lin
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Shane Bergsma, Emily Pitler, Dekang Lin
Comments (0)