Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

15 years 4 months ago

Download webdocs.cs.ualberta.ca

In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.

Shane Bergsma, Emily Pitler, Dekang Lin

Real-time Traffic

ACL 2010 | Computational Linguistics | N-gram | Supervised Nlp Classifiers | Web-scale N-gram Data |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	ACL
Authors	Shane Bergsma, Emily Pitler, Dekang Lin

Comments (0)

Sciweavers

Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

ACL 2010 | Computational Linguistics | N-gram | Supervised Nlp Classifiers | Web-scale N-gram Data |

Explore & Download

Productivity Tools

Sciweavers