Sciweavers

EMNLP
2011

Named Entity Recognition in Tweets: An Experimental Study

12 years 11 months ago
Named Entity Recognition in Tweets: An Experimental Study
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http:// github.com/aritter/twitter_nlp
Alan Ritter, Sam Clark, Mausam, Oren Etzioni
Added 20 Dec 2011
Updated 20 Dec 2011
Type Journal
Year 2011
Where EMNLP
Authors Alan Ritter, Sam Clark, Mausam, Oren Etzioni
Comments (0)