

Named Entity Recognition in Tweets: An Experimental Study

13 years 2 months ago
Named Entity Recognition in Tweets: An Experimental Study
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://
Alan Ritter, Sam Clark, Mausam, Oren Etzioni
Added 20 Dec 2011
Updated 20 Dec 2011
Type Journal
Year 2011
Authors Alan Ritter, Sam Clark, Mausam, Oren Etzioni
Comments (0)