Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words

10 years 1 months ago

Download www.aclweb.org

Most previous work of text normalization on informal text made a strong assumption that the system has already known which tokens are non-standard words (NSW) and thus need normalization. However, this is not realistic. In this paper, we propose a method for NSW detection. In addition to the information based on the dictionary, e.g., whether a word is out-ofvocabulary (OOV), we leverage novel information derived from the normalization results for OOV words to help make decisions. Second, this paper investigates two methods using NSW detection results for named entity recognition (NER) in social media data. One adopts a pipeline strategy, and the other uses a joint decoding fashion. We also create a new data set with newly added normalization annotation beyond the existing named entity labels. This is the ﬁrst data set with such annotation and we release it for research purpose. Our experiment results demonstrate the effectiveness of our NSW detection method and the beneﬁt of NSW d...

Chen Li, Yang Liu

Real-time Traffic

ACL 2015 | Computational Linguistics |

claim paper

Added	13 Apr 2016
Updated	13 Apr 2016
Type	Journal
Year	2015
Where	ACL
Authors	Chen Li, Yang Liu

Sciweavers

Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words

ACL 2015 | Computational Linguistics |

Explore & Download

Productivity Tools

Sciweavers