Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings

9 years 10 months ago

Download cs.uwaterloo.ca

Compared with carefully edited prose, the language of social media is informal in the extreme. The application of NLP techniques in this context may require a better understanding of word usage within social media. In this paper, we compute a word embedding for a corpus of tweets, comparing it to a word embedding for Wikipedia. After learning a transformation of one vector space to the other, and adjusting similarity values according to term frequency, we identify words whose usage differs greatly between the two corpora. For any given word, the set of words closest to it in a particular embedding provides a characterization for that word’s usage within the corresponding corpora.

Luchen Tan, Haotian Zhang, Charles L. A. Clarke, M

Real-time Traffic

ACL 2015 | Computational Linguistics |

claim paper

» Extracting Lexicoconceptual Knowledge for Developing Persian WordNet

» A Semantic Network Approach to Measuring Relatedness

Post Info
More Details (n/a)

Added	13 Apr 2016
Updated	13 Apr 2016
Type	Journal
Year	2015
Where	ACL
Authors	Luchen Tan, Haotian Zhang, Charles L. A. Clarke, Mark D. Smucker

Comments (0)

Sciweavers

Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings

ACL 2015 | Computational Linguistics |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers