Web-Scale Distributional Similarity and Entity Set Expansion

15 years 4 months ago

Download www.aclweb.org

Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly scalable implementation based on distributional similarity, implemented in the MapReduce framework and deployed over a 200 billion word crawl of the Web. The pairwise similarity between 500 million terms is computed in 50 hours using 200 quad-core nodes. We apply the learned similarity matrix to the task of automatic set expansion and present a large empirical study to quantify the effect on expansion performance of corpus size, corpus quality, seed composition and seed size. We make public an experimental testbed for set expansion analysis that includes a large collection of diverse entity sets extracted from Wikipedia.

Patrick Pantel, Eric Crestan, Arkady Borkovsky, An

Real-time Traffic

Distributional Similarity | EMNLP 2009 | Natural Language Processing | Pairwise Semantic Similarity | Pairwise Similarity |

claim paper

» Distributional Similarity vs PU Learning for Entity Set Expansion

» WebSets extracting sets of entities from the web using unsupervised information extraction

» Semantic Multimedia Retrieval using Lexical Query Expansion and ModelBased Reranking

» Information Retrieval and Information Extraction in TREC Genomics 2007

» Explaining Similarity of Terms

» Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition

» Gathering and Ranking Photos of Named Entities with High Precision High Recall and Diversi...

» The Complexity of Early Deciding Set Agreement

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EMNLP
Authors	Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas

Comments (0)

Sciweavers

Web-Scale Distributional Similarity and Entity Set Expansion

Distributional Similarity | EMNLP 2009 | Natural Language Processing | Pairwise Semantic Similarity | Pairwise Similarity |

Explore & Download

Productivity Tools

Sciweavers