

Collecting paraphrase corpora from volunteer contributors

14 years 9 months ago
Collecting paraphrase corpora from volunteer contributors
Extensive and deep paraphrase corpora are important for a variety of natural language processing and user interaction tasks. In this paper, we present an approach which i) collects multiple paraphrases per given item from volunteers and ii) incentivises responsible contributions by volunteer contributors. Our approach is to solicit paraphrases from Web volunteers, both collecting new paraphrases with no prompting and asking contributors to guess partially obfuscated paraphrases. To test the approach, we have implemented an online game, 1001 Paraphrases (, and deployed it to collect 20,944 entries focused on paraphrases of 400 statements. The approach complements existing text extraction methods and has some inherent unique advantages. We present and motivate our design as well as share preliminary observations and lessons learned about the performance of the approach. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning – k...
Timothy Chklovski
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where KCAP
Authors Timothy Chklovski
Comments (0)