Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

16 years 1 months ago

Download www.aclweb.org

This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method diﬀers from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not require any explicit clue indicating which documents are likely to encode parallel paraphrases, as they report on the same events or describe the same stories. Large sets of paraphrases are collected through exhaustive pairwise alignment of small needles, i.e., sentence fragments, across a haystack of Web document sentences. The paper describes experiments on a set of about one billion Web documents, and evaluates the extracted paraphrases in a natural-language Web search application.

Marius Pasca, Péter Dienes

Real-time Traffic

IJCNLP 2005 | Natural Language Processing | Paraphrases | Textual Web Documents | Web Documents |

claim paper

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	IJCNLP
Authors	Marius Pasca, Péter Dienes

Comments (0)

Sciweavers

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

IJCNLP 2005 | Natural Language Processing | Paraphrases | Textual Web Documents | Web Documents |

Explore & Download

Productivity Tools

Sciweavers