Sciweavers

ACL
2008

Mining Wikipedia Revision Histories for Improving Sentence Compression

14 years 1 months ago
Mining Wikipedia Revision Histories for Improving Sentence Compression
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this newfound data, we propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.
Elif Yamangil, Rani Nelken
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where ACL
Authors Elif Yamangil, Rani Nelken
Comments (0)