Sciweavers

ICDM
2009
IEEE

Multi-document Summarization by Information Distance

14 years 6 months ago
Multi-document Summarization by Information Distance
—We are now living in a world where information is growing and updating quickly. Knowledge can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined as one of which has the minimal information distance to the entire document set. And the best update summary has the minimal conditional information distance to a document cluster given that a prior document cluster has already been read. We propose two methods to approximate information distance between two documents, one by compression and the other by the coding theory. Experiments on the DUC 2007 dataset1 and the TAC 2008 dataset2 have proved that our method closely correlates with the human-written summaries and outperforms LexRank in many categories under the ROUGE evaluation criterion. Keywords-Data Mining; Text Mining; Kolmogorov Complexity; Information Distance
Chong Long, Minlie Huang, Xiaoyan Zhu, Ming Li
Added 23 May 2010
Updated 23 May 2010
Type Conference
Year 2009
Where ICDM
Authors Chong Long, Minlie Huang, Xiaoyan Zhu, Ming Li
Comments (0)