

Cross-document summarization by concept classification

14 years 1 months ago
Cross-document summarization by concept classification
In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).
Hilda Hardy, Nobuyuki Shimizu, Tomek Strzalkowski,
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Authors Hilda Hardy, Nobuyuki Shimizu, Tomek Strzalkowski, Ting Liu, Xinyang Zhang, G. Bowden Wise
Comments (0)