To summarize is to reducein complexity, and hencein length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary. Document extracts consisting of roughly 20% of the original can be as informative as the full text of a document, which suggests that even shorter extracts may be useful indicative summaries. The trends in our results are in agreement with those of Edmundson who used a subjectively weighted combination of featuresasopposedto training the feature weightsusinga corpus. We have developed a trainable summarization program that is grounded in a sound statistical framework.
Julian Kupiec, Jan O. Pedersen, Francine Chen