This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequently-asked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments--on a collection of Usenet FAQs and on a FAQ-like set of customer-submitted questions to several large retail companies--suggest the plausibility of learning for summarization.
Adam L. Berger, Vibhu O. Mittal