Current news interfaces are largely driven by recent information, even though many events are better interpreted in context of previous related events. To address this problem, we consider the task of constructing an explicit representation of a "saga"--i.e., a long-running series of related events. We define a timeline as a concrete representation of a "saga" and we propose two unsupervised methods for timeline construction and compare their performance to manually-produced timelines using a tree edit distance-based measure. Preliminary results using these techniques on a weblog corpus and a supplementary news corpus are presented and show both promise and challenges.
Ramnath Balasubramanyan, Frank Lin, William W. Coh