Abstract— The Topic Detection and Tracking (TDT) research community investigates information retrieval methods for organizing a constantly arriving stream of news articles by the events that they discuss. Our best system for the open evaluations of TDT has used an approach that turned out to be problematic when the cluster detection technology was deployed in a real world setting. To avoid generating “garbage” clusters, we had to revert to a different approach and to explore engineering solutions that were not motivated by the model. Our experiences also led us to propose extensions to the formal TDT evaluation. I. OVERVIEW The Topic Detection and Tracking (TDT) research community investigates information retrieval methods for organizing a constantly arriving stream of news articles by the events that they discuss. TDT is explored in an open and cooperative evaluation sponsored by DARPA and run by NIST; the evaluations have run every year since 1998. One of the organization tasks...
James Allan, Stephen M. Harding, David Fisher, Alv