ct tasks such as extraction of relational information from text [Young] [Jacobs]. We describe a method for classifying news stories using Alternative systems [Biebricher] [Lewis] use statisticalMemory Based Reasoning (MBR) (a k-nearest neighbor approaches such as conditional probabilities on summary method), that does not require manual topic definitions. representations of the documents. One problem with statisti Using an already coded training database of about 50,000 cal representations o( the training database is the highstories from the Dow Jones Press Release News Wire, and dimensionality of the training space, generally at least 150kSEEKER [Stanfill] (a text retrieval system that supports rel unique single features -- or words. Such a large featureevance feedback) as the underlying match engine, codes are space makes it difficult to compute probabilities involvingassigned to new, unseen stories with a recall of about 80% conjunctions or co-occurrence of features. It also makes t...
Brij M. Masand, Gordon Linoff, David L. Waltz