A system that saves a digital copy of every document that users copy, print, or fax, without asking the user, has recently been proposed. Referred to as the Infinite Memory Multifunction Machine (IM3 ), this system solves most of the problem of lost documents. However, because of the indiscriminate way it captures data, it is important that users have easy-to-use retrieval tools. Two document analysis techniques are described that simplify retrieval from large collections like the IM3 . One technique detects duplicates or versions of a document. Another method automatically files a document in a hierarchy familiar to a user. Experimental results are presented that illustrate the performance of each method.
Jonathan J. Hull, Dar-Shyang Lee, John F. Cullen,