Stop word detection is attempted in this work in the context of retrieval of document images in the compressed domain. Algorithms are presented to identify text lines and words an...
We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection s...
Interactive clustering refers to situations in which a human labeler is willing to assist a learning algorithm in automatically clustering items. We present a related but somewhat...
Sumit Basu, Danyel Fisher, Steven M. Drucker, Hao ...
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper,...