Latent topic models have been successfully applied as an unsupervised topic discovery technique in large document collections. With the proliferation of hypertext document collect...
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been ide...
Gleb Skobeltsyn, Toan Luu, Ivana Podnar Zarko, Mar...
This article describes a sketch-based framework for semi-automatic annotation of historical document collections. It is motivated by the fact that fully automatic methods, while h...
This paper addresses the challenging problem of similarity search over widely distributed ultra-high dimensional data. Such an application is retrieval of the top-k most similar d...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...
Full-text information retrieval systems have traditionally been designed for archival environments. They often provide little or no support for adding new documents to an existing...
CT This paper explores several methods for visualizing the thematic content of large document collections. As opposed to traditional query-driven document retrieval, these methods ...
Nancy Miller, Elizabeth G. Hetzler, Grant Nakamura...
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
A hierarchical browsing interface to a document collection can be constructed by identifying the phrases that recur in the full text of the documents and structuring them into a h...
A significant amount of information is stored in computer systems today, but people are struggling to manage their documents such that the information is easily found. XML is a de...