Semantic analysis of a document collection can be viewed as an unsupervised clustering of the constituent words and documents around hidden or latent concepts. This has shown to i...
Abstract. This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Rand...
Inverted indexes are the most fundamental and widely used data structures in information retrieval. For each unique word occurring in a document collection, the inverted index sto...
Manish Patil, Sharma V. Thankachan, Rahul Shah, Wi...
Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper w...
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the presen...
Andreas Hauser, Markus Heller, Elisabeth Leiss, Kl...