The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
While classic information retrieval methods return whole documents as a result of a query, many information demands would be better satisfied by fine-grain access inside the docu...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of inform...
In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the e...