In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document struc...
Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyp...
Sequential patterns of d-gaps exist pervasively in inverted lists of Web document collection indices due to the cluster property. In this paper the information of d-gap sequential...
In Latent Semantic Indexing (LSI), a collection of documents is often pre-processed to form a sparse term-document matrix, followed by a computation of a low-rank approximation to...
We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from...
Deng Cai, Xiaofei He, Wei Vivian Zhang, Jiawei Han