Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search e...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
This paper describes the participation of Columbus Project of Microsoft Research Asia (MSRA) in the GeoCLEF 2006 (a cross-language geographical retrieval track which is part of Cr...
Zhisheng Li, Chong Wang 0002, Xing Xie, Xufa Wang,...
We use Wikipedia articles to semantically inform the generation of query models. To this end, we apply supervised machine learning to automatically link queries to Wikipedia artic...
We present a new algorithm to measure domain-specific readability. It iteratively computes the readability of domainspecific resources based on the difficulty of domain-specific c...