Given a set D = {d1, d2, ..., dD} of D strings of total length n, our task is to report the "most relevant" strings for a given query pattern P. This involves somewhat mo...
In the KL divergence framework, the extended language modeling approach has a critical problem estimating a query model, which is the probabilistic model that encodes user’s inf...
Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the join...
Deng Cai, Qiaozhu Mei, Jiawei Han, Chengxiang Zhai
Semantic similarity between words or phrases is frequently used to find matching correlations between search queries and documents when straightforward matching of terms fails. Th...
We present a method for automated topic suggestion. Given a plain-text input document, our algorithm produces a ranking of novel topics that could enrich the input document in a m...