In this paper we study the privacy preservation properties of a specific technique for query log anonymization: tokenbased hashing. In this approach, each query is tokenized, and ...
Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously....
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, Che...
Sequential patterns of d-gaps exist pervasively in inverted lists of Web document collection indices due to the cluster property. In this paper the information of d-gap sequential...
As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly sup...
The automatic identification of a user's task has the potential to improve information filtering systems that rely on implicit measures of interest and whose effectiveness ma...