Sciweavers

8349 search results - page 1442 / 1670
» Introduction to Information Retrieval
Sort
View
WWW
2007
ACM
16 years 5 months ago
On anonymizing query logs via token-based hashing
In this paper we study the privacy preservation properties of a specific technique for query log anonymization: tokenbased hashing. In this approach, each query is tokenized, and ...
Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins
WWW
2007
ACM
16 years 5 months ago
A new suffix tree similarity measure for document clustering
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...
Hung Chim, Xiaotie Deng
WWW
2007
ACM
16 years 5 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2007
ACM
16 years 5 months ago
Topic sentiment mixture: modeling facets and opinions in weblogs
In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously....
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, Che...
158
Voted
WWW
2006
ACM
16 years 5 months ago
Question answering on top of the BT digital library
In this poster we present an approach to query answering over knowledge sources that makes use of different ontology management components within an application scenario of the BT...
Johanna Völker, Peter Haase, Philipp Cimiano,...
« Prev « First page 1442 / 1670 Last » Next »