Search Sciweavers | Sciweavers

11016 search results - page 1874 / 2204

» Digital Information Retrieval

117

click to vote

WWW
2007
ACM

155views Internet Technology» more WWW 2007»

On anonymizing query logs via token-based hashing

16 years 5 months ago

Download www2007.org

In this paper we study the privacy preservation properties of a specific technique for query log anonymization: tokenbased hashing. In this approach, each query is tokenized, and ...

Ravi Kumar, Jasmine Novak, Bo Pang, Andrew Tomkins

claim paper

Read More »

130

Voted

WWW
2007
ACM

157views Internet Technology» more WWW 2007»

A new suffix tree similarity measure for document clustering

16 years 5 months ago

Download www2007.org

In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree ...

Hung Chim, Xiaotie Deng

claim paper

Read More »

119

click to vote

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

16 years 5 months ago

Download infolab.stanford.edu

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

136

Voted

WWW
2007
ACM

170views Internet Technology» more WWW 2007»

Topic sentiment mixture: modeling facets and opinions in weblogs

16 years 5 months ago

Download sifaka.cs.uiuc.edu

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously....

Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, Che...

claim paper

Read More »

128

Voted

WWW
2006
ACM

140views Internet Technology» more WWW 2006»

Towards practical genre classification of web documents

16 years 5 months ago

Download es.csiro.au

Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...

George Ferizis, Peter Bailey

claim paper

Read More »

« Prev « First page 1874 / 2204 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers