Sciweavers

52 search results - page 5 / 11
» Effective compression for the web: exploiting document linka...
Sort
View
WWW
2005
ACM
14 years 8 months ago
Improving Web search efficiency via a locality based static pruning method
The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not on...
Edleno Silva de Moura, Célia Francisca dos ...
WWW
2003
ACM
14 years 8 months ago
Detecting Near-replicas on the Web by Content and Hyperlink Analysis
The presence of replicas or near-replicas of documents is very common on the Web. Documents may be replicated completely or partially for different reasons (versions, mirrors, etc...
Ernesto Di Iorio, Michelangelo Diligenti, Marco Go...
CACM
1998
110views more  CACM 1998»
13 years 7 months ago
Viewing WISs as Database Applications
abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...
Gustavo O. Arocena, Alberto O. Mendelzon
WISE
2009
Springer
14 years 4 months ago
Entry Pairing in Inverted File
Abstract. This paper proposes to exploit content and usage information to rearrange an inverted index for a full-text IR system. The idea is to merge the entries of two frequently ...
Hoang Thanh Lam, Raffaele Perego, Nguyen Thoi Minh...
WEBI
2005
Springer
14 years 1 months ago
A Semi-Supervised Document Clustering Algorithm Based on EM
Document clustering is a very hard task in Automatic Text Processing since it requires to extract regular patterns from a document collection without a priori knowledge on the cat...
Leonardo Rigutini, Marco Maggini