Sciweavers

142 search results - page 4 / 29
» Entropy-Based Authorship Search in Large Document Collection...
Sort
View
ICDE
2004
IEEE
151views Database» more  ICDE 2004»
14 years 9 months ago
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of impo...
Torsten Suel, Patrick Noel, Dimitre Trendafilov
CIKM
2010
Springer
13 years 6 months ago
Improved index compression techniques for versioned document collections
Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are u...
Jinru He, Junyuan Zeng, Torsten Suel
SAC
2005
ACM
14 years 1 months ago
A hierarchical naive Bayes mixture model for name disambiguation in author citations
Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web ...
Hui Han, Wei Xu, Hongyuan Zha, C. Lee Giles
EDBT
2004
ACM
133views Database» more  EDBT 2004»
14 years 7 months ago
HOPI: An Efficient Connection Index for Complex XML Document Collections
In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2?hop cover of a directed graph introduced by Cohen et al. In contrast to most o...
Ralf Schenkel, Anja Theobald, Gerhard Weikum
CIKM
2010
Springer
13 years 6 months ago
Document allocation policies for selective searching of distributed indexes
Indexes for large collections are often divided into shards that are distributed across multiple computers and searched in parallel to provide rapid interactive search. Typically,...
Anagha Kulkarni, Jamie Callan