Sciweavers

142 search results - page 5 / 29
» Entropy-Based Authorship Search in Large Document Collection...
Sort
View
SIGIR
2008
ACM
13 years 7 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
KDD
2009
ACM
169views Data Mining» more  KDD 2009»
14 years 2 months ago
On burstiness-aware search for document sequences
As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching su...
Theodoros Lappas, Benjamin Arai, Manolis Platakis,...
SIGIR
2012
ACM
11 years 10 months ago
Optimizing positional index structures for versioned document collections
Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and ...
Jinru He, Torsten Suel
IR
2008
13 years 7 months ago
Output-sensitive autocompletion search
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query wo...
Holger Bast, Christian Worm Mortensen, Ingmar Webe...
SIGIR
2011
ACM
12 years 10 months ago
Pseudo test collections for learning web search ranking functions
Test collections are the primary drivers of progress in information retrieval. They provide a yardstick for assessing the effectiveness of ranking functions in an automatic, rapi...
Nima Asadi, Donald Metzler, Tamer Elsayed, Jimmy L...