Sciweavers

92 search results - page 5 / 19
» sigir 2006
Sort
View
SIGIR
2006
ACM
14 years 3 months ago
Finding near-duplicate web pages: a large-scale evaluation of algorithms
Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...
Monika Rauch Henzinger
SIGIR
2006
ACM
14 years 3 months ago
Building a test collection for complex document information processing
Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. A...
David D. Lewis, Gady Agam, Shlomo Argamon, Ophir F...
SIGIR
2006
ACM
14 years 3 months ago
Measuring similarity of semi-structured documents with context weights
In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimenta...
Christopher C. Yang, Nan Liu
SIGIR
2006
ACM
14 years 3 months ago
Learning a ranking from pairwise preferences
We introduce a novel approach to combining rankings from multiple retrieval systems. We use a logistic regression model or an SVM to learn a ranking from pairwise document prefere...
Ben Carterette, Desislava Petkova
SIGIR
2006
ACM
14 years 3 months ago
User expectations from XML element retrieval
The primary aim of XML element retrieval is to return to users XML elements, rather than whole documents. This poster describes a small study, in which we elicited users’ expect...
Stamatina Betsi, Mounia Lalmas, Anastasios Tombros...