Static index pruning techniques aim at removing from the posting lists of an inverted file the references to documents which are likely to be not relevant for answering user querie...
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...
The internet causes a continuous emergence of novel forms of scholarly communication and collaboration. Electronic publishing provides a means for representing eventual outcomes o...
Wolfram Horstmann, Peter Reimer, Jochen Schirrwage...
This paper presents our ongoing effort on developing a principled methodology for automatic ontology mapping based on BayesOWL, a probabilistic framework we developed for modeling ...