Sciweavers

CIKM
2008
Springer
13 years 11 months ago
An extension of PLSA for document clustering
In this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to cocluster documents and terms simultaneously. We show on three datase...
Young-Min Kim, Jean-François Pessiot, Massi...
CIKM
2008
Springer
13 years 11 months ago
Estimating the number of answers with guarantees for structured queries in p2p databases
Structured P2P overlays supporting standard database functionalities are a popular choice for building large-scale distributed data management systems. In such systems, estimating...
Marcel Karnstedt, Kai-Uwe Sattler, Michael Ha&szli...
CIKM
2008
Springer
13 years 11 months ago
A new method for indexing genomes using on-disk suffix trees
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous ...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt...
CIKM
2008
Springer
13 years 11 months ago
Modeling LSH for performance tuning
Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search q...
Wei Dong, Zhe Wang, William Josephson, Moses Chari...
CIKM
2008
Springer
13 years 11 months ago
To swing or not to swing: learning when (not) to advertise
Web textual advertising can be interpreted as a search problem over the corpus of ads available for display in a particular context. In contrast to conventional information retrie...
Andrei Z. Broder, Massimiliano Ciaramita, Marcus F...
CIKM
2008
Springer
13 years 11 months ago
Achieving both high precision and high recall in near-duplicate detection
To find near-duplicate documents, fingerprint-based paradigms such as Broder's shingling and Charikar's simhash algorithms have been recognized as effective approaches a...
Lian'en Huang, Lei Wang, Xiaoming Li
CIKM
2008
Springer
13 years 11 months ago
Combining concept hierarchies and statistical topic models
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...
CIKM
2008
Springer
13 years 11 months ago
Modeling multi-step relevance propagation for expert finding
An expert finding system allows a user to type a simple text query and retrieve names and contact information of individuals that possess the expertise expressed in the query. Thi...
Pavel Serdyukov, Henning Rode, Djoerd Hiemstra
CIKM
2008
Springer
13 years 11 months ago
Trada: tree based ranking function adaptation
Machine Learned Ranking approaches have shown successes in web search engines. With the increasing demands on developing effective ranking functions for different search domains, ...
Keke Chen, Rongqing Lu, C. K. Wong, Gordon Sun, La...