Sciweavers

CIKM
2011
Springer
12 years 8 months ago
Factorization-based lossless compression of inverted indices
Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...
CIKM
2011
Springer
12 years 8 months ago
Learning to aggregate vertical results into web search results
Aggregated search is the task of integrating results from potentially multiple specialized search services, or verticals, into the Web search results. The task requires predicting...
Jaime Arguello, Fernando Diaz, Jamie Callan
CIKM
2011
Springer
12 years 8 months ago
Mining entity translations from comparable corpora: a holistic graph mapping approach
This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe...
Jinhan Kim, Long Jiang, Seung-won Hwang, Young-In ...
CIKM
2011
Springer
12 years 8 months ago
LogSig: generating system events from raw textual logs
Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system ...
Liang Tang, Tao Li, Chang-Shing Perng
CIKM
2011
Springer
12 years 8 months ago
Classifying trending topics: a typology of conversation triggers on Twitter
Twitter summarizes the great deal of messages posted by users in the form of trending topics that reflect the top conversations being discussed at a given moment. These trending ...
Arkaitz Zubiaga, Damiano Spina, Víctor Fres...
CIKM
2011
Springer
12 years 8 months ago
The impact of author ranking in a library catalogue
The field of information retrieval has witnessed over 50 years of research on retrieval methods for metadata descriptions and controlled indexing languages, the prototypical exam...
Jaap Kamps
CIKM
2011
Springer
12 years 8 months ago
Joint inference for cross-document information extraction
Previous information extraction (IE) systems are typically organized as a pipeline architecture of separated stages which make independent local decisions. When the data grows bey...
Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, Heng J...
CIKM
2011
Springer
12 years 8 months ago
Probabilistic near-duplicate detection using simhash
This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Sadhan Sood, Dmitri Loguinov
CIKM
2011
Springer
12 years 8 months ago
Integrating and querying web databases and documents
There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...
Carlos Garcia-Alvarado, Carlos Ordonez