Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality cont...
This paper presents Yagada, an algorithm to search labelled graphs for anomalies using both structural data and numeric attributes. Yagada is explained using several security-rela...
Michael Davis, Weiru Liu, Paul Miller, George Redp...
A fundamental problem related to RDF query processing is selectivity estimation, which is crucial to query optimization for determining a join order of RDF triple patterns. In thi...
This paper explores correspondence and mixture topic modeling of documents tagged from two different perspectives. There has been ongoing work in topic modeling of documents with...
In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly;...
Many large-scale Web applications that require ranked top-k retrieval are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non...
George Beskales, Marcus Fontoura, Maxim Gurevich, ...
Aggregated search is the task of integrating results from potentially multiple specialized search services, or verticals, into the Web search results. The task requires predicting...
This paper addresses the problem of mining named entity translations from comparable corpora, specifically, mining English and Chinese named entity translation. We first observe...
Jinhan Kim, Long Jiang, Seung-won Hwang, Young-In ...
Modern computing systems generate large amounts of log data. System administrators or domain experts utilize the log data to understand and optimize system behaviors. Most system ...