Abstract In this paper, we describe a novel approach to intrinsic plagiarism detection. Each suspicious document is divided into a series of consecutive, potentially overlapping ā...
Abstract. We propose a generative model for automatic query reformulations from an initial query using the underlying subtopic structure of top ranked retrieved documents. We addre...
Debasis Ganguly, Johannes Leveling, Gareth J. F. J...
Abstract This paper describes the University of Shefļ¬eld entry for the 3rd International Competition on Plagiarism Detection which attempted the monolingual external plagiarism d...
Rao Muhammad Adeel Nawab, Mark Stevenson, Paul D. ...
Abstract The paper overviews the vandalism detection task of the PANā11 competition. A new corpus is introduced which comprises about 30 000 Wikipedia edits in the languages Engl...
Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identiļ¬cation of high-quality cont...
This paper presents Yagada, an algorithm to search labelled graphs for anomalies using both structural data and numeric attributes. Yagada is explained using several security-rela...
Michael Davis, Weiru Liu, Paul Miller, George Redp...
A fundamental problem related to RDF query processing is selectivity estimation, which is crucial to query optimization for determining a join order of RDF triple patterns. In thi...
This paper explores correspondence and mixture topic modeling of documents tagged from two diļ¬erent perspectives. There has been ongoing work in topic modeling of documents with...
In this paper, we reveal a common deļ¬ciency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly;...