Sciweavers

725 search results - page 94 / 145
» A Generalization of Repetition Threshold
Sort
View
WWW
2008
ACM
14 years 9 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
WWW
2007
ACM
14 years 9 months ago
Causal relation of queries from temporal logs
In this paper, we study a new problem of mining causal relation of queries in search engine query logs. Causal relation between two queries means event on one query is the causati...
Yizhou Sun, Kunqing Xie, Ning Liu, Shuicheng Yan, ...
WWW
2007
ACM
14 years 9 months ago
EPCI: extracting potentially copyright infringement texts from the web
In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
KDD
2009
ACM
193views Data Mining» more  KDD 2009»
14 years 9 months ago
Probabilistic frequent itemset mining in uncertain databases
Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard "certain&quo...
Andreas Züfle, Florian Verhein, Hans-Peter Kr...
KDD
2009
ACM
202views Data Mining» more  KDD 2009»
14 years 9 months ago
Correlated itemset mining in ROC space: a constraint programming approach
Correlated or discriminative pattern mining is concerned with finding the highest scoring patterns w.r.t. a correlation measure (such as information gain). By reinterpreting corre...
Siegfried Nijssen, Tias Guns, Luc De Raedt