Sciweavers

161 search results - page 21 / 33
» Workshop on massive datasets
Sort
View
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 7 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
KDD
2001
ACM
216views Data Mining» more  KDD 2001»
14 years 7 months ago
The distributed boosting algorithm
In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneo...
Aleksandar Lazarevic, Zoran Obradovic
INFOCOM
2008
IEEE
14 years 1 months ago
ALPACAS: A Large-Scale Privacy-Aware Collaborative Anti-Spam System
— While the concept of collaboration provides a natural defense against massive spam emails directed at large numbers of recipients, designing effective collaborative anti-spam s...
Zhenyu Zhong, Lakshmish Ramaswamy, Kang Li
CIDM
2007
IEEE
14 years 1 months ago
Data Mining of MISR Aerosol Product using Spatial Statistics
— In climate models, aerosol forcing is the major source of uncertainty in climate forcing, over the industrial period. To reduce this uncertainty, instruments on satellites have...
Tao Shi, Noel Cressie
HIPC
2004
Springer
14 years 23 days ago
Performance Characteristics of a Cosmology Package on Leading HPC Architectures
Abstract. The Cosmic Microwave Background (CMB) is a snapshot of the Universe some 400,000 years after the Big Bang. The pattern of anisotropies in the CMB carries a wealth of info...
Jonathan Carter, Julian Borrill, Leonid Oliker