Sciweavers

129 search results - page 6 / 26
» Fully distributed EM for very large datasets
Sort
View
CCGRID
2010
IEEE
13 years 8 months ago
High Performance Dimension Reduction and Visualization for Large High-Dimensional Data Analysis
Abstract--Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining appro...
Jong Youl Choi, Seung-Hee Bae, Xiaohong Qiu, Geoff...
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
14 years 8 months ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
TSP
2008
91views more  TSP 2008»
13 years 7 months ago
A Sequential Monte Carlo Method for Motif Discovery
We propose a sequential Monte Carlo (SMC)-based motif discovery algorithm that can efficiently detect motifs in datasets containing a large number of sequences. The statistical di...
Kuo-ching Liang, Xiaodong Wang, Dimitris Anastassi...
ICDM
2007
IEEE
129views Data Mining» more  ICDM 2007»
14 years 2 months ago
Semi-supervised Clustering Using Bayesian Regularization
Text clustering is most commonly treated as a fully automated task without user supervision. However, we can improve clustering performance using supervision in the form of pairwi...
Zuobing Xu, Ram Akella, Mike Ching, Renjie Tang
ICC
2007
IEEE
102views Communications» more  ICC 2007»
14 years 2 months ago
Distributed Scheduling in Input Queued Switches
— Dealing with RTTs (Round Trip Time) in IQ switches has been recently recognized as a challenging problem, especially if considering distributed (multi-chip) scheduler implement...
Alessandra Scicchitano, Andrea Bianco, Paolo Giacc...