Sciweavers

2252 search results - page 319 / 451
» Improving Random Forests
Sort
View
KDD
2004
ACM
132views Data Mining» more  KDD 2004»
14 years 11 months ago
A probabilistic framework for semi-supervised clustering
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clu...
Sugato Basu, Mikhail Bilenko, Raymond J. Mooney
KDD
2003
ACM
146views Data Mining» more  KDD 2003»
14 years 11 months ago
Probabilistic discovery of time series motifs
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalize...
Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonar...
STOC
2003
ACM
141views Algorithms» more  STOC 2003»
14 years 11 months ago
Better streaming algorithms for clustering problems
We study clustering problems in the streaming model, where the goal is to cluster a set of points by making one pass (or a few passes) over the data using a small amount of storag...
Moses Charikar, Liadan O'Callaghan, Rina Panigrahy
STOC
2003
ACM
142views Algorithms» more  STOC 2003»
14 years 11 months ago
Optimal probabilistic fingerprint codes
We construct binary codes for fingerprinting. Our codes for n users that are -secure against c pirates have length O(c2 log(n/ )). This improves the codes proposed by Boneh and Sh...
Gábor Tardos
OSDI
2002
ACM
14 years 11 months ago
Taming Aggressive Replication in the Pangaea Wide-Area File System
Pangaea is a wide-area file system that supports data sharing among a community of widely distributed users. It is built on a symmetrically decentralized infrastructure that consi...
Yasushi Saito, Christos T. Karamanolis, Magnus Kar...