Sciweavers

1254 search results - page 232 / 251
» Making Hard Problems Harder
Sort
View
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
14 years 9 months ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
KDD
2007
ACM
182views Data Mining» more  KDD 2007»
14 years 9 months ago
Cleaning disguised missing data: a heuristic approach
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei
KDD
2004
ACM
112views Data Mining» more  KDD 2004»
14 years 9 months ago
A rank sum test method for informative gene discovery
Finding informative genes from microarray data is an important research problem in bioinformatics research and applications. Most of the existing methods rank features according t...
Lin Deng, Jian Pei, Jinwen Ma, Dik Lun Lee
SIGMOD
2009
ACM
177views Database» more  SIGMOD 2009»
14 years 9 months ago
Exploiting context analysis for combining multiple entity resolution systems
Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descript...
Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotr...
SIGMOD
2008
ACM
215views Database» more  SIGMOD 2008»
14 years 9 months ago
CSV: visualizing and mining cohesive subgraphs
Extracting dense sub-components from graphs efficiently is an important objective in a wide range of application domains ranging from social network analysis to biological network...
Nan Wang, Srinivasan Parthasarathy, Kian-Lee Tan, ...