Sciweavers

4757 search results - page 936 / 952
» Generalized Posynomial Performance Modeling
Sort
View
KDD
2009
ACM
198views Data Mining» more  KDD 2009»
14 years 8 months ago
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
KDD
2007
ACM
165views Data Mining» more  KDD 2007»
14 years 8 months ago
Finding low-entropy sets and trees from binary data
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occu...
Eino Hinkkanen, Hannes Heikinheimo, Heikki Mannila...
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
14 years 8 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
KDD
2004
ACM
163views Data Mining» more  KDD 2004»
14 years 8 months ago
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
William W. Cohen, Sunita Sarawagi
SIGMOD
2008
ACM
191views Database» more  SIGMOD 2008»
14 years 8 months ago
Efficient aggregation for graph summarization
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying character...
Yuanyuan Tian, Richard A. Hankins, Jignesh M. Pate...