All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a coll...
Srivatsava Daruru, Nena M. Marin, Matt Walker, Joy...
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occu...
Eino Hinkkanen, Hannes Heikinheimo, Heikki Mannila...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
Graphs are widely used to model real world objects and their relationships, and large graph datasets are common in many application domains. To understand the underlying character...
Yuanyuan Tian, Richard A. Hankins, Jignesh M. Pate...