We describe a statistical signature of chunks and an algorithm for finding chunks. While there is no formal definition of chunks, they may be reliably identified as configurat...
An important area of data mining is anomaly detection, particularly for fraud. However, little work has been done in terms of detecting anomalies in data that is represented as a g...
We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two extended variants. A set of seven mailboxes comprising about 65,000 mails f...
Graph grammars combine the relational aspect of graphs with the iterative and recursive aspects of string grammars, and thus represent an important next step in our ability to dis...
Jacek P. Kukluk, Lawrence B. Holder, Diane J. Cook
Natural extension is a powerful tool for combining the expert judgments in the framework of imprecise probability theory. However, it assumes that every judgment is “true” and...
Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize th...
Abstract. This paper introduces a new algorithm for approximate mining of frequent patterns from streams of transactions using a limited amount of memory. The proposed algorithm co...
: DNA micro-arrays provide thousands of genomic expressions on the same subject. A main issue is then to find the subset of genes whose degeneration is responsible of a certain typ...
Simone Garatti, Sergio Bittanti, Diego Liberati, A...
Abstract. We present a method to extract a time series (Number of Active Requests (NAR)) from web cache logs which serves as a transport level measurement of internet traffic. This...