We have been working on two different KDD systems for scientific data. One system involves comparative genomics, where the database contains more than 60,000 plant gene and protei...
In data mining, similarity or distance between attributes is one of the central notions. Such a notion can be used to build attribute hierarchies etc. Similarity metrics can be us...
We consider the problem of nding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple examp...
Gautam Das, King-Ip Lin, Heikki Mannila, Gopal Ren...
Wedescribe an industrial-strength data mining application in telecommunications.Theapplication requires building a short (7 byte) profile for all telephonenumbersseen on a large t...
WHIRL is an extensionof relational databasesthat canperform "soft joins" basedon the similarity of textual identifiers;thesesoftjoins extendthe traditional operationof j...
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clusteri...
Direct marketing response models seek to identify individuals most likely to respond to marketing solicitations. Such models are commonly evaluatedon classification accuracyand so...
An important issue in data mining is the recognition of complex dependencies between attributes. Past techniques for identifying attribute dependence include correlation coefficie...
In this workweproposea generalisation of the notion of associationrule in the contextof flat transactions to that of a compositeassociation rule in the context of a structured dir...