Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingenc...
Inderjit S. Dhillon, Subramanyam Mallela, Dharmend...
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be c...
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalize...
Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonar...
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions....
William W. Cohen, Richard C. Wang, Robert F. Murph...
This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 differ...
Shlomo Argamon, Marin Saric, Sterling Stuart Stein
Distance function computation is a key subtask in many data mining algorithms and applications. The most effective form of the distance function can only be expressed in the conte...
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal o...