The primary constraint in the effective mining of data streams is the large volume of data which must be processed in real time. In many cases, it is desirable to store a summary...
In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (...
Jiawei Han, Liangliang Cao, Sangkyum Kim, Xin Jin,...
Support Vector Machines (SVMs) are a leading tool in classification and pattern recognition and the kernel function is one of its most important components. This function is used...
Shaoyi Zhang, M. Maruf Hossain, Md. Rafiul Hassan,...
We study non-parametric measures for the problem of comparing distributions, which arise in anomaly detection for continuous time series. Non-parametric measures take two distribu...
Social networks tend to contain some amount of randomness and some amount of non-randomness. The amount of randomness versus non-randomness affects the properties of a social netw...
Biclustering refers to simultaneous clustering of objects and their features. Use of biclustering is gaining momentum in areas such as text mining, gene expression analysis and co...
Alok N. Choudhary, Arifa Nisar, Waseem Ahmad, Wei-...
Interactive analysis of datacube, in which a user navigates a cube by launching a sequence of queries is often tedious since the user may have no idea of what the forthcoming query...
Abstract. Since current search engines employ link-based ranking algorithms as an important tool to decide a ranking of sites, Web spammers are making a significant effort to man...
Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number of patterns, efficient discovery of user-interest-based...