Abstract-- This paper introduces a generalization of the Gravitational Clustering Algorithm proposed by Gomez et all in [1]. First, it is extended in such a way that not only the G...
- The Inductive Monitoring System (IMS) software was developed to provide a technique to automatically produce health monitoring knowledge bases for systems that are either difficu...
This paper describes work-in-progress on an Interactive Sonification Toolkit which has been developed in order to aid the analysis of general data sets. The toolkit allows the des...
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and proteinprotein interactions. The graphs are often large ...
Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Inde...
Clustering is the process of locating patterns in large data sets. It is an active research area that provides value to scientific as well as business applications. Practical clust...
In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual a...
In recent years, privacy preserving data mining has become very important because of the proliferation of large amounts of data on the internet. Many data sets are inherently high...
A common approach for dealing with large data sets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive data sets, howeve...
Jon Feldman, S. Muthukrishnan, Anastasios Sidiropo...
This paper presents empirical results that contradict the prevailing opinion that entity extraction is a boring solved problem. In particular, we consider data sets that resemble ...