The immensevolumeof data resulting from DNAmicroarray experiments, accompaniedby an increase in the numberof publications discussing gene-related discoveries, presents a majordata...
Hagit Shatkay, Stephen Edwards, W. John Wilbur, Ma...
We propose using large-scale clustering of dependency relations between verbs and multiword nouns (MNs) to construct a gazetteer for named entity recognition (NER). Since dependen...
We present a parallel version of BIRCH with the objective of enhancing the scalability without compromising on the quality of clustering. The incoming data is distributed in a cyc...
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...