Effort prediction is a very important issue for software project management. Historical project data sets are frequently used to support such prediction. But missing data are oft...
This paper describes a theoretical framework for inducing knowledge from incomplete data sets. The general framework can be used with any formalism based on a lattice structure. It...
The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics compute...
Genomic medicine aims to revolutionize health care by applying our growing understanding of the molecular basis of disease. Research in this arena is data intensive, which means d...
Background: Interpreting the results of high-throughput experiments, such as those obtained from DNA-microarrays, is an often time-consuming task due to the high number of data-po...
Felix Kokocinski, Nicolas Delhomme, Gunnar Wrobel,...
We present a framework for segmenting and storing filament networks from scalar volume data. Filament structures are commonly found in data generated using high-throughput microsc...
A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each oth...
Abstract: We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partiti...
Several bioinformatics data sets are naturally represented as graphs, for instance gene regulation, metabolic pathways, and proteinprotein interactions. The graphs are often large ...
Random k-nearest-neighbour (RKNN) imputation is an established algorithm for filling in missing values in data sets. Assume that data are missing in a random way, so that missing...