The problem of selecting a sample subset sufficient to preserve diversity arises in many applications. One example is in the design of recombinant inbred lines (RIL) for genetic a...
Feng Pan, Adam Roberts, Leonard McMillan, David Th...
Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms do not con...
Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information. In this paper, we prove...
Recently the problem of dimensionality reduction (or, subspace learning) has received a lot of interests in many fields of information processing, including data mining, informati...
This paper is motivated to improve the performance of individual ensembles using a hybrid mechanism in the regression setting. Based on an error-ambiguity decomposition, we formal...
We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a "brea...
Many databases will not or can not be disclosed without strong guarantees that no sensitive information can be extracted. To address this concern several data perturbation techniq...
In this paper, we present Concept Chain Queries (CCQ), a special case of text mining in document collections focusing on detecting links between two topics across text documents. ...
Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to gro...