Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
In this paper, we introduce an algebraic approach to the foundations of data mining. Our approach is based upon two algebras of functions de ned over a common state space X and a ...
We advocate a new learning task that deals with orders of items, and we call this the Learning from Order Examples (LOE) task. The aim of the task is to acquire the rule that is u...
Current Data Mining techniques usually do not have a mechanism to automatically infer semantic features inherent in the data being “mined”. The semantics are either injected i...
Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and traini...
Wei Fan, Haixun Wang, Philip S. Yu, Shaw-hwa Lo, S...
We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instea...
The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the r...
It is well-known that for high dimensional data clustering, standard algorithms such as EM and the K-means are often trapped in local minimum. Many initialization methods were pro...
Chris H. Q. Ding, Xiaofeng He, Hongyuan Zha, Horst...
The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However, spherical k-means ca...
Proceedings of IEEE Data Mining, IEEE Press, pp. 581-584, 2002. We describe an interactive way to generate a set of clusters for a given data set. The clustering is done by constr...
Michael R. Berthold, Bernd Wiswedel, David E. Patt...