We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
Recent research in privacy-preserving data mining (PPDM) has become increasingly popular due to the wide application of data mining and the increased concern regarding the protect...
Bin Yang, Hiroshi Nakagawa, Issei Sato, Jun Sakuma
Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequen...
Recently, a large amount of work has been done in XML data mining. However, we observed that most of the existing works focus on the snapshot XML data, while XML data is dynamic i...
Qiankun Zhao, Sourav S. Bhowmick, Mukesh K. Mohani...