Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of minin...
Abstract—Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can...
Databases are a key technology for molecular biology which is a very data intensive discipline. Since molecular biological databases are rather heterogeneous, unification and data...
High dimensional data visualization is critical to data analysts since it gives a direct view of original data. We present a method to visualize large amount of high dimensional d...
Missing data handling is an important preparation step for most data discrimination or mining tasks. Inappropriate treatment of missing data may cause large errors or false result...