Major media companies such as The Financial Times, the Wall Street Journal or Reuters generate huge amounts of textual news data on a daily basis. Mining frequent patterns in this...
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clusteri...
In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for different scenarios: (...
Jiawei Han, Liangliang Cao, Sangkyum Kim, Xin Jin,...
Abstract--Large high dimension datasets are of growing importance in many fields and it is important to be able to visualize them for understanding the results of data mining appro...
Jong Youl Choi, Seung-Hee Bae, Xiaohong Qiu, Geoff...
Using SQL has not been considered an efficient and feasible way to implement data mining algorithms. Although this is true for many data mining, machine learning and statistical a...