Abstract Clustering text data streams is an important issue in data mining community and has a number of applications such as news group filtering, text crawling, document organiza...
It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of clickthrough data logged by Web search engines, which record the...
Qiankun Zhao, Steven C. H. Hoi, Tie-Yan Liu, Soura...
We propose a new clustering algorithm, called SyMP, which is based on synchronization of pulse-coupled oscillators. SyMP represents each data point by an Integrate-and-Fire oscill...
This paper describes our work in learning online models that forecast real-valued variables in a high-dimensional space. A 3GB database was collected by sampling 421 real-valued s...
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor proce...