Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-s...
In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two ...
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, ...
User browsing information, particularly their non-search related activity, reveals important contextual information on the preferences and the intent of web users. In this paper, ...
Software is a ubiquitous component of our daily life. We often depend on the correct working of software systems. Due to the difficulty and complexity of software systems, bugs an...
David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo,...
Influence maximization is the problem of finding a small subset of nodes (seed nodes) in a social network that could maximize the spread of influence. In this paper, we study the ...
Identifying similar keywords, known as broad matches, is an important task in online advertising that has become a standard feature on all major keyword advertising platforms. Eff...
Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such...
Albert Bifet, Bernhard Pfahringer, Geoffrey Holmes...
This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially use...
A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden s...