Post streams from public social media platforms such as Instagram and Twitter have become precious but noisy data sources to discover what is happening around us. In this paper, we...
Abstract. Clustering validation is a crucial part of choosing a clustering algorithm which performs best for an input data. Internal clustering validation is efficient and realisti...
Learning user/item relation is a key issue in recommender system, and existing methods mostly measure the user/item relation from one particular aspect, e.g., historical ratings, e...
Bin Fu, Guandong Xu, Longbing Cao, Zhihai Wang, Zh...
We introduce the problem of rank matrix factorisation (RMF). That is, we consider the decomposition of a rank matrix, in which each row is a (partial or complete) ranking of all co...
Thanh Le Van, Matthijs van Leeuwen, Siegfried Nijs...
In outlying aspects mining, given a query object, we aim to answer the question as to what features make the query most outlying. The most recent works tackle this problem using tw...
Nguyen Xuan Vinh, Jeffrey Chan, James Bailey, Chri...
Crowdsourcing provides a new way to distribute enormous tasks to a crowd of annotators. The divergent knowledge background and personal preferences of crowd annotators lead to nois...
Abstract. Interestingness measures stand as proxy for “real human interest,” but their effectiveness is rarely studied empirically due to the difficulty of obtaining ground-tr...
Greg Harris, Anand V. Panangadan, Viktor K. Prasan...
Abstract. Social networks provide unparalleled opportunities for marketing products or services. Along this line, tremendous efforts have been devoted to the research of targeted ...
This study describes a statistically motivated approach to constraint-based data cleansing that derives the cause of errors from a distribution of conflicting tuples. In real-worl...
As one of the main components of haze, topics with respect to PM2.5 are coming into people’s sight recently in China. In this paper, we try to predict PM2.5 concentrations in Da...