Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (...
On-line news agents provide commenting facilities for readers to express their views with regard to news stories. The number of user supplied comments on a news article may be ind...
Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
Expertise retrieval has received increased interests in recent years, whose task is to suggest people with relevant expertise. Motivated by the observation that communities could ...
Clustering ensembles combine different clustering solutions into a single robust and stable one. Most of existing methods become highly time-consuming when the data size turns to ...
Bitmap indexes are widely used in Decision Support Systems (DSSs) to improve query performance. In this paper, we evaluate the use of compressed inverted indexes with adapted quer...
Data generalization is widely used to protect identities and prevent inference of sensitive information during the public release of microdata. The k-anonymity model has been exte...
Many outlier detection methods do not merely provide the decision for a single data object being or not being an outlier but give also an outlier score or “outlier factor” sig...
Leveraging information from relevance assessments has been proposed as an effective means for improving retrieval. We introduce a novel language modeling method which uses inform...
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hy...