The lack of standards for Romanization of Thai proper names makes searching activity a challenging task. This is particularly important when searching for people-related documents ...
Co-training, a paradigm of semi-supervised learning, may alleviate effectively the data scarcity problem (i.e., the lack of labeled examples) in supervised learning. The standard ...
When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on “non-class” attribute noise and we consider how a frequent fault-t...
One important problem proposed recently in the field of web mining is website classification problem. The complexity together with the necessity to have accurate and fast algorit...
Abstract. An object o of a database D is called a hot item, if there is a sufficiently large population of other objects in D that are similar to o. In other words, hot items are ...
Thomas Bernecker, Hans-Peter Kriegel, Matthias Ren...
It is a challenging and important task to retrieve images from a large and highly varied image data set based on their visual contents. Problems like how to fill the semantic gap b...
In this paper we propose to study budget semi-supervised learning, i.e., semi-supervised learning with a resource budget, such as a limited memory insufficient to accommodate and/...
Zhi-Hua Zhou, Michael Ng, Qiao-Qiao She, Yuan Jian...
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
The usual data mining setting uses the full amount of data to derive patterns for different purposes. Taking cues from machine learning techniques, we explore ways to divide the d...