Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources o...
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from ...
Supervised approaches to Data Mining are particularly appealing as they allow for the extraction of complex relations from data objects. In order to facilitate their application i...
In Chinese texts, words composed of single or multiple characters are not separated by spaces, unlike most western languages. Therefore Chinese word segmentation is considered an ...
Abstract. We present a possibly great improvement while performing semisupervised learning tasks from training data sets when only a small fraction of the data pairs is labeled. In...