This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformula...
Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation a...
This paper describes a rather simplistic method of unsupervised morphological analysis of words in an unknown language. All what is needed is a raw text corpus in the given langua...
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
In supervised learning, we commonly assume that training and test data are sampled from the same distribution. However, this assumption can be violated in practice and then standa...