Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract seque...
The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was mot...
We propose a novel patch-based image representation that is useful because it (1) inherently detects regions with repetitive structure at multiple scales and (2) yields a paramete...
Lena Gorelick, Andrew Delong, Olga Veksler, Yuri B...
, Yunde Jia Model structure selection is currently an open problem in modeling data via Gaussian Mixture Models (GMM). This paper proposes a discriminative method to select GMM st...
The power of sparse signal coding with learned overcomplete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical infe...
Abstract. For a book, the title and abstract provide a good first impression of what to expect from it. For a database, getting a first impression is not so straightforward. Whil...
Abstract. In this paper we introduce a new approach to automatic attribute and granularity selection for building optimum regression trees. The method is based on the minimum descr...
—We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the...
The main statistics used in rough set data analysis, the approximation quality, is of limited value when there is a choice of competing models for predicting a decision variable. ...
We present two methods for unsupervised segmentation of words into morphemelike units. The model utilized is especially suited for languages with a rich morphology, such as Finnis...