Ideally pattern recognition machines provide constant output when the inputs are transformed under a group G of desired invariances. These invariances can be achieved by enhancing...
The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocab...
Thoughit has been possible in the past to learn to predict DNAhydration patterns from crystallographic data, there is ambiguity in the choice of training data (both in terms of th...
Dawn M. Cohen, Casimir A. Kulikowski, Helen Berman
Due to the large variation and richness of visual inputs, statistical learning gets more and more concerned in the practice of visual processing such as visual tracking and recogn...
A serious bottleneck in the development of trainable text summarization systems is the shortage of training data. Constructing such data is a very tedious task, especially because...
This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-tra...
Mark Steedman, Rebecca Hwa, Stephen Clark, Miles O...
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web ï...
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification ...
The problem of identifying mislabeled training examples has been examined in several studies, with a variety of approaches developed for editing the training data to obtain better...
Training statistical models to detect nonnative sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the exte...