Sciweavers

INTERSPEECH
2010
13 years 6 months ago
Minimally invasive surgery for spoken dialog systems
David Suendermann, Jackson Liscombe, Roberto Piera...
INTERSPEECH
2010
13 years 6 months ago
Chirp complex cepstrum-based decomposition for asynchronous glottal analysis
It was recently shown that complex cepstrum can be effectively used for glottal flow estimation by separating the causal and anticausal components of speech. In order to guarantee...
Thomas Drugman, Thierry Dutoit
INTERSPEECH
2010
13 years 6 months ago
Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables
This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word's predictability and its morphological structure....
Iris Hanique, Barbara Schuppler, Mirjam Ernestus
INTERSPEECH
2010
13 years 6 months ago
Towards mixed language speech recognition systems
Multilingual speech recognition obviously involves numerous research challenges, including common phoneme sets, adaptation on limited amount of training data, as well as mixed lan...
David Imseng, Hervé Bourlard, Mathew Magima...
INTERSPEECH
2010
13 years 6 months ago
Towards spoken term discovery at scale with zero resources
Aren Jansen, Kenneth Church, Hynek Hermansky
INTERSPEECH
2010
13 years 6 months ago
Hierarchical bottle neck features for LVCSR
This paper investigates the combination of different neural network topologies for probabilistic feature extraction. On one hand, a five-layer neural network used in bottle neck f...
Christian Plahl, Ralf Schlüter, Hermann Ney
INTERSPEECH
2010
13 years 6 months ago
Revisiting VTLN using linear transformation on conventional MFCC
In this paper, we revisit the linear transformation for VTLN on conventional MFCC proposed by Sanand et al. in [1], using the idea of band-limited interpolation. The filter-bank i...
Doddipatla Rama Sanand, Ralf Schlüter, Herman...
INTERSPEECH
2010
13 years 6 months ago
HMM-based automatic visual speech segmentation using facial data
We describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechani...
Utpala Musti, Asterios Toutios, Slim Ouni, Vincent...
INTERSPEECH
2010
13 years 6 months ago
Comparison of approaches for instrumentally predicting the quality of text-to-speech systems
In this paper, we compare and combine different approaches for instrumentally predicting the perceived quality of Text-to-Speech systems. First, a log-likelihood is determined by ...
Sebastian Möller, Florian Hinterleitner, Tiag...
INTERSPEECH
2010
13 years 6 months ago
Decision tree state clustering with word and syllable features
In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others hav...
Hank Liao, Christopher Alberti, Michiel Bacchiani,...