
13 years 4 months ago
Exploring the mechanism of tonal contraction in taiwan Mandarin
This study investigates the mechanism of tonal contraction when a disyllabic unit is merged into a monosyllable at fast speech rate in Taiwan Mandarin. Various degrees of contract...
Chierh Cheng, Yi Xu, Michele Gubian
13 years 4 months ago
Can conversational word usage be used to predict speaker demographics?
This work surveys the potential for predicting demographic traits of individual speakers (gender, age, education level, ethnicity, and geographic region) using only word usage fea...
Dan Gillick
13 years 4 months ago
Learning speaker normalization using semisupervised manifold alignment
As a child acquires language, he or she: perceives acoustic information in his or her surrounding environment; identifies portions of the ambient acoustic information as languager...
Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin...
13 years 4 months ago
What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering
Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Mul...
Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria...
13 years 4 months ago
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition
We recently proposed a method for HMM adaptation to noisy environments called Linear Spline Interpolation (LSI). LSI uses linear spline regression to model the relationship betwee...
Michael L. Seltzer, Alex Acero
13 years 4 months ago
Fully automatic segmentation for prosodic speech corpora
While automatic methods for phonetic segmentation of speech can help with rapid annotation of corpora, most methods rely either on manually segmented data to initially train the p...
Sarah Hoffmann, Beat Pfister
13 years 4 months ago
Acoustic feature analysis in speech emotion primitives estimation
We recently proposed a family of robust linear and nonlinear estimation techniques for recognizing the three emotion primitives
Dongrui Wu, Thomas D. Parsons, Shrikanth S. Naraya...
13 years 4 months ago
Setup for acoustic-visual speech synthesis by concatenating bimodal units
This paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concaten...
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent...
13 years 4 months ago
Deep-structured hidden conditional random fields for phonetic recognition
We extend our earlier work on deep-structured conditional random field (DCRF) and develop deep-structured hidden conditional random field (DHCRF). We investigate the use of this n...
Dong Yu, Li Deng