An adaptive initialization method for speaker Diarization based on prosodic features

15 years 6 months ago

Download infoscience.epfl.ch

The following article presents a novel, adaptive initialization scheme that can be applied to most state-of-the-art Speaker Diarization algorithms, i.e. algorithms that use agglomerative hierarchical clustering with Bayesian Information Criterion (BIC) and Gaussian Mixture Models (GMMs) of framebased cepstral features (MFCCs). The initialization method is a combination of the recently proposed “adaptive seconds per Gaussian” (ASPG) method and a new pre-clustering and number of initial clusters estimation method based on prosodic features. The presented initialization method has two important advantages. First, the method requires no manual tuning and is robust against ﬁle length and speaker count variations. Second, the method outperforms our previously used initialization methods on all benchmark ﬁles that were presented in the 2006, 2007, and 2009 NIST Rich Transcription (RT) evaluations and results in a Diarization Error Rate (DER) improvement of up to 67% (relative).

David Imseng, Gerald Friedland

Real-time Traffic

Adaptive Initialization Scheme | ICASSP 2010 | Initialization Methods | Presented Initialization Method | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	06 Dec 2010
Updated	06 Dec 2010
Type	Conference
Year	2010
Where	ICASSP
Authors	David Imseng, Gerald Friedland

Comments (0)

Sciweavers

An adaptive initialization method for speaker Diarization based on prosodic features

Adaptive Initialization Scheme | ICASSP 2010 | Initialization Methods | Presented Initialization Method | Signal Processing |

Explore & Download

Productivity Tools

Sciweavers