Sciweavers

MTA
2016

Audio-visual speaker diarization using fisher linear semi-discriminant analysis

8 years 8 months ago
Audio-visual speaker diarization using fisher linear semi-discriminant analysis
Speaker diarization aims to automatically answer the question “who spoke when” given a speech signal. In this work, we have focused on applying the FLSD approach, a semi-supervised version of Fisher Linear Discriminant analysis, both in the audio and the video signals to form a complete multimodal speaker diarization system. Extensive experiments have proven that the FLSD method boosts the performance of the face diarization task (i.e. the task of discovering faces over time given only the visual signal). In addition, we have proven through experimentation that applying the FLSD method for discriminating between faces is also independent of the initial feature space and remains relatively unaffectedasthenumberof faces increases. Finally, a fusion method is proposed that leads to performance improvement in comparison to the best individual modality, which is the audio signal. Keywords Speaker diarization · FLsD · FLD · Audio-visual fusion
Nikolaos Sarafianos, Theodoros Giannakopoulos, Ser
Added 08 Apr 2016
Updated 08 Apr 2016
Type Journal
Year 2016
Where MTA
Authors Nikolaos Sarafianos, Theodoros Giannakopoulos, Sergios Petridis
Comments (0)