Multimodal speaker diarization using oriented optical flow histograms

15 years 1 months ago

Download www.icsi.berkeley.edu

Speaker diarization is the task of partitioning an input stream into speaker homogeneous regions, or in other words, to determine "who spoke when." While approaches to this problem have traditionally relied entirely on the audio stream, the availability of accompanying video streams in recent diarization corpora has prompted the study of methods based on multimodal audio-visual features. In this work, we propose the use of robust video features based on oriented optical flow histograms. Using the state-of-the art ICSI diarization system, we show that, when combined with standard audio features, these features improve the diarization error rate by 14% percent over an audio-only baseline.

Mary Tai Knox, Gerald Friedland

Real-time Traffic

INTERSPEECH 2010 | Multimodal Audio-visual Features | Signal Processing | Speaker Homogeneous Regions | State-of-the Art Icsi |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Mary Tai Knox, Gerald Friedland

Comments (0)

Sciweavers

Multimodal speaker diarization using oriented optical flow histograms

INTERSPEECH 2010 | Multimodal Audio-visual Features | Signal Processing | Speaker Homogeneous Regions | State-of-the Art Icsi |

Explore & Download

Productivity Tools

Sciweavers