Voxel-based Viterbi Active Speaker Tracking (V-VAST) with best view selection for video lecture post-production

13 years 6 months ago

Download mirlab.org

An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using the tracking information, the system then extracts from the available camera views the most frontal face view of the active speaker to display.

Damien Kelly, Anil Kokaram, Frank Boland

Real-time Traffic

Active Speaker | ICASSP 2011 | Signal Processing | Speech Activity | Viterbi Active Speaker |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Damien Kelly, Anil Kokaram, Frank Boland

Comments (0)

Sciweavers

Voxel-based Viterbi Active Speaker Tracking (V-VAST) with best view selection for video lecture post-production

Active Speaker | ICASSP 2011 | Signal Processing | Speech Activity | Viterbi Active Speaker |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers