Voxel-based Viterbi Active Speaker Tracking (V-VAST) with best view selection for video lecture post-production