Crossmodal Matching of Speakers Using Lip and Voice Features in Temporally Non-Overlapping Audio and Video Streams