We introduce a novel and inexpensive approach for the temporal alignment of speech to highly imperfect transcripts from automatic speech recognition (ASR). Transcripts are generat...
—As it is true for human perception that we gather information from different sources in natural and multi-modality forms, learning from multi-modalities has become an effective ...
Modeling visual concepts using supervised or unsupervised machine learning approaches are becoming increasing important for video semantic indexing, retrieval, and filtering appli...
Abstract From lyrics-display on electronic music players and Karaoke videos to surtitles for live Chinese opera performance, one feature is common to all these everyday functionali...
Digital watermarking is a process that embeds an imperceptible signature or watermark in a digital file containing audio, image, text or video data. The watermark is later used to...
Arun Kejariwal, Sumit Gupta, Alexandru Nicolau, Ni...