In this paper, we propose an approach for TV commercial video classification by the categories of advertised products or services (e.g. automobiles, healthcare products, etc). Since automatic speech recognition (ASR) and optical character recognition (OCR) can deliver meaningful textual information related to products or services, TV commercial video classification is formulated as the problem of text categorization. However, there exist two challenges. Firstly, the background music of TV commercials makes ASR techniques yield erroneous and deficient output transcripts. Secondly, even if ASR and OCR could work perfectly, the limited textual information from TV commercials do not suffice to train a generic and non-overfitting text categorizer. For the first issue, our approach resorts to the external resources to expand deficient ASR and OCR transcripts. The output transcripts of ASR and OCR are parsed to yield a few keywords, on which a Web searching is executed to retrieve relevant a...
Yantao Zheng, Lingyu Duan, Qi Tian, Jesse S. Jin