

Visual Event Recognition in Videos by Learning from Web Data

14 years 11 months ago
Visual Event Recognition in Videos by Learning from Web Data
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pair-wise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined train...
Lixin Duan, Dong Xu, Wai-Hung Tsang, Jiebo Luo
Added 08 Apr 2010
Updated 14 May 2010
Type Conference
Year 2010
Where CVPR
Authors Lixin Duan, Dong Xu, Wai-Hung Tsang, Jiebo Luo
Comments (0)