—With the development of video-sharing websites, P2P, micro-blog, mobile WAP websites, and so on, sensitive videos can be more easily accessed. Effective sensitive video recognition is necessary for web content security. Among web sensitive videos, this paper focuses on violent and horror videos. Based on color emotion and color harmony theories, we extract visual emotional features from videos. A video is viewed as a bag and each shot in the video is represented by a key frame which is treated as an instance in the bag. Then, we combine multi-instance learning (MIL) with sparse coding to recognize violent and horror videos. The resulting MIL-based model can be updated online to adapt to changing web environments. We propose a cost-sensitive context-aware multi-instance sparse coding (MI-SC) method, in which the contextual structure of the key frames is modeled using a graph, and fusion between audio and visual features is carried out by extending the classic sparse coding into cost-...