We propose to use a visual object (e.g., the baseball catcher) detection algorithm to find local, semantic objects in video frames in addition to an audio classification algorit...
Time-frequency representations of audio signals often resemble texture images. This paper derives a simple audio classification algorithm based on treating sound spectrograms as ...