Video genre identification methods are frequently based on image or motion analysis, which are relatively timeconsuming processes. Since such approaches are tractable by batch processing, as-soon-as-possible identification requires faster methods. In this paper, we investigate the use of audio-only methods for on-the-fly video classification. We propose to use several acoustic feature streams and we evaluate various combination schemes at the frame or at the score level. Results are compared to those obtained by humans, according to the listening duration. Although the system based on model combination slightly outperforms the humans on very soon detection. The latter remain significantly more accurate on long sessions.