As amounts of publicly uvuiluble video dutu grow, the need to query this dutu efficiently becomes signijcunt. Consequently, content-bused retrievul of video datu turns out to be U chullenging and important problem. In this puper, we uddress the speclfic uspect of inferring semuntics automatically fiom ruw video dutu. In purticulur, we introduce U new video dutu model thut supports the integruted use of two different upprouches for mupping low-level feutures to high-level concepts. Firstly, the model is extended with U rule-bused upproach that supports sputio-temporalformulizution of high-level concepts, and then with a stochustic upprouch. Furthermore, results on reul tennis video dutu are presented demonsrrating the vulidity of both approaches, as well us udvuntuges of their integruted use.