A new spatio-temporal model for simulating the bottomup visual attention is proposed. It has been built from numerous important properties of the Human Visual System (HVS). This paper focuses both on the architecture of the model and on its performances. Given that the spatial model of the bottom-up visual attention has already been defined [1], the temporal dimension is more accurately described. A qualitative and quantitative comparison with human fixations collected from an eye tracking apparatus is undertaken. From the former, the quality of the prediction is deemed very good whereas the latter illustrates that the best predictor of the human fixation consists of the sum all visual features (achromatic, chromatic and motion).