The modeling of the human visual attention into a computational attention model leads to the split of visual features into several independent channels. Then, a difficult problem arises to combine these maps, having different dynamic ranges or distribution. When several maps are considered, such process is mandatory in order to compute a single measure of interest for each location, regardless of which features contributed to the salience. Several strategies of cue combination are proposed in this paper for the spatial cues as well as the temporal saliency. Finally, some user tests on still image and video databases leads to highlight one operator.