In the H.264/AVC coding standard, motion estimation (ME) is allowed to use multiple reference frames to make full use of reducing temporal redundancy in a video sequence. Although it can further reduce the motion compensation errors, it introduces tremendous computational complexity as well. In this paper, we propose a statistical learning approach to reduce the computation involved in the multireference motion estimation. Some representative features are extracted in advance to build a learning model. Then, an off-line pre-classification approach is used to determine the best reference frame number according to the run-time features. It turns out that motion estimation will be performed only on the necessary reference frames based on the learning model. Experimental results show that the computation complexity is about three times faster than the conventional fast ME algorithm while the video quality degradation is negligible.