We present a general approach for automatically matching electronic slides to videos of corresponding presentations for use in distance learning and video proceedings of conferences. We deal with a large variety of videos, various frame compositions and color balances, arbitrary slides sequence and with dynamic cameras switching, pan, tilt and zoom. To achieve high accuracy, we develop a two-phases process with unsupervised scene background modelling. In the first phase, scale invariant feature transform (SIFT) keypoints are applied to frame to slide matching, under constraint projective transformation (constraint homography) using a random sample consensus (RANSAC). Successful first-phase matches are then used to automatically build a scene background model. In the second phase the background model is applied to the remaining unmatched frames to boost the matching performance for difficult cases such as wide field of view camera shots where the slide shows as a small portion of th...