We present a novel variational method for the simultaneous estimation of dense scene flow and structure from stereo sequences. In contrast to existing approaches that rely on a fully calibrated camera setup, we assume that only the intrinsic camera parameters are known. To couple the estimation of motion, structure and geometry, we propose a joint energy functional that integrates spatial and temporal information from two subsequent image pairs subject to an unknown stereo setup. We further introduce a normalisation of image and stereo constraints such that deviations from model assumptions can be interpreted in a geometrical way. Finally, we suggest a separate discontinuity-preserving regularisation to improve the accuracy. Experiments on calibrated and uncalibrated data demonstrate the excellent performance of our approach. We even outperform recent techniques for the rectified case that make explicit use of the simplified geometry.