In this paper, we present a framework for 6D absolute scale motion and structure estimation of a multi-camera system in challenging indoor environments. It operates in real-time and employs information from two cameras with non-overlapping fields of view. Monocular Visual Odometry supplying up-to-scale 6D motion information is carried out in each of the cameras, and the metric scale is recovered via a linear solution by imposing the known static transformation between both sensors. The redundancy in the motion estimates is finally exploited by a statistical fusion to an optimal 6D metric result. The proposed technique is robust to outliers and able to continuously deliver a reasonable measurement of the scale factor. The quality of the framework is demonstrated by a concise evaluation on indoor datasets, including a comparison to accurate ground truth data provided by an external motion tracking system.