We present a system that combines multiple visual navigation techniques to achieve GPS-denied, non-line-of-sight SLAM capability for heterogeneous platforms. Our approach builds on several layers of vision algorithms, including sparse frame-to-frame structure from motion (visual odometry), a Kalman filter for fusion with inertial measurement unit (IMU) data and a distributed visual landmark matching capability with geometric consistency verification. We apply these techniques to implement a tag-along robot, where a human operator leads the way and a robot autonomously follows. We show results for a real-time implementation of such a system with real field constraints on CPU power and network resources.