Abstract— Robotics researchers are often faced with realtime constraints, and for that reason algorithmic and implementation-level optimization can dramatically increase the overall performance of a robot. In this paper we illustrate how a substantial run-time gain can be achieved by taking advantage of the extended instruction sets found in modern processors, in particular the SSE1 and SSE2 instruction sets. We present an SSE version of Monte Carlo Localization that results in an impressive 9x speedup over an optimized scalar implementation. In the process, we discuss SSE implementations of atan, atan2 and exp that achieve up to a 4x speedup in these mathematical operations alone.