As computational power increases, tele-immersive applications are an emerging trend. These applications make extensive demands on computational resources through their heavy use of real-time 3D reconstruction algorithms. Since computer vision developers do not necessarily have parallel programming expertise, it is important to give them the tools and capabilities to naturally express computer vision algorithms, yet retain high efficiency by exploiting modern GPU and large-scale multi-core platforms. In this paper, we describe our optimization efforts for a tele-immersion application by tuning it for GPU and multicore platforms. Additionally, we introduce a method to obtain portability, high performance, and increase programmer productivity. Categories and Subject Descriptors