We propose a novel method for detecting hands and hand-held objects in desktop manipulation situations. In order to achieve robust tracking under few constraints, we use multiple image sensors, that is, a RGB camera, a stereo camera, and an IR camera. By using these sensors, our system realized robust tracking without the prior knowledge of an object even if there are moving people or objects in the background. We experimentally verified the performance of object tracking by each of the three sensors and evaluated the effectiveness of their integration.