While many systems are available for audio-visual people collaboration and data collaboration, systems for collaboration on physical objects are few. In this paper, we present WebDOVE, a system designed to address the needs of collaborative physical tasks. WebDOVE supports both live video streams and pen-based gesture recognition in multi-party bidirectional communication via inexpensive web cameras. WebDOVE allows distributed collaborators to draw over video streams to produce and interpret pointing and representational gestures as readily as they do in faceto-face settings. To accommodate potential diverse platform requirements from different participants, WebDOVE is designed to be a web-based platform-independent and browser-independent collaboration solution. We show via experiments that despite WebDOVE’s platform independency, it requires moderate network bandwidth and CPU load, which make WebDOVE a practical solution for real-world applications.