This paper presents a computer vision stereo based interface to navigate inside a 3-D Internet city, using body gestures. A wide-baseline stereo pair of cameras is used to obtain 3-D body models of the user’s hands and head in a small desk -area environment. The interface feeds this information to an HMM gesture classifier to reliably recognize the user’s browsing commands. To illustrate the features of this interface we describe its application to our 3-D Internet browser which facilitates the recollection of information by organizing and embedding it inside a virtual city through which the user navigates.