Lip reading provides useful information in speech perception and language understanding, especially when the auditory speech is degraded. However, many current automatic lip reading systems impose some restrictions on users. In this paper, we present our research e orts in the Interactive System Laboratory, towards unrestricted lip reading. We rst introduce a top-down approach to automatically track and extract lip regions. This technique makes it possible to acquire visual information in real-time without limiting the user's freedom of movement. We then discuss normalization algorithms to preprocess images for di erent lightning conditions global illumination and side illumination. We also compare di erent visual preprocessing methods such as raw image, Linear Discriminant Analysis LDA, and Principle Component Analysis PCA. We demonstrate the feasibility of the proposed methods by development of a modular system for exible human-computer interaction via both visual and acoustic ...