An image-based approach provides an efficient way for visual speech synthesis. In an image-based visual speech synthesis system, a few lip images, namely visemes, are used for generatingan arbitrary new sentence. Many approaches select visemes manually. In this paper, we propose a method for a system to automatically select visemes by minimizing the synthesis error. The feasibility of the proposed method has been demonstrated by experiments. We describe an application of image-based visual speech synthesis to a multimodal communication agent for a translation task where two people, who speak different languages, can talk to each other over the Internet.