To recognize and understand a person’s emotion has been known as one of the most important issue in human-computer interaction. In this paper, we present a multimodal system that supports emotion recognition from both visual and acoustic feature analysis. Our main achievement is that with this bimodal method, we can effectively extend the recognized emotion categories compared to when only visual or acoustic feature analysis works alone. We also show that by carefully cooperating bimodal features, the recognition precision of each emotion category will exceed the limit set up by the single modality, both visual and acoustic. Moreover, we believe our system is closer to real human perception and experience and hence will make emotion recognition closer to practical application in the future.