Computers have been widely deployed to our daily lives, but human-computer interaction still lacks intuition. Researchers intend to resolve these shortcomings by augmenting traditional systems with human-like interaction capabilities. Knowledge about human emotion, behavior, and intention is necessary to construct convenient interaction mechanisms. Today, dedicated hardware often infers the emotional state from human body measures. Similar to humans interpreting facial expressions, our approach acquires video information using standard hardware that does not interfere with people to accomplish this task. It exploits model-based techniques that accurately localize facial features, seamlessly track them through image sequences, and finally interpret the visible information. We make use of state-of-the-art techniques and specifically adapt most of the components involved to this scenario, which provides high accuracy and real-time capability. We base our experimental evaluation on publ...