Few systems combine both Embodied Conversational Agents (ECAs) and multimodal input. This research aims at modeling the behavior of adults and children during their multimodal interaction with ECAs. A Wizard-of-Oz setup was used and users were video-recorded while interacting with 2D ECAs in a game scenario with speech and pen as input modes. We found that frequent social cues and natural Human-Human syntax condition the verbal interaction of both groups with ECAs. Multimodality accounted for 21% of inputs: it was used for integrating conversational and social aspects (by speech) into task-oriented actions (by pen). We closely examined temporal and semantic integration of modalities: most of the time, speech and gesture overlapped and produced complementary or redundant messages; children also tended to produce concurrent multimodal inputs, as a way of doing several things at the same time. Design implications of our results for multimodal bidirectional ECAs and game systems are discu...