Multimodal interaction combines input from multiple sensors such as pointing devices or speech recognition systems, in order to achieve more fluid and natural interaction. Twohanded interaction has been used recently to enrich graphical interaction. Building applications that use such combined interaction requires new software techniques and frameworks. Using additional devices means that user interface toolkits must be more flexible with regard to input devices and event types. The possibility of parallel interactions must also be taken into account, with consequences on the structure of toolkits. Finally, frameworks must be provided for the combination of events and status of several devices. This paper reports on the extensions we made to the direct manipulation interface toolkit Whizz in order to experiment two-handed interaction. These extensions range from structural adaptations of the toolkit to new techniques for specifying the time-dependent fusion of events.