Efficient matching of incoming events to persistent queries is fundamental to event pattern matching, complex event processing, and publish/subscribe systems. Recent processing engines based on non-deterministic finite automata (NFAs) have demonstrated scalability in the number of queries that can be efficiently executed on a single machine. However, existing NFA based systems are limited to processing events on a single machine. Consequently, their event processing capacity cannot be increased by adding more machines. In this paper, we present an experimental evaluation of different methods for distributing an event processing system that is based on NFAs across multiple machines in a cluster. Our results show that careful input stream partitioning gives close to linear performance scaleup for CPU bound workloads. Categories and Subject Descriptors H.2 [Database Management]: Systems—Query Processing General Terms Experimentation, design, performance Keywords Publish-subscribe, C...