Sciweavers

PVLDB
2010

SigMatch: Fast and Scalable Multi-Pattern Matching

13 years 9 months ago
SigMatch: Fast and Scalable Multi-Pattern Matching
Multi-pattern matching involves matching a data item against a large database of “signature” patterns. Existing algorithms for multipattern matching do not scale well as the size of the signature database increases. In this paper, we present sigMatch – a fast, versatile, and scalable technique for multi-pattern signature matching. At its heart, sigMatch organizes the signature database into a (processor) cache-efficient q-gram index structure, called the sigTree. The sigTree groups patterns based on common sub-patterns, such that signatures that don’t match can be quickly eliminated from the matching process. The sigTree also uses parallel Bloom filters and a technique to reduce imbalances across groups, for improved performance. Using extensive empirical evaluation across three diverse domains, we show that sigMatch often outperforms existing methods by an order of magnitude or more.
Ramakrishnan Kandhan, Nikhil Teletia, Jignesh M. P
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Where PVLDB
Authors Ramakrishnan Kandhan, Nikhil Teletia, Jignesh M. Patel
Comments (0)