Regular Expression Matching with Multi-Strings

14 years 9 months ago

Download siam.org

Regular expression matching is a key task (and often computational bottleneck) in a variety of software tools and applications. For instance, the standard grep and sed utilities, scripting languages such as perl, internet traffic analysis, XML querying, and protein searching. The basic definition of a regular expression is that we combine characters with union, concatenation, and kleene star operators. The length m is proportional to the number of characters. However, often the initial operation is to concatenate characters in fairly long strings, e.g., if we search for certain combinations of words in a firewall. As a result, the number k of strings in the regular expression is significantly smaller than m. Our main result is a new algorithm that essentially replaces m with k in the complexity bounds for regular expression matching. More precisely, after an O(m log k) time and O(m) space preprocessing of the expression, we can match it in a string presented as a stream of characters ...

Philip Bille, Mikkel Thorup

Real-time Traffic

Character Class Interval | Discrete Algorithms | Regular Expression Matching | SODA 2010 | Variable Length Gaps |

claim paper

Post Info
More Details (n/a)

Added	01 Mar 2010
Updated	02 Mar 2010
Type	Conference
Year	2010
Where	SODA
Authors	Philip Bille, Mikkel Thorup

Comments (0)

Sciweavers

Regular Expression Matching with Multi-Strings

Character Class Interval | Discrete Algorithms | Regular Expression Matching | SODA 2010 | Variable Length Gaps |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers