High-performance regular expression scanning on the Cell/B.E. processor

14 years 11 months ago

Download domino.research.ibm.com

Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors’ execution time and represents the ﬁrst stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like ﬂex haven’t changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multicore features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization. As a proof of conce...

Daniele Paolo Scarpazza, Gregory F. Russell

Real-time Traffic

ICS 2009 | Multiple Threads | Search Engines | SIMD Instructions | Theoretical Computer Science |

claim paper

Post Info
More Details (n/a)

Added	20 May 2010
Updated	20 May 2010
Type	Conference
Year	2009
Where	ICS
Authors	Daniele Paolo Scarpazza, Gregory F. Russell

Comments (0)

Sciweavers

High-performance regular expression scanning on the Cell/B.E. processor

ICS 2009 | Multiple Threads | Search Engines | SIMD Instructions | Theoretical Computer Science |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers