Sciweavers

ICS
2009
Tsinghua U.

High-performance regular expression scanning on the Cell/B.E. processor

14 years 6 months ago
High-performance regular expression scanning on the Cell/B.E. processor
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors’ execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven’t changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multicore features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization. As a proof of conce...
Daniele Paolo Scarpazza, Gregory F. Russell
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where ICS
Authors Daniele Paolo Scarpazza, Gregory F. Russell
Comments (0)