Matching and searching computations play an important role in the indexing of data. These computations are typically encoded in very tight loops with a single index variable and a simple search/ matching predicate. Their inherent sequential nature, either because of data dependences but more often because of very strong control dependences, makes it impossible to apply existing data dependence and parallelization analysis to exploit significant levels parallelism on traditional architectures. This paper describes a class of searching and matching computations and describes a mapping strategy to map these computations to hardware. We have developed a compiler analysis in SUIF using array data dependence analysis and implicit loop unrolling analysis to expose more parallelism for the parallel evaluation of these computations. Our compiler generates parallel hardware specifications in VHDL. The resulting parallel hardware yields significant performance improvements when these kernel oper...
Pablo Moisset, Pedro C. Diniz, Joonseok Park