The key idea behind Inspector Joins is that during the I/O partitioning phase of a hash-based join, we have the opportunity to look at the actual data itself and then use this knowledge in two ways: (1) to create specialized indexes, specific to the given query on the given data, for optimizing the CPU cache performance of the subsequent join phase of the algorithm, and (2) to decide which join phase algorithm best suits this specific query. We show how inspector joins, employing novel statistics and specialized indexes, match or exceed the performance of state-of-the-art cache-friendly hash join algorithms. For example, when run on eight or more processors, our experiments show
Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbo