We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The mainidea is to use multiplealgorithms,each specialized for a di erent degree ofskew, and to use a smallsample of the relations being joined to determine which algorithmis appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms one, which we call virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skew-handling numbers from an actual implementation.
David J. DeWitt, Jeffrey F. Naughton, Donovan A. S