Shared nothing multiprocessor archit.ecture is known t.obe more scalable to support very large databases. Compared to other join strategies, a hash-ba9ed join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mechanism, skew effect can deteriorate t.he system performance severely. In this paper, we propose two sl;ew avoidance techniques and one skew resolution method.In particular, three new parallel hash join algorithms are presented. We developed an analytical model to study the effectiveness of these algorithms. The performance study indicates that the proposed techniques offer substant,ial improvement. over the conventional strategies in the presence of data skew. It is also interesting to observe that the skew avoidance t,echniques provide join strategies that are robust against data skew; where as ...
Kien A. Hua, Chiang Lee