Journal of Jilin University Science Edition

Previous Articles     Next Articles

Optimization Algorithm of Two Table DataSkew Join Based on MapReduce

ZHAO Yulan   

  1. Information Faculty, Business College of Shanxi University, Taiyuan 030031, China
  • Received:2016-04-14 Online:2016-11-26 Published:2016-11-29
  • Contact: ZHAO Yulan E-mail:zhaoyulan24@163.com

Abstract: Aiming at the problem that Range partition algorithm could not optimize two table join efficiency, which contained heavily skewed data, we proposed an improved algorithm for the data skew connection. The algorithm took different treatment for skew data and nonskew data,  sent data to each Reduce node by using the methods of replicating and broadcasting,  and completed all the connection operation through a round of Map/Reduce tasks. The algorithm could effectively balance processing of each Reduce, which solved the impact of the heavily skewed data on the performance of two table join. The results show that the algorithm is effective by comparing with the traditional partition join algorithm.

Key words: optimization of join algorithm, data skew, MapReduce, Range partition algorithm

CLC Number: 

  • TP311