Journal of Jilin University Science Edition
Previous Articles Next Articles
ZHAO Yulan
Received:
Online:
Published:
Contact:
Abstract: Aiming at the problem that Range partition algorithm could not optimize two table join efficiency, which contained heavily skewed data, we proposed an improved algorithm for the data skew connection. The algorithm took different treatment for skew data and nonskew data, sent data to each Reduce node by using the methods of replicating and broadcasting, and completed all the connection operation through a round of Map/Reduce tasks. The algorithm could effectively balance processing of each Reduce, which solved the impact of the heavily skewed data on the performance of two table join. The results show that the algorithm is effective by comparing with the traditional partition join algorithm.
Key words: optimization of join algorithm, data skew, MapReduce, Range partition algorithm
CLC Number:
ZHAO Yulan. Optimization Algorithm of Two Table DataSkew Join Based on MapReduce[J].Journal of Jilin University Science Edition, 2016, 54(06): 1383-1387.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://xuebao.jlu.edu.cn/lxb/EN/
http://xuebao.jlu.edu.cn/lxb/EN/Y2016/V54/I06/1383
Cited