吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

支持大规模流数据处理的在线MapReduce数据传输机制

魏晓辉, 李聪, 李洪亮, 李翔, 刘圆圆, 李丽娜,庄园   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2014-05-08 出版日期:2015-03-26 发布日期:2015-03-24
  • 通讯作者: 李洪亮 E-mail:lihongliang@jlu.edu.cn

Online MapReduce Data Transmission MechanismSupporting LargeScale Stream Data Processing

WEI Xiaohui, LI Cong, LI Hongliang, LI Xiang, LIU Yuanyuan, LI Lina, ZHUANG Yuan   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2014-05-08 Online:2015-03-26 Published:2015-03-24
  • Contact: LI Hongliang E-mail:lihongliang@jlu.edu.cn

摘要:

针对流数据规模参差不齐、 流量动态变化且突发性较强的特点, 提出一种可伸缩的动态MapReduce计算模型, 支持大规模动/静态数据在线处理. 基于Event推送方式, 利用Netty底层异步通信方式等技术, 建立在线MapReduce数据传输机制, 进一步实现其原型程序, 解决了大规模分布式计算程序的快速在线传输和数据分发等问
题, 支持流数据动态分发机制, 为动态MapReduce模型提供支撑. 与HadoopOnline系统的传统Socket管道传送方式相比, 该方法能有效提高作业之间数据的传送效率, 从而提高大规模流数据处理的实时性.

关键词: 大数据, 流数据处理, MapReduce模型, 数据传输机制

Abstract:

We proposed a scalable and dynamic MapReduce computation model which supports the online processing of largescale dynamic/static data against the characteristics of uneven stream data size and dynamic flowing  and breaking out suddenly. On this basis, we proposed an online MapReduce data transmission mechanism and implemented its prototype program based on the push mode of Event and the use of Netty asynchronous communication technology. This paper focuses on solving fast online transfer of the largescale distributed computing program and data dynamic distribution to provide support for dynamic MapReduce model. The experimental results show that the method can greatly improve the transmission efficiency of data between jobs compared with the traditional socket pipeline method in Hadoop system and improve realtime data stream handling significantly.

Key words: big data, stream data processing, MapReduce model, data transmission mechanism

中图分类号: 

  • TP391