Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (4): 1349-1357.doi: 10.13229/j.cnki.jdxbgxb20190854

Previous Articles    

Cost-efficient resource allocation algorithm for scientific workflow accross geo-distributed data centers

Xiao-hui WEI1,2(),Fang-yu TANG1,Hong-liang LI1,2()   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering,Jilin University,Changchun 130012,China
  • Received:2019-09-01 Online:2021-07-01 Published:2021-07-14
  • Contact: Hong-liang LI E-mail:weixh@jlu.edu.cn;lihongliang@jlu.edu.cn

Abstract:

Cloud operators charge bandwidth fees for data from user jobs that are transferred between data centers, and this accounts for a large percentage of user spending. It is of practical significance for users to reduce the communication overhead of the submitted job significantly without affecting the completion of the job through reasonable assignment of tasks and resources when determining the leased resources. This paper analyzes and models the traffic cost problem across data centers when workflow tasks are distributed in geographically distributed data centers, and proposes a heuristic algorithm for solving the problem. Several sets of experiments are carried out on the proposed algorithm, and the results are compared with that of the resource allocation algorithms commonly used in the current practice.. The experimental results verify that the heuristic algorithm proposed in this paper can effectively reduce the user communication traffic cost by at least 9.75% in the workflow operation across the geo-distributed data center, which serves to reduce the communication fee between users across geo-distributed data centers.

Key words: cloud computing, geo-distributed, workflow scheduling, cost efficient, heuristic algorithm

CLC Number: 

  • TP399

Fig.1

Workflow jobs deployed across geographically distributed data centers"

Fig.2

Schematic diagram of cloud computing model across geo-distributed data centers"

Table 1

Meaning of each variable in the model"

变量名称变量代表意义
T租赁的节点数目/提交作业的子任务数目
taskii个子任务的名称
Dbataska发往taskb的数据量
N租赁资源涉及的数据中心数目
nii个数据中心的名称
Rii个数据中心上租赁的节点数目
taskLocai子任务taski是否部署在了数据中心na
UPa数据中心na向外的流量带宽单价
BwCjitaskitaskj发送数据的流量带宽费用
BwC工作流作业的整体流量带宽费用

Fig.3

Example of algorithm ideas"

Fig.4

Structures of real-world workflows"

Table 2

Comparative algorithm"

算法名称算法简介
顺序分配策略ABS大部分工作流作业没有合理的资源分配策略,用户将各个任务上传至云平台后,按顺序为其分配租赁的计算资源,这种策略就是顺序分配策略(Allocate?ByNumbers?Strategy,ABS)。
最早完成时间策略EFT最早完成时间策略(Earliest?Finish?Time,EFT )18 是一种常用的作业资源分配策略。目的是让工作流作业的完成时间最短,其思路是将工作流作业的任务按完成时间排序进入队列,然后按照队列中任务的顺序分配资源。
最小数据量策略MBD最小数据量策略(Minium?Bandwidth?Data,MBD)是Li等11工作的内容。其主要思路是将作业资源分配问题抽象为混合整数线性规划问题,通过CPLEX求解器得出一个跨数据中心之间数据传输量最优的解。

Table 3

Characteristics of real-world workflow applications"

工作流结构作业规模任务数目边数目

平均作业内任务数据

传输总量/GB

Montage501083 586.8
10023573 850.9
400984301 382.2
Epigenomics505415 776.6
10012281 938.1
400470288 345.5
CyberShake50465 177.9
10096110 446.3
400396434 251.3
Sipht50563 288.9
10010954 472.7
400439232 937.5
Inspiral506019 407.9
10012339 175.8
400500158 730.5

Fig.5

Experimental of cross data-center traffic cost"

1 Enterprise Customer Success Stories - Amazon Web Services [DB/OL]. [2019-08-10].
2 Introduction to Cloud Economics_AA(final).pdf [DB/OL]. [2019-08-10].
3 Yao G, Ding Y, Hao K, et al. Using imbalance characteristic for fault-tolerant workflow scheduling in cloud systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(12): 3671-3683.
4 Zhou X, Wang K, Jia W, et al. Reinforcement learning-based adaptive resource management of differentiated services in geo-distributed data centers[C]∥ International Symposium on Quality of Service(IWQoS), Vilanova I La Geltru, Spain, 2017: 1-6.
5 Hu Z, Li B, Qin Z, et al. Job scheduling without prior information in big data processing systems[C]∥IEEE 37th International Conference on Distributed Computing Systems(ICDCS), Atlanta, GA, USA, 2017: 572-582.
6 Hu Z, Li B, Luo J, et al. Flutter:scheduling tasks closer to data across geo-distributed datacenters[C]∥The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 2016: 1-9.
7 Hu Z, Li B, Luo J, et al. Time- and cost-efficient task scheduling across geo-distributed data centers[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(3): 705-718.
8 Apache Spark™ — unified analytics engine for big data[DB/OL]. [2019-08-10].
9 Hadoop Apache. [DB/OL]. [2019-08-10].
10 Rimal B P, Maier M. Workflow scheduling in multi-tenant cloud computing environments[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(1): 290-304.
11 Li P, Guo S, Miyazaki T, et al. Traffic-Aware geo-distributed big data analytics with predictable Job completion time[J]. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(6): 1785-1796.
12 Mei J, Li K, Tong Z, et al. Profit maximization for cloud brokers in cloud computing[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(1): 190-203.
13 EC2instance pricing-amazon web services (AWS). [DB/OL]. [2019-08-10]. .
14 Stoer M, Wagner F. A simple min-cut algorithm[J]. Journal of the ACM(JACM), 1997, 44(4): 585-591.
15 Juve G, Chervenak A, Deelman E, et al. Characterizing and profiling scientific workflows[J]. Future Generation Computer Systems, 2013, 29(3): 682-692.
16 Deelman E, Vahi K, Juve G, et al. Pegasus, a workflow management system for science automation[J]. Future Generation Computer Systems, 2015, 46: 17-35.
17 WorkflowGenerator - Pegasus - Pegasus Workflow management system [DB/OL].[2019-08-10].
18 Lee Y C, Han H, Zomaya A Y, et al. Resource-efficient workflow scheduling in clouds[J]. Knowledge-Based Systems, 2015, 80: 153-162.
[1] Xiao-hui LI,Chao-yang CHEN,Hua-wei YI,Bo LI. Large scale network traffic prediction based on cloud computing and big data analysis [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(3): 1034-1039.
[2] Yuan SONG,Dan-yuan ZHOU,Wen-chang SHI. Method to enhance security function of OpenStack Swift cloud storage system [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 314-322.
[3] Shun-fu JIN,Xiu-chen QIE,Hai-xing WU,Zhan-qiang HUO. Clustered virtual machine allocation strategy in cloud computing based on new type of sleep-mode and performance optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(1): 237-246.
[4] JIAO Yu-ling, XU Liang-cheng, WANG Zhan-zhong, ZHANG Peng. Balance experiment and analysis of double U-shaped assembly line based on directed network [J]. 吉林大学学报(工学版), 2018, 48(2): 454-459.
[5] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[6] ZHAO Wei, QU Hui-yan. Fast collision detection algorithm based on Cloud Map-Reduce model [J]. 吉林大学学报(工学版), 2016, 46(2): 578-584.
[7] LI Qi, MA Jian-feng, XIONG Jin-bo,ZHANG Tao,LIU Xi-meng. Attribute-based encryption based access control scheme withconstant-size ciphertext in cloud computing [J]. 吉林大学学报(工学版), 2014, 44(3): 788-794.
[8] LIU Guo-qi, LIU Hui, GAO Yu, LIU Ying, ZHU Zhi-liang. Resource dynamic pricing strategy based on utility in cloud computing [J]. 吉林大学学报(工学版), 2013, 43(06): 1631-1637.
[9] YANG Qing-fang, MEI Duo, HAN Zhen-bo, ZHANG Biao. Ant colony optimization for the shortest path of urban road network based on cloud computing [J]. 吉林大学学报(工学版), 2013, 43(05): 1210-1214.
[10] MENG Chao, SUN Zhi-xin, LIU San-min. Multiple execution paths for virus based on cloud computing [J]. 吉林大学学报(工学版), 2013, 43(03): 718-726.
[11] OUYANG Dan-tong, GENG Xue-na, GUO Jin-song, WANG Xiao-yu. Heuristic algorithm of computing minimal hitting sets matrix [J]. 吉林大学学报(工学版), 2013, 43(01): 106-110.
[12] GUO Ping, DAN Guang-xiang. Mixed encryption algorithm in cloud computing [J]. 吉林大学学报(工学版), 2012, 42(增刊1): 327-331.
[13] CHEN Long, LI Jun-zhong. Verifiable method for remote data integrity supporting different granular operation [J]. 吉林大学学报(工学版), 2012, 42(增刊1): 295-299.
[14] NIE Xiong-ding, HAN De-zhi, BI Kun. Cloud computing data security [J]. 吉林大学学报(工学版), 2012, 42(增刊1): 332-336.
[15] SONG Rui, LIU Zhi-qian. Heuristic algorithm for feeder bus route generation in railway traffic system [J]. 吉林大学学报(工学版), 2011, 41(05): 1234-1239.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!