吉林大学学报(工学版) ›› 2015, Vol. 45 ›› Issue (5): 1559-1564.doi: 10.13229/j.cnki.jdxbgxb201505026

Previous Articles     Next Articles

Performance portable GPU parallel optimization technique on Bellman-Ford algorithm

LIU Lei1, WANG Yan-yan1, SHEN Chun1, LI Yu-xiang2, LIU Lei3   

  1. 1.College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2.Citic Securities Co., Ltd., Beijing 100026, China;
    3.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2014-05-06 Online:2015-09-01 Published:2015-09-01

Abstract: A GPU-oriented performance portable parallel reduction algorithm for extreme value optimization and a global memory access optimization algorithm are presented to resolve the issues of deficient parallel granularity and global memory access conflict on different GPUs. Experimental results show that the presented optimization algorithms can obtain high performance on a variety of NVIDIA and AMD GPU devices, gaining a speedup of 3 to 6 times than existing methods.

Key words: computer software, Bellman-Ford algorithm, GPU parallel programing and optimization techniques, parallel reduction algorithm, performance portability

CLC Number: 

  • TP302
[1] Tehrani Pouya, Zhao Qing. Distributed online learning of the shortest path under unknown random edge weights[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver,BC,2013:3138-3142.
[2] Carli M, Panzieri S, Pascucci F. A joint routing and localization algorithm for emergency scenario[J]. Ad Hoc Networks, 2014, 13:19-33.
[3] Tewari A,Kumar D A.Different routing algorithm for computer networks[J].Kaav International Journal of Science, Engineering & Technology,2014,1(1):21-34.
[4] Abouali M, Timmermans J, Castillo J E, et al. A high performance GPU implementation of Surface energy balance system (SEBS) based on CUDA-C[J]. Environmental Modelling & Software, 2013, 41: 134-138.
[5] Huang B, Mielikainen J, Oh H, et al. Development of a GPU-based high-performance radiative transfer model for the Infrared atmospheric sounding interferometer (IASI)[J]. Journal of Computational Physics, 2011, 230(6): 2207-2221.
[6] Singh D P, Khare N. A study of different parallel implementations of single source shortest path algorithms[J]. International Journal of Computer Applications, 2012, 54(10):26-30.
[7] 颜深根, 张云泉, 龙国平, 等. 基于OpenCL的归约算法优化[J]. 软件学报, 2011, 22(2): 163-171. Yan Shen-gen,Zhang Yun-quan,Long Guo-ping,et al.Reduction algorithm optimization based on the OpenCL[J].Journal of Software,2011,22(2):163-171.
[8] Kumar S, Misra A, Tomar R S. A modified parallel approach to single source shortest path problem for massively dense graphs using CUDA[C]∥2011 2nd International Conference on Computer and Communication Technology, Allahabod,2011: 635-639.
[9] Lee Jaejin, Kim Jungwon,Seo Sangmin.An OpenCL framework for heterogenous multicores with local memory[C]∥Proceeding of the 19th International Conference on Parallel Architectures and Compilation Techiniques,ACM New York,NY,USA,2010:193-204.
[10] 贾海鹏, 张云泉, 龙国平, 等. 基于 OpenCL 的拉普拉斯图像增强算法优化研究[J]. 计算机科学, 2012, 39(5): 271-277. Jia Hai-peng, Zhang Yun-quan, Long Guo-ping, et al. Research on laplace image enhancement algorithm optimization base on OpenCL[J]. Computer Science, 2012, 39(5):271-277.
[11] NVIDIA.NVIDIA's Next Generation CUDA Compute Architecture:Kepler GK110[EB/OL].[2013-06-28].http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
[12] NVIDIA. NVIDIA's next generation CUDA TM compute architecture, Fermi[EB/OL].[2013-05-22].http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.
[1] MA Jian, FAN Jian-ping, LIU Feng, LI Hong-hui. The evolution model of objective-oriented software system [J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[2] LUO Yang-xia, GUO Ye. Software recognition based on features of data dependency [J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[3] YING Huan, WANG Dong-hui, WU Cheng-gang, WANG Zhe, TANG Bo-wen, LI Jian-jun. Efficient deterministic replay technique on commodity system environment [J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[4] LI Yong, HUANG Zhi-qiu, WANG Yong, FANG Bing-wu. New approach of cross-project defect prediction based on multi-source data [J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[5] WANG Nian-bin, ZHU Guan-wen, ZHOU Lian-ke, WANG Hong-wei. Novel dataspace index for efficient processing of path query [J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[6] TE Ri-gen, JIANG Sheng, LI Xiong-fei, LI Jun. Document compression scheme based on integer data [J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[7] CHEN Peng-fei, TIAN Di, YANG Guang. Design and implementation of LIBS software based on MVC architecture [J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[8] FENG Xiao-ning, WANG Zhuo, ZHANG Xu. Formal method for routing protocol of WSN based on L-π calculus [J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[9] LI Ming-zhe, WANG Jin-lin, CHEN Xiao, CHEN Jun. Architecture model of streaming media applications on network processors(VPL) [J]. 吉林大学学报(工学版), 2015, 45(5): 1572-1580.
[10] WANG Ke-chao, WANG Tian-tian, SU Xiao-hong, MA Pei-jun. Plagiarism detection in student programs based on frequent closed sequence mining [J]. 吉林大学学报(工学版), 2015, 45(4): 1260-1265.
[11] HUANG Hong-tao,WANG Jing,YE Hai-zhi,HUANG Shao-bin. Lazy slicing based method for verifying linear temporal logic property [J]. 吉林大学学报(工学版), 2015, 45(1): 245-251.
[12] FAN Da-juan, HUANG Zhi-qiu, XIAO Fang-xiong, ZHU Yi, WANG Jin. Compatibility analysis and adaptor generation for multi-service interaction [J]. 吉林大学学报(工学版), 2014, 44(4): 1094-1103.
[13] HE Qin-lu, LI Zhan-huai, WANG Le-xiao, WANG Rui. Testing technology for aggregate bandwidth of cloud storage system [J]. 吉林大学学报(工学版), 2014, 44(4): 1104-1111.
[14] LIU Guo-qi, LIU Hui, GAO Yu, LIU Ying, ZHU Zhi-liang. Resource dynamic pricing strategy based on utility in cloud computing [J]. 吉林大学学报(工学版), 2013, 43(06): 1631-1637.
[15] DENG Hui, WU Jin-zhao. Approximate bisimulation for linear semi-algebraic transition systems [J]. 吉林大学学报(工学版), 2013, 43(04): 1052-1058.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!