Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (5): 1798-1805.doi: 10.13229/j.cnki.jdxbgxb.20230891

Previous Articles     Next Articles

An automatic driving decision control algorithm based on hierarchical reinforcement learning

Wei-dong LI1(),Cao-yuan MA1,Hao SHI2,Heng CAO2   

  1. 1.School of Automotive Engineering,Dalian University of Technology,Dalian 116024,China
    2.Huadian Coal Industry Group Digital Intelligence Technology Co. ,Ltd. ,Beijing 102488,China
  • Received:2023-08-22 Online:2025-05-01 Published:2025-07-18

Abstract:

To address the issues of slow convergence and limited applicability of reinforcement learning models in automatic driving tasks, a two-tiered reinforcement learning framework is proposed as a substitute for the decision and control layers. Within this framework, the decision layer categorizes driving behaviors into lane keeping, left lane change, and right lane change. Subsequently, after the decision layer selects the appropriate behavior, execution is achieved by modifying the input to the control layer. Then, in combination with reinforcement learning and online experts, a new method RL_COE is proposed to train the control layer. Finally, the proposed algorithm is verified in the highway simulation environment based on Carla and compared with the baseline reinforcement learning algorithm. The results show that this method significantly improves the convergence and stability of the algorithm, and can better perform the driving task.

Key words: vehicle engineering, automatic driving, hierarchical reinforcement learning, online experts, Carla

CLC Number: 

  • U495

Fig.1

Schematic diagram of algorithm framework"

Fig.2

Schematic diagram of state space"

Fig.3

Schematic diagram of state space"

Table 1

Algorithm hyperparameters"

超参数
折扣因子9.9×10-1
目标熵-2
策略网络学习率2.5×10-4
价值网络学习率5×10-4
权衡系数学习率5×10-4
软更新率1×10-2
经验回放池容量5×106
单次训练样本数量512

Fig.4

Training process of control layer"

Table 2

Control layer test results"

算法成功率/%平均累计奖励累计奖励标准差
SAC_COE100928.213.3
DDPG_COE100908.423.6
SAC95806.6155.7
DDPG23532.1197.4
PID98857.832.1

Fig.5

Lateral deviation of lane maintenance"

Table 3

Comparison of lateral deviation"

算 法横向偏差平均值/m横向偏差标准差/m
SAC_COE0.070.05
DDPG_COE0.190.10
PID0.120.11

Fig.6

Training process of decision layer"

Table 4

Test results of decision makers"

算 法成功率/%

平均累计

奖励

累计奖励

标准差

OURS100925.156.9
RULE97825.6148.4
LANE_KEEPING100773.2183.6

Fig.7

Typical operating condition test results"

[1] 张羽翔. 基于知识与机理增强强化学习的智能车辆决策与控制[D]. 长春: 吉林大学汽车工程学院, 2022.
Zhang Yu-xiang. Domain knowledge and physical-enhanced reinforcement learning for intelligent vehicles decision-making and control[D]. Changchun: School of Transportation, Jilin University, 2022.
[2] Schwarting W, Alonso M J, Rus D. Planning and decision-making for autonomous vehicles[J]. Annual Review of Control, Robotics, and Autonomous Systems, 2018, 1: 187-210.
[3] Paden B, Čáp M, Yong S Z, et al. A survey of motion planning and control techniques for self-driving urban vehicles[J]. IEEE Transactions on Intelligent Vehicles, 2016, 1(1): 33-55.
[4] 王景珂. 基于学习的自动驾驶行为决策研究[D]. 杭州: 浙江大学控制科学与工程学院, 2021.
Wang Jing-ke. Research on autonomous driving behavior decison-making based on learning[D]. Hangzhou: College of Control Science and Engineering, Zhejiang University, 2021.
[5] Sutton R S, Barto A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 2018.
[6] Kendall A, Hawke J, Janz D, et al. Learning to drive in a day[C]∥International Conference on Robotics and Automation, Montreal, Canada, 2019: 8248-8254.
[7] Wurman P R, Barrett S, Kawamoto K, et al. Outracing champion gran turismo drivers with deep reinforcement learning[J]. Nature, 2022, 602: 223-228.
[8] 李文礼, 邱凡珂, 廖达明, 等. 基于深度强化学习的高速公路换道跟踪控制模型[J]. 汽车安全与节能学报, 2022, 13(4): 750-759.
Li Wen-li, Qiu Fan-ke, Liao Da-ming, et al. Highway lane change decision control model based on deep reinforcement learning[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 750-759.
[9] Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909-4926.
[10] Zhu M, Wang Y, Pu Z, et al. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving[J]. Transportation Research Part C: Emerging Technologies, 2020, 117: 102662.
[11] Huang Z, Wu J, Lyu C. Efficient deep reinforcement learning with imitative expert priors for autonomous driving[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7391-7403.
[12] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533.
[13] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]∥International Conference on Machine Learning, Stockholm, Sweden, 2018: 1861-1870.
[14] Liang X, Wang T, Yang L, et al. Cirl: controllable imitative reinforcement learning for vision-based self-driving[C]∥Proceedings of the European Conference on Computer Vision, Munichi, Germany, 2018: 584-599.
[15] Hester T, Vecerik M, Pietquin O, et al. Deep q-learning from demonstrations[C]∥Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 3223-3230.
[16] Elallid B B, Benamar N, Hafid A S, et al. A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving[J]. Journal of King Saud University-Computer and Information Sciences, 2022, 34(9): 7366-7390.
[17] Kulkarni T D, Narasimhan K, Saeedi A, et al. Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation[J]. Advances in Neural Information Processing Systems, 2016, 29: 990-998.
[18] Duan J, Eben L S, Guan Y, et al. Hierarchical reinforcement learning for self‐driving decision‐making without reliance on labelled driving data[J]. IET Intelligent Transport Systems, 2020, 14(5): 297-305.
[19] Dosovitskiy A, Ros G, Codevilla F, et al. CARLA: an open urban driving simulator[C]∥Conference on Robot Learning, Mountain View, USA, 2017: 1-16.
[1] Dang LU,Yan-ru SUO,Yu-hang SUN,Hai-dong WU. Estimation of tire camber and sideslip combined mechanical characteristics based on dimensionless expression [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1516-1524.
[2] Zhen-hai GAO,Cheng-yuan ZHENG,Rui ZHAO. Review of active safety verification and validation for autonomous vehicles in real and virtual scenarios [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1142-1162.
[3] Tao ZHANG,Huang-da LIN,Zhong-jun YU. Real-time rolling optimization control method for gearshift of hybrid electric vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1215-1224.
[4] Dang LU,Xiao-fan WANG,Hai-dong WU. Analysis of uniform distribution characteristics of contact pressure of TWEEL tires [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 811-819.
[5] Jun-long QU,Wen-ku SHI,Sheng-yi XUAN,Zhi-yong CHEN. Parameter design method of multiple dynamic vibration absorbers for suppressing multi-frequency resonance of automotive powertrain [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 444-455.
[6] Xin CHEN,Xiang-yuan ZHANG,Zi-tao WU,Gui-shen YU,Li-fei YANG. Effect of process sequence on tensile shear properties of PFSSW joints for automotive aluminum sheets [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 468-475.
[7] Hong-yu HU,Zheng-guang ZHANG,You QU,Mu-yu CAI,Fei GAO,Zhen-hai GAO. Driver behavior recognition method based on dual-branch and deformable convolutional neural networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(1): 93-104.
[8] Jun-nian WANG,Yu-jing CAO,Zhi-ren LUO,Kai-xuan LI,Wen-bo ZHAO,Ying-yi MENG. Online detection algorithm of road water depth based on binocular vision [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(1): 175-184.
[9] Cao TAN,Hao-xin REN,Wen-qing GE,Ya-dong SONG,Jia-yu LU. Improved active disturbance rejection control for hydraulic vibration stages based on the direct-drive valve [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(1): 84-92.
[10] Shou-tao LI,Lu YANG,Ru-yi QU,Peng-peng SUN,Ding-li YU. Slip rate control method based on model predictive control [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(9): 2687-2696.
[11] Liang WU,Yi-fan GU,Biao XING,Fang-wu MA,Li-wei NI,Wei-wei JIA. Steering four-wheel distributed integrated control method based on LQR [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(9): 2414-2422.
[12] Yu-hai WANG,Xiao-zhi LI,Xing-kun LI. Predictive energy saving algorithm for hybrid electric truck under high-speed condition [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2121-2129.
[13] Sheng CHANG,Hong-fei LIU,Nai-wei ZOU. H loop shaping robust control of vehicle tracking on variable curvature curve [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2141-2148.
[14] Jian-ze LIU,Jiang LIU,Min LI,Xin-jie ZHANG. Vehicle speed decoupling road identification method based on least squares [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 1821-1830.
[15] Xian-yi XIE,Ming-jun ZHANG,Li-sheng JIN,Bin ZHOU,Tao HU,Yu-fei BAI. Artificial bee colony trajectory planning algorithm for intelligent vehicles considering comfortable [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1570-1581.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] Cao Fu-cheng, Wang Shu-xun, Sun Xiao-ying . High resolution and low complexity subspace based
delay estimation for DSUWB system
[J]. 吉林大学学报(工学版), 2008, 38(02): 471 -0475 .
[2] YANG Zhi-fa,WANG Yun-peng,LI Shi-wu,YU Zhuo,KUI Hai-lin. Sight distance estimation model affected by road landscape of crooked highway in mountain area[J]. 吉林大学学报(工学版), 2009, 39(增刊2): 195 -0198 .
[3] WU Ding-chao,GAO Yin-han,FAN Kuan-gang,WANG Ke. Electromagnetic radiation suppressed research of automotive mobile preparation system based on bluetooth technology[J]. 吉林大学学报(工学版), 2009, 39(06): 1554 -158 .
[4] ZHAO Chun-hui, HU Chun-mei. Weighted anomaly detection algorithm for hyperspectral image based on target orthogonal subspace projection[J]. 吉林大学学报(工学版), 2011, 41(05): 1468 -1474 .
[5] ZHANG Lin, ZHAO Hong-wei, YANG Yi-han, MA Zhi-chao, HUANG Hu, MA Zhi-chao. Molecular dynamics simulation of nanoindentation of single-layer graphene sheet[J]. 吉林大学学报(工学版), 2013, 43(06): 1558 -1565 .
[6] WANG Qing-nian, WANG Jun, CHEN Hui-yong, ZENG Xiao-hua, TANG Xian-zhi. Accelerating and braking intention identification in hybrid vehicle[J]. 吉林大学学报(工学版), 2014, 44(2): 281 -286 .
[7] WANG Pei-zhi, TIAN Di, LONG Tao, LI Di-fei, QIU Chun-ling, LIU Dun-yi. Automatic focusing algorithm for TOF-SIMS zircon sample image[J]. 吉林大学学报(工学版), 2017, 47(1): 308 -315 .
[8] WANG Feng-yan, HUANG Run-qiu, CHEN Jian-ping, ZHANG Yuan-yuan, WANG Ming-chang. Camera calibration based on computer vision and surveying adjustment fundamentals[J]. 吉林大学学报(工学版), 2017, 47(3): 944 -951 .
[9] CHEN Yong-heng, LIU Xin-shan, XIONG Shuai, WANG Kun-wei, SHEN Yao, YANG Shao-hui. Variable speed limit control under snow and ice conditions for urban expressway in junction bottleneck area[J]. 吉林大学学报(工学版), 2018, 48(3): 677 -687 .
[10] JIA Tuo,ZHAO Ding-xuan,CUI Yu-xin. Method of rollover pre-warning for articulated loader[J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1762 -1769 .