Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (10): 3151-3161.doi: 10.13229/j.cnki.jdxbgxb.20231335

Previous Articles    

Behavior-constrained proximal policy optimization for autonomous intersection management

Zhen-hai GAO1(),He-sheng HAO2,Fei GAO1,Rui ZHAO2()   

  1. 1.State Key Laboratory of Automotive Simulation and Control,Jilin University,Changchun 130022,China
    2.College of Automotive Engineering,Jilin University,Changchun 130022,China
  • Received:2023-12-02 Online:2025-10-01 Published:2026-02-03
  • Contact: Rui ZHAO E-mail:gaozh@jlu.edu.cn;rzhao@jlu.edu.cn

Abstract:

Aiming at the low computational efficiency and absence of safety guarantees in existing centralized cooperative control schemes, a swarm-coordination algorithm based on reinforcement learning is first proposed. Single-agent proximal policy optimization is extended to multi-agent interactive environments so that complex cooperation in multi-agent systems can be addressed. Secondly, the cooperative control of vehicles at unsignalized intersections is formulated as a multi-agent reinforcement-learning problem, and a safety-augmented centralized cooperative control method—behavior-restricted proximal policy optimization—is developed. Formal safety verification and behavior restrictions are integrated into the swarm-coordination algorithm, whereby the policy is guided to be iteratively optimized in a safe manner and unsafe driving behaviors are avoided, so that traffic safety in unknown scenarios can be further guaranteed. Finally, simulation experiments are conducted with the Carla platform. It is shown that the incorporation of behavior restrictions causes an 8.06 % loss in traffic efficiency, yet a 100 % safety improvement is achieved. Compared with the representative model predictive control approach, the proposed method reduces the computation time to 1/326 of the original, increases traffic efficiency by 67.0 %, lowers the collision rate from 63.5 % to 0, and improves ride comfort by 26.5 %.

Key words: automotive engineering, autonomous intersection management, autonomous and connected vehicles, reinforcement learning, formal verification

CLC Number: 

  • U491

Fig.1

Driving area division, driving routes and potential conflict points"

Fig.2

Vehicle motion model"

Fig.3

Architecture diagram of BCPPO"

Fig.4

Process of formal safety verification"

"

1

初始化行动家-评论家网络ππ0,网络参数θθ0。设置学习率α,奖励衰减系数γ,裁剪范围?,总时间步T,批大小B,最小批大小MB,世纪U

2

for k=1,2,?,T/B(对于每次迭代)

3

?初始化缓冲区Dbatch=?

4

?计算状态值Vθk(st)和奖励Rt

5

?在状态st,执行联合动作at={ai,t}i=1:Na,得到下一个状态st+1

6

?采集样本轨迹τk

7

?计算TD误差δt

8

?计算优势估计A?πθk

9

?计算值目标V?(st)=A?πθk+Vθk(st)

10

?存储τk,Vθk(st),A?πθk,V?(st)到缓冲区Dbatch

11

?更新πθkπθkold

12

?for epoch=1,2,?,U(对于每个世纪)

13

??随机排序缓冲区Dbatch内的数据批,并将其分成大小为MBDmmini

14

for m=1,2,?,B/MB(对于每个最小批)

15

???利用Dmmini计算新旧策略差异dtθk

16

???计算策略网络的替代目标Lactor(θk)

17

???计算值函数误差Lcritic(θk)

18

???计算行动家-评论家网络的替代目标LPPO(θk)

19

???利用?θkLPPO(θk)更新θk

20

??end for

21

?end for

22

end for

23

for t=0,1,?,n(对于策略部署中每个时间步)

24

?在状态st,遵循策略πθ选择动作ai,t

25

?for pij=1,2,?,M(对于每个冲突车辆对)

26

if 形式化验证到车辆i和车辆j即将碰撞

27

if pi>pj

28

????更新ajajcon

29

???else

30

????更新aiaicon

31

??end for

32

?执行联合动作at={ai,t}i=1:Na并得到下一个状态st+1

33

end for

Table 1

Parameters for simulation"

参数
离散时间步长/s0.1 s
车辆数量Na8
冲突车辆对数量M19
速度范围/(m·s-1[5.5, 8]
缓冲区长度db/m25
行动家网络权重kactor1
评论家网络权重kcritic-0.5
隐藏层数量2
隐藏单元数量512,512
种子数5
总时间步T2 048 000
批大小B2 048
最小批大小MB64
世纪U10
裁剪范围?0.2
奖励衰减系数γ0.99
学习率α0.000 03→0
回报衰减系数λ0.95
碰撞事件奖励kc-50
单辆车通过奖励kpone10
全部车通过奖励kpall50
安全时间阈值Ts1.2
干预时间阈值Tc4
形式化验证权重kt-22
形式化验证偏差bf2
时间步奖励ks-0.1
舒适奖励权重kj-4.5e-2
最大碰撞风险H1 000
风险权重αv0.005
通行效率权重wv1
舒适性权重wa5
预测步数16
期望速度/(m·s-18

Fig.5

Results of MAPPO training"

Table 2

Comparison of experimental resultsfor vehicle travelling"

方 法计算时间/s碰撞率/%通行效率/s舒适性/(m·s-2
MAPPO0.0040.4256.71.25
BCPPO0.00507.241.33
VICS(异步)1.6363.521.921.81
VICS(同步)1.61012.861.43
[1] Du Y, Shang G W, Chai L G. A coupled vehicle-signal control method at signalized intersections in mixed traffic environment[J]. IEEE Transactions on Vehicular Technology, 2021, 70(3): 2089-2100.
[2] 庄伟超, 丁昊楠, 董昊轩, 等. 信号交叉口网联电动汽车自适应学习生态驾驶策略[J]. 吉林大学学报:工学版, 2023, 53(1): 82-93.
Zhuang Wei-chao, Ding Hao-nan, Dong Hao-xuan, et al. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[3] Khoury J, Khoury J, Zouein G, et al. A practical decentralized access protocol for autonomous vehicles at isolated under-saturated intersections[J]. Journal of Intelligent Transportation Systems, 2019, 23(5): 427-440.
[4] Karthikeyan P, Chen W, Hsiung P. Autonomous intersection management by using reinforcement learning[J]. Algorithms, 2022, 15(9): No.326.
[5] Chamideth S, Tarneberg W, Kihl M. A safe and robust autonomous intersection management system using a hierarchical control strategy and V2I communication[J]. IEEE Systems Journal, 2023, 17(1): 50-61.
[6] Antonio G, Maria-Dolores C. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow's intersections[J]. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7033-7043.
[7] Fajardo D, Au T, Waller S T, et al. Automated intersection control[J]. Transportation Research Record: Journal of the Transportation Research Board, 2011, 2259(1): 223-232.
[8] Dresner K, Stone P. A multiagent approach to autonomous intersection management[J]. Journal of Artificial Intelligent Research, 2008, 31(1): 591-656.
[9] Gregoire J, Bonnabel S, Arnaud D. Optimal cooperative motion planning for vehicles at intersections[J/OL].[2023-11-23]..
[10] Lu G, Li L, Wang Y, et al. A rule based control algorithm of connected vehicles in uncontrolled intersection[C]∥The 17th International IEEE Conference on Intelligent Transportation Systems, Qingdao, China,2014: 115-120.
[11] Zhang K, Arnaud D, Zhang D, et al. Analysis and modeled design of one state-driven autonomous passing-through algorithm for driverless vehicles at intersections[C]∥The 16th International Conference on Computational Science and Engineering,Sydney, Australia,2013: 751-757.
[12] Arnaud D. Analysis of reservation algorithms for cooperative planning at intersections[C]∥The 13th International IEEE Conference on Intelligent Transportation Systems,Funchal,Portugal, 2010: 445-449.
[13] Li N, Kolmanovsky I, Girard A, et al. Game theoretic modeling of vehicle interactions at unsignalized intersections and application to autonomous vehicle control[C]∥Annual American Control Conference,Milwaukee, USA,2018: 3215-3220.
[14] Wang H, Meng Q, Chen S. Competitive and cooperative behaviour analysis of connected and autonomous vehicles across unsignalised intersections: a game-theoretic approach[J]. Transportation Research Part B: Methodological, 2021, 149: 322-346.
[15] Elhenawy M, Elbery A A, Hassan A A, et al. An intersection game-theory-based traffic control algorithm in a connected vehicle environment[C]∥IEEE 18th International Conference on Intelligent Transportation Systems,Gran Canaria,Spain,2015: 343-347.
[16] Zhao W, Liu R, Ngoduy D. A bilevel programming model for autonomous intersection control and trajectory planning[J]. Transportmetrica A: Transport Science, 2021, 17(1): 34-58.
[17] Nair S H, Govindarajan V, Lin T, et al. Stochastic MPC with multi-modal predictions for traffic intersections[C]∥IEEE 25th International Conference on Intelligent Transportation Systems,Macau,China, 2022: 635-640.
[18] Kamal M A S, Imura J, Hayakawa T, et al. A vehicle-intersection coordination scheme for smooth flows of traffic without using traffic lights[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1136-1147.
[19] Fink M. Implementation of linear model predictive control-tutorial[J/OL].[2023-11-06]. .
[20] Zhou M, Yu U, Qu X. Development of an efficient driving strategy for connected and automated vehicles at signalized intersections: a reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 20(1): 433-443.
[21] Guo M, Wang P, Chan C Y, et al. A reinforcement learning approach for intelligent traffic signal control at urban intersections[C]∥IEEE Intelligent Transportation Systems Conference, Auckland, New Zealand,2019: 4242-4247.
[22] Guan Y, Ren Y, Li S, et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12597-12608.
[23] Wang S, Wan Q. Right-turn driving decisions of autonomous vehicles at signal-free intersections [J]. Application Research of Computers, 2023, 40(5): 1468-1472.
[24] Nordfjarn T, Simseloglu, O, Rundmo T. Culture related to road traffic safety: a comparison of eight countries using two conceptualizations of culture[J]. Accident Analysis and Prevention, 2014, 62: 319-328.
[25] Zheng J, Zhu K, Wang R. Deep reinforcement learning for autonomous vehicles collaboration at unsignalized intersections[C]∥IEEE Global Communications Conference, Rio de Janeiro,Brazil, 2022: 1115-1120.
[26] Teh Y W, Bapst V, Czarnecki W M, et al. Distral: robust multitask reinforcement learning[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems,Long Beach,USA,2017: 4499-4509.
[1] Tian-jun SUN,Hui-zhe YANG,Rong-gui CAI,Jia-yi FENG,Rui RAN,Bin LIU. Humanized stop⁃and⁃go control strategy for adaptive cruise system of pure electric vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 2847-2857.
[2] Wei-dong LI,Cao-yuan MA,Hao SHI,Heng CAO. An automatic driving decision control algorithm based on hierarchical reinforcement learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1798-1805.
[3] Zi-hao SHEN,Yong-sheng GAO,Hui WANG,Pei-qian LIU,Kun LIU. Deep deterministic policy gradient caching method for privacy protection in Internet of Vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1638-1647.
[4] Zhen-hai GAO,Cheng-yuan ZHENG,Rui ZHAO. Review of active safety verification and validation for autonomous vehicles in real and virtual scenarios [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1142-1162.
[5] Bing ZHU,Tian-xin FAN,Wen-bo ZHAO,Wei-nan LI,Pei-xing ZHANG. Continuous test scenario complexity evaluation method for automated driving vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 456-467.
[6] Ji-xuan YANG,Gui-hui ZHANG,Zhi-yong CHEN,Wen-ku SHI,Jian LIU,Ren-fei YUAN,Yan-yan ZHAO. Clearance design of taper roller bearings for reducing drive axle whine [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3141-3150.
[7] Guang-he ZHU,Zhi-qiang ZHU,Yi-ping YUAN. Deep reinforcement learning optimization scheduling algorithm for continuous production line [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2086-2092.
[8] Jing-peng GAO,Guo-xuan WANG,Lu GAO. LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(3): 797-806.
[9] Xiao WU,Wen-ku SHI,Nian-cheng GUO,Yan-yan ZHAO,Zhi-yong CHEN,Xin-peng LI,Zhuo SUN,Jian LIU. Multi-objective optimization of hypoid gears based on Ease off [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(1): 76-85.
[10] Jian ZHANG,Qing-yang LI,Dan LI,Xia JIANG,Yan-hong LEI,Ya-ping JI. Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2508-2518.
[11] Yan-tao TIAN,Yan-shi JI,Huan CHANG,Bo XIE. Deep reinforcement learning augmented decision⁃making model for intelligent driving vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 682-692.
[12] Bai-cang GUO,Guo-feng LUO,Li-sheng JIN,Xian-yi XIE,Dong-xian SUN. Construction method of cut-in scenario library for automatic driving virtual tests [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(11): 3130-3140.
[13] Bao-gang LI,Yu WANG,Fan-wei KONG,Cheng-wei TIAN. Security status updates based on intelligent reflecting surface assistance and age of information metrics [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 3014-3025.
[14] Wei-chao ZHUANG,Hao-nan Ding,Hao-xuan DONG,Guo-dong YIN,Xi WANG,Chao-bin ZHOU,Li-wei XU. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[15] Xing-tao LIU,Xiao-jian LIU,Ji WU,Yao HE,Xin-tian LIU. State of health estimation method for lithium⁃ion battery based on curve compression and extreme gradient boosting [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1273-1280.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!