基于行为规约近端策略优化的自主交叉口管理方法

doi:10.13229/j.cnki.jdxbgxb.20231335

Abstract

Abstract:

Aiming at the low computational efficiency and absence of safety guarantees in existing centralized cooperative control schemes， a swarm-coordination algorithm based on reinforcement learning is first proposed. Single-agent proximal policy optimization is extended to multi-agent interactive environments so that complex cooperation in multi-agent systems can be addressed. Secondly， the cooperative control of vehicles at unsignalized intersections is formulated as a multi-agent reinforcement-learning problem， and a safety-augmented centralized cooperative control method—behavior-restricted proximal policy optimization—is developed. Formal safety verification and behavior restrictions are integrated into the swarm-coordination algorithm， whereby the policy is guided to be iteratively optimized in a safe manner and unsafe driving behaviors are avoided， so that traffic safety in unknown scenarios can be further guaranteed. Finally， simulation experiments are conducted with the Carla platform. It is shown that the incorporation of behavior restrictions causes an 8.06 % loss in traffic efficiency， yet a 100 % safety improvement is achieved. Compared with the representative model predictive control approach， the proposed method reduces the computation time to 1/326 of the original， increases traffic efficiency by 67.0 %， lowers the collision rate from 63.5 % to 0， and improves ride comfort by 26.5 %.

Key words: automotive engineering, autonomous intersection management, autonomous and connected vehicles, reinforcement learning, formal verification

CLC Number:

U491

Zhen-hai GAO,He-sheng HAO,Fei GAO,Rui ZHAO. Behavior-constrained proximal policy optimization for autonomous intersection management[J].Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3151-3161.

Figures/Tables 8

Fig.1

Fig.2

Fig.3

Fig.4

1	初始化行动家-评论家网络 $π$ 为 $π 0$ ，网络参数 $θ$ 为 $θ 0$ 。设置学习率 $α$ ，奖励衰减系数 $γ$ ，裁剪范围 $?$ ，总时间步 $T$ ，批大小 $B$ ，最小批大小 $M B$ ，世纪 $U$
2	for $k = 1,2, ?, T / B$ （对于每次迭代）
3	?初始化缓冲区 $D b a t c h = ?$
4	?计算状态值 $V θ k (s t)$ 和奖励 $R t$
5	?在状态 $s t$ ，执行联合动作 $a t = {a i, t} i = 1 : N a$ ，得到下一个状态 $s t + 1$
6	?采集样本轨迹 $τ k$
7	?计算TD误差 $δ t$
8	?计算优势估计 $A ? π θ k$
9	?计算值目标 $V ? (s t) = A ? π θ k + V θ k (s t)$
10	?存储 $τ k, V θ k (s t), A ? π θ k, V ? (s t)$ 到缓冲区 $D b a t c h$ 中
11	?更新 $π θ k$ 为 $π θ k o l d$
12	?for $e p o c h = 1,2, ?, U$ （对于每个世纪）
13	??随机排序缓冲区 $D b a t c h$ 内的数据批，并将其分成大小为 $M B$ 的 $D m m i n i$
14	for $m = 1,2, ?, B / M B$ （对于每个最小批）
15	???利用 $D m m i n i$ 计算新旧策略差异 $d t θ k$
16	???计算策略网络的替代目标 $L a c t o r (θ k)$
17	???计算值函数误差 $L c r i t i c (θ k)$
18	???计算行动家-评论家网络的替代目标 $L P P O (θ k)$
19	???利用 $? θ k L P P O (θ k)$ 更新 $θ k$
20	??end for
21	?end for
22	end for
23	for $t = 0,1, ?, n$ （对于策略部署中每个时间步）
24	?在状态 $s t$ ，遵循策略 $π θ$ 选择动作 $a i, t$
25	?for $p i j = 1,2, ?, M$ （对于每个冲突车辆对）
26	if 形式化验证到车辆 $i$ 和车辆 $j$ 即将碰撞
27	if $p i > p j$
28	????更新 $a j$ 为 $a j c o n$
29	???else
30	????更新 $a i$ 为 $a i c o n$
31	??end for
32	?执行联合动作 $a t = {a i, t} i = 1 : N a$ 并得到下一个状态 $s t + 1$
33	end for

Table 1

Parameters for simulation"

参数	值
离散时间步长/s	0.1 s
车辆数量 $N a$	8
冲突车辆对数量 $M$	19
速度范围/（m·s^-1）	［5.5， 8］
缓冲区长度 $d b$ /m	25
行动家网络权重 $k a c t o r$	1
评论家网络权重 $k c r i t i c$	-0.5
隐藏层数量	2
隐藏单元数量	512，512
种子数	5
总时间步 $T$	2 048 000
批大小 $B$	2 048
最小批大小 $M B$	64
世纪 $U$	10
裁剪范围 $?$	0.2
奖励衰减系数 $γ$	0.99
学习率 $α$	0.000 03→0
回报衰减系数 $λ$	0.95
碰撞事件奖励 $k c$	-50
单辆车通过奖励 $k p o n e$	10
全部车通过奖励 $k p a l l$	50
安全时间阈值 $T s$	1.2
干预时间阈值 $T c$	4
形式化验证权重 $k t$	-22
形式化验证偏差 $b f$	2
时间步奖励 $k s$	-0.1
舒适奖励权重 $k j$	-4.5e-2
最大碰撞风险 $H$	1 000
风险权重 $α v$	0.005
通行效率权重 $w v$	1
舒适性权重 $w a$	5
预测步数	16
期望速度/（m·s^-1）	8

Table 1

Fig.5

Table 2

References 26

[1]	Du Y, Shang G W, Chai L G. A coupled vehicle-signal control method at signalized intersections in mixed traffic environment[J]. IEEE Transactions on Vehicular Technology, 2021, 70(3): 2089-2100.
[2]	庄伟超, 丁昊楠, 董昊轩, 等. 信号交叉口网联电动汽车自适应学习生态驾驶策略[J]. 吉林大学学报:工学版, 2023, 53(1): 82-93.
	Zhuang Wei-chao, Ding Hao-nan, Dong Hao-xuan, et al. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[3]	Khoury J, Khoury J, Zouein G, et al. A practical decentralized access protocol for autonomous vehicles at isolated under-saturated intersections[J]. Journal of Intelligent Transportation Systems, 2019, 23(5): 427-440.
[4]	Karthikeyan P, Chen W, Hsiung P. Autonomous intersection management by using reinforcement learning[J]. Algorithms, 2022, 15(9): No.326.
[5]	Chamideth S, Tarneberg W, Kihl M. A safe and robust autonomous intersection management system using a hierarchical control strategy and V2I communication[J]. IEEE Systems Journal, 2023, 17(1): 50-61.
[6]	Antonio G, Maria-Dolores C. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow's intersections[J]. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7033-7043.
[7]	Fajardo D, Au T, Waller S T, et al. Automated intersection control[J]. Transportation Research Record: Journal of the Transportation Research Board, 2011, 2259(1): 223-232.
[8]	Dresner K, Stone P. A multiagent approach to autonomous intersection management[J]. Journal of Artificial Intelligent Research, 2008, 31(1): 591-656.
[9]	Gregoire J, Bonnabel S, Arnaud D. Optimal cooperative motion planning for vehicles at intersections[J/OL].[2023-11-23]..
[10]	Lu G, Li L, Wang Y, et al. A rule based control algorithm of connected vehicles in uncontrolled intersection[C]∥The 17th International IEEE Conference on Intelligent Transportation Systems, Qingdao, China,2014: 115-120.
[11]	Zhang K, Arnaud D, Zhang D, et al. Analysis and modeled design of one state-driven autonomous passing-through algorithm for driverless vehicles at intersections[C]∥The 16th International Conference on Computational Science and Engineering,Sydney, Australia,2013: 751-757.
[12]	Arnaud D. Analysis of reservation algorithms for cooperative planning at intersections[C]∥The 13th International IEEE Conference on Intelligent Transportation Systems,Funchal,Portugal, 2010: 445-449.
[13]	Li N, Kolmanovsky I, Girard A, et al. Game theoretic modeling of vehicle interactions at unsignalized intersections and application to autonomous vehicle control[C]∥Annual American Control Conference,Milwaukee, USA,2018: 3215-3220.
[14]	Wang H, Meng Q, Chen S. Competitive and cooperative behaviour analysis of connected and autonomous vehicles across unsignalised intersections: a game-theoretic approach[J]. Transportation Research Part B: Methodological, 2021, 149: 322-346.
[15]	Elhenawy M, Elbery A A, Hassan A A, et al. An intersection game-theory-based traffic control algorithm in a connected vehicle environment[C]∥IEEE 18th International Conference on Intelligent Transportation Systems,Gran Canaria,Spain,2015: 343-347.
[16]	Zhao W, Liu R, Ngoduy D. A bilevel programming model for autonomous intersection control and trajectory planning[J]. Transportmetrica A: Transport Science, 2021, 17(1): 34-58.
[17]	Nair S H, Govindarajan V, Lin T, et al. Stochastic MPC with multi-modal predictions for traffic intersections[C]∥IEEE 25th International Conference on Intelligent Transportation Systems,Macau,China, 2022: 635-640.
[18]	Kamal M A S, Imura J, Hayakawa T, et al. A vehicle-intersection coordination scheme for smooth flows of traffic without using traffic lights[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1136-1147.
[19]	Fink M. Implementation of linear model predictive control-tutorial[J/OL].[2023-11-06]. .
[20]	Zhou M, Yu U, Qu X. Development of an efficient driving strategy for connected and automated vehicles at signalized intersections: a reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 20(1): 433-443.
[21]	Guo M, Wang P, Chan C Y, et al. A reinforcement learning approach for intelligent traffic signal control at urban intersections[C]∥IEEE Intelligent Transportation Systems Conference, Auckland, New Zealand,2019: 4242-4247.
[22]	Guan Y, Ren Y, Li S, et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12597-12608.
[23]	Wang S, Wan Q. Right-turn driving decisions of autonomous vehicles at signal-free intersections [J]. Application Research of Computers, 2023, 40(5): 1468-1472.
[24]	Nordfjarn T, Simseloglu, O, Rundmo T. Culture related to road traffic safety: a comparison of eight countries using two conceptualizations of culture[J]. Accident Analysis and Prevention, 2014, 62: 319-328.
[25]	Zheng J, Zhu K, Wang R. Deep reinforcement learning for autonomous vehicles collaboration at unsignalized intersections[C]∥IEEE Global Communications Conference, Rio de Janeiro,Brazil, 2022: 1115-1120.
[26]	Teh Y W, Bapst V, Czarnecki W M, et al. Distral: robust multitask reinforcement learning[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems,Long Beach,USA,2017: 4499-4509.

Related Articles 15

[1]	Tian-jun SUN,Hui-zhe YANG,Rong-gui CAI,Jia-yi FENG,Rui RAN,Bin LIU. Humanized stop⁃and⁃go control strategy for adaptive cruise system of pure electric vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 2847-2857.
[2]	Wei-dong LI,Cao-yuan MA,Hao SHI,Heng CAO. An automatic driving decision control algorithm based on hierarchical reinforcement learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1798-1805.
[3]	Zi-hao SHEN,Yong-sheng GAO,Hui WANG,Pei-qian LIU,Kun LIU. Deep deterministic policy gradient caching method for privacy protection in Internet of Vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1638-1647.
[4]	Zhen-hai GAO,Cheng-yuan ZHENG,Rui ZHAO. Review of active safety verification and validation for autonomous vehicles in real and virtual scenarios [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1142-1162.
[5]	Bing ZHU,Tian-xin FAN,Wen-bo ZHAO,Wei-nan LI,Pei-xing ZHANG. Continuous test scenario complexity evaluation method for automated driving vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 456-467.
[6]	Ji-xuan YANG,Gui-hui ZHANG,Zhi-yong CHEN,Wen-ku SHI,Jian LIU,Ren-fei YUAN,Yan-yan ZHAO. Clearance design of taper roller bearings for reducing drive axle whine [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3141-3150.
[7]	Guang-he ZHU,Zhi-qiang ZHU,Yi-ping YUAN. Deep reinforcement learning optimization scheduling algorithm for continuous production line [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2086-2092.
[8]	Jing-peng GAO,Guo-xuan WANG,Lu GAO. LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(3): 797-806.
[9]	Xiao WU,Wen-ku SHI,Nian-cheng GUO,Yan-yan ZHAO,Zhi-yong CHEN,Xin-peng LI,Zhuo SUN,Jian LIU. Multi-objective optimization of hypoid gears based on Ease off [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(1): 76-85.
[10]	Jian ZHANG,Qing-yang LI,Dan LI,Xia JIANG,Yan-hong LEI,Ya-ping JI. Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2508-2518.
[11]	Yan-tao TIAN,Yan-shi JI,Huan CHANG,Bo XIE. Deep reinforcement learning augmented decision⁃making model for intelligent driving vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 682-692.
[12]	Bai-cang GUO,Guo-feng LUO,Li-sheng JIN,Xian-yi XIE,Dong-xian SUN. Construction method of cut-in scenario library for automatic driving virtual tests [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(11): 3130-3140.
[13]	Bao-gang LI,Yu WANG,Fan-wei KONG,Cheng-wei TIAN. Security status updates based on intelligent reflecting surface assistance and age of information metrics [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 3014-3025.
[14]	Wei-chao ZHUANG,Hao-nan Ding,Hao-xuan DONG,Guo-dong YIN,Xi WANG,Chao-bin ZHOU,Li-wei XU. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[15]	Xing-tao LIU,Xiao-jian LIU,Ji WU,Yao HE,Xin-tian LIU. State of health estimation method for lithium⁃ion battery based on curve compression and extreme gradient boosting [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1273-1280.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

方法	计算时间/s	碰撞率/%	通行效率/s	舒适性/（m·s^-2）
MAPPO	0.004	0.425	6.7	1.25
BCPPO	0.005	0	7.24	1.33
VICS（异步）	1.63	63.5	21.92	1.81
VICS（同步）	1.61	0	12.86	1.43

Behavior-constrained proximal policy optimization for autonomous intersection management

RICH HTML

PDF (PC)