基于行为规约近端策略优化的自主交叉口管理方法

doi:10.13229/j.cnki.jdxbgxb.20231335

摘要/Abstract

摘要：

针对当前集中式协同控制方法存在计算效率低和无安全保障的问题，本文首先提出了一种基于强化学习的群体协同算法，将单智能体近端策略优化扩展到多智能体协同合作的复杂交互环境中，以解决多智能体系统的复杂合作问题。其次，将无信号交叉口车辆集中式协同控制形式化为多智能体强化学习问题，并提出一种安全增强的交叉口集中式协同控制方法——行为规约近端策略优化。该方法将形式化安全验证及行为规约融入群体协同算法，以指导策略安全迭代优化和避免非安全驾驶行为，进一步保障未知场景下的通行安全。最后，通过仿真软件Carla进行模拟实验。仿真结果表明：行为规约的纳入牺牲了8.06%的通行效率，获得了100%的安全提升；相较典型的模型预测控制方法，本文方法将计算时间缩短到1/326倍，交通效率提高了67.0%，碰撞率从63.5%降低到0，舒适性提升了26.5%。

关键词: 车辆工程, 自主交叉口管理, 智能网联汽车, 强化学习, 形式化验证

Abstract:

Aiming at the low computational efficiency and absence of safety guarantees in existing centralized cooperative control schemes， a swarm-coordination algorithm based on reinforcement learning is first proposed. Single-agent proximal policy optimization is extended to multi-agent interactive environments so that complex cooperation in multi-agent systems can be addressed. Secondly， the cooperative control of vehicles at unsignalized intersections is formulated as a multi-agent reinforcement-learning problem， and a safety-augmented centralized cooperative control method—behavior-restricted proximal policy optimization—is developed. Formal safety verification and behavior restrictions are integrated into the swarm-coordination algorithm， whereby the policy is guided to be iteratively optimized in a safe manner and unsafe driving behaviors are avoided， so that traffic safety in unknown scenarios can be further guaranteed. Finally， simulation experiments are conducted with the Carla platform. It is shown that the incorporation of behavior restrictions causes an 8.06 % loss in traffic efficiency， yet a 100 % safety improvement is achieved. Compared with the representative model predictive control approach， the proposed method reduces the computation time to 1/326 of the original， increases traffic efficiency by 67.0 %， lowers the collision rate from 63.5 % to 0， and improves ride comfort by 26.5 %.

Key words: automotive engineering, autonomous intersection management, autonomous and connected vehicles, reinforcement learning, formal verification

中图分类号:

U491

高镇海,郝鹤声,高菲,赵睿. 基于行为规约近端策略优化的自主交叉口管理方法[J]. 吉林大学学报(工学版), 2025, 55(10): 3151-3161.

Zhen-hai GAO,He-sheng HAO,Fei GAO,Rui ZHAO. Behavior-constrained proximal policy optimization for autonomous intersection management[J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3151-3161.

图/表 8

图1

图2

图3

图4

算法1

基于BCPPO的交叉口车辆协同控制"

1	初始化行动家-评论家网络 $π$ 为 $π 0$ ，网络参数 $θ$ 为 $θ 0$ 。设置学习率 $α$ ，奖励衰减系数 $γ$ ，裁剪范围 $?$ ，总时间步 $T$ ，批大小 $B$ ，最小批大小 $M B$ ，世纪 $U$
2	for $k = 1,2, ?, T / B$ （对于每次迭代）
3	?初始化缓冲区 $D b a t c h = ?$
4	?计算状态值 $V θ k (s t)$ 和奖励 $R t$
5	?在状态 $s t$ ，执行联合动作 $a t = {a i, t} i = 1 : N a$ ，得到下一个状态 $s t + 1$
6	?采集样本轨迹 $τ k$
7	?计算TD误差 $δ t$
8	?计算优势估计 $A ? π θ k$
9	?计算值目标 $V ? (s t) = A ? π θ k + V θ k (s t)$
10	?存储 $τ k, V θ k (s t), A ? π θ k, V ? (s t)$ 到缓冲区 $D b a t c h$ 中
11	?更新 $π θ k$ 为 $π θ k o l d$
12	?for $e p o c h = 1,2, ?, U$ （对于每个世纪）
13	??随机排序缓冲区 $D b a t c h$ 内的数据批，并将其分成大小为 $M B$ 的 $D m m i n i$
14	for $m = 1,2, ?, B / M B$ （对于每个最小批）
15	???利用 $D m m i n i$ 计算新旧策略差异 $d t θ k$
16	???计算策略网络的替代目标 $L a c t o r (θ k)$
17	???计算值函数误差 $L c r i t i c (θ k)$
18	???计算行动家-评论家网络的替代目标 $L P P O (θ k)$
19	???利用 $? θ k L P P O (θ k)$ 更新 $θ k$
20	??end for
21	?end for
22	end for
23	for $t = 0,1, ?, n$ （对于策略部署中每个时间步）
24	?在状态 $s t$ ，遵循策略 $π θ$ 选择动作 $a i, t$
25	?for $p i j = 1,2, ?, M$ （对于每个冲突车辆对）
26	if 形式化验证到车辆 $i$ 和车辆 $j$ 即将碰撞
27	if $p i > p j$
28	????更新 $a j$ 为 $a j c o n$
29	???else
30	????更新 $a i$ 为 $a i c o n$
31	??end for
32	?执行联合动作 $a t = {a i, t} i = 1 : N a$ 并得到下一个状态 $s t + 1$
33	end for

算法1

表1

仿真实验参数设置"

参数	值
离散时间步长/s	0.1 s
车辆数量 $N a$	8
冲突车辆对数量 $M$	19
速度范围/（m·s^-1）	［5.5， 8］
缓冲区长度 $d b$ /m	25
行动家网络权重 $k a c t o r$	1
评论家网络权重 $k c r i t i c$	-0.5
隐藏层数量	2
隐藏单元数量	512，512
种子数	5
总时间步 $T$	2 048 000
批大小 $B$	2 048
最小批大小 $M B$	64
世纪 $U$	10
裁剪范围 $?$	0.2
奖励衰减系数 $γ$	0.99
学习率 $α$	0.000 03→0
回报衰减系数 $λ$	0.95
碰撞事件奖励 $k c$	-50
单辆车通过奖励 $k p o n e$	10
全部车通过奖励 $k p a l l$	50
安全时间阈值 $T s$	1.2
干预时间阈值 $T c$	4
形式化验证权重 $k t$	-22
形式化验证偏差 $b f$	2
时间步奖励 $k s$	-0.1
舒适奖励权重 $k j$	-4.5e-2
最大碰撞风险 $H$	1 000
风险权重 $α v$	0.005
通行效率权重 $w v$	1
舒适性权重 $w a$	5
预测步数	16
期望速度/（m·s^-1）	8

表1

图5

表2

参考文献 26

[1]	Du Y, Shang G W, Chai L G. A coupled vehicle-signal control method at signalized intersections in mixed traffic environment[J]. IEEE Transactions on Vehicular Technology, 2021, 70(3): 2089-2100.
[2]	庄伟超, 丁昊楠, 董昊轩, 等. 信号交叉口网联电动汽车自适应学习生态驾驶策略[J]. 吉林大学学报:工学版, 2023, 53(1): 82-93.
	Zhuang Wei-chao, Ding Hao-nan, Dong Hao-xuan, et al. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[3]	Khoury J, Khoury J, Zouein G, et al. A practical decentralized access protocol for autonomous vehicles at isolated under-saturated intersections[J]. Journal of Intelligent Transportation Systems, 2019, 23(5): 427-440.
[4]	Karthikeyan P, Chen W, Hsiung P. Autonomous intersection management by using reinforcement learning[J]. Algorithms, 2022, 15(9): No.326.
[5]	Chamideth S, Tarneberg W, Kihl M. A safe and robust autonomous intersection management system using a hierarchical control strategy and V2I communication[J]. IEEE Systems Journal, 2023, 17(1): 50-61.
[6]	Antonio G, Maria-Dolores C. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow's intersections[J]. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7033-7043.
[7]	Fajardo D, Au T, Waller S T, et al. Automated intersection control[J]. Transportation Research Record: Journal of the Transportation Research Board, 2011, 2259(1): 223-232.
[8]	Dresner K, Stone P. A multiagent approach to autonomous intersection management[J]. Journal of Artificial Intelligent Research, 2008, 31(1): 591-656.
[9]	Gregoire J, Bonnabel S, Arnaud D. Optimal cooperative motion planning for vehicles at intersections[J/OL].[2023-11-23]..
[10]	Lu G, Li L, Wang Y, et al. A rule based control algorithm of connected vehicles in uncontrolled intersection[C]∥The 17th International IEEE Conference on Intelligent Transportation Systems, Qingdao, China,2014: 115-120.
[11]	Zhang K, Arnaud D, Zhang D, et al. Analysis and modeled design of one state-driven autonomous passing-through algorithm for driverless vehicles at intersections[C]∥The 16th International Conference on Computational Science and Engineering,Sydney, Australia,2013: 751-757.
[12]	Arnaud D. Analysis of reservation algorithms for cooperative planning at intersections[C]∥The 13th International IEEE Conference on Intelligent Transportation Systems,Funchal,Portugal, 2010: 445-449.
[13]	Li N, Kolmanovsky I, Girard A, et al. Game theoretic modeling of vehicle interactions at unsignalized intersections and application to autonomous vehicle control[C]∥Annual American Control Conference,Milwaukee, USA,2018: 3215-3220.
[14]	Wang H, Meng Q, Chen S. Competitive and cooperative behaviour analysis of connected and autonomous vehicles across unsignalised intersections: a game-theoretic approach[J]. Transportation Research Part B: Methodological, 2021, 149: 322-346.
[15]	Elhenawy M, Elbery A A, Hassan A A, et al. An intersection game-theory-based traffic control algorithm in a connected vehicle environment[C]∥IEEE 18th International Conference on Intelligent Transportation Systems,Gran Canaria,Spain,2015: 343-347.
[16]	Zhao W, Liu R, Ngoduy D. A bilevel programming model for autonomous intersection control and trajectory planning[J]. Transportmetrica A: Transport Science, 2021, 17(1): 34-58.
[17]	Nair S H, Govindarajan V, Lin T, et al. Stochastic MPC with multi-modal predictions for traffic intersections[C]∥IEEE 25th International Conference on Intelligent Transportation Systems,Macau,China, 2022: 635-640.
[18]	Kamal M A S, Imura J, Hayakawa T, et al. A vehicle-intersection coordination scheme for smooth flows of traffic without using traffic lights[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1136-1147.
[19]	Fink M. Implementation of linear model predictive control-tutorial[J/OL].[2023-11-06]. .
[20]	Zhou M, Yu U, Qu X. Development of an efficient driving strategy for connected and automated vehicles at signalized intersections: a reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 20(1): 433-443.
[21]	Guo M, Wang P, Chan C Y, et al. A reinforcement learning approach for intelligent traffic signal control at urban intersections[C]∥IEEE Intelligent Transportation Systems Conference, Auckland, New Zealand,2019: 4242-4247.
[22]	Guan Y, Ren Y, Li S, et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12597-12608.
[23]	Wang S, Wan Q. Right-turn driving decisions of autonomous vehicles at signal-free intersections [J]. Application Research of Computers, 2023, 40(5): 1468-1472.
[24]	Nordfjarn T, Simseloglu, O, Rundmo T. Culture related to road traffic safety: a comparison of eight countries using two conceptualizations of culture[J]. Accident Analysis and Prevention, 2014, 62: 319-328.
[25]	Zheng J, Zhu K, Wang R. Deep reinforcement learning for autonomous vehicles collaboration at unsignalized intersections[C]∥IEEE Global Communications Conference, Rio de Janeiro,Brazil, 2022: 1115-1120.
[26]	Teh Y W, Bapst V, Czarnecki W M, et al. Distral: robust multitask reinforcement learning[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems,Long Beach,USA,2017: 4499-4509.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

方法	计算时间/s	碰撞率/%	通行效率/s	舒适性/（m·s^-2）
MAPPO	0.004	0.425	6.7	1.25
BCPPO	0.005	0	7.24	1.33
VICS（异步）	1.63	63.5	21.92	1.81
VICS（同步）	1.61	0	12.86	1.43