吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (10): 3151-3161.doi: 10.13229/j.cnki.jdxbgxb.20231335

• 车辆工程·机械工程 • 上一篇    

基于行为规约近端策略优化的自主交叉口管理方法

高镇海1(),郝鹤声2,高菲1,赵睿2()   

  1. 1.吉林大学 汽车仿真与控制国家重点实验室,长春 130022
    2.吉林大学 汽车工程学院,长春 130022
  • 收稿日期:2023-12-02 出版日期:2025-10-01 发布日期:2026-02-03
  • 通讯作者: 赵睿 E-mail:gaozh@jlu.edu.cn;rzhao@jlu.edu.cn
  • 作者简介:高镇海(1973-),男,教授,博士. 研究方向:汽车驾驶辅助系统,无人驾驶与驾驶行为分析. E-mail: gaozh@jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(52202494);国家自然科学基金项目(52202495)

Behavior-constrained proximal policy optimization for autonomous intersection management

Zhen-hai GAO1(),He-sheng HAO2,Fei GAO1,Rui ZHAO2()   

  1. 1.State Key Laboratory of Automotive Simulation and Control,Jilin University,Changchun 130022,China
    2.College of Automotive Engineering,Jilin University,Changchun 130022,China
  • Received:2023-12-02 Online:2025-10-01 Published:2026-02-03
  • Contact: Rui ZHAO E-mail:gaozh@jlu.edu.cn;rzhao@jlu.edu.cn

摘要:

针对当前集中式协同控制方法存在计算效率低和无安全保障的问题,本文首先提出了一种基于强化学习的群体协同算法,将单智能体近端策略优化扩展到多智能体协同合作的复杂交互环境中,以解决多智能体系统的复杂合作问题。其次,将无信号交叉口车辆集中式协同控制形式化为多智能体强化学习问题,并提出一种安全增强的交叉口集中式协同控制方法——行为规约近端策略优化。该方法将形式化安全验证及行为规约融入群体协同算法,以指导策略安全迭代优化和避免非安全驾驶行为,进一步保障未知场景下的通行安全。最后,通过仿真软件Carla进行模拟实验。仿真结果表明:行为规约的纳入牺牲了8.06%的通行效率,获得了100%的安全提升;相较典型的模型预测控制方法,本文方法将计算时间缩短到1/326倍,交通效率提高了67.0%,碰撞率从63.5%降低到0,舒适性提升了26.5%。

关键词: 车辆工程, 自主交叉口管理, 智能网联汽车, 强化学习, 形式化验证

Abstract:

Aiming at the low computational efficiency and absence of safety guarantees in existing centralized cooperative control schemes, a swarm-coordination algorithm based on reinforcement learning is first proposed. Single-agent proximal policy optimization is extended to multi-agent interactive environments so that complex cooperation in multi-agent systems can be addressed. Secondly, the cooperative control of vehicles at unsignalized intersections is formulated as a multi-agent reinforcement-learning problem, and a safety-augmented centralized cooperative control method—behavior-restricted proximal policy optimization—is developed. Formal safety verification and behavior restrictions are integrated into the swarm-coordination algorithm, whereby the policy is guided to be iteratively optimized in a safe manner and unsafe driving behaviors are avoided, so that traffic safety in unknown scenarios can be further guaranteed. Finally, simulation experiments are conducted with the Carla platform. It is shown that the incorporation of behavior restrictions causes an 8.06 % loss in traffic efficiency, yet a 100 % safety improvement is achieved. Compared with the representative model predictive control approach, the proposed method reduces the computation time to 1/326 of the original, increases traffic efficiency by 67.0 %, lowers the collision rate from 63.5 % to 0, and improves ride comfort by 26.5 %.

Key words: automotive engineering, autonomous intersection management, autonomous and connected vehicles, reinforcement learning, formal verification

中图分类号: 

  • U491

图1

行驶区域划分、驾驶路线及潜在冲突点"

图2

车辆运动模型"

图3

BCPPO系统架构"

图4

形式化安全验证流程"

算法1

基于BCPPO的交叉口车辆协同控制"

1

初始化行动家-评论家网络ππ0,网络参数θθ0。设置学习率α,奖励衰减系数γ,裁剪范围?,总时间步T,批大小B,最小批大小MB,世纪U

2

for k=1,2,?,T/B(对于每次迭代)

3

?初始化缓冲区Dbatch=?

4

?计算状态值Vθk(st)和奖励Rt

5

?在状态st,执行联合动作at={ai,t}i=1:Na,得到下一个状态st+1

6

?采集样本轨迹τk

7

?计算TD误差δt

8

?计算优势估计A?πθk

9

?计算值目标V?(st)=A?πθk+Vθk(st)

10

?存储τk,Vθk(st),A?πθk,V?(st)到缓冲区Dbatch

11

?更新πθkπθkold

12

?for epoch=1,2,?,U(对于每个世纪)

13

??随机排序缓冲区Dbatch内的数据批,并将其分成大小为MBDmmini

14

for m=1,2,?,B/MB(对于每个最小批)

15

???利用Dmmini计算新旧策略差异dtθk

16

???计算策略网络的替代目标Lactor(θk)

17

???计算值函数误差Lcritic(θk)

18

???计算行动家-评论家网络的替代目标LPPO(θk)

19

???利用?θkLPPO(θk)更新θk

20

??end for

21

?end for

22

end for

23

for t=0,1,?,n(对于策略部署中每个时间步)

24

?在状态st,遵循策略πθ选择动作ai,t

25

?for pij=1,2,?,M(对于每个冲突车辆对)

26

if 形式化验证到车辆i和车辆j即将碰撞

27

if pi>pj

28

????更新ajajcon

29

???else

30

????更新aiaicon

31

??end for

32

?执行联合动作at={ai,t}i=1:Na并得到下一个状态st+1

33

end for

表1

仿真实验参数设置"

参数
离散时间步长/s0.1 s
车辆数量Na8
冲突车辆对数量M19
速度范围/(m·s-1[5.5, 8]
缓冲区长度db/m25
行动家网络权重kactor1
评论家网络权重kcritic-0.5
隐藏层数量2
隐藏单元数量512,512
种子数5
总时间步T2 048 000
批大小B2 048
最小批大小MB64
世纪U10
裁剪范围?0.2
奖励衰减系数γ0.99
学习率α0.000 03→0
回报衰减系数λ0.95
碰撞事件奖励kc-50
单辆车通过奖励kpone10
全部车通过奖励kpall50
安全时间阈值Ts1.2
干预时间阈值Tc4
形式化验证权重kt-22
形式化验证偏差bf2
时间步奖励ks-0.1
舒适奖励权重kj-4.5e-2
最大碰撞风险H1 000
风险权重αv0.005
通行效率权重wv1
舒适性权重wa5
预测步数16
期望速度/(m·s-18

图5

MAPPO训练结果"

表2

交叉口车辆通行实验结果比较"

方 法计算时间/s碰撞率/%通行效率/s舒适性/(m·s-2
MAPPO0.0040.4256.71.25
BCPPO0.00507.241.33
VICS(异步)1.6363.521.921.81
VICS(同步)1.61012.861.43
[1] Du Y, Shang G W, Chai L G. A coupled vehicle-signal control method at signalized intersections in mixed traffic environment[J]. IEEE Transactions on Vehicular Technology, 2021, 70(3): 2089-2100.
[2] 庄伟超, 丁昊楠, 董昊轩, 等. 信号交叉口网联电动汽车自适应学习生态驾驶策略[J]. 吉林大学学报:工学版, 2023, 53(1): 82-93.
Zhuang Wei-chao, Ding Hao-nan, Dong Hao-xuan, et al. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[3] Khoury J, Khoury J, Zouein G, et al. A practical decentralized access protocol for autonomous vehicles at isolated under-saturated intersections[J]. Journal of Intelligent Transportation Systems, 2019, 23(5): 427-440.
[4] Karthikeyan P, Chen W, Hsiung P. Autonomous intersection management by using reinforcement learning[J]. Algorithms, 2022, 15(9): No.326.
[5] Chamideth S, Tarneberg W, Kihl M. A safe and robust autonomous intersection management system using a hierarchical control strategy and V2I communication[J]. IEEE Systems Journal, 2023, 17(1): 50-61.
[6] Antonio G, Maria-Dolores C. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow's intersections[J]. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7033-7043.
[7] Fajardo D, Au T, Waller S T, et al. Automated intersection control[J]. Transportation Research Record: Journal of the Transportation Research Board, 2011, 2259(1): 223-232.
[8] Dresner K, Stone P. A multiagent approach to autonomous intersection management[J]. Journal of Artificial Intelligent Research, 2008, 31(1): 591-656.
[9] Gregoire J, Bonnabel S, Arnaud D. Optimal cooperative motion planning for vehicles at intersections[J/OL].[2023-11-23]..
[10] Lu G, Li L, Wang Y, et al. A rule based control algorithm of connected vehicles in uncontrolled intersection[C]∥The 17th International IEEE Conference on Intelligent Transportation Systems, Qingdao, China,2014: 115-120.
[11] Zhang K, Arnaud D, Zhang D, et al. Analysis and modeled design of one state-driven autonomous passing-through algorithm for driverless vehicles at intersections[C]∥The 16th International Conference on Computational Science and Engineering,Sydney, Australia,2013: 751-757.
[12] Arnaud D. Analysis of reservation algorithms for cooperative planning at intersections[C]∥The 13th International IEEE Conference on Intelligent Transportation Systems,Funchal,Portugal, 2010: 445-449.
[13] Li N, Kolmanovsky I, Girard A, et al. Game theoretic modeling of vehicle interactions at unsignalized intersections and application to autonomous vehicle control[C]∥Annual American Control Conference,Milwaukee, USA,2018: 3215-3220.
[14] Wang H, Meng Q, Chen S. Competitive and cooperative behaviour analysis of connected and autonomous vehicles across unsignalised intersections: a game-theoretic approach[J]. Transportation Research Part B: Methodological, 2021, 149: 322-346.
[15] Elhenawy M, Elbery A A, Hassan A A, et al. An intersection game-theory-based traffic control algorithm in a connected vehicle environment[C]∥IEEE 18th International Conference on Intelligent Transportation Systems,Gran Canaria,Spain,2015: 343-347.
[16] Zhao W, Liu R, Ngoduy D. A bilevel programming model for autonomous intersection control and trajectory planning[J]. Transportmetrica A: Transport Science, 2021, 17(1): 34-58.
[17] Nair S H, Govindarajan V, Lin T, et al. Stochastic MPC with multi-modal predictions for traffic intersections[C]∥IEEE 25th International Conference on Intelligent Transportation Systems,Macau,China, 2022: 635-640.
[18] Kamal M A S, Imura J, Hayakawa T, et al. A vehicle-intersection coordination scheme for smooth flows of traffic without using traffic lights[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 16(3): 1136-1147.
[19] Fink M. Implementation of linear model predictive control-tutorial[J/OL].[2023-11-06]. .
[20] Zhou M, Yu U, Qu X. Development of an efficient driving strategy for connected and automated vehicles at signalized intersections: a reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 20(1): 433-443.
[21] Guo M, Wang P, Chan C Y, et al. A reinforcement learning approach for intelligent traffic signal control at urban intersections[C]∥IEEE Intelligent Transportation Systems Conference, Auckland, New Zealand,2019: 4242-4247.
[22] Guan Y, Ren Y, Li S, et al. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization[J]. IEEE Transactions on Vehicular Technology, 2020, 69(11): 12597-12608.
[23] Wang S, Wan Q. Right-turn driving decisions of autonomous vehicles at signal-free intersections [J]. Application Research of Computers, 2023, 40(5): 1468-1472.
[24] Nordfjarn T, Simseloglu, O, Rundmo T. Culture related to road traffic safety: a comparison of eight countries using two conceptualizations of culture[J]. Accident Analysis and Prevention, 2014, 62: 319-328.
[25] Zheng J, Zhu K, Wang R. Deep reinforcement learning for autonomous vehicles collaboration at unsignalized intersections[C]∥IEEE Global Communications Conference, Rio de Janeiro,Brazil, 2022: 1115-1120.
[26] Teh Y W, Bapst V, Czarnecki W M, et al. Distral: robust multitask reinforcement learning[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems,Long Beach,USA,2017: 4499-4509.
[1] 高镇海,鲍明喜,赵睿,唐明弘,高菲. 基于目标锚点驱动的多模态轨迹预测方法[J]. 吉林大学学报(工学版), 2026, 56(1): 21-30.
[2] 张向文,王子豪. 电动汽车制动模式切换过程电液协调控制策略[J]. 吉林大学学报(工学版), 2026, 56(1): 31-43.
[3] 兰巍,周政,王冠宇,王伟,张苗苗. 基于机器学习的汽车设计智能拟合方法[J]. 吉林大学学报(工学版), 2025, 55(9): 2858-2863.
[4] 王东,李宇暄,吴欢,宗芳. 基于随机森林的智能网联汽车开放测试道路评级算法[J]. 吉林大学学报(工学版), 2025, 55(9): 2998-3006.
[5] 李寿涛,贾湘怡,朱军,郭洪艳,于丁力. 基于Level-K的智能驾驶汽车无信控交叉路口决策方法[J]. 吉林大学学报(工学版), 2025, 55(9): 3069-3078.
[6] 孙天骏,杨惠喆,蔡荣贵,冯嘉仪,冉锐,刘斌. 面向纯电动汽车自适应巡航系统的人性化起停控制策略[J]. 吉林大学学报(工学版), 2025, 55(9): 2847-2857.
[7] 朱冰,孟鹏翔,刘斌,韩嘉懿,赵健,陈志成,宋东鉴,陶晓文. 基于交通环境信息的虚拟车道线拟合方法[J]. 吉林大学学报(工学版), 2025, 55(9): 2935-2945.
[8] 赵俊武,曲婷,胡云峰. 基于自适应采样的智能车辆轨迹规划方法[J]. 吉林大学学报(工学版), 2025, 55(8): 2802-2816.
[9] 于贵申,陈鑫,唐悦,赵春晖,牛艾佳,柴辉,那景新. 激光表面处理对铝-铝粘接接头剪切强度的影响[J]. 吉林大学学报(工学版), 2025, 55(8): 2555-2569.
[10] 高金武,孙少龙,王舜尧,高炳钊. 基于电机转矩补偿的增程器转速波动抑制策略[J]. 吉林大学学报(工学版), 2025, 55(8): 2475-2486.
[11] 朱科,邢志明,康翔宇. 机械手多任务均衡策略[J]. 吉林大学学报(工学版), 2025, 55(8): 2782-2790.
[12] 贾美霞,胡建军,肖凤. 基于多软件联合的车用电机变工况多物理场仿真方法[J]. 吉林大学学报(工学版), 2025, 55(6): 1862-1872.
[13] 肖纯,易子淳,周炳寅,张少睿. 基于改进鸽群优化算法的燃料电池汽车模糊能量管理策略[J]. 吉林大学学报(工学版), 2025, 55(6): 1873-1882.
[14] 宋学伟,于泽平,肖阳,王德平,袁泉,李欣卓,郑迦文. 锂离子电池老化后性能变化研究进展[J]. 吉林大学学报(工学版), 2025, 55(6): 1817-1833.
[15] 李伟东,马草原,史浩,曹衡. 基于分层强化学习的自动驾驶决策控制算法[J]. 吉林大学学报(工学版), 2025, 55(5): 1798-1805.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!