吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (10): 3180-3188.doi: 10.13229/j.cnki.jdxbgxb.20240017

• 交通运输工程·土木工程 • 上一篇    

基于深度强化学习的自动驾驶车辆与行人交互建模

胡伟超1,2(),杨镇铭3,于鹏程2,陈艳艳1,马社强3   

  1. 1.北京工业大学 城市建设学部,北京 100124
    2.公安部道路交通安全研究中心,北京 100062
    3.中国人民公安大学 交通管理学院,北京 100091
  • 收稿日期:2024-01-03 出版日期:2025-10-01 发布日期:2026-02-03
  • 作者简介:胡伟超(1987-),男,副研究员,博士.研究方向:智能交通,交通安全. E-mail: dfengwin@163.com
  • 基金资助:
    国家重点研发计划项目(2020YFB1600304)

Modeling interaction policy of autonomous vehicle and pedestrian based on deep reinforcement learning

Wei-chao HU1,2(),Zhen-ming YANG3,Peng-cheng YU2,Yan-yan CHEN1,She-qiang MA3   

  1. 1.Faculty of Architecture,Civil and Transportation Engineering,Beijing University of Technology,Beijing 100124,China
    2.Research Institute for Road Safety of the Ministry of Public Security,Beijing 100062,China
    3.School of Traffic Management,People’s Public Security University of China,Beijing 100091,China
  • Received:2024-01-03 Online:2025-10-01 Published:2026-02-03

摘要:

为满足自动驾驶车辆安全、高效地与行人进行交互,保护行人安全,本文使用多智能体深度确定性策略梯度算法建立自动驾驶车辆和人工驾驶车辆混行下的人车交互模型并求解交互策略,使自动驾驶车辆能够在不依赖通信的前提下避免事故发生。将本文算法与其他基线算法对比,在训练效果、碰撞率和通行效率方面均有显著提高,同时将本文模型在不同风险等级的场景中进行实验,结果表明:随着行人行为噪声强度的增加,两种车辆的通行效率降低,而自动驾驶车辆的碰撞率出现先增加后降低的趋势,在高噪声强度下自动驾驶车辆的避碰能力比人工驾驶车辆强,更好地保护了行人的安全。

关键词: 交通运输系统工程, 自动驾驶车辆, 人车交互, 深度强化学习, 多智能体系统

Abstract:

To facilitate safe and efficient interactions between Autonomous Vehicles(AVs) and pedestrians, this study employs the Multi-Agent Deep Deterministic Policy Gradient(MADDPG) algorithm to establish a pedestrian-vehicle interaction model in a mixed traffic context that includes both autonomous and human-driving vehicles. This model formulates interaction strategies enabling AVs to avert accidents without the necessity of direct inter-vehicle communication. In comparison with several benchmark algorithms, the proposed algorithm demonstrates substantial improvements in terms of training efficacy, collision frequency reduction, and traffic capacity. Additionally, the robustness of the proposed model is assessed across varied risk scenarios. Findings reveal that as the intensity of pedestrian behavioral randomness, or behavioral noise rises, the duration of interaction delays of both vehicle categories increases. Remarkably, the collision rate of AVs initially increases before declining, indicating an adaptive learning phase. Under conditions of elevated noise, AVs exhibit a superior capability for collision avoidance compared to human-driving vehicles, highlighting their enhanced resilience in chaotic urban traffic conditions. These outcomes underscore the potential of MADDPG-based frameworks to significantly contribute to safer, more efficient AV integration in mixed traffic scenarios.

Key words: engineering of transportation system, autonomous vehicle, interaction of vehicle and pedestrian, deep reinforcement learning, multi-agent system

中图分类号: 

  • U491

图1

DDPG工作流程图"

图2

训练奖励对比"

图3

速度对比"

图4

AV/HDV碰撞率"

图5

AV/HDV延误时间"

图6

智能体交互示意图"

图7

AV-行人交互过程中AV1和AV2的速度变化"

[1] 公安部交通管理局. 中华人民共和国道路交通事故统计年报(2022年度)[R/OL].[2024-05-22].
[2] 刘荣, 王凤兰, 吕良东. 基于改进复制动态演化博弈模型的行人与机动车冲突[J].科学技术与工程,2020, 20(30): 12486-12491.
Liu Rong, Wang Feng-lan, Liang-dong Lyu. Game model of pedestrian-vehicle conflict based on improved replication dynamic evolution[J]. Science Technology and Engineering, 2020, 20(30): 12486-12491.
[3] Gupta S, Vasardani M, Winter S. Negotiation between vehicles and pedestrians for the right of way at intersections[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(3): 888-899.
[4] Kalatian A, Farooq B. Deepwait: pedestrian wait time estimation in mixed traffic conditions using deep survival analysis[C]∥IEEE Intelligent Transportation Systems Conference(ITSC),Auckland, New Zealand, 2019: 2034-2039.
[5] Schratter M, Hartmann M, Watzenig D. Pedestrian collision avoidance system for autonomous vehicles[J]. SAE International Journal of Connected and Automated Vehicles, 2019, 2(12): 279-293.
[6] Camara F, Romano R, Markkula G, et al. Empirical game theory of pedestrian interaction for autondomous vehicles[C]∥Proceedings of Measuring Behavior, Manchester, UK, 2018: 238-244.
[7] Chae H, Kang C M, Kim B D, et al. Autonomous braking system via deep reinforcement learning[C]∥IEEE 20th International Conference on Intelligent Transportation Systems(ITSC),Shanghai, China, 2017: 1-6.
[8] Papini G P R, Plebe A, Da Lio M, et al. A reinforcement learning approach for enacting cautious behaviours in autonomous driving system: safe speed choice in the interaction with distracted pedestrians[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(7): 8805-8822.
[9] Schroeder B J, Rouphail N M. Event-based modeling of driver yielding behavior at unsignalized crosswalks[J].Journal of Transportation Engineering, 2011, 137(7): 455-465.
[10] Zhao J, Malenje J O, Wu J, et al. Modeling the interaction between vehicle yielding and pedestrian crossing behavior at unsignalized midblock crosswalks[J].Transportation Research Part F: Traffic Psychology and Behaviour, 2020, 73: 222-235.
[11] 张健, 李青扬, 李丹, 等. 基于深度强化学习的自动驾驶车辆专用道汇入引导[J]. 吉林大学学报: 工学版, 2023, 53(9): 2508-2518.
Zhang Jian, Li Qing-yang, Li Dan, et al. Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2508-2518.
[12] 秦严严, 王昊, 王炜. 智能网联环境下的混合交通流LWR模型[J]. 中国公路学报, 2018, 31(11): 147-156.
Qin Yan-yan, Wang Hao, Wang Wei. LWR model for mixed traffic flow in connected and autonomous vehicular environments[J]. China Journal of Highway and Transport, 2018, 31(11): 147-156.
[13] Becker F, Axhausen K W. Literature review on surveys investigating the acceptance of automated vehicles[J]. Transportation, 2017, 44(6): 1293-1306.
[14] Elliott D, Keen W, Miao L. Recent advances in connected and automated vehicles[J]. Journal of Traffic and Transportation Engineering(English Edition),2019, 6(2): 109-131.
[15] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J/OL].[2023-03-10]. .
[16] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. [2023-12-10]. https: arxiv.org/pdf/1509.02971.
[17] Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[J]. Advances in Neural Information Processing Systems, 2017, 30:1-16.
[18] Franois-Lavet V, Henderson P, Islam R,et al.An introduction to deep reinforcement learning[J]. Foundations and Trends® in Machine Learning, 2018, 11(3-4):219-354.
[19] Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909-4926.
[20] Vasquez R, Farooq B. Multi-objective autonomous braking system using naturalistic dataset[C]∥IEEE Intelligent Transportation Systems Conference(ITSC),Auckland, New Zealand, 2019: 4348-4353.
[21] 王殿海, 金盛. 车辆跟驰行为建模的回顾与展望[J].中国公路学报, 2012, 25(1): 115-127.
Wang Dian-hai, Jin Sheng. Review and outlook of modeling of car following behavior[J]. China Journal of Highway and Transport, 2012, 25(1): 115-127.
[22] Willis A, Gjersoe N, Havard C, et al. Human movement behaviour in urban spaces: implications for the design and modelling of effective pedestrian environments[J]. Environment and Planning B: Planning and Design, 2004, 31(6): 805-828.
[23] Trumpp R, Bayerlein H, Gesbert D. Modeling interactions of autonomous vehicles and pedestrians with deep multi-agent reinforcement learning for collision avoidance[C]∥IEEE Intelligent Vehicles Symposium (IV), Beijing, China, 2022: 331-336.
[24] 王辉, 秦华, 冉令华, 等. 无交通信号路口行人过街的人车交互过程研究[J]. 科学技术与工程, 2023, 23(28):12275-12281.
Wang Hui, Qin Hua, Ran Ling-hua, et al. Human vehicle interaction process of pedestrian crossing at no traffic signal intersection[J].Sicence Technology and Engineering, 2023, 23(28): 12275-12281.
[25] Schmidt S, Faerber B. Pedestrians at the Kerb-recognising the action intentions of humans[J]. Transportation Research Part F: Traffic Psychology and Behaviour, 2009, 12(4): 300-310.
[26] Dean B A K.Grammatical design and crowd behaviour: a study of factors that influence human movement in urban spaces[C]∥Proceedings of the 10th International Conference on Computer Aided Architectural Design Research in Asia,New Delhi, India, 2005:648-650.
[27] Millard-Ball A. Pedestrians, autonomous vehicles, and cities[J]. Journal of Planning Education and Research, 2018, 38(1): 6-12.
[1] 马壮林,毕宇明,周备,邓亚娟,兆雪. 公交换乘优惠政策下居民换乘意向的异质性分析[J]. 吉林大学学报(工学版), 2026, 56(1): 158-169.
[2] 曲昭伟,王铭阳,王喆,宋现敏,张云翔,黄镜尘. 基于自动驾驶模块化车辆主辅功能分配的公交自适应调度方法[J]. 吉林大学学报(工学版), 2025, 55(9): 2946-2957.
[3] 王琳虹,刘宇阳,刘子昱,鹿应佳,张宇恒,黄桂树. 基于YOLOv5的轻量化桥梁缺陷识别[J]. 吉林大学学报(工学版), 2025, 55(9): 2958-2968.
[4] 张云翔,宋现敏,谢渝,湛天舒. 基于用户满意度的停车预约服务智能体行为仿真[J]. 吉林大学学报(工学版), 2025, 55(9): 2978-2984.
[5] 穆长儒,徐亮,程国柱. 基于能量合理分配的外包U型钢-混凝土组合护栏防撞性能[J]. 吉林大学学报(工学版), 2025, 55(8): 2669-2680.
[6] 朱科,邢志明,康翔宇. 机械手多任务均衡策略[J]. 吉林大学学报(工学版), 2025, 55(8): 2782-2790.
[7] 李艳波,汪静远,陈圆媛,程绍峰,吕浩楠,陈俊硕. 面向高速公路服务区自洽能源系统的RAMS评价方法[J]. 吉林大学学报(工学版), 2025, 55(7): 2243-2250.
[8] 柴树山,周志强,李海涛,徐炅旸. 基于图时空模式学习网络的路网实时交通事件自动检测方法[J]. 吉林大学学报(工学版), 2025, 55(7): 2145-2161.
[9] 于江波,翁剑成,林鹏飞,孙宇星,柴娇龙. 基于混合Transformer的对外客运枢纽抵站客流预测模型[J]. 吉林大学学报(工学版), 2025, 55(7): 2251-2259.
[10] 戢晓峰,邓若凡,乔新,关昊天. 建成环境对共享单车时间集聚模式的非线性影响[J]. 吉林大学学报(工学版), 2025, 55(7): 2233-2242.
[11] 闫晟煜,程铭杰,田宏策,王洪瑀,周永恒,马博浩. 封闭式景区纯电动客车调度方法[J]. 吉林大学学报(工学版), 2025, 55(6): 1984-1993.
[12] 赵红专,吴泽健,张鑫,石胜文,李文勇,展新,许恩永,王佳明. 基于密度离散度和信息传输延迟的网联商用车弯道格子模型[J]. 吉林大学学报(工学版), 2025, 55(6): 2015-2029.
[13] 潘福全,牛远征,张丽霞,杨金顺,陈秀锋,陈德启. 智能网联环境下无信号交叉口车辆通行控制策略[J]. 吉林大学学报(工学版), 2025, 55(6): 1948-1962.
[14] 申自浩,高永生,王辉,刘沛骞,刘琨. 面向车联网隐私保护的深度确定性策略梯度缓存方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1638-1647.
[15] 潘义勇,徐家聪,尤逸文,全勇俊. 网约车出行需求影响因素多尺度空间异质性分析[J]. 吉林大学学报(工学版), 2025, 55(5): 1567-1575.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!