吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (5): 965-977.

• • 上一篇    下一篇

基于持续强化学习的自动驾驶多城市场景决策 

刘朋友,于 镝,陈启丽,张昌文   

  1. 北京信息科技大学自动化学院,北京100192
  • 收稿日期:2025-02-12 出版日期:2025-09-28 发布日期:2025-11-19
  • 通讯作者: 于镝(1977— ), 女, 黑龙江安达人, 北京信息科技大学副教授, 硕士生导师,博士,主要从事智能决策研究,(Tel)86-13520786717(E-mail)yudizlg@aliyun.com。 E-mail:yudizlg@aliyun.com
  • 作者简介:刘朋友(1997— ), 男, 黑龙江海伦人, 北京信息科技大学硕士研究生, 主要从事自动驾驶智能决策研究, (Tel)86- 17745161350(E-mail)liupengyou21@126. com
  • 基金资助:
    国家自然科学基金资助项目(62103056) 

Autonomous Driving Decision-Making for Multi-City Scenarios Based on Continual Reinforcement Learning

LIU Pengyou, YU Di, CHEN Qili, ZHANG Changwen   

  1. School of Automation, Beijing Information Science and Technology University, Beijing 100192, China
  • Received:2025-02-12 Online:2025-09-28 Published:2025-11-19

摘要: 针对自动驾驶多城市场景决策的灾难性遗忘问题,提出基于持续强化学习的自动驾驶决策框架。 该方案 以IMPALA(Importance Weighted Actor-Learner Architecture)算法为基础架构, 首先融合共同注意力感知模块, 通过跨场景特征交互提取重要环境表征;然后搭建自激活神经集成架构,实现知识模块的自主激活;进而采用回放机制,结合场景特征和历史轨迹经验回放缓解旧知识的遗忘问题。 其中采用离策略行为克隆和在策略学习共同维持决策算法的可塑性和稳定性。 根据不同自动驾驶场景任务需要确定是否使用旧模块或生成新模块, 并通过融合模块解决占用内存过高的问题。 针对两组多城市场景进行消融实验和对比实验,通过对比路径完成率以及累积奖励验证方法的性能。 实验结果表明,在第1组顺序场景任务中的平均完成率达到85% 左右, 在第2种顺序场景中的平均完成率达到81.93%。 该方案能有效缓解多城市场景持续决策中的灾难性遗忘问题,并且取得更好的平稳驾驶性能。

关键词: 自动驾驶, 持续强化学习, 回放机制, 自激活神经集成架构, 灾难性遗忘

Abstract: To address the issue of catastrophic forgetting in decision-making for autonomous driving in multi-city scenarios, a framework based on continual reinforcement learning is proposed. This framework is built upon the IMPALA( Importance Weighted Actor-Learner Architecture) algorithm architecture. First, a co-attentive awareness module is combined to extract critical environmental representations through cross-scenario feature interaction. Second, a self-activating neural ensemble architecture is built to enable autonomous activation of knowledge modules. Finally, a replay mechanism is applied to relieve the problem of forgetting old knowledge by combining scenario-specific features with historical trajectory experience replay. Off-policy behavior cloning and on-policy learning are employed concurrently to maintain the plasticity and stability of the decision-making algorithm. Whether to use old modules or generate new ones is determined based on the requirements of different autonomous driving scenarios and tasks, and the issue of excessive memory usage is addressed through module fusion. Ablation experiments and comparative ones are conducted in two different groups of multiple city scenarios. The performance of the proposed method is validated by comparing path completion rates and cumulative rewards. Experimental results demonstrate that the average completion rate reaches approximately 85% in the first sequential scenario, and it reaches 81. 93% in the second sequential scenario. The proposed scheme can effectively relieve the issue of catastrophic forgetting in multi-scenario continual decision-making and achieve better stable driving performance. 

Key words: autonomous driving, continual reinforcement learning, replay mechanism, self-activating neural ensembles, catastrophic forgetting

中图分类号: 

  • TP181