吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (4): 588-599.

• • 上一篇    下一篇

基于 ATMADDPG 算法的多水面无人航行器编队导航

王思琪1, 关巍1, 佟敏2, 赵盛烨3   

  1. 1. 大连海事大学 航海学院, 辽宁 大连 116026; 2. 吉林大学 通信设计院股份有限公司, 长春 130012;3. 辽宁一辉科技集团股份公司 先进技术研究院, 沈阳 110170
  • 收稿日期:2023-05-18 出版日期:2024-07-22 发布日期:2024-07-22
  • 作者简介:王思琪(1997—摇 ), 女, 辽宁抚顺人, 大连海事大学硕士研究生, 主要从事无人船编队导航研究, (Tel)86-13290781627 (E-mail)misswang97@ foxmail. com; 通讯作者: 关巍(1982—摇 ), 男, 辽宁鞍山人, 大连海事大学教授, 博士生导师, 主要 从事无人船编队导航、 无人船避障研究, (Tel)86-18900982591(E-mail)gwwtxdy@ 163. com。
  • 基金资助:

    国家自然科学基金资助项目(52171342)

Formation Navigation of Multi-Unmanned Surface Vehicles Based on ATMADDPG Algorithm

WANG Siqi1 , GUAN Wei1 , TONG Min2 , ZHAO Shengye3   

  1. 1. Maritime College, Dalian Maritime University, Dalian 116026, China|2. Communication Design Institute Company Limited, Jilin University, Changchun 130012, China|3. Advanced Technology Research Institute, Liaoning Yihui Technology Group Company Limited, Shenyang 110170, China
  • Received:2023-05-18 Online:2024-07-22 Published:2024-07-22

摘要:

为提高多无人船编队系统的导航能力,提出了一种基于注意力机制的多智能体深度确定性策略梯度(ATMADDPG: Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient) 算法。 该算法在训练阶段, 通过大量试验训练出最佳策略, 并在实验阶段直接使用训练出的最佳策略得到最佳编队路径。仿真实验将 4 艘相同的“百川号冶无人船作为实验对象。 实验结果表明, 基于 ATMADDPG 算法的队形保持策略能实现稳定的多无人船编队导航, 并在一定程度上满足队形保持的要求。相较于多智能体深度确定性策略梯度(MADDPG: Multi-Agent Depth Deterministic Policy Gradient)算法, 所提出的 ATMADDPG 算法在收敛速度、 队形保持能力和对环境变化的适应性等方面表现出更优越的性能, 综合导航效率可提高约 80% , 具有较大的应用潜力。

关键词: 多无人船编队导航, MADDPG算法, 注意力机制, 深度强化学习

Abstract: The ATMADDPG ( Attention Mechanism based Multi-Agent Deep Deterministic Policy Gradient) algorithm is proposed to improve the navigation ability of a multi-unmanned ship formation system. In the training phase, the algorithm trains the best strategy through a large number of experiments, and directly uses the trained best strategy to obtain the best formation path in the experimental phase. The simulation experiment uses four ' Baichuan' unmanned ships as experimental objects. The experimental results show that the formation maintenance strategy based on the ATMADDPG algorithm can achieve stable navigation of multiple unmanned ship formations and meet the requirements of formation maintenance to some extent. Compared to the MADDPG (Multi-Agent Depth Deterministic Policy Gradient ) algorithm, the developed ATMADDPG algorithm shows superior performance in terms of convergence speed, formation maintenance ability, and adaptability to environmental changes. The comprehensive navigation efficiency can be improved by about 80% , which has great application potential.

Key words: formation navigation of multi-unmanned surface vehicles, multi-agent depth deterministic policy gradient (MADDPG) algorithm, attention mechanism, deep reinforcement learning

中图分类号: 

  • TP301