吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (6): 1230-1236.

• • 上一篇    下一篇

基于强化学习的多无人机系统航线规划

涂晓彬   

  1. 闽南理工学院 信息工程学院, 福建 泉州 362700
  • 收稿日期:2025-06-26 出版日期:2025-12-08 发布日期:2025-12-08
  • 作者简介:涂晓彬(1990—), 男, 福建漳州人,闽南理工学院讲师,硕士,主要从事大数据分析、机器学习研究,( Tel) 86- 15159517692(E-mail)936705675@ qq. com。
  • 基金资助:
    福建省科技厅科技计划基金资助项目(2024H0038); 闽南理工学院科技创新团队基金资助项目(2024XTD160)

Route Planning for Multi-UAV Systems Based on Reinforcement Learning

TU Xiaobin   

  1. College of Information Engineering, Minnan University of Science and Technology, Quanzhou 362700, China
  • Received:2025-06-26 Online:2025-12-08 Published:2025-12-08

摘要:

为使多无人机集群在特定网络条件下实现通信性能、任务效率与飞行安全的综合优化, 使其更好完成城市区域的巡查任务, 对多目标约束下的航线规划问题进行了研究。基于双深度强化学习技术, 对已知通信质量分布的空域进行空间离散化处理, 建立空间、能耗与通信模型, 设计包含数据获取量、飞行安全、剩余电量及路径消耗的多维度奖励函数, 通过经验回放与目标网络机制稳定训练过程。实验表明, 训练后的网络模型在未预知环境中仍能生成最优无线网络传输策略与安全飞行轨迹, 可有效解决多目标约束下的航线规划问题,验证了双深度强化学习在该领域的适用性。

关键词:

Abstract:

The aim is to enable multi-UAV ( Unmanned Aerial Vehicle) swarms to achieve comprehensive optimization of communication performance, task efficiency, and flight safety under specific network conditions,thereby better conducting patrol missions in urban areas. Based on double deep reinforcement learning technology, the spatial discretization processing on the airspace with known communication quality distribution is studied, spatial models, energy consumption models and communication models are established. A multi-dimensional reward function including data acquisition, flight safety, remaining power and path consumption is designed, and the training process is established through experience replay and target network mechanisms.Experiments show that the trained network model can generate optimal wireless network transmission strategies and safe flight trajectories in unforeseen environments. The research effectively solves the route planning problem under multi-objective constraints and verifies the applicability of double deep reinforcement learning in this field.

Key words:

中图分类号: 

  • TN929