基于近端策略优化的高速无人飞行器上升段在线轨迹规划

doi:10.13229/j.cnki.jdxbgxb20221282

摘要/Abstract

摘要：

针对高速无人飞行器上升段在线轨迹规划需要实现多约束下在线快速求解的问题，首先搭建了飞行器的运动和动力学模型，给出了轨迹规划所面临的约束条件；并根据约束条件和飞行特性，基于近端策略优化（PPO）策略梯度优化设计了满足任务要求的动作状态空间和奖励评价函数。其次，基于飞行器上升段轨迹规划具有很强时间记忆性的特性，在传统PPO算法基础上引入长短期记忆网络（LSTM）网络结构，利用PPO-LSTM算法解决了高速飞行器上升段在线轨迹规划问题，训练出能够根据飞行器状态实时规划最优攻角策略的模型。最后，根据蒙特卡洛仿真对算法性能进行验证，结果表明，相比于传统PPO和粒子群算法，本文算法终端状态的均方根误差减小了约50%，充分证明了本文算法的优越性和有效性。

关键词: 导航制导与控制, 高速无人飞行器, 上升段, 轨迹规划, 近端策略优化算法

Abstract:

Aiming at the problem that the online trajectory planning of high-speed unmanned aerial vehicle (UAV) in the ascending phase needs to realize online fast solution under multiple constraints, firstly, the motion and dynamics model of the vehicle was built, and the constraints faced by the trajectory planning were given. According to the constraints and flight characteristics, the action state space and reward evaluation function that meet the mission requirements were designed based on the near end strategy optimization (PPO) strategy gradient optimization. Secondly, based on the characteristics of strong time memory of the trajectory planning in the ascending phase of the aircraft, the short and long term memory network (LSTM) network structure was introduced on the basis of the traditional PPO algorithm, and the PPO-LSTM algorithm was used to solve the online trajectory planning problem in the ascending phase of the high-speed aircraft, and the model that can plan the optimal angle of attack strategy in real time according to the aircraft state was trained. Finally, the performance of the algorithm was verified by Monte Carlo simulation. The results show that the root-mean-square error of the terminal state of the algorithm in this paper is reduced by about 50% compared with the traditional PPO and particle swarm optimization, which fully proves the superiority and effectiveness of the proposed algorithm.

Key words: navigation guidance and control, hypersonic umanned areial vehicle, ascent phase, trajectory programming, proximal policy optimization algorithm

中图分类号:

佘智勇,朱彤鸣,刘旺魁. 基于近端策略优化的高速无人飞行器上升段在线轨迹规划[J]. 吉林大学学报(工学版), 2023, 53(3): 863-870.

Zhi-yong SHE,Tong-ming ZHU,Wang-kui LIU. Rapid trajectory programming for hypersonic umanned areial vehicle in ascent phase based on proximal policy optimization[J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 863-870.

图/表 13

图1

图2

表1

表2

图3

图4

表3

图5

图6

图7

图8

图9

表4

参考文献 21

1	张国成, 姚彦龙, 王慧. 美国两级入轨水平起降可重复使用空天运载器发展综述[J].飞机设计, 2018, 38(2): 1-6.
	Zhang Guo-cheng, Yao Yan-long, Wang Hui. A survey on development of two-stage-to-orbit horizontal take-off horizontal landing reusable launch vehicle in USA[J]. Aircraft Design, 2018, 38(2): 1-6.
2	谭永华, 李平, 杜飞平. 重复使用天地往返运输系统动力技术发展研究[J]. 载人航天, 2019, 25(1): 1-11.
	Tan Yong-hua, Li Ping, Du Fei-ping. Research on development of propulsion technology for reusable space transportation system[J]. Manned Spaceflight, 2019, 25(1): 1-11.
3	Shahriar K, Maj D M. Six-DOF modeling and simulation of a generic hypersonic vehicle for conceptual design studies[C]∥AIAA Paper, 2004-4805.
4	Randall T, Lawrence D, Charles R. X-43A hypersonic vehicle technology development[J]. Acta Astronautic, 2006, 59(1-5): 181-191.
5	Keshmiri S, Colgren R, Mirmirani D. Six-DOF modeling and simulation of a generic hypersonic for control and navigation purposes[C]∥AIAA Paper, 2006-6694.
6	宗群, 田栢苓, 窦立谦, 等. 基于Gauss伪谱法的临近空间飞行器上升段轨迹优化[J]. 宇航学报,2010, 31(7) : 1775-1781.
	Zong Qun, Tian Bai-ling, Dou Li-qian, et al. Ascent phase trajectory optimization for near space vehicle based on Gauss pseudospectral method[J]. Journal of Astronautics, 2010, 31(7): 1775-1781.
7	Pan B F, Lu P. Improvements to optimal launch ascent guidance[C]∥AIAA Paper, 2010-8174.
8	Fabrizio P, Edmondo M, Christie M,et al. Ascent trajectory optimization for a single-stage-to-orbit vehicle with hybrid propulsion[C]∥AIAA Paper: 2012-5828.
9	李惠峰, 李昭莹. 高超声速飞行器上升段最优制导间接法研究[J]. 宇航学报, 2011, 32(2): 297-309.
	Li Hui-feng, Li Zhao-ying. Indirect method of optimal ascent guidance for hypersonic vehicle[J]. Journal of Astronautics, 2011, 32(2): 297-309.
10	万里鹏, 兰旭光, 张翰博, 等. 深度强化学习理论及其应用综述[J]. 模式识别与人工智能, 2019,32(1): 73-87.
	Wan Li-peng, Lan Xu-guang, Zhang Han-bo, et al. A review of deep reinforcement learning theory and application[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(1): 73-87.
11	Gaudet B, Furfaro R. Missile homing-phase guidance law design using reinforcement learning[J/OL]. [2022-09-18].
12	Yu X, Lv Z, Wu Y H, et al. Neural network modeling and backstepping control for quadrotor[J/OL]. [2022-09-18].
13	周宏宇, 王小刚, 赵亚丽, 等. 组合动力运载器上升段轨迹智能优化方法[J]. 宇航学报, 2020, 41(1):61-70.
	Zhou Hong-yu, Wang Xiao-gang, Zhao Ya-li, et al. Ascent trajectory optimization for a multi-combined-cycle-based launch vehicle using a hybrid heuristic algorithm[J]. Journal of Astronautics, 2020, 41(1): 61-70.
14	刘扬, 何泽众, 王春宇. 基于DDPG算法的末制导律设计研究[J]. 计算机学报,2021, 44(9): 54-65.
	Liu Yang, He Ze-zhong, Wang Chun-yu. Terminal guidance law design based on DDPG algorithm[J]. Chinese Journal of Computers, 2021, 44(9): 54-65.
15	Liang C, Wang W, Liu Z,et al. Learning to guide:guidance law based on deep meta-learning and model predictive path integral control[J]. IEEE Access, 2019, 7: 47353-47365.
16	张建强, 王振国, 李清廉. 空气深度预冷组合循环发动机吸气式模态建模及性能分析[J]. 国防科技大学学报, 2018, 40(1): 1-9.
	Zhang Jian-qiang, Wang Zhen-guo, Li Qing-lian. Modeling and performance analysis of deeply precooled combined cycle engine in the air-breathing mode[J]. Journal of National University of Defense Technology, 2018, 40(1): 1-9.
17	Sutton R S, Barto A G. Reinforcement Learning:an Introduction[M].2nd Edition.Cambridge: MIT Press, 2018.
18	Wu Yu-hu, Shen Tie-long. Policy iteration algorithm for optimal control of stochastic logical dynamical systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017(99): 1-6.
19	Gers F, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM[J]. Neural Computation, 2000, 12(10): 2451-2471.
20	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J/OL]. [2022-09-22].
21	Dossa R, Huang S Y, Ontanon S, et al. An empirical investigation of early stopping optimizations in proximal policy optimization[J/OL]. [2022-09-22].

相关文章 12

[1]	肖雪,李克平,彭博,昌满玮. 基于决策-规划迭代框架的智驾车换道行为建模[J]. 吉林大学学报(工学版), 2023, 53(3): 746-757.
[2]	潘弘洋,刘昭,杨波,孙庚,刘衍珩. 基于新一代通信技术的无人机系统群体智能方法综述[J]. 吉林大学学报(工学版), 2023, 53(3): 629-642.
[3]	张玮,张树培,罗崇恩,张生,王国林. 智能汽车紧急工况避撞轨迹规划[J]. 吉林大学学报(工学版), 2022, 52(7): 1515-1523.
[4]	彭浩楠,唐明环,查奇文,王伟忠,王伟达,项昌乐,刘玉龙. 自动驾驶汽车双车道换道最优轨迹规划方法[J]. 吉林大学学报(工学版), 2022, 52(12): 2852-2863.
[5]	鲜斌,张诗婧,韩晓薇,蔡佳明,王岭. 基于强化学习的无人机吊挂负载系统轨迹规划[J]. 吉林大学学报(工学版), 2021, 51(6): 2259-2267.
[6]	杨毅,王斯财,南英. 大型水陆两栖飞机海上最优搜索航路规划算法[J]. 吉林大学学报(工学版), 2019, 49(3): 963-971.
[7]	张琳, 章新杰, 郭孔辉, 王超, 刘洋, 刘涛. 未知环境下智能汽车轨迹规划滚动窗口优化[J]. 吉林大学学报(工学版), 2018, 48(3): 652-660.
[8]	曲兴田, 闫龙威, 孙慧超, 周伟, 李光辉. 工作平台可翻转的3D打印机装置结构分析[J]. 吉林大学学报(工学版), 2017, 47(5): 1489-1497.
[9]	曹福成, 邢笑雪, 李元春, 赵希禄. 下肢康复机器人轨迹自适应滑模阻抗控制[J]. 吉林大学学报(工学版), 2016, 46(5): 1602-1608.
[10]	管成，王飞，张登雨. 基于NURBS的挖掘机器人时间最优轨迹规划[J]. 吉林大学学报(工学版), 2015, 45(2): 540-546.
[11]	缪东晶，吴聊，徐静，陈恳，谢颖，刘志. 飞机表面自动喷涂机器人系统与喷涂作业规划[J]. 吉林大学学报(工学版), 2015, 45(2): 547-553.
[12]	孙浩, 邓伟文, 张素民, 吴梦勋. 考虑全局最优性的汽车微观动态轨迹规划[J]. 吉林大学学报(工学版), 2014, 44(4): 918-924.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

参数	数值	参数	数值
输入层节点数	6	网络堆叠层数	2
隐藏层节点数	1	神经元随机丢弃数	0
采用偏置	True	采用双向网络	True

参数	数值	参数	数值
学习率	0.003	Gae lambda	0.95
更新步长	2048	剪辑参数	0.2
Batch size	128	损失的熵系数	0
优化损失epoch	10	损失的价值函数系数	0.5
折扣系数	0.99	梯度剪裁最大值	0.5

参数	拉偏范围
初始轨迹倾角/（°）	±3°
发动机推力/N	±20%
升阻力/N	±20%
大气密度/（kg·m^-3）	±20%
飞行器质量/kg	±20%