基于异步合作更新的LSTM-MADDPG多智能体协同决策算法

doi:10.13229/j.cnki.jdxbgxb.20220523

Abstract

Abstract:

In fully cooperative tasks， the MADDPG algorithm has credit assignment and poor stability of training problem. To address this problem， a LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update was proposed. According to the idea of Difference Reward and Value Decomposition， LSTM was used to extract the characteristics between trajectory sequences. The global reward division was optimized to realize the agent's reward distribution. In order to meet requirements of algorithm joint training， the high-quality training set was constructed. Then， the asynchronous cooperative update method was designed to joint train the LSTM-MADDPG network， and realize the cooperation of multi-agent. In cooperative capture scene， the simulation results show that the convergence speed of the proposed algorithm is increased by 20.51% compared with the QMIX. After the convergence of algorithm training， the update method of asynchronous cooperation reduces the mean square error of normalized reward value by 57.59% compared with synchronous update， which improves the stability of algorithm convergence.

Key words: artificial intelligence, multi-agent coordination decision making, deep reinforcement learning, credit assignment, update of asynchronous cooperation

CLC Number:

TP18

Jing-peng GAO,Guo-xuan WANG,Lu GAO. LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update[J].Journal of Jilin University(Engineering and Technology Edition), 2024, 54(3): 797-806.

Figures/Tables 13

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Table 1

Table 2

Collaborative capture reward settings"

智能体动作	动作评判	奖励值
接近/远离目标	奖励/惩罚	$r g$
捕获目标	奖励	10
发生碰撞	惩罚	-5
越过边界	惩罚	-5

Table 2

Table 3

Training parameter setting"

参数	数值	参数	数值
Episode	60 000	Batch_size 1	1024
Step	30	Batch_size 2	32
D₁	25 600	$γ$	0.95
D₂	128	$τ$	0.000 1
Lr	0.01	N	5

Table 3

Fig.6

Table 4

Fig.7

Table 5

Table 6

References 17

1	Feng J, Li H, Huang M, et al. Learning to collaborate: multi-scenario ranking via multi-agent reinforcement learning[C]∥Proceedings of the 2018 World Wide Web Conference, Lyon, France, 2018: 1939-1948.
2	杨顺, 蒋渊德, 吴坚, 等. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报: 工学版, 2019, 49(4): 1026-1033.
	Yang Shun, Jiang Yuan-de, Wu Jian, et al. Autonomous driving policy learning based on deep reinforcement learning and multi-type sensor data[J]. Journal of Jilin University (Engineering and Technology Edition), 2019, 49(4): 1026-1033.
3	Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multi-agent deep reinforcement learning[J]. Auto Agent Multi-Agent Systems, 2019, 33(6): 750-797.
4	Lowe R, Wu Y, Tamar Aviv, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]∥Proceedings of the 31th International Conference on Neural Information Processing Systems, New York, USA, 2017: 6382-6393.
5	Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multi-agent systems: a review of challenges, solutions, and applications[J]. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839.
6	Holmesparker C, Taylor M E, Agogino A, et al. Cleaning the reward: counterfactual actions to remove exploratory action noise in multi-agent learning[C]∥Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, Paris, France, 2014: 1353-1354.
7	Chang Y H, Ho T, Kaelbling L P. All learning is local: multi-agent learning in global reward games[C]∥Proceedings of the 17th Advances in Neural Information Processing Systems, Cambridge, USA,2004: 807-814.
8	Devlin S, Yliniemi L, Kudenko D, et al. Potential based difference rewards for multi-agent reinforcement learning[C]∥Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, Paris, France, 2014: 165-172.
9	Foerster J N, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]∥Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 2974-2982.
10	Chen J, Guo L, Jia J, et al. Resource allocation for IRS assisted SGF NOMA transmission: a MADRL approach[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(4): 1302-1316.
11	Sunehag P, Lever G, Gruslys A, et al. Value decomposition networks for cooperative multi-agent learning based on team reward[C]∥Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems, Stockholm, Sweden, 2018: 2085-2087.
12	Rashid T, Samvelyan M, Schroeder C, et al. QMIX: monotonic value function factorization for deep multi-agent reinforcement learning[C]∥Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018: 4295-4304.
13	Son K, Kim D, Kang W J, et al. QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]∥Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 5887-5896.
14	施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021, 47(7): 1610-1623.
	Shi Wei, Feng Yang-he, Cheng Guang-quan, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning[J]. Acta Automation Sinica, 2021, 47(7): 1610-1623.
15	Wan K, Wu D, Li B, et al. ME-MADDPG: an efficient learning based motion planning method for multiple agents in complex environments[J]. International Journal of Intelligent Systems, 2022, 37(3): 2393-2427.
16	王乃钰, 叶育鑫, 刘露, 等. 基于深度学习的语言模型研究进展[J]. 软件学报, 2021, 32(4): 1082-1115.
	Wang Nai-yu, Ye Yu-xin, Liu Lu, et al. Language models based on deep learning: a review[J]. Journal of Software, 2021, 32(4): 1082-1115.
17	Pan Z Y, Zhang Z Z, Chen Z X. Asynchronous value iteration network[C]∥Proceedings of the 25th International Conference on Neural Information Processing, Red Hook, USA, 2018: 169-180.

Related Articles 15

[1]	Liu LIU,Kun DING,Shan-shan LIU,Ming LIU. Event detection method as machine reading comprehension [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 533-539.
[2]	Jian ZHANG,Qing-yang LI,Dan LI,Xia JIANG,Yan-hong LEI,Ya-ping JI. Merging guidance of exclusive lanes for connected and autonomous vehicles based on deep reinforcement learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2508-2518.
[3]	Jian LI,Qi XIONG,Ya-ting HU,Kong-yu LIU. Chinese named entity recognition method based on Transformer and hidden Markov model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1427-1434.
[4]	Yan-tao TIAN,Yan-shi JI,Huan CHANG,Bo XIE. Deep reinforcement learning augmented decision⁃making model for intelligent driving vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 682-692.
[5]	Chun-hui LIU,Si-chang WANG,Ce ZHENG,Xiu-lian CHEN,Chun-lei HAO. Obstacle avoidance planning algorithm for indoor navigation robot based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3558-3564.
[6]	Wei-chao ZHUANG,Hao-nan Ding,Hao-xuan DONG,Guo-dong YIN,Xi WANG,Chao-bin ZHOU,Li-wei XU. Learning based eco⁃driving strategy of connected electric vehicle at signalized intersection [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 82-93.
[7]	Tian BAI,Ming-wei XU,Si-ming LIU,Ji-an ZHANG,Zhe WANG. Dispute focus identification of pleading text based on deep neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1872-1880.
[8]	Sheng-sheng WANG,Lin-yan JIANG,Yong-bo YANG. Transfer learning of medical image segmentation based on optimal transport feature selection [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(7): 1626-1638.
[9]	Hao-yu TIAN,Xin MA,Yi-bin LI. Skeleton-based abnormal gait recognition： a survey [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(4): 725-737.
[10]	Yong LIU,Lei XU,Chu-han ZHANG. Deep reinforcement learning model for text games [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 666-674.
[11]	Zhong-li WANG,Hao WANG,Yan SHEN,Bai-gen CAI. A driving decision⁃making approach based on multi⁃sensing and multi⁃constraints reward function [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(11): 2718-2727.
[12]	Jing-pei LEI,Dan-tong OUYANG,Li-ming ZHANG. Relation domain and range completion method based on knowledge graph embedding [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 154-161.
[13]	Zhi-hua LI,Ye-chao ZHANG,Guo-hua ZHAN. Realtime mosaic and visualization of 3D underwater acoustic seabed topography [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 180-186.
[14]	Yan-lei XU,Run HE,Yu-ting ZHAI,Bin ZHAO,Chen-xiao LI. Weed identification method based on deep transfer learning in field natural environment [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2304-2312.
[15]	Yong YANG,Qiang CHEN,Fu-heng QU,Jun-jie LIU,Lei ZHANG. SP⁃k⁃means-+ algorithm based on simulated partition [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1808-1816.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

参数	数值	参数	数值
场景范围	2×2	目标个数	5
智能体个数	5	目标半径	0.02
智能体半径	0.06	仿真时间步间隔	0.1
速度区间	［0，1］	仿真时间步数	30

算法	平均百回合训练耗时/s	总训练耗时/h
MADDPG	13.79	2.30
QMIX	15.63	2.61
同步LSTM-MADDPG	18.59	3.10

算法	平均碰撞次数	平均与目标单位距离
本文	1.015	0.048
QMIX	1.142	0.094
MADDPG	1.223	0.125

LSTM⁃MADDPG multi⁃agent cooperative decision algorithm based on asynchronous collaborative update

RICH HTML

PDF (PC)