基于相关熵诱导度量的近端策略优化算法

Journal of Jilin University (Information Science Edition) ›› 2023, Vol. 41 ›› Issue (3): 437-443.

Previous Articles Next Articles

Proximal Policy Optimization Algorithm Based on Correntropy Induced Metric

ZHANG Huizhen, WANG Qiang

School of Electrical and Informatioin Engineering, Northeast Pertroleum University, Daqing 163318, China

Received:2022-05-14 Online:2023-06-08 Published:2023-06-14

Abstract

Abstract: In the deep Reinforcement Learning, the PPO ( Proximal Policy Optimization) performs very well in many experimental tasks. However, KL(Kullback-Leibler) -PPO with adaptive KL divergence affects the update efficiency of KL-PPO strategy because of its asymmetry. In order to solve the negative impact of this asymmetry, Proximal Policy Optimization algorithm based on CIM( Correntropy Induced Metric) is proposed characterize the difference between the old and new strategies, update the policies more accurately, and then the experimental test of OpenAI gym shows that compared with the mainstream near end strategy optimization algorithms clip PPO and KL PPO, the proposed algorithm can obtain more than 50% reward, and the convergence speed is accelerated by about 500 ~ 1 100 episodes in different environments. And it also has good robustness.

Key words: kullback-leibler(KL) divergence; , proximal policy optimization(PPO); , correntropy induced metric (CIM); , alternative target; , deep reinforcement learning

CLC Number:

TP273

ZHANG Huizhen, WANG Qiang. Proximal Policy Optimization Algorithm Based on Correntropy Induced Metric[J].Journal of Jilin University (Information Science Edition), 2023, 41(3): 437-443.

References

Metrics

Viewed

Full text

465

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	0	0	0	465

From	Others	local

Times	44	421
Rate	9%	91%

Abstract

275

Just accepted	Online first	Issue

0	0	275

From	Others	local

Times	274	1
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

[1]	LIU Qingqiang, LIU Pengyun. Soft Actor Critic Reinforcement Learning with Prioritized Experience Replay [J]. Journal of Jilin University (Information Science Edition), 2021, 39(2): 192-199.
[2]	KANG Chaohai, SUN Chao, ＲONG Chuiting, LIU Pengyun. TD3 Algorithm with Dynamic Delayed Policy Update [J]. Journal of Jilin University (Information Science Edition), 2020, 38(4): 474-481.

Proximal Policy Optimization Algorithm Based on Correntropy Induced Metric

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Metrics

Comments

Recommended 10