Journal of Jilin University (Information Science Edition) ›› 2023, Vol. 41 ›› Issue (3): 437-443.

Previous Articles     Next Articles

Proximal Policy Optimization Algorithm Based on Correntropy Induced Metric

ZHANG Huizhen, WANG Qiang   

  1. School of Electrical and Informatioin Engineering, Northeast Pertroleum University, Daqing 163318, China
  • Received:2022-05-14 Online:2023-06-08 Published:2023-06-14

Abstract: In the deep Reinforcement Learning, the PPO ( Proximal Policy Optimization) performs very well in many experimental tasks. However, KL(Kullback-Leibler) -PPO with adaptive KL divergence affects the update efficiency of KL-PPO strategy because of its asymmetry. In order to solve the negative impact of this asymmetry, Proximal Policy Optimization algorithm based on CIM( Correntropy Induced Metric) is proposed characterize the difference between the old and new strategies, update the policies more accurately, and then the experimental test of OpenAI gym shows that compared with the mainstream near end strategy optimization algorithms clip PPO and KL PPO, the proposed algorithm can obtain more than 50% reward, and the convergence speed is accelerated by about 500 ~ 1 100 episodes in different environments. And it also has good robustness.

Key words: kullback-leibler(KL) divergence; , proximal policy optimization(PPO); , correntropy induced metric (CIM); , alternative target; , deep reinforcement learning

CLC Number: 

  • TP273