Journal of Jilin University (Information Science Edition) ›› 2024, Vol. 42 ›› Issue (2): 318-325.

Previous Articles     Next Articles

Novel Reinforcement Learning Algorithm: Stable Constrained Soft Actor Critic

HAI Ri 1 , ZHANG Xingliang 2 , JIANG Yuan 1 , YANG Yongjian 1    

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. China Mobile Jilin Company Limited, China Mobile Communications Group Company Limited, Changchun 130022, China
  • Received:2023-02-13 Online:2024-04-10 Published:2024-04-12

Abstract: To solve the problem that Q function overestimation may cause SAC ( Soft Actor Critic) algorithm trapped in local optimal solution, SCSAC ( Stable Constrained Soft Actor Critic) algorithm is proposed for perfectly resolving the above weakness hidden in maximum entropy objective function improving the stability of Stable Constrained Soft Actor Critic algorithm in trailing process. The result of evaluating Stable Constrained Soft Actor Critic algorithm on the suite of OpenAI Gym Mujoco environments shows less Q value overestimation appearance and more stable results in trailing process comparing with SAC algorithm.

Key words: reinforcement learning, maximum entropy reinforcement learning, Q value overestimation, soft actor critic(SAC)algorithm

CLC Number: 

  • TP301