稳定且受限的新强化学习 SAC 算法

Journal of Jilin University (Information Science Edition) ›› 2024, Vol. 42 ›› Issue (2): 318-325.

Previous Articles Next Articles

Novel Reinforcement Learning Algorithm: Stable Constrained Soft Actor Critic

HAI Ri ¹ , ZHANG Xingliang² , JIANG Yuan ¹ , YANG Yongjian ¹

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. China Mobile Jilin Company Limited, China Mobile Communications Group Company Limited, Changchun 130022, China

Received:2023-02-13 Online:2024-04-10 Published:2024-04-12

Abstract

Abstract: To solve the problem that Q function overestimation may cause SAC ( Soft Actor Critic) algorithm trapped in local optimal solution, SCSAC ( Stable Constrained Soft Actor Critic) algorithm is proposed for perfectly resolving the above weakness hidden in maximum entropy objective function improving the stability of Stable Constrained Soft Actor Critic algorithm in trailing process. The result of evaluating Stable Constrained Soft Actor Critic algorithm on the suite of OpenAI Gym Mujoco environments shows less Q value overestimation appearance and more stable results in trailing process comparing with SAC algorithm.

Key words: reinforcement learning, maximum entropy reinforcement learning, Q value overestimation, soft actor critic(SAC)algorithm

CLC Number:

TP301

HAI Ri , ZHANG Xingliang , JIANG Yuan , YANG Yongjian . Novel Reinforcement Learning Algorithm: Stable Constrained Soft Actor Critic[J].Journal of Jilin University (Information Science Edition), 2024, 42(2): 318-325.

[1]	ZHANG Guanghua, XU Hang, WAN Enhan. Research on Task Offloading Strategy for Mobile Edge Computing [J]. Journal of Jilin University (Information Science Edition), 2024, 42(2): 210-216.
[2]	ZHANG Huizhen, WANG Qiang. Proximal Policy Optimization Algorithm Based on Correntropy Induced Metric [J]. Journal of Jilin University (Information Science Edition), 2023, 41(3): 437-443.
[3]	LIU Qingqiang, LIU Pengyun. Soft Actor Critic Reinforcement Learning with Prioritized Experience Replay [J]. Journal of Jilin University (Information Science Edition), 2021, 39(2): 192-199.
[4]	KANG Chaohai, SUN Chao, ＲONG Chuiting, LIU Pengyun. TD3 Algorithm with Dynamic Delayed Policy Update [J]. Journal of Jilin University (Information Science Edition), 2020, 38(4): 474-481.
[5]	GAO Le, MA Tianlu, LIU Kai, ZHANG Yuxuan. Application of Improved Q-Learning Algorithm in Path Planning [J]. Journal of Jilin University (Information Science Edition), 2018, 36(4): 439-443.

Novel Reinforcement Learning Algorithm: Stable Constrained Soft Actor Critic

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Metrics

Comments

Recommended 10