Journal of Jilin University (Information Science Edition) ›› 2021, Vol. 39 ›› Issue (2): 192-199.

Previous Articles     Next Articles

Soft Actor Critic Reinforcement Learning with Prioritized Experience Replay

  

  1. School of Electrical Engineering and Information, Northeast Petroleum University, Daqing 163318, China
  • Received:2020-06-27 Online:2021-04-19 Published:2021-04-27

Abstract: SAC(Soft Actor Critic) has the advantages of good robustness, strong exploration ability, and good agent generalization ability. It also has the disadvantage of lower training speed and unstable training process, to solve this problem, we propose the PER(Prioritized Experience Replay)-SAC algorithm by introducing priority experience sampling into the SAC algorithm. Thus the network prioritizes training samples with large error in estimation function and poor strategy performance, which improves the stability and convergence speed of the Agent training process. The experimental results show that the PER-SAC algorithm is better than SAC algorithm in training speed and stability.

Key words: deep reinforcement learning, Actor-Critic, maximum entropy, prioritized experience replay

CLC Number: 

  • TP273