Journal of Jilin University (Information Science Edition) ›› 2020, Vol. 38 ›› Issue (4): 474-481.

Previous Articles     Next Articles

TD3 Algorithm with Dynamic Delayed Policy Update

KANG Chaohai,SUN Chao,RONG Chuiting,LIU Pengyun   

  1. School of Electrical Engineering and Information,Northeast Petroleum University,Daqing 163318,China
  • Received:2020-01-17 Online:2020-07-24 Published:2020-08-13

Abstract: In the field of deep reinforcement learning, in order to further reduce the impact of value
overestimation on policy estimation in TD3 ( Twin Delayed Deep Deterministic Policy Gradients) and accelerate
the efficiency of model learning,a DD-TD3 ( Twin Delayed Deep Deterministic Policy Gradients with Dynamic
Delayed Policy Update) is proposed. The delay update step size of the actor network is guided by the dynamic
difference between the latest loss of the critic network and its exponential weighted moving average. Experimental
results show that compared to the original TD3 algorithm that obtain high reward value in the 2 000 steps,the
DD-TD3 method can learn the optimal control strategy in about 1 000 steps and obtain a higher reward value,
thereby the efficiency of finding the optimal strategy is improved.

Key words: deep reinforcement learning, twin delayed deep deterministic policy gradients ( TD3) , dynamic delayed policy update

CLC Number: 

  • TP273