基于动态延迟策略更新的TD3 算法

Abstract

Abstract: In the field of deep reinforcement learning， in order to further reduce the impact of value
overestimation on policy estimation in TD3 ( Twin Delayed Deep Deterministic Policy Gradients) and accelerate
the efficiency of model learning，a DD-TD3 ( Twin Delayed Deep Deterministic Policy Gradients with Dynamic
Delayed Policy Update) is proposed． The delay update step size of the actor network is guided by the dynamic
difference between the latest loss of the critic network and its exponential weighted moving average． Experimental
results show that compared to the original TD3 algorithm that obtain high reward value in the 2 000 steps，the
DD-TD3 method can learn the optimal control strategy in about 1 000 steps and obtain a higher reward value，
thereby the efficiency of finding the optimal strategy is improved．

Key words: deep reinforcement learning, twin delayed deep deterministic policy gradients ( TD3) , dynamic delayed policy update

CLC Number:

TP273

KANG Chaohai, SUN Chao, ＲONG Chuiting, LIU Pengyun. TD3 Algorithm with Dynamic Delayed Policy Update[J].Journal of Jilin University (Information Science Edition), 2020, 38(4): 474-481.

[1]	ZHOU Shuhui , WANG Zenghui , HUANG Dongyan . No-Till Planter Sowing Intelligent Depth Regulation System Based on Flex Sensor [J]. Journal of Jilin University (Information Science Edition), 2020, 38(5): 555-562.
[2]	SHAO Keyong, BU Ｒuixuan, ZHOU Liyuan, XU Zihui, ZHANG Yi. Passive Synchronization Control in Different Dimensional Fractional-Order Chaotic Systems [J]. Journal of Jilin University (Information Science Edition), 2020, 38(4): 394-401.
[3]	LIN Xia, LIN Baojun, LIU Yingchun, BAI Tao, WU Guoqiang, WANG Zhengkai. Ｒesearch on Centralized Autonomous Orbit Determination Algorithm for Beidou Satellites [J]. Journal of Jilin University (Information Science Edition), 2020, 38(4): 428-432.
[4]	YAO Dongdong, MA Lin, TAO Pengfei, YU Limei, WU Cong, KONG Caihua. Variable Speed Limit Control Method for Freeway under Snow and Ice Conditions [J]. Journal of Jilin University (Information Science Edition), 2020, 38(3): 258-265.
[5]	ZHAO Hang, YUE Xiaofeng, FANG Bo, YUAN Xiaolei, MA Guoyuan, GUO Songwuming. Visual Servo Control System Based on PSO-GA-BP Neural Network [J]. Journal of Jilin University (Information Science Edition), 2020, 38(2): 172-178.
[6]	ZHANG Liwei, GAO Sheng, CHEN Kun, CHANG Yulian. ＲOV Path Planning for Full Traverse Detection of Jacket [J]. Journal of Jilin University (Information Science Edition), 2019, 37(5): 482-489.
[7]	DONG Na, FENG Yu, WU Aiguo, HAN Xueshuo. Model-Free Predictive Control and Its Application in Control of Lithium Bromide Unit [J]. Journal of Jilin University (Information Science Edition), 2019, 37(4): 372-381.
[8]	CHE Xiaonan, SHI Yaowu, WANG Shiqian, LI Xuchen. Ｒesearch on Harmonic Ｒestoration in α Noise Background [J]. Journal of Jilin University (Information Science Edition), 2019, 37(3): 223-229.
[9]	HUANG Jianfei, MA Yan. Adaptive Formation Control for Vehicles Based on Leader-Follower Strategy [J]. Journal of Jilin University (Information Science Edition), 2019, 37(3): 253-259.
[10]	CHE Yuhan, LIU Fu, KANG Bing. Quadrotor Tracking Control System Based on PTZ Camera [J]. Journal of Jilin University (Information Science Edition), 2019, 37(3): 278-285.
[11]	LI Yanhui, BO Peng. Non-Fragile Ｒobust H_∞ Control of T-S Fuzzy Switched Systems [J]. Journal of Jilin University (Information Science Edition), 2019, 37(3): 286-291.
[12]	HUANG Zhenkui, SHEN Wenzhu, DU Qiaoling, YANG Tingting, YAN Xinyu, WU Dongrui. Studies on Control System of Small-Scale Float-Garbage Automatic Cruise Ship Based on Open-Water Traversal Algorithm#br# [J]. Journal of Jilin University (Information Science Edition), 2019, 37(2): 208-215.
[13]	LI Yanhui, CAO Yiming. Non-Fragile L2-L∞ Control For Networked Control Systems with Distributed Delay [J]. Journal of Jilin University (Information Science Edition), 2019, 37(1): 25-31.
[14]	SUI Zhen，XU Feng，SU Zhendong. Error Correction of Cam Lift Based on Genetic Algorithm [J]. Journal of Jilin University(Information Science Ed, 2018, 36(3): 277-282.
[15]	HU Dongxue，ZHANG Zongda，WANG Ｒui，YANG Han. Ｒesearch on Constant Temperature Control System Based on Novel Fuzzy-PID Control Algorithm [J]. Journal of Jilin University(Information Science Ed, 2018, 36(3): 312-317.

TD3 Algorithm with Dynamic Delayed Policy Update

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 10