吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (3): 885-0894.

• • 上一篇    下一篇

基于渐近式k-means聚类的多行动者确定性策略梯度算法

刘全1,2, 刘晓松2, 吴光军2, 刘禹含3   

  1. 1. 喀什大学 计算机科学与技术学院, 新疆 喀什 844000; 2. 苏州大学 计算机科学与技术学院, 江苏 苏州 215008; 3. 西交利物浦大学 未来教育学院, 江苏 苏州 215000
  • 收稿日期:2024-01-25 出版日期:2025-05-26 发布日期:2025-05-26
  • 通讯作者: 刘全 E-mail:quanliu@suda.edu.cn

Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k-Means Clustering

LIU Quan1,2, LIU Xiaosong2, WU Guangjun2, LIU Yuhan3   

  1. 1. School of Computer Science and Technology, Kashi University, Kashi 844000, Xinjiang Uygur Autonomous Region, China;2. School of Computer Science and Technology, Soochow University, Suzhou 215008, Jiangsu Province, China;3. Academy of Future Education, Xi’an Jiaotong-Liverpool University, Suzhou 215000, Jiangsu Province, China
  • Received:2024-01-25 Online:2025-05-26 Published:2025-05-26

摘要: 针对深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法在一些大状态空间任务中存在学习效果不佳及波动较大等问题, 提出一种基于渐近式k-means聚类算法的多行动者深度确定性策略梯度(multi-actor deep deterministic policy gradient based on progressive k-means clustering, MDDPG-PK-Means)算法. 在训练过程中, 对每一时间步下的状态进行动作选择时, 根据k-means算法判别结果辅佐行动者网络的决策, 同时随训练时间步的增加, 逐渐增加k-means算法类簇中心的个数. 将MDDPG-PK-Means算法应用于MuJoCo仿真平台上, 实验结果表明, 与DDPG等算法相比, MDDPG-PK-Means算法在大多数连续任务中都具有更好的效果.

关键词: 深度强化学习, 确定性策略梯度算法, k-means聚类, 多行动者

Abstract: Aiming at the problems of poor learning performance and high fluctuation in the deep deterministic policy gradient (DDPG) algorithm for tasks with some large state spaces, we proposed a multi-actor deep deterministic policy gradient algorithm based on progressive k-means clustering (MDDPG-PK-Means) algorithm. In the training process, when selecting actions for the state at each time step, the decision-making of the actor network was  assisted based on the discrimination results of  the k-means clustering algorithm. At the same time, as the training steps increased, the number of k-means cluster centers gradually increased. The MDDPG-PK-Means algorithm was applied to the MuJoCo simulation platform, the experimental results show that, compared with 
 DDPG and other algorithms, the MDDPG-PK-Means algorithm has better performance  in most continuous tasks.

Key words: deep reinforcement learning, deterministic policy , gradient algorithm, k-means clustering, multi-actor

中图分类号: 

  • TP18