基于渐近式k-means聚类的多行动者确定性策略梯度算法

吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (3): 885-0894.

基于渐近式k-means聚类的多行动者确定性策略梯度算法

刘全^1,2, 刘晓松², 吴光军², 刘禹含³

1. 喀什大学计算机科学与技术学院, 新疆喀什 844000； 2. 苏州大学计算机科学与技术学院, 江苏苏州 215008； 3. 西交利物浦大学未来教育学院, 江苏苏州 215000

收稿日期:2024-01-25 出版日期:2025-05-26 发布日期:2025-05-26
通讯作者: 刘全 E-mail:quanliu@suda.edu.cn

Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k-Means Clustering

LIU Quan^1,2, LIU Xiaosong², WU Guangjun², LIU Yuhan³

1. School of Computer Science and Technology, Kashi University, Kashi 844000, Xinjiang Uygur Autonomous Region, China；2. School of Computer Science and Technology, Soochow University, Suzhou 215008, Jiangsu Province, China；3. Academy of Future Education, Xi’an Jiaotong-Liverpool University, Suzhou 215000, Jiangsu Province, China

Received:2024-01-25 Online:2025-05-26 Published:2025-05-26

摘要/Abstract

摘要： 针对深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法在一些大状态空间任务中存在学习效果不佳及波动较大等问题, 提出一种基于渐近式k-means聚类算法的多行动者深度确定性策略梯度(multi-actor deep deterministic policy gradient based on progressive k-means clustering, MDDPG-PK-Means)算法. 在训练过程中, 对每一时间步下的状态进行动作选择时, 根据k-means算法判别结果辅佐行动者网络的决策, 同时随训练时间步的增加, 逐渐增加k-means算法类簇中心的个数. 将MDDPG-PK-Means算法应用于MuJoCo仿真平台上, 实验结果表明, 与DDPG等算法相比, MDDPG-PK-Means算法在大多数连续任务中都具有更好的效果.

关键词: 深度强化学习, 确定性策略梯度算法, k-means聚类, 多行动者

Abstract: Aiming at the problems of poor learning performance and high fluctuation in the deep deterministic policy gradient (DDPG) algorithm for tasks with some large state spaces, we proposed a multi-actor deep deterministic policy gradient algorithm based on progressive k-means clustering (MDDPG-PK-Means) algorithm. In the training process, when selecting actions for the state at each time step, the decision-making of the actor network was assisted based on the discrimination results of the k-means clustering algorithm. At the same time, as the training steps increased, the number of k-means cluster centers gradually increased. The MDDPG-PK-Means algorithm was applied to the MuJoCo simulation platform, the experimental results show that, compared with
DDPG and other algorithms, the MDDPG-PK-Means algorithm has better performance in most continuous tasks.

Key words: deep reinforcement learning, deterministic policy , gradient algorithm, k-means clustering, multi-actor

中图分类号:

TP18

刘全, 刘晓松, 吴光军, 刘禹含. 基于渐近式k-means聚类的多行动者确定性策略梯度算法[J]. 吉林大学学报(理学版), 2025, 63(3): 885-0894.

LIU Quan, LIU Xiaosong, WU Guangjun, LIU Yuhan. Multi-actor Deterministic Policy Gradient Algorithm Based on Progressive k-Means Clustering[J]. Journal of Jilin University Science Edition, 2025, 63(3): 885-0894.

[1]	白天, 吕璐瑶, 李储, 何加亮. 基于深度强化学习的游戏智能引导算法[J]. 吉林大学学报(理学版), 2025, 63(1): 91-0098.
[2]	李晓峰, 任杰, 李东. 基于深度强化学习的移动机器人视觉图像分级匹配算法[J]. 吉林大学学报(理学版), 2023, 61(1): 127-135.
[3]	赵鹏程, 高尚, 于洪梅. 基于多智能体深度强化学习的空间众包任务分配[J]. 吉林大学学报(理学版), 2022, 60(2): 321-331.
[4]	金晓民, 张丽萍. 基于最小生成树的多层次k-Means聚类算法及其在数据挖掘中的应用[J]. 吉林大学学报(理学版), 2018, 56(5): 1187-1192.
[5]	杨杰明, 吴启龙, 曲朝阳, 杨烁, 阚中峰, 高冶. MapReduce框架下基于抽样的分布式K-Means聚类算法[J]. 吉林大学学报(理学版), 2017, 55(01): 109-115.
[6]	张强, 王春霞, 赵健, 武龙举, 李静永. 基于聚类和局部信息的离群点检测算法[J]. J4, 2012, 50(06): 1214-1217.