吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (6): 1713-1722.

• • 上一篇    下一篇

基于优先经验回放的生成式SAC算法及其应用

张伟1, 李玉俊1, 谢雯雯2, 许耘嘉1, 孙庚2   

  1. 1. 吉林大学 后勤处, 长春 130012; 2. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2025-02-26 出版日期:2025-11-26 发布日期:2025-11-26
  • 通讯作者: 孙庚 E-mail:sungeng@jlu.edu.cn

Prioritized Experience Replay-Based Generative SAC Algorithm and Its Application

ZHANG Wei1, LI Yujun1, XIE Wenwen2, XU Yunjia1, SUN Geng2   

  1. 1. Logistics Department, Jilin University, Changchun 130012, China;
    2. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2025-02-26 Online:2025-11-26 Published:2025-11-26

摘要: 针对传统柔性演员-评论家算法在探索能力和复杂环境中状态表征不足的问题, 提出一种改进的柔性演员-评论家算法. 首先, 该算法通过引入优先经验回放机制, 利用时序差分误差对经验样本进行动态优先级评估, 从而提高关键经验的利用率, 进而提升学习效率; 其次, 该算法将生成式Transformer架构集成到演员网络中以增强对状态特征的动态捕捉能力, 从而显著提升其在复杂优化任务中的性能; 最后, 在高校后勤人员动态调度优化问题上进行应用实验. 实验结果表明, 与原始柔性演员评论家算法及经典深度Q网络算法相比, 改进的柔性演员-评论家算法在人力需求动态拟合方面误差更小, 从而有效验证了其在实际应用中的优势和实用性.

关键词: 深度强化学习, 柔性演员-评论家算法, 优先经验回放, Transformer架构, 后勤管理

Abstract: Aiming at the problem that the conventional soft actor-critic (SAC) algorithm lacked exploration capability and state representation in complex environments, we proposed an improved soft actor-critic (ISAC) algorithm. Firstly, the ISAC algorithm  introduced a prioritized experience replay (PER) mechanism, which dynamically evaluated the priority of experience samples by using the temporal differential errors, thereby enhancing the utilization of crucial experiences and improving learning efficiency of the algorithm. Secondly, the algorithm integrated  generative Transformer architecture  into the actor network to strengthen its ability to dynamically capture state features, thereby significantly improving its performance in complex optimization tasks. Finally, we conducted an application experiment  on the dynamic scheduling optimization problem of university logistics staff. The experimental results show that, compared with the original  SAC algorithm and the classic deep Q-network (DQN) algorithm, the proposed ISAC algorithm has smaller errors in dynamically fitting human resource demand, which effectively demonstrates its 
advantages and practicality in practical applications.

Key words: deep reinforcement learning, soft actor-critic algorithm, prioritized experience replay, Transformer architecture,  , logistics management

中图分类号: 

  • TP181