吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (4): 1105-1116.

• • 上一篇    下一篇

基于DBSDER-QL算法的应急物资分配策略

杨皓1, 张池军1,2, 张辛未3   

  1. 1. 长春理工大学 计算机科学技术学院, 长春 130022;2. 广东财经大学 国际商学院, 广州 510320; 3. 长春大学 学生工作部, 长春 130022
  • 收稿日期:2025-02-24 出版日期:2025-07-26 发布日期:2025-07-26
  • 通讯作者: 张池军 E-mail:cjzhang6@gdufe.edu.cn

Emergency Resource Allocation Strategy Based on DBSDER-QL Algorithm

YANG Hao1, ZHANG Chijun1,2, ZHANG Xinwei3   

  1. 1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China;
    2. International Business School, Guangdong University of Finance & Economics, Guangzhou 510320, China; 3. Student Affairs Office, Changchun University, Changchun 130022, China
  • Received:2025-02-24 Online:2025-07-26 Published:2025-07-26

摘要: 针对自然灾害应急物资分配的问题, 提出一种基于动态Boltzmann Softmax(DBS)和动态探索率(DER)的Q-learning算法(dynamic Boltzmann Softmax and dynamic exploration rate based-Q-learning, DBSDER-QL). 首先, 采用动态Boltzmann Softmax策略, 通过动态调整动作价值的权重, 促进算法的稳定收敛, 解决了最大运算符的过度贪婪问题. 其次, 采用动态探索率策略提高算法的收敛性和稳定性, 解决了固定探索率Q-learning算法在训练后期无法完全收敛到最优策略的问题. 最后, 通过消融实验验证了DBS和DER策略的有效性. 与动态规划算法、 贪心算法及传统Q-learning算法进行对比的实验结果表明, DBSDER-QL算法在总成本和计算效率方面均明显优于传统方法, 展现了更高的适用性和有效性.

关键词: 物资分配, 强化学习, Q-learning算法, 动态探索率, 动态Boltzmann Softmax

Abstract: Aiming at the problem of emergency resource allocation for natural disasters, we proposed a Q-learning algorithm based on dynamic Boltzmann Softmax (DBS) and dynamic exploration rate (DER) (DBSDER-QL).  Firstly, the DBS strategy was used to dynamically adjust the weights of action values, promoting stable convergence of the algorithm and solving the problem of excessive of  the maximum operator. Secondly, the DER strategy was used to improve convergency and stability of the algorithm, solving the problem of the fixed exploration rate Q-learning algorithm not fully converging to the optimal strategy in the later stage of  training. Finally,  the effectiveness of the DBS and DER strategies was verified by ablation experiments. Compared with 
 dynamic programming, the greedy algorithm, and traditional Q-learning algorithm, the experimental results show  that DBSDER-QL algorithm is significantly better than traditional methods in terms of total cost and computational efficiency, showing higher applicability and effectiveness.

Key words: resource allocation, reinforcement learning, Q-learning algorithm, dynamic exploration rate, dynamic Boltzmann Softmax

中图分类号: 

  • TP391