基于DBSDER-QL算法的应急物资分配策略

吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (4): 1105-1116.

基于DBSDER-QL算法的应急物资分配策略

杨皓¹, 张池军^1,2, 张辛未³

1. 长春理工大学计算机科学技术学院, 长春 130022；2. 广东财经大学国际商学院, 广州 510320； 3. 长春大学学生工作部, 长春 130022

收稿日期:2025-02-24 出版日期:2025-07-26 发布日期:2025-07-26
通讯作者: 张池军 E-mail:cjzhang6@gdufe.edu.cn

Emergency Resource Allocation Strategy Based on DBSDER-QL Algorithm

YANG Hao¹, ZHANG Chijun^1,2, ZHANG Xinwei³

1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China;
2. International Business School, Guangdong University of Finance & Economics, Guangzhou 510320, China; 3. Student Affairs Office, Changchun University, Changchun 130022, China

Received:2025-02-24 Online:2025-07-26 Published:2025-07-26

摘要/Abstract

摘要： 针对自然灾害应急物资分配的问题, 提出一种基于动态Boltzmann Softmax(DBS)和动态探索率(DER)的Q-learning算法（dynamic Boltzmann Softmax and dynamic exploration rate based-Q-learning, DBSDER-QL）. 首先, 采用动态Boltzmann Softmax策略, 通过动态调整动作价值的权重, 促进算法的稳定收敛, 解决了最大运算符的过度贪婪问题. 其次, 采用动态探索率策略提高算法的收敛性和稳定性, 解决了固定探索率Q-learning算法在训练后期无法完全收敛到最优策略的问题. 最后, 通过消融实验验证了DBS和DER策略的有效性. 与动态规划算法、贪心算法及传统Q-learning算法进行对比的实验结果表明, DBSDER-QL算法在总成本和计算效率方面均明显优于传统方法, 展现了更高的适用性和有效性.

关键词: 物资分配, 强化学习, Q-learning算法, 动态探索率, 动态Boltzmann Softmax

Abstract: Aiming at the problem of emergency resource allocation for natural disasters, we proposed a Q-learning algorithm based on dynamic Boltzmann Softmax (DBS) and dynamic exploration rate (DER) (DBSDER-QL). Firstly, the DBS strategy was used to dynamically adjust the weights of action values, promoting stable convergence of the algorithm and solving the problem of excessive of the maximum operator. Secondly, the DER strategy was used to improve convergency and stability of the algorithm, solving the problem of the fixed exploration rate Q-learning algorithm not fully converging to the optimal strategy in the later stage of training. Finally, the effectiveness of the DBS and DER strategies was verified by ablation experiments. Compared with
dynamic programming, the greedy algorithm, and traditional Q-learning algorithm, the experimental results show that DBSDER-QL algorithm is significantly better than traditional methods in terms of total cost and computational efficiency, showing higher applicability and effectiveness.

Key words: resource allocation, reinforcement learning, Q-learning algorithm, dynamic exploration rate, dynamic Boltzmann Softmax

中图分类号:

TP391

杨皓, 张池军, 张辛未. 基于DBSDER-QL算法的应急物资分配策略[J]. 吉林大学学报(理学版), 2025, 63(4): 1105-1116.

YANG Hao, ZHANG Chijun, ZHANG Xinwei. Emergency Resource Allocation Strategy Based on DBSDER-QL Algorithm[J]. Journal of Jilin University Science Edition, 2025, 63(4): 1105-1116.

[1]	刘全, 刘晓松, 吴光军, 刘禹含. 基于渐近式k-means聚类的多行动者确定性策略梯度算法[J]. 吉林大学学报(理学版), 2025, 63(3): 885-0894.
[2]	白天, 吕璐瑶, 李储, 何加亮. 基于深度强化学习的游戏智能引导算法[J]. 吉林大学学报(理学版), 2025, 63(1): 91-0098.
[3]	郝嘉宁, 姚永伟, 叶育鑫. 本体指导下的安全强化学习最优化策略[J]. 吉林大学学报(理学版), 2025, 63(1): 83-0090.
[4]	高思华, 顾晗, 贺怀清, 周钢. 基于深度Q学习的无线传感器网络目标覆盖问题算法[J]. 吉林大学学报(理学版), 2023, 61(6): 1432-1440.
[5]	李晓峰, 任杰, 李东. 基于深度强化学习的移动机器人视觉图像分级匹配算法[J]. 吉林大学学报(理学版), 2023, 61(1): 127-135.
[6]	赵鹏程, 高尚, 于洪梅. 基于多智能体深度强化学习的空间众包任务分配[J]. 吉林大学学报(理学版), 2022, 60(2): 321-331.