吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (1): 83-89.

• • 上一篇    下一篇

基于强化学习的多策略自寻优人工蜂群算法

倪红梅王 梅   

  1. 东北石油大学 计算机与信息技术学院, 黑龙江 大庆 163318
  • 收稿日期:2023-11-02 出版日期:2025-02-24 发布日期:2025-02-24
  • 作者简介:倪红梅(1975— ), 女, 吉林德惠人, 东北石油大学副教授, 博士, 硕士生导师, 主要从事智能优化算法研究, (Tel)86-13836845801(E-mail)nhm257@ 163. com。
  • 基金资助:

     国家自然科学基金资助项目(51774090)

Artificial Bee Colony Algorithmof Multi-Strategy Self-Optimizing Based on Reinforcement Learning

NI Hongmei, WANG Mei   

  1. School of Computer and Information Technology, Northeast Petroleum University,Daqing 163318, China
  • Received:2023-11-02 Online:2025-02-24 Published:2025-02-24

摘要:

针对人工蜂群算法局部搜索能力不足的缺点, 借鉴强化学习的寻优思想, 提出了一种基于强化学习的多策略自寻优人工蜂群算法。 该算法将强化学习中的 Q 学习方法与人工蜂群算法相融合, 利用种群最好值与个体适应值的距离和种群多样性两个指标作为划分状态的依据, 建立包含多种搜索策略的动作集, 采用ε -贪心策略选择最优, 产生高质量的后代, 实现了 ABC(Artificial Bee Colony)算法更新策略的智能选择。 通过 20 个测试函数和在股票预测方面的应用, 结果显示所提算法表现出较优性能, 能更好地平衡勘探和开发之间的关系, 具有较快的收敛速度和较好的自寻优能力。

关键词: 人工蜂群算法, 强化学习, 多策略, Q 学习, 自寻优

Abstract: To address the deficiency in the local search ability of the artificial bee colony algorithm, a multi-strategy self-optimizing artificial bee colony algorithm based on reinforcement learning is proposed. This algorithm combines the Q-learning method in reinforcement learning with the artificial bee colony algorithm. The distance between the best value of the population and the individual fitness value, along with the diversity of the population are used as the basis for dividing the state. The algorithm creates an action set that contains multiple search strategies, adopts the ε-greedy strategy for selecting the best, produces high-quality offspring, and achieves intelligent selection of the ABC (Artificial Bee Colony) algorithm update strategy. Through 20 test functions and application in stock prediction, the results show that the proposed algorithm has better performance, a better balance between exploration and exploitation, faster convergence speed, and better self- optimizing ability.

Key words: artificial bee colony algorithm, reinforcement learning, multi-strategy, Q-learning, self-optimizing

中图分类号: 

  • TP183