吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (1): 91-0098.

• • 上一篇    下一篇

基于深度强化学习的游戏智能引导算法

白天1, 吕璐瑶2, 李储1, 何加亮3   

  1. 1. 吉林大学 计算机科学与技术学院, 长春 130012; 2. 吉林大学 软件学院, 长春 130012;3. 大连民族大学 信息与通信工程学院, 辽宁 大连 116600
  • 收稿日期:2023-12-29 出版日期:2025-01-26 发布日期:2025-01-26
  • 通讯作者: 何加亮 E-mail:78919121@qq.com

Game Intelligent Guidance Algorithm Based on Deep Reinforcement Learning

BAI Tian1, LV Luyao2, LI Chu1, HE Jialiang3   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;
    2. College of Software, Jilin University, Changchun 130012, China;  3. College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, Liaoning Province, China
  • Received:2023-12-29 Online:2025-01-26 Published:2025-01-26

摘要: 针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题, 提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法. 首先, 利用Unity引擎提供的接口直接读取游戏后台信息, 以有效压缩状态空间的维度, 减少输入数据量; 其次, 通过精细化设计奖励机制, 加速模型的收敛过程; 最后, 从主观定性和客观定量两方面对该算法模型与现有方法进行对比实验, 实验结果表明, 该算法不仅显著提高了模型的训练效率, 还大幅度提高了智能体的性能.

关键词: 深度强化学习, 游戏智能体, 奖励函数塑形, 近端策略优化算法

Abstract: Aiming at the problems of high input dimensionality and long training time in traditional game intelligent  algorithm models, we  proposed a novel deep reinforcement learning game intelligent  guidance algorithm that integrated state information transformation and reward function shaping techniques. Firstly, using  the interface provided by the Unity engine to directly read game backend  information effectively compressed  the dimensionality of the state space and reduced the amount of input data. Secondly, by finely designing  the reward mechanism, the convergence process of the model was accelerated. Finally, we conducted comparative experiments between the proposed algorithm model and existing methods  from both subjective qualitative and objective quantitative perspectives. The experimental results show that this algorithm not only significantly improves the training efficiency of the model,  but also markedly enhances the performance of the  agent.

Key words: deep reinforcement learning, game agent, reward function shaping, proximal policy optimization algorithm

中图分类号: 

  • TP391