改进的 Q 学习算法在轨迹规划中的应用

吉林大学学报(信息科学版) ›› 2016, Vol. 34 ›› Issue (5): 697-702.

改进的 Q 学习算法在轨迹规划中的应用

赵摇辉 1 , 刘雅喆 2

1. 渤海大学工学院, 辽宁锦州 121013;2. 大庆师范学院计算机科学与信息技术学院, 黑龙江大庆 163318

收稿日期:2015-11-25 出版日期:2016-09-24 发布日期:2017-01-16
作者简介:赵辉(1984— ), 男, 吉林公主岭人, 渤海大学硕士研究生, 主要从事智能控制、模式识别与信号处理研究, (Tel)86-18640615974(E-mail)704256282@ qq. com。
基金资助:
国家青年基金资助项目(61304053)

Improved Algorithm of Q鄄Learning for Trajectory Planning

ZHAO Hui 1 , LIU Yazhe 2

1. College of Engineering, Bohai University, Jinzhou 121013, China;
2. College of Computer Science and Information Technology, Daqing Normal University, Daqing 163318, China

Received:2015-11-25 Online:2016-09-24 Published:2017-01-16

摘要/Abstract

摘要： 为解决 Q 学习算法易陷入局部最优解问题, 改进了传统贪婪策略, 提出了一种分段渐近搜索策略。该策略通过动态调整策略参数, 使 Q 学习算法在学习过程中实现探索鄄学习鄄利用 3 个阶段的渐近跳转。同时将该搜索策略应用于 Q 学习算法中, 使改进的 Q 学习算法能更快速地逼近全局最优解。将改进算法应用于机械臂轨迹规划中, 其仿真结果表明, 该算法能稳定地引导机械臂沿最优轨迹快速到达目标位置。

关键词: 轨迹规划, 机械臂, 搜索策略, 在线学习, 数学模型

Abstract: Aiming at the local optimal solution for Q learning algorithm, a segment incremental search strategy was proposed base on greedy strategy. The improved Q learning jump gradually between three situations such as explore, learn and utilize by adjusting parameters of segment incremental search strategy, and it could approach the global optimal rapidly than the traditional one when the new search strategy is applied to the Q learning. The simulation results show that the manipulator reaches the target position accurately and quickly guided by the improve Q learning algorithm.

Key words: online learning, manipulator, mathematical model, trajectory plan, search strategy

中图分类号:

TP242. 2

赵辉,刘雅喆 . 改进的 Q 学习算法在轨迹规划中的应用[J]. 吉林大学学报(信息科学版), 2016, 34(5): 697-702.

ZHAO Hui,LIU Yazhe . Improved Algorithm of Q鄄Learning for Trajectory Planning[J]. Journal of Jilin University(Information Science Ed, 2016, 34(5): 697-702.

[1]	任晓琳,李洪文 . 复杂多关节机械臂建模及逆运动学比较分析[J]. 吉林大学学报(信息科学版), 2016, 34(6): 753-760.
[2]	王光勇, 杜巧玲, 刘振泽, 尹苍穹. 基于操作空间的机械臂自适应模糊鲁棒控制[J]. 吉林大学学报(信息科学版), 2015, 33(4): 402-.
[3]	李飞, 王从庆, 周鑫, 周大可. 基于在线多示例学习的协同训练目标跟踪算法[J]. 吉林大学学报(信息科学版), 2015, 33(2): 201-207.
[4]	李艳辉, 赵辉, 李珊珊. 一种新的Q学习算法在机械臂轨迹规划中的应用[J]. J4, 2013, 31(1): 90-94.
[5]	王勋龙, 张红燕, 隋振, 郭盟, 田彦. 数控凸轮磨削中三环控制系统设计[J]. J4, 2012, 30(1): 40-46.
[6]	隋振,郭盟\|王勋龙\|徐凤\|崔鸣笛\|刘爱莲. 凸轮轴磨床OEM软件的二次开发及算法设计[J]. J4, 2010, 28(05): 519-.
[7]	. 山洪预测的数学模型及数值模拟[J]. J4, 2009, 27(01): 99-.