一种新的Q学习算法在机械臂轨迹规划中的应用

J4 ›› 2013, Vol. 31 ›› Issue (1): 90-94.

一种新的Q学习算法在机械臂轨迹规划中的应用

李艳辉, 赵辉, 李珊珊

东北石油大学电气信息工程学院, 黑龙江大庆 163318

收稿日期:2012-08-24 出版日期:2003-01-24 发布日期:2013-04-01
作者简介:李艳辉（1970—）, 女, 辽宁法库人, 东北石油大学教授, 硕士生导师, 主要从事鲁棒控制、滤波和智能控制研究, (Tel)86-459-6504797(E-mail)ly_hui@hotmail.com。
基金资助:
国家青年基金资助项目(61004067)；黑龙江省教育厅科学技术基金资助项目(12511002)

New Q-Learning Algorithm for Trajectory Plan of Manipulator

LI Yan-hui, ZHAO Hui, LI Shan-shan

College of Electrical and Information Engineering, Northeast Petroleum University, Daqing 163318, China

Received:2012-08-24 Online:2003-01-24 Published:2013-04-01

摘要/Abstract

摘要：

为了对二自由度机械臂轨迹进行规划, 提出了一种新的动态搜索Q学习算法。该算法不需要建立机械臂的数学模型, 直接对轨迹进行规划, 根据学习进程动态调整贪婪策略的比例参数, 并给出较传统方式更具客观性和公平性的定量策略评价单元。同时, 由动态更新机构在线更新学习经验。仿真结果表明, 新的Q学习算法能使机械臂更快速地达到目标位置, 并实现轨迹全局最优。

关键词: 机械臂, Q学习, 贪婪策略, 轨迹规划, 定量评价单元

Abstract:

In order to achieve the purpose of trajectory for 2DOF (Two Degrees of Freedom) manipulator, we propose an improved Qlearning algorithm which doesn't need the mathematical model of manipulator and can plan trajectory directly. The algorithm can dynamically adjust parameters of greedy strategy according to the study process. The simulation results show that the manipulator reaches the target position more quickly and the trajectory is the most optimal one when the new algorithm is applied to 2DOF manipulator trajectory plan.

Key words: manipulator, Q-learning, greedy strategy, trajectory plan, quantitative judgment unit

中图分类号:

TP242.2

李艳辉, 赵辉, 李珊珊. 一种新的Q学习算法在机械臂轨迹规划中的应用[J]. J4, 2013, 31(1): 90-94.

LI Yan-hui, ZHAO Hui, LI Shan-shan. New Q-Learning Algorithm for Trajectory Plan of Manipulator[J]. J4, 2013, 31(1): 90-94.

[1]	任晓琳,李洪文 . 复杂多关节机械臂建模及逆运动学比较分析[J]. 吉林大学学报(信息科学版), 2016, 34(6): 753-760.
[2]	赵辉,刘雅喆 . 改进的 Q 学习算法在轨迹规划中的应用[J]. 吉林大学学报(信息科学版), 2016, 34(5): 697-702.
[3]	王光勇, 杜巧玲, 刘振泽, 尹苍穹. 基于操作空间的机械臂自适应模糊鲁棒控制[J]. 吉林大学学报(信息科学版), 2015, 33(4): 402-.
[4]	吴洪岩,刘淑华,张嵛. 基于RBFNN的强化学习在机器人导航中的应用[J]. J4, 2009, 27(02): 185-.