J4 ›› 2013, Vol. 31 ›› Issue (1): 90-94.

• 论文 • 上一篇    下一篇

一种新的Q学习算法在机械臂轨迹规划中的应用

李艳辉, 赵辉, 李珊珊   

  1. 东北石油大学 电气信息工程学院, 黑龙江 大庆 163318
  • 收稿日期:2012-08-24 出版日期:2003-01-24 发布日期:2013-04-01
  • 作者简介:李艳辉(1970—), 女, 辽宁法库人, 东北石油大学教授, 硕士生导师, 主要从事鲁棒控制、 滤波和智能控制研究, (Tel)86-459-6504797(E-mail)ly_hui@hotmail.com。
  • 基金资助:

    国家青年基金资助项目(61004067); 黑龙江省教育厅科学技术基金资助项目(12511002)

New Q-Learning Algorithm for Trajectory Plan of Manipulator

LI Yan-hui, ZHAO Hui, LI Shan-shan   

  1. College of Electrical and Information Engineering, Northeast Petroleum University, Daqing 163318, China
  • Received:2012-08-24 Online:2003-01-24 Published:2013-04-01

摘要:

为了对二自由度机械臂轨迹进行规划, 提出了一种新的动态搜索Q学习算法。该算法不需要建立机械臂的数学模型, 直接对轨迹进行规划, 根据学习进程动态调整贪婪策略的比例参数, 并给出较传统方式更具客观性和公平性的定量策略评价单元。同时, 由动态更新机构在线更新学习经验。仿真结果表明, 新的Q学习算法能使机械臂更快速地达到目标位置, 并实现轨迹全局最优。

关键词: 机械臂, Q学习, 贪婪策略, 轨迹规划, 定量评价单元

Abstract:

In order to achieve the purpose of trajectory for 2DOF (Two Degrees of Freedom) manipulator, we propose an improved Qlearning algorithm which doesn't need the mathematical model of manipulator and can plan trajectory directly. The algorithm can dynamically adjust parameters of greedy strategy according to the study process. The simulation results show that the manipulator reaches the target position more quickly and the trajectory is the most optimal one when the new algorithm is applied to 2DOF manipulator trajectory plan.

Key words: manipulator, Q-learning, greedy strategy, trajectory plan, quantitative judgment unit

中图分类号: 

  • TP242.2