吉林大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (5): 1375-1384.doi: 10.7964/jdxbgxb201405025

• • 上一篇    下一篇

动态约束下可重构模块机器人分散强化学习最优控制

董博1, 刘克平2, 李元春2   

  1. 1.吉林大学 控制科学与工程系, 长春 130022;
    2.长春工业大学 控制工程系, 长春 130012
  • 收稿日期:2013-06-01 出版日期:2014-09-01 发布日期:2014-09-01
  • 通讯作者: 李元春(1962),男,教授,博士生导师.研究方向:智能机械与机器人控制.E-mail:liyc@mail.ccut.edu.cn
  • 作者简介:董博(1986), 男, 博士研究生.研究方向:智能机械与机器人控制.E-mail:bodong09@mails.jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61374051,60974010); 吉林省科技发展计划项目(20110705).

Decentralized reinforcement learning optimal control for time varying constrained reconfigurable modular robot

DONG Bo1,LIU Ke-ping2,LI Yuan-chun2   

  1. 1.Department of Control Science and Engineering,Jilin University,Changchun 130022,China;
    2.Department of Control Engineering,Changchun University of Technology,Changchun 130012,China
  • Received:2013-06-01 Online:2014-09-01 Published:2014-09-01

摘要: 基于ction-critic-identifier(ACI)与RBF神经网络,提出了一种外界动态约束下的可重构模块机器人分散强化学习最优控制方法,解决了存在强耦合不确定性的模块机器人系统的连续时间非线性最优控制问题。文中将机器人动力学模型描述为一个交联子系统的集合,基于连续时间MDPs性能指标,结合ACI与RBF神经网络,对子系统最优值函数,最优控制策略及总体不确定项进行辨识,使系统满足HJB方程下的最优条件,从而使可重构模块机器人子系统渐进跟踪期望轨迹,跟踪误差收敛且有界。采用Lyapunov理论对系统稳定性进行证明,数值仿真验证了所提出的分散控制策略的有效性。

关键词: 自动控制技术, 可重构模块机器人, 强化学习, 非线性最优控制, 分散控制

Abstract: Based on Action-Critic-Identifier (ACI) and Radial Basis Function (RBF) neural network, a novel decentralized reinforcement learning optimal control method for time varying constrained reconfigurable modular robot is presented. The continuous time nonlinear optimal control problem of strongly coupled uncertainty robotic system is solved. The dynamics of the robot is described as a synthesis of interconnected subsystems. As a precondition to the continuous-time MDPs performance indicators, the optimal value function, optimal control policy and global uncertainty of the subsystems are estimated combing with ACI and RBF network. The optimal conditions of HJB equation with regard to the subsystem are satisfied, so that the reconfigurable modular robot system can track the desired trajectory in a short time and the estimation error can converge to zero in finite time. The stability of the system is confirmed by Lyapunov theory. Simulations are performed to illustrate the effectiveness of the proposed decentralized control scheme.

Key words: automatic control technology, reconfigurable modular robot, reinforcement learning, nonlinear optimal control, decentralized control

中图分类号: 

  • TP273
[1] Li Yuan-chun, Dong Bo. Decentralized ADRC control for reconfigurable manipulators based on VGSTA-ESO of sliding mode[J]. Information-an International Interdisciplinary Journal, 2012, 15(6): 2453-2465.
[2] 李英,朱明超,李元春.基于速度观测模型的可重构机械臂补偿控制[J].控制理论与应用,2008,25(5):891-897.Li Ying, Zhu Ming-chao, Li Yuan-chun. Velocity observer based compensator for motion control of a reconfigurable manipulator [J]. Control Theory & Applications, 2008, 25(5):891-897.
[3] 朱明超,李元春.可重构机械臂分散自适应模糊滑模控制[J].吉林大学学报:工学版,2009,39(1):170-176.Zhu Ming-chao, Li Yuan-chun. Decentralized adaptive sliding mode control for reconfigurable manipulators using fuzzy logic[J].Journal of Jilin University(Engineering and Technology Edition), 2009,39(1):170-176.
[4] 朱明超,李英,李元春.基于观测器的可重构机械臂分散自适应模糊控制[J].控制与决策,2009,24(3):429-434.Zhu Ming-chao, Li Ying, Li Yuan-chun. Observer-based decentralized adaptive fuzzy control for reconfigurable manipulator[J].Control and Decision, 2009, 24(3):429-434.
[5] Xu Yan-kai, Cao Xi-ren. Lebesgue-sampling-based optimal control problems with time aggregation[J]. IEEE Transactions on Automatic Control, 2011, 56(5): 1097-1109.
[6] Lewis F L, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control[J]. IEEE Circuits and Systems Magzine, 2009, 9(3): 32-50.
[7] Xu Xin, He Han-gen, Hu De-wen. Efficient reinforcement learning using recursive least-squares methods[J]. Journal of Artificial Intelligence Research, 2002, 16: 259-292.
[8] Lewis F L, Liu De-rong. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control[M]. New York: Wiley-IEEE Press, 2012.
[9] Lewis F L, Syrmos V L. Optimal Control[M]. New York: John Wiley & Sons, Inc, 1995.
[10] Sassano M, Astolfi A. Dynamic approximate solutions of the HJ inequality and of the HJB equation for input-affine nonlinear systems[J]. IEEE Transactions on Automatic Control, 2012, 57(10):2490-2503.
[11] 吴玉香,王聪. 基于确定学习的机器人任务空间自适应神经网络控制[J].自动化学报, 2013, 39(6): 806-815.Wu Yu-xiang, Wang Cong. Deterministic learning based adaptive network control of robot in task space[J]. Acta Automatica Sinica, 2013,39(6): 806-815.
[12] Patre P M, MacKunis W, Kaiser K, et al. Asymptotic tracking for uncertain dynamic systems via a multilayer neural network feedforward and RISE feedback control structure[J]. IEEE Transactions on Automatic Control, 2008,53(9): 2180-2185.
[13] Paden B, Sastry S. Calculus for computing Filippov's differential inclusion with application to the variable structure control of robot manipulators[J]. IEEE Transactions on Circuits Systems, 1987, 3(1):73-82.
[1] 顾万里,王萍,胡云峰,蔡硕,陈虹. 具有H性能的轮式移动机器人非线性控制器设计[J]. 吉林大学学报(工学版), 2018, 48(6): 1811-1819.
[2] 李战东,陶建国,罗阳,孙浩,丁亮,邓宗全. 核电水池推力附着机器人系统设计[J]. 吉林大学学报(工学版), 2018, 48(6): 1820-1826.
[3] 赵爽,沈继红,张刘,赵晗,陈柯帆. 微细电火花加工表面粗糙度快速高斯评定[J]. 吉林大学学报(工学版), 2018, 48(6): 1838-1843.
[4] 王德军, 魏薇郦, 鲍亚新. 考虑侧风干扰的电子稳定控制系统执行器故障诊断[J]. 吉林大学学报(工学版), 2018, 48(5): 1548-1555.
[5] 闫冬梅, 钟辉, 任丽莉, 王若琳, 李红梅. 具有区间时变时滞的线性系统稳定性分析[J]. 吉林大学学报(工学版), 2018, 48(5): 1556-1562.
[6] 张茹斌, 占礼葵, 彭伟, 孙少明, 刘骏富, 任雷. 心肺功能评估训练系统的恒功率控制[J]. 吉林大学学报(工学版), 2018, 48(4): 1184-1190.
[7] 董惠娟, 于震, 樊继壮. 基于激光测振仪的非轴对称超声驻波声场的识别[J]. 吉林大学学报(工学版), 2018, 48(4): 1191-1198.
[8] 田彦涛, 张宇, 王晓玉, 陈华. 基于平方根无迹卡尔曼滤波算法的电动汽车质心侧偏角估计[J]. 吉林大学学报(工学版), 2018, 48(3): 845-852.
[9] 张士涛, 张葆, 李贤涛, 王正玺, 田大鹏. 基于零相差轨迹控制方法提升快速反射镜性能[J]. 吉林大学学报(工学版), 2018, 48(3): 853-858.
[10] 王林, 王洪光, 宋屹峰, 潘新安, 张宏志. 输电线路悬垂绝缘子清扫机器人行为规划[J]. 吉林大学学报(工学版), 2018, 48(2): 518-525.
[11] 胡云峰, 王长勇, 于树友, 孙鹏远, 陈虹. 缸内直喷汽油机共轨系统结构参数优化[J]. 吉林大学学报(工学版), 2018, 48(1): 236-244.
[12] 朱枫, 张葆, 李贤涛, 王正玺, 张士涛. 基于强跟踪卡尔曼滤波的陀螺信号处理[J]. 吉林大学学报(工学版), 2017, 47(6): 1868-1875.
[13] 晋超琼, 张葆, 李贤涛, 申帅, 朱枫. 基于扰动观测器的光电稳定平台摩擦补偿策略[J]. 吉林大学学报(工学版), 2017, 47(6): 1876-1885.
[14] 冯建鑫. 具有测量时滞的不确定系统的递推鲁棒滤波[J]. 吉林大学学报(工学版), 2017, 47(5): 1561-1567.
[15] 许金凯, 王煜天, 张世忠. 驱动冗余重型并联机构的动力学性能[J]. 吉林大学学报(工学版), 2017, 47(4): 1138-1143.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!