本体指导下的安全强化学习最优化策略

吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (1): 83-0090.

本体指导下的安全强化学习最优化策略

郝嘉宁^1,2, 姚永伟³, 叶育鑫^1,4

1. 吉林大学计算机科学与技术学院, 长春 130012; 2. 浪潮通用软件有限公司, 济南 250101;
3. 中国人民解放军 63611部队, 新疆库尔勒 841000；4. 吉林大学符号计算与知识工程教育部重点实验室, 长春 130012

收稿日期:2024-01-05 出版日期:2025-01-26 发布日期:2025-01-26
通讯作者: 叶育鑫 E-mail:yeyx@jlu.edu.cn

Optimization Strategy for Safety Reinforcement Learning Guided by Ontology

HAO Jianing^1,2, YAO Yongwei³, YE Yuxin^1,4

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. Inspur General Software Co., Ltd., Jinan 250101, China；3. 63611 Unit of the Chinese People’s Liberation Army, Korla 841000, Xinjiang Uygur Autonomous Region, China；
4. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China

Received:2024-01-05 Online:2025-01-26 Published:2025-01-26

摘要/Abstract

摘要： 针对安全强化学习实现过程中, 基于屏蔽的实现方式可能受制于没有合适的备用策略可供使用, 导致判断出危险也不能阻止系统离开安全状态, 结合知识的实现方式虽然能通过提取概念特征, 用结构化的知识对指定状态给予安全指导, 但有时知识蕴含的指导可能并不是最优的策略, 甚至可能不如智能体探索习得策略的问题, 提出一个本体指导下的安全强化学习最优化策略, 实现风险识别规避、动作生成最优化. 基于该理论设计和实现了一个在无人机避障场景下的仿真系统, 并使用5种不同的强化学习算法进行效果验证. 实验结果表明, 基于本体指导的安全强化学习最优化策略能在屏蔽风险动作的基础上, 实现智能体备用策略选取, 比传统强化学习方法性能更优.

关键词: 安全强化学习, 屏蔽机制, 本体, 深度神经网络, 联合查询

Abstract: Aiming at the problem that in the implementation process of safety reinforcement learning, the implementation approach based on shielding might be constrained by the lack of suitable alternative policies available, which resulted in the inability to prevent the system from leaving a safe state even if danger was detected. Although the implementation approach of knowledge integration could provide safety guidance for specific states by extracting conceptual features and applying structured knowledge, sometimes the guidance embedded in knowledge might not be the optimal strategy, and might even be inferior to the strategies learned by agent exploration. We proposed an optimization strategy for safety reinforcement learning guided by ontology to achieve risk
identification avoidance and action generation optimization. Based on this theory, we designed and implemented a simulation system in the scenario of unmanned aerial vehicle obstacle avoidance, and verified the effectiveness by using five different reinforcement learning algorithms. The experimental results show that the optimization strategy for safety reinforcement learning based on ontology guidance can achieve alternative policy selection for intelligent agents on the basis of shielding risky actions, and has better performance than traditional reinforcement learning methods.

Key words: safety reinforcement learning, shielding mechanism, ontology, deep neural network, conjunctive query

中图分类号:

TP183

郝嘉宁, 姚永伟, 叶育鑫. 本体指导下的安全强化学习最优化策略[J]. 吉林大学学报(理学版), 2025, 63(1): 83-0090.

HAO Jianing, YAO Yongwei, YE Yuxin. Optimization Strategy for Safety Reinforcement Learning Guided by Ontology[J]. Journal of Jilin University Science Edition, 2025, 63(1): 83-0090.

[1]	何杰, 王佳蓉, 王恒恒. 基于节点语义相似度的本体映射方法[J]. 吉林大学学报(理学版), 2024, 62(2): 399-0409.
[2]	李伟伟, 王丽妍, 傅博, 王娟, 黄虹. 基于多模态融合的深度神经网络图像复原方法[J]. 吉林大学学报(理学版), 2024, 62(2): 391-0398.
[3]	何杰, 屈国兴. 基于XML Schema分块的快速本体构建方法[J]. 吉林大学学报(理学版), 2022, 60(5): 1113-1122.
[4]	宋海明, 侯頔. Blac-Scholes模型下美式期权定价的神经网络算法[J]. 吉林大学学报(理学版), 2021, 59(5): 1089-1092.
[5]	李旭东, 周林华. 基于大津算法和深度学习的开集声纹识别自适应阈值计算方法[J]. 吉林大学学报(理学版), 2021, 59(4): 909-914.
[6]	陈刚, 徐星羽. 基于维基百科信息框的本体信息提取[J]. 吉林大学学报(理学版), 2020, 58(2): 355-363.
[7]	周诗源, 王英林. 基于抽取规则和本体映射的语义搜索算法[J]. 吉林大学学报(理学版), 2018, 56(2): 329-334.
[8]	王铁君, 王维兰. 唐卡领域本体研究与构建[J]. 吉林大学学报(理学版), 2017, 55(02): 363-370.
[9]	韩红章, 景征骏. 一种应用于策略网络系统的本体融合算法[J]. 吉林大学学报(理学版), 2016, 54(03): 580-586.
[10]	成锦晖, 郑山红, 李万龙, 岳绍敏. 本体领域综合概念相似度计算中的权重确定方法[J]. 吉林大学学报(理学版), 2014, 52(06): 1272-1276.
[11]	侯丽鑫, 郑山红, 赵辉, 董亚则, 彭馨仪. 基于P-集合和FCA的中文领域本体学习方法[J]. 吉林大学学报(理学版), 2013, 51(04): 659-665.
[12]	孙善武, 王楠, 张福威. 基于本体的工作流抽象模型[J]. J4, 2012, 50(02): 299-304.
[13]	尚晋, 张睿, 吕帅, 刘磊. 基于依赖关系图模型的本体演化影响分析[J]. J4, 2012, 50(01): 89-94.
[14]	张鹏, 杨峰, 吕帅, 刘磊. 基于聚类的本体块匹配方法[J]. J4, 2011, 49(03): 493-497.
[15]	周栩, 罗景文, 周桐, 刘磊. 一种基于演化代价约束的本体演化方法[J]. J4, 2010, 07(4): 646-653.