Journal of Jilin University Science Edition ›› 2025, Vol. 63 ›› Issue (1): 83-0090.

Previous Articles     Next Articles

Optimization Strategy  for Safety Reinforcement Learning Guided by Ontology

HAO Jianing1,2, YAO Yongwei3, YE Yuxin1,4   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China; 2. Inspur General Software Co., Ltd., Jinan 250101, China;3. 63611 Unit of the Chinese People’s Liberation Army, Korla 841000, Xinjiang Uygur Autonomous Region, China;
    4. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Received:2024-01-05 Online:2025-01-26 Published:2025-01-26

Abstract: Aiming at the problem that in the implementation process of safety reinforcement learning, the implementation approach based on shielding might  be constrained by the lack of suitable alternative policies available, which resulted in the inability to prevent  the system from leaving a safe state even if danger was detected. Although the implementation approach of  knowledge integration could  provide safety guidance for specific states by extracting conceptual features and applying structured knowledge, sometimes the guidance embedded in knowledge might not be the optimal strategy, and might even be inferior to  the strategies learned by agent exploration. We proposed an optimization strategy for safety reinforcement learning guided by ontology to achieve  risk 
identification avoidance and  action generation optimization. Based on this theory, we designed and implemented a simulation system in the scenario of unmanned aerial vehicle  obstacle avoidance, and verified  the effectiveness by using  five different reinforcement learning algorithms. The experimental results show that the optimization strategy for safety reinforcement learning based on  ontology guidance can achieve  alternative policy selection for intelligent agents on the basis of shielding risky actions, and has better performance than  traditional reinforcement learning methods.

Key words: safety reinforcement learning, shielding mechanism, ontology, deep neural network, conjunctive query

CLC Number: 

  • TP183