吉林大学学报(信息科学版) ›› 2020, Vol. 38 ›› Issue (1): 99-106.

• • 上一篇    下一篇

融合BabelNet 的多语言智能信息检索模型

于再富,袁满   

  1. 东北石油大学计算机与信息技术学院,黑龙江大庆163318
  • 收稿日期:2019-03-07 出版日期:2020-01-20 发布日期:2020-02-17
  • 作者简介:于再富( 1998— ) ,男,黑龙江齐齐哈尔人,东北石油大学本科生,主要从事知识工程、信息检索等研究,( Tel) 86- 15174535383( E-mail) yzfssg@126. com; 通讯作者: 袁满( 1965— ) ,男,吉林农安人,东北石油大学教授,博士生导师,主要 从事数据科学、数据标准化、知识组织及信息集成方面的研究,( Tel) 86-15765959186( E-mail) yuanman@ nepu. edu. cn。
  • 基金资助:
    黑龙江省哲学社会科学规划研究基金资助项目( 19ED334) ; 黑龙江省教育厅国家培育基金资助项目( 2017PYYL-06) ; 东北石
    油大学研究生创新科研基金资助项目( JYCX_CX07_2018_2)

Retrieval Model of Multi-Language Intelligent Information Based on BabelNet

YU Zaifu,YUAN Man   

  1. School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China
  • Received:2019-03-07 Online:2020-01-20 Published:2020-02-17

摘要: 传统的跨语言信息检索存在翻译映射准确度低和查询扩展后语义偏离等问题。为此提出结合统计学和
本体论的方法构建多语言信息检索模型,通过使用统计翻译解决翻译映射歧义问题,使用多本体BabelNet 等减
少语义关联度损失。由于本体包含大量概念联系,因此使用本体作为语义层表示设计了语义权重算法,并将其
构建在BM25F 统计信息检索模型上作为用户反馈的排序算法。最后根据建立的模型设计实现了多语言信息检
索原型系统,并用基于爬虫技术获取的数据测试集对模型进行测试,实验结果表明,该模型平均查准率高于传
统的基于机器翻译的信息检索模型。

关键词: BabelNet 知识资源库, 多语言信息检索, 排序算法, 语义关联度

Abstract: Traditional cross-language information retrieval has problems such as low translation mapping
accuracy and semantic deviation after query expansion. To deal with this problem,a method of integrating
statistics and ontology is proposed to construct a multi-language information retrieval model. Using statistical
translation to solve the problem of translation mapping ambiguity,the multi-ontology BabelNet is used to
reduce the loss of semantic relevance. Because the ontology contains a large number of conceptual
connections,the ontology is used as the semantic layer representation to design the semantic weighting
algorithm. And it is built on the BM25F statistical information retrieval model as the user feedback sorting
algorithm. Finally,the multi-language information retrieval prototype system is designed according to the
established model,and the model is tested with the data set obtained based on the crawler technology. The
experimental results show that the average precision of the model is higher than the traditional machine
translation-based information retrieval model.

Key words: BabelNet, multi-language information retrieval, sorting algorithm, semantic relevance

中图分类号: 

  • TP391