吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (5): 1817-1822.doi: 10.13229/j.cnki.jdxbgxb20200370

• 计算机科学与技术 • 上一篇    

基于深度学习的大规模语义文本重叠区域检索

董丽丽(),杨丹,张翔()   

  1. 西安建筑科技大学 信息与控制工程学院,西安 710055
  • 收稿日期:2020-05-26 出版日期:2021-09-01 发布日期:2021-09-16
  • 通讯作者: 张翔 E-mail:kkkuujm@163.com;xd220210@163.com
  • 作者简介:董丽丽(1960-),女,教授,硕士.研究方向:智能控制,系统决策.E-mail:kkkuujm@163.com
  • 基金资助:
    国家自然科学基金项目(61701388);陕西省自然科学基础研究计划项目(2018JM6080);西安市科技局科技创新引导项目(201805033YD11CG17)

Large⁃scale semantic text overlapping region retrieval based on deep learning

Li-li DONG(),Dan YANG,Xiang ZHANG()   

  1. School of Information and Control Engineering,Xi'an University of Architecture and Technology,Xi'an 710055,China
  • Received:2020-05-26 Online:2021-09-01 Published:2021-09-16
  • Contact: Xiang ZHANG E-mail:kkkuujm@163.com;xd220210@163.com

摘要:

针对传统文本重叠区域检索方法存在精确性和查全性差的问题,提出了基于深度学习的大规模语义文本重叠区域检索方法。结合稀疏自动编码器与深度置信网络构建了混合模型,依据混合模型设计并构建了文本分类器,该分类器主要组成部分为文本预处理、特征学习、分类检索。针对文本集合中文本实行去噪、分词和去停止词等一系列预处理。最后,采用Softmax回归实现文本分类,将学习得到的文本特征当作分类器的输入得到文本重叠区域分类检索结果。经实验验证可知:该方法查准率与查全率均较高,表现出了可靠性与鲁棒性。

关键词: 深度学习, 语义文本, 重叠区域检索, 深度置信网络, 特征学习

Abstract:

As a hot topic in natural language processing, overlapping region recognition needs to be further explored and studied. Aiming at the problem of poor accuracy and recall in traditional text overlapping region retrieval methods, a large-scale semantic text overlapping region retrieval method based on deep learning is proposed. Combined with sparse automatic encoder and depth confidence network, a hybrid model is constructed. According to the hybrid model, a text classifier is designed and constructed. The main components of the classifier are text preprocessing, feature learning and classification retrieval. In this paper, a series of preprocessing, such as de-noising, word segmentation and stop word removal, are carried out. Finally, softmax regression is used to realize text classification, and the learned text features are used as the input of the classifier to get the classification and retrieval results of the overlapping regions. The experimental results show that the accuracy and recall of the method are both high, showing reliability and robustness.

Key words: deep learning, semantic text, overlapping region retrieval, deep confidence network, feature learning

中图分类号: 

  • TP391

图1

混合模型结构"

图2

文本分类器"

图3

文本预处理流程"

图4

对比散度法运行过程"

图5

文本特征分类精度结果"

图6

检索精度对比结果"

图7

检索召回率对比结果"

1 张倩倩, 田学东, 杨芳, 等. 基于数学文本和表达式转换的融合检索模型[J]. 计算机工程, 2019, 45(3): 175-181, 187.
Zhang Qian-qian, Tian Xue-dong, Yang Fang, et al. Integration retrieval model based on transformation of mathematical text and expression[J]. Computer Engineering, 2019, 45(3): 175-181, 187.
2 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报: 工学版, 2018, 48(5): 1621-1628.
Che Xiang-jiu, Wang Li, Guo Xiao-xin. Improved boundary detection based on multi-scale cues fusion[J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1621-1628.
3 林泽琦, 邹艳珍, 赵俊峰, 等. 基于代码结构知识的软件文档语义搜索方法[J]. 软件学报, 2019, 30(12): 3714-3729.
Lin Ze-qi, Zou Yan-zhen, Zhao Jun-feng, et al. Software text semantic search approach based on code structure knowledge[J]. Journal of Software, 2019, 30(12): 3714-3729.
4 何涛, 王桂芳, 杨美妮, 等. 基于词嵌入语义的精准检索式构建方法[J]. 现代情报, 2018, 38(11): 55-58.
He Tao, Wang Gui-fang, Yang Mei-ni, et al. Construction of precise search queries based on word embedding[J]. Modern Information, 2018, 38(11): 55-58.
5 林云, 孙晓刚, 姜尧岗, 等. 基于语义分割的活体检测算法[J]. 吉林大学学报: 工学版, 2020, 50(3): 281-287.
Lin Yun, Sun Xiao-gang, Jiang Yao-gang, et al. Live detection algorithm based on semantic segmentation[J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(3): 281-287.
6 吴曦, 俞能海, 张卫明. 一种基于BloomFilter的改进型加密文本模糊搜索机制研究[J]. 控制与决策, 2019, 34(1): 97-104.
Wu Xi, Yu Neng-hai, Zhang Wei-ming, et al. An improved multi-keyword fuzzy search scheme based on BloomFilter over encrypted text[J]. Control and Decision, 2019, 34(1): 97-104.
7 李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4): 86-99.
Li Zhi-yi, Huang Zi-feng, Xu Xiao-mian. A review of the cross-modal retrieval model and feature extraction based on representation learning[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(4): 86-99.
8 王永强, 韩磊. 基于文本驱动的动画素材自动检索系统设计[J]. 现代电子技术, 2018, 41(24): 177-179.
Wang Yong-qiang, Han Lei. Design of animation material automatic retrieval system based on text driven[J]. Modern Electronics Technique, 2018, 41(24): 177-179.
9 马健, 刘峰, 李红辉, 等. 采用PageRank和节点聚类系数的标签传播重叠社区发现算法[J]. 国防科技大学学报, 2019, 41(1): 186-193.
Ma Jian, Liu Feng, Li Hong-hui, et al. Overlapping community detection algorithm by label propagation using PageRank and node clustering coefficients[J]. Journal of National University of Defense Technology, 2019, 41(1): 186-193.
10 缪峰, 贾华丁, 熊于宁. 基于服务相似度的移动用户近似邻居选取方法[J]. 计算机工程, 2018, 44(5): 168-173, 179.
Miao Feng, Jia Hua-ding, Xiong Yu-ning. Approximate neighbors selection method for mobile user based on services similarity[J]. Computer Engineering, 2018, 44(5): 168-173, 179.
[1] 金立生,郭柏苍,王芳荣,石健. 基于改进YOLOv3的车辆前方动态多目标检测算法[J]. 吉林大学学报(工学版), 2021, 51(4): 1427-1436.
[2] 兰凤崇,李继文,陈吉清. 面向动态场景复合深度学习与并行计算的DG-SLAM算法[J]. 吉林大学学报(工学版), 2021, 51(4): 1437-1446.
[3] 李锦青,周健,底晓强. 基于循环生成对抗网络的学习型光学图像加密方案[J]. 吉林大学学报(工学版), 2021, 51(3): 1060-1066.
[4] 袁哲明,袁鸿杰,言雨璇,李钎,刘双清,谭泗桥. 基于深度学习的轻量化田间昆虫识别及分类模型[J]. 吉林大学学报(工学版), 2021, 51(3): 1131-1139.
[5] 彭博,张媛媛,王玉婷,唐聚,谢济铭. 基于自动编码机-分类器的视频交通状态自动识别[J]. 吉林大学学报(工学版), 2021, 51(3): 886-892.
[6] 宋震,李俊良,刘贵强. 基于深度学习和限幅模糊的变转速液压动力源恒流量预测方法[J]. 吉林大学学报(工学版), 2021, 51(3): 1106-1110.
[7] 赵宏伟,刘晓涵,张媛,范丽丽,龙曼丽,臧雪柏. 基于关键点注意力和通道注意力的服装分类算法[J]. 吉林大学学报(工学版), 2020, 50(5): 1765-1770.
[8] 谌华,郭伟,闫敬文,卓文浩,吴良斌. 基于深度学习的SAR图像道路识别新方法[J]. 吉林大学学报(工学版), 2020, 50(5): 1778-1787.
[9] 郜峰利,陶敏,李雪妍,何昕,杨帆,王卓,宋俊峰,佟丹. 基于深度学习的CT影像脑卒中精准分割[J]. 吉林大学学报(工学版), 2020, 50(2): 678-684.
[10] 徐谦,李颖,王刚. 基于深度学习的行人和车辆检测[J]. 吉林大学学报(工学版), 2019, 49(5): 1661-1667.
[11] 郭立民,陈鑫,陈涛. 基于AlexNet模型的雷达信号调制类型识别[J]. 吉林大学学报(工学版), 2019, 49(3): 1000-1008.
[12] 李抵非, 田地, 胡雄伟. 基于分布式内存计算的深度学习方法[J]. 吉林大学学报(工学版), 2015, 45(3): 921-925.
[13] 常发亮, 李江宝. 拓扑模型和特征学习的多摄像机接力跟踪策略[J]. 吉林大学学报(工学版), 2013, 43(增刊1): 330-334.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!