Journal of Jilin University(Engineering and Technology Edition) ›› 2022, Vol. 52 ›› Issue (4): 910-915.doi: 10.13229/j.cnki.jdxbgxb20210025

Previous Articles    

Text information similarity search algorithm based on segment estimation and PageRank

Ling ZHAI1(),Xu CUI2()   

  1. 1.Department of Library Information Technology,Xi'an University of Science and Technology,Xi'an 710054,China
    2.School of Public Management,Northwest University,Xi'an 710027,China
  • Received:2021-01-19 Online:2022-04-01 Published:2022-04-20
  • Contact: Xu CUI E-mail:zhaillll1@163.com;zhai13909282219@163.com

Abstract:

Because the current existing methods fail to consider the problem of text information feature extraction, the average relevance, average excellence rate and new word search accuracy rate decrease. In order to effectively solve the above problems, a text information similarity search algorithm based on segment estimation and PageRank is proposed. First, the segmentation estimation method is used to extract text features, and the PageRank value is taken as the criterion for preliminary classification of text information. Then, the similarity of different features of text information is calculated, the text information similarity is sorted. Finally, the similarity is searched based on the relevance between text information, effectively realizing the similarity search of text information. Simulation results show that the proposed algorithm can comprehensively improve the average correlation, average excellence rate and new word search accuracy, and the highest new word search accuracy reaches 98.98%, which indicates that the algorithm can obtain high quality and high stability search results.

Key words: segment estimation, PageRank, text information, similarity search

CLC Number: 

  • TP391

Fig.1

Local extremum points and fluctuating points"

Fig.2

Vector space model"

Fig.3

Flow chart of text information similarity search"

Table 1

Search content library"

搜索类别搜索数量/个
总数1500
含有新词50
复杂搜索700
简单搜索750

Fig.4

Average correlation comparison results of different algorithms"

Fig.5

Comparison results of average excellence rate of different algorithms"

Table 2

Comparison results of new word search accuracy of different algorithms"

新词数量/个不同算法新词搜索准确率/%
本文文献[3文献[4
1098.9896.8595.63
2097.5895.3693.85
3097.4594.1492.74
4096.2592.2591.33
5096.3692.5590.74
6095.4191.7488.39
7095.2090.3687.85
1 刘素艳, 刘元安, 吴帆, 等. 物联网中基于相似性计算的传感器搜索[J]. 电子与信息学报, 2018, 40(12): 3020-3027.
Liu Su-yan, Liu Yuan-an, Wu Fan, et al. Sensor search based on sensor similarity computing in the Internet of Things[J]. Journal of Electronics & Information Technology, 2018, 40(12): 3020-3027.
2 朱颢东, 丁温雪, 杨立志, 等. 微博环境下基于用户行为与主题相似度的改进PageRank算法[J]. 计算机工程, 2017, 43(5): 179-184.
Zhu Hao-dong, Ding Wen-xue, Yang Li-zhi, et al. Improved Pagerank algorithm based on user behavior and topic similarity in microblog environment[J]. Computer Engineering, 2017, 43(5): 179-184.
3 段瑞, 方欢, 詹悦. 基于加权流关系的流程相似性算法[J]. 电子学报, 2019, 47(12): 2596-2601.
Duan Rui, Fang Huan, Zhan Yue. Process similarity algorithm based on weighted flow relationship[J]. Acta Electronica Sinica, 2019, 47(12): 2596-2601.
4 黎万英, 黄瑞章, 丁志远, 等. 基于用户行为特征的多维度文本聚类[J]. 计算机应用, 2018, 38(11): 3127-3131.
Li Wan-ying, Huang Rui-zhang, Ding Zhi-yuan, et al. Multi-dimensional text clustering with user behavior characteristics[J]. Journal of Computer Applications, 2018, 38(11): 3127-3131.
5 库珊, 刘钊. 基于PageRank与HITS的改进算法的网页排名优化[J]. 武汉科技大学学报: 自然科学版, 2019, 42(2): 155-160.
Ku Shan, Liu Zhao. An improved algorithm for page rank optimization based on an improved algorithm for page rank optimization based on Pagerank and HITS algorithms Pagerank and HITS algorithms[J]. Journal of Wuhan University of Science and Technology(Natural Science Edition), 2019, 42(2): 155-160.
6 赵沛然, 吴新元, 汤新雨, 等. 基于GN分裂的小目标检测区域推荐搜索算法[J]. 光学学报, 2018, 38(9): 277-282.
Zhao Pei-ran, Wu Xin-yuan, Tang Xin-yu, et al. An algorithm of small object detection region proposal search based on GN splitting[J]. Acta Optica Sinica, 2018, 38(9): 277-282.
7 孙红, 左腾. 基于PageRank的微博用户影响力算法研究[J]. 计算机应用研究, 2018, 35(4): 1028-1032.
Sun Hong, Zuo Teng. Research on algorithm of micro-blog user influence based on PageRank[J]. Application Research of Computers, 2018, 35(4): 1028-1032.
8 谭泗桥, 张席, 李钎, 等. 基于最大互信息系数的信息推送模型构建[J]. 吉林大学学报: 工学版, 2018, 48(2): 558-563.
Tan Si-qiao, Zhang Xi, Li Qian, et al. Information push model-building based on maximum mutual information coefficient[J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(2): 558-563.
9 康卫, 邱红哲, 焦冬冬,等. 基于搜索的短文本分类算法研究[J]. 电子技术应用, 2018, 44(11): 121-123.
Kang Wei, Qiu Hong-zhe, Jiao Dong-dong, et al. Search-based short-text classification[J]. Application of Electronic Technique, 2018, 44(11): 121-123.
10 金洁, 徐岳皓, 刘振宇. 基于PageRank的论文引用网络关系挖掘[J]. 中国电子科学研究院学报, 2019, 14(9): 924-928.
Ji Jie, Xu Yue-hao, Liu Zhen-yu. Paper relational network mining based on Pagerank[J]. Journal of China Academy of Electronics and Information Technology, 2019, 14(9): 924-928.
11 赵宏伟, 王鹏, 范丽丽, 等. 相似性保持实例检索方法[J]. 吉林大学学报: 工学版, 2019, 49(6): 2045-2050.
Zhao Hong-wei, Wang Peng, Fan Li-li, et al. Similarity retention instance retrieval method[J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2045-2050.
[1] Xue WANG,Zhan-shan LI,Ying-da LYU. Medical image segmentation based on multi⁃scale context⁃aware and semantic adaptor [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 640-647.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!