吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (5): 985-990.

• • 上一篇    下一篇

基于Softmax 回归分类模型的网页搜索排序算法 

党米花   

  1. 西安交通工程学院人文与管理学院,西安710300
  • 收稿日期:2023-07-11 出版日期:2024-10-21 发布日期:2024-10-23
  • 作者简介:党米花(1985— ), 女, 陕西渭南人, 西安交通工程学院讲师, 主要从事数据库和网站建设研究, (Tel)86-15229368567 (E-mail)dangmihua1002@163. com。
  • 基金资助:
    基金项目:西安交通工程学院校级中青年基金资助项目(2023KY-17)

Sorting Algorithm of Web Search Based on Softmax Regression Classification Model

 DANG Mihua   

  1. School of Humanities and Management, Xi’an Traffic Engineering Institute, Xi’an 710300, China
  • Received:2023-07-11 Online:2024-10-21 Published:2024-10-23

摘要: 针对网页搜索结果存在返回的网页与搜索的关键词领域不相关的领域漂移现象,导致用户无法搜索到需 求信息的问题,提出基于Softmax回归分类模型的网页搜索排序算法。 选择网页搜索文本特征,得到相应的特 征项, 利用向量表示模型,将选择的网页搜索文本特征项转换为格式化数据,对网页搜索文本数据进行均衡 处理, 获取网页搜索文本数据集。 采用Softmax回归分类模型, 分类处理网页搜索文本数据集, 预测网页搜索 文本类别,通过Okapi BM25算法, 对网页搜索文本进行排序操作, 实现网页搜索排序。 实验结果表明, 所提 算法具有较好的网页搜索排序,提升了网页搜索排序精度,避免网页搜索排序过程中的领域漂移现象。 

关键词: Softmax 回归分类模型, 网页搜索排序, 文本预处理;TF-IDF算法, Okapi BM25算法 

Abstract:  There is a phenomenon of domain drift in webpage search results, where the returned webpage is not related to the search keyword domain, resulting in that users are unable to search for demand information. Therefore, a web search sorting algorithm based on Softmax regression classification model is proposed. Through the Feature selection of web search text, the corresponding feature items are obtained. Using the vector representation model, the selected web search text feature items are converted into formatted data, and the web search text data is balanced to obtain the web search text data set. Using the Softmax regression classification model, the web search text dataset is classified and processed, the types of web search texts is predicted. And the OkapiBM25 algorithm is used to sort web search texts, achieving web search sorting. The experimental results show that the proposed algorithm performs well in web search sorting, effectively improving the accuracy of web search sorting and avoiding domain drift during the process of web search sorting.

Key words: softmax regression classification model, sort web search, text preprocessing, term-frequency-inverse document frequency(TF-IDF) algorithm, OkapiBM25 algorithm

中图分类号: 

  • TP391