吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (4): 837-843.

• • 上一篇    下一篇

基于机器学习的文本分类与标签预测算法

孙晓瑜#br#   

  1. 中国石油大学(华东图书馆,山东青岛266580
  • 收稿日期:2023-12-08 出版日期:2025-08-15 发布日期:2025-08-15
  • 作者简介:孙晓瑜(1981— ),女,湖北松滋人,中国石油大学(华东)副研究馆员,硕士,主要从事知识服务及信息技术研究,(Tel) 86-15863060818(E-mail)Sunxiaoyu1124@126. com。
  • 基金资助:
    中国图书馆学会阅读推广课题基金资助项目(2024LSCYDFZZYB059); 青岛市哲学社会科学规划研究基金资助项目 (QDSKL2301043)

Text Classification and Label Prediction Algorithms Based on Machine Learning

SUN Xiaoyu    

  1. Library, China University of Petroleum (East China), Qingdao 266580, China
  • Received:2023-12-08 Online:2025-08-15 Published:2025-08-15

摘要:

文本数据量大时,需要从文本数据中提取有效的特征,以捕捉文本的重要信息,以便于文本的存储和查询。为此,提出基于机器学习的文本分类与标签预测算法研究。采用条件随机场方法对待处理文本展开词性标注和切分,获取文本的特征。将文本特征输入自注意力机制循环卷积神经网络中,经过模型训练输出 文本的分类结果和标签预测结果。经实验证明,所提算法可以有效的完成对文本的分类和标签预测,文本分类 取伪率平均为95.2%, 文本预测排序损失平均为0.4%

关键词: 文本分类, 条件随机场, 词性标注, 自注意力机制循环卷积神经网络, 机器学习

Abstract: When there is a large amount of text data, it is necessary to extract effective features from the text data to capture important information of the text to facilitate the storage and querying of the text. Therefore a machine learning based text classification and label prediction algorithm research is proposed. Conditional random field method is used to annotate and segment the part of speech of the processed text, and obtain the features of the texrt. Text features are inputted into a self attention mechanism recurrent convolutional neural network, and after model training, the classification results and label prediction results of the text outputted. After experimental verification, the proposed algorithm can effectively complete text classification and label prediction, with an average false rate of 95. 2% in text classification and an average loss of 0. 4% in text prediction ranking.

Key words: text classification, conditional random field, part-of-speech tagging, self attention mechanism cyclic convolutional neural network, machine learning

中图分类号: 

  • TP391