吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

基于经验风险的中心文本分类算法

周晓堂, 欧阳继红, 李熙铭   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2012-12-19 出版日期:2013-09-26 发布日期:2013-09-17
  • 通讯作者: 欧阳继红 E-mail:ouyangjihong@yahoo.com.cn

Centroid Classifier Based on Empirical Risk for Text Categorization

ZHOU Xiaotang, OUYANG Jihong, LI Ximing   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2012-12-19 Online:2013-09-26 Published:2013-09-17
  • Contact: OUYANG Jihong E-mail:ouyangjihong@yahoo.com.cn

摘要:

采用经验风险最小化归纳原则和梯度下降方法调整传统中心分类法的类别中心向量, 解决了传统中心分类法因忽略训练集文本权值因素而导致的类别中心向量表达能力较差问题, 得到了与支持向量机分类性能基本一致的一种改进的中心分类法. 实验结果表明, 该方法是提高中心分类法分类性能的一种有效方法.

关键词: 文本分类, 中心分类法, 经验风险最小化

Abstract:

Empirical risk minimization inductive principle and gradient descent method were used to fix classcentroidvectors in traditional centroidbased text classification algorithms so as to improve the poor expression ability of classcentroidvectors in traditional centroidbased text classification algorithm caused by ignoring the weighting factors of training texts. Then, an improved centroidbased text classification algorithm was obtained, the performance of which is as well as those of support vector machines. Experimental results show that the method adopted in this article is an effective mean to improve the performance of traditional centroidbased text classification algorithms.

Key words: text classification, centroidbased text classification algorithms, empirical risk minimization

中图分类号: 

  • TP391.1