吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

文本分类中基于综合度量的特征选择方法

杨杰明1,2, 刘元宁2, 曲朝阳1, 刘志颖1   

  1. 1. 东北电力大学 信息工程学院, 吉林 吉林 132012; 2. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2012-09-07 出版日期:2013-09-26 发布日期:2013-09-17
  • 通讯作者: 杨杰明 E-mail:yjmlzy@gmail.com

Feature Selection Algorithm Based on ComprehensiveMeasurement for Text Categorization

YANG Jieming1,2, LIU Yuanning2, QU Zhaoyang1, LIU Zhiying1   

  1. 1. School of Information Engineering, Northeast Dianli University, Jilin 132012, Jilin Province, China;2. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2012-09-07 Online:2013-09-26 Published:2013-09-17
  • Contact: YANG Jieming E-mail:yjmlzy@gmail.com

摘要:

针对传统特征选择算法的不足, 提出一种新的特征选择算法. 该算法能综合度量一个特征在类内和类间的重要性, 并在3个不同的数据集上利用2个分类器与5个现有的特征选择方法进行了对比实验. 实验结果表明, 该算法进一步降低了特征向量空间的维度, 并有效提高了分类器的分类性能.

关键词: 特征选择, 文本分类, 降维

Abstract:

In view of the disadvantages of traditional feature selection algorithm, we proposed a new feature selection algorithm, which simultaneously measures the importance of one feature both in intracategory and intercategory. The proposed algorithm was compared with five feature selection algorithms via two classification algorithms on three benchmark document collections. The experimental results show the proposed method can reduce the dimensionality
 of the text representation and significantly improve the performance of the text categorization.

Key words: feature selection, text categorization, dimensionality reduction

中图分类号: 

  • TP301.6