吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

一种基于信息增益的新垃圾邮件特征选择算法

李猛, 刘元宁   

  1. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2016-06-24 出版日期:2017-03-26 发布日期:2017-03-24
  • 通讯作者: 刘元宁 E-mail:lyn@jlu.edu.cn

A New Spam Feature Selection Algorithm Based on Information Gain

LI Meng, LIU Yuanning   

  1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2016-06-24 Online:2017-03-26 Published:2017-03-24
  • Contact: LIU Yuanning E-mail:lyn@jlu.edu.cn

摘要: 基于传统信息增益特征选择算法, 通过提出类内分散度与类间集中度的概念, 结合传统信息增益算法, 解决了信息增益算法因忽略特征项的分布而导致的性能下降问题, 提高了信息增益算法的效率. 使用改进的特征选择算法进行垃圾邮件过滤实验, 在不同的分类器下, 与传统的特征选择算法进行对比, 实验结果表明, 改进的特征选择算法性能较优.

关键词: 信息增益, 垃圾邮件, 类内分散度, 特征选择, 类间集中度

Abstract: The concept of intraclass dispersity and interclass concentration was proposed based on the traditional information gain feature se lection algorithm. Combined with the traditional information gain algorithm, i t solved the problem of performance degradation caused by ignoring the distribut ion of the characteristic items and improved the efficiency of the information g ain algorithm. The improved feature selection algorithm was applied to the spam filtering experiment. Compared with the traditional feature selection algorithms under different classifiers, the experimental results show that the improved fe ature selection algorithm has better performance.

Key words: information gain, spam, intraclass dispersity, feature selection, interclass concentration

中图分类号: 

  • TP181