吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

数据挖掘中改进的C4.5决策树分类算法

王文霞   

  1. 运城学院 计算机科学与技术系, 山西 运城 044000
  • 收稿日期:2016-10-25 出版日期:2017-09-26 发布日期:2017-09-26
  • 通讯作者: 王文霞 E-mail:wangwx@126.com

Improved C45 Decision Tree Classification Algorithm in Data Mining

WANG Wenxia   

  1. Department of Computer Science and Technology, Yuncheng University, Yuncheng 044000, Shanxi Province, China
  • Received:2016-10-25 Online:2017-09-26 Published:2017-09-26
  • Contact: WANG Wenxia E-mail:wangwx@126.com

摘要: 针对传统C4.5决策树分类算法需要进行多次扫描, 导致运行效率低的缺陷, 提出一种新的改进C4.5决策树分类算法. 通过优化信息增益推导算法中相关的对数运算, 以减少决策树分类算法的运行时间; 将传统算法中连续属性的简单分裂属性改进为最优划分点分裂处理, 以提高算法效率. 实验结果表明, 改进的C4.5决策树分类算法相比传统的C4.5决策树分类算法极大提高了执行效率, 减小了需求空间.

关键词: 连续属性, C4.5决策树, 数据挖掘, 分类算法, 判别能力度量

Abstract: Aiming at the problem that the algorithm for traditional C45 decision tree classification algorithm needed to be scanned several times, resulting in defects of running low efficiency, the author proposed a new improved C45 decision tree classification algorithm by optimizing the logarithmic operation related information gain derivation algorithm in order to reduce the running time of the decision tree classification algorithm. And the simple split attribute of the continuous attributes in the traditional algorithm was improved to the optimal partition point splitting processing in order to improve the efficiency of
the algorithm. Experimental results show that compared with the traditional C45 decision tree classification algorithm, the improved C45 decision tree classification algorithm greatly improves the execution efficiency and reduces the demand space.

Key words: C45 decision tree, data mining, discriminative ability measure, continuous attribute, classification algorithm

中图分类号: 

  • TP391