吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

基于互信息和随机森林的混合变量选择算法

赵伟卫1, 李艳颖2, 赵风芹1, 魏洒洒1   

  1. 1. 西安电子科技大学 数学与统计学院, 西安 710126; 2. 宝鸡文理学院 数学与信息科学学院, 陕西 宝鸡 721013
  • 收稿日期:2016-07-18 出版日期:2017-07-26 发布日期:2017-07-13
  • 通讯作者: 赵伟卫 E-mail:zhaoweiweitg@163.com

Hybrid Variable Selection Algorithm Based onMutual Information and Random Forest

ZHAO Weiwei1, LI Yanying2, ZHAO Fengqin1, WEI Sasa1   

  1. 1. School of Mathematics and Statistics, Xidian University, Xi’an 710126, China; 2. School of Mathematicsand Information Science, Baoji University of Arts and Science, Baoji 721013, Shaanxi Province, China
  • Received:2016-07-18 Online:2017-07-26 Published:2017-07-13
  • Contact: ZHAO Weiwei E-mail:zhaoweiweitg@163.com

摘要: 针对单一变量选择算法中模型分类精度和泛化能力较低的问题, 提出一种混合变量选择算法. 该算法分为两个阶段: 过滤阶段, 利用互信息快速排除一部分无关变量, 降低样本空间的维数; 封装阶段, 在置换理论框架下, 利用随机森林精选剩余变量. 实验结果表明, 该算法与对比算法相比具有更高的分类精度和泛化能力.

关键词: 随机森林, 互信息, 混合算法, 变量选择

Abstract: Aiming at the problem that the classification accuracy and generalization ability of model were low in single variable selection algorithms, we proposed a hybrid variable selection algorithm. The algorithm was divided into two stages. In filtration stage, mutual information was used to quickly exclude a part of irrelevant variables, which reduced the dimension of sample space. In wrapper stage, the random forest was used to refine the remaining variables in the framework of permutation theory. The experimental results show that, compared with the contrast algorithm, this algorithm has higher classification accuracy and generalization ability.

Key words: hybrid algorithm, random forest, variable selection, mutual information

中图分类号: 

  • TP391