吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (3): 627-634.

• • 上一篇    下一篇

XGBoost启发的双向特征选择算法

王丽1, 王涛1, 肖巍1, 刘兆赓2, 李占山3   

  1. 1. 长春工业大学 计算机科学与工程学院, 长春 130012; 2. 吉林大学人工智能学院, 长春 130012; 3. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2020-10-30 出版日期:2021-05-26 发布日期:2021-05-23
  • 通讯作者: 肖巍 E-mail:xiaowei@ccut.edu.cn

Bidirectional Feature Selection Algorithm Inspired by XGBoost

WANG Li1, WANG Tao1, XIAO Wei1, LIU Zhaogeng2, LI Zhanshan3   

  1. 1. College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China; 2. College of Artificial Intelligence, Jilin University, Changchun 130012, China; 3. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2020-10-30 Online:2021-05-26 Published:2021-05-23

摘要: 针对特征选择过程中特征评价指标单一性的问题, 基于集成学习中的极端梯度提升算法, 提出一种新的特征选择算法. 该算法首先应用极端梯度提升算法中构建集成树模型的指标作为特征选择的特征重要性度量指标, 然后利用一种新的双向搜索策略, 权衡了多种特征重要性对结果的影响, 并优化了评价过程的效率. 通过11个不同维度的标准数据集进行测试, 实验结果表明, 该算法能增加特征子集的多样性, 加快特征选择的速度, 并在中维和低维数据集上均具有较高的计算效率, 且能处理高维数据集.

关键词: 特征选择, 极端梯度提升, 双向搜索

Abstract: Aiming at the problem of single feature evaluation criteria in feature selection process, we proposed a new feature selection algorithm based on the extreme gradient boosting algorithm in ensemble learning. Firstly, the metrics of building ensemble tree model in the extreme gradient boosting algorithm were used as the importance measures of features in feature selection, and then a new bidirectional search strategy was used to balance the influence of multiple feature importance on the results, and optimize the efficiency of evaluation process. Through the test of 11 different dimensions of standard datasets, the experimental results show that the algorithm can increase the diversity of feature subsets, accelerate the speed of feature selection, and has high computational efficiency on both medium and low dimensional datasets, and can deal with high-dimensional datasets.

Key words: feature selection, extreme gradient boosting, bidirectional search

中图分类号: 

  • TP18