吉林大学学报(工学版) ›› 2014, Vol. 44 ›› Issue (01): 137-141.doi: 10.13229/j.cnki.jdxbgxb201401024

• paper • Previous Articles     Next Articles

Feature selection algorithm based on random forest

YAO Deng-ju1,2, YANG Jing1, ZHAN Xiao-juan3   

  1. 1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
    2. School of Software, Harbin University of Science and Technology, Harbin 150040, China;
    3. College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin 150050, China
  • Received:2012-08-21 Online:2014-01-01 Published:2014-01-01

Abstract:

A feature selection algorithm based on random forest (RFFS) is proposed. This algorithm adopts random forest algorithm as the basic tool, the classification accuracy as the criterion function. The sequential backward selection and generalized sequential backward selection methods are employed for feature selection. The experimental results on UCI datasets show that the RFFS algorithm has better performance in classification accuracy and feature selection subset than the other methods in literatures.

Key words: artificial intelligence, random forest, feature selection, wrapper

CLC Number: 

  • TP18

[1] 蒋胜利. 高维数据的特征选择与特征提取研究[D]. 西安:西安电子科技大学计算机学院, 2011. Jiang Sheng-li. Research on feature selection and feature extraction for high-dimensional data[D]. Xi'an: School of Computer Science and Engineering, Xidian University, 2011.

[2] Davies S, Russl S. NP-completeness of searches for smallest possible feature sets[C]//Proceedings of the AAAI Fall Symposiums on Relevance, Menlo Park, 1994: 37-39.

[3] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.

[4] Strobl Carolin, Boulesteix Anne-Laure, Kneib Thomas, et al. Conditional variable importance for random forests[J]. BMC Bioinformatics, 2008, 9(1): 1-11.

[5] Reif David M, Motsinger Alison A, McKinney Brett A, et al. Feature selection using a random forests classifier for the integrated analysis of multiple data types[C]//IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2006: 171-178.

[6] Mohammed Khalilia, Sounak Chakraborty, Mihail Popescu. Predicting disease risks from highly imbalanced data using random forest[J]. BMC Medical Informatics and Decision Making, 2011, 11(7): 51-58.

[7] Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests[J]. Pattern Recognition, 2011, 44(2): 330-349.

[8] Inza I, Larranaga P, Blanco R. Filter versus wrapper gene selection approaches in DNA microarray domains[J]. Artificial Intelligence in Medicine, 2004, 31(2): 91-103.

[9] 蒋盛益, 郑琪, 张倩生. 基于聚类的特征选择方法[J]. 电子学报, 2008, 36(12):157-160. Jiang Sheng-yi, Zheng Qi, Zhang Qian-sheng. Clustering-based feature selection[J]. Acta Electronica Sinica, 2008, 36(12):157-160.

[10] 刘元宁, 王刚, 朱晓冬, 等. 基于自适应多种群遗传算法的特征选择[J]. 吉林大学学报:工学版, 2011, 41(6): 1690-1693. Liu Yuan-ning, Wang Gang, Zhu Xiao-dong, et al.Feature selection based on adaptive multi-population genetic algorithm[J].Journal of Jilin University(Engineering and Technology Edition), 2011, 41(6):1690-1693.

[1] DONG Sa, LIU Da-you, OUYANG Ruo-chuan, ZHU Yun-gang, LI Li-na. Logistic regression classification in networked data with heterophily based on second-order Markov assumption [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1571-1577.
[2] GU Hai-jun, TIAN Ya-qian, CUI Ying. Intelligent interactive agent for home service [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1578-1585.
[3] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Measurement of graph similarity based on vertical dimension sequence dynamic time warping method [J]. 吉林大学学报(工学版), 2018, 48(4): 1199-1205.
[4] ZHANG Hao, ZHAN Meng-ping, GUO Liu-xiang, LI Zhi, LIU Yuan-ning, ZHANG Chun-he, CHANG Hao-wu, WANG Zhi-qiang. Human exogenous plant miRNA cross-kingdom regulatory modeling based on high-throughout data [J]. 吉林大学学报(工学版), 2018, 48(4): 1206-1213.
[5] HUANG Lan, JI Lin-ying, YAO Gang, ZHAI Rui-feng, BAI Tian. Construction of disease-symptom semantic net for misdiagnosis prompt [J]. 吉林大学学报(工学版), 2018, 48(3): 859-865.
[6] LI Xiong-fei, FENG Ting-ting, LUO Shi, ZHANG Xiao-li. Automatic music composition algorithm based on recurrent neural network [J]. 吉林大学学报(工学版), 2018, 48(3): 866-873.
[7] LIU Jie, ZHANG Ping, GAO Wan-fu. Feature selection method based on conditional relevance [J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[8] WANG Xu, OUYANG Ji-hong, CHEN Gui-fen. Heuristic algorithm of all common subsequences of multiple sequences for measuring multiple graphs similarity [J]. 吉林大学学报(工学版), 2018, 48(2): 526-532.
[9] YANG Xin, XIA Si-jun, LIU Dong-xue, FEI Shu-min, HU Yin-ji. Target tracking based on improved accelerated gradient under tracking-learning-detection framework [J]. 吉林大学学报(工学版), 2018, 48(2): 533-538.
[10] LIU Xue-juan, YUAN Jia-bin, XU Juan, DUAN Bo-jia. Quantum k-means algorithm [J]. 吉林大学学报(工学版), 2018, 48(2): 539-544.
[11] QU Hui-yan, ZHAO Wei, QIN Ai-hong. A fast collision detection algorithm based on optimization operator [J]. 吉林大学学报(工学版), 2017, 47(5): 1598-1603.
[12] LI Jia-fei, SUN Xiao-yu. Clustering method for uncertain data based on spectral decomposition [J]. 吉林大学学报(工学版), 2017, 47(5): 1604-1611.
[13] SHAO Ke-yong, CHEN Feng, WANG Ting-ting, WANG Ji-chi, ZHOU Li-peng. Full state based adaptive control of fractional order chaotic system without equilibrium point [J]. 吉林大学学报(工学版), 2017, 47(4): 1225-1230.
[14] WANG Sheng-sheng, WANG Chuang-feng, GU Fang-ming. Spatio-temporal reasoning for OPRA direction relation network [J]. 吉林大学学报(工学版), 2017, 47(4): 1238-1243.
[15] MA Miao, LI Yi-bin. Multi-level image sequences and convolutional neural networks based human action recognition method [J]. 吉林大学学报(工学版), 2017, 47(4): 1244-1252.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!