Journal of Jilin University (Information Science Edition) ›› 2023, Vol. 41 ›› Issue (6): 1079-1085.

Previous Articles     Next Articles

Unbalanced Big Data Classification Algorithm Based on Random Forest Model

WEI Yaming 1 , MENG Yuan 2   

  1. 1. Information Department, Xuzhou Central Hospital, Xuzhou 221000, China; 2. Graduate School, Jiangsu Normal University, Xuzhou 221000, China
  • Received:2022-11-11 Online:2023-11-30 Published:2023-12-01

Abstract: In response to the problem of poor classification performance faced by current imbalanced big data classification algorithms, a random forest model based imbalanced big data classification algorithm is proposed. Firstly, the SVM(Support Vector Machine) algorithm is used to filter information on imbalanced big data, and then the anti k-nearest neighbor method is used to detect and eliminate outliers. The singularity of the covariance matrix in imbalanced big data is removed through incremental principal component analysis. And based on the entropy method, weight analysis is carried out to extract imbalanced big data feature information. The CART (Classification and Regression Trees) decision tree is used as the base classifier for imbalanced big data, and a random forest decision tree classifier is constructed. The extracted imbalanced big data feature information is input into the classifier to achieve imbalanced big data classification. The experimental results show that the proposed algorithm has good sampling performance, high classification accuracy, high stability, and high performance for imbalanced big data. 

Key words: stochastic forest model, unbalanced big data classification, support vector machine( SVM), Anti k-nearest neighbor method, classification and regression trees(CART) decision tree

CLC Number: 

  • TP391