Journal of Jilin University (Information Science Edition) ›› 2024, Vol. 42 ›› Issue (5): 894-900.

Previous Articles     Next Articles

Optimization Method for Unstructured Big Data Classification Based on Improved ID3 Algorithm

TANG Kailing, ZHENG Hao   

  1. Institute of Marine Mineral Resources Development and Utilization Technology, Changsha Research Institute of Mining and Metallurgy, Changsha 410012, China
  • Received:2023-06-20 Online:2024-10-21 Published:2024-10-23

Abstract: During the classification process of unstructured big data, due to the large amount of redundant data in the data, if the redundant data cannot be cleaned in a timely manner, it will reduce the classification accuracy of the data. In order to effectively improve the effectiveness of data classification, a non structured big data classification optimization method based on the improved ID3(Iterative Dichotomiser 3) algorithm is proposed. This method addresses the problem of excessive redundant data and complex data dimensions in unstructured big data sets. It cleans the data and combines supervised identification matrices to achieve data dimensionality reduction; Based on the results of data dimensionality reduction, an improved ID3 algorithm is used to establish a decision tree classification model for data classification. Through this model, unstructured big data is classified and processed to achieve accurate data classification. The experimental results show that when using this method to classify unstructured big data, the classification effect is good and the accuracy is high. 

Key words: improve the iterative dichotomiser 3(ID3) algorithm, data cleaning, data dimensionality reduction, unstructured big data, data classification methods 

CLC Number: 

  • TP301