吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (5): 894-900.

• • 上一篇    下一篇

基于改进ID3 算法的非结构化大数据分类优化方法 

唐锴令,郑  皓   

  1. 长沙矿冶研究院海洋矿产资源开发利用技术研究所,长沙410012
  • 收稿日期:2023-06-20 出版日期:2024-10-21 发布日期:2024-10-23
  • 作者简介:唐锴令(1999— ), 男, 长沙人, 长沙矿冶研究院硕士研究生, 主要从事信息安全、 大数据、 数字孪生研究, (Tel)86- 18874210789(E-mail)104610867@ qq. com; 郑皓(1981— ), 男, 江西上饶人, 长沙矿冶研究院正高级工程师, 主要从事 深海矿产资源开发和安全研究,(Tel)86-18670076080(E-mail)1587015388@qq. com。
  • 基金资助:
    湖南省自然科学基金资助项目(2022JK60058) 

Optimization Method for Unstructured Big Data Classification Based on Improved ID3 Algorithm

TANG Kailing, ZHENG Hao   

  1. Institute of Marine Mineral Resources Development and Utilization Technology, Changsha Research Institute of Mining and Metallurgy, Changsha 410012, China
  • Received:2023-06-20 Online:2024-10-21 Published:2024-10-23

摘要: 针对非结构化大数据在分类过程中,由于其数据中存在大量的冗余数据,若不能及时清洗大数据中的 冗余数据,会降低数据分类精度的问题,提出一种基于改进ID3(Iterative Dichotomiser 3)算法的非结构化大数 据分类优化方法。 该方法针对非结构化大数据集合中冗余数据多以及维度繁杂的问题,对数据进行清洗处理, 并结合有监督辨识矩阵完成数据降维;根据数据降维结果,采用改进ID3算法建立用于数据分类的决策树分类 模型,通过该模型对非结构化大数据进行分类处理,从而实现数据的精准分类。 实验结果表明,使用该方法对 非结构化大数据分类时,分类效果好,精度高。

关键词: 改进ID3算法, 数据清洗, 数据降维, 非结构化大数据, 数据分类方法 

Abstract: During the classification process of unstructured big data, due to the large amount of redundant data in the data, if the redundant data cannot be cleaned in a timely manner, it will reduce the classification accuracy of the data. In order to effectively improve the effectiveness of data classification, a non structured big data classification optimization method based on the improved ID3(Iterative Dichotomiser 3) algorithm is proposed. This method addresses the problem of excessive redundant data and complex data dimensions in unstructured big data sets. It cleans the data and combines supervised identification matrices to achieve data dimensionality reduction; Based on the results of data dimensionality reduction, an improved ID3 algorithm is used to establish a decision tree classification model for data classification. Through this model, unstructured big data is classified and processed to achieve accurate data classification. The experimental results show that when using this method to classify unstructured big data, the classification effect is good and the accuracy is high. 

Key words: improve the iterative dichotomiser 3(ID3) algorithm, data cleaning, data dimensionality reduction, unstructured big data, data classification methods 

中图分类号: 

  • TP301