基于改进ID3 算法的非结构化大数据分类优化方法

吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (5): 894-900.

基于改进ID3 算法的非结构化大数据分类优化方法

唐锴令,郑皓

长沙矿冶研究院海洋矿产资源开发利用技术研究所,长沙410012

收稿日期:2023-06-20 出版日期:2024-10-21 发布日期:2024-10-23
作者简介:唐锴令(1999— ), 男, 长沙人, 长沙矿冶研究院硕士研究生, 主要从事信息安全、大数据、数字孪生研究, (Tel)86- 18874210789(E-mail)104610867@ qq. com; 郑皓(1981— ), 男, 江西上饶人, 长沙矿冶研究院正高级工程师, 主要从事深海矿产资源开发和安全研究,(Tel)86-18670076080(E-mail)1587015388@qq. com。
基金资助:
湖南省自然科学基金资助项目(2022JK60058)

Optimization Method for Unstructured Big Data Classification Based on Improved ID3 Algorithm

TANG Kailing, ZHENG Hao

Institute of Marine Mineral Resources Development and Utilization Technology, Changsha Research Institute of Mining and Metallurgy, Changsha 410012, China

Received:2023-06-20 Online:2024-10-21 Published:2024-10-23

摘要/Abstract

摘要： 针对非结构化大数据在分类过程中,由于其数据中存在大量的冗余数据,若不能及时清洗大数据中的冗余数据,会降低数据分类精度的问题,提出一种基于改进ID3(Iterative Dichotomiser 3)算法的非结构化大数据分类优化方法。该方法针对非结构化大数据集合中冗余数据多以及维度繁杂的问题,对数据进行清洗处理, 并结合有监督辨识矩阵完成数据降维;根据数据降维结果,采用改进ID3算法建立用于数据分类的决策树分类模型,通过该模型对非结构化大数据进行分类处理,从而实现数据的精准分类。实验结果表明,使用该方法对非结构化大数据分类时,分类效果好,精度高。

关键词: 改进ID3算法, 数据清洗, 数据降维, 非结构化大数据, 数据分类方法

Abstract: During the classification process of unstructured big data, due to the large amount of redundant data in the data, if the redundant data cannot be cleaned in a timely manner, it will reduce the classification accuracy of the data. In order to effectively improve the effectiveness of data classification, a non structured big data classification optimization method based on the improved ID3(Iterative Dichotomiser 3) algorithm is proposed. This method addresses the problem of excessive redundant data and complex data dimensions in unstructured big data sets. It cleans the data and combines supervised identification matrices to achieve data dimensionality reduction; Based on the results of data dimensionality reduction, an improved ID3 algorithm is used to establish a decision tree classification model for data classification. Through this model, unstructured big data is classified and processed to achieve accurate data classification. The experimental results show that when using this method to classify unstructured big data, the classification effect is good and the accuracy is high.

Key words: improve the iterative dichotomiser 3(ID3) algorithm, data cleaning, data dimensionality reduction, unstructured big data, data classification methods

中图分类号:

TP301

唐锴令, 郑皓. 基于改进ID3 算法的非结构化大数据分类优化方法 [J]. 吉林大学学报(信息科学版), 2024, 42(5): 894-900.

TANG Kailing, ZHENG Hao. Optimization Method for Unstructured Big Data Classification Based on Improved ID3 Algorithm[J]. Journal of Jilin University (Information Science Edition), 2024, 42(5): 894-900.

[1]	秦喜文, 冷春晓, 董小刚. 基于混合策略的蜣螂优化算法研究[J]. 吉林大学学报(信息科学版), 2024, 42(5): 829-839.
[2]	何剑萍, 徐胜超, 贺敏伟. 基于用户画像与二部图的大学生就业岗位推荐算法 [J]. 吉林大学学报(信息科学版), 2024, 42(5): 856-865.
[3]	袁满, 卢雯雯. 融合认知负荷的学习者模型的构建与推荐研究 [J]. 吉林大学学报(信息科学版), 2024, 42(5): 943-951.
[4]	王思琪, 关巍, 佟敏, 赵盛烨. 基于 ATMADDPG 算法的多水面无人航行器编队导航[J]. 吉林大学学报(信息科学版), 2024, 42(4): 588-599.
[5]	海日, 张兴亮, 姜源, 杨永健. 稳定且受限的新强化学习 SAC 算法[J]. 吉林大学学报(信息科学版), 2024, 42(2): 318-325.
[6]	陈经涛, 朱大伟, 钱琦. 基于 Kent 映射的数字集群动态负载均衡算法研究 [J]. 吉林大学学报(信息科学版), 2024, 42(2): 326-332.
[7]	刁庶, 蒋川东, 田宝凤, 王春杰. 误差理论与数据处理课程综合性实验平台设计[J]. 吉林大学学报(信息科学版), 2023, 41(6): 969-975.
[8]	谢春丽, 陶天艺, 李佳浩 . 基于改进人工势场法的路径规划研究 [J]. 吉林大学学报(信息科学版), 2023, 41(6): 998-1006.
[9]	邱宇 , 欧阳敏 , 胡斌 , 杨文博 , 盖永浩 , 邓聪 , 张文祥. 改进的局部二值法与熵结合的边缘检测算法[J]. 吉林大学学报(信息科学版), 2023, 41(5): 952-960.
[10]	付光杰, 后乐云. 基于优化的 VSVPWM 三电平 NPC 逆变器控制策略[J]. 吉林大学学报(信息科学版), 2023, 41(3): 417-426.
[11]	刘文杰, 杨海军. 基于 ESCS 剪枝策略的闭频繁项集挖掘算法[J]. 吉林大学学报(信息科学版), 2023, 41(2): 329-337.
[12]	李昂, 张爽, 陈曙东. 便携式瞬变电磁系统未爆弹特征化准确性研究[J]. 吉林大学学报(信息科学版), 2023, 41(2): 259-264.
[13]	邓昊原, 张爽, 陈曙东. 基于瞬变电磁探测的未爆弹分类研究[J]. 吉林大学学报(信息科学版), 2023, 41(2): 265-271.
[14]	袁满, 李明轩, 张维罡, 袁靖舒. 基于本体和关联数据的知识集成模型研究[J]. 吉林大学学报(信息科学版), 2023, 41(1): 67-75.
[15]	王晓, 唐少茹. 无桥图最短偶子图覆盖的上界[J]. 吉林大学学报(信息科学版), 2023, 41(1): 112-117.