吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (5): 959-965.

• • 上一篇    下一篇

改进决策树算法的大数据分类优化方法

唐灵逸1, 唐怡雯2, 李蓓蓓   

  1. 1. 上海交通大学医学院附属仁济医院上海200127;2. 上海市卫生健康委员会办公室(信息化管理处), 上海200125;
    3. 上海市静安区临汾路街道社区卫生服务中心, 上海200435
  • 收稿日期:2023-05-18 出版日期:2024-10-21 发布日期:2024-10-23
  • 通讯作者: 唐怡雯(1993— ), 女, 上海人, 上海市卫生健康委员会助理研究员, 主要 从事护理学研究,(Tel)86-13744386746(E-mail)liuchunhui869@163. com。
  • 作者简介:唐灵逸(1994— ), 男, 浙江宁波人, 上海交通大学助理工程师, 主要从事医学信息学研究, (Tel)86-13122389759 (E-mail)goyong7894@163. com
  • 基金资助:
    上海市自然科学基金资助项目(16GR137510) 

 Improved Decision Tree Algorithm for Big Data Classification Optimization 

 TANG Lingyi1, TANG Yiwen2, LI Beibei3   

  1. 1. Renji Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200127, China; 2. General Office (Information Management Division), Shanghai Municipal Health Commission, Shanghai 200125, China; 3. Community Health Service Center of Linfen Road, Jing’an District, Shanghai 200435, China
  • Received:2023-05-18 Online:2024-10-21 Published:2024-10-23

摘要: 针对当前海量数据的结构和特征较为复杂,对其分类时很难确保较高的精准度与效率的问题,提出了 改进决策树算法的大数据分类优化方法。 构建模糊决策函数检测大数据的序列特征,并将其输入决策树模型 中挖掘和训练规则;利用灰狼优化算法改进决策树模型,使用改进后模型对大数据简化、粗略分类,再建立分 类器准确度目标函数,实现对大数据的精准分类。 实验结果表明,所提方法取得分类结果准确度最高、假正例 率最低,保证了算法整体具有较高的吞吐量,提高了算法分类效率。 

关键词: 决策树模型, 灰狼优化算法, 目标函数, 大数据分类, 模糊决策函数

Abstract: Due to the complex structure and features of current massive data, big data exhibits unstructured and small sample characteristics, making it difficult to ensure high accuracy and efficiency in its classification. Therefore, a big data classification optimization method is proposed to improve the decision tree algorithm. A fuzzy decision function is constructed to detect sequence features of big data, and these features inputted into a decision tree model to mine and train rules. The decision tree model is improved using grey wolf optimization algorithm. The big data is classified using the improved model, and then a classifier accuracy objective function is established to achieve accurate classification of big data. The experimental results show that the proposed method achieves the highest accuracy in classification results and the lowest false positive case rate, ensuring the overall high throughput of the algorithm and improving its classification efficiency.

Key words: decision tree model, grey wolf optimization algorithm, objective function, big data classification, fuzzy decision function

中图分类号: 

  • TP394