吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (2): 231-237.

• • 上一篇    下一篇

深度学习模式下大数据特征集成分类算法

彭建祥   

  1. 成都市中西医结合医院 信息部, 成都 610000
  • 收稿日期:2023-06-25 出版日期:2025-04-08 发布日期:2025-04-09
  • 作者简介:彭建祥(1976— ),男,四川乐至人,成都市中西医结合医院高级工程师,主要从事医疗信息化和信息管理研究,(Tel)86-18011417391(E-mail)pengjianxiang@ 163. com。
  • 基金资助:
     四川省自然科学基金资助项目(201834646554)

Classification Algorithm of Big Data Feature Integration under Deep Learning Mode

PENG Jianxiang   

  1. Information Department, Chengdu Integrated TCM&Western Medicine Hospital, Chengdu 610000, China
  • Received:2023-06-25 Online:2025-04-08 Published:2025-04-09

摘要: 由于大数据通常来自不同的数据源, 具有不同的格式、结构和质量, 且其中包含大量的冗余特征, 因而在进行特征集成分类时, 这些因素均会影响数据分类精度, 为此, 设计一种深度学习模式下大数据特征集成分类算法。基于深度学习模式建立医疗大数据特征提取模型, 针对模型训练过程中会引入大量噪声, 特征提取结果含有部分无关特征信息, 影响特征集成分类结果的问题, 采用堆叠稀疏降噪编码器抑制无关特征, 即使用散度函数、贪婪算法找出训练最佳参数, 运用损失函数将特征空间无关特征稀疏掉, 得到实际数据特征。通过Auto-encoder 网络搭建特征集成分类模型, 借助类型约束函数、目标函数得出各类全局最佳集成中心, 完成数据特征集成分类。实验结果表明, 所提方法在医疗大数据的分类中得到很好效果, 宏平均值在 0. 95 以上, 且分类速度快, 表明所提方法的分类性能较好。

关键词: 深度学习, 医疗大数据, 特征集成, 堆叠稀疏降噪编码器, 集成中心

Abstract: Big data usually comes from different data sources with diverse formats, structures, and qualities. Big data often contains a large number of redundant features, which can affect the accuracy of data classification during feature integration. To address these issues, a deep learning-based algorithm is proposed for feature integration classification in hospital big data. A feature extraction model is established based on deep learning to extract relevant features from the data. However, since the training process of the model introduces a significant amount of noise, the extracted features may contain irrelevant information, which can impact the results of feature integration classification. Therefore, a stacked sparse denoising autoencoder is employed to suppress irrelevant features. The best training parameters are determined using divergence functions and greedy algorithms, and a loss function is utilized to sparsify the irrelevant features in the feature space, resulting in practical data features.A feature integration classification model is constructed using an autoencoder network, and with the assistance of type-constrained functions and objective functions, the optimal integration centers for each class are obtained to achieve data feature integration classification. Experimental results demonstrate that the proposed method exhibits excellent classification performance, with macro-averaged values above 0. 95, and it also shows fast classification speed, indicating its effectiveness in classification.

Key words: deep learning, medical big data, feature integration, stacked sparse noise reduction encoder, integration center

中图分类号: 

  • TN911