吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (2): 372-377.

• • 上一篇    下一篇

基于迁移学习的非结构化大数据缺失值插补算法

颜远海, 杨莉云   

  1. 广州华商学院 数据科学学院, 广东 增城 511300
  • 收稿日期:2023-03-16 出版日期:2024-04-10 发布日期:2024-04-12
  • 作者简介:颜远海(1985— ), 男, 江西吉安人, 广州华商学院讲师, 主要从事大数据可视化技术和数据分析算法研究, ( Tel)86- 18924273591(E-mail)yan85028@ 163. com。
  • 基金资助:
    创新强校工程基金资助项目(2017KQNCX266) 

Missing Value Interpolation Algorithm of Unstructured Big Data Based on Transfer Learning 

YAN Yuanhai, YANG Liyun   

  1. College of Data Science, Guangzhou Huashang College, Zengcheng 511300, China
  • Received:2023-03-16 Online:2024-04-10 Published:2024-04-12

摘要: 针对数字信息产生的海量、 多角度的非结构化大数据, 由于外界干扰、 数据结构损坏等因素造成其信息 丢失问题, 提出了基于迁移学习的非结构化大数据缺失值插补算法。 通过迁移学习算法, 预测非结构化大数据 缺失部位, 利用朴素贝叶斯算法分类数据特征, 度量属性间权重值, 明确数据类别特征差异向量, 辨别特征 差异程度。 采用核回归模型对数据缺失部分实施非线性映射, 经过多项式变化编码, 描述数据的跨空间互补条 件, 完成非结构化大数据缺失值插补。 实验结果表明, 所提算法可以有效完成非结构化大数据缺失值插补, 具有较好的插补效果, 能提高插补精度。

关键词: 迁移学习, 非结构化大数据, 缺失值插补, 缺失值预测, 核回归函数

Abstract: Due to the complexity of digital information, massive and multi-angle unstructured big data, and external interference, data structure damage and other factors cause its information loss, a missing value interpolation algorithm for unstructured big data based on transfer learning is proposed. Through the migration learning algorithm, the missing parts of unstructured big data are predicted, and the naive Bayesian algorithm is used to classify data features, to measure the weight value between attributes, to clarify the feature difference vector of data categories, and to identify the degree of feature difference. The kernel regression model is used to implement nonlinear mapping for the missing part of the data, and the polynomial change coding is used to describe the cross-space complementary condition of the data, completing the interpolation of the missing value of unstructured big data. The experimental results show that the proposed algorithm can effectively complete the interpolation of missing values of unstructured large data, has good interpolation effect and can improve the interpolation accuracy.

Key words: transfer learning, unstructured big data, imputation of missing values, missing value prediction, kernel regression function

中图分类号: 

  • TP391