吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (5): 1144-1150.

• • 上一篇    下一篇

基于并行聚类算法的区域经济大数据集成分类方法 

祁蔚茹,毕 鹏   

  1. 西安交通大学城市学院,西安710018
  • 收稿日期:2025-03-12 出版日期:2025-09-28 发布日期:2025-11-20
  • 作者简介:祁蔚茹(1990— ), 女, 河北保定人, 西安交通大学副教授,主要从事创新驱动发展研究, (Tel)86-14760945636(E-mail) qiweiru1990@163. com。
  • 基金资助:
    陕西省教育厅一般专项科学研究计划基金资助项目(24JK0137)

Integrated Classification Method for Regional Economic Big Data Based on Parallel Clustering Algorithm

QI Weiru, BI Peng   

  1. City College, Xi’an Jiaotong University, Xi’an 710018, China
  • Received:2025-03-12 Online:2025-09-28 Published:2025-11-20

摘要: 针对区域经济数据来源多样,并且数据格式、结构和语义存在显著差异,难以统一处理,导致数据特征量的提取难以精准实现及数据分类结果不准确的问题,提出了基于并行聚类算法的区域经济大数据集成分类方法。 基于区域经济大数据的特性,计算数据的纯度和邻域半径,确定区域经济大数据的缺失值,并对其进行修正填充。 基于填充后的数据,利用并行聚类算法,将其随机划分为多个数据子集。 并行聚类算法利用多节点并行处理,显著提升计算效率,满足大规模数据处理需求。 提取每个数据子集的特征量,进而设计大数据基分类器。 在考虑基分类器内部数据密度的前提下,确定每个基分类器的权重值,将每个基分类器的分类结果进行组合, 输出最终的数据集成分类结果。 实验结果表明, 设计的分类方法在实际应用中DBI(Davies-Bouldin Index)指数为0.31, 并能实现准确的区域经济大数据分类。

关键词: 并行聚类算法, 区域经济大数据, 大数据集成, 大数据分类, 基分类器, 修正填充

Abstract: The sources of regional economic data are diverse, including statistical departments, enterprise reports, sensor data, et al. There are significant differences in data format, structure, and semantics, making it difficult to process them uniformly. This leads to difficulties in accurately extracting data features, which in turn results in inaccurate data classification results for methods. To address this issue, a regional economic big data integrated classification method based on parallel clustering algorithm is proposed. Based on the characteristics of regional economic big data, calculate the purity and neighborhood radius of the data, determine the missing values of regional economic big data, and correct and fill them in. Based on the filled data, parallel clustering algorithm is used to randomly divide it into multiple subsets of data. The parallel clustering algorithm utilizes multi node parallel processing to significantly improve computational efficiency and meet the requirements of large-scale data processing. Extract the feature quantities of each data subset and design a big data base classifier accordingly. Under the premise of considering the internal data density of the base classifiers, determine the weight values of each base classifier, combine the classification results of each base classifier, and output the final data ensemble classification result. The experimental results show that the designed classification method has a DBI (Davies-Bouldin Index) index of 0.31 in practical applications, which can achieve accurate classification of regional economic big data.

Key words: parallel clustering algorithm, regional economic big data, big data integration, big data classification, base classifier, correction filling

中图分类号: 

  • TP391