吉林大学学报(理学版) ›› 2023, Vol. 61 ›› Issue (3): 665-670.

• • 上一篇    下一篇

基于K-medoids聚类算法的多源信息数据集成算法

祝鹏, 郭艳光   

  1. 内蒙古农业大学 计算机技术与信息管理系, 内蒙古 包头 014109
  • 收稿日期:2022-03-28 出版日期:2023-05-26 发布日期:2023-05-26
  • 通讯作者: 郭艳光 E-mail:guoyanguang@imau.edu.cn

Multi-source Information Data Integration Algorithm Based on K-Medoids Clustering Algorithm

ZHU Peng, GUO Yanguang   

  1. Department of Computer Technology and Information Management, Inner Mongolia Agricultural University, Baotou 014109, Inner Mongolia Autonomous Region, China
  • Received:2022-03-28 Online:2023-05-26 Published:2023-05-26

摘要: 针对因多源信息数据源域相似性较低、 不易确定导致的集成难度较大问题, 提出一种基于K-medoids聚类算法的集成方法. 先将多源数据的聚类过程视为迁移学习过程, 确定初始样本的权重值, 记录训练样本每次迭代时权重和损失期望值的学习特点, 再利用特点参数判定数据属于源域还是目标域; 然后将集成算法聚类转化为多样化的域值标记问题, 使数据具有聚类特性后, 再分别计算源域和目标域中待集成数据间的权重因子, 利用权重因子覆盖特性判定二者间的交互信息量, 对信息量较高的数据进行集成, 以确保集成的成功率. 仿真实验结果表明, 该算法无论是在稳定、 数目较少的数据集, 还是在紊乱、 数目较多较杂的数据集下, 都能实现高效集成, 并且二次集成次数较少, 整体耗用较低.

关键词: K-medoids聚类算法, 多源数据, 源域, 目标域, 交互信息量

Abstract: Aiming at the problem that the integration difficulty was relatively high caused by the low similarity and uncertainty of multi-source information data source domain, we proposed an integration method based on K-medoids clustering algorithm. First,  the clustering process of multi-source data was regarded as a transfer learning process, the weight value of the initial sample was determined, the learning characteristics of the weight and loss expectation value of the training sample in each iteration were recorded, and then the characteristic parameters were used to determine whether the data belongs to the source domain or the target domain. Then the clustering of the integration algorithm was transformed into a diversified domain value marking problem. After the data had the clustering characteristics, the weight factors between the data to be integrated in the source domain and the target domain were calculated respectively, the amount of interactive information between them was determined by using  the coverage characteristics of the weight factors, and the data with high amount of information was integrated to ensure the success rate of integration. The simulation experiment results show that the proposed algorithm  can achieve efficient integration, less secondary integration times and low overall consumption under stable and less datasets, or disordered and more and more complex datasets.

Key words: K-medoids clustering algorithm, multi\, source data, source domain, target domain, amount of interactive information

中图分类号: 

  • TP393.09