吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (5): 1404-1410.

• • 上一篇    下一篇

基于自适应动态特征加权的K-means算法

薛雷1, 王天放2   

  1. 1. 吉林大学 学生就业创业指导与服务中心, 长春 130012; 2. 吉林大学 软件学院, 长春 130012
  • 收稿日期:2025-01-03 出版日期:2025-09-26 发布日期:2025-09-26
  • 通讯作者: 王天放 E-mail:wangtf23@mails.jlu.edu.cn

K-means Algorithm Based on  Adaptive Dynamic Feature Weighting

XUE Lei1, WANG Tianfang2   

  1. 1. Guidance and Service Center for Student Employment and Entrepreneurship, Jilin University, Changchun 130012, China; 2. College of Software, Jilin University, Changchun 130012, China
  • Received:2025-01-03 Online:2025-09-26 Published:2025-09-26

摘要: 首先, 针对传统K-means算法在处理高维异构数据时存在特征平等假设导致重要特征被忽视、 聚类结果对预设簇数高度敏感以及对初始中心点选择强依赖性的问题, 提出一种自适应动态特征加权K-means(adaptive dynamic feature weighting K-means, ADFW-K-means)算法, 该算法融合了动态特征加权、 K-means++优化初始化、 肘部法则辅助簇数选择、 空簇处理机制以及自适应簇数调整策略等多项技术. 其次, 在吉林大学2022--2024年选调生数据集上进行实验, 实验结果表明, ADFW-K-means算法相较于传统聚类算法, 在轮廓系数、 聚类稳定性和业务可解释性3个核心指标上均得到显著提升, ADFW-K-means算法有效克服了传统方法的固有缺陷, 显著提升了复杂高维异构数据聚类的准确性和鲁棒性.

关键词: 自适应簇数, 动态特征加权, K-means算法, 聚类算法

Abstract: Firstly, aiming at the problems of  the traditional K-means algorithm’s assumption of feactre equality in  processing high-dimensional heterogeneous data, which led to  the neglect of important features,  high sensitivity of clustering results to the preset number of clusters, and strong dependence on the selection of initial centroids, we proposed an adaptive dynamic feature weighting K-means algorithm (ADFW-K-means), which integrated multiple techniques, such as dynamic feature weighting, 
K-means++-optimized initialization, the elbow rule for cluster number selection, an empty cluster handling mechanism, and an adaptive cluster number adjustment strategy. Secondly, the experiments conducted on the targeted selection graduates dataset of  Jilin University from 2022 to 2024. The experimental results show that compared with traditional clustering algorithms, 
the  ADFW-K-means algorithm achieves significant improvements in three core metrics of  silhouette coefficient, clustering stability, and business interpretability,  effectively overcoming the inherent limitations of traditional methods, significantly enhancing the accuracy and robustness of clustering for complex high-dimensional heterogeneous data.

Key words:  , adaptive cluster number, dynamic feature weighting, K-means algorithm, clustering algorithm

中图分类号: 

  • TP18