吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (6): 1673-1684.

• • 上一篇    下一篇

基于自动权重的主动块对角子空间聚类

李向利1,2,3, 谢腾翅1, 韦嘉逢1   

  1. 1. 桂林电子科技大学 数学与计算科学学院, 广西 桂林 541004;2. 广西高校数据分析与计算重点实验室, 广西 桂林 541004; 3. 广西应用数学中心, 广西 桂林 541004
  • 收稿日期:2025-01-10 出版日期:2025-11-26 发布日期:2025-11-26
  • 通讯作者: 李向利 E-mail:lixiangli@guet.edu.cn

Active Block Diagonal Subspace Clustering Based on Automatic Weighting

LI Xiangli1,2,3, XIE Tengchi1, WEI Jiafeng1   

  1. 1. School of Mathematics & Computing Science, Guilin University of Electronic Techology, Guilin 541004, Guangxi Zhuang Autonomous Region, China; 2. Guangxi University Key Laboratory of Data Analysis and Calculation, Guilin 541004, Guangxi Zhuang Autonomous Region, China;3. Guangxi Applied Mathematics Center, Guilin 541004, Guangxi Zhuang Autonomous Region, China
  • Received:2025-01-10 Online:2025-11-26 Published:2025-11-26

摘要: 针对传统基于谱聚类的子空间聚类方法在高维数据存在离群点时, 易受离群点干扰而导致聚类性能下降的问题, 提出一种基于自动权重的主动块对角子空间聚类方法. 该方法先为每个数据点赋予相应权重, 通过权重差异识别数据中的离群点. 在确定离群点后, 主动降低其在表示矩阵中的贡献度, 进而构建更优的表示矩阵以提升模型的聚类性能. 在10个数据集上与8种对比算法的实验结果表明: 在含10%,20%离群点的数据集上, 该方法的平均聚类准确率、 归一化互信息、 调整Rand指数等指标普遍优于对比算法; 在一般聚类任务中, 其在超过半数数据集上性能最优或位居前三. 因此该方法既能高效处理含离群点的高维数据聚类, 又能在通用聚类任务中保持竞争力, 为提高高维数据聚类的鲁棒性提供了有效方案, 有较高的实际应用价值.

关键词: 子空间聚类, 离群点, 自动权重, 块对角方法

Abstract: Aiming at the problem  that traditional spectral clustering-based subspace clustering methods were prone to outlier interference and thus show degraded clustering performance when there were outliers in high-dimensional data, we proposed an active block diagonal subspace clustering method based on automatic weighting. The method first assigned a corresponding weight to each data point, identified outliers in the data through weight differences, then actively reduced its contribution in the representation matrix to construct a better representation matrix and improved the  clustering performance of the model. Experimental results on 10 datasets compared with 8  algorithms show that the average clustering accuracy, normalized mutual information, and adjusted Rand index of the proposed method are generally better than the comparison algorithms on datasets with 10% or 20% outliers. It performs the best or ranks in the top three on more than half of the datasets in general clustering tasks. Therefore,  the method can not only efficiently handle high-dimensional data clustering with outliers, but also maintain competitiveness in general clustering tasks, providing an effective solution to enhance the robustness of high-dimensional data clustering and having high practical application value.

Key words: subspace clustering, outliers, automatic weighting, block diagonal method

中图分类号: 

  • TP181