吉林大学学报(理学版) ›› 2025, Vol. 63 ›› Issue (6): 1723-1730.

• • 上一篇    下一篇

一种融合自编码器与动态阈值策略的改进BIRCH算法

王守佳1, 郭东伟2, 石泽男3, 默锦杨3, 刘恒斌2   

  1. 1. 吉林大学 人力资源处, 长春 130012; 2. 吉林大学 软件学院, 长春 130012;
    3. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2024-12-25 出版日期:2025-11-26 发布日期:2025-11-26
  • 通讯作者: 王守佳 E-mail:wangshoujia@jlu.edu.cn

An Improved BIRCH Algorithm Integrating Autoencoder and Dynamic Threshold Strategy

WANG Shoujia1, GUO Dongwei2, SHI Zenan3, MO Jinyang3, LIU Hengbin2   

  1. 1. Human Resources Department, Jilin University, Changchun 130012, China;2. College of Software, Jilin University, Changchun 130012, China;3. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2024-12-25 Online:2025-11-26 Published:2025-11-26

摘要: 针对传统BIRCH算法在应对特征强相关、 分布不均的指标数据时, 易出现簇内差异过大或过度合并的问题, 提出一种融合自编码器和动态阈值策略的改进BIRCH算法. 首先, 该算法利用自编码器进行非线性特征映射与降维, 削弱特征间相关性对距离度量的影响, 提高数据表示的紧凑性和判别性; 其次, 设计动态阈值调整策略, 根据局部样本密度与簇规模自适应调整聚类半径, 增强算法对非均匀分布数据的适应性; 最后, 在改进后的特征空间与自适应阈值策略下构建聚类特征树, 实现高效且稳定的层次聚类, 并应用于高校教师多维数据的智能聚类分析中. 实验结果表明, 在多个聚类评价指标上, 改进算法均取得了更优性能, 能显著提升聚类的稳定性和准确性.

关键词: BIRCH算法, 自编码器, 动态阈值, 教师评价

Abstract: Aiming at  the problem that the traditional BIRCH algorithm was prone to excessive intra-cluster variance or excessive merging when dealing with indicator data with strongly correlated features and uneven distribution, we proposed an improved BIRCH algorithm  that integrated an autoencoder and a dynamic threshold strategy. Firstly, this algorithm  utilized an autoencoder for nonlinear feature mapping and dimensionality reduction, weakening the influence of inter-feature correlation on the distance metric and improving the compactness and discriminability of the data representation. Secondly, we designed a dynamic threshold strategy to adaptively adjust the clustering radius based on local sample density and cluster size, enhancing the algorithm’s adaptability to unevenly distributed data. Finally, we  constructed a clustering feature tree by using the improved feature space and adaptive threshold strategy to achieve efficient and stable hierarchical clustering, and applied to intelligent clustering analysis of multidimensional data of university teachers. Experimental results show that the improved algorithm achieves superior performance on multiple clustering evaluation metrics, significantly improving  stability and accuracy of clustering.

Key words: BIRCH algorithm, autoencoder, dynamic threshold, teacher evaluation

中图分类号: 

  • TP391