吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

一种基于Seeds集和成对约束的主动半监督聚类算法

陈志雨1, 王慧君1, 胡明2, 刘 钢1   

  1. 1. 长春工业大学 计算机科学与工程学院, 长春 130012; 2. 长春工程学院 校长办公室, 长春 130012
  • 收稿日期:2016-09-09 出版日期:2017-05-26 发布日期:2017-05-31
  • 通讯作者: 刘 钢 E-mail:lg@ccut.edu.cn

An Active Semi-supervised Clustering AlgorithmBased on Seeds Set and Pairwise Constraints

CHEN Zhiyu1, WANG Huijun1, HU Ming2, LIU Gang1   

  1. 1. College of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China;\=2. Office of Principal, Jilin Vocational and Technical Institute Communications, Changchun 130012, China
  • Received:2016-09-09 Online:2017-05-26 Published:2017-05-31
  • Contact: LIU Gang E-mail:lg@ccut.edu.cn

摘要: 针对半监督聚类算法中监督信息使用不充分, 监督信息中信息含有量低的问题, 提出一种结合主动学习的半监督聚类算法. 首先结合使用数据的类别标记和成对约束信息, 指导Kmeans聚类过程, 设计出一种基于Seeds集和成对约束的半监督聚类算法SC\|Kmeans; 其次将主动学习算法引入到SC\|Kmeans中, 以尽量小的代价选取信息含有量更高的监督信息, 提高SC\|Kmeans算法的聚类精度; 最后在UCI标准数据集上进行仿真实验. 实验结果表明, 该算法取得了较好的聚类效果, 有效提高了聚类准确率.

关键词: Seeds集, 主动学习, 成对约束, 半监督聚类, Kmeans算法

Abstract: Aiming at the problem that the supervised information was not sufficient and the information content of supervision information was low in semi-supervised clustering algorithm, we proposed a semi\|supervised clustering algorithm based on active learning. Firstly, we designed a semi\|supervised clustering algorithm based on Seeds set and pairwise constraints (SC\|Kmeans) to guide the clustering process of the Kmeans algorithm by using the labeled data a
nd pairwise constraints. Secondly, we introduced the active learning algorithm into SC\|Kmeans, in order to select a higher amount of supervision information with a small cost and improve the clustering accuracy of SC\|Kmeans algorithm. Finally, the simulation experiments were performed on machine learning repository (UCI) standard data sets. The experimental results show that the proposed algorithm can achieve better clustering effect, and effectively improve the clustering accuracy.

Key words: semi-supervised clustering, pairwise constraint, Kmeans algorithm, active learning, Seeds set

中图分类号: 

  • TP181