J4 ›› 2012, Vol. 50 ›› Issue (06): 1214-1217.

• 计算机科学 • 上一篇    下一篇

基于聚类和局部信息的离群点检测算法

张强1, 王春霞2, 赵健3, 武龙举3, 李静永3   

  1. 1. 白城师范学院 计算机科学学院, 吉林 白城 137000;2. 中国科学院 长春光学精密机械与物理研究所, 长春 130033;3. 吉林大学 计算机科学与技术学院, 长春 130012
  • 收稿日期:2012-01-18 出版日期:2012-11-26 发布日期:2012-11-26
  • 通讯作者: 李静永 E-mail:lijingyong8888@126.com

Outlier Detecting Algorithm Based on Clusteringand Local Information

ZHANG Qiang1, WANG Chunxia2, ZHAO Jian3, WU Longju3, LI Jingyong3   

  1. 1. School of Computer Science, |Baicheng Teachers College, Baicheng 137000, Jilin Province, China;2. Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China|3. College of Computer Science and Technology, Jilin University, Changchun 130012, China
  • Received:2012-01-18 Online:2012-11-26 Published:2012-11-26
  • Contact: LI Jingyong E-mail:lijingyong8888@126.com

摘要:

针对目前大部分离群点检测算法未考虑数据的局部信息, 导致离群点检测的准确率低问题, 提出一种新的基于聚类和局部信息的两阶段离群点检测算法. 通过定义新的局部离群因子作为判断数据对象是否为离群点的衡量标准, 改进了传统离群点检测算法的过程. 实验结果表明, 该算法在保持线性复杂度的同时, 能更准确、 有效地挖掘出数据集中的离群点.

关键词: 离群点检测; k-means聚类; 局部离群因子

Abstract:

Most existing outlier detection algorithms ignore local information of data sets, they are of low accuracy. We adopted a twophase algorithm based on k-means clustering algorithm, defined a new local stray factor as the standard to judge whether data objects are outliers. We also improved the process of detecting outliers and solved the above problem. Experiments show that our algorithm overcomes the shortcomings of existing methods, ensure the algorithm has linear time complexity and is able to find outliers in data sets more accurately and effectively.

Key words: outlier detecting, k-means clustering, local outlier factor

中图分类号: 

  • TP391