吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

基于离群因子的不确定数据生成算法

刘钢1, 唐东凯1, 王红梅1, 胡明2   

  1. 1. 长春工业大学 计算机科学与工程学院, 长春 130012; 2. 长春工程学院 计算机技术与工程学院, 长春 130012
  • 收稿日期:2017-06-20 出版日期:2018-07-26 发布日期:2018-07-31
  • 通讯作者: 胡明 E-mail:huming@ccut.edu.cn

Uncertain Data Generation Algorithm Based on Outlier Factor

LIU Gang1, TANG Dongkai1, WANG Hongmei1, HU Ming2   

  1. 1. School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China;2. School of Computer Technology and Engineering, Changchun Institute of Technology, Changchun 130012, China
  • Received:2017-06-20 Online:2018-07-26 Published:2018-07-31
  • Contact: HU Ming E-mail:huming@ccut.edu.cn

摘要: 基于不确定数据的表示模型, 针对属性级不确定数据, 提出一种不确定数据生成算法ACUDGen(attribute level continuous uncertain data set generation algorithm). 该算法通过引入离群点检测LOF(local outlier factor)算法, 用每个数据对象的离群因子作为参数来控制不确定数据对象的扰动范围, 可很好地满足原始数据的分布特征, 解决了目前工作中缺乏原始数据分布特征的问题. 实验结果表明, 该算法生成的不确定数据集具有更好的聚类效果, 并降低了离群点对聚类结果的影响, 使每个数据对象MBR(minimum bounding rectangle)的大小可根据自身的分布特征自适应地变化.

关键词: 表示模型, ACUDGen算法, 不确定数据, 离群因子

Abstract: Based on the uncertain data representation model, we proposed an uncertain data generation algorithm ACUDGen (attribute level continuous uncertain data set generation algorithm) for attribute level uncertain data. By introducing the outlier detection algorithmLOF (local outlier factor) algorithm, the algorithm used the outlier factor of each data object as the parameter to control the perturbation range of uncertain data objects, which could well satisfy the distribution characteristics of the original data and solve the problem of lack of the distribution characteristics of the original data in the present work. The experimental results show that the uncertain data set generated by the proposed algorithm has a better clustering effect, and reduces the influence of outier on the clustering results, so that the size of each data object MBR (minimum bounding rectangle) can be adaptively changed according to its own distribution characteristics.

Key words: outlier factor; ACUDGen algorithm, uncertain data, representation model

中图分类号: 

  • TP391