Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (4): 1181-1186.doi: 10.13229/j.cnki.jdxbgxb.20220087

Previous Articles    

Fast outlier mining algorithm in uncertain data set based on spectral clustering

Yao-long KANG1(),Li-lu FENG2,Jing-an ZHANG3,Su-e CAO1   

  1. 1.School of Computer and Network Engineering,Shanxi Datong University,Datong 037009,China
    2.School of Education Science and Technology,Shanxi Datong University,Datong 037009,China
    3.Computer Network Center,Shanxi Datong University,Datong 037009,China
  • Received:2022-01-22 Online:2023-04-01 Published:2023-04-20

Abstract:

Aiming at the problems of long mining time, poor mining effect and low mining performance in data mining due to the failure to extract relevant data features before data mining, a fast outlier mining algorithm in uncertain data set based on spectral clustering is proposed. The algorithm calculates the similarity of data according to unequal length sequences, and uses partial least square method to extract the features of uncertain data sets; Then, the data features are calculated based on spectral clustering algorithm to obtain the outlier index of the data; Finally, outlier mining of uncertain data sets is completed by outlier index. The experimental results show that the algorithm has the advantages of short mining time, good mining effect and high mining performance.

Key words: spectral clustering algorithm, uncertain data set, data outliers, fast mining, partial least squares method

CLC Number: 

  • TP274

Table 1

Mining time test results of different miningalgorithms"

数据总量 /万条挖掘时间/s
本文算法文献[3]算法文献[5]算法
10454050
20455280
304574126
4055100186
5070162238
60100208299
70174285363
80223357395
90298417436
100349489525

Fig.1

Mining accuracy test results of different miningalgorithms"

Fig.2

Test results of false positive rate of differentmining algorithms"

1 李旺彦, 于彤. 基于计算机技术的水利工程管理信息化研究——评≪水利工程管理≫[J]. 人民黄河, 2020, 42(7): 168.
Li Wang-yan, Yu Tong. Research on computer technology-based water conservancy project management information——comment on "water conservancy project management"[J]. Yellow River, 2020, 42(7): 168.
2 吕九亨, 王建岭, 潘丽佳, 等. 基于数据挖掘技术的腹针疗法应用特点研究[J]. 针刺研究, 2020, 45(3): 237-242.
Jiu-heng Lyu, Wang Jian-ling, Pan Li-jia, et al. Application characteristics of abdominal acupuncture based on data mining technique[J]. Acupuncture Research, 2020, 45(3): 237-242.
3 杜旭升, 于炯, 陈嘉颖, 等. 一种基于邻域系统密度差异度量的离群点检测算法[J]. 计算机应用研究, 2020, 37(7): 1969-1973.
Du Xu-sheng, Yu Jiong, Chen Jia-ying, et al. Outlier detection algorithm based on neighborhood system density difference measurement[J]. Application Research of Computers, 2020, 37(7): 1969-1973.
4 杜旭升, 于炯, 叶乐乐, 等. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.
Du Xu-sheng, Yu Jiong, Ye Le-le, et al. Outlier detection algorithm based on graph random walk[J]. Journal of Computer Applications, 2020, 40(5): 1322-1328.
5 赵晓永, 王宁宁, 王磊. 基于主动学习的离群点集成挖掘方法研究[J]. 计算机工程与应用, 2020, 56(12): 112-117.
Zhao Xiao-yong, Wang Ning-ning, Wang Lei. Research of outlier ensemble mining based on active learning[J]. Computer Engineering and Applications, 2020, 56(12): 112-117.
6 吴鑫育, 李心丹, 马超群. 基于期权与高频数据信息的VaR度量研究[J]. 中国管理科学, 2021, 29(8): 13-23.
Wu Xin-yu, Li Xin-dan, Ma Chao-qun. Measuring VaR based on the information content of option and high-frequency data[J]. Chinese Journal of Management Science, 2021, 29(8): 13-23.
7 申晨曦, 杜晨晖, 李震宇, 等. 基于氢核磁共振与偏最小二乘法对酸枣仁及其掺伪品的鉴别[J]. 食品科学, 2020, 41(8): 275-281.
Shen Chen-xi, Du Chen-hui, Li Zhen-yu, et al. Differentiation between authentic and adulterated ziziphi spinosae semen by 1H NMR spectroscopy combined with partial least squares[J]. Food Science, 2020, 41(8): 275-281.
8 徐胜蓝, 司曹明哲, 万灿, 等. 考虑双尺度相似性的负荷曲线集成谱聚类算法[J]. 电力系统自动化, 2020, 44(22): 152-160.
Xu Sheng-lan, Ming-zhe Sicao, Wan Can, et al. Ensemble spectral clustering algorithm for load profiles considering dual-scale similarities[J]. Automation of Electric Power Systems, 2020, 44(22): 152-160.
9 王秋萍, 丁成, 王晓峰. 一种基于改进KH与KHM聚类的混合数据聚类算法[J]. 控制与决策, 2020, 35(10): 2449-2458.
Wang Qiu-ping, Ding Cheng, Wang Xiao-feng. A hybrid data clustering algorithm based on improved krill herd algorithm and KHM clustering[J]. Control and Decision, 2020, 35(10): 2449-2458.
10 王晓辉, 宋学坤, 王晓川. 基于邻域密度的异构数据局部离群点挖掘算法[J]. 计算机仿真, 2021, 38(7): 281-285.
Wang Xiao-hui, Song Xue-kun, Wang Xiao-chuan. Local outlier mining algorithm for heterogeneous data based on neighborhood density[J]. Computer Simulation, 2021, 38(7): 281-285.
[1] Yao-long KANG,Li-lu FENG,Jing-an ZHANG,Fu CHEN. Outlier mining algorithm for high dimensional categorical data streams based on spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1422-1427.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!