Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (10): 2917-2922.doi: 10.13229/j.cnki.jdxbgxb.20220689

Previous Articles     Next Articles

Detection method of abnormal data in cube based on spectral clustering

Shi-jun SONG1(),Min FAN2()   

  1. 1.School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 610031,China
    2.School of Civil Engineering,Southwest Jiaotong University,Chengdu 610031,China
  • Received:2022-06-12 Online:2023-10-01 Published:2023-12-13
  • Contact: Min FAN E-mail:songshijun20220@yeah.net;fanmin@swjtu.edu.cn

Abstract:

Due to the lack of dimension reduction in the process of cube abnormal data detection, the detection accuracy of abnormal data in the cube is low, the error detection rate is high, and the detection time is long. Therefore, a cube abnormal data detection method based on spectral clustering is proposed. Cluster the data in the multidimensional data set through Laplace matrix, preliminarily classify the data, use LLE algorithm to reduce the dimension of the classified data, express the high-dimensional data set with eigenvectors, remove the redundant information in the multidimensional data set, input the processed multidimensional data set into the support vector machine model, and complete the detection of abnormal data according to the calculation of regression estimates. Experimental results show that the proposed algorithm has higher accuracy, lower false detection rate and shorter detection time.

Key words: Laplace matrix, spectral clustering, data dimensionality reduction, cube, support vector machine algorithm

CLC Number: 

  • TP393

Fig.1

Anomaly data detection process"

Table 1

information statistics of database"

数据集名称属性个数数据数量
Isolet6737
Multiple Features1018 282
KDDCUP19991525 021

Fig.2

Detection accuracy of three methods"

Table 2

False detection rate of three methods"

方 法参 数Isolet数据库Multiple Features数据库KDDCUP1999数据库
本文误检/个102
误检率/%204%
文献[3误检/个5810
误检率8%12%20%
文献[4误检/个101218
误检率20%22%32%

Fig.3

Detection time of three methods"

1 赵臣啸, 薛惠锋, 王磊, 等. 基于孤立森林算法的取用水量异常数据检测方法[J]. 中国水利水电科学研究院学报, 2020, 18(1): 31-39.
Zhao Chen-xiao, Xue Hui-feng, Wang Lei, et al. Water consumption abnormal data detection method based on isolation forest[J]. Journal of China Institute of Water Resources and Hydropower Research, 2020, 18(1): 31-39.
2 李晨, 王布宏, 田继伟, 等. 基于LSTM-OCSVM的无人机传感器数据异常检测[J]. 小型微型计算机系统, 2021, 42(4): 700-705.
Li Chen, Wang Bu-hong, Tian Ji-wei, et al. Anomaly detection method for UAV sensor data based on LSTM-OCSVM[J]. Journal of Chinese Computer Systems, 2021, 42(4): 700-705.
3 吴金娥, 王若愚, 段倩倩, 等. 基于反向k近邻过滤异常的群数据异常检测[J]. 上海交通大学学报, 2021, 55(5): 598-606.
Wu Jin-e, Wang Ruo-yu, Duan Qian-qian, et al. Collective data anomaly detection based on reverse k-nearest neighbor filtering[J]. Journal of Shanghai Jiaotong University, 2021, 55(5): 598-606.
4 仇开, 姜瑛. 加权LOF结合上下文判断的云环境中服务运行数据异常检测方法[J]. 计算机工程与科学,2020, 42(6): 951-958.
Qiu Kai, Jiang Ying. A service running data anomaly detection method based on weighted LOF and context judgment in cloud environment[J]. Computer Engineering and Science, 2020, 42(6): 951-958.
5 仇媛, 常相茂, 仇倩, 等. 基于长短期记忆网络和滑动窗口的流数据异常检测方法[J]. 计算机应用, 2020, 40(5): 1335-1339.
Qiu Yuan, Chang Xiang-mao, Qiu Qian, et al. Stream data anomaly detection method based on long short-term memory network and sliding window[J]. Journal of Computer Applications, 2020, 40(5): 1335-1339.
6 王秋萍, 丁成, 王晓峰. 一种基于改进KH与KHM聚类的混合数据聚类算法[J]. 控制与决策, 2020, 35(10): 2449-2458.
Wang Qiu-ping, Ding Cheng, Wang Xiao-feng. A hybrid data clustering algorithm based on improved krill herd algorithm and KHM clustering[J]. Control and Decision, 2020, 35(10): 2449-2458.
7 石险峰, 刘学军, 张礼. PUseqClust: 一种RNA-seq数据聚类分析方法[J]. 软件学报, 2019, 30(9): 2857-2868.
Shi Xian-feng, Liu Xue-jun, Zhang Li. PUseqClust: a clustering analysis method for RNA-Seq data[J]. Journal of Software, 2019, 30(9): 2857-2868.
8 钱晓东, 罗彦福. 基于互信息属性排序的不完整数据聚类算法[J]. 信息与控制, 2019, 48(1): 80-87.
Qian Xiao-dong, Luo Yan-fu. Incomplete data clustering algorithm based on mutual information attributes ranking[J]. Information and Control, 2019, 48(1): 80-87.
9 刘颖, 张艳邦. 拉普拉斯矩阵在聚类中的应用[J]. 天津科技大学报, 2019, 34(3): 76-80.
Liu Ying, Zhang Yan-bang. Application of Laplacian matrix in clustering[J]. Journal of Tianjin University of Science & Technology, 2019, 34(3): 76-80.
10 魏世超, 李歆, 张宜弛, 等. 基于E-t-SNE的混合属性数据降维可视化方法[J]. 计算机工程与应用, 2020, 56(6): 66-72.
Wei Shi-chao, Li Xin, Zhang Yi-chi, et al. Dimension reduction and visualization of mixed-type data based on E-t-SNE[J]. Computer Engineering and Applications, 2020, 56(6): 66-72.
11 郭方方, 吕宏武, 任威霖, 等. 基于有监督判别投影的网络安全数据降维算法[J]. 通信学报, 2021, 42(6): 84-93.
Guo Fang-fang, Lv Hong-wu, Ren Wei-lin, et al. Reduction algorithm based on supervised discriminant projection for network security data[J]. Journal on Communications, 2021, 42(6): 84-93.
12 李世波, 林辉, 葛淼. 东洞庭湖湿地植被高光谱数据降维与分类[J]. 中南林业科技大学学报, 2019, 39(11): 36-41.
Li Shi-bo, Lin Hui, Ge Miao. Hyperspectral dimensionality reduction and classification of the east Dongting lake wetland vegetation[J]. Journal of Central South University of Forestry & Technology, 2019,39(11): 36-41.
13 刘俐, 李勇, 曹一家, 等. 基于支持向量机和长短期记忆网络的暂态功角稳定预测方法[J]. 电力自动化设备, 2020, 40(2): 129-139.
Liu Li, Li Yong, Cao Jia-jia, et al. Transient rotor angle stability prediction method based on SVM and LSTM network[J]. Electric Power Automation Equipment, 2020, 40(2): 129-139.
14 雷庆祝, 秦永松. 强混合样本下非参数回归函数的经验似然推断[J]. 应用数学学报, 2019, 42(2): 179-196.
Lei Qing-zhu, Qin Yong-song. Empirical Likelihood for nonparametric regression functions under strong mixing samples[J]. Acta Mathematicae Applicatae Sinica, 2019, 42(2): 179-196.
15 赵海燕, 刘琨, 王廷梅, 等. 网络文本蕴含关系识别的异常信息获取仿真[J]. 计算机仿真, 2020, 37(8):256-260.
Zhao Hai-yan, Liu Kun, Wang Ting-mei, et al. Simulation of abnormal information acquisition for network text implication relationship recognition[J]. Computer Simulation, 2020, 37(8): 256-260.
[1] Yao-long KANG,Li-lu FENG,Jing-an ZHANG,Su-e CAO. Fast outlier mining algorithm in uncertain data set based on spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1181-1186.
[2] Shi-jie GUO,Xue-wei ZHANG,Nan ZHANG,Guan QIAO,Shu-feng TANG. Thermal key point select and error prediction under typical speed of machine tool spindle [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 72-81.
[3] Zhao-ming CHEN,Jin-song ZOU,Wei WANG,Ming-quan SHI. Multi-objective optimization of casting-forging dynamic forming based on improved particle swarm neural network and finite element analysis [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(7): 1524-1533.
[4] Yao-long KANG,Li-lu FENG,Jing-an ZHANG,Fu CHEN. Outlier mining algorithm for high dimensional categorical data streams based on spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1422-1427.
[5] Jun-jun LI,Jian-nong CAO,Bei-bei CHENG,Juan LIAO,Ying-ying ZHU. High spatial resolution remote sensing imagery segmentation based on combination of pixels and multi⁃scaleobjects using spectral clustering [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2098-2108.
[6] Jin⁃gang ZHAO,Ming ZHANG,Yu⁃lin ZHAN,Ming⁃zhi XIE. Damage criterion of reinforced concrete pier based on plastic strain energy density [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1124-1133.
[7] LIU Zhong-min, LI Zhan-ming, LI Bo-hao, HU Wen-jin. Spectral clustering image segmentation based on sparse matrix [J]. 吉林大学学报(工学版), 2017, 47(4): 1308-1313.
[8] WANG Qing-nian, DUAN Ben-ming, WANG Peng-yu, GONG Yin-sheng, ZHU Qing-lin. Optimization of powertrain transmission parameters of plug-in hybrid electric vehicle [J]. 吉林大学学报(工学版), 2017, 47(1): 1-7.
[9] QU Lin,ZHOU Fan,CHEN Yao-wu. Trajectory lcassification based on Hausdorff distance for visual surveillance system [J]. 吉林大学学报(工学版), 2009, 39(06): 1618-1624.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!