Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (9): 2659-2665.doi: 10.13229/j.cnki.jdxbgxb.20220598

Previous Articles     Next Articles

Design of big data anomaly detection model based on random forest algorithm

Shi-jun SONG1(),Min FAN2()   

  1. 1.School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 610031,China
    2.School of Civil Engineering,Southwest Jiaotong University,Chengdu 610031,China
  • Received:2022-05-18 Online:2023-09-01 Published:2023-10-09
  • Contact: Min FAN E-mail:songshijun2022@yeah.net;fanmin@swjtu.edu.cn

Abstract:

Aiming at the problem that Big data anomaly detection process is easily interfered by edge data, which leads to poor accuracy of Big data anomaly detection, a big data anomaly detection model based on Random forest algorithm was proposed. Firstly, the improved k-means algorithm was used to cluster the big data, and the principal component analysis method was used to extract the features of the big data. Then a big data anomaly detection model based on random forest classifier was built, the extracted features was inputted into the model, a decision tree was built, and the classification accuracy of the classifier was improved by dynamically updating the weight value of the decision tree. Finally, the classification results are output to complete the anomaly detection of big data. The experimental results show that the detection time of the proposed model is about 25 s, the average big data anomaly detection accuracy is 91%, and the false alarm rate is 4.5%.

Key words: big data clustering, feature extraction, principal component analysis, random forest classifier, decision tree, update weights

CLC Number: 

  • TM714

Fig.1

Principle of decision tree classification"

Fig.2

Flow chart based on random forest detection model"

Table 1

Test time of different methods"

实验序号检测时间/s
本文模型文献[3]模型文献[4]模型
1204178
2224378
3264174
4274872
5244689
6236878
7274172
8274481
9254777
10244878
11234479
12224778
13244987
14254175
15265477

Fig.3

Accuracy of different models"

Fig.4

False positive rate of different models"

1 刘永辉, 张显, 孙鸿雁, 等. 能源互联网背景下电力市场大数据应用探讨[J]. 电力系统自动化, 2021, 45(11): 1-10.
Liu Yong-hui, Zhang Xian, Sun Hong-yan, et al. Discussion on application of big data in electricity market in background of energy internet[J]. Automation of Electric Power Systems, 2021, 45(11): 1-10.
2 姜丹, 梁春燕, 吴军英, 等. 基于大数据分析的电力运行数据异常检测示警方法[J]. 中国测试, 2020, 46(7): 18-23.
Jiang Dan, Liang Chun-yan, Wu Jun-ying, et al. Alarm method of power operation data anomaly detection based on big data analysis[J]. China Measurement & Test, 2020, 46(7): 18-23.
3 万磊, 陈成, 黄文杰, 等. 基于BRB和LSTM网络的电力大数据用电异常检测方法[J]. 电力建设, 2021, 42(8): 38-45.
Wan Lei, Chen Cheng, Huang Wen-jie, et al. Power abnormity detection method based on power big data applying BRB and LSTM network[J]. Electric Power Construction, 2021, 42(8): 38-45.
4 李清. 基于改进PSO-PFCM聚类算法的电力大数据异常检测方法[J]. 电力系统保护与控制, 2021, 49(18): 161-166.
Li Qing. Power big data anomaly detection method based on an improved PSO-PFCM clustering algorithm[J]. Power System Protection and Control, 2021, 49(18): 161-166.
5 丁小欧, 于晟健, 王沐贤, 等. 基于相关性分析的工业时序数据异常检测[J]. 软件学报, 2020, 31(3): 726-747.
Ding Xiao-ou, Yu Sheng-jian, Wang Mu-xian, et al. Anomaly detection on industrial time series based on correlation analysis[J]. Journal of Software, 2020, 31(3): 726-747.
6 谢桦, 陈昊, 邓晓洋, 等. 基于改进k-means聚类技术与半不变量法的电-气综合能源系统运行风险评估方法[J]. 中国电机工程学报, 2020, 40(1): 59-69, 374.
Xie Hua, Chen Hao, Deng Xiao-yang, et al. Electric-gas integrated energy system operational risk assessment based on improved k-means clustering technology and semi-invariant method[J]. Proceedings of the CSEE, 2020, 40(1): 59-69, 374.
7 吴金蔚. φ-混合样本下密度函数在有限点处的联合渐近分布[J]. 信阳师范学院学报: 自然科学版, 2021, 34(4): 541-544.
Wu Jin-wei. The joint asymptotic distribution of probability density function in a finite number of points under φ-mixing samples[J]. Journal of Xinyang Normal University (Natural Science Edition), 2021, 34(4): 541-544.
8 张重远, 胡焕, 程槐号, 等. 基于欧氏距离分析的电力变压器绕组变形程度与类型的诊断方法[J]. 高压电器, 2020, 56(1): 224-230.
Zhang Zhong-yuan, Hu Huan, Cheng Huai-hao, et al. Diagnostic method to determine degree and type of winding deformation in power transformer based on euclidean distance[J]. High Voltage Apparatus, 2020, 56(1): 224-230.
9 代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115.
Dai Jin, Chen Ying. Joint Linear Discrimination and graph regularization for task-oriented cross-modal retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 106-115.
10 蔡瑞初, 李嘉豪, 郝志峰. 基于类内最大均值差异的无监督领域自适应算法[J]. 计算机应用研究, 2020, 37(8): 2371-2375.
Cai Rui-chu, Li Jia-hao, Hao Zhi-feng.Unsupervised domain adaptive algorithm with intra-class maximum mean discrepancy[J]. Application Research of Computers, 2020, 37(8): 2371-2375.
11 胡善科, 秦玉华, 段如敏, 等. 联合矩阵局部保持投影的近红外光谱特征提取[J]. 光谱学与光谱分析, 2020, 40(12): 3772-3777.
Hu Shan-ke, Qin Yu-hua, Duan Ru-min, et al. Research on feature extraction of near-infrared spectroscopy based on joint matrix local preserving projection[J]. Spectroscopy and Spectral Analysis, 2020, 40(12): 3772-3777.
12 吴铮, 张悦, 董泽. 基于改进高斯混合模型的热工过程异常值检测[J]. 系统仿真学报, 2023, 35(5): 1020-1033.
Wu Zheng, Zhang Yue, Dong Ze. Outlier detection during thermal processes based on improved Gaussian mixture model[J]. Journal of System Simulation, 2023, 35(5): 1020-1033.
13 谢桦, 陈俊星, 赵宇明, 等. 基于SMOTE和决策树算法的电力变压器状态评估知识获取方法[J]. 电力自动化设备, 2020, 40(2): 137-142.
Xie Hua, Chen Jun-xing, Zhao Yu-ming, et al. Knowledge acquisition method of power transformer condition assessment based on SMOTE and decision tree algorithm[J]. Electric Power Automation Equipment, 2020, 40(2): 137-142.
14 蔡瑞初, 白一鸣, 乔杰, 等. 基于混淆因子隐压缩表示模型的因果推断方法[J]. 计算机应用, 2021, 41(10): 2793-2798.
Cai Rui-chu, Bai Yi-ming, Qiao Jie, et al. Causal inference method based on confounder hidden compact representation model[J]. Journal of Computer Applications, 2021, 41(10): 2793-2798.
15 张清华, 庞国弘, 李新太, 等. 基于代价敏感的序贯三支决策最优粒度选择方法[J]. 电子与信息学报, 2021, 43(10): 3001-3009.
Zhang Qing-hua, Pang Guo-hong, Li Xin-tai, et al. Optimal granularity selection method based on cost-sensitive sequential three-way decisions[J]. Journal of Electronics & Information Technology, 2021, 43(10): 3001-3009.
[1] Li-fang FU,Zhuo CHEN,Chang-lin AO. Dynamic outlier detection algorithm for network large data set based on classification and regression trees decision tree [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2620-2625.
[2] Zhuang-lin MA,Shan-shan CUI,Da-wei HU,Jin WANG. Travel mode choice of traditional car travelers after implementation of driving restriction policy [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(7): 1981-1993.
[3] En-shen LONG,Guang-ze BAN. Idle noise diagnosis algorithm of air-conditioning refrigeration compressor based on wavelet packet extraction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(7): 1929-1934.
[4] Yan LI,Jiu-peng ZHANG,Zi-xuan CHEN,Guo-jing HUANG,Pei WANG. Evaluation of asphalt pavement performance based on PCA⁃PSO⁃SVM [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(6): 1729-1735.
[5] Lin BAI,Lin-jun LIU,Xuan-ang LI,Sha WU,Ru-qing LIU. Depth estimation algorithm of monocular image based on self-supervised learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1139-1145.
[6] Qiang GUO,Ming-song LI,Kai ZHOU. Multi⁃mode radar signal sorting based on potential distance graph and improved cloud model [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1904-1911.
[7] Yi-na ZHOU,Hong-li DONG,Yong ZHANG,Jing-yi LU. Feature extraction method of pipeline signals based on VMD de-noising and dispersion entropy [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(4): 959-969.
[8] Chao-ying YIN,Chun-fu SHAO,Zhao-guo HUANG,Xiao-quan WANG,Sheng-you WANG. Investigating influences of multi⁃scale built environment on car ownership behavior based on gradient boosting decision trees [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 572-577.
[9] Xiao⁃lei CHEN,Yong⁃feng SUN,Ce LI,Dong⁃mei LIN. Stable anti⁃noise fault diagnosis of rolling bearing based on CNN⁃BiLSTM [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 296-309.
[10] Guo-fa LI,Yan-bo WANG,Jia-long HE,Ji-li WANG. Research progress and development trend of health assessment of electromechanical equipment [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 267-279.
[11] Xian-jun DU,Liang-liang JIA. Fault diagnosis of rolling bearing based on optimized stacked denoising auto encoders [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(12): 2827-2838.
[12] Jing-xian WU,Hua-peng SHEN,Yin HAN,Min YANG. Residents' commuting time model under the nonlinear impact of urban built environment [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(11): 2568-2573.
[13] Li-can DAI,Xiang DAI,Ying CUI,Yong-chao WEI. Anomaly data mining algorithm in social network based on deep integrated learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(11): 2712-2717.
[14] Cai-mao LI,Shao-fan CHEN,Cheng-rong LIN,Yu-quan HOU,Hao LI. Dynamic recommendation method of virtual community knowledge based on circular knowledge map [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2385-2390.
[15] Zhen CAO,Lu-yao CUI,Bin LEI,Jing-yi WANG,Shuang-sheng CAO. Feature dimensionality reduction and random forest method in intelligent diagnosis of rolling bearings for urban rail trains [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2287-2293.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] Liu Qing-min,Wang Long-shan,Chen Xiang-wei,Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] Li Hong-ying; Shi Wei-guang;Gan Shu-cai. Electromagnetic properties and microwave absorbing property
of Z type hexaferrite Ba3-xLaxCo2Fe24O41
[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] Zhang Quan-fa,Li Ming-zhe,Sun Gang,Ge Xin . Comparison between flexible and rigid blank-holding in multi-point forming[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5] Yang Shu-kai, Song Chuan-xue, An Xiao-juan, Cai Zhang-lin . Analyzing effects of suspension bushing elasticity
on vehicle yaw response character with virtual prototype method
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6] . [J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7] Che Xiang-jiu,Liu Da-you,Wang Zheng-xuan . Construction of joining surface with G1 continuity for two NURBS surfaces[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8] Liu Han-bing, Jiao Yu-ling, Liang Chun-yu,Qin Wei-jun . Effect of shape function on computing precision in meshless methods[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9] Li Yue-ying,Liu Yong-bing,Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .
[10] Zhang He-sheng, Zhang Yi, Wen Hui-min, Hu Dong-cheng . Estimation approaches of average link travel time using GPS data[J]. 吉林大学学报(工学版), 2007, 37(03): 533 -0537 .