吉林大学学报(工学版) ›› 2025, Vol. 55 ›› Issue (4): 1396-1405.doi: 10.13229/j.cnki.jdxbgxb.20230751

• 计算机科学与技术 • 上一篇    下一篇

融合集成学习技术和PSO-GA算法的特征提取技术的入侵检测方法

王军(),司昌馥,王凯鹏,付强()   

  1. 沈阳化工大学 计算机科学与技术学院,沈阳 110142
  • 收稿日期:2023-07-17 出版日期:2025-04-01 发布日期:2025-06-19
  • 通讯作者: 付强 E-mail:wj_software@hotmail.com;qiang.fu@outlook.com
  • 作者简介:王军(1978-),男,教授,博士.研究方向:工业网络安全.E-mail: wj_software@hotmail.com
  • 基金资助:
    辽宁省自然科学基金项目(2022-MS-291);国家外国专家项目(G2022006008L);辽宁省教育厅高校基本科研项目(LJKMZ20220781)

Intrusion detection method based on ensemble learning and feature selection by PSO-GA

Jun WANG(),Chang-fu SI,Kai-peng WANG,Qiang FU()   

  1. College of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang 110142,China
  • Received:2023-07-17 Online:2025-04-01 Published:2025-06-19
  • Contact: Qiang FU E-mail:wj_software@hotmail.com;qiang.fu@outlook.com

摘要:

针对工业网络的安全问题,提出了一种新的入侵检测方法,方法的具体创新之处分为两点。首先,在处理数据特征过程中,针对原始数据维度较高的问题,提出一种参数动态调整的粒子群优化-遗传混合算法,用于特征提取,成功筛选出了对模型训练有意义的特征子集,加快了模型训练速度。其次,在构建机器学习模型时,使用了堆叠集成学习框架对多个模型的输出结果进行泛化,以获得整体预测精度的提升。共在两个数据集上验证了本文方法的检测性能,试验结果表明:在公开的入侵检测数据集CICDS-2017上的检测精确度达到了95%,在由美国密西西比州立大学的Lan Turnipseed从天然气管道控制系统收集到的真实工业数据集上也达到了93%的精确度。

关键词: 计算机应用, 工业控制系统, 入侵检测, 集成学习, 特征提取

Abstract:

In response to the security issues in industrial networks, a new intrusion detection method is proposed. The specific innovations of the method are divided into two aspects. First, in the process of processing, in order to solve the problem of high dimensionality of the original data, a particle swarm optimize genetic algorithm (PSO-GA) hybrid algorithm with dynamically adjusted parameters was proposed for feature extraction. It successfully screened out a subset of features that are meaningful to model training and accelerated training speed. Secondly, when building a machine learning model, theStacking integrated learning framework is used to generalize the output results of multiple models to improve the overall prediction accuracy. The experimental results on both two datasets show that the detection precision on the publicly available intrusion detection dataset CICDS-2017 has reached 95%, and it also has a 93% precision on a real industrial dataset developed by Lan Turnipseed from the gas pipeline control system.

Key words: computer application, industrial control system, intrusion detection, ensemble learning, feature selection

中图分类号: 

  • TP399

表1

简化后的数据集"

编号标签样本数量/个
1BENIGN22 767
2DoS19 035
3PortScan7 946
4BruteForce2 767
5WebAttack2 180
6Bot1 966

表2

天然气管道数据集的描述"

标签缩写(编号)
NormalNormal(0)
Na?ve Malicious Response InjectionNMRI(1)
Complex Malicious Response InjectionCMRI(2)
Malicious State Command InjectionMSCI(3)
Malicious Parameter Command InjectionMPCI(4)
Malicious Function Code InjectionMFCI(5)
Denial of ServiceDOS(6)
ReconnaissanceRecon(7)

图1

CICIDS-2017_sample标签分布"

图2

树的深度对精确度的影响"

图3

损失随着迭代次数下降"

图4

不同特征子集对应的准确率"

图5

堆叠集成"

图6

本文模型流程图"

表3

多分类任务混淆矩阵"

类别1类别2类别n
类别1A11A12A1n
类别2A21A22A2n
????…
类别nA n1A n2A nn

表4

在集成和特征选择处理前后的比较"

方 法精确度召回率F1分数CPU时间/s
决策树0.9250.9640.95122.8
随机森林0.8620.9960.9063 min 5
极端梯度提升0.9030.9970.9354 min 14
堆叠集成0.9360.9750.9123 min 25
堆叠集成加特征提取0.9510.9870.9531 min 30

表5

与其他方法的比较"

来 源方 法精确度召回率F1分数类别数
本文

特征提取加

堆叠集成

0.9510.9870.9536
文献[24MLP0.8710.9950.8736
文献[25DeepGFL0.9484480.53112
文献[26MLP0.8840.8620.8722
文献[26LSTM0.9840.8980.8952

图7

分类的混淆矩阵"

表6

每一类数据的精确、召回率、F1分数"

类别精确度召回率F1分数
00.980.990.98
10.720.830.77
20.850.770.81
30.960.960.96
40.970.940.96
50.991.001.00
60.960.950.95
70.990.980.98

表7

整个数据集上的平均结果"

精确度召回率F1分数
宏平均0.930.930.93
加权平均0.970.970.97
1 Gaikwad D P, Thool R C.Intrusion detection system using bagging ensemble method of machine learning[C]∥International Conference on Computing Communication Control and Automation, Pune, India,2015: 291-295.
2 Shen Y, Zheng K, Wu C, et al. An ensemble method based on selection using bat algorithm for intrusion detection[J]. The Computer Journal, 2018, 61(4): 526-538.
3 Bhati B S, Chugh G, Al‐Turjman F, et al. An improved ensemble based intrusion detection technique using XGBoost[J]. Transactions on Emerging TeleCommunications Technologies, 2021, 32(6): No.e4076.
4 Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention[J]. Advances in Neural Information Processing Systems, 2014, 27:1-12.
5 Ahmad I. Feature selection using particle swarm optimization in intrusion detection[J]. International Journal of Distributed Sensor Networks, 2015, 11(10):No. 806954.
6 Dickson A, Thomas C. Improved PSO for optimizing the performance of intrusion detection systems[J]. Journal of Intelligent & Fuzzy Systems, 2020, 38(5): 6537-6547.
7 Aziz M R, Alfoudi A S. Feature selection of the anomaly network intrusion detection based on restoration particle swarm optimization[J]. International Journal of Intelligent Engineering & Systems, 2022, 15(5):592-600.
8 Wei P, Li Y F, Zhang Z, et al. An optimization method for intrusion detection classification model based on deep belief network[J]. IEEE Access, 2019, 7: 87593-87605.
9 Panigrahi R, Borah S. A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems[J]. International Journal of Engineering & Technology, 2018, 7(3): 479-482.
10 Goryunov M N, Matskevich A G, Rybolovlev D A. Synthesis of a machine learning model for detecting computer attacks based on the Cicids2017 dataset[J]. Proceedings of the Institute for System Programming of the RAS, 2020, 32(5): 81-94.
11 Stiawan D, Idris M Y B, Bamhdi A M, et al. CICIDS-2017 dataset feature analysis with information gain for anomaly detection[J]. IEEE Access, 2020, 8:132911-132921.
12 Salo F, Injadat M, Nassif A B, et al. Data mining techniques in intrusion detection systems: a systematic literature review[J]. IEEE Access, 2018, 6: 56046-56058.
13 Turnipseed I P. A new scada dataset for intrusion detection research[D]. Starkville:James Worth Bagley College of Engineering,Mississippi State University, 2015.
14 Rastogi A K, Narang N, Siddiqui Z A. Imbalanced big data classification: a distributed implementation of smote[C]∥Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking, Varanasi, India,2018: 1-6.
15 Myles A J, Feudale R N, Liu Y, et al. An introduction to decision tree modeling[J]. Journal of Chemometrics: a Journal of the Chemometrics Society, 2004, 18(6): 275-285.
16 Biau G, Scornet E. A random forest guided tour[J]. Test, 2016, 25: 197-227.
17 Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting(version 0.4-2)[DB/OL]. [2015-12-13]. .
18 温博文, 董文瀚, 解武杰, 等. 基于改进网格搜索算法的随机森林参数优化[J]. 计算机工程与应用, 2018, 54(10): 154-157.
Wen Bo-wen, Dong Wen-han, Xie Wu-jie, et al. Parameter optimization method for random forest based on improved grid search algorithm[J]. Computer Engineering and Applications,2018,54(10):154-157.
19 Pattawaro A, Polprasert C. Anomaly-based network intrusion detection system through feature selection and hybrid machine learning technique[C]∥The 16th International Conference on ICT and Knowledge Engineering(ICT&KE), Bangkok, Thailand, 2018: 1-6.
20 李红亚, 彭昱忠, 邓楚燕, 等. GA与PSO的混合研究综述[J]. 计算机工程与应用,2018, 54(2): 20-28.
Li Hong-ya, Peng Yu-zhong, Deng Chu-yan, et al. Review of hybrids of GA and PSO[J]. Computer Engineering and Applications, 2018, 54(2):20-28.
21 Mohammed M, Mwambi H, Omolo B, et al. Using stacking ensemble for microarray-based cancer classification[C]∥International Conference on Computer, Control, Electrical, and Electronics Engineering, Khartoum, Sudan, 2018: 1-8.
22 王辉, 李昌刚.Stacking集成学习方法在销售预测中的应用[J]. 计算机应用与软件, 2020, 37(8): 85-90.
Wang Hui, Li Chang-gang. Application of Stacking integrated learning method in sales forecasting[J]. Computer Applications and Software, 2020, 37(8):85-90.
23 张开放, 苏华友, 窦勇. 一种基于混淆矩阵的多分类任务准确率评估新方法[J].计算机工程与科学,2021, 43(11): 1910-1919.
Zhang Kai-fang, Su Hua-you, Dou Yong. A new multi-classification task accuracy evaluation method based on confusion matrix[J]. Computer Engineering & Science, 2021, 43(11): 1910-1919.
24 Belarbi O, Khan A, Carnelli P, et al. An intrusion detection system based on deep belief networks[C]∥International Conference on Science of Cyber Security,Matsue, Japan, 2022: 377-392.
25 Yao Y, Su L, Lu Z. DeepGFL: deep feature learning via graph for attack detection on flow-based network traffic[C]∥IEEE Military Communications Conference(MILCOM),Los Angeles, USA, 2018: 579-584.
26 Roopak M, Tian G Y, Chambers J. Deep learning models for cyber security in IoT networks[C]∥IEEE The 9th Annual Computing and Communication Workshop and Conference, Las Vegas, USA,2019: 452-457.
[1] 侯越,郭劲松,林伟,张迪,武月,张鑫. 分割可跨越车道分界线的多视角视频车速提取方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1692-1704.
[2] 赵秀芝,谢德红. 基于噪声鲁棒性特征提取的普洱茶品种鲁棒判别方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1756-1762.
[3] 赵宏伟,周明珠,刘萍萍,周求湛. 基于置信学习和协同训练的医学图像分割方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1675-1681.
[4] 申自浩,高永生,王辉,刘沛骞,刘琨. 面向车联网隐私保护的深度确定性策略梯度缓存方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1638-1647.
[5] 王友卫,刘奥,凤丽洲. 基于知识蒸馏和评论时间的文本情感分类新方法[J]. 吉林大学学报(工学版), 2025, 55(5): 1664-1674.
[6] 孟祥海,王国锐,张明扬,田毕江. 基于选择集成的山区高速事故预测模型[J]. 吉林大学学报(工学版), 2025, 55(4): 1298-1306.
[7] 徐涛,孔帅迪,刘才华,李时. 异构机密计算综述[J]. 吉林大学学报(工学版), 2025, 55(3): 755-770.
[8] 戴银飞,周秀贞,刘玉宝,刘志远. 基于CAN总线数据的车载网络入侵检测系统[J]. 吉林大学学报(工学版), 2025, 55(3): 857-865.
[9] 赵孟雪,车翔玖,徐欢,刘全乐. 基于先验知识优化的医学图像候选区域生成方法[J]. 吉林大学学报(工学版), 2025, 55(2): 722-730.
[10] 王娜,崔月磊,李杨,王子从. 基于小波包对数能量图的滚动轴承故障诊断方法[J]. 吉林大学学报(工学版), 2025, 55(2): 494-502.
[11] 蔡晓东,周青松,张言言,雪韵. 基于动静态和关系特征全局捕获的社交推荐模型[J]. 吉林大学学报(工学版), 2025, 55(2): 700-708.
[12] 车翔玖,武宇宁,刘全乐. 基于因果特征学习的有权同构图分类算法[J]. 吉林大学学报(工学版), 2025, 55(2): 681-686.
[13] 董华松,连远锋. 海量数字媒体视频无损转码重压缩的轻量化检测算法[J]. 吉林大学学报(工学版), 2025, 55(2): 741-747.
[14] 郭晓然,王铁君,闫悦. 基于局部注意力和本地远程监督的实体关系抽取方法[J]. 吉林大学学报(工学版), 2025, 55(1): 307-315.
[15] 汪豪,赵彬,刘国华. 基于时间和运动增强的视频动作识别[J]. 吉林大学学报(工学版), 2025, 55(1): 339-346.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李寿涛, 李元春. 在未知环境下基于递阶模糊行为的移动机器人控制算法[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] 刘庆民,王龙山,陈向伟,李国发. 滚珠螺母的机器视觉检测[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] 李红英;施伟光;甘树才 .

稀土六方Z型铁氧体Ba3-xLaxCo2Fe24O41的合成及电磁性能与吸波特性

[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] 杨树凯,宋传学,安晓娟,蔡章林 . 用虚拟样机方法分析悬架衬套弹性对
整车转向特性的影响
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[5] 冯金巧;杨兆升;张林;董升 . 一种自适应指数平滑动态预测模型[J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[6] 车翔玖,刘大有,王钲旋 .

两张NURBS曲面间G1光滑过渡曲面的构造

[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[7] 刘寒冰,焦玉玲,,梁春雨,秦卫军 . 无网格法中形函数对计算精度的影响[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[8] 张全发,李明哲,孙刚,葛欣 . 板材多点成形时柔性压边与刚性压边方式的比较[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[9] .

吉林大学学报(工学版)2007年第4期目录

[J]. 吉林大学学报(工学版), 2007, 37(04): 0 .
[10] 李月英,刘勇兵,陈华 . 凸轮材料的表面强化及其摩擦学特性
[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .