Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (8): 2681-2692.doi: 10.13229/j.cnki.jdxbgxb.20231299

Previous Articles    

Small target swmantic segmentation method based MFF-STDC network in complex outdoor environments

Qing-lin AI(),Yuan-xiao LIU,Jia-hao YANG   

  1. Key Laboratory of Special Purpose Equipment and Advanced Manufacturing Technology,Ministry of Education and Zhejiang Province,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2023-11-24 Online:2025-08-01 Published:2025-11-14
  • Contact: Qing-lin AI E-mail:aqlaql@163.com

Abstract:

The MFF-STDC network model based on multi-level feature fusion is built in this paper to solve the problem that lightweight networks have weak segmentation effect for small target category objects in complex environment. Firstly,by superimposing the feature extraction module based on group convolution many times, the feature extraction capability of the network is improved. Secondly, the combination ability of multi-scale feature information is improved by hierarchical attention module and CA mechanism. Lastly,A-Cityscapes dataset A-IDD dataset and Field dataset were built based on adaptive replication algorithm, the number of small target categories in the dataset was increased, and training and testing were completed. The MFF-STDC network improves the mIoU by 4.01%, 3.65%, and 2.94% respectively comparing with the STDC, and segmentation of the small target categories in the complex environment is much better than that of other networks. A real-world testing experimental platform is built, and the test results show that the MFF-STDC network effectively improves the semantic segmentation accuracy and classification ability of small target categories, and meets the real-time requirements.

Key words: computer applications, small target category detection, multi-level feature fusion, coordinate attention mechanism, adaptive replication algorithm

CLC Number: 

  • TP391

Fig.1

Overall structure of the MFF-STDC newwork"

Fig.2

Structure of DLPRM"

Fig.3

Structure of HAM"

Fig.4

Structure of MIC module"

Fig.5

Improved CA-FFM"

Fig.6

Schematic diagram of space object projection on a plane"

Fig.7

Effect of intra-image copy data augmentation"

Fig.8

Effect of cross-image copy data augmentation"

Fig.9

Some pictures of field topography dataset and their annotations"

Table 1

Classification result confusion matrix and its parameters"

实际情况预测结果
正例反例
正例TP(真正例)FN(假反例)
反例FP(假正例)TN(真反例)

Table 2

mIoU of different networks tested on different datasets"

模型mIoU/%
A-CityscapesA-IDDField
SegNet44.3630.7229.49
ENet57.9343.1238.64
BiSeNet69.2354.3144.91
DeepLabV3+(MV2)72.1457.7147.79
Segformer71.3256.1347.12
STDC71.8556.3446.84
本文网络(MFF-STDC)75.8659.9949.78

Fig.10

Prediction results of different networks on A-Cityscape dataset"

Fig.11

Prediction results of different networks on A-IDD dataset"

Fig.12

Prediction results of different networks on Field dataset"

Table 3

Acuracy and model sizes of different network"

模型mIoU/%picAcc/%Params/M
SegNet44.3659.3229.5
ENet57.9366.730.4
BiSeNet69.2378.6559.24
DeepLabV3+(MV2)71.6483.8245.57
Segformer70.3281.143.72
STDC71.8581.327.08
本文网络(MFF-STDC)75.8683.875.43

Table 4

Ablation experiment"

实验组别DW-STDCCA-FFMHAMMICmIoU/%Params/MFLOPs/G
Exp 1××××71.858.5716.95
Exp 2×××72.675.3613.72
Exp 3×××72.318.57616.953
Exp 4×××72.098.56116.95
Exp 5×××72.358.66117.702
Exp 675.865.4314.48

Table 5

Model parameter quantity and precision under different grouping numbers and layers"

分组数各阶段模块层数

参数量

Params/M

计算量

Flops/G

精度

mIoU/%

Stage3Stage4Stage5
12228.5716.9571.85
Cout/162224.3412.0371.33
Cout/164535.3613.7272.67
Cout/167957.0816.2673.02
Cout/41224.4312.2070.89
Cout/42435.5114.0871.98
Cout/42867.3516.9072.42
Cout1224.3412.0369.83
Cout2435.3613.7271.04
Cout2867.0816.2671.78

Fig.13

Actual scenario testing system and testing system worksite"

Fig.14

Actual environment test results"

[1] 张艳, 张明路, 吕晓玲, 等. 深度学习小目标检测算法研究综述[J]. 计算机工程与应用, 2022, 58(15): 1-17.
Zhang Yan, Zhang Ming-lu, Xiao-ling Lyu, et al. Review of research on small target detection based on deep learning[J]. Computer Engineering and Applications, 2022, 58(15): 1-17.
[2] Takikawa T, Acuna D, Jampani V, et al. Gated-SCNN: Gated shape CNN for semantic segmentation[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway,N J: IEEE, 2019: 5229-5238.
[3] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
[4] 杨玉敏, 廖育荣, 林存宝, 等. 轻量化卷积神经网络目标检测算法综述[J]. 舰船电子工程, 2021, 41(4): 31-36.
Yang Yu-min, Liao Yu-rong, Lin Cun-bao, et al. A survey of object detection algorithms for lightweight convolutional neural networks[J]. Ship Electronic Engineering, 2021, 41(4): 31-36.
[6] Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]∥Proceedings of the European Conference on Computer Vision (ECCV). Munich: IEEE, 2018: 334-349.
[5] Fan M, Lai S, Huang J, et al. Rethinking bisenet for real-time semantic segmentation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway,N J:IEEE, 2021: 9716-9725.
[7] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway,N J: IEEE, 2021: 13708-13717.
[8] Ioannou Y, Robertson D, Cipolla R, et al. Deep roots: improving CNN efficiency with hierarchical filter groups[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway,N J: IEEE, 2017: 5977-5986.
[9] 霍光,林大为,刘元宁,等.基于多尺度特征和注意力机制的轻量级虹膜分割模型[J].吉林大学学报: 工学版, 2023, 53(9): 2591-2600.
Huo Guang, Lin Da-wei, Liu Yuan-ning, et al. Lightweight iris segmentation model based on multiscale feature and attention mechanism[J]. Journal of Jilin University (Engineering and Technology Edition),2023, 53(9): 2591-2600.
[10] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway,N J: IEEE, 2016: 3213-3223.
[11] Varma G, Subramanian A, Namboodiri A, et al. IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments[C]∥Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway,N J:IEEE, 2019: 1743-1751.
[12] Shi Q, Liu M, Li S, et al. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-16.
[13] Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
[14] Paszke A, Chaurasia A, Kim S, et al. ENet: a deep neural network architecture for real-time semantic segmentation[J]. Arxiv Preprint, 2016, 6: No. 160602147.
[15] Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[1] He-shan ZHANG,Meng-wei FAN,Xin TAN,Zhan-ji ZHENG,Li-ming KOU,Jin XU. Dense small object vehicle detection in UAV aerial images using improved YOLOX [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1307-1318.
[2] Tao XU,Shuai-di KONG,Cai-hua LIU,Shi LI. Overview of heterogeneous confidential computing [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 755-770.
[3] Ze-qiang ZHANG,Wei LIANG,Meng-ke XIE,Hong-bin ZHENG. Elite differential evolution algorithm for mixed⁃model two⁃side disassembly line balancing problem [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1297-1304.
[4] Xiao-ning LI,Hong-wei ZHAO,Dan-yang ZHANG,Yuan ZHANG. Image retrieval algorithm based on response value center weighted convolution feature [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(11): 2669-2675.
[5] Li-li REN,Zhi-jun WANG,Dong-mei YAN. Improved multi⁃verse algorithm with combined slime mould foraging behavior [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2190-2197.
[6] Bing-hai ZHOU,Qiong WU. Balancing and bi⁃objective optimization of robotic assemble lines [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(2): 720-727.
[7] Bing-hai ZHOU,Qiong WU. Balancing and optimization of robotic assemble lines withtool and space constraint [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(6): 2069-2075.
[8] Bin LI,Guo⁃jun SHEN,Geng SUN,Ting⁃ting ZHENG. Improved chicken swarm optimization algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1339-1344.
[9] ZHANG Hao, LIU Hai-ming, WU Chun-guo, ZHANG Yan-mei, ZHAO Tian-ming, LI Shou-tao. Detection method of vehicle in highway green toll lane based on multi-feature fusion [J]. 吉林大学学报(工学版), 2016, 46(1): 271-276.
[10] CHENG Yu, ZHAO Hong-wei, LONG Man-li, LI Yu-cui. Improvement of earliest deadline first scheduling algorithm [J]. 吉林大学学报(工学版), 2013, 43(05): 1338-1342.
[11] ZHAO Hong-wei, CHENG Yu, LI Zhuo, LI Yu-cui. Design of QoS architecture in IEEE802.16d [J]. 吉林大学学报(工学版), 2013, 43(03): 701-705.
[12] JIANG Ju-lang, HUANG Zhong, ZHENG Jiang-yun. Algorithm for texture atlas generation based on triangular bounding box [J]. , 2012, (06): 1543-1547.
[13] ZHAO Hong-wei, CUI Hong-rui, DAI Jin-bo, ZANG Xue-bai. Contour detection based on HMAX model and non-classical receptive field inhibition [J]. 吉林大学学报(工学版), 2012, 42(01): 128-133.
[14] ZHAO Xin,WANG Xiao-dong. Efficient self-adaptive broadcast authentication mechanism in wireless sensor networks [J]. 吉林大学学报(工学版), 2011, 41(03): 758-764.
[15] ZHAO Hong-Wei, ZHAO De-Fang, ZHANG Yuan, WEI Li. Application software thread of mobile video system based ARM9 [J]. 吉林大学学报(工学版), 2010, 40(增刊): 301-0303.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!