Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (10): 3361-3371.doi: 10.13229/j.cnki.jdxbgxb.20240054

Previous Articles    

Lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network

Wei WANG(),Yu-jie SUN,Xin WANG()   

  1. School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China
  • Received:2024-01-15 Online:2025-10-01 Published:2026-02-03
  • Contact: Xin WANG E-mail:wangwei@csust.edu.cn;wangxin@csust.edu.cn

Abstract:

To address the issues of diverse land cover sizes and spatial combinations, as well as significant interclass similarity and intraclass variability in remote sensing image classification tasks, a lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network(FS-LMFFNet) is proposed, based on the purpose of effective feature extraction and full integration of multi-scale features. Firstly, to combine the advantages of CNN and Transformer, and achieve an adequate extraction of local and global features, a Frequency and Spatial MLP module(FS-MLP) is proposed, which complements traditional spatial operations in extracting global high-frequency texture features by introducing frequency domain analysis. Secondly, to resolve the multi-scale characteristics of remote sensing scene images, a Lightweight Multi-layer Feature Fusion(LMFF) module is proposed, in which lightweight convolutional blocks are employed to efficiently fuse the multi-scale features in the first three stages. Finally, FS-LMFFNet has been extensively experimented on three publicly available datasets UC_Merced, RSSCN7 and AID datasets and yielded remarkable accuracies of 99.10%, 96.60% and 95.48%, respectively. Experimental results demonstrate the superior multi-scale feature extraction and fusion capability of FS-LMFFNet, which achieves better performance than other state-of-the-art models.

Key words: remote sensing images, deep learning, convolutional neural network(CNN), fast Fourier transform(FFT), multi-scale feature fusion

CLC Number: 

  • TP391.4

Fig.1

FS-LMFFNet structure"

Fig.2

Coordinate attention structure"

Table 1

FS-LMFFNet network structure table"

输出大小
阶段一Tokenizer层H/4)×(W/4)
FS-MLP模块×2
阶段二下采样模块H/8)×(W/8)
FS-MLP模块×2
阶段三下采样模块H/16)×(W/16)
FS-MLP模块×8
阶段四LMFF模块H/32)×(W/32)
分类器

归一化、全局池化

全连接层

C=预测类别

Fig.3

Examples of the UC_Merced dataset"

Fig.4

Loss curves of FS-LMFFNet on different dataset"

Table 2

Ablation experiments with FSTM"

频率分支空间分支原始分支权重计算参数量/M计算量/G准确率/%
×Kernel size=71.991.0998.36±0.26
×2.311.0798.62±0.20
Kernel size=32.411.1198.72±0.11
Kernel size=7×2.341.0998.86±0.26
Kernel size=7×2.361.1298.50±0.35
Kernel size=72.431.1299.10±0.22

Table 3

Ablation experiments for LMFF’s downsample operations"

下采样方法参数量/M计算量/G准确率/%
(-,-,下采样块)2.671.1498.16±0.30
MaxP28,MaxP24,MaxP222.441.1298.70±0.26
MaxP88,MaxP44,MaxP222.431.1299.10±0.22

Table 4

Ablation experiments for LMFF’s feature fusion method"

模型参数量/M计算量/G准确率/%
FS-MLP4.321.1297.42±0.42
Sandglass Block2.421.1298.98±0.20
Sandglass_CA Block2.431.1299.10±0.22

Table 5

Overall ACC and computation complexity of different methods on three datasets"

模型参数量/M计算量/G准确率/%
UC_MercedRSSCN7AID
ResNet182911.151.8298.28±0.2594.96±0.4294.50±0.19
FcaNet183011.231.8297.96±0.4794.96±0,5294.34±0.34
MobileNeXt202.860.2997.14±0.4694.64±0.7395.08±0.16
MobileNeXt_CA223.260.2997.36±0.3795.10±0.3995.26±0.11
SwinTransformer_Tiny2626.964.295.48±0.4593.40±0.5290.20±0.20
VAN_b083.910.8897.62±0.1894.28±0.4193.88±0.23
GFNet_PyramidTi3112.181.9095.18±0.6891.54±0.4490.90±0.57
WaveMLP_T3216.402.4896.86±0.7292.48±0.4192.54±0.40
ConvMLP_S338.602.3095.24±0.8694.62±0.2893.38±0.22
Strip-MLP-T*2718.242.5498.72±0.1695.32±0.5195.12±0.23
SAFF1614.7615.3895.58±0.3993.78±0.6394.18±0.17
RaNet3421.473.8598.38±0.3995.24±0.1595.38±0.22
EMTCAL3527.304.2398.78±0.2795.32±0.3794.96±0.19
FS-LMFFNet2.431.1299.10±0.2296.60±0.2495.48±0.13

Fig.5

Confusion matrix of the FS-MLP method on AID datase"

Fig.6

Grad-CAM visualization results onRSSCN7 dataset"

[1] 徐从安, 吕亚飞, 张筱晗, 等. 基于双重注意力机制的遥感图像场景分类特征表示方法[J]. 电子与信息学报, 2021, 43(3): 683-691.
Xu Cong-an, Ya-fei Lyu, Zhang Xiao-han, et al. A discriminative feature representation method based on dual attention mechanism for remote sensing image scene classification[J]. Journal of Electronics & Information Technology, 2021, 43(3): 683-691.
[2] Morell-Monzó S, Sebastiá-Frasquet M T, Estornell J. Land use classification of VHR images for mapping small-sized abandoned citrus plots by using spectral and textural information[J]. Remote Sensing, 2021, 13(4): No.681.
[3] Liang S, Cheng J, Zhang J. Maximum likelihood classification of soil remote sensing image based on deep learning[J]. Earth Sciences Research Journal, 2020, 24(3): 357-365.
[4] Fatemighomi H S, Golalizadeh M, Amani M. Object-based hyperspectral image classification using a new latent block model based on hidden Markov random fields[J]. Pattern Anal Applic, 2022, 25: 467-481.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[6] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]∥Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy, 2017: 764-773.
[7] Ding X, Zhang X, Han J, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]∥Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, Louisiana, USA,2022: 11953-11965.
[8] Guo M H, Lu C Z, Liu Z N, et al. Visual attention network[J]. Computational Visual Media, 2022, 9(4):733-752.
[9] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Proceedings of 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000-6010.
[10] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2022-10-18]..
[11] Bazi Y, Bashmal L, Rahhal M M A, et al. Vision transformers for remote sensing image classification[J]. Remote Sensing, 2021, 13(3): No. 516.
[12] Yu W H, Luo M, Zhou P, et al. Meta former is actually what you need for vision[C]∥Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 10809-10819.
[13] 王威, 李希杰, 王新. ADC-CPANet: 一种局部-全局特征融合的遥感图像分类方法[J]. 遥感学报, 2024, 28(10): 2661-2672.
Wang Wei, Li Xi-jie, Wang Xin. ADC-CPANet:a remote sensing image classification method based on local-global feature fusion[J]. National Remote Sensing Bulletin, 2024, 28(10): 2661-2672.
[14] Wang W, Hu T, Wang X, et al. BFRNet: bidimensional feature representation network for remote sensing images classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-13.
[15] Huang Z, Zhang Z, Lan C, et al. Adaptive frequency filters as efficient global token mixers[EB/OL].[2023-03-22]. .
[16] Cao R, Fang L, Lu T, et al. Self-attention -based deep feature fusion for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(1): 43-47.
[17] 王威, 邓纪伟, 王新, 等. 面向遥感图像场景分类的GLFFNet模型[J]. 测绘学报, 2023, 52(10): 1693-1702.
Wang Wei, Deng Ji-wei, Wang Xin, et al. GLFFNet model for remote sensing image scene classification[J]. Acta Geodaetica ET Cartographica Sinica, 2023, 52(10): 1693-1702.
[18] Hendrycks D, Gimpel K. Gaussian error linear units (GELUs)[EB/OL]. [2024-01-10]. .
[19] Sandler M, Howard A, Zhu M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA,2018:4510-4520.
[20] Zhou D, Hou Q, Chen Y, et al. Rethinking bottleneck structure for efficient mobile network design[J]. In Computer Vision-ECCV 2020, Lecture Notes in Computer Science, 2020, 12348: 680-697.
[21] Sergey I, Christian S. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]∥Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015:448-456.
[22] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Montreal, Canada,2021: 13713-13722.
[23] Yang Y, Shawn N. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose California, USA, 2010: 270-279.
[24] Zou Q, Ni L H, Zhang T, et al. Deep learning based feature selection for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(11): 2321-2325.
[25] Xia G S, Hu J, Hu F, et al. AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[26] Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥ IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, Canada, 2021: 10012-10022.
[27] Cao G, Luo S, Huang W, et al. Strip-MLP: efficient token interaction for vision MLP[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France,2023: 1494-1504.
[28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2023-03-18]. .
[29] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
[30] Qin Z, Zhang P, Wu F, et al. FcaNet: frequency channel attention networks[C]∥Proceedings of the IEEE International Conference on Computer Vision, Xi'an, China, 2020: 763-772.
[31] Rao Y, Zhao W, Zhu Z, et al. Global filter networks for image classification[J]. Advances in Neural Information Processing Systems, 2021, 2: 980-993.
[32] Tang Y, Han K, Guo J, et al. An image patch is a wave: phase-aware vision MLP[EB/OL].[2023-03-18]. .
[33] Li J, Hassani A, Walton S, et al. ConvMLP: Hierarchical Convolutional MLPs for Vision[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vancouver,Canada, 2023: 6307-6316.
[34] Wang X, Duan L, Ning C, et al. Relation-attention networks for remote sensing scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 422-439.
[35] Tang X, Li M, Ma J, et al. EMTCAL: efficient multiscale transformer and cross-level attention learning for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-15.
[36] Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]∥Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618-626.
[1] Zong-wei YAO,Chen CHEN,Zhen-yun GAO,Hong-peng JIN,Hao RONG,Xue-fei LI,Hong-pu HUANG,Qiu-shi BI. Visual recognition of excavator keypoints based on synthetic image datasets [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 76-85.
[2] Lin-hong WANG,Yu-yang LIU,Zi-yu LIU,Ying-jia LU,Yu-heng ZHANG,Gui-shu HUANG. Defect recognition of lightweight bridges based on YOLOv5 [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 2958-2968.
[3] Jing LIAN,Ji-bao ZHANG,Ji-zhao LIU,Jia-jun ZHANG,Zi-long DONG. Text-based guided face image inpainting [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2732-2740.
[4] Yuan-ning LIU,Xing-zhe WANG,Zi-yu HUANG,Jia-chen ZHANG,Zhen LIU. Stomach cancer survival prediction model based on multimodal data fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2693-2702.
[5] Jing-shu YUAN,Wu LI,Xing-yu ZHAO,Man YUAN. Semantic matching model based on BERTGAT-Contrastive [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2383-2392.
[6] Hui-zhi XU,Dong-sheng HAO,Xiao-ting XU,Shi-sen JIANG. Expressway small object detection algorithm based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2003-2014.
[7] Ying YU,Chun-ping WANG,Ren-ke KOU,Bo-xiong YANG,Lei WANG,Fu-jun ZHAO,Qiang FU. Semantic segmentation algorithm for multi temporal high⁃resolution satellite remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2131-2137.
[8] Ru-bo ZHANG,Shi-qi CHANG,Tian-yi ZHANG. Review on image information hiding methods based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1497-1515.
[9] Bin WEN,Yi-fu DING,Chao YANG,Yan-jun SHEN,Hui LI. Self-selected architecture network for traffic sign classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1705-1713.
[10] Li TIAN,Yu-hui JIA. Improved YOLOv5s algorithm for target detection in hyperspectral remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1742-1748.
[11] Jian LI,Huan LIU,Yan-qiu LI,Hai-rui WANG,Lu GUAN,Chang-yi LIAO. Image recognition research on optimizing ResNet-18 model based on THGS algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1629-1637.
[12] Zhen-jiang LI,Li WAN,Shi-rui ZHOU,Chu-qing TAO,Wei WEI. Dynamic estimation of operational risk of tunnel traffic flow based on spatial-temporal Transformer network [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1336-1345.
[13] Xue-jun LI,Lin-fei QUAN,Dong-mei LIU,Shu-you YU. Improved Faster⁃RCNN algorithm for traffic sign detection [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 938-946.
[14] Meng-xue ZHAO,Xiang-jiu CHE,Huan XU,Quan-le LIU. A method for generating proposals of medical image based on prior knowledge optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 722-730.
[15] Hu JIN,Yu-sheng SHEN,Yong FANG,Li YU,Jia-mei ZHOU. Identification of small cracks in highway tunnel lining based on deep learning SSD algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3653-3659.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!