频率和空间特征融合的轻量级多尺度遥感图像场景分类网络

doi:10.13229/j.cnki.jdxbgxb.20240054

Abstract

Abstract:

To address the issues of diverse land cover sizes and spatial combinations， as well as significant interclass similarity and intraclass variability in remote sensing image classification tasks， a lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network（FS-LMFFNet） is proposed， based on the purpose of effective feature extraction and full integration of multi-scale features. Firstly， to combine the advantages of CNN and Transformer， and achieve an adequate extraction of local and global features， a Frequency and Spatial MLP module（FS-MLP） is proposed， which complements traditional spatial operations in extracting global high-frequency texture features by introducing frequency domain analysis. Secondly， to resolve the multi-scale characteristics of remote sensing scene images， a Lightweight Multi-layer Feature Fusion（LMFF） module is proposed， in which lightweight convolutional blocks are employed to efficiently fuse the multi-scale features in the first three stages. Finally， FS-LMFFNet has been extensively experimented on three publicly available datasets UC_Merced， RSSCN7 and AID datasets and yielded remarkable accuracies of 99.10%， 96.60% and 95.48%， respectively. Experimental results demonstrate the superior multi-scale feature extraction and fusion capability of FS-LMFFNet， which achieves better performance than other state-of-the-art models.

Key words: remote sensing images, deep learning, convolutional neural network（CNN）, fast Fourier transform（FFT）, multi-scale feature fusion

CLC Number:

TP391.4

Wei WANG,Yu-jie SUN,Xin WANG. Lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network[J].Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3361-3371.

Figures/Tables 11

Fig.1

Fig.2

Table 1

Fig.3

Fig.4

Table 2

Table 3

Ablation experiments for LMFF’s downsample operations"

下采样方法	参数量/M	计算量/G	准确率/%
（-，-，下采样块）	2.67	1.14	98.16±0.30
$M a x P 28, M a x P 24, M a x P 22$	2.44	1.12	98.70±0.26
$M a x P 88, M a x P 44, M a x P 22$	2.43	1.12	99.10±0.22

Table 3

Table 4

Table 5

Fig.5

Fig.6

References 36

[1]	徐从安, 吕亚飞, 张筱晗, 等. 基于双重注意力机制的遥感图像场景分类特征表示方法[J]. 电子与信息学报, 2021, 43(3): 683-691.
	Xu Cong-an, Ya-fei Lyu, Zhang Xiao-han, et al. A discriminative feature representation method based on dual attention mechanism for remote sensing image scene classification[J]. Journal of Electronics & Information Technology, 2021, 43(3): 683-691.
[2]	Morell-Monzó S, Sebastiá-Frasquet M T, Estornell J. Land use classification of VHR images for mapping small-sized abandoned citrus plots by using spectral and textural information[J]. Remote Sensing, 2021, 13(4): No.681.
[3]	Liang S, Cheng J, Zhang J. Maximum likelihood classification of soil remote sensing image based on deep learning[J]. Earth Sciences Research Journal, 2020, 24(3): 357-365.
[4]	Fatemighomi H S, Golalizadeh M, Amani M. Object-based hyperspectral image classification using a new latent block model based on hidden Markov random fields[J]. Pattern Anal Applic, 2022, 25: 467-481.
[5]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[6]	Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]∥Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy, 2017: 764-773.
[7]	Ding X, Zhang X, Han J, et al. Scaling up your kernels to 31×31: revisiting large kernel design in CNNs[C]∥Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), New Orleans, Louisiana, USA,2022: 11953-11965.
[8]	Guo M H, Lu C Z, Liu Z N, et al. Visual attention network[J]. Computational Visual Media, 2022, 9(4):733-752.
[9]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Proceedings of 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000-6010.
[10]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2022-10-18]..
[11]	Bazi Y, Bashmal L, Rahhal M M A, et al. Vision transformers for remote sensing image classification[J]. Remote Sensing, 2021, 13(3): No. 516.
[12]	Yu W H, Luo M, Zhou P, et al. Meta former is actually what you need for vision[C]∥Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 10809-10819.
[13]	王威, 李希杰, 王新. ADC-CPANet: 一种局部-全局特征融合的遥感图像分类方法[J]. 遥感学报, 2024, 28(10): 2661-2672.
	Wang Wei, Li Xi-jie, Wang Xin. ADC-CPANet:a remote sensing image classification method based on local-global feature fusion[J]. National Remote Sensing Bulletin, 2024, 28(10): 2661-2672.
[14]	Wang W, Hu T, Wang X, et al. BFRNet: bidimensional feature representation network for remote sensing images classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-13.
[15]	Huang Z, Zhang Z, Lan C, et al. Adaptive frequency filters as efficient global token mixers[EB/OL].[2023-03-22]. .
[16]	Cao R, Fang L, Lu T, et al. Self-attention -based deep feature fusion for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(1): 43-47.
[17]	王威, 邓纪伟, 王新, 等. 面向遥感图像场景分类的GLFFNet模型[J]. 测绘学报, 2023, 52(10): 1693-1702.
	Wang Wei, Deng Ji-wei, Wang Xin, et al. GLFFNet model for remote sensing image scene classification[J]. Acta Geodaetica ET Cartographica Sinica, 2023, 52(10): 1693-1702.
[18]	Hendrycks D, Gimpel K. Gaussian error linear units (GELUs)[EB/OL]. [2024-01-10]. .
[19]	Sandler M, Howard A, Zhu M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA,2018:4510-4520.
[20]	Zhou D, Hou Q, Chen Y, et al. Rethinking bottleneck structure for efficient mobile network design[J]. In Computer Vision-ECCV 2020, Lecture Notes in Computer Science, 2020, 12348: 680-697.
[21]	Sergey I, Christian S. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]∥Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 2015:448-456.
[22]	Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Montreal, Canada,2021: 13713-13722.
[23]	Yang Y, Shawn N. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose California, USA, 2010: 270-279.
[24]	Zou Q, Ni L H, Zhang T, et al. Deep learning based feature selection for remote sensing scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(11): 2321-2325.
[25]	Xia G S, Hu J, Hu F, et al. AID: a benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[26]	Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]∥ IEEE/CVF International Conference on Computer Vision(ICCV), Montreal, Canada, 2021: 10012-10022.
[27]	Cao G, Luo S, Huang W, et al. Strip-MLP: efficient token interaction for vision MLP[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France,2023: 1494-1504.
[28]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2023-03-18]. .
[29]	He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
[30]	Qin Z, Zhang P, Wu F, et al. FcaNet: frequency channel attention networks[C]∥Proceedings of the IEEE International Conference on Computer Vision, Xi'an, China, 2020: 763-772.
[31]	Rao Y, Zhao W, Zhu Z, et al. Global filter networks for image classification[J]. Advances in Neural Information Processing Systems, 2021, 2: 980-993.
[32]	Tang Y, Han K, Guo J, et al. An image patch is a wave: phase-aware vision MLP[EB/OL].[2023-03-18]. .
[33]	Li J, Hassani A, Walton S, et al. ConvMLP: Hierarchical Convolutional MLPs for Vision[C]∥IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vancouver,Canada, 2023: 6307-6316.
[34]	Wang X, Duan L, Ning C, et al. Relation-attention networks for remote sensing scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 422-439.
[35]	Tang X, Li M, Ma J, et al. EMTCAL: efficient multiscale transformer and cross-level attention learning for remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-15.
[36]	Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]∥Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618-626.

Related Articles 15

[1]	Zong-wei YAO,Chen CHEN,Zhen-yun GAO,Hong-peng JIN,Hao RONG,Xue-fei LI,Hong-pu HUANG,Qiu-shi BI. Visual recognition of excavator keypoints based on synthetic image datasets [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 76-85.
[2]	Lin-hong WANG,Yu-yang LIU,Zi-yu LIU,Ying-jia LU,Yu-heng ZHANG,Gui-shu HUANG. Defect recognition of lightweight bridges based on YOLOv5 [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 2958-2968.
[3]	Jing LIAN,Ji-bao ZHANG,Ji-zhao LIU,Jia-jun ZHANG,Zi-long DONG. Text-based guided face image inpainting [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2732-2740.
[4]	Yuan-ning LIU,Xing-zhe WANG,Zi-yu HUANG,Jia-chen ZHANG,Zhen LIU. Stomach cancer survival prediction model based on multimodal data fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2693-2702.
[5]	Jing-shu YUAN,Wu LI,Xing-yu ZHAO,Man YUAN. Semantic matching model based on BERTGAT-Contrastive [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2383-2392.
[6]	Hui-zhi XU,Dong-sheng HAO,Xiao-ting XU,Shi-sen JIANG. Expressway small object detection algorithm based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2003-2014.
[7]	Ying YU,Chun-ping WANG,Ren-ke KOU,Bo-xiong YANG,Lei WANG,Fu-jun ZHAO,Qiang FU. Semantic segmentation algorithm for multi temporal high⁃resolution satellite remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2131-2137.
[8]	Ru-bo ZHANG,Shi-qi CHANG,Tian-yi ZHANG. Review on image information hiding methods based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1497-1515.
[9]	Bin WEN,Yi-fu DING,Chao YANG,Yan-jun SHEN,Hui LI. Self-selected architecture network for traffic sign classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1705-1713.
[10]	Li TIAN,Yu-hui JIA. Improved YOLOv5s algorithm for target detection in hyperspectral remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1742-1748.
[11]	Jian LI,Huan LIU,Yan-qiu LI,Hai-rui WANG,Lu GUAN,Chang-yi LIAO. Image recognition research on optimizing ResNet-18 model based on THGS algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1629-1637.
[12]	Zhen-jiang LI,Li WAN,Shi-rui ZHOU,Chu-qing TAO,Wei WEI. Dynamic estimation of operational risk of tunnel traffic flow based on spatial-temporal Transformer network [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1336-1345.
[13]	Xue-jun LI,Lin-fei QUAN,Dong-mei LIU,Shu-you YU. Improved Faster⁃RCNN algorithm for traffic sign detection [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 938-946.
[14]	Meng-xue ZHAO,Xiang-jiu CHE,Huan XU,Quan-le LIU. A method for generating proposals of medical image based on prior knowledge optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 722-730.
[15]	Hu JIN,Yu-sheng SHEN,Yong FANG,Li YU,Jia-mei ZHOU. Identification of small cracks in highway tunnel lining based on deep learning SSD algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3653-3659.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

	层	输出大小
阶段一	Tokenizer层	（H/4）×（W/4）
阶段一	FS-MLP模块×2	（H/4）×（W/4）
阶段二	下采样模块	（H/8）×（W/8）
阶段二	FS-MLP模块×2	（H/8）×（W/8）
阶段三	下采样模块	（H/16）×（W/16）
阶段三	FS-MLP模块×8	（H/16）×（W/16）
阶段四	LMFF模块	（H/32）×（W/32）
分类器	归一化、全局池化全连接层	C=预测类别

频率分支	空间分支	原始分支	权重计算	参数量/M	计算量/G	准确率/%
×	Kernel size=7	√	√	1.99	1.09	98.36±0.26
√	×	√	√	2.31	1.07	98.62±0.20
√	Kernel size=3	√	√	2.41	1.11	98.72±0.11
√	Kernel size=7	×	√	2.34	1.09	98.86±0.26
√	Kernel size=7	√	×	2.36	1.12	98.50±0.35
√	Kernel size=7	√	√	2.43	1.12	99.10±0.22

模型	参数量/M	计算量/G	准确率/%
模型	参数量/M	计算量/G	UC_Merced	RSSCN7	AID
ResNet18^［29］	11.15	1.82	98.28±0.25	94.96±0.42	94.50±0.19
FcaNet18^［30］	11.23	1.82	97.96±0.47	94.96±0，52	94.34±0.34
MobileNeXt^［20］	2.86	0.29	97.14±0.46	94.64±0.73	95.08±0.16
MobileNeXt_CA^［22］	3.26	0.29	97.36±0.37	95.10±0.39	95.26±0.11
SwinTransformer_Tiny^［26］	26.96	4.2	95.48±0.45	93.40±0.52	90.20±0.20
VAN_b0^［8］	3.91	0.88	97.62±0.18	94.28±0.41	93.88±0.23
GFNet_PyramidTi^［31］	12.18	1.90	95.18±0.68	91.54±0.44	90.90±0.57
WaveMLP_T^［32］	16.40	2.48	96.86±0.72	92.48±0.41	92.54±0.40
ConvMLP_S^［33］	8.60	2.30	95.24±0.86	94.62±0.28	93.38±0.22
Strip-MLP-T*^［27］	18.24	2.54	98.72±0.16	95.32±0.51	95.12±0.23
SAFF^［16］	14.76	15.38	95.58±0.39	93.78±0.63	94.18±0.17
RaNet^［34］	21.47	3.85	98.38±0.39	95.24±0.15	95.38±0.22
EMTCAL^［35］	27.30	4.23	98.78±0.27	95.32±0.37	94.96±0.19
FS-LMFFNet	2.43	1.12	99.10±0.22	96.60±0.24	95.48±0.13

Lightweight frequency and spatial feature fused multi-scale remote sensing scene classification network

RICH HTML

PDF (PC)