Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (7): 2393-2401.doi: 10.13229/j.cnki.jdxbgxb.20231128

Previous Articles    

Few-shot remote sensing image classification based on contrastive learning text perception

Wen-hui LI(),Chen YANG   

  1. College of Computer Science and Technology,Jilin University,Changchun 130012,China
  • Received:2023-10-20 Online:2025-07-01 Published:2025-09-12

Abstract:

Aiming at the problem that existing methods mainly use a single modality of remote sensing images to solve the problem of low similarity of the same class, a remote sensing image classification method based on multimodal learning is proposed. Firstly, The spatial features of the image are corrected. The image encoder is pre-trained using contrastive learning to generate image features and text features are generated using text encoder. Secondly, a feature decoder is introduced to acquire text perception visual features, and a new attention mechanism approach is proposed in the feature fusion stage. Thirdly, a new image encoder is designed to improve the classification accuracy. Finally, the similarity between the support set and the query set is computed for further class prediction. Experiments are conducted on the NWPU-RESISC45, AID and UC Merced datasets. The 5-way 5-shot accuracies of 86.46%, 85.89%, and 80.32% respectively outperform existing methods in few-shot remote sensing image classification.

Key words: computer application technology, contrastive learning, text perception, few-shot learning, remote sensing image classification

CLC Number: 

  • TP751.1

Fig.1

Pre-training contrastive learning stage"

Fig.2

Structure of CLTP-ML framework"

Fig.3

Structure of image encoder"

Fig.4

Structure of feature decoder"

Fig.5

Fusion of image features and text features"

Fig.6

Structure of attention mechanism"

Table1

Comparison of classification accuracies of models on three datasets %"

模型5-way 5-shot
NWPUAIDUCM
ProtoNet666.64±0.5170.33±0.4867.84±0.15
RelationNet781.43±0.7374.46±0.8762.24±0.54
DLA-MatchNet182.62±0.5476.57±0.7463.86±0.37
SCL-MLNet282.77±0.2979.83±0.6667.39±0.74
FEAT883.84±0.3781.39±0.7175.87±0.52
SPNet980.55±0.4678.28±0.5873.52±0.53
TeAw1085.57±0.2584.62±0.2577.50±0.27
CLTP-ML86.46±0.4385.89±0.2480.32±0.06

Table2

Experimental results for each model parameter"

模型参数量/MFLOPs/G
ProtoNet611.841.82
RelationNet722.432.44
DLA-MatchNet115.061.95
SCL-MLNet214.242.03
FEAT837.523.98
SPNet912.962.26
TeAw1014.592.17
CLTP-ML14.352.06

Fig.7

Comparison of classification accuracy with different weighting factors"

Fig.8

Comparison of classification accuracy of different image encoders"

Table3

Comparison of classification accuracy of different attentional mechanisms"

方法数据集5-way 5-shot/%
SENetNWPU85.71±0.63
ECANetNWPU85.84±0.28
CBAMNWPU86.19±0.56
ECSAMNWPU86.46±0.43
SENetAID85.42±0.67
ECANetAID85.04±0.31
CBAMAID84.88±0.45
ECSAMAID85.89±0.24
SENetUCM79.37±0.72
ECANetUCM79.65±0.39
CBAMUCM79.97±0.24
ECSAMUCM80.32±0.36

Fig.9

Cluster heatmap"

Table4

Ablation experiment"

方法数据集分类准确度/%
CLTP-MLNWPU83.39±0.63
CLTP-ML+PNWPU85.84±0.28
CLTP-ML+P+TNWPU86.05±0.58
CLTP-ML+P+T+ANWPU86.46±0.43
CLTP-MLAID82.19±0.89
CLTP-ML+PAID84.83±0.24
CLTP-ML+P+TAID85.28±0.55
CLTP-ML+P+T+AAID85.89±0.36
CLTP-MLUCM77.26±0.72
CLTP-ML+PUCM78.89±0.52
CLTP-ML+P+TUCM79.73±0.44
CLTP-ML+P+T+AUCM80.32±0.41
[1] Li L, Han J, Yao X, et al.DLA-MatchNet for few-shot remote sensing image scene classification[J].IEEE Transactions on Geoscience and Remote Sensing, 2020, 59 (9): 7844-7853.
[2] Li X, Shi D, Diao X, et al.SCL-MLNet: boosting few-shot remote sensing scene classification via self-supervised contrastive learning[J].IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-12.
[3] 崔璐, 张鹏, 车进.基于深度神经网络的遥感图像分类算法综述[J]. 计算机科学,2018, 45 (6): 50-53.
Cui Lu, Zhang Peng, Che Jin. A review of deep neural network based remote sensing image classification algorithms[J]. Computer Science,2018, 45(6): 50-53.
[4] Gong T, Zheng X, Lu X.Meta self-supervised learning for distribution shifted few-shot scene classification[J].IEEE Geoscience and Remote Sensing Letters, 2022, 19: 1-5.
[5] Yuan Z, Tang C, Yang A, et al.Few-shot remote sensing image scene classification based on metric learning and local descriptors[J].Remote Sensing,2023, 15 (3): No.831.
[6] Snell J, Swersky K, Zemel R.Prototypical networks for few-shot learning[J].Advances in Neural Information Processing Systems, 2017, 30:No. 05175.
[7] Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199-1208.
[8] Ye H J, Hu H, Zhan D C, et al. Few-shot learning via embedding adaptation with set-to-set functions[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle,USA, 2020: 8808-8817.
[9] Cheng G, Cai L, Lang C, et al.SPNet: Siamese-prototype network for few-shot remote sensing image scene classification[J].IEEE Transactions on Geoscience and Remote Sensing,2021, 60: 1-11.
[10] Cheng K, Yang C, Fan Z, et al. TeAw: Text-aware few-shot remote sensing image scene classification[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Rhodes Island, Greece, 2023: 1-5.
[11] Khosla P, Teterwak P, Wang C, et al.Supervised contrastive learning[J].Advances in Neural Information Processing Systems, 2020, 33: 18661-18673.
[12] Vaswani A, Shazeer N, Parmar N, et al.Attention is all you need[J].Advances in Neural Information Processing Systems, 2017, 30: No. 03762.
[13] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]∥International Conference on Machine Learning, Jeju Island, Repubic of Korea, 2021: 8748-8763.
[14] Li Y, Zhu Z, Yu J-G, et al.Learning deep cross-modal embedding networks for zero-shot remote sensing image scene classification[J].IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(12): 10590-10603.
[15] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770-778.
[16] Tolstikhin I O, Houlsby N, Kolesnikov A, et al.Mlp-mixer: an all-mlp architecture for vision[J].Advances in Neural Information Processing Systems,2021, 34: 24261-24272.
[17] Cheng G, Han J, Lu X.Remote sensing image scene classification: benchmark and state of the art[J].Proceedings of the IEEE, 2017, 105 (10): 1865-1883.
[18] Xia G S, Hu J, Hu F, et al.AID: a benchmark data set for performance evaluation of aerial scene classification[J].IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[19] Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification[C]∥Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, USA,2010: 270-279.
[20] Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132-7141.
[21] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]∥Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Online, 2020: 11534-11542.
[22] Woo S, Park J, Lee J-Y, et al. Cbam: Convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision(ECCV), Glasgow, England, 2018: 3-19.
[23] Fu R, Hu Q, Dong X, et al.Axiom-based grad-cam: Towards accurate visualization and explanation of cnns[J/OL].[2023-10-11].,2020
[1] Jing-shu YUAN,Wu LI,Xing-yu ZHAO,Man YUAN. Semantic matching model based on BERTGAT-Contrastive [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2383-2392.
[2] Xiang-jiu CHE,Liang LI. Graph similarity measurement algorithm combining global and local fine-grained features [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2365-2371.
[3] Feng-feng ZHOU,Zhe GUO,Yu-si FAN. Feature representation algorithm for imbalanced classification of multi⁃omics cancer data [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2089-2096.
[4] Jian WANG,Chen-wei JIA. Trajectory prediction model for intelligent connected vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 1963-1972.
[5] Xiang-jiu CHE,Yu-peng SUN. Graph node classification algorithm based on similarity random walk aggregation [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 2069-2075.
[6] Xiang-jiu CHE,Yu-ning WU,Quan-le LIU. A weighted isomorphic graph classification algorithm based on causal feature learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(2): 681-686.
[7] Li-ming LIANG,Long-song ZHOU,Jiang YIN,Xiao-qi SHENG. Fusion multi-scale Transformer skin lesion segmentation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1086-1098.
[8] Xiao-xu LI,Wen-juan AN,Ji-jie WU,Zhen LI,Ke ZHANG,Zhan-yu MA. Channel attention bilinear metric network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 524-532.
[9] Dondrub LHAKPA,Duoji ZHAXI,Jie ZHU. Tibetan text normalization method [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(12): 3577-3588.
[10] Yu-xin YE,Luo-jia XIA,Ming-hui SUN. Gesture input method based on transparent keyboard in augmented reality environment [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(11): 3274-3282.
[11] Na CHE,Yi-ming ZHU,Jian ZHAO,Lei SUN,Li-juan SHI,Xian-wei ZENG. Connectionism based audio-visual speech recognition method [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(10): 2984-2993.
[12] Ya-hui ZHAO,Fei-yu LI,Rong-yi CUI,Guo-zhe JIN,Zhen-guo ZHANG,De LI,Xiao-feng JIN. Korean⁃Chinese translation quality estimation based on cross⁃lingual pretraining model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2371-2379.
[13] Shan XUE,Ya-liang ZHANG,Qiong-ying LYU,Guo-hua CAO. Anti⁃unmanned aerial vehicle system object detection algorithm under complex background [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 891-901.
[14] Zhen WANG,Xiao-han YANG,Nan-nan WU,Guo-kun LI,Chuang FENG. Ordinal cross entropy Hashing based on generative adversarial network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3536-3546.
[15] Feng-feng ZHOU,Zhen-wei YAN. A model for identifying neuropeptides by feature selection based on hybrid features [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(11): 3238-3245.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!