Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (2): 524-532.doi: 10.13229/j.cnki.jdxbgxb.20221176

Previous Articles    

Channel attention bilinear metric network

Xiao-xu LI1(),Wen-juan AN1,Ji-jie WU1,Zhen LI1,Ke ZHANG2,3,Zhan-yu MA4   

  1. 1.School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China
    2.Department of Electronics and Communication Engineering,North China Electric Power University,Baoding 071003,China
    3.Hebei Key Laboratory of Power Internet of Things Technology,North China Electric Power University,Baoding 071003,China
    4.Laboratory of Pattern Recognition and Intelligent System,School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2022-09-12 Online:2024-02-01 Published:2024-03-29

Abstract:

In few-shot image classification tasks, when the model measures similar images of different classes, due to the lack of attention to local important features of the sample and the difficulty in capturing the subtle differences between similar images, the classification boundary between query samples and correct class prototype is fuzzy. Based on this, this paper proposes a Channel Attention Bilinear Metric Network (CABMN), which firstly increases the attention of the model to the local important region of the image, and then uses the bilinear Hadamard product operation to mine the deep second-order feature information of the important region, so that the model can locate the local key region of the image more accurately. Comparative experimental results show that the proposed CABMN has improved the classification performance on all datasets, especially on the fine-grained datasets CUB-200-2011 and Stanford-Cars, reaching 86.19% and 81.51% classification accuracy.

Key words: few-shot learning, fine-grained image classification, metric learning, attention mechanism, Hadamard product

CLC Number: 

  • TP391

Fig.1

Illustration of proposed CABMN for a few-shot learning task in 3-way 1-shot setting"

Fig.2

Detailed structure diagram of the three modules in CABMN"

Table 1

Few-shot classification performance on the CUB-200-2011, Stanford-Cars and Stanford-Dogs datasets under the Conv-4 network structure"

对比方法CUB-200-2011Stanford-CarsStanford-Dogs
1-shot5-shot1-shot5-shot1-shot5-shot
Matching 560.06±0.8874.57±0.7344.73±0.7764.74±0.7246.10±0.8659.79±0.72
ProtoNet*[362.97±0.2383.64±0.1548.42±0.2271.38±0.1845.12±0.2669.16±0.26
RelationNet1663.94±0.9277.87±0.6446.04±0.9168.52±0.7847.35±0.8866.20±0.74
Baseline++*[1762.36±0.8479.08±0.6146.82±0.7668.20±0.7244.49±0.7064.48±0.66
PARN1879.86±0.8588.85±0.5460.23±0.9771.17±0.7655.71±0.9769.01±0.74
DeepEMD664.08±0.5080.55±0.7161.63±0.2772.95±0.3846.73±0.4965.74±0.63
LRPABN163.63±0.7776.06±0.5860.28±0.7673.29±0.5845.72±0.7560.94±0.66
MattML1966.29±0.5680.34±0.3066.11±0.5482.80±0.2854.84±0.5371.34±0.38
BSNet(P&C)2055.81±0.9776.34±0.6544.56±0.8363.72±0.7843.14±0.8562.61±0.73
VFD2168.42±0.9282.42±0.61--57.03±0.8673.00±0.66
MixtFSL*[2253.61±0.8873.24±0.7544.56±0.8059.63±0.7943.96±0.7764.43±0.68
本文71.15±0.2486.19±0.1466.20±0.2481.51±0.1657.64±0.2376.61±0.16

Table 2

Few-shot classification performance on the CUB-200-2011, Stanford-Cars and Stanford-Dogs datasets under the ResNet-12 network structure"

对比方法CUB-200-2011Stanford-CarsStanford-Dogs
1-shot5-shot1-shot5-shot1-shot5-shot
FEAT2373.27±0.2285.77±0.14----
DeepEMD*[671.11±0.3186.30±0.1973.30±0.2988.37±0.1767.59±0.3083.13±0.20
RENet*[2479.49±0.4491.11±0.2479.66±0.4491.95±0.2271.69±0.4785.60±0.30
MixtFSL*[1767.86±0.9482.18±0.6658.15±0.8780.54±0.6367.26±0.9082.05±0.56
本文79.23±0.2190.07±0.1282.83±0.2091.95±0.2372.29±0.2286.38±0.13

Table 3

Classification performance on mini-ImageNet dataset under the Conv-4 network structure"

方法mini-ImageNet
1-shot5-shot
Matching548.14±0.7863.48±0.66
MAML2546.47±0.8262.71±0.71
MemoryNetwork2653.37±0.4866.97±0.35
RelationNet1649.31±0.8566.60±0.69
Baseline++1848.24±0.7566.43±0.63
Baseline1842.11±0.7162.53±0.69
本文52.79±0.2467.02±0.16

Fig.3

Experimental results of CABMN ablation on four datasets"

Table4

Compared with the network experiment results after the combination of different attention"

骨干网络方法CUB-200-2011Stanford-Cars
1-shot5-shot1-shot5-shot
Conv-4Ours(PSA+BM)70.3184.5665.4181.35
Ours(PMA+BM)70.5185.1662.0380.09
Ours(PCA+BM)71.1586.1966.2081.51
ResNet-12Ours(PSA+BM)77.5689.3183.1491.89
Ours(PMA+BM)76.9289.8581.7191.81
Ours(PCA+BM)79.2390.0782.8391.95

Fig.4

Visualization of similarity scores of Ours(PSA+BM)、Ours(PMA+BM)、Ours(PCA+BM) on CUB and Cars datasets."

1 Huang H, Zhang J, Zhang J, et al. Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification[J]. IEEE Transactions on Multimedia, 2020, 23: 1666-1680.
2 曹洁, 屈雪, 李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报: 工学版, 2021, 51(5): 1785-1791.
Cao Jie, Qu Xue, Li Xiao-xu. Few⁃shot image classification method based on sliding feature vectors[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(5): 1785-1791.
3 Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning[C]∥Conference on Neural Information Processing Systems, Long Beach California, USA, 2017: 4080-4090.
4 Nguyen V N, Løkse S, Wickstrøm K, et al. Sen: a novel feature normalization dissimilarity measure for prototypical few-shot learning networks[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 118-134.
5 Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]∥Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3637-3645.
6 Zhang C, Cai Y, Lin G, et al. Deepemd: few-shot image classification with differentiable earth mover's distance and structured classifiers[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12203-12213.
7 刘萍萍, 赵宏伟, 耿庆田, 等. 基于局部特征和视皮层识别机制的图像分类[J]. 吉林大学学报: 工学版, 2011, 41(5): 1401-1406.
Liu Ping-ping, Zhao Hong-wei, Geng Qing-tian, et al. Image classification method based on local feature and visual cortex recognition mechanism[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(5): 1401-1406.
8 Huang K, Geng J, Jiang W, et al. Pseudo-loss confidence metric for semi-supervised few-shot learning[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8671-8680.
9 Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition[C]∥IEEE/CVF International Conference on Computer Vision, Santiago, USA, 2015: 1449-1457.
10 Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset[J]. California Institute of Technology, 2011, 7: 7138640.
11 Khosla A, Jayadevaprakash N, Yao B, et al. Novel dataset for fine-grained image categorization: Stanford dogs[J/OL]. [2022-09-12].
12 Krause J, Stark M, Deng J, et al. 3D object representations for fine-grained categorization[C]∥IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2013: 554-561.
13 Maji S, Rahtu E, Kannala J, et al. Fine-grained visual classification of aircraft[J]. ArXiv Preprint, 2013, 7:13065151.
14 Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
15 Liu Y, Lee J, Park M, et al. Transductive propagation network for few-shot learning[J]. arXiv E-prints, 2018, 5: 180510002.
16 Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018:1199-1208.
17 Chen W Y, Liu Y C, Kira Z, et al. A closer look at few-shot classification[J]. arXiv E-prints, 2019, 4: 190404232.
18 Wu Z, Li Y, Guo L, et al. Parn: position-aware relation networks for few-shot learning[C]∥IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 6659-6667.
19 Zhu Y, Liu C, Jiang S. Multi-attention meta learning for few-shot fine-grained image recognition[C]∥Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track, Yokohama, Japan, 2020: 1090-1096.
20 Li X, Wu J, Sun Z, et al. BSNet: bi-similarity network for few-shot fine-grained image classification[J]. IEEE Transactions on Image Processing, 2020, 30: 1318-1331.
21 Xu J, Le H, Huang M, et al. Variational feature disentangling for fine-Grained few-shot classification[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8792-8801.
22 Afrasiyabi A, Lalonde J F, Gagné C. Mixture-based feature space learning for few-shot image classification[C]∥IEEE International Conference on Computer Vision, Montreal, Canada, 2021: 9041-9051.
23 Ye H J, Hu H, Zhan D C, et al. Few-shot learning via embedding adaptation with set-to-set functions[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8808-8817.
24 Kang D, Kwon H, Min J, et al. Relational embedding for few-shot classification[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8822-8833.
25 Finn C, Abbeel P, Lwvine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]∥International Conference on Machine Learning, Sydney, Australia, 2017: 1126-1135.
26 Cai Q, Pan Y, Yao T, et al. Memory matching networks for one-shot image recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, 2018: 4080-4088.
27 Li X, Wu J, Chang D, et al. Mixed attention mechanism for small-sample fine-grained image classification[C]∥Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Lanzhou, China, 2019: 80-85.
[1] Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[2] Xiao-xin GUO,Jia-hui LI,Bao-liang ZHANG. Joint segmentation of optic cup and disc based on high resolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2350-2357.
[3] Fei-fei TANG,Hai-lian ZHOU,Tian-jun TANG,Hong-zhou ZHU,Yong WEN. Multi⁃step prediction method of landslide displacement based on fusion dynamic and static variables [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(6): 1833-1841.
[4] Yan-tao TIAN,Xing HUANG,Hui-qiu LU,Kai-ge WANG,Fu-qiang XU. Multi⁃mode behavior trajectory prediction of surrounding vehicle based on attention and depth interaction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1474-1480.
[5] Wei LYU,Jia-ze HAN,Jing-hui CHU,Pei-guang JING. Multi⁃modal self⁃attention network for video memorability prediction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1211-1219.
[6] Yan-tao TIAN,Fu-qiang XU,Kai-ge WANG,Zi-xu HAO. Expected trajectory prediction of vehicle considering surrounding vehicle information [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 674-681.
[7] Gui-xia LIU,Yu-xin TIAN,Tao WANG,Ming-rui MA. Pancreas segmentation algorithm based on dual input 3D convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3565-3572.
[8] Jing-hong LIU,An-ping DENG,Qi-qi CHEN,Jia-qi PENG,Yu-jia ZUO. Anchorfree target tracking algorithm based on multiple attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3518-3528.
[9] Sheng JIANG,Peng-lang WANG,Zhi-ji DENG,Yi-ming BIE. Image fusion algorithm for traffic accident rescue based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3472-3480.
[10] Ji-hong OUYANG,Ze-qi GUO,Si-guang LIU. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 648-656.
[11] Xian-tong LI,Wei QUAN,Hua WANG,Peng-cheng SUN,Peng-jin AN,Yong-xing MAN. Route travel time prediction on deep learning model through spatiotemporal features [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 557-563.
[12] Xiao⁃lei CHEN,Yong⁃feng SUN,Ce LI,Dong⁃mei LIN. Stable anti⁃noise fault diagnosis of rolling bearing based on CNN⁃BiLSTM [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 296-309.
[13] Da-ke ZHOU,Chao ZHANG,Xin YANG. Self-supervised 3D face reconstruction based on multi-scale feature fusion and dual attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2428-2437.
[14] Jie CAO,Xue QU,Xiao-xu LI. Few⁃shot image classification method based on sliding feature vectors [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1785-1791.
[15] De-xing WANG,Ruo-you WU,Hong-chun YUAN,Peng GONG,Yue WANG. Underwater image restoration based on multi-scale attention fusion and convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1396-1404.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!