通道注意力双线性度量网络

doi:10.13229/j.cnki.jdxbgxb.20221176

Abstract

Abstract:

In few-shot image classification tasks， when the model measures similar images of different classes， due to the lack of attention to local important features of the sample and the difficulty in capturing the subtle differences between similar images， the classification boundary between query samples and correct class prototype is fuzzy. Based on this， this paper proposes a Channel Attention Bilinear Metric Network （CABMN）， which firstly increases the attention of the model to the local important region of the image， and then uses the bilinear Hadamard product operation to mine the deep second-order feature information of the important region， so that the model can locate the local key region of the image more accurately. Comparative experimental results show that the proposed CABMN has improved the classification performance on all datasets， especially on the fine-grained datasets CUB-200-2011 and Stanford-Cars， reaching 86.19% and 81.51% classification accuracy.

Key words: few-shot learning, fine-grained image classification, metric learning, attention mechanism, Hadamard product

CLC Number:

TP391

Xiao-xu LI,Wen-juan AN,Ji-jie WU,Zhen LI,Ke ZHANG,Zhan-yu MA. Channel attention bilinear metric network[J].Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 524-532.

Figures/Tables 8

Fig.1

Fig.2

Table 1

Table 2

Table 3

Fig.3

Table4

Fig.4

References 27

1	Huang H, Zhang J, Zhang J, et al. Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification[J]. IEEE Transactions on Multimedia, 2020, 23: 1666-1680.
2	曹洁, 屈雪, 李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报: 工学版, 2021, 51(5): 1785-1791.
	Cao Jie, Qu Xue, Li Xiao-xu. Few⁃shot image classification method based on sliding feature vectors[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(5): 1785-1791.
3	Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning[C]∥Conference on Neural Information Processing Systems, Long Beach California, USA, 2017: 4080-4090.
4	Nguyen V N, Løkse S, Wickstrøm K, et al. Sen: a novel feature normalization dissimilarity measure for prototypical few-shot learning networks[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 118-134.
5	Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning[C]∥Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3637-3645.
6	Zhang C, Cai Y, Lin G, et al. Deepemd: few-shot image classification with differentiable earth mover's distance and structured classifiers[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12203-12213.
7	刘萍萍, 赵宏伟, 耿庆田, 等. 基于局部特征和视皮层识别机制的图像分类[J]. 吉林大学学报: 工学版, 2011, 41(5): 1401-1406.
	Liu Ping-ping, Zhao Hong-wei, Geng Qing-tian, et al. Image classification method based on local feature and visual cortex recognition mechanism[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(5): 1401-1406.
8	Huang K, Geng J, Jiang W, et al. Pseudo-loss confidence metric for semi-supervised few-shot learning[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8671-8680.
9	Lin T Y, RoyChowdhury A, Maji S. Bilinear CNN models for fine-grained visual recognition[C]∥IEEE/CVF International Conference on Computer Vision, Santiago, USA, 2015: 1449-1457.
10	Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset[J]. California Institute of Technology, 2011, 7: 7138640.
11	Khosla A, Jayadevaprakash N, Yao B, et al. Novel dataset for fine-grained image categorization: Stanford dogs[J/OL]. [2022-09-12].
12	Krause J, Stark M, Deng J, et al. 3D object representations for fine-grained categorization[C]∥IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 2013: 554-561.
13	Maji S, Rahtu E, Kannala J, et al. Fine-grained visual classification of aircraft[J]. ArXiv Preprint, 2013, 7:13065151.
14	Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
15	Liu Y, Lee J, Park M, et al. Transductive propagation network for few-shot learning[J]. arXiv E-prints, 2018, 5: 180510002.
16	Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for few-shot learning[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018:1199-1208.
17	Chen W Y, Liu Y C, Kira Z, et al. A closer look at few-shot classification[J]. arXiv E-prints, 2019, 4: 190404232.
18	Wu Z, Li Y, Guo L, et al. Parn: position-aware relation networks for few-shot learning[C]∥IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 6659-6667.
19	Zhu Y, Liu C, Jiang S. Multi-attention meta learning for few-shot fine-grained image recognition[C]∥Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track, Yokohama, Japan, 2020: 1090-1096.
20	Li X, Wu J, Sun Z, et al. BSNet: bi-similarity network for few-shot fine-grained image classification[J]. IEEE Transactions on Image Processing, 2020, 30: 1318-1331.
21	Xu J, Le H, Huang M, et al. Variational feature disentangling for fine-Grained few-shot classification[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8792-8801.
22	Afrasiyabi A, Lalonde J F, Gagné C. Mixture-based feature space learning for few-shot image classification[C]∥IEEE International Conference on Computer Vision, Montreal, Canada, 2021: 9041-9051.
23	Ye H J, Hu H, Zhan D C, et al. Few-shot learning via embedding adaptation with set-to-set functions[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8808-8817.
24	Kang D, Kwon H, Min J, et al. Relational embedding for few-shot classification[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 8822-8833.
25	Finn C, Abbeel P, Lwvine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]∥International Conference on Machine Learning, Sydney, Australia, 2017: 1126-1135.
26	Cai Q, Pan Y, Yao T, et al. Memory matching networks for one-shot image recognition[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, 2018: 4080-4088.
27	Li X, Wu J, Chang D, et al. Mixed attention mechanism for small-sample fine-grained image classification[C]∥Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Lanzhou, China, 2019: 80-85.

Related Articles 15

[1]	Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[2]	Xiao-xin GUO,Jia-hui LI,Bao-liang ZHANG. Joint segmentation of optic cup and disc based on high resolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2350-2357.
[3]	Fei-fei TANG,Hai-lian ZHOU,Tian-jun TANG,Hong-zhou ZHU,Yong WEN. Multi⁃step prediction method of landslide displacement based on fusion dynamic and static variables [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(6): 1833-1841.
[4]	Yan-tao TIAN,Xing HUANG,Hui-qiu LU,Kai-ge WANG,Fu-qiang XU. Multi⁃mode behavior trajectory prediction of surrounding vehicle based on attention and depth interaction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1474-1480.
[5]	Wei LYU,Jia-ze HAN,Jing-hui CHU,Pei-guang JING. Multi⁃modal self⁃attention network for video memorability prediction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1211-1219.
[6]	Yan-tao TIAN,Fu-qiang XU,Kai-ge WANG,Zi-xu HAO. Expected trajectory prediction of vehicle considering surrounding vehicle information [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 674-681.
[7]	Gui-xia LIU,Yu-xin TIAN,Tao WANG,Ming-rui MA. Pancreas segmentation algorithm based on dual input 3D convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3565-3572.
[8]	Jing-hong LIU,An-ping DENG,Qi-qi CHEN,Jia-qi PENG,Yu-jia ZUO. Anchor⁃free target tracking algorithm based on multiple attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3518-3528.
[9]	Sheng JIANG,Peng-lang WANG,Zhi-ji DENG,Yi-ming BIE. Image fusion algorithm for traffic accident rescue based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3472-3480.
[10]	Ji-hong OUYANG,Ze-qi GUO,Si-guang LIU. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 648-656.
[11]	Xian-tong LI,Wei QUAN,Hua WANG,Peng-cheng SUN,Peng-jin AN,Yong-xing MAN. Route travel time prediction on deep learning model through spatiotemporal features [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 557-563.
[12]	Xiao⁃lei CHEN,Yong⁃feng SUN,Ce LI,Dong⁃mei LIN. Stable anti⁃noise fault diagnosis of rolling bearing based on CNN⁃BiLSTM [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 296-309.
[13]	Da-ke ZHOU,Chao ZHANG,Xin YANG. Self-supervised 3D face reconstruction based on multi-scale feature fusion and dual attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2428-2437.
[14]	Jie CAO,Xue QU,Xiao-xu LI. Few⁃shot image classification method based on sliding feature vectors [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1785-1791.
[15]	De-xing WANG,Ruo-you WU,Hong-chun YUAN,Peng GONG,Yue WANG. Underwater image restoration based on multi-scale attention fusion and convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1396-1404.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

对比方法	CUB-200-2011		Stanford-Cars		Stanford-Dogs
对比方法	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot
Matching ^［5］	60.06±0.88	74.57±0.73	44.73±0.77	64.74±0.72	46.10±0.86	59.79±0.72
ProtoNet^*［3］	62.97±0.23	83.64±0.15	48.42±0.22	71.38±0.18	45.12±0.26	69.16±0.26
RelationNet^［16］	63.94±0.92	77.87±0.64	46.04±0.91	68.52±0.78	47.35±0.88	66.20±0.74
Baseline++^*［17］	62.36±0.84	79.08±0.61	46.82±0.76	68.20±0.72	44.49±0.70	64.48±0.66
PARN^［18］	79.86±0.85	88.85±0.54	60.23±0.97	71.17±0.76	55.71±0.97	69.01±0.74
DeepEMD^［6］	64.08±0.50	80.55±0.71	61.63±0.27	72.95±0.38	46.73±0.49	65.74±0.63
LRPABN^［1］	63.63±0.77	76.06±0.58	60.28±0.76	73.29±0.58	45.72±0.75	60.94±0.66
MattML^［19］	66.29±0.56	80.34±0.30	66.11±0.54	82.80±0.28	54.84±0.53	71.34±0.38
BSNet（P&C）^［20］	55.81±0.97	76.34±0.65	44.56±0.83	63.72±0.78	43.14±0.85	62.61±0.73
VFD^［21］	68.42±0.92	82.42±0.61	-	-	57.03±0.86	73.00±0.66
MixtFSL^*［22］	53.61±0.88	73.24±0.75	44.56±0.80	59.63±0.79	43.96±0.77	64.43±0.68
本文	71.15±0.24	86.19±0.14	66.20±0.24	81.51±0.16	57.64±0.23	76.61±0.16

对比方法	CUB-200-2011		Stanford-Cars		Stanford-Dogs
对比方法	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot
FEAT^［23］	73.27±0.22	85.77±0.14	-	-	-	-
DeepEMD^*［6］	71.11±0.31	86.30±0.19	73.30±0.29	88.37±0.17	67.59±0.30	83.13±0.20
RENet^*［24］	79.49±0.44	91.11±0.24	79.66±0.44	91.95±0.22	71.69±0.47	85.60±0.30
MixtFSL^*［17］	67.86±0.94	82.18±0.66	58.15±0.87	80.54±0.63	67.26±0.90	82.05±0.56
本文	79.23±0.21	90.07±0.12	82.83±0.20	91.95±0.23	72.29±0.22	86.38±0.13

方法	mini-ImageNet
方法	1-shot	5-shot
Matching^［5］	48.14±0.78	63.48±0.66
MAML^［25］	46.47±0.82	62.71±0.71
MemoryNetwork^［26］	53.37±0.48	66.97±0.35
RelationNet^［16］	49.31±0.85	66.60±0.69
Baseline++^［18］	48.24±0.75	66.43±0.63
Baseline^［18］	42.11±0.71	62.53±0.69
本文	52.79±0.24	67.02±0.16

骨干网络	方法	CUB-200-2011		Stanford-Cars
骨干网络	方法	1-shot	5-shot	1-shot	5-shot
Conv-4	Ours（PSA+BM）	70.31	84.56	65.41	81.35
	Ours（PMA+BM）	70.51	85.16	62.03	80.09
	Ours（PCA+BM）	71.15	86.19	66.20	81.51
ResNet-12	Ours（PSA+BM）	77.56	89.31	83.14	91.89
	Ours（PMA+BM）	76.92	89.85	81.71	91.81
	Ours（PCA+BM）	79.23	90.07	82.83	91.95

Channel attention bilinear metric network

RICH HTML

PDF (PC)