Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (12): 3518-3528.doi: 10.13229/j.cnki.jdxbgxb.20220166

Previous Articles    

Anchorfree target tracking algorithm based on multiple attention mechanism

Jing-hong LIU1(),An-ping DENG1,2,Qi-qi CHEN1,2,Jia-qi PENG3,Yu-jia ZUO1   

  1. 1.Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China
    2.University of Chinese of Sciences,Beijing 100039,China
    3.The First Military Representative Office of the Military Representative Bureau of the Army Equipment Department of the Chinese People's Liberation Army in Shenyang and in Changchun,Changchun 130022,China
  • Received:2022-02-21 Online:2023-12-01 Published:2024-01-12

Abstract:

Siamese network based trackers have two branches which are independent of each other and lack of infor-mation interaction. So it cannot accurately and robust tracking under the challenges of target occlusion and similar object. To solve this problem, an anchor-free target tracking algorithm based on multiple attention mechanism was proposed. Multiple attention mechanism was used to encode the target template and search area features. After improving the feature significance through self-attention mechanism, mutual attention mechanism was used to aggregate the feature interaction between target template and search area, which strengthens this algorithm's discri-mination ability between target and background. At the same time, the anchor-free mechanism was used to complete the end-to-end visual target tracking task pixel by pixel, avoiding the disadvantages of human intervention caused by the anchor frame mechanism. Extensive experiments are conducted on many challenging benchmarks like OTB50, OTB100 and GOT-10K. These results show the anchor-free target tracking algorithm based on multiple attention mechanism proposed has strong robustness against the challenges of target occlusion and similar object, and effectively improves the precision rate and success rate of the tracking algorithm.

Key words: computer vision, object tracking, attention mechanism, anchor-free

CLC Number: 

  • TP391.4

Fig.1

The proposed algorithm frame diagram"

Fig.2

Spatial attention mechanism map"

Fig.3

Channel attention mechanism map"

Fig.4

Cross attention mechanism map"

Fig.5

OTB50 Comparison chart"

Fig.6

Comparison of accuracy and success rate of 4 attributes on OTB50 dataset"

Fig.7

Comparison of heat map"

Fig.8

Comparison chart of qualitative results"

Table 1

Experimental comparison results of different algorithms on the GOT-10K dataset"

指标SiamFCSiamRPNSiamRPN ++ATOMSiamCAR本文
AO0.3740.4630.5160.5560.5690.576
SR0.500.4040.5490.6200.6340.6700.672
SR0.750.1440.2530.3340.4020.4150.439
FPS25.87426211817

Table 2

Comparative experiment on operational efficiency"

算法FLOPsParamsFPS
SiamCAR83.2G91.9M18
本文84.5G93.9M17
1 Guo Dong-yan, Wang Jun, Cui Ying, et al. SiamCAR: siamese fully convolutional classification and regression for visual tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6269-6277.
2 Baker S, Matthews I. Lucas-kanade 20 years on: a unifying framework[J]. International Journal of Computer Vision, 2004, 56(3): 221-255.
3 Collins R T. Mean-shift blob tracking through scale space[C]∥2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, 2003: No. II-234.
4 Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015, 37(3): 583-596.
5 Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 1420-1429.
6 Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]∥European Conference on Computer Vision, Germany, Cham, 2016: 850-865.
7 Li Bo, Yan Junjie, Wu Wei, et al. High performance visual tracking with siamese region proposal network[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 8971-8980.
8 Li Bo, Wu Wei, Wang Qiang, et al. Evolution of siamese visual tracking with very deep networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 16-20.
9 Zhu Zheng, Wang Qiang, Li Bo, et al. Distractor-aware siamese networks for visual object tracking[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 101-117.
10 王侃, 苏航, 曾浩, 等. 表观增强的深度目标跟踪算法[J]. 吉林大学学报: 工学版, 2022, 52(11): 2676-2684.
Wang Kan, Su Hang, Zeng Hao, et al. Deep target tracking using augmented apparent information[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(11): 2676-2684.
11 Roy A G, Navab N, Wachinger C. Concurrent spatial and channel' squeeze & excitation' in fully convolutional networks[C]∥International Conference on Medical Image Computing and Computer-assisted Intervention, Germany, Cham, 2018: 421-429.
12 Wang X L, Girshick R, Gupta A, et al. Non-local neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 7794-7803.
13 Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 3-19.
14 He An-feng, Luo Chong, Tian Xin-mei, et al. A twofold siamese network for real-time object tracking[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 4834-4843.
15 Wang Qiang, Teng Zhu, Xing Jun-liang, et al. Learning attentions: residual attentional siamese network for high performance online visual tracking[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 4854-4863.
16 才华, 王学伟, 付强, 等. 基于动态模板更新的孪生网络目标跟踪算法[J]. 吉林大学学报: 工学版, 2022, 52(5): 1106-1116.
Cai Hua, Wang Xue-wei, Fu Qiang, et al. Siamese network target tracking algorithm based on dynamic template updating[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(5): 1106-1116.
17 He Kai-ming, Zhang Xiang-yu, Ren Shao-qing, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 770-778.
18 Howard A G, Zhu M L, Chen B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[J/OL]. [2022-02-01].
19 Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context[C]∥European Conference on Computer Vision, Cham,Germany, 2014: 740-755.
20 Huang Liang-hua, Zhao Xin, Huang Kai-qi. Got-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(5): 1562-1577.
21 Deng J, Dong W, Socher R, et al. Imagenet: a large-scale hierarchical image database[C]∥2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 248-255.
22 Wu Yi, Jongwoo Lim, Yang Ming-hsuan. Online object tracking: a benchmark[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, US, 2013: 2411-2418.
[1] Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
[2] Xiao-xin GUO,Jia-hui LI,Bao-liang ZHANG. Joint segmentation of optic cup and disc based on high resolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2350-2357.
[3] Fei-fei TANG,Hai-lian ZHOU,Tian-jun TANG,Hong-zhou ZHU,Yong WEN. Multi⁃step prediction method of landslide displacement based on fusion dynamic and static variables [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(6): 1833-1841.
[4] Yan-tao TIAN,Xing HUANG,Hui-qiu LU,Kai-ge WANG,Fu-qiang XU. Multi⁃mode behavior trajectory prediction of surrounding vehicle based on attention and depth interaction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(5): 1474-1480.
[5] Wei LYU,Jia-ze HAN,Jing-hui CHU,Pei-guang JING. Multi⁃modal self⁃attention network for video memorability prediction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1211-1219.
[6] Yan-tao TIAN,Fu-qiang XU,Kai-ge WANG,Zi-xu HAO. Expected trajectory prediction of vehicle considering surrounding vehicle information [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 674-681.
[7] Sheng JIANG,Peng-lang WANG,Zhi-ji DENG,Yi-ming BIE. Image fusion algorithm for traffic accident rescue based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3472-3480.
[8] You QU,Wen-hui LI. Multiple object tracking method based on multi-task joint learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2932-2941.
[9] Ji-hong OUYANG,Ze-qi GUO,Si-guang LIU. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 648-656.
[10] Xian-tong LI,Wei QUAN,Hua WANG,Peng-cheng SUN,Peng-jin AN,Yong-xing MAN. Route travel time prediction on deep learning model through spatiotemporal features [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 557-563.
[11] Xiao⁃lei CHEN,Yong⁃feng SUN,Ce LI,Dong⁃mei LIN. Stable anti⁃noise fault diagnosis of rolling bearing based on CNN⁃BiLSTM [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(2): 296-309.
[12] Kan WANG,Hang SU,Hao ZENG,Jian QIN. Deep target tracking using augmented apparent information [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(11): 2676-2684.
[13] Da-ke ZHOU,Chao ZHANG,Xin YANG. Self-supervised 3D face reconstruction based on multi-scale feature fusion and dual attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(10): 2428-2437.
[14] Jie CAO,Xue QU,Xiao-xu LI. Few⁃shot image classification method based on sliding feature vectors [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1785-1791.
[15] De-xing WANG,Ruo-you WU,Hong-chun YUAN,Peng GONG,Yue WANG. Underwater image restoration based on multi-scale attention fusion and convolutional neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1396-1404.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!