Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (10): 3384-3393.doi: 10.13229/j.cnki.jdxbgxb.20231415

Previous Articles    

Multi-pedestrian tracking based on Transformer double branch detection and re-identification

Dan-dan HUANG1(),Xin-ru ZHANG1,Zhi LIU1,2,Gang PENG3   

  1. 1.School of Electronics and Information Engineering,Changchun University of Science and Technology,Changchun 130022,China
    2.National and Local Joint Engineering Research Center of Space Photoelectric Technology,Changchun University of Science and Technology,Changchun 130022,China
    3.Changchun Shikai Technology Industry Co. ,Research and Development Center,Changchun 130015,China
  • Received:2023-12-17 Online:2025-10-01 Published:2026-02-03

Abstract:

Aiming at the problems of target misdetection and omission, inaccurate association, and re-identification error in multi-target tracking in dense pedestrian scenarios, this study proposes a multi-pedestrian tracking network based on Transformer. The algorithm consists of three modules: detection, data association and tracking, in which the detection module adopts the selective query recollection method to enhance the decoder's collection of key features, improve the model's ability to characterize the target, and effectively reduce the problem of target misdetection and omission; the data association module adopts the fusion strategy of bilinear LSTM and quadratic data association, to solve the inaccurate association of dense pedestrians due to the similarity of the appearance of the target; Finally, the attention pyramid is embedded into the pyramid spatio-temporal aggregation module on the tracking module to capture the spatio-temporal information of the feature map at different scales, which improves the accuracy of target re-identification.The performance of the proposed network is tested on the publicly available datasets MOT16, MOT17, and the experimental results show that the method in this study is able to achieve more accurate multi-pedestrian tracking compared to other methods.

Key words: computer vision, multi-pedestrian tracking, re-identification, data association

CLC Number: 

  • TP391

Fig.1

Overall framework"

Fig.2

Structure of selective query recollection module"

Fig.3

APNet Structure diagram"

Fig.4

ASTAM structure diagram"

Fig.5

Data association block diagram"

Table 1

Added changes in SQR with BL_Byte and PA on the MOT17 dataset"

算 法MOTAHOTAAssADetAIDF1

基准

+SQR

+SQR+BL_Byte

+SQR+BL_Byte+PA

66.86

67.43

68.27

68.77

49.18

50.75

54.23

55.15

46.07

48.20

53.95

55.66

53.14

53.97

55.07

55.10

60.79

61.57

67.02

68.94

Table 2

Comparison of experimental results of MOT16 test dataset"

算 法MOTAHOTAAssADetAIDF1
CTracker-V12067.6048.8043.7054.9057.20
GSDT2166.7055.9054.9057.2069.20
Tube_TK2264.0048.7045.5052.5059.40
KDNT2368.2050.1045.2056.0060.00
Hugmot22470.2052.1049.5055.3065.40
TraDes2570.1053.2050.9056.2064.70
CNNMTT2665.2049.4047.0052.2062.20
SmartSORT2760.4046.1041.9051.0056.10
TransTrack68.8447.4140.9355.5356.60
本文70.5255.8655.7656.5369.73

Table 3

Comparison of experimental results of MOT17 test dataset"

算 法MOTAHOTAAssADetAIDF1

Ctracker-V120

GSDT21

66.60

66.20

49.00

55.50

45.20

54.80

53.60

56.40

57.40

68.70

Tube_TK2263.0048.0045.1051.4058.60
CJTracker2858.7048.4048.0049.1058.20
Hugmot22468.8051.5049.4054.1064.60
TraDes2569.1052.7050.8055.2063.90
MENDER2965.0053.9054.4053.6067.10
QuasiDense3068.7053.9052.7055.6066.30
CTTrack173167.8052.2051.0053.8064.70
TransTrack66.8649.1846.0753.1460.79
本文68.7755.1555.6655.1068.94

Fig.6

MOT17-02 Visualization image"

Fig.7

MOT16-11 Visualization image"

[1] 丁贵鹏, 陶钢, 庞春桥, 等. 基于无锚的轻量化孪生网络目标跟踪算法[J]. 吉林大学学报: 理学版, 2023, 61(4): 890-898.
Ding Gui-peng, Tao Gang, Pang Chun-qiao, et al. Anchorless target tracking algorithm for lightweight siamese network[J]. Journal of Jilin University (Science Edition),2023,61(4):890-898.
[2] 徐涛, 马克, 刘才华. 基于深度学习的行人多目标跟踪方法[J]. 吉林大学学报: 工学版, 2021, 51(1): 27-38.
Xu Tao, Ma Ke, Liu Cai-hua, et al. Multi-object pedestrian tracking based on deep learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(1): 27-38.
[3] Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]∥IEEE International Conference on Image Processing (ICIP), Beijing, China, 2017: 3645-3649.
[4] Zhang Y, Wang C, Wang X, et al. Fairmot: on the fairness of detection and re-identification in multiple object tracking[J]. International Journal of Computer Vision, 2021, 129: 3069-3087.
[5] Xu Y, Ban Y, Delorme G, et al. TransCenter: Transformers with dense representations for multiple-object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(6): 7820-7835.
[6] Zhou X, Yin T, Koltun V, et al. Global tracking transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans, USA, 2022: 8771-8780.
[7] Cai J, Xu M, Li W, et al. Memot: multi-object tracking with memory[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 8090-8100.
[8] Chen F, Zhang H, Hu K, et al. Enhanced training of query-based object detection via selective query recollection[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Vancouver, Canada, 2023: 23756-23765.
[9] Wang Y, Zhang P, Gao S, et al. Pyramid spatial-temporal aggregation for video-based person re-identification[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision,Montreal, Canada, 2021: 12026-12035.
[10] Chen G, Gu T, Lu J, et al. Person re-identification via attention pyramid[J]. IEEE Transactions on Image Processing, 2021, 30: 7663-7676.
[11] Kim C, Li F X, Alotaibi M, et al. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 9553-9562.
[12] Zhang Y, Sun P, Jiang Y, et al. Bytetrack: multi-object tracking by associating every detection box[C]∥The 17th European Conference on Computer Vision,Tel Aviv, Israel, 2022: 1-21.
[13] Meinhardt T, Kirillov A, Leal-Taixe L, et al. Trackformer: multi-object tracking with transformers[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans,USA, 2022: 8844-8854.
[14] Zeng F, Dong B, Zhang Y, et al. Motr: end-to-end multiple-object tracking with transformer[C]∥The 17th European Conference on Computer Vision,Tel Aviv, Israel, 2022: 659-675.
[15] Sun P, Cao J, Jiang Y, et al. Transtrack: multiple object tracking with transformer[J/OL].[2023-11-20]. .
[16] Zhu X, Su W, Lu L, et al. Deformable detr: deformable transformers for end-to-end object detection[J/OL]. [2023-11-21]..
[17] 庄珊娜, 王君帅, 白晶, 等. 基于三维卷积与自注意力机制的视频行人重识别[J]. 吉林大学学报: 工学版, 2025, 55(7): 2409-2417.
Zhuang Shan-na, Wang Jun-shuai, Bai Jing, et al.Video-based person re-identification based on three-dimensional convolution and self-attention mechanism[J]. Journal of Jilin University (Engineering and Technology Edition), 2025, 55(7): 2409-2417.
[28] 涂淑琴, 黄正鑫, 梁云, 等. 改进TransTrack多目标生猪行为跟踪方法[J]. 农业工程学报, 2023, 39(15): 172-180.
Tu Shu-qin, Huang Zheng-xin, Liang Yun, et al.Improvement of the TransTrack multi-objective hog behavior tracking method[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023,39(15): 172-180.
[19] Guo Y, Stutz D, Schiele B. Robustifying token attention for vision transformers[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 17511-17522.
[20] Peng J, Wang C, Wan F, et al. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]∥Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, 2020: 145-161.
[21] Wang Y, Kitani K, Weng X. Joint object detection and multi-object tracking with graph neural networks[C]∥IEEE International Conference on Robotics and Automation(ICRA), Xi'an, China, 2021: 13708-13715.
[22] Pang B, Li Y, Zhang Y, et al. Tubetk: adopting tubes to track multi-object in a one-step training model[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Seattle, USA, 2020: 6307-6317.
[23] Yu F, Li W, Li Q, et al. Poi: multiple object tracking with high performance detection and appearance feature[C]∥European Conferenceon Computer Vision: amsterdam, The Netherlands, 2016: 36-42.
[24] Cao J, Zhang J, Li B, et al. RetinaMOT: rethinking anchor-free YOLOv5 for online multiple object tracking[J]. Complex & Intelligent Systems, 2023, 9(5): 5115-5133.
[25] Wan X, Zhou S, Wang J, et al. Multiple object tracking by trajectory map regression with temporal priors embedding[C]∥Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 2021: 1377-1386.
[26] Wu J, Cao J, Song L, et al. Track to detect and segment: an online multi-object tracker[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville,USA,2021: 12347-12356.
[27] Nguyen P, Quach K G, Kitani K, et al. Type-to-track: retrieve any object via prompt-based tracking[J/OL].[2023-11-20]. .
[28] Mahmoudi N, Ahadi S M, Rahmati M. Multi-target tracking using CNN-based features: CNNMTT[J]. Multimedia Tools and Applications, 2019, 78(6): 7077-7096.
[29] Meneses M, Matos L, Prado B, et al. Learning to associate detections for real-time multiple object tracking[J/OL]. [2023-11-22]..
[30] Pang J, Qiu L, Li X, et al. Quasi-dense similarity learning for multiple object tracking[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 164-173.
[31] Zhou X, Koltun V, Krähenbühl P. Tracking objects as points[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 474-490.
[1] Shan-na ZHUANG,Jun-shuai WANG,Jing BAI,Jing-jin DU,Zheng-you WANG. Video-based person re-identification based on three-dimensional convolution and self-attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2409-2417.
[2] Yue HOU,Jin-song GUO,Wei LIN,Di ZHANG,Yue WU,Xin ZHANG. Multi-view video speed extraction method that can be segmented across lane demarcation lines [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1692-1704.
[3] Hua CAI,Rui-kun ZHU,Qiang FU,Wei-gang WANG,Zhi-yong MA,Jun-xi SUN. Human pose estimation corrector algorithm based on implicit key point interconnection [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1061-1071.
[4] Guang-wen LIU,Xin-yue XIE,Qiang FU,Hua CAI,Wei-gang WANG,Zhi-yong MA. Spatiotemporal Transformer with template attention for target tracking [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1037-1049.
[5] Lai-wei JIANG,Ce WANG,Hong-yu YANG. Review of multi-object tracking based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3429-3445.
[6] Xin GUAN,Zi-jian ZHOU,Qiang LI. Human pose estimation based on graph structure guidance and location information enhancement [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3283-3295.
[7] Sheng-jie ZHU,Xuan WANG,Fang XU,Jia-qi PENG,Yuan-chao WANG. Multi-scale normalized detection method for airborne wide-area remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2329-2337.
[8] Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
[9] Yun-long GAO,Ming REN,Chuan WU,Wen GAO. An improved anchor-free model based on attention mechanism for ship detection [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1407-1416.
[10] Dian-wei WANG,Chi ZHANG,Jie FANG,Zhi-jie XU. UAV target tracking algorithm based on high resolution siamese network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1426-1434.
[11] Yu WANG,Kai ZHAO. Postprocessing of human pose heatmap based on sub⁃pixel location [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1385-1392.
[12] Lin MAO,Hong-yang SU,Da-wei YANG. Temporal salient attention siamese tracking network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(11): 3327-3337.
[13] Peng YU,Yan PIAO. Reverse backbone net for unsupervised person re-identification [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(11): 3309-3317.
[14] Wen-cai SUN,Xu-ge HU,Zhi-fa YANG,Fan-yu MENG,Wei SUN. Optimization of infrared-visible road target detection by fusing GPNet and image multiscale features [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(10): 2799-2806.
[15] Peng YU,Yan PIAO. New method for extracting person re-identification attributes based on multi-scale features [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(4): 1155-1162.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!