Journal of Jilin University(Engineering and Technology Edition) ›› 2025, Vol. 55 ›› Issue (10): 3283-3295.doi: 10.13229/j.cnki.jdxbgxb.20240086

Previous Articles    

Human pose estimation based on graph structure guidance and location information enhancement

Xin GUAN(),Zi-jian ZHOU,Qiang LI()   

  1. School of Microelectronics,Tianjin University,Tianjin 300072,China
  • Received:2024-01-23 Online:2025-10-01 Published:2026-02-03
  • Contact: Qiang LI E-mail:guanxin@tju.edu.cn;liqiang@tju.edu.cn

Abstract:

The high degree of freedom of human limbs often constitutes complex poses in which the key points are prone to occluded, and locating the occluded key points is one of the difficulties in human pose estimation. To this end, this paper proposed a method with a guided graph structure and enhanced key points location information. The method incorporates a location information enhancement module in the HRNet, which can improve the representation of the spatial location information of visible key points. A visual graph neural module is integrated into backbone network to extract relevant features containing key points and exploit the local and global topological connectivity relationships between key points in pixel coordinate space to infer the location information of the occluded key points. Finally, a heatmap aggregation unit and a semantic graph convolutional network are employed to update the affinity weights between key points in the semantic space, which can represent the topological dependencies between key points under the constraints of the skeleton structure and further optimize the estimation of the occluded key points. The proposed model achieves an average accuracy of 78.1% on the COCO2017 test set, and can accurately estimate the occluded key points prone to occlusion in complex poses.

Key words: computer vision, human pose estimation, key points, graph convolution

CLC Number: 

  • TP391.4

Fig.1

Overall architecture of pose estimation networks"

Fig.2

Structure of LIEM"

Fig.3

Structure of VGNM"

Fig.4

Structure of HAU"

Fig.5

Structure of asymmetric convolution"

Fig.6

Structure of SGCN"

Table 1

Experimental results for different network configurations with a PCKh threshold of 0.5 on MPII validation set"

BaselineLIEMVGNMSGCNHAU&SGCN头部肩部肘部腕部臀部膝盖脚踝平均值
97.195.990.386.489.187.183.390.3
97.396.090.386.489.387.183.290.3
97.496.190.786.589.487.483.490.4
97.696.190.886.589.487.483.590.5
97.696.390.886.789.587.483.690.6

Table 2

Experimental results for different network configurations on COCO2017 validation set"

BaselineLIEMVGNMSGCNHAU&SGCN

参数量

/M

运算量

/G

AP/%

AP0.5

/%

AP0.75

/%

APM

/%

APL

/%

AR

/%

28.57.174.490.581.970.881.079.8
28.77.274.991.182.071.681.479.9
29.47.575.691.882.372.281.880.2
30.17.876.292.582.772.482.080.6
30.37.976.592.782.872.582.380.8

Table 3

Comparison results with different pose estimation networks on COCO2017 validation set"

方法输入尺寸参数量/M运算量/GAP/%AP0.5/%AP0.75/%APM/%APL/%AR/%
8-stage Hourglass6256×19225.114.366.9
CPN507256×19227.06.268.6
CPN50+OHKM7256×19227.06.269.4
Simple Baseline1528256×19268.615.772.089.379.868.778.977.8
HRNetW329256×19228.57.174.490.581.970.881.079.8
HRNetW489256×19263.614.675.190.682.271.581.880.4
TokenPose-L/D2412256×19227.511.075.890.382.572.382.780.9
HRFormer-B13256×19243.212.275.690.882.871.782.680.8
RAM-GPRNet(W32)30256×19231.47.776.0
RAM-GPRNet(W48)30256×19270.015.876.5
EMF-HRNet31256×19228.89.575.690.482.672.082.480.8
AMHRNet(W32)32256×19236.476.191.082.771.582.981.2
AMHRNet(W48)32256×19271.876.491.183.172.283.381.4
SCC-Net33256×19258.910.573.492.681.570.477.576.2
Ours(W32)256×19230.37.976.592.782.872.582.380.8
Ours(W48)256×19266.217.677.293.083.372.982.781.3
CPN507384×28813.970.6
CPN50+OHKM7384×28813.971.6
Simple Baseline1528384×28868.635.674.389.681.170.579.779.7
HRNetW329384×28828.516.075.890.682.771.982.881.0
HRNetW489384×28863.632.976.390.882.972.383.481.2
HRFormer-B13384×28843.226.877.291.083.673.284.282.0
RAM-GPRNet(W32)30384×28831.417.277.3
RAM-GPRNet(W48)30384×28870.035.677.7
EMF-HRNet31384×28828.876.590.783.172.783.681.5
Ours(W32)384×28830.318.678.093.183.573.182.981.4
Ours(W48)384×28866.237.478.493.383.673.483.781.7

Table 4

Comparison results with different pose estimation networks on COCO2017 dataset"

方 法输入尺寸参数量/M运算量/GAP/%AP0.5/%AP0.75/%APM/%APL/%AR/%
CPN506384×28872.686.169.778.364.1
Simple Baseline1527384×28868.635.673.791.981.170.380.079.0
HRNet(W32)8384×28828.516.074.992.582.871.380.980.1
HRNet(W48)8384×28863.632.975.592.583.371.981.580.5
TokenPose-L/D2412384×28829.822.175.992.383.472.282.180.8
HRFormer-B13384×28843.226.876.292.783.872.582.381.2
RAM-GPRNet(W32)30384×28831.417.276.5
RAM-GPRNet(W48) 30384×28870.035.677.0
Ours(W32)384×28830.318.677.692.983.472.882.581.2
Ours(W48)384×28866.237.478.193.083.673.283.181.5

Fig.7

Comparison of visualization results between baseline model and proposed method"

Fig.8

Comparison of skeleton visualization results between baseline model and proposed method"

[1] Eduardo RDS, Adams LS, Stoffel R A, et al. Monocular multi-person pose estimation: a survey[J]. Pattern Recognition, 2021, 118: No.108046.
[2] 田皓宇, 马昕, 李贻斌. 基于骨架信息的异常步 态识别方法[J]. 吉林大学学报: 工学版, 2022, 52(4): 725-737.
Tian Hao-yu, Ma Xin, Li Yi-bin. Abnormal gait recognition method based on skeleton information[J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(4): 725-737.
[3] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[4] Toshev A, Szegedy C. DeepPose: Human pose estimation via deep neural networks[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1653-1660.
[5] Tompson J, Jain A, Lecun Y, et al. Joint training of a convolutional network and a graphical model for human pose estimation[C]∥Neural Information Processing Systems,Montreal, Canada, 2014: 1799-1807.
[6] Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]∥European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 483-499.
[7] Chen Y L, Wang Z C, Peng Y X, et al. Cascaded pyramid network for multi-person pose estimation[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7103-7112.
[8] Xiao B, Wu H P, Wei Y C. Simple baselines for human pose estimation and tracking[C]∥European Conference on Computer Vision, Munich, Germany, 2018: 472-487.
[9] Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation [C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5686-5796.
[10] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Neural Information Processing Systems(NeurIPS),Long Beach, USA, 2017: 5998-6008.
[11] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]∥International Conference on Learning Representations, Online, 2021.
[12] Li Y J, Zhang S K, Wang Z C, et al. Tokenpose: Learning keypoint tokens for human pose estimation[C]∥Proceedings of the IEEE International Conference on Computer Vision(ICCV),Montreal, Canda, 2021: 11293-11302.
[13] Yuan Y H, Fu R, Huang L, et al. Hrformer: high-resolution transformer for dense prediction[J]. Advances in Neural Information Processing Systems, 2021, 34: 7281-7293.
[14] Yang S, Quan Z B, Nie M, et al. Transpose: Keypoint localization via transformer[C]∥Proceedings of the IEEE International Conference on Computer Vision, Montreal, Canda, 2021: 11782-11792.
[15] Li G H, Müller M, Thabet A, et al. DeepGCNs: Can GCNs Go As Deep As CNNs?[C]∥IEEE International Conference on Computer Vision, Seoul, South Korea, 2019: 9266-9275.
[16] Qiu L T, Zhang X Y Y, Li Y R, et al. Peeking into occluded joints: A novel framework for crowd pose estimation[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 488-504.
[17] Bin Y R, Chen Z M, Wei X S, et al. Structure-aware human pose estimation with graph convolutional networks[J]. Pattern Recognition, 2020, 106: No.107410.
[18] Wang J, Long X, Gao Y, et al. Graph-PCNN: Two stage human pose estimation with graph pose refinement[C]∥European Conference on Computer Vision, Glasgow, UK, 2020: 492-508.
[19] Banik S, GarcÍa A M, Knoll A. 3D human pose regression using graph convolutional network[C]∥IEEE International Conference on Image Processing(ICIP), Anchorage, USA, 2021: 924-928.
[20] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]∥IEEE Conference on Computer Vision and Pattern Recognition. Nashville, USA, 2021: 13708-13717.
[21] Han K, Wang Y H, Guo J Y, et al. Vision gnn: An image is worth graph of nodes[J]. Advances in Neural Information Processing Systems, 2022, 35: 8291-8303.
[22] Huang G, Liu Z, Laurens V D M, et al. Densely connected convolutional networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, USA, 2017: 4700-4708.
[23] Ding X H, Guo Y C, Ding G G, et al. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks[C]∥International Conference on Computer Vision, Seoul, South Korea, 2019: 1911-1920.
[24] Zhao L, Peng X, Tian Y, et al. Semantic graph convolutional networks for 3D Human Pose Regression[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3420-3430.
[25] Wang X L, Girshick R, Gupta A, et al. Non-local neural networks[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, USA, 2018: 7794-7803.
[26] Yang J W, Lu J S, Lee S, et al. Graph R-CNN for scene graph generation[C]∥European Conference on Computer Vision, Munich, Germany, 2018: 690-706.
[27] Velikovi P, Cucurull G, Casanova A, et al. Graph attention networks[J]. Stat, 2017, 1050(20): No.10-48550.
[28] Andriluka M, Pishchulin L, Gehler P, et al. 2D human pose estimation: New benchmark and state of the art analysis[C]∥IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 3686-3693.
[29] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]∥Proceedings of the European Conference on Computer Vision(ECCV), Zurich, the Switzerland, 2014: 740-755.
[30] Zhang K, He P, Yao P, et al. Learning enhanced resolution-wise features for human pose estimation[C]∥IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 2020: 2256-2260.
[31] Wang R, Wu W Y, Wang X Y. Enhancing multi-scale information exchange and feature fusion for human pose estimation[J]. The Visual Computer, 2023, 39(10): 4751-4765.
[32] Tran T D, Vo X T, Nguyen D L, et al. High-resolution network with attention module for human pose estimation[C]∥Asian Control Conference, Jeju Island, South Korea, 2022: 459-464.
[33] Dong K W, Sun Y J, Cheng X Z, et al. Combining detailed appearance and multi-scale representation: A structure-context complementary network for human pose estimation[J]. Applied Intelligence, 2023, 53(7): 8097-8113.
[34] Soomro K, Zamir A R, Shah M. UCF101: a dataset of 101 human actions classes from videos in the wild[J/OL].[2023-08-16]. .
[1] Yue HOU,Xin ZHANG,Yue WU. Traffic flow prediction based on spatio-temporal dynamic constraint graph feedback [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 183-198.
[2] Yue HOU,Jin-song GUO,Wei LIN,Di ZHANG,Yue WU,Xin ZHANG. Multi-view video speed extraction method that can be segmented across lane demarcation lines [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1692-1704.
[3] Hua CAI,Rui-kun ZHU,Qiang FU,Wei-gang WANG,Zhi-yong MA,Jun-xi SUN. Human pose estimation corrector algorithm based on implicit key point interconnection [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1061-1071.
[4] Guang-wen LIU,Xin-yue XIE,Qiang FU,Hua CAI,Wei-gang WANG,Zhi-yong MA. Spatiotemporal Transformer with template attention for target tracking [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1037-1049.
[5] Lai-wei JIANG,Ce WANG,Hong-yu YANG. Review of multi-object tracking based on deep learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3429-3445.
[6] Sheng-jie ZHU,Xuan WANG,Fang XU,Jia-qi PENG,Yuan-chao WANG. Multi-scale normalized detection method for airborne wide-area remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2329-2337.
[7] Pei-guang JING,Yu-dou TIAN,Shao-chu WANG,Yun LI,Yu-ting SU. Traffic flow prediction algorithm based on dynamic diffusion graph convolution [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1582-1592.
[8] Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
[9] Yun-long GAO,Ming REN,Chuan WU,Wen GAO. An improved anchor-free model based on attention mechanism for ship detection [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1407-1416.
[10] Dian-wei WANG,Chi ZHANG,Jie FANG,Zhi-jie XU. UAV target tracking algorithm based on high resolution siamese network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1426-1434.
[11] Yu WANG,Kai ZHAO. Postprocessing of human pose heatmap based on sub⁃pixel location [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1385-1392.
[12] Lin MAO,Hong-yang SU,Da-wei YANG. Temporal salient attention siamese tracking network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(11): 3327-3337.
[13] Wen-cai SUN,Xu-ge HU,Zhi-fa YANG,Fan-yu MENG,Wei SUN. Optimization of infrared-visible road target detection by fusing GPNet and image multiscale features [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(10): 2799-2806.
[14] Yu-ting SU,Ji WANG,Wei ZHAO,Pei-guang JING. Dynamic graph convolutional neural network for image sentiment distribution prediction [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2601-2610.
[15] Jing-hong LIU,An-ping DENG,Qi-qi CHEN,Jia-qi PENG,Yu-jia ZUO. Anchorfree target tracking algorithm based on multiple attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(12): 3518-3528.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!