基于隐式关键点互联的人体姿态估计矫正器算法

doi:10.13229/j.cnki.jdxbgxb.20230600

摘要/Abstract

摘要：

针对现有人体姿态估计模型中忽略人体关键点之间相互联系的问题，提出了一种基于相关矩阵建立隐式关键点互联的人体姿态估计矫正算法。该算法采用双阶段网络结构，第一阶段使用现有模型获得关键点的准确率，第二阶段计算相关系数矩阵，构建关键点矫正器，实现隐式的关键点互联，通过选取效果较好的部分关键点作为指导，优化效果较差的另一部分关键点。将伪热力图信息重组后分为指引区域和待索引区域两个矩阵，指引矩阵根据准确率分为3个区域，并和转置后的待索引矩阵分别相乘得到相关矩阵，为不同相关矩阵增加不同的影响因子，通过公式得到总相关矩阵，与待索引矩阵相乘得到索引区域，最终逆向信息重组恢复到最初维度，与伪热力图进行逐像素相加，得到最终预测热力图，自动建立各关键点之间的联系，矫正现有模型忽略的关键点相互关系。在COCO2017数据集上的实验结果表明：本文人体姿态估计矫正算法在关键点识别准确度为70.5%，比现有模型SimpleBaseLine平均提升了0.4%。在一些容易被遮挡的部位，例如脚踝和手腕，体积小，处于四肢末端，遮挡的概率高，相邻的关键点个数少，表层信息特征少，人体姿态估计矫正器通过加强四肢末端关键点和其他全身关键点的联系，建立更多的特征联系。实验表明：脚踝和手腕识别的准确率相对提升1.5%，相比于其他现有模型，准确率也有较大的提升，证明了隐式关键点互联的人体姿态估计矫正算法的有效性。

关键词: 机器视觉, 人体姿态估计, 隐式关键点, 相关矩阵

Abstract:

To solve the problem of ignoring the interrelation between key points in the existing human pose estimation models， a new algorithm based on correlation matrix to establish implicit key point interconnection was proposed. The algorithm adopts a two-stage network structure. In the first stage， the existing model was used to obtain the accuracy of key points. In the second stage， the correlation coefficient matrix was calculated to build a key point corrector to achieve implicit key point interconnection. By selecting some key points with good effect as guidance， the other key points with poor effect were optimized. The pseudo-thermal map information was reorganized into two matrices， the guide region and the region to be indexed. The guide matrix was divided into three regions according to the accuracy rate， and was multiplied by the transposed matrix to be indexed to obtain the correlation matrix， and different influence factors were added to different correlation matrices. The total correlation matrix was obtained by formula， and the index region was obtained by multiplying the matrix to be indexed. Finally， the reverse information was reconstructed to the original dimension， and the pseudo-thermal map was added pixel by pixel to obtain the final predicted thermal map， which automatically establishes the relationship between key points and corrects the relationship between key points ignored by the existing model. The experimental results on COCO2017 data set show that the key point recognition accuracy of the human pose estimation correction algorithm is 70.5%， an average increase of 0.4% compared with the existing model SimpleBaseLine. In some parts that are easily obscured， such as ankles and wrists， they are small in size and at the end of limbs， and the probability of occlusion is high. The number of adjacent key points is less， and the surface information features are less. The human posture estimation corrector strengthens the connection between the key points at the extremities and other key points in the whole body， and establishes more feature connections.The experiments demonstrate that the recognition accuracy of ankles and wrists has improved by 1.5%， with a significant increase compared to other existing models， proving the effectiveness of the implicit keypoint-interconnected human pose estimation correction algorithm.

Key words: computer vision, human pose estimation, implicit key points, correlation matrix

中图分类号:

TP391.4

才华,朱瑞昆,付强,王伟刚,马智勇,孙俊喜. 基于隐式关键点互联的人体姿态估计矫正器算法[J]. 吉林大学学报(工学版), 2025, 55(3): 1061-1071.

Hua CAI,Rui-kun ZHU,Qiang FU,Wei-gang WANG,Zhi-yong MA,Jun-xi SUN. Human pose estimation corrector algorithm based on implicit key point interconnection[J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1061-1071.

图/表 8

图1

图2

表1

不同K1、K2、Q对应的检测效果"

模型	$K 1$	$K 2$	Q	AP/%
SBL-KPC	9	8	1	70.2
SBL-KPC	10	7	1	70.1
SBL-KPC	11	6	1	68.9
SBL-KPC	12	5	2	70.3
SBL-KPC	13	4	3	70.5
SBL-KPC	14	3	4	70.4
SBL-KPC	15	2	7	70.3
SBL-KPC	16	1	16	70.1

表1

表2

不同方法在COCO 20177数据集的姿态估计评价结果"

模型	GFLOPs	AP/%	$A P 50$ /%	$A P 75$ /%
CPN	9.142	68.4	87.5	74.4
CPN-KPC	9.321	68.7（↑0.3）	87.9（↑0.4）	74.6（↑0.2）
SBL	7.671	70.1	91.3	76.9
SBL-KPC	7.862	70.5（↑0.4）	91.5（↑0.2）	77.1（↑0.2）
HRNeT	1.533	73.5	92.2	80.3
HRNeT-KPC	1.721	73.6（↑0.1）	92.2	80.6（↑0.3）
UDP	9.432	71.2	91.3	78.2
UDP-KPC	9.622	71.4（↑0.2）	91.2（↓0.1）	78.3（↑0.1）
DarkPose	9.452	71.3	90.9	78.4
DarkPose-KPC	9.638	71.4（↑0.1）	91.2（↑0.3）	78.5（↑0.1）

表2

表3

图3

图4

图5

参考文献 34

1	Andriluka M, Iqbal U, Insafutdinov E, et al. PoseTrack:a benchmark for human pose estimation and tracking[C]∥IEEE / CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City,USA, 2018: 5167-5176.
2	Liu Q, Zhang Y, Bai S, et al. Explicit occlusion reasoning for multi-person 3D human pose estimation[C] ∥Proceedings of the European Conference on Computer Vision (ECCV), Switzerland, 2022: 497-517.
3	Wu H, Ma X, Li Y. Multi-level channel attention excitation network for human action recognition in videos[J]. Signal Processing: Image Communication, 2023, 114: No. 116940.
4	Nguyen P, Liu T, Prasad G, et al. Weakly supervised action localization by sparse temporal pooling network[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018: 6752-6761.
5	Ren Z, Zhang Q, Gao X, et al. Multi-modality learning for human action recognition[J]. Multimedia Tools and Applications, 2021, 80(11): 16185-16203.
6	Tang Y, Liu R. Skeleton embedding of multiple granularity attention network for human action recognition[C]∥International Conference on Articulated Motion and Deformable Objects. Berlin: Springer, 2020: 12878-12885.
7	Liang Z J, Wang X L, Huang R, et al. An expressive deep model for human action parsing from a single image[C]∥EEE International Conference on Multimedia and Expo (ICME), Chengdu,China, 2014: 1-6.
8	Goh E S, Sunar M S, & Ismail A W. 3D object manipulation techniques in handheld mobile augmented reality interface: a review[J]. IEEE Access, 2019, 7: 40581-40601.
9	Yang X, Chen Y, Liu J, et al. Rapid prototyping of tangible augmented reality interfaces: towards exploratory learning for science education[J].Interactive Learning Environments, 2019, 27(4): 469-483.
10	Zhang D, Peng Y, Yang W, et al. Multi-viewpoint interaction with social robots: a case study of speech therapy for children with autism[J].Journal of Intelligent & Robotic Systems, 2018, 92(3/4): 359-3728.
11	Zhang R, Li J, Xiao T, et al. BodyPoseNet: body pose estimation driven by deep neural networks[J]. Signal Processing: Image Communication, 2021, 99:No.116290.
12	Chen L, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
13	Zeng W, Gao Y, Zheng Y, et al. DenseReg: fully convolutional dense regression for accurate 3D human pose estimation[J]. IEEE Transactions on Image Processing, 2021, 30: 2830-2842.
14	Li J, Wang C, Zhu H, et al. Efficient crowded scenes pose estimation and a new benchmark[EB/OL].[2023-05-01].
15	Moon G, Chang J Y, Lee K M. PoseFix: Model-agnostic general human pose refinement network[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 7773-7781.
16	Zhou C, Chen S, Ding C, et al. Learning contextual and attentive information for brain tumor segmentation[C]∥Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Granada, Spain, 2019: 497-507.
17	Ji X, Yang Q, Yang X, et al. Human pose estimation: multi-stage network based on HRNet[J]Journal of Physics, 2022, 2400(1): No.012034.
18	He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
19	Papandreou G, Zhu T, Kanazawa N, et al. Towards accurate multi-person pose estimation in the wild[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 4903-4911.
20	Wei L, Zhang S, Dai J, et al. ST-GCN: spatial temporal graph convolutional networks for skeleton-based action recognition[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Sait Lake City, USA, 2018: 7452-7461.
21	Qin X, Zhang Z, Huang C,et al. U2-Net: going deeper with nested U-structure for salient object detection[J]. Pattern Recognition, 2020, 106: No.107404.
22	Zhou F, Zhu M, Bai J, et al. Deformable ConvNets v2: more deformable, better results[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Sait Lake City,USA, 2018: 9308-9316.
23	Carreira J, Agrawal P, Fragkiadaki K, et al. Associative embedding: end-to-end learning for joint detection and grouping[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Las Vegas, USA, 2016: 2274-2284.
24	Papandreou G, Zhu T, Chen L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany,2018: 282-299.
25	Kreiss S, Bertoni A, Alahi A. PifPaf: composite fields for human pose estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019:11977-11986.
26	Insafutdinov M, Pishchulin L, Andres B, et al. DeepCut: joint subset partition and labeling for multi person pose estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: No.533.
27	Newell A, Yang A, Deng J. Stacked hourglass networks for human pose estimation[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, Holland, 2016 :483-499.
28	Chen Y, Wang Z, Peng Y, et al. Cascaded pyramid network for multi-person pose estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Sait Lake City,USA, 2018: 7103-7112.
29	Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 5693-5703.
30	Xiao B, Wu H, Wei Y. Simple baseline for human pose estimation and tracking[C]∥Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 742-757.
31	Wei L, Zhang S, Yin W, et al. Convolutional pose machines[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Munich, Germany, 2016: 4724-4732.
32	Lecun Y, Bottou L, Bengio Y, et al.Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
33	Lin G, Li Q, Li M, et al. A novel bottleneck-activated feedback neural network model for time series prediction[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(4): 1621-1635.
34	Cao Z, Hidalgo Martinez G, Simon T, et al. OpenPose: realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 43(1): 172-186.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

模型	Head（头）	Sho.（肩）	Elb.（肘）	Wri.（腕）	Hip（臀）	Knee（膝）	Ank（踝）	OKS
CPN	90.1	86.2	85.1	83.5	86.1	84.3	80.2	75.2
CPN-KPC	90.2	86.4	85.3	85.2	87.1	85.2	83.1	78.4
SBL	92.2	91.1	89.2	87.8	90.1	87.2	86.3	90.3
SBL-KPC	92.4	91.5	89.5	88.1	90.5	88.3	89.1	91.4
HourGlass	91.8	90.3	88.7	87.3	90.3	88.2	85.3	91.3
HourGlass-KPC	91.8	90.5	89.1	87.6	91.2	89.3	88.2	92.5
HRNeT	92.4	92.1	90.3	89.2	90.4	89.3	86.2	91.8
HRNeT-KPC	92.4	92.2	90.5	91.1	90.8	89.9	89.8	93.2

[1]	李延风,刘名扬,胡嘉明,孙华栋,孟婕妤,王奥颖,张涵玥,杨华民,韩开旭. 基于梯度转移和自编码器的红外与可见光图像融合[J]. 吉林大学学报(工学版), 2024, 54(6): 1777-1787.
[2]	王宇,赵凯. 基于亚像素定位的人体姿态热图后处理[J]. 吉林大学学报(工学版), 2024, 54(5): 1385-1392.
[3]	张自超,陈建. 基于双目仿鹰眼视觉与超分辨的果园三维点云重建[J]. 吉林大学学报(工学版), 2024, 54(5): 1469-1481.
[4]	龙关旭,张修石,辛公锋,王涛,杨干. 融合机器视觉的桥梁动态称重方法[J]. 吉林大学学报(工学版), 2024, 54(1): 188-197.
[5]	刘思远,侯跃谦,寇莹,任真,胡正乙,赵雪微,葛云鹏. 基于线结构光视觉的平面度误差测量方法[J]. 吉林大学学报(工学版), 2023, 53(12): 3358-3366.
[6]	包昊菁,刘思远,任真,张云辉,胡正乙,葛宇鹏. 基于机器视觉的链轮尺寸测量方法[J]. 吉林大学学报(工学版), 2023, 53(10): 2795-2806.
[7]	李景彬,杨禹锟,温宝琴,坎杂,孙雯,杨朔. 基于根茬检测的秋后残膜回收导航路径提取方法[J]. 吉林大学学报(工学版), 2021, 51(4): 1528-1539.
[8]	刘恩泽,吴文福. 基于机器视觉的农作物表面多特征决策融合病变判断算法[J]. 吉林大学学报(工学版), 2018, 48(6): 1873-1878.
[9]	林金花, 王延杰, 王璐, 姚禹. 全局相机姿态优化下的快速表面重建[J]. 吉林大学学报(工学版), 2018, 48(3): 909-918.
[10]	张飞, 单忠德, 任永新, 聂军刚, 刘丰. 缸盖缺陷检测系统线阵相机现场标定方法[J]. 吉林大学学报(工学版), 2017, 47(2): 540-545.
[11]	周晓东, 张雅超, 谭庆昌, 张维君. 基于结构光视觉技术的圆柱度测量新方法[J]. 吉林大学学报(工学版), 2017, 47(2): 524-529.
[12]	张波,王文军,魏民国,成波. 基于机器视觉的驾驶人使用手持电话行为检测[J]. 吉林大学学报(工学版), 2015, 45(5): 1688-1695.
[13]	张保华, 黄文倩, 李江波, 赵春江, 刘成良, 黄丹枫. 基于I-RELIEF和SVM的畸形马铃薯在线分选[J]. 吉林大学学报(工学版), 2014, 44(6): 1811-1817.
[14]	刘长英¹, 蔡文静¹, 王天皓², 李机智¹, 贾艳梅¹, 宋玉河¹. 汽车连杆裂解槽视觉检测技术[J]. 吉林大学学报(工学版), 2014, 44(4): 1076-1080.
[15]	万川, 田彦涛, 刘帅师, 陈宏伟. 基于主动机器视觉的人脸跟踪与表情识别系统[J]. 吉林大学学报(工学版), 2013, 43(02): 459-465.