基于亚像素定位的人体姿态热图后处理

doi:10.13229/j.cnki.jdxbgxb.20230268

摘要/Abstract

摘要：

为提高热图预测关节点的精度，提出了一种基于亚像素定位的人体姿态热图后处理方法，该方法包括2个策略：第一个策略是翻转图像热图的亚像素偏移处理，可以消除与原始图像热图的未对齐偏差；第二个策略是局部区域曲面拟合的热图解码，实现关节点的亚像素定位。本文热图后处理方法独立于网络模型，不需要对模型进行任何修改即可应用于当前基于热图的人体姿态估计模型。在COCO2017和MPII数据集上对本文方法进行了实验。以HRNet-W32-256×192模型和Simple Baseline-W32-256×192模型为例，COCO2017数据集上平均精度分别提高了0.9和1.1，验证了方法的有效性。

关键词: 计算机视觉, 人体姿态估计, 热图后处理, 高斯拟合, 热图解码

Abstract:

To improve the prediction accuracy of joint points of the heatmap， this paper proposes a postprocessing method of human pose heatmap based on sub-pixel localization. The method includes two strategies： the first is the sub-pixel shift processing of the flipped image heatmap， which can eliminate the unaligned deviation from the original image heatmap； the second is the heatmap decoding for local region surface fitting to achieve sub-pixel localization of the joint points. The heatmap postprocessing method in this paper is independent of the network model and can be applied to the current heatmap-based human pose estimation models without any modification. To verify the effectiveness of the proposed method， experiments have been carried out by using two publicly available datasets named COCO2017 and MPII. The average precision can be improved by 0.9 and 1.1 on COCO2017， respectively， by adopting two deep learning models， i.e.， HRNet-W32-256×192 model and Simple Baseline-W32-256×192 model.

Key words: computer vision, human pose estimation, heatmap postprocessing, gaussian fitting, heatmap decoding

中图分类号:

TP391.4

王宇,赵凯. 基于亚像素定位的人体姿态热图后处理[J]. 吉林大学学报(工学版), 2024, 54(5): 1385-1392.

Yu WANG,Kai ZHAO. Postprocessing of human pose heatmap based on sub⁃pixel location[J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(5): 1385-1392.

图/表 6

图1

图2

表1

COCO数据集实验结果"

网络	模型	分辨率	解码方法	AP	AP.5	AP.75	AP（M）	AP（L）	AR	AR.5	AR.75	AR（M）	AR（L）
HRNet	C-32	256×192	标准偏移	74.4	90.5	81.9	70.8	81.0	79.8	94.2	86.5	75.7	85.8
			SSFH	74.7	90.6	82.0	71.0	81.6	80.0	94.3	86.6	75.7	86.3
			DARK	74.8	90.4	82.0	71.4	81.6	80.2	94.1	86.7	76.1	86.2
			LRSF	75.0	90.5	82.0	71.4	81.7	80.2	94.1	86.7	76.1	86.2
			SSFH+LRSF	75.3	90.6	82.2	71.6	82.3	80.5	94.3	86.7	76.3	86.7
		384×288	标准偏移	75.8	90.6	82.5	72.0	82.7	80.9	94.3	86.9	76.7	87.1
			SSFH	75.9	90.6	82.8	72.0	82.9	81.0	94.3	87.1	76.6	87.4
			DARK	75.9	90.6	82.4	72.1	82.9	81.1	94.3	87.0	76.7	87.4
			LRSF	76.0	90.6	82.5	72.2	83.0	81.1	94.3	87.0	76.9	87.3
			SSFH+LRSF	76.2	90.6	82.8	72.3	83.2	81.2	94.3	87.1	76.8	87.6
	C-48	256×192	标准偏移	75.1	90.6	82.2	71.5	81.8	80.4	94.3	86.7	76.2	86.4
			SSFH	75.3	90.6	82.4	71.3	82.4	80.5	94.2	86.6	76.1	86.9
			DARK	75.5	90.5	82.5	71.9	82.3	80.6	94.2	86.7	76.4	87
			LRSF	75.6	90.6	82.5	71.9	82.4	80.7	94.2	86.8	76.5	86.9
			SSFH+LRSF	75.9	90.7	82.5	72.0	82.9	81.0	94.2	86.7	76.6	87.3
		384×288	标准偏移	76.3	90.8	82.9	72.3	83.4	81.2	94.2	87.1	76.7	87.6
			SSFH	76.4	90.8	83.1	72.3	83.6	81.3	94.2	87.3	76.8	87.8
			DARK	76.5	90.8	82.9	72.4	83.5	81.3	94.2	87	76.8	87.7
			LRSF	76.5	90.8	82.9	72.5	83.6	81.3	94.2	87.0	76.9	87.7
			SSFH+LRSF	76.6	90.8	83.1	72.5	83.9	81.5	94.2	87.3	76.9	88.0
simple baseline	R-50	256×192	标准偏移	70.4	88.6	78.3	67.1	77.2	76.3	92.9	83.4	72.1	82.4
			SSFH	70.7	88.6	78.1	67.3	77.6	76.6	92.9	83.2	72.2	72.2
			DARK	71.0	88.6	78.5	67.7	77.9	76.7	93	83.3	72.5	83
			LRSF	71.2	88.6	78.6	67.8	78.1	76.8	93.0	83.4	72.6	83.0
			SSFH+LRSF	71.5	88.6	78.7	67.9	78.3	77.2	93.1	83.5	72.7	83.4
		384×288	标准偏移	72.2	89.3	78.9	68.1	79.7	77.6	93.2	83.8	72.8	84.6
			SSFH	72.3	89.4	78.9	68.1	80.1	77.7	93.3	83.6	72.9	84.8
			DARK	72.4	89.3	79.0	68.2	80.1	77.8	93.2	83.8	73.0	84.8
			LRSF	72.5	89.3	79.1	68.4	80.1	77.9	93.2	83.9	73.0	84.8
			SSFH+LRSF	72.6	89.4	79.1	68.4	80.4	78.0	93.2	83.8	73.1	85.1
	R-101	256×192	标准偏移	71.4	89.3	79.3	68.1	78.1	77.1	93.4	84	73.0	83.2
			SSFH	71.7	89.3	79.4	68.2	78.7	77.3	93.3	84.1	73.0	83.6
			DARK	71.9	89.3	79.5	68.9	78.6	77.6	93.5	84.1	73.5	83.7
			LRSF	72.0	89.3	79.6	68.9	78.7	77.7	93.5	84.2	73.6	83.7
			SSFH+LRSF	72.3	89.3	79.5	68.8	79.4	77.9	93.3	84.2	73.6	84.2
		384×288	标准偏移	73.6	89.6	80.3	69.9	81.1	79.1	93.6	85.1	74.5	85.8
			SSFH	73.8	89.5	80.6	69.7	81.4	79.0	93.2	85.3	74.3	85.9
			DARK	73.9	89.4	80.6	69.9	81.4	79.1	93.2	85.3	74.4	85.9
			LRSF	73.9	73.9	80.7	70.0	81.5	79.2	93.3	85.3	74.5	86.0
			SSFH+LRSF	74.1	89.4	80.8	70.0	81.8	79.3	93.3	85.4	74.5	86.2
	R-152	256×192	标准偏移	72.0	89.3	79.8	68.7	78.9	77.8	93.4	84.6	73.6	83.9
			SSFH	72.4	89.4	79.7	68.9	79.5	78.0	93.5	84.5	73.7	84.3
			DARK	72.6	89.3	80	69.4	79.7	78.3	93.4	84.9	74.1	84.4
			LRSF	72.7	89.3	80.0	69.4	79.7	78.3	93.4	84.9	74.1	84.4
			SSFH+LRSF	72.9	89.3	80.0	69.4	80.2	78.5	93.4	84.8	74.2	84.8
		384×288	标准偏移	74.3	89.6	81.1	70.5	81.6	79.7	93.7	85.8	75.1	86.3
			SSFH	74.4	89.6	81.3	81.3	81.9	79.8	93.7	85.9	75.1	86.5
			DARK	74.6	89.6	81.4	70.9	82	79.8	93.6	86	75.2	86.5
			LRSF	74.6	89.6	81.5	70.8	82.0	79.9	93.6	86.0	75.3	86.5
			SSFH+LRSF	74.7	89.6	81.5	70.7	82.3	80.0	93.6	86.0	75.3	86.7

表1

表2

图3

图4

参考文献 15

1	Islam M U, Mahmud H, Ashraf F B, et al. Yoga posture recognition by detecting human joint points in real time using microsoft kinect[C]∥ IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, Bangladesh, 2017: 668-673.
2	李贻斌, 郭佳旻, 张勤. 人体步态识别方法与技术[J]. 吉林大学学报: 工学版, 2020, 50(1): 1-18.
	Li Yi-bin, Guo Jia-min, Zhang Qin. Methods and technologies of human gait recognition[J]. Journal of Jilin University (Engineering and Technology Edition), 2020, 50(1): 1-18.
3	田皓宇, 马昕, 李贻斌. 基于骨架信息的异常步态识别方法[J]. 吉林大学学报: 工学版, 2022, 52(4): 725-737.
	Tian Hao-yu, Ma Xin, Li Yi-bin. Skeleton-based abnormal gait recognition: a survey[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(4): 725-737.
4	Tang S, Andriluka M, Andres B, et al. Multiple people tracking by lifted multicut and person re-identification[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hololulu, USA, 2017: 3539-3548.
5	侯春萍, 杨庆元, 黄美艳, 等. 基于语义耦合和身份一致性的跨模态行人重识别方法[J]. 吉林大学学报: 工学版, 2022, 52(12): 2954-2963.
	Hou Chun-ping, Yang Qing-yuan, Huang Mei-yan, et al. Cross⁃modality person re-identification based on semantic coupling and identity-consistence constraint[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(12): 2954-2963.
6	Cheng Y, Yi P, Liu R, et al. Human-robot interaction method combining human pose estimation and motion intention recognition[C]∥ IEEE 24th International Conference on Computer Supported Cooperative Work in Design, Dalian, China, 2021: 958-963.
7	Toshev A, Szegedy C. Deeppose: human pose estimation via deep neural networks[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1653-1660.
8	Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seoul, South Korea, 2019: 5693-5703.
9	Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking[C]∥ Proceedings of the European Conference on Computer Vision, Munichi, Germany, 2018: 466-481.
10	Huang J, Zhu Z, Guo F, et al. The devil is in the details: delving into unbiased data processing for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 5700-5709.
11	Zhang F, Zhu X, Dai H, et al. Distribution-aware coordinate representation for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7093-7102.
12	Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]∥ Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, Netherlands, 2016: 483-499.
13	Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context[C]∥ Computer Vision⁃ECCV 2014: 13th European Conference, Zurich, Switzerland, 2014: 740-755.
14	Andriluka M, Pishchulin L, Gehler P, et al. 2d human pose estimation: new benchmark and state of the art analysis[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 3686-3693.
15	Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts[C]∥ CVPR 2011, Colorado Springs, USA, 2011: 1385-1392.

相关文章 15

[1]	刘晶红,邓安平,陈琪琪,彭佳琦,左羽佳. 基于多重注意力机制的无锚框目标跟踪算法[J]. 吉林大学学报(工学版), 2023, 53(12): 3518-3528.
[2]	王侃,苏航,曾浩,覃剑. 表观增强的深度目标跟踪算法[J]. 吉林大学学报(工学版), 2022, 52(11): 2676-2684.
[3]	曹洁,屈雪,李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报(工学版), 2021, 51(5): 1785-1791.
[4]	徐涛,马克,刘才华. 基于深度学习的行人多目标跟踪方法[J]. 吉林大学学报(工学版), 2021, 51(1): 27-38.
[5]	赵宏伟,李明昭,刘静,胡黄水,王丹,臧雪柏. 基于自然性和视觉特征通道的场景分类[J]. 吉林大学学报(工学版), 2019, 49(5): 1668-1675.
[6]	车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[7]	刘舒, 姜琦刚, 朱航, 李晓东. 基于Hyb-F组合滤波算法的向海自然保护区NDVI时间序列重构[J]. 吉林大学学报(工学版), 2018, 48(3): 957-967.
[8]	许岩岩, 陈辉, 刘家驹, 袁金钊. CELL处理器并行实现立体匹配算法[J]. 吉林大学学报(工学版), 2017, 47(3): 952-958.
[9]	杨焱, 刘飒, 廉世彬, 朱晓冬. 基于计算机视觉的果树害虫的形态特征分析[J]. 吉林大学学报(工学版), 2013, 43(增刊1): 235-238.
[10]	商飞, 马骏骁, 姚立, 田地, 邱春玲. 基于多特征融合的科学仪器工作状态检测方法[J]. 吉林大学学报(工学版), 2010, 40(02): 545-0548.
[11]	葛亮,朱庆生,傅思思,罗大江,刘金凤. 改进的立体像对稠密匹配算法[J]. 吉林大学学报(工学版), 2010, 40(01): 212-0217.
[12]	殷涌光,丁筠. 基于计算机视觉的食品中大肠杆菌快速定量检测[J]. 吉林大学学报(工学版), 2009, 39(增刊2): 344-0348.
[13]	管欣，贾鑫，高振海 . 基于道路图像对比度-区域均匀性图分析的自适应阈值算法[J]. 吉林大学学报(工学版), 2008, 38(04): 758-763.
[14]	万鹏，孙瑜，孙永海 . 基于计算机视觉的大米粒形识别方法[J]. 吉林大学学报(工学版), 2008, 38(02): 489-0492.
[15]	田建，李江，李亚桥. 道路交通事故现场摄影测量的标定技术[J]. 吉林大学学报(工学版), 2006, 36(增刊1): 136-0139.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed