吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (5): 1385-1392.doi: 10.13229/j.cnki.jdxbgxb.20230268

• 计算机科学与技术 • 上一篇    

基于亚像素定位的人体姿态热图后处理

王宇1(),赵凯1,2   

  1. 1.长春理工大学 电子信息工程学院,长春 130022
    2.黑龙江科技大学 电子与信息工程学院,哈尔滨 150022
  • 收稿日期:2023-03-26 出版日期:2024-05-01 发布日期:2024-06-11
  • 作者简介:王宇(1974-),女,教授,博士.研究方向:图像处理与机器视觉.E-mail:wangyulfy@cust.edu.cn
  • 基金资助:
    吉林省自然科学基金项目(20210101180JC)

Postprocessing of human pose heatmap based on sub⁃pixel location

Yu WANG1(),Kai ZHAO1,2   

  1. 1.School of Electronic Information Engineering,Changchun University of Science and Technology,Changchun 130022,China
    2.School of Electronics and Information Engineering,Heilongjiang University of Science and Technology,Harbin 150022,China
  • Received:2023-03-26 Online:2024-05-01 Published:2024-06-11

摘要:

为提高热图预测关节点的精度,提出了一种基于亚像素定位的人体姿态热图后处理方法,该方法包括2个策略:第一个策略是翻转图像热图的亚像素偏移处理,可以消除与原始图像热图的未对齐偏差;第二个策略是局部区域曲面拟合的热图解码,实现关节点的亚像素定位。本文热图后处理方法独立于网络模型,不需要对模型进行任何修改即可应用于当前基于热图的人体姿态估计模型。在COCO2017和MPII数据集上对本文方法进行了实验。以HRNet-W32-256×192模型和Simple Baseline-W32-256×192模型为例,COCO2017数据集上平均精度分别提高了0.9和1.1,验证了方法的有效性。

关键词: 计算机视觉, 人体姿态估计, 热图后处理, 高斯拟合, 热图解码

Abstract:

To improve the prediction accuracy of joint points of the heatmap, this paper proposes a postprocessing method of human pose heatmap based on sub-pixel localization. The method includes two strategies: the first is the sub-pixel shift processing of the flipped image heatmap, which can eliminate the unaligned deviation from the original image heatmap; the second is the heatmap decoding for local region surface fitting to achieve sub-pixel localization of the joint points. The heatmap postprocessing method in this paper is independent of the network model and can be applied to the current heatmap-based human pose estimation models without any modification. To verify the effectiveness of the proposed method, experiments have been carried out by using two publicly available datasets named COCO2017 and MPII. The average precision can be improved by 0.9 and 1.1 on COCO2017, respectively, by adopting two deep learning models, i.e., HRNet-W32-256×192 model and Simple Baseline-W32-256×192 model.

Key words: computer vision, human pose estimation, heatmap postprocessing, gaussian fitting, heatmap decoding

中图分类号: 

  • TP391.4

图1

热图后处理流程"

图2

局部区域曲面拟合解码效果"

表1

COCO数据集实验结果"

网络模型分辨率解码方法APAP.5AP.75AP(M)AP(L)ARAR.5AR.75AR(M)AR(L)
HRNetC-32256×192标准偏移74.490.581.970.881.079.894.286.575.785.8
SSFH74.790.682.071.081.680.094.386.675.786.3
DARK74.890.482.071.481.680.294.186.776.186.2
LRSF75.090.582.071.481.780.294.186.776.186.2
SSFH+LRSF75.390.682.271.682.380.594.386.776.386.7
384×288标准偏移75.890.682.572.082.780.994.386.976.787.1
SSFH75.990.682.872.082.981.094.387.176.687.4
DARK75.990.682.472.182.981.194.387.076.787.4
LRSF76.090.682.572.283.081.194.387.076.987.3
SSFH+LRSF76.290.682.872.383.281.294.387.176.887.6
C-48256×192标准偏移75.190.682.271.581.880.494.386.776.286.4
SSFH75.390.682.471.382.480.594.286.676.186.9
DARK75.590.582.571.982.380.694.286.776.487
LRSF75.690.682.571.982.480.794.286.876.586.9
SSFH+LRSF75.990.782.572.082.981.094.286.776.687.3
384×288标准偏移76.390.882.972.383.481.294.287.176.787.6
SSFH76.490.883.172.383.681.394.287.376.887.8
DARK76.590.882.972.483.581.394.28776.887.7
LRSF76.590.882.972.583.681.394.287.076.987.7
SSFH+LRSF76.690.883.172.583.981.594.287.376.988.0
simple baselineR-50256×192标准偏移70.488.678.367.177.276.392.983.472.182.4
SSFH70.788.678.167.377.676.692.983.272.272.2
DARK71.088.678.567.777.976.79383.372.583
LRSF71.288.678.667.878.176.893.083.472.683.0
SSFH+LRSF71.588.678.767.978.377.293.183.572.783.4
384×288标准偏移72.289.378.968.179.777.693.283.872.884.6
SSFH72.389.478.968.180.177.793.383.672.984.8
DARK72.489.379.068.280.177.893.283.873.084.8
LRSF72.589.379.168.480.177.993.283.973.084.8
SSFH+LRSF72.689.479.168.480.478.093.283.873.185.1
R-101256×192标准偏移71.489.379.368.178.177.193.48473.083.2
SSFH71.789.379.468.278.777.393.384.173.083.6
DARK71.989.379.568.978.677.693.584.173.583.7
LRSF72.089.379.668.978.777.793.584.273.683.7
SSFH+LRSF72.389.379.568.879.477.993.384.273.684.2
384×288标准偏移73.689.680.369.981.179.193.685.174.585.8
SSFH73.889.580.669.781.479.093.285.374.385.9
DARK73.989.480.669.981.479.193.285.374.485.9
LRSF73.973.980.770.081.579.293.385.374.586.0
SSFH+LRSF74.189.480.870.081.879.393.385.474.586.2
R-152256×192标准偏移72.089.379.868.778.977.893.484.673.683.9
SSFH72.489.479.768.979.578.093.584.573.784.3
DARK72.689.38069.479.778.393.484.974.184.4
LRSF72.789.380.069.479.778.393.484.974.184.4
SSFH+LRSF72.989.380.069.480.278.593.484.874.284.8
384×288标准偏移74.389.681.170.581.679.793.785.875.186.3
SSFH74.489.681.381.381.979.893.785.975.186.5
DARK74.689.681.470.98279.893.68675.286.5
LRSF74.689.681.570.882.079.993.686.075.386.5
SSFH+LRSF74.789.681.570.782.380.093.686.075.386.7

表2

MPII数据集实验结果"

解码方法HeadShoulderElbowWristHipKneeAnkleMeanPckh@0.1
标准偏移97.195.990.386.489.187.183.390.337.7
SSFH97.196.190.386.589.187.083.390.438.9
DARK97.296.090.486.589.387.183.390.540.3
LRSF97.296.090.586.589.587.183.390.541.3
SSFH+LRSF97.296.190.686.589.387.083.490.541.5

图3

不同热图后处理方法的速度、精度对比"

图4

可视化分析"

1 Islam M U, Mahmud H, Ashraf F B, et al. Yoga posture recognition by detecting human joint points in real time using microsoft kinect[C]∥ IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, Bangladesh, 2017: 668-673.
2 李贻斌, 郭佳旻, 张勤. 人体步态识别方法与技术[J]. 吉林大学学报: 工学版, 2020, 50(1): 1-18.
Li Yi-bin, Guo Jia-min, Zhang Qin. Methods and technologies of human gait recognition[J]. Journal of Jilin University (Engineering and Technology Edition), 2020, 50(1): 1-18.
3 田皓宇, 马昕, 李贻斌. 基于骨架信息的异常步态识别方法[J]. 吉林大学学报: 工学版, 2022, 52(4): 725-737.
Tian Hao-yu, Ma Xin, Li Yi-bin. Skeleton-based abnormal gait recognition: a survey[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(4): 725-737.
4 Tang S, Andriluka M, Andres B, et al. Multiple people tracking by lifted multicut and person re-identification[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hololulu, USA, 2017: 3539-3548.
5 侯春萍, 杨庆元, 黄美艳, 等. 基于语义耦合和身份一致性的跨模态行人重识别方法[J]. 吉林大学学报: 工学版, 2022, 52(12): 2954-2963.
Hou Chun-ping, Yang Qing-yuan, Huang Mei-yan, et al. Cross⁃modality person re-identification based on semantic coupling and identity-consistence constraint[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(12): 2954-2963.
6 Cheng Y, Yi P, Liu R, et al. Human-robot interaction method combining human pose estimation and motion intention recognition[C]∥ IEEE 24th International Conference on Computer Supported Cooperative Work in Design, Dalian, China, 2021: 958-963.
7 Toshev A, Szegedy C. Deeppose: human pose estimation via deep neural networks[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1653-1660.
8 Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seoul, South Korea, 2019: 5693-5703.
9 Xiao B, Wu H, Wei Y. Simple baselines for human pose estimation and tracking[C]∥ Proceedings of the European Conference on Computer Vision, Munichi, Germany, 2018: 466-481.
10 Huang J, Zhu Z, Guo F, et al. The devil is in the details: delving into unbiased data processing for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 5700-5709.
11 Zhang F, Zhu X, Dai H, et al. Distribution-aware coordinate representation for human pose estimation[C]∥ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7093-7102.
12 Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]∥ Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, Netherlands, 2016: 483-499.
13 Lin T Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context[C]∥ Computer Vision⁃ECCV 2014: 13th European Conference, Zurich, Switzerland, 2014: 740-755.
14 Andriluka M, Pishchulin L, Gehler P, et al. 2d human pose estimation: new benchmark and state of the art analysis[C]∥ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 3686-3693.
15 Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts[C]∥ CVPR 2011, Colorado Springs, USA, 2011: 1385-1392.
[1] 刘晶红,邓安平,陈琪琪,彭佳琦,左羽佳. 基于多重注意力机制的无锚框目标跟踪算法[J]. 吉林大学学报(工学版), 2023, 53(12): 3518-3528.
[2] 王侃,苏航,曾浩,覃剑. 表观增强的深度目标跟踪算法[J]. 吉林大学学报(工学版), 2022, 52(11): 2676-2684.
[3] 曹洁,屈雪,李晓旭. 基于滑动特征向量的小样本图像分类方法[J]. 吉林大学学报(工学版), 2021, 51(5): 1785-1791.
[4] 徐涛,马克,刘才华. 基于深度学习的行人多目标跟踪方法[J]. 吉林大学学报(工学版), 2021, 51(1): 27-38.
[5] 赵宏伟,李明昭,刘静,胡黄水,王丹,臧雪柏. 基于自然性和视觉特征通道的场景分类[J]. 吉林大学学报(工学版), 2019, 49(5): 1668-1675.
[6] 车翔玖, 王利, 郭晓新. 基于多尺度特征融合的边界检测算法[J]. 吉林大学学报(工学版), 2018, 48(5): 1621-1628.
[7] 刘舒, 姜琦刚, 朱航, 李晓东. 基于Hyb-F组合滤波算法的向海自然保护区NDVI时间序列重构[J]. 吉林大学学报(工学版), 2018, 48(3): 957-967.
[8] 许岩岩, 陈辉, 刘家驹, 袁金钊. CELL处理器并行实现立体匹配算法[J]. 吉林大学学报(工学版), 2017, 47(3): 952-958.
[9] 杨焱, 刘飒, 廉世彬, 朱晓冬. 基于计算机视觉的果树害虫的形态特征分析[J]. 吉林大学学报(工学版), 2013, 43(增刊1): 235-238.
[10] 商飞, 马骏骁, 姚立, 田地, 邱春玲. 基于多特征融合的科学仪器工作状态检测方法[J]. 吉林大学学报(工学版), 2010, 40(02): 545-0548.
[11] 葛亮,朱庆生,傅思思,罗大江,刘金凤. 改进的立体像对稠密匹配算法[J]. 吉林大学学报(工学版), 2010, 40(01): 212-0217.
[12] 殷涌光,丁筠. 基于计算机视觉的食品中大肠杆菌快速定量检测[J]. 吉林大学学报(工学版), 2009, 39(增刊2): 344-0348.
[13] 管欣,贾鑫,高振海 . 基于道路图像对比度-区域均匀性图分析的自适应阈值算法[J]. 吉林大学学报(工学版), 2008, 38(04): 758-763.
[14] 万鹏,孙瑜,孙永海 . 基于计算机视觉的大米粒形识别方法[J]. 吉林大学学报(工学版), 2008, 38(02): 489-0492.
[15] 田建,李江,李亚桥. 道路交通事故现场摄影测量的标定技术[J]. 吉林大学学报(工学版), 2006, 36(增刊1): 136-0139.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!