基于双向特征融合的物体位姿估计方法

doi:10.13229/j.cnki.jdxbgxb.20240748

Abstract

Abstract:

To fully leverage the appearance features of RGB images and the geometric features of depth images， this paper proposes an "appearance-geometry" features parallel fusion method for object position estimation. First， in the feature extraction and fusion stage， a three-stream bidirectional fusion architecture is constructed to ensure that the parallel RGB image features and depth image features are fused at each encoding layer and decoding layer. To prevent the loss of important features and achieve sufficient fusion of the two types of features， two complementary attention mechanisms are designed， enabling the two features to gain both local and global complementarities. Sercond， in the pose inference calculation stage， considering the distance between the keypoints output by the network and the object’s center point， a keypoint detection network based on a combination of distance metric and distance constraint is proposed， achieving accurate position estimation. The proposed algorithm has been tested on two challenging 6D object position estimation datasets， validating its effectiveness.

Key words: position estimation, bidirectional feature fusion, feature disparity, distance constraint, attention mechanism

CLC Number:

TP391.41

Jun MIAO,Jie YAN,Rong-hua DU,Lei LI,Jun CHU. A bidirectional feature fusion method for object position estimation[J].Journal of Jilin University(Engineering and Technology Edition), 2026, 56(2): 523-532.

Figures/Tables 8

Fig.1

Fig.2

Fig.3

Table 1

Fig.4

Table 2

Table 3

Fig.5

References 17

[1]	Guan J, Hao Y M, Wu Q X, et al. A survey of 6DoF object pose estimation methods for different application scenarios[J]. Sensors, 2024, 24(4): 1076.
[2]	Marullo G, Tanzi L, Piazzolla P, et al. 6D object position estimation from 2D images: A literature review[J]. Multimedia Tools and Applications, 2023, 82(16): 24605-24643.
[3]	王静, 金玉楚, 郭苹, 等. 基于深度学习的相机位姿估计方法综述[J]. 计算机工程与应用, 2023, 59(7): 1-14.
	Wang Jing, Jin Yu-chu, Guo Ping, et al. A review of camera pose estimation methods based on deep learning[J]. Computer Engineering and Applications, 2023, 59(7): 1-14.
[4]	Wang C, Xu D E, Zhu Y K, et al. Dense Fusion: 6D object pose estimation by iterative dense fusion[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE,2019: 3343-3352.
[5]	He Y S, Huang H B, Fan H Q, et al. FB6D: A full flow bidirectional fusion network for 6D pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEEE, 2021: 3003-3013.
[6]	Peng S D, Liu Y, Huang Q X, et al. PVNet: Pixel-wise voting network for 6DoF pose estimation[J]. Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3212-3223.
[7]	Lin S F, Wang Z R, Ling Y G, et al. E2EK: End-to-end regression network based on keypoint for 6D pose estimation[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 6526-6533.
[8]	Xiang Y, Schmidt T, Narayanan V, et al. Pose CNN: A convolutional neural network for 6D object pose estimation in cluttered scenes[J]. ArXiv Preprint, 2017, 11: 171100199.
[9]	Zakharov S, Shugurov I, Ilic S. DPOD: 6D pose object detector and refiner[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2019: 1941-1950.
[10]	王连明, 吴鑫. 基于姿态估计的物体 3D 运动参数测量方法[J]. 吉林大学学报:工学版, 2023, 53(7): 2099-2108.
	Wang Lian-ming, Wu Xin. Measurement of 3D motion parameters of an object based on attitude estimation[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(7): 2099-2108.
[11]	Ding Z F, Sun Y X, Xu S J, et al. Recent advances and perspectives in deep learning techniques for 3D point cloud data processing[J]. Robotics, 2023, 12(4): 100.
[12]	Zhou J, Chen K, Xu L L, et al. Deep fusion transformer network with weighted vector-wise keypoints voting for robust 6D object pose estimation[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2023: 13967-13977.
[13]	白琳, 刘林军, 李轩昂, 等. 基于自监督学习的单目图像深度估计算法[J]. 吉林大学学报:工学版, 2023, 53(4): 1139-1145.
	Bai Lin, Liu Lin-jun, Li Xuan-ang, et al. A depth estimation algorithm for monocular images based on self-supervised learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(4): 1139-1145.
[14]	Song C, Song J R, Huang Q X. HybridPose: 6D object pose estimation under hybrid representations[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 431-440.
[15]	张宸嘉, 朱磊, 俞璐. 卷积神经网络中的注意力机制综述[J]. 计算机工程与应用学报, 2021, 57(20):64-72.
	Zhang Chen-jia, Zhu Lei, Yu Lu. A review of attention mechanisms in convolutional neural networks[J]. Journal of Computer Engineering & Applications, 2021, 57(20):64-72.
[16]	Hinterstoisser S, Lepetit V, Ilic S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]∥Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision. Piscataway, NJ: IEEE, 2013: 548-562.
[17]	Calli B, Singh A, Walsman A, et al. The YCB object and model set: Towards common benchmarks for manipulation research[C]∥ International Conference on Advanced Robotics. Piscataway, NJ: IEEE, 2015: 510-517.

Related Articles 15

[1]	Qiu-zhan ZHOU,Xin-meng LI,Hao-qing-zi SHEN,Hui-nan WU,Yuan-yuan LI,Jing RONG,Chun-hua HU,Ping-ping LIU. Non-intrusive load decomposition of unbalanced data based on attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 239-246.
[2]	Zhi-gang FENG,Meng-yuan REN,Bing DONG,Ming-yue YU. Rolling bearing fault diagnosis based on multi-band feature map and improved SqueezeNet [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 96-108.
[3]	Zhen HUO,Li-sheng JIN,Qiang HUA, HEYang. Edge feature⁃guided semantic segmentation method for intelligent vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 3032-3041.
[4]	Qing-lin AI,Yuan-xiao LIU,Jia-hao YANG. Small target swmantic segmentation method based MFF-STDC network in complex outdoor environments [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2681-2692.
[5]	Yan PIAO,Ji-yuan KANG. RAUGAN：infrared image colorization method based on cycle generative adversarial networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2722-2731.
[6]	Shan-na ZHUANG,Jun-shuai WANG,Jing BAI,Jing-jin DU,Zheng-you WANG. Video-based person re-identification based on three-dimensional convolution and self-attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2409-2417.
[7]	Zhi-gang FENG,Shou-qi WANG,Ming-yue YU. Rolling bearing fault diagnosis based on variational mode extraction and lightweight network [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 1883-1891.
[8]	Ya-li XUE,Tong-an YU,Shan CUI,Li-zun ZHOU. Infrared small target detection based on cascaded nested U-Net [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1714-1721.
[9]	Hua CAI,Yu-yao WANG,Qiang FU,Zhi-yong MA,Wei-gang WANG,Chen-jie ZHANG. Semantic segmentation network based on attention mechanism and feature fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1384-1395.
[10]	He-shan ZHANG,Meng-wei FAN,Xin TAN,Zhan-ji ZHENG,Li-ming KOU,Jin XU. Dense small object vehicle detection in UAV aerial images using improved YOLOX [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1307-1318.
[11]	Yang LI,Xian-guo LI,Chang-yun MIAO,Sheng XU. Low⁃light image enhancement algorithm based on dual branch channel prior and Retinex [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1028-1036.
[12]	Xiang WANG,Guo-zhen TAN,Yan-fei PENG,Hao REN,Jian-ping LI. Autonomous driving decision⁃making model based on language reasoning and cognitive memory [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(12): 3918-3927.
[13]	Yun-hong LI,Mei WANG,Xue-ping SU,Li-min LI,Fu-xing ZHANG,Te-ji HAO. Road extraction from remote sensing images combining attention and context fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(12): 4034-4044.
[14]	Duo PENG,Ming-shuo LIU,Kun XIE. Observation station parameter error joint multi-feature fusion attention mechanism TDOA/FDOA multi-aircraft passive localization algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3751-3761.
[15]	Ming-hui SUN,Jing-yuan BIAN,Jia-xing CHE,Zhen-jie SHU. Trajectory prediction and interception algorithm for large maneuvering multi-rotor UAV [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3416-3422.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

	RGB-D
	DenseFusion		PVN3D		FFB6D		本文方法（KP）		本文方法
object	ADD	ADDS	ADD	ADDS	ADD	ADDS	ADD	ADDS	ADD	ADDS
ape	83.9	70.2	90.9	84.6	96.0	80.5	98.4	90.6	99.8	96.5
benchvise	86.9	83.1	93.1	89.3	96.1	94.8	98.3	94.6	98.5	95.1
camera	84.2	88.4	94.3	94.2	97.4	96.3	98.6	96.6	99.6	98.6
can	86.0	76.2	90.5	84.8	98.2	88.5	97.6	89.6	97.9	96.8
cat	90.4	81.0	95.6	87.5	97.5	96.2	99.8	97.0	99.8	97.6
driller	88.0	70.7	91.7	83.8	96.0	89.3	98.8	93.9	99.6	97.0
duck	84.1	82.7	94.3	87.1	97.1	95.7	99.1	94.6	99.3	97.8
eggbox	87.2	85.2	92.9	89.5	97.7	96.1	98.1	96.9	99.4	99.4
gule	83.5	74.5	88.2	81.6	93.3	88.6	98.7	94.1	99.7	99.7
holepuncher	86.0	82.3	94.8	81.0	96.6	93.7	99.2	95.5	99.5	95.3
iron	87.0	83.6	89.5	87.3	97.4	96.5	99.6	96.9	99.2	97.3
lamp	81.6	75.3	88.4	79.8	96.0	93.2	98.8	96.3	99.5	97.7
phone	79.6	79.6	82.3	88.3	86.0	90.2	96.3	96.3	99.6	98.6
ALL	85.3	79.4	91.3	86.1	95.8	92.3	98.6	94.8	99.4	97.5

方法	Baseline+ICP	Baseline+KP	Baseline+DCKP
ADD	93.0	95.5	97.4
ADDS	86.9	91.8	94.5

	RGB				RGB-D
	PoseCNN		PVNet		DenseFusion		FFB6D		本文方法
object	ADD	ADDS	ADD	ADDS	ADD	ADDS	ADD	ADDS	ADD	ADDS
002_master_chef_can	83.9	50.2	90.9	74.6	95.3	70.7	96.3	80.6	98.2	93.2
003_cracker_box	76.9	53.1	87.1	79.3	92.5	86.9	96.3	94.6	99.8	96.3
004_sugar_box	84.2	68.4	94.3	84.2	95.1	90.8	97.6	96.6	99.2	97.9
005_tomato_soup_can	81.0	66.2	90.5	79.8	93.8	84.7	95.6	89.6	97.8	94.8
006_mustard_bottle	90.4	81.0	90.6	83.5	95.8	90.9	97.8	97.0	98.8	98.5
007_tuna_fish_can	88.0	70.7	91.7	73.8	95.7	79.6	96.8	89.7	99.8	88.9
008_pudding_box	79.1	62.7	89.3	84.1	94.3	89.3	97.1	94.6	97.7	96.8
009_gelatin_box	87.2	75.2	92.9	89.5	97.2	95.8	98.3	96.9	98.1	97.3
024_bowl*	69.6	69.6	80.3	80.3	86.0	86.0	96.3	96.3	99.8	99.8
025_mug	78.2	58.5	90.7	76.6	95.3	83.8	97.3	94.2	98.5	93.4
035_power_drill	72.7	55.3	87.4	78.4	92.1	83.7	97.2	95.9	98.0	97.6
036_wood_block*	64.3	64.3	84.2	84.2	89.5	89.5	92.6	92.6	98.9	98.9
037_scissors	56.9	35.8	84.2	70.3	90.1	77.4	97.7	95.7	98.5	95.2
040_large_maker*	71.7	58.3	89.5	81.0	95.1	89.1	96.6	89.1	99.0	99.0
051_large_clamp*	50.2	50.2	63.6	63.6	71.5	71.5	96.8	96.8	99.1	99.1
061_foam_brick	88.0	88.0	83.1	83.1	92.2	92.2	97.3	97.3	98.6	98.6
ALL	76.4	63.0	86.9	79.1	92.0	85.1	96.7	93.6	98.8	96.6

A bidirectional feature fusion method for object position estimation

RICH HTML

PDF (PC)