Journal of Jilin University(Engineering and Technology Edition) ›› 2026, Vol. 56 ›› Issue (2): 523-532.doi: 10.13229/j.cnki.jdxbgxb.20240748

Previous Articles    

A bidirectional feature fusion method for object position estimation

Jun MIAO1,2(),Jie YAN1,Rong-hua DU1(),Lei LI3,Jun CHU3   

  1. 1.School of Aeronautical Manufacturing and Mechanical Engineering,Nanchang Hangkong University,Nanchang 330063,China
    2.Key Laboratory of Lunar and Deep Space Exploration,CAS,Beijing 100190,China
    3.Institute of Computer Vision,Nanchang Hangkong University,Nanchang 330063,China
  • Received:2024-07-06 Online:2026-02-01 Published:2026-03-17
  • Contact: Rong-hua DU E-mail:miaojun@nchu.edu.cn;7130l@nchu.edu.cn

Abstract:

To fully leverage the appearance features of RGB images and the geometric features of depth images, this paper proposes an "appearance-geometry" features parallel fusion method for object position estimation. First, in the feature extraction and fusion stage, a three-stream bidirectional fusion architecture is constructed to ensure that the parallel RGB image features and depth image features are fused at each encoding layer and decoding layer. To prevent the loss of important features and achieve sufficient fusion of the two types of features, two complementary attention mechanisms are designed, enabling the two features to gain both local and global complementarities. Sercond, in the pose inference calculation stage, considering the distance between the keypoints output by the network and the object’s center point, a keypoint detection network based on a combination of distance metric and distance constraint is proposed, achieving accurate position estimation. The proposed algorithm has been tested on two challenging 6D object position estimation datasets, validating its effectiveness.

Key words: position estimation, bidirectional feature fusion, feature disparity, distance constraint, attention mechanism

CLC Number: 

  • TP391.41

Fig.1

Structure diagram of the proposed pose estimation network"

Fig.2

The structure diagram of the proposed feature processing module"

Fig.3

Structure diagram of the proposed keypoint detection network (DCKP)"

Table 1

Comparison of the proposed method with other methods on the LineMOD dataset"

RGB-D
DenseFusionPVN3DFFB6D本文方法(KP)本文方法
objectADDADDSADDADDSADDADDSADDADDSADDADDS
ape83.970.290.984.696.080.598.490.699.896.5
benchvise86.983.193.189.396.194.898.394.698.595.1
camera84.288.494.394.297.496.398.696.699.698.6
can86.076.290.584.898.288.597.689.697.996.8
cat90.481.095.687.597.596.299.897.099.897.6
driller88.070.791.783.896.089.398.893.999.697.0
duck84.182.794.387.197.195.799.194.699.397.8
eggbox87.285.292.989.597.796.198.196.999.499.4
gule83.574.588.281.693.388.698.794.199.799.7
holepuncher86.082.394.881.096.693.799.295.599.595.3
iron87.083.689.587.397.496.599.696.999.297.3
lamp81.675.388.479.896.093.298.896.399.597.7
phone79.679.682.388.386.090.296.396.399.698.6
ALL85.379.491.386.195.892.398.694.899.497.5

Fig.4

Visualization results of the proposed method with other methods on the LineMOD dataset"

Table 2

Validation of the effectiveness of the keypoint detection network proposed in this paper"

方法Baseline+ICPBaseline+KPBaseline+DCKP
ADD93.095.597.4
ADDS86.991.894.5

Table 3

Comparison of the proposed method with other methods on the YCB-Video dataset"

RGBRGB-D
PoseCNNPVNetDenseFusionFFB6D本文方法
objectADDADDSADDADDSADDADDSADDADDSADDADDS
002_master_chef_can83.950.290.974.695.370.796.380.698.293.2
003_cracker_box76.953.187.179.392.586.996.394.699.896.3
004_sugar_box84.268.494.384.295.190.897.696.699.297.9
005_tomato_soup_can81.066.290.579.893.884.795.689.697.894.8
006_mustard_bottle90.481.090.683.595.890.997.897.098.898.5
007_tuna_fish_can88.070.791.773.895.779.696.889.799.888.9
008_pudding_box79.162.789.384.194.389.397.194.697.796.8
009_gelatin_box87.275.292.989.597.295.898.396.998.197.3
024_bowl*69.669.680.380.386.086.096.396.399.899.8
025_mug78.258.590.776.695.383.897.394.298.593.4
035_power_drill72.755.387.478.492.183.797.295.998.097.6
036_wood_block*64.364.384.284.289.589.592.692.698.998.9
037_scissors56.935.884.270.390.177.497.795.798.595.2
040_large_maker*71.758.389.581.095.189.196.689.199.099.0
051_large_clamp*50.250.263.663.671.571.596.896.899.199.1
061_foam_brick88.088.083.183.192.292.297.397.398.698.6
ALL76.463.086.979.192.085.196.793.698.896.6

Fig.5

Visualization results of the proposed method trained on the YCB-Video dataset"

[1] Guan J, Hao Y M, Wu Q X, et al. A survey of 6DoF object pose estimation methods for different application scenarios[J]. Sensors, 2024, 24(4): 1076.
[2] Marullo G, Tanzi L, Piazzolla P, et al. 6D object position estimation from 2D images: A literature review[J]. Multimedia Tools and Applications, 2023, 82(16): 24605-24643.
[3] 王静, 金玉楚, 郭苹, 等. 基于深度学习的相机位姿估计方法综述[J]. 计算机工程与应用, 2023, 59(7): 1-14.
Wang Jing, Jin Yu-chu, Guo Ping, et al. A review of camera pose estimation methods based on deep learning[J]. Computer Engineering and Applications, 2023, 59(7): 1-14.
[4] Wang C, Xu D E, Zhu Y K, et al. Dense Fusion: 6D object pose estimation by iterative dense fusion[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE,2019: 3343-3352.
[5] He Y S, Huang H B, Fan H Q, et al. FB6D: A full flow bidirectional fusion network for 6D pose estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEEE, 2021: 3003-3013.
[6] Peng S D, Liu Y, Huang Q X, et al. PVNet: Pixel-wise voting network for 6DoF pose estimation[J]. Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3212-3223.
[7] Lin S F, Wang Z R, Ling Y G, et al. E2EK: End-to-end regression network based on keypoint for 6D pose estimation[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 6526-6533.
[8] Xiang Y, Schmidt T, Narayanan V, et al. Pose CNN: A convolutional neural network for 6D object pose estimation in cluttered scenes[J]. ArXiv Preprint, 2017, 11: 171100199.
[9] Zakharov S, Shugurov I, Ilic S. DPOD: 6D pose object detector and refiner[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2019: 1941-1950.
[10] 王连明, 吴鑫. 基于姿态估计的物体 3D 运动参数测量方法[J]. 吉林大学学报:工学版, 2023, 53(7): 2099-2108.
Wang Lian-ming, Wu Xin. Measurement of 3D motion parameters of an object based on attitude estimation[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(7): 2099-2108.
[11] Ding Z F, Sun Y X, Xu S J, et al. Recent advances and perspectives in deep learning techniques for 3D point cloud data processing[J]. Robotics, 2023, 12(4): 100.
[12] Zhou J, Chen K, Xu L L, et al. Deep fusion transformer network with weighted vector-wise keypoints voting for robust 6D object pose estimation[C]∥Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway, NJ: IEEE, 2023: 13967-13977.
[13] 白琳, 刘林军, 李轩昂, 等. 基于自监督学习的单目图像深度估计算法[J]. 吉林大学学报:工学版, 2023, 53(4): 1139-1145.
Bai Lin, Liu Lin-jun, Li Xuan-ang, et al. A depth estimation algorithm for monocular images based on self-supervised learning[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(4): 1139-1145.
[14] Song C, Song J R, Huang Q X. HybridPose: 6D object pose estimation under hybrid representations[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 431-440.
[15] 张宸嘉, 朱磊, 俞璐. 卷积神经网络中的注意力机制综述[J]. 计算机工程与应用学报, 2021, 57(20):64-72.
Zhang Chen-jia, Zhu Lei, Yu Lu. A review of attention mechanisms in convolutional neural networks[J]. Journal of Computer Engineering & Applications, 2021, 57(20):64-72.
[16] Hinterstoisser S, Lepetit V, Ilic S, et al. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[C]∥Computer Vision-ACCV 2012: 11th Asian Conference on Computer Vision. Piscataway, NJ: IEEE, 2013: 548-562.
[17] Calli B, Singh A, Walsman A, et al. The YCB object and model set: Towards common benchmarks for manipulation research[C]∥ International Conference on Advanced Robotics. Piscataway, NJ: IEEE, 2015: 510-517.
[1] Qiu-zhan ZHOU,Xin-meng LI,Hao-qing-zi SHEN,Hui-nan WU,Yuan-yuan LI,Jing RONG,Chun-hua HU,Ping-ping LIU. Non-intrusive load decomposition of unbalanced data based on attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 239-246.
[2] Zhi-gang FENG,Meng-yuan REN,Bing DONG,Ming-yue YU. Rolling bearing fault diagnosis based on multi-band feature map and improved SqueezeNet [J]. Journal of Jilin University(Engineering and Technology Edition), 2026, 56(1): 96-108.
[3] Zhen HUO,Li-sheng JIN,Qiang HUA, HEYang. Edge feature⁃guided semantic segmentation method for intelligent vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(9): 3032-3041.
[4] Qing-lin AI,Yuan-xiao LIU,Jia-hao YANG. Small target swmantic segmentation method based MFF-STDC network in complex outdoor environments [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2681-2692.
[5] Yan PIAO,Ji-yuan KANG. RAUGAN:infrared image colorization method based on cycle generative adversarial networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(8): 2722-2731.
[6] Shan-na ZHUANG,Jun-shuai WANG,Jing BAI,Jing-jin DU,Zheng-you WANG. Video-based person re-identification based on three-dimensional convolution and self-attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(7): 2409-2417.
[7] Zhi-gang FENG,Shou-qi WANG,Ming-yue YU. Rolling bearing fault diagnosis based on variational mode extraction and lightweight network [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 1883-1891.
[8] Ya-li XUE,Tong-an YU,Shan CUI,Li-zun ZHOU. Infrared small target detection based on cascaded nested U-Net [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(5): 1714-1721.
[9] Hua CAI,Yu-yao WANG,Qiang FU,Zhi-yong MA,Wei-gang WANG,Chen-jie ZHANG. Semantic segmentation network based on attention mechanism and feature fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1384-1395.
[10] He-shan ZHANG,Meng-wei FAN,Xin TAN,Zhan-ji ZHENG,Li-ming KOU,Jin XU. Dense small object vehicle detection in UAV aerial images using improved YOLOX [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(4): 1307-1318.
[11] Yang LI,Xian-guo LI,Chang-yun MIAO,Sheng XU. Low⁃light image enhancement algorithm based on dual branch channel prior and Retinex [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(3): 1028-1036.
[12] Xiang WANG,Guo-zhen TAN,Yan-fei PENG,Hao REN,Jian-ping LI. Autonomous driving decision⁃making model based on language reasoning and cognitive memory [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(12): 3918-3927.
[13] Yun-hong LI,Mei WANG,Xue-ping SU,Li-min LI,Fu-xing ZHANG,Te-ji HAO. Road extraction from remote sensing images combining attention and context fusion [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(12): 4034-4044.
[14] Duo PENG,Ming-shuo LIU,Kun XIE. Observation station parameter error joint multi-feature fusion attention mechanism TDOA/FDOA multi-aircraft passive localization algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(11): 3751-3761.
[15] Ming-hui SUN,Jing-yuan BIAN,Jia-xing CHE,Zhen-jie SHU. Trajectory prediction and interception algorithm for large maneuvering multi-rotor UAV [J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(10): 3416-3422.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!