Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (8): 2355-2363.doi: 10.13229/j.cnki.jdxbgxb.20230341

Previous Articles     Next Articles

Anterior-posterior memory matrix model for chest radiology image report generation

Li-jun LIU1,2(),Yun-feng ZHANG1,Qing-song HUANG1   

  1. 1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    2.Yunnan Key Laboratory of Computer Technologies Application,Kunming 650500,China
  • Received:2023-04-11 Online:2024-08-01 Published:2024-08-30

Abstract:

Aiming at the existing radiographic report generation methods that mainly focus on the guiding role of the preceding information in the report generation process and ignore the role of the following information in the report description generation, which has the problems of incomplete semantic information and missing key information, anterior-posterior memory matrix model for chest radiology image report generation(RadRG) is proposed. In order to obtain richer semantic information in the generation process, anterior-posterior memory matrix(MA) is proposed, which uses the memory matrix to record the schema information and semantic information of the report, and uses a gating unit to regulate it in order to prevent the gradient from disappearing or the information explosion in the training process. The model in this paper is tested on the IU X-Ray and MIMIC-CXR public datasets, and the experimental results show that the model in this paper achieves a 2.6% improvement in the BIEU metrics compared with the existing mainstream models.

Key words: medical report generation, context memory matrix, gating unit, cross model

CLC Number: 

  • TP391

Fig.1

Overall structure diagram of the mode"

Fig.2

Visual feature extraction module"

Fig.3

Front and rear text memory matrix generation module"

Fig.4

Door control unit"

Fig.5

MLCN module structure"

Table 1

Experimental parameter setting"

参数名参数值参数说明
lr_ve5×10-5视觉特征初始学习率
lr_ed1×10-4其他参数学习率
weight_decay5×10-5权重的衰减率
batch10一次输入网络的数量
m_f512记忆矩阵的维度
mma_heads8多头注意力的头数
gamma0.1学习率的衰减率(IU X-Ray)
0.8学习率的衰减率(MIMIC-CXR)
epoch100迭代次数(IU X-Ray)
50迭代次数(MIMIC-CXR)

Table 2

Comparison of test results of different algorithms on IU X?Ray and MIMIC?CXR datasets"

数据集模型BLEU-1BLEU-2BLEU-3BLEU-4METEORROUGE
IU X-RayADAATT90.2200.1270.0890.0680.308
CoATT240.4550.2880.2050.1540.369
HRGR250.4380.2980.2080.1510.322
CMAS-RL260.4640.3010.2100.1540.362
CMN220.4750.3090.2220.1700.1810.375
PPKED210.4800.3090.2240.1680.1830.376
V-TL230.4810.3110.2240.1690.380
MIMIC-CXROurs0.4960.3250.2370.1790.1920.391
ADAATT0.2690.1450.0840.0580.0980.226
TOPDOWN270.2770.1550.0900.0520.1080.237
CMN0.2780.1650.1040.0650.1180.247
PPKED0.2820.1680.1070.0640.1160.247
V-TL0.2830.1690.1070.0680.249
Ours0.2940.1760.1160.0820.1330.256

Table 3

Ablation experiment"

数据集模 型BLEU-1BLEU-2BLEU-3BLEU-4METEORROUGE
IU X-RayBASE0.4700.3040.2190.1650.1870.371
BASE+Attention0.4760.3000.2150.1690.1860.371
BASE+MA0.4770.3040.2240.1710.1910.383
BASE+MA+MCLN0.4810.3110.2230.1730.1930.387
Ours0.4910.3190.2310.1760.1920.391
MIMIC-CXRBASE0.2740.1610.1040.0600.1170.246
BASE+Attention0.2730.1610.1040.0720.1150.252
BASE+MA0.2830.1630.1080.0730.1100.251
BASE+MA+MLCN0.2850.1650.1090.0720.1110.250
Ours0.2860.1680.1080.0730.1230.251

Table 4

Visual analysis"

数据集放射影像真实值生成文本
IU X-RayThe heart is enlarged. There is pulmonary vascular congestion with diffusely increased interstitial and mild patchy airspace opacities. The distribution XXXX pulmonary edema. There is no pneumothorax or large pleural effusion. There are no acute bony findingsthere is stable cardiomegaly with xxxx pulmonary vascular congestion and probable mild interstitial edema. there are bilateral pleural effusions with bibasilar airspace disease right greater than left. there is no pneumothorax. there are no acute bony findings
The lungs are clear. there is no pleural effusion or pneumothorax. the heart and mediastinum are normal. the skeletal structures are normal.cardiac and mediastinal contours are within normal limits. there is no pleural effusion or pneumothorax. the lungs are clear. bony structures are intact
MIMIC-CXRmoderate cardiomegaly is re-demonstrated. the aorta is tortuous. pulmonary vasculature is not engorged. patchy opacities are seen in the left lung base potentially atelectasis but infection or aspiration cannot be excluded. streaky atelectasis is also demonstrated in the left lung base. no pleural effusion or pneumothorax is present. no acute osseous abnormality is visualizedlung volumes are low. heart size is mildly enlarged. mediastinal and hilar contours are unremarkable. pulmonary vasculature is not engorged. streaky opacities in the lung bases likely reflect atelectasis. no focal consolidation pleural effusion or pneumothorax is present. no acute osseous abnormality is detected.
1 刘桂霞, 田郁欣, 王涛, 等. 基于双输入3D卷积神经网络的胰腺分割算法[J]. 吉林大学学报: 工学版, 2023, 53(12): 3565-3572.
Liu Gui-xia, Tian Yu-xin, Wang Tao,et al. Pancreas segmentation algorithm based on dual input 3D convolutional neural network[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(12): 3565-3572.
2 Kaur N, Mittal A, Singh G. Methods for automatic generation of radiological reports of chest radiographs: a comprehensive survey[J]. Multimedia Tools and Applications, 2022, 81(10): 13409-13439.
3 Li M, Liu R, Wang F, et al. Auxiliary signal-guided knowledge encoder-decoder for medical report generation[J]. World Wide Web, 2023, 26(1): 253-270.
4 Li C Y, Liang X, Hu Z, et al. Knowledge-driven encode, retrieve, paraphrase for medical image report generation[DB/OL].[2023-03-22]..
5 Wang Z, Zhou L, Wang L, et al. A self-boosting framework for automated radiographic report generation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA,2021: 2433-2442.
6 Messina P, Pino P, Parra D, et al. A survey on deep learning and explainability for automatic report generation from medical images[J]. ACM Computing Surveys (CSUR), 2022, 54(10s): 1-40.
7 Kuo C W, Kira Z. Beyond a pre-trained object detector: cross-modal textual and visual context for image captioning[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans,USA,2022: 17969-17979.
8 Luo Y, Ji J, Sun X, et al. Dual-level collaborative transformer for image captioning[DB/OL].[2023-03-22]..
9 Lu J, Xiong C, Parikh D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Honolulu, USA, 2017: 375-383.
10 Diao H, Zhang Y, Ma L, et al. Similarity reasoning and filtration for image-text matching[DB/OL].[2023-03-23]..
11 Wang L, Li Y, Lazebnik S. Learning deep structure-preserving image-text embeddings[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA,2016: 5005-5013.
12 Wang H, Zhang Y, Ji Z, et al. Consensus-aware visual-semantic embedding for image-text matching[C]∥Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020: 18-34.
13 Kim W, Son B, Kim I. Vilt: Vision-and-language transformer without convolution or region supervision[DB/OL].[2023-03-23].∥.
14 Srinivasan P, Thapar D, Bhavsar A, et al. Hierarchical X-ray report generation via pathology tags and multi head attention[C]∥Proceedings of the Asian Conference on Computer Vision. Berlin:Springer, 2020: 600-616.
15 Chen Z, Song Y, Chang T H, et al. Generating radiology reports via memory-driven transformer[DB/OL].[2023-03-23]..
16 Liu F, You C, Wu X, et al. Auto-encoding knowledge graph for unsupervised medical report generation[J]. Advances in Neural Information Processing Systems, 2021, 34: 16266-16279.
17 Vu Y N T, Wang R, Balachandar N, et al. Medaug: contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation[DB/OL].[2023-03-24]..
18 Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]∥Proceedings of the European conference on computer vision (ECCV). Berlin: Springer,2018: 3-19.
19 肖明尧, 李雄飞, 朱芮. 基于NSST域像素相关分析的医学图像融合[J]. 吉林大学学报:工学版, 2023, 53(9): 2640-2648.
Xiao Ming-yao, Li Xiong-fei, Zhu Rui. Medical image fusion based on pixel correlation analysis in NSST domain[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(9): 2640-2648.
20 Demner-Fushman D, Kohli M D, Rosenman M B, et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23(2): 304-310.
21 Liu F, Wu X, Ge S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 13753-13762.
22 Chen Z, Shen Y, Song Y, et al. Cross-modal memory networks for radiology report generation[DB/OL].[2023-03-24]..
23 Yang S, Wu X, Ge S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment[J]. Medical Image Analysis, 2023, 86: No.102798.
24 Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports[DB/OL].[2023-03-24]..
25 Li Y, Liang X, Hu Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[DB/OL].[2023-03-24]..
26 Jing B, Wang Z, Xing E. Show, describe and conclude : on exploiting the structure information of chest X-ray reports[DB/OL].[2023-03-24]..
27 Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City, USA,2018: 6077-6086.
[1] Hua CAI,Ting-ting KOU,Yi-ning YANG,Zhi-yong MA,Wei-gang WANG,Jun-xi SUN. Three-dimensional vehicle multi-target tracking based on trajectory optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2338-2347.
[2] Sheng-jie ZHU,Xuan WANG,Fang XU,Jia-qi PENG,Yuan-chao WANG. Multi-scale normalized detection method for airborne wide-area remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2329-2337.
[3] Xin-gang GUO,Chao CHENG,Zi-qi SHEN. Face expression recognition based on attention mechanism of convolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2319-2328.
[4] Hong-wei ZHAO,Hong WU,Ke MA,Hai LI. Image classification framework based on knowledge distillation [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2307-2312.
[5] Chao-lu TEMUR,Ya-ping ZHANG. Link anomaly detection algorithm for wireless sensor networks based on convolutional neural networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2295-2300.
[6] Dan-hui LAI,Wei-feng LUO,Xu-dong YUAN,Zi-liang QIU. Key point feature extraction algorithms for multimodal gesture in complex environments [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2288-2294.
[7] Peng WANG,Guo-dong YANG. Optimization of cache scheduling algorithm for embedded multi-core system [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2282-2287.
[8] Liang-li ZHANG,Xiao-feng MA. New energy vehicle charging station location method based on improved particle swarm optimization algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2275-2281.
[9] Nan ZHANG,Ben-yuan ZHONG,Ping WANG. Real time tracking method for hybrid moving targets based on GPS-UWB combined positioning technology [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2115-2120.
[10] Yi TANG,Yang PAN,Ming GAO,Hong-chen YI,An-qi WEI. Multi spectral image matching algorithm of unmanned aerial vehicle based on affine invariant operator [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2080-2085.
[11] Xin-dong YOU,Lei GUO,Jing HAN,Xue-qiang LYU. An character recognition network for imprint character [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2072-2079.
[12] Xin-gang GUO,Ying-chen HE,Chao CHENG. Noise-resistant multistep image super resolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2063-2071.
[13] Lei JIANG,Zi-qi WANG,Zhen-yu CUI,Zhi-yong CHANG,Xiao-hu SHI. Visual Transformer based on a recurrent structure [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2049-2056.
[14] Yun-zuo ZHANG,Yu-xin ZHENG,Cun-yu WU,Tian ZHANG. Accurate lane detection of complex environment based on double feature extraction network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 1894-1902.
[15] Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] Liu Qing-min,Wang Long-shan,Chen Xiang-wei,Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] Li Hong-ying; Shi Wei-guang;Gan Shu-cai. Electromagnetic properties and microwave absorbing property
of Z type hexaferrite Ba3-xLaxCo2Fe24O41
[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] Zhang Quan-fa,Li Ming-zhe,Sun Gang,Ge Xin . Comparison between flexible and rigid blank-holding in multi-point forming[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5] Yang Shu-kai, Song Chuan-xue, An Xiao-juan, Cai Zhang-lin . Analyzing effects of suspension bushing elasticity
on vehicle yaw response character with virtual prototype method
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6] . [J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7] Che Xiang-jiu,Liu Da-you,Wang Zheng-xuan . Construction of joining surface with G1 continuity for two NURBS surfaces[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8] Liu Han-bing, Jiao Yu-ling, Liang Chun-yu,Qin Wei-jun . Effect of shape function on computing precision in meshless methods[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9] . [J]. 吉林大学学报(工学版), 2007, 37(04): 0 .
[10] Li Yue-ying,Liu Yong-bing,Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .