吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (8): 2355-2363.doi: 10.13229/j.cnki.jdxbgxb.20230341

• 计算机科学与技术 • 上一篇    下一篇

前后文记忆矩阵引导的胸部放射影像报告生成模型

刘利军1,2(),张云峰1,黄青松1   

  1. 1.昆明理工大学 信息工程与自动化学院,昆明 650500
    2.云南省计算机技术应用重点实验室,昆明 650500
  • 收稿日期:2023-04-11 出版日期:2024-08-01 发布日期:2024-08-30
  • 作者简介:刘利军(1978-),男,副教授,博士.研究方向:智能医学影像诊断,医学影像报告生成,医学视觉问答.E-mail:cloneiq@kust.edu.cn
  • 基金资助:
    国家自然科学基金项目(81860318)

Anterior-posterior memory matrix model for chest radiology image report generation

Li-jun LIU1,2(),Yun-feng ZHANG1,Qing-song HUANG1   

  1. 1.School of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China
    2.Yunnan Key Laboratory of Computer Technologies Application,Kunming 650500,China
  • Received:2023-04-11 Online:2024-08-01 Published:2024-08-30

摘要:

针对现有的放射影像报告生成方法主要关注前文信息在报告生成过程中的指导作用,忽略了后文信息在报告描述生成中的作用,存在语义信息不完整、关键信息缺失等问题,提出了一种前后文记忆矩阵引导的胸部放射影像报告生成模型(RadRG)。为了在生成过程中获得更丰富的语义信息,提出记忆矩阵生成模块(MA),使用记忆矩阵记录报告的模式信息和语义信息,并使用门控单元对其进行调控,以防止训练过程中出现梯度消失或信息爆炸。将本文模型在IU X-Ray和MIMIC-CXR公开数据集上进行了测试,实验结果表明:与现有主流模型对比,本文模型在BIEU指标达到了2.6%的提升。

关键词: 医学报告生成, 前后文记忆矩阵, 门控单元, 跨模态

Abstract:

Aiming at the existing radiographic report generation methods that mainly focus on the guiding role of the preceding information in the report generation process and ignore the role of the following information in the report description generation, which has the problems of incomplete semantic information and missing key information, anterior-posterior memory matrix model for chest radiology image report generation(RadRG) is proposed. In order to obtain richer semantic information in the generation process, anterior-posterior memory matrix(MA) is proposed, which uses the memory matrix to record the schema information and semantic information of the report, and uses a gating unit to regulate it in order to prevent the gradient from disappearing or the information explosion in the training process. The model in this paper is tested on the IU X-Ray and MIMIC-CXR public datasets, and the experimental results show that the model in this paper achieves a 2.6% improvement in the BIEU metrics compared with the existing mainstream models.

Key words: medical report generation, context memory matrix, gating unit, cross model

中图分类号: 

  • TP391

图1

模型总体结构图"

图2

视觉特征提取模块"

图3

前后文记忆矩阵生成模块"

图4

门控单元"

图5

MLCN模块结构"

表1

实验参数设置"

参数名参数值参数说明
lr_ve5×10-5视觉特征初始学习率
lr_ed1×10-4其他参数学习率
weight_decay5×10-5权重的衰减率
batch10一次输入网络的数量
m_f512记忆矩阵的维度
mma_heads8多头注意力的头数
gamma0.1学习率的衰减率(IU X-Ray)
0.8学习率的衰减率(MIMIC-CXR)
epoch100迭代次数(IU X-Ray)
50迭代次数(MIMIC-CXR)

表2

不同算法在IU X?Ray和MIMIC?CXR数据集上的测试结果对比"

数据集模型BLEU-1BLEU-2BLEU-3BLEU-4METEORROUGE
IU X-RayADAATT90.2200.1270.0890.0680.308
CoATT240.4550.2880.2050.1540.369
HRGR250.4380.2980.2080.1510.322
CMAS-RL260.4640.3010.2100.1540.362
CMN220.4750.3090.2220.1700.1810.375
PPKED210.4800.3090.2240.1680.1830.376
V-TL230.4810.3110.2240.1690.380
MIMIC-CXROurs0.4960.3250.2370.1790.1920.391
ADAATT0.2690.1450.0840.0580.0980.226
TOPDOWN270.2770.1550.0900.0520.1080.237
CMN0.2780.1650.1040.0650.1180.247
PPKED0.2820.1680.1070.0640.1160.247
V-TL0.2830.1690.1070.0680.249
Ours0.2940.1760.1160.0820.1330.256

表3

消融实验"

数据集模 型BLEU-1BLEU-2BLEU-3BLEU-4METEORROUGE
IU X-RayBASE0.4700.3040.2190.1650.1870.371
BASE+Attention0.4760.3000.2150.1690.1860.371
BASE+MA0.4770.3040.2240.1710.1910.383
BASE+MA+MCLN0.4810.3110.2230.1730.1930.387
Ours0.4910.3190.2310.1760.1920.391
MIMIC-CXRBASE0.2740.1610.1040.0600.1170.246
BASE+Attention0.2730.1610.1040.0720.1150.252
BASE+MA0.2830.1630.1080.0730.1100.251
BASE+MA+MLCN0.2850.1650.1090.0720.1110.250
Ours0.2860.1680.1080.0730.1230.251

表4

可视化分析"

数据集放射影像真实值生成文本
IU X-RayThe heart is enlarged. There is pulmonary vascular congestion with diffusely increased interstitial and mild patchy airspace opacities. The distribution XXXX pulmonary edema. There is no pneumothorax or large pleural effusion. There are no acute bony findingsthere is stable cardiomegaly with xxxx pulmonary vascular congestion and probable mild interstitial edema. there are bilateral pleural effusions with bibasilar airspace disease right greater than left. there is no pneumothorax. there are no acute bony findings
The lungs are clear. there is no pleural effusion or pneumothorax. the heart and mediastinum are normal. the skeletal structures are normal.cardiac and mediastinal contours are within normal limits. there is no pleural effusion or pneumothorax. the lungs are clear. bony structures are intact
MIMIC-CXRmoderate cardiomegaly is re-demonstrated. the aorta is tortuous. pulmonary vasculature is not engorged. patchy opacities are seen in the left lung base potentially atelectasis but infection or aspiration cannot be excluded. streaky atelectasis is also demonstrated in the left lung base. no pleural effusion or pneumothorax is present. no acute osseous abnormality is visualizedlung volumes are low. heart size is mildly enlarged. mediastinal and hilar contours are unremarkable. pulmonary vasculature is not engorged. streaky opacities in the lung bases likely reflect atelectasis. no focal consolidation pleural effusion or pneumothorax is present. no acute osseous abnormality is detected.
1 刘桂霞, 田郁欣, 王涛, 等. 基于双输入3D卷积神经网络的胰腺分割算法[J]. 吉林大学学报: 工学版, 2023, 53(12): 3565-3572.
Liu Gui-xia, Tian Yu-xin, Wang Tao,et al. Pancreas segmentation algorithm based on dual input 3D convolutional neural network[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(12): 3565-3572.
2 Kaur N, Mittal A, Singh G. Methods for automatic generation of radiological reports of chest radiographs: a comprehensive survey[J]. Multimedia Tools and Applications, 2022, 81(10): 13409-13439.
3 Li M, Liu R, Wang F, et al. Auxiliary signal-guided knowledge encoder-decoder for medical report generation[J]. World Wide Web, 2023, 26(1): 253-270.
4 Li C Y, Liang X, Hu Z, et al. Knowledge-driven encode, retrieve, paraphrase for medical image report generation[DB/OL].[2023-03-22]..
5 Wang Z, Zhou L, Wang L, et al. A self-boosting framework for automated radiographic report generation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA,2021: 2433-2442.
6 Messina P, Pino P, Parra D, et al. A survey on deep learning and explainability for automatic report generation from medical images[J]. ACM Computing Surveys (CSUR), 2022, 54(10s): 1-40.
7 Kuo C W, Kira Z. Beyond a pre-trained object detector: cross-modal textual and visual context for image captioning[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,New Orleans,USA,2022: 17969-17979.
8 Luo Y, Ji J, Sun X, et al. Dual-level collaborative transformer for image captioning[DB/OL].[2023-03-22]..
9 Lu J, Xiong C, Parikh D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Honolulu, USA, 2017: 375-383.
10 Diao H, Zhang Y, Ma L, et al. Similarity reasoning and filtration for image-text matching[DB/OL].[2023-03-23]..
11 Wang L, Li Y, Lazebnik S. Learning deep structure-preserving image-text embeddings[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA,2016: 5005-5013.
12 Wang H, Zhang Y, Ji Z, et al. Consensus-aware visual-semantic embedding for image-text matching[C]∥Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 2020: 18-34.
13 Kim W, Son B, Kim I. Vilt: Vision-and-language transformer without convolution or region supervision[DB/OL].[2023-03-23].∥.
14 Srinivasan P, Thapar D, Bhavsar A, et al. Hierarchical X-ray report generation via pathology tags and multi head attention[C]∥Proceedings of the Asian Conference on Computer Vision. Berlin:Springer, 2020: 600-616.
15 Chen Z, Song Y, Chang T H, et al. Generating radiology reports via memory-driven transformer[DB/OL].[2023-03-23]..
16 Liu F, You C, Wu X, et al. Auto-encoding knowledge graph for unsupervised medical report generation[J]. Advances in Neural Information Processing Systems, 2021, 34: 16266-16279.
17 Vu Y N T, Wang R, Balachandar N, et al. Medaug: contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation[DB/OL].[2023-03-24]..
18 Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]∥Proceedings of the European conference on computer vision (ECCV). Berlin: Springer,2018: 3-19.
19 肖明尧, 李雄飞, 朱芮. 基于NSST域像素相关分析的医学图像融合[J]. 吉林大学学报:工学版, 2023, 53(9): 2640-2648.
Xiao Ming-yao, Li Xiong-fei, Zhu Rui. Medical image fusion based on pixel correlation analysis in NSST domain[J]. Journal of Jilin University (Engineering and Technology Edition), 2023, 53(9): 2640-2648.
20 Demner-Fushman D, Kohli M D, Rosenman M B, et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association, 2016, 23(2): 304-310.
21 Liu F, Wu X, Ge S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville, USA, 2021: 13753-13762.
22 Chen Z, Shen Y, Song Y, et al. Cross-modal memory networks for radiology report generation[DB/OL].[2023-03-24]..
23 Yang S, Wu X, Ge S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment[J]. Medical Image Analysis, 2023, 86: No.102798.
24 Jing B, Xie P, Xing E. On the automatic generation of medical imaging reports[DB/OL].[2023-03-24]..
25 Li Y, Liang X, Hu Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[DB/OL].[2023-03-24]..
26 Jing B, Wang Z, Xing E. Show, describe and conclude : on exploiting the structure information of chest X-ray reports[DB/OL].[2023-03-24]..
27 Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City, USA,2018: 6077-6086.
[1] 侯春萍,杨庆元,黄美艳,王致芃. 基于语义耦合和身份一致性的跨模态行人重识别方法[J]. 吉林大学学报(工学版), 2022, 52(12): 2954-2963.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李寿涛, 李元春. 在未知环境下基于递阶模糊行为的移动机器人控制算法[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] 刘庆民,王龙山,陈向伟,李国发. 滚珠螺母的机器视觉检测[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] 李红英;施伟光;甘树才 .

稀土六方Z型铁氧体Ba3-xLaxCo2Fe24O41的合成及电磁性能与吸波特性

[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] 张全发,李明哲,孙刚,葛欣 . 板材多点成形时柔性压边与刚性压边方式的比较[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5] 杨树凯,宋传学,安晓娟,蔡章林 . 用虚拟样机方法分析悬架衬套弹性对
整车转向特性的影响
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6] 冯金巧;杨兆升;张林;董升 . 一种自适应指数平滑动态预测模型[J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7] 车翔玖,刘大有,王钲旋 .

两张NURBS曲面间G1光滑过渡曲面的构造

[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8] 刘寒冰,焦玉玲,,梁春雨,秦卫军 . 无网格法中形函数对计算精度的影响[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9] .

吉林大学学报(工学版)2007年第4期目录

[J]. 吉林大学学报(工学版), 2007, 37(04): 0 .
[10] 李月英,刘勇兵,陈华 . 凸轮材料的表面强化及其摩擦学特性
[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .