基于多模态自适应融合的呼吸频率预测方法

吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 319-0328.

基于多模态自适应融合的呼吸频率预测方法

路扬, 张学沛, 马小蕾, 王一博, 白金峰

内蒙古民族大学计算机科学与技术学院, 内蒙古通辽 028000

收稿日期:2025-01-26 出版日期:2026-03-26 发布日期:2026-03-26
通讯作者: 马小蕾 E-mail:maxl@imun.edu.cn

Respiratory Rate Prediction Method Based on Multimodal Adaptive Fusion

LU Yang, ZHANG Xuepei, MA Xiaolei, WANG Yibo, BAI Jinfeng

College of Computer Science and Technology, Inner Mongolia Minzu University, Tongliao 028000, Inner Mongolia Autonomous Region, China

Received:2025-01-26 Online:2026-03-26 Published:2026-03-26

摘要/Abstract

摘要： 针对现有呼吸频率预测研究在多模态生理信号深层联合分析方面存在的不足, 以及难以兼顾长时序依赖与局部细节捕捉的挑战, 提出一种基于动态多维特征混合网络的预测模型. 首先, 构建自适应多尺度融合模块, 分别针对心电图与光电容积脉搏图动态提取多频率特征, 以生成包含丰富多尺度信息的单模态特征图, 从而解决了单一卷积核感受野受限的问题. 其次, 模型引入混合时空注意力机制, 通过堆叠Transformer编码块并结合局部、全局及时空三重注意力策略, 实现了异构特征的深度交互与长时序依赖的精确建模. 基于公开数据集BIDMC和CapnoBase的验证结果表明, 该模型的平均绝对误差分别达1.08次/min和0.76次/min, 在准确性和鲁棒性方面显著优于现有主流模型, 能为临床非侵入式健康监测提供理论依据.

关键词: 多模态, 呼吸频率预测, 混合时空注意力, 自适应多尺度特征融合

Abstract: Aiming at the limitations of existing research on respiratory rate prediction in deep joint analysis of multimodal physiological signals, as well as the challenge of balancing long-term temporal dependencies and capturing local details, we proposed a prediction model based on a dynamic multidimensional feature fusion network. Firstly, we constructed an adaptive multi-scale fusion module to dynamically extract multi-frequency features from both electrocardiogram and photoplethysmography, respectively, to generate a single-modal feature map containing rich multi-scale information, thereby resolving the problem of limited receptive field of a single convolutional kernel. Secondly, the model incorporated a hybrid spatio-temporal attention mechanism. By stacking Transformer encoding blocks and integrating local, global, and spatio-temporal triple attention strategies,
it achieved deep interaction between heterogeneous features and precise modeling of long-term temporal dependencies. Validation results based on the BIDMC and CapnoBase public datasets show that the mean absolute errors of the model reach 1.08 beats/min and 0.76 beats/min, respectively, which is significantly better than existing mainstream models in terms of accuracy
and robustness, and can provide theoretical basis for clinical non-invasive health monitoring.

Key words: , multimodal, respiratory rate prediction, hybrid spatio-temporal attention, adaptive multi-scale feature fusion

中图分类号:

TP391

路扬, 张学沛, 马小蕾, 王一博, 白金峰. 基于多模态自适应融合的呼吸频率预测方法[J]. 吉林大学学报(理学版), 2026, 64(2): 319-0328.

LU Yang, ZHANG Xuepei, MA Xiaolei, WANG Yibo, BAI Jinfeng. Respiratory Rate Prediction Method Based on Multimodal Adaptive Fusion[J]. Journal of Jilin University Science Edition, 2026, 64(2): 319-0328.

[1]	闫培玲, 刘俊娟, 高志宇. 基于多模态深度神经网络的Web网页攻击重定向混淆检测[J]. 吉林大学学报(理学版), 2025, 63(6): 1731-1736.
[2]	郭晓新, 杨梅, 杨广奇, 董洪良, 徐海啸. 基于多层次多尺度注意力融合网络的多模态眼底疾病诊断模型[J]. 吉林大学学报(理学版), 2025, 63(3): 783-0794.
[3]	张燕. 基于深度学习与D-S理论的多模态数据特征融合算法[J]. 吉林大学学报(理学版), 2025, 63(3): 855-0860.
[4]	陈琳, 魏娟. 无线局域网多模态数据标签自适应标注方法[J]. 吉林大学学报(理学版), 2025, 63(3): 878-0884.
[5]	李伟伟, 王丽妍, 傅博, 王娟, 黄虹. 基于多模态融合的深度神经网络图像复原方法[J]. 吉林大学学报(理学版), 2024, 62(2): 391-0398.
[6]	杨琦. 融合Morlet小波与GA优化多模态核的轴承故障检测算法[J]. 吉林大学学报(理学版), 2018, 56(1): 101-108.