吉林大学学报(理学版) ›› 2026, Vol. 64 ›› Issue (2): 319-0328.

• • 上一篇    下一篇

基于多模态自适应融合的呼吸频率预测方法

路扬, 张学沛, 马小蕾, 王一博, 白金峰   

  1. 内蒙古民族大学 计算机科学与技术学院, 内蒙古 通辽 028000
  • 收稿日期:2025-01-26 出版日期:2026-03-26 发布日期:2026-03-26
  • 通讯作者: 马小蕾 E-mail:maxl@imun.edu.cn

Respiratory Rate Prediction Method Based on Multimodal Adaptive Fusion

LU Yang, ZHANG Xuepei, MA Xiaolei, WANG Yibo, BAI Jinfeng   

  1. College of Computer Science and Technology, Inner Mongolia Minzu University, Tongliao 028000, Inner Mongolia Autonomous Region, China
  • Received:2025-01-26 Online:2026-03-26 Published:2026-03-26

摘要: 针对现有呼吸频率预测研究在多模态生理信号深层联合分析方面存在的不足, 以及难以兼顾长时序依赖与局部细节捕捉的挑战, 提出一种基于动态多维特征混合网络的预测模型. 首先, 构建自适应多尺度融合模块, 分别针对心电图与光电容积脉搏图动态提取多频率特征, 以生成包含丰富多尺度信息的单模态特征图, 从而解决了单一卷积核感受野受限的问题. 其次, 模型引入混合时空注意力机制, 通过堆叠Transformer编码块并结合局部、 全局及时空三重注意力策略, 实现了异构特征的深度交互与长时序依赖的精确建模. 基于公开数据集BIDMC和CapnoBase的验证结果表明, 该模型的平均绝对误差分别达1.08次/min和0.76次/min, 在准确性和鲁棒性方面显著优于现有主流模型, 能为临床非侵入式健康监测提供理论依据.

关键词: 多模态, 呼吸频率预测, 混合时空注意力, 自适应多尺度特征融合

Abstract: Aiming at  the limitations of existing research on respiratory rate prediction  in deep joint analysis of multimodal physiological signals, as well as the challenge of balancing long-term temporal dependencies and capturing local details, we  proposed a prediction model based on a dynamic multidimensional feature fusion network. Firstly, we constructed an adaptive multi-scale fusion module to dynamically extract multi-frequency features from both electrocardiogram and photoplethysmography, respectively, to generate a single-modal feature map containing rich  multi-scale information, thereby resolving the problem of limited receptive field of a single convolutional kernel.   Secondly, the model incorporated a hybrid spatio-temporal attention mechanism. By stacking Transformer encoding blocks and integrating local, global, and spatio-temporal triple attention strategies, 
it achieved deep interaction between heterogeneous features and precise modeling of long-term temporal dependencies. Validation results based on the BIDMC and CapnoBase public datasets show  that the mean absolute errors of the model reach 1.08 beats/min and 0.76 beats/min, respectively, which is  significantly better than  existing mainstream models in terms of accuracy 
and robustness, and can provide theoretical basis  for clinical non-invasive health monitoring.

Key words:  , multimodal, respiratory rate prediction, hybrid spatio-temporal attention, adaptive multi-scale feature fusion

中图分类号: 

  • TP391