吉林大学学报(信息科学版) ›› 2026, Vol. 44 ›› Issue (1): 78-86.

• • 上一篇    下一篇

面向大规模虚拟筛选的并行增量图贝叶斯优化

赵晨阳a,b , 赵海士a,b , 杨 博a,b   

  1. 吉林大学 a. 计算机科学与技术学院; b. 符号计算与知识工程教育部重点实验室, 长春 130012
  • 收稿日期:2024-12-19 出版日期:2026-01-31 发布日期:2026-02-03
  • 通讯作者: 杨博(1974— ), 男, 河南新乡人, 吉林大学教授, 博士生导师, 主要从事图 神经网络和图优化研究, (Tel)86-431-85166892(E-mail)ybo@ jlu. edu. cn
  • 作者简介:赵晨阳(1999— ), 男, 哈尔滨人, 吉林大学硕士研究生, 主要从事图优化和贝叶斯优化研究, ( Tel) 86-15643067937 (E-mail)15643067937@ 163. com

Parallel Incremental Graph Bayesian Optimization for Large-Scale Virtual Screening

ZHAO Chenyang a,b , ZHAO Haishi a,b , YANG Bo a,b   

  1. a. College of Computer Science and Technology; b. Key Laboratory of Symbolic Computing and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Received:2024-12-19 Online:2026-01-31 Published:2026-02-03

摘要: 针对分子对接等传统方法在大规模虚拟筛选任务中面临时间成本过高甚至难以完成的问题, 提出一种 融合增量学习的并行图贝叶斯优化框架, 用于高效执行大规模虚拟筛选任务。 其利用深度图贝叶斯优化框架 进行筛选, 通过并行化扩展支持在多台服务器的多个计算节点上灵活部署, 从而显著提高计算效率。 同时, 针对代理模型训练时间较长的问题, 引入增量学习训练策略予以解决, 并设计指数移动平均及回放机制以缓解 增量学习中的灾难性遗忘问题。 实验结果表明, 深度图贝叶斯优化框架仅需对接化合物库中 6% 的分子即可 筛选出优秀分子集合中 96% 以上的分子; 在采用 4 个计算节点时, 并行化框架较串行框架节省了 71% 的时间 成本; 在增量学习策略的加持下, 总运行时间进一步减少 13. 8% , 同时仍能筛选出优秀分子集合中 93. 7% 的 分子。 综上, 所提出的方法能在大幅降低虚拟筛选时间成本的同时保持筛选性能。

关键词: 并行, 虚拟筛选, 贝叶斯优化, 增量学习

Abstract: Traditional methods like molecular docking often face high time costs or infeasibility in large-scale virtual screening tasks. To address this problem, a parallel graph Bayesian optimization framework incorporating incremental learning is proposed to efficiently handle such tasks. The method utilizes a deep graph Bayesian optimization framework for screening and employs parallelization to enable flexible deployment across multiple computational nodes on various servers, significantly improving computational efficiency. To tackle the issue of long surrogate model training times, an incremental learning strategy is introduced, along with an exponential moving average mechanism and a replay mechanism to mitigate catastrophic forgetting in incremental learning. Experimental results demonstrate that the framework can identify over 96% of the optimal molecules by docking only 6% of the molecular library. When deployed on four computational nodes, the parallel framework reduces time costs by 71% compared to the serial framework. With the incremental learning strategy, the total runtime is further reduced by 13. 8% , while still identifying 93. 7% of the optimal molecules. The proposed method significantly reduces the time cost of virtual screening while maintaining high screening performance. 

Key words: parallel, virtual screening, Bayesian optimization, incremental learning

中图分类号: 

  • TP391