吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (2): 667-676.doi: 10.13229/j.cnki.jdxbgxb20191070

• 计算机科学与技术 • 上一篇    

机器学习加速CALYPSO结构预测的可行性

魏晓辉1(),周长宝1,沈笑先1,刘圆圆1,童群超2   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 超硬材料国家重点实验室,长春 130012
  • 收稿日期:2019-11-22 出版日期:2021-03-01 发布日期:2021-02-09
  • 作者简介:魏晓辉(1972-),男,教授,博士生导师.研究方向:云计算,高性能计算,分布式系统.E-mail:weixh@jlu.edu.cn
  • 基金资助:
    国家自然科学基金项目(61772228)

Accelerating CALYPSO structure prediction with machine learning

Xiao-hui WEI1(),Chang-bao ZHOU1,Xiao-xian SHEN1,Yuan-yuan LIU1,Qun-chao TONG2   

  1. 1.College of Computer Science & Technology,Jilin University,Changchun 130012,China
    2.State Key Lab of Superhard Materials,Jilin University,Changchun 130012,China
  • Received:2019-11-22 Online:2021-03-01 Published:2021-02-09

摘要:

对机器学习替代DFT能量计算方法加速CALYPSO结构预测进行研究,选择5种机器学习方法评估其预测硼团簇总能量时的性能。使用库伦矩阵把原始数据表征为结构信息矩阵,提取矩阵特征值向量作为算法输入输出来训练模型;采用相同数据集评估算法,并探索影响算法性能的其他因素。提出基于势能面特征的相似性判断方法,建立置信度模型对性能最佳算法进行验证,结果表明:核岭回归算法预测出的势能面和DFT具有相似性;当允许误差为1 kcal/mol时,算法置信度接近90%。时间测试结果显示,核岭回归算法时间复杂度为Ο(n),比DFT方法提高1~2个数量级。

关键词: 计算机应用, 结构预测, 能量计算, 均方根误差, 置信度

Abstract:

The potential of accelerating CALYPSO structure prediction by replacing DFT methods with machine learning was studied. The performance in predicting the potential energy of boron clusters was evaluated with five machine learning methods. Firstly, the original data was represented as structural information metrics with Coulomb matrix. Then the eigenvalue vector pair of the matrix was extracted and used as the input of machine learning algorithms to training model, five algorithms were trained and tested using the same dataset. Also, factors affecting the performance were explored. Finally, a method of comparing the similarity of predicted values and ground truth was proposed based on the characters of potential energy surface (PES), and a confidence model was constructed to validate the best kernel ridge regression (KRR) method. It is suggested that PES fitted by KRR is similar with the PES by DFT, and the confidence of the algorithm is closed to 90% while permissible error is 1 kcal/mol. The result of time test to KRR shows that the method’s time complexity is On), which is improved by 1 to 2 orders of magnitude compared with DFT methods.

Key words: computer application, structure prediction, energy calculation, root mean square error, confidence

中图分类号: 

  • TP399

图1

二维势能面示意图"

图2

神经网络示意图"

图3

硼团簇结构"

表1

硼团簇数据格式"

XYZ
能量:-123.82 eV
12.3613.546.50
6.846.456.50
10.6610.956.49
8.189.276.49

图4

B8库伦矩阵计算结果"

图5

特征值提取过程"

图6

5种方法预测准确率"

图7

数据集团簇结构分布"

表2

算法时间开销"

项目训练时间/ms预测时间/ms
KNN29.0276.49
KRR13 777.212 157.36
LRR10.340.5
MNN4 403.285.46
SVR14 404.392 408.4

图8

其他因素探索结果"

图9

B20势能面"

表3

对B20等结构测试的KRR算法置信度"

允许误差B20B22B24B28B30B36B38B40
0.500.6660.6550.7210.7690.7320.7560.7520.767
0.550.7090.7100.7530.7930.7540.7660.7730.795
0.600.7450.7540.7800.8090.7860.7970.8010.832
0.650.7670.7820.8050.8290.8080.8100.8090.852
0.700.7900.8020.8160.8490.8400.8340.8350.866
0.750.8030.8330.8360.8690.8550.8440.8500.881
0.800.8150.8450.8540.8800.8740.8640.8600.895
0.850.8250.8770.8720.8840.8740.8780.8680.908
0.900.8450.8850.8770.8960.8820.8850.8770.919
0.950.8580.8930.8850.9000.8870.9050.8840.924
1.000.8620.8970.8910.9000.8940.9150.8880.929

图10

KRR预测耗时"

1 Hansen K, Montavon G, Biegler F, et al. Assessment and validation of machine learning methods for predicting molecular atomization energies[J]. Journal of Chemical Theory and Computation, 2013, 9(8): 3404-3419.
2 Doye J P, Wales D J. Structural consequences of the range of the interatomic potential a menagerie of clusters[J]. Journal of the Chemical Society, Faraday Transactions, 1997, 93(24): 4233-4243.
3 Salamat A, Garbarino G, Dewaele A, et al. Dense close-packed phase of tin above 157 GPa observed experimentally via angle-dispersive x-ray diffraction[J]. Physical Review B, 2011, 84(14): 140104.
4 Wang Y C, Lv J, Zhu L, et al. Crystal structure prediction via particle-swarm optimization[J]. Physical Review B, 2010,82(9):094116.
5 Wang Y C, Lv J, Zhu L, et al. CALYPSO: A method for crystal structure prediction[J]. Computer Physics Communications, 2012, 183(10): 2063-2070.
6 Lv J, Wang Y C, Zhu L, et al. Particle-swarm structure prediction on clusters[J]. The Journal of Chemical Physics, 2012, 137(8): 084104.
7 Tadmor E B, Miller R E. Modeling Materials: Continuum, Atomistic and Multiscale Techniques[M]. Cambridge: Cambridge University Press, 2011.
8 赵东, 臧雪柏, 赵宏伟. 基于果蝇优化的随机森林预测方法[J]. 吉林大学学报: 工学版, 2017, 47(2): 609-614.
Zhao Dong, Zang Xue-bai, Zhao Hong-wei. Random forest prediction method based on optimization of fruit fly[J]. Journal of Jilin University (Engineering and Technology Edition), 2017, 47(2): 609-614.
9 Tong Q C, Xue L T, Lv J, et al. Accelerating CALYPSO structure prediction by data-driven learning of potential energy surface[J]. Faraday Discussions, 2018, 211: 31-43.
10 张耀龙, 周雪瑶, 蒋彬. 加速神经网络势能面的构建:一种杂化的训练算法[J]. 化学物理学报, 2017, 30(6): 727-734.
Zhang Yao-long, Zhou Xue-yao, Jiang Bin. Accelerating the construction of neural network potential energy surfaces: a fast hybrid training algorithm[J]. Chinese Journal of Chemical Physics, 2017, 30(6):727-734.
11 Zhang L F, Han J Q, Wang H, et al. Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics[J]. Physical Review Letters, 2018,120(14): 143001.
12 Marsland S. Machine Learning: An Algorithmic Perspective[M]. Chapman and Hall/CRC, 2011.
13 Behler J, Parrinello M. Generalized neural-network representation of high-dimensional potential-energy surfaces[J]. Physical Review Letters, 2007, 98(14): 146401.
14 Bartók A P, Payne M C, Kondor R, et al. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons[J]. Physical Review Letters, 2010, 104(13): 136403.
15 Snyder J C, Rupp M, Hansen K, et al. Finding density functionals with machine learning[J]. Physical Review Letters, 2012, 108(25): 253002.
16 Cui J, Krems R V. Gaussian process model for collision dynamics of complex molecules[J]. Physical Review Letters, 2015, 115(7): 073202.
17 Hoerl A E, Kennard R W. Ridge regression: biased estimation for nonorthogonal problems[J]. Technometrics, 1970, 12(1): 55-67.
18 Vovk V. Kernel Ridge Regression[M].Empirical Inference, Springer, 2013.
19 Smola A J, Schölkopf B. A tutorial on support vector regression[J]. Statistics and Computing, 2004, 14(3): 199-222.
20 王刚, 刘元宁, 陈慧灵, 等. 粗糙集与支持向量机在肝炎诊断中的应用[J]. 吉林大学学报: 工学版, 2011, 41(1): 160-164.
Wang Gang, Liu Yuan-ning, Chen Hui-ling, et al. Application of rough set and support vector machines in hepatitis diagnosis[J]. Journal of Jilin University (Engineering and Technology Edition), 2011, 41(1): 160-164.
21 Heidari A A, Faris H, Mirjalili S, et al. Ant Lion Optimizer: Theory, Literature Review, and Application in Multi-layer Perceptron Neural Networks[M].Nature-Inspired Optimizers, Springer, 2020.
22 Biau G, Devroye L. Lectures on the Nearest Neighbor Method[M]. Springer, 2015.
23 沈艳芳, 徐畅, 黄敏, 等. 硼团簇、硼烷及金属硼化物的研究现状[J]. 化学进展, 2016, 28(11): 1601-1614.
Shen Yan-fang, Xu Chang, Huang Min, et al. Research advances of boron clusters, borane and metal-doped boron compounds[J],Progress in Chemistry, 2016, 28(11): 1601-1614.
24 Rupp M, Tkatchenko A, Müller K, et al. Fast and accurate modeling of molecular atomization energies with machine learning[J]. Physical Review Letters, 2012, 108(5): 058301.
[1] 方明,陈文强. 结合残差网络及目标掩膜的人脸微表情识别[J]. 吉林大学学报(工学版), 2021, 51(1): 303-313.
[2] 宋元,周丹媛,石文昌. 增强OpenStack Swift云存储系统安全功能的方法[J]. 吉林大学学报(工学版), 2021, 51(1): 314-322.
[3] 王小玉,胡鑫豪,韩昌林. 基于生成对抗网络的人脸铅笔画算法[J]. 吉林大学学报(工学版), 2021, 51(1): 285-292.
[4] 车翔玖,董有政. 基于多尺度信息融合的图像识别改进算法[J]. 吉林大学学报(工学版), 2020, 50(5): 1747-1754.
[5] 赵宏伟,刘晓涵,张媛,范丽丽,龙曼丽,臧雪柏. 基于关键点注意力和通道注意力的服装分类算法[J]. 吉林大学学报(工学版), 2020, 50(5): 1765-1770.
[6] 管乃彦,郭娟利. 基于姿态估计算法的组件感知自适应模型[J]. 吉林大学学报(工学版), 2020, 50(5): 1850-1855.
[7] 李阳,李硕,井丽巍. 基于贝叶斯模型与机器学习算法的金融风险网络评估模型[J]. 吉林大学学报(工学版), 2020, 50(5): 1862-1869.
[8] 周炳海,何朝旭. 基于线边集成超市的混流装配线动态物料配送调度[J]. 吉林大学学报(工学版), 2020, 50(5): 1809-1817.
[9] 蒋磊,管仁初. 基于多目标进化算法的人才质量模糊综合评价系统设计[J]. 吉林大学学报(工学版), 2020, 50(5): 1856-1861.
[10] 刘洲洲,尹文晓,张倩昀,彭寒. 基于离散优化算法和机器学习的传感云入侵检测[J]. 吉林大学学报(工学版), 2020, 50(2): 692-702.
[11] 王晓辉,吴禄慎,陈华伟. 基于法向量距离分类的散乱点云数据去噪[J]. 吉林大学学报(工学版), 2020, 50(1): 278-288.
[12] 张笑东,夏筱筠,吕海峰,公绪超,廉梦佳. 大数据网络并行计算环境中生理数据流动态负载均衡[J]. 吉林大学学报(工学版), 2020, 50(1): 247-254.
[13] 陈蔓,钟勇,李振东. 隐低秩结合低秩表示的多聚焦图像融合[J]. 吉林大学学报(工学版), 2020, 50(1): 297-305.
[14] 金顺福,郄修尘,武海星,霍占强. 基于新型休眠模式的云虚拟机分簇调度策略及性能优化[J]. 吉林大学学报(工学版), 2020, 50(1): 237-246.
[15] 邓钧忆,刘衍珩,冯时,赵荣村,王健. 基于GSPN的Ad⁃hoc网络性能和安全平衡[J]. 吉林大学学报(工学版), 2020, 50(1): 255-261.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!