吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (4): 1420-1426.doi: 10.13229/j.cnki.jdxbgxb20200515

• 计算机科学与技术 • 上一篇    

基于深度强化学习的随机局部搜索启发式方法

吕帅1(),刘京2   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.吉林大学 软件学院,长春 130012
  • 收稿日期:2020-07-08 出版日期:2021-07-01 发布日期:2021-07-14
  • 作者简介:吕帅(1981-),男,副教授,博士生导师. 研究方向:人工智能,机器学习,自动推理.E-mail: lus@jlu.edu.cn
  • 基金资助:
    国家重点研发计划项目(2017YFB1003103);国家自然科学基金项目(61300049);吉林省自然科学基金项目(20180101053JC)

Stochastic local search heuristic method based on deep reinforcement learning

Shuai LYU1(),Jing LIU2   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.College of Software,Jilin University,Changchun 130012,China
  • Received:2020-07-08 Online:2021-07-01 Published:2021-07-14

摘要:

为了更充分地利用可满足问题(SAT)的数据分布中的信息,从而提升算法性能,提出了一种基于深度强化学习的随机局部搜索启发式方法。把随机局部搜索算法中变量的选择看作强化学习任务,训练强化学习Agent学习策略作为随机局部搜索算法选择翻转变量的启发式,以期望通过端到端的方式获得效率更好的翻转变量的选择方法。实验结果表明,本文方法是有效的,并且与经典随机局部搜索算法ProbSAT相比,本文方法在性能上也有一定的优势,可以在更少的决策步骤内求出问题的解。

关键词: 计算机软件, 可满足性, 随机局部搜索, 深度强化学习

Abstract:

In order to make full use of the information in the data distribution of the Satisfiability Problem (SAT) and thereby improve the performance of the algorithm, this paper proposes a stochastic local search heuristic method based on deep reinforcement learning. The selection of the variables in stochastic local search algorithm is regarded as a reinforcement learning task. The strategy agent learns is used as a heuristic for selecting variables, so that we can obtain a more efficient variable selection heuristic in an end-to-end way. Experiments show that the method in this paper is effective. Compared with the classical stochastic local search algorithm ProbSAT, our method also has certain advantages in performance, and can solve the problems in fewer decision steps.

Key words: computer software, satisfiability, stochastic local search, deep reinforcement learning

中图分类号: 

  • TP181

图1

SAT问题图表示"

图2

网络结构"

表1

本文算法和ProbSAT在随机3-SAT的实例上测试的结果"

DatasetRL?basedProbSAT
ttotal#flipsttotal#flips
3?SAT(20)17.316 1641.318 892
3?SAT(50)44.0319 47916.1463 831
3?SAT(100)80.2772 876109.7990 461

表2

同分布不同规模实例泛化能力测试"

DatasetTrain_on_D20Train_on_D
ttotal#flipsttotal#flips
3?SAT(20)17.3161 6417.316 164
3?SAT(50)49.1390 37144.0319 479
3?SAT(100)112.41 061 20180.2772 876

表3

不同分布实例泛化能力测试"

Dataset#vars#clausesTrain_on_D20Train_on_D
ttotal#flipsttotal#flips
DomSet(2)15691.514 2011.615 796
DomSet(3)3636914.3127 83412.7110 420
DomSet(7)1284 344133.51413 83685.6858 127
1 Cook S A. The complexity of theorem-proving procedures[C]∥Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, New Orleans, USA, 1971: 151-158.
2 Audemard G, Simon L. Predicting learnt clauses quality in modern SAT solvers[C]∥International Joint Conference on Artificial Intelligence, Hainan Island, China, 2009: 399-404.
3 Moskewicz M W, Madigan C F, Zhao Y, et al. Chaff: Engineering an efficient SAT solver [C]∥Proceedings of the 38th Annual Design Automation Conference, Las Vegas, USA, 2001: 530-535.
4 Eén N, Sörensson N. An extensible SAT-solver[C]∥International Conference on Theory and Applications of Satisfiability Testing, Santa Margherita Ligure, Italy,2003: 502-518.
5 Cai S, Su K. Local search for Boolean Satisfiability with configuration checking and subscore[J]. Artificial Intelligence, 2013, 204: 75-98.
6 Shang Y, Wah B W. A discrete Lagrangian-based global-search method for solving satisfiability problems[J]. Journal of Global Optimization, 1998, 12(1): 61-99.
7 Balint A, Schöning U. Choosing probability distributions for stochastic local search and the role of make versus break[C]∥International Conference on Theory and Applications of Satisfiability Testing, Trento, Italy, 2012: 16-29.
8 Wang F, Jiang M, Qian C, et al. Residual attention network for image classification[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA,2017: 3156-3164.
9 Chiu C C, Sainath T N, Wu Y, et al. State-of-the-art speech recognition with sequence-to-sequence models[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018: 4774-4778.
10 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems, Los Angeles,USA, 2017: 5998-6008.
11 Xu L, Hutter F, Hoos H H, et al. SATzilla: portfolio-based algorithm selection for SAT[J]. Journal of Artificial Intelligence Research, 2008, 32: 565-606.
12 Liang J H, Ganesh V, Poupart P, et al. Learning rate based branching heuristic for SAT solvers [C]∥International Conference on Theory and Applications of Satisfiability Testing, Bordeaux, France, 2016: 123-140.
13 Liang J H, VK H G, Poupart P, et al. An empirical study of branching heuristics through the lens of global learning rate[C]∥International Conference on Theory and Applications of Satisfiability Testing, Melbourne, Australia,2017: 119-135.
14 Selsam D, Lamm M, Bünz B, et al. Learning a SAT solver from single-bit supervision[C]∥The 7th International Conference on Learning Representations, San Diego, USA, 2019.
15 Sutton R S, Barto A G. Reinforcement Learning: an Introduction[M]. Boston:MIT Press, 2018: 68-69.
16 Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
17 Mezard M, Zecchina R. Random k-satisifability problem: from an analytic solution to an efficient algorithm[J]. Physical Review E, 2002, 66 (5) :056126.
[1] 赵亚慧,杨飞扬,张振国,崔荣一. 基于强化学习和注意力机制的朝鲜语文本结构发现[J]. 吉林大学学报(工学版), 2021, 51(4): 1387-1395.
[2] 魏晓辉,孙冰怡,崔佳旭. 基于图神经网络的兴趣活动推荐算法[J]. 吉林大学学报(工学版), 2021, 51(1): 278-284.
[3] 袁满,胡超,仇婷婷. 基于Linked data的数据完整性评估新方法[J]. 吉林大学学报(工学版), 2020, 50(5): 1826-1831.
[4] 刘磊,瓮杰,郭德贵. 面向编译器测试的部分求值静态输入确定方法[J]. 吉林大学学报(工学版), 2020, 50(1): 262-267.
[5] 杨顺,蒋渊德,吴坚,刘海贞. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1026-1033.
[6] 马健, 樊建平, 刘峰, 李红辉. 面向对象软件系统演化模型[J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[7] 罗养霞, 郭晔. 基于数据依赖特征的软件识别[J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[8] 应欢, 王东辉, 武成岗, 王喆, 唐博文, 李建军. 适用于商用系统环境的低开销确定性重放技术[J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[9] 李勇, 黄志球, 王勇, 房丙午. 基于多源数据的跨项目软件缺陷预测[J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[10] 王念滨, 祝官文, 周连科, 王红卫. 支持高效路径查询的数据空间索引方法[J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[11] 陈鹏飞, 田地, 杨光. 基于MVC架构的LIBS软件设计与实现[J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[12] 康辉, 王家琦, 梅芳. 基于Pi演算的并行编程语言[J]. 吉林大学学报(工学版), 2016, 46(1): 235-241.
[13] 特日跟, 江晟, 李雄飞, 李军. 基于整数数据的文档压缩编码方案[J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[14] 冯晓宁, 王卓, 张旭. 基于L-π演算的WSN路由协议形式化方法[J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[15] 刘磊, 王燕燕, 申春, 李玉祥, 刘雷. Bellman-Ford算法性能可移植的GPU并行优化[J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!