基于深度强化学习的随机局部搜索启发式方法

doi:10.13229/j.cnki.jdxbgxb20200515

吉林大学学报(工学版) ›› 2021, Vol. 51 ›› Issue (4): 1420-1426.doi: 10.13229/j.cnki.jdxbgxb20200515

• 计算机科学与技术 • 上一篇

基于深度强化学习的随机局部搜索启发式方法

吕帅¹(),刘京²

^1.吉林大学计算机科学与技术学院，长春 130012
^2.吉林大学软件学院，长春 130012

收稿日期:2020-07-08 出版日期:2021-07-01 发布日期:2021-07-14
作者简介:吕帅（1981-），男，副教授，博士生导师. 研究方向：人工智能，机器学习，自动推理.E-mail： lus@jlu.edu.cn
基金资助:
国家重点研发计划项目(2017YFB1003103);国家自然科学基金项目(61300049);吉林省自然科学基金项目(20180101053JC)

Stochastic local search heuristic method based on deep reinforcement learning

Shuai LYU¹(),Jing LIU²

^1.College of Computer Science and Technology，Jilin University，Changchun 130012，China
^2.College of Software，Jilin University，Changchun 130012，China

Received:2020-07-08 Online:2021-07-01 Published:2021-07-14

摘要/Abstract

摘要：

为了更充分地利用可满足问题（SAT）的数据分布中的信息，从而提升算法性能，提出了一种基于深度强化学习的随机局部搜索启发式方法。把随机局部搜索算法中变量的选择看作强化学习任务，训练强化学习Agent学习策略作为随机局部搜索算法选择翻转变量的启发式，以期望通过端到端的方式获得效率更好的翻转变量的选择方法。实验结果表明，本文方法是有效的，并且与经典随机局部搜索算法ProbSAT相比，本文方法在性能上也有一定的优势，可以在更少的决策步骤内求出问题的解。

关键词: 计算机软件, 可满足性, 随机局部搜索, 深度强化学习

Abstract:

In order to make full use of the information in the data distribution of the Satisfiability Problem (SAT) and thereby improve the performance of the algorithm, this paper proposes a stochastic local search heuristic method based on deep reinforcement learning. The selection of the variables in stochastic local search algorithm is regarded as a reinforcement learning task. The strategy agent learns is used as a heuristic for selecting variables, so that we can obtain a more efficient variable selection heuristic in an end-to-end way. Experiments show that the method in this paper is effective. Compared with the classical stochastic local search algorithm ProbSAT, our method also has certain advantages in performance, and can solve the problems in fewer decision steps.

Key words: computer software, satisfiability, stochastic local search, deep reinforcement learning

中图分类号:

TP181

吕帅,刘京. 基于深度强化学习的随机局部搜索启发式方法[J]. 吉林大学学报(工学版), 2021, 51(4): 1420-1426.

Shuai LYU,Jing LIU. Stochastic local search heuristic method based on deep reinforcement learning[J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1420-1426.

图/表 5

图1

图2

表1

本文算法和ProbSAT在随机3-SAT的实例上测试的结果"

Dataset	RL?based		ProbSAT
Dataset	$t t o t a l$	$# f l i p s$	$t t o t a l$	$# f l i p s$
3?SAT（20）	17.3	16 164	1.3	18 892
3?SAT（50）	44.0	319 479	16.1	463 831
3?SAT（100）	80.2	772 876	109.7	990 461

表1

表2

同分布不同规模实例泛化能力测试"

Dataset	Train_on_D20		Train_on_D
Dataset	$t t o t a l$	$# f l i p s$	$t t o t a l$	$# f l i p s$
3?SAT（20）	17.3	161 64	17.3	16 164
3?SAT（50）	49.1	390 371	44.0	319 479
3?SAT（100）	112.4	1 061 201	80.2	772 876

表2

表3

不同分布实例泛化能力测试"

Dataset	$# v a r s$	$# c l a u s e s$	Train_on_D20		Train_on_D
Dataset	$# v a r s$	$# c l a u s e s$	$t t o t a l$	$# f l i p s$	$t t o t a l$	$# f l i p s$
DomSet（2）	15	69	1.5	14 201	1.6	15 796
DomSet（3）	36	369	14.3	127 834	12.7	110 420
DomSet（7）	128	4 344	133.5	1413 836	85.6	858 127

表3

参考文献 17

1	Cook S A. The complexity of theorem-proving procedures[C]∥Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, New Orleans, USA, 1971: 151-158.
2	Audemard G, Simon L. Predicting learnt clauses quality in modern SAT solvers[C]∥International Joint Conference on Artificial Intelligence, Hainan Island, China, 2009: 399-404.
3	Moskewicz M W, Madigan C F, Zhao Y, et al. Chaff: Engineering an efficient SAT solver [C]∥Proceedings of the 38th Annual Design Automation Conference, Las Vegas, USA, 2001: 530-535.
4	Eén N, Sörensson N. An extensible SAT-solver[C]∥International Conference on Theory and Applications of Satisfiability Testing, Santa Margherita Ligure, Italy,2003: 502-518.
5	Cai S, Su K. Local search for Boolean Satisfiability with configuration checking and subscore[J]. Artificial Intelligence, 2013, 204: 75-98.
6	Shang Y, Wah B W. A discrete Lagrangian-based global-search method for solving satisfiability problems[J]. Journal of Global Optimization, 1998, 12(1): 61-99.
7	Balint A, Schöning U. Choosing probability distributions for stochastic local search and the role of make versus break[C]∥International Conference on Theory and Applications of Satisfiability Testing, Trento, Italy, 2012: 16-29.
8	Wang F, Jiang M, Qian C, et al. Residual attention network for image classification[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA,2017: 3156-3164.
9	Chiu C C, Sainath T N, Wu Y, et al. State-of-the-art speech recognition with sequence-to-sequence models[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018: 4774-4778.
10	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems, Los Angeles,USA, 2017: 5998-6008.
11	Xu L, Hutter F, Hoos H H, et al. SATzilla: portfolio-based algorithm selection for SAT[J]. Journal of Artificial Intelligence Research, 2008, 32: 565-606.
12	Liang J H, Ganesh V, Poupart P, et al. Learning rate based branching heuristic for SAT solvers [C]∥International Conference on Theory and Applications of Satisfiability Testing, Bordeaux, France, 2016: 123-140.
13	Liang J H, VK H G, Poupart P, et al. An empirical study of branching heuristics through the lens of global learning rate[C]∥International Conference on Theory and Applications of Satisfiability Testing, Melbourne, Australia,2017: 119-135.
14	Selsam D, Lamm M, Bünz B, et al. Learning a SAT solver from single-bit supervision[C]∥The 7th International Conference on Learning Representations, San Diego, USA, 2019.
15	Sutton R S, Barto A G. Reinforcement Learning: an Introduction[M]. Boston:MIT Press, 2018: 68-69.
16	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
17	Mezard M, Zecchina R. Random k-satisifability problem: from an analytic solution to an efficient algorithm[J]. Physical Review E, 2002, 66 (5) :056126.

相关文章 15

[1]	赵亚慧,杨飞扬,张振国,崔荣一. 基于强化学习和注意力机制的朝鲜语文本结构发现[J]. 吉林大学学报(工学版), 2021, 51(4): 1387-1395.
[2]	魏晓辉,孙冰怡,崔佳旭. 基于图神经网络的兴趣活动推荐算法[J]. 吉林大学学报(工学版), 2021, 51(1): 278-284.
[3]	袁满,胡超,仇婷婷. 基于Linked data的数据完整性评估新方法[J]. 吉林大学学报(工学版), 2020, 50(5): 1826-1831.
[4]	刘磊,瓮杰,郭德贵. 面向编译器测试的部分求值静态输入确定方法[J]. 吉林大学学报(工学版), 2020, 50(1): 262-267.
[5]	杨顺,蒋渊德,吴坚,刘海贞. 基于多类型传感数据的自动驾驶深度强化学习方法[J]. 吉林大学学报(工学版), 2019, 49(4): 1026-1033.
[6]	马健, 樊建平, 刘峰, 李红辉. 面向对象软件系统演化模型[J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[7]	罗养霞, 郭晔. 基于数据依赖特征的软件识别[J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[8]	应欢, 王东辉, 武成岗, 王喆, 唐博文, 李建军. 适用于商用系统环境的低开销确定性重放技术[J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[9]	李勇, 黄志球, 王勇, 房丙午. 基于多源数据的跨项目软件缺陷预测[J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[10]	王念滨, 祝官文, 周连科, 王红卫. 支持高效路径查询的数据空间索引方法[J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[11]	陈鹏飞, 田地, 杨光. 基于MVC架构的LIBS软件设计与实现[J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[12]	康辉, 王家琦, 梅芳. 基于Pi演算的并行编程语言[J]. 吉林大学学报(工学版), 2016, 46(1): 235-241.
[13]	特日跟, 江晟, 李雄飞, 李军. 基于整数数据的文档压缩编码方案[J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[14]	冯晓宁, 王卓, 张旭. 基于L-π演算的WSN路由协议形式化方法[J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[15]	刘磊, 王燕燕, 申春, 李玉祥, 刘雷. Bellman-Ford算法性能可移植的GPU并行优化[J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度强化学习的随机局部搜索启发式方法

Stochastic local search heuristic method based on deep reinforcement learning

RICH HTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 17

相关文章 15

Metrics

本文评价

推荐阅读 0