Journal of Jilin University(Engineering and Technology Edition) ›› 2021, Vol. 51 ›› Issue (4): 1420-1426.doi: 10.13229/j.cnki.jdxbgxb20200515

Previous Articles    

Stochastic local search heuristic method based on deep reinforcement learning

Shuai LYU1(),Jing LIU2   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.College of Software,Jilin University,Changchun 130012,China
  • Received:2020-07-08 Online:2021-07-01 Published:2021-07-14

Abstract:

In order to make full use of the information in the data distribution of the Satisfiability Problem (SAT) and thereby improve the performance of the algorithm, this paper proposes a stochastic local search heuristic method based on deep reinforcement learning. The selection of the variables in stochastic local search algorithm is regarded as a reinforcement learning task. The strategy agent learns is used as a heuristic for selecting variables, so that we can obtain a more efficient variable selection heuristic in an end-to-end way. Experiments show that the method in this paper is effective. Compared with the classical stochastic local search algorithm ProbSAT, our method also has certain advantages in performance, and can solve the problems in fewer decision steps.

Key words: computer software, satisfiability, stochastic local search, deep reinforcement learning

CLC Number: 

  • TP181

Fig.1

Graph representation of SAT problem"

Fig.2

Network structure"

Table 1

Results of proposed method and ProbSAT tested on random 3-SAT problem instances"

DatasetRL?basedProbSAT
ttotal#flipsttotal#flips
3?SAT(20)17.316 1641.318 892
3?SAT(50)44.0319 47916.1463 831
3?SAT(100)80.2772 876109.7990 461

Table 2

Generalization ability test of instances with the same distribution but different scales"

DatasetTrain_on_D20Train_on_D
ttotal#flipsttotal#flips
3?SAT(20)17.3161 6417.316 164
3?SAT(50)49.1390 37144.0319 479
3?SAT(100)112.41 061 20180.2772 876

Table 3

Generalization ability test of instances with different distribution"

Dataset#vars#clausesTrain_on_D20Train_on_D
ttotal#flipsttotal#flips
DomSet(2)15691.514 2011.615 796
DomSet(3)3636914.3127 83412.7110 420
DomSet(7)1284 344133.51413 83685.6858 127
1 Cook S A. The complexity of theorem-proving procedures[C]∥Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, New Orleans, USA, 1971: 151-158.
2 Audemard G, Simon L. Predicting learnt clauses quality in modern SAT solvers[C]∥International Joint Conference on Artificial Intelligence, Hainan Island, China, 2009: 399-404.
3 Moskewicz M W, Madigan C F, Zhao Y, et al. Chaff: Engineering an efficient SAT solver [C]∥Proceedings of the 38th Annual Design Automation Conference, Las Vegas, USA, 2001: 530-535.
4 Eén N, Sörensson N. An extensible SAT-solver[C]∥International Conference on Theory and Applications of Satisfiability Testing, Santa Margherita Ligure, Italy,2003: 502-518.
5 Cai S, Su K. Local search for Boolean Satisfiability with configuration checking and subscore[J]. Artificial Intelligence, 2013, 204: 75-98.
6 Shang Y, Wah B W. A discrete Lagrangian-based global-search method for solving satisfiability problems[J]. Journal of Global Optimization, 1998, 12(1): 61-99.
7 Balint A, Schöning U. Choosing probability distributions for stochastic local search and the role of make versus break[C]∥International Conference on Theory and Applications of Satisfiability Testing, Trento, Italy, 2012: 16-29.
8 Wang F, Jiang M, Qian C, et al. Residual attention network for image classification[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA,2017: 3156-3164.
9 Chiu C C, Sainath T N, Wu Y, et al. State-of-the-art speech recognition with sequence-to-sequence models[C]∥IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018: 4774-4778.
10 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems, Los Angeles,USA, 2017: 5998-6008.
11 Xu L, Hutter F, Hoos H H, et al. SATzilla: portfolio-based algorithm selection for SAT[J]. Journal of Artificial Intelligence Research, 2008, 32: 565-606.
12 Liang J H, Ganesh V, Poupart P, et al. Learning rate based branching heuristic for SAT solvers [C]∥International Conference on Theory and Applications of Satisfiability Testing, Bordeaux, France, 2016: 123-140.
13 Liang J H, VK H G, Poupart P, et al. An empirical study of branching heuristics through the lens of global learning rate[C]∥International Conference on Theory and Applications of Satisfiability Testing, Melbourne, Australia,2017: 119-135.
14 Selsam D, Lamm M, Bünz B, et al. Learning a SAT solver from single-bit supervision[C]∥The 7th International Conference on Learning Representations, San Diego, USA, 2019.
15 Sutton R S, Barto A G. Reinforcement Learning: an Introduction[M]. Boston:MIT Press, 2018: 68-69.
16 Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
17 Mezard M, Zecchina R. Random k-satisifability problem: from an analytic solution to an efficient algorithm[J]. Physical Review E, 2002, 66 (5) :056126.
[1] Ya-hui ZHAO,Fei-yang YANG,Zhen-guo ZHANG,Rong-yi CUI. Korean text structure discovery based on reinforcement learning and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1387-1395.
[2] Xiao-hui WEI,Bing-yi SUN,Jia-xu CUI. Recommending activity to users via deep graph neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 278-284.
[3] Man YUAN,Chao HU,Ting-ting QIU. A new method for data integrity assessment based on Linked data [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1826-1831.
[4] Lei LIU,Jie WENG,De-gui GUO. Static input determination method in partial evaluation for compiler test [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(1): 262-267.
[5] Shun YANG,Yuan⁃de JIANG,Jian WU,Hai⁃zhen LIU. Autonomous driving policy learning based on deep reinforcement learning and multi⁃type sensor data [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1026-1033.
[6] MA Jian, FAN Jian-ping, LIU Feng, LI Hong-hui. The evolution model of objective-oriented software system [J]. 吉林大学学报(工学版), 2018, 48(2): 545-550.
[7] LUO Yang-xia, GUO Ye. Software recognition based on features of data dependency [J]. 吉林大学学报(工学版), 2017, 47(6): 1894-1902.
[8] YING Huan, WANG Dong-hui, WU Cheng-gang, WANG Zhe, TANG Bo-wen, LI Jian-jun. Efficient deterministic replay technique on commodity system environment [J]. 吉林大学学报(工学版), 2017, 47(1): 208-217.
[9] LI Yong, HUANG Zhi-qiu, WANG Yong, FANG Bing-wu. New approach of cross-project defect prediction based on multi-source data [J]. 吉林大学学报(工学版), 2016, 46(6): 2034-2041.
[10] WANG Nian-bin, ZHU Guan-wen, ZHOU Lian-ke, WANG Hong-wei. Novel dataspace index for efficient processing of path query [J]. 吉林大学学报(工学版), 2016, 46(3): 911-916.
[11] CHEN Peng-fei, TIAN Di, YANG Guang. Design and implementation of LIBS software based on MVC architecture [J]. 吉林大学学报(工学版), 2016, 46(1): 242-245.
[12] TE Ri-gen, JIANG Sheng, LI Xiong-fei, LI Jun. Document compression scheme based on integer data [J]. 吉林大学学报(工学版), 2016, 46(1): 228-234.
[13] FENG Xiao-ning, WANG Zhuo, ZHANG Xu. Formal method for routing protocol of WSN based on L-π calculus [J]. 吉林大学学报(工学版), 2015, 45(5): 1565-1571.
[14] LIU Lei, WANG Yan-yan, SHEN Chun, LI Yu-xiang, LIU Lei. Performance portable GPU parallel optimization technique on Bellman-Ford algorithm [J]. 吉林大学学报(工学版), 2015, 45(5): 1559-1564.
[15] LI Ming-zhe, WANG Jin-lin, CHEN Xiao, CHEN Jun. Architecture model of streaming media applications on network processors(VPL) [J]. 吉林大学学报(工学版), 2015, 45(5): 1572-1580.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!