next up previous contents
Next: Reference - Debugging Up: SBSAT User Manual and Previous: Reference - Results: making   Contents


Reference - Results: Experiments

SBSAT was tested on several popular benchmark suites. We also ran current versions of Berkmin (v. 561), zChaff (v. 2003.10.9), and Siege (v. 4) on these benchmarks for comparison. In addition, we concocted a class of random problems, called sliders, which resemble BMC problems in that copies of the same function, each differing only in the input variables it depends on, are conjoined. Making those functions random, in some sense, makes sliders hard. Specifically, sliders are defined as follows:

$\textstyle \parbox{4.5in}{ Choose $m$, even, the number of constraints and the
...
...rray*}where each $o_h$\ is independently and uniformly chosen from $\{0,1\}$.
}$

We find sliders appealing because they resemble some real-world problem domains and because $f$ and $g$ can be designed to force inferences to occur only when nearly all inputs of $f$ and $g$ are assigned values. This fact makes conflict analysis useless, and is challenging to a search heuristic which is looking for information contained in groups of variables.

At this stage of our SBSAT implementation, lemmas are handled in a rather primitive manner so we observe an unusually low number of backtracks per second. All experiments were run on a single processor Pentium 4, 2 GHz, with 2 GB RAM.

Our first set of results, shown in Table [*], is for the problem of verifying a long multiplier. The circuit definition is available from Carnegie Mellon University. All inputs are unsatisfiable. The left column of the table shows the number of time steps involved in the verification of each benchmark (see Section [*]). Experiments were run from 4 time steps to 70 time steps. The next three columns present the observed performance of SBSAT on three-address inputs in total number of choice points, total time, and search time. The next three columns present the same information except when translated CNF formulas are input (see Section [*]). The next two columns present the performance of zChaff in choice points and total time and the last two columns present the results of Siege and Berkmin on the CNF versions we generated.


Table: SBSAT, zChaff, Siege, Berkmin times on the Long Multiplier benchmarks
  SBSAT on Three-Address SBSAT on CNF zChaff on CNF Siege Berkmin
#time number total branch number total branch number total total total
steps choices (sec) (sec) choices (sec) (sec) choices (sec) (sec) (sec)
4 720 2.3 0.16 687 1.47 0.86 1041 0.45 0.2 0.27
8 10000 14.78 7.12 13110 41.02 39.28 33272 50.37 12.73 18.9
12 19398 42.14 27.31 31963 167.8 163.8 122522 357.1 71.61 96.9
16 17508 61.05 38.89 32969 247.3 240.4 125026 366.7 177.4 200.6
20 14077 72.63 41.65 34426 347.0 335.1 164373 585.9 165.2 178.8
24 17775 118.5 77.03 23854 270.0 252.0 214263 790.3 542.8 312.2
28 18872 134.1 81.71 23847 319.0 293.8 220045 888.2 805.4 255.0
32 18538 155.6 90.5 16718 262.9 228.3 216916 882.8 1035 334.6
36 20356 186.8 109.9 14750 278.0 233.5 269856 1055 576.8 420.4
40 19141 203.3 113.5 11703 281.1 225.0 289687 1103 845.3 442.6
50 21867 263.0 134.4 11306 378.6 286.9 472053 2032 1552 466.9
60 24985 434.1 239.4 10844 450.2 313.6 461867 2183 3340 709.2
70 26907 618.4 335.9 11270 632.8 164.3 850942 5875 2860 844.7


Observe that SBSAT working in the user domain on three-address code shows a slight advantage to working with the CNF translation. It is interesting that in the case of CNF inputs, more preprocessing seems to result in less searching. The fact that preprocessing varies so much from benchmark to benchmark on CNF inputs may reflect the imprecision of guesses made when trying to recreate domain-specific information from given CNF formulas. Such preprocessing fluctuations are not as pronounced when three-address codes are input to SBSAT.

Observe that zChaff and Siege cannot compete with SBSAT on long multiplier benchmarks. The problem seems to be due to encountering many more choicepoints during search. Berkmin visits only about an order of magnitude more choicepoints than SBSAT on CNF inputs but the slower implementation of lemmas in SBSAT enables Berkmin to be only a fraction slower than SBSAT, in general. The difference in choicepoints suggests the success in this case is due to the complex search heuristic used natively in SBSAT.

Table [*] shows timings for the set of barrel benchmarks. The three-address code equivalents were generated by applying the bmc tool to the output of the genbarrel utility in the bmc suite. All inputs are unsatisfiable. Runs were cut off prematurely if not completed before 3600 seconds. This is reflected as a line (--) through a table entry.

Observe that in all cases, SBSAT solved the problems constructed from the three-address code without any search. This raises the question of whether a BDD tool might also do as well. This appears not to be the case, since we build a collection of BDDs of about 10 variables each and then strengthen them against each other. The inferences resulting from this process are enough to generate a contradiction before search is applied. We suppose a BDD tool would either have attempted to build a single BDD from the three-address code, in which case it would have been forced to give up due to unmanageable sizes, or it would have used the conjoin operation instead of the strengthening operation to combine the BDDs, probably again taking too much space. Although the time taken by SBSAT in preprocessing is considerable, it is shown to be well-spent as SBSAT, zChaff, Siege, and Berkmin all have difficulty with the larger CNF versions of the barrel benchmarks. Thus, it appears staying closer to the user-domain and preprocessing to reveal inferences early has paid off on these benchmarks.


Table: SBSAT, zChaff, Siege, Berkmin times on the Barrel benchmarks
  SBSAT on Three-Address SBSAT on CNF zChaff on CNF Siege Berkmin
Name number total branch number total branch number total total total
  choices (sec) (sec) choices (sec) (sec) choices (sec) (sec) (sec)
barrel2 0 0.00 0.00 3 0.05 0.00 3 0.00 0.01 0.0
barrel3 0 0.11 0.00 13 0.08 0.00 48 0.00 0.01 0.0
barrel4 0 0.12 0.00 33 0.15 0.01 201 0.02 0.01 0.01
barrel5 0 0.72 0.00 354 0.66 0.21 8856 0.58 0.67 0.65
barrel6 0 1.48 0.00 1205 2.89 1.96 28110 2.81 5.97 5.56
barrel7 0 2.84 0.00 2848 11.10 8.51 66959 11.37 21.19 29.96
barrel8 0 5.05 0.00 4304 25.15 18.71 116858 31.98 136.7 298.3
barrel9 0 67.87 0.00 -- -- -- 649532 254.6 41.24 89.27
barrel10 0 108.9 0.00 -- -- -- 1801476 1191 86.34 184.0
barrel11 0 166.2 0.00 -- -- -- -- -- 134.7 238.3
barrel12 0 243.8 0.00 -- -- -- -- -- 927.1 999.3
barrel13 0 348.4 0.00 -- -- -- -- -- 629.9 1049
barrel14 0 481.9 0.00 -- -- -- -- -- 2122 3389
barrel15 0 655.9 0.00 -- -- -- -- -- -- --
barrel16 0 859.7 0.00 -- -- -- -- -- -- --


Tables [*] and [*] show timings for a set of queue benchmarks and permute benchmarks generated by genqueue and genpermute, respectively, from the bmc suite. Cutoff of runs was set at 3600 seconds for the queue benchmarks and 60000 seconds for the permute benchmarks. All inputs are unsatisfiable. The pattern observed is similar to the previous sets of runs. When SBSAT works with three-address code timings are much better than when equivalent CNF inputs are used. Working in three-address code gets results faster than other solvers on equivalent CNF inputs.


Table: SBSAT, zChaff, Siege, Berkmin times on the Queue benchmarks
  SBSAT on Three-Address SBSAT on CNF zChaff on CNF Siege Berkmin
Name number total branch number total branch number total total total
  choices (sec) (sec) choices (sec) (sec) choices (sec) (sec) (sec)
queue4 41 0.1 0.0 19 0.11 0.00 32 0.00 0.01 0.0
queue8 651 3.04 0.07 291 0.49 0.10 561 0.05 0.04 0.05
queue12 4351 5.53 1.02 3875 5.52 4.38 11752 3.09 1.04 0.96
queue16 30835 22.3 14.7 41029 107 104 73407 62.22 30.27 32.38
queue20 311127 265 227 565559 2420 2412 698914 1874 400.4 401.0
queue22 1052750 843 798 2016859 9367 9356 -- -- 1886 1050
queue24 3262464 2666 2613 -- -- -- -- -- -- 2724



Table: SBSAT, zChaff, Siege, Berkmin times on the Permute benchmarks
  SBSAT on Three-Address SBSAT on CNF zChaff on CNF Siege Berkmin
Name number total branch number total branch number total total total
  choices (sec) (sec) choices (sec) (sec) choices (sec) (sec) (sec)
permute2 0 0.01 0.00 1 0.05 0.00 1 0.00 0.01 0.00
permute3 5 0.04 0.00 14 0.07 0.00 11 0.00 0.01 0.00
permute4 68 0.65 0.00 47 0.11 0.00 52 0.00 0.01 0.01
permute5 174 10.1 0.01 304 0.27 0.10 199 0.02 0.02 0.03
permute6 893 11.46 0.09 1655 1.44 1.15 2021 0.28 0.17 0.16
permute7 5537 23.24 0.81 8551 9.21 8.77 16485 9.51 2.88 1.12
permute8 64607 71.16 70.21 58051 244.1 243.2 110492 172.93 21.6 15.7
permute9 454726 686.6 685.0 471422 2575 2573 361422 1018 315 228
permute10 1311291 2064 2062 -- -- -- 2118409 12101 3003 3891
permute11 20462503 39260 39257 -- -- -- -- -- -- --


The story changes on the queue invariant benchmarks of Table [*]. In this case, SBSAT experienced memory problems. In order to fit the resulting SMURFs into memory, the BDDs upon which they were based were required to be so small we had to change their maximum size manually, that is, after preprocessing. The result was an unexpectedly large amount of garbling of domain-specific information and dismal results. We did not feel it was worthwhile reporting them. Although SBSAT did solve the CNF versions of these problems, the other solvers performed better as in previous benchmark sets.


Table: SBSAT, zChaff, Siege, Berkmin times on the Queue Invariant benchmarks
  SBSAT on CNF zChaff on CNF Siege Berkmin
Name number total branch number total total total
  choices (sec) (sec) choices (sec) (sec) (sec)
queueinv4 83 0.08 0.01 136 0.00 0.01 0.01
queueinv8 438 0.17 0.07 1122 0.04 0.06 0.06
queueinv12 1429 0.98 0.42 4368 0.22 0.31 0.12
queueinv16 2411 1.04 0.75 7721 0.27 0.53 0.24
queueinv20 4787 6.81 2.91 16258 1.63 0.73 0.81
queueinv24 7379 13.62 6.00 26995 2.96 1.89 1.90
queueinv28 10914 25.61 11.23 38145 5.69 3.88 3.40
queueinv32 15403 16.73 14.56 68641 3.20 3.74 4.20
queueinv36 21324 116.6 35.01 103281 23.58 9.59 10.33
queueinv40 27404 189.2 52.54 145691 38.08 17.62 16.46
queueinv44 35820 309.2 88.65 166634 46.42 57.38 25.16
queueinv48 44719 476.2 135.8 217615 79.95 62.0 43.61
queueinv52 52320 683.8 189.9 297830 179.2 155.5 55.93
queueinv56 51768 928.0 238.9 397142 239.1 514.9 82.13


For completeness, we include results on the dlx suite available from Carnegie Mellon University in Table [*]. Some inputs are satisfiable and some are unsatisfiable. We applied SBSAT to two variations: namely Trace and CNF formats (both available). All problems in this suite are easy for all the solvers and that is about all that can be said about them. We did not include results of dlx9 benchmarks because SBSAT had some memory problems.


Table: SBSAT, zChaff, Siege, Berkmin times on the DLX benchmarks. Benchmarks with bug in the name are satisfiable (verified) and the rest are unsatisfiable.
  SBSAT on Trace SBSAT on CNF zChaff on CNF Siege Berkmin
Name number total branch number total branch number total total total
  choices (sec) (sec) choices (sec) (sec) choices (sec) (sec) (sec)
dlx1_c 525 0.12 0.02 592 0.12 0.03 1082 0.02 0.01 0.01
dlx2_aa 1755 0.22 0.06 2062 0.26 0.08 5224 0.10 0.06 0.02
dlx2_ca 7247 1.49 1.00 6861 1.60 0.91 9800 0.30 0.17 0.12
dlx2_cc 9655 2.60 2.03 9631 2.83 1.97 17825 0.95 0.36 0.26
dlx2_cl 9375 2.14 1.56 8872 2.33 0.57 25390 1.50 0.71 0.29
dlx2_cs 8489 1.84 1.31 7916 2.15 1.37 16310 0.77 0.20 0.23
dlx2_la 6233 1.06 0.64 6814 1.41 0.84 9246 0.26 0.11 0.10
dlx2_sa 2938 0.35 0.16 2168 0.38 0.15 5563 0.14 0.08 0.03
dlx2_cc_bug01 6603 1.77 1.20 6448 2.11 1.25 14471 0.84 0.18 0.28
dlx2_cc_bug02 6584 1.80 1.22 6432 2.09 1.25 13717 0.79 0.48 0.26
dlx2_cc_bug03 6861 1.81 1.23 6628 2.09 1.23 22776 1.05 0.01 0.10
dlx2_cc_bug04 6932 1.92 1.33 6699 1.12 1.28 12860 0.52 0.08 0.08
dlx2_cc_bug05 3743 1.24 0.65 3413 1.47 0.62 376 0.01 0.22 0.13
dlx2_cc_bug06 3630 1.19 0.60 3581 1.52 0.67 374 0.01 0.01 0.10
dlx2_cc_bug07 4601 1.36 0.77 3567 1.50 0.65 316 0.01 0.03 0.05
dlx2_cc_bug08 5964 1.65 1.06 5353 1.75 0.92 747 0.02 0.01 0.04
dlx2_cc_bug09 2549 0.92 0.42 2693 1.18 0.33 321 0.01 0.01 0.02
dlx2_cc_bug10 3423 1.15 0.55 3564 1.41 0.56 259 0.00 0.02 0.02
dlx2_cc_bug11 6037 1.60 1.03 6886 2.20 1.35 10528 0.43 0.02 0.06
dlx2_cc_bug12 7099 2.00 1.43 5702 1.91 1.05 11099 0.44 0.07 0.10
dlx2_cc_bug13 5998 1.69 1.12 6133 1.91 1.08 12049 0.50 0.03 0.02
dlx2_cc_bug14 253 0.59 0.01 298 0.87 0.01 234 0.01 0.12 0.02
dlx2_cc_bug15 4405 1.93 1.27 3756 1.99 0.99 296 0.01 0.01 0.06
dlx2_cc_bug16 252 0.58 0.01 297 0.86 0.01 233 0.01 0.13 0.01
dlx2_cc_bug17 504 1.16 0.06 4453 2.97 1.01 5806 0.40 0.01 0.01
dlx2_cc_bug18 1066 1.06 0.10 3236 2.51 0.78 337 0.01 0.01 0.02
dlx2_cc_bug19 269 0.63 0.02 302 0.89 0.02 4452 0.15 0.01 0.00
dlx2_cc_bug20 703 0.60 0.03 777 0.89 0.50 521 0.01 0.01 0.02
dlx2_cc_bug21 331 0.59 0.02 360 0.85 0.02 458 0.01 0.01 0.01
dlx2_cc_bug22 744 0.62 0.40 865 0.91 0.05 4456 0.19 0.01 0.04
dlx2_cc_bug23 620 0.60 0.03 323 0.86 0.02 4726 0.14 0.01 0.10
dlx2_cc_bug24 270 0.59 0.02 313 0.86 0.02 4034 0.14 0.01 0.04
dlx2_cc_bug25 3931 1.32 0.75 3233 1.44 0.59 4406 0.14 0.01 0.02
dlx2_cc_bug26 4200 1.42 0.83 3687 1.58 0.48 543 0.02 0.02 0.02
dlx2_cc_bug27 591 0.52 0.02 2979 1.16 0.46 293 0.01 0.03 0.00
dlx2_cc_bug28 2205 0.88 0.22 5275 2.05 1.09 339 0.01 0.08 0.01
dlx2_cc_bug29 324 0.58 0.01 334 0.87 0.02 243 0.01 0.19 0.03
dlx2_cc_bug30 311 0.60 0.02 267 0.89 0.02 323 0.01 0.30 0.02
dlx2_cc_bug31 294 0.58 0.02 325 0.88 0.02 247 0.00 0.24 0.02
dlx2_cc_bug32 278 0.59 0.02 317 0.86 0.02 242 0.00 0.02 0.01
dlx2_cc_bug33 299 0.58 0.02 305 0.88 0.02 272 0.01 0.19 0.06
dlx2_cc_bug34 329 0.60 0.02 506 0.86 0.03 298 0.01 0.30 0.02
dlx2_cc_bug35 282 0.59 0.02 328 0.89 0.02 318 0.01 0.32 0.03
dlx2_cc_bug36 279 0.61 0.02 325 0.86 0.02 316 0.01 0.08 0.07
dlx2_cc_bug37 3643 1.28 0.71 3214 1.45 0.60 329 0.01 0.05 0.01
dlx2_cc_bug38 6249 1.70 0.43 5854 1.93 1.09 9500 0.36 0.44 0.07
dlx2_cc_bug39 3307 1.07 0.54 6058 1.88 1.04 12314 0.50 0.04 0.40
dlx2_cc_bug40 8046 2.21 1.64 6748 2.26 1.40 9972 0.41 0.12 0.02


Finally, Table [*] shows the result of applying all the solvers to a family of slider problems, some satisfiable and some unsatisiable, based on the following:

sliderxx_sat:

\begin{eqnarray*}
&& f=(x_1\oplus (\neg x_{i_3}\wedge x_{i_1})\oplus\neg(x_{m/2}...
...}\wedge x_{j_4})
\oplus x_{j_3}) \oplus (x_{m/2}\equiv x_{j_1})
\end{eqnarray*}

$f$:
$m$ $i_1$ $i_2$ $i_3$ $i_4$
60 13 15 17 24
70 12 15 17 24
80 15 17 33 24
90 15 17 24 33
100 15 17 24 43
110 15 17 24 43
120 15 24 43 57
  $g$:
$m$ $j_1$ $j_2$ $j_3$ $j_4$
60 12 16 18 27
70 12 15 19 27
80 12 16 18 27
90 12 16 18 27
100 18 26 27 42
110 20 26 27 42
120 6 18 27 42

sliderxx_unsat:

\begin{eqnarray*}
&& f=(x_1\oplus (\neg x_{i_3}\wedge x_{i_1})\oplus\neg(x_{m/2}...
...\wedge x_{j_4}) \oplus x_{j_3}) \oplus (x_{j_5}\equiv x_{j_1})))
\end{eqnarray*}

$f$:
$m$ $i_1$ $i_2$ $i_3$ $i_4$
60 13 15 17 24
70 12 15 17 24
80 15 17 33 24
90 15 17 24 33
100 15 17 24 43
110 15 17 24 43
120 15 24 43 57
  $g$:
$m$ $j_1$ $j_2$ $j_3$ $j_4$ $j_5$
60 12 16 18 19 27
70 12 16 18 19 27
80 12 16 18 27 29
90 12 16 18 27 29
100 18 19 26 27 42
110 18 29 26 27 42
120 6 18 27 29 42

If ``unsat'' is in the name of the benchmark, then it is unsatisfiable, otherwise it is satisfiable. The number in the name of each benchmark refers to the value of $m$. The value of $k$ for all benchmarks is fixed at 6 and the value of $l$ is 6 or 7 (see the beginning of this section for an explanation of this family of benchmarks and the meaning of $m$, $k$ and $l$). The two functions were chosen to yield somewhat balanced BDDs, requiring nearly all inputs to have a value before an inference could be established. These are hard problems and only zChaff was able to approach the runtimes of SBSAT. Table [*] shows why these problems are hard. We turned off lemmas in SBSAT and reran all the benchmarks. The number of choicepoints generated did not change very much. Thus, for these problems, learning from conflict analysis during search seems to help little. Notice also that SBSAT running time changes by an order of magnitude. This clearly points to adjustments that must be made to lemma handling.


Table: SBSAT, zChaff, Siege, Berkmin times on the Slider benchmarks
  SBSAT zChaff Siege Berkmin
Name number total branch number total number total number total
  choices (sec) (sec) choices (sec) choices (sec) choices (sec)
slider60_sat 1051 0.25 0.10 534 0.02 2900 0.16 2114 0.09
slider70_sat 622 0.27 0.06 1511 0.07 329 0.01 425 0.01
slider80_sat 79884 39.4 39.2 149153 52.8 38044 6.20 209805 73.5
slider90_sat 2765 0.64 0.44 66152 14.3 47180 9.56 41372 8.63
slider100_sat 36761 15.4 15.9 104054 85.5 70693 35.8 120468 48.2
slider110_sat 171163 113.4 113.2 280126 173.3 576670 437.4 1909731 801.4
slider60_unsat 9227 1.49 1.27 27414 3.4 19505 2.63 25251 4.37
slider70_unsat 7957 1.46 1.29 18157 1.93 17735 2.21 17543 2.17
slider80_unsat 148242 78.8 78.6 245112 116.6 215436 104.4 -- --
slider90_unsat 429468 263.4 263.0 685026 513.4 501539 302.5 -- --
slider100_unsat 1600514 1066 1065 1495633 3094 2482913 6540 -- --



Table: SBSAT times and choice points on the Slider benchmarks, with and without Lemmas.
  SBSAT with Lemmas SBSAT without Lemmas
Name number total branch number total branch
  choices (sec) (sec) choices (sec) (sec)
slider60_sat 1051 0.25 0.10 1152 0.14 0.04
slider70_sat 622 0.27 0.06 1265 0.22 0.05
slider80_sat 79884 39.4 39.2 111575 5.22 5.12
slider90_sat 2765 0.64 0.44 3576 0.30 0.18
slider100_sat 36761 15.4 15.9 51994 2.83 2.69
slider110_sat 171163 113.4 113.2 282213 16.0 15.8
slider120_sat -- -- -- 1539977 86.3 86.1
slider60_unsat 9227 1.49 1.27 10004 0.46 0.37
slider70_unsat 7957 1.46 1.29 9373 0.50 0.39
slider80_unsat 148242 78.8 78.6 190177 8.67 8.57
slider90_unsat 429468 263.4 263.0 626812 29.8 29.7
slider100_unsat 1600514 1066 1065 2403878 124.2 124.1
slider110_unsat -- -- -- 10256075 564.7 564.5



next up previous contents
Next: Reference - Debugging Up: SBSAT User Manual and Previous: Reference - Results: making   Contents
John Franco 2011-09-15