The Satisfiability problem (SAT) is to determine whether there exists
an assignment of values to a collection of Boolean variables that
causes a propositional formula expressed in those variables to
evaluate to true. The problem is at the core of many other
problems - so much so that it became the first problem to be shown to
be NP-complete via a general translation that can be applied to just
about all interesting combinatorial problems. The NP-completeness of
SAT means there is no known algorithm that is guaranteed to run
efficiently on all instances of SAT. The Davis Putnam Loveland
Logemann (DPLL) search algorithm for solving instances of SAT that are
expressed in Conjunctive Normal Form (CNF - a canonical representation
of propositional formulas) was discovered in 1960 and remains the
backbone of currently successful Satisfiability solvers. In the late
1970s Martin Davis, then a professor at New York University, asked a
student, Allan Goldberg, to investigate the average running time of
DPLL. The student surprised the world by showing that if all inputs
are distributed uniformly, DPLL runs efficiently on the average.
I analyzed the distribution that Goldberg used and found that it,
rather than DPLL, was responsible for this amazing result:
specifically, I showed that any random assignment is a solution
to a randomly generated input with probability tending to 1 as the
number of variables increases. I proposed an alternative
distribution, now known as the constant-width distribution. The
constant-width distribution comes with a parameter
called density. I showed that the DPLL variant considered by
Goldberg performs inefficiently on random inputs generated according
to the constant-width distribution, regardless of density.
With collaborators, I further showed that random constant-width
formulas nearly never have a true assignment if the density is
greater than a constant times 2k, where k
is the disjunction size. I introduced non-backtracking variants of
DPLL which are now known in the literature as UC, GUC, and UC with
majority rule and showed that each of these performed efficiently on
almost all random instances when density is less than a constant times
2k/k - the constant depending on the variant.
We applied the techniques used to obtain the above results to estimate
the impact of certain classes of SAT that are known to be solved
efficiently without search. There are many such classes but some of
the best known are 2-SAT, Horn, q-Horn, Extended Horn, Hidden Horn,
CC-balanced, Linear Autarkies, and Matched formulas. The q-Horn class
is especially interesting because, before our analysis, it had been
thought to be the largest succinctly expressible, efficiently solved
class of CNF formulas. We showed that some of these classes,
including q-Horn, represent a substantial fraction of input formulas
only if density is less than a constant times k and others only
if density is less than a constant. None of the classes is
substantially large up to densities on the order of even
2k/k. We introduced another efficiently
solved class called SLUR (for Single Lookahead Unit Resolution) and
showed it subsumes most of the classes above. However, SLUR is
no larger in the probabilistic sense than the trivial Matched
Our probabilistic studies have provided insights which have had some
influence on SAT solver design and, more importantly, have helped to
understand the boundaries of efficiently solved problems: the
structures that allow such problems to be solved efficiently, and the
structures that prevent formulas from being members of efficiently
solved classes. The constant-width distribution remains to this day
the most important distribution for analysis and interpreting SAT
algorithmic behavior and has been used effectively by physicists to
help understand what happens to solution structure as density
Our more recent research attempts to transfer some of the knowledge
gained from our previous research and some new problem solving ideas
to the real world. Doing so required building our own testing tool
which became the SBSAT solver mentioned earlier. SBSAT is unique in
several ways and I mention the two most interesting ways here. First,
SBSAT is non-clausal. All other modern SAT solvers depend on CNF
inputs but few real problems are naturally expressed that way. Thus,
problems must be translated to CNF from some other language, for
example temporal logic, and the translation tends to garble critical
domain-specific information that would otherwise be useful in
directing a search for a solution. SBSAT bypasses the translation
step. It is designed to memoize domain-specific information in
preprocessing and use it to guide the search. SBSAT maintains a
collection of state machines, the nodes and edges of which retain data
such as how many inferences might be generated deeper in the search if
a particular unassigned variable is assigned a particular value.
Thus, SBSAT uses a new form of look-ahead, which we call
function-complete look-ahead. This is the second unique feature of
SBSAT. Other solvers perform either depth-first look-ahead, also
known as restarts, or breadth-first look-ahead, such as in Prover.
Since, as was mentioned earlier, many problems are hard because they
do not reveal inferences until deep in the search, the
function-complete look-ahead approach appears to have an enhanced
potential for discovery of at least some of these hard-to-get
inferences. SBSAT remains under development and currently serves as a
testbed for several new approaches to solving difficult instances of
||Groundwater Remediation Software
The problem of cleaning up (alternatively, remediating) a contaminated
landfill has become a national priority only recently. This is partly
because, until the 1980s, it was not generally realized that
groundwater, the water flowing beneath the surface of the earth in
aquifers, can become polluted to the point of being unhealthy. Since
about 50% of the drinking water in the United States comes from
aquifers, remediation of contaminated sites drained by groundwater has
received urgent attention.
When contaminated sites are drained by aquifers, remediation plans
must include an effective means to intercept and extract a sufficient
volume of contaminant from groundwater systems near the site to
prevent unsatisfactory concentrations from accumulating at other sites
served by these systems. The kinds of interception equipment
available include injection wells, extractions wells, and trenches.
Injection wells pump water into the ground and are used to control
locally the direction of groundwater and therefore the flow of
contaminant. Extraction wells are used to pump contaminated water out
of the ground. Trenches provide a passive means for contaminant
extraction and may be an attractive alternative to extraction wells in
The optimal placement of these remediation objects is a complex
function of many variables such as local knowledge, capital and
operating costs of the remediation objects, availability of
remediation object resources, what are considered safe levels of
contaminant concentration, what can or must be done with the recovered
contaminated water, and several others. This level of complexity
requires some mechanical computational assistance. However, although
the flow of contaminant through a groundwater system remediated by a
particular extraction system can be simulated by a computer, there was
no computer aid which finds an optimal or near optimal recovery plan:
that is, a plan which recovers sufficient contaminant at least cost.
Some of the features of our software are:
It was available on DOS machines and MacIntosh computers.
It could be used in the classroom to assist in the teaching of
It could mix approximation algorithms for optimization with simulation
algorithms for validation. As such, it could be a research platform for
experimenting with a variety of optimization techniques.
It could provide several levels of simulation and approximation accuracy:
the final remediation plan for a site could be determined by the process of
stepwise-refinement through several levels.
It could provide several levels of output detail. For example, one
may view the status of the entire site at one particular time in order
to understand global, spatial relationships, or view specific
information at a point on the site in order to understand time
dependent relationships between measurable attributes. Windows and
dialog boxes provide the output. These may be opened and closed
easily when needed.
Many of the attributes of the simulation could be represented visually.
A model of a site could be represented on the computer screen. Site features
such as roads, buildings, and rivers could be visible or invisible as needed.
Soil properties such as permeability could be represented visually.
Quantities such as Head, and Concentration could also be represented visually
using color shading. Remediation objects could be moved around the screen
under mouse control and automatically by the results of intrinsic
approximation algorithms; in this way, partial results could be observed
rapidly, possibly leading to abandonment of some alternatives in favor of
others. Here is a screenshot:
It could solve a variety of subsurface transport problems using a
variety of treatment technologies. Thus, the cost effectiveness of
various methods of treatment could be compared.
The most important feature of the program was the ability to integrate
different kinds of information at different levels of detail. The
user was able to concentrate on site visualization and specific
quantities at the same time, as needed. For example, a user could
locate a test-point on the visualized landscape and pop open a graph
showing the history of an attribute at that point. Then the user
could explore that graph with the mouse to find exact values at
particular graph points.
I am working closely with colleagues at Ft. George Meade, Maryland,
until late March, 2010 on a number of projects including improving SBSAT.
||SIAM Monograph on Satisfiability
I am writing a monograph for the Society of Industrial and Applied
Mathematics entitled Analysis of Algorithms for Monotonic and
Non-monotonic Reasoning, with John Schlipf of the University of
Cincinnati. Expected completion is late 2010.
||Non-clausal Satisfiability Solver Competition
I am co-organizing a non-clausal Satisfiability solver competition
with Prof. E. Clarke of Carnegie Mellon University, Prof. A. Biere of
the Johannes Kepler University, and S. Weaver of the U.S. DoD.
||Special issue of AMAI
I am guest editing a special issue of Annals of Mathematics and
Artificial Intelligence on Constraints in Formal Verification with
Miroslav Velev of Aires Design Automation.
||11th ISAIM conference
I am program co-chair, with Raj Bhatnagar of the University of
Cincinnati, of the 11th International Symposium on Mathematics
and Artifical Intelligence, to be held in Ft. Lauderdale, Florida,
January 6-8, 2010.
||6th CFV workshop
I am a member of the technical program committee of the 6th workshop
on constraints in formal verification, to be held in Grenoble, France,
on June 26, 2009. The workshop is organized by Miroslav Velev.
• to the workshop website
||ACM Regional Programming Contest
Our department has hosted the annual East Central, North America
region International Collegiate Programming Contest since 2000.
I have been the site director all those years and have been ably
assisted by a variety of students helpers.
• to the contest website
Summer camp lasts for two weeks. During that time high school
students are exposed to basic computer science principles through
programming of computers and robots.
• to the camp website
||Difficult search problems
Discovering reasons for the difficulty of search algorithms in solving
some well-known and important combinatorial problems. Specifically,
we are interested in classes of Satisfiability problems (that is,
determining the solution space of propositional logic formulas).
Numerous problems, especially those arising in the area of Information
Assurance, may be cast as Satisfiability problems so success on
Satisfiability problems translates immediately to success for many
other problems. In the past, research into this question has examined
input structure for exploitable properties: it has been found that,
during search, some input structures may reveal inferences that can be
used to prune the search space considerably, perhaps making an
otherwise intractable problem trivial. The structure of problems
resistant to this approach generally do not provide sufficient
information to generate such inferences until very deep in the search
space, when it is too late to do much about it. Our research
considers guessing inferences from looking at the structure
of solutions to solvable versions of the same problem and
instantiating those inferences before search. Doing so has
allowed us to compute van der Waerden number W(2,6), the first
van der Waerden number to be found in 40 years. The next step is to
apply these techniques to problems of practical interest, such as
Bounded Model Checking.
I am involved in the continued development of a non-clausal
Satisfiability solver which we call SBSAT. SBSAT has been under
development in cooperation with the U.S. DoD. We have demonstrated
that this solver can outperform others on targeted classes of
problems. SBSAT is now a prototype, intended primarily for
proof-of-concept, and does not have most features commonly found in
production quality solvers. We are enhancing SBSAT by adding such
features, speeding up the code, and integrating with complementary
techniques such as algebraic methods.
• to the sbsat website
||Solving difficult combinatorial problems
We wish to apply SBSAT to a variety of combinatorial problems that are
known to be difficult. We hope to gather such a collection of
benchmarks from government and industry. These will be used to
improve the performance of SBSAT, eventually leading to a practical,
general purpose problem solving tool.
||Understanding the nature of hard problems
Study of the difficult benchmarks mentioned above will lead to
insights into the nature of hard problems and will help us to discover
the means to tame some of them.