The Satisfiability problem (SAT) is to determine whether there exists
an assignment of values to a collection of Boolean variables that
causes a propositional formula expressed in those variables to
evaluate to true. The problem is at the core of many other
problems - so much so that it became the first problem to be shown to
be NP-complete via a general translation that can be applied to just
about all interesting combinatorial problems. The NP-completeness of
SAT means there is no known algorithm that is guaranteed to run
efficiently on all instances of SAT. The Davis Putnam Loveland
Logemann (DPLL) search algorithm for solving instances of SAT that are
expressed in Conjunctive Normal Form (CNF - a canonical representation
of propositional formulas) was discovered in 1960 and remains the
backbone of currently successful Satisfiability solvers. In the late
1970s Martin Davis, then a professor at New York University, asked a
student, Allan Goldberg, to investigate the average running time of
DPLL. The student surprised the world by showing that if all inputs
are distributed uniformly, DPLL runs efficiently on the average.
I analyzed the distribution that Goldberg used and found that it, rather than DPLL, was responsible for this amazing result: specifically, I showed that any random assignment is a solution to a randomly generated input with probability tending to 1 as the number of variables increases. I proposed an alternative distribution, now known as the constant-width distribution. The constant-width distribution comes with a parameter called density. I showed that the DPLL variant considered by Goldberg performs inefficiently on random inputs generated according to the constant-width distribution, regardless of density.
With collaborators, I further showed that random constant-width formulas nearly never have a true assignment if the density is greater than a constant times 2k, where k is the disjunction size. I introduced non-backtracking variants of DPLL which are now known in the literature as UC, GUC, and UC with majority rule and showed that each of these performed efficiently on almost all random instances when density is less than a constant times 2k/k - the constant depending on the variant.
We applied the techniques used to obtain the above results to estimate the impact of certain classes of SAT that are known to be solved efficiently without search. There are many such classes but some of the best known are 2-SAT, Horn, q-Horn, Extended Horn, Hidden Horn, CC-balanced, Linear Autarkies, and Matched formulas. The q-Horn class is especially interesting because, before our analysis, it had been thought to be the largest succinctly expressible, efficiently solved class of CNF formulas. We showed that some of these classes, including q-Horn, represent a substantial fraction of input formulas only if density is less than a constant times k and others only if density is less than a constant. None of the classes is substantially large up to densities on the order of even 2k/k. We introduced another efficiently solved class called SLUR (for Single Lookahead Unit Resolution) and showed it subsumes most of the classes above. However, SLUR is no larger in the probabilistic sense than the trivial Matched class.
Our probabilistic studies have provided insights which have had some influence on SAT solver design and, more importantly, have helped to understand the boundaries of efficiently solved problems: the structures that allow such problems to be solved efficiently, and the structures that prevent formulas from being members of efficiently solved classes. The constant-width distribution remains to this day the most important distribution for analysis and interpreting SAT algorithmic behavior and has been used effectively by physicists to help understand what happens to solution structure as density increases.
Our more recent research attempts to transfer some of the knowledge gained from our previous research and some new problem solving ideas to the real world. Doing so required building our own testing tool which became the SBSAT solver mentioned earlier. SBSAT is unique in several ways and I mention the two most interesting ways here. First, SBSAT is non-clausal. All other modern SAT solvers depend on CNF inputs but few real problems are naturally expressed that way. Thus, problems must be translated to CNF from some other language, for example temporal logic, and the translation tends to garble critical domain-specific information that would otherwise be useful in directing a search for a solution. SBSAT bypasses the translation step. It is designed to memoize domain-specific information in preprocessing and use it to guide the search. SBSAT maintains a collection of state machines, the nodes and edges of which retain data such as how many inferences might be generated deeper in the search if a particular unassigned variable is assigned a particular value. Thus, SBSAT uses a new form of look-ahead, which we call function-complete look-ahead. This is the second unique feature of SBSAT. Other solvers perform either depth-first look-ahead, also known as restarts, or breadth-first look-ahead, such as in Prover. Since, as was mentioned earlier, many problems are hard because they do not reveal inferences until deep in the search, the function-complete look-ahead approach appears to have an enhanced potential for discovery of at least some of these hard-to-get inferences. SBSAT remains under development and currently serves as a testbed for several new approaches to solving difficult instances of Satisfiability.
|♦||Groundwater Remediation Software|
The problem of cleaning up (alternatively, remediating) a contaminated
landfill has become a national priority only recently. This is partly
because, until the 1980s, it was not generally realized that
groundwater, the water flowing beneath the surface of the earth in
aquifers, can become polluted to the point of being unhealthy. Since
about 50% of the drinking water in the United States comes from
aquifers, remediation of contaminated sites drained by groundwater has
received urgent attention.
When contaminated sites are drained by aquifers, remediation plans must include an effective means to intercept and extract a sufficient volume of contaminant from groundwater systems near the site to prevent unsatisfactory concentrations from accumulating at other sites served by these systems. The kinds of interception equipment available include injection wells, extractions wells, and trenches. Injection wells pump water into the ground and are used to control locally the direction of groundwater and therefore the flow of contaminant. Extraction wells are used to pump contaminated water out of the ground. Trenches provide a passive means for contaminant extraction and may be an attractive alternative to extraction wells in certain cases.
The optimal placement of these remediation objects is a complex function of many variables such as local knowledge, capital and operating costs of the remediation objects, availability of remediation object resources, what are considered safe levels of contaminant concentration, what can or must be done with the recovered contaminated water, and several others. This level of complexity requires some mechanical computational assistance. However, although the flow of contaminant through a groundwater system remediated by a particular extraction system can be simulated by a computer, there was no computer aid which finds an optimal or near optimal recovery plan: that is, a plan which recovers sufficient contaminant at least cost.
Some of the features of our software are:
It could be used in the classroom to assist in the teaching of groundwater principles.
It could mix approximation algorithms for optimization with simulation algorithms for validation. As such, it could be a research platform for experimenting with a variety of optimization techniques.
It could provide several levels of simulation and approximation accuracy: the final remediation plan for a site could be determined by the process of stepwise-refinement through several levels.
It could provide several levels of output detail. For example, one may view the status of the entire site at one particular time in order to understand global, spatial relationships, or view specific information at a point on the site in order to understand time dependent relationships between measurable attributes. Windows and dialog boxes provide the output. These may be opened and closed easily when needed.
Many of the attributes of the simulation could be represented visually. A model of a site could be represented on the computer screen. Site features such as roads, buildings, and rivers could be visible or invisible as needed. Soil properties such as permeability could be represented visually. Quantities such as Head, and Concentration could also be represented visually using color shading. Remediation objects could be moved around the screen under mouse control and automatically by the results of intrinsic approximation algorithms; in this way, partial results could be observed rapidly, possibly leading to abandonment of some alternatives in favor of others. Here is a screenshot:
It could solve a variety of subsurface transport problems using a variety of treatment technologies. Thus, the cost effectiveness of various methods of treatment could be compared.
The most important feature of the program was the ability to integrate different kinds of information at different levels of detail. The user was able to concentrate on site visualization and specific quantities at the same time, as needed. For example, a user could locate a test-point on the visualized landscape and pop open a graph showing the history of an attribute at that point. Then the user could explore that graph with the mouse to find exact values at particular graph points.
|I am working closely with colleagues at Ft. George Meade, Maryland, until late March, 2010 on a number of projects including improving SBSAT.|
|♦||SIAM Monograph on Satisfiability|
|I am writing a monograph for the Society of Industrial and Applied Mathematics entitled Analysis of Algorithms for Monotonic and Non-monotonic Reasoning, with John Schlipf of the University of Cincinnati. Expected completion is late 2010.|
|♦||Non-clausal Satisfiability Solver Competition|
|I am co-organizing a non-clausal Satisfiability solver competition with Prof. E. Clarke of Carnegie Mellon University, Prof. A. Biere of the Johannes Kepler University, and S. Weaver of the U.S. DoD.|
|♦||Special issue of AMAI|
|I am guest editing a special issue of Annals of Mathematics and Artificial Intelligence on Constraints in Formal Verification with Miroslav Velev of Aires Design Automation.|
|♦||11th ISAIM conference|
|I am program co-chair, with Raj Bhatnagar of the University of Cincinnati, of the 11th International Symposium on Mathematics and Artifical Intelligence, to be held in Ft. Lauderdale, Florida, January 6-8, 2010.|
|♦||6th CFV workshop|
|I am a member of the technical program committee of the 6th workshop on constraints in formal verification, to be held in Grenoble, France, on June 26, 2009. The workshop is organized by Miroslav Velev.|
|♦||ACM Regional Programming Contest|
|Our department has hosted the annual East Central, North America region International Collegiate Programming Contest since 2000. I have been the site director all those years and have been ably assisted by a variety of students helpers.|
|Summer camp lasts for two weeks. During that time high school students are exposed to basic computer science principles through programming of computers and robots.|
|♦||Difficult search problems|
|Discovering reasons for the difficulty of search algorithms in solving some well-known and important combinatorial problems. Specifically, we are interested in classes of Satisfiability problems (that is, determining the solution space of propositional logic formulas). Numerous problems, especially those arising in the area of Information Assurance, may be cast as Satisfiability problems so success on Satisfiability problems translates immediately to success for many other problems. In the past, research into this question has examined input structure for exploitable properties: it has been found that, during search, some input structures may reveal inferences that can be used to prune the search space considerably, perhaps making an otherwise intractable problem trivial. The structure of problems resistant to this approach generally do not provide sufficient information to generate such inferences until very deep in the search space, when it is too late to do much about it. Our research considers guessing inferences from looking at the structure of solutions to solvable versions of the same problem and instantiating those inferences before search. Doing so has allowed us to compute van der Waerden number W(2,6), the first van der Waerden number to be found in 40 years. The next step is to apply these techniques to problems of practical interest, such as Bounded Model Checking.|
|I am involved in the continued development of a non-clausal Satisfiability solver which we call SBSAT. SBSAT has been under development in cooperation with the U.S. DoD. We have demonstrated that this solver can outperform others on targeted classes of problems. SBSAT is now a prototype, intended primarily for proof-of-concept, and does not have most features commonly found in production quality solvers. We are enhancing SBSAT by adding such features, speeding up the code, and integrating with complementary techniques such as algebraic methods.|
|♦||Solving difficult combinatorial problems|
|We wish to apply SBSAT to a variety of combinatorial problems that are known to be difficult. We hope to gather such a collection of benchmarks from government and industry. These will be used to improve the performance of SBSAT, eventually leading to a practical, general purpose problem solving tool.|
|♦||Understanding the nature of hard problems|
|Study of the difficult benchmarks mentioned above will lead to insights into the nature of hard problems and will help us to discover the means to tame some of them.|