*In which I tell you how I got a bottle of Scotch.*

The overarching plan of mad scientists is to replace flesh with metal. As an example of objects made of flesh, let us consider software developers. At the moment, eliminating them seems unfortunately unfeasible, but we can take baby steps. We begin by noticing the daily routine of a typical programmer: short bursts of creative code-writing, and long periods of bug-hunting. Which of these two activities should we try to automate first? Bug-hunting. It involves less creativity, so it should be easier to automate. It occupies more of the programmer's time, so its automation should have higher impact. Bug-hunting is our relatively-low-hanging fruit.

Assuming we, mad scientists, succeed, far fewer humans shall goggle Java code in the future.
Instead, goggling of programs shall be performed by other programs.
Let us call this activity *program analysis*.

Besides eliminating software developers, program analysis offers other benefits.
In the long run, it should help ensure
(1) that web pages don't crash, no matter how unpredictable the user,
(2) that the password a user types will never end up stored on the server,
(3) that numbers are rounded in favor of the bank rather than its clients,
(4) and so on.
(OK, maybe not number 3.)
To achieve these goals, we should be able to perform some basic tasks.
For a language like Java, one of these basic tasks is to tell, by looking at the program text, if variables `x`

and `y`

may point to the same object.
In other words, we need to be able to do *pointer aliasing analysis*.

How to implement an aliasing analysis?
An analyzer written in C would look quite different from one written in Haskell.
This is the standard conundrum:
Should I use a language where I can do as I please, but run the risk of shooting my shoe?
Or should I use a language that makes it difficult to make mistakes, but occasionally it also makes it difficult to make non-mistakes?
Luckily, some people have figured out how to write a *fast* analyzer in a really-high-level^{®} language.
The language is called Datalog.
Perhaps not quite general purpose, but quite high-level.
(See, for example, [Bravenboer, Smaragdakis, *Strictly Declarative Specification of Sophisticated Points-to Analyses*, 2009].)

Datalog is really-high-level, but no magic bullet.
It does not make the problem we try to solve any easier.
In theory, all interesting program analyses, including aliasing analysis, are equally hard: there is no algorithm that solves them precisely.
Intuitively, this equivalence is fairly easy to see.
First, recall that it is impossible to decide whether a program terminates.
If you can't recall this, then, please, just take my word for it.
Second, we can always cook up a program that depends on the termination of another.
For example, consider the program `t(); x=y`

.
Are `x`

and `y`

aliases?
Well, it depends on whether `t()`

terminates, and we know we can't decide that.
However, this answer is what you'd get from a sane mathematician.
If, instead, you go and ask a mad scientist, they're likely to answer ‘yes, `x`

may-alias `y`

’.
Mad scientists are happy enough with an approximate answer.

Wait a minute! Did I not make two opposing statements? On the one hand, all analyses are equally difficult; on the other hand, to decide whether a website may crash we should first see which variables alias which. Surely, if any two analyses are equally difficult, then deciding whether a website may crash is equally difficult to deciding which variables may be aliased. But, if these tasks are equally difficult, then why would I bother to solve the aliasing question at all? Aliasing questions aren't easier, and their answer isn't that interesting in itself. My case for aliasing analysis makes no sense, it would seem. But theory tells us not only that the two tasks are equally difficult, it tells us they are equally impossible. Thus, I'm not talking about implementations that always give the right answer, because they don't exist. I'm talking about implementations that sometimes err. If errors (aka approximations) are allowed, then different analyses need not be equally difficult.

Why are not all analyses equally difficult?
The short answer is that some programs are more likely than others.
Let's zoom into a bit more detail.
Static analysis questions usually have a ‘yes’ or ‘no’ answer.
We agreed to allow errors; that is, perhaps the real answer is ‘yes’ but our implementation says ‘no’, or the other way around.
An implementation is good when its error probability is low.
But how exactly is the error probability defined?
The simplest idea would be to go through all programs and see for what fraction of them the implementation errs.
Let's suppose we can do some pen-and-paper magic and compute this probability to two decimals.
(We can't compute it exactly, at least not for interesting, undecidable analyses.)
Would this number correspond to what we see if we actually try the analysis for a set of real-world programs?
No!
Because some programs are more likely than others.
So, the right definition for probability of error, which should tell us how good an implementation is, ought to weight programs by how likely they are.
This is important, so let me restate it in a different form.
How good an implementation is depends not only on the analysis question, but also on the programs that are being analyzed.
It is perfectly possible to build a termination analysis using an aliasing analysis as a black box, and some thin glue: `t();x=y`

.
But it may very well be that the aliasing analysis is good for the kind of programs humans write, and the termination analysis we just built is good for the kind of programs that humans don't write.

To escape the paralyzing negativity of the theory of computability we embrace approximations. That sounds philosophical and deep. But how exactly do we go about this ‘embracing’, and how does it help in practice? To see this, let's look at a more realistic type of approximation. Consider the following code:

void f() { h(new Object(), new Object()); } void g() { Object x = new Object(); h(x, x); } void h(Object x, Object y) { /* … here goes the body of h … */ }

Could `x`

alias `y`

in `h`

?
The simple answer is ‘yes’, because `h`

could be called from `g`

.
However, we could give a more precise answer:
‘If `h`

is called from `f`

, then `x`

and `y`

are not aliases;
if `h`

is called from `g`

, then `x`

and `y`

are aliases.’
The analysis would need some slightly, but only slightly, fancier data structures to express such facts.
The answer ‘yes’ is just a boolean, but the long answer contains bits like ‘`h`

is called from `f`

’ which you wouldn't represent with a boolean.
The part ‘if …’ is called *context*, and analyses that keep track of it are said to be *context-sensitive*.
So, how would you represent contexts?
The example here suggests that contexts are lists of method names:

It is really tempting to see this as a list of method names,ifzwas called byy, which was called byx, …, which was called bya.

`[a;b;…;z]`

.
(Don't worry about why I wrote it reversed, it's an implementation detail relevant only if you're a functional programmer.)
There is a minor problem with this representation: contexts are not always about method names.
But, I am willing to cover this detail under the rug for this presentation.
I am not willing to ignore the more important problem: lists may be arbitrarily long.
An analysis could keep creating longer and longer contexts forever, and not terminate.
One way to limit the number of possible contexts is to truncate them to some fixed length $k$.
For example, if $k$ is 0, then the only context is the empty list.
The context does not even mention the name of the method being executed.
As another example, if $k$ is 1, then we'd keep track of the method currently being executed, but not of who called it.
If we want to distinguish between `h`

being called from `f`

or `g`

, then we'd need $k\ge2$.

Observe how focusing on a particular analysis naturally suggested an approximation.
We started with the question whether `x`

and `y`

are aliases.
To answer it precisely, we needed to refer to a context.
In our example, the context is a list of method names that are on the call stack.
Clearly, there are programs for which the number of such contexts is unbounded.
(For example, the contexts of
`f(int n) {return n<2? 1 : n*f(n-1);}`

are lists `[f;…;f]`

up to length $n+1$.)
If the analysis would explore all contexts, then it would not terminate.
Hence, it seems natural to limit the number of contexts;
a natural way to do so is to limit their length.

Another observation is that truncating contexts is an overapproximation.
To understand the term overapproximation, recall that the implementation of a static analysis deduces a set of statements of the form ‘fact $f$ holds in context $c$’.
These are shorthands that really mean ‘fact $f$ holds in *all* executions $\tau$ that have context $c$’.
Thus, although these statements taken together look like a set of $(f,c)$ pairs, they really denote a set of $(f,\tau)$ pairs.
In other words, the finite set of statements about (abstract) contexts is a shorthand for a typically infinite set of statements about concrete executions.
Consider two sets $C_1$ and $C_2$ of statements about executions.
If $C_1 \subseteq C_2$, we say that $C_1$ underapproximates $C_2$, and $C_2$ overapproximates $C_1$.
So here's what it means that truncation is an overapproximation: if you truncate contexts that occur in the original statements about contexts, then you end up representing a bigger set of statements about executions.
The terms overapproximation and underapproximation are sometimes used in a slightly different way.
Above, truncation was a process that took us from one set $C_1$ to another set $C_2$, and the term overapproximation described how these two sets relate.
Sometimes, however, there is no process involved and it would seem that only one set $C$ is mentioned.
For example: ‘The analysis deduces an overapproximation.’
In this case, the comparison term is implicitly the set of statements about executions that describe the program *exactly*.

The parameter $k$, which tells us how to truncate contexts, lets us trade running time for precision of the analysis.
Higher values of $k$ mean less approximation, better precision, and slower analysis;
lower values of $k$ mean more approximation, worse precision, and faster analysis.
Multiple parameters, not just one $k$, would allow finer tuning.
Intuitively, if we are interested in one particular fact, then we should need to increase the precision of the analysis only when looking at the relevant parts of the program.
Instead of having one parameter $k$ for the whole analysis, we could have several.
For example, parameter $k_f$ would tell us how to truncate contexts that start with `f`

, parameter $k_g$ would tell us how to truncate contexts that start with `g`

, and so on.
There is a slight problem with multiple parameters.
With one parameter, there is a simple strategy for choosing its value.
You start with $k=0$.
At every step, either the analysis is too slow, or it found the fact you are interested in, or it didn't find the fact you are interested in.
In the latter case, you increment $k$ and try again.
But what do you do with multiple parameters?
First, it's clear that the strategy must be automated, because no human wants to weed through hundreds of parameters.
Second, how to decide which parameters to increase?

If there are many parameters, we need an algorithm that picks their values. Intuitively, we must perform some sort of dependency analysis. Some parts of the program are more relevant than others to the query we try to answer. We should probably prefer to increase parameters associated to the more relevant parts of the program. The difficulty lies in making these vague intuitions precise. How exactly should we compute how relevant is one part of the program? How exactly is a parameter associated with a part of the program? And so on. Moreover, we'd like to make the intuitions precise in a way that is not too tied to the particular language (Java) or the particular analysis (aliasing) that we happen to use as examples. It seems difficult to do so. But, there is a big simplifying factor: people tend to write static analyses in Datalog. We don't need to worry about Java semantics, because they are encoded in Datalog. We don't need to do dependency analysis on Java programs, because they are encoded in Datalog. We don't need to talk about parts of the original Java program, because … you guessed.

Since Datalog is so magical, I'll spend a tiny bit of space on describing it. Datalog programs are sets of rules such as

f(x,z) :- g(x,y), h(y,z), p(y,1)To the left of

`:-`

you have one tuple, the *head*. To the right of

`:-`

you have a possibly empty list of tuples, the *body*. The arguments of tuples are either variables (such as

`x`

, `y`

, `z`

) or constants (such as `1`

).
A tuple is *ground*when all its arguments are constants. The semantics of a Datalog program is its least fixed-point. That is, to run it, you maintain a set of known ground tuples such as

`h(3,5)`

, if a rule lets you derive a new tuple from known ones you do so, otherwise you are done.
For example, if the current set of known facts includes `g(0,1)`

, `h(1,2)`

, and `p(1,1)`

, then the rule from above tells as that we should insert `f(0,2)`

into the set of known facts.
An *open*Datalog program is one to which you are supposed to add a few more rules (the

*input*) before you run it. A

*closed*Datalog program already includes its input, you just have to run it. In the case of program analysis there are two types of inputs — the Java program to analyze, and the values of the parameters. In other words, our closed Datalog programs have three types of rules, which encode, respectively:

- the analysis,
- the parameter setting, and
- the Java program to analyze.

Once the analysis and the program are encoded in Datalog, we can search automatically for a good setting of the parameters. The algorithm that conducts this search is the main focus of our paper. As a result, the paper spends little space on how the encoding works. Since I don't have any space constraints for my blog posts, I'll give an example here.

Let's analyze a very stupid language. Here is an example program:

1: goto 2, 6 2: x := new() 3: goto 4, 7 4: check x 5: goto 1 6: x := new() 7: check x 8: goto 1

This language has variables, object instantiation, non-deterministic goto, and a special `check`

statement.
In this example, we use a single variable, namely `x`

.
On lines 2 and 6, new objects are instantiated.
On lines 4 and 7, we check whether `x`

could come from different `new`

statements.
This check would make more sense if the language would be more complicated.
Think of the code above as a simplified version of a Java program.
Pretend that on lines 2 and 6 the program instantiates *different* classes.
Then, on lines 4 and 7, some method is called on `x`

.
Since Java does dynamic dispatch, we need to pick the method body based on the dynamic type of `x`

, which we know at runtime.
Or do we?
If there is only one place where `x`

could have been instantiated, then we can optimize away the dynamic dispatch.
Well, … almost: Reflection falsifies almost all statements about Java, so I'm sure there must be something wrong with what I said above as well.
But, anyway, the story should provide enough motivation for our `check`

statement.

Given the motivation above, let's say that `check`

statements print the label of statement that instantiated the object.
As humans, we can quickly see what will the `check`

statements do in this example.
It is particularly easy if we look at the flowgraph.

The `check`

at line 7 could print `2` or `6`.
The `check`

at line 4 prints only `2`.

Many analyses reduce to some form of reachability, but often reachability *not* on the flowgraph.
Instead, for context sensitive analyses, the graph's vertices are contexts.
So, let's pick some contexts for our example.
One simple choice is the execution trace itself;
that is, the contexts could be sequences like `12378167` that list labels from the program.
It is easy to see that this sequences form an infinite tree.
The execution `12378167` is highlighted below.

The next step is to turn this tree into a forest.
We are led to do so because we want to add abstraction arcs.
If we truncate the first statement from `12378167` we get `2378167`.
This later sequence will be represented by a vertex in a tree whose root is `2`.
Thus, we need a forest: a tree rooted at `1`, a tree rooted at `2`, and so on.
Below is a small part of this forest of infinite trees.

The abstraction arcs are depicted with dashed lines.
Intuitively, we will be propagating facts through this infinite forest.
Whenever we travel a normal arc from `a` to `b`, we take the facts associated with `a`, pass them through the transfer function of the statement at label `a`, and insert the resulting facts in the set of facts associated with `b`.
Abstraction arcs, on the other hand, will be treated differently.
We distinguish between two states of abstractions arcs: inactive and active.
An inactive abstraction arcs does nothing — it is as if the arc is missing.
An active abstraction arc copies all facts from its source to its target, unmodified.
Abstraction arcs will get activated/deactivated in bunches, based on a parameter setting.
For each label $\ell$ there is a parameter $k(\ell)$.
If $k(7)$ is $2$, this means that the abstraction arc going out of vertices labelled $7$ that are at depth bigger than 2 are activated.
Intuitively, if the execution reaches label $7$, then we will be forgetting what happened more than $k(7)$ statements ago.

The question we will be asking has the following form:
‘Given that the parameters have values $k(1),\ldots,k(8)$, does fact *foo* get attached to any vertex labelled *bla*?’
At first sight, this problem may seem only semi-decidable, because the trees are infinite.
However, we can stop after we explore down to depth $1+\max_\ell k(\ell)$.
Let's see on an example why this is so.
Suppose that the vertex identified by the path `1237816 7` gets some fact

*foo*. Suppose further that $k(\ell)$ is $1$ for $\ell\le4$, and $k(\ell)$ is $2$ for $\ell\gt4$. Let's write under each vertex its depth and its $k$.

12378167 vertex 12345678 depth 11122122 k

The column on gray background is the first column where the depth exceeds the parameter.
Because vertex `1 2` has depth bigger than $k(2)$, there is an active abstraction arc from

`1`to

**2**`. Thus, any facts on`

**2**`1`are also on

**2**`, so we can just as well look at the vertex`

**2**`237816`. Now repeat.

**7**2378167 vertex 1234567 depth 1122122 k

Repeat.

378167 vertex 123456 depth 122122 k

Repeat.

8167 vertex 1234 depth 2122 k

Repeat.

167 vertex 123 depth 122 k

Repeat.

7 vertex 1 depth 2 k

Oh, well, we can't continue here, because no vertex has the depth bigger than its parameter.
But, the point is that, as long as the depth is bigger than the maximum parameter, we can continue.
In this example, all the facts that are on vertex `1237816 7` must also be on vertex

`. So, if we looked at vertex`

**7**`and didn't find fact`

**7***foo*, then we can be sure that vertex

`1237816`won't have fact

**7***foo*either. (Well, if you payed much attention you'd see that you need to appeal to co-induction to conclude what I said above.) In other words, to decide questions of the form we are interested in, it suffices to look down to depth $1+\max_\ell k(\ell)$.

Let's see now how to encode the digraph of contexts in Datalog. For the picture above, we simply list all the arcs, and all the abstraction arcs.

arc(2,23). arc(1,12). arc(6,67). arc(23,234). arc(12,123). arc(67,678) arc(23,233). arc(1,16). arc(16,167). abs(12,2,21). abs(16,6,61). abs(123,23,32). abs(167,67,72).

The expression `abs(12,2,21)` means that there is an abstraction arc from vertex `1 2` to vertex

`; furthermore, this arc is supposed to be active when $k(2)\le1$. We will see in a moment how this activation takes place.`

**2**
Let's recap where we are.
From the code we got a flowgraph.
The flowgraph can be seen as an infinite tree.
The infinite tree can be seen as a forest, with roots starting with each possible label.
The is forest of infinite trees can be safely cut at a certain depth, once we agree we don't care about contexts longer than a certain bound.
And now … we turn this forest back into a tree.
How?
In the simplest way possible, we add root to rule them all.
Note how vertices are contexts; that is, sequences of integers like `1234`.
Well, it makes sense to have a vertex for the empty sequence – that's our super-root.
The empty sequence is usually denoted by `ε`.
(You can think of `ε` as being represented by `0`, if you'd prefer this to be consistent with the previous story that Datalog constants are integers.)

Now, let's look at aliasing information.
Let `pto(x,y,z)` denote that in context `x` variable `y` could point to the object instantiated at label `z`.
We obtain such facts from `new` statements, and we propagate such facts through the digraph of contexts.

pto(x,y,z) :- label(x,z), new(z,y). pto(x,y,z) :- pto(u,y,z), arc(u,x), label(x,v), kind(v,w), w!=1. pto(x,y,z) :- pto(u,y,z), abs(u,x,v), param(v).

The first rule deduces `pto` facts from other facts, `label` and `new`.
The expression `new(x,y)` means that at label `x` variable `y` gets to point to a new object.
Thus, we represent the two `new` statements from our example as follows:

new(2,0). new(6,0).

(By convention, I chose to represent variable `x` of the program we analyze with the integer 0.
Keeping the letter `x` would've been an opportunity for confusion because the Datalog rules also have some variables named `x`.)
The expression `label(x,y)` means that, if we are in context `x`, then we are at label `y`.
And, by the way, when I say ‘at label `y`’, I mean ‘immediately *after* executing the statement at label `y`’.
For the subgraph depicted above, we generate the following rules:

label(2,2). label(1,1). label(6,6). label(23,3). label(12,2). label(67,7). label(234,4). label(123,3). label(678,8). label(233,3). label(16,6). label(167,7).

Now let's see how `pto` facts propagate.
Normally, if the statement is something else than a `new` statement, then aliasing information marches on undisturbed.

pto(x,y,z) :- pto(u,y,z), arc(u,x), label(x,v), kind(v,w), w!=1.

For this to work, we need to know the kind of each statement.

kind(1,0). kind(2,1). kind(3,0). kind(4,0). kind(5,0). kind(6,1). kind(7,0). kind(8,0).

Notice that I used two kinds only: 1 means ‘`new` statement’, 0 means ‘not a `new` statement’.
Notice also that this is a sneaky trick to avoid using negation in Datalog.
Or, rather, to avoid negating user-defined predicates — we're still using disequalities.
Perhaps you now feel an unstoppable urge to figure out a different, simpler encoding.
If so, then please go ahead, and don't forget to let me know your answer.
I, on the other hand, will keep plodding.
The other way to propagate facts is through active abstraction arcs:

pto(x,y,z) :- pto(u,y,z), abs(u,x,v), param(v).

The `param` predicate says which approximations we are willing to make.
For example, the expression `param(21)` means that $k(2)\le1$.
Again looking at the picture from above, we write:

param(21). param(61). param(32). param(72).

That's it.
That's the encoding.
So, let's take a step back and admire it.
I said earlier that there are three types of rules.
The last batch, with the list of cases where `param` holds, demonstrates how the setting of parameters is encoded.
The three rules that look like an inductive definition for `pto` demonstrate how to encode a static analysis.
Finally, all the other rules, including those for `arc` and `abs`, demonstrate how to encode the program we are analyzing.
The encoding is finished in the sense that we can run a Datalog solver and get back the `pto` facts that hold, for a certain parameter setting.
But, it isn't quite finished.
The questions we want to ask are:
‘At label $\ell$, can `x` come from more than one `new` statement?’
Here, $\ell$ is $4$ or $7$.
We need to encode this as well, so we add a rule:

poly(x,y) :- pto(z,y,u), pto(z',y,u'), label(z,x), label(z',x), u!=u'.

The expression `poly(x,y)` means that at label `x` it is not known that variable `y` comes from a fixed `new` statement.

You might want to stop and breathe, after all these new notations. The bad news is that so many concepts were introduced to handle a trivial example. The good news is that complicated examples are handled similarly. Let's recap.

Predicate | Meaning |
---|---|

arc(2,23) |
the control flows from context 2 to context 23 |

abs(12,2,21) |
facts that hold in context 12 should also hold in context 2 if parameter 2 has value at most 1 |

param(21) |
parameter 2 has value at most 1 |

label(123,3) |
the context 123 includes the statement with label 3 |

pto(4,0,2) |
in context 4 variable 0 may point to the object instantiated at the line with label 2 |

new(2,0) |
the statement at label 2 assigns a fresh object to variable 0 |

kind(2,0) |
the statement at label 2 is not a new statement |

kind(2,1) |
the statement at label 2 is a new statement |

poly(4,0) |
at label 4 variable 0 may point to objects instantiated by different new statements |

Let's see if we can actually do stuff with all these definitions.
Yes.
At the very least we can express the two queries that interest us: `poly(4,0)` and `poly(7,0)`.
What else?
Let's set all parameters to 0: `param(10)`, `param(20)`, …, `param(80)`.
Also, we need to produce `arc` and `abs` for contexts of length 0 and 1, as described above.
For example, we include `abs(1,ε,10)`.
Now that all parameters are set, let's run the Datalog solver.
What will it do?
First, it will notice that at line 2 variable `x` points to something allocated on line 2.
Second, it notices that a parameter allows it it to truncate the context `2` to the empty context `ε`.
Third, from the empty context `ε` it goes to context `4` to conclude that at line 4 variable `x` could still come from the `new` statement on line 2.
In fact, truncating a context to length 0 amounts to doing a *flow-insensitive* analysis — the arcs of the flow-graph are ignored, since a fact immediately bleeds out to all other statements.
Here is a snippet of the Datalog derivation:

pto(2,0,2) :- label(2,2), new(2,0). pto(ε,0,2) :- pto(2,0,2), abs(2,ε,20), param(20). pto(6,0,6) :- label(6,6), new(6,0). pto(ε,0,6) :- pto(6,0,6), abs(6,ε,60), param(60). pto(4,0,2) :- pto(ε,0,2), arc(ε,4), label(4,4), kind(4,0), 0!=1. pto(4,0,6) :- pto(ε,0,6), arc(ε,4), label(4,4), kind(4,0), 0!=1. poly(4,0) :- pto(4,0,2), pto(4,0,6), label(4,4), label(4,4), 2!=6. pto(7,0,2) :- pto(ε,0,2), arc(ε,7), label(7,7), kind(7,0), 0!=1. pto(7,0,6) :- pto(ε,0,6), arc(ε,7), label(7,7), kind(7,0), 0!=1. poly(7,0) :- pto(7,0,2), pto(7,0,6), label(7,7), label(7,7), 2!=6.

When all parameters are set to $0$, both `poly(4,0)` and `poly(7,0)` are derived.
In reality, the `check` on line 4 prints only `2`.
So, it better be that `poly(4,0)` ceases being derived if we increase some parameters.
But which?
At last, MaxSat comes into the picture.
What do we want?
We want `poly(4,0)` not to be derived.
We want to play by the (Datalog) rules.
And we want to increase as few parameters as possible.
What we want is readily expressed by the following partial MaxSat problem:

hard ¬poly(4,0) hard pto(2,0,2) ∨ ¬label(2,2) ∨ ¬new(2,0) … hard poly(7,0) ∨ ¬pto(7,0,2) ∨ ¬pto(7,0,6) ∨ ¬label(7,7) soft param(10) soft param(20) … soft param(80)

The hard clauses must be satisfied.
The soft clauses should be satisfied.
In this case, the MaxSat solver will notice that dropping either of `param(20)` or `param(60)` is enough to disrupt the derivation of `poly(4,0)`.
Out of `param(20)` and `param(60)` it will pick one at random.
Let's say it picks `param(60)`.
We interpret this to mean that we should set parameter `6` to a value greater than `0`.
We repeat the process from above.
The Datalog solver will derive `pto(6,0,6)`, then `pto(67,0,6)`, then `pto(7,0,6)`.
At this point the fact that variable `0` may come from `6` bleeds into the empty context `ε` again, because parameter `7` has value `0`.
This time, the MaxSat solver will pick randomly between `param(20)` and `param(70)`.
After going back-and-forth several times between Datalog and MaxSat, eventually the values of parameters on the right loop (that is, parameters `1`, `6`, `7`, `8`) would be at least `1`, and Datalog will not derive `poly(4,0)` anymore.

If, on the other hand, we are asking about `poly(7,0)` in the same way, then we will end up increasing parameters forever.
To ensure termination we can put an upper limit on how much we are willing to increase parameters.
This is better than just trying to set all parameters to the upper limit in the first try!
There are typically hundreds of parameters, and the refinement approach outlined above pushes to the limit only a few tens, before deciding that the query cannot be ruled-out.

This post was rather heavy on details. I will conclude on a lighter note.

**Where next.**
If you are intrigued by this post, but a little annoyed that there's too much detail and too little insight, then there's a paper you should read:
[Zhang et al., *On Abstraction Refinement for Program Analyses in Datalog*, 2014].
(As an aside, if you look at the first proof in the appendix, you may recognize a shortened version of a previous post on this blog on least fixed-points.)
If you want to play with the implementation then head to jChord.
The refinement strategy described above is in the repository (mainly r1832), but not yet in a release.

What's there not to like about this work? At PLDI14, Yannis Smaragdakis told me that he thinks it's ugly that we use two solvers. There are two answers to that. First, I think it's beautiful how the two solvers perfectly fit together. So, beauty may be subjective. Second, the two solvers handle rather different tasks: one computes least fixed-points, the other does boolean optimization. There simply was no off-the-shelf solver that was good at both. We used bddbddb for the Datalog part, and MiFuMax for the MaxSat part. But, it's true: There's no principled reason why a sover couldn't attack both questions in one go. And, in fact, Nikolaj Bjørner coded on the spot something that does just this in Z3!!

What else is there not to like?
What I dislike the most is the random choices.
It's not that I have a problem with randomness or nondeterminism.
(Although, perhaps I do.)
But, it feels like there should be a more principled way to pick between `param(20)` and `param(60)` in the example above.
It feels like we're ignoring some information that could speed things up even more.
Well, …
I kinda know this is the case, because Hongseok Yang already did some experiments that showed better is possible!

The other downside was that they gave only one bottle per distinguished paper at PLDI14, and there are 5 authors for this paper. But, somehow, the bottle ended up with me. So, I guess, it's not such a bad downside after all.

## No comments:

## Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.