Avoiding false-positives with flow-sensitive conditional analysis
It’s hard to develop a new static analysis tool for a language that’s been around a couple of decades — alarm bells go off in your users’ minds if your tool flags code that has worked fine for a dozen years.
Some relatively new static analysis tools have overcome this through a sort of argument from authority. TypeScript’s author can say “I have designed famous programming languages, trust me when I say your code is poorly-written”.
I, however, have designed zero programming languages, so I have to be a little more accommodating when building a static analysis tool for PHP.
Psalm has a feature I'm calling flow-sensitive conditional analysis. I don't believe any similar tools employ it. Since writing this Paul Khuong has pointed to Typed Racket as a modern type system that also performs this analysis, where it's more properly named Occurrence Typing.
This article will show how Psalm’s analysis can help us avoid a false-positive on this snippet of code:
<?php
class A {
public function isValid() : bool {
return (bool) rand(0, 1);
}
public function someAction() : void {}
}
function takesA(?A $a) : void {
$valid_a = $a && $a->isValid();
if ($valid_a) {
$a->someAction();
}
}
Flow-sensitive type analysis
Any type-inferring static analysis tool for dynamically-typed languages like PHP or JavaScript needs to perform flow-sensitive type analysis, meaning that the tool understands how the datatypes of variables change within a function given that function’s collection of statements and expressions.
At every point during the analysis of a given set of statements such tools keep a map of variable names with corresponding types, updating that map as they traverse the code's abstract syntax tree.
This allows those tools to infer that executing this code will never result in a null reference:
<?php
function getStringOrDefault(?string $a) : string {
if ($a === null) {
return 'default';
}
return $a;
}
Here Psalm’s behaviour is similar to other static analysis tools:
- Psalm sees the conditional expression and generates a formula (in Conjunctive Normal Form) composed of a series of clauses. The
if
conditional$a === null
is converted to the formula($a is null)
. That formula represents all the things we know to be true inside theif
block. - From this formula we generate a list of assertions that can be applied back to the types that are currently defined. To do this, we look for all clauses in our formula that contain just a single variable. In this case, that’s the clause with the
($a is null)
assertion that we found earlier. We extract that clause, then reconcile it with the existing known type for$a
,string|null
, resulting in us setting the type of$a
tonull
. - Psalm sees that the
if
block always returns as long as the formula we generated above evaluates to true, so after theif
block Psalm knows that the formula must evaluate to false. Thus the negation of the formula is true after theif
block. - Psalm calculates the negation of the formula as
($a is !null)
trivially. - As in step 2, it looks for clauses containing a single variable which can translate to a type assignment. Here we take the
($a is !null)
clause and reconcile it with thestring|null
type for$a
to give us the typestring
. - Psalm can then determine that
return $a
always returns a string.
Flow-sensitive conditional analysis
While flow-sensitive type analysis keeps a map of variables and their respective types, flow-sensitive conditional analysis adds a list of conditionals it knows to be true at each point in the code.
Let’s introduce some extra complexity to our function above and see how flow-sensitive conditional analysis is useful:
<?php
function getStringsOrDefault(?string $a, ?string $b) : string {
if ($a === null && $b === null) {
return "default";
}
if ($a !== null) {
return $a;
}
return $b;
}
- Psalm analyses the first conditional
$a === null && $b === null
and generates the formula($a is null) && ($b is null)
. - Psalm then looks for clauses containing a single variable assertion, and sees two —
($a is null)
and($b is null)
. Psalm dutifully assigns$a
and$b
to null inside the if condition (though these assignments are never used). - Since the
if
block always returns when entered, theif
condition must be false after it, so we compute the negation!(($a is null) && ($b is null))
to get($a is !null || $b is !null)
. This is a single clause, but since it contains two variables we cannot infer any types straight away. At this point tools that just perform flow-sensitive type analysis discard that negated formula, since the formula cannot translate directly into type assignments. But Psalm retains this information, stored as the set of conditions it knows to be true. - Now it encounters a new
if
condition$a !== null
. It derives the formula($a is !null)
and inside theif
block it trivially determines$a
has the typestring
- After that second
if
condition Psalm then knows that($a is null)
. It combines that formula with the previously-generated formula($a is !null || $b is !null)
.
($a is null) && ($a is !null || $b is !null)
simplifies to
($b is !null)
.
This gives us a clause with a single variable which allows us to determine that$b
has the typestring
. Tools that just perform flow-sensitive type analysis emit a false-positive warning here.
If you’re still with me, congratulations!
Knowing a bit about flow-sensitive conditional analysis, let’s revisit the very first example:
<?php
class A {
public function isValid() : bool {
return (bool) rand(0, 1);
}
public function someAction() : void {}
}
function takesA(?A $a) : void {
$valid_a = $a && $a->isValid();
if ($valid_a) {
$a->someAction();
}
}
Whenever Psalm encounters an assignment where the right-hand side is an AND/OR operation like this:
$valid_a = $a && $a->isValid();
It introduces a new condition to its list of conditions it knows to be true:
($valid_a is falsy || (($a is !falsy) && <extra stuff>))
In plain English “it’s either the case that $valid_a
is falsy
or it’s the case that $a
is not falsy”.
Since we store everything in Conjunctive Normal Form this becomes
($valid_a is falsy || $a is !falsy)
&& ($valid_a is falsy || <extra stuff>)
From the if
condition Psalm derives the formula ($valid_a is !falsy)
. Now Psalm knows that inside the if
block this formula holds:
($valid_a is !falsy)
&& ($valid_a is falsy || $a is !falsy)
&& ($valid_a is falsy || <extra stuff>)
This formula can again be simplified to
($a is !falsy) && (<extra stuff>)
This tells Psalm that $a
is never falsy
inside the if
condition, which allows the tool to avoid a false-positive that other tools miss.
What’s the cost?
Performing flow-sensitive conditional analysis adds a small (roughly 6%) average increase in Psalm’s overall runtime. There are a number of pathological cases that can create very large formulae (Psalm maxes out at 20,000 terms before it gives up trying to simplify formulae).
This analysis can also generate some slightly convoluted issue messages. For instance
<?php
function checkValue(string $a) : void {
if ($a !== "hello") {
if ($a === "hello") {
// some code
}
}
}
Produces
Condition
($a is string(hello))
contradicts a previously-established condition($a is not string(hello))
On the plus-side, a slightly-convoluted message is better than no message at all.
I am an island
Lastly there's a cost to maintaining a system that doesn’t appear to exist outside of Psalm. I’m hoping this article might change that!