Security Analysis in Psalm
Psalm can attempt to find connections between user-controlled input (like $_GET['name']
) and places that we don’t want unescaped user-controlled input to end up (like echo "<h1>$name</h1>"
by looking at the ways that data flows through your application (via assignments, function/method calls and array/property access).
You can enable this mode with the --taint-analysis
command line flag. When taint analysis is enabled, no other analysis is performed. To ensure comprehensive results, Psalm should be run normally prior to taint analysis, and any errors should be fixed.
Tainted input is anything that can be controlled, wholly or in part, by a user of your application. In taint analysis, tainted input is called a taint source.
Example sources:
$_GET[‘id’]
$_POST['email']
$_COOKIE['token']
Taint analysis tracks how data flows from taint sources into taint sinks. Taint sinks are places you really don’t want untrusted data to end up.
Example sinks:
<div id="section_<?= $id ?>">
$pdo->exec("select * from users where name='" . $name . "'")
Taint Types
Psalm recognises a number of taint types by default, defined in the Psalm\Type\TaintKind class:
sql
- used for strings that could contain SQLldap
- used for strings that could contain a ldap DN or filterhtml
- used for strings that could contain angle brackets or unquoted stringshas_quotes
- used for strings that could contain unquoted stringsshell
- used for strings that could contain shell commandscallable
- used for callable strings that could be user-controlledunserialize
- used for strings that could contain a serialized stringinclude
- used for strings that could contain a path being includedeval
- used for strings that could contain codessrf
- used for strings that could contain text passed to Curl or similarfile
- used for strings that could contain a pathcookie
- used for strings that could contain a http cookieheader
- used for strings that could contain a http headeruser_secret
- used for strings that could contain user-supplied secretssystem_secret
- used for strings that could contain system secrets
You're also free to define your own taint types when defining custom taint sources – they're just strings.
Taint Sources
Psalm currently defines three default taint sources: the $_GET
, $_POST
and $_COOKIE
server variables.
You can also define your own taint sources.
Taint Sinks
Psalm currently defines a number of different sinks for builtin functions and methods, including echo
, include
, header
.
You can also define your own taint sinks.
Avoiding False-Positives
Nobody likes to wade through a ton of false-positives – here’s a guide to avoiding them.
Limitations
Taint Analysis relies on not making any mistakes when escaping values, e.g.
$sql = 'SELECT * FROM users WHERE id = ' . $mysqli->real_escape_string((string) $_GET['id']);
$html = "
<img src=" . htmlentities((string) $_GET['img']) . " alt='' />
<a href='" . htmlentities((string) $_GET['a1']) . "'>Link 1</a>
<a href='" . htmlentities((string) $_GET['a2']) . "'>Line 2</a>";
// Details:
// $id = 'id' - Missing quotes
// $img = '/ onerror=alert(1)' - Missing quotes
// $a1 = 'javascript:alert(1)' - Normal inline JavaScript
// $a2 = '/' onerror='alert(1)' - Pre PHP 8.1, single quotes are not escaped by default
// Test:
// /?id=id&img=%2F+onerror%3Dalert%281%29&a1=javascript%3Aalert%281%29&a2=%2F%27+onerror%3D%27alert%281%29
To avoid these issues, use Parameterised Queries for SQL and Commands (e.g. exec
); and a context-aware templating engine for HTML. Then use the literal-string type to ensure sensitive strings are defined in your application (i.e. have been written by a developer).
Using Baseline With Taint Analysis
Since taint analysis is performed separately from other static code analysis, it makes sense to use a separate baseline for it.
You can use --use-baseline=PATH option to set a different baseline for taint analysis.
Viewing Results in a User Interface
Psalm supports the SARIF standard for exchanging static analysis results. This enables you to view the results in any SARIF compatible software, including the taint flow.
GitHub Code Scanning
GitHub code scanning can be set up by using the Psalm GitHub Action.
Alternatively, the generated SARIF file can be manually uploaded as described in the GitHub documentation.
The results will then be available in the "Security" tab of your repository.
Other SARIF compatible software
To generate a SARIF report run Psalm with the --report
flag and a .sarif
extension. For example:
psalm --report=results.sarif
Debugging the taint graph
Psalm can output the taint graph using the DOT language. This is useful when expected taints are not detected. To generate a DOT graph run Psalm with the --dump-taint-graph
flag. For example:
psalm --taint-analysis --dump-taint-graph=taints.dot
dot -Tsvg -o taints.svg taints.dot