Free Software and the Aftermath of the XZ Affair (Part 1: Technical)

This essay will be revised in response to commentary and/or subsequent events. Its change history is viewable here.

Please post public commentary as responses to this fediverse post, which links back to this essay. Private commentary may be emailed to zack@owlfolio.org. In either case, I will not quote you here without your permission.

Like practically everyone else with a connection to computer security and/or free software development, I’ve been thinking a bunch in the past two weeks about the insider attack on the XZ compression library and how it could have been prevented.

What happened

The XZ compression library was originally written starting in 2005, based on the older LZMA SDK by Igor Pavlov. Its contribution frequency graph on Github indicates that the library was finished circa 2011, with steadily declining project activity from then until 2022. Lasse Collin (Github handle Larzhu) did almost all of the work until 2022, when a new contributor appeared: “Jia Tan” (almost certainly a pseudonym) (Github handle JiaT75). This person single-handedly revitalized the project, both by doing a lot of piled-up maintenance work themselves and by rekindling Lasse Collin’s interest. They showed up at exactly the right time; Lasse Collin admitted to “limited ability to care” and “longterm mental health issues” in a mailing list thread asking about the library’s maintenance. By late 2022, Jia Tan had earned enough trust to be making releases of the library.

In February of 2024, Jia Tan released version 5.6.0 of XZ, which included malicious code hidden within files that they had, only a couple of days earlier, added to the test suite. The library’s build scripts were subtly altered to extract the malicious code from the test suite and link it into the library. We’re not yet sure of everything the malicious code was intended to do, but we do know that it created a “back door” in the SSH remote login service, potentially allowing the attacker to take control of thousands of servers worldwide. The malicious code was discovered less than a month after the 5.6.0 release, because it made the SSH daemon run suspiciously slowly.

In retrospect, it seems that Tan’s entire involvement with XZ was in preparation to pull off this attack. The people who were asking about the maintenance status of XZ may have been sock puppets used by Tan to put pressure on Collin, and again later to urge Linux distributions to pick up the malicious 5.6.0 update.

For more detail, consult Russ Cox’s timeline of the attack, Evan Boehs’ similar timeline with different emphasis, and Sam James’ FAQ.

What can we do?

We must recognize that the XZ attack was fundamentally a social engineering attack, and that against an attacker as determined and patient as Jia Tan was, technical countermeasures will not help by themselves. If this assertion seems bizarre and improbable to you, please take a moment and do one or more of: read Ken Thompson’s essay “Reflections on Trusting Trust”, browse through the submissions to the Underhanded C Contest, and, if you’re of a mathematical bent, ponder the implications of Rice’s theorem. Bruce Schneier’s book Liars and Outliers is also strongly recommended background reading.

The rest of this essay is going to be about technical countermeasures, but they are all intended to make it easier for human beings to catch future insider attacks, and perhaps also to deter such attacks, not to make them impossible. Later in the week I will be posting a second essay that focuses on the social side of things.

Hardening Source Distribution

There are technical measures that could have made it significantly harder for this attack to evade notice. The most important one I can think of is: The reproducible builds project should start testing that the presence or absence of a program’s test suite has absolutely no effect on the result of a build.

Let me unpack that a little: Right now, when you download the source code for a program (whether via version control or via traditional “tarballs”), you get both the program itself, and its test suite, and there’s no simple way to separate the two. The xz attack took advantage of this, hiding the bulk of the malicious code inside the xz library’s test suite. When the library was built, the malicious code would be extracted from its hiding place and linked into the library. But if the files where the malicious code was hidden were just not there, the build would keep running, so as not to attract attention. Thus, someone could have caught this attack by testing whether removing the test suite from the source tree changed the library itself.

For this to be a reasonable thing to ask of the reproducible builds project, we need to make it easy to separate a program from its testsuite. This will be straightforward for software where the test suite is already in separate files from the program (the usual practice in C) and more difficult when the test suite is blended together with the program (e.g. “doc tests” in Rust). We will need to be able to do it semi-adversarially, that is, on the assumption that most of the authors of the software intend to make the distinction obvious, but that there could be someone who is trying to evade this process. I imagine a tool that can be applied to an unpacked tarball or VCS checkout and that will prune all the test material, trusting machine-readable declarations of what is what, but not running any code or build scripts from inside the tarball. The result will need to be audited by a human. We could also modify common build tools (e.g. Automake) so that tarball creation (“make dist” or equivalent) produces two tarballs: one for the program and one for its testsuite. Blended files would still ship with the program, so a pruning step would still be needed, but this would put another obstacle in the path of an attacker.

Of course, nothing stops an attacker from hiding malicious code in the program itself. But it isn’t nearly as easy. The xz attack payload was precompiled machine code, unreadable by humans, and hidden inside an xz-compressed file, which is a perfectly normal thing to have in xz’s testsuite. It is not normal to have unreadable files of any kind in the program itself, because an unreadable file is almost never going to be “the preferred form for making changes,” as the GPL puts it.

(Relatedly, last week, on the Automake mailing list, Jacob Bachmeyer suggested that the GNU style guidelines should forbid, or at least strongly discourage, shipping binary files of any kind in a source package, not even for testing purposes. Avoiding binary files is a good idea for other reasons: test data, for example, needs to be documented just like any other component. (What is wrong with this invalid xz-compressed file? How is the decompressor expected to handle the corruption?) Generating such files from text-based templates gives you a place to put the documentation, and makes it easier to add more variants of the test in the future. However, converting an existing test suite that contains binary files may be a tremendous amount of work, which would take away developer time from other improvements that are probably more valuable (e.g. implementing property-based testing instead). Also, a ban on binary files would not prevent a determined insider attack from smuggling malicious object code inside the test suite, for instance using stegsnow.)

The XZ attack also makes it obvious that the process of creating source packages needs to be reproducible and verifiable. (A key piece of the XZ malware was included only in tarball releases; builds made directly from XZ’s Git repository would not be malicious. For reasons discussed below, we cannot abandon tarball releases, but we can make them not be a hiding place.) Simon Josefsson has put forward a concrete proposal for this, with a proof of concept implementation for one program. The next step is to adopt it in all relevant build tools (Automake, CMake, Meson, etc.) Josefsson’s experience indicates that changes to gnulib and gettext will also be needed, and probably there will be more. Verification of tarball reproducibility is within the scope of the reproducible builds project, but, in the long run, source distribution sites that accept uploads directly from developers (e.g. ftp.gnu.org, CPAN, CRAN, CTAN, PyPI, npm, crates.io…) should be verifying this step themselves.

Hardening the Dynamic Loader

The SSH protocol does not use XZ compression. The XZ compression library was loaded into sshd’s address space because some Linux distributions patched sshd to use libsystemd, a utility library providing several unrelated functions, some of which do involve XZ compression. But the patched sshd did not use any of those functions, so how was the malicious code getting invoked at all? It used an obscure feature of GNU libc’s dynamic loader, indirect functions, to get itself called very early in process startup, while it was still possible to modify the PLT and GOT. By modifying these tables, it was able to intercept calls into a library that sshd does actively use, libcrypto (a general purpose cryptography library).

Indirect functions are very limited in what they can do, and people struggle to use them reliably even just for their original intended use. They work by calling arbitrary application code (the “resolver” for the function) from deep within the dynamic loader, which is risky to begin with. Worse, depending on system configuration, they might be called before symbol resolution is complete and before the core C library is fully initialized. (The XZ exploit actually relied on this; if I understand correctly, it would not have worked on a system using “lazy” symbol resolution. Ironically, “eager” symbol resolution was developed as a security hardening measure.)

The intended use of indirect functions is to allow a library to select one of several implementations of a single function, based on the characteristics of the CPU. For example, GNU libc itself uses indirect functions to select an implementation of memcpy that uses AVX instructions only when running on an x86 processor that has those instructions. I’m not aware of any other important use cases. It seems to me that it might be a good idea to scrap the indirect-function feature entirely and handle their intended use with a declarative mechanism for selecting function implementations based on CPU characteristics. For example, a library could supply an array of candidate implementations each paired with a bit vector that declares all of the CPU capabilities that that implementation requires. In “eager” symbol resolution mode, this would also mean that no application-controlled code can run before the PLT and GOT are made read-only, closing the path that the XZ exploit used to become active.

More generally (and much more ambitiously), the XZ exploit demonstrates that the dynamic loader is security-critical and probably shouldn’t be in the same address space as the code it loads. I wonder what it would take to make the dynamic loader into a daemon, started as the first user space process after kernel initialization. It would need the ability to alter virtual memory mappings of processes other than itself, but I don’t think any other new system calls would be needed. Another technical challenge would be completely separating the loader from the C library: right now, each C library (glibc, musl, bionic, etc.) has its own loader that’s tightly coupled with the rest of the C runtime. We would need to define a stable IPC interface that covers everything the loader needs to be involved with after process startup is complete. I probably haven’t thought of all the problems.

Still, if we could do it, I think it would solve any number of other problems besides the immediate headache of malware messing with the PLT. It might well be easier to add new types of dynamic relocations. It would definitely be easier for new system programming languages to support dynamic linkage while still avoiding any dependency on the rest of the C runtime. Startup of dynamically and statically linked programs would no longer take radically different paths. The guts of the C library would no longer need to worry about the possibility of running before symbol resolution was complete. Possibly, most of the kernel’s executable loader could be moved into the dynamic loader. (I may be the last person in the world who still thinks it’s a good idea to minimize the code size of the kernel, but I’m not going to stop.)

Autoconf and the Need for a Better Shell

Remember I said a key piece of the XZ malware was included only in tarball releases, and we needed to make tarball generation reproducible? You might now be wondering why there’s any difference at all between the contents of a release tarball and the source tree in the project’s VCS that was marked with the release tag.

There doesn’t have to be a difference, but many projects add extra files to their tarball releases to make life easier for people who want to use the program but have no intention of hacking on it. One of the most common cases of this, and the one that’s relevant to the XZ affair, is the “Autotools suite:” Automake, which I already mentioned, plus Autoconf, Libtool, and a few more pieces that aren’t always needed. If you’re not familiar with these, you can think of them as compilers for build scripts. The scripts they generate are machine-independent, but they aren’t intended to be edited by hand.

Traditionally, these scripts are not checked into version control, but they are included in tarball releases, so you don’t need to install the Autotools suite before you can build the thing you actually wanted. This is particularly important because the Autotools are written in Perl and M4. Perl is large (1.2 million lines of code as of the 5.38.2 release) and not easy to set up by hand. GNU’s version of M4 uses Autoconf and Automake to build, so you couldn’t install it on a system that doesn’t already have them if the compiled build scripts weren’t included in the M4 tarball.

Autoconf, in particular, is a complete programming language, but one that nobody loves. It’s an awkward combination of M4 macros and Bourne shell. Bourne shell by itself is infamously hard to read, especially if you are trying to be as portable as possible (which is usually what you want in an Autoconf script). Layering M4 on top of this only makes it worse. The XZ attacker took advantage of this by slipping that key piece of their malware into one of the source files for XZ’s autoconf script. They didn’t check that modification into version control, thus the only trace of it was in the compiled “./configure” included in the tarball, which they expected no one would look at. Even if they had checked in the modified autoconf source, it’s likely that no one would have wanted to look at that, either. (If you want a taste of what it’s like to read autoconf source code, here is the original version of the file that the XZ attacker modified.)

I wrote some notes about the XZ exploit on my hackers.town account right after the news broke. One of the things I said then was: It’s possible to write incomprehensible, underhanded code in any programming language. There’s competitions for it, even. But when you have a programming language, or perhaps a mashup of two languages, that everyone expects not to be able to understand—no matter how careful the author is—well, then you have what we might call an attractive nuisance. And when blobs of code in that language are passed around in copy-and-paste fashion without much review or testing or version control, that makes it an even easier target.

So maybe we all ought to be using something else? There are several popular alternatives, the most prominent being CMake and Meson. However, rewriting one’s build scripts can be an enormous task. I currently don’t think any existing alternative is enough of an improvement to warrant the switching costs. None of the alternatives support cross-compilation as well as Autoconf does. The CMake language can be just as confusing as Autoconf’s language. Meson is generally agreed to be nicer to work with than either CMake or Autoconf, but it requires you to have Python available when you build things; it has no equivalent of Autoconf’s compiled build scripts. Python is just as big as Perl, and just as difficult to set up by hand if you’re building from source. Meson is also, intentionally, not a complete programming language, which means that you can get 90% of the way through a conversion and then discover that there’s no clean way to finish the job. (This actually happened to me with libxcrypt.)

In my earlier notes I brought up the “half-baked idea” of weaning Autoconf off of M4; you would write the configure script by hand, in ordinary (but still portable) Bourne shell. All the existing M4 macros would be turned into shell functions. There would still be a build step for tarballs, but all it would do is copy the shell functions you used out of the Autoconf installation into your tarball. The biggest problem with this idea is that one is still stuck using portable Bourne shell. It’s not enough of an improvement; in particular, it would still be easy for a malicious insider to hide something in the configure script. And it doesn’t help at all with the part of the job that’s done by Automake. Automake generates portable Makefiles, which are even more annoying to work with than portable shell scripts. Let’s not talk about Libtool.

(Is Autoconf, or something like it, still even necessary? The short answer is yes and the long answer is another 3000-word essay by itself.)

If we want to make it easier to review build scripts (which is the only thing that would raise the bar for an attacker) I think we must take a step back and think hard about what a good shell language would actually be like. Bourne shell is awful, but C shell is worse. fish, rc, and ysh are better, but in my opinion, they still aren’t good. A big piece of why not, I think, is that a shell language is only partly defined by its interpreter. We also need to think about the ergonomics of all the “command line utilities” that constitute the shell’s runtime library (sed, grep, expr, test, etc). Nobody’s seriously tried to do that for as long as I can remember.