Andrew Kelley - `zig cc`: a Powerful Drop-In Replacement for GCC/Clang (2020 Mar 24)

`zig cc`: a Powerful Drop-In Replacement for GCC/Clang

If you have heard of Zig before, you may know it as a promising new programming language which is ambitiously trying to overthrow C as the de-facto systems language. But did you know that it also can straight up compile C code?

This has been possible for a while, and you can see some examples of this on the home page. What's new is that the zig cc sub-command is available, and it supports the same options as Clang, which, in turn, supports the same options as GCC.

Now, I'm sure you're feeling pretty skeptical right about now, so let me hook you real quick before I get into the juicy details.

Clang and GCC cannot do this:

andy@ark ~/tmp> cat hello.c
#include <stdio.h>

int main(int argc, char **argv) {
    fprintf(stderr, "Hello, World!\n");
    return 0;
}
andy@ark ~/tmp> clang -o hello.exe hello.c -target x86_64-windows-gnu
clang-7: warning: argument unused during compilation: '--gcc-toolchain=/nix/store/ificps9si1nvz85f9xa7gjd9h6r5lzg6-gcc-9.2.0' [-Wunused-command-line-argument]
/nix/store/7bhi29ainf5rjrk7k7wyhndyskzyhsxh-binutils-2.31.1/bin/ld: unrecognised emulation mode: i386pep
Supported emulations: elf_x86_64 elf32_x86_64 elf_i386 elf_iamcu elf_l1om elf_k1om
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
andy@ark ~/tmp> clang -o hello hello.c -target mipsel-linux-musl
In file included from hello.c:1:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/stdio.h:27:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/bits/libc-header-start.h:33:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/features.h:452:
/nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/gnu/stubs.h:7:11: fatal error: 
      'gnu/stubs-32.h' file not found
# include <gnu/stubs-32.h>
          ^~~~~~~~~~~~~~~~
1 error generated.
andy@ark ~/tmp> clang -o hello hello.c -target aarch64-linux-gnu
In file included from hello.c:1:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/stdio.h:27:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/bits/libc-header-start.h:33:
In file included from /nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/features.h:452:
/nix/store/8pp3i3hcp7bv0f8jllzqq7gcp9dbzvp9-glibc-2.27-dev/include/gnu/stubs.h:7:11: fatal error: 
      'gnu/stubs-32.h' file not found
# include <gnu/stubs-32.h>
          ^~~~~~~~~~~~~~~~
1 error generated.

`zig cc` can:

andy@ark ~/tmp> zig cc -o hello.exe hello.c -target x86_64-windows-gnu
andy@ark ~/tmp> wine64 hello.exe
Hello, World!
andy@ark ~/tmp> zig cc -o hello hello.c -target mipsel-linux-musl
andy@ark ~/tmp> qemu-mipsel ./hello
Hello, World!
andy@ark ~/tmp> zig cc -o hello hello.c -target aarch64-linux-gnu
andy@ark ~/tmp> qemu-aarch64 -L ~/Downloads/glibc/multi-2.31/install/glibcs/aarch64-linux-gnu ./hello
Hello, World!

Features of `zig cc`

zig cc is not the main purpose of the Zig project. It merely exposes the already-existing capabilities of the Zig compiler via a small frontend layer that parses C compiler options.

Install simply by unzipping a tarball

Zig is an open source project, and of course can be built and installed from source the usual way. However, the Zig project also has tarballs available on the download page. You can download a 45 MiB tarball, unpack it, and you're done. You can even have multiple versions at the same time, no problem.

Here, rather than downloading the x86_64-linux version, which matches the computer I am currently using, I'll download the Windows version and run it in Wine to show how simple installation is:

andy@ark ~/tmp> wget --quiet https://ziglang.org/builds/zig-windows-x86_64-0.5.0+13d04f996.zip
andy@ark ~/tmp> unzip -q zig-windows-x86_64-0.5.0+13d04f996.zip 
andy@ark ~/tmp> wine64 ./zig-windows-x86_64-0.5.0+13d04f996/zig.exe cc -o hello hello.c -target x86_64-linux
andy@ark ~/tmp> ./hello
Hello, World!

Take a moment to appreciate what just happened here - I downloaded a Windows build of Zig, ran it in Wine, using it to cross compile for Linux, and then ran the binary natively. Computers are fun!

Compare this to downloading Clang, which has 380 MiB Linux-distribution-specific tarballs. Zig's Linux tarballs are fully statically linked, and therefore work correctly on all Linux distributions. The size difference here comes because the Clang tarball ships with more utilities than a C compiler, as well as pre-compiled static libraries for both LLVM and Clang. Zig does not ship with any pre-compiled libraries; instead it ships with source code, and builds what it needs on-the-fly.

Caching System

The Zig compiler uses a sophisticated caching system to avoid needlessly rebuilding artifacts. I carefully designed this caching system to make optimal use of the file system while maintaining correct semantics - which is trickier than you might think!

The caching system uses a combination of hashing inputs and checking the fstat values of file paths, while being mindful of mtime granularity. This makes it avoid needlessly hashing files, while at the same time detecting when a modified file has the same contents. It always has correct behavior, whether the file system has nanosecond mtime granularity, second granularity, always sets mtime to zero, or anything in between.

You can find a detailed description of the caching system in the 0.4.0 release notes.

zig cc makes this caching system available when compiling C code. For simple enough projects, this obviates the need for a Makefile or other build system.

andy@ark ~/tmp> cat foo.c
#include <stdio.h>

#include "another_file.c"

int main(int argc, char **argv) {
#include "printf_many_times.c"
}
andy@ark ~/tmp> cat another_file.c 
void another(void) {}
andy@ark ~/tmp> time zig cc -c foo.c
0.12
andy@ark ~/tmp> time zig cc -c foo.c
0.01
andy@ark ~/tmp> touch another_file.c 
andy@ark ~/tmp> time zig cc -c foo.c
0.01
andy@ark ~/tmp> echo "/* add a comment */" >>another_file.c
andy@ark ~/tmp> time zig cc -c foo.c
0.12
andy@ark ~/tmp> time zig cc -c foo.c
0.01

Here you can see the caching system is smart enough to find dependencies that are included with the preprocessor, and smart enough to avoid a full rebuild when the mtime of another_file.c was updated.

One last thing before I move on. I want to point out that this caching system is not some fluffy bloated feature - rather it is an absolutely critical component to making cross-compiling work in a usable manner. As we'll see below, other compilers ship with pre-compiled, target-specific binaries, while Zig ships with source code only and cross-compiles on-the-fly, caching the result.

Cross Compiling

I have carefully designed Zig since the very beginning to treat cross compilation as a first class use case. Now that the zig cc frontend is available, it brings these capabilities to C code.

I showed you above cross-compiling some simple "Hello, World!" programs. But now let's try a real-world C project.

Let's try LuaJIT!

[~/Downloads]$ git clone https://github.com/LuaJIT/LuaJIT
[~/Downloads]$ cd LuaJIT
[~/Downloads/LuaJIT]$ ls
COPYRIGHT  doc  dynasm  etc  Makefile  README  src

OK so it uses standard Makefiles. Here we go, first let's make sure it works natively with zig cc.

[~/Downloads/LuaJIT]$ export CC="zig cc"
[~/Downloads/LuaJIT]$ make CC="$CC"
==== Building LuaJIT 2.1.0-beta3 ====
make -C src
make[1]: Entering directory '/home/andy/Downloads/LuaJIT/src'
HOSTCC    host/minilua.o
HOSTLINK  host/minilua
DYNASM    host/buildvm_arch.h
HOSTCC    host/buildvm.o
HOSTCC    host/buildvm_asm.o
HOSTCC    host/buildvm_peobj.o
HOSTCC    host/buildvm_lib.o
HOSTCC    host/buildvm_fold.o
HOSTLINK  host/buildvm
BUILDVM   lj_vm.S
ASM       lj_vm.o
CC        lj_gc.o
BUILDVM   lj_ffdef.h
CC        lj_err.o
CC        lj_char.o
BUILDVM   lj_bcdef.h
CC        lj_bc.o
CC        lj_obj.o
CC        lj_buf.o
CC        lj_str.o
CC        lj_tab.o
CC        lj_func.o
CC        lj_udata.o
CC        lj_meta.o
CC        lj_debug.o
CC        lj_state.o
CC        lj_dispatch.o
CC        lj_vmevent.o
CC        lj_vmmath.o
CC        lj_strscan.o
CC        lj_strfmt.o
CC        lj_strfmt_num.o
CC        lj_api.o
CC        lj_profile.o
CC        lj_lex.o
CC        lj_parse.o
CC        lj_bcread.o
CC        lj_bcwrite.o
CC        lj_load.o
CC        lj_ir.o
CC        lj_opt_mem.o
BUILDVM   lj_folddef.h
CC        lj_opt_fold.o
CC        lj_opt_narrow.o
CC        lj_opt_dce.o
CC        lj_opt_loop.o
CC        lj_opt_split.o
CC        lj_opt_sink.o
CC        lj_mcode.o
CC        lj_snap.o
CC        lj_record.o
CC        lj_crecord.o
BUILDVM   lj_recdef.h
CC        lj_ffrecord.o
CC        lj_asm.o
CC        lj_trace.o
CC        lj_gdbjit.o
CC        lj_ctype.o
CC        lj_cdata.o
CC        lj_cconv.o
CC        lj_ccall.o
CC        lj_ccallback.o
CC        lj_carith.o
CC        lj_clib.o
CC        lj_cparse.o
CC        lj_lib.o
CC        lj_alloc.o
CC        lib_aux.o
BUILDVM   lj_libdef.h
CC        lib_base.o
CC        lib_math.o
CC        lib_bit.o
CC        lib_string.o
CC        lib_table.o
CC        lib_io.o
CC        lib_os.o
CC        lib_package.o
CC        lib_debug.o
CC        lib_jit.o
CC        lib_ffi.o
CC        lib_init.o
AR        libluajit.a
CC        luajit.o
BUILDVM   jit/vmdef.lua
DYNLINK   libluajit.so
LINK      luajit
warning: unsupported linker arg: -E
OK        Successfully built LuaJIT
make[1]: Leaving directory '/home/andy/Downloads/LuaJIT/src'
==== Successfully built LuaJIT 2.1.0-beta3 ====

[~/Downloads/LuaJIT]$ ls
COPYRIGHT  doc  dynasm  etc  Makefile  README  src

[~/Downloads/LuaJIT]$ ./src/
host/         jit/          libluajit.so  luajit        zig-cache/    

[~/Downloads/LuaJIT]$ ./src/luajit 
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2020 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> print(3 + 4)
7
> 

OK so that worked. Now for the real test - can we make it cross compile?

[~/Downloads/LuaJIT]$ git clean -xfdq
[~/Downloads/LuaJIT]$ export CC="zig cc -target aarch64-linux-gnu"
[~/Downloads/LuaJIT]$ export HOST_CC="zig cc"
[~/Downloads/LuaJIT]$ make CC="$CC" HOST_CC="$HOST_CC" TARGET_STRIP="echo"
==== Building LuaJIT 2.1.0-beta3 ====
make -C src
make[1]: Entering directory '/home/andy/Downloads/LuaJIT/src'
HOSTCC    host/minilua.o
HOSTLINK  host/minilua
DYNASM    host/buildvm_arch.h
HOSTCC    host/buildvm.o
HOSTCC    host/buildvm_asm.o
HOSTCC    host/buildvm_peobj.o
HOSTCC    host/buildvm_lib.o
HOSTCC    host/buildvm_fold.o
HOSTLINK  host/buildvm
BUILDVM   lj_vm.S
ASM       lj_vm.o
CC        lj_gc.o
BUILDVM   lj_ffdef.h
CC        lj_err.o
CC        lj_char.o
BUILDVM   lj_bcdef.h
CC        lj_bc.o
CC        lj_obj.o
CC        lj_buf.o
CC        lj_str.o
CC        lj_tab.o
CC        lj_func.o
CC        lj_udata.o
CC        lj_meta.o
CC        lj_debug.o
CC        lj_state.o
CC        lj_dispatch.o
CC        lj_vmevent.o
CC        lj_vmmath.o
CC        lj_strscan.o
CC        lj_strfmt.o
CC        lj_strfmt_num.o
CC        lj_api.o
CC        lj_profile.o
CC        lj_lex.o
CC        lj_parse.o
CC        lj_bcread.o
CC        lj_bcwrite.o
CC        lj_load.o
CC        lj_ir.o
CC        lj_opt_mem.o
BUILDVM   lj_folddef.h
CC        lj_opt_fold.o
CC        lj_opt_narrow.o
CC        lj_opt_dce.o
CC        lj_opt_loop.o
CC        lj_opt_split.o
CC        lj_opt_sink.o
CC        lj_mcode.o
CC        lj_snap.o
CC        lj_record.o
CC        lj_crecord.o
BUILDVM   lj_recdef.h
CC        lj_ffrecord.o
CC        lj_asm.o
CC        lj_trace.o
CC        lj_gdbjit.o
CC        lj_ctype.o
CC        lj_cdata.o
CC        lj_cconv.o
CC        lj_ccall.o
CC        lj_ccallback.o
CC        lj_carith.o
CC        lj_clib.o
CC        lj_cparse.o
CC        lj_lib.o
CC        lj_alloc.o
CC        lib_aux.o
BUILDVM   lj_libdef.h
CC        lib_base.o
CC        lib_math.o
CC        lib_bit.o
CC        lib_string.o
CC        lib_table.o
CC        lib_io.o
CC        lib_os.o
CC        lib_package.o
CC        lib_debug.o
CC        lib_jit.o
CC        lib_ffi.o
CC        lib_init.o
AR        libluajit.a
CC        luajit.o
BUILDVM   jit/vmdef.lua
DYNLINK   libluajit.so
libluajit.so
LINK      luajit
warning: unsupported linker arg: -E
luajit
OK        Successfully built LuaJIT
make[1]: Leaving directory '/home/andy/Downloads/LuaJIT/src'
==== Successfully built LuaJIT 2.1.0-beta3 ====

[~/Downloads/LuaJIT]$ file ./src/luajit 
./src/luajit: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 2.0.0, with debug_info, not stripped

It worked! Will it run in QEMU though?

[~/Downloads/LuaJIT]$ qemu-aarch64 -L ~/Downloads/glibc/multi-2.31/install/glibcs/aarch64-linux-gnu ./src/luajit
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2020 Mike Pall. http://luajit.org/
JIT: ON fold cse dce fwd dse narrow loop abc sink fuse
> print(4 + 3)
7
> 

Amazing. QEMU never fails to impress me.

Before we move on, I want to show one more thing. You can see above, in order to run the foreign-architecture binary, I had to pass -L ~/Downloads/glibc/multi-2.31/install/glibcs/aarch64-linux-gnu. This is due to the binary being dynamically linked. You can confirm this with the output from file above where it says: dynamically linked, interpreter /lib/ld-linux-aarch64.so.1

Often, when cross-compiling, it is useful to make a static binary. In the case of Linux, for example, this will make the resulting binary able to run on any Linux distribution, rather than only ones with a hard-coded glibc dynamic linker path of /lib/ld-linux-aarch64.so.1.

We can accomplish this by targeting musl rather than glibc:

[~/Downloads/LuaJIT]$ git clean -qxfd
[~/Downloads/LuaJIT]$ export CC="zig cc -target aarch64-linux-musl"
[~/Downloads/LuaJIT]$ make CC="$CC" CXX="$CXX" HOST_CC="$HOST_CC" TARGET_STRIP="echo"
==== Building LuaJIT 2.1.0-beta3 ====
(same output)
==== Successfully built LuaJIT 2.1.0-beta3 ====
[~/Downloads/LuaJIT]$ file src/luajit
src/luajit: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, not stripped
[~/Downloads/LuaJIT]$ qemu-aarch64 ./src/luajit
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2020 Mike Pall. http://luajit.org/
JIT: ON fold cse dce fwd dse narrow loop abc sink fuse
> print(11 + 22)
33

Here you can see the file command reported statically linked, and in the qemu command, the -L parameter was not needed.

Use Cases of `zig cc`

Alright, so I've given you a taste of what zig cc can do, but now I will list explicitly what I consider to be the use cases:

Experimentation

Sometimes you just want a tool that you can use to try out different things. It can quickly answer questions such as "What assembly does this code generate on MIPS vs ARM?". The widely popular Compiler Explorer serves this purpose.

zig cc provides a lightweight tool which can also answer questions such as, "What happens if I swap out glibc for musl?" and "How big is this executable when cross-compiled for Windows?". Here's me using Zig to quickly find out what the maximum UDP packet size is on Linux.

Since Zig is so easy to install - and it actually works everywhere without patches, even Linux distributions such as NixOS - it can often be a more convenient tool for running quick C test programs on your computer.

At the time of this writing, LLVM 10 was just released two hours ago. It will take days or weeks for it to become available in various system package managers. But you can already download a master branch build of Zig and play with the new features of Clang/LLVM 10. For example, improved RISC-V support!

andy@ark ~/tmp> zig cc -o hello hello.c -target riscv64-linux-musl
andy@ark ~/tmp> qemu-riscv64 ./hello
Hello, World!

Bundling a C compiler as part of a larger project

With Zig tarballs weighing in at under 45 MiB, zero system dependencies, no configuration, and MIT license, it makes for an ideal candidate when you need to bundle a C compiler along with another project.

For example, maybe you have a programming language that compiles to C. Zig is an obvious choice for what C compiler to ship with your language.

Or maybe you want to make a batteries-included IDE that ships with a compiler.

Lightweight alternative to a cross compilation environment

If you're trying to build something with a large dependency tree, you'll probably want to use a full cross compilation environment, such as mxe.cc or musl.cc.

But if you don't need such a sledgehammer, zig cc could be a useful alternative, especially if your goal is to compile for N different targets. Consider that musl.cc lists different tarballs for each architecture, each weighing in at roughly 85 MiB. Meanwhile Zig weighs in at 45 MiB and it supports all those architectures, plus glibc and Windows.

An alternative to installing MSVC on Windows

You could spend days - literally! - waiting for Microsoft Visual Studio to install, or you could install Zig and VS Code in a matter of minutes.

Under the Hood

If zig cc is built on top of Clang, why doesn't Clang just do this? What exactly is Zig doing on top of Clang to make this work?

The answer is, a lot, actually. I'll go over how it works here.

compiler-rt

compiler-rt is a library that provides "polyfill" implementations of language-supported features when the target does not have machine code instructions for it. For example, compiler-rt has the function __muldi3 to perform signed 64-bit integer multiplication on architectures that do not have a 64-bit wide integer multiplication instruction.

In the GNU world, compiler-rt is named libgcc.

Most C compilers ship with this library pre-built for the target. For example, on an Ubuntu (Bionic) system, with the build-essential package installed, you can find this at /lib/x86_64-linux-gnu/libgcc_s.so.1.

If you download clang+llvm-9.0.1-x86_64-linux-gnu-ubuntu-16.04.tar.xz and take a look around, clang actually does not even ship with compiler-rt. Instead, it relies on the system libgcc noted above. This is one reason that this tarball is Ubuntu-specific and does not work on other Linux distributions, FreeBSD's Linuxulator, or WSL, which have system files in different locations.

Zig's strategy with compiler-rt is that we have our own implementation of this library, written in Zig. Most of it is ported from LLVM's compiler-rt project, but we also have some of our own improvements on top of this.

Anyway, rather than depending on system compiler-rt being installed, or shipping a pre-compiled library, Zig ships its compiler-rt in source form, and lazily builds compiler-rt for the compilation target, and then caches the result using the caching system discussed above.

Zig's compiler-rt is not yet complete. However, completing it is a prerequisite for releasing Zig version 1.0.0.

libc

When C code calls printf, printf has to be implemented somewhere, and that somewhere is libc.

Some operating systems, such as FreeBSD and macOS, have a designated system libc, and it is the kernel syscall interface. On others, such as Windows and Linux, libc is optional, and therefore there are multiple options of which libc to use, if any.

As of the time of this writing, Zig can provide libcs for the following targets:

andy@ark ~> zig targets | jq .libc
[
  "aarch64_be-linux-gnu",
  "aarch64_be-linux-musl",
  "aarch64_be-windows-gnu",
  "aarch64-linux-gnu",
  "aarch64-linux-musl",
  "aarch64-windows-gnu",
  "armeb-linux-gnueabi",
  "armeb-linux-gnueabihf",
  "armeb-linux-musleabi",
  "armeb-linux-musleabihf",
  "armeb-windows-gnu",
  "arm-linux-gnueabi",
  "arm-linux-gnueabihf",
  "arm-linux-musleabi",
  "arm-linux-musleabihf",
  "arm-windows-gnu",
  "i386-linux-gnu",
  "i386-linux-musl",
  "i386-windows-gnu",
  "mips64el-linux-gnuabi64",
  "mips64el-linux-gnuabin32",
  "mips64el-linux-musl",
  "mips64-linux-gnuabi64",
  "mips64-linux-gnuabin32",
  "mips64-linux-musl",
  "mipsel-linux-gnu",
  "mipsel-linux-musl",
  "mips-linux-gnu",
  "mips-linux-musl",
  "powerpc64le-linux-gnu",
  "powerpc64le-linux-musl",
  "powerpc64-linux-gnu",
  "powerpc64-linux-musl",
  "powerpc-linux-gnu",
  "powerpc-linux-musl",
  "riscv64-linux-gnu",
  "riscv64-linux-musl",
  "s390x-linux-gnu",
  "s390x-linux-musl",
  "sparc-linux-gnu",
  "sparcv9-linux-gnu",
  "wasm32-freestanding-musl",
  "x86_64-linux-gnu",
  "x86_64-linux-gnux32",
  "x86_64-linux-musl",
  "x86_64-windows-gnu"
]

In order to provide libc on these targets, Zig ships with a subset of the source files for these projects:

For each libc, there is a process for upgrading to a new release. This process is a sort of pre-processing step. We still end up with source files, but we de-duplicate non-multi-arch source files into multi-arch source files.

glibc

glibc is the most involved. The first step is building glibc for every target that it supports, which takes upwards of 24 hours and 74 GiB of disk space.

From here, the process_headers tool inspects all the header files from all the targets, and identifies which files are the same across all targets, and which header files are target-specific. They are then sorted into the corresponding directories in Zig's source tree, in:

Additionally, Linux header files are not included in glibc, and so the same process is applied to Linux header files, with the directories:

That takes care of the header files, but now we have the problem of dynamic linking against glibc, without touching any system files.

For this, we have the update_glibc tool. Given the path to the glibc source directory, it finds all the .abilist text files and uses them to produce 3 simple but crucial files:

Together, these files amount to only 192 KB (27 KB gzipped), and they allow Zig to target any version of glibc.

Yes, I did not make a typo there. Zig can target any of the 42 versions of glibc for any of the architectures listed above. I'll show you:

andy@ark ~/tmp> cat rand.zig 
const std = @import("std");

pub fn main() anyerror!void {
    var buf: [10]u8 = undefined;
    _ = std.c.getrandom(&buf, buf.len, 0);
    std.debug.warn("random bytes: {x}\n", .{buf});
}
andy@ark ~/tmp> zig build-exe rand.zig -lc -target native-native-gnu.2.25
andy@ark ~/tmp> ./rand
random bytes: e2059382afb599ea6d29
andy@ark ~/tmp> zig build-exe rand.zig -lc -target native-native-gnu.2.24
lld: error: undefined symbol: getrandom
>>> referenced by rand.zig:5 (/home/andy/tmp/rand.zig:5)
>>>               ./rand.o:(main.0)

Sure enough, if you look at the man page for getrandom, it says:

Support was added to glibc in version 2.25.

When no explicit glibc version is requested, and the target OS is the native (host) OS, Zig detects the native glibc version by inspecting the Zig executable's own dynamically linked libraries, looking for glibc, and checking the version. It turns out you can look for libc.so.6 and then readlink on that, and it will look something like libc-2.27.so. When this strategy does not work, Zig looks at /usr/bin/env, looking for the same thing. Since this file path is hard-coded into countless shebang lines, it's a pretty safe bet to find out the dynamic linker path and glibc version (if any) of the native system!

zig cc currently does not provide a way to choose a specific glibc version (because C compilers do not provide a way), and so Zig chooses the native version for compiling natively, and the default (2.17) for cross-compiling. However, I'm sure this problem can be solved, even when using zig cc. For example, maybe it could support an environment variable, or simply introduce an extra command line option that does not conflict with any Clang options.

When you request a certain version of glibc, Zig uses those text files noted above to create dummy .so files to link against, which contain exactly the correct set of symbols (with appropriate name mangling) based on the requested version. The symbols will be resolved at runtime, by the dynamic linker on the target platform.

In this way, most of libc in the glibc case resides on the target file system. But not all of it! There are still the "C runtime start files":

These are statically compiled into every binary that dynamically links glibc, and their ABI is therefore Very Very Stable.

And so, Zig bundles a small subset of glibc's source files needed to build these object files from source for every target. The total size of this comes out to 1.4 MiB (252 KB gzipped). I do think there is some room for improvement here, but I digress.

There are a couple of patches to this small subset of glibc source files, which simplify them to avoid including too many .h files, since the end result that we need is some bare bones object files, and not all of glibc.

And finally, we certainly do not ship the build system of glibc with Zig! I manually inspected, audited, and analyzed glibc's build system, and then by hand wrote code in the Zig compiler which hooks into Zig's caching system and performs a minimal build of only these start files, as needed.

musl

The process for preparing musl to ship with Zig is much simpler by comparison.

It still involves building musl for every target architecture that it supports, but in this case only the install-headers target has to be run, and it takes less than a minute, even to do it for all targets.

The same process_headers tool tool used for glibc headers is used on the musl headers:

Unlike glibc, musl supports building statically. Zig currently assumes a static libc when musl is chosen, and does not support dynamically linking against musl, although that could potentially be added in the future.

And so for musl, zig actually bundles most - but still not all - of musl's source files. Everything in arch, crt, compat, src, and include gets copied in.

Again much like glibc, I carefully studied musl's build system, and then hand-coded logic in the Zig compiler to build these source files. In musl's case it is simpler - just a bit of logic having to do with the file extension, and whether to override files with an architecture-specific file. The only file that needs to be patched (by hand) is version.h, which is normally generated during the configure phase in musl's build system.

I really appreciate Rich Felker's efforts to make musl simple to utilize in this way, and he has been incredibly helpful in the #musl IRC channel when I ask questions. I proudly sponsor Rich Felker for $150/month.

mingw-w64

mingw-w64 was an absolute joy to support in Zig. The beautiful thing about this project is that they have already been transitioning into having one set of header files that applies to all architectures (using #ifdefs only where needed). One set of header files is sufficient to support all four architectures: arm, aarch64, x86, and x86_64.

So for updating headers, all we have to do is build mingw-w64, then:

mv $INSTALLPREFIX/include $ZIGSRC/lib/libc/include/any-windows-any

After doing this for all 3 libcs, the libc/include directory looks like this:

aarch64_be-linux-any   i386-linux-musl           powerpc-linux-any
aarch64_be-linux-gnu   mips64el-linux-any        powerpc-linux-gnu
aarch64-linux-any      mips64el-linux-gnuabi64   powerpc-linux-musl
aarch64-linux-gnu      mips64el-linux-gnuabin32  riscv32-linux-any
aarch64-linux-musl     mips64-linux-any          riscv64-linux-any
any-linux-any          mips64-linux-gnuabi64     riscv64-linux-gnu
any-windows-any        mips64-linux-gnuabin32    riscv64-linux-musl
armeb-linux-any        mips64-linux-musl         s390x-linux-any
armeb-linux-gnueabi    mipsel-linux-any          s390x-linux-gnu
armeb-linux-gnueabihf  mipsel-linux-gnu          s390x-linux-musl
arm-linux-any          mips-linux-any            sparc-linux-gnu
arm-linux-gnueabi      mips-linux-gnu            sparcv9-linux-gnu
arm-linux-gnueabihf    mips-linux-musl           x86_64-linux-any
arm-linux-musl         powerpc64le-linux-any     x86_64-linux-gnu
generic-glibc          powerpc64le-linux-gnu     x86_64-linux-gnux32
generic-musl           powerpc64-linux-any       x86_64-linux-musl
i386-linux-any         powerpc64-linux-gnu
i386-linux-gnu         powerpc64-linux-musl

When Zig generates a C command line to send to clang, it puts the appropriate include paths using -I depending on the target. For example, if the target is aarch64-linux-musl, then the following command line parameters are appended:

Anyway back to mingw-w64.

Again, Zig includes a subset of source files from mingw-w64 with a few patches applied to make things compile successfully.

The Zig compiler code that builds mingw-w64 from source files emulates only the parts of the build system that are needed for this subset. This includes preprocessing .def.in files to get .def files, and then in-turn using LLD to generate .lib files from the .def files, which allows Zig to provide .lib files for any Windows DLL, such as kernel32.dll or even opengl32.dll.

Invoking Clang Without a System Dependency

Since Zig already links against Clang libraries for the translate-c feature, it was not much more cost to expose the main() entry point from Zig. So that's exactly what we do:

The following patch is applied:

--- a/src/zig_clang_driver.cpp
+++ b/src/zig_clang_driver.cpp
@@ -206,8 +205,6 @@
                     void *MainAddr);
 extern int cc1as_main(ArrayRef<const char *> Argv, const char *Argv0,
                       void *MainAddr);
-extern int cc1gen_reproducer_main(ArrayRef<const char *> Argv,
-                                  const char *Argv0, void *MainAddr);
 
 static void insertTargetAndModeArgs(const ParsedClangName &NameParts,
                                     SmallVectorImpl<const char *> &ArgVector,
@@ -330,19 +327,18 @@
   if (Tool == "-cc1as")
     return cc1as_main(makeArrayRef(ArgV).slice(2), ArgV[0],
                       GetExecutablePathVP);
-  if (Tool == "-cc1gen-reproducer")
-    return cc1gen_reproducer_main(makeArrayRef(ArgV).slice(2), ArgV[0],
-                                  GetExecutablePathVP);
   // Reject unknown tools.
   llvm::errs() << "error: unknown integrated tool '" << Tool << "'. "
                << "Valid tools include '-cc1' and '-cc1as'.\n";
   return 1;
 }
 
-int main(int argc_, const char **argv_) {
+extern "C" int ZigClang_main(int argc_, const char **argv_);
+int ZigClang_main(int argc_, const char **argv_) {
   noteBottomOfStack();
   llvm::InitLLVM X(argc_, argv_);
-  SmallVector<const char *, 256> argv(argv_, argv_ + argc_);
+  size_t argv_offset = (strcmp(argv_[1], "-cc1") == 0 || strcmp(argv_[1], "-cc1as") == 0) ? 0 : 1;
+  SmallVector<const char *, 256> argv(argv_ + argv_offset, argv_ + argc_);
 
   if (llvm::sys::Process::FixupStandardFileDescriptors())
     return 1;

This disables some cruft, and then renames main to ZigClang_main so that it can be called like any other function. Next, in Zig's actual main, it looks for clang as the first parameter, and calls it.

So, zig clang is low-level undocumented API that Zig exposes for directly invoking Clang. But zig cc is much higher level than that. When Zig needs to compile C code, it invokes itself as a child process, taking advantage of zig clang. zig cc on the other hand, has a more difficult job: it must parse Clang's command line options and map those to the Zig compiler's settings, so that ultimately zig clang can be invoked as a child process.

Parsing Clang Command Line Options

When using zig cc, Zig acts as a proxy between the user and Clang. It does not need to understand all the parameters, but it does need to understand some of them, such as the target. This means that Zig must understand when a C command line parameter expects to "consume" the next parameter on the command line.

For example, -z -target would mean to pass -target to the linker, whereas -E -target would mean that the next parameter specifies the target.

Clang has a long list of command line options and so it would be foolish to try to hard-code all of them.

Fortunately, LLVM has a file "options.td" which describes all of its command line parameter options in some obscure format. But fortunately again, LLVM comes with the llvm-tblgen tool that can dump it as JSON format.

Zig has an update_clang_options tool which processes this JSON dump and produces a big sorted list of Clang's command line options.

Combined with a list of "known options" which correspond to Zig compiler options, this is used to make an iterator API that zig cc uses to parse command line parameters and instantiate a Zig compiler instance. Any Clang options that Zig is not aware of are forwarded to Clang directly. Some parameters are handled specially.

Linking

This part is pretty straightforward. Zig depends on LLD for linking rather than shelling out to the system linker, like GCC and Clang do.

When you use -o with zig cc, Clang is not actually acting as a linker driver here. Zig is still the linker driver.

Everybody Wins

Now that I've spent this entire blog article comparing Zig and Clang as if they are competitors, let me make it absolutely clear that both of these are harmonious, mutually beneficial open-source projects. It's pretty obvious how Clang and the entire LLVM project are massively beneficial to the Zig project, since Zig builds on top of them.

But it works the other way, too.

With Zig's focus on cross-compiling, its test suite has been expanding rapidly to cover a large number of architectures and operating systems, leading to dozens of bugs reported upstream and patches sent, including, for example:

Everybody wins.

This is still experimental!

I have only recently landed zig cc support last week, and it is still experimental. Please do not expect it to be production quality yet.

Zig's 0.6.0 release is right around the corner, scheduled for April 13th. I will be sure to provide an update on the release notes on how stable and robust you can expect zig cc to be in the 0.6.0 release.

There are some follow-up issues related to zig cc which are still open:

As always, Contributions are most welcome.

💖 Sponsor Zig 💖

Sponsor Andrew Kelley on GitHub

If you're reading this and you already sponsor me, thank you so much! I wake up every day absolutely thrilled that I get to do this for my full time job.

As Zig has been gaining popularity, demands for my time have been growing faster than funds to hire another full-time programmer. Every recurring donation helps, and if the funds keep growing then soon enough the Zig project will have two full-time programmers.

That's all folks. I hope you and your loved ones are well.


Thanks for reading my blog post.

Home Page | RSS feed | Sponsor the Zig Software Foundation