Andrew Kelley - Zig: December 2017 in Review (2018 Jan 03)

Zig: December 2017 in Review

I figured since I ask people to donate monthly, I will start giving a monthly progress report to provide accountability.

So here's everything that happened in Zig land in December 2017:

enum tag types

You can now specify the tag type for an enum (#305):

const Small2 = enum (u2) {
    One,
    Two,
};

If you specify the tag type for an enum, you can put it in a packed struct:

const A = enum (u3) {
    One,
    Two,
    Three,
    Four,
    One2,
    Two2,
    Three2,
    Four2,
};

const B = enum (u3) {
    One3,
    Two3,
    Three3,
    Four3,
    One23,
    Two23,
    Three23,
    Four23,
};

const C = enum (u2) {
    One4,
    Two4,
    Three4,
    Four4,
};

const BitFieldOfEnums = packed struct {
    a: A,
    b: B,
    c: C,
};

const bit_field_1 = BitFieldOfEnums {
    .a = A.Two,
    .b = B.Three3,
    .c = C.Four4,
};

You can no longer cast from an enum to an arbitrary integer. Instead you must cast to the enum tag type and vice versa:

const Small2 = enum (u2) {
    One,
    Two,
};
test "casting enum to its tag type" {
    testCastEnumToTagType(Small2.Two);
}

fn testCastEnumToTagType(value: Small2) {
    assert(u2(value) == 1);
}

enum tag values

Now you can set the tag values of enums:

const MultipleChoice = enum(u32) {
    A = 20,
    B = 40,
    C = 60,
    D = 1000,
};

Complete enum and union overhaul

Related issue: #618

Enums are now a simple mapping between a symbol and a number. They can no longer contain payloads.

Unions have been upgraded and can now accept an enum as an argument:

const TheTag = enum {A, B, C};
const TheUnion = union(TheTag) { A: i32, B: i32, C: i32 };
test "union field access gives the enum values" {
    assert(TheUnion.A == TheTag.A);
    assert(TheUnion.B == TheTag.B);
    assert(TheUnion.C == TheTag.C);
}

If you want to auto-create an enum for a union, you can use the enum keyword like this:

const TheUnion2 = union(enum) {
    Item1,
    Item2: i32,
};

You can switch on a union-enum just like you could previously with an enum:

const SwitchProngWithVarEnum = union(enum) {
    One: i32,
    Two: f32,
    Meh: void,
};
fn switchProngWithVarFn(a: &const SwitchProngWithVarEnum) {
    switch(*a) {
        SwitchProngWithVarEnum.One => |x| {
            assert(x == 13);
        },
        SwitchProngWithVarEnum.Two => |x| {
            assert(x == 13.0);
        },
        SwitchProngWithVarEnum.Meh => |x| {
            const v: void = x;
        },
    }
}

However, if you do not give an enum to a union, the tag value is not visible to the programmer:

const Payload = union {
    A: i32,
    B: f64,
    C: bool,
};
export fn entry() {
    const a = Payload { .A = 1234 };
    foo(a);
}
fn foo(a: &const Payload) {
    switch (*a) {
        Payload.A => {},
        else => unreachable,
    }
}
test.zig:11:13: error: switch on union which has no attached enum
    switch (*a) {
            ^
test.zig:1:17: note: consider 'union(enum)' here
const Payload = union {
                ^
test.zig:12:16: error: container 'Payload' has no member called 'A'
        Payload.A => {},
               ^

There is still debug safety though!

const Foo = union {
    float: f32,
    int: u32,
};

pub fn main() -> %void {
    var f = Foo { .int = 42 };
    bar(&f);
}

fn bar(f: &Foo) {
    f.float = 12.34;
}
access of inactive union field
lib/zig/std/special/panic.zig:12:35: 0x0000000000203674 in ??? (test)
        @import("std").debug.panic("{}", msg);
                                  ^
test.zig:12:6: 0x0000000000217bd7 in ??? (test)
    f.float = 12.34;
     ^
test.zig:8:8: 0x0000000000217b7c in ??? (test)
    bar(&f);
       ^
Aborted

However, if you make an extern union to be compatible with C code, there is no debug safety, just like a C union.

Other tidbits:

test "cast tag type of union to union" {
    var x: Value2 = Letter2.B;
    assert(Letter2(x) == Letter2.B);
}
const Letter2 = enum { A, B, C };
const Value2 = union(Letter2) { A: i32, B, C, };

test "implicit cast union to its tag type" {
    var x: Value2 = Letter2.B;
    assert(x == Letter2.B);
    giveMeLetterB(x);
}
fn giveMeLetterB(x: Letter2) {
    assert(x == Value2.B);
}

Update LLD fork to 5.0.1rc2

We have a fork of LLD in the zig project because of several upstream issues, all of which I have filed bugs for:

When LLVM 6.0.0 comes out, Zig will have to keep its fork because of the one issue, but we can drop all the other patches since they have been accepted upstream.

Self-hosted compiler progress

The self-hosted compiler effort has begun.

So far we have a tokenizer, and an incomplete parser and formatter. The code uses no recursion and therefore has compile-time known stack space usage. See #157

The self-hosted compiler works on every supported platform, is built using the zig build system, tested with zig test, links against LLVM, and can import 100% of the LLVM symbols from the LLVM C-API .h files - even the inline functions.

There is one C++ file in Zig which uses the more powerful LLVM C++ API (for example to create debug information) and exposes a C API. This file is now shared between the C++ self-hosted compiler and the self-hosted compiler. In stage1, we create a static library with this one file in it, and then use that library in both the C++ compiler and the self-hosted compiler.

Higher level arg-parsing API

It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows. This lead to a command line args API like this:

pub fn main() -> %void {
    var arg_it = os.args();
    // skip my own exe name
    _ = arg_it.skip();
    while (arg_it.next(allocator)) |err_or_arg| {
        const arg = %return err_or_arg;
        defer allocator.free(arg);
        // use the arg...
    }
}

Yikes, a bit cumbersome. I added a higher level API. Now you can call std.os.argsAlloc and get a %[]const []u8, and you just have to call std.os.argsFree when you're done with it.

pub fn main() -> %void {
    const allocator = std.heap.c_allocator;

    const args = %return os.argsAlloc(allocator);
    defer os.argsFree(allocator, args);

    var arg_i: usize = 1;
    while (arg_i < args.len) : (arg_i += 1) {
        const arg = args[arg_i];
        // do something with arg...
    }
}

Better! Single point of failure.

For now this uses the other API under the hood, but it could be reimplemented with the same API to do a single allocation.

I added a new kind of test to make sure command line argument parsing works.

Automatic C-to-Zig translation

#define NRF_GPIO ((NRF_GPIO_Type *) NRF_GPIO_BASE)

Zig now understands this C macro.

std.mem

std.os.ChildProcess

I added std.os.ChildProcess.exec for when you want to spawn a child process, wait for it to complete, and then capture the stdandard output into a buffer.
pub fn exec(self: &Builder, argv: []const []const u8) -> []u8 {
    const max_output_size = 100 * 1024;
    const result = os.ChildProcess.exec(self.allocator, argv, null, null, max_output_size) %% |err| {
        std.debug.panic("Unable to spawn {}: {}", argv[0], @errorName(err));
    };
    switch (result.term) {
        os.ChildProcess.Term.Exited => |code| {
            if (code != 0) {
                warn("The following command exited with error code {}:\n", code);
                printCmd(null, argv);
                warn("stderr:{}\n", result.stderr);
                std.debug.panic("command failed");
            }
            return result.stdout;
        },
        else => {
            warn("The following command terminated unexpectedly:\n");
            printCmd(null, argv);
            warn("stderr:{}\n", result.stderr);
            std.debug.panic("command failed");
        },
    }
}

std.sort

Hejsil pointed out that the quicksort implementation in the standard library failed a simple test case.

There was another problem with the implementation of sort in the standard library, which is that it used O(n) stack space via recursion. This is fundamentally insecure, especially if you consider that the length of an array you might want to sort could be user input. It prevents #157 from working as well.

I had a look at Wikipedia's Comparison of Sorting Algorithms and only 1 sorting algorithm checked all the boxes:

And that algorithm is Block sort.

I found a high quality implementation of block sort in C, which is licensed under the public domain.

I ported the code from C to Zig, integrated it into the standard library, and it passed all tests first try. Amazing.

Surely, I thought, there must be some edge case. So I created a simple fuzz tester:

test "sort fuzz testing" {
    var rng = std.rand.Rand.init(0x12345678);
    const test_case_count = 10;
    var i: usize = 0;
    while (i < test_case_count) : (i += 1) {
        fuzzTest(&rng);
    }
}

var fixed_buffer_mem: [100 * 1024]u8 = undefined;

fn fuzzTest(rng: &std.rand.Rand) {
    const array_size = rng.range(usize, 0, 1000);
    var fixed_allocator = mem.FixedBufferAllocator.init(fixed_buffer_mem[0..]);
    var array = %%fixed_allocator.allocator.alloc(IdAndValue, array_size);
    // populate with random data
    for (array) |*item, index| {
        item.id = index;
        item.value = rng.range(i32, 0, 100);
    }
    sort(IdAndValue, array, cmpByValue);

    var index: usize = 1;
    while (index < array.len) : (index += 1) {
        if (array[index].value == array[index - 1].value) {
            assert(array[index].id > array[index - 1].id);
        } else {
            assert(array[index].value > array[index - 1].value);
        }
    }
}

This test passed as well. And so I think this problem is solved.

@export

There is now an @export builtin function which can be used in a comptime block to conditionally export a function:

const builtin = @import("builtin");

comptime {
    const strong_linkage = builtin.GlobalLinkage.Strong;
    if (builtin.link_libc) {
        @export("main", main, strong_linkage);
    } else if (builtin.os == builtin.Os.windows) {
        @export("WinMainCRTStartup", WinMainCRTStartup, strong_linkage);
    } else {
        @export("_start", _start, strong_linkage);
    }
}

It can also be used to create aliases:

const builtin = @import("builtin");
const is_test = builtin.is_test;

comptime {
    const linkage = if (is_test) builtin.GlobalLinkage.Internal else builtin.GlobalLinkage.Weak;
    const strong_linkage = if (is_test) builtin.GlobalLinkage.Internal else builtin.GlobalLinkage.Strong;

    @export("__letf2", @import("comparetf2.zig").__letf2, linkage);
    @export("__getf2", @import("comparetf2.zig").__getf2, linkage);

    if (!is_test) {
        // only create these aliases when not testing
        @export("__cmptf2", @import("comparetf2.zig").__letf2, linkage);
        @export("__eqtf2", @import("comparetf2.zig").__letf2, linkage);
        @export("__lttf2", @import("comparetf2.zig").__letf2, linkage);
        @export("__netf2", @import("comparetf2.zig").__letf2, linkage);
        @export("__gttf2", @import("comparetf2.zig").__getf2, linkage);
    }
}

Previous export syntax is still allowed. See #462 and #420.

Labeled loops, blocks, break, and continue, and R.I.P. goto

We used to have labels and goto like this:

export fn entry() {
    label:
    goto label;
}

Now this does not work, because goto is gone.

test.zig:2:10: error: expected token ';', found ':'
    label:
         ^

There are a few reasons to use goto, but all of the use cases are better served with other zig control flow features:

goto backward

export fn entry() {
    start_over:

    while (some_condition) {
        // do something...
        goto start_over;
    }
}

Instead, use a loop!

export fn entry() {
    outer: while (true) {

        while (some_condition) {
            // do something...
            continue :outer;
        }

        break;
    }
}

goto forward

pub fn findSection(elf: &Elf, name: []const u8) -> %?&SectionHeader {
    var file_stream = io.FileInStream.init(elf.in_file);
    const in = &file_stream.stream;

    section_loop: for (elf.section_headers) |*elf_section| {
        if (elf_section.sh_type == SHT_NULL) continue;

        const name_offset = elf.string_section.offset + elf_section.name;
        %return elf.in_file.seekTo(name_offset);

        for (name) |expected_c| {
            const target_c = %return in.readByte();
            if (target_c == 0 or expected_c != target_c) goto next_section;
        }

        {
            const null_byte = %return in.readByte();
            if (null_byte == 0) return elf_section;
        }
next_section:
    }

    return null;
}

Looks like the use case is breaking out of an outer loop:

pub fn findSection(elf: &Elf, name: []const u8) -> %?&SectionHeader {
    var file_stream = io.FileInStream.init(elf.in_file);
    const in = &file_stream.stream;

    section_loop: for (elf.section_headers) |*elf_section| {
        if (elf_section.sh_type == SHT_NULL) continue;

        const name_offset = elf.string_section.offset + elf_section.name;
        %return elf.in_file.seekTo(name_offset);

        for (name) |expected_c| {
            const target_c = %return in.readByte();
            if (target_c == 0 or expected_c != target_c) continue :section_loop;
        }

        {
            const null_byte = %return in.readByte();
            if (null_byte == 0) return elf_section;
        }
    }

    return null;
}

You can also break out of arbitrary blocks:

export fn entry() {
    outer: {

        while (some_condition) {
            // do something...
            break :outer;
        }
    }
}

This can be used to return a value from a block in the same way you can return a value from a function:

export fn entry() {
    const value = init: {
        for (slice) |item| {
            if (item > 100)
                break :init item;
        }
        break :init 0;
    };
}

Omitting a semicolon no longer causes the value to be returned by the block. Instead you must use explicit block labels to return a value from a block. I'm considering a keyword such as result which defaults to the current block.

Removal of goto caused a regression in C-to-Zig translation: Switch statements no longer can be translated. However this code will be resurrected soon using labeled loops and labeled break instead of goto.

See #346, #630, and #629.

New IR pass iteration strategy

Before:

while (cond) {
    if (false) { }
    break;
}

Pretty crazy right? Something as simple as this would crash the compiler.

Now:

This improvement deletes a lot of messy code:

 5 files changed, 288 insertions(+), 1243 deletions(-) 

And it also fixes comptime branches not being respected sometimes:

export fn entry() {
    while (false) {
        @compileError("bad");
    }
}

Before, this would cause a compile error. Now the while loop respects the implicit compile-time.

See #667.

Bug Fixes

Miscellaneous changes

Thank you contributors!

Thank you financial supporters!

Special thanks to those who donate monthly:


Thanks for reading my blog post.

Home Page | RSS feed | Sponsor the Zig Software Foundation