Zig: December 2017 in Review
I figured since I ask people to donate monthly, I will start giving a monthly progress report to provide accountability.
So here's everything that happened in Zig land in December 2017:
enum tag types
You can now specify the tag type for an enum (#305):
const Small2 = enum (u2) {
One,
Two,
};
If you specify the tag type for an enum, you can put it in a packed struct:
const A = enum (u3) {
One,
Two,
Three,
Four,
One2,
Two2,
Three2,
Four2,
};
const B = enum (u3) {
One3,
Two3,
Three3,
Four3,
One23,
Two23,
Three23,
Four23,
};
const C = enum (u2) {
One4,
Two4,
Three4,
Four4,
};
const BitFieldOfEnums = packed struct {
a: A,
b: B,
c: C,
};
const bit_field_1 = BitFieldOfEnums {
.a = A.Two,
.b = B.Three3,
.c = C.Four4,
};
You can no longer cast from an enum to an arbitrary integer. Instead you must cast to the enum tag type and vice versa:
const Small2 = enum (u2) {
One,
Two,
};
test "casting enum to its tag type" {
testCastEnumToTagType(Small2.Two);
}
fn testCastEnumToTagType(value: Small2) {
assert(u2(value) == 1);
}
enum tag values
Now you can set the tag values of enums:
const MultipleChoice = enum(u32) {
A = 20,
B = 40,
C = 60,
D = 1000,
};
Complete enum and union overhaul
Related issue: #618
Enums are now a simple mapping between a symbol and a number. They can no longer contain payloads.
Unions have been upgraded and can now accept an enum as an argument:
const TheTag = enum {A, B, C};
const TheUnion = union(TheTag) { A: i32, B: i32, C: i32 };
test "union field access gives the enum values" {
assert(TheUnion.A == TheTag.A);
assert(TheUnion.B == TheTag.B);
assert(TheUnion.C == TheTag.C);
}
If you want to auto-create an enum for a union, you can use the enum
keyword like this:
const TheUnion2 = union(enum) {
Item1,
Item2: i32,
};
You can switch on a union-enum just like you could previously with an enum:
const SwitchProngWithVarEnum = union(enum) {
One: i32,
Two: f32,
Meh: void,
};
fn switchProngWithVarFn(a: &const SwitchProngWithVarEnum) {
switch(*a) {
SwitchProngWithVarEnum.One => |x| {
assert(x == 13);
},
SwitchProngWithVarEnum.Two => |x| {
assert(x == 13.0);
},
SwitchProngWithVarEnum.Meh => |x| {
const v: void = x;
},
}
}
However, if you do not give an enum to a union, the tag value is not visible to the programmer:
const Payload = union {
A: i32,
B: f64,
C: bool,
};
export fn entry() {
const a = Payload { .A = 1234 };
foo(a);
}
fn foo(a: &const Payload) {
switch (*a) {
Payload.A => {},
else => unreachable,
}
}
test.zig:11:13: error: switch on union which has no attached enum
switch (*a) {
^
test.zig:1:17: note: consider 'union(enum)' here
const Payload = union {
^
test.zig:12:16: error: container 'Payload' has no member called 'A'
Payload.A => {},
^
There is still debug safety though!
const Foo = union {
float: f32,
int: u32,
};
pub fn main() -> %void {
var f = Foo { .int = 42 };
bar(&f);
}
fn bar(f: &Foo) {
f.float = 12.34;
}
access of inactive union field
lib/zig/std/special/panic.zig:12:35: 0x0000000000203674 in ??? (test)
@import("std").debug.panic("{}", msg);
^
test.zig:12:6: 0x0000000000217bd7 in ??? (test)
f.float = 12.34;
^
test.zig:8:8: 0x0000000000217b7c in ??? (test)
bar(&f);
^
Aborted
However, if you make an extern union
to be compatible with C code,
there is no debug safety, just like a C union.
Other tidbits:
-
@enumTagName
is renamed to @tagName -
@EnumTagType
is renamed to @TagType, and it works on both enums and union-enums. - There is no longer an
EnumTag
type - It is now an error for enums and unions to have 0 fields. However you can still have a struct with 0 fields.
- union values can implicitly cast to enum values when the enum type is the tag type of the union and the union value tag is comptime known to have a void field type. likewise, enum values can implicitly cast to union values. See #642.
test "cast tag type of union to union" {
var x: Value2 = Letter2.B;
assert(Letter2(x) == Letter2.B);
}
const Letter2 = enum { A, B, C };
const Value2 = union(Letter2) { A: i32, B, C, };
test "implicit cast union to its tag type" {
var x: Value2 = Letter2.B;
assert(x == Letter2.B);
giveMeLetterB(x);
}
fn giveMeLetterB(x: Letter2) {
assert(x == Value2.B);
}
Update LLD fork to 5.0.1rc2
We have a fork of LLD in the zig project because of several upstream issues, all of which I have filed bugs for:
- LDD calls exit after successful link. Patch to fix sent upstream and accepted.
- LDD crashes on a linker script with empty sections. Fixed upstream.
- Buggy MACH-O code causes assertion failure in simple object. We have a hacky workaround for this bug in zig's fork of LLD, but the workaround is not good enough to send upstream. See #662 for more details.
- MACH-O: Bad ASM code generated for __stub_helpers section Thanks to Patricio V. for sending the fix upstream, which has been accepted.
When LLVM 6.0.0 comes out, Zig will have to keep its fork because of the one issue, but we can drop all the other patches since they have been accepted upstream.
Self-hosted compiler progress
The self-hosted compiler effort has begun.
So far we have a tokenizer, and an incomplete parser and formatter. The code uses no recursion and therefore has compile-time known stack space usage. See #157
The self-hosted compiler works on every supported platform, is built using
the zig build system, tested with zig test
, links against LLVM,
and can import 100% of the LLVM symbols from the LLVM
C-API .h files - even the inline functions.
There is one C++ file in Zig which uses the more powerful LLVM C++ API (for example to create debug information) and exposes a C API. This file is now shared between the C++ self-hosted compiler and the self-hosted compiler. In stage1, we create a static library with this one file in it, and then use that library in both the C++ compiler and the self-hosted compiler.
Higher level arg-parsing API
It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows. This lead to a command line args API like this:
pub fn main() -> %void {
var arg_it = os.args();
// skip my own exe name
_ = arg_it.skip();
while (arg_it.next(allocator)) |err_or_arg| {
const arg = %return err_or_arg;
defer allocator.free(arg);
// use the arg...
}
}
Yikes, a bit cumbersome. I added a higher level API. Now you can call
std.os.argsAlloc
and get a %[]const []u8
, and you just have to call
std.os.argsFree
when you're done with it.
pub fn main() -> %void {
const allocator = std.heap.c_allocator;
const args = %return os.argsAlloc(allocator);
defer os.argsFree(allocator, args);
var arg_i: usize = 1;
while (arg_i < args.len) : (arg_i += 1) {
const arg = args[arg_i];
// do something with arg...
}
}
Better! Single point of failure.
For now this uses the other API under the hood, but it could be reimplemented with the same API to do a single allocation.
I added a new kind of test to make sure command line argument parsing works.
Automatic C-to-Zig translation
- Translation now understands enum tag types.
- I refactored the C macro parsing, and made it understand pointer casting. Now some kinds of int-to-ptr code in .h files for embedded programming just work:
#define NRF_GPIO ((NRF_GPIO_Type *) NRF_GPIO_BASE)
Zig now understands this C macro.
std.mem
- add
aligned
functions to Allocator interface mem.Allocator
initializes bytes to undefined. This does nothing in ReleaseFast mode. In Debug and ReleaseSafe modes, it initializes bytes to0xaa
which helps catch memory errors.- add
mem.FixedBufferAllocator
std.os.ChildProcess
I addedstd.os.ChildProcess.exec
for when you want to spawn a child process, wait for it
to complete, and then capture the stdandard output into a buffer.
pub fn exec(self: &Builder, argv: []const []const u8) -> []u8 {
const max_output_size = 100 * 1024;
const result = os.ChildProcess.exec(self.allocator, argv, null, null, max_output_size) %% |err| {
std.debug.panic("Unable to spawn {}: {}", argv[0], @errorName(err));
};
switch (result.term) {
os.ChildProcess.Term.Exited => |code| {
if (code != 0) {
warn("The following command exited with error code {}:\n", code);
printCmd(null, argv);
warn("stderr:{}\n", result.stderr);
std.debug.panic("command failed");
}
return result.stdout;
},
else => {
warn("The following command terminated unexpectedly:\n");
printCmd(null, argv);
warn("stderr:{}\n", result.stderr);
std.debug.panic("command failed");
},
}
}
std.sort
Hejsil pointed out that the quicksort implementation in the standard library failed a simple test case.
There was another problem with the implementation of sort in the standard library,
which is that it used O(n)
stack space via recursion. This is fundamentally
insecure, especially if you consider that the length of an array you might want to sort could be
user input. It prevents #157
from working as well.
I had a look at Wikipedia's Comparison of Sorting Algorithms and only 1 sorting algorithm checked all the boxes:
- Best case
O(n)
complexity (adaptive sort) - Average case
O(n * log(n))
complexity - Worst case
O(n * log(n))
complexity O(1)
memory- Stable sort
And that algorithm is Block sort.
I found a high quality implementation of block sort in C, which is licensed under the public domain.
I ported the code from C to Zig, integrated it into the standard library, and it passed all tests first try. Amazing.
Surely, I thought, there must be some edge case. So I created a simple fuzz tester:
test "sort fuzz testing" {
var rng = std.rand.Rand.init(0x12345678);
const test_case_count = 10;
var i: usize = 0;
while (i < test_case_count) : (i += 1) {
fuzzTest(&rng);
}
}
var fixed_buffer_mem: [100 * 1024]u8 = undefined;
fn fuzzTest(rng: &std.rand.Rand) {
const array_size = rng.range(usize, 0, 1000);
var fixed_allocator = mem.FixedBufferAllocator.init(fixed_buffer_mem[0..]);
var array = %%fixed_allocator.allocator.alloc(IdAndValue, array_size);
// populate with random data
for (array) |*item, index| {
item.id = index;
item.value = rng.range(i32, 0, 100);
}
sort(IdAndValue, array, cmpByValue);
var index: usize = 1;
while (index < array.len) : (index += 1) {
if (array[index].value == array[index - 1].value) {
assert(array[index].id > array[index - 1].id);
} else {
assert(array[index].value > array[index - 1].value);
}
}
}
This test passed as well. And so I think this problem is solved.
@export
There is now an @export builtin function which can be used in a comptime block to conditionally export a function:
const builtin = @import("builtin");
comptime {
const strong_linkage = builtin.GlobalLinkage.Strong;
if (builtin.link_libc) {
@export("main", main, strong_linkage);
} else if (builtin.os == builtin.Os.windows) {
@export("WinMainCRTStartup", WinMainCRTStartup, strong_linkage);
} else {
@export("_start", _start, strong_linkage);
}
}
It can also be used to create aliases:
const builtin = @import("builtin");
const is_test = builtin.is_test;
comptime {
const linkage = if (is_test) builtin.GlobalLinkage.Internal else builtin.GlobalLinkage.Weak;
const strong_linkage = if (is_test) builtin.GlobalLinkage.Internal else builtin.GlobalLinkage.Strong;
@export("__letf2", @import("comparetf2.zig").__letf2, linkage);
@export("__getf2", @import("comparetf2.zig").__getf2, linkage);
if (!is_test) {
// only create these aliases when not testing
@export("__cmptf2", @import("comparetf2.zig").__letf2, linkage);
@export("__eqtf2", @import("comparetf2.zig").__letf2, linkage);
@export("__lttf2", @import("comparetf2.zig").__letf2, linkage);
@export("__netf2", @import("comparetf2.zig").__letf2, linkage);
@export("__gttf2", @import("comparetf2.zig").__getf2, linkage);
}
}
Previous export syntax is still allowed. See #462 and #420.
Labeled loops, blocks, break, and continue, and R.I.P. goto
We used to have labels and goto like this:
export fn entry() {
label:
goto label;
}
Now this does not work, because goto is gone.
test.zig:2:10: error: expected token ';', found ':'
label:
^
There are a few reasons to use goto, but all of the use cases are better served with other zig control flow features:
- cleanup pattern. Use
defer
and%defer
instead. - goto backward
- goto forward
goto backward
export fn entry() {
start_over:
while (some_condition) {
// do something...
goto start_over;
}
}
Instead, use a loop!
export fn entry() {
outer: while (true) {
while (some_condition) {
// do something...
continue :outer;
}
break;
}
}
goto forward
pub fn findSection(elf: &Elf, name: []const u8) -> %?&SectionHeader {
var file_stream = io.FileInStream.init(elf.in_file);
const in = &file_stream.stream;
section_loop: for (elf.section_headers) |*elf_section| {
if (elf_section.sh_type == SHT_NULL) continue;
const name_offset = elf.string_section.offset + elf_section.name;
%return elf.in_file.seekTo(name_offset);
for (name) |expected_c| {
const target_c = %return in.readByte();
if (target_c == 0 or expected_c != target_c) goto next_section;
}
{
const null_byte = %return in.readByte();
if (null_byte == 0) return elf_section;
}
next_section:
}
return null;
}
Looks like the use case is breaking out of an outer loop:
pub fn findSection(elf: &Elf, name: []const u8) -> %?&SectionHeader {
var file_stream = io.FileInStream.init(elf.in_file);
const in = &file_stream.stream;
section_loop: for (elf.section_headers) |*elf_section| {
if (elf_section.sh_type == SHT_NULL) continue;
const name_offset = elf.string_section.offset + elf_section.name;
%return elf.in_file.seekTo(name_offset);
for (name) |expected_c| {
const target_c = %return in.readByte();
if (target_c == 0 or expected_c != target_c) continue :section_loop;
}
{
const null_byte = %return in.readByte();
if (null_byte == 0) return elf_section;
}
}
return null;
}
You can also break out of arbitrary blocks:
export fn entry() {
outer: {
while (some_condition) {
// do something...
break :outer;
}
}
}
This can be used to return a value from a block in the same way you can return a value from a function:
export fn entry() {
const value = init: {
for (slice) |item| {
if (item > 100)
break :init item;
}
break :init 0;
};
}
Omitting a semicolon no longer causes the value to be returned by the block.
Instead you must use explicit block labels to return a value from a block.
I'm considering a keyword such as result
which defaults to the
current block.
Removal of goto caused a regression in C-to-Zig translation: Switch statements no longer can be translated. However this code will be resurrected soon using labeled loops and labeled break instead of goto.
New IR pass iteration strategy
Before:
- IR basic blocks are in arbitrary order
- when doing an IR pass, when a block is encountered, code must look at all the instructions in the old basic block, determine what blocks are referenced, and queue up those old basic blocks first.
- This had a bug
while (cond) {
if (false) { }
break;
}
Pretty crazy right? Something as simple as this would crash the compiler.
Now:
- IR basic blocks are required to be in an order that guarantees they will be referenced by a branch, before any instructions within are referenced. ir pass1 is updated to meet this constraint.
- hen doing an IR pass, we iterate over old basic blocks in the order they appear. Blocks which have not been referenced are discarded.
- fter the pass is complete, we must iterate again to look for old basic blocks which now point to incomplete new basic blocks, due to comptime code generation.
- his last part can probably be optimized - most of the time we don't need to iterate over the basic block again.
This improvement deletes a lot of messy code:
5 files changed, 288 insertions(+), 1243 deletions(-)
And it also fixes comptime branches not being respected sometimes:
export fn entry() {
while (false) {
@compileError("bad");
}
}
Before, this would cause a compile error. Now the while loop respects the implicit compile-time.
See #667.
Bug Fixes
- fix const and volatile qualifiers being dropped sometimes.
in the expression
&const a.b
, the const (and/or volatile) qualifiers would be incorrectly dropped. See #655. - fix compiler crash in a nullable if after an if in a switch prong of a switch with 2 prongs in an else. See #656.
- fix assert when wrapping zero bit type in nullable. See #659.
- fix crash when implicitly casting array of len 0 to slice. See #660.
- fix endianness of sub-byte integer fields in packed structs. In the future packed structs will require specifying endianness. See #307.
- fix
std.os.path.resolve
when the drive is missing. - fix automatically C-translated functions not having debug information.
- fix crash when passing union enum with sub-byte field to const slice parameter. See #664.
Miscellaneous changes
- Rename
builtin.is_big_endian
tobuiltin.endian
. This is in preparation for having endianness be a pointer property, which is related to packed structs. See #307. - Tested Zig with LLVM debug mode and fixed some bugs that were causing LLVM assertions.
- Add
@noInlineCall.
See #640.
This fixes a crash in
--release-safe
and--release-fast
modes where the optimizer inlines everything into_start
and clobbers the command line argument data. If we were able to verify that the user's code never reads command line args, we could leave off this "no inline" attribute. This might call for a patch to LLVM. It seems like inlining into a naked function should correctly bump the stack pointer. - add
i29
andu29
primitive types.u29
is the type of alignment, so it makes sense to be a primitive. probably in the future we'll make anyi
oru
followed by digits into a primitive. - add implicit cast from enum tag type of union to const ptr to the union. closes #654
- ELF stack traces support
DW_AT_ranges
, so sometimes when you would see "???" you now get a useful stack trace instead. - add
std.sort.min
andstd.sort.max
functions std.fmt.bufPrint
returns a possibleerror.BufferTooSmall
instead of asserting that the buffer is large enough.- Remove unnecessary inline calls in
std.math
. zig build
now has a--search-prefix
option. Any number of search prefixes can be specified.- add some utf8 parsing utilities to the standard library.
Thank you contributors!
- MIURA Masahiro fixed the color of compiler messages for light-themed terminals. (See #644)
- Peter Rönnquist added format for floating point numbers.
{.x}
wherex
is the number of decimals. (See #668)
Thank you financial supporters!
Special thanks to those who donate monthly:
- Lauren Chavis
- Andrea Orru
- Adrian Sinclair
- David Joseph
- jeff kelley
- Hasan Abdul-Rahman
- Wesley Kelley
- Jordan Torbiak
- Richard Ohnemus
- Martin Schwaighofer
- Matthew
- Mirek Rusin
- Brendon Scheinman
- Pyry Kontio
- Thomas Ballinger
- Peter Ronnquist
- Robert Paul Herman
- Audun Wilhelmsen
- Marko Mikulicic
- Anthony J. Benik
- Caius
- Tyler Philbrick
- Jeremy Larkin
- Rasmus Rønn Nielsen