Gettin’ Ziggy With It: Lexers & Ziglings
Published on: 2025-02-10
Okay, so I played around with Odin earlier and had a great time. There are a lot
of things to love about it. For example, what’s not too love about the walrus
operator (:=
) and the four-eyes operator (::
) for assignment? It’s heavenly.
But everyone and their mother keeps crowing about Zig lately. Zig this. Zig that. Zig cured my depression. Zig got me a job. Zig solved world hunger. Zig replaced my therapist and my the hole in my life that my father left. Like jesus f@!#&^g christ, I get it.
Why I Tried Zig
Two simple reasons:
- Error handling In Zig is cool!
- I want to compare it to Odin.
Zig Handles Errors Sanely
Zig feels like the best parts of Go’s error system with some of the tedium and
poor behavior excised. For example, errors as values makes sense. That is great.
It’s one of Go’s best features. If if err != nil
boilerplate kinda sucks, but
it’s not the bigger problem. The bigger problem is passing around tuples of
desired values and errors.
Zig’s error unions feel like a refined version of the idea. A return value can be an error value or the desired return value. I tried explaining to my partner with this analogy:
Let’s say you have a recipe for a cake. You know at the end of the recipe, if you follow the steps and they all go okay, you’ll get a cake, right? Okay, but with the recipe also told you all the possible wrong versions of the cake you could get too!? You might end up with a burnt cake if the temp in the oven is wrong. You could get a brick cake if you work it too much and make it tough.
It’s inelegant, but this does make it make sense reasonably well.
Zig vs. Odin
Simply put, how can you understand the difference between two languages if you haven’t used them. Is this a great use of my unemployed time? I think it is, but this probably is not making anyone want to hire me. I should probably be making a Pinterest clone in React or some other clout chasing exercise, but I want to follow my heart.
Hair Pulling Madness
I got a swift kick in the pants within 30 minutes of writing some Zig. Let’s
take a look at the nextToken
method I wrote on the Lexer and breakdown what
went wrong:
pub fn nextToken(self: *Lexer) token.Token {
var tok: token.Token = undefined;
self.skipWhitespace();
switch (self.ch) {
'=' => {
if (self.peekChar() == '=') {
const ch = self.ch;
self.readChar();
var l = [_]u8{ ch, ch };
var l = [_]u8{ ch, self.ch };
const literal = l[0..];
tok = token.Token{
.type = token.TokenType.EQ,
.literal = literal,
.line = self.line,
.col = self.col,
};
} else {
var l = [_]u8{self.ch};
const literal = l[0..];
tok = newToken(
token.TokenType.ASSIGN,
literal,
self.line,
self.col,
);
tok = self.newToken(token.TokenType.ASSIGN, literal);
}
},
//... more similar cases...
else => {
var l = [_]u8{0};
const literal = l[0..];
tok = token.Token{
.type = token.TokenType.ILLEGAL,
.literal = literal,
.line = self.line,
.col = self.col,
};
},
}
self.readChar();
return tok;
}
10 points to anyone who can spot the problem with this code, because it’s a big honking issue. I couldn’t figure it out at first, so let’s step away from the Lexer for a second.
Enter Ziglings
It was clear that just reading the docs and zig.guide were not quite enough for me. My background it low-level languages is too weak.
Fortunately, there exists an amazing way to pick up Zig and quickly learn some
of the low-level concepts that Zig is trying to address:
Ziglings! It features over 100 exercises that gently
increase in difficulty. Just fix the small programs in each file and run
zig build
to see if you succeeded.
I’m not 100% finished yet, but it already got me comfortable enough that I was able to go back and figure out what in the unholy hell I was doing wrong earlier.
Enter The Arena
You see, everything I was doing was almost completely on the stack. I believe
that my issue was simple and frustrating: in the main file, I would initialize a
Lexer (cool, no big deal) and then ask it to loop over the lexer until an EOF
token is found (also chill, easy enough). But when I’d go to take a look at what
was inside the tokens I had generated, the line, column, and type were
perfect… But the token literal!??! It might just be []const u8
with two 0
bytes. What the f$%k!?!
My understanding now is that the when the newToken
call was clearing up the
[]const u8
strings, which are basically a pointer and a length, were having
their contents freed. So I was keeping the pointer and grabbing (drumroll,
please)… 00000000
AKA 0
AKA NUL
.
The fix was fairly simple: make an arena allocator and pass it into the Lexer
upon intialization. Anytime you make a string, just std.mem.Allocator.dupe
the
sucker onto the heap and you’re good.
An arena is a good choice here because we are fine just dumping all the contents of the Lexer onto the heap for now and then just freeing the whole shebang once we’re done with the Lexer and Tokens. There are clearly performance optimizations that could be made, but just getting it to work is a huge deal for me.
What’s Next?
I have two things:
- My new game OctoDive is well under way and I need to kick it into high gear. I like the idea and it’s shaping up quite nicely.
- There’s an idea that I’ve been mulling about that I might want to try, but the complexity is pretty high and it’s a fair bit of mixing up different details that I have yet to broach in development. Audio streaming, efficiently storing and converting large audio files, and payments for users. It’s scary to know all the work that can go into a project and then dive in. Sometimes it better to be stupid and naive.