A Unit of Analogy

Porting Lemon to Zig: A YACC Shave

The Angle of Attack

In our last installment, I had pounded in a spike, run out of steam, and slept on the problem. In the morning I had what I needed: an approach to the rest of the port, a place to begin that work, and a new goal to go with it.

The next piece of the puzzle was somewhat obvious in retrospect: port the parser first, covering any and all missing types and functions as I encounter them. This is the shortest path to making any of this code reachable, which is the only way to get it to typecheck, and of course from there, I can continue to follow the top-to-bottom logic of the main function. This leaves some loose ends, like option parsing, but really not many: Lemon is a straight-shot program, which loads, reads the input, digests it into data, grabs the output template, and emits the parser file, along with some other artifacts.

Once my port generates those artifacts, I have an easy test of a basic level of correctness: simply diff my output with that generated by Lemon itself. Until they are identical, my code isn’t even provisionally correct. I can ransack the Internet for grammar files which use Lemon, and even if they have customized templates, I don’t care: I can just use one canonical template and compare all the outputs. A Lemon template file cannot differ so much from the original that Lemon can’t use it, and I don’t care about compiling and running the output, just diffing it.

For a start, I happened to have pikchr.y handy, and until that compiles, there’s little advantage in adding additional samples. By the nature of the program, achieving perfect fidelity in one non-trivial input-output mapping will mean that a broad swathe of the happy path is correct; bugs uncovered by the introduction of variety should be manageable in scope.

But if this were the whole of the strategy, I would be in for a world of hurt, because I would be flying blind all the way through some rather dense and difficult code. I needed a way to ensure that each step, as I port it, leaves the program in the correct internal state to enable the next step to proceed.

Fortunately, the same technique works as early in the process as I need it to. Simply instrument lemon.c and lemon.zig with print statements, redirect stderr, and diff those until the output a) reflects the important results of the stage in question and b) is byte-for-byte identical.

Reflecting on all of this lead me to realize: by the end of this process, I would have but little work ahead of me to make the executable of lemon.zig a drop-in replacement for Lemon itself. That additional work would be to my advantage as well, because it’s my only opportunity to compare the outputs to the canonical original: once I start generating Zig, I’m on my own, needing a way to decide what correctness is on top of my obligation to achieve it.

Given that, it would be a shame to abandon such a finished program in the depths of my repository’s commit history. Why not carry it forward? While there may be no pressing need for a second, functionally-identical Lemon, there’s no harm in it either, and it could even come in handy.

With that, the plan came together. I did it, and it worked.

[roll credits]

Well. Why stop there? Here you are, reading part four of a fairly long-winded er, port report, I suppose it is. I might fairly assume my remaining audience has an appetite for this sort of thing, no?

If Zig is to be as successful as I am convinced it must be, there will be a lot of porting C to Zig in the future. Part of why I anticipate such success, is that Zig is designed specifically so there need never be any hurry to do so.

A C program can begin by converting from makefiles, CMake, or, Heaven forfend, autotools, to the Zig build system, without changing a single line of load-bearing code. This is a gentle induction into Zig land, and comes with numerous advantages, such as stellar cross-compilation, sane dependency management, and, of course, it then becomes easy to write new parts of the system in Zig instead of C.

Sometimes, although it never becomes a requirement, it will be advantageous to port some, or all, of the C code to Zig. Aside from documenting this process for my own later enjoyment, and that of anyone who might also find it to be of interest, my goal is to shed some light on what that process will entail, in the hope that this will help those who embark on similar work.

In the next chapter, I’ll concentrate on one of the most important questions which needs to be answered: where are the bytes?

Part Three: A Faithful Translation
Part Five: On Memory and Policy