A Unit of Analogy

Porting Lemon to Zig: A YACC Shave

An Introduction

Recently I embarked upon a bit of a yacc shave: porting the Lemon parser generator to Zig. Zig’s most cherished goal is to be a viable replacement for C. How does it stack up?

Background

Lemon is the creation of D. Richard Hipp, best known as the author of SQLite. SQLite is the most widely installed database in existence: whatever device you’re reading this on has, most likely, dozens of SQLite databases. It is famed for its reliability, efficiency, portability, and correctness.

SQL is a very complex language. The parser for SQLite’s SQL dialect, the program which converts a query string into a form suitable for compiling, is written in Lemon. This is a parser generator, or compiler-compiler, which uses LALR(1), an algorithm with good characteristics: it always yields an unambiguous parse, in time linear to the input, and with only one token of lookahead.

This is the same basic technique as used by yacc itself, and its various ports such as bison. These programs may be obscure, but parsers generated using them are under the hood of a remarkable number of languages. Even languages which ultimately use a hand-rolled parser will often begin with a LALR(1) generator, to ensure that the grammar defining the language is, in fact, unambiguous and context-free.

Lemon, in drh’s opinion, and mine, improves on yacc and its dialects in several important ways, which we will explore in more detail as this series continues.

All of the aforementioned programs are code generators, emitting source code which is then compiled to produce the parser specified in the input file. Canonically, the output language is C, and for Lemon, the output language is only C. Anticipating a need for myself and others to generate parsers in Zig, I decided the path forward was to port Lemon to Zig. This is a classic Yak Shave, in that the end is not the port itself, but the things I can accomplish with it once the port is complete, and then adapted for Ziggish purposes.

Yacc Shaving: Prelude

A confession: I’m the biggest D. Richard Hipp fanboy there is. I admire his patience and kindness, the rigorous quality of his work, and his success in earning a right livelihood while donating the central substance of his creations to the public domain.

A factor which appears essential to this success is his willingness to, or perhaps, insistence upon, building his own tools to support his work. For reasons I haven’t enquired about, but are hinted at in the documentation, he didn’t find yacc itself suitable for his needs in a parser generator, so he wrote Lemon instead. With this complete control over the mechanism of his parser generator, he was able to create a robust SQL parser for SQLite. Needing a revision control system for SQLite, he wrote Fossil, a “coherent software configuration management system” with much to recommend it. The SQLite web page is served by althttpd, rather than, say, Apache. Needing to improve his syntax diagram workflow for SQLite, and otherwise make technical diagrams for his documentation, he wrote Pikchr, a port of Brian Kernighan’s pic language with, as we must expect, various changes and improvements to suit his needs. Not a complete list, even!

Pikchr is where this yacc shave begins. Lacking in myself a certain humility, and seeing opportunities to improve Pikchr, at least from my perspective and toward my own purposes, I’ve been hacking on it on and off for the last couple of years. That work isn’t complete, so I won’t be linking to it, but the curious can find the story and the fork on the forum.

Pikchr, and you might have guessed this already, is parsed with a Lemon grammar. Having spent a good chunk of my professional life researching parsing theory under a DARPA grant, with my focus being Parsing Expression Grammars, I was familiar with the compiler-compiler school of parser generation, and was poised to appreciate the subtle ways in which Lemon is better than the mainstream of such tools.

It was my work on parsers which lead me to Zig, and that has kindled a love affair which shows no signs of slowing. Recently, as I prepared to hand-roll yet another parser1, I woke up one morning and decided to solve the problem that particular parser represents several times, rather than just once. Perhaps, in the process, easing the solution to similar problems for others as well.

A Wild Strategy Appears!

Some time ago, on the Pikchr forums, I came across a blog post on porting2 Pikchr to the Go language, for easier integration into the Hugo static site generator. This, naturally, required porting Lemon to Go as well. This gave me a template (quote from the linked post):

  1. Translate lemon.c to Go, but still generate C code

  2. Find all the bugs until it compiles and generates the exact same C code

  3. Translate the parser template, lempar.tpl to Go, and generate Go code

  4. Laboriously find all the bugs

The post has two more steps, but I’m not intending to translate Pikchr. The plan was afoot: substitue Zig for Go in the above recipe, and write lemon.zig for great justice!

Unlike Go, using Pikchr from Zig programs would be very nearly trivial. The interface consists of a single function, and Zig is not only world-class in its support for linked C libraries, but ships with a C compiler as well.

So much so, that I considered keeping my port of Lemon in C, and merely changing the code generation to emit Zig instead. Evaluating that strategy convinced me that it wouldn’t be that much more work to just port the whole thing, and so, uttering the Programmer’s Credo3, I rolled up my figurative sleeves and got to work.

In the next posts, we’ll do a blow-by-blow of that project. In the process, I’ll explore the ways in which Zig and C differ, the ways in which they are congruent, and why Zig is the most credible candidate to replace C in existence.

  1. The alert reader might fairly suspect that this parser is intended to be a parser-parser. This is correct. Yo dawg I heard you like parsing? Also correct.

    1

  2. My thanks to the author for not making the “yacc shave” joke. I would have beaten it to death anyway, but felt bad about it.

    2

  3. “We do these things, not because they are easy, but because we thought they would be easy”.

    3

Part Two: Lemon.c: A Critical Review