Merlin Development Diary
I am working on a Lisp-based term rewriting language named Merlin. This is the story of its design and implementation, presented in a not-particularly-structured manner.
July 26, 2016: First Steps
I implemented the ﬁrst bit of the interpreter today — after spending way too long time thinking about methods, approaches, and design principles. Sometimes I spend too much time philosophizing rather than hacking.
As it stands, the interpreter isn't much more sophisticated than a little
calculator that shows intermediate results. With
TRACE-STEPS on, here's one
(+ (* 10 (+ 1 1)) (- 40 20) (+ 1 1)) (+ (* 10 2) (- 40 20) (+ 1 1)) (+ 20 (- 40 20) (+ 1 1)) (+ 20 20 (+ 1 1)) (+ 20 20 2) 42
Rewriting proceeds leftmost-innermost. The rewriter simply traverses a term from the left to the right, and upon ﬁnding the ﬁrst reducible subterm, it recursively dives in and performs a rewrite step.
Performance is abysmal: The entire term up to the ﬁrst reducible expression gets traversed on every rewrite step. That doesn't matter at this point; once I start preparing to actually compile this thing, I'll come up with a more efficient representation.
July 20, 2016: Introducing Merlin
Inspired by a stray thought about what might come out of
crossing pure Lisp with a term rewriting engine,
I am developing a new programming language. I have decided
to keep a log of my
descent into insanity
progress here, to track my thoughts during the project. Or
possibly to serve as a warning to others.
First of all, I should probably explain why this idea appeals to me. The essence of Lisp — the one deep idea from which the entire character of the language ﬂows — is that code is data. The famous macro facilities are a reflection of this; they're one interesting thing we can do when code is data. This is also why Lisp has its distinctive syntax; the S-expressions are a notation for trees, and Lisp code is built out of trees.
But what if we pushed that idea even further, such that code remains data at runtime? The same linguistic structures that are used for describing programs would then form the basis of executing them as well. Computation, then, is transformation of trees into other trees, starting with the program and its inputs, and proceeding until it has been transformed into its output. Computation begins and ends with linguistic structures, and programming is designing the rules for transforming them.
If code is data, then data can just be inert code; trees that don't induce transformation. Taken to an extreme, we then have "language all the way down"; linguistic structures are used pervasively. The runtime could provide a whole range of interesting external representations. A graphics package, for example, could just be syntactic sugar for a language of graphics terms. There are also other delightful implications: A coherent explanation readily emerges for how all the usual language implementation tools—compilers, partial evaluators, debuggers—ﬁt into the metaphor of the language. They become explainable within the language's own reality.
This is term rewriting. It's mathematically well-explored stuff, but few programming languages have been based on it. There are a few ﬂies in the ointment too, though.
Error handing in term rewriting systems is weird. An
expression that doesn't have any rules applicable to it
isn't an error, it's just a normal form that can't be
rewritten any further. In other words, unless special care
is taken, an expression like
(+ 42 "horse") isn't a type
error. This implies that errors become evident arbitrarily
far from where they occurred, and you end up with a
half-rewritten program rather than an error message. There
are several ways of dealing with this that I want to
Data abstraction is another potential problem. It's very easy to build arbitrary data structures as inert terms, but they expose all their concrete structure by default, possibly leading to deeply intertwingled messy code. This can probably also be remedied with a little bit of clever language design.
It will probably be hard to compile efficiently. I'm looking forward to the challenge.
My main motivation for this project is simply that the idea just feels right in my brain. In fact, the overriding reason for the entire design is to optimize for how I personally would like to program. Nobody in their right mind would want to pay me for doing something like this, so you should probably view it as a personal creative expression rather than as a "real" product. I guess you just reach a point in your life where you feel the burning need to invent a new programming language for your own ends.
There are many things I'd like to use this language for, some perhaps of dubious sanity. At the end of the day, a large part of why I'm doing this is just the sheer joy of programming language development, and Merlin is also a fun project for me to play with and in my free time. I've started this round of hackery in my summer holidays, so here we are: I have summer evenings to burn, I have a large playlist of coding-appropriate death metal, I have an idea and I have a whole galaxy of caffeine delivery mechanisms. Let's see where this goes, and happy hacking to one and all.