DATARAMA

Merlin Development Diary

I am working on a Lisp-based term rewriting language named Merlin. This is the story of its design and implementation, presented in a not-particularly-structured manner.

July 26, 2016: First Steps

I implemented the first bit of the interpreter today — after spending way too long time thinking about methods, approaches, and design principles. Sometimes I spend too much time philosophizing rather than hacking.

As it stands, the interpreter isn't much more sophisticated than a little calculator that shows intermediate results. With TRACE-STEPS on, here's one output example:

(+ (* 10 (+ 1 1)) (- 40 20) (+ 1 1))
(+ (* 10 2) (- 40 20) (+ 1 1))
(+ 20 (- 40 20) (+ 1 1))
(+ 20 20 (+ 1 1))
(+ 20 20 2)
42

Rewriting proceeds leftmost-innermost. The rewriter simply traverses a term from the left to the right, and upon finding the first reducible subterm, it recursively dives in and performs a rewrite step.

Performance is abysmal: The entire term up to the first reducible expression gets traversed on every rewrite step. That doesn't matter at this point; once I start preparing to actually compile this thing, I'll come up with a more efficient representation.

July 20, 2016: Introducing Merlin

Inspired by a stray thought about what might come out of crossing pure Lisp with a term rewriting engine, I am developing a new programming language. I have decided to keep a log of my descent into insanity progress here, to track my thoughts during the project. Or possibly to serve as a warning to others.

First of all, I should probably explain why this idea appeals to me. The essence of Lisp — the one deep idea from which the entire character of the language flows — is that code is data. The famous macro facilities are a reflection of this; they're one interesting thing we can do when code is data. This is also why Lisp has its distinctive syntax; the S-expressions are a notation for trees, and Lisp code is built out of trees.

But what if we pushed that idea even further, such that code remains data at runtime? The same linguistic structures that are used for describing programs would then form the basis of executing them as well. Computation, then, is transformation of trees into other trees, starting with the program and its inputs, and proceeding until it has been transformed into its output. Computation begins and ends with linguistic structures, and programming is designing the rules for transforming them.

If code is data, then data can just be inert code; trees that don't induce transformation. Taken to an extreme, we then have "language all the way down"; linguistic structures are used pervasively. The runtime could provide a whole range of interesting external representations. A graphics package, for example, could just be syntactic sugar for a language of graphics terms. There are also other delightful implications: A coherent explanation readily emerges for how all the usual language implementation tools—compilers, partial evaluators, debuggers—fit into the metaphor of the language. They become explainable within the language's own reality.

This is term rewriting. It's mathematically well-explored stuff, but few programming languages have been based on it. There are a few flies in the ointment too, though.

Error handing in term rewriting systems is weird. An expression that doesn't have any rules applicable to it isn't an error, it's just a normal form that can't be rewritten any further. In other words, unless special care is taken, an expression like (+ 42 "horse") isn't a type error. This implies that errors become evident arbitrarily far from where they occurred, and you end up with a half-rewritten program rather than an error message. There are several ways of dealing with this that I want to explore.

Data abstraction is another potential problem. It's very easy to build arbitrary data structures as inert terms, but they expose all their concrete structure by default, possibly leading to deeply intertwingled messy code. This can probably also be remedied with a little bit of clever language design.

It will probably be hard to compile efficiently. I'm looking forward to the challenge.

My main motivation for this project is simply that the idea just feels right in my brain. In fact, the overriding reason for the entire design is to optimize for how I personally would like to program. Nobody in their right mind would want to pay me for doing something like this, so you should probably view it as a personal creative expression rather than as a "real" product. I guess you just reach a point in your life where you feel the burning need to invent a new programming language for your own ends.

There are many things I'd like to use this language for, some perhaps of dubious sanity. At the end of the day, a large part of why I'm doing this is just the sheer joy of programming language development, and Merlin is also a fun project for me to play with and in my free time. I've started this round of hackery in my summer holidays, so here we are: I have summer evenings to burn, I have a large playlist of coding-appropriate death metal, I have an idea and I have a whole galaxy of caffeine delivery mechanisms. Let's see where this goes, and happy hacking to one and all.

 
Powered by Plutonium