Watch the recording on YouTube!
Over the last year, I've been hacking occasionally on a few different projects in my free time:
The one thing that left me unsatisfied in these projects is that I couldn't easily have the power of Scheme with the low-level memory management and system access of C.
Scheme is designed to be garbage collected! You can access C APIs and structures but ultimately you're still dealing with a garbage collected language.
What if I could write my own Scheme-inspired language for systems programming?
To accomplish these goals, here are the parts that are needed:
This is the obvious part. To produce standalone binaries, I'm going to need a compiler that can take the code of the language and produce an intermediate representation which can be optimized and ultimately turned into a form of assembly.
The intermediate representation will have only the necessary code prepared in a format that is ready to be converted to machine language.
One other job of the compiler would be to manage just-in-time (JIT) compilation at runtime. Since this is a Lisp, we definitely need the ability to dynamically evaluate code!
The assembler takes a low-level representation of your code and turns it into machine instructions. You've probably seen code that looks like this:
.text
.global _start
_start:
mov $1, %eax # System call number 1: exit()
mov $42, %ebx # Exits with exit status 42
int $0x80 # Invoke system call
This is a simple program that just sets the program exit code to 42 and then exits.
It is written in a way that is closer to what actually happens in machine instructions, but in reality an instruction like mov can turn into a variety of different operation codes based on the parameters that are given!
The assembler needs to know the appropriate instruction encoding for the target architecture (64-bit Intel in my case) so that it can produce the proper opcodes and use the processor effectively.
One other interesting aspect here: the assembly representation will be written in the language itself, meaning that the full power of macros can be used for low-level code generation!
The output of the assembler will be object files, typically in the system's object file format. These object files will be used by the next component in the toolchain: the linker.
The linker's job is to take the object files produced by the assembler and produce a working executable. For now I'm only focused on 64-bit Linux so I have a clear path to follow to produce ELF (Executable and Linkable Format) binaries.
This is the part of the code that is very OS-dependent so I probably won't get to Windows support for a while and macOS support for even longer.
Once I have a toolchain that is capable of compiling basic code, I'll also need some kind of lightweight runtime or standard library that can implement memory management and other low-level tasks like module loading. My goal is to make this part as small as possible so that the output program isn't weighed down with unneeded code.
Any extra behavior needed by the programmer should be pulled in using modules which aren't built into the program by default.
I'll try to produce a set of modules that feel like a coherent standard library for the language, providing all the functionality and data structures you would need for day to day coding. The compiler itself may not come with these parts bundled: it could be better if they were installed as dependencies for your project where only the parts you use actually get compiled into the program.
This does mean that compiled programs wouldn't be able to arbitrarily load modules that weren't compiled, but I might be able to find a way to make that possible, perhaps with a secondary tool.
I'm actually starting from the bottom with this project! I want to write my own standalone toolchain which doesn't depend on anything else aside from the language/compiler needed to bootstrap the project.
I've started working on the assembler and linker first which may sound strange but it has a few benefits:
I'm using Chibi Scheme as the bootstrapping language because it's easy to embed in a simple C application. This gives me the ability to produce a compiler executable which can produce binaries for my language before I can rewrite the compiler in the language itself.
Bootstrapping your Lisp with another Lisp is a longstanding tradition!
Primarily to learn! But also because it's an opportunity to write a fully Lisp-oriented toolchain from the ground up and optimize everything for my goals. For some reason the idea is irresistable to me: I started thinking about it a year ago and it keeps popping up so now I've decided to go for it!
Another option is using something like LLVM, but LLVM is huge and is a whole system I'd have to learn anyway!
By writing the toolchain from the ground up, every aspect of it can be optimized for the control flow and memory management schemes of the language: continuations and hybrid GC/manual memory. It'd be similarly hard work to do this with existing toolchains, so it's better to do it myself and learn a lot in the process!
What I want in the end is a tool that is tailored to the kinds of projects I want to work on and produces the kind of programs that feel right to me. Once it's working I'm going to write a lot of the programs I've been dreaming about for years!
Part of the reason I'm telling you about this today is to see if you would be interested in hearing more about building a compiler toolchain from scratch. I'm no expert, but I can share what I'm learning over time!
The project itself won't be meant for other people to use for a long time, I'm really just building it for myself for now. However, if you want to follow along you can sign up for the mailing list on Sourcehut here:
https://lists.sr.ht/~mesche/dev
The (empty) repository for the compiler project is here:
https://git.sr.ht/~mesche/compiler
Why the name "Mesche":
Mesche scheme