3/14/26
The shape of compilers originates from a long lineage of systems, which has mostly involved building on top of previous foundations. As a result, we still see the ghosts of many constraints from the 70s in modern designs.
When two systems wish to interact, they require a shared representation, I call this a seam. Seams exist everywhere, from biological to social to computational domains. When you wish to pick up a pencil, your cerebellum transforms the high level intent into intricate nerve firing patterns, that’s a seam. When you want to allocate a spot in memory and type malloc(), that’s a seam.
The thing about seams is that they accumulate complexity by requiring both ends to agree on what exactly goes in and comes out. The more interdependent systems around the seam are, the more they cascade upwards into exponential trees of complexity that become ecosystems orders of magnitude larger than the seam that spawned them. The mechanism through which this occurs is local solutions to non-local problems. Seams inherently make any problem domain non-local unless both systems on either end are in perfect sync. The mechanism of this non-locality is that there must be symmetry between the raising and lowering action of the seam: if the information that goes in comes out the other end differently, that’s asymmetry requiring handling. The shape this most often takes is the reconstruction of destroyed information which was discarded because of bandwidth concerns.
A seam’s width is its bandwidth: what is the minimal input of the raising system. In a network this is literally bandwidth, in compilers its computer memory. There was a time when memory was limited enough that bandwidth was a real constraint, which necessitated seams between processing stages; you simply couldn’t hold the entire program in memory without making it smaller in some way or doing less with it. These days this is far from the case, but the ghost of this constraint still haunts many systems. This is because once a seam is in place, you can’t just remove it, it’s integral to the very function of the system, for it defines the shape and every assumption that system makes.
The two ends of a seam are raising and lowering, or downwards and upwards abstraction, or encoding and decoding, take your pick. Most systems have both a raising and lowering component, as well as a middle processing stage that takes advantage of the fully raised state. I call these TAST (raising), DRE (processing), and MIX (lowering), essentially any system of process can be broken down into one or all three of these. As an instance, ANTLR is a TAST system, it raises a grammar specification to a parsed tree. LLVM is a MIX, it lowers IR to platform-specific binary. The DRE is where the implementation of most programs exists, Hindley-milner is a DRE system, borrow checking is DRE, type systems are DRE.
What I’ve noticed is that almost no compiler is just one of each, instead, they’re often multiple stages of TAST->DRE->MIX, each one lowering the representation only to have the next system raise it again for its DRE, so it can lower it again, and then the next system raises… from text to silicon this happens anywhere from 6 to 12 times depending on the compiler and OS implementation, with many smaller seam navigations in between. What this produces is systems where 70% of the implementation is spent on the TAST and MIX despite the only real goal being the DRE, often at the massive expense of performance and maintainability. The thing is, these seams are almost entirely artificial to the problem domain, when they’re properly removed suddenly C is 1500 lines of handlers and not 2 million lines of seam management. The key is this: instead of lowering and raising the same information between subsystems, have each system mutate one stream of information into the shape needed by the next pass. That means one protocol, one representation, one decision on what an object even is.
This is the core principal underlying what I’m working on with GDSL.