Strange Loops

terminology

Mon Apr 06 13:22:38 -0700 2009

In Godel, Escher, and Bach, the author describes a phenomenon called strange loops. Strange loops are things that are defined in terms of themselves. Like MC Escher’s hands drawing themselves:

Self-referencing systems are more than just visual candy; they’re the backbone of many key conceptual systems used by humankind. For example, a dictionary defines the meaning of words, but what do you get when you look up a word? More words. What happens when you look up those words? More words. It’s turtles all the way down.

Some strange loops in software: gcc is written in C. Rubinius is written in Ruby. Postgres stores its internal data in itself.

In computing, the verb we use for strange loops is bootstraping (or “booting” for short). The metaphor for bootstrapping is grabbing yourself by the straps of your boots and pulling hard enough that you levitate. On its surface, writing gcc in C, Rubinius in Ruby, gcc in C, or storing Postgres in Postgres seem as nonsensical as the literal image of bootstrapping.

I suspect most development tools and deployment infrastructure bump into this sooner or later. The author of TextMate probably uses TextMate to edit the source. PostgreSQL stores internal information about users, databases, and tables in a set of internal tables (pg_*). At boot, the Linux kernel reads the device driver for being able to access the hard drive from that very same hard drive.

One way to build software defined in terms of itself is primitives. Instead of thinking as the software as one unified mass - like Ruby, the language, with all its features and add-on libraries - Rubinius builds up some basic pieces of the Ruby language to create a sort of proto-Ruby, and from that it becomes possible to define everything else that Ruby does.

gcc uses primitives. If you’re ever compiled it from source, you know that there is a multi-pass process, with the initial pass building a rudimentary compiler that can compile its more complex sibiling. I used a simple version of this technique in rush, where there are fifteen or so primitives defined in the connection class, from which an infinite number of more complex functions can be built.

There’s another sort of strange loop going on to bootstrap self-hosted languages like C or Rubinius: using an ancestral version of a tool to build a newer one. The entire build process for Rubinius is controlled by rake, which is of course a Ruby program. Rubinius can count on the “ruby” command always existing - for now, it’s Matz’ Ruby Intepreter (MRI, or what most of us just think of as Ruby 1.8), but in the future it may be a previously compiled version of Rubinius.

Ken Thompson used this bootstrapping technique to insert a backdoor into the original version of unix, it should be noted. So this technique is not without its potential downsides.

Strange loops in software are like mental mobius strips: challenging to get your head around, but fun (and mind-expanding) when you do, and a powerful technique for building infrastructure tools. Very meta.