Wednesday, September 28, 2011

Transforming Large Unions in F#

Writing compilers is not a really a niche activity. If you think of it, every programmer is doing precisely that, defining a specific language for a given domain and then explaining its semantics to the computer. F#, ML and Haskell programmers know that union types are a life-saver for dealing with languages systematically. These types do, however, sometimes grow large, becoming less and less nice to work with.

If a 30-case recursive union type, or 2-3 co-recursive 20-30 case union types, is something you have to deal with, consider writing a transform function, a shallow map over the immediate children. This simple addition helps your more complex transformations focus on the cases they really need to focus on, leaving the boilerplate work to transform. The gist below provides an extended example.



There is, of course, a downside to using such shortcuts. The compiler no longer forces you to re-read and mentally check all transformations when you change the Expression definition, for example by adding new cases. If you add LetRecursive, the code will still compile but will be broken because of the catch-all pattern in getFreeIdentifiers.

Some readers will note that these transformations can be thought of cata-, ana-, para-, or some-such morphisms and can be treated directly as such. Others will perhaps suggest using reflection or meta-programming to automatically derive transform, or even more complex transforms. As always, you should use what works for you.

As for me, I find that I am not yet comfortable with climbing the abstraction ladder much further. It seems to inevitably involve some loss of control. It helps a lot to have explicit control over doing top-down, bottom-up, or mixed traversals, as evident in recursive function definitions. The technique also scales easily to languages with expression-statement distinction. Most importantly, my brain does not explode when I look at such code the way it does when dealing with monstrous morphisms.

To summarize, factoring out mapping over intermediate sub-expressions helps a lot to reduce boilerplate in compiler code while keeping this code relatively easy to read and understand. The technique, although presented in F#, applies equally well to any language with algebraic
datatypes.

Is all this practical? We found it very useful in WebSharper, the F#->JavaScript platform my company is developing. There we have union types to represent JavaScript syntax, simplified optimisation-friendly JavaScript-like language we call Core, and F# itself, as starting from version 2.4 we had to use our own quotation type in an attempt to avoid reflection.

No comments:

Post a Comment