Friday, December 30, 2011

Solving F# Async.StartChild Leak: Futures

I discovered a memory leak in Async.StartChild and here discuss a workaround based on a Future abstraction.

I noticed what appears to be a memory leak in F# standard library function Async.StartChild. This happened in a context of a socket server, where I attempted to perform socket reading and writing in parallel. It seems that memory use slowly grows and memory profiler points to some CancellationTokenSource-related objects not being released.

As a non-leaking alternative, I used my own abstractions. The basic idea is to use synchronizable events. Unfortunately Event is already used in F# to mean something different, so I will use the word Future instead. If you know F# events, the key problem is that subscribing to events after they happen is meaningless, for example this code procuces nothing:

let test () =
    let e = Event<_>()
    e.Trigger(1)
    e.Publish.Add(printf "%i")

In contrast, Future objects retain the value. For simplicity, I allow subscribing and triggering only once. In addition, the sample includes Executor code. I found by experimentation that running short-lived coordination tasks on a single thread instead of the ThreadPool is beneficial. Enjoy:

Symbol Interning in WebSharper 2.4

When WebSharper compiles F# to JavaScript it preserves namespaces, module and class nesting to make it easy to navigate the compiled code from JavaScript shell. Roughly speaking, A.B.C.D.E identifier in F# can be found by typing A.B.C.D.E in JavaScript.

This poses a challenge: as you can imagine, emitting long qualified identifiers everywhere is not a good idea for compact code generation. To save space WebSharper 2.4 does class/module interning. The basic idea is to say L=Microsoft.FSharp.Core.ListModule once and then say L.ofArray at every use site.

An example of this in action can be seen below:

Thursday, December 29, 2011

F# Frustration: More Async Memory Leaks

Can anyone explain why this code leaks memory?

Looks like using Async.StartAsTask makes it complete in constant space.

Friday, December 23, 2011

Hacking Type Classes in F#

Recent FPish FPish discussion focused on some hacks available in F# to write code that resembles using Haskell type classes. I particularly enjoyed the comments by Gustavo Leon and Loic Denuziere.

To cut the long story short, before compiling to .NET F# expands methods delcared inline and does overload resolution. This was intended to support flexible operator overloading, but opens up the door for interesting hacks. Even code that generalizes over higher kinds and therefore cannot exist at .NET level can with these hacks exist at the inline F# level.

Although these remain hacks (ugly code, unreadable inferred types, fragility to changes when type inference stops working, difficulty in scaling to more complicated examples), I cannot help but enjoy it. Here is a writeup that infers JSON serialization automatically over lists at the inline level:

Thursday, December 15, 2011

Making Async 5x Faster

In this article I discuss why F# Async is a good thing for writing concurrent software on .NET and show how to implement your own Async specialized for low-concurrency use. As a sample application, I look at a simple CML-style blocking channel. 30-50 lines of custom async and threadpool implementation increase the throughput from 100-400 K to 1M messages a second.

Concurrency? Again?

It is hard to believe that after so many man-years of computer science research anyone would still have quetions about how to write concurrent software. But I do, if only out of ignorance.

My specific problem right now is writing a decently scalable web server. It is really nothing more than developing a sensible bridge API between a web application and the OS, with a bit of parsing thrown in. All heavy lifting is already done by IO completion ports available to .NET code through System.Net.Socket and SocketAsyncEventArgs types.

Blocking and Non-Blocking IO

In a web server, there are two flows of data: one flow comes from the client via the server to the application, and the other goes in the reverse direction. Imperative programming answer to this the System.IO.Stream object which your app uses to read and write. It is easy to use, but there are problems.

The first obvious problem is that writes cannot really block. If you attempt to write some data, the client socket may not be ready. In fact, it can not become ready for a while. Making the thread wait for it would be lovely, but .NET threads are expensive. Not a good idea if one is to cope with thousands of concurrent clients.

Fine, how about Non-Blockig IO? It seems to solve the problem as application code resumes immediately. But what then, where does the data go? We need a buffer of unbounded size somewhere to hold the data. What happens here is the application receives no feedback when it produces data too quickly. A pathological case would be generating an 8G file.

Reading from a System.IO.Stream poses even more problems. What if the data is not there yet? Blocking wastes a thread, not blocking makes the application retry but requires programming arbitrary timeouts or busy-waits. In light of this it seems much more reasonable to invert control and have the server write data to the application instead as it arrives.

First-Class Continuations

From now on let us consider designing a writer that does not block but also does not require an unbounded buffer, throttling the data producer if the consumer does not keep up. This is an instance of the classic tradeoff between asynchronous and synchronous channels.

Ideally the language would support efficient first-class continuations. Then one would reify the continuation and shove it somewhere, to resume computation later when the client is ready:

(call-with-current-continuation save-for-later)

This, as far as I understand, is indeed how John Reppy's Concurrent ML was originally implemented. CML first-class channels are unbuffered: you can read and write on them, but both writers and readers block until they are matched. Sounds good, but is this not back to blocking threads again?

Enter Async

The answer greatly depends on what you define as a thread. CML used light-weight threads built on first-class continuations, benchmarking at millions of synchronizations per second. Without first-class continuations, what can .NET do? The answer is F# async: we can model threads as chains of callbacks, then they become first-class and capturable.

As a first cut, let us forget about Async. Just a simple channel that accepts continuation functions with its Read and Write operations. If a match is available, the continuation is invoked immediately, otherwise it is queued for later. Locking is designed to minimize the critical section region:

Note: that the code assumes a spawn : (unit -> unit) -> unit function that forks of the computation on a thread pool. It is just a form of trampolining: one could call it immediately but that makes it a non-tail call and can lead to stack overflow. You could probably work around the need to do this..

Benchmarking Message-Passing

Now let us check out the implementation with F# Async. Note that it is very easy to write adapters to Read and Write methods on the channel to give them an Async signature. F# Async syntax also makes it very easy to define new "threads" - these are exactly what they are conceptually. Although async code may jump between real .NET threads, it is logically a single thread and allows, for example, to safely use single-threaded data structures.

For the benchmark we define two async chains: one pushes a series of numbers on the channel and another one reads them and computes their sum. Then we run them in parallel.

Great. I have been getting 100, 200, sometimes if I am very lucky up to 400 K hops on this benchmark. That sounds like a good enough result at it might be for some applications. However, it looks like the benchmark is using all 4 of my cores. In fact, they seem to spend a jolly lot of time coordinating an essentially sequential computation.. More on this later.

Async is The Answer

Performance issues aside for the moment, the take home message I want you to get is this: Async is the answer for how to write concurrent code on .NET. The writer interface I talked about earlier would essentially be a form of channel passing byte array segments. The key is coordination: Async makes it easy to write and coordinate code, and does not block .NET-level threads.

But.. How About Performance?

Sceptics would say Async is a performance hog here. Just think of all the callbacks it allocates, the GC pressure, the thread hopping and work stealing, and what not! SocketAsyncEventArgs examples out there seem to offer plenty of evidence that one can design socket servers that do not allocate at all, somehow chaining the components together with preallocated callbacks.

I thought a lot about this and even tried to write some of those optimized solutions. Upon reflection, the criticism is probably valid but it is more a comment about the implementation, not the concept. Async provides a general, easy to read, understand and proofread way to orchestrate concurrent code: I simply do not see a viable general alternative. Attempting to specialize code by hand seems to yield impossible to maintain and fix spaghetti. If you know a competitive way to structure such code sensibly I will be very happy to hear about it.

For Real Performance Junkies

For real performance junkies like me, the lesson is this: if you are unhappy with the peformance of beautiful Async code, before rewriting it to spaghetti consider specializing Async implementaion. In fact, try to get the F# compiler expand the code to spaghetti instead of doing it manually. F# is not as good as GHC at this, but its inline facility does seem to help in this.

What exactly do we want from Async here? Not much. Throw cancellation and exception handling out the window. Throw thread hopping and parallelism out the window. Assume that all our callbacks execute quickly and without exceptions. Inline everything in hope that F# will reduce at least some of the intermediate lambdas at compile time, and lower GC pressure. With this in mind I got something like this:

The idea is that since we are not using all the features of F# async, our specialized implementation will be a better fit. Incidentally, the same applies to the thread pool. Instead of scheduling many tasks let us just run one (you can have N if you want) threads dedicated to running these short-lived callbacks in a tight loop. Forget work stealing, cancellation, and so on: our objective is to maximize the peformance of the mostly-sequential processes. Here is my humble attempt:

Drumroll..

With these changes, I managed to get up to 1M hops. A lot better. The key observation is that having a single thread in the pool is critical. As more threads fight for work, performance quickly decreases: the process is essentially sequential and likes to run as such. Inlining in async also seems to help but just a little. Having a simpler async helps too but the vanilla one can be tolerated.

Is this result relevant to web servers? I hope so. It seems to me that webservers run mostly-sequential processes, albeit a lot of them in parallel. Dumping all those on a shared pool would make the work jump around a lot and probably be detrimental to throughput. Is specializing Async necessary to help here? Probably not. I will be glad to hear of ways to set thread or core affinity or something like that to get similar improvements.

In conclusion, Async is the right tool for concurrent code, but you might want to tailor it a bit to your application's need.

Full code of the article:

Wednesday, December 7, 2011

Coq Trivia: Dependent Pattern-Matching and Inversion

I found some more time to study Coq. One place where I stumble very frequently is case analysis of value-indexed inductive types. There are often cases that lead to contradiction. Other cases intuitively imply some variables to be equal. Vanilla match constructs gives you no help, leaving you stranded. In proof mode, inversion tactic helps a lot. However, generated proofs are huge and hard to navigate. Another option is to use dependent pattern matching directly (which is what inversion generates for you).

I was eager to figure this out because there are times when proof mode is too confusing for me as a beginner or I seem to lack control defining precisely what I want. In particular, if you do code extraction, you do not want to use inversion-generated convoluted code. But up until today I had little success. Here is the first, and very modest, success story: retrieving the head of a non-empty length-indexed list, and discharging the Nil case with dependent pattern-matching:

Monday, December 5, 2011

Stack-Allocated Lists

Don Syme and several others sent tweets to correct my claims about boxing in F#. I made the mistake of assuming that structs cannot be used as interface implementations without being boxed and allocated on the heap. As it turns out, .NET constraints allow structs to implement interfaces without being boxed. Out of sheer curiosity, I attempted to (ab)use this facility to systematically remove heap allocation. I decided to write a simple functional program that would not use the heap at all, performing all computation on the stack.

I went for encoding lists and implementing list reversal:



This method for getting rid of heap allocation would not work in C as this particular use of generics implies runtime code generation. As far as I understand, the CLR here generates new code paths for every possible list length.

My little benchmark has stack-allocated lists of 5 elements each being reversed a bit slower than heap-allocated lists. But it is fun to try. FSI reports zero GC usage.

Friday, November 11, 2011

Optimizing the Heck Out of F#: HTTP Request Parsing

As part of the WebSharper web server effort, I have been writing an HTTP request parser. Tuning the parser for performance for the common simple case (small, correct HTTP request) has improved performance 8-fold, from 30K to 250K sample requests parsed every second on a single core of my Core i3. Let me review what I have learned from this.

Indexing


Accessing array elements goes through a bounds check. Unmanaged C++ code clearly wins here. C# has unsafe regions, but F# does not. So what can we do in F# to be competitive? The only option left is using bulk operations from the standard library. BCL is not at all helpful here - it is not clear from the documentation which functions drop the check. Also, many operations one would want are simply missing.

For an example where it matters, I was not able to match the performance of System.Array.FindIndex with any F# code I wrote to do the same job.

I imagine this is a killer problem for numerical computing. With unavoidable bounds checking, one really cannot hope to design numerical code in safe managed .NET that would match fortran routines.

Specialization


Generic code gets a staggering performance hit when certain simple operations such as equality do not get specialized to the simple value type you are using. Polymorphism has a cost. Inline F# functions sometimes help here. But it is unfortunate there is no flag to monomorphise some code. MLton users, I envy you here.

Value Types


Using value types such as structs and enums reduces the GC pressure. Note, however, that they still get boxed sometimes. For example, if a struct implements an interface, code that expects an interface implementation will receive the struct boxed. This code has to be specialized to structs.

Mutation


If we care about every bit of performance, mutation matters. However, I found myself wasting lots of time trying to wrap my head around a problem thinking about it in terms of mutation. Clearly, the diagnosis is premature optimization. What I found more helpful is writing a purely functional solution and then transforming it to eliminate allocations and introduce mutation.

Note also that the GC is good enough in most cases. One cannot afford to allocate on the heap per every byte, but allocating short-lived objects does not matter much until you need to do it 100K times a second.

Profiling


Profiling is a life-saver. I used a SlimTune profiler this time. My first discovery was that using System.Collections.Specialized.NameValueCollection for headers is really expensive. It spends a lot of time computing case-insensitive hash values for the header keys. What a bother, especially when the application does not look into the headers. I settled for queuing the headers instead and exposing them as a sequence.

The profiler helps to spend your time effectively - optimizing what really matters.

Specifics of HTTP Request Parsing


The problem is rather simple: HTTP requests keep arriving and need to be parsed and forwarded to the responder thread. In the keep-alive scenario many requests arrive on the same socket. If there is pipelining, they all come at once.

What I wanted to solve here is parsing the requests incrementally, so that if half of a request arrives we say OK and suspend in mid-air until more data is available.

Iteratees are the general solution here. However, iteratees are allocating on the heap, and F#, unlike Haskell, does not do any program transformation magic to simplify them. For this reason it seems that it is not the ideal solution, at least on the byte level.

What I ended up doing instead with incomplete requests is re-parsing. The parsing logic is expressed over a TextReader-like interface. Parser return codes are Done, Error, or Waiting. If the parser says Waiting, I keep the data in the buffer. If it succeeds, the data is discarded. Errors cannot be recovered from.

To some extent micro-parsers can be combined without using the heap. The trick here is to use mutation to return the result on success. Since the return code is an enum, I can join parsers with `&&&`:

parseMethod r req
&&& skipChar r ' '
&&& parseUntil r ' ' &req.uri
&&& parseVersion r req
&&& parseHeaders r req

In case of an early error, parsing does not stop, but there is no reason to care since most requests are well-formed.

To work with TextReader-like interface and avoid allocation, I use a constant-space ring buffer that acts as a limited-size queue for bytes. Most servers limit the size of the request head to 8192, this is what I do as well. It provides its own TextReader that assumes ASCII encoding.

The most rewarding optimization was adding custom methods to the buffer and its reader to make parseUntil and r.ReadLine possible. Instead of going byte-by-byte through several layers of indirection, I switched to System.Array.IndexOf. A ring buffer needs to do it at most twice per operation.

Monday, November 7, 2011

An F# Web Server From Sockets and Up

I have implemented a simple web server in F#. The idea was to try to marry .NET asynchronous socket operations with F# async. Result: F# async seems to be the right tool for the job of webserver implementation: it makes asynchronous programming intuitive without adding too much performance overhead. The server executes 3500 keep-alive or 1000 normal request per second on my Core i3 machine, compared to 2500/500 requests per second using IIS or System.Net.HttpListener.

Asynhronous Socket Operations


Working with sockets in .NET is done with the Socket class. From the MSDN documentation, the recommended approach is to use the asynchronous methods such as AcceptAsync, SendAsync and ReceiveAsync. These methods register callbacks to be executed when data arrives or ships through the socket. As a result of the callback approach, no threads are blocked by slow connections.

Sockets and F# Async


Unfortunately, the default interface is not very intuitive. The example code is atrocious. Since the operations are callback-based, this seems like a good match for
F# async. I went for the first mapping that came to mind:



Implementing this interface is easy - it is just working around the boilerplate: creating a SocketAsyncEventArgs object, registering a callback, calling the method, checking the result for errors. I was able to express all of it in a single helper method:



Optimizations


It seems that the common optimization paths include pooling sockets, pooling SocketAsyncEventArgs, and pooling buffers to prevent memory fragmentation. The latest point is the most interesting. Socket code itself is written in unmanaged C and passing data between garbage-collected .NET code and C code is done by pinning the .NET array used as a buffer. A pinned array is never relocated by the garbage collector, so the C code
has no trouble finding it. A lot of pinned arrays to work around make garbage collector's job harder - memory gets fragmented.

To avoid fragmentation issues, instead of allocating a lot of small arrays I allocate one huge array and then lease sections of it to be used as buffers by the socket operations.

I have not yet tried pooling `Socket` or `SocketAsyncEventArgs` objects in a similar manner.

Benchmarks


For benchmarking I have used Apache Bench (ab) tool running on Arch Linux inside a VirtualBox VM. All benchmarks involved dynamically generating and serving a "HELLO, WORLD" document on my Core i3 laptop, with ab -k -c 1000 -n 10000:

ServerKeep-alive r/sRegular r/s
F# WebServer35001000
Haskell warp/wai GHC 735003500
IIS 2500500
System.Net.HttpListener?500
node.js (Windows)800400
node.js (Linux)?3000

I do not feel very good about these numbers, in particular because I have seen claims of Haskell WARP doing 90000 r/s on only slightly faster hardware (8-core Core i5). It may be that I am hitting VirtualBox networking overhead or I have not built the Haskell code with proper flags.

But for what they are worth, the numbers seem to indicate that F# async is a good enough foundation for web servers with performance in the IIS league. It does not need to be faster, it just needs to be good enough. The real advantage is that F# async code is tremendously easier to read and write than explicit callback code.

EDIT: Please do take the benchmarks with a grain of salt. They are far from comprehensive or correctly done.

Wednesday, September 28, 2011

Transforming Large Unions in F#

Writing compilers is not a really a niche activity. If you think of it, every programmer is doing precisely that, defining a specific language for a given domain and then explaining its semantics to the computer. F#, ML and Haskell programmers know that union types are a life-saver for dealing with languages systematically. These types do, however, sometimes grow large, becoming less and less nice to work with.

If a 30-case recursive union type, or 2-3 co-recursive 20-30 case union types, is something you have to deal with, consider writing a transform function, a shallow map over the immediate children. This simple addition helps your more complex transformations focus on the cases they really need to focus on, leaving the boilerplate work to transform. The gist below provides an extended example.



There is, of course, a downside to using such shortcuts. The compiler no longer forces you to re-read and mentally check all transformations when you change the Expression definition, for example by adding new cases. If you add LetRecursive, the code will still compile but will be broken because of the catch-all pattern in getFreeIdentifiers.

Some readers will note that these transformations can be thought of cata-, ana-, para-, or some-such morphisms and can be treated directly as such. Others will perhaps suggest using reflection or meta-programming to automatically derive transform, or even more complex transforms. As always, you should use what works for you.

As for me, I find that I am not yet comfortable with climbing the abstraction ladder much further. It seems to inevitably involve some loss of control. It helps a lot to have explicit control over doing top-down, bottom-up, or mixed traversals, as evident in recursive function definitions. The technique also scales easily to languages with expression-statement distinction. Most importantly, my brain does not explode when I look at such code the way it does when dealing with monstrous morphisms.

To summarize, factoring out mapping over intermediate sub-expressions helps a lot to reduce boilerplate in compiler code while keeping this code relatively easy to read and understand. The technique, although presented in F#, applies equally well to any language with algebraic
datatypes.

Is all this practical? We found it very useful in WebSharper, the F#->JavaScript platform my company is developing. There we have union types to represent JavaScript syntax, simplified optimisation-friendly JavaScript-like language we call Core, and F# itself, as starting from version 2.4 we had to use our own quotation type in an attempt to avoid reflection.

Tuesday, September 27, 2011

A 30-40% Faster and Leaner WebSharper 2.4 is Coming

We are currently finalizing the 2.4 release of WebSharper. This release is mainly concerned with bug fixes and quality-of-implementation improvements. The exciting part is that we are witnessing about 30%-35% generated code size reduction, and a 40% reduction in compilation time.

Getting there was rather tedious. One key refactoring was replacing System.Reflection with Mono.Cecil as the main workhorse for processing F# assemblies. To use Mono.Cecil, and also to avoid some of the nasty reflection bugs in the F# itself, I had to rewrite quotation metadata parsing by having a peek at F# compiler sources. I also had to refactor most of the data structures to stop assuming System.Reflection availability.

For getting better code generation, I revised our Core language for simplicity. Both Core and associated optimizer are made simpler. Compiler optimizations are more focused, targeting the common use-cases.

In particular, local tail-call elimination combined with uncurrying finally allows for automated unrolling of recursive functional code into JavaScript loops. The following code, for example, is now safe for stack use:



I am thankful for my colleagues and their effort at building the F# community site, FPish. This codebase provided an invaluable resource for testing and optimizing the compiler on a large project.

I feel that I have learned a lot on this project. It turns out that reading about compilers is not quite the same as writing compilers! You only appreciate this truth after trying. I hardly discovered anything you might call a research contribution, but there are some practical little techniques specialized to F#/.NET that made my life easier and might help you too. Stay tuned for discussions of an approach to maintain and traverse large F# union types, considerations for designing data structures, the memoizing fixpoint pattern, large F# codebase organization ideas, and other things along those lines.

Monday, September 26, 2011

Firefox fails on nested closures at 16 layers

How deep do you think you can nest your closures in JavaScript? Yes, we know there is no tail-recursion, probably no inlining either, we know we should not do this, but sometimes we still do. Before today I expected something like 64-128 levels of nesting to work out fine. In fact, Chrome chokes only at level 512 or so, due to stack overflow - fine.

EDIT: There is Bug #517781 reported 2009-09-20, and STILL NOT ADDRESSED. Please upvote it if you can.

But Firefox.. At level 16! Reference error: x16 is not defined



This is on 6.0.2. Shame.

Sunday, September 25, 2011

Earley Parsing in Haskell

After a bit of a struggle I can run simple parsing examples using my Earley algorithm implementation [1] in Haskell. At this stage the implementation is likely to be flawed, with respect to correctness and likely performance. Regardless, the point I wanted to prove to myself is that it is perfectly plausible to define context-free grammars and their interpretation in plain Haskell, without resorting to code generation, or giving up full grammar analysis. I suspect it is a well-known method but it was a pleasure to discover. See for yourself:



Aycock and Horspool's paper [2] was my source for the algorithm. I admit I did not get very far, pretty much ignoring (for now) the bulk of their paper and the automaton construction, and focusing on their review of the basics. I also had to adjust things a bit to fit Haskell's purity, and devised a slightly modified (and possibly faulty) interaction of completer and predictor.

Earley's algorithm is beautiful. Very simple, fully general (all CFGs!), cubic in worst-case but close to linear on practical grammars, and, perhaps most importantly, incremental (think completion in an IDE). A highly recommended exercise.

[1] https://github.com/toyvo/haskell-earley
[2] http://webhome.cs.uvic.ca/~nigelh/Publications/PracticalEarleyParsing.pdf

Thursday, August 25, 2011

Minimalistic Unit Testsing in F#

So you want to do some unit testing in F#. Should you really use a framework?

The blessing and the curse of it is that there are so many frameworks out there that choosing one takes about as much time as creating one. Besides, what should a unit testing framework do anyway? There are frameworks offering real features such as randomized (QuickCheck) or exhaustive (SmallCheck) testing. On the other hand, frameworks like NUnit and xUnit do not seem to offer much - just bells and whistles such as result reporting and IDE integration. Is it worth it?

To roll our own, throw out everything but the bare bones. No reflection - all tests first class. No randomized / exhaustive testing - simple tests do not need that. Simply spend a few lines to define nice syntax, and start testing. Compile the test assembly to an executable, and run it on every build.

For example:

Thursday, July 14, 2011

Combinatory Regular Expressions in Haskell

Combinatory parsing fascinates me. It is an area that has open problems, but also a lot of solved problems that make great exercises. A while ago I tried to define a combinatory regular expression matcher in F#, and was puzzled at how complex it turned out to be. As usual, marginal understanding was to blame.

Now I turned to Haskell for a change. I do think it helped, to learn more both about the problem and about Haskell. I hope to backport this exercise to the ML world, in the meanwhile here is the gist:

Thursday, April 28, 2011

Pretty-Printed Code Blocks in LaTeX

I came across PLT Scribble which appears to be the best tool available today for writing documentation. You write text with some Scheme in it, it generates HTML and LaTeX. Without much ado, I could use PLT Racket to define a lexer for F# that pretty-prints code blocks. It worked beautifully for HTML, but in LaTeX I hit a problem – how do I define a macro that typesets its argument with preserved whitespace and mono-space font, while keeping other commands as-is?

Note, verbatim does not work – it types inner commands verbatim, instead of applying them to, say, make keywords blue. There is alltt and fancyvrb but they do not work either: somehow they change the reader too much. Scribble generated commands of the form “\char’146” that LaTeX accepts in default mode but fails to accept inside alltt or Verbatim from fancyvrb.

I knew then I had no choice but to roll my own..

Enter 1970-s text processing technology. Without proper documentation either, because I do not currently have a copy of the TeX book handy. Anyhow, here is what I got after more time than I care to admit:

Friday, February 18, 2011

Groking HOAS in F#

It is 3 AM in Budapest and, quite typically, I cannot sleep since my mind is actively engaged in remembering all the functional programming tricks I have read about. So, tonight it is HOAS, or Higher Order Abstract Syntax. If you have ever implemented a De Bruijn-indexed AST for Scheme and spent a few days catching indexing bugs, you surely would appreciate the beauty of this:

Wednesday, February 16, 2011

Where F#/.NET Falls Short: Higher Kinds

Every programmer thinks their favorite language has a problem. I am no exception. I used to think that F# misses an ML module system or Haskell-style typeclasses, but I came to realize that the complaint is really about a feature called higher-kinded polymorphism. Without this feature, there is a lot of very beautiful, general, and type-safe code in OCaml and Haskell that cannot be expressed in type-safe F#. It can be expressed, but you are facing a tradeoff:

  • Sacrifice generality – this is what workflow builders do. In effect, F# has no true monads, because you cannot easily write code generalized over an arbitrary monad.
  • Sacrifice type safety by using downcasts and rely on convention.

Option #1 is by far the most popular, but very detrimental to the F# ecosystem, as it encourages duplication of specific code that could have otherwise been expressed by a single, general definition.

Option #2 is controversial. I have attempted a few encodings I personally have not yet found a practical one, and in practice usually fall back to Option #1.

To illustrate, suppose we have this Haskell code:

Functor type class is parameterized over f, which is itself generic. This is exactly the piece that .NET and F# are lacking. If one really insists on writing code that is generic in every possible Functor, not just a concrete Functor instance, one ends up having to rely on downcasts and convention:

As you can see, this approach (1) has a run-time cost and (2) gets annoying very quickly as the complexity of the code grows.

It is sad.

Thursday, February 10, 2011

The Sad State of fshtmldoc – Can We Do Better?

If you ever tried to generate HTML documentation to document the API of F#-produced assemblies, you have likely tried fshtmldoc, a tool that is a part of F# PowerPack. It does a few things well and another couple of things not so well, there is a single overarching problem this tool. It uses System.Reflection.

Why is System.Reflection evil? In fact it is not, it is simply an API designed for a purpose vastly different from ours (HTML API report generation). The side effects of using these APIs are: the runtime needs to load the documented assemblies, and their references, which is slow and error-prone. fshtmldoc often crashes as it cannot find a reference. Also, the runtime cannot unload these assemblies. So, using this from an MS Build task, for example, is not an option, as it can make your Visual Studio session leak memory.

Is there an alternative? A non-alternative is to use C#-centric tools such as monodoc or SandCastle. These tools do not know anything about F# constructs, and are likely to produce output that will easily confuse you for code that is highly functional – full of function types, unions, records.

Enter Mono.Cecil. There is the alternative reflection and bytecode-manipulation library. It addresses all of the above problems. Could we base a documentation tool on Cecil? Yes. All that needs to be done, is to detect F# constructs from metadata.

The good news is I have almost completed this – remaining tasks are fairly minor. I am testing this out on a bunch of our IntelliFactory assemblies. More coming soon..

Wednesday, February 9, 2011

Home-made Regular Expressions in F#: Thompson NFA

Russ Cox re-discovered and popularized an old (originating in a 1968 paper by Thompson) technique for implementing regular expressions: http://swtch.com/~rsc/regexp/regexp1.html. The article offers a wonderfully simple to understand presentation, and I just could not resist trying it in F#.

First, the regular expressions themselves: the source language can be nicely described with a union type, encompassing the empty string, choice, concatenation, the Kleene star, and a token parser. The  algorithm is general enough to accept the Filter case, which generalizes over character classes and the like.

The first step is to compile this definition to a non-deterministic finite automaton (NFA). As a first cut, the state of an NFA could look like this:

When writing the compiler function, however, I realized that I would want two more features for the NFA state. First, compiling the Kleene star makes it hard to tie the knot, making one wish the language was Haskell. As a workaround, I simulate mutable union fields with a helper record. Second, it is nice to be able to compare state values to use binary search trees later in the game. To do this, I label every state with an integer and use it for comparison:

The compiler looks like this:

The key insight is that the NFA corresponds to a deterministic (DFA) machine. Whenever our machine encounters choice (Branch), it follows both options simultaneously. So then, the state of this machine (DFA state) is a set of NFA states. You can easily write a transition function that takes an input token and produces a transformation on the DFA state. Moreover, this function can be cached by token and input state. Once caching is introduced, we obtain a simple yet fascinating, lazily-constructed DFA machine.

The definition of the DFA state is not surprising:

In addition to token-based transitions, I also cache IsMatch information which can be obtained by traversing the NFA states and testing if any of them contains a Match case.

Here is the sketch of the transition function:

What remains is to write an interpreter for the DFA. I will post it in the next article when it is more tested. Perhaps I will also simplify the above definitions a bit. It is typical for me to find that the first code that I write for any problem is much more complicated than necessary.

Monday, February 7, 2011

WebSharper B5 with Extensions

We have finally released the next beta (5) of WebSharper that can run the samples I have been demonstrating on the blog. In addition, the download page now lets you play with an exciting set of extensions: Google Maps, Visualization, Info Vis Toolkit, Protovis, Raphael, jQuery Mobile and jQuery UI. Check it out: http://www.websharper.com/

Friday, January 28, 2011

WebSharper Sitelets Cont’d: Developing a Blog Service

The previous article might have left you with an erroneous impression that WebSharper sitelets only support a static set of URLs. This is not the case. In fact, the user-defined Action type can be an arbitrary data structure that maps to a URL. This comes particularly handy when designing a blogging website, since the set of blogs and their URLs is not statically bounded. Something like this would do:

Now you would need to write a router, which is roughly a bijection between Action values and URLs. A boring way to do so might look like this:

Writing routers by hand you get full control – you can inspect an Http.Request value when routing, and generate any URL when linking. However writing the above code is not much fun. If you do not particularly care for the exact URLs, there is a much shorter way to get there:

The Infer function inspects the Action type by reflection to derive a router automatically. It is smart enough to handle records, unions, most scalar types, lists, and options. In our case it derives the following mapping:

A smart thing to do would be to alter it just a little bit, for example by making ShowRecentBlogs available at the root. This can be done by combining a simple table router with the inferred router:

This is much better: 80% of the benefit with 20% of the work.

Now that URL routing is out of the way, you can finally get down to the blog service itself. I present the most interesting thing first, the controller:

The controller simply handles actions (Action values) and returns responses (Content values), no surprises here. Together with a router, the controller forms a sitelet, a self-contained website ready to go by itself or be embedded in a bigger application:

For the model in this example, you can use the simplest thing that can possibly work in a development environment, namely a static mutable variable (see the listing at the end of the article). Now, views are quite straightforward as well. The only thing of interest is the use of WebSharper Formlets (client-side form combinators) to build the UI for editing and creating the blogs. I build a simple control that renders the form and talks to the model directly over RPC, redirecting to the homepage when done:

Time to try it out:

And here is the complete code listing:

Friday, January 21, 2011

WebSharper sitelets: building a two-page website

Let me show the simplest possible self-contained example involving WebSharper sitelets that are coming with the 2.1 release. You define a website with two pages (and two actions):

Let us quickly mash up a template for the website using F#-embedded HTML combinators. A template is just a function taking the body and decorating it:

Two things to notice:

  1. F# lets you define your own syntax, and the example makes liberal use of that (=>).
  2. Instead of generating URLs by hand, you ask the context to create a link to the action. This ensures safety: renaming Home action makes the project stop compiling, and moving it to a different URL keeps the links correct.

Now, you define the sitelets:

HomePage and AboutUsPage are single-page sitelets, with a single URL and a single Action. They are combined into a website by the Sum operator.

Now, a little bit of administrative boilerplate:

And you are done! Let’s browse:

s2 s1

So far so good. The pages have the expected URLs and the menu links work.

The above could have been accomplished by any reasonable web framework. Let us push the limit and spice it up a bit now. Let us add a few lines of F# that will actually compile to JavaScript:

So there is a button that changes its title when clicked. AND there is a control. Now, this while the Button lives on the client-side completely (constructs DOM nodes, in fact), the Control executes a quantum leap: the constructor executes on the server, and the body on the client. But that means you can use the Control to glue the server and the client together. Let us update the AboutUs page:

That’s it. The user will now see a clickable, JavaScript-enabled, F#-implemented button, right where you expect it. No script tags to worry about, no dependency chasing, no “ondocumentready” worries, it just works:

s2

Below is the complete listing. As soon as the 2.1 release becomes public, you will able to enjoy running it yourself. Stay tuned!

Thursday, January 20, 2011

Can WebSharper beat ASP.NET MVC at the routing game?

At work we have been having a lot of fun designing WebSharper sitelets lately. A quick recap: WebSharper is our web development framework that is all about cross-compiling F# to neatly interoperating client-side JavaScript and server-side .NET assemblies - a GWT of sorts. Sitelets are the recently introduced server-side abstraction, the new kid on the block where the bully is of course the notorious ASP.NET MVC.

ASP.NET MVC operation is all about controllers and actions: a request comes in, is matched to a controller/action, and then gets handled. At first glance it appears that controllers are classes and actions are methods. But NO! By introducing something they consider an improvement, the MVC team decoupled actions from methods and insists now that both controllers and actions are strings:

http://haacked.com/archive/2008/08/29/how-a-method-becomes-an-action.aspx

If you now want to construct a link to a particular action in a view, you have to know it by its string name. And what if it changes? Bummer, the project compiles fine and then fails at runtime.

This is particularly sad given that they used to offer a safer way:

http://weblogs.asp.net/scottgu/archive/2007/12/03/asp-net-mvc-framework-part-2-url-routing.aspx 

While ASP.NET MVC fans wait for this feature to come back, we can do better with F#. The basic idea is that we model everything first-class: an action is just a value of a user-defined Action data type, and a controller is a value as well. Consider:

Now, a controller is simply a function of type Action –> Response. And a website? A website, or a sitelet, as we call it in WebSharper, is a controller coupled with a router, where a router is a bijection from locations to actions.

While the controller is fun to write, the router is not. One tedious way to write it is to write two F# functions, one pattern-matching a location and returning an Action, and the other doing the reverse. For actions that can be listed in a finite list, a less tedious is to write a table with a 1:1 correspondence between locations and actions. But the easiest is to automate it all:

This little function will automatically infer locations such as “/HomePage”, “/AboutUs”, “/Blog/123”... All the tedious work done, time to write the controller!

And how about safe links in the view? Nothing is easier:

And if anything happens to change, F# compiler will make sure the links are still correct.

Sitelets offer a lot more than this, such as easy embedding of F#-defined client-side (JavaScript) controls or the pre-generation of the static part of the web application as a set of HTML and resource files. But none of it can beat the fun of designing routing the functional way.