Don’t settle for a playground

This is a transcript of a talk I did at Booster 2023 in Bergen.

Here’s a scene. Imagine we’re having a stand-up meeting. A stand-up meeting, as you will be aware, is a ritual where a team of software developers stands in a circle in the hope of summoning the spirit known as agility. If a sufficient number of teams performs this ritual, it brings about a digital transformation in the organization, and everyone becomes happy and productive.

Anyway, we’re having this stand-up meeting, and something comes up. Suddenly everyone goes quiet, and gazes down at their feet. There is a moment’s awkward silence, until someone looks up from under their eyebrows and says “we’re gonna need a grown-up for that”. And then we snicker a bit, cast sideways glances, shuffle our feet, and move on to something else.

Is this scene familiar to you? Have you been in this scene? I don’t know how culturally specific it is, but I’ve been in this scene multiple times. And I must say that to me, this scene is the least dignified scene in all of software development. It makes my skin crawl, it fills me with shame, and it makes me angry.

Here’s the thing: I’m not a child, and I’m not going to cosplay as one. I’m a grown man! I’ve been married for 20 years! We did a house renovation project last year. It was hell and cost a small fortune. I have two teenagers at home. Even my kids aren’t kids anymore! If I’m not grown-up, then who the hell is?

And it’s not just me. Take a look in the mirror! Do you see a carefree child, or a person juggling a tangled mess of incompatible responsibilities and making trade-offs all day long, every day? Of course you do. Everyone does.

And still we perform that stupid little scene. At least in my part of the world we do. I’m sure I know developers who played out that scene just last week. Isn’t that strange? I think it’s strange. Why do we do it?

When we behave in strange ways, it’s often because we’re getting in touch with something uncomfortable. And so we look for a way out, a way to diffuse the nervous energy and get back to normal. We use jokes for that. But we shouldn’t. We shouldn’t fall for that temptation. Discomfort is actually useful and interesting. It can be a clue, a symptom of something. If we look a little closer at that discomfort, maybe we can learn something, and maybe we can fix the problem at the source.

Obviously, it has something to do with brushing up against some kind of boundary. We find ourselves at the threshold of something, and the most obvious candidate is the good old business/IT divide. That awkward point where what we do as software developers meets the reason why the business exists. And when it feels too business-y, we retreat, and call for the real business people to take over.

When we say “we’re gonna need a grown-up for that”, what does it really mean? It’s code for something, right? What could it be? What are we really saying? Here are a few candidates:

  • We don’t have the autonomy to make that decision.
  • It’s not our responsibility.
  • We don’t have enough information.
  • It’s not just up to us, it’s going to affect someone else too.

If I’m right, and these are the kinds of things we’re trying to say, why don’t we say them outright? It seems like a shame, because they all sound like really interesting topics to me. We could have meaningful conversations about our environment, our boundaries, the constraints we operate under, our dependencies on others, our degree of ownership over the problem we’re solving, etc etc. And not just as generic, abstract topics from the realm of philosophy either, but concretely: we could build our understanding of what these big words actually mean to us, in our context. But if we just laugh it off as something awkward, we won’t be having those conversations. And so those things are going to remain unexamined and unclear.

You might say, ok, but why does it matter? What’s the harm? It’s just a joke, right? I think it matters, and that it does real harm. First of all, the child/adult trope sediments a very real divide between “the business people” that make decisions on one side, and “the developers” that carry out instructions, essentially still just building to order, on the other. This is the divide that we’ve been fighting for at least 20 years of agile software development. I would argue that it’s a really bad idea to reinforce that divide, even in the form of a joke.

And second, it’s much easier to challenge the things we say outright. We don’t have the autonomy to make that decision? Ok, let’s talk about that. First of all, is it true? Or are we just scared, or reluctant, to exercise the autonomy that we do have? And if it is true, what would need to change for us to have that autonomy? Is it possible to make those changes? Who would need to be involved? What would the consequences be? This conversation offers a path towards potentially increased autonomy, or at least clarified autonomy.

And we do want autonomy, right? I’m sure that many of you are familiar with Daniel Pink’s book Drive, about what motivates us as knowledge workers. He says there are three things: Autonomy, mastery and purpose. And autonomy is the first word in that list.

Well, what kind of autonomy is that going to be? What are we aiming for here?

Consider the difference between the autonomy of children and the autonomy of adults. If you take a bunch of children and put them in a playground, they have considerable autonomy within that playground. They can run around freely, without a care in the world. They can do what they want and play with whatever toys are available. But they can’t venture outside. They need a grown-up for that.

Adults have a very different kind of autonomy. They can do many more things than children can, but that type of autonomy comes with a bunch of strings attached, like responsibility and accountability. Right? When we make a choice, we’re willing to own the consequences in a way, or at least be able to explain to ourselves and others why we made that choice. In a sense, that’s what it means to be an adult.

Similarly, I think we can distinguish between autonomy inside and outside the developer bubble. That’s our playground. I once tried to tell a joke, that the ultimate autonomy for a developer is “THEY LET US USE HASKELL”. I thought that was pretty funny.

The thing is, no-one really cares what programming language you use. Not really. They might have some concerns about being able to replace you if and when you leave, but ultimately it doesn’t matter. It’s not important. They’ll manage. They’ll find someone. Similarly, no-one really cares if you have one or two or three monitors on your desk, or what kind of mechanical keyboard you prefer, if it’s an expensive model. They can afford them all. It’s peanuts. It’s insignificant. It’s just toys to them. Choose whichever you like.

Now consider this: if no-one really cares what you decide, does it even make sense to think of it as autonomy?

If we want real, significant autonomy, it’s going to have to be about entirely different things! Almost by definition, it’s going to be things that are painful to yield control over, because they actually matter to the so-called business people. It’s going to require real trust, and a conviction that your team is in fact best equipped to make that decision. It must be because they realize that you actually understand the business problem the best, and that you are best situated to own the consequences of that decision. That’s what it means to own a business problem, or to focus on business outcome over output. You can’t do that if your idea of autonomy is to use Vim.

Ironically, real autonomy tends to involve exactly the things that make us snicker nervously and say “we’re going to need an adult for that”. The thing to realize is that, yes, obviously we’re going to need an adult for that. Probably several. But the good news is: here we are! The adults we need are right here! We’re standing in a circle, trying to summon agility!

Of course I realize that it’s not just up to the team itself. We’ll have to fight for that kind of autonomy, and there will always be limits. Not even adults have unbounded autonomy. But we can explore those boundaries, and see where they yield to pressure. There’s no reason to fail on the safe side.

I would like to see development teams with much higher ambitions with respect to autonomy. The aim should be to become tiny organizations in their own right: self-organizing, self-governing and self-modifying. They have a mission and a budget, and they are free to carry out that mission in the way they see fit, within that budget. They can even challenge the mission itself, because they understand why the mission is important, and how it fits into the larger scheme of things. They can pivot if they find a more useful mission to pursue within that scheme. Or they can choose to disband. Dissolve the team itself, if it turns out that mission is ill-advised or infeasible or outdated. Just pass whatever money you have left back and say “it would be better for the business to invest this in something else”. That would be the grown-up thing to do.

But of course, we must really want that kind of autonomy. The kind that comes with responsibility, accountability and real business impact. We must prefer to be adults. We can’t settle for a playground.


Something for nothing

I thought I’d jot down some fairly obvious things about values in programs.

Say you have some value in your program. For instance, it could be a String, or a Thing. Then conceptually, each String belongs to the set of possible Strings, and each Thing belongs to the set of possible Things. Right?

Like so:

thing-string

Of course, you might even have something like a function from String to Thing or the other way around. That’s no different, they’re just values that belong to some set of possible values. Hard to draw, though.

In programs, this notion of sets of possible values is baked into the concept of types. So instead of saying that some value belongs to the set of possible Things, we say that it has type Thing or is of type Thing.

I wish that was all there was to it, but alas, it gets more complicated. Not only do we want to represent the presence of values in our programs, sometimes we want to represent the absence of values as well. The absence of a value isn’t necessarily an error. It could be, but it could be fine, too. There are many valid reasons why we might end up with absences flowing through our programs. So we need to represent them.

This is where null enters the picture.

In languages like C# and Java – any object-oriented language that carries DNA from Tony Hoare’s ALGOL W – the drawing of the sets above doesn’t map directly over to types. For each so-called reference type (object values that are accessed by means of references), there’s a twist. In addition to the set of possible values, each reference type also allows for the value null:

thing-string-null.png

It looks pretty similar, but the consequences for program semantics are significant.

The purpose of null is, of course, to represent the absence of a value of a given type. But now your type identifies a pretty weird set of possible values. In the case of Thing, for instance, you have all the legitimate actual Things, but also this weird thing that represents the absence of a Thing. So it’s decidedly not a Thing as such, yet it is of type Thing. It’s bizarre.

But it’s not just confusing to think about. It causes practical problems since null just doesn’t fit in. It’s a phony – a hologram that successfully fools the compiler, which is unable to distinguish between null and proper values of a given type. It’s nothing like the other values. Hence you need to think about it and worry about it all the time. Since it’s not really there, you can’t treat it like a proper value. You most decidedly can not invoke a method on it, which is sort of what you do with objects. The interpretation of null is radically different from the interpretation of all the other values of the same type. (Interestingly, it’s different in precisely the same way for all types: how null sticks out from legit Things mirrors how it sticks out from legit Strings and everything else)

But it gets worse. Once you’ve invited null into your home, there’s no way of getting rid of it! In other words, when you make null part of, say, the Thing type, you can no longer express the idea one of the actual, legit Things, not that spectral special “Thing”. There is no way you can say explicitly in your program that you know that a value is present. It’s all anecdotes and circumstance. You can obviously take a look at some value at any given time in your program and decide whether it’s a legit Thing or just an illusion, but it’s completely ephemeral. You’ve given your type system a sort of brain damage that prevents it from forming memories about absence and presence of values: you might check for null, but your program immediately forgets about it!

So much for reference types. What about so-called primitive values, like integers and booleans? Neither can be null in C# or Java. So the question is: lacking null, how do we represent the absence of a value?

Well, one hack is to think “Gee, I’m not really going to use all the possible values, so I can take one of them and pretend it’s sort of like null.” So instead of interpreting the value literally, you override the interpretation for some magic values. (Using -1 as a special value for integers is a classic, in the case where your legit values are non-negative.) The consequence is that you now have two kinds of values inside your type, operating at different semantic levels and being used for different purposes. Your set of values isn’t just a set of values anymore – it’s a conglomerate of conceptually different things.

This leaves us in a situation that’s similar to the reference type situation, except it’s ad-hoc, convention-based at best. In both cases, we have two things we’re trying to express. One thing is the presence or absence of a value. And the other thing is the set of possible values that value belongs to. These are distinct, orthogonal concerns. And usually, the word of wisdom in programming is to let distinct things be distinct, and avoid mixing things up.

So how can we do that? If we reject the temptation to put special values with special interpretation rules into the same bag as the legit values, what can we do?

If you’re a C# programmer, you’re thinking that you can use Nullable to represent the absence of a primitive value instead. This is true, and you should. Nullable is a wrapper around a primitive type, creating a new type that can have either null or a legit instance of the primitive type as value. On top of that, the the compiler works pretty to hard to blur the line between the underlying type and its Nullable counterpart, such as special syntax and implicit type conversion from T to Nullable<T>.

In a language with sum types, we can do something similar to Nullable, but in a completely generic way. What is a sum type? It is a composite type created by combining various classes of things into a single thing. Here’s an example:

type Utterance = Fee | Fie | Foe | Fum

So it’s sort of like an enumeration of variants. But it’s a rich man’s enumeration, since you can associate a variant with a value of another type. Like so:

type ThingOrNothing = Indeed of Thing | Nothing

This creates a type that distinguishes neatly between legit Things and the absence of a Thing. Either you’ll indeed have a Thing, or you’ll have nothing. And since absence stands to presence in the same way for all types, we can generalize it:

type Mayhaps<'T> = Indeed of 'T | Nothing

The nice thing is that we can now distinguish between the case where we might have a value or not and the case where we know we do have a value (provided our language doesn’t allow null, of course). In other words, we can have a function of type Mayhaps<Thing> -> Thing. This is a big deal! We can distinguish between the parts of our program that have to worry about absent values and the parts that don’t. We’ve fixed the brain damage, our program can form memories of the checks we’ve made. It’s a beautiful feat of surgery, enabled by sum types and the absence of null.

So sum types neatly solves this problem with representing absence and presence of values, on top of, and orthogonally to, the issue of defining a set of possible values for a given shape of data. What’s more, this is just one application of a general feature of the language. There is no need to complicate the language itself with special handling of absence. Instead, you’re likely to find something like Mayhaps in the standard library. In F# it’s called Option, in Elm it’s called Maybe. In languages like F# and Elm – any functional language that carries DNA from Robin Milner’s ML – you’ll find that you have both sum types and pattern matching. Pattern matching makes it very easy to work with the sum type variants.

The code you write to handle absence and presence follows certain patterns, which means you can create abstractions. For instance, say you have some function thingify of type String -> Thing, which takes a String and produces a Thing. Now suppose you’re given an Mayhaps<String> instead of a String. If you don’t have a String, you don’t get a Thing, but if indeed you do have a String, you’d like to apply thingify to get a Thing. Right?

Here’s how you might write it out, assuming F#-style syntax and pattern matching:

match dunnoCouldBe with 
| Indeed str -> Indeed (thingify str) 
| Nothing -> Nothing

This pattern is going to pop up again and again when you’re working with Mayhaps values, where the only thing that varies is the function you’d like to apply. So you can write a general function to handle this:

let quux (f : 'T -> 'U) (v : Mayhaps<'T>) : Mayhaps<'U> = 
  match v with 
  | Indeed t -> Indeed (f t) 
  | Nothing -> Nothing

Whenever you need to optionally transform an option value using some function, you just pass both of them to quux to handle it. But of course, chances are you’ll find that quux has already been written for you. In F#, it’s called Option.map. Because it map from one kind of Option value to another.

At this point, we’ve got the mechanics and practicalities of working with values that could be present or absent worked out. Now what should you do when you change your mind about where and how you handle the absence of a value? This is a design decision, even a business rule. These things change.

The short answer is that when these things change, you get a rippling change in type signatures in your program – from the place where you made a change, to the place where you handle the effect of the change. This is a good thing. This is the compiler pointing out where you need to do some work to make the change work as planned. That’s another benefit of treating the absence of values explicitly instead of mixing it up with the values themselves.

That’s all well and good, but what can you do if you’re a C# or Java programmer? What if your programming language has null and no sum types? Well, you could implement something similar to Mayhaps using the tools available to you.

Here’s a naive implementation written down very quickly, without a whole lot of thought put into it:


public abstract class Mayhaps<T>
{
private Mayhaps() {}
public abstract bool HasValue { get; }
public abstract T Value { get; }
public abstract Mayhaps<TR> Map<TR>(Func<T, TR> f);
private class MayhapsValue : Mayhaps<T>
{
private readonly T _value;
public MayhapsValue(T value)
{
if (value == null) {
throw new ArgumentNullException("Begone, null!");
}
_value = value;
}
public override bool HasValue
{
get { return true; }
}
public override T Value
{
get { return _value; }
}
public override Mayhaps<TR> Map<TR>(Func<T, TR> f)
{
return Mayhaps<TR>.Indeed(f(_value));
}
}
private class MayhapsNothing : Mayhaps<T>
{
public override bool HasValue
{
get { return false; }
}
public override T Value
{ get { throw new InvalidOperationException("Nothing here."); } }
public override Mayhaps<TR> Map<TR>(Func<T, TR> f)
{
return Mayhaps<TR>.Nothing;
}
}
private static MayhapsNothing _nothing = new MayhapsNothing();
public static Mayhaps<T> Indeed(T value)
{
return new MayhapsValue(value);
}
public static Mayhaps<T> Nothing
{
get { return _nothing; }
}
}

view raw

Mayhaps.cs

hosted with ❤ by GitHub

Now you can write code like this:

var foo = Mayhaps<string>.Nothing;
var bar = Mayhaps<string>.Indeed("lol");
var couldBeStrings = new[] { foo, bar };
var couldBeLengths = couldBeStrings.Select(it => it.Map(s => s.Length));

A better solution would be to use a library such as Succinc<T> to do the job for you.

Regardless of how you do it, however, it’s always going to be a bit clunky. What’s more is it won’t really solve our problem.

As you’ll recall, the problem with null is that you can’t escape from it. In a sense, what is missing isn’t Mayhaps. It’s the opposite. With null, everything is Mayhaps! We still don’t have a way to say that we know that the value is there. So perhaps a better solution is to implement the opposite? We could try. Here’s a very simple type that banishes null:


public sealed class Indeed<T> {
private readonly T _value;
public Indeed(T value) {
if (value == null)
{
throw new ArgumentNullException("Begone, null!"));
}
_value = value;
}
public T Value {
get { return _value; }
}
}

view raw

Indeed.cs

hosted with ❤ by GitHub

Now the question is – apart from being very clunky – does it work? And the depressing answer is: not really. It addresses the correct problem, but it fails for an obvious reason – how do you ensure that the Indeed value itself isn’t null? Put it inside another Indeed?

Implementing Indeed as a struct (that is, a value type) doesn’t work too great either. While a struct Indeed cannot be null, you can very easily obtain an uninitialized Indeed, for instance by getting the default value of the struct, which is always available. In that case, you would end up with an Indeed which wraps a null, which is unacceptable.

So I’m afraid it really is true. You can’t get rid of null once you’ve invited it in. It’s pretty annoying. I wish they hadn’t invited in null in the first place.


Pragmatism is poison

Yesterday I gave a lightning talk called “Pragmatism is Poison” at the Booster Conference in Bergen. This blog post is essentially that talk in written form.

The basic idea of the talk, and therefore of this blog post, is to launch a public attack on perhaps the most fundamental virtue of the software craftsman: pragmatism. The reason is that I think pragmatism has turned toxic, to the point where it causes more harm than good.

While I do believe that pragmatism was once a useful maxim to combat analysis paralysis and over-engineering, I also believe that the usefulness has expired. Pragmatism no longer represents a healthy attitude, in my mind it’s not even a word anymore. It has degenerated into a thought-terminating cliché, which is used to stifle discussion, keep inquiries at bay and to justify not thinking things through.

These are, of course, pretty harsh words. I’ll try to make my case though.

Let’s start by having a look at what it means to be pragmatic. The definition Google gave me is as follows:

Dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations.

[A small caveat: this isn’t necessarily the only or “correct” definition of what it means to be pragmatic. But I suppose it’s a sensible and realistic approximation?]

Anyways: this is good by definition! How can it possibly be bad?

Well, for a whole bunch of reasons really. We can divide the problems in two parts: 1) that pragmatism lends itself to abuse because it’s ambiguous and subjective, and 2) all the forms that abuse can take.

So the underlying problem is that the definition relies heavily on subjective judgment. (Consider the task of devising a test that would determine whether or not someone was being pragmatic – the very idea is absurd!) One thing is that what qualifies as “realistic” and “sensible” is clearly subjective. For any group of people, you will find varying degrees of agreement and disagreement depending on who they are and the experiences they’ve had. But the distinction between “practical” and “theoretical” considerations is problematic as well.

Here are some quick considerations to -uh- consider:

#1 laws of nature change
#2 bomb hits data center
#3 cloud providers go down
#4 servers are hacked
#5 unlikely timing of events

I suppose we can agree that #1 is of theoretical interest only. What about the others? If #2 happens, maybe we have more serious problems that your service going down, but it depends on how critical that service is. The same goes for #3. With respect to #4, a lot of people seem surprised when their servers are hacked – as if they hadn’t thought it possible! And #5 depends on whether or not you’d like luck to be a part of your architecture. I’ve been on projects where people would tell me that scenarios I asked about were so unlikely that they were practically impossible. Which to me just means they’ll be hard to debug and reproduce when they happen in production.

The point, though, is that the distinction between “practical” and “theoretical” is pretty much arbitrary. Where do we draw the line? Who’s to say? But it’s important, because mislabeling important considerations – things that affect the quality of software! – as “theoretical” leads to bad software.

So that’s the underlying problem of ambiguity. On to the various forms of abuse!

The first is that pragmatism is often used to present a false dilemma in software development. We love our dichotomies! I could use cheap rhetoric to imply that it traces back to the 0s and 1s of our computers, but let’s not – I have no evidence for such a claim! Luckily I don’t need it either. Suffice it to say that the world is complex, and it’s always very tempting to see things in black and white instead. And so we pit “pragmatism” on the one hand against “dogmatism” on the other, and it’s really important to stay on the right side of that divide! Sometimes we use different words instead, like “practical” vs “theoretical” or “real world” vs “ivory tower”. It all means the same thing: “good” vs “bad”. Which is a big lie, because we’re not making ethical judgments, we’re trying to assess the pros and cons of different solutions to particular problems in concrete contexts. This isn’t The Lord of the Rings.

The consequences of this polarizing are pretty severe. The false dilemma is often used as a self-defense bluff in discussions between team members. The so-called impostor syndrome is rampant in our industry, and so we reach for tools that help us deal with insecurity. One such tool is pragmatism, which can be abused as a magical spell to turn insecurity on its head.

Here’s how it works. Because of the false dilemma, a claim to be pragmatic is implicitly an accusation that says that whoever disagrees is dogmatic: they’re not being sensible, they’re not being realistic, they’re just obsessing over theoretical considerations. So while a statement like “I’m being pragmatic” sounds innocuous enough, it’s really not. It leads to stupid, unrefined, pointless discussions where no knowledge is gained. Instead we’re fighting over who’s good and who’s bad. Polarizing does not make discussions more interesting, it makes them degenerate into banality.

A related strategy is to use pragmatism as a diversion or a smoke bomb, offering the confronted part with an easy exit and effectively ending the discussion. The reason is that it takes a lot of guts and perseverance to call someone’s bluff when they’re claiming to be pragmatic. You might approach your co-worker with a concern like “hey, I was looking at the code, and it seems like we’re blocking in our streaming API, which sort of defeats the purpose of a streaming API in the first place”. It sounds like a valid concern until your co-worker says the magic words “I was just being pragmatic” and vanishes in a puff of smoke, like so:

puffanimate

What we should do instead is accept the complexity we’re faced with and resist the urge to trivialize it. There’s always a need for thinking and discussion, and spurious claims of pragmatism don’t help.

Another problem with pragmatism is that it can be – and is – used as an excuse for sloppy thinking, or no thinking at all. Pragmatism encourages partial “solutions” that work not by reflecting a conceptual solution to a problem, but rather by mimicking correct behavior for various inputs. That way, we can short-circuit the need for design and collaboration. Instead we start with a trivial “happy path” solution and add flags and epicycles to flesh it out into richer behavior as needed. This approach yields software that more or less works for the inputs we’ve tried, and maybe for other inputs as well. (Behavior in the latter case is not as well understood, for obvious reasons.) If we come across inputs that cause problems, we apply patches in the form of additional flags and epicycles.

Because the approach sounds rather dubious when written out like that, we use the magic word “pragmatic” to make it sound better. It’s pragmatic problem-solving. We call the solutions themselves “good enough” solutions – wasting any more time thinking would be gold-plating!  Sometimes we use quotes like “perfect is the enemy of good” as further evidence that we’re doing a good thing – as if the problem we’re facing is too much perfect software in the world!

Here’s an obviously made-up example of this approach:

function square(x) = {
    if (x == 1) then return 1;
    if (x == 2) then return 4;
    if (x == 3) then return 9;
    if (x == 4) then return 15;
    if (x == 5) then return 25;
    // Should never happen.
    return -1;
 }

This is a function that computes the square of the integers 1-5, not by reflecting any understanding of what it means to square a number, but rather by emulating correct behavior. It has a small bug for the input number 4, but that doesn’t matter much, we rarely get that value, and the result isn’t too far off. It’s a perfectly pragmatic solution if you can assume that x will always be in the range 1-5.

A more general solution would be this:

function square(x) = {
    return x * x;
}

Which solves the general case of calculating the square of integers – sort of! Unfortunately, integers themselves are deeply pragmatic. (What happens when x * x is greater than the maximum value for integers?)

But these are all silly examples – theoretical considerations!

So let’s consider something more “real world”. Since software exists and executes in time, software typically needs to take time into account – by registering time stamps for various events and so forth.

How do you handle dates and times in your applications? Are you aware of the related complexities that exist? Do you handle those complexities explicitly? Do you think about how they might affect your application in various ways? Or do you simply close your eyes and hope for the best? Assume that all the systems you integrate with use time stamps from the same time zone? Assume that leap years and leap seconds won’t affect you (that all years have 365 days and all minutes have 60 seconds)? Assume that daylight savings time won’t cause any problems (even though it means that time isn’t linear – depending on your time zone(s), some points in time may not exist, whereas others may exist more than once)? Assume that everyone else around you are making the same assumptions? That’s mighty pragmatic!

Finally, pragmatism is sometimes used to create outright logical contradictions. Pragmatism is about compromise, but some compromises cannot be made without compromising the concept itself! For instance, some architectural properties have principles that cannot be violated while still maintaining the properties – they are simply no longer present due to the violation. (A vegetarian cannot eat meat in the weekends and still be a vegetarian, if you will.) Not even pragmatism can fix this, but that doesn’t stop people from trying!

To illustrate, here’s (a reproduction of) a funny meme I found on the Internet.

batman-rest

I think it’s funny because a lot of people seem to get annoyed when someone points out that their self-proclaimed RESTful APIs aren’t really RESTful because they violate some property of REST or other – typically the hypermedia constraint. People get annoyed because they don’t want to think about that, they’d rather be pragmatic and get on with stuff.

But for some reason they still want to keep that word, REST. Maybe they think it sounds good, or maybe they promised their manager a REST API, I don’t know. It doesn’t matter. They want to keep that word. And so they say “well, maybe it’s not your Ivory Tower REST” (implicitly bad!), “maybe it’s Pragmatic REST instead” (implicitly good!). And then they go on to do something like JSON over HTTP, which is really simple and great, and they can easily deserialize the JSON in their JavaScript, and they’ve practically shipped it already. And when someone comes along and talks about hypermedia being a requirement for REST, they just slap them! Pretty funny!

Here’s another meme. I made this one myself. Unfortunately it’s not funny.

batman-secure

Why isn’t it funny? It looks a lot like the previous one.

The problem is that when someone violates some established principle of security – maybe they decide it’s convenient to store encrypted passwords on their server or to roll their own cryptography – we think it’s a bad idea. And we don’t think it’s a good excuse to say “well, maybe it’s not Ivory Tower Security, maybe it’s Pragmatic Security instead”. We simply don’t agree that it’s very secure at all. So in some sense it’s not funny because the roles have been reversed. Turns out it’s much more funny being Batman than being Robin in this meme. Go figure.

Now we have the strange situation that it’s apparently OK for some words in software to have no meaning (REST), whereas we insist that others do have meaning (secure). For the meaningless words, prefixing “pragmatic” will absolve you from your sins, for the meaningful words, it will not. This is a problem, I think. Who’s to decide which words should have meaning?

Here’s a third meme. I made this one too.

batman-tea

It’s a bit strange, I’ll admit that. But bear with me. It’s the last one, I promise.

What would you say if someone offered you hot water and a biscuit and said “have some pragmatic tea”? Would you be content? Would you pay for a cup of pragmatic tea? Or would you take the route of the dogmatist and argue that it’s not really tea if no tea was involved in the preparation? Well, SLAP you! Crazy tea zealot! Hang out with the other ivory tower hipsters, and have your fancy tea! Who drinks tea anyway?!

At this point you can see I’ve gone absurd – but we’re still just doing variations of the same joke. I didn’t bring the absurdity, it was there all along. The point I’m trying to make is that words do have meaning, whether it’s REST or security or tea. We should respect that meaning instead of using pragmatism as a poor excuse for undermining it. Some properties have principles that cannot be broken while at the same time keeping the properties intact. You can’t have REST without Representational State Transfer, because that’s literally what the acronym means. Secure applications shouldn’t be storing passwords, even if they’re encrypted. Tea should contain tea. (Please don’t get me started on Rooibos.)

I should add that it’s perfectly fine to not care about REST, or to dispute the value of REST – or tea, or security, for that matter. Those are conversations worth having. It’s a free world, and everyone is entitled to choose the properties they care about, based on the context they’re in. Lots of people very vocally don’t care about REST, perhaps even more people than people who know what REST actually is! I have no problem with that. What’s less fine is pretending it has no meaning.

This concludes my attack on pragmatism, at least for now! To reiterate: pragmatism is easily abused because it’s hard to tell if someone is genuinely pragmatic or just claiming to be so. The abuse takes various forms: false dilemma, self-defense bluff, smoke bomb, justification for sloppy thinking and undermining of the meaning of words.

A call for action? Please stop using pragmatism as an excuse for doing sloppy work! And if you find you do need to use the word, please have it be the beginning of a discussion rather than the end of one.


Technical debt isn’t technical

TL;DR

Technical debt is not primarily caused by clumsy programming, it is a third-order effect of poor communication. Technical debt is a symptom of an underlying lack of appropriate abstractions, which in turn stems from insufficient modelling of the problem domain. This means that necessary communication has not taken place: discussions and decisions to resolve ambiguity and make informed trade-offs have been swept under the rug. Technical debt is the reification of this lack of resolution in code.

The technical debt meme

For a while now, I’ve been wanting to write about technical debt. As we all know, technical debt is a very successful meme in software development – it needs no introduction as a concept. Like any good virus, it has self-replicated and spread throughout the software development world, even reaching into the minds of project leaders and stakeholders. This is good, since the notion of technical debt brings attention to the fact that the internal quality of software matters – that there are aspects of software that are invisible to anyone but the programmers, but still have very visible effects – in the form of prolonged quality problems, missed deadlines, development grinding to a halt and so forth. For this reason, we should tip our hats to Ward Cunningham for coming up with the term. It gives us terminology that allows us to communicate better with non-technical stakeholders in software projects.

Why technical debt is a misnomer

That’s not what I want to write about however. What I want to say is that technical debt is also a deeply problematic notion, because it speaks little of the causes of technical debt, or how to fix them.

The usual story is that technical debt stems from project deadlines. If the code is inadequate, sloppy or otherwise “bad”, it has probably been written in a hurry because the project leader said so. This indicates that time is the cause of our problems, and also conveniently places the responsibility of the mess on someone else than the developers.

This is certainly true in some cases; we have all written code like that, and for those exact reasons. I just don’t think it’s the whole story, or even a major part of it. It seems to me entirely inadequate to explain the majority of technical debt that I’ve seen on software projects. The so-called technical problems go much deeper than mere sloppiness of implementation, and reveal fundamental problems in the process of understanding of the business domain and how that understanding is captured and represented in software. In particular, it is very common to see weak abstractions that fail to represent the richness of the domain. The code tends to be overgrown with conditional and flags, which indicates a weak model that has handled evolution and change very poorly – by ad hoc spouting of extra branches and the booleans needed to navigate them as appropriate for different use cases.  Complexity grows like ever new epicycles on the inadequate model – easily recognizable as things in your code that cannot be given meaningful names because they have no meaningful counterpart in the problem domain. The end result is a horrific steampunk contraption of accidental complexity.

This makes the code extraordinarily difficult to reason about. Hence, it would seem that the so-called technical debt really stems from modelling debt; the code lacks the higher-level concepts of a rich domain model that would make it possible to express the use cases more directly.

The currency of technical debt is knowledge. — @sarahmei

In DDD terms, modelling debt indicates that insufficient knowledge crunching has taken place. Knowledge crunching involves learning about the problem domain and capturing that knowledge in a suitable domain model. This is a communication-driven process that involves identifying and resolving ambiguity in the problem domain, and expressing the domain as clearly as possible. Most of all, it is a chaotic and messy process that involves people and discussion. Insufficient knowledge crunching in turn points towards the ultimate cause of technical debt: poor communication.

Communication is the principal portion of the “technical debt.” Messy code is just the ever-increasing interest. — @nycplayer

Why technical debt is misrepresented

So if technical debt isn’t really technical – or at least not ultimately caused by technical issues – why do we keep referring to it as technical debt? Unfortunately, it seems to me that developers have a tendency to look for technical solutions to soft problems.

Technical tasks are alluring because, unlike modelling and communication, they have no psychological dimensions, and tend not to lead to conflict. I don’t want to add to the stereotype of the programmer as particularly socially inept; suffice it to say that most people will prefer to avoid conflict if possible. Technical work is a series of puzzles to be solved. Modelling work uncovers human issues, differences of opinion, different focus, different hopes for the application, even personal conflicts. Figuring out what the application should do exposes all of these issues, and it is painful! This is why everyone is quoting Conway’s Law these days.

And so, even as widespread as the meme technical debt is, it seems to be poorly understood, even by developers – perhaps even particularly by developers! Indeed, the Wikipedia page for technical debt – no doubt authored by developers – currently lists 11 causes of technical debt, but lack of understanding of the problem domain is not one of them! “Lack of knowledge” sounds promising, until you read the explanation: “when the developer simply doesn’t know how to write elegant code”(!)

Again we see the focus on the technical aspects, as if technical debt were caused by clumsy, unskilled programmers with nagging, incompetent project leaders – and hence as if it were fixable by some virtuous programmer – a master craftsman, no less! – using generic, context-free principles like SOLID, dependency injection and patterns. It is not! Code hygiene is certainly a virtue, but it is no substitute for modelling, just like frantically washing your hands is not sufficient for successful surgery. Getting lost in code hygiene discussions is like arguing about the optimal kinds of soap and water temperature while the patient is dying on the operating table.

And yet it is indirectly true: a developer who doesn’t know the importance of understanding the problem domain, of proper modelling, will certainly fail to write elegant code. Elegance of the implementation can only stem from an elegant model that reflects a deep understanding of the problem addressed.

Down payment

This has deep ramifications, in particular in how we address technical debt. Refactoring is another successful meme in software development, and we often use it to describe the process of making down payments on technical debt. But if technical debt isn’t just clumsy code, if instead it is clumsy code caused by unresolved ambiguity in the problem domain, then it is poorly addressed by rearranging code. We need to start in the other end, with a better understanding of the problem we are trying to solve, and with modelling concepts permeating the code instead of branches and booleans. This is what Eric Evans calls “refactoring towards deeper insight”. Unless we have a model to drive our efforts, there is no reason to believe that we will be able to do much better than before. Refactoring without an improved domain model is just hubris.

A rewrite will end up with the same problems as the original unless you close the understanding gap. — @sarahmei

To conclude

That’s what I wanted to say about technical debt. It’s not very technical at all. It’s about code that gets bad because humans fail to communicate when trying to solve problems in some business domain using software. It usually is.

 


Diamond mirrors

My friend Bjørn Einar did a nice write-up about the Diamond code kata in F# the other day. He did so in the context of TDD-style evolutionary design vs up-front thinking away from the keyboard. Apparently he has this crazy idea that it might be worthwhile to do a bit of conceptual problem-solving and thinking about properties of the domain before you start typing. Very out of vogue, I know.

Anyways, he ended up with an interesting implementation centered on exploiting something called the taxicab norm. (I hadn’t heard of it either, which makes it all the more interesting.) I really like that approach: cast your problem as an instance of an existing, well-understood problem for which there exists a well-understood solution. It replaces ad-hoc code with a mathematical idea, and is rather a far step away from typical implementations that get heavy on string manipulations and where the solution to the problem in general is swamped with things related to outputting the diamond to the console.

I wondered if I could come up with an alternative approach, and hence I got to thinking a bit myself. Away from the keyboard, like a madman. The solution I came up is perhaps a bit more conventional, a bit less mathematical (I’m sorry to say), but still centered on a single idea. That idea is mirroring.

To illustrate the approach, consider a sample diamond built from five letters, A through E. It should look like the following:

....A....
...B.B...
..C...C..
.D.....D.
E.......E
.D.....D.
..C...C..
...B.B...
....A....

The mirroring is fairly obvious. One way to look at the diamond is to consider it as a pyramid mirrored along the E-row. But at the same time, it is also a pyramid mirrored along the A-column. So it goes both ways. This means that we could rather easily build our pyramid from just a quarter of the diamond, by mirroring it twice. We would start with just this:

A....
.B...
..C..
...D.
....E

We could then proceed by mirroring along the A-column to produce this:

....A....
...B.B...
..C...C..
.D.....D.
E.......E

And then we could complete the diamond by mirroring along the E-row, and it would look like the diamond we wanted.

So far so good. But we need the first quarter. How could we go about producing that?

Assume we start with a list ['A' .. 'E']. We would like to use that to produce this list:


[
['A'; '.'; '.'; '.'; '.'];
['.'; 'B'; '.'; '.'; '.'];
['.'; '.'; 'C'; '.'; '.'];
['.'; '.'; '.'; 'D'; '.'];
['.'; '.'; '.'; '.'; 'E'];
]

But that’s rather easy. Each inner list is just the original list ['A' .. 'E'] with all letters except one replaced by ‘.’. That’s a job for map. Say I want to keep only the ‘B’:


List.map (fun x -> if x = 'B' then x else '.') ['A' .. 'E']

view raw

map-to-dots.fs

hosted with ❤ by GitHub

And so on and so forth for each letter in the original list. We can use a list comprehension to generate all of them for us. For convenience, we’ll create a function genLists:


let genLists lst =
[ for e in lst do List.map (fun x -> if x = e then x else '.') lst ]

view raw

genLists.fs

hosted with ❤ by GitHub

This gives us the first quarter. Now for the mirroring. That’s easy too:


let mirror lst =
match lst with
| [] -> []
| h::t -> List.rev t @ lst

view raw

mirror.fs

hosted with ❤ by GitHub

(We’ll never actually call mirror with an empty list, but I think it’s better form to include it anyway.)

So now we can map the mirror function over the quarter diamond to produce a half diamond:


[
['.'; '.'; '.'; '.'; 'A'; '.'; '.'; '.'; '.'];
['.'; '.'; '.'; 'B'; '.'; 'B'; '.'; '.'; '.'];
['.'; '.'; 'C'; '.'; '.'; '.'; 'C'; '.'; '.'];
['.'; 'D'; '.'; '.'; '.'; '.'; '.'; 'D'; '.'];
['E'; '.'; '.'; '.'; '.'; '.'; '.'; '.'; 'E'];
]

view raw

half-diamond.fs

hosted with ❤ by GitHub

Excellent. Now we’re almost ready to do the second mirroring. The only problem is that the mirror function uses the head element as the pivot for mirroring, so we would end up with an X instead of a diamond!

That’s trivial to fix though. We’ll just reverse the list first, and then do the mirroring. I’m not even going to write up the result for that – it is obviously the completed diamond. Instead, here’s the complete diamond function, built from the parts we’ve seen so far:


let diamond letters =
letters |> genLists
|> List.map (fun a -> mirror a)
|> List.rev
|> mirror

view raw

diamond-fun.fs

hosted with ❤ by GitHub

Could I speed up things by reversing my lists before the first mapping instead of after? No, because the (outer) list has the same number of elements before and after the first mirroring. Plus it’s easier to explain this way. And really, perf optimization for a code kata? Come on!

Now for rendering:


let toStr d =
d |> List.map (fun a -> new string(Array.ofList(a)))
|> String.concat "\n"

view raw

tostr.fs

hosted with ❤ by GitHub

And to run everything (for a full-sized diamond, because why not):


['A' .. 'Z'] |> diamond |> toStr |> printfn "%s"

view raw

run-diamond.fs

hosted with ❤ by GitHub

And that’s all there is to it. The entire code looks like this:


open System
let genLists lst =
[ for e in lst do yield List.map (fun x -> if x = e then x else '.') lst ]
let mirror lst =
match lst with
| [] -> []
| h::t -> List.rev t @ lst
let diamond letters =
letters |> genLists
|> List.map (fun a -> mirror a)
|> List.rev
|> mirror
let toStr d =
d |> List.map (fun a -> new string(Array.ofList(a)))
|> String.concat "\n"
['A' .. 'Z'] |> diamond |> toStr |> printfn "%s"


Aspects without Aspects

In the previous blog posts, we saw that we could hide the problematic concrete exception type from the C# compiler by tucking it inside a transformation from a closure of type Func<TR> to another closure of the same type. But of course we can use such transformations for many things besides exception handling. Any old behaviour that we would like to throw on top of the business logic, we can apply in layers using this approach.

This capability is so cool that I took a break from writing this blog post to share my enthusiasm with my wife. She was like, “what are you blogging about?”, and I was like “there’s this really cool thing you can do, where you apply this transformation to some method call, and then you can like, do additional stuff with it, entirely transparently!”, and she was like “like what?”, and I was like “like anything!”, and she was like “like what?”, and I was like “anything you want!”, but she was still like “like what though?” and then I turned more like “uh… uh… like say you had this method that returned a string – you could easily transform that into a method that looked exactly the same, but returned the reversed string instead”, and she was like “…the reversed string? why?” and I was like “or-or-or maybe you could return the uppercase string instead…?”, and she was like “uppercase?” with totally uppercase eyebrows and I was like “nonono! I got it! say you had this method that did a really expensive and slow computation, you could turn that into a method that would keep along the result that computation, so you didn’t have to do the actual computation all the time”, and she was like “oh, that’s cool” and I was like “phew! I’m never talking to her about blogging again!”.

So that was a close call. But yes, you can totally use this for caching. All we need is a suitable transformation thing.


public static Func<T> Caching<T>(this Func<T> f)
{
bool cached = false;
T t = default(T);
return () => {
if (cached) return t;
t = f();
cached = true;
return t;
};
}

view raw

Caching.cs

hosted with ❤ by GitHub

Here, we’re taking advantage of the fact that C# has mutable closures – that is, that we can write to cached and t from inside the body of the lambda expression.

To verify that it works, we need a suitable example – something that’s really expensive and slow to compute. And as we all know, one of the most computationally intensive things we can do in a code example is to sleep:


Func<string> q = () => {
Thread.Sleep(2000);
return "Hard-obtained string";
};
Console.WriteLine(q());
Console.WriteLine(q());
Console.WriteLine(q());
q = q.Caching();
Console.WriteLine(q());
Console.WriteLine(q());
Console.WriteLine(q());

view raw

SleepCache.cs

hosted with ❤ by GitHub

Well, what kind of behaviour should we expect from this code? Obviously, the first three q calls will be slow. But what about the three last? The three last execute the caching closure instead. When we execute the forth call, cached is false, and so the if test fails, and we proceed to evaluate the original, non-caching q (which is slow), tuck away the result value for later, set the cached flag to true, and return the computed result – the hard-obtained string. But the fifth and sixth calls should be quick, since cached is now true, and we have a cached result value to return to the caller, without ever having to resort to the original q.

That’s theory. Here’s practice:

So that seems to work according to plan. What else can we do? We’ve seen exception handling in the previous posts and caching in this one – both examples of so-called “cross-cutting concerns” in our applications. Cross-cutting concerns was hot terminology ten years ago, when the enterprise world discovered the power of the meta-object protocol (without realizing it, of course). It did so in the guise of aspect-oriented programming, which carried with it a whole vocabulary besides the term “cross-cutting concerns” itself, including “advice” (additional behaviour to handle), “join points” (places in your code where the additional behaviour may be applied) and “pointcuts” (a way of specifying declaratively which join points the advice applies to). And indeed, we can use these transformations that we’ve been doing to implement a sort of poor man’s aspects.

Why a poor man’s aspects? What’s cheap about them? Well, we will be applying advice at various join points, but we won’t be supporting pointcuts to select them. Rather, we will be applying advice to the join points manually. Arguably, therefore, it’s not really aspects at all, and yet we get some of the same capabilities. That’s why we’ll call them aspects without aspects. Makes sense?

Let’s consider wrapping some closure f in a hypothetical try-finally-block, and see where we might want to add behaviour.


// 1. Before calling f.
try {
f();
// 2. After successful call to f.
}
finally {
// 3. After any call to f.
}

So we’ll create extension methods to add behaviour in those three places. We’ll call them Before, Success and After, respectively.


public static class AspectExtensions {
public static Func<T> Before<T>(this Func<T> f, Action a) {
return () => { a(); return f(); };
}
public static Func<T> Success<T>(this Func<T> f, Action a) {
return () => {
var result = f();
a();
return result;
};
}
public static Func<T> Success<T>(this Func<T> f, Action<T> a) {
return () => {
var result = f();
a(result);
return result;
};
}
public static Func<T> After<T>(this Func<T> f, Action a) {
return () => {
try {
return f();
} finally {
a();
}
};
}
public static Func<T> After<T>(this Func<T> f, Action<T> a) {
return () => {
T result = default(T);
try {
result = f();
return result;
} finally {
a(result);
}
};
}
}

Note that we have two options for each of the join points that occur after the call to the original f closure. In some cases you might be interested in the value returned by f, in others you might not be.

How does it work in practice? Let’s look at a contrived example.


static void Main (string[] args)
{
Func<Func<string>, Func<string>> wrap = fn => fn
.Before(() => Console.WriteLine("I'm happening early on."))
.Success(r => Console.WriteLine("Successfully obtained: " + r))
.Before(() => Console.WriteLine("When do I occur???"))
.After(r => Console.WriteLine("What did I get? " + r));
var m1 = wrap(() => {
Console.WriteLine("Executing m1…");
return "Hello Kiczales!";
});
var m2 = wrap(() => {
Console.WriteLine("Executing m2…");
throw new Exception("Boom");
});
Call("m1", m1);
Call("m2", m2);
}
static void Call(string name, Func<string> m) {
Console.WriteLine(name);
try {
Console.WriteLine(name + " returned: " + m());
}
catch (Exception ex) {
Console.WriteLine("Exception in {0}: {1}", name, ex.Message);
}
Console.WriteLine();
}

So here we have a transformation thing that takes a Func<string> closure and returns another Func<string> closure, with several pieces of advice applied. Can you work out when the different closures will be executed?

We start with some closure fn, but before fn executes, the first Before must execute (that’s why we call it Before!). Assuming both of these execute successfully (without throwing an exception), the Success will execute. But before all these things, the second Before must execute! And finally, regardless of how the execution turns out with respect to exceptions, the After should execute.

In the case of m1, no exception occurs, so we should see the message “Successfully obtained: Hello Kiczales!” in between “Executing m1…” and “What did I get? Hello Kiczales!”. In the case of m2, on the other hand, we do get an exception, so the Success closure is never executed.

A screenshot of my console verifies this:

Screen Shot 2014-07-04 at 10.59.07 PM

So we’ve seen that we can do fluent exception handling, caching and aspects without aspects using the same basic idea: we take something of type Func<TR> and produce something else of the same type. Of course, this means that we’re free to mix and match all of these things if we wanted to, and compose them all using Linq’s Aggregate method! For once, though, I think I’ll leave that as an exercise for the reader.

And of course, we can transform other things besides closures as well – we can use the same approach to transform any instance of type T to some other T instance. In fact, let’s declare a delegate to capture such a generalized concept:


delegate T Decorate<T>(T t);

view raw

Decorate.cs

hosted with ❤ by GitHub

Why Decorate? Well, nothing is ever new on this blog. I’m just rediscovering old ideas and reinventing flat tires as Alan Kay put it. In this case, it turns out that all we’ve been doing is looking at the good old Decorator pattern from the GoF book in a new or unfamiliar guise.


Rethrow recoil

The closure-based exception handling scheme in the previous blog post works almost perfectly, and it would have worked entirely perfectly if a catch block were an ordinary block of code. But alas, it is not. Not quite. A catch block is special in that there is one statement that you can make inside a catch block that you cannot make anywhere else in your program. Unfortunately, it is also a fairly common statement to use inside catch blocks: the rethrow statement – that is, a throw statement with no operand. We cannot simply use a throw statement with an operand instead, since that has different semantics. In fact, it’s not a rethrow at all, it’s a brand new throw (even when we’re using the exception we just caught). The consequence is that the original stack trace is lost. In other words, we lose trace of where the exception originally occurred, which is almost never what we want.

So that’s unfortunate. Unacceptable even. Can we fix it? Of course we can fix it! We’re programmers, we can fix anything!

There’s no way to put a rethrow into our lambda expession though, so we need to do something else. That something else turns out to be trivial. We do have a genuine catch-block available, so we’ll just put it there. Or rather, we’ll create a new try-catch block with a hard-coded rethrow inside, and put that block inside a new extension method which complements the one we created last time. Like so:


public static Func<TR> Touch<TR, TE>(this Func<TR> f, Action<TE> h)
where TE : Exception
{
return () =>
{
try
{
return f();
}
catch (TE e)
{
h(e);
throw;
}
};
}

view raw

Touch.cs

hosted with ❤ by GitHub

In this case, we use an Action<TE> instead of a Func<TE, TR>, because obviously the exception handler won’t be returning anything – we just hard-coded a rethrow in there!

Why Touch and not an overload of the Catch method we saw before? Well, first of all we’re not catching the exception, we’re merely touching it on the way through – hence Catch is not really a suitable name for what we’re doing. And besides, the following code snippet would be ambiguous, at least to the reader of the code. Assuming we had overloaded the Catch method, how should we interpret something like this?


Catch((Exception ex) => { throw ex; })

view raw

AmbCatch.cs

hosted with ❤ by GitHub

Is that an Action<TE> or a Func<TE, TR>? I have no idea. It turns out that the compiler is OK with the ambivalence – it decides it must be an Action<TE> (which leaves us with a rather strange catch block where the throw ex statement in the handler terminates the block before the rethrow occurs). But I’m not! Better to be explicit. Touch it is.

Now we can write code like this:


var g = f
.Catch((ArgumentException ex) => "Something")
.Touch((NullReferenceException ex) => Console.WriteLine("I saw you!"))
.Catch((Exception ex) => "ex");
Console.WriteLine(g());

So what happens in the last line? Well, if an ArgumentException is thrown, we’ll see the string “Something” written. If a NullReferenceException is thrown, however, we’ll see the “I saw you” string, in addition to “ex”, since the exception percolates up to the outer handler where it is caught and swallowed.

Fixed!


Linq to Exceptions

Lately I’ve been thinking about the mundane task of exception handling. Remarkably unsexy, and yet completely mandatory. There’s no escape from handling the exceptions that may (and therefore will) occur in your application, unless you want it to crash or misbehave regularly.

Proper exception handling can be a chore anywhere in your application, but in particular at integration points. Whenever your application needs to communicate with some external service, any number of things can go wrong, at various levels in the stack – from application level errors to network problems. We need to be able to cope with all these different kinds of errors, and hence we surround each service call with a veritable flood of exception handlers. That sort of works, but there are problems – first, the actual service is drowning amidst all the exception handling, and second, we get a lot of code duplication, since we tend to handle many kinds of exceptions in the same way for all service calls.

I’ve been discussing how to best solve this problem in a completely general and flexible way in several impromptu popsicle- and coffee-driven sessions with some of my compatriots at work (in particular Johan and Jonas – thanks guys!). Some of the code examples in this blog post, especially those who exhibit some trace of intelligence, can legitimately be considered rip-offs, evolutions, misunderstandings or various other degenerations of similar code snippets they have come up with.

But I’m getting ahead of myself. Let’s return to the root of the problem: how exception handling code has a tendency to overshadow the so-called business logic in our applications and to cause severe cases of duplication.

The problem is minor if there is a single exception we’re catching:


try {
// Interesting code.
}
catch (SomeException ex) {
// Dull exception handling code.
}

However, things regress quickly with additional types of exceptions:


try {
// Interesting code.
}
catch (SomeException ex) {
// Dull code.
}
catch (SomeOtherException ex) {
// Dull code.
}
catch (YetAnotherException ex) {
// Dull code.
}
catch (Exception ex) {
// Dull code.
}

This is a bit messy, and the signal-to-noise-ratio is low. It gets much worse when you have a second piece of interesting code that you need to guard with the same exception handling. Suddenly you have rampant code repetition on your hands.

A solution would be to use a closure to inject the interesting code into a generic exception handling method, like this:


TR Call<TR>(Func<TR> f) {
try {
return f();
}
catch (SomeException ex) {
// Dull code.
}
catch (SomeOtherException ex) {
// Dull code.
}
catch (YetAnotherException ex) {
// Dull code.
}
catch (Exception ex) {
// Dull code.
}
}

view raw

Coupled-Call.cs

hosted with ❤ by GitHub

And then you can use this in your methods:


Foo SafeMethod1(int x, string s) {
Call(() => Method1(x, s));
}
Bar SafeMethod2(double d) {
Call(() => Method2(d));
}

view raw

UsingCall.cs

hosted with ❤ by GitHub

Adding another method is trivial:


Quux SafeMethod3() {
Call(() => Method3());
}

This works pretty nicely and solves our immediate issue. But there are a couple of limitations. In particular, three problems spring to mind:

  1. What if you want to return some legal, non-default value of type TR from one or more of the catch-blocks? As it stands now, each catch-block must either rethrow or return the default value of TR.
  2. What if there are variations in how you want to handle some of the exceptions? For instance, it may be that YetAnotherException should be handled differently for each method.
  3. What if there are slight variations between the “catching” needs for the various methods? What if you decided that SafeMethod2 doesn’t need to handle SomeOtherException, whereas SafeMethod3 should handle IdiosyncraticException in addition to the “standard” ones?
  4. As an answer to the two first problems, you could pass in each exception handler to the Call method! Then you would have a method like this:


    TR Call<TR>(
    Func<TR> f,
    Func<SomeException, TR> h1,
    Func<SomeOtherException, TR> h2,
    Func<YetAnotherException, TR> h3,
    Func<Exception, TR> h4)
    {
    try {
    return f();
    }
    catch (SomeException ex) {
    return h1(ex);
    }
    catch (SomeOtherException ex) {
    return h2(ex);
    }
    catch (YetAnotherException ex) {
    return h3(ex);
    }
    catch (Exception ex) {
    return h4(ex);
    }
    }

    view raw

    CrazyCall.cs

    hosted with ❤ by GitHub

    And at this point you’re about to stop reading this blog post, because WTF. Now your methods look like this:


    Foo SafeMethod1(int x, string s) {
    Call(() => Method1(x, s),
    ex => // Handle SomeException,
    ex => // Handle SomeOtherException,
    ex => // Handle YetAnotherException,
    ex => // Handle Exception
    );
    }

    So we’re pretty much back at square one, except it’s a bit more convoluted and confusing for the casual reader. And we still don’t have a good solution for the third problem. We could of course fake “non-handling” of SomeOtherException for SafeMethod2 by simply handing in a non-handling handler, that is, one that simply rethrows directly: ex => { throw ex; }. But that’s ugly and what about the IdiosyncraticException? Well, it’s not going to be pretty:


    Quux SafeMethod3() {
    try {
    return Call(() => Method3(),
    ex => // Handle SomeException,
    ex => // Handle SomeOtherException,
    ex => // Handle YetAnotherException,
    ex => // Handle Exception
    );
    }
    catch (IdiosyncraticException ex) {
    // Do something.
    }
    }

    Which just might be the worst code ever, and also has the idiosyncracy that the additional catch-handler will only be reached if the Exception handler rethrows. Horrible. Better, perhaps, to put it inside?


    Quux SafeMethod3() {
    return Call(() =>
    {
    try { return Method3(); } catch (IdiosyncraticException ex) { throw ex; }
    },
    ex => // Handle SomeException,
    ex => // Handle SomeOtherException,
    ex => // Handle YetAnotherException,
    ex => // Handle Exception
    );
    }

    Well yes, slightly better, but still pretty horrible, and much worse than just suffering the duplication in the first place. But maybe we’ve learned something. What we need is composition – our current solution doesn’t compose at all. we need to be able to put together exactly the exception handlers we need for each method, while at the same time avoiding repetition. The problem is in some sense the coupling between the exception handlers. What if we tried a different approach, handling a single exception at a time?

    We could have a less ambitious Call-method, that would handle just a single type of exception for a method. Like this:


    TR Call<TR, TE>(
    Func<TR> f,
    Func<TE, TR> h)
    where TE : Exception
    {
    try {
    return f();
    }
    catch (TE ex) {
    return h(ex);
    }
    }

    view raw

    LessCall.cs

    hosted with ❤ by GitHub

    Now we have a single generic exception handler h. Note that when we constrain the type variable TE to be a subclass of Exception, we can use TE in the catch clause, to select precisely the exceptions we would like to catch. Then we could write a method like this:


    Frob SafeMethod4a() {
    return Call(
    () => new Frob(),
    (NullReferenceException ex) => ... );
    }

    view raw

    SafeMethod4a.cs

    hosted with ❤ by GitHub

    What if we wanted to catch another exception as well? The solution is obvious:


    Frob SafeMethod4b() {
    return Call(
    () => SafeMethod4a(),
    (InvalidOperationException ex) => ... );
    }

    view raw

    SafeMethod4b.cs

    hosted with ❤ by GitHub

    And yet another exception?


    Frob SafeMethod4c() {
    return Call(
    () => SafeMethod4b(),
    (FormatException ex) => ... );
    }

    view raw

    SafeMethod4c.cs

    hosted with ❤ by GitHub

    You get the picture. Of course, we can collapse all three to a single method if we want to:


    Frob SafeMethod4() {
    return
    Call(() =>
    Call(() =>
    Call(() =>
    FrobFactory.Create(),
    (NullReferenceException ex) => ... ),
    (InvalidOperationException ex) => ... ),
    (FormatException ex) => ... );
    }

    view raw

    SafeMethod4.cs

    hosted with ❤ by GitHub

    What have we gained? Well, not readability, I’ll admit that. But we’ve gained flexibility! Flexibility goes a long way! And we’ll work on the readability shortly. First, though: just in case it’s not clear, what we’ve done is that we’ve created an exception handling scenario that is similar to this:


    Frob TraditionalSafeMethod4() {
    try {
    try {
    try {
    return FrobFactory.Create();
    }
    catch (NullReferenceException ex) { ... handler code ... }
    }
    catch (InvalidOperationException ex) { ... handler code ... }
    }
    catch (FormatException ex) { ... handler code ... }
    }

    So there’s nothing very complicated going on here. In fact, I bet you can see how similar the two methods really are – the structure is identical! All we’ve done is replace the familiar try-catch construct with our own Call-construct.

    As an aside, we should note that the composed try-catch approach has slightly different semantics than the sequential, coupled try-catch approach. The difference in semantics is due to decoupling provided by the composed try-catch approach – each catch-block is completely independent. Therefore, there is nothing stopping us from having multiple catch-handlers for the same type of exception should we so desire.

    Now, to work on the readability a bit. What we really would like is some way to attach catch-handlers for various exception types to our function call. So assuming that we wrap up our original function call in a closure using a delegate of type Func<TR>, we would like to be able to attach a catch-handler for some exception type TE, and end up with a new closure that still has the type Func<TR>. Then we would have encapsulated the exception handling completely. Our unambitious Call-method from above is almost what we need, but not quite. Instead, let’s define an extension method on the type that we would like to extend! Func<TR>, that is:


    static class CatchExtensions
    {
    static Func<TR> Catch<TR, TE>(
    this Func<TR> f,
    Func<TE, TR> h)
    where TE : Exception
    {
    return () => {
    try {
    return f ();
    } catch (TE ex) {
    return h (ex);
    };
    };
    }
    }

    view raw

    Ext.Catch.cs

    hosted with ❤ by GitHub

    So the trick is to return a new closure that encapsulates calling the original closure and the exception handling. Then we can write code like this:


    Frob ExtSafeMethod4() {
    Func<Frob> it = () => FrobFactory.Create();
    var safe =
    it.Catch((NullReferenceException ex) => ... )
    .Catch((InvalidOperationException ex) => ... )
    .Catch((FormatException ex) => ... );
    return safe();
    }

    Now the neat thing is that you can very easily separate out the catch-handler-attachment from the rest of the code:


    Frob ExtSafeMethod4b() {
    var safe = Protect(() => FrobFactory.Create);
    return safe();
    }
    Func<TR> Protect<TR>(Func<TR> it) {
    return
    it.Catch((NullReferenceException ex) => ... )
    .Catch((InvalidOperationException ex) => ... )
    .Catch((FormatException ex) => ... );
    }

    So we have essentially created a fluent interface for attaching catch-handlers to a method call. The cool thing is that it is trivial to attach additional exception handlers as needed – and since we do so programmatically, we can even have logic to control the attachment of handlers. Say we discovered that we needed to catch WerewolfExceptions when the moon is full? No problem:


    Func<Frob> WolfProof() {
    var f = Protect(() => FrobFactory.Create());
    if (IsFullMoon()) {
    f = f.Catch((WerewolfException ex) => ... ); // silver bullet?
    }
    return f;
    }

    view raw

    WolfProof.cs

    hosted with ❤ by GitHub

    In my eyes, this is pretty cool. You might be running away screaming, thinking I’m crazy and that with this approach, you’ll never know which exceptions you’re actually catching anymore. You could be right. Opinions differ.

    But that’s OK. All I’m doing is providing an alternative approach to the handling of multiple exception – one that I think offers increased power and flexibility. I’m not saying you should take advantage of it. With greater power comes greater responsibility and all that.

    And besides, we still haven’t talked about Linq. An alternative (and attractive) solution to our current fluent interface is to attach a sequence of catch-handlers at once. Something like this:


    // Unfortunately, this won't compile.
    Func<TR> Protect<TR>(Func<TR> it) {
    return
    it.CatchAll(
    (NullReferenceException ex) => ... ,
    (InvalidOperationException ex) => ... ,
    (FormatException ex) => ... );
    }

    However, it’s surprisingly difficult to provide a suitable type for that sequence of catch-handlers – in fact, the C# compiler fails to do so! The problem is that delegates are contravariant in their parameters, which means that a delegate D1 is considered a subtype of delegate D2 if the parameters of D1 are supertypes of the parameters of D2. That’s all a bit abstract, so perhaps an example will help:


    Action<object> d1 = (object o) => {};
    Action<string> d2 = (string s) => {};
    d1 = d2; // This won't compile.
    d2 = d1; // This is OK.

    view raw

    FuncObject.cs

    hosted with ❤ by GitHub

    To make sense of the abstract description above, assume that D1 is Action<object> and D2 is Action<string>. Since the D1 parameter (object) is a supertype of the D2 parameter (string), it follows that D1 is a subtype of D2 – and not the other way around, as we might have guessed. This is why the C# compiler won’t let us assign a D2 instance to a D1 reference.

    The implication is that the C# compiler will fail to find a type that will reconcile the catch handlers above. In particular, due to the contravariance of delegate parameters, we cannot type the sequence as Func<Exception, TR>, since neither Func<NullReferenceException, TR>, nor Func<InvalidOperationException, TR>, nor Func<FormatException, TR> are assignable to Func<Exception, TR>. It would go the other way around: we could assign a Func<Exception, TR> to all three of the other types, but which one should the compiler pick? If it (arbitrarily) picked Func<NullReferenceException, TR>, clearly it wouldn’t work for the two other delegates – and all other choices have the same problem.

    So we’re stuck. Sort of. The only solution we have is to hide the exception type somehow, so that we don’t have to include the exception type in the type of the sequence. Now how do we do that? Well, in some sense, we’ve already seen an example of how to do that: we hide the exception handling (and the type) inside a closure. So all we need is some way to convert an exception handler to a simple transformation function that doesn’t care about the type of the exception itself. Like this:


    Func<Func<TR>, Func<TR>> Encapsulate<TR, TE>(Func<TE, TR> h)
    where TE : Exception
    {
    return f => () =>
    {
    try {
    return f();
    }
    catch (TE ex) {
    return h(ex);
    };
    };
    }

    view raw

    Encapsulate.cs

    hosted with ❤ by GitHub

    So what is this thing? It’s a method that encapsulates the catch-handler inside a closure. This closure will take as input a closure of type Func<TR> and produce as output another closure of type Func<TR>. In the process, we have hidden the type TE, so that the C# compiler doesn’t have to worry about it anymore: all we have is a thing that will transform a Func<TR> to another Func<TR>.

    So now we can sort of accomplish what we wanted, even though it’s less than perfect.


    Func<TR> Protect<TR>(Func<TR> it) {
    return
    it.CatchAll(
    Encapsulate((NullReferenceException ex) => ...),
    Encapsulate((InvalidOperationException ex) => ...),
    Encapsulate((FormatException ex) => ... ));
    }

    But now we can have some fun using Linq’s Aggregate method to compose our exception handlers. So we might write code like this:


    var catchers = new [] {
    Encapsulate((ArgumentException x) => x.Message),
    Encapsulate((InvalidOperationException x) => { Log(x.Message); throw x; },
    Encapsulate((NullReferenceException x) => "Uh oh")
    };
    var protect = catchers.Aggregate((acc, nxt) => thing => nxt(acc(thing)));
    var f = protect(() => FetchStringSomewhere());
    var s = f();

    The cool part is obviously the Aggregate call, where acc is the “accumulated” composed closure, nxt is the next encapsulated exception handler and thing is the thing we’re trying to protect with our exception handlers – so in other words, the closure that contains the call to FetchStringSomewhere.

    And of course we can now implement CatchAll if we want to:


    static class CatchExtensions
    {
    Func<TR> CatchAll<TR>(
    this Func<TR> f,
    params Func<Func<TR>, Func<TR>>[] catchers)
    {
    var protect = catchers.Aggregate((acc, nxt) => thing => nxt(acc(thing)));
    return protect(f);
    }
    }

    Now please, if you are Eric Lippert and can come up with code that proves that I’m wrong with respect to the typing of sequences of exception handler delegates – please let me know! I would very much like to be corrected if that is the case.


Another wild tail chase

It appears I’ve been waiting in vain! It’s been more than a month since my last blog post, and still no pull requests for TailCop! In particular, no pulls requests that implement rewriting of recursive calls to loops for instance methods. I don’t know why.

I guess it’s up to me, then.

To recap, TailCop is a simple utility I wrote to rewrite tail-recursive static methods to use loops instead (which prevents stack overflow in cases where the recursion goes very deep). The reason we shied away from instance methods last time is dynamic dispatch, which complicates matters a bit. We’ll tackle that in this blog post. To do so, however, we need to impose a couple of constraints.

First, we need to make sure that the instance method is non-virtual, that is, that it cannot be overridden in a subclass. Why? Well, let’s suppose you let Add be virtual, so that people may override it. Sounds reasonable? If it isn’t overridden, then it will behave just the same whether or not it’s rewritten. If it is overridden, then it shouldn’t matter if we override the recursive or the rewritten version, right? Well, you’d think, but unfortunately that’s not the case.

Say you decided to make Add virtual and rewrote it using TailCop. A few months pass by. Along comes your enthusiastic, dim-witted co-worker, ready to redefine the semantics of Add. He’s been reading up on object-orientation and is highly motivated to put all his hard-won knowledge to work. Unfortunately, he didn’t quite get to the Liskov thing, and so he ends up with this:


class Adder
{
virtual int Add(int x, int y)
{
return x > 0 ? Add(x 1, y + 1) : y;
}
}
class BlackAdder : Adder
{
override int Add(int x, int y)
{
return base.Add(x, x > 0 ? y + 1 : y);
}
}

So while he overrides the Add method in a subclass, he doesn’t replace it wholesale – he still invokes the original Add method as well. But then we have a problem. Due to dynamic dispatch, the recursive Add call in Adder will invoke BlackAdder.Add which will then invoke Adder.Add and so forth. Basically we’re taking the elevator up and down in the inheritance hierarchy. If we rewrite Adder.Add to use a loop, we will never be allowed to take the elevator back down to BlackAdder.Add. Obviously, then, the rewrite is not safe. Running BlackAdder.Add(30, 30) yields 90 with the original recursive version of Adder.Add and 61 with the rewritten version. Long story short: we will not rewrite virtual methods.

Our second constraint is that, obviously, the recursive call has to be made on the same object instance. If we call the same method on a different instance, there’s no guarantee that we’ll get the same result. For instance, if the calculation relies on object state in any way, we’re toast. So we need to invoke the method on this, not that. So how do we ensure that a recursive call is always made on the same instance – that is, on this? Well, obviously we need to be in a situation where the evaluation stack contains a reference to this in the right place when we’re making the recursive call. In IL, the this parameter to instance methods is always passed explicitly, unlike in C#. So a C# instance method with n parameters is represented as a method with n+1 parameters on the IL level. The additional parameter in IL is for the this reference, and is passed as the first parameter to the instance method. (This is similar to Python, where the self parameter is always passed explicitly to instance methods.) So anyways, if we take the evaluation stack at the point of a call to an instance methods and pop off n values (corresponding to the n parameters in the C# method), we should find a this there. If we find something else, we won’t rewrite.

While the first constraint is trivial to check for, the second one is a bit more involved. What we have at hand is a data-flow problem, which is a big thing in program analysis. In our case, we need to identify places in the code where this references are pushed onto the stack, and emulate how the stack changes when different opcodes are executed. To model the flow of data in a method (in particular: flow of this references), we first need to construct a control flow graph (CFG for short). A CFG shows (statically) the different possible execution paths through the method. It consists of nodes that represents blocks of instructions and edges that represents paths from one such block to another. A method without branching instructions has a trivial CFG, consisting of a single node representing a block with all the instructions in the method. Once we have branching instructions, however, things become a little more interesting. For instance, consider the code below:


public static int Loop(int x, int y)
{
while (x > 0)
{
x;
++y;
}
return y;
}

view raw

Loop.Add.cs

hosted with ❤ by GitHub

The CFG for (the IL representation of) that code looks like this:

cfg

As you can see, some nodes now have multiple inbound edges. This matters when we try to describe how data flows through the method. Let’s see if we can sketch it out by looking at what happens inside a single node first. A node represents a block of n instructions. Each instruction can be thought of as a function f : S -> S' that accepts a stack as input and produces a stack as output. We can then produce an aggregated stack transformation function for an entire node in the CFG by composing such functions. Since each node represents a block of n instructions, we have a corresponding sequence of functions f0, f1, ..., fn-1 of stack transformations. We can use this sequence to compose a new function g : S -> S' by applying each function fi in order, as follows: g(s) = fn-1(...(f1(f0(s)))). In superior languages, this is sometimes written g = fn-1 o ... o f1 o f0, where o is the composition operator.

Each node in the CFG is associated with such a transformation function g. Now the edges come into play: since it is possible to arrive at some node n in the CFG by following different paths, we may end up with more than a single stack as potential input for n‘s g function – and hence more than a single stack as potential output. In general, therefore, we associate with each node a set I of distinct input stacks and a set O of distinct output stacks. Obviously, if there is an edge from node n to node m in CFG, then all stacks in n‘s output set will be elements in m‘s input set. To determine the sets I and O for each node in the CFG, we traverse the edges in the CFG until the various Is and Os stabilize, that is, until we no longer produce additional distinct stacks in any of the sets.

This gives us the following pseudocode for processing a node in the CFG, given a set S of stacks as input:


def process S
if exists s in S where not I contains s
I = I union S
O = map g I
foreach e in edges
process e.target O
end
end
end

Initially, I and O for all nodes will be empty sets. We start processing at the entry node, with S containing just the empty stack. When we’re done, each node will have their appropriate sets I and O.

So now we have the theory pretty much in place. We still need a way to dermine the potential stack state(s) at the place in the code where it matters, though: at the call instruction for the recursive method call. It’s very easy at this point – we already have all the machinery we need. Assuming that a recursive call happens as the k‘th instruction in some node, all we have to do is compose the function h(s) = fk-1(...f1(f0(s))), alternatively h = fk-1 o ... o f1 o f0. Mapping h onto the stack elements of the node’s I set, we get a set C of stack elements at the call site. Now we pop off any “regular” argument values for the method call off the stacks in C to produce a new set C'. Finally we verify that for all elements in C', we have a this reference at the top of the stack.

Now we should be in good shape to tackle the practicalities of our implementation. One thing we obviously need is a data type to represent our evaluation stack – after all, our description above is littered with stack instances. The stack can be really simple, in that it only needs to distinguish between two kinds of values: this and anything else. So it’s easy, we’ll just use the plain ol’ Stack<bool>, right? Unfortunately we can’t, since Stack<bool> is mutable (in that push and pop mutates the stack they operate on). That’s definitely not going to work. When producing the stack instances in O, we don’t want the g function to mutate the actual stack instances in I in the process. We might return later on with stack instances we’ll want to compare to the instances in I, so we need to preserve those as-is. Hence we need an immutable data structure. So we should use ImmutableStack<bool> from the Immutable Collections, right? I wish! Unfortunately, ImmutableStack<bool>.Equals has another problem (which Stack<bool> also has) – it implements reference equality, whereas we really need value equality (so that two distinct stack instances containing the same values are considered equal). So I ended up throwing together my own EvalStack class instead, highly unoptimized and probably buggy, but still serving our needs.


class EvalStack
{
private const char YesThis = '1';
private const char NotThis = '0';
private readonly string _;
private EvalStack(string s) { _ = s; }
public EvalStack() : this("") {}
public bool Peek()
{
if (IsEmpty)
{
throw new Exception("Cannot Peek an empty stack.");
}
return _[0] == YesThis;
}
public EvalStack Pop()
{
if (IsEmpty)
{
throw new Exception("Cannot Pop an empty stack.");
}
return new EvalStack(_.Substring(1));
}
public EvalStack Push(bool b)
{
char c = b ? YesThis : NotThis;
return new EvalStack(c + _);
}
public bool IsEmpty
{
get { return _.Length == 0; }
}
public override bool Equals(object that)
{
return Equals(that as EvalStack);
}
public bool Equals(EvalStack that)
{
return _.Equals(that._);
}
public override int GetHashCode()
{
return (_ != null ? _.GetHashCode() : 0);
}
public override string ToString()
{
return "[" + _ + "]";
}
}

view raw

EvalStack.cs

hosted with ❤ by GitHub

I’m particularly happy about the way the code for the ToString method ended up.

Now that we have a data structure for the evaluation stack, we can proceed to look at how to implement the functions f : S -> S' associated with each instruction in the method body. At first glance, this might seem like a gargantuan task – in fact, panic might grip your heart – as there are rather a lot of different opcodes in IL. I haven’t bothered to count them, but it’s more than 200. Luckily, we don’t have to implement unique functions for all of them. Instead, we’ll treat groups of them in bulk. At a high level, we’ll distinguish between three groups of instructions: generators, propagators and consumers.

The generators are instructions that conjure up this references out of thin air and push them onto the stack. The prime example is ldarg.0 which loads the first argument to the method and pushes it. For instance methods, the first argument is always this. In addition to ldarg.0, there are a few other instructions that in principle could perform the same task (such as ldarg n).

The propagators are instructions that can derive or pass on a this reference from an existing this reference. The dup instruction is such an instruction. It duplicates whatever is currently at the top of the stack. If that happens to be a this reference, the result will be this references at the two topmost locations in the stack.

The vast majority of the instructions, however, are mere consumers in our scenario. They might vary in their exact effect on the stack (how many values they pop and push), but they’ll never produce a this reference. Hence we can treat them generically, as long as we know the number of pops and pushes for each – we’ll just pop values regardless of whether or not there they are this references, and we’ll push zero or more non-this values onto the stack.

At this point, it’s worth considering the soundness of our approach. In particular, what happens if I fail to identify a generator or a propagator? Will the resulting code still be correct? Yes! Why? Because we’re always erring on the conservative side. As long as we don’t falsely produce a this reference that shouldn’t be there, we’re good. Failing to produce a this reference that should be there is not much of a problem, since the worst thing that can happen is that we miss a tail call that could have been rewritten. For instance, I’m not even bothering to try to track this references that are written to fields with stflda (for whatever reason!) and then read back and pushed onto the stack with ldflda.

Does this mean TailCop is safe to use and you should apply it to your business critical applications to benefit from the immense speed benefits and reduced risk for stack overflows that stems from rewriting recursive calls to loops? Absolutely not! Are you crazy? Expect to find bugs, oversights and gaffes all over the place. In fact, TailCop is very likely to crash when run on code examples that deviate much at all from the simplistic examples found in this post. All I’m saying is that the principles should be sound.

Mono.Cecil does try to make our implementation task as simple as possible, though. The OpCode class makes it almost trivial to write f-functions for most consumers – which is terrific news, since so many instructions end up in that category. Each OpCode in Mono.Cecil knows how many values it pops and pushes, and so we can easily compose each f from the primitive functions pop and push. For instance, assume we want to create the function fadd for the add instruction. Since Mono.Cecil is kind enough to tell us that add pops two values and pushes one value, we’ll use that information to compose the function fadd(s) = push(*non-this*, pop(pop(s))).

Here’s how we compose such f-functions for consumer instructions in TailCop:


public Func<EvalStack, EvalStack> CreateConsumerF(int pops, int pushes)
{
Func<EvalStack, EvalStack> pop = stack => stack.Pop();
Func<EvalStack, EvalStack> push = stack => stack.Push(false);
Func<EvalStack, EvalStack> result = stack => stack;
for (int i = 0; i < pops; i++)
{
var fresh = result;
result = stack => pop(fresh(stack));
}
for (int i = 0; i < pushes; i++)
{
var fresh = result;
result = stack => push(fresh(stack));
}
return result;
}

Notice the two fresh variables, which are there to avoid problems related to closure modification. Eric Lippert explains the problem here and here. TL;DR is: we need a fresh variable to capture each intermediate result closure.

We’ll call the CreateConsumerF method from the general CreateF method which also handles generators and propagators. The simplest possible version looks like this:


private Func<EvalStack, EvalStack> CreateF(Instruction insn)
{
var op = insn.OpCode;
if (op == OpCodes.Ldarg_0)
{
// Prototypical *this* generator!!
return stack => stack.Push(true);
}
if (op == OpCodes.Dup)
{
// Prototypical *this* propagator!!
return stack => stack.Push(stack.Peek());
}
return CreateConsumerF(insn);
}

view raw

CreateF.cs

hosted with ❤ by GitHub

You’ll note that I’ve only included a single generator and a single propagator! I might add more later on. The minimal CreateF version is sufficient to handle our naïve Add method though.

Now that we have a factory that produces f-functions for us, we’re all set to compose g-functions for each node in the CFG.


private Func<EvalStack, EvalStack> CreateG()
{
return CreateComposite(_first, _last);
}
private Func<EvalStack, EvalStack> CreateComposite(
Instruction first,
Instruction last)
{
Instruction insn = first;
Func<EvalStack, EvalStack> fun = stack => stack;
while (insn != null)
{
var f = CreateF(insn);
var fresh = fun;
fun = stack => f(fresh(stack));
insn = (insn == last) ? null : insn.Next;
}
return fun;
}

view raw

CreateG.cs

hosted with ❤ by GitHub

Once we have the g-function for each node, we can proceed to turn the pseudocode for processing nodes into actual C# code. In fact the C# code looks remarkably similar to the pseudocode.


public void Process(ImmutableHashSet<EvalStack> set)
{
var x = set.Except(_I);
if (!x.IsEmpty)
{
_I = _I.Union(x);
_O = _I.Select(s => G(s)).ToImmutableHashSet();
foreach (var n in _targets)
{
n.Process(_O);
}
}
}

view raw

Node.Process.cs

hosted with ❤ by GitHub

We process the nodes in the CFG, starting at the root node, until the I-sets of input stacks and the O-sets of output stacks stabilize. At that point, we can determine the stack state (with respect to this references) for any given instruction in the method – including for any recursive method call. We determine whether or not it is safe to rewrite a recursive call to a loop like this:


public bool SafeToRewrite(Instruction call,
Dictionary<Instruction, Node> map)
{
var insn = call;
while (!map.Keys.Contains(insn))
{
insn = insn.Previous;
}
var node = map[insn];
var stacks = node.GetPossibleStacksAt(call);
var callee = (MethodReference)call.Operand;
var pops = callee.Parameters.Count;
return stacks.Select(s =>
{
var st = s;
for (int i = 0; i < pops; i++)
{
st = st.Pop();
}
return st;
}).All(s => s.Peek());
}

We find the node that the call instruction belongs to, find the possible stack states at the call site, pop off any values intended to be passed as arguments to the method, and verify that we find a this reference at the top of each stack. Simple! To find the possible stack states, we need to compose the h-function for the call, but that’s easy at this point.


ImmutableHashSet<EvalStack> GetPossibleStacksAt(Instruction call)
{
var h = CreateH(call);
return _I.Select(h).ToImmutableHashSet();
}
Func<EvalStack, EvalStack> CreateH(Instruction call)
{
return CreateComposite(_first, call.Previous);
}

view raw

CreateH.cs

hosted with ❤ by GitHub

And with that, we’re done. Does it work? It works on my machine. You’ll have to download TailCop and try for yourself.


Chasing your tail with bytecode manipulation

Last week I was at the TDC conference in Trondheim to do a talk entitled “Bytecode for beginners”. In one of my demos, I showed how you might do a limited form of tail call elimination using bytecode manipulation. To appreciate what (recursive) tail calls are and why you might want to eliminate them, consider the following code snippet:


static int Add(int x, int y)
{
return x > 0 ? Add(x 1, y + 1) : y;
}

view raw

Add.cs

hosted with ❤ by GitHub

As you can see, it’s a simple recursive algorithm to add two (non-negative) integers together. Yes, I am aware that there is a bytecode operation available for adding integers, but let’s forget about such tedious practicalities for a while. It’s just there to serve as a minimal example of a recursive algorithm. Bear with me.

The algorithm exploits two simple facts:

1. Addition is trivial to do if one of the numbers is zero.
2. We can work our way to trivial case incrementally.

So basically we just decrement x and increment y until we run out of x, and then all we have left is y. Pretty simple.

This algorithm works really well for lots of integers, but the physical world of the computer puts a limit on how big x can be. The problem is this: each time we call Add, the .NET runtime will allocate a bit of memory known as a stack frame for the execution of the method. To illustrate, consider the addition of two small numbers, 6 + 6. If we imagine the stack frames -uh- stacked on top of each other, it might look something like this:

add-call-stack

So we allocate a total of 7 stack frames to perform the calculation. The .NET runtime will handle that just fine, but 6 is a pretty small number. In general we allocate x + 1 stack frames, and at some point that becomes a problem. The .NET runtime can only accommodate so many stack frames before throwing in the towel (where the towel takes on the physical form of a StackOverflowException).

It’s worth noting, though, that all we’re really doing in each of the stack frames leading up to Add(0, 12) is wait around for the result of the next invocation of Add to finish, and when we get that result, that’s immediately what is returned as result from the current stack frame.

This is what is known as a tail recursive call. In general, a tail call is any call in tail position, that is, any call that happens as the last operation of a method. It may be a call to the same method (as in our example) or it may be a call to some other method. In either case, we’re making a method call at a point in time where we don’t have much need for the old stack frame anymore.

It should come as no surprise, therefore, that clever people have figured out that in principle, we don’t need a brand new stack frame for each tail call. Instead, we can reuse the old one, slightly modified, and simply jump to the appropriate method. This is known as tail call optimization or tail call elimination. You can find all the details in a classic paper by the eminent Guy L Steele Jr. The paper has the impressive title DEBUNKING THE “EXPENSIVE PROCEDURE CALL” MYTH or PROCEDURE CALL IMPLEMENTATIONS CONSIDERED HARMFUL or LAMBDA: THE ULTIMATE GOTO, but is affectionately known as simply Lambda: The Ultimate GOTO (probably because overly long and complex titles are considered harmful).

In this blog post, we’ll implement a poor man’s tail call elimination by transforming recursive tail calls into loops. Instead of actually making a recursive method call, we’ll just jump to the start of the method – with the arguments to the method set to the appropriate values. That’s actually remarkably easy to accomplish using bytecode rewriting with the ever-amazing Mono.Cecil library. Let’s see how we can do it.

First, we’ll take a look at the original bytecode, the one that does the recursive tail call.


.method private hidebysig static
int32 Add(
int32 x,
int32 y
) cil managed
{
// Code size 17 (0x11)
.maxstack 8
IL_0000: ldarg.0
IL_0001: brtrue.s IL_0005
IL_0003: ldarg.1
IL_0004: ret
IL_0005: ldarg.0
IL_0006: ldc.i4.1
IL_0007: sub
IL_0008: ldarg.1
IL_0009: ldc.i4.1
IL_000a: add
IL_0012: call int32 Program::Add(int32,int32)
IL_0013: ret
} // end of method Program::Add

view raw

Add.il

hosted with ❤ by GitHub

So the crucial line is at IL_0012, that’s where the recursive tail call happens. We’ll eliminate the call instruction and replace it with essentially a goto. In terms of IL we’ll use a br.s opcode (where “br” means branch), with the first instruction (IL_0000) as target. Prior to jumping to IL_0000, we need to update the argument values for the method. The way method calls work in IL is that the argument values have been pushed onto the execution stack prior to the call, with the first argument deepest down in the stack, and the last argument at the top. Therefore we already have the necessary values on the execution stack, it is merely a matter of writing them to the right argument locations. All we need to do is starg 1 and starg 0 in turn, to update the value of y and x respectively.


.method private hidebysig static
int32 Add (
int32 x,
int32 y
) cil managed
{
// Method begins at RVA 0x2088
// Code size 18 (0x12)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldc.i4.0
IL_0002: bgt.s IL_0006
IL_0004: ldarg.1
IL_0005: ret
IL_0006: ldarg.0
IL_0007: ldc.i4.1
IL_0008: sub
IL_0009: ldarg.1
IL_000a: ldc.i4.1
IL_000b: add
IL_0010: starg 1
IL_0011: starg 0
IL_0012: br.s IL_0000
} // end of method Program::Add

view raw

AddLoop.il

hosted with ❤ by GitHub

If we reverse engineer this code into C# using a tool like ILSpy, we’ll see that we’ve indeed produced a loop.


private static int Add(int x, int y)
{
while (x != 0)
{
int arg_0F_0 = x 1;
y++;
x = arg_0F_0;
}
return y;
}

You may wonder where the arg_0F_0 variable comes from; I do too. ILSpy made it up for whatever reason. There’s nothing in the bytecode that mandates a local variable, but perhaps it makes for simpler reverse engineering.

Apart from that, we note that the elegant recursive algorithm is gone, replaced by a completely pedestrian and mundane one that uses mutable state. The benefit is that we no longer run the risk of running out of stack frames – the reverse engineered code never allocates more than a single stack frame. So that’s nice. Now if we could do this thing automatically, we could have the best of both worlds: we could write our algorithms in the recursive style, yet have them executed as loops. That’s where TailCop comes in.

TailCop is a simple command line utility I wrote that rewrites some tail calls into loops, as in the example we’ve just seen. Why some and not all? Well, first of all, rewriting to loops doesn’t help much for mutually recursive methods, say. So we’re restricted to strictly self-recursive tail calls. Furthermore, we have to be careful with dynamic dispatch of method calls. To keep TailCop as simple as possible, I evade that problem altogether and don’t target instance methods at all. Instead, TailCop will only rewrite tail calls for static methods. (Obviously, you should feel free, even encouraged, to extend TailCop to handle benign cases of self-recursive instance methods, i.e. cases where the method is always invoked on the same object. Update: I’ve done it myself.)

The first thing we need to do is find all the recursive tail calls.


private IList<Instruction> FindTailCalls(MethodDefinition method)
{
var calls = new List<Instruction>();
foreach (var insn in method.Body.Instructions)
{
if (insn.OpCode == OpCodes.Call)
{
var methodRef = (MethodReference)insn.Operand;
if (methodRef == method)
{
if (insn.Next != null && insn.Next.OpCode == OpCodes.Ret)
{
calls.Add(insn);
}
}
}
}
return calls;
}

So as you can see, there’s nothing mystical going on here. We’re simply selecting call instructions from method bodies where the invoked method is the same as the method we’re in (so it’s recursive) and the following instruction is a ret instruction.

The second (and final) thing is to do the rewriting described above.


private void TamperWith(
MethodDefinition method,
IEnumerable<Instruction> calls)
{
foreach (var call in calls)
{
var il = method.Body.GetILProcessor();
int counter = method.Parameters.Count;
while (counter > 0)
{
var starg = il.Create(OpCodes.Starg, counter);
il.InsertBefore(call, starg);
}
var start = method.Body.Instructions[0];
var loop = il.Create(OpCodes.Br_S, start);
il.InsertBefore(call, loop);
il.Remove(call.Next); // Ret
il.Remove(call);
}
}

view raw

TamperWith.cs

hosted with ❤ by GitHub

As you can see, we consistently inject new instructions before the recursive call. There are three things to do:

1. Loop to update the argument values using starg instructions.
2. Insert the br.s instruction that will jump to the start of the method.
3. Remove the recursive call instruction as well as the ret that follows immediately after it.

That’s all there is to it. If you run TailCop on an assembly that contains the tail recursive Add method, it will produce a new assembly where the Add method contains a loop instead. Magic!

To convince ourselves (or at least make it plausible) that TailCop works in general, not just for the Add example, let’s consider another example. It looks like this:


private static int Sum(List<int> numbers, int result = 0)
{
int size = numbers.Count();
if (size == 0)
{
return result;
}
int last = size 1;
int n = numbers[last];
numbers.RemoveAt(last);
return Sum(numbers, n + result);
}

view raw

Sum.cs

hosted with ❤ by GitHub

So once again we have a tail recursive algorithm, this time to compute the sum of numbers in a list. It would be sort of nice and elegant if it were implemented in a functional language, but we’ll make do. The idea is to exploit two simple facts about sums of lists of integers:

1. The sum of the empty list is zero.
2. The sum of the non-empty list is the value of the first element plus the sum of the rest of the list.

The only complicating detail is that we use an accumulator (the result variable) in order to make the implementation tail-recursive. That is, we pass the partial result of summing along until we run out of numbers to sum, and then the result is complete. But of course, this algorithm is now just a susceptible to stack overflows as the recursive Add method was. And so we run TailCop on it to produce this instead:


private static int Sum(List<int> numbers, int result = 0)
{
while (true)
{
int num = numbers.Count<int>();
if (num == 0)
{
break;
}
int index = num 1;
int num2 = numbers[index];
numbers.RemoveAt(index);
List<int> arg_27_0 = numbers;
result = num2 + result;
numbers = arg_27_0;
}
return result;
}

And we’re golden. You’ll note that ILSpy is just about as skilled at naming things as that guy you inherited code from last month, but there you go. You’re not supposed to look at reverse engineered code, you know.