Inspirational math

It’s pretty cute that self-help people have discovered exponential functions. Just improve 1% every day, and you will have improved almost 38-fold by the end of the year! Because you see, 1.01^365 is 37.78. Wow! So inspirational!

But what does it mean? What does it mean for me to improve 1%? What is it that I improve? My creativity? My output? My writing? My talks? My interpersonal skills? Something else? Is it everything I do? Because that sounds like a lot, to get 1% better in total, across everything I do. I’d spread myself thin, wouldn’t I? So I guess I must choose, but what will happen to my other skills then? Because skills in general need to be maintained, right? Will they somehow stay unchanged? Or will they drop 1% per day? But that would be terrible – an accumulated skill loss of more than 97% by the end of the year!

Where did the 1% number come from? How much does 1% improvement entail? Is it realistic? I guess it’s supposed to sound like a minuscule improvement, something easily achievable, but is it really? How can we tell? In fact, if 1% is so easily within our reach, why stop there? Why not put in some extra effort and go for 2% improvement? That would yield a truly mind-blowing 1377-fold improvement!

But that’s obviously ridiculous. We can’t just arbitrarily decide to improve by 2% every day! That’s absurd! But how is 1% any different? In fact, how can we be sure that the improvement – whatever it is, and whatever it even means – is not something like 0.1% instead? It makes for less impressive math of course – 1.001^365 is approximately 1.44. It is still a significant improvement though! To get 44% better at something in a year (even if I’m not entirely sure what that means unless I have a very clear metric, like time spent on something, or weights lifted or whatever) is truly remarkable in many cases. But of course it could be even bleaker. Maybe daily improvement is unrealistic, and we have to settle for weekly? We’re down to 5%. I’m not sure about you, but I’m getting less inspired. But which calculation is most realistic? How can we tell?

But it gets worse. Is it likely that we’ll be able to improve any skill or process consistently by any fixed percentage over time? If you consider an athlete, do you think they can run 1% faster every day? Or even 0.1%? Most things don’t improve that way at all! Improvement tends to be easiest in the beginning. As you get better, it gets harder and harder to improve. Even with deliberate practice, it is very hard to improve by a fixed amount every day – not to mention an exponentially growing amount! After all, 1% at the end of the year is a much larger improvement than at the beginning – that’s the whole point of exponential functions! That growth of daily increment is where the impressive 38-fold improvement comes from! It makes you wonder if exponential functions really are a good fit for most optimization problems.

Of course, continuously trying to make things a little better, or to learn something new every day, remains good advice. But we don’t need to dress it up in arbitrary, ill-fitting math to make that point.


Drift into debt

Kent Beck recently published a blog post called “Friction >> Debt”, where he suggests that the “technical debt” metaphor has outlived its usefulness and suggests “friction” as a replacement. The problem, according to Beck, is that “the business folks” are prone to interpret “technical debt” as a weak excuse for wanting to spend time fixing stupid mistakes that could have been avoided in the first place. Why would they want to pay for that?

This is certainly a problem. I’m not sure how to react to the fact that even Kent Beck still finds himself in the situation where he needs to explain to the business people that software development by its very nature involves spending some time and effort on development activities that don’t directly result in new or changed features. Is it heartening (same boat)? Or depressing (it’s sinking)?

Anyway, I’m not optimistic that switching metaphors will help. The reason is that it’s not just a matter of communication across the business/IT divide. What is lacking is a better understanding of the dynamics that produce the situation we – both developers and business people – invariably find ourselves in: that our joint venture, the software we build, has accumulated so much “debt” or “friction” or “cruft” or “dirty dishes” or whatever we want to call it, that our ability to make the software more valuable for end users has degraded to the point where it can no longer be ignored. I believe this understanding is missing not just from the business people, but us software developers as well. We might observe the symptoms and consequences of this process more directly, but I’m not convinced we understand the forces that produce them very well.

When we speak of “change” in software, we tend to think about changes in functionality: new features, extensions, refinements; in short, things that end users are meant to appreciate, and that the people footing the bill are relatively happy to pay for. But as we know, programs are hit by many other changes as well. Frameworks and libraries constantly evolve, some with breaking changes. Security vulnerabilities pop up and must be addressed. The license costs of a service provider increases by an order of magnitude and a replacement must be found. Old cloud offerings become obsolete and are eventually discontinued, while new ones appear. I’m sure you can think of many more. It is a constant push-stream of changes. We might not even notice all of them immediately, but they are still there.

When something changes that affects the software in some way, it produces a gap or a delta. This goes for all kinds of changes. Suddenly, there is a difference between the software as it is and the software as it should be. There may even be a difference between our current mental model of the problem domain and one that is suitable to support the change. These differences represent outstanding work. Actions are required if the gaps are to be closed. Since programs are dead and inert by themselves, it falls upon us humans to perform these actions.

Unfortunately, the process of gap-closing is riddled with problems. A change must first be noticed and reasonably understood, and then the corresponding gap must be addressed in a meaningful manner. To figure out the most appropriate response to a change may require deep understanding of many things: the problem domain, the mental model of that domain and the conceptual solution to the problem, the actual code itself, the architecture of the system, the programming models and technologies involved, the non-functional properties of the system, its dependencies on other systems, the runtime environment, and so forth. This is necessary to identify where and how the change affects the software, what parts need to be changed, to what extent the change is compatible with the assumptions the current solution builds upon, how it will interact with existing features, etc etc.

In the best case, it is obvious what we need to do to close the gap, and there will be no residues. For instance, we may be able to fix the gap caused by the discovery of a security vulnerability in a library by bumping the library version. But often we are not so lucky. In fact, it might not be immediately obvious what the proper response should be. Maybe there are different alternatives with different trade-offs. If we are to find a good solution, we might need to spend time researching and exploring different options. At the same time, we are always bound by constraints on time and money. Responding to any particular change competes with all the other changes in the queue. This dynamic tends to favor minimal solutions with little immediate risk over optimal solutions for the long-term health of the software.

The physical properties of the code plays a role here, too. Code has a mass of sorts. As such, it exhibits inertia in the face of change. This inertia not a property of “bad” code alone. All code has inertia. As a rule of thumb, the more code, the more inertia. That said, some code is more “dense” than other code, and as such represents more mass per line of code, and exhibits more inertia. But even simple code has inertia and presents an obstacle to change. Change that involves changing code is never free. Again, this favors minimal solutions that require us to change as little as possible of the existing structure.

In general, there will be a residue when we respond to change. We close the gaps, but not completely and not without compromise. In the process, we may incur assumptions that limit our degrees of freedom in responding to future changes. Over time, we accumulate lots and lots of these imperfectly bridged gaps that are woven into the fabric of the code. Compromises, assumptions, and imperfections multiply and ossify. The system becomes rigid and brittle. This is when we start talking about friction and debt. We have painted ourselves into a corner. There are no more minimal solutions.

Various factors influence how effective we are in closing the gaps, but we can’t avoid the fundamental processes at work. At best, a healthy and highly competent team in a healthy environment can minimize the accumulation of residues by favoring solutions that benefit the long-term health of the software, choosing simplicity over sophistication wherever possible, recognizing gaps early, consolidating models and architecture regularly, and investing in the skills and competence of its members. On the flip side, there are many things we can do to speed up the accumulation of problems, both inside and outside the team. We are familiar with many of them: unreasonable deadlines, gamification of arbitrary metrics, high pressure, low trust, no slack, focus on short-term goals, doing the bare minimum, avoiding difficult tasks, neglecting consolidation and structural improvements, and so forth. An enabling problem underlying many of these dysfunctions is a lack of understanding the dynamics of software development, and how the forces and constraints we operate under affect the systems we build.

Our poor understanding of the processes that lead to technical debt is mirrored in the generic and feeble language we use to describe the process of paying it off as well. In fact, it’s a bit of a stretch to call it a language, it’s a single word: refactoring. While refactoring as an activity is fine, by itself it says nothing about the processes that should drive that refactoring. After all, refactoring just means restructuring a body of code. We must do so in a way that restores the system’s ability to handle change and evolution, which means we will need to make significant changes. Chances are, we’ll need help from the business people in the process. We may need to revisit previous decisions and solutions if they don’t form a coherent whole. We may need to look at the assumptions we have made, to what extent they are still valid, and whether or not they hold us back. We may need to come up with a better mental model and language for the problem domain to drive our software going forward. We may have to remove features that don’t fit in. We may need to delete some of our favorite code. We may have to look for constraints we can embrace that will simplify our architecture, make it more consistent, and enable desirable system properties. These are crucial tasks that go way beyond mere code hygiene. Our hope in succeeding lies in the benefit of hindsight. We can apply our learning to make better, more consistent choices retroactively. But it requires an environment that is both capable of learning and understands the necessity of feeding that learning back into the system to improve it.

The nature of software development is such that the entropy of the system must grow over time. We are bound to drift into debt. That’s just the way it is. We can mitigate the effects of that process or we can accelerate it, but we can’t avoid it. This is a basic insight that we developers first need to accept ourselves, and then share with our friends, the business people. We must take collective ownership of this reality if we hope to improve our ability to build software systems that adapt and evolve successfully in a world of constant change.


Don’t settle for a playground

This is a transcript of a talk I did at Booster 2023 in Bergen.

Here’s a scene. Imagine we’re having a stand-up meeting. A stand-up meeting, as you will be aware, is a ritual where a team of software developers stands in a circle in the hope of summoning the spirit known as agility. If a sufficient number of teams performs this ritual, it brings about a digital transformation in the organization, and everyone becomes happy and productive.

Anyway, we’re having this stand-up meeting, and something comes up. Suddenly everyone goes quiet, and gazes down at their feet. There is a moment’s awkward silence, until someone looks up from under their eyebrows and says “we’re gonna need a grown-up for that”. And then we snicker a bit, cast sideways glances, shuffle our feet, and move on to something else.

Is this scene familiar to you? Have you been in this scene? I don’t know how culturally specific it is, but I’ve been in this scene multiple times. And I must say that to me, this scene is the least dignified scene in all of software development. It makes my skin crawl, it fills me with shame, and it makes me angry.

Here’s the thing: I’m not a child, and I’m not going to cosplay as one. I’m a grown man! I’ve been married for 20 years! We did a house renovation project last year. It was hell and cost a small fortune. I have two teenagers at home. Even my kids aren’t kids anymore! If I’m not grown-up, then who the hell is?

And it’s not just me. Take a look in the mirror! Do you see a carefree child, or a person juggling a tangled mess of incompatible responsibilities and making trade-offs all day long, every day? Of course you do. Everyone does.

And still we perform that stupid little scene. At least in my part of the world we do. I’m sure I know developers who played out that scene just last week. Isn’t that strange? I think it’s strange. Why do we do it?

When we behave in strange ways, it’s often because we’re getting in touch with something uncomfortable. And so we look for a way out, a way to diffuse the nervous energy and get back to normal. We use jokes for that. But we shouldn’t. We shouldn’t fall for that temptation. Discomfort is actually useful and interesting. It can be a clue, a symptom of something. If we look a little closer at that discomfort, maybe we can learn something, and maybe we can fix the problem at the source.

Obviously, it has something to do with brushing up against some kind of boundary. We find ourselves at the threshold of something, and the most obvious candidate is the good old business/IT divide. That awkward point where what we do as software developers meets the reason why the business exists. And when it feels too business-y, we retreat, and call for the real business people to take over.

When we say “we’re gonna need a grown-up for that”, what does it really mean? It’s code for something, right? What could it be? What are we really saying? Here are a few candidates:

  • We don’t have the autonomy to make that decision.
  • It’s not our responsibility.
  • We don’t have enough information.
  • It’s not just up to us, it’s going to affect someone else too.

If I’m right, and these are the kinds of things we’re trying to say, why don’t we say them outright? It seems like a shame, because they all sound like really interesting topics to me. We could have meaningful conversations about our environment, our boundaries, the constraints we operate under, our dependencies on others, our degree of ownership over the problem we’re solving, etc etc. And not just as generic, abstract topics from the realm of philosophy either, but concretely: we could build our understanding of what these big words actually mean to us, in our context. But if we just laugh it off as something awkward, we won’t be having those conversations. And so those things are going to remain unexamined and unclear.

You might say, ok, but why does it matter? What’s the harm? It’s just a joke, right? I think it matters, and that it does real harm. First of all, the child/adult trope sediments a very real divide between “the business people” that make decisions on one side, and “the developers” that carry out instructions, essentially still just building to order, on the other. This is the divide that we’ve been fighting for at least 20 years of agile software development. I would argue that it’s a really bad idea to reinforce that divide, even in the form of a joke.

And second, it’s much easier to challenge the things we say outright. We don’t have the autonomy to make that decision? Ok, let’s talk about that. First of all, is it true? Or are we just scared, or reluctant, to exercise the autonomy that we do have? And if it is true, what would need to change for us to have that autonomy? Is it possible to make those changes? Who would need to be involved? What would the consequences be? This conversation offers a path towards potentially increased autonomy, or at least clarified autonomy.

And we do want autonomy, right? I’m sure that many of you are familiar with Daniel Pink’s book Drive, about what motivates us as knowledge workers. He says there are three things: Autonomy, mastery and purpose. And autonomy is the first word in that list.

Well, what kind of autonomy is that going to be? What are we aiming for here?

Consider the difference between the autonomy of children and the autonomy of adults. If you take a bunch of children and put them in a playground, they have considerable autonomy within that playground. They can run around freely, without a care in the world. They can do what they want and play with whatever toys are available. But they can’t venture outside. They need a grown-up for that.

Adults have a very different kind of autonomy. They can do many more things than children can, but that type of autonomy comes with a bunch of strings attached, like responsibility and accountability. Right? When we make a choice, we’re willing to own the consequences in a way, or at least be able to explain to ourselves and others why we made that choice. In a sense, that’s what it means to be an adult.

Similarly, I think we can distinguish between autonomy inside and outside the developer bubble. That’s our playground. I once tried to tell a joke, that the ultimate autonomy for a developer is “THEY LET US USE HASKELL”. I thought that was pretty funny.

The thing is, no-one really cares what programming language you use. Not really. They might have some concerns about being able to replace you if and when you leave, but ultimately it doesn’t matter. It’s not important. They’ll manage. They’ll find someone. Similarly, no-one really cares if you have one or two or three monitors on your desk, or what kind of mechanical keyboard you prefer, if it’s an expensive model. They can afford them all. It’s peanuts. It’s insignificant. It’s just toys to them. Choose whichever you like.

Now consider this: if no-one really cares what you decide, does it even make sense to think of it as autonomy?

If we want real, significant autonomy, it’s going to have to be about entirely different things! Almost by definition, it’s going to be things that are painful to yield control over, because they actually matter to the so-called business people. It’s going to require real trust, and a conviction that your team is in fact best equipped to make that decision. It must be because they realize that you actually understand the business problem the best, and that you are best situated to own the consequences of that decision. That’s what it means to own a business problem, or to focus on business outcome over output. You can’t do that if your idea of autonomy is to use Vim.

Ironically, real autonomy tends to involve exactly the things that make us snicker nervously and say “we’re going to need an adult for that”. The thing to realize is that, yes, obviously we’re going to need an adult for that. Probably several. But the good news is: here we are! The adults we need are right here! We’re standing in a circle, trying to summon agility!

Of course I realize that it’s not just up to the team itself. We’ll have to fight for that kind of autonomy, and there will always be limits. Not even adults have unbounded autonomy. But we can explore those boundaries, and see where they yield to pressure. There’s no reason to fail on the safe side.

I would like to see development teams with much higher ambitions with respect to autonomy. The aim should be to become tiny organizations in their own right: self-organizing, self-governing and self-modifying. They have a mission and a budget, and they are free to carry out that mission in the way they see fit, within that budget. They can even challenge the mission itself, because they understand why the mission is important, and how it fits into the larger scheme of things. They can pivot if they find a more useful mission to pursue within that scheme. Or they can choose to disband. Dissolve the team itself, if it turns out that mission is ill-advised or infeasible or outdated. Just pass whatever money you have left back and say “it would be better for the business to invest this in something else”. That would be the grown-up thing to do.

But of course, we must really want that kind of autonomy. The kind that comes with responsibility, accountability and real business impact. We must prefer to be adults. We can’t settle for a playground.


Dragging a dead priest: programs and programmers in time

A fundamental challenge we face as programmers is that the world is alive, but the programs we write, alas, are not. When we deploy our dead programs to operate in the living world, it presents an immediate maintenance problem. Or, if you like, a matter of keeping up appearances.

In the movie Night on Earth, Roberto Benigni plays an exuberant and eccentric taxi driver working the night shift in Rome. You might be able to see the movie clip on YouTube. If you can, you should.

As night turns into early morning, the taxi driver picks up a priest. During the ride, the talkative driver starts confessing his many sins of the flesh to the priest – including affairs with a pumpkin, a sheep named Lola, and his brother’s wife. The priest listens in increasing disbelief and discomfort. Apparently he has a weak heart. By the time the taxi driver has finished his confessions, the poor priest is long dead.

What should the taxi driver do? He decides to try to make it look as if the priest has not in fact died, but rather just fallen asleep on a bench. Unfortunately, the priest’s body is unwieldy and heavy, which makes the task cumbersome. (As the taxi driver puts it, “he is only a priest, but he weighs enough to be a cardinal”.) He drags the priest out of his taxi, and with much effort manages to place him on a bench. But dead people are not good at sitting upright. The priest repeatedly slides down. After several tries, however, he succeeds in placing the priest in a reasonable sitting position. He immediately notices another problem: the priest’s eyes are wide open. It’s not a natural look. He tries to close his eyelids, but they spring back open. The solution? The sunglasses he has been wearing while driving taxi at night. He puts the sunglasses on the priest, and drives away. As he does so, the priest slumps to the side.

I suggest that we programmers are Roberto Benigni in this story, and that our programs play the part of the priest. We are constantly dragging our programs around, trying to make them look alive. In our case, we have the additional problem that the programs grow larger as we drag them around – as if the priest were gaining ever more weight.

The world around us constantly changes. Our programs are unable to respond and adapt to those changes by themselves. They are, after all, mere code. They are dead and have no agency. It falls on us, the programmers, to run around modifying the code as appropriate in response to the changes that occur in the world. Unfortunately, since this is a manual process and we are bound by time and economic constraints, our response is bound to be partial and imperfect. We might not immediately be aware of all relevant changes that we need to respond to. Sometimes we don’t know exactly how to respond. We might not consider it worthwhile to respond to the change. The change might be incompatible with previous choices or assumptions we have made. It might take a lot of time to make the necessary changes to the code. When there are many changes, we might lag behind.

The problem, then, is that our programs are dead structures that exist in time. By necessity, they exhibit inertia in the face of change. Effort must be expended to overcome that inertia. Effort does not only cost resources, but also takes time. And time brings with it more changes.

There are further complicating factors as well. For instance, it’s increasingly hard to clearly define what the program is and isn’t. Is it just the code in the git repository? Surely not, because it almost certainly depends on third-party components and external services as well, not to mention a hosting environment, perhaps in the cloud somewhere. In a sense, our code is just a plug-in to this much larger program – which also has structure, is affected by change, exhibits inertia, and needs external help in adapting to that change. The interplay between changes and more or less delayed responses to these changes in various parts of the conglomerate program can be chaotic.

Speaking of help: the program is interwoven with the organization that creates the program, as part of a larger socio-technical system. This system is also hard to delineate. The distinction between system and non-system is an abstraction that we will into being by collaborative illusion-making. And of course the people, too, are in a sense structures (albeit live ones), with their jobs, roles, salaries, status, group memberships, relationships, loyalties and trust. We also exhibit inertia in the face of change.

What changes brush up against and confront all this inertia? All kinds: functional requirements, non-functional requirements, vulnerabilities in third party components, fashions and trends, legislation, macro and micro economic developments, funding, people leaving and joining – anything that changes the relationship between the program (or our part of it, anyway), the organization and the world they both operate in. The world is an engine that creates constant gaps, and we must keep running to stand relatively still. Or rather to maintain our balance, since the world is spinning, never where it was a moment ago.

What happens if we can’t keep up? If the gaps between expectations and reality becomes too large, and the illusion of the living program breaks down? Is it a second death that awaits when the priest has become too heavy for us to move?


NO! Programming as Other

A year ago I was reading the CfP for HAPOP 2020, the 5th symposium on the History and Philosophy of Programming, and there it was again, the quasi-philosophical question that keeps haunting our industry. In fact, it was listed as the very first question in a long list of potential questions of relevance for the conference:

Can/has been/should be programming understood as an art, a craft and/or a science?

Which is admittedly a strange version of the question, as it doesn’t quite parse and also contains several permutations of the basic question written as a single question. It kind of makes you suspect that it was written by a programmer. But it’s still fundamentally the same question:

Is programming art or science?

And my immediate reaction was: Et tu, Brute? Even you, HAPOP?

And my second response was just NO! I reject the question! I hate it! And I didn’t necessarily know why, but I knew very well that I did, and so I quickly wrote an incoherent rant in my TiddlyWiki to get it out of my system. And then nothing much happened for a year. Until now, in fact. This blog post is an attempt to write down a more coherent version of that rant. It is an attempt to perform a little introspection and try to explain my own reaction to myself. What is it about this question that annoys me so much? What’s wrong with it? Where does the anger come from?

Let’s start by considering the question as a meme. Memetics is the idea that memes – ideas or soundbites – are like genes in that they replicate, mutate and compete for attention.

When discussing a particular meme, an interesting question is how valuable it is. What is it worth? One way you might measure the value of a meme is by looking at the output it generates – the sum of the insights and new ideas it produces over time. And I’m frankly very, very disappointed by the output generated by the question “is programming art or science?”.

From what I can tell, the original “is programming art or science?” article is Knuth’s 1974 ACM Turing Award lecture “Computer programming as an art“. If you read that lecture, and compare it to whatever people have been writing about the topic since then – at least the writings I’ve found – you’ll find practically no new insights. None! Instead you’ll find poor echoes, and echoes of echoes, of what Knuth tried to say in 1974. That’s a terrible track record for an idea: close to 50 years of stagnation, of rehashing the same stale points over and over. And yet we continue to treat that question with reverence, as if it were somehow an important question, one that is worthwhile to keep asking.

It’s a shame that memes don’t compete by the usefulness of their output. But alas, for a meme to be reproductive, it just needs to make the host feel good, and a good way to do that is to appeal to our vanity. And it just so happens that merely asking the question “is programming art or science?” makes us feel sophisticated and philosophical, we can sip a little wine, gaze into the horizon and nod as we contemplate and admire the depth of our own souls.

It’s also useful for a meme to create a sense of community, of recognition and belonging – a sort of in-code for members of a club. One of the things that unite us as programmers is that we are constantly discussing if we’re doing art or science. And the answer that we have all memorized (yet at intervals feel the need to repeat to each other), is this: “it is neither and both and isn’t it grand”. And if you get really worked up about that answer, you write a blog post. Such blog posts invariably approach the question in a slightly roundabout fashion, which involves a Google search for more or less arbitrary definitions of art and science that fit your purpose, and then meandering a bit back and forth before concluding with what was already known. And again, that pattern was laid down by Knuth in 1974. Nothing new there.

That silly little dance is probably what annoys me the most about the question. It is as a form of pseudo-intellectual posturing and peacockery. It’s just show, there is no substance to it. It’s very very easy to do. It requires no original thinking. While dressing up as somehow intellectual, it really represents the opposite: mindless parroting.

And it’s not without cost. After all, memes do compete for attention and replication, and if we’re constantly asking the same old question (which generates no new output, no new insights into what programming is or isn’t) then we’re wasting precious meme space. We should free up that space to ask different questions. There’s an opportunity cost to asking unproductive questions over and over again, as if they were interesting. The unproductive questions occupy brain real estate that could be used for something productive instead. Hence we need new questions, questions that will stimulate actual intellectual effort.

So much for memes. Another perspective has to do with linguistics, or at least it seems that way to me, a non-linguist. If we are to take the question “is programming art or science?” seriously, we should start by taking a close look at what we mean by the words we use. For instance, we’re not very explicit about what we mean when we use the word “is”.

Consider a sentence like “programming is art”. What is the significance of such a sentence? What could it mean? Do we mean it descriptively, as a statement of fact – I think programming is art in an objective sense – or do we mean it normatively, as something that we wish were so – I think programming should be art, programming would be better if it were art. We can’t possibly mean that programming actually equals art. They’re not synonyms. There must be some other relationship between the words. Presumably it’s more about readings or perspectives than about identity. We’re perhaps trying to paint the meaning of the word “programming” by applying the paint of other words, which seems to be the way language works.

In reflection of that, let’s see what happens when we substitute the word “as” for “is”. It seems to be closer to what we’re actually trying to accomplish. We can ask, for instance, what happens if we consider programming as art, what insights might follow from that. (And to be fair to Knuth, that was actually the title of his 1974 lecture: Computer programming as an art. He didn’t say that programming was art, he said that programming could be and perhaps should be viewed as art.) Or we can ask, what happens if we consider programming as science, what insights might follow from that?

But we never seem to do that. We always discuss programming as art and as science at the same time, together. They’re tied together, and for that reason we never delve very deeply in either of the perspectives. Why is that? Why is the comparison more important than the perspectives by themselves? I think it’s because we’re really using art and science as metaphors. Art isn’t just art, it is a symbol for creativity, gut feeling, the humane. Similarly, science represents cool reason, logic, the mechanical. When we’re asking (over and over!) if programming is art or science, art and science act as metaphors for these perceived opposing forces. Sometimes, we substitute “craft” for “art” or “engineering” for “science”, but the metaphorical battle between the elemental forces remains the same.

One might expect that such an exercise would be fruitful, that it would produce useful output. There is something to be said about using two more or less opposing perspectives to describe something. But at the same time, Knuth has already done that exercise for us, and apparently we don’t have much more to say. If we did, we should have done so by now. Moreover, viewing art only in contrast to science and science only in contrast to art is also very limiting. It holds us back from going very far in our consideration of programming as either art or science.

For instance, let’s go back to the reading of programming as art. Such a perspective could be greatly expanded upon, but it’s unlikely that it will happen as long as it is forever joined at the hip to that other question. If art is always considered in contrast to science, then we won’t ever delve very deeply into the notion of programming as art as such. But we could. We could take the perspective programming as art seriously, and we might even discover something interesting, or at least be saying something that hasn’t been said a million times before. Even without much thinking, it is very easy to sketch some basic questions that still never come up in these endless repetitions of the art-vs-science meme.

If we consider programming as art, what is the artwork? To Knuth, the answer is the source code, the “listing” as they called it back then, presumably because you had it printed out, and could treat it as a physical artifact. And it’s both interesting and peculiar to think of it that way. Transposed to the world of music, it’s like considering the sheet of music to be the artwork rather than the actual music you hear. Moreover, since no-one has expanded much on anything in Knuth’s original article, that’s the only perspective I’ve seen, which is quite startling. It’s certainly not the only possible answer. The ontological status of programs is actually very interesting. What is the program? Can we really ignore what happens at runtime, the execution or performance of the program if you like? Is what happens at runtime implicit and somehow baked into the source code? Is the program the sum of all possible executions of the program? All actual executions? Are the partial simulated executions inside a programmer’s head included as well, including erroneous ones? What about self-modifying programs? Or a program’s evolution over time in general?

Who is the artist? In Knuth’s time, it seems, it was assumed that there was a single author for programs, and so a single artist for the artwork. But much if not most programming today is collaborative. Does the artist matter? Is the artist interesting? What about mob programming?

What about the experience of art? Obviously that’s going to be related to what the artwork is, the nature of the artful artifact. Is it tangible? Visible? Decipherable? To whom? Who is the audience? Can only programmers experience programming as art? Come to think of it, where are the exhibitions, and what are they like? If programming can be seen as art, can be artful, surely there must be art exhibitions, where we can appreciate the art of programming? Is it GitHub?

I guess you could argue that the demoscene arranges exhibitions of the art of programming. But I don’t think most programmers think of the demoscene when they talk about programming as art. Maybe they’ll mention it when pushed by someone asking annoying questions, but my impression is that it’s mostly about the sentiment that “code can be beautiful”. Which is fine, but not very profound. It also implies a very limited and antiquated notion of art. Modern art is often unconcerned with beauty or meaning in a conventional sense. (What is the aesthetic of programs? When is a program kitsch?)

Those are just a few of the questions that can be asked to delve at least a little deeper into the reading of programming as art. And we could do the same for programming as science.

A more pressing issue is: why on earth would we stop at just those two? It seems to me that we suffer from extreme lack of imagination if those are the only two readings of programming that we can give, and that we are somehow condemned to repeat those two readings forever, in tandem. It’s like you could never analyze Hamlet from anything else than a Freudian angle, as an Oedipus drama. I think we desperately need to break out and look at other perspectives. What is programming as neither art nor science? What is it then?

What happens if we think of programming as something else entirely, or even as the negation or absence of something? Are there aspects of programming that somehow escape our attempts to capture or describe them with metaphors? What is the irreducible other of programming?

Eventually we might ask ourselves – as we perform these readings, these attempts to highlight some aspect of what programming might be – what is the purpose of this exercise? Why are we doing it?

That might seem silly, because surely we do it to understand programming better, right? It’s an act of interpretation aimed at answering the question “what is programming?”.

But that’s not the only possible explanation. Another way to look at it is that we are reading – interpreting – programming to be something ourselves. Because programming is something that we, programmers, do. Whatever programming is determines who or what we are. Programming makes the programmer. Whatever metaphor we apply acts a mirror, and we want to like what we see in the mirror.

For instance, viewing programming as art makes the programmer an artist. This is clearly a desirable identity for some, with Paul Graham being the most prominent example. Unfortunately, borrowed clothes often fit poorly, and borrowed identities are often hollow. You risk that someone will call your bluff. Programmers are first and foremost programmers. And why shouldn’t that be enough? I am curious as to where this impulse to borrow legitimacy and status from other identities comes from. Why aren’t we content to be mere programmers? Why do we feel the need to borrow feathers from others?

To summarize then: “is programming art or science?” is a stale meme that generates little insight. It is as much about crafting an interesting identity for ourselves as it is about programming as such. If we truly want to understand the nature of programming better or our own identities as programmers better, we should focus on more specific, more direct, more honest questions. There is no shortage of such questions if we are prepared to make a little effort. But the more to the point the questions are, the less room there is for bluffing.


Programmer vs developer

I read this tweet today:

I’ve seen variations of that statement many times over the years. It’s a statement that resonnates with many people. It’s easy to see why. After all, we’re not mere programmers. We do lots of things.

And yet I find myself going in the opposite direction. I am increasingly referring to myself as a programmer these days. You might say I’ve come full circle.

Many years ago, I started out thinking of myself as a programmer. I was naive, I didn’t know what software development was really about. Then, quite quickly, I started thinking of myself as a software developer instead. But now I think of myself as a programmer again. This despite doing a lot more “non-programmer” tasks now than I used to. I also think code is less important than I used to. In fact, I’m more of a software developer and less of a programmer than ever. And yet I’ve started going back to the designation of programmer. How come?

How can I explain these two switches, first from programmer to developer, then back to programmer again?

The switches correspond to exactly two realizations.

The first realization, maybe twenty years ago, was that programming isn’t everything. To be effective in my work, I soon figured that I had to do more – much more! – than just write code. And so I called myself a software developer in reflection of that.

I realized that there were many things I needed to master that were just as important as programming. For one thing, I needed to learn to collaborate effectively with other people (how to negotiate, explain opinions, share ideas, handle being challenged, compromise, how to speak up and how to shut up). I also needed to learn that figuring out what the problem should be is just as important as chasing a solution as quickly as possible, that there are many problems for which there are no technical solutions, and a slew of other things. I even realized that code quality is largely determined by contextual forces, not coding chops. I realized that agility is primarily a property of organizations. To deal with all of this, I started looking into domain-driven design, thinking about language, categorization, abstraction, systems thinking. I grew aware of socio-technical factors.

Along that path, I’ve eventually come to a second realization. That realization, made only recently, is that programming isn’t everything.

Ha ha! Gotcha! It’s the exact same realization! But my interpretation is different now.

Here’s how I think about it these days: If we really mean that programming isn’t everything, if software development is a multi-disciplinary, cross-team effort, then how come only programmers are referred to as software developers? That doesn’t sound right at all. Why aren’t the user experience experts software developers? The testers? The graphic designers? The product owners? They all develop software, don’t they?

Hence, the reason I’ve started referring to myself as a programmer again is not that I’m not a software developer. I am. It’s that I’m not the only software developer. By referring to myself as a software developer, I feel like I’m perpetuating the view that as a programmer, I am somehow special in this effort. Like I’m the one doing the real work. I’d like to get away from that. It’s not true.

The one activity that I do do that people with other specialities don’t do, is program. I am a programmer. Some of the other software developers I work with are not.


Conway’s mob

Is there anything interesting happening at the intersection between Conway’s law and mob programming? Yes! What? Read on!

As you may know, the term “Conway’s law” comes from a 1968 paper by Mel Conway called “How do committes invent?“. If you haven’t already, I really recommend taking the time to read the paper. It’s just four pages! Granted, it’s four pages in a really tiny, downright miniscule font, yes, but still just four pages. It’s also very readable, very approachable, very understandable. I bet you could work through it in half an hour with a good cup of coffee. And if you do that, you’ll get a much richer understanding of what Mel Conway is trying to say, the argument that he is making. If nothing else, you can use that understanding to call out people who write blog posts or do talks at conferences when they’re misrepresenting Conway. It’s a win-win.

So what did Mel Conway say in 1968? Here’s my take. The paper talks about the relationship between organizations and the systems they design. There are two things I’d like to highlight. First, in the introduction, Conway says that given some team organization – any team organization – there is going to be a class of design alternatives that cannot be effectively pursued by that organization because the necessary communication paths aren’t there. This tells us, of course, that communication is vital for design! But perhaps we already suspected as much, and if that were all there was to it, perhaps we wouldn’t be reading Conway’s paper today. But then he goes on to argue that the influence of communication patterns on design is much more direct than we might expect. Indeed, he says that organizations are constrained to produce system designs that are copies of the communication patterns in the organization! And this startling insight is what has become known as Conway’s law.

To reiterate: given some organization tasked with designing some system, then according to Conway’s law, there is a force at work to mimic the communication patterns of the organization in the communication patterns of the system. If this is true, it follows that if you care about the system design, you better care about the team organization as well! In fact, you should make sure that you organize your teams in such a way that you can efficiently pursue a satisfactory system design. The Team Topologies book discusses this at some length, including the perils of being ignorant of Conway’s law when organizing your teams.

Deliberately creating an organization that will produce the system architecture you want is sometimes called “The inverse Conway manuever”. This always struck me as odd, because I can’t see anything inverse about it. To me, it’s just “having read Conway’s paper”! It’s still the same force acting in the same direction: deriving a system design from an organization structure. You’re just trying to be in control and use that force for good.

Anyway, I think we’re still only looking at half the picture. We can’t really step outside reality and coolly and objectively consider the ideal organization to produce the ideal system design. We are always entrenched in reality, which means we are part of some existing organization, and chances are there is also an existing system! This is the system we’re working on as developers or designers or architects or whatever. And of course, Conway’s law will have been in effect, so the communication patterns of the organization will have their counterparts in the system as well.

In this situation, there is an actual inverse force at work. The very existence of the system mandates that your organization communicates along certain paths – the communication paths of the system! So this is a force that tries to mimic the communication patterns of the system in the communication patterns of the organization. Not only is the organization shaping the system, the system is shaping the organization as well. This is a reinforcing loop.

Allan Kelly calls this “the homomorphic force“, where homomorphic means “having the same shape”. It is a bi-directional force ensuring that the organization structure and the system architecture stay in sync. If you try to change either one, communication will suffer. It is a very stable and change-resistant constellation. The inertia of the organization will prevent changes to the system, and the inertia of the system will prevent changes to the organization.

This is potentially very bad! Sometimes we want and need change! (XP said embrace change!) How can we make that happen? What kind of organizational juggernaut has the energy required to overcome “the homomorphic force”?

I’m glad you asked! Let’s switch gears and talk about mob programming.

Mob programming works roughly like this. You gather a group of people, ideally a cross-functional group of people, in a room with a big screen and a keyboard. (Complimentary skills are a great asset in this setting, because it means that the mob can handle more kinds of challenges.) There is a single person sitting at the keyboard and typing and the rest of the group is telling the typist what to do. At fairly short intervals (10-15 minutes, perhaps) the roles rotate. Everyone gets to be a typist at regular intervals.

I think there are some good reasons to be sort of strict about the mob programming routine. One thing is that it keeps everyone focused on the task at hand, because they know they will be typing soon. Also, if you have a mob consisting of people who haven’t worked together before, having a structured, role-based, semi-formal mode of working can help overcome some initial awkwardness as you gain trust and get to know each other. And finally it counters some potential bad group dynamics, e.g. maybe someone really wants that keyboard or decide that should be done or whatever.

There are many potential make-ups of such mobs, that is, we can put them together in various ways for various purposes. We can distinguish, for instance, between team mobs (where the participants all belong to the same team) and inter-team mobs (where there are participants from multiple teams). A team mob is – quite naturally – appropriate for features that can be implemented by a single team. If you have teams that are closely aligned with the value streams in your organization, you may often find yourself in this situation. But in other cases, you may find that multiple teams need to be involved to implement a feature. There can be many reasons for this. For instance, if your organization makes a software product that is available on multiple platforms such as iOS, Android and desktop, you may have separate teams for each of those platforms. If you want to launch a new feature across all your different platforms, you’ll need to coordinate and communicate across team boundaries.

In such cases, communication really is the bottleneck. Communication is almost always the bottleneck in software development, but within a single team, communication bandwidth tends to be high (maxing out at a team size of one, of course) and so it’s less noticable. In cross-team development efforts the communication challenges become more pronounced. It’s all too easy to end up in a situation where the different teams are doing a lot of handoffs, constantly waiting for each other, talking past each other, misunderstanding each other, not talking to each other when they really should be talking to each other and so on and so forth. It’s practically inevitable.

The problem with communication is that it rots. I mean that quite literally: we forget what we’ve been talking about. This is not much of a problem when you have constant communication, because you’re constantly reminded. But when the communication stops and starts at irregular intervals, it can be a disaster. We may talk together, but then we forget and our shared understanding rots, our agreements rot, our promises rot. The consequence is that it can be very difficult to get anything done, because we need a critical mass of sustained, shared understanding in order to be able to implement a feature in a meaningful way.

Since you’re a perceptive reader, you’ll no doubt have noticed that we are back to talking about communication! Just like with Conway’s law! If communication is the bottleneck and we notice that our communication patterns are inadequate and don’t allow us to implement a feature efficiently, we need to change our communication patterns!

One way of doing that is to form a inter-team mob to implement a feature across team boundaries. That is going to give us high-bandwidth, sustained communication focused on the feature. In a sense, the mob becomes a temporary feature team. Note that membership in the mob doesn’t need to be fixed, participants may vary over the lifetime of the mob. The important part is that the mob has the knowledge and resources it needs to implement the feature at all times. The mob is more important than the individual participants in the mob.

The great thing about such a mob is that it frees itself from having to work within the communication pattern structure dictated by “the homomorphic force”. It can work across both team and subsystem boundaries. That’s why I like to think of it as Conway’s mob.

I am not much of a sci-fi person, but I’ve watched just enough Star Trek to know that there is something there called “The Borg“. The Borg is a collective of cybernetic organisms with a single hive mind. The Borg acquire the technology and knowledge of other alien species through a process known as “assimilation”. And that’s how Conway’s mob can work as well. Whenever the mob encounters a team boundary, needs to communicate someone new, needs to acquire new knowledge to solve a problem, it can simply invite people in to join the mob.

If you work like this, you may notice a remarkable effect. I have. We are so used to, I think, that cross-team features run into friction and delays at the team boundaries, that it feels weird when it doesn’t happen. It’s almost as if something is wrong, or that we’re cheating. You get the feeling that you’re practically unstoppable. It’s exhilarating. It has so much momentum. There is so much competence and knowledge in the mob room at all times that things never grind to a halt. There is so much decision-making power as well that decisions can be made directly, because everyone that needs to be involved in the decision is already present. You have the discussion right there in the mob room, make a decision and move on. There is no need call meetings a week in advance, hope that everyone shows up, hopefully reach a conclusion before the meeting is over, and then forget what you agreed upon afterwards. It’s remarkably efficient and a lot of fun.

Too much? Too good to be true? Not buying it?

Obviously nothing is ever perfect. Mob programming is not magic. Crossing team boundaries is always difficult. People are different, not everyone shares the same perspective or the same experience. You are likely to encounter some issues, challenges and concerns if you try to launch a Conway’s mob to take on “the homomorphic force” and work across the dual boundaries of teams and subsystems. That’s to be expected.

One concern has to do with priorities. How do you split your time? If you’re member of a cross-team mob, chances are you’re also a member of a long-term team that is aligned with existing subsystems. Those teams in turn have their own priorities – other features, refactorings, bug fixes, all kinds of stuff. How can we just put all of those things on hold while we work on the one feature that the mob is working on?

To my mind, this is really an alignment issue, which in turn means it’s a communication issue. (Yes, again. Communication – is there anything it can’t do?) Do we agree, across teams, that the feature the mob is working on takes precedence? That’s obviously a very important question to ask, but it’s not like it’s impossible to answer. For the product, there is a single answer to that question. It’s a great question to ask a product owner. If the answer is no, then the mob’s work will be much more difficult. Perhaps a cross-team mob shouldn’t be formed at all. But if the answer is yes? Problem solved, go ahead! Whatever the answer, the question should be asked only once. There is no need to revisit it. If the question keeps recurring, that’s yet another communication issue! Someone didn’t hear the answer, or didn’t like it.

A related concern has to do with productivity. In my experience, and contrary to what one might think, it’s primarily developers who are concerned with this. Not so much team leads or product owners. (Perhaps I’ve just been lucky with team leads and product owners, but there you go.) Some developers are likely to feel that the mob is inefficient or unnecessary. The feature could have been split according to team boundaries, each team would do their part, we could work in parallel – much more efficient. Of course I think they’re wrong because I believe very strongly that progress is not going to be bound by typing speed, but by communication bandwidth. But I think I understand where this viewpoint comes from. Ten, fifteen years of sprinting and standups and Jira tickets and burndown charts will do that to you. The productivity thing gets under your skin. We’ve been told over and over again that the ideal situation for a developer is to “get in the zone” and stay there uninterrupted, closing Jira tickets. But we risk suboptimization when we work like that. Productivity is fine, but it should always be focused on the product, not individual developers or individual teams.

And finally, some developers may feel that mob programming is fine, but that team mobs are much better than intra-team mobs. Intra-team mobs has too much overhead and friction and it’s just not worth it. And again, I understand where this sentiment comes from. It’s true, intra-team mobs involve much more overhead and friction that team mobs. But I think this view misses something important. It’s too narrow in scope. We’re taught that friction is always a bad thing, that it is waste. But friction can also be a sign of change. We’re encountering friction and overhead in cross-team mobs because we’re changing communication patterns. We need to spend time to establish trust and even develop a shared language. We need to be able to understand each other. This work never ends up on Jira tickets. But it can still be valuable and important.

We need to remember that the systems we are working on are sociotechnical systems, and that we are part of those systems. We are not just building the software, we are also building the organization that builds the software. When we are changing our communication patterns we are in fact refactoring a sociotechnical system. I think mob programming – in particular using intra-team mobs – can be a great vehicle in bringing about change in sociotechnical systems, since they can free themselves from established patterns of communciation.


Into the Tar Pit

I recently re-read the “Out of the Tar Pit” paper by Ben Moseley and Peter Marks for a Papers We Love session at work. It is a pretty famous paper. You can find it at the Papers We Love repository at GitHub for the simple reason that lots of people love it. Reading the paper again triggered some thoughts, hence this blog post.

The title of the paper is taken from Alan Perlis (epigram #54):

Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.

A Turing tar-pit is a language or system that is Turing complete (and so can do as much as any language or system can do) yet is cumbersome and impractical to use. The Turing machine itself is a Turing tar-pit, because you probably wouldn’t use it at work to solve real problems. It might be amusing but not practical.

The implication of the title is that we are currently in a Turing tar-pit, and we need to take measures to get out of it. Specifically, the measures outlined in the paper.

The paper consists of two parts. The first part is an essay about the causes and effects of complexity in software. The second part is a proposed programming model to minimize so-called accidental complexity in software.

The argument of the paper goes like this: Complexity is the primary cause of problems in software development. Complexity is problematic because it hinders the understanding of software systems. This leads to all kinds of bad second-order effects including unreliability, security issues, late delivery and poor performance, and – in a vicious circle – compound complexity, making all the problems worse in a non-linear fashion as systems grow larger.

Following Fred Brooks in “No Silver Bullet“, the authors distinguish between essential and accidental complexity. The authors define “essential complexity” as the complexity inherent in the problem seen by the users. The “accidental complexity” is all the rest, including everything that has to do with the mundane, practical aspects of computing.

The authors identify state handling, control flow and code volume as drivers of complexity. Most of this complexity is labeled “accidental”, since it has to do with the physical reality of the machine, not with the user’s problem.

The proposed fix is to turn the user’s informal problem statement into a formal one: to derive an executable specification. Beyond that, we should only allow for the minimal addition of “accidental” complexity as needed for practical efficiency concerns.

The authors find our current programming models inadequate because they incur too much accidental complexity. Hence a new programming model is needed, one that incurs a minimum of accidental complexity. The second part of the paper presents such a model.

What struck me as I was reading the paper again was that it is wrong about the causes of complexity and naive about software development in general.

The paper is wrong for two reasons. First, because it treats software development as an implementation problem. It would be nice if that were true. It’s not. We will not get much better at software development if we keep thinking that it is. Second, because it ignores the dynamics of software development and makes invalid assumptions. Specifically, it is naive about the nature of the problems we address by making software.

I agree with the authors that complexity is a tremendous problem in software. The non-linear cumulation of complexity often threatens to make non-trivial software development efforts grind to a halt. Many software systems are not just riddled with technical debt which is the term we often use for runaway complexity – they have practically gone bankrupt! However, the problem of complexity cannot be solved by means of a better programming model alone. We must trace the causes of complexity beyond the realm of the machine and into the real world. While better programming models would be nice, we can’t expect wonders from them. The reason is that the root cause of complexity is to be found in the strenuous relationship between the software system and the world in which it operates. This is outside the realm of programming models.

According to the authors, the role of a software development team is “to produce (using some given language and infrastructure) and maintain a software system which serves the purposes of its users”. In other words, the role is to implement the software. The role of the user, on the other hand, is to act as oracle with respect to the problem that needs to be solved. The authors note in parenthesis that they are assuming “that the users do in fact know and understand the problem that they want solved”. Yet it is well-known that this assumption doesn’t hold! Ask anyone in software! Already we’re in trouble. How can we create an executable specification without a source for this knowledge and understanding?

The paper’s analysis of the causes of complexity begins like this:

In any non-trivial system there is some complexity inherent in the problem that needs to be solved.

So clearly the problem is important. But what is it? In fact, let’s pull the sentence “the problem that needs to be solved” apart a bit, by asking some questions.

Where did the problem come from? Who defined the problem? How is the problem articulated and communicated, by whom, to whom? Is there agreement on what the problem is? How many interpretations and formulations of the problem are there? Why this problem and not some other problem? Who are affected by this problem? Who has an interest in it? Who owns it? Why does the problem matter? Who determined that it was a problem worth solving? Why does it need to be solved? How badly does it need to be solved? Is time relevant? Has it always been a problem? How long has this problem or similar problems existed? Could it cease to be a problem? What happens if it isn’t solved? Is a partial solution viable? How does the problem relate to other problems, or to solutions to other problems? How often does the problem change? What does it mean for the problem to change? Will it still need solving? What forces in the real world could potentially lead to changes? How radical can we expect such changes to be?

We quickly see that the problem isn’t the solution, the problem is the problem itself! How can we even begin to have illusions about how to best develop a “solution” to our “problem” without answers to at least some of these questions? The curse of software development is that we can never fully answer all of these questions, yet they are crucial to our enterprise! If we are to look for root causes of complexity in software, we must start by addressing questions such as these.

When we treat the problem definition as somehow outside the scope of the software development effort, we set ourselves up for nasty surprises – and rampant complexity. As Gerald Weinberg put it in “Are Your Lights On?“: “The computer field is a mother lode of problem definition lessons.” Indeed, any ambiguity, misunderstandings, conflicts, conflict avoidance etc with respect to what the problem is will naturally come back to haunt us in the form of complexity when we try to implement a solution.

Consider an example of actual software development by an actual organization in an actual domain: the TV streaming service offered by NRK, the national public broadcaster in Norway. It’s where I work. What is the problem? It depends on who you ask. Who should you ask? I happen to be nearby. If you ask me, one of many developers working on the service, I might say something like “to provide a popular, high-quality, diverse TV streaming service for the Norwegian public”. It is immediately clear that providing such a service is not a purely technical problem: we need great content, great presentation, great usability, a great delivery platform, among many other things. Creating useful, non-trivial software systems is a multi-disciplinary effort.

It is also clear that such a high-level problem statement must be interpreted and detailed in a million ways in order to be actionable. All the questions above start pouring in. Who provides the necessary interpretation and deliberation? Who owns this problem? Is it our CEO? The product owner for the TV streaming service? The user experience experts? Me? The public? The answer is all and none of us!

But it gets worse, or more interesting, depending on your perspective. The world is dynamic. It changes all the time, whether we like it or not. Hence “the problem” changes as well. It is not something that we can exercise full control over. We don’t exist in a vacuum. We are heavily influenced by changes to the media consumption habits of the public, for instance. The actions of the international media giants influence our actions as well, as do the actions of the large social media platforms. Everything changes, sometimes in surprising ways from surprising angles.

With this backdrop, how do we address “the problem”? What is the best future direction for our service? What would make it more popular, higher quality, more diverse, better for the public? Opinions vary! Is it ML-driven customization and personalization? Is it more social features? Is it radical new immersive and interactive experiences that challenge what TV content is and how it is consumed? We don’t know. No-one knows.

It is naive to think that there is such a thing as “the user”. If there were such a thing as “the user”, it is naive to think that they could provide us with “the problem”. If they could provide us with “the problem”, it is naive to think that it would stay stable over time. If “the problem” did stay stable over time, it is naive to think that everyone would understand it the same way. And so on and so forth.

We cannot expect “the user” to provide us with a problem description, at least not one that we could use to implement an executable specification. The problem of defining the problem unfolds over time in a concrete yet shifting context, in a complex system of human actors. There is nothing inessential about this, it is ingrained in everything we do. We can’t escape from it. Labeling it accidental won’t make it go away.

Instead of ignoring it or dreaming about an “ideal world” where all of these aspects of software development can be ignored, we should accept it. Not only accept it, in fact, but see it as our job to handle. Software developers should provide expertise not just in programming or in running software in production, but also in the externalization of mental models to facilitate communication and enable collaborative modelling. Software development is largely a communication problem. We should take active part in defining, delineating, describing and exploring the problem domain itself, which goes beyond the software system. We should contribute to better concepts and a richer language to describe the domain. It will help us uncover new and better problem descriptions, which will lead to new and better software systems. This exploration is a never-ending process of discovery, negotiation and reevaluation. We should lead in this effort, not wait for someone else to do it for us.

When we pretend that there is such a thing as “the essential problem” that the user can hand over to “the development team” for implementation, we are being naive Platonists. We’re acting as if “the problem” is something stable and eternal, an a priori, celestial entity that we can uncover. But that is not the reality of most problem domains. It may be possible to identify such problems for purely abstract, mathematical structures – structures that need no grounding in the fleeting world that we inhabit. But most software systems don’t deal with such structures.

Instead, most programs or software systems deal with informal, ambiguous, self-contradictory, fluctuating, unstable problems in a shifting, dynamic world. “The problem that needs solving” is always in a state of negotiation and partial understanding. Assumptions and presumed invariants are rendered invalid by a reality that has no particular regard for our attempts to describe it. Indeed, there can be no innovation without the invalidation of existing models! The problem of software development is not “how to implement a solution to a given problem without shooting yourself in the foot”. It is to formalize something that in its nature is informal and unformalizable. As Stephen Jay Gould puts it in “What, if anything, is a zebra?“, “I do not believe that nature frustrates us by design, but I rejoice in her intransigence nonetheless.”

As software developers, we can’t turn a blind eye to this state of affairs. It is an intrinsic and hence essential problem in software development, and one that we must tackle head-on. In “The World and the Machine“, Michael A. Jackson refers to what he calls the “Von Neumann principle”:

There is no point in using exact methods where there is no clarity in the concepts and issues to which they are to be applied.

This means that we must gain a deep understanding and a rich language to describe the problem domain itself, not just the software system we want to operate in that problem domain.

The challenge is to fight an impossible battle successfully. We must constantly try to pin down a problem to a sufficient degree to be able to construct a useful machine that helps solve the problem, as we currently understand it. We must accept that this solution is temporary since the problem will change. And then we must try to keep a dance going on between a fundamentally unstable problem and a machine that longs for stability, without toppling over.

We can’t hope to be successful in this endeavor if we ignore the nature of this process. An account of complexity in software that doesn’t account for the continuous tension between a necessarily formal system and an irreducibly informal world is missing something essential about software development. That’s why “Out of the Tar Pit” is wrong.

I think we need to accept and embrace the tar-pit. At least then we’re grappling with the real causes of complexity. The real world is a hot and sticky place. This is where our software systems and we, as software developers, must operate. Nothing of interest is ever going to be easy. But perhaps we can take heart that everything is still possible!


Proper JSON and property bags

I recently wrote a blog post where I argued that “JSON serialization” as commonly practiced in the software industry is much too ambitious. This is the case at least in the .NET and Java ecosystems. I can’t really speak to the state of affairs in other ecosystems, although I note that the amount of human folly appears to be a universal constant much like gravity.

The problem is that so-called “JSON serializers” handle not just serialization and deserialization of JSON, they tend to support arbitrary mapping between the JSON model and some other data model as well. This additional mapping, I argued, causes much unnecessary complexity and pain. Whereas serialization and deserialization of JSON is a “closed” problem with bounded complexity, arbitrary mapping between data models is an “open” problem with unbounded complexity. Hence JSON serializers should focus on the former and let us handle the latter by hand.

I should add that there is nothing that forces developers to use the general data model mapping capabilities of JSON serializers of course. We’re free to use them in much more modest ways. And we should.

That’s all well and good. But what should we do in practice? There are many options open to us. In this blog post I’d like to explore a few. Perhaps we’ll learn something along the way.

Before we proceed though, we should separate cleanly between serialization and deserialization. When a software module uses JSON documents for persistence, it may very well do both. In many cases, however, a module will do one or the other. A producer of JSON documents only does serialization, a consumer only does deserialization. In general, the producer and consumer are separate software modules, perhaps written in different languages by different teams.

It looks like this:

Source and target models for JSON serialization and deserialization

There doesn’t have to be a bi-directional mapping between a single data model and JSON text. There could very well be two independent unidirectional mappings, one from a source model to JSON (serialization) and the other from JSON to a target model (deserialization). The source model and the target model don’t have to be the same. Why should they be? Creating a model that is a suitable source for serialization is a different problem from creating a model that is suitable target for deserialization. We are interested in suitable models for the task at hand. What I would like to explore, then, are some alternatives with respect to what the JSON serializer actually should consume during serialization (the source model) and produce during deserialization (the target model).

In my previous blog post I said that the best approach was to use “an explicit representation of the JSON data model” to act as an intermediate step. You might be concerned about the performance implications of the memory allocations involved in populating such a model. I am not, to be honest. Until I discover that those allocations are an unacceptable performance bottleneck in my application, I will optimize for legibility and changeability, not memory footprint.

But let’s examine closer what a suitable explicit representation could be. JSON objects are property bags. They have none of the ambitions of objects as envisioned in the bold notion of object-oriented programming put forward by Alan Kay. There is no encapsulation and definitely no message passing involved. They’re not alive. You can’t interact with them. They’re just data. That may be a flaw or a virtue depending on your perspective, but that’s the way it is. JSON objects are very simple things. A JSON object has keys that point to JSON values, which may be null, true, false, a number, a string, an array of JSON values, or another object. That’s it. So the question becomes: what is an appropriate representation for those property bags?

JSON serializers typically come with their own representation of the JSON data model. To the extent that it is public, this representation is an obvious possibility. But what about others?

I mentioned in my previous blog post that the amount of pain related to ambitious “JSON serializers” is proportional to the conceptual distance involved in the mapping, and also of the rate of change to that mapping. In other words, if the model you’re mapping to or from is significantly different from the JSON model, there will be pain. If the model changes often, there will be no end to the pain. It’s a bad combination. Conversely, if you have a model that is very close to the JSON model and that hardly ever changes, the amount of pain will be limited. We are still technically in the land of unbounded complexity, but if we are disciplined and staying completely still close to the border, it might not be so bad? A question might be: how close to the JSON model must we stay to stay out of trouble? Another might be: what would be good arguments to deviate from the JSON model?

When exploring these alternatives, I’ll strive to minimize the need for JSON serializer configuration. Ideally there should be no configuration at all, it should just work as expected out of the box. In my previous blog post I said that the black box must be kept closed lest the daemon break free and wreak havoc. Once we start configuring, we need to know the internal workings of the JSON serializer and the fight to control the daemon will never stop. Now your whole team needs to be experts in JSON serializer daemon control. Let’s make every effort to stay out of trouble and minimize the need for configuration. In other words, need for configuration counts very negatively in the evaluation of a candidate model.

Example: shopping-cart.json

To investigate our options, I’m going to need an example. I’m going to adapt the shopping cart example from Scott Wlaschin’s excellent book on domain modelling with F#.

A shopping cart can be in one of three states: empty, active or paid. How would we represent something like that as JSON documents?

First, an empty shopping cart.


{
"_state": "empty"
}

view raw

empty-cart.json

hosted with ❤ by GitHub

Second, an active shopping cart with two items in it, a gizmo and a widget. You’ll notice that items may have an optional description that we include when it’s present.


{
"_state": "active",
"unpaidItems": [
{
"id": "1bcd",
"title": "gizmo"
},
{
"id" : "3cdf",
"title": "widget",
"description": "A very useful item"
}
]
}

And finally, a paid shopping cart with two items in it, the amount and currency paid, and a timestamp for the transaction.


{
"_state": "paid",
"paidItems": [
{
"id": "1bcd",
"title": "gizmo"
},
{
"id" : "3cdf",
"title": "widget",
"description": "A very useful item"
}
],
"payment": {
"amount": 123.5,
"currency": "USD"
},
"timestamp": "2020-04-11T10:11:33.514+02:00"
}

view raw

paid-cart.json

hosted with ❤ by GitHub

You’ll notice that I’ve added a _state property to make it easier for a client to check which case they’re dealing with. This is known as a discriminator in OpenAPI, and can be used with the oneOf construct to create a composite schema for a JSON document.

So what are our options for explicit representations of these JSON documents in code?

We’ll take a look at the following:

  • Explicit JSON model (Newtonsoft)
  • Explicit DTO model
  • Anonymous DTO model
  • Dictionary

Explicit JSON model

Let’s start by using an explicit JSON model. An obvious possibility is to use the JSON model from whatever JSON serializer library we happen to be using. In this case, we’ll use the model offered by Newtonsoft.

We’ll look at serialization first. Here’s how we might create a paid cart as a JObject and use it to serialize to the appropriate JSON.


var paidCartObject = new JObject(
new JProperty("_state", new JValue("paid")),
new JProperty("paidItems",
new JArray(
new JObject(
new JProperty("id", new JValue("1bcd")),
new JProperty("title", new JValue("gizmo"))),
new JObject(
new JProperty("id", new JValue("3cdf")),
new JProperty("title", new JValue("widget")),
new JProperty("description", new JValue("A very useful item"))))),
new JProperty("payment",
new JObject(
new JProperty("amount", new JValue(123.5)),
new JProperty("currency", new JValue("USD")))),
new JProperty("timestamp", new JValue("2020-04-11T10:11:33.514+02:00")));

There’s no denying it: it is a bit verbose. At the same time, it’s very clear what we’re creating. We are making no assumptions that could be invalidated by future changes. We have full control over the JSON since we are constructing it by hand. We have no problems with optional properties.

What about deserialization?


var paidCartJsonString = @"{
""_state"": ""paid"",
""paidItems"": [
{
""id"": ""1bcd"",
""title"": ""gizmo""
},
{
""id"" : ""3cdf"",
""title"": ""widget"",
""description"": ""A very useful item""
}
],
""payment"": {
""amount"": 123.5,
""currency"": ""USD""
},
""timestamp"": ""2020-04-11T10:11:33.514+02:00""
}";
var paidCartDeserialized = JsonConvert.DeserializeObject(paidCartJsonText);
var firstItemTitleToken = paidCartDeserialized["paidItems"][0]["title"];
var firstItemTitle = ((JValue) firstItemTitleToken).Value;
var paymentCurrencyToken = paidCartDeserialized["payment"]["currency"];
var paymentCurrency = ((JValue) paymentCurrencyToken).Value;

The deserialization itself is trivial, a one-liner. More importantly: there is no configuration involved, which is great news. Deserialization is often a one-liner, but you have to set up and configure the JSON serializer “just so” to get the output you want. Not so in this case. There are no hidden mechanisms and hence no surprises.

We can read data from the deserialized JObject by using indexers, which read pretty nicely. Unfortunately the last step is a little bit cumbersome, since we need to cast the JToken to a JValue before we can actually get to the value itself. Also, we obviously have to make sure that we get the property names right.

A drawback of using Newtonsoft’s JSON model is, of course, that we get locked-in to Newtonsoft. If we decide we want to try a hot new JSON serializer for whatever reason, we have to rewrite a bunch of pretty boring code. An alternative would be to create our own simple data model for JSON. But that approach has its issues too. Not only would we have to implement that data model, but we would probably have to teach our JSON serializer how to use it as a serialization source or deserialization target as well. A lot of work for questionable gain.

Explicit DTO model

Many readers of my previous blog post said they mitigated the pain of JSON serialization by using dedicated data transfer objects or DTOs as intermediaries between their domain model and any associated JSON documents. The implied cure for the pain, of course, is that the DTOs are much nearer to the JSON representation than the domain model is. The DTOs don’t have to concern themselves with things such as data integrity and business rules. The domain model will handle all those things. The domain model in turn doesn’t need to know that such a thing as JSON even exists. This gives us a separation of concerns, which is great.

However, the picture is actually a little bit more complex.

JSON serialization and deserialization with DTOs.

To keep the drawing simple, I’m pretending that there is a single DTO model and a bi-directional mapping between the DTO and the JSON. That doesn’t have to be the case. There might well be just a unidirectional mapping.

Even with a DTO, we have ventured into the land of unbounded complexity, on the tacit promise that we won’t go very far. The pain associated with JSON serialization will be proportional to the distance we travel. So let’s agree to stay within an inch of actual JSON. In fact, let’s just treat our explicit DTO model as named, static property bags.

To minimize pain, we’ll embrace some pretty tough restrictions on our DTOs. We’ll only allow properties of the following types: booleans, numbers (integers and doubles), strings, arrays, lists and objects that are themselves also DTOs. That might seem like a draconian set of restrictions, but it really just follows from the guideline that JSON serialization and deserialization should work out of the box, without configuration.

You’ll probably notice that there are no types representing dates or times in that list. The reason is that there are no such types in JSON. Dates and times in JSON are just strings. Ambitious JSON serializers will take a shot at serializing and deserializing types like DateTime for you of course, but the exact behavior varies between serializers. You’d have to know what your JSON serializer of choice happens to do, and you’d have to configure your JSON serializer to override the default behavior if you didn’t like it. That, to me, is venturing too far from the JSON model. I’ve seen many examples of developers being burned by automatic conversion of dates and times by JSON serializers.

Even with those restrictions, we still have many choices to make. In fact, it’s going to be a little difficult to achieve the results we want without breaking the no-configuration goal.

First we’re going to have to make some decisions about property names. In C#, the convention is to use PascalCase for properties, whereas our JSON documents use camelCase. This is a bit of a “when in Rome” issue, with the complicating matter that there are two Romes. There are two possible resolutions to this problem.

One option is to combine an assumption with an admission. That is, we can 1) make the assumption that our JSON documents only will contain “benign” property names that don’t contain whitespace or control characters and 2) accept that our DTOs will have property names that violate the sensitivies of a C# style checker. That will yield the following set of DTOs:


public abstract class ShoppingCart
{
public ShoppingCart(string state)
{
_state = state;
}
public string _state { get; }
}
public class EmptyCart : ShoppingCart
{
public EmptyCart() : base("empty") {}
}
public class ActiveCart : ShoppingCart
{
public ActiveCart() : base("active") { }
public Item[] unpaidItems { get; set; }
}
public class PaidCart : ShoppingCart
{
public PaidCart() : base("paid") {}
public object[] paidItems { get; set; }
public Money payment { get; set; }
public string timestamp { get; set; }
}
public class Item
{
public string id { get; set; }
public string title { get; set; }
public string description { get; set; }
}
public class Money
{
public float amount { get; set; }
public string currency { get; set; }
}

Depending on your sensitivies, you may have run away screaming at this point. A benefit however, is that it works reasonably well out of the box. The property names in the DTOs and in the JSON are identical, which makes sense since the DTOs are a representation of the same property bags we find in the JSON. In this scenario, coupling of names is actually a good thing.

Another option is to add custom attributes to the properties of our DTOs. Custom attributes are a mechanism that some JSON serializers employ to let us create an explicit mapping between property names in our data model and property names in the JSON document. This clearly is a violation of the no-configuration rule, though. Do it at your own peril.


abstract class ShoppingCart
{
public ShoppingCart(string state)
{
State = state;
}
[JsonProperty("_state")]
public string State { get; }
}
class EmptyCart : ShoppingCart
{
public EmptyCart() : base("empty") {}
}
class ActiveCart : ShoppingCart
{
public ActiveCart() : base("paid") { }
[JsonProperty("unpaidItems")]
public Item[] UnpaidItems { get; set; }
}
class PaidCart : ShoppingCart
{
public PaidCart() : base("paid") {}
[JsonProperty("paidItems")]
public Item[] PaidItems { get; set; }
[JsonProperty("payment")]
public Money Payment { get; set; }
[JsonProperty("timestamp")]
public string Timestamp { get; set; }
}
class Item
{
[JsonProperty("id")]
public string Id { get; set; }
[JsonProperty("title")]
public string Title { get; set; }
[JsonProperty("description", NullValueHandling = NullValueHandling.Ignore)]
public string Description { get; set; }
}
class Money
{
[JsonProperty("amount")]
public double Amount { get; set; }
[JsonProperty("currency")]
public string Currency { get; set; }
}

This yields perhaps more conventional-looking DTOs. They are, however, now littered with custom attributes specific to the JSON serializer I’m using. There’s really a lot of configuration going on: every property is being reconfigured to use a different name.

We also have the slightly strange situation where the property names for the DTOs don’t really matter. It is a decoupling of sorts, but it doesn’t really do much work for us, seeing as the whole purpose of the DTO is to represent the data being transferred.

But ok. Let’s look at how our DTOs hold up as source models for serialization and target models for deserialization, respectively.

Here’s how you would create an instance of DTO v1 and serialize it to JSON.


var paidCartDto1 = new PaidCart
{
paidItems = new Item[] {
new Item {
id = "1bcd",
title = "gizmo"
},
new Item {
id = "3cdf",
title = "widget",
description = "A very useful item"
}
},
payment = new Money {
mount = 123.5,
currency = "USD"
},
timestamp = "2020-04-11T10:11:33.514+02:00"
};
var paidCartDto1JsonText = JsonConvert.SerializeObject(paidCartDto1);

It’s pretty succinct and legible, and arguably looks quite similar to the JSON text it serializes to. However, there is a small caveat: our optional description is included with a null value in the JSON. That’s not really what we aimed for. To change that behaviour, we can configure our JSON serializer to omit properties with null values from the serialized output. But now we have two problems. The first is that we had to resort to configuration, the second is that we’ve placed a bet: that all properties with null values should always be omitted from the output. That’s the case today, but it could definitely change. To gain more fine-grained control, we’d have to dig out more granular and intrusive configuration options, like custom attributes or custom serializers. Or perhaps some combination? That’s even worse, now our configuration is spread over multiple locations – who knows what the aggregated behavior is and why?

What about DTO v2? The code looks very similar, except it follows C# property naming standards and at the same time deviates a little bit from the property names that we actually find in the JSON document. We’d have to look at the definition of the PaidCart to convince ourselves that it probably will serialize to the appropriate JSON text, since we find the JSON property names there – not at the place we’re creating our DTO.


var paidCartDto2 = new PaidCart
{
PaidItems = new Item[] {
new Item {
Id = "1bcd",
Title = "gizmo"
},
new Item {
Id = "3cdf",
Title = "widget",
Description = "A very useful item"
}
},
Payment = new Money {
Mount = 123.5,
Currency = "USD"
},
Timestamp = "2020-04-11T10:11:33.514+02:00"
};
var paidCartDto2JsonText = JsonConvert.SerializeObject(paidCartDto2);

A benefit is that since we already littered the DTO with custom attributes, I made sure to add a NullValueHandling.Ignore to the Description property, so that the property is not included in the JSON if the value is null. Of course I had to Google how to do it, since I can’t ever remember all the configuration options and how they fit together.

So that’s serialization. We can get it working, but it’s obvious that the loss of control compared to using the explicit JSON model is pushing us towards making assumptions and having to rely on configuration to tweak the JSON output. We’ve started pushing buttons and levers. The daemon is banging against walls of the black box.

What about deserialization? Here’s how it looks for a paid cart using DTO v2:


var paidCartJsonString = @"{
""_state"": ""paid"",
""paidItems"": [
{
""id"": ""1bcd"",
""title"": ""gizmo""
},
{
""id"" : ""3cdf"",
""title"": ""widget"",
""description"": ""A very useful item""
}
],
""payment"": {
""amount"": 123.5,
""currency"": ""USD""
},
""timestamp"": ""2020-04-11T10:11:33.514+02:00""
}";
var paidCartDtoFromText = JsonConvert.DeserializeObject<PaidCart>(paidCartJsonString);
var firstItemTitle = paidCartDtoFromText.PaidItems[0].Title;
var currency = paidCartDtoFromText.Payment.Currency;

Well, what can I say. It’s quite easy if we know in advance if we’re dealing with an empty cart, an active cart or a paid cart! And it’s very easy and access the various property values.

But of course we generally don’t know what kind of shopping the JSON document describes. That information is in the JSON document!

What we would like to write in our code is something like this:


var shoppingCartDtoFromText = JsonConvert.DeserializeObject<ShoppingCart>(jsonText);

But the poor JSON serializer can’t do that, not without help! The problem is that the JSON serializer doesn’t know which subclass of ShoppingCart to instantiate. In fact, it doesn’t even know that the subclasses exist.

We have three choices at this point. First, we can create a third variation of our DTO, one that doesn’t have this problem. We could just the collapse our fancy class hierarchy and use something like this:


class ShoppingCart
{
[JsonProperty("_state")]
public string State { get; set; }
[JsonProperty("unpaidItems", NullValueHandling = NullValueHandling.Ignore)]
public Item[] UnpaidItems { get; set; }
[JsonProperty("paidItems", NullValueHandling = NullValueHandling.Ignore)]
public Item[] PaidItems { get; set; }
[JsonProperty("payment", NullValueHandling = NullValueHandling.Ignore)]
public Money Payment { get; set; }
[JsonProperty("timestamp", NullValueHandling = NullValueHandling.Ignore)]
public string Timestamp { get; set; }
}

It’s not ideal, to put it mildly. I think we can probably agree that this is not a good DTO, as it completely muddles together what was clearly three distinct kinds of JSON documents. We’ve lost that now, in an effort to make the JSON deserialization process easier.

The second option is to pull out the big guns and write a custom deserializer. That way we can sneak a peek at the _state property in the JSON document, and based on that create the appropriate object instance. To manage that requires, needless to say, a fair bit of knowledge of the workings of our JSON serializer. Chances are your custom deserializer will be buggy. If the JSON document format and hence the DTOs are subject to change (as typically happens), changes are it will stay buggy over time.

The third option is to protest against the design of the JSON documents! That would mean that we’re letting our problems with deserialization dictate our communication with another software module and potentially a different team of developers. It is not the best of reasons for choosing a design, I think. After all, there are alternative target models for deserialization that don’t have these problems. Why can’t we use one of them? But we might still be able to pull it off, if we really want to. It depends on our relationship with the owners of the supplier of the JSON document. It is now a socio-technical issue that involves politics and power dynamics between organizations (or different parts of the same organization): do we have enough leverage with the supplier of the JSON document to make them change their design to facilitate deserialization at our end? To we want to exercise that leverage? What are the consequences?

It’s worth noting that these problems only apply to DTOs as target models for deserialization. As source models for serialization, we can use our previous two variations, with the caveats mentioned earlier.

To conclude then, explicit DTOs are relatively straightforward as source models for serialization, potentially less so as target models for deserialization. A general drawback of using explicit DTOs is that we must write, maintain and configure a bunch of classes. That should be offset by some real, tangible advantage. Is it?

Anonymous classes

We can avoid the chore of having to write and maintain such classes by using anonymous classes in C# as DTOs instead. It might not be as silly as it sounds, at least for simple use cases.

For the serialization case, it would look something like this:


var paidCartAnon = new
{
_state = "paid",
paidItems = new object[] {
new {
id = "1bcd",
title = "gizmo"
},
new {
id = "3cdf",
title = "widget",
description = "A very useful item"
}
},
payment = new {
mount = 123.5,
currency = "USD"
},
timestamp = "2020-04-11T10:11:33.514+02:00"
};
var paidCartAnonJsonText = JsonConvert.SerializeObject(paidCartAnon);

This is actually very clean! The code looks really similar to the target JSON output. You may notice that the paidItems array is typed as object. This is to allow for the optional description of items. The two items are actually instances of distinct anonymous classes generated by the compiler. One is a DTO with two properties, the other a DTO with three properties. For the compiler, the two DTOs have no more in common than the fact that they are both objects.

As long as we’re fine with betting that the property names of the target JSON output will never contain whitespace or control characters, this isn’t actually a bad choice. No configuration is necessary to handle the optional field appropriately.

A short-coming compared to explicit DTOs is ease of composition and reuse across multiple DTOs. That’s not an issue in the simple shopping cart example, but it is likely that you will encounter it in a real-world scenario. Presumably you will have smaller DTOs that are building blocks for multiple larger DTOs. That might be more cumbersome to do using anonymous DTOs.

What about deserialization? Surely it doesn’t make sense to use an anonymous type as target model for deserialization? Newtonsoft thinks otherwise! Ambitious JSON serializers indeed!


var paidCartJsonString = @"{
""_state"": ""paid"",
""paidItems"": [
{
""id"": ""1bcd"",
""title"": ""gizmo""
},
{
""id"" : ""3cdf"",
""title"": ""widget"",
""description"": ""A very useful item""
}
],
""payment"": {
""amount"": 123.5,
""currency"": ""USD""
},
""timestamp"": ""2020-04-11T10:11:33.514+02:00""
}";
var anonymousPaidCartObject = JsonConvert.DeserializeAnonymousType(paidCartJsonString,
new
{
_state = default(string),
paidItems = new [] {
new {
id = default(string),
title = default(string),
description = default(string)
}
},
payment = new
{
amount = default(double),
currency = default(string)
},
timestamp = default(string)
});
var firstItemTitle = anonymousPaidCartObject.paidItems[0].title;
var currency = anonymousPaidCartObject.payment.currency;

This actually works, but it’s a terrible idea, I hope you’ll agree. Creating a throw-away instance of an anonymous type in order to be able to reflect over the type definition is not how you declare types. It’s convoluted and confusing.

So while it is technically possible to use anonymous DTOs as target models for deserialization, you really shouldn’t. As source models for serialization, however, anonymous DTOs are not too bad. In fact, they have some advantages over explicit DTOs in that you don’t have to write and maintain them yourself.

Dictionary

Finally, we come to the venerable old dictionary! With respect to representing a property bag, it really is an obvious choice, isn’t it? A property bag is literally what a dictionary is. In particular, it should be a dictionary that uses strings for keys and objects for values.

Here is a dictionary used as serialization source:


var paidCartBag = new Dictionary<string, object> {
{
"_state", "paid"
},
{
"paidItems",
new List<object>() {
new Dictionary<string, object> {
{ "id", "1bcd" },
{ "title", "gizmo" }
},
new Dictionary<string, object> {
{ "id", "3cdf" },
{ "title", "widget" },
{ "description", "A very useful item" }
}
}
},
{
"payment",
new Dictionary<string, object> {
{ "amount", 123.5 },
{ "currency", "USD" }
}
},
{
"timestamp", "2020-04-11T10:11:33.514+02:00"
}
};

It is more verbose than the versions using explicit or anonymous DTOs above. I’m using the dictionary initializer syntax in C# to make it as compact as possible, but still.

It is very straightforward however. It makes no assumptions and places no bets against future changes to the JSON document format. Someone could decide to rename the paidItems property in the JSON to paid items and we wouldn’t break a sweat. The code change would be trivial. Moreover the effect of the code change would obviously be local – there would be no surprise changes to the serialization of other properties.

What about the deserialization target scenario, which caused so much trouble for our DTOs? We would like to be able to write something like this:


var paidCartJsonString = @"{
""_state"": ""paid"",
""paidItems"": [
{
""id"": ""1bcd"",
""title"": ""gizmo""
},
{
""id"" : ""3cdf"",
""title"": ""widget"",
""description"": ""A very useful item""
}
],
""payment"": {
""amount"": 123.5,
""currency"": ""USD""
},
""timestamp"": ""2020-04-11T10:11:33.514+02:00""
}";
var paidCartBagFromText = JsonConvert.DeserializeObject<Dictionary<string, object>>(paidCartJsonString);
var firstItemTitle = paidCartBagFromText["paidItems"][0]["title"];
var currency = paidCartBagFromText["payment"]["currency"];

Alas, it doesn’t work! The reason is that while we can easily tell the JSON serializer that we want the outermost object to be a dictionary, it doesn’t know that we want that rule to apply recursively. In general, the JSON serializer doesn’t know what to do with JSON objects and JSON arrays, so it must revert to defaults.

We’re back to custom deserializers, in fact. Deserialization really is much more iffy than serialization. The only good news is that deserialization of JSON into a nested structure of string-to-object dictionaries, object lists and primitive values is again a closed problem. It is not subject to change. We could do it once, and not have to revisit it again. Since our target model won’t change, our custom deserializer won’t have to change either. So while it’s painful, the pain is at least bounded.

Here is a naive attempt at an implementation, thrown together in maybe half an hour:


public class PropertyBagDeserializer : JsonConverter
{
public override bool CanRead => true;
public override bool CanWrite => false;
public override bool CanConvert(Type objectType)
{
return true;
}
public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
return ReadValue(reader, serializer);
}
private static object ReadValue(JsonReader reader, JsonSerializer serializer)
{
if (reader.TokenType == JsonToken.StartObject)
{
return ReadObjectValue(reader, serializer);
}
else if (reader.TokenType == JsonToken.StartArray)
{
return ReadArrayValue(reader, serializer);
}
else
{
return ReadSimpleValue(reader, serializer);
}
}
private static object ReadObjectValue(JsonReader reader, JsonSerializer serializer)
{
reader.Read();
var dictionary = new Dictionary<string, object>();
while (reader.TokenType != JsonToken.EndObject)
{
if (reader.TokenType == JsonToken.PropertyName)
{
var propertyName = (string) reader.Value;
reader.Read();
dictionary[propertyName] = ReadValue(reader, serializer);
}
}
reader.Read();
return dictionary;
}
private static object ReadArrayValue(JsonReader reader, JsonSerializer serializer)
{
reader.Read();
var list = new List<object>();
while (reader.TokenType != JsonToken.EndArray)
{
list.Add(ReadValue(reader, serializer));
}
reader.Read();
return list;
}
private static object ReadSimpleValue(JsonReader reader, JsonSerializer serializer)
{
var val = serializer.Deserialize(reader);
reader.Read();
return val;
}
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}

It probably has bugs. It didn’t crash on my one test input (the paid cart JSON we’ve seen multiple times in this blog post), that’s all the verification I have done. Writing custom deserializers is a pain, and few developers have enough time available to become experts at it. I’m certainly no expert, I have to look it up and go through a slow discovery process every time. But there is a chance that it might one day become relatively bug-free, since the target isn’t moving. There are no external sources of trouble.

With the custom deserializer, deserializing to a dictionary looks like this:


var paidCartJsonString = @"{
""_state"": ""paid"",
""paidItems"": [
{
""id"": ""1bcd"",
""title"": ""gizmo""
},
{
""id"" : ""3cdf"",
""title"": ""widget"",
""description"": ""A very useful item""
}
],
""payment"": {
""amount"": 123.5,
""currency"": ""USD""
},
""timestamp"": ""2020-04-11T10:11:33.514+02:00""
}";
var paidCartBagFromText = JsonConvert.DeserializeObject<Dictionary<string, object>>(paidCartJsonString);
var paidItems = (List<object>) paidCartBagFromText["paidItems"];
var firstItem = (Dictionary<string, object>) paidItems[0];
var firstItemTitle = (string) firstItem["title"];
var payment = (Dictionary<string, object>) paidCartBagFromText["payment"];
var currency = (string) payment["currency"];

There is a lot of casting going on. We might be able to gloss it over a bit by offering some extension methods on dictionary and list. I’m not sure it would help or make matters worse.

The reason is in JSON’s nature, I guess. It is a completely heterogenic property bag. It’s never going to be a frictionless thing in a statically typed language, at least not of the C# ilk.

Summary

What did we learn? Did we learn anything?

Well, we learned that deserialization in general is much more bothersome than serialization. Perhaps we already suspected as much, but it really became painfully clear, I think. In fact, the only target model that will let us do deserialization without either extensive configuration, making bets against the future or potentially engaging in organizational tug of war is the explicit JSON model. But luckily that’s actually a very clean model as well. The explicit JSON model is verbose when you use it to create instances by hand. But we’re not doing that. The JSON serializer does all of that, and it does it robustly because it’s the JSON serializer’s own model. Reading values out of the JSON model is actually quite succinct and nice. And when we’re deserializing, we’re only reading. I therefore recommend using that model as target model for deserialization.

For serialization, there is more competition and hence the conclusion is less clear-cut. The explicit JSON model is still a good choice, but it is pretty verbose. You might prefer to use a dictionary or some sort of DTO, either explicit or anonymous. However, both of the latter come with some caveats and pitfalls. I think actually the good old dictionary might be the best choice as source model for serialization.

What do you think?


On the complexity of JSON serialization

I vented a bit on Twitter the other day about my frustrations with JSON serialization in software development.

I thought I’d try to write it out in a bit more detail.

I’m going to embrace ambiguity and use the term “serialization” to mean both actual serialization (producing a string from some data structure) and deserialization (producing a data structure from some string). The reason is that our terminology sucks, and we have no word that encompasses both. No, (de-)serialization is not a word. I think you’ll be able to work out which sense I’m using at any given point. You’re not a machine after all.

Here’s the thing: on every single software project or product I’ve worked on, JSON serialization has been a endless source of pain and bugs. It’s a push stream of trouble. Why is that so? What is so inherently complicated in the problem of JSON serialization that we always, by necessity, struggle with it?

It’s weird, because the JSON object model is really really simple. Moreover, it’s a bounded, finite set of problems, isn’t it? How do you serialize or deserialize JSON? Well, gee, you need to map between text and the various entities in the JSON object model. Specifically, you need to be able to handle the values null, true and false, you need to handle numbers, strings and whitespace (all of which are unambiguously defined), and you need to handle arrays and objects of values. That’s it. Once you’ve done that, you’re done. There are no more problems!

I mean, it’s probably an interesting engineering challenge to make that process fast, but that’s not something that should ever end up causing us woes. Someone else could solve that problem for us once and for all. People have solved problems much, much more complicated than that once and for all.

But I’m looking at the wrong problem of course. The real problem is something else, because “JSON serialization” as commonly practiced in software development today is much more than mere serializing and deserializing JSON!

This is actual JSON serialization:

Actual JSON serialization.

This problem has some very nice properties! It is well-defined. It is closed. It has bounded complexity. There are no sources of new complexity unless the JSON specification itself is changed. It is an eminent candidate for a black box magical solution – some highly tuned, fast, low footprint enterprise-ready library or other. Great.

This, however, is “JSON serialization” as we practice it:

JSON serialization as practised

“JSON serialization” is not about mapping to or from a text string a single, canonical, well-defined object model. It is much more ambitious! It is about mapping to or from a text string containing JSON and some arbitrarily complex data model that we invented using a programming language of our choice. The reason, of course, is that we don’t want to work with a JSON representation in our code, we want to work with our own data structure. We may have a much richer type system, for instance, that we would like to exploit. We may have business rules that we want to enforce. But at the same it it’s so tedious to write the code to map between representations. Boilerplate, we call it, because we don’t like it. It would be very nice if the “JSON serializer” could somehow produce our own, custom representation directly! Look ma, no boilerplate! But now the original problem has changed drastically.

It now includes this:

Generic data model mapping

The sad truth is that it belongs to a class of problems that is both boring and non-trivial. They do exist. It is general data model mapping, where only one side has fixed properties. It could range from very simple (if the two models are identical) to incredibly complex or even unsolvable. It depends on your concrete models. And since models are subject to change, so is the complexity of your problems. Hence the endless stream of pain and bugs mentioned above.

How does an ambitious “JSON serializer” attempt to solve this problem? It can’t really know how to do the mapping correctly, so it must guess, based on conventions and heuristics. Like if two names are the same, you should probably map between them. Probably. Like 99% certain that it should map. Obviously it doesn’t really know about your data model, so it needs to use the magic of reflection. For deserialization, it needs to figure out the correct way to construct instances of your data represention. What if there are multiple ways? It needs to choose one. Sometimes it will choose or guess wrong, so there needs to be mechanisms for giving it hints to rectify that. What if there are internal details in your data representation that doesn’t have a natural representation in JSON? It needs to know about that. More mechanisms to put in place. And so on and so forth, ad infinitum.

This problem has some very bad properties! It is ill-defined. It is open. It has unbounded, arbitrary complexity. There are endless sources of new complexity, because you can always come up with new ways of representing your data in your own code, and new exceptions to whatever choices the “JSON serializer” needs to make. The hints you gave it may become outdated. It’s not even obvious that there exists an unambiguous mapping to or from your data model and JSON. It is therefore a terrible candidate for a black box magical solution!

It’s really mind-boggling that we can talk about “single responsibility principle” with a grave expression on our faces and then happily proceed to do our “JSON serialization”. Clearly we’re doing two things at once. Clearly our “JSON serializer” now has two responsibilities, not one. Clearly there is more than one reason to change. And yet here we are. Because it’s so easy, until it isn’t.

But there are more problems. Consider a simple refactoring: changing the name of a property of your data model, for instance. It’s trivial, right? Just go ahead and change it! Now automatically your change also affects your JSON document. Is that what you want? Always? To change your external contract when your internal representation changes? You want your JSON representation tightly coupled to your internal data model? Really? “But ha, ha! That’s not necessarily so!” you may say, because you might be a programmer and a knight of the technically correct. “You can work around that!” And indeed you can. You can use the levers on the black box solution to decouple what you’ve coupled! Very clever! You can perhaps annotate your property with an explicit name, or even take full control over the serialization process by writing your own custom serializer plug-in thing. But at that point it is time for a fundamental question: “why are you doing this again?”.

Whenever there is potentially unbounded complexity involved in a problem, you really want full control over the solution. You want maximum transparency. Solving the problem by trying to give the black box the right configurations and instructions is much, much more difficult than just doing it straightforwardly “by hand”, as it were. By hand, there are no “exceptions to the default”, you just make the mapping you want. Conversely, if and when you summon a daemon to solve a problem using the magic of reflection, you really want that problem to be a fixed one. Keep the daemon locked in a sealed box. If you ever have to open the box, you’ve lost. You’ll need to tend to the deamon endlessly and its mood varies. It is a Faustian bargain.

So what am I suggesting? I’m suggesting letting JSON serialization be about JSON only. Let JSON serializer libraries handle translating between text and a representation of the JSON object model. They can do that one job really well, quickly and robustly. Once you have that, you take over! You take direct control over the mapping from JSON to your own model.

It looks like this:

JSON serialization and mapping by hand

There is still potentially arbitrary complexity involved of course, in the mapping between JSON and your own model. But it is visible, transparent complexity that you can address with very simple means. So simple that we call it boilerplate.

There is a famous paper by Fred Brooks Jr kalled “No Silver Bullet“. In it, Brooks distinguishes between “essential” and “accidental” complexity. That’s an interesting distinction, worthy of a discussion of its own. But I think it’s fair to say that for “JSON serialization”, we’re deep in the land of the accidental. There is nothing inescapable about the complexity of serializing and deserializing JSON.