Patching polymorphic pain at runtime

In the last post, we saw that data binding in ASP.NET doesn’t support polymorphism. We also saw that we could mitigate the problem by using simple wrapper types. Writing such wrappers by hand won’t kill you, but it is fairly brain-dead. I mentioned that an alternative would be to generate the wrappers at runtime, using reflection. That actually sounds like a bit of fun, so let’s see how it can be done. If nothing else, it’s a nice introductory lesson in using Reflection.Emit.

Comeback of the canines

As an example, let’s revisit our two four-legged friends and one enemy from the previous post: the Dog, the Chihuahua and the Wolf. They all implement ICanine.

The canines have gained a skill since last time, though – they can now indicate whether or not they’ll enjoy a particular kind of food. The code looks like this:

What we want to do in our web application is display a grid that shows the canine’s eating preferences as well as its bark. This calls for a combination of auto-generated and custom columns: an automatic one for the Bark property, and a custom one for each kind of food.

The DataGrid is declared in the .aspx page:

This gives us a column for the Bark out of the box.

In the code-behind, we add a column for each kind of food. We also get a list of canines, which we wrap in something called an BoxEnumerable<ICanine> before binding to it.

The food preference columns use an ItemTemplate called FoodColumnTemplate. It’s a simple example of data binding which goes beyond mere properties, since we’re invoking a method on the data item:

If we run the application, we get the result we wanted:

Foods-result

Without the presence of the BoxEnumerable<ICanine> above, though, we’d have a runtime exception at our hands. Under the covers, BoxEnumerable<ICanine> is producing the necessary wrappers around the actual canines to keep the DataGrid happy.

How it works

Let’s see how we can do this. Here’s an overview of the different moving parts:

Box-overview

That’s a fair amount of types, but most of them have trivial implementations. Consider BoxEnumerable<T> first:

As you can see, it’s really the simplest possible wrapper around the original IEnumerable<T>, turning it into an IEnumerable<Box<T>>. It relies on another wrapper type, BoxEnumerator<T>:

That too is just a minimal wrapper. The only remotely interesting code is in the Current property, where a BoxFactory<T> is responsible for turning the T instance into a Box<T> instance. BoxFactory<T> looks like this:

This is short but a little weird, perhaps. For fun, we’re adding a dash of premature optimization here. We’re using EmptyBoxFactory to create an “empty” instance of Box<T> (that is, without an instance of T inside). The BoxFactory<T> holds on to that empty instance for the rest of its lifetime, and uses it to create “populated” boxes. In other words, the initial empty box acts as a prototype for all subsequent boxes. That way, we avoid using reflection more than once to create the boxes. This should make people who fear the performance penalty of reflection a little happier. Let’s see how the prototype creates populated boxes for the factory:

Easy as pie, we’re just cloning and setting the protected T field. Doesn’t get much simpler than that.

It’s time to start worrying about the box itself, though. Of course, this is where things get both non-trivial and interesting.

So the goal is to create a type at runtime. The type should be used to wrap each item in an IEnumerable<T>, so that the control’s DataSource is set to a perfectly homogenous IEnumerable. That is, it will only contain instances of the same concrete type. The wrapper type won’t have any intelligence of its own, it will merely delegate to the wrapped instance of T.

To support auto-generation of columns, the wrapper type must have the same public properties as T. (We won’t consider the option of masking or renaming properties – that’s a use case that goes beyond just fixing what is broken.) In the case of T being an interface, a viable option would be for the wrapper type to implement T. However, we need the wrapper to work for all kinds of T, including when T is a base class with one or more non-virtual members. In the general case, therefore, the wrapper must simply mimic the same properties, duck typing-style.

Auto-generation of columns is pretty nifty, and a property-mimicking wrapper is sufficient for that scenario. For more sophisticated data binding scenarios, though, you need to be able to call arbitrary methods on the item we’re binding to. To do so in the general case (where T might be a class), we need some way of shedding the wrapper. We can’t simply call the methods on the wrapper itself, since we don’t have access to the name of the dynamically generated wrapper type at compile time. The C# compiler wouldn’t let us (well, we could use dynamic, but then we’re giving up static typing). So we’ll be using an Unwrap method, giving us access to the bare T. (Note that we can’t use a property, since that would show up when auto-generating columns!)

Now how can we call Unwrap if the type doesn’t even exist at compile time? Well, we know that there’s a small set of core capabilities that all wrapper types are going to need: the wrapped instance of T, and a way of wrapping and unwrapping T. So let’s create an abstract base class containing just that:

That way, we can always cast to Box<T>, call Unwrap, and we’re good.

Why are we calling it a “box”, by the way? It’s sort of a tip of the hat to academia, of all things. According to this paper on micro patterns, a “box” is “a class which has exactly one, mutable, instance field”. That suits our implementation to a T (hah!) so “box” it is.

The concrete box for our example should conceptually look like this:

Of course, the boxes we generate at runtime will never actually have a C# manifestation – they will be bytecode only. At this point though, the hand-written example will prove useful as target for our dynamically generated type.

Note that we’re going to try to be a little clever in our implementation. In the case where T is an interface (like ICanine), we’re going to let the dynamically generated box implement the original interface T, in addition to extending Box<T>. This will allow us to pretend that the box isn’t even there during data binding. You might recall that we’re casting to ICanine rather than calling Unwrap in the FoodColumnTemplate, even though the data item is our dynamically generated type rather than the original canine. Obviously we won’t be able to pull off that trick when T is a class, since C# has single inheritance.

Looking at the bytecode for BoxedICanine in ILDASM, ILSpy or Reflector, you should see something like this (assuming you’re doing a release compilation):

This, then, is what we’re aiming for. If we can generate this type at runtime, using ICanine as input, we’re good.

IL for beginners

If you’re new to IL, here’s a simple walk-through of the get_Bark method. IL is a stack-based language, meaning it uses a stack to transfer state between operations. In addition, state can be written to and read from local variables.

The .maxstack 8 instruction tells the runtime that a stack containing a eight elements will be sufficient for this method (in reality, the stack will never be more than a single element deep, so eight is strictly overkill). That’s sort of a preamble to the actual instructions, which come next. The ldarg.0 instruction loads argument 0 onto the stack, that is, the first parameter of the method. Now that’s confusing, since get_Bark seems to have no parameters, right? However, all instance methods receive a reference to this as an implicit 0th argument. So ldarg.0 loads the this reference onto the stack. This is necessary to read the _ instance field, which happens in the ldfld !0 instruction that follows. The ldfld !0 pops the this reference from the stack, and pushes the reference held by the 0th field (_) back on. So now we got an reference to an ICanine on there. The following callvirt instruction pops the ICanine reference from the stack and invokes get_Bark on it (passing the reference as the implicit 0th argument, of course). When the method returns, it will have pushed its return value onto the stack. So there will be a reference to a string there. Finally, ret returns from the method, leaving the string reference on the stack as the return value from the method.

If you take a look at the Eats method next, you’ll notice it’s practically identical to get_Bark. That’s because we’re essentially doing the same thing: delegating directly to the underlying T instance referenced by the _ field.

Now, how can we generate stuff like this on the fly?

Creating a type at runtime

As you can see below, a .NET type lives inside a module that lives inside an assembly that lives inside an appdomain.

Appdomain-blue

So before we can start generating the actual type, we need to provide the right environment for the type to live in. We only want to create this environment once, so we’ll do it inside the constructor of our singleton EmptyBoxFactory:

AssemblyBuilderAccess.Run indicates that we’re creating a transient assembly – it won’t be persisted to disk. We’re holding on to the module builder, which we’ll use when creating types later on. Assuming that we’ll be using the BoxEnumerable<T> in multiple data binding scenarios (for various Ts), the module will be accumulating types over time.

The public API of EmptyBoxFactory is limited to a single method, CreateEmptyBox. It uses reflection to create an instance of the appropriate type.

Creating the instance is simple enough (albeit slower than newing up objects the conventional way). The real work lies in coming up with the type to instantiate, so we need to move on! GetBoxType<T> looks like this:

We’re still treading the waters, though. Specifically, we’re just checking if the module already contains the suitable box type – meaning that we’ve been down this road before. Assuming we haven’t (and we haven’t, have we?), we’ll go on to CreateBoxType. Hopefully we’ll see something interesting there.

Oh man, it seems we’re still procrastinating! We haven’t reached the bottom of the rabbit hole just yet. Now we’re preparing for the BoxTypeFactory to create the actual type.

Two things worth noting, though. One thing is that if t is an interface, then we’ll let our new type implement it as mentioned earlier. This will let us pretend that the box isn’t even there during data binding. The other thing is that we’re obtaining a FieldInfo instance to represent the _ field of BoxType<T>, which as you’ll recall holds the instance of T that we’ll be delegating all our method calls and property accesses to. Once we have the FieldInfo, we can actually forget all about BoxType<T>. It’s sort of baked into the TypeBuilder as the superclass of the type we’re creating, but apart from that, BoxTypeFactory is oblivious to it.

But now! Now there’s nowhere left to hide. Let’s take a deep breath, dive in and reflect:

Oh. That’s almost anti-climatic – it’s not really hard at all. The Create method is super-simple: create proxy methods for any public methods in the type we’re wrapping, wire up any properties to the corresponding getter and/or setter methods, and we’re done! CreateProxyMethod seems like it might warrant some explanation; however, all we’re really doing is copying verbatim the IL we looked at in our walkthrough of get_Bark earlier. The wiring up of properties is necessary because a property consists of two parts at the IL level, a .property thing and a .method thing for each accessor. That, too, we saw in the IL of the hand-written class. So there’s really not much to it.

You might note that we’re explicitly not creating a proxy for the GetType method, defined on System.Object. This applies to the case where the type we’re boxing is a class, not an interface. In general, we shouldn’t proxy any non-virtual methods inherited from System.Object, but in practice that’s just GetType. So we’re taking the easy way out. (Note that the .NET runtime wouldn’t actually be fooled if we did inject a lying GetType implementation – it would still reveal the actual type of the object. Still, it’s better to play by the book.)

We will be providing proxies for virtual methods, though (e.g. Equals, GetHashCode and ToString). This makes the box as invisible as possible.

Afterthought: Anonymous types

There’s actually an alternative way of getting around the problem with broken polymorphism in simple scenarios. Rather than hand-writing your own wrapper or generating one at runtime, you can have the C# compiler generate one for you at compile time, using anonymous types. In fact, you can approximate a working solution for our example just by doing this in the code-behind:

Note that you don’t add any custom columns in this case, it’s all auto-generated. Running the application, you get this:

Food-result-anon

It’s not exactly the same as before, but it’s pretty close. Unfortunately, the approach isn’t very flexible – it breaks down as soon as you want to display something that’s not just text in the grid. For instance, say you want something like this:

Food-dropdown

Anonymous types won’t help you, but the runtime wrapper will (as will a hand-written one, of course). You just need a suitable ITemplate:

So…

Turns out that generating types at runtime is no big deal. It provides a flexible solution to the data binding problem, without the need for mindless hand-written wrappers.

As usual, let me know if you think there’s something wrong with the approach or the implementation. Also, I’d love to hear it if you have a different solution to the problem.


Polymorphic pain in ASP.NET data binding

I recently found out – the hard way, of course – that data binding in ASP.NET is broken with respect to polymorphism. It’s not consistently broken, though – it depends on the particular control you’re using. Makes life as a programmer that much more interesting, doesn’t it?

Let’s consider a very simple example. We have a type hierarchy like the following:

Canine-hierarchy

There’s an ICanine interface, with implementing classes Wolf and Dog. Chihuahua is a subclass of the latter. The code looks like this:

Now we’d like to conjure up a collection of canines and bind to them. Sounds innocent enough, right? And it is, if you use one of the benign controls. “Benign” as in “not badly broken”. ListBox is one of those. We create an IEnumerable<ICanine> like so:

And then we do the two-step song-and-dance of ASP.NET data binding (assuming that _box is an instance of ListBox):

We run it, and get the following result:

Bark-listbox

Presto! All is well! What is this guy talking about? Broken polymorphism? Where?

Well, that was the benign control, remember? Here’s a malicious one: DataGrid.

We do the exact same thing, except using _grid of the obvious type:

We run it, and get…

Exception-grid-wolf

Ouch.

Evidently, there’s some reflection voodoo going on underneath the hood when you’re doing data binding in ASP.NET. And in the case of DataGrid, that voodoo is just too feeble.

Now, consider a variation of the code above, omitting the Wolf. Wolves are trouble after all.

This time…

Bark-grid-dogs

It works! Oh man, that’s weird. So apparently subclassing works as long as there’s a common base class? You wish. Let’s try this instead:

That is, we reverse the order, passing in the Chihuahua first, before the Dog. And now:

Reflection-dog-exception

Gaah!

The reflection voodoo seems to be making some arbitrary assumptions regarding types based on the first element in the enumerable or some such. You could probably tease out the details using Reflector and coffee, but there’s no point. It’s just broken; I don’t care exactly how. What we need is a simple and predicatable workaround.

Workaround

You can mitigate the problem (aka complicate your program in order to work around a broken framework) by using a wrapper type that always stays the same. That way, the type of the instances handed out by the IEnumerable stays nice and homogenous, just the way DataGrid likes it. Inside the wrapper, you delegate to whatever ICanine you want:

This effectively replaces the original IEnumerable<ICanine> containing the bare Chihuahua and Dog with one that contains only wrapper canines. So data binding should work. And it does:

Wrapped-chihuahua

Notice that we got the Chihuahua Arff!ing as the first grid element.

You could generate such wrappers on the fly, using reflection. In fact, you can download and use something called a “defensive datasource“. Turns out I’m not the only one who’s been bitten and annoyed by this issue.

Peace, love and understanding

Why is DataGrid broken? Well, if you crack open BaseDataList, a base class for both DataGrid and ListBox, you’ll find that the DataSource property assumes that you’re passing it an IEnumerable. No T, just a plain ol’ .NET 1.1-style untyped IEnumerable. So basically, it’s just a series of arbitrary stuff, the type of which could be anything. You could stuff apples and oranges in there, no problem. Now recall this result:

Bark-grid-dogs

See the Bark header? That’s the name of the property shared by Dog and Chihuahua. This is DataGrid auto-generating columns based on the properties of the objects you pass it for data binding. It’s sort of cool, even though it’s broken. Of course, the DataGrid couldn’t pull that trick off without knowing something about the types of the instances in the IEnumerable. In fact, it absolutely needs to know that all instances share the properties that it’s going to display. If you put instances of both Apple and Orange into your IEnumerable, what would you expect to see? You need some commonality between the types, or the whole premise of DataGrid just falls apart.

Of course, an IEnumerable<T> would give you what you need: a common type T for the DataGrid to work with. But DataGrid is stuck with an IEnumerable for its data source and has to make do, somehow. One solution would be to build an inheritance tree for all the elements in the IEnumerable and pick the root type, the least common denominator so to speak. But I imagine that would be costly. Instead, DataGrid looks at the type of the first element, and assumes that the rest will be just like it (or a subclass). Weird, arbitrary, yet at least not completely irrational.

ListBox revisited

Now, given that ListBox also expects an untyped IEnumerable, how come polymorphism seems to work in that case?

Consider three unrelated classes, Huey, Dewey and Louie. We’ll make them singletons since there can only be one of each. More importantly, they all inherit directly from System.Object; there’s nothing else linking them together (no IDuck interface, no Duck base class). Coincidentally, though, they each sport a QuacksLike property.

Here’s the code for Huey:

As you can imagine, the declarations for Dewey and Louie are remarkably similar.

Let’s toss all three into an IEnumerable and see what happens:

The result is this:

Ducks-listbox

Isn’t that something? It’s not really polymorphic at all! Instead, it turns out that ListBox supports duck typing. As long as the object has a public property matching the DataTextField for the ListBox, the actual type is irrelevant. The property’s type is irrelevant too. We could change Dewey‘s implementation of QuacksLike like this, and it will still work:

Quacker could be anything, really. Here’s a possibility:

Now we get this:

Any-old-duck

Of course, if we were to replace Dewey with, say, an Apple, we’d be in trouble (unless it happens to have a public QuacksLike property, but that seems unlikely):

Exception-apple-quacks

No duck typing without quacking, that’s what I always say!

Conclusion

So polymorphism is not actually supported by either control. It’s just that it’s more likely that you’ll notice when you use a DataGrid than a ListBox. Fun!


A simple LRU cache

Disclaimer: copy code from this site at your own peril!

So yeah, I wanted to share this simple LRU cache I put together. LRU means “least-recently-used”. The idea is that you want to keep the hottest, most useful elements in the cache, so you evict elements that haven’t been used for a while. It’s a useful approximation of Belady’s algorithm (aka the clairvoyant algorithm) in many scenarios. Wikipedia has all the details.

Of course, my LRU implementation (written in C#) is extremely simple, naive even. For instance, it’s not self-pruning, meaning there’s no threading or callback magic going on to make sure that elements are evicted as soon as they expire. Rather, you check for expired elements when you interact with the cache. This incurs a little bit of overhead for each operation. Also, it means that if you don’t touch the cache, elements linger there forever. Another name for that would be “memory leak”, which sounds less impressive than “LRU cache”, but there you go. (We’re going off on a tangent here. Why would you create a cache that you never use? So let’s assume that you’re going to use it. “Useful” sort of implies “use”, indeed it should be “ful” of it.)

(The implementation isn’t synchronized either, which means you could wreak havoc if you have multiple threads accessing it simultaneously. But of course, you would be anyway. None of us can write correct multi-threaded code, so we should just refrain from doing so until the STM angel arrives to salvage us all. I believe his name is “Simon“.)

Anyways, the basic idea of an LRU cache is that you timestamp your cache elements, and use the timestamps to decide when to evict elements. To do so efficiently, you do two kinds of book-keeping: one for lookup, another for pruning of expired elements. In my implementation, this translates to a hash table and a linked list. The linked list is sorted by age, making it easy to find the oldest elements. New elements are added to the front, elements are pruned from the back. Pretty simple. I’ve sketched it below, just in case you cannot read.

Lru-cache

The cache in the figure holds five elements, two of which have expired. To be extraordinarily pedagogical, I’ve colored those red. The still-fresh elements are green. Notice that there are arrows going both ways between the hash table and the list. We’ll be using both to do our double book-keeping.

Now, let’s look at the implementation details. Performance-wise, we want to keep everything nice and O(1), with the caveat that there’ll be a little book-keeping going on. It shouldn’t be too bad, there’s not much to do.

Overview

A skeletal view of the ExpirationCache class looks like this:

There’s a fair amount of generic noise there, but the important stuff should be graspable. We can see the hash table (i.e. the Dictionary) and the linked list, as well as the TimeSpan which determines how long items should be held in the cache before expiring. The type declaration reveals that the cache holds elements of type TValue, that are accessed by keys of type TKey.

The TimeStamped type is simply a way of adding a timestamp to any ol’ type, like so:

Pruning

Let’s look at something a bit more interesting: the pruning. (Of course, I don’t know if caches can really be “pruned”, as they’re not plants. But programming is the realm of mangled metaphors, so we’ll make do.) Here’s the Prune method:

The linked list is the tool we use for pruning. We iterate over the list backwards, starting with the last and oldest element. If the element has expired, we need to remove it from both the linked list and the dictionary. Of course, these operations must be O(1), or we’re toast. Removing the current node from a linked list is not a problem, but what about the dictionary? We need access to the key, so the linked list node needs to contain that. Once we have that, we’re good. We stop iterating once we reach an element that hasn’t expired yet. The neat thing is that we never look at more than a single non-expired element, since the list is sorted by age.

There are plenty of variations that you might consider. You might want to pose a strict limit on the number of elements, for instance. You might use that instead of, or in conjunction with, the expiration time.

How aggressively should we be pruning the cache? In my implementation, I prune on all operations, including counting elements in the cache. But you could choose different approaches. For instance, if you don’t mind serving slightly stale elements, you could prune only when adding. Speaking of adding: what happens when you add an element to the cache?

Adding

There are two scenarios to consider when adding an element to the cache: 1) adding a new element, and 2) replacing an existing element. Let’s look at the code to see how it’s done.

After the mandatory pruning step is done, we can do what we’re here for. Let’s assume we’re adding a new element first. It’s trivial, even though we need to nest a few type envelopes: we’re wrapping our element in a TimeStamped inside a KeyValuePair inside a LinkedListNode. Then we just put that in front of the list (because it’s our newest one), and add it to the dictionary. Replacing an existing element is not much harder; we just remove it before adding it. We can’t mutate it, since we need a new timestamp and – more importantly – a new position in the linked list. We can’t afford to shuffle things around, since we’d lose O(1) in a heartbeat.

Look up

There isn’t much to it, really. Look up is a pure hash table operation:

Of course, you need open up all those type envelopes, too. But that’s pretty much it. For convenience, it’s nice to add an indexer, like so:

Wrapping up

And we’re done! If you think the implementation sucks or is flat-out broken, please tell me why.