Cast or GetHashCode?

I really hate to resurrect this issue but after some recent conversations I think it’s necessary.  We have a lot of code – particularly in the deep, dark recesses of our application that no one dares touch – that uses GetHashCode() to retrieve the underlying value of an enumeration item.

I’ve been slowly working to eliminate this technique from the system but it works, has been in use for eight or so years, and old habits die hard.  An unfortunate side effect though, is that less experienced developers see this pattern repeated throughout the code, internalize the practice, and propagate it.  If GetHashCode() works why should we care?

This question has already been discussed a bit on Stack Overflow.  The top rated answer gives two reasons for preferring a cast:

That GetHashCode() happens to return the integer value of the enum is an implementation detail and may change in future versions of .net.

GetHashCode() guarantees that if two values are equal their hash codes are equal too. The other way round is not guaranteed.

In other words, using GetHashCode() in this manner is abusing the method; it’s intended for use with hashing algorithms and hash tables, not type conversions.  With the .NET Framework currently on version 4 I don’t expect the underlying algorithm to change but it always remains a possibility and, as The Pragmatic Programmer tells us, we shouldn’t rely on programming by coincidence.  Just because it just happens to work doesn’t mean it will continue to work or that it’s right.  I fully agree with that reasoning but I think we need to dig a little deeper to fully appreciate why casting is the preferred technique.

I wrote a simple test script to determine whether GetHashCode() has any negative impact on execution.  The script iterated 1,000,000 times, grabbing a random enum item and timing how long it took to use get the underlying value with both GetHashCode() and a cast to int.  I expected to see some performance impact but the gap between the two techniques turned out much larger than I expected.

I tabulated the results over 10 runs.  The table below shows that GetHashCode is significantly slower than a simple cast but there is also less predictable.

Execution Time Over 1,000,000 Items
(All times are in milliseconds)

Run Total GetHashCode() Total Cast
1 251.4508 47.5184
2 297.3245 51.1837
3 274.0526 47.1972
4 309.5782 49.1028
5 253.9144 47.0770
6 274.3677 47.7544
7 347.5474 49.7458
8 308.8236 51.0616
9 300.2899 49.1750
10 299.1564 48.1113

I also graphed the data points to get a better view.  The graph clearly shows the performance difference but also illustrates the GetHashCode()’s unpredictability.

What we see is that over 1,000,000 iterations GetHashCode is generally about 5 to 7 times slower than casting.  We can also easily observe the casting’s consistency with every instance coming in at roughly 50 milliseconds.  Why is there such a disparity?

If we look at the enumerations’s MSIL we can see each item is explicitly defined as the enumeration’s underlying type (Int32 by default).  Casting the enumeration is just a basic type conversion operation that instructs the runtime to give us the item’s underlying type instead of the enumeration.

GetHashCode() is more complicated.  A quick glance at the method in your decompiler of choice will reveal a chain of method calls.  Nested inside this chain are calls to two extern methods:

  • System.Enum.InternalGetValue()
  • System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode().
If you’re not familiar with the extern modifier it is used to define methods that are implemented elsewhere, typically in an unmanaged library.

Both of these methods are decorated with [MethodImpl(MethodImplOptions.InternalCall)] meaning that their implementation is within the CLR itself.  So whenever we use GetHashCode() to get the underlying value of an enumeration item we’re always going to make two jumps out of managed code into the CLR’s native code.  This added overhead could become significant depending on how often you need an underlying value.

We’ve focused on casting and GetHashCode() but in the interest of completeness I also want to examine a third way to get an enumeration value.  System.Enum implements IConvertible and explicitly implements the various type conversion methods defined by that interface.  Because the conversion methods are explicitly implemented we need to convert the enumeration to IConvertible in order to use them.

I ran a few tests using IConvertible and found that over 1,000,000 items IConvertible.ToInt32() was consistently about 70 milliseconds slower than GetHashCode().  I didn’t look as deeply into the internals of the IConvertible implementation but it looks like the extra slowness is due to both an extern method call and some extra casts.

So what have we learned?  We’ve seen that using GetHashCode() has several drawbacks:

  1. Microsoft could change the implementation
  2. GetHashCode offers somewhat unpredictable performance
  3. GetHashCode is slower than casting
  4. GetHashCode is still faster than IConvertible

Does all this mean anything?  I think it does otherwise I wouldn’t have written this.  I’ll concede that my tests aggregated the elapsed time over 1,000,000 items and normal volumes are probably much lower.  I’ll also concede that even at this volume we’re still only dealing with milliseconds but one thing that’s clear is that using GetHashCode() in this manner is forcing the system to do unnecessary work.  Why would we intentionally consume system resources without a good reason?  I think that performance is only an ancillary consideration though.

What is more important to me is that we not underestimate the significance of abusing/bastardizing the GetHashCode() method.  It is generally understood that GetHashCode() is for use in hashing algorithms and hash tables.  When we repurpose it as a type conversion we’ve violated an implicit contract of the interface and therefore made the code more difficult to understand.

Advertisements