C#

Specifications, Expression Trees, and NHibernate

For my latest project we decided to follow a domain driven approach.  It wasn’t until after we had built out much of the domain model that we decided to change course and embrace CQRS and event sourcing.  By this point though we had already developed several specifications and wanted to adapt what we could on the query side.

(more…)

String Distances

One of my first projects at Leaf has been trying to match some data read from an OCR solution against what is in the database.  As sophisticated as OCR algorithms have become though it’s still not reliable enough to guarantee 100% accurate results every time due to the number of variations in type faces, artifacts introduced through scanning or faxing the document, or any number of other factors.

Most of the documents I’ve been working with have been pretty clean and I’ve been able to get an exact match automatically.  One of my samples though, has some security features that intentionally obfuscate some of the information I care about.  This naturally makes getting an exact match difficult.  Amazingly though, the OCR result was about 80% accurate so there was still some hope.

One of my coworkers suggested that I look at some of the string distance algorithms to see if any of them could help us get closer to an exact match.  He pointed me at the Levenshtein algorithm so I took a look at that along with the Hamming and Damerau-Levenshtein algorithms.

For the uninitiated (like myself a week ago), these algorithms provide a way to determine the distance between two strings.  The distance is essentially a measurement of string similarity. In other words, they calculate how many steps are required to transform one string into another.

I want to look briefly at each of these and show some examples. Note that each of these algorithms are case sensitive but modifying them to ignore case is trivial.

(more…)

Theoris Innovation Series

On April 20, 2012 from 1:00 – 4:00 PM Theoris IT Services is hosting the next installment of its Theoris Innovation Series.  For this event Alex Gheith and I will be discussing many of the modern features of C# including:

  • LINQ
  • Dynamic Programming
  • Parallel Programming (including the upcoming async and await keywords)

This is a free event but please note that space is limited to the first 40 respondents.  For more information, please check the event site.

Going Underground: Microsoft Moles

3/27/2012 Update: According to the Moles page, the Moles framework has been integrated into Visual Studio 11 as the Fakes framework and Moles is no longer under active development.   After a quick review of the changes it appears that most of this guide still applies but there are a few changes to be aware of:

  • Mole types are now referred to as Shim types
  • The configuration file now has a .fakes extension
  • The generated types are now placed in a .Fakes namespace (i.e.: System.Fakes)

6/25/2012 Update: In preparation for my Faking It talk covering the Fakes framework I compiled a list of the notable differences between Moles and Fakes. There are quite a few more than I listed above so if you’re using this post as an introduction to either framework you’ll probably want to look them over.

Despite having been around for several years I hadn’t heard about Microsoft’s Moles framework until a few months ago when one of my teammates mentioned it during the 2011 Indy GiveCamp. I was interested in learning more about it at the time but given how we were running on caffeine I’m not surprised I forgot about it until he mentioned it again a few weeks ago. This time I was much more alert and started reading about it almost immediately. After seeing what Moles offered and not finding much in the way of resources for it on the Web I knew I needed to spread the word.

(more…)

Some C# Syntax Tricks

Christmas came a bit early for my team this year when we were told that the company had purchased licenses for both Reflector AND ReSharper!  Naturally I installed both right away.  Of course we’re all familiar with Reflector but haven’t used it lately since it’s no longer free.  ReSharper on the other hand is new to many of us.  I know lots of great developers that swear by it but I’ve never really had a chance to dive in and try it out.  I generally think I’m pretty savvy with C# syntax but after just two days ReSharper has already taught me a few syntax tricks that I’ve adopted.

(more…)

System.Diagnostics.Debugger

I hardly ever use the classes in the System.Diagnostics namespace.  As much as I’d like everyone to believe that it’s because I’m such a rockstar that I don’t need them, it’s really just that I generally use other techniques.  With Visual Studio providing so many tools for debugging I’ve rarely had reason to dig into this namespace much.  Sure, I’ve used the Debug, Trace, and EventLog classes but I haven’t taken the time to investigate what else is in there.

(more…)

Cast or GetHashCode?

I really hate to resurrect this issue but after some recent conversations I think it’s necessary.  We have a lot of code – particularly in the deep, dark recesses of our application that no one dares touch – that uses GetHashCode() to retrieve the underlying value of an enumeration item.

I’ve been slowly working to eliminate this technique from the system but it works, has been in use for eight or so years, and old habits die hard.  An unfortunate side effect though, is that less experienced developers see this pattern repeated throughout the code, internalize the practice, and propagate it.  If GetHashCode() works why should we care?

(more…)

Not Another Regular Expression

I haven’t done anything with the System.Drawing namespace directly in a long time.  So long in fact that before today I honestly can’t remember the last time I needed anything in there.  When I needed to update the border color on an old ASP.NET DataGrid and the compiler informed me that I couldn’t use a hex string I was a bit surprised.  I needed a way to convert that string to a System.Drawing.Color.

In my haste the first thing I did was start writing a method to parse out the string and get the integer values to pass in to Color.FromArgb.  Because I needed to account for both the 3-digit and 6-digit formats in both uppercase and lowercase characters with or without the leading hash I started hacking out a regular expression.

I haven’t had much reason to use regular expressions for a long time either but apparently (and amazingly) I can apparently remember their syntax better than I can remember what’s in System.Drawing because with minimal documentation referencing this is what I came up with:

var re = new Regex(
	@"^#?(?([\dA-F]{3}$)(?<r>[\dA-F])(?<g>[\dA-F])(?<b>[\dA-F])|(?<r>[\dA-F]{2})(?<g>[\dA-F]{2})(?<b>[\dA-F]{2}))$",
 RegexOptions.IgnoreCase
);

As irritating and confusing as the syntax is I’m always amazed at how powerful regular expressions are.  There’s really quite a bit going on in this example so let’s take a look at what it’s matching.  I won’t talk about the RegexOptions piece because that should be pretty self-explanatory but otherwise we can break this one down into a few pieces starting with the most basic.

We start and end with the ^ and $ characters.  These ensure that the string we’re checking is respectively the first and last thing on the line.  Immediately following the opening ^ we see the #? pattern that says a valid match will start with no more than one instance of the # character.

Throughout the expression we repeatedly see the [\dA-F] pattern.  On its own this pattern matches a single hexadecimal digit (0-9, A-F).  When we need to match multiple consecutive hexadecimal digits we follow the pattern with a quantifier like {2} or {3}.

The remaining constructs in the expression deal with groups and conditional matching (formally called alternation).  These constructs look similar and are closely related.  In this example we’re using two types grouping patterns and an alternation pattern.  It’s probably best to start with the outermost construct and work our way in.

In this example alternation construct follows the (?(expression)yes-part|no-part) syntax.  I like to think of this conditional matching construct as the regular expression version of the ternary operator.  The expression is a zero-width assertion construct (non-advancing) that is used to determine whether the yes-part or no-part pattern should be matched.  Most of the time the construct for a zero-width assertion begins with (?= but in this case the assertion is is implied and the .NET regular expression parser allows us to omit the ?=.  In this example our zero-width assertion is ([\dA-F]{3}$).  That is, we’re evaluating whether the string matches exactly 3 hexadecimal digits followed by the end of the line.  In short, if the string is a 6-digit format the parser will match the “yes” part otherwise it will match the “no” part.  The reason we’re asserting the end of line here too is that we want to ensure that a 6-digit color doesn’t fall in to the “yes” part.

Note: Alternatively we could assert [\dA-F]{6} and swap the yes/no parts.

The “yes” and “no” parts are very similar in that they both consist of three named capturing groups: “r”, “g”, and “b”.  The named capturing groups are identified by the (?<name>pattern) syntax and instruct the parser to remember the values for use later in the pattern through backreferences or returning to C# via the Groups collection on the Match object.  Since we’ve really covered what the pattern does we won’t go into detail here.  We just need to recognize that when we’re matching a 3-digit color we capture the individual digits whereas when we have a 6-digit color we capture pairs of digits.  By using the same names in both parts our C# code can be completely ignorant of how the expression captured them.

Note: Regular expressions also allow for unnamed capturing groups that can be referred to by their ordinal index.  Even though they add clutter to an already potentially confusing string I usually stick to the named capturing groups because they make it easier to remember which group I’m working with.

This regular expression did the trick nicely.  I was able to extract the individual color components from both 3-digit and 6-digit color codes and fail out of anything that didn’t match by checking the match’s Success property.  Unfortunately this was only part of the conversion process.  I still needed to convert the values from the 3-digit pattern over to their 6-digit equivalent and pass the integer values to Color.FromArgb.  At this point I got to thinking “there has to be an easier way” as though the regular expression wasn’t enough.

No matter how far you have gone on a wrong road, turn back.
– Turkish Proverb

Remember that I said that I haven’t done anything with the System.Drawing namespace directly in a long time…  It turns out that there’s a ColorTranslator class in System.Drawing that provides a nice FromHtml method.  FromHtml takes a hex string and returns the equivalent System.Drawing.Color.  Problem solved.

Building Strings Fluently

Last night I was reading the second edition of Effective C#.  Item 16 discusses avoiding creation of unnecessary objects with part of the discussion using the typical example of favoring StringBuilder over string concatenation.  The tip itself was nothing new, StringBuilder has been available since the first versions of the .NET framework, but it did remind me of something I “discovered” a few months ago.

Back in late July Esther and I took a week vacation.  We rented a two story loft on a marina in southwest Michigan.  It was incredibly relaxing and apparently more refreshing than I realized at the time.  When I returned to the office the following Monday I was looking at a block of code that was doing a lot of string concatenation and decided to rewrite it to use a StringBuilder instead.  When using a StringBuilder I follow the familiar pattern seen in most books and even in the MSDN documentation:

var sb = new StringBuilder();
sb.Append("Hello, Dave");
sb.AppendLine();
sb.AppendFormat("Today is {0:D}", DateTime.Now);
Console.WriteLine(sb.ToString());

For some reason though as I was writing code this particular Monday I noticed something that I hadn’t noticed before.  I realized that StringBuilder, a class I’ve been using for nearly 10 years, implements a fluent interface!  All of those familiar methods like Append, AppendFormat, Insert, Replace, etc… each return the StringBuilder instance meaning we can chain calls together!

Armed with this new knowledge I started thinking about all the places that code can be simplified just by taking advantage of the fluent interface.  No longer do I need to define a variable for the StringBuilder and pass it to something.  Instead, I can create the instance inline, build it up, then pass it along.

Console.WriteLine(
		(new StringBuilder())
		.Append("Hello, Dave")
		.AppendLine()
				.AppendFormat("Today is {0:D}", DateTime.Now)
		.ToString()
);

Hoping I hadn’t been completely oblivious for so long I hopped over to the .NET 1.1 documentation and what I found was astonishing – this functionality has been there all along.  I asked a few trusted colleagues if they knew about it and incredibly none of them had realized it either!  How did we miss this for so long?

Basic Dynamic Programming in .NET 4

.NET 4 adds some nice tools to the toolbox.  Chief among them is support for dynamic languages and dynamic features in strongly typed languages.  In this post we’ll examine how to use reflection to work with unknown data types then we’ll see how to use dynamics to accomplish the same task. Next, we’ll see an example of interacting with IronRuby from within a C# application.  Finally, we’ll take a brief look at two of the specialized classes in the System.Dynamic namespace.

Introducing Dynamic Programming

.NET has historically been a statically typed environment.  By virtue of being statically typed we get many benefits such as type safety and compile-time member checking.  There are times though that we don’t know the type of a variable when we’re writing the code.  Consider this code:

module Bank

type BankAccount() = class
  let mutable _balance = 0m

  member this.CurrentBalance
    with get() = _balance

  member this.Deposit(amount) =
    let lastBalance = _balance
    _balance <- _balance + amount
    ( lastBalance, _balance)

  member this.Withdrawal(amount) =
    let lastBalance = _balance
    let tmpBalance = _balance - amount
    if tmpBalance < 0m then raise(new System.Exception("Balance cannot go below zero"))
    _balance <- tmpBalance
    ( lastBalance, _balance)

  member this.Close() =
    let lastBalance = _balance
    _balance <- 0m
    ( lastBalance, _balance)
end

The code above defines a simple type in F#.  Don’t worry if you’re not familiar with it – all that’s important to understand here is that the BankAccount type has a parameterless constructor, a read-only CurrentBalance property, and three methods (Deposit, Withdrawal, and Close) that each return a two item tuple.

What if we want to work with this type from a C# assembly?  In many cases it will be possible to add a reference to the F# project but what if that isn’t possible?  What if we’re getting a reference to the type from an IoC container, a factory method, or a COM interop component that returns object?  In those cases we may not have enough information to cast the instance to a known type.

Reflecting on the Past

In the past when these situations would arise we had to resort to reflection to access an object’s members.  Aside from being costly, using reflection requires a lot of extra code and often involves additional casting.

var account = BankAccountFactory.GetBankAccount();

var accountType = account.GetType();
var depositMethod = accountType.GetMethod("Deposit");
var currentBalanceProperty = accountType.GetProperty("CurrentBalance");
var withdrawalMethod = accountType.GetMethod("Withdrawal");
var closeMethod = accountType.GetMethod("Close");

Console.WriteLine(Resources.CurrentBalanceFormat, ((Tuple<decimal, decimal>)depositMethod.Invoke(account, new object[] { 1000m })).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, ((decimal)currentBalanceProperty.GetValue(account, null)));
Console.WriteLine(Resources.CurrentBalanceFormat, ((Tuple<decimal, decimal>)withdrawalMethod.Invoke(account, new object[] { 250m })).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, ((Tuple<decimal, decimal>)withdrawalMethod.Invoke(account, new object[] { 100m })).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, ((Tuple<decimal, decimal>)depositMethod.Invoke(account, new object[] { 35m })).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, ((Tuple<decimal, decimal>)closeMethod.Invoke(account, new object[] { })).Item2);

Look at those pipes!  There’s a ton of code and most of it is just plumbing.  Before we can do anything useful with the BankAccount instance we need to get a MemberInfo (specifically PropertyInfo or MethodInfo) instance for each member we want to use.  Once we have the MemberInfo instances we need to call the Invoke or GetValue method passing the source instance, and other required arguments before casting the result of the call to the expected type! Isn’t there a better way?

Dynamic Language Runtime to the Rescue!

Prior to .NET 4 we were stuck with reflection but .NET 4 introduces the Dynamic Language Runtime (DLR) to help reduce this complexity.  The DLR is exposed to C# through the dynamic type.  Rather than declaring variables as type object we can define them as type dynamic.

Note: Please, please, pleeeease don’t confuse var with dynamic.  in C# var uses type inference to determine the actual type of the variable at compile-time whereas dynamic defers type resolution to the DLR at run-time.  Variables defined with var are still strongly typed.

The dynamic type is just like any other type in that it can be used for defining variables, fields, method return values, method arguments, etc… Variables defined as dynamic can be used like we’re working with directly with an instance of a known type. Let’s take the reflection example and revise it to use dynamics instead.

dynamic account = BankAccountFactory.GetBankAccount();

Console.WriteLine(Resources.CurrentBalanceFormat, account.Deposit(1000m).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, account.CurrentBalance);
Console.WriteLine(Resources.CurrentBalanceFormat, account.Withdrawal(250m).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, account.Withdrawal(100m).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, account.Deposit(35m).Item2);
Console.WriteLine(Resources.CurrentBalanceFormat, account.Close().Item2);

Reflection and Dynamic OutputAs you can see, the code is much more concise.  We don’t have any of the complexity required by remoting and dynamics typically performs better than reflection.  This convenience does come at a price though.

When we use dynamic types we lose all of the compile-time support that comes with a strongly typed environment.  That means we lose type safety, type checking, and even IntelliSense.  If something is wrong we won’t find out about it until run-time so it is especially important to have good tests around any code using dynamics.

Using Dynamic Languages

We’ve seen how to use the dynamic type so let’s take a look at using the DLR with an actual dynamic language (F# is functional, not dynamic).  In this example we’ll define a class using an IronRuby script hosted within a C# application.  We’ll then create and use an instance of that type.

var engine = Ruby.CreateEngine();
dynamic account = engine.Execute(
@"class BankAccount
	attr_reader :CurrentBalance

	def initialize()
		@CurrentBalance = 0.0
	end

	def Deposit(amount)
		@CurrentBalance = @CurrentBalance + amount
		@CurrentBalance
	end

	def Withdrawal(amount)
		@CurrentBalance = @CurrentBalance - amount
		@CurrentBalance
	end

	def Close()
		@CurrentBalance = 0.0
		@CurrentBalance
	end
end
return BankAccount.new"
);
Note: A full discussion of IronRuby is beyond the scope of this article.  For now just know that the Ruby class exposes the Microsoft Scripting classes.  The Ruby class itself is found in the IronRuby namespace in IronRuby.dll.

The IronRuby code snippet passed to engine.Execute defines a class very similar to the F# class we used earlier.  The ScriptEngine‘s Execute method evaluates the script and returns a dynamic that will be bound to the IronRuby type.  Once we have that reference we can use the DLR to manipulate the instance as follows:

Console.WriteLine(Resources.CurrentBalanceFormat, account.Deposit(1000m));
Console.WriteLine(Resources.CurrentBalanceFormat, account.CurrentBalance);
Console.WriteLine(Resources.CurrentBalanceFormat, account.Withdrawal(250m));
Console.WriteLine(Resources.CurrentBalanceFormat, account.Withdrawal(100m));
Console.WriteLine(Resources.CurrentBalanceFormat, account.Deposit(35m));
Console.WriteLine(Resources.CurrentBalanceFormat, account.Close());

With the exception that the IronRuby class doesn’t return a Tuple the C# code is identical to that used to work with the F# class. In both cases the DLR handles resolving properties, methods, and data types despite the fact that the underlying class is not only entirely different but is also completely unrelated. This illustrates how dynamics can also simplify working with similar classes that don’t share a common interface or base class.

The Microsoft Scripting classes don’t restrict us to using inline scripts.  We can also use the ScriptEngine‘s ExecuteFile method to invoke external scripts.  Unlike with the Execute method which returns dynamic ExecuteFile returns an instance of ScriptScope that can be used to dive back into the engine and provide more control for using the loaded script(s).

var scope = engine.ExecuteFile("IronRubySample.ir");
dynamic globals = engine.Runtime.Globals;
dynamic account = globals.BankAccount.@new();

Special Dynamic Types

In addition to declaring any unknown types as dynamic the .NET Framework now provides classes that allow dynamic behavior from the traditional .NET languages.  Each of these types are located in the System.Dynamic namespace and instances must be defined as type dynamic to avoid static typing and take advantage of their dynamic capabilities.

Of the classes in the System.Dynamic namespace we’ll only be looking at ExpandoObject and DynamicObject here.

ExpandoObject

ExpandoObject is a dynamic type that allows members to be added or removed at run-time.  The behavior of ExpandoObject is similar to that of objects in traditional dynamic languages.

public sealed class ExpandoObject : IDynamicMetaObjectProvider,
	IDictionary<string, Object>, ICollection<KeyValuePair<string, Object>>,
	IEnumerable>, IEnumerable, INotifyPropertyChanged

For the following examples assume we have an ExpandoObject defined as:

dynamic car = new ExpandoObject();
car.Make = "Toyota";
car.Model = "Prius";
car.Year = 2005;
car.IsRunning = false;

In their simplest form ExpandoObjects will only use properties:

Console.WriteLine("{0} {1} {2}", car.Year, car.Make, car.Model);

…but we can also add methods:

car.TurnOn = (Action)(() => { Console.WriteLine("Starting {0}", car.Model); car.IsRunning = true; });
car.TurnOff = (Action)(() => { Console.WriteLine("Stopping {0}", car.Model); car.IsRunning = false; });

Console.WriteLine("Is Running? {0}", car.IsRunning);
car.TurnOn();
Console.WriteLine("Is Running? {0}", car.IsRunning);
car.TurnOff();
Console.WriteLine("Is Running? {0}", car.IsRunning);

…and events:

var OnStarted =
	(Action<dynamic, EventArgs>)((dynamic c, EventArgs ea) =>
	{
		if (c.Started != null)
		{
			c.Started(c, new EventArgs());
		}
	});

var OnStopped =
	(Action<dynamic, EventArgs>)((dynamic c, EventArgs ea) =>
	{
		if (c.Stopped != null)
		{
			c.Stopped(c, new EventArgs());
		}
	});

car.Started = null;
car.Started += (Action<dynamic, EventArgs>)((dynamic c, EventArgs ea) => Console.WriteLine("{0} Started", c.Model));
car.Stopped = null;
car.Stopped += (Action<dynamic, EventArgs>)((dynamic c, EventArgs ea) => Console.WriteLine("{0} Stopped", c.Model));
car.TurnOn = (Action)(() => { car.IsRunning = true; OnStarted(car, EventArgs.Empty); });
car.TurnOff = (Action)(() => { car.IsRunning = false; OnStopped(car, EventArgs.Empty); });

Console.WriteLine("Is Running? {0}", car.IsRunning);
car.TurnOn();
Console.WriteLine("Is Running? {0}", car.IsRunning);
car.TurnOff();
Console.WriteLine("Is Running? {0}", car.IsRunning);

In addition to the standard IDynamicMetaObjectProvider interface ExpandoObject also implements several interfaces for accessing members as though they were a dictionary.  The DLR will handle adding members through its binding mechanism but we need to use the dictionary syntax to remove them.

var carDict = (IDictionary<string, object>)car;
Console.WriteLine("{0} {1} {2}", carDict["Year"], carDict["Make"], carDict["Model"]);

DynamicObject

While ExpandoObject allows us to dynamically add and remove members at run-time, DynamicObject allows us to control that behavior.

public class DynamicObject : IDynamicMetaObjectProvider

Since DynamicObject is an abstract class doesn’t expose a public constructor we must create a derived class to take advantage of its features.  A side effect of this we can also define members directly on the class and the DLR will handle resolving them correctly.

public class DynamicCar : System.Dynamic.DynamicObject
{
	public DynamicCar()
	{
		Extensions = new System.Dynamic.ExpandoObject();
	}

	private ExpandoObject Extensions { get; set; }

	public string Make { get; set; }
	public string Model { get; set; }
	public int Year { get; set; }

	public override bool TryGetMember(GetMemberBinder binder, out object result)
	{
		string name = binder.Name.ToLower();
		Console.WriteLine("Getting: {0}", name);
		return (Extensions as IDictionary<string, object>).TryGetValue(name, out result);
	}

	public override bool TrySetMember(SetMemberBinder binder, object value)
	{
		var name = binder.Name.ToLower();

		Console.WriteLine("Setting: {0} -> {1}", name, value);
		(Extensions as IDictionary<string, object>)[name] = value;
		return true;
	}

	public override bool TryInvokeMember(InvokeMemberBinder binder, object[] args, out object result)
	{
		Console.WriteLine("Invoking: {0}", binder.Name);
		return base.TryInvokeMember(binder, args, out result);
	}
}

Once the type is defined we create and use instances just like any other dynamic type:

dynamic car = new DynamicCar()
{
	Make = "Toyota",
	Model = "Prius",
	Year = 2005
};

car.IsRunning = false;
car.TurnOn = (Action)(() => car.IsRunning = true);
car.TurnOff = (Action)(() => car.IsRunning = false);

Console.WriteLine("Make: {0}", car.Make);
Console.WriteLine("Model: {0}", car.Model);
Console.WriteLine("Year: {0}", car.Year);
Console.WriteLine("IsRunning: {0}", car.IsRunning);
car.TurnOn();
car.TurnOff();

DynamicObject Output

Notice how we are able to take advantage of object initializer syntax because the members we’re setting are defined on the class itself rather than being dynamic.  We can still access those members normally later on despite the variable being defined as dynamic.

The output shows how we’ve changed the behavior of the dynamic members while the static members are unaffected.  In this example actions affecting the dynamic members display a message.

Can’t Everything be Dynamic?

It’s true that there’s nothing preventing us from declaring everything as dynamic but it’s usually not a good idea in statically typed languages like C#.  In addition to losing all compile-time support that comes from statically typed languages, code that uses dynamic typing generally performs worse than code using static typing.  Generally speaking, only use dynamics when you have a good reason.