Not Another Regular Expression

I haven’t done anything with the System.Drawing namespace directly in a long time.  So long in fact that before today I honestly can’t remember the last time I needed anything in there.  When I needed to update the border color on an old ASP.NET DataGrid and the compiler informed me that I couldn’t use a hex string I was a bit surprised.  I needed a way to convert that string to a System.Drawing.Color.

In my haste the first thing I did was start writing a method to parse out the string and get the integer values to pass in to Color.FromArgb.  Because I needed to account for both the 3-digit and 6-digit formats in both uppercase and lowercase characters with or without the leading hash I started hacking out a regular expression.

I haven’t had much reason to use regular expressions for a long time either but apparently (and amazingly) I can apparently remember their syntax better than I can remember what’s in System.Drawing because with minimal documentation referencing this is what I came up with:

var re = new Regex(
	@"^#?(?([\dA-F]{3}$)(?<r>[\dA-F])(?<g>[\dA-F])(?<b>[\dA-F])|(?<r>[\dA-F]{2})(?<g>[\dA-F]{2})(?<b>[\dA-F]{2}))$",
 RegexOptions.IgnoreCase
);

As irritating and confusing as the syntax is I’m always amazed at how powerful regular expressions are.  There’s really quite a bit going on in this example so let’s take a look at what it’s matching.  I won’t talk about the RegexOptions piece because that should be pretty self-explanatory but otherwise we can break this one down into a few pieces starting with the most basic.

We start and end with the ^ and $ characters.  These ensure that the string we’re checking is respectively the first and last thing on the line.  Immediately following the opening ^ we see the #? pattern that says a valid match will start with no more than one instance of the # character.

Throughout the expression we repeatedly see the [\dA-F] pattern.  On its own this pattern matches a single hexadecimal digit (0-9, A-F).  When we need to match multiple consecutive hexadecimal digits we follow the pattern with a quantifier like {2} or {3}.

The remaining constructs in the expression deal with groups and conditional matching (formally called alternation).  These constructs look similar and are closely related.  In this example we’re using two types grouping patterns and an alternation pattern.  It’s probably best to start with the outermost construct and work our way in.

In this example alternation construct follows the (?(expression)yes-part|no-part) syntax.  I like to think of this conditional matching construct as the regular expression version of the ternary operator.  The expression is a zero-width assertion construct (non-advancing) that is used to determine whether the yes-part or no-part pattern should be matched.  Most of the time the construct for a zero-width assertion begins with (?= but in this case the assertion is is implied and the .NET regular expression parser allows us to omit the ?=.  In this example our zero-width assertion is ([\dA-F]{3}$).  That is, we’re evaluating whether the string matches exactly 3 hexadecimal digits followed by the end of the line.  In short, if the string is a 6-digit format the parser will match the “yes” part otherwise it will match the “no” part.  The reason we’re asserting the end of line here too is that we want to ensure that a 6-digit color doesn’t fall in to the “yes” part.

Note: Alternatively we could assert [\dA-F]{6} and swap the yes/no parts.

The “yes” and “no” parts are very similar in that they both consist of three named capturing groups: “r”, “g”, and “b”.  The named capturing groups are identified by the (?<name>pattern) syntax and instruct the parser to remember the values for use later in the pattern through backreferences or returning to C# via the Groups collection on the Match object.  Since we’ve really covered what the pattern does we won’t go into detail here.  We just need to recognize that when we’re matching a 3-digit color we capture the individual digits whereas when we have a 6-digit color we capture pairs of digits.  By using the same names in both parts our C# code can be completely ignorant of how the expression captured them.

Note: Regular expressions also allow for unnamed capturing groups that can be referred to by their ordinal index.  Even though they add clutter to an already potentially confusing string I usually stick to the named capturing groups because they make it easier to remember which group I’m working with.

This regular expression did the trick nicely.  I was able to extract the individual color components from both 3-digit and 6-digit color codes and fail out of anything that didn’t match by checking the match’s Success property.  Unfortunately this was only part of the conversion process.  I still needed to convert the values from the 3-digit pattern over to their 6-digit equivalent and pass the integer values to Color.FromArgb.  At this point I got to thinking “there has to be an easier way” as though the regular expression wasn’t enough.

No matter how far you have gone on a wrong road, turn back.
– Turkish Proverb

Remember that I said that I haven’t done anything with the System.Drawing namespace directly in a long time…  It turns out that there’s a ColorTranslator class in System.Drawing that provides a nice FromHtml method.  FromHtml takes a hex string and returns the equivalent System.Drawing.Color.  Problem solved.

Advertisement