C#

Back to Basics: Streamlining the StringBuilder

To get back in the habit of blogging I thought I’d start by trying to breathe some new life into one of the oldest pieces of the .NET Framework – the StringBuilder. A few years ago I wrote about an aspect of the StringBuilder class that’s often overlooked – it’s fluent interface. Back then I speculated that one reason the fluent interface is so rarely used is that virtually every StringBuilder example, including those on MSDN, fail to highlight it by using a more imperative style. (more…)

Replicating F#’s using Function in C#

Update – 5 April 2014

I’ve reconsidered the approach discussed here. This post is still worth reading for the context and motivation behind creating a Using method in C# but the updated approach works better and passes static code analysis. You can find the updated approach at the address below:

https://davefancher.com/2014/04/05/revisiting-the-using-function/

Thanks!

It didn’t take long for me to really appreciate the power offered by F#’s using function. Properly managing IDisposable instances in C# isn’t particularly problematic but the using statement really doesn’t offer a whole lot in the way of flexibility.  Sure, the block lets you create and dispose some IDisposable instance and put some arbitrary code within it but even before I entered the world of functional programming I found the syntax clunky, particularly when I only want the instance around long enough to get some value from it so I can do something with that value later. The obvious solution to the problem is to just use F# instead but unfortunately that’s not always a viable option in this C# dominated market.

Assuming that I have to work in C# I could address the situation by just putting all the code in the using block.

// C#
using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
{
  Console.WriteLine("Dimensions: {0} x {1}", img.Width, img.Height);
}

Granted, in this contrived example I’m only writing the dimensions to the console so the image would be disposed quickly but what if I needed those dimensions somewhere else? There are a few approaches to take and honestly, I don’t like any of them all that much.

The first way would be to forego using altogether.

// C#
var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png");
var dims = Tuple.Create(img.Width, img.Height);
img.Dispose();

// Do something with dims elsewhere

This approach is probably the cleanest and is very similar to a use binding in F# but it requires discipline to remember to manually dispose of the object. In C# I’m so conditioned to define IDisposables within using blocks though that I seldom take this approach. In order to do this same task with a using statement there are basically two options, define a variable outside of the block and assign it inside the block or define a method to wrap the using statement. I generally prefer the later because it facilitates reuse and eliminates a state change.

Variable approach

// C#
Tuple<int, int> dims;
using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
{
  dims = Tuple.Create(img.Width, img.Height);
}

// Do something with dims elsewhere

Method approach

// C#
private Tuple<int, int> GetDimensions()
{
  using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
  {
    return Tuple.Create(img.Width, img.Height);
  }
}

// Do something with dims elsewhere

Now let’s see how this same example would look in F# with the using function.

// F#
let dims = using (Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
                 (fun img -> (img.Width, img.Height))

Look how clean that is! Wouldn’t it be nice to have something like that in C#? You’ve probably guessed if from nothing else than the title of this article that it’s entirely possible. Right now you might be thinking that you could just reference FSharp.Core and call the function from the Operators module but you’ll quickly find that more trouble than it’s worth. The using function’s signature is:

// F#
val using : resource:'T -> action:('T -> 'U) (requires 'T :> System.IDisposable)

The function accepts two arguments: resource, a generic argument constrained to IDisposable, and action, a function that accepts 'T and returns another (unconstrained) generic type, 'U. That second argument is what would prove problematic if you tried to call the function from C# since it compiles to FSharpFunc<T, TResult> instead of Func<T, TResult>. Fortunately though it’s really easy to replicate the functionality natively in C#.

Due to differences between C# and F# the C# version of the Using function needs to be overloaded to accept either a Func or an Action depending on whether you’re returning a value. You’ll see though that in either case the function is just wrapping up the provided IDisposable instance inside a using statement and invoking the delegate, passing the IDisposable as an argument. To make the code accessible you’ll want to put the IDisposableHelper class in one of your common assemblies.

// C#
public static class IDisposableHelper
{
  public static TResult Using<TResource, TResult> (TResource resource, Func<TResource, TResult> action)
    where TResource : IDisposable
  {
    using (resource) return action(resource);
  }

  public static void Using<TResource> (TResource resource, Action<TResource> action)
    where TResource : IDisposable
  {
    using (resource) action(resource);
  }  
}

Using the functions isn’t quite as elegant as in F# but it definitely gets the job done in a much more functional manner.

Not returning a value

// C#
IDisposableHelper.Using(
  Image.FromFile(@"C:\Windows\Web\Screen\img100.png"),
  img => Console.WriteLine("Dimensions: {0} x {1}", img.Width, img.Height)
);

Returning a value

// C#
var dims =
  IDisposableHelper.Using(
    Image.FromFile(@"C:\Windows\Web\Screen\img100.png"),
    img => Tuple.Create(img.Width, img.Height)
  );

Console.WriteLine("Dimensions: {0} x {1}", dims.Item1, dims.Item2);

I’d still prefer to use F# but when I can’t at least I can turn to this to make C# feel a little more like home.

Why F#?

If you’ve been following along with my posts over the past six months or so you can probably imagine that I’ve been asked some variation of this post’s title more than a few times. One question that I keep getting is why I chose F# over some other functional languages like Haskell, Erlang, or Scala. The problem with that question though is that it’s predicated on the assumption that I actually set out to learn a functional language. The truth is that moving to F# was more of a long but natural progression from C# rather than a conscious decision.

The story begins about two and a half years ago. I had pretty much burned out and was deep into what I can only begin to describe as a stagnation coma. I stopped attending user group meetings; I cut way back on reading; I pretty much stopped doing anything that would help me remain even remotely aware of what was going on in the industry.

It’s easy to explain how I got to this point. My employer at the time gave very little incentive to stay current. For instance, there were homegrown frameworks for nearly every aspect of the product. Who needs NHibernate (or other ORM) when you have a proprietary DAL? Why learn ASP.NET MVC when you have a proprietary system for page layout? What’s the point of diving into WPF when the entire application is Web-based? Introducing new technologies or practices was generally discouraged under the banner of being too risky.

It wasn’t until the company hired a new architect that I started to wake up. He brought a wealth of knowledge of fascinating technologies that I’d hardly heard of and his excitement reminded me of what I loved about technology. My passion for software development was reigniting and I started looking at many of the technologies that had passed me by.

The first technology that really caught my attention during this time was LINQ. I’d consider it my introduction to the wonderful world of functional programming. As cool as I thought its query syntax was, I was really interested in the method syntax. I remember reading early on (although I don’t remember where) that query syntax was only added to C# after users complained that method syntax was too cumbersome in the preview versions. I didn’t understand this sentiment at all because to me method syntax felt completely natural. Gradually I began to realize that the reason it felt so natural was because it works the way I think. Lambda expressions, higher-order functions, composability, all of these functional concepts just spoke to me.

Over time I started using more of C#’s functional aspects but found myself getting increasingly frustrated. For the longest time though I couldn’t quite pinpoint anything specific but something just didn’t feel right anymore. It wasn’t until I was mowing the lawn late on a summer afternoon when I had that “a ha!” moment that changed my life.

That afternoon my yard work podcast selection included Hanselminutes #311. In this episode Richard Minerich and Phillip Trelford were talking about a functional language called F# that had been around for a few years and was built upon the .NET framework. I’d seen a few mentions of F# here and there but before hearing this podcast I hadn’t given it much thought.

At one point the discussion turned to language productivity and Phillip remarked that writing C# feels like completing government forms in triplicate. As he elaborated I experienced a sudden burst of clarity into one of the major things that had been bothering me about C# – it’s verbosity! After listening to the rest of the podcast and hearing more about how the functional nature of the language made it less error prone and how things like default immutability helped alleviate some of the complexity of parallel programming I knew I had to give it a try.

A language that doesn’t affect the way you think about programming, is not worth knowing.
— Alan Perlis, Epigrams on Programming

Despite my love of F# most of my work is still in C# but learning F# has had an amazing impact on how I write C#. C# has been becoming more of a functional language with virtually every new release and I’ve been using many of those capabilities for a few years but the language is hardly built around them. That said, forgetting about all the times I’ve typed let instead of var or tried to define a default constructor in the class signature in the last week alone F# really has changed the way I work. I find myself writing more higher-order functions and making much better use of delegation; I’ve developed a strong preference for readonly member fields and properties; and I regularly find myself longing for some of F#’s constructs like tuples, records, discriminated unions, and pattern matching.

The truth is that for all of its strengths though I’m finding working in C# increasingly annoying especially as I continue to work with F#. In many ways, working with C# feels like interacting with a toddler. I feel like I have to hold its hand and guide it along with explicit instructions on what to do every step of the way – even if I’ve already told it something. On the other hand, F# feels like having a personal assistant. Its functional nature allows me to more or less describe what I want and it handles the details.

There are plenty of things that make me prefer F# over C# but I’d like to highlight a few in particular. I’ve already written extensively about some of these and will be writing more about others as time permits but here I’d like to look at them from a more comparative angle.

Terse Syntax

Even though F# is a functional-first language I think a great way to illustrate the language’s expressiveness is with an object-oriented. We’ll start with a simple class definition in C#.

public class CircleMeasurements
{
  private double _diameter;
  private double _area;
  private double _circumference;

  public CircleMeasurements(double diameter, double area, double circumference)
  {
    _diameter = diameter;
    _area = area;
    _circumference = circumference;
  }

  public double Diameter { get { return _diameter; } }
  public double Area { get { return _area; } }
  public double Circumference { get { return _circumference; }  }
}

// Usage
var measurements = new CircleMeasurements(5.0, 19.63495408, 15.70796327);

Look how much code was required to create a class with three properties. Even in this small example we had to tell the compiler what type to use for each value three times! I could have used auto-implemented properties to simplify it a bit but even then we still need to tell the compiler which type to use twice for each value. Now let’s take a look at the same class in F#:

type CircleMeasurements(diameter, area, circumference) =
  member x.Diameter = diameter
  member x.Area = area
  member x.Circumference = circumference

// Usage
let measurements = CircleMeasurements(5.0, 19.63495408, 15.70796327);

That’s it – four lines of code. Granted this compiles to something that resembles the C# class but we didn’t have to write any of it and thanks to the fantastic type inference system we didn’t have to tell the compiler multiple times which type to use for each value. Quite often though even this is more verbose than we actually need. Many times we can use another F# construct – a record type – to represent something we’d represent with a class in other .NET languages:

type CircleMeasurements = { Diameter : float; Area : float; Circumference : float }

// Usage
let measurements = { Diameter = 5.0; Area = 19.63495408; Circumference = 15.70796327 }

In addition to being even more concise than the corresponding class definition, record types have the added benefit of being structurally comparable so we can easily check for equality between two instances. Record types do require us to include the type annotations but we only need to explicitly tell the compiler what to use once for each value and the constructor and properties are each created implicitly.

Functional Style

I already mentioned that functional programming feels more natural to me and by design F# really shines when it comes to expressing and using functions. Traditional .NET development has always had some type of support for delegation and it has definitely improved over the years, particularly with the common generic delegate classes (e.g.: Func, Action) and lambda expressions but actually trying to use them in a more functional style is a pain. This is complicated by the fact that in some situations the C# compiler can’t infer whether a lambda expression is a delegate or an expression tree. Although in some regards I prefer the C# lambda expression syntax I definitely prefer F#’s syntactic distinction between delegates and code quotations.

While on the topic of functional programming I have to mention F#’s default immutability. Immutability is key to any functional language and has been shown to improve overall program correctness by eliminating side effects. C# has some support for immutability through readonly fields or by omitting setters from property definitions but either of these require a conscious decision to enable. Nearly everything in F# is immutable unless explicitly declared otherwise. Immutability also provides benefits when writing asynchronous code because if nothing is changing, there’s a reduced need for locking.

Discriminated Unions

If you haven’t read my post on discriminated unions yet, they’re a way to express simple object hierarchies (or even tree structures) in a concise manner. From a syntactic perspective they’re incredibly simple but trying to replicate them even for simple structures really isn’t particularly feasible in C#. Here’s a simple example:

In this example suit is another discriminated union.

type Card =
| Ace of Suit
| King of Suit
| Queen of Suit
| Jack of Suit
| ValueCard of Suit * int

Using this discriminated union we express the 7 of hearts as ValueCard(Heart, 7). For illustration of what it would take to represent this structure in C# I’m including a screenshot of ILSpy’s decompilation. Note that I’ve only included the signatures and even this five case discriminated union more than fills my screen. Just to drive the point home, fully expanded, this code is nearly 700 lines long! Granted there are a few CompilerGeneratedAttributes in there but they hardly count for a majority of the code.

Ultimately, the union type is an abstract class and each case is a nested class. The nested classes are each assigned a unique tag that’s used for type checking and code branching in some of the union type’s methods. Not included in the screenshot are implementations of several interfaces and overrides of Equals and GetHashCode.

Decompiled Card Discriminated Union

Decompiled Card Discriminated Union

Collection Types & Comprehensions

C# has made great strides in regard to initializing various collection types but it still pales in comparison to the constructs offered by F#. Don’t get me wrong, collection initializers are a nice syntactic convenience but nearly every time I use it I think how much easier it would likely be with a comprehension. LINQ can address some of these shortcomings but even convenience methods like Enumerable.Range feel limiting. Yes, I could write some additional convenience methods to address some of the shortcomings but in F# I don’t have to.

Part of the beauty of comprehensions is that they generally apply regardless of which collection type you’re creating. Although each of the examples below create F# lists they can easily be modified to create sequences or arrays instead.

// Numbers 0 - 10
> [0..10];;
val it : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

// Numbers 0 - 10 by 2
> [0..2..10];;
val it : int list = [0; 2; 4; 6; 8; 10]

// Charcters 'a' - 'z'
> ['a'..'z'];;
val it : char list =
  ['a'; 'b'; 'c'; 'd'; 'e'; 'f'; 'g'; 'h'; 'i'; 'j'; 'k'; 'l'; 'm'; 'n'; 'o';
   'p'; 'q'; 'r'; 's'; 't'; 'u'; 'v'; 'w'; 'x'; 'y'; 'z']

// Unshuffled deck
> let deck =
  [ for suit in [ Heart; Diamond; Spade; Club ] do
      yield Ace(suit)
      yield King(suit)
      yield Queen(suit)
      yield Jack(suit)
      for v in 2 .. 10 do
        yield ValueCard(suit, v)
  ];;

val deck : Card list =
  [Ace Heart; King Heart; Queen Heart; Jack Heart; ValueCard (Heart,2);
   ValueCard (Heart,3); ValueCard (Heart,4); ValueCard (Heart,5);
   ValueCard (Heart,6); ValueCard (Heart,7); ValueCard (Heart,8);
   ValueCard (Heart,9); ValueCard (Heart,10); Ace Diamond; King Diamond;
   Queen Diamond; Jack Diamond; ValueCard (Diamond,2); ValueCard (Diamond,3);
   ValueCard (Diamond,4); ValueCard (Diamond,5); ValueCard (Diamond,6);
   ValueCard (Diamond,7); ValueCard (Diamond,8); ValueCard (Diamond,9);
   ValueCard (Diamond,10); Ace Spade; King Spade; Queen Spade; Jack Spade;
   ValueCard (Spade,2); ValueCard (Spade,3); ValueCard (Spade,4);
   ValueCard (Spade,5); ValueCard (Spade,6); ValueCard (Spade,7);
   ValueCard (Spade,8); ValueCard (Spade,9); ValueCard (Spade,10); Ace Club;
   King Club; Queen Club; Jack Club; ValueCard (Club,2); ValueCard (Club,3);
   ValueCard (Club,4); ValueCard (Club,5); ValueCard (Club,6);
   ValueCard (Club,7); ValueCard (Club,8); ValueCard (Club,9);
   ValueCard (Club,10)]

Pattern Matching

faced with a 20th century computer
Scotty: Computer! Computer?
He’s handed a mouse, and he speaks into it
Scotty: Hello, computer.
Dr. Nichols: Just use the keyboard.
Scotty: Keyboard. How quaint.
— Star Trek IV

The above conversation enters my mind when I’m working with C#’s branching constructs, switch statements in particular.  F#’s pattern matching may bear a slight resemblance to C# switch statements but they’re so much more powerful.  switch statements limit us to simply branching on constant values but pattern matching allows value extraction, multiple cases, and refinement constraints for more precise control with a syntax much friendlier than your common if/else statement. Furthermore, like virtually everything else in F#, pattern matches are expressions so they return a value making them ideal candidates for inline conditional bindings.

Units of Measure

Every programmer works with code that uses different units of measure. In most languages dealing with units of measure is error prone because they require discipline from the developer to ensure that the correct units are always used. There are some libraries that attempt to address the problem but to my knowledge (please correct me if I’m wrong) F# is the only one to actually include it in the type system. F#’s unit of measure support is complete enough that it can often automatically convert values between units, particularly when a conversion expression is included with the type definition.

[<Measure>] type px
[<Measure>] type inch
[<Measure>] type dot = 1 px
[<Measure>] type dpi = dot / inch

let convertToPixels (inches : float<inch>) (resolution : float<dpi>) : float<px> =
  inches * resolution

let convertToInches (pixels : float<px>) (resolution : float<dpi>) : float<inch> =
  pixels / resolution

The biggest downfall of F#’s units of measure is that they’re a feature of the type system and compiler rather than a CLR feature. As such, the compiled code doesn’t have any unit information and we can’t enforce the units in cross-language scenarios.

Object Expressions

I haven’t written about object expressions yet but they’re definitely on my backlog. Object expressions provide a way to create ad-hoc (anonymous) types based on one or more interfaces or a base class. They’re useful in a variety of scenarios like creating one-off formatters or comparers and I think they can at least supplement, if not replace some mocking libraries. To illustrate we’ll use a somewhat contrived example of a logging service.

type LogMessageType =
| DebugMessage of string
| InfoMessage of string
| ErrorMessage of string

type ILogProvider =
  abstract member Log : LogMessageType -> unit

type LogService(provider : ILogProvider) =
  let log = provider.Log
  member this.LogDebug msg = DebugMessage(msg) |> log
  member this.LogInfo msg = InfoMessage(msg) |> log
  member this.LogError msg = ErrorMessage(msg) |> log

Here we use a discriminated union to define the message types the logging provider interface can handle. The log service itself provides a slightly friendlier API than the provider interface itself. Normally if we wanted to use the log service instance we’d have to define a concrete implementation of ILogProvider but object expressions allow us to easily define one inline.

let logger =
  LogService(
    { new ILogProvider with
      member this.Log msg =
        match msg with
        | DebugMessage(m) -> printfn "DEBUG: %s" m
        | InfoMessage(m) -> printfn "INFO: %s" m
        | ErrorMessage(m) -> printfn "ERROR: %s" m
    }
  )

In our object expression we use pattern matching to detect the message type, extract the associated string, and write an appropriate message to the console.

> logger.LogDebug "message"
logger.LogInfo "message"
logger.LogError "message";;
DEBUG: message
INFO: message
ERROR: message

What’s Not So Great?

I could continue on for a while with things I like about F# but this post is already long enough as it is. That said, I think it’s only fair to list out a few of the things I don’t like about the language.

Tooling Support

Probably the biggest gripe I have is the lack of tooling support around the language. So many tools and templates that I take for granted when working with C# simply aren’t available. Things like IntelliSense are pretty complete but if you’re just getting started with F# and looking to do more than a console application or library be prepared to spend some time looking for 3rd party templates and reading blog posts.

Language Interop

On a somewhat related note, even though F# compiles to MSIL and can reference or be referenced by other .NET assemblies there are some quirks that make language interoperability a but cumbersome. For instance, using extension methods defined in C# doesn’t work as cleanly as I thought they would. When I was experimenting with MassTransit I couldn’t get the UseMsmq, VerifyMsmqConfiguration, or a number of other extension methods to appear no matter what I tried. I ultimately had to call the static methods directly.

I’ve read that this is addressed in F# 3.0 but I haven’t done enough with 3.0 yet to confirm.

Project Structure

It’s not really fair to put this under the “what’s not so great?” heading but it seemed most appropriate. This isn’t so much an issue with the language as much as it’s a big mindset shift of a similar magnitude of switching from OO to functional. The structure of an F# project is significantly different than that of a C# (or even VB) project and is something I’m still struggling with.

In C# we generally organize code into folders representing namespaces and keep one type (class) per file. F# evaluates code from top down throughout the project so file sequence is significant. Furthermore, code is organized by modules rather than type. F# does have namespaces but even then they’re usually divided across one or more files and from my experience, not grouped by folder.

Wrap Up

No matter what language you work in, programming in a functional style provides benefits. You should do it whenever it is convenient, and you should think hard about the decision when it isn’t convenient.
— John Carmack, Functional Programming in C++

In general I’ve found that the more I learn and work with F# the more I like it. I regularly find myself reaching for it as my first choice, especially when it comes to new development. Although there are a few things that I don’t like about working in F# most of them just require more diligence on my part or are easily managed. I’ve only listed a few key areas where I think F# excels but I firmly believe that their strengths far outweigh any weaknesses.

Overload Resolution Oops

Earlier today I was observing the output of some calls to Debug.WriteLine when I decided that one of the messages was a little too verbose. Basically the message included a fully qualified name when just the class name would suffice. I found the line which originally read:

Debug.WriteLine("Storing Pending Events for {0}", aggregate.GetType());

and changed it to:

Debug.WriteLine("Storing Pending Events for {0}", aggregate.GetType().Name);

Upon rerunning the app I saw something that surprised me but really shouldn’t have. The message now read “Document: Storing Pending Events for {0}” instead of “Storing Pending Events for Document.” How could this be?

The issue came down to overload resolution. Debug.WriteLine has five overloads but we’re really only interested in the last two:

public static void WriteLine(Object value)
public static void WriteLine(string message)
public static void WriteLine(Object value, string category)
public static void WriteLine(string format, params Object[] args)
public static void WriteLine(string message, string category)

The final two overloads serve very different purposes but potentially conflict as I was observing. One of them writes a formatted message (using String.Format behind the scenes) whereas the other writes a message and category. It just so happens that changing the second argument from aggregate.GetType() to aggregate.GetType().Name resulted in the compiler selecting a different overload because the one that accepts two strings is a better match than the one that accepts the object params array. Had our message included two or more format arguments we’d have never seen this but because we happened to be passing a Type rather than a string we got the params version.

To resolve the problem I first wrapped the two arguments into a call to String.Format but of course ReSharper complained about a redundant call (apparently it also thought that params version would be called). Ultimately I just cast the name to object and moved on.

Debug.WriteLine("Storing Pending Events for {0}", (object)aggregate.GetType().Name);

Like I said, this really shouldn’t have surprised me but it did. Hopefully next time I’ll remember that there’s a potentially conflicting overload to watch out for.

C# 5.0 Breaking Changes

In the Language Lab section of the November 2012 issue of Visual Studio Magazine Patrick Steele highlights some of the lesser known changes to C#. Among the changes are some new attributes to help obtain caller information without having to resort to directly accessing StackFrames but that’s not what I want to call attention to. The more important part of his article are some breaking changes that anyone moving to C# 5 should be aware of.

The first of the breaking changes relate to capturing the value of an iteration variable in a lambda expression. If you’ve ever written a loop where the body contained a lambda expression that directly used the iteration variable you’ve encountered some unexpected behavior.  Consider Patrick’s example:

var computes = new List<Func<int>>();

foreach(var i in Enumerable.Range(0, 10))
{
	computes.Add(() => i * 2);
}

foreach(var func in computes)
{
	Console.WriteLine(func());
}

Without knowing the old behavior one could reasonably assume that the second loop would print out 0 – 18 (by 2s of course) but that’s not what happens. Prior to C# 5.0 deferring execution of the lambda expression to the second loop causes the expression to use the last value of i (9) so the number 18 is printed 10 times. We can observe similar behavior in LINQ as it iterates over sequences. The way to work around it was to create a state variable and capture it in a closure like in this modified example:

var computes = new List<Func<int>>();

foreach(var i in Enumerable.Range(0, 10))
{
	var state = i;
	computes.Add(() => state * 2);
}

foreach(var func in computes)
{
	Console.WriteLine(func());
}

Under C# 5.0 using a state variable is no longer necessary. The compiler will handle capturing the value of the iteration variable when it’s created.

The other breaking change relates to how named and positional arguments are handled. I typically only use explicit, ordered parameters so the old behavior never really affected me but previous versions of the compiler would evaluate named arguments before evaluating the ordered parameters. This behavior wasn’t particularly intuitive so it has been changed in C# 5.0. The only time this would really be a problem is when the expression being evaluated affected subsequent expression evaluations but since the change does affect compiler behavior it’s important to be aware of.

String Distances

One of my first projects at Leaf has been trying to match some data read from an OCR solution against what is in the database.  As sophisticated as OCR algorithms have become though it’s still not reliable enough to guarantee 100% accurate results every time due to the number of variations in type faces, artifacts introduced through scanning or faxing the document, or any number of other factors.

Most of the documents I’ve been working with have been pretty clean and I’ve been able to get an exact match automatically.  One of my samples though, has some security features that intentionally obfuscate some of the information I care about.  This naturally makes getting an exact match difficult.  Amazingly though, the OCR result was about 80% accurate so there was still some hope.

One of my coworkers suggested that I look at some of the string distance algorithms to see if any of them could help us get closer to an exact match.  He pointed me at the Levenshtein algorithm so I took a look at that along with the Hamming and Damerau-Levenshtein algorithms.

For the uninitiated (like myself a week ago), these algorithms provide a way to determine the distance between two strings.  The distance is essentially a measurement of string similarity. In other words, they calculate how many steps are required to transform one string into another.

I want to look briefly at each of these and show some examples. Note that each of these algorithms are case sensitive but modifying them to ignore case is trivial.

(more…)

Theoris Innovation Series

On April 20, 2012 from 1:00 – 4:00 PM Theoris IT Services is hosting the next installment of its Theoris Innovation Series.  For this event Alex Gheith and I will be discussing many of the modern features of C# including:

  • LINQ
  • Dynamic Programming
  • Parallel Programming (including the upcoming async and await keywords)

This is a free event but please note that space is limited to the first 40 respondents.  For more information, please check the event site.

JustDecompile – One More Time

I’ve looked at JustDecompile a few times over the past year, following its progress from its early beta stages.  When I saw that it was officially released yesterday (14 Feb 2012) I remembered telling Vladimir Dragoev from the JustDecompile team that I’d give it another look.  I’ve been very critical of this product in the past but given that everything I’ve looked at so far has been pre-release software it’s only fair that I give it one more chance to redeem itself.  Can this version finally handle my tests?

(more…)

Some C# Syntax Tricks

Christmas came a bit early for my team this year when we were told that the company had purchased licenses for both Reflector AND ReSharper!  Naturally I installed both right away.  Of course we’re all familiar with Reflector but haven’t used it lately since it’s no longer free.  ReSharper on the other hand is new to many of us.  I know lots of great developers that swear by it but I’ve never really had a chance to dive in and try it out.  I generally think I’m pretty savvy with C# syntax but after just two days ReSharper has already taught me a few syntax tricks that I’ve adopted.

(more…)

System.Diagnostics.Debugger

I hardly ever use the classes in the System.Diagnostics namespace.  As much as I’d like everyone to believe that it’s because I’m such a rockstar that I don’t need them, it’s really just that I generally use other techniques.  With Visual Studio providing so many tools for debugging I’ve rarely had reason to dig into this namespace much.  Sure, I’ve used the Debug, Trace, and EventLog classes but I haven’t taken the time to investigate what else is in there.

(more…)