F#

Bloomington .NET Society – June 27

I know I’ve been quiet for a few months but don’t worry, I haven’t disappeared. Instead I’ve been hard at work on an upcoming F# book! The book has consumed most of my time but not being one to pass up a chance to talk about my obsession I’m making the trip down to Bloomington, IN at the end of the month to talk to the Bloomington .NET Society.

If you’re in the Bloomington area on June 27th and interested in learning about F#, please join us. You can find the full meeting details on the group’s site: http://dotnet.indiana.edu/news/jun-2013-meeting.

I hope to see you there!

Replicating F#’s using Function in C#

Update – 5 April 2014

I’ve reconsidered the approach discussed here. This post is still worth reading for the context and motivation behind creating a Using method in C# but the updated approach works better and passes static code analysis. You can find the updated approach at the address below:

https://davefancher.com/2014/04/05/revisiting-the-using-function/

Thanks!

It didn’t take long for me to really appreciate the power offered by F#’s using function. Properly managing IDisposable instances in C# isn’t particularly problematic but the using statement really doesn’t offer a whole lot in the way of flexibility. Sure, the block lets you create and dispose some IDisposable instance and put some arbitrary code within it but even before I entered the world of functional programming I found the syntax clunky, particularly when I only want the instance around long enough to get some value from it so I can do something with that value later. The obvious solution to the problem is to just use F# instead but unfortunately that’s not always a viable option in this C# dominated market.

Assuming that I have to work in C# I could address the situation by just putting all the code in the using block.

// C#
using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
{
  Console.WriteLine("Dimensions: {0} x {1}", img.Width, img.Height);
}

Granted, in this contrived example I’m only writing the dimensions to the console so the image would be disposed quickly but what if I needed those dimensions somewhere else? There are a few approaches to take and honestly, I don’t like any of them all that much.

The first way would be to forego using altogether.

// C#
var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png");
var dims = Tuple.Create(img.Width, img.Height);
img.Dispose();

// Do something with dims elsewhere

This approach is probably the cleanest and is very similar to a use binding in F# but it requires discipline to remember to manually dispose of the object. In C# I’m so conditioned to define IDisposables within using blocks though that I seldom take this approach. In order to do this same task with a using statement there are basically two options, define a variable outside of the block and assign it inside the block or define a method to wrap the using statement. I generally prefer the later because it facilitates reuse and eliminates a state change.

Variable approach

// C#
Tuple<int, int> dims;
using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
{
  dims = Tuple.Create(img.Width, img.Height);
}

// Do something with dims elsewhere

Method approach

// C#
private Tuple<int, int> GetDimensions()
{
  using(var img = Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
  {
    return Tuple.Create(img.Width, img.Height);
  }
}

// Do something with dims elsewhere

Now let’s see how this same example would look in F# with the using function.

// F#
let dims = using (Image.FromFile(@"C:\Windows\Web\Screen\img100.png"))
                 (fun img -> (img.Width, img.Height))

Look how clean that is! Wouldn’t it be nice to have something like that in C#? You’ve probably guessed if from nothing else than the title of this article that it’s entirely possible. Right now you might be thinking that you could just reference FSharp.Core and call the function from the Operators module but you’ll quickly find that more trouble than it’s worth. The using function’s signature is:

// F#
val using : resource:'T -> action:('T -> 'U) (requires 'T :> System.IDisposable)

The function accepts two arguments: resource, a generic argument constrained to IDisposable, and action, a function that accepts 'T and returns another (unconstrained) generic type, 'U. That second argument is what would prove problematic if you tried to call the function from C# since it compiles to FSharpFunc<T, TResult> instead of Func<T, TResult>. Fortunately though it’s really easy to replicate the functionality natively in C#.

Due to differences between C# and F# the C# version of the Using function needs to be overloaded to accept either a Func or an Action depending on whether you’re returning a value. You’ll see though that in either case the function is just wrapping up the provided IDisposable instance inside a using statement and invoking the delegate, passing the IDisposable as an argument. To make the code accessible you’ll want to put the IDisposableHelper class in one of your common assemblies.

// C#
public static class IDisposableHelper
{
  public static TResult Using<TResource, TResult> (TResource resource, Func<TResource, TResult> action)
    where TResource : IDisposable
  {
    using (resource) return action(resource);
  }

  public static void Using<TResource> (TResource resource, Action<TResource> action)
    where TResource : IDisposable
  {
    using (resource) action(resource);
  }  
}

Using the functions isn’t quite as elegant as in F# but it definitely gets the job done in a much more functional manner.

Not returning a value

// C#
IDisposableHelper.Using(
  Image.FromFile(@"C:\Windows\Web\Screen\img100.png"),
  img => Console.WriteLine("Dimensions: {0} x {1}", img.Width, img.Height)
);

Returning a value

// C#
var dims =
  IDisposableHelper.Using(
    Image.FromFile(@"C:\Windows\Web\Screen\img100.png"),
    img => Tuple.Create(img.Width, img.Height)
  );

Console.WriteLine("Dimensions: {0} x {1}", dims.Item1, dims.Item2);

I’d still prefer to use F# but when I can’t at least I can turn to this to make C# feel a little more like home.

IndySA – March 21, 2013

The March IndySA meeting is this Thursday. I’m excited for the opportunity to spread around a bit more F# love as this month’s speaker. If you’re looking for a fun way to fill the evening please join us at the SEP office in Carmel at 5:30 PM. All of the logistics details are available on the meetup site.

I hope to see you there!

About the Talk

F# Needs Love Too

Originally developed by Microsoft Research, Cambridge, F# is an open-source, functional-first language in the ML family. Despite its lofty position as a first-class Visual Studio language for the past two releases and its cross-platform availability it hasn’t seen widespread adoption in the business world. In this talk we’ll take an introductory look at F#, exploring how its constructs and terse syntax can allow you to write more stable, maintainable code while keeping you focused on the problem rather than the plumbing.

Custom Dark Colors for F# Depth Colorizer for VS2012

A few days ago I installed the F# Depth Colorizer extension for Visual Studio 2012. I really liked the idea but didn’t care much for the default colors used with the dark theme. Rather than alternating light and dark colors I thought it would look better with the background getting progressively lighter giving the illusion of each block being stacked on its container.

After a little tweaking I created the necessary registry entry and was pretty pleased with the result. The image to the right shows the colors against the same code snippet used in the extension’s description. If you’d like to use these colors just copy the registry information below into a .reg file and apply it. You’ll need to restart Visual Studio for the changes to take effect.

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\11.0\Text Editor\FSharpDepthColorizer\Dark]
"Depth0"="0,0,0,0,0,0"
"Depth1"="15,15,15,15,15,15"
"Depth2"="30,30,30,30,30,30"
"Depth3"="45,45,45,45,45,45"
"Depth4"="55,55,55,55,55,55"
"Depth5"="65,65,65,65,65,65"
"Depth6"="75,75,75,75,75,75"
"Depth7"="80,80,80,80,80,80"
"Depth8"="85,85,85,85,85,85"
"Depth9"="90,90,90,90,90,90"

More information about the extension including how to customize the colors is available on Brian McNamara’s blog.

Why F#?

If you’ve been following along with my posts over the past six months or so you can probably imagine that I’ve been asked some variation of this post’s title more than a few times. One question that I keep getting is why I chose F# over some other functional languages like Haskell, Erlang, or Scala. The problem with that question though is that it’s predicated on the assumption that I actually set out to learn a functional language. The truth is that moving to F# was more of a long but natural progression from C# rather than a conscious decision.

The story begins about two and a half years ago. I had pretty much burned out and was deep into what I can only begin to describe as a stagnation coma. I stopped attending user group meetings; I cut way back on reading; I pretty much stopped doing anything that would help me remain even remotely aware of what was going on in the industry.

It’s easy to explain how I got to this point. My employer at the time gave very little incentive to stay current. For instance, there were homegrown frameworks for nearly every aspect of the product. Who needs NHibernate (or other ORM) when you have a proprietary DAL? Why learn ASP.NET MVC when you have a proprietary system for page layout? What’s the point of diving into WPF when the entire application is Web-based? Introducing new technologies or practices was generally discouraged under the banner of being too risky.

It wasn’t until the company hired a new architect that I started to wake up. He brought a wealth of knowledge of fascinating technologies that I’d hardly heard of and his excitement reminded me of what I loved about technology. My passion for software development was reigniting and I started looking at many of the technologies that had passed me by.

The first technology that really caught my attention during this time was LINQ. I’d consider it my introduction to the wonderful world of functional programming. As cool as I thought its query syntax was, I was really interested in the method syntax. I remember reading early on (although I don’t remember where) that query syntax was only added to C# after users complained that method syntax was too cumbersome in the preview versions. I didn’t understand this sentiment at all because to me method syntax felt completely natural. Gradually I began to realize that the reason it felt so natural was because it works the way I think. Lambda expressions, higher-order functions, composability, all of these functional concepts just spoke to me.

Over time I started using more of C#’s functional aspects but found myself getting increasingly frustrated. For the longest time though I couldn’t quite pinpoint anything specific but something just didn’t feel right anymore. It wasn’t until I was mowing the lawn late on a summer afternoon when I had that “a ha!” moment that changed my life.

That afternoon my yard work podcast selection included Hanselminutes #311. In this episode Richard Minerich and Phillip Trelford were talking about a functional language called F# that had been around for a few years and was built upon the .NET framework. I’d seen a few mentions of F# here and there but before hearing this podcast I hadn’t given it much thought.

At one point the discussion turned to language productivity and Phillip remarked that writing C# feels like completing government forms in triplicate. As he elaborated I experienced a sudden burst of clarity into one of the major things that had been bothering me about C# – it’s verbosity! After listening to the rest of the podcast and hearing more about how the functional nature of the language made it less error prone and how things like default immutability helped alleviate some of the complexity of parallel programming I knew I had to give it a try.

A language that doesn’t affect the way you think about programming, is not worth knowing.
— Alan Perlis, Epigrams on Programming

Despite my love of F# most of my work is still in C# but learning F# has had an amazing impact on how I write C#. C# has been becoming more of a functional language with virtually every new release and I’ve been using many of those capabilities for a few years but the language is hardly built around them. That said, forgetting about all the times I’ve typed let instead of var or tried to define a default constructor in the class signature in the last week alone F# really has changed the way I work. I find myself writing more higher-order functions and making much better use of delegation; I’ve developed a strong preference for readonly member fields and properties; and I regularly find myself longing for some of F#’s constructs like tuples, records, discriminated unions, and pattern matching.

The truth is that for all of its strengths though I’m finding working in C# increasingly annoying especially as I continue to work with F#. In many ways, working with C# feels like interacting with a toddler. I feel like I have to hold its hand and guide it along with explicit instructions on what to do every step of the way – even if I’ve already told it something. On the other hand, F# feels like having a personal assistant. Its functional nature allows me to more or less describe what I want and it handles the details.

There are plenty of things that make me prefer F# over C# but I’d like to highlight a few in particular. I’ve already written extensively about some of these and will be writing more about others as time permits but here I’d like to look at them from a more comparative angle.

Terse Syntax

Even though F# is a functional-first language I think a great way to illustrate the language’s expressiveness is with an object-oriented. We’ll start with a simple class definition in C#.

public class CircleMeasurements
{
  private double _diameter;
  private double _area;
  private double _circumference;

  public CircleMeasurements(double diameter, double area, double circumference)
  {
    _diameter = diameter;
    _area = area;
    _circumference = circumference;
  }

  public double Diameter { get { return _diameter; } }
  public double Area { get { return _area; } }
  public double Circumference { get { return _circumference; }  }
}

// Usage
var measurements = new CircleMeasurements(5.0, 19.63495408, 15.70796327);

Look how much code was required to create a class with three properties. Even in this small example we had to tell the compiler what type to use for each value three times! I could have used auto-implemented properties to simplify it a bit but even then we still need to tell the compiler which type to use twice for each value. Now let’s take a look at the same class in F#:

type CircleMeasurements(diameter, area, circumference) =
  member x.Diameter = diameter
  member x.Area = area
  member x.Circumference = circumference

// Usage
let measurements = CircleMeasurements(5.0, 19.63495408, 15.70796327);

That’s it – four lines of code. Granted this compiles to something that resembles the C# class but we didn’t have to write any of it and thanks to the fantastic type inference system we didn’t have to tell the compiler multiple times which type to use for each value. Quite often though even this is more verbose than we actually need. Many times we can use another F# construct – a record type – to represent something we’d represent with a class in other .NET languages:

type CircleMeasurements = { Diameter : float; Area : float; Circumference : float }

// Usage
let measurements = { Diameter = 5.0; Area = 19.63495408; Circumference = 15.70796327 }

In addition to being even more concise than the corresponding class definition, record types have the added benefit of being structurally comparable so we can easily check for equality between two instances. Record types do require us to include the type annotations but we only need to explicitly tell the compiler what to use once for each value and the constructor and properties are each created implicitly.

Functional Style

I already mentioned that functional programming feels more natural to me and by design F# really shines when it comes to expressing and using functions. Traditional .NET development has always had some type of support for delegation and it has definitely improved over the years, particularly with the common generic delegate classes (e.g.: Func, Action) and lambda expressions but actually trying to use them in a more functional style is a pain. This is complicated by the fact that in some situations the C# compiler can’t infer whether a lambda expression is a delegate or an expression tree. Although in some regards I prefer the C# lambda expression syntax I definitely prefer F#’s syntactic distinction between delegates and code quotations.

While on the topic of functional programming I have to mention F#’s default immutability. Immutability is key to any functional language and has been shown to improve overall program correctness by eliminating side effects. C# has some support for immutability through readonly fields or by omitting setters from property definitions but either of these require a conscious decision to enable. Nearly everything in F# is immutable unless explicitly declared otherwise. Immutability also provides benefits when writing asynchronous code because if nothing is changing, there’s a reduced need for locking.

Discriminated Unions

If you haven’t read my post on discriminated unions yet, they’re a way to express simple object hierarchies (or even tree structures) in a concise manner. From a syntactic perspective they’re incredibly simple but trying to replicate them even for simple structures really isn’t particularly feasible in C#. Here’s a simple example:

In this example suit is another discriminated union.

type Card =
| Ace of Suit
| King of Suit
| Queen of Suit
| Jack of Suit
| ValueCard of Suit * int

Using this discriminated union we express the 7 of hearts as ValueCard(Heart, 7). For illustration of what it would take to represent this structure in C# I’m including a screenshot of ILSpy’s decompilation. Note that I’ve only included the signatures and even this five case discriminated union more than fills my screen. Just to drive the point home, fully expanded, this code is nearly 700 lines long! Granted there are a few CompilerGeneratedAttributes in there but they hardly count for a majority of the code.

Ultimately, the union type is an abstract class and each case is a nested class. The nested classes are each assigned a unique tag that’s used for type checking and code branching in some of the union type’s methods. Not included in the screenshot are implementations of several interfaces and overrides of Equals and GetHashCode.

Decompiled Card Discriminated Union

Collection Types & Comprehensions

C# has made great strides in regard to initializing various collection types but it still pales in comparison to the constructs offered by F#. Don’t get me wrong, collection initializers are a nice syntactic convenience but nearly every time I use it I think how much easier it would likely be with a comprehension. LINQ can address some of these shortcomings but even convenience methods like Enumerable.Range feel limiting. Yes, I could write some additional convenience methods to address some of the shortcomings but in F# I don’t have to.

Part of the beauty of comprehensions is that they generally apply regardless of which collection type you’re creating. Although each of the examples below create F# lists they can easily be modified to create sequences or arrays instead.

// Numbers 0 - 10
> [0..10];;
val it : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

// Numbers 0 - 10 by 2
> [0..2..10];;
val it : int list = [0; 2; 4; 6; 8; 10]

// Charcters 'a' - 'z'
> ['a'..'z'];;
val it : char list =
  ['a'; 'b'; 'c'; 'd'; 'e'; 'f'; 'g'; 'h'; 'i'; 'j'; 'k'; 'l'; 'm'; 'n'; 'o';
   'p'; 'q'; 'r'; 's'; 't'; 'u'; 'v'; 'w'; 'x'; 'y'; 'z']

// Unshuffled deck
> let deck =
  [ for suit in [ Heart; Diamond; Spade; Club ] do
      yield Ace(suit)
      yield King(suit)
      yield Queen(suit)
      yield Jack(suit)
      for v in 2 .. 10 do
        yield ValueCard(suit, v)
  ];;

val deck : Card list =
  [Ace Heart; King Heart; Queen Heart; Jack Heart; ValueCard (Heart,2);
   ValueCard (Heart,3); ValueCard (Heart,4); ValueCard (Heart,5);
   ValueCard (Heart,6); ValueCard (Heart,7); ValueCard (Heart,8);
   ValueCard (Heart,9); ValueCard (Heart,10); Ace Diamond; King Diamond;
   Queen Diamond; Jack Diamond; ValueCard (Diamond,2); ValueCard (Diamond,3);
   ValueCard (Diamond,4); ValueCard (Diamond,5); ValueCard (Diamond,6);
   ValueCard (Diamond,7); ValueCard (Diamond,8); ValueCard (Diamond,9);
   ValueCard (Diamond,10); Ace Spade; King Spade; Queen Spade; Jack Spade;
   ValueCard (Spade,2); ValueCard (Spade,3); ValueCard (Spade,4);
   ValueCard (Spade,5); ValueCard (Spade,6); ValueCard (Spade,7);
   ValueCard (Spade,8); ValueCard (Spade,9); ValueCard (Spade,10); Ace Club;
   King Club; Queen Club; Jack Club; ValueCard (Club,2); ValueCard (Club,3);
   ValueCard (Club,4); ValueCard (Club,5); ValueCard (Club,6);
   ValueCard (Club,7); ValueCard (Club,8); ValueCard (Club,9);
   ValueCard (Club,10)]

Pattern Matching

faced with a 20th century computer
Scotty: Computer! Computer?
He’s handed a mouse, and he speaks into it
Scotty: Hello, computer.
Dr. Nichols: Just use the keyboard.
Scotty: Keyboard. How quaint.
— Star Trek IV

The above conversation enters my mind when I’m working with C#’s branching constructs, switch statements in particular. F#’s pattern matching may bear a slight resemblance to C# switch statements but they’re so much more powerful. switch statements limit us to simply branching on constant values but pattern matching allows value extraction, multiple cases, and refinement constraints for more precise control with a syntax much friendlier than your common if/else statement. Furthermore, like virtually everything else in F#, pattern matches are expressions so they return a value making them ideal candidates for inline conditional bindings.

Units of Measure

Every programmer works with code that uses different units of measure. In most languages dealing with units of measure is error prone because they require discipline from the developer to ensure that the correct units are always used. There are some libraries that attempt to address the problem but to my knowledge (please correct me if I’m wrong) F# is the only one to actually include it in the type system. F#’s unit of measure support is complete enough that it can often automatically convert values between units, particularly when a conversion expression is included with the type definition.

[<Measure>] type px
[<Measure>] type inch
[<Measure>] type dot = 1 px
[<Measure>] type dpi = dot / inch

let convertToPixels (inches : float<inch>) (resolution : float<dpi>) : float<px> =
  inches * resolution

let convertToInches (pixels : float<px>) (resolution : float<dpi>) : float<inch> =
  pixels / resolution

The biggest downfall of F#’s units of measure is that they’re a feature of the type system and compiler rather than a CLR feature. As such, the compiled code doesn’t have any unit information and we can’t enforce the units in cross-language scenarios.

Object Expressions

I haven’t written about object expressions yet but they’re definitely on my backlog. Object expressions provide a way to create ad-hoc (anonymous) types based on one or more interfaces or a base class. They’re useful in a variety of scenarios like creating one-off formatters or comparers and I think they can at least supplement, if not replace some mocking libraries. To illustrate we’ll use a somewhat contrived example of a logging service.

type LogMessageType =
| DebugMessage of string
| InfoMessage of string
| ErrorMessage of string

type ILogProvider =
  abstract member Log : LogMessageType -> unit

type LogService(provider : ILogProvider) =
  let log = provider.Log
  member this.LogDebug msg = DebugMessage(msg) |> log
  member this.LogInfo msg = InfoMessage(msg) |> log
  member this.LogError msg = ErrorMessage(msg) |> log

Here we use a discriminated union to define the message types the logging provider interface can handle. The log service itself provides a slightly friendlier API than the provider interface itself. Normally if we wanted to use the log service instance we’d have to define a concrete implementation of ILogProvider but object expressions allow us to easily define one inline.

let logger =
  LogService(
    { new ILogProvider with
      member this.Log msg =
        match msg with
        | DebugMessage(m) -> printfn "DEBUG: %s" m
        | InfoMessage(m) -> printfn "INFO: %s" m
        | ErrorMessage(m) -> printfn "ERROR: %s" m
    }
  )

In our object expression we use pattern matching to detect the message type, extract the associated string, and write an appropriate message to the console.

> logger.LogDebug "message"
logger.LogInfo "message"
logger.LogError "message";;
DEBUG: message
INFO: message
ERROR: message

What’s Not So Great?

I could continue on for a while with things I like about F# but this post is already long enough as it is. That said, I think it’s only fair to list out a few of the things I don’t like about the language.

Tooling Support

Probably the biggest gripe I have is the lack of tooling support around the language. So many tools and templates that I take for granted when working with C# simply aren’t available. Things like IntelliSense are pretty complete but if you’re just getting started with F# and looking to do more than a console application or library be prepared to spend some time looking for 3rd party templates and reading blog posts.

Language Interop

On a somewhat related note, even though F# compiles to MSIL and can reference or be referenced by other .NET assemblies there are some quirks that make language interoperability a but cumbersome. For instance, using extension methods defined in C# doesn’t work as cleanly as I thought they would. When I was experimenting with MassTransit I couldn’t get the UseMsmq, VerifyMsmqConfiguration, or a number of other extension methods to appear no matter what I tried. I ultimately had to call the static methods directly.

I’ve read that this is addressed in F# 3.0 but I haven’t done enough with 3.0 yet to confirm.

Project Structure

It’s not really fair to put this under the “what’s not so great?” heading but it seemed most appropriate. This isn’t so much an issue with the language as much as it’s a big mindset shift of a similar magnitude of switching from OO to functional. The structure of an F# project is significantly different than that of a C# (or even VB) project and is something I’m still struggling with.

In C# we generally organize code into folders representing namespaces and keep one type (class) per file. F# evaluates code from top down throughout the project so file sequence is significant. Furthermore, code is organized by modules rather than type. F# does have namespaces but even then they’re usually divided across one or more files and from my experience, not grouped by folder.

Wrap Up

No matter what language you work in, programming in a functional style provides benefits. You should do it whenever it is convenient, and you should think hard about the decision when it isn’t convenient.
— John Carmack, Functional Programming in C++

In general I’ve found that the more I learn and work with F# the more I like it. I regularly find myself reaching for it as my first choice, especially when it comes to new development. Although there are a few things that I don’t like about working in F# most of them just require more diligence on my part or are easily managed. I’ve only listed a few key areas where I think F# excels but I firmly believe that their strengths far outweigh any weaknesses.

Revisiting String Distances

Back in April I wrote about three different string distance algorithms. If you’re not familiar with string distance algorithms I encourage you to read the original post but in short, these algorithms provide different ways to determine how many steps are required to translate one string into another. Since the algorithms I wrote about aren’t particularly complex I thought translating them to F# would be a fun exercise.

In general the code is pretty similar to its C# counterpart but I was able to take advantage of a few F# constructs, tuples and pattern matching in particular. These examples may or may not be the best way to implement the algorithms in F# (ok, I’ll be honest, they’re probably not) but they certainly show how well suited F# is to this type of work.

Hamming Distance

The Hamming distance algorithm simply counts how many characters are different in equal length strings.

let hammingDistance (source : string) (target : string) =
  if source.Length <> target.Length then failwith "Strings must be equal length"
 
  Array.zip (source.ToCharArray()) (target.ToCharArray())
  |> Array.fold (fun acc (x, y) -> acc + (if x = y then 0 else 1)) 0

hammingDistance "abcde" "abcde" |> printfn "%i" // 0
hammingDistance "abcde" "abcdz" |> printfn "%i" // 1
hammingDistance "abcde" "abcyz" |> printfn "%i" // 2
hammingDistance "abcde" "abxyz" |> printfn "%i" // 3
hammingDistance "abcde" "awxyz" |> printfn "%i" // 4
hammingDistance "abcde" "vwxyz" |> printfn "%i" // 5

Levenshtein Distance

The Levenshtein distance algorithm counts how many characters are different between any two strings taking into account character insertions and deletions.

open System
let levenshteinDistance (source : string) (target : string) =
  let wordLengths = (source.Length, target.Length)
  let matrix = Array2D.create ((fst wordLengths) + 1) ((snd wordLengths) + 1) 0

  for c1 = 0 to (fst wordLengths) do
    for c2 = 0 to (snd wordLengths) do
      matrix.[c1, c2] <-
        match (c1, c2) with
        | h, 0 -> h
        | 0, w -> w
        | h, w ->
          let sourceChar, targetChar = source.[h - 1], target.[w - 1]
          let cost = if sourceChar = targetChar then 0 else 1
          let insertion = matrix.[h, w - 1] + 1
          let deletion = matrix.[h - 1, w] + 1
          let subst = matrix.[h - 1, w - 1] + cost
          Math.Min(insertion, Math.Min(deletion, subst))
  matrix.[fst wordLengths, snd wordLengths]

levenshteinDistance "abcde" "abcde" |> printfn "%i" // 0
levenshteinDistance "abcde" "abcdz" |> printfn "%i" // 1
levenshteinDistance "abcde" "abcyz" |> printfn "%i" // 2
levenshteinDistance "abcde" "abxyz" |> printfn "%i" // 3
levenshteinDistance "abcde" "awxyz" |> printfn "%i" // 4
levenshteinDistance "abcde" "vwxyz" |> printfn "%i" // 5
levenshteinDistance "abcdefg" "abcde" |> printfn "%i" // 2
levenshteinDistance "abcdefg" "zxyde" |> printfn "%i" // 5
levenshteinDistance "abcde" "bacde" |> printfn "%i" // 2
levenshteinDistance "abcde" "baced" |> printfn "%i" // 4

Damerau-Levenshtein Distance

The Damerau-Levenshtein algorithm builds upon the Levenshtein algorithm by taking transpositions into account.

open System
let damerauLevenshteinDistance (source : string) (target : string) =
  let wordLengths = (source.Length, target.Length)
  let matrix = Array2D.create ((fst wordLengths) + 1) ((snd wordLengths) + 1) 0

  for h = 0 to (fst wordLengths) do
    for w = 0 to (snd wordLengths) do
      matrix.[h, w] <-
        match (h, w) with
        | x, 0 -> x
        | 0, y -> y
        | x, y ->
          let sourceChar, targetChar = source.[x - 1], target.[y - 1]
          let cost = if sourceChar = targetChar then 0 else 1
          let insertion = matrix.[x, y - 1] + 1
          let deletion = matrix.[x - 1, y] + 1
          let subst = matrix.[x - 1, y - 1] + cost

          let distance = Math.Min(insertion, Math.Min(deletion, subst))

          if x > 1 && w > 1 && sourceChar = target.[y - 2] && source.[x - 2] = targetChar then
            Math.Min(distance, matrix.[x - 2, y - 2] + cost)
          else
            distance
  matrix.[fst wordLengths, snd wordLengths]

damerauLevenshteinDistance "abcde" "abcde" |> printfn "%i" // 0
damerauLevenshteinDistance "abcde" "abcdz" |> printfn "%i" // 1
damerauLevenshteinDistance "abcde" "abcyz" |> printfn "%i" // 2
damerauLevenshteinDistance "abcde" "abxyz" |> printfn "%i" // 3
damerauLevenshteinDistance "abcde" "awxyz" |> printfn "%i" // 4
damerauLevenshteinDistance "abcde" "vwxyz" |> printfn "%i" // 5
damerauLevenshteinDistance "abcdefg" "abcde" |> printfn "%i" // 2
damerauLevenshteinDistance "abcdefg" "zxyde" |> printfn "%i" // 5
// Transpositions
damerauLevenshteinDistance "abcde" "bacde" |> printfn "%i" // 1
damerauLevenshteinDistance "abcde" "baced" |> printfn "%i" // 2

F# Enumerations

I was originally going to include this discussion about F# enumerations in the discriminated unions post because their syntax is so similar but the section quickly took on a life of its own. Enumerations in F# are the same as enumerations from other .NET languages so all of the same capabilities and restrictions apply.

Defining Enumerations

Defining an enumeration in F# is much like defining a discriminated union but here we bind an integral value. Unlike in C# though, F# doesn’t automatically generate values for each item so we need to give every case a value.

type DayOfWeek =
| Sunday = 0
| Monday = 1
| Tuesday = 2
| Wednesday = 3
| Thursday = 4
| Friday = 5
| Saturday = 6

Changing Base Type

We can also change the enumeration’s underlying type by changing the suffix of each value. For example, if we wanted to change the underlying type for the DayOfWeek enumeration to byte we could do so by appending a y to each value as follows:

type DayOfWeekByte =
| Sunday = 0y
| Monday = 1y
| Tuesday = 2y
| Wednesday = 3y
| Thursday = 4y
| Friday = 5y
| Saturday = 6y

Referencing Enumeration Values

As is the case with enumerations in other .NET languages enumeration values are referenced through the enumeration name.

> DayOfWeek.Friday
val it : DayOfWeek = Friday

FlagsAttribute

Also like in other .NET languages, F# enumerations support the FlagsAttribute to indicate that each value in the enumeration represents a bit position.

open System

[<Flags>]
type Border =
| None = 0
| Right = 1
| Top = 2
| Left = 4
| Bottom = 8;;

let showBorder = Border.Bottom ||| Border.Left

To determine whether a particular flag is set we can either use bit-math or we can use the enumeration’s HasFlag method. Since I generally prefer the code I don’t have to write so I generally favor built-in approach:

> showBorder.HasFlag Border.Top;;
val it : bool = false

> showBorder.HasFlag Border.Left;;
val it : bool = true

Conversion

For those times when we only have the underlying integer value and want to convert that value to the corresponding enum value we can use the built-in generic enum function:

> enum<DayOfWeek> 5
val it : DayOfWeek = Friday

Note: The signature of the enum function above is int32 -> 'U so it can only be used with enumerations based on int32. If your enumeration is based on another data type such as with the DayOfWeekByte example above you’ll need to use the generic EnumOfValue function from the Microsoft.FSharp.Core.LanguagePrimitives module instead.

Likewise, we can get the underlying value by calling the corresponding conversion function:

int DayOfWeek.Friday
// val it : int = 5

byte DayOfWeekByte.Friday
// val it : byte = 5uy

F# Discriminated Unions

When I started learning F# discriminated unions were probably the first thing that really sold me on the language only partially because there’s nothing like them in C#. From a traditional .NET background a quick glance at them, particularly simple ones, may lead you to believe that they’re just glorified enumerations but in reality, discriminated unions are so much more powerful. While enumerations are limited to a single, fixed integral value but applications for discriminated unions go far beyond that.

Like enumerations, discriminated unions allow us to define values that can only be one of a specified set of named cases but that’s where the similarities end. Discriminated unions allow us to associate zero or more data items of any type with each case. The effect is that we can easily define tree structures, simple object hierarchies, and even leverage pattern matching.

Discriminated Unions as Object Hierarchies

I think the easiest way to get into discriminated unions is to start off with an example. In my studies of the language I’ve become quite fond of the example I first encountered in Programming F# where discriminated unions and a list comprehension are used to define a standard, unshuffled card deck. This example does a great job illustrating not only simple, “dataless” unions but also shows how we can associate multiple data items and even combine discriminated unions to achieve more expressive code.

We start by defining a discriminated union for each suit. The suits don’t have any data associated with them so this discriminated union simply includes the case names.

type Suit =
| Heart
| Diamond
| Spade
| Club

The second discriminated union defines the individual cards. Each card belongs to a particular suit and non-face cards will also include their denomination.

type Card =
| Ace of Suit
| King of Suit
| Queen of Suit
| Jack of Suit
| ValueCard of Suit * int

With just these two constructs we can define a value representing any card in a standard deck. For instance, if we wanted to represent the Ace of Spades, or 7 of Hearts we’d write Ace(Spade) and ValueCard(Heart, 7), respectively.

We don’t need to include to the actual discriminated union name when referring to a case. Should there be a naming conflict we’re free to include the union name with dot syntax (Suit.Heart) or we could instruct the compiler to force a qualified name with the RequireQualifiedAccess attribute.

If you were paying attention during the Card example you probably noticed a familiar syntax. When multiple data elements are associated with a case (such as with ValueCard) we use the tuple syntax to identify each element’s data type. While this allows for a lot of flexibility it also brings along some of the drawbacks of tuples such as not being able to define names for each value. We’ll look at one way to work around this limitation a little later when we discuss pattern matching.

The tuples in discriminated unions are syntactic tuples. The types the compiler generates for each case are nested classes that derive from the discriminated union type. The data elements are exposed as read-only properties following the same conventions as tuples but there aren’t any actual tuple class instances involved.

Now that we’re armed with some basic knowledge of discriminated unions we can use a list comprehension to easily construct a full, unshuffled deck.

let deck =
  [ for suit in [ Heart; Diamond; Spade; Club ] do
      yield Ace(suit)
      yield King(suit)
      yield Queen(suit)
      yield Jack(suit)
      for v in 2 .. 10 do
        yield ValueCard(suit, v)
  ]

Option Type

No introductory discussion about discriminated unions would be complete without at least a brief mention of the option type. The option type is a predefined discriminated union (Microsoft.FSharp.Core.FSharpOption) that’s useful for indicating when a value may or may not exist (remember: null generally isn’t used in F#). There are only two cases in the option type: Some(_) and None. Because the type itself is an unconstrained generic type, its cases can be used to indicate the presence of any value.

Although options are generally referenced in pattern matching expressions there’s an associated option module that contains several useful functions including Option.get, Option.bind, and Option.exists.

Discriminated Unions as Trees

A really nice aspect of discriminated unions is that they can be self-referencing. This makes creating simple tree structures trivial. To illustrate let’s create a very rudimentary HTML structure.

type Markup =
| Element of string * Markup list
| EmptyElement of string
| Content of string

Note that the Element case has two associated pieces of data: a string (the element name), and a list of additional Markup instances. From just these three cases we can quickly represent a simple HTML document:

let htmlTree =
  Element("html",
    [
      Element("head", [ Element("title", [ Content("Very Simple HTML") ]) ])
      Element("body",
        [
          Element("article",
            [
              Element("h1", [ Content("Discriminated Unions") ])
              Element("p",
                [
                  Content("See how ")
                  Element("b", [ Content("easy") ])
                  Content(" this is?")
                ])
              Element("p", [ Content("F#"); EmptyElement("br"); Content("is fun") ])
            ])
        ])
    ])

Discriminated Unions in Pattern Matching

We’ve seen how discriminated unions can be used to easily define both object hierarchies and tree structures. Like with so many things in F# though that power is amplified when combined with pattern matching. With pattern matching we can not only can we provide conditional logic for operating against each case but we can also work around the limitation I mentioned earlier where each associated data item is assigned a less than useful name.

Let’s expand upon the Markup example from the previous section to illustrate. Here we’ll define a recursive function to convert the htmlTree value to an HTML string.

let rec toHtml m : string =
  match m with
  | Element (tag, children) ->
      let sb = System.Text.StringBuilder()
      let append (s : string) = sb.Append s |> ignore

      append (sprintf "<%s>" tag)
      List.iter (fun c -> append (c |> toHtml)) children
      append (sprintf"</%s>" tag)
      sb.ToString()
  | EmptyElement tag -> sprintf "<%s />" tag
  | Content v -> sprintf "%s" v;;

htmlTree |> toHtml
> val it : string =
  "<html><head><title>Very Simple HTML</title></head><body><article><h1>Discriminated Unions</h1><p>See how <b>easy</b> this is?</p><p>F#<br />is fun</p></article></body></html>"

In this example we have a pattern for each Markup case. Each case is covered by a pattern so there’s no need to introduce the wildcard (_) pattern here. Each pattern includes an identifier for each data item associated with the case. The identifiers allow us to refer to each item with something meaningful rather than the default name. Identifiers are also useful within when constraints to provide more granular control over what matches a given pattern. For instance if we wanted to apply different handling for p elements we could add the pattern | Element (tag, children) when tag = "p" -> ... to the beginning of the match expression.

Members in Discriminated Unions

Just as with record types, discriminated unions ultimately compile down to classes. As such, we can expand upon them with additional members such as methods. Continuing with the Markup it makes sense to keep the toHtml method with the Markup type so let’s combine them.

To convert the function to a method we only need to make a few small tweaks. First, we’ll eliminate the argument since we’ll be operating against the current instance. We’ll also add a type annotation to the delegate used by the List.iter function.

type Markup =
| Element of string * Markup list
| EmptyElement of string
| Content of string
  member x.toHtml() : string =
    match x with
    | Element (tag, children) ->
        let sb = System.Text.StringBuilder()
        let append (s : string) = sb.Append s |> ignore

        append (sprintf "<%s>" tag)
        List.iter (fun (c : Markup) -> append (c.toHtml())) children
        append (sprintf"</%s>" tag)
        sb.ToString()
    | EmptyElement tag -> sprintf "<%s />" tag
    | Content v -> sprintf "%s" v

Note that the member function is no longer marked as recursive. This is because recursion is implicitly allowed for members.

Now that the toHtml function is a method of the Markup type we can call it like we would a method on any other type:

htmlTree.toHtml()
> 
val it : string =
  "<html><head><title>Very Simple HTML</title></head><body><article><h1>Discriminated Unions</h1><p>See how <b>easy</b> this is?</p><p>F#<br />is fun</p></article></body></html>"

Wrapping Up

Discriminated unions are a highly useful construct with applications far beyond that of enumerations. Whether they’re being used to construct an object hierarchy or represent a tree structure discriminated unions can greatly enhance the clarity of your code.

Sometimes a discriminated union is bit overkill for the problem space and an enumeration is all that’s needed. In the next post we’ll explore how to define enumerations in F#.

Concatenating PDFs with F# and PdfSharp

In order to test part some new functionality I’m working on I needed to combine a bunch of somewhat arbitrary documents into one. At first I thought about just re-scanning the documents but the scanner attached to my PC is a single-sheet flatbed. I also thought about running them through our sheet-feed scanner but it’s attached to my counterpart’s PC, not shared, and I didn’t want to bother him with it. I’ve been working a bit with the PdfSharp library lately so I decided to do the programmer thing and write a script to concatenate the documents. Of course, this would be a perfect chance to flex my developing F# muscles and came up with what I think is a pretty decent solution.

Before taking a look at the script there are a few things to note:

The script is intended to be run from within a folder that contains the documents that you want to concatenate. This is mainly because I had a bunch of documents I wanted to combine and including the full path to each one quickly became unwieldy.
I don’t check that files exist or that what’s supplied is actually a PDF. PdfSharp will throw an exception in those cases
I needed something quick so I hard-coded the destination file name as Result.pdf.
fsi.CommandLineArgs includes the name of the script as the first array item. The easiest thing I could think of for getting a list that didn’t include the script name was to create a new list with Array.toList and grab the Tail.
The PdfPages class is built around IEnumerable rather than IEnumerable<T> so I needed to use a comprehension to build the pages list.

Script:

// PdfConcat.fsx

#r "PdfSharp.dll"

open System;
open System.IO
open PdfSharp.Pdf
open PdfSharp.Pdf.IO

let readPages (sourceFileName : string) =
  use source = PdfReader.Open(sourceFileName, PdfDocumentOpenMode.Import)
  [ for p in source.Pages -> p ]

let createDocFromPages pages =
  let targetDoc = new PdfDocument()
  pages |> List.iter (fun p -> targetDoc.Pages.Add p |> ignore)
  targetDoc

let docNames = (Array.toList fsi.CommandLineArgs).Tail
let workingDirectory = Environment.CurrentDirectory;
let targetFileName = Path.Combine(workingDirectory, "Result.pdf")

let allPages =
  [ for n in docNames -> Path.Combine(workingDirectory, n) ]
  |> List.map readPages
  |> List.concat

let doc = createDocFromPages allPages
doc.Save targetFileName
doc.Dispose()

Usage:

D:\ScannedDocuments>fsi D:\Dev\FSharp\PdfConcat\PdfConcat.fsx "07-09-2012 05;29;44PM.PDF" "07-11-2012 08;47;13PM.PDF" "07-11-2012 08;48;24PM.PDF" "07-11-2012 08;50;27PM.PDF"

F# Record Types

In the last post we talked about tuples and how they’re a convenient construct for grouping data. For all their convenience though sometimes tuples aren’t quite powerful enough. Sometimes we want to name the values with something more descriptive than fst or snd. Sometimes we may even want to attach additional members to the type. In these cases we may be inclined to fall back on traditional OOP and create a class – OOP is one of the paradigms supported by F#, after all. In some cases a class may be appropriate but F# offers a syntactically lighter-weight alternative – record types.

Although at their core record types are classes in MSIL but in F# they differ from classes in a few important ways. First, record types have structural equality rather than reference equality. This is achieved by generating overrides for CompareTo, GetHashCode, and Equals. Next, every label is automatically exposed as a read-only property. Finally, record types give us no control over the constructor.

Defining and Creating Record Types

Defining a record type is much easier than defining a class in F#. In keeping with the circle measurements theme from the tuple post, we can create a record type to contain the same measurements as the tuple as follows:

> type CircleMeasurements = { Diameter : float; Area : float; Circumference : float; };;

type CircleMeasurements =
  {Diameter: float;
   Area: float;
   Circumference: float;}

Each of the names within the record definition (the names between the curly braces) are called labels and they must have a type annotation. If we wanted to put each label on its own line we could omit the semicolons.

Creating instances of record types is accomplished with record expressions. With basic record expressions we assign a value to each label:

> let measurements = { Diameter = 5.0; Area = 19.63495408; Circumference = 15.70796327 };;

val measurements : CircleMeasurements = {Diameter = 5.0;
                                         Area = 19.63495408;
                                         Circumference = 15.70796327;}

Just as with defining the type, we can omit the semicolons between the values if we place them each on a separate line.

Conflict Resolution

Occasionally we’ll run across multiple record types with the same labels. When this happens, the type inference engine can’t resolve the proper record type and we need to provide a little more information to the compiler to ensure that the correct type is selected. Consider a system that has both favorites and bookmarks defined as record types:

> type Favorite = { Name : string; Url : string };;

type Favorite =
  {Name: string;
   Url: string;}

> type Bookmark = { Name : string; Url : string };;

type Bookmark =
  {Name: string;
   Url: string;}

You can see that both of these record types has the labels Name and Url. If we were to create an instance as we saw above the compiler will infer that we want the Bookmark type because it was evaluated more recently than Favorite.

> let mySite = { Name = "Didactic Code"; Url = "http://davefancher.com" };;

val mySite : Bookmark = {Name = "Didactic Code";
                         Url = "http://davefancher.com";}

If we really need to use the Favorite type instead of Bookmark we need to explicitly identify the it by prefixing it on the first label:

> let mySite = { Favorite.Name = "Didactic Code"; Url = "http://davefancher.com" };;

val mySite : Favorite = {Name = "Didactic Code";
                         Url = "http://davefancher.com";}

Copy & Update

As with everything else in F#, record types are immutable by default. The implication for record types is that if we want to make a change to an existing instance we need to copy that instance and provide new values where appropriate. Record expressions not only allow creating new instances from scratch but also provide a convenient copy & update syntax. Consider the Favorite instance from the last section. If we needed to change the Url value we could use the copy & update syntax and specify mySite as the template:

> let myOtherSite = { mySite with Url = "http://www.davefancher.com" };;

val myOtherSite : Favorite = {Name = "Didactic Code";
                              Url = "http://www.davefancher.com";}

With copy & update syntax the system will copy forward any values not explicitly assigned. If we were changing multiple values we’d follow the same pattern, separating each label/value pair with a semicolon.

Mutability

If we actually do want to allow changes to a record we can easily make individual labels mutable by prefixing each label we want to make mutable with the mutable keyword. For instance, we can change the type definition for the Favorite type to allow changing the Name as follows:

> type Favorite = { mutable Name : string; Url : string };;

type Favorite =
  {mutable Name: string;
   Url: string;}

Creating an instance of the Favorite type is the same as we’ve already seen:

> let mySite = { Name = "My Blog"; Url = "http://davefancher.com" };;

val mySite : Favorite = {Name = "My Blog";
                         Url = "http://davefancher.com";}

Changing the mutable value is the same as changing any other mutable value in F#:

> mySite.Name <- "Didactic Code";;

val it : unit = ()

> mySite;;

val it : Favorite = {Name = "Didactic Code";
                     Url = "http://davefancher.com";}

Additional Members

Like classes, record types can have additional members such as methods defined on them. If we wanted to extend our Favorites type to include a method to open the associated page we could do so with a simple Visit method that invokes Process.Start.

> open System.Diagnostics;;
> type Favorite =
    { Name : string; Url : string }
    member this.Visit() = Process.Start("IExplore.exe", this.Url);;

type Favorite =
  {Name: string;
   Url: string;}
  with
    member Visit : unit -> Process
  end

Invoking the method is just like invoking a method on any other type in F#:

> let mySite = { Name = "Didactic Code"; Url = "http://davefancher.com" };;

val mySite : Favorite = {Name = "Didactic Code";
                         Url = "http://davefancher.com";}

// Remove the pipe forward to ignore to see process details
mySite.Visit() |> ignore

Pattern Matching

One area that I really haven’t devoted enough space to so far is F#’s pattern matching. I plan to explore pattern matching in much more detail in a future post so for now, if you’re not already familiar with the concept it’s enough to think of it as a much more powerful version of C#’s switch statements (or VB’s select case).

When using record types with pattern matching we need to use a record pattern. Record patterns deconstruct the record into it’s components so we can work with them individually. Record patterns don’t require us to use every value from the record either so we’re free to only include the relevant parts.

Identifying a point’s quadrant in a two-dimensional coordinate plane lets us see most of these concepts in action.

> type Point = { X : float; Y : float }

let FindQuadrant point =
  match point with
  | { X = 0.0; Y = 0.0 } -> "origin"
  | { X = 0.0 } -> "y-axis"
  | { Y = 0.0 } -> "x-axis"
  | p when p.X > 0.0 && p.Y > 0.0 -> "I"
  | p when p.X < 0.0 && p.Y > 0.0 -> "II"
  | p when p.X < 0.0 && p.Y < 0.0 -> "III"
  | p when p.X > 0.0 && p.Y < 0.0 -> "IV"
  | _ -> failwith "unknown";;

type Point =
  {X: float;
   Y: float;}
val FindQuadrant : Point -> string

In this example we see matching against both the X and Y coordinate to determine if the point is the origin. We then see matching against either the X or Y coordinate to determine if the point is on an axis. With the special cases accounted for we move on to using when conditions to check for the individual quadrants. Finally, we have a wildcard pattern that uses failwith to throw an exception.

Wrapping Up

Record types provide a concise and convenient syntax for quickly creating types with named values and optional members.

In the next post we’ll explore another one of F#’s powerful constructs, the discriminated union, and see how it can help us quickly create simple object hierarchies.

Rate this:

Spread The Word:

Rate this:

Spread The Word:

Rate this:

Spread The Word:

Rate this:

Spread The Word:

Terse Syntax

Functional Style

Discriminated Unions

Collection Types & Comprehensions

Pattern Matching

Units of Measure

Object Expressions

What’s Not So Great?

Tooling Support

Language Interop

Project Structure

Wrap Up

Rate this:

Spread The Word:

Hamming Distance

Levenshtein Distance

Damerau-Levenshtein Distance

Rate this:

Spread The Word:

Defining Enumerations

Changing Base Type

Referencing Enumeration Values

FlagsAttribute

Conversion

Rate this:

Spread The Word:

Discriminated Unions as Object Hierarchies

Option Type

Discriminated Unions as Trees

Discriminated Unions in Pattern Matching

Members in Discriminated Unions

Wrapping Up

Rate this:

Spread The Word:

Rate this:

Spread The Word:

Defining and Creating Record Types

Conflict Resolution

Copy & Update

Mutability

Additional Members

Pattern Matching

Wrapping Up

Rate this:

Spread The Word: