LINQed Up (Part 3)

This is the third part of a series intended as an introduction to LINQ.  The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner.  In this post we’ll take a more detailed look at composing queries through a variety of examples and introduce some new concepts such as deferred execution.

In the previous post we covered some of the common query methods and introduced both method and query syntax.  By the end of this post you should have a solid foundation to begin composing your own queries within your applications.  The examples will use either method or query syntax depending upon their complexity.  Before we see the examples though we should take a closer look at query syntax.

Diving Into Query Syntax

As mentioned in the previous post query syntax allows us to compose LINQ expressions in a SQL-like manner.  This is accomplished through the use of several new or overloaded keywords that are converted into their method syntax equivalent at compile time.

Note: Not every method in the System.Linq.Enumerable class has a query syntax equivalent.

The main keywords used by query syntax are:

  • from
  • where
  • select
  • join
  • orderby [ascending | descending]
  • group by
  • let

Although the purpose of most of these keywords should be pretty clear it’s worth taking a closer look at each of them.

from

The from keyword identifies the sequence that will serve as the primary source for the query.  Every query built using query syntax must begin with a from clause.  By placing the from clause first we get full IntelliSense support on our query syntax queries.  Any type that implements IEnumerable can be used in the from clause.

Some queries will contain multiple from clauses.  This is a special syntax called SelectMany that is used to flatten multiple sequences into one.

where

LINQ query syntax provides a where clause that provides the same basic function as its SQL counterpart.  In the LINQ form we get the benefit of strong typing and IntelliSense.

Note: The criteria provided in the where clause can include any number of .NET operations but may be limited depending on the capabilities of the LINQ provider.

select

The select clause is used to project query results into a new sequence.  Select is capable of returning single values, well-known types, or anonymous types.  Select is the workhorse behind one of LINQ’s most powerful features: transforming data from one structure to another.

join

LINQ is capable of joining multiple sequences together much like SQL can join tables or views.  Part of what makes LINQ so powerful is its ability to join data from disparate sources in a single query.  This allows us to join simple .NET collections to XML or data returned from LINQ to SQL or Entity Framework.

One restriction on join joins in LINQ is that it requires an equality comparison.  To eliminate any ambiguity about what can be used Microsoft introduced the equals keyword to replace the == operator.

orderby

LINQ allows sorting in query syntax through the use of the orderby keyword.  By default items will be sorted in ascending order but that can be controlled through the ascending and descending keywords.  Multiple items can be sorted by separating them with a comma.

group by

Grouping in LINQ works a bit different than in SQL.  In SQL we simply specify one or more column names in the group by clause.  LINQ’s operates by projecting values into a System.Linq.IGrouping that is keyed upon the group and all group members as members of the group.  Since IGrouping implements IEnumerable it can be directly projected into another query.  Grouping can be performed upon either a data value or a calculated value.

let

The let keyword is a bit of an oddity.  Unlike the other query syntax keywords let doesn’t have a direct mapping to a method.  Instead, let is provided for convenience to declare query scoped variables to eliminate the need for repetitive operations within a query.

LINQ Examples

Now that we’ve covered the how LINQ works, its common methods, and syntax we should have a good foundation to see some examples of LINQ in action and understand what is happening.  In the following sections we’ll see how all of the pieces we’ve discussed so far fit together to make LINQ a powerful tool for solving a wide variety of set based problems.

Each of the examples assume the following data structures and data:

class Author
{
	public Author(string firstName, string lastName)
	{
		FirstName = firstName;
		LastName = lastName;
	}

	public string FirstName { get; set; }
	public string LastName { get; set; }
}

class Book
{
	public Book(int id, string title, string isbn13, int copyrightYear, params Author[] authors)
	{
		ID = id;
		Title = title;
		Isbn13 = isbn13;
		CopyrightYear = copyrightYear;
		Authors = authors;
	}

	public int ID { get; set; }
	public string Title { get; set; }
	public string Isbn13 { get; set; }
	public int CopyrightYear { get; set; }
	public IEnumerable<author> Authors { get; set; }
}

IEnumerable<book> GetLibrary()
{
	return new List<book>()
	{
		new Book(
			1,
			"Essential LINQ",
			"978-0-321-56416-0",
			2009,
			new Author("Charlie", "Calvert"),
			new Author("Dinesh", "Kulkarni")
		),
		new Book(
			2,
			"Programming F#",
			"978-0-596-15364-9",
			2010,
			new Author("Chris", "Smith")
		),
		new Book(
			3,
			"C# 4.0 in a Nutshell",
			"978-0-596-80095-6",
			2010,
			new Author("Joseph", "Albahari"),
			new Author("Ben", "Albahari")
		),
		new Book(
			4,
			"WPF in Action with Visual Studio 2008",
			"978-1-933-98822-1",
			2009,
			new Author("Arlen", "Feldman"),
			new Author("Maxx", "Daymon")
		),
	};
}
Note: I recommend using LINQPad to run these examples. If using LINQPad remember to change the language option to C# Program. If using Visual Studio to run the examples make sure your code file includes a using directive for System.Linq.

Starting Simple

This first example illustrates the most basic form of a LINQ statement.  Consider it the “Hello World” example.

var query = from b in GetLibrary()
			select b;

Even in this example we can see several of the LINQ concepts coming into play. First, query is defined using the var keyword indicating type inference. The compiler understands that GetLibrary() returns IEnumerable<Book> and the select clause is projecting instances of Book so in this case the inferred type of query is IEnumerable<Book>. We also see how the from clause defines the range variable and appears before the select clause.

Despite the simplicity of this query it LINQ really isn’t that effective here. This query can be better expressed by simply referring to GetLibrary() since all we’re doing is retrieving each item in the sequence.

Transformations

LINQ includes extension methods for converting sequences to lists, dictionaries, and arrays but it is by no means restricted to those types. Through the Select clause (or method) we can return either a new instance of a well-known type or define a new anonymous type on the fly.
Transform to Array

var arr = GetLibrary().ToArray();

Transform to XML
To really demonstrate the transformation capabilities we need to see a more complex example. Here we’ll take the entire structure returned by GetLibrary() into a well-formed XML document with LINQ to XML.

var query = new XElement("library",
						 from b in GetLibrary()
						 select new XElement("book",
						 					 new XAttribute("title", b.Title),
											 new XAttribute("isbn_13", b.Isbn13),
											 new XAttribute("copyright", b.CopyrightYear),
											 new XElement("authors",
											 			  from a in b.Authors
														  select new XElement("author",
														  					  new XText(String.Format("{0}, {1}", a.LastName, a.FirstName))
														 )
											 )
						 )
			);

Basic Filtering

Much of LINQ’s real power comes from the operations it can perform against sequences so what’s a better place to start than with it’s filtering capabilities.  Prior to LINQ if we wanted to operate against a subset of a sequence we needed to nest potentially complex logic in the body of a loop. With LINQ we can filter the sequence down to only the elements we care about before entering the loop.

var query = from b in GetLibrary()
			where b.ID == 1
			select b;

This example illustrates how easy it is to reduce a sequence to only the elements we care about. Here we’re reducing the full sequence down to only those elements with an ID of 1. Although it isn’t necessarily obvious, this query introduces a lambda expression. Let’s look at the same query using method syntax to bring out the lambda.

var query = GetLibrary().Where(b => b.ID == 1);

In either case we’ll get an IEnumerable<book> that contains only a single item. In situations like this where we know we’ll only get one item or we just want the first no matter how many items are returned we can turn to the First() or FirstOrDefault() extension methods. Each of these methods have an overload that accepts a lambda expression to filter the sequence they are acting upon. We can simplify the above queries to just get the single book we care about:

var query = GetLibrary().FirstOrDefault(b => b.ID == 1);

Filtering is also not limited to a single value nor is it restricted to values contained directly in a sequence.

var query = from b in GetLibrary()
			where b.CopyrightYear == 2009 && b.Authors.Count() > 1
			select b;

The example above reduces the sequence to only those books with a copyright year of 2009 and multiple authors. While the collection of authors is contained within each book the number of authors is determined by calling the Count() extension method.

Deferred Execution

Having seen a few queries and the role that lambda expressions play in them we have a perfect opportunity to explore deferred execution. In each of the above examples we define a variable named query and set it to the query. The query is not actually executed until it is enumerated. That is, even though we have defined the query we don’t actually determine what items make up the sequence until we force enumeration of the query by using it a loop or call a method that causes it to enumerate. To observe the effect consider the next example:

var title = "LINQ";

var query = from b in GetLibrary()
			where b.Title.Contains(title)
			select b;

title = "F#";

var book = query.FirstOrDefault();

The query references a local string variable named title that was initialized to “LINQ” prior to the definition of the query. We then change the value of title to “F#” before retrieving the first item from the new sequence. Because the value of the variable isn’t resolved in the query until the query is executed we get the book “Programming F#” rather than “Essential LINQ.”

Deferred execution is an important concept to understand in LINQ to Objects and critical for LINQ to SQL and Entity Framework.

Sorting

Sorting with LINQ is pretty straight forward. Results can be sorted on one or more values in ascending or descending order.
Single Value Ascending

var query = from b in GetLibrary()
			orderby b.Title
			select b.Title;

Single Value Descending

var query = from b in GetLibrary()
			orderby b.Title descending
			select b.Title;

Multiple Values

var query = from b in GetLibrary()
			orderby b.CopyrightYear descending, b.Title
			select new { b.ID, b.Title, b.CopyrightYear };

The same query can be expressed in method syntax as follows:

var query = GetLibrary()
				.OrderByDescending (b => b.CopyrightYear)
				.ThenBy (b => b.Title)
				.Select(b => new
					{
						b.ID,
						b.Title,
						b.CopyrightYear
					}
				);

Joins

Sequences from one or more data sources may be joined together in a single query.  The data sources don’t even need to be of the same type.

LINQ supports both inner and outer joins. To demonstrate the join capabilities we’ll introduce an XElement that we can join to using part of each book’s ISBN.

Note: The XElement class is part of LINQ to XML and can be found in the System.Xml.Linq namespace. The LINQ to XML classes offer many advantages over the traditional XML classes in that they were specifically designed for queries and composition.
XElement GetPublishers()
{
	return new XElement("publishers",
						new XElement("publisher",
									 new XAttribute("id", 321),
									 new XText("Addison-Wesley")),
						new XElement("publisher",
									 new XAttribute("id", 933),
									 new XText("Manning")),
						new XElement("publisher",
									 new XAttribute("id", 596),
									 new XText("O'Reilly")),
						new XElement("publisher",
									 new XAttribute("id", 7653),
									 new XText("Tor Books")),
						new XElement("publisher",
									 new XAttribute("id", 312),
									 new XText("St. Martin's Griffin")));
}

Inner Join
In the next example we retrieve all of the publisher elements from our XML document. Since we don’t have the publisher ID available directly within the Book class we extract it from the ISBN and specify that the extracted publisher id is equal to the id attribute of each element. We then project a new anonymous type that includes values from both sequences.

var query = from b in GetLibrary()
			join p in GetPublishers().Descendants("publisher") on int.Parse(b.Isbn13.Split('-')[2]) equals int.Parse(p.Attribute("id").Value)
			select new { PublisherName = p.Value, b.Title };

Outer Join
Outer joins are a bit more complicated than inner joins but only slightly. In addition to specifying the join values we need to project the results of the join into another sequence that we then select from using the DefaultIfEmpty extension method.

var query = from p in GetPublishers().Descendants("publisher")
			let publisherID = int.Parse(p.Attribute("id").Value)
			join b in GetLibrary() on publisherID equals int.Parse(b.Isbn13.Split('-')[2])
				into bookPublishers
			from bp in bookPublishers.DefaultIfEmpty()
			select new
			{
				PublisherID = publisherID,
				PublisherName = p.Value,
				Book = bp
			};

SelectMany

SelectMany allows us to flatten multiple sequences. Using our Books data we can flatten the structure to extract a single sequence containing all the authors.

var query = from b in GetLibrary()
			from a in b.Authors
			orderby a.LastName
			select a;

Grouping

Grouping query results by a value is fairly straight-forward. With grouping we need to identify the source sequence, specify the group ing value, then project the grouped results into another IGrouping sequence. In this example we’ll see how to group books by copyright year.

var query = from b in GetLibrary()
			group b by b.CopyrightYear into y
			select y;

Next Steps

Having read through and (hopefully) trying the examples you should have a good understanding of how to implement LINQ in your projects. While LINQ is great for simplifying code it does come at a price. In the next post we’ll examine some of the performance implications of using LINQ and look at how to optimize some queries.

Working With JavaScript

I’ve started on a new project that’s going to be pretty heavy on JavaScript.  Today one of my teammates and I were discussing which editor we should use.  I haven’t done much with JavaScript in the past few years but in the past I’ve typically used Notepad++.  Given the amount of work I’m going to be doing with it in the near future I decided I needed something more than a bulked up text editor.  My co-worker convinced me to give Eclipse with the EclipseJS Plug-in a try because that combination allowed easy navigation between JavaScript files, some code completion capabilities, and a nice function list.  I messed around with it for a little while and although the UI was clean it didn’t feel right.  Although I admittedly didn’t use it for long the Eclipse/EclipseJS combination was missing some things I’ve come to expect from a modern IDE.

The first thing I noticed was the lack of code formatting.  I don’t know about you but I really don’t want to spend my days worrying about indentation levels, bracket positioning, and spacing when the IDE can do it all for me.  When you’re used to having this done automatically you really become cognizant of what a huge time sink code formatting really is when you have to do it manually.

Another productivity blocker came from the lack of any automated commenting.  When I need to comment out a block of code in Visual Studio it’s Ctrl + E, Ctrl + C (uncomment: Ctrl + E, Ctrl + U) but when I looked for a button or menu item in Eclipse to do the same thing I came up empty handed.  Manually commenting each line or navigating to the start and end of the block to insert the comment symbols is a tedious waste of time.  On a related note, many of the convenient shortcuts I’m used to such as cutting an entire line just by moving the cursor to that line just weren’t available.

Finally, there was no easy way to open the folder containing the JavaScript file I was working on.  This might not sound important but since we’re using TFS for source control on this project I need easy access to either the Source Control Explorer or the TFS Power Tools shell extension to check out the file.  Sure, I could copy the full file path from a context menu but then that still requires navigating to the folder.  Visual Studio provides both.

With all this in mind it was clear that Eclipse wasn’t going to work for me so I jumped back to Visual Studio.  After a few quick tweaks to the JavaScript code formatting settings I was coding like it was C#.  There was one thing I really liked in EclipseJS that was missing in Visual Studio: a JavaScript function list.  The ability to navigate between functions simply by double-clicking on a name was really nice and ultimately a good time saver.  It turns out there’s a free extension for that.

The JavaScript Parser extension adds a new JavaScript Parser window that can be docked along with Solution Explorer (or other windows) and displays a tree containing all of the functions in the file.  What really sold me on the extension was that it also includes all of the anonymous functions used for event handling or jQuery functions such as .each.  To get the extension just install it through the extension manager.  Once the extension is installed restart Visual Studio and open the window through View -> Other Windows -> JavaScript Parser.

One of the most compelling reasons I found for using Visual Studio 2010 for editing JavaScript is its superb IntelliSense capabilities.  In addition to the core JavaScript objects, DOM objects, and user-defined types and functions it also allows for external references through the use of directives.  The inclusion of support for external references means that it is incredibly easy to activate IntelliSense support for other JavaScript libraries like jQuery or Reactive Extensions for JavaScript by adding a reference directive to the file.

Adding a Reference to jQuery

  1. In the file that will reference jQuery add the following line before any script (typically the beginning of the file).  The path attribute must reflect relative URI of the jQuery JavaScript file.
    /// <reference path="jQuery-1.4.2.min.js" />
    
  2. Press Ctrl + Shift + J to force IntelliSense to refresh.

References aren’t restricted to local files nor are they restricted to just JavaScript files.  It is perfectly acceptable to reference aspx files as well.

Notice the syntax of the reference directive in the example above.  It should be familiar to anyone using XML Comments in C# because that’s exactly what it is.  As part of the IntelliSense support Visual Studio supports XML Comments in JavaScript.

I’d be willing to try Eclipse again if I missed something.  I’ve also both heard and read some good things about Aptana Studio which I haven’t tried yet although it looks promising.  For now though I’ll probably continue to use VS2010 for JavaScript editing because it meets my needs really well.

TFS2010: Reverting a Branch to a Folder

My team is in the process of transitioning from SVN to TFS for version control. One lesson we just learned the hard way was that TFS doesn’t support the concept of nested branching. Early on in our transition I branched an individual folder and things had been going quite smooth until yesterday when we needed to branch the main folder to spin off a side-project and TFS gave me a nice message about the folder already containing a branch. Uh oh…

I spent a few hours grasping at straws trying to get out of this situation. I even tried reverting a couple of changesets with the rollback functionality included in the TFS 2010 Power Tools but nothing seemed to get rid of the old branch. Eventually I stumbled across this MSDN forum post that said the command to change a branch back to a folder is just a matter clicking File -> Source Control -> Branching and Merging -> Convert to Folder in Visual Studio. After selecting that menu item the branch icon changed back to the standard folder icon and I was able to create the new branch.  Had this option been available in the context menu I’d have found it right away rather than spinning my wheels but now I know it’s there should I need it again.

LINQed Up (Part 2)

This is the second part of a series intended as an introduction to LINQ.  The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner.  In this post we’ll look at some of the common query methods in Enumerable and what LINQ looks like.

In the previous post we defined LINQ and discussed some of the features of the .NET Framework that make LINQ possible.  This post will build upon that foundation.  By the end of this post you should understand the basics of LINQ and understand how it can fit into your development toolbox.

Common Query Methods

The Enumerable class in the System.Linq namespace is central to LINQ’s functionality in that it defines all of the core extension methods that make up LINQ.  There are tons of methods in the Enumerable class but for the purpose of this post we’ll focus on just a few.  Most of the methods here have at least one overload.  We’ll examine each method taking a more generalized approach to introduce the methods and divide them into a few categories.

Sequence Operations

The methods in this section work across an entire sequence.

Method Name Description
Where Primary method used to filter a sequence.
Select Primary method used to project results from the query
Join Allows combining sequences into a single query.  Similar to a SQL join.
OrderBy/OrderByDescending Sorts a sequence.
All Indicates whether every element in a sequence meet the specified criteria.
Any Indicates whether any element in a sequence meet the specified criteria.
Count Gets the number of elements in a sequence.

Element Retrieval Operations

Each of the methods below allow retrieval of a specific element in a sequence and have two forms.  The basic form will throw an exception if the element cannot be found while the OrDefault version will return default(T).

Method Name Description
First/FirstOrDefault Retrieves the first element in a sequence.
ElementAt/ElementAtOrDefault Retrieves the element at the specified position in a sequence.
Last/LastOrDefault Retrieves the last element in a sequence.

What Does LINQ Look Like?

LINQ statements can be written in either of two forms: query syntax and method (dot) syntax.  While these forms are functionally equivalent they each have their place and the decision about which form to use will often come down to readability and is typically situational.  There is no reason to use one form exclusively.

Each of the following examples will use a sequence that contains the first ten numbers in the Fibonacci sequence.

var fibonacci = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34, 55 };

Where We’re Coming From

Before looking at each LINQ syntax it’s helpful to see an example of what LINQ aims to address.  Consider the following code that builds a sequence of the even numbers found in the Fibonacci sequence defined above:

var evenNumbers = new List();
foreach(var i in fibonacci)
{
	if(i % 2 == 0)
	{
		evenNumbers.Add(i);
	}
}

This code, while not particularly complex involves a number of steps to accomplish a relatively simple task.  First we define a List<int> to hold the even numbers.  We then enter a loop where we check whether each value is even before adding it to the list.  This is imperative programming at its finest.  The focus of the code is on how rather than what.  Of the eight lines of code only one line is really relevant to the problem of finding even numbers in the source sequence.  LINQ addresses the imperative nature of this code by providing a functional framework to let us focus on the what rather than the how.

Method Syntax

Method syntax is a fluent interface that allows building queries using method calls.  As its name implies, method syntax calls the LINQ extension methods directly passing lambda expressions as parameters.  Compare the code below with the traditional example above.

var evenNumbers = fibonacci.Where(i => i % 2 == 0);

In this example we remove all of the imperative code and replace it with a single method call and let LINQ do all of the heavy lifting.  When the Where method executes it calls the supplied lambda expression for each value in the source sequence.  The lambda expression must return a boolean value that informs the Where method whether or not the current value meets the criteria.

Method syntax tends to be more readable for simple queries such as this example.

Query Syntax

Alternatively, query syntax introduces a SQL-like syntax for writing queries.  Here is the same example repeated using query syntax:

var evenNumbers =
    from i in fibonacci
    where i % 2 == 0
    select i;

Query syntax tends to be more verbose than method syntax but is well suited for composing more complex queries such as those that use joins.

Notice the SQL like structure of the above query. One important difference to note between SQL and query syntax is that the from and where clauses are in the opposite order as they would be in a SQL query.  In fact, query syntax requires the from clause to appear first.  By placing the from clause at the beginning of the query we get all of the benefits of IntelliSense within Visual Studio.  Ultimately though, query syntax is just some syntactic sugar.  When the code is compiled any queries using query syntax are parsed and converted to method syntax.

Next Steps

Now that we’ve seen some of the common LINQ methods and understand how to compose LINQ statements with both method and query syntax we can look at how to use the various methods.  The next post will go in-depth showing how to compose LINQ statements using the common methods and introduce some new concepts such as deferred execution.

LINQed Up (Part 1)

This is the first of a series intended as an introduction to LINQ. The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner. In this post we’ll look at what LINQ is and how it works.

My director and I were recently talking about questions he has been asking candidates for senior level .NET development positions. He mentioned that he has been asking the candidates to describe LINQ and some situations where it would be useful.  The response from each of the candidates has ranged from a blank stare to something along the lines of “it means you don’t need to write SQL anymore.”  Those responses are the inspiration for this series.

The blank stares are discouraging but the statements that constrain LINQ to a very specific use case illustrate a fundamental lack of understanding of the technology. It is true that LINQ can greatly simplify interaction with a database through LINQ to SQL or Entity Framework but those are only a small part of what LINQ can do.  In fact, the majority of the places I’ve used LINQ have no database interaction whatsoever.  LINQ has so many applications beyond database access that I find myself using at least some part of it in most of my projects and often in some unexpected places.

Let’s start with a trip through the basics.

What is LINQ?

Language INtegrated Query (LINQ) was introduced with the .NET Framework v3.5. MSDN has this to say about it:

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.

Although the description is accurate I think the language regarding “native language syntax for queries” is what leads people to mistakenly believe that the only use for LINQ is with a database. After all, we’ve been conditioned to think that queries are database operations.  That said, I offer an alternative definition:

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for querying data from a variety of sources and provides class libraries to take advantage of these capabilities.

The idea that LINQ makes it possible to query data from a variety of sources is critically important to using it to its full potential. It means that LINQ is not constrained to working with databases but actually comes in several flavors:

  • LINQ to Objects
  • LINQ to XML
  • LINQ to SQL (and Entity Framework)
  • LINQ to DataSets
  • LINQ to Twitter
  • etc…

Essentially any data source can be queried with LINQ as long as there’s a corresponding provider. In addition to providing a common query language for disparate data sources these sources can be queried in a unified manner through the use of joins and subqueries.  LINQ also gives us some really powerful transformation capabilities.  Essentially LINQ is a domain specific language for working with sets of data.

How Does LINQ Work?

LINQ is made possible by several additions to the .NET Framework and in order to truly appreciate its power and elegance we need to first look at:

  • Extension Methods
  • Delegates/Lambda Expressions
  • Type Inference
  • Anonymous Types

Since these are all features of the .NET framework and/or compiler their usage is not restricted to LINQ.  Most of them are actually quite useful outside of LINQ as well.

Extension Methods

Central to the functionality of LINQ are extension methods. Extension methods allow adding capabilities to types without needing to derive a new type. They must be static methods within a static class. The type being extended must be the first parameter of the method and is modified using an overload of the this keyword.  Because extension methods add capabilities to an existing type without relying on inheritance we can even write extension methods for sealed classes.

LINQ introduces the class System.Linq.Enumerable that contains extension methods that extend the IEnumerable interface.  Microsoft could have added the signatures to the interface but that would be a breaking change and everything that previously built against IEnumerable would no longer compile until the implementations of those were provided.  By using extension methods Microsoft was able to introduce all of the LINQ query methods into the framework without breaking anything.

Activating LINQ is merely a matter of importing the System.Linq namespace.  Once the namespace is imported the extension methods are available to any type that implements IEnumerable including lists, arrays, and even strings.  There’s even a trick for using the non-generic IEnumerable with LINQ that we’ll discuss in a later post.

Delegates/Lambda Expressions

While extension methods provide the methods that make LINQ possible delegates make them work.  Most of the extension methods in the Extensible class accept one or more delegates as parameters.  Delegates have always been available in .NET but their usage and syntax has evolved over the years.

Before C# 2.0 the only way to use delegates was to have a named method.  C# 2.0 introduced anonymous methods using the delegate keyword.

Handling an event with the delegate syntax

var t = new System.Timers.Timer(1000);
t.Elapsed += delegate(object sender, System.Timers.ElapsedEventArgs ea) { Console.WriteLine("Timer elapsed"); };

t.Start();
System.Threading.Thread.Sleep(10000);
t.Stop();

Notice how the event is handled by an inline anonymous method rather than a separate named method.  LINQ makes heavy use of delegates to control query behavior. Having to include a full method signature to pass to a method would make LINQ statements virtually unreadable so clearly something else was needed. This is where lambda expressions come in to play.

C# 3.0 added support for lambda expressions. Lambda expressions are functionally equivalent to the delegate syntax above but are more developer friendly. In C# lambda expressions use the => (goes to) operator. The left side contains the list of parameters and the right side contains the method body.

Handling an event with a lambda expression

var t = new System.Timers.Timer(1000);
t.Elapsed += (s, ea) => Console.WriteLine("Timer elapsed");

t.Start();
System.Threading.Thread.Sleep(10000);
t.Stop();

In both examples the timer’s elapsed event is handled by an anonymous method and both handle the event exactly the same way but in the lambda example we have the much more concise and easier to read syntax.  The key difference between the traditional delegate syntax and a lambda expression is the lack of any type information in the parameter list of the lambda expression.  This lack of type information is a great segue into the next technology important to LINQ: type inference.

Type Inference

Type inference allows the compiler to determine the type of a variable, return type, or generic type. By letting the compiler do its job with type inference we can remove a lot of the explicit nature of type identification. Type inference gives us the ability to use anonymous types and use the var keyword to declare variables (and is required to use anonymous types).

Using the var keyword to declare variables is the subject of debate. One side is opposed to its use saying that code is too ambiguous whereas the other side likes the simplicity and convenience of it. I fall into the later group because I’ve found that as I’m first developing something I may change variable or return types multiple times as the design is flushed out.  By using the var keyword I typically only have to change the type on one place rather.  The var keyword is also required when using anonymous types.  If there’s ever any question about what type is being resolved, just hover over var in Visual Studio.

Don’t confuse use of the var keyword with the dynamic keyword in .NET 4.0 or JavaScript’s var. Variables declared with the var keyword are still strongly typed, we’re just letting the compiler figure out what the type really is.

Anonymous Types

Finally, we have anonymous types. Anonymous types are dynamically defined types with no formal definition outside of their usage.  At compile time the compiler will generate a read-only type based on the inline definition of the type.  The type name is not known until compile time and the generated type name is not valid within C# so the only way to declare a variable of an anonymous type is through the var keyword described above. Although they’re not required to use LINQ anonymous types add a lot of capabilities for projecting results from a query.

Creating an anonymous type with two properties

var anon = new { IntegerValue = 1, StringValue = "A String" };

Due to the way anonymous types are defined there are some restrictions on their use. Although there are some ways around this the rule of thumb is that anonymous types can only be used within the scope where they are declared.

Next Steps

We’ve covered what LINQ is and the main pieces of the .NET Framework make it possible.  In the next post we’ll look at the common query methods and how to construct queries.

DaveFancher.com Reloaded

I’ve owned DaveFancher.com for as long as I can remember but I’ve been neglecting it for the past few years.  I’ve neglected it so much that I’ve actually been paying a Web host for e-mail.  That came to an abrupt end tonight.

When I started the site I rolled my own blog and for the most part, it met my needs.  I had a rudimentary rich text editor, I had attachments, I had commenting, I think I even had an RSS feed.  I ultimately got to a point where I wanted to allow drafts, versioning, trackbacks (not that they’d ever be used!), and even ping sites like Technorati but I didn’t have the desire to build any of it.  I just wanted to write.  By the time I reached this point blogging software was coming of age so I started seeking other solutions.

For a while I used Blogger (Blogspot at the time) but I never really liked it although I couldn’t really explain why.  After a long but unproductive run with Blogger/Blogspot I went hunting again.  I checked a few of my friends’ blogs and many of them were using WordPress so I decided to check it out and was hooked almost immediately.

One of the first things I looked into with WordPress was how to self-host.  After all, I was paying for it, right?  Unfortunately it required MySQL which my host didn’t support.  I was kind of disappointed but looked at the hosted option anyway.  WordPress made migrating from Blogger really easy and was so feature-rich I knew it was what I was looking for.  DaveFancher.com would continue to appear abandoned but I wasn’t about to give up my e-mail address.

Fast forward to this evening.  I took the plunge.  I purchased the domain add-on for my WordPress blog, updated the name servers with my registrar, and waited… Amazingly it only took about an hour for the changes to take effect.  But what about e-mail?

As I mentioned, the only reason I’ve really been hanging on to the host was e-mail but the increase in spam over the past few months was becoming an annoyance and was a huge influence on my decision.  Luckily Google offers a free version of Google Apps that makes GMail available to custom domains.  WordPress’s recent addition of DNS editing made it simple to allow Google Apps to manage e-mail.  All I had to do was enter the verification code from Google Apps to let WordPress generate some entries and manually add a few extra CNAME entries to simplify some access.

In the few months since I switched to WordPress I’ve been posting with more frequency than ever before.  Tonight’s changes should give me even more motivation to keep it up.  Now, just a few hours after starting the process DaveFancher.com has a new lease on life thanks to WordPress and Google.

IndyNDA: MongoDB & Reactive Extensions

The June 10 IndyNDA meeting was one of the most interesting I’ve attended in months.  First, Dennis Burton gave a high-level introduction to MongoDB then for the C# SIG Joel Dart gave a fairly detailed introduction to Reactive Extensions.  I’ve heard a lot of buzz about both of these technologies lately but really haven’t done much investigation on my own so I was glad to see them both in action.  I was inspired to look deeper into both technologies after the presentations.

MongoDB

MongoDB has been getting quite a bit of press lately.  There was even a session devoted to it at IndyTechFest.  Even though I had heard a bit about it and knew it falls under the NoSQL heading I really didn’t know much more than that.  What’s wrong with relational databases?  What does it mean to be NoSQL?  Where does Mongo fit in?  Dennis did a great job answering each of these questions and more.

What’s Wrong With Relational Databases?

Nothing.  The relational model is alive and well and has a definite place in the world of software development.  The problem is that the relational model was originally described in 1970 and hasn’t changed significantly since then but the world has and the types of applications we’re building have changed with it.

Think about some of the decisions that need to be made for a new application (assuming .NET):

  • Which language should it be written in?  C#?  VB?  C++?  F#?
  • What type of UI should it have?  WinForms?  WPF?  Silverlight?
  • If it needs interprocess communication should it use remoting or WCF?

We answer all of these questions and more, then we say “oh, and we need a relational database” regardless of whether it’s the right tool for the job.  Relational databases have been the default answer to the question of how to store the data for years.  The NoSQL family of databases offers an alternative to the relational model that may or may not be the best solution to the problem.

What Does it Mean to be NoSQL?

NoSQL is exactly as it sounds.  NoSQL databases don’t use SQL for storing or accessing data.  MongoDB is a document oriented database.  Unlike a relational database which stores related objects separately and brings them together with joins a document oriented database stores related objects together.  This differentiation does require a change in mindset and terminology.

Relational Document Oriented
Database Database
Table Collection
Row Document

With MongoDB there is no predefined schema.  Databases can be implicitly created just by referencing them.  The same goes for collections.  Need a new collection?  Just specify a name and add a document.  The lack of a predefined schema also means that documents stored within a collection don’t need to follow a consistent structure although they typically will.

Where Does Mongo Fit in?

Mongo is designed for speed and scalability.  This makes it ideal for applications where performance is critical.  The MongoDB web site states that it is well suited for content management, caching, high-volume, and areas requiring a flexible data structure.  That said, given the emphasis on speed and scalability it is not well suited for environments requiring complex transactions such as banking systems nor is it good for problem-specific business intelligence systems.  MongoDB also has very limited support for user security leaving the responsibility for managing security up to the application.

Additional Resources

I’ve starting learning MongoDB and will probably have more posts to go along with what I’m discovering but until then:

Reactive Extensions for .NET (RxNet)

Reactive Extensions (Rx) for .NET is another technology I’ve heard some whispers about but hadn’t really looked into.  So, what is Rx?

The Microsoft DevLabs site states “Rx is a library for composing asynchronous and event-based programs using observable collections.”  What this means is that we can now use a superset of LINQ to respond to events or other asynchronous operations in a declarative rather than imperative manner.

Rx introduces the IObservable<T> and IObserver<T> interfaces which are essentially the opposite of IEnumerable<T> and IEnumerator<T>.  IEnumerable<T> and IEnumerator<T> work with pull-based operations where there is a predefined set of data to work with whereas IObservable<T> and IObserver<T> work with push-based operations where the entire set of data cannot be known in advance.  There is also a specialized ISubject<T> interface that simplifies error handling and completion.

Joel’s first demo really drove home the point of why Rx is often referred to as LINQ to Events.  In this demo he showed a Silverlight application with a red square that he was able to drag around with the mouse.  Rather than having to respond to each MouseDown, MouseUp, and MouseMove event individually and set state flags to manage what should happen within each event handler he showed how Rx enabled handling each of those events within a single LINQ statement that would calculate and project the delta of the mouse pre and post positions and pass the delta to a separate function responsible for updating the position of the box.

Rx really offers a powerful new way to work with asynchronous and event-based programs.  I can already see a few places in some current projects where it would be really handy.  Rx also isn’t restricted to .NET.  There’s already an Rx library for JavaScript!  I’m really looking forward to getting some time for further investigation.

Additional Resources

My IndyTechFest Experience

This past Saturday I, along with 400+ developers, admins, and DBAs attended IndyTechFest.  It was a long, intense day of sessions covering topics such as WPF, Silverlight, SQL Server, C#, VB, Testing, and Windows Phone 7.  I’ve had a few days to digest what I heard and wanted highlight some things from each of the sessions I attended.

This year’s conference was split into seven tracks each with five sessions and an all-day open space.  All of the tracks had at least one topic I was interested in and many time slots had conflicts but ultimately I stayed within the general .NET and Silverlight tracks.  My schedule for the day was:

  • Keynote: Are My Three Screens Cloudy?
  • WPF for Developers
  • Implementing MVVM for WPF
  • The State of Data Services: Open Data for the Open Web
  • C# Tips and Tricks
  • Silverlight Code Survey

For the most part I found value in each of the sessions I attended.  Thanks go out to the sponsors, organizers, and volunteers that made this event possible.

Keynote: Are My Three Screens Cloudy?

Presented By: Jesse Liberty

In many ways Jesse Liberty’s keynote was the highlight of the day.  I think my #1 takeaway for the day is that Jesse Liberty is awesome!  In the keynote Jesse briefly described his position within Microsoft, how he got there, and gave a quick history on the evolution of Silverlight.  He went on to describe what Microsoft sees as the “three screens” (computer, TV, and phone) and how Silverlight is the technology that will bring the three screens together through Windows, XBox360, and Windows Phone 7.

WPF For Developers

Presented By: Phil Japikse

This was the first of two Windows Presentation Foundation (WPF) sessions from Phil Japikse.  In this session Phil gave a good introduction to WPF for the non-initiated (like me).  He started by defining WPF, describing the advantages and disadvantages of WPF to WinForms, and discussing new features in .NET 4.0.  The majority of the session was demonstrating some of the more common features.

Some highlights:

  • Creating custom spell-check dictionaries with .lex files
  • Panels dock in XAML order
  • Controls tab in XAML order by default
  • INotifyPropertyChanged interface
  • INotifyCollectionChanged interface

The presentation and example code are both available on Phil’s Samples and Presentations page.

Implementing MVVM for WPF

Presented By: Phil Japikse

Expanding upon his first WPF session, Phil discussed how to implement the Model-View-ViewModel (MVVM) pattern in WPF.  This session was almost entirely demo showing the classes that represent each part of the pattern and how they interact.

The presentation and example code are both available on Phil’s Samples and Presentations page.

Additional Resources:

The State of Data Services: Open Data for the Open Web

Presented By: Dan Rigsby

Dan Rigsby gave a great introduction to OData, a protocol developed by Microsoft to facilitate data interchange between systems using existing Web technologies.  He started by describing REST and Atom/Pub, two technologies that make OData possible then went on to show OData in action.

REST (http://en.wikipedia.org/wiki/Representational_State_Transfer)

  • Embrace the URI
  • HTTP Verbs (GET, POST, etc…) translate to methods
  • Content-Type defines the object model
  • Status code is the result

Atom/Pub (http://atompub.org/)

  • Standards based XML syndication format for publishing and editing web resources
  • Preserves metadata
  • Provides constructs

OData (http://www.odata.org/)

  • “Open” Data
  • Formerly known “Astoria” and ADO.NET Data Services
  • Open protocol
  • WCF Data Services is Microsoft’s provider for creating and consuming OData
  • Netflix provides an OData interface to its video library

C# Tips and Tricks

Presented By: Mark Strawmyer

With all due respect to Mr. Strawmyer I was incredibly disappointed by this session.  The IndyTechFest program had this to say about the session:

This C# presentation focuses on tips and tricks for the C# developer.  It contains a mixture of C# specific features along with other handy how-to items such as shortcuts for working with the C# IDE that will make you more productive.

This session did not fit the description.  I understand the the previous session was a C# 4.0 overview and there was a strong desire to avoid duplication of information but only two of the tips/tricks mentioned were actually specific to C# and one of those was a C# 4.0 feature!

For the curious, the tips and tricks discussed were:

  • Optional & named parameters
  • Extension methods
  • ObsoleteAttribute
  • GC.Collect()
  • using keyword
  • Parallel extensions
  • Utilities

I really question offering up GC.Collect() as a tip, especially when it was provided with the caveat “you can do this but don’t.”  Is letting people know something is possible really a tip if it shouldn’t be done or is not doing it the tip?

To me, a C# tips and tricks presentation should include things such as lesser known/used operators, XML documentation & IntelliSense, compiler options, automatic properties, etc…

Silverlight Code Survey

Presented By: Jesse Liberty

This session was originally going to be “Application Development with Silverlight 4” but after some feedback from the morning’s keynote and the overlap with the MVVM session it was changed.  In this session Jesse did a quick run-through of creating a new Silverlight application, showing some basic data binding, and some basic animation.  Most of the demonstration is available on the Learn page of Silverlight.net.  Nevertheless, I wasn’t about to miss the opportunity to listen to Jesse present again.

In the Hallway

As with just about any conference lots of interesting things happen in the hallway between sessions.  I’ll sheepishly admit that I didn’t use this time as well as I could (should?) have but I really did enjoy playing with the Windows Phone 7 demo application at the Microsoft booth.  I was even clever enough to crash the app by clicking the emulator’s home button while a dialog box was open :)

Font Squirrel

This isn’t really photography related but a few weeks ago I stumbled across Font Squirrel.  They’ve aggregated tons of fonts that are available free for commercial use.  Most of the fonts I’ve looked at are available directly from the Font Squirrel site but there are a few that were only available from a vendor site.  I’ve only snagged four fonts so far but found one that I really like.  I’ve been using Windsong to “sign” my photos.

One of the things I really appreciate about Font Squirrel is the Test Drive feature.  Test Drive allows you to try out a font in your browser before you download and install it only to discover that it doesn’t look quite like you thought it would in practice.

If you’re looking for some new fonts definitely check out this site.

Test Framework Philosophy

My development team is working to implement and enforce more formal development processes than we have used in the past.  Part of this process involves deciding on which unit test framework to use going forward.  Traditionally we have used NUnit and it has worked well for our needs but now that we’re implementing Visual Studio Team System we now have MSTest available.  This has sparked a bit of a debate as to whether we should stick with NUnit or migrate to MSTest.  As we examine the capabilities of each framework and weigh each of their advantages and disadvantages I’ve come to realize that the decision is a philosophical matter.

MSTest has a bit of a bad reputation.  The general consensus seems to be that MSTest sucks.  A few weeks ago I would have thoroughly agreed with that assessment but recently I’ve come to reconsider that position.  The problem isn’t that MSTest sucks, it’s that MSTest follows a different paradigm than some other frameworks as to what a test framework should provide.

My favorite feature of NUnit is its rich, expressive syntax.  I especially like NUnit’s constraint-based assertion model.  By comparison, MSTest’s assertion model is limited, even restrictive if you’re used to the rich model offered by NUnit.  Consider the following “classic” assertions from both frameworks:

NUnit MSTest
Equality/Inequality Assert.AreEqual(e, a)
Assert.AreNotEqual(e, a)
Assert.Greater (e, a)
Assert.LessOrEqual(e, a)
Assert.AreEqual (e, a)
Assert.AreNotEqual (e, a)
Assert.IsTrue(a > e)
Assert.IsTrue(a <= e)
Boolean Values Assert.IsTrue(a)
Assert.IsFalse(a)
Assert.IsTrue(a)
Assert.IsFalse(a)
Reference Assert.AreSame(e, a)
Assert.AreNotSame(e, a)
Assert.AreSame(e, a)
Assert.AreNotSame(e, a)
Null Assert.IsNull(a)
Assert.IsNotNull(a)
Assert.IsNull(a)
Assert.IsNotNull(a)
e – expected value
a – actual value

They’re similar aren’t they?  Each of the assertions listed are functionally equivalent but notice how the Greater and LessOrEqual assertions are handled in MSTest.  MSTest doesn’t provide assertion methods for these cases but instead relies on evaluating expressions to define the condition.  This difference above all else defines the divergence in philosophy between the two frameworks.  So why is this important?

Readability

Unit tests should be readable.  In unit tests we often break established conventions and/or violate the coding standards we use in our product code.  We sacrifice brevity in naming with Really_Long_Snake_Case_Names_So_They_Can_Be_Read_In_The_Test_Runner_By_Non_Developers.  We sacrifice DRY to keep code together.  All of these things are done in the name of readability.

The Readability Debate

Argument 1: A rich assertion model can unnecessarily complicate a suite of tests particularly when multiple developers are involved.

Rich assertion models make it possible to assert the same condition in a variety of ways resulting in a lack of consistency.  Readability naturally falls out of a week assertion model because the guess work of which form of an assertion is being used is removed.

Argument 2: With a rich model there is no guess work because assertions are literally spelled out as explicitly as they can be.
Assert.Greater(e, a) doesn’t require a mental context shift from English to parsing an expression.  The spelled out statement of intent is naturally more readable for developers and non-developers alike.

My Position

I strongly agree with argument 2.  When I’m reading code I derive as much meaning from the method name as I can before examining the arguments.  “Greater” conveys more contextual information than “IsTrue.”  When I see “IsTrue” I immediately need to ask “What’s true?” then delve into an argument which could be anything that returns a boolean value.  In any case I still need to think about what condition is supposed to be true.

NUnit takes expressiveness to another level with its constraint-based assertions.  The table below lists the same assertions as the table above when written as constraint-based assertions.

Equality/Inequality Assert.That(e, Is.EqualTo(a))
Assert.That(e, Is.Not.EqualTo(a))
Assert.That(e, Is.GreaterThan(a))
Assert.That(e, Is.LessThanOrEqualTo(a))
Boolean Values Assert.That(a, Is.True)
Assert.That(a, Is.False)
Reference Assert.That(a, Is.SameAs(e))
Assert.That(a, Is.Not.SameAs(e))
Null Assert.That(a, Is.Null)
Assert.That(a, Is.Not.Null)
e – expected value
a – actual value

Constraint-based assertions are virtually indistinguishable from English.  To me this is about as readable as code can be.

Even the frameworks with a weak assertion model provide multiple ways of accomplishing the same task.  Is it not true that Assert.AreEqual(e, a) is functionally equivalent to Assert.IsTrue(e == a)?  Is it not also true that Assert.AreNotEqual(e, a) is functionally equivalent to Assert.IsTrue(e !=a)?  Since virtually all assertions ultimately boil down to ensuring that some condition is true and throwing an exception when that condition is not true, shouldn’t weak assertion models be limited to little more than Assert.IsTrue(a)?

Clearly there are other considerations beyond readability when deciding upon a unit test framework but given that much of the power of a given framework is provided by the assertion model it’s among the most important.  To me, an expressive assertion model is just as important as the tools associated with the framework.

Your thoughts?