LINQed Up (Part 3)

This is the third part of a series intended as an introduction to LINQ.  The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner.  In this post we’ll take a more detailed look at composing queries through a variety of examples and introduce some new concepts such as deferred execution.

In the previous post we covered some of the common query methods and introduced both method and query syntax.  By the end of this post you should have a solid foundation to begin composing your own queries within your applications.  The examples will use either method or query syntax depending upon their complexity.  Before we see the examples though we should take a closer look at query syntax.

Diving Into Query Syntax

As mentioned in the previous post query syntax allows us to compose LINQ expressions in a SQL-like manner.  This is accomplished through the use of several new or overloaded keywords that are converted into their method syntax equivalent at compile time.

Note: Not every method in the System.Linq.Enumerable class has a query syntax equivalent.

The main keywords used by query syntax are:

  • from
  • where
  • select
  • join
  • orderby [ascending | descending]
  • group by
  • let

Although the purpose of most of these keywords should be pretty clear it’s worth taking a closer look at each of them.

from

The from keyword identifies the sequence that will serve as the primary source for the query.  Every query built using query syntax must begin with a from clause.  By placing the from clause first we get full IntelliSense support on our query syntax queries.  Any type that implements IEnumerable can be used in the from clause.

Some queries will contain multiple from clauses.  This is a special syntax called SelectMany that is used to flatten multiple sequences into one.

where

LINQ query syntax provides a where clause that provides the same basic function as its SQL counterpart.  In the LINQ form we get the benefit of strong typing and IntelliSense.

Note: The criteria provided in the where clause can include any number of .NET operations but may be limited depending on the capabilities of the LINQ provider.

select

The select clause is used to project query results into a new sequence.  Select is capable of returning single values, well-known types, or anonymous types.  Select is the workhorse behind one of LINQ’s most powerful features: transforming data from one structure to another.

join

LINQ is capable of joining multiple sequences together much like SQL can join tables or views.  Part of what makes LINQ so powerful is its ability to join data from disparate sources in a single query.  This allows us to join simple .NET collections to XML or data returned from LINQ to SQL or Entity Framework.

One restriction on join joins in LINQ is that it requires an equality comparison.  To eliminate any ambiguity about what can be used Microsoft introduced the equals keyword to replace the == operator.

orderby

LINQ allows sorting in query syntax through the use of the orderby keyword.  By default items will be sorted in ascending order but that can be controlled through the ascending and descending keywords.  Multiple items can be sorted by separating them with a comma.

group by

Grouping in LINQ works a bit different than in SQL.  In SQL we simply specify one or more column names in the group by clause.  LINQ’s operates by projecting values into a System.Linq.IGrouping that is keyed upon the group and all group members as members of the group.  Since IGrouping implements IEnumerable it can be directly projected into another query.  Grouping can be performed upon either a data value or a calculated value.

let

The let keyword is a bit of an oddity.  Unlike the other query syntax keywords let doesn’t have a direct mapping to a method.  Instead, let is provided for convenience to declare query scoped variables to eliminate the need for repetitive operations within a query.

LINQ Examples

Now that we’ve covered the how LINQ works, its common methods, and syntax we should have a good foundation to see some examples of LINQ in action and understand what is happening.  In the following sections we’ll see how all of the pieces we’ve discussed so far fit together to make LINQ a powerful tool for solving a wide variety of set based problems.

Each of the examples assume the following data structures and data:

class Author
{
	public Author(string firstName, string lastName)
	{
		FirstName = firstName;
		LastName = lastName;
	}

	public string FirstName { get; set; }
	public string LastName { get; set; }
}

class Book
{
	public Book(int id, string title, string isbn13, int copyrightYear, params Author[] authors)
	{
		ID = id;
		Title = title;
		Isbn13 = isbn13;
		CopyrightYear = copyrightYear;
		Authors = authors;
	}

	public int ID { get; set; }
	public string Title { get; set; }
	public string Isbn13 { get; set; }
	public int CopyrightYear { get; set; }
	public IEnumerable<author> Authors { get; set; }
}

IEnumerable<book> GetLibrary()
{
	return new List<book>()
	{
		new Book(
			1,
			"Essential LINQ",
			"978-0-321-56416-0",
			2009,
			new Author("Charlie", "Calvert"),
			new Author("Dinesh", "Kulkarni")
		),
		new Book(
			2,
			"Programming F#",
			"978-0-596-15364-9",
			2010,
			new Author("Chris", "Smith")
		),
		new Book(
			3,
			"C# 4.0 in a Nutshell",
			"978-0-596-80095-6",
			2010,
			new Author("Joseph", "Albahari"),
			new Author("Ben", "Albahari")
		),
		new Book(
			4,
			"WPF in Action with Visual Studio 2008",
			"978-1-933-98822-1",
			2009,
			new Author("Arlen", "Feldman"),
			new Author("Maxx", "Daymon")
		),
	};
}
Note: I recommend using LINQPad to run these examples. If using LINQPad remember to change the language option to C# Program. If using Visual Studio to run the examples make sure your code file includes a using directive for System.Linq.

Starting Simple

This first example illustrates the most basic form of a LINQ statement.  Consider it the “Hello World” example.

var query = from b in GetLibrary()
			select b;

Even in this example we can see several of the LINQ concepts coming into play. First, query is defined using the var keyword indicating type inference. The compiler understands that GetLibrary() returns IEnumerable<Book> and the select clause is projecting instances of Book so in this case the inferred type of query is IEnumerable<Book>. We also see how the from clause defines the range variable and appears before the select clause.

Despite the simplicity of this query it LINQ really isn’t that effective here. This query can be better expressed by simply referring to GetLibrary() since all we’re doing is retrieving each item in the sequence.

Transformations

LINQ includes extension methods for converting sequences to lists, dictionaries, and arrays but it is by no means restricted to those types. Through the Select clause (or method) we can return either a new instance of a well-known type or define a new anonymous type on the fly.
Transform to Array

var arr = GetLibrary().ToArray();

Transform to XML
To really demonstrate the transformation capabilities we need to see a more complex example. Here we’ll take the entire structure returned by GetLibrary() into a well-formed XML document with LINQ to XML.

var query = new XElement("library",
						 from b in GetLibrary()
						 select new XElement("book",
						 					 new XAttribute("title", b.Title),
											 new XAttribute("isbn_13", b.Isbn13),
											 new XAttribute("copyright", b.CopyrightYear),
											 new XElement("authors",
											 			  from a in b.Authors
														  select new XElement("author",
														  					  new XText(String.Format("{0}, {1}", a.LastName, a.FirstName))
														 )
											 )
						 )
			);

Basic Filtering

Much of LINQ’s real power comes from the operations it can perform against sequences so what’s a better place to start than with it’s filtering capabilities.  Prior to LINQ if we wanted to operate against a subset of a sequence we needed to nest potentially complex logic in the body of a loop. With LINQ we can filter the sequence down to only the elements we care about before entering the loop.

var query = from b in GetLibrary()
			where b.ID == 1
			select b;

This example illustrates how easy it is to reduce a sequence to only the elements we care about. Here we’re reducing the full sequence down to only those elements with an ID of 1. Although it isn’t necessarily obvious, this query introduces a lambda expression. Let’s look at the same query using method syntax to bring out the lambda.

var query = GetLibrary().Where(b => b.ID == 1);

In either case we’ll get an IEnumerable<book> that contains only a single item. In situations like this where we know we’ll only get one item or we just want the first no matter how many items are returned we can turn to the First() or FirstOrDefault() extension methods. Each of these methods have an overload that accepts a lambda expression to filter the sequence they are acting upon. We can simplify the above queries to just get the single book we care about:

var query = GetLibrary().FirstOrDefault(b => b.ID == 1);

Filtering is also not limited to a single value nor is it restricted to values contained directly in a sequence.

var query = from b in GetLibrary()
			where b.CopyrightYear == 2009 && b.Authors.Count() > 1
			select b;

The example above reduces the sequence to only those books with a copyright year of 2009 and multiple authors. While the collection of authors is contained within each book the number of authors is determined by calling the Count() extension method.

Deferred Execution

Having seen a few queries and the role that lambda expressions play in them we have a perfect opportunity to explore deferred execution. In each of the above examples we define a variable named query and set it to the query. The query is not actually executed until it is enumerated. That is, even though we have defined the query we don’t actually determine what items make up the sequence until we force enumeration of the query by using it a loop or call a method that causes it to enumerate. To observe the effect consider the next example:

var title = "LINQ";

var query = from b in GetLibrary()
			where b.Title.Contains(title)
			select b;

title = "F#";

var book = query.FirstOrDefault();

The query references a local string variable named title that was initialized to “LINQ” prior to the definition of the query. We then change the value of title to “F#” before retrieving the first item from the new sequence. Because the value of the variable isn’t resolved in the query until the query is executed we get the book “Programming F#” rather than “Essential LINQ.”

Deferred execution is an important concept to understand in LINQ to Objects and critical for LINQ to SQL and Entity Framework.

Sorting

Sorting with LINQ is pretty straight forward. Results can be sorted on one or more values in ascending or descending order.
Single Value Ascending

var query = from b in GetLibrary()
			orderby b.Title
			select b.Title;

Single Value Descending

var query = from b in GetLibrary()
			orderby b.Title descending
			select b.Title;

Multiple Values

var query = from b in GetLibrary()
			orderby b.CopyrightYear descending, b.Title
			select new { b.ID, b.Title, b.CopyrightYear };

The same query can be expressed in method syntax as follows:

var query = GetLibrary()
				.OrderByDescending (b => b.CopyrightYear)
				.ThenBy (b => b.Title)
				.Select(b => new
					{
						b.ID,
						b.Title,
						b.CopyrightYear
					}
				);

Joins

Sequences from one or more data sources may be joined together in a single query.  The data sources don’t even need to be of the same type.

LINQ supports both inner and outer joins. To demonstrate the join capabilities we’ll introduce an XElement that we can join to using part of each book’s ISBN.

Note: The XElement class is part of LINQ to XML and can be found in the System.Xml.Linq namespace. The LINQ to XML classes offer many advantages over the traditional XML classes in that they were specifically designed for queries and composition.
XElement GetPublishers()
{
	return new XElement("publishers",
						new XElement("publisher",
									 new XAttribute("id", 321),
									 new XText("Addison-Wesley")),
						new XElement("publisher",
									 new XAttribute("id", 933),
									 new XText("Manning")),
						new XElement("publisher",
									 new XAttribute("id", 596),
									 new XText("O'Reilly")),
						new XElement("publisher",
									 new XAttribute("id", 7653),
									 new XText("Tor Books")),
						new XElement("publisher",
									 new XAttribute("id", 312),
									 new XText("St. Martin's Griffin")));
}

Inner Join
In the next example we retrieve all of the publisher elements from our XML document. Since we don’t have the publisher ID available directly within the Book class we extract it from the ISBN and specify that the extracted publisher id is equal to the id attribute of each element. We then project a new anonymous type that includes values from both sequences.

var query = from b in GetLibrary()
			join p in GetPublishers().Descendants("publisher") on int.Parse(b.Isbn13.Split('-')[2]) equals int.Parse(p.Attribute("id").Value)
			select new { PublisherName = p.Value, b.Title };

Outer Join
Outer joins are a bit more complicated than inner joins but only slightly. In addition to specifying the join values we need to project the results of the join into another sequence that we then select from using the DefaultIfEmpty extension method.

var query = from p in GetPublishers().Descendants("publisher")
			let publisherID = int.Parse(p.Attribute("id").Value)
			join b in GetLibrary() on publisherID equals int.Parse(b.Isbn13.Split('-')[2])
				into bookPublishers
			from bp in bookPublishers.DefaultIfEmpty()
			select new
			{
				PublisherID = publisherID,
				PublisherName = p.Value,
				Book = bp
			};

SelectMany

SelectMany allows us to flatten multiple sequences. Using our Books data we can flatten the structure to extract a single sequence containing all the authors.

var query = from b in GetLibrary()
			from a in b.Authors
			orderby a.LastName
			select a;

Grouping

Grouping query results by a value is fairly straight-forward. With grouping we need to identify the source sequence, specify the group ing value, then project the grouped results into another IGrouping sequence. In this example we’ll see how to group books by copyright year.

var query = from b in GetLibrary()
			group b by b.CopyrightYear into y
			select y;

Next Steps

Having read through and (hopefully) trying the examples you should have a good understanding of how to implement LINQ in your projects. While LINQ is great for simplifying code it does come at a price. In the next post we’ll examine some of the performance implications of using LINQ and look at how to optimize some queries.