C#

LINQed Up (Part 2)

This is the second part of a series intended as an introduction to LINQ.  The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner.  In this post we’ll look at some of the common query methods in Enumerable and what LINQ looks like.

In the previous post we defined LINQ and discussed some of the features of the .NET Framework that make LINQ possible.  This post will build upon that foundation.  By the end of this post you should understand the basics of LINQ and understand how it can fit into your development toolbox.

Common Query Methods

The Enumerable class in the System.Linq namespace is central to LINQ’s functionality in that it defines all of the core extension methods that make up LINQ.  There are tons of methods in the Enumerable class but for the purpose of this post we’ll focus on just a few.  Most of the methods here have at least one overload.  We’ll examine each method taking a more generalized approach to introduce the methods and divide them into a few categories.

Sequence Operations

The methods in this section work across an entire sequence.

Method Name Description
Where Primary method used to filter a sequence.
Select Primary method used to project results from the query
Join Allows combining sequences into a single query.  Similar to a SQL join.
OrderBy/OrderByDescending Sorts a sequence.
All Indicates whether every element in a sequence meet the specified criteria.
Any Indicates whether any element in a sequence meet the specified criteria.
Count Gets the number of elements in a sequence.

Element Retrieval Operations

Each of the methods below allow retrieval of a specific element in a sequence and have two forms.  The basic form will throw an exception if the element cannot be found while the OrDefault version will return default(T).

Method Name Description
First/FirstOrDefault Retrieves the first element in a sequence.
ElementAt/ElementAtOrDefault Retrieves the element at the specified position in a sequence.
Last/LastOrDefault Retrieves the last element in a sequence.

What Does LINQ Look Like?

LINQ statements can be written in either of two forms: query syntax and method (dot) syntax.  While these forms are functionally equivalent they each have their place and the decision about which form to use will often come down to readability and is typically situational.  There is no reason to use one form exclusively.

Each of the following examples will use a sequence that contains the first ten numbers in the Fibonacci sequence.

var fibonacci = new int[] { 1, 1, 2, 3, 5, 8, 13, 21, 34, 55 };

Where We’re Coming From

Before looking at each LINQ syntax it’s helpful to see an example of what LINQ aims to address.  Consider the following code that builds a sequence of the even numbers found in the Fibonacci sequence defined above:

var evenNumbers = new List();
foreach(var i in fibonacci)
{
	if(i % 2 == 0)
	{
		evenNumbers.Add(i);
	}
}

This code, while not particularly complex involves a number of steps to accomplish a relatively simple task.  First we define a List<int> to hold the even numbers.  We then enter a loop where we check whether each value is even before adding it to the list.  This is imperative programming at its finest.  The focus of the code is on how rather than what.  Of the eight lines of code only one line is really relevant to the problem of finding even numbers in the source sequence.  LINQ addresses the imperative nature of this code by providing a functional framework to let us focus on the what rather than the how.

Method Syntax

Method syntax is a fluent interface that allows building queries using method calls.  As its name implies, method syntax calls the LINQ extension methods directly passing lambda expressions as parameters.  Compare the code below with the traditional example above.

var evenNumbers = fibonacci.Where(i => i % 2 == 0);

In this example we remove all of the imperative code and replace it with a single method call and let LINQ do all of the heavy lifting.  When the Where method executes it calls the supplied lambda expression for each value in the source sequence.  The lambda expression must return a boolean value that informs the Where method whether or not the current value meets the criteria.

Method syntax tends to be more readable for simple queries such as this example.

Query Syntax

Alternatively, query syntax introduces a SQL-like syntax for writing queries.  Here is the same example repeated using query syntax:

var evenNumbers =
    from i in fibonacci
    where i % 2 == 0
    select i;

Query syntax tends to be more verbose than method syntax but is well suited for composing more complex queries such as those that use joins.

Notice the SQL like structure of the above query. One important difference to note between SQL and query syntax is that the from and where clauses are in the opposite order as they would be in a SQL query.  In fact, query syntax requires the from clause to appear first.  By placing the from clause at the beginning of the query we get all of the benefits of IntelliSense within Visual Studio.  Ultimately though, query syntax is just some syntactic sugar.  When the code is compiled any queries using query syntax are parsed and converted to method syntax.

Next Steps

Now that we’ve seen some of the common LINQ methods and understand how to compose LINQ statements with both method and query syntax we can look at how to use the various methods.  The next post will go in-depth showing how to compose LINQ statements using the common methods and introduce some new concepts such as deferred execution.

LINQed Up (Part 1)

This is the first of a series intended as an introduction to LINQ. The series should provide a good enough foundation to allow the uninitiated to begin using LINQ in a productive manner. In this post we’ll look at what LINQ is and how it works.

My director and I were recently talking about questions he has been asking candidates for senior level .NET development positions. He mentioned that he has been asking the candidates to describe LINQ and some situations where it would be useful.  The response from each of the candidates has ranged from a blank stare to something along the lines of “it means you don’t need to write SQL anymore.”  Those responses are the inspiration for this series.

The blank stares are discouraging but the statements that constrain LINQ to a very specific use case illustrate a fundamental lack of understanding of the technology. It is true that LINQ can greatly simplify interaction with a database through LINQ to SQL or Entity Framework but those are only a small part of what LINQ can do.  In fact, the majority of the places I’ve used LINQ have no database interaction whatsoever.  LINQ has so many applications beyond database access that I find myself using at least some part of it in most of my projects and often in some unexpected places.

Let’s start with a trip through the basics.

What is LINQ?

Language INtegrated Query (LINQ) was introduced with the .NET Framework v3.5. MSDN has this to say about it:

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities.

Although the description is accurate I think the language regarding “native language syntax for queries” is what leads people to mistakenly believe that the only use for LINQ is with a database. After all, we’ve been conditioned to think that queries are database operations.  That said, I offer an alternative definition:

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for querying data from a variety of sources and provides class libraries to take advantage of these capabilities.

The idea that LINQ makes it possible to query data from a variety of sources is critically important to using it to its full potential. It means that LINQ is not constrained to working with databases but actually comes in several flavors:

  • LINQ to Objects
  • LINQ to XML
  • LINQ to SQL (and Entity Framework)
  • LINQ to DataSets
  • LINQ to Twitter
  • etc…

Essentially any data source can be queried with LINQ as long as there’s a corresponding provider. In addition to providing a common query language for disparate data sources these sources can be queried in a unified manner through the use of joins and subqueries.  LINQ also gives us some really powerful transformation capabilities.  Essentially LINQ is a domain specific language for working with sets of data.

How Does LINQ Work?

LINQ is made possible by several additions to the .NET Framework and in order to truly appreciate its power and elegance we need to first look at:

  • Extension Methods
  • Delegates/Lambda Expressions
  • Type Inference
  • Anonymous Types

Since these are all features of the .NET framework and/or compiler their usage is not restricted to LINQ.  Most of them are actually quite useful outside of LINQ as well.

Extension Methods

Central to the functionality of LINQ are extension methods. Extension methods allow adding capabilities to types without needing to derive a new type. They must be static methods within a static class. The type being extended must be the first parameter of the method and is modified using an overload of the this keyword.  Because extension methods add capabilities to an existing type without relying on inheritance we can even write extension methods for sealed classes.

LINQ introduces the class System.Linq.Enumerable that contains extension methods that extend the IEnumerable interface.  Microsoft could have added the signatures to the interface but that would be a breaking change and everything that previously built against IEnumerable would no longer compile until the implementations of those were provided.  By using extension methods Microsoft was able to introduce all of the LINQ query methods into the framework without breaking anything.

Activating LINQ is merely a matter of importing the System.Linq namespace.  Once the namespace is imported the extension methods are available to any type that implements IEnumerable including lists, arrays, and even strings.  There’s even a trick for using the non-generic IEnumerable with LINQ that we’ll discuss in a later post.

Delegates/Lambda Expressions

While extension methods provide the methods that make LINQ possible delegates make them work.  Most of the extension methods in the Extensible class accept one or more delegates as parameters.  Delegates have always been available in .NET but their usage and syntax has evolved over the years.

Before C# 2.0 the only way to use delegates was to have a named method.  C# 2.0 introduced anonymous methods using the delegate keyword.

Handling an event with the delegate syntax

var t = new System.Timers.Timer(1000);
t.Elapsed += delegate(object sender, System.Timers.ElapsedEventArgs ea) { Console.WriteLine("Timer elapsed"); };

t.Start();
System.Threading.Thread.Sleep(10000);
t.Stop();

Notice how the event is handled by an inline anonymous method rather than a separate named method.  LINQ makes heavy use of delegates to control query behavior. Having to include a full method signature to pass to a method would make LINQ statements virtually unreadable so clearly something else was needed. This is where lambda expressions come in to play.

C# 3.0 added support for lambda expressions. Lambda expressions are functionally equivalent to the delegate syntax above but are more developer friendly. In C# lambda expressions use the => (goes to) operator. The left side contains the list of parameters and the right side contains the method body.

Handling an event with a lambda expression

var t = new System.Timers.Timer(1000);
t.Elapsed += (s, ea) => Console.WriteLine("Timer elapsed");

t.Start();
System.Threading.Thread.Sleep(10000);
t.Stop();

In both examples the timer’s elapsed event is handled by an anonymous method and both handle the event exactly the same way but in the lambda example we have the much more concise and easier to read syntax.  The key difference between the traditional delegate syntax and a lambda expression is the lack of any type information in the parameter list of the lambda expression.  This lack of type information is a great segue into the next technology important to LINQ: type inference.

Type Inference

Type inference allows the compiler to determine the type of a variable, return type, or generic type. By letting the compiler do its job with type inference we can remove a lot of the explicit nature of type identification. Type inference gives us the ability to use anonymous types and use the var keyword to declare variables (and is required to use anonymous types).

Using the var keyword to declare variables is the subject of debate. One side is opposed to its use saying that code is too ambiguous whereas the other side likes the simplicity and convenience of it. I fall into the later group because I’ve found that as I’m first developing something I may change variable or return types multiple times as the design is flushed out.  By using the var keyword I typically only have to change the type on one place rather.  The var keyword is also required when using anonymous types.  If there’s ever any question about what type is being resolved, just hover over var in Visual Studio.

Don’t confuse use of the var keyword with the dynamic keyword in .NET 4.0 or JavaScript’s var. Variables declared with the var keyword are still strongly typed, we’re just letting the compiler figure out what the type really is.

Anonymous Types

Finally, we have anonymous types. Anonymous types are dynamically defined types with no formal definition outside of their usage.  At compile time the compiler will generate a read-only type based on the inline definition of the type.  The type name is not known until compile time and the generated type name is not valid within C# so the only way to declare a variable of an anonymous type is through the var keyword described above. Although they’re not required to use LINQ anonymous types add a lot of capabilities for projecting results from a query.

Creating an anonymous type with two properties

var anon = new { IntegerValue = 1, StringValue = "A String" };

Due to the way anonymous types are defined there are some restrictions on their use. Although there are some ways around this the rule of thumb is that anonymous types can only be used within the scope where they are declared.

Next Steps

We’ve covered what LINQ is and the main pieces of the .NET Framework make it possible.  In the next post we’ll look at the common query methods and how to construct queries.

LINQ: IEnumerable to DataTable

Over the past several months I’ve been promoting LINQ pretty heavily at work.  Several of my coworkers have jumped on the bandwagon and are realizing how much power is available to them.

This week two of my coworkers were working on unrelated projects but both needed to convert a list of simple objects to a DataTable and asked me for an easy way to do it.  LINQ to DataSet provides wonderful functionality for exposing DataTables to LINQ expressions and converting the data into another structure but it doesn’t have anything for turning a collection of objects into a DataTable.  Lucky for us LINQ makes this task really easy.

First we need to use reflection to get the properties for the type we’re converting to a DataTable.

var props = typeof(MyClass).GetProperties();

Once we have our property list we build the structure of the DataTable by converting the PropertyInfo[] into DataColumn[].  We can add each DataColumn to the DataTable at one time with the AddRange method.

var dt = new DataTable();
dt.Columns.AddRange(
  props.Select(p => new DataColumn(p.Name, p.PropertyType)).ToArray()
);

Now that the structure is defined all that’s left is to populate the DataTable.  This is also trivial since the Add method on the Rows collection has an overload that accepts params object[] as an argument.  With LINQ we can easily build a list of property values for each object, convert that list to an array, and pass it to the Add method.

source.ToList().ForEach(
  i => dt.Rows.Add(props.Select(p =>; p.GetValue(i, null)).ToArray())
);

That’s all there is to it for collections of simple objects.  Those familiar with LINQ to DataSet might note that the example doesn’t use the CopyToDataTable extension method.  The main reason for adding the rows directly to the DataTable instead of using CopyToDataTable is that we’d be doing extra work.  CopyToDataTable accepts IEnumerable but constrains T to DataRow.  In order to make use of the extension method (or its overloads) we still have to iterate over the source collection to convert each item into a DataRow, add each row into a collection, then call CopyToDataTable with that collection.  By adding the rows directly to the DataTable we avoid the extra step altogether.

We can now bring the above code together into a functional example. To run this example open LINQPad, change the language selection to C# Program, and paste the code into the snippet editor.

class MyClass
{
  public Guid ID { get; set; }
  public int ItemNumber { get; set; }
  public string Name { get; set; }
  public bool Active { get; set; }
}

IEnumerable<MyClass> BuildList(int count)
{
  return Enumerable
    .Range(1, count)
    .Select(
      i =>
      new MyClass()
      {
        ID = Guid.NewGuid(),
        ItemNumber = i,
        Name = String.Format("Item {0}", i),
        Active = (i % 2 == 0)
      }
    );
}

DataTable ConvertToDataTable<TSource>(IEnumerable<TSource> source)
{
  var props = typeof(TSource).GetProperties();

  var dt = new DataTable();
  dt.Columns.AddRange(
    props.Select(p => new DataColumn(p.Name, p.PropertyType)).ToArray()
  );

  source.ToList().ForEach(
    i => dt.Rows.Add(props.Select(p => p.GetValue(i, null)).ToArray())
  );

  return dt;
}

void Main()
{
  var dt = ConvertToDataTable(
    BuildList(100)
  );

  // NOTE: The Dump() method below is a LINQPad extension method.
  //       To run this example outside of LINQPad this method
  //       will need to be revised.

  Console.WriteLine(dt.GetType().FullName);
  dt.Dump();
}

Of course there are other ways to accomplish this and the full example has some holes but it’s pretty easy to expand. An obvious enhancement would be to rename the ConvertToDataTable method and change it to handle child collections and return a full DataSet.

What is this?

Recently I’ve been updating one of our older utilities to .NET 4.  A few days ago I stumbled across this line of C#:

if(this == null) return null;

I was dumbfounded.  When would that ever evaluate to true?  Worse yet, why was it repeated in two other places?

Out of curiosity (read: late night boredom) I did some research to see if there’s ever a case where the condition would be met and found a good discussion over on Stack Overflow.  There apparently are a few cases where this == null could actually be true:

  1. Overload the == operator to explicitly return true when comparing to null.
  2. Pass this to a base constructor as part of a closure.

Neither of these cases applied to this code.  We weren’t overloading the == operator and we certainly weren’t using it in a closure let alone a closure being passed to a base constructor.  The second case has apparently been fixed for .NET 4 so it definitely wouldn’t apply with the changes I was making.

As part of the Stack Overflow discussion Eric Lippert provided an interesting comment about why the C# compiler doesn’t complain about checking for this == null.   He basically says that the compiler doesn’t complain because they didn’t think about it because it’s obviously wrong.  So for those wondering, yes, I eliminated all three instances of this code from the utility.