Clean Code, Evolved

Bob Martin’s 2008 book, Clean Code, is considered by many to be one of the “must read” books for software developers, and for good reason: The guidelines discussed in the book aim to decrease long-term software maintenance costs by improving code readability. One of the most beautiful aspects of Clean Code is that although it was written with an emphasis on Java development, the guidelines are applicable to software development in general.

Clean Code was written to combat the plague of unmaintainable code that creeps into software projects. You know the kind: intertwined dependencies, useless comments, poor names, long functions, functions with side-effects, and so on. Each of these things make code difficult to read but some are more nefarious because they make code fragile and difficult to reason about. The end result is code that is difficult and expensive to maintain or extend.

In March I had the good fortune to speak at Nebraska Code Camp. I was really excited about the way the schedule worked out because I was able to sit in on Cory House’s talk about Clean Code. As Cory walked through the points I remembered reading the book and pondered its impact on how I write code. Eventually, my thoughts drifted to the climate that necessitated such a book and its continued relevance today. In the days following Cory’s talk, I reread Clean Code and further developed my thoughts on these subjects. (Confession: I skipped the chapters on successive refinement, JUnit internals, and refactoring SerialDate this time around.) What I came to realize is that while the Clean Code guidelines are an important step toward improving code quality, many of them simply identify deficiencies in our tools and describe ways to avoid them. Several of the guidelines gently nudge us toward functional programming but they stop short of fixing the problems, instead relying on programmer discipline to write cleaner code.

There is no doubt that understanding and embracing the Clean Code guidelines leads to higher code quality but relying on developer discipline to enforce them isn’t enough. We need to evolve Clean Code.

One of the key ways we can evolve Clean Code is by being more pragmatic about language selection. Rather than continuing to use languages that necessitate these guidelines because “we’re a {language} shop,” shouldn’t we instead favor languages that make such violations difficult if not impossible? In many cases, functional languages such as F# do just that.

For this discussion I’m going to ignore the more subjective guidelines such as using descriptive identifiers and comment quality. These are things that can’t really be addressed at a language level so I don’t believe that discussing them here is of much value. Instead, I’ll cover several of the more concrete guidelines, emphasizing those that address code structure and predictability. To illustrate the points I’ll use a combination of C# and F#.

Denote Code Blocks with Consistent Indentation

One of the easiest things we can to do improve code readability is consistently indent code blocks. This may seem like an obvious guideline but I’ve had to maintain code where different developers had different tab configurations in Visual Studio. Some developers had configured tabs while others were set to use spaces. Even when we recognized the problem and updated IDE settings, the damage to the code had already been done. Arguably worse is when lines are arbitrarily indented without concern for logical context as I often encountered during a previous project.

The following C# code is perfectly valid but it’s difficult to read due to inconsistent indentation:

public byte[] EncodeFileContents(Encoding encoding, string fileName)
{
  string content;
using (var fs = File.OpenText(fileName))
  {
content = fs.ReadToEnd();
}
  var bytes = encoding.GetBytes(content);
    return bytes;
	}

This function is short but it’s nearly impossible to discern where code blocks begin and end just by glancing at it. The curly braces look like they’re helping but at a glance I’d be inclined to think the function ends on line 7 rather than line 10.

In F#, these concerns disappear because the guideline is actively enforced by the compiler. Rather than relying on syntactic tokens such as curly braces or other keywords to denote code blocks, F# uses whitespace. The rule is that each nested block must be indented beyond the line that opens the block. Here’s the same function ported directly to F# (I’ve intentionally not taken advantage of several language features for this example):

let encodeFileContents (encoding : Encoding) (fileName : string) =
  let content =
    using (File.OpenText fileName)
          (fun fs -> fs.ReadToEnd())
  let bytes = encoding.GetBytes content
  bytes

In the preceding example, the hierarchical structure makes it clear where each code block begins and ends. F# takes this a step further and makes a ruling on the age-old spaces vs tabs debate by requiring spaces!

Keep Functions Small

Bob Martin didn’t provide any numbers or point to any studies to provide justification for this guideline when he wrote the book nor will I here. But, like Bob, I’ve learned from experience that more code means more opportunity for error and that smaller functions are generally easier to read, reason about, and maintain than long functions.

F# can’t force you to keep your functions small, but as a functional-first language its syntax and features actively promote small functions. Let’s rewrite the encodeFileContents function as more idiomatic F# and observe the difference.

let encodeFileContents (encoding : Encoding) fileName =
  using (File.OpenText fileName) (fun fs -> fs.ReadToEnd())
  |> encoding.GetBytes

This version accepts the same parameters and invokes the same functions yet it is nearly 40% smaller. We achieved this by taking advantage of several language features including: type inference, pipelining, and implicit return values. You can see the type inference on the first line where we’ve omitted the type annotation for the fileName parameter. The pipelining change involved replacing the intermediate bindings (F#’s approximation of variables) with the forward pipelining operator (|>) which sends the result of the expression on the left to the function on the right as the final argument. Finally, the lack of a return keyword indicates the implicit return.

Top-Down Coding

A major component of code maintainability is code discoverability; keeping related definitions near each other makes them easier to find. Tools like Visual Studio’s Go To Definition or ReSharper’s Go To Implementation alleviate the problem of code separation but they can be jarring and often force you out of context. I sometimes find that returning to the original context can be even more jarring as it often leads me to an internal dialog that goes something like this: Which file was that function in? Was it above or below this one? Hmmm, maybe in here? Screw it. CTRL+-, CTRL+-, CTRL+-, CTRL+-, CTRL+-, CTRL+-, CTRL+-, CTRL+-. Oh, there it is. Now, what was I looking for again?

Clean Code recommends that code be structured in a manner that resembles a top-down narrative; that every function be followed by those at the next level of abstraction, that is, functions are organized such that major concepts appear before their dependencies. This rule can be difficult to follow in C# because C# places few restrictions on where functions are defined. As long as one function is visible to another, it can be invoked no matter where it appears in the code. This is compounded by the fact that C# code files are typically organized hierarchically by namespace.

F# partially enforces this guideline albeit not quite the way described in the book. Rather than placing functions at lower levels of abstraction after functions at higher levels, the F# compiler evaluates code from top to bottom thus requiring that things be defined before they’re used. This applies across the entire project so it more or less forces you to group code by abstraction from low to high. The structure also allows the compiler to make more assumptions about your code thus allowing you to start at the end of a project to get a high-level overview of the project and work your way up to the details.

Don’t Repeat Yourself

One of my biggest pet peeves in software development is duplicated code. I once worked on a project where a single file had the same 200 (yes, 200) lines of code duplicated three times. Naturally, I refactored the monstrosity shortly after I discovered it but the code should never have been duplicated in the first place. F# wouldn’t have prevented this duplication but, just as it encourages small functions, F#’s syntax would have promoted composing the duplicated code into a function from many small ones.

Similarly, one of the things that has bothered me for years when working in C# is how many times we need to tell the compiler something it already knows. I hate repeating myself (something my wife would certainly confirm) so imagine what I think about writing even simple code like this:

public class MyClass
{
  private readonly int _intValue;
  private readonly string _stringValue;
  private readonly DateTime _dateValue;
  
  public MyClass(int intValue, string stringValue, DateTime dateValue)
  {
    _intValue = intValue;
    _stringValue = stringValue;
    _dateValue = dateValue;
  }
  
  public int IntValue { get { return _intValue; } }
  public string StringValue { get { return _stringValue; } }
  public DateTime DateValue { get { return _dateValue; } }
}

Here we’ve told the C# compiler about the values and the associated data types three times: once for the backing fields, once for the property definitions, and yet again in the constructor definition. (Yes, I could have used auto-implemented properties with private setters but then the class would still be internally mutable – more on that later.) How can we possibly avoid repeating ourselves when the compiler itself requires us to do it? This is one of the primary reasons that I was quick to adopt the var keyword when it was introduced to C# – I no longer needed to again tell the compiler something I’d already told it multiple times.

Now consider an equivalent class in F#:

type MyClass(intValue : int, stringValue : string, dateValue : DateTime) =
  member x.IntValue = intValue
  member x.StringValue = stringValue
  member x.DateValue = dateValue

In this version we had to tell the compiler about the values twice and, thanks to F#’s amazing type inference capabilities, we needed only tell the compiler about the corresponding data types once (as annotations in the primary constructor)! Even as it stands, there’s still too much repetition with this because we’ve defined the individual values twice. We can further reduce the duplication by using an F# record type instead.

type MyClass = { IntValue : int; StringValue : string; DateValue : DateTime }

Notice how we’ve progressed from 17 lines of code in the C# version to 4 lines in the F# class, then to a single line in the F# record type version. The end result is no duplication and 85% less code. What’s not apparent from this example is that the record type also has structural equality without us having to explicitly define it. Which version would you rather maintain?

Functions Should Accept No More Than Three Parameters

The rationale behind this guideline is simple: more parameters means more things to mentally process when reading a function and more permutations to test. F# enforces this guideline by treating functions differently than C#.

In C#, all arguments are applied simultaneously, regardless of how many there are. From this we get additional complexity. For example, to represent an arbitrary function in C# as delegate we typically select from one of the 9 generic overloads of Action (for void functions) or the 9 generic overloads of Func (for functions that return a value). In some cases we might even need to define a custom delegate type to represent the function. Naturally, these typically aren’t compatible which makes treating C# functions as data cumbersome at best.

Contrast that with F# which follows a very simple rule: every function has exactly one input and exactly one output. Functions that don’t have any particular input (parameterless functions in C#) accept a value called unit which indicates that no particular input is required. Functions that don’t have any particular output (void functions in C#) return unit. The one input rule applies even in cross-language scenarios where the F# compiler treats multiple arguments as a tuple and void functions as functions that return unit.

How, then, was I able to write the encodeFileContents function a bit earlier with two parameters (encoding and filename)? Accepting multiple parameters is an illusion created by function currying. If you were to inspect the signature of the function you’d see that it is actually:

Encoding -> string -> byte[]

What this is saying is that encodeFileContents is a function that accepts an Encoding and returns another function that accepts a String and returns a byte array.

What does currying have to do with Clean Code? For one, it avoids the complexity associated with multiple function types. More importantly, it enables partial application which allows you to easily compose new functions by specifying values for the first n parameters. This can have a profound impact on the cleanliness of your code because, by carefully selecting the order of your parameters, you can easily include your functions in pipelining and composition chains.

We can see how parameter order for curried functions affects code cleanliness by examining how we might consume the encodeFileContents function. We’ll start with accepting the file name first.

let fileName = "foo.txt"
let encodedContents = encodeFileContents fileName Encoding.UTF8

This isn’t too bad as far as readability goes but it’s a prime example of treating F# like C#. By specifying the file name before the encoding we’ve made it impossible to pipe in the file name because it’s not the final parameter. (Ok, not quite impossible, but piping values as a tuple is ugly.) We’ve also made it more difficult to compose new encoding functions using encodeFileContents. For instance, what if we recognize that we’ll frequently use UTF8 and want to compose a new function? Without changing the parameter order, such a function might look like this:

let encodeFileContentsUTF8 fileName =
  encodeFileContents fileName Encoding.UTF8

Consuming this function would then look like this:

let encodedContents = "foo.txt" |> encodeFileContentsUTF8

Now the code is a bit cleaner – we have a specialized function for UTF8 encodings and we can pipe the file name into the function. Notice the duplication in the encodeFileContentsUTF8 function though. We needed to include the fileName parameter and explicitly pass it to encodeFileContents. Now let’s look at the impact of switching the parameter order to accept the encoding first.

let encodedContents = "foo.txt" |> encodeFileContents Encoding.UTF8

Even without redefining the encodeFileContentsUTF8 function, we can pipe the file name to the encodeFileContents function because we’re partially applying it with the UTF8 encoding.

The encodeFileContentsUTF8 definition is greatly simplified, too.

let encodeFileContentsUTF8 = encodeFileContents Encoding.UTF8

Notice here how we didn’t need to specify the fileName parameter. Because partially applying encodeFileContents with the UTF8 encoding results in a function with a signature string -> byte[], the argument is implicit as evidenced by invoking the function.

let encodedContents = "foo.txt" |> encodeFileContentsUTF8

The end result may be the same but by taking advantage of currying and partial application, we were able to simplify the code and watch how data flows and transforms through the process.

Do One Thing/Avoid Side Effects

A colleague and I were recently trying to determine why a discount was being applied to only part of an order. Eventually we traced execution to function named “GetOrderItems” or something like that. This function accepted an order ID and returned a list of order items. Given the title of this section, it shouldn’t surprise you that this function did more than its name and signature implied. In addition to retrieving the order items, this function also deleted certain items from the database before calculating the discount.

Examples like this show why the Clean Code guidelines regarding meaningful names and small functions are so important. Equally important is to ensure that functions do only one thing. When functions do more than one thing, those other things are considered side effects.

This example is a rather extreme case of what can happen when a function has a side effect. Unfortunately, it can be quite easy to get into this situation with just about any language but what about more subtle side effects such as changing the contents of a reference type or modifying a shared value? In languages that don’t offer any safeguards against these types of changes, often the only valid answer to “what does this function do?” is “I don’t know.”

F# actively protects you from these types of side effects in a number of ways.

Default Immutability

F# doesn’t have the concept of variables, per se. Instead, it has bindings. Bindings simply associate a name to a value. The reason that bindings aren’t truly variables is that they’re immutable by default. In fact, nearly everything defined within the F# sandbox is immutable by default (no such guarantees exist when using types from other CLR languages). This characteristic alone virtually eliminates the possibility of a function unexpectedly changing some piece of shared data.

It is possible to override the default behavior by defining a binding as mutable but this is only advisable when the scope is limited and it benefits the overall solution. The beauty of this approach is that because everything is immutable by default, declaring a mutable binding doubles as an explicit warning to the reader that the associated value may change. It also means that code written in F# is more naturally suited to asynchronous and parallel tasks because there’s a reduced need to lock the shared resources.

Output Parameters

Output parameters are ugly. First, they violate the implicit contract that parameters should represent inputs to the function. Second, like void functions, their very presence indicates that the function will have at least one side-effect.

Consider the commonly used TryParse methods such as Int32.TryParse or Double.TryParse which accept two parameters and return a Boolean. The first parameter is the string to parse and the second is an output parameter that will contains the parsed value if parsing is successful. Although these functions are great for avoiding exceptions thrown when parsing the string, using them in C# is clunky as shown:

int parsedValue;
var success = Int32.TryParse("42", out parsedValue);

We need to define a variable to receive the parsed value, then invoke the TryParse function. Only upon inspecting the function’s return value can we be certain that the variable contains something useful.

F# doesn’t allow output parameters so they’re a non-issue when working solely with F# code. For those (hopefully) rare times when you’re working across language boundaries and need to invoke a function that uses an out parameter, the language designers devised a clever workaround – the compiler wraps the call within a generated class that hides the out parameter by including it in a tuple containing the return value and parsed value. The impact on code cleanliness is significant:

let success, parsedValue = Int32.TryParse "42"

This approach keeps you firmly planted in F#’s immutable garden and doesn’t pollute your code with extraneous declarations. As an added benefit, this tupled form plays nicely with F#’s pattern matching capabilities so branching on parsing success and failure is simple.

Return Values

In C#, void functions are functions that have no return value. Void functions are executed solely to have some effect on the system be it writing to a log file, updating a database, or something else altogether. F# is different in that is has no concept of a void function.

Just as every F# function accepts only one input, F# functions always have exactly one output. When a function has no particular return value, the return value is unit. This may seem like a subtle difference since they serve the same basic purpose but it has a significant impact on your code. For one, the single input/single output rule greatly simplifies using F# functions as data because there’s only one generic function type to pass around. Perhaps more important is how it allows the compiler to make different assumptions about your code such as implicitly returning the result of the last evaluated expression. Next, it forces you to be explicit about ignoring non-unit return values when invoking a function for a side effect by passing the result to the ignore function. Finally, it forces you to think more critically about whether unit is the correct return value or if the function should actually return something else.

Avoid Null

This section is actually a simplification of two guidelines: don’t pass null and don’t return null. It’s reasonable to argue that these are distinctly different than a blanket “avoid null” guideline but given how much code is devoted to accounting for null and how many defects arise because of null, avoiding it altogether seems more pragmatic.

Seldom does a day pass where I don’t hear of at least one instance of a problem caused by a null reference. Usually it’s me or another developer tracking down a NullReferenceException. It’s annoying enough in development but even more so when a customer support ticket includes the phrase “object reference not set to an instance of an object.”

NullReferenceExceptions are virtually a non-issue in F# because, unless interacting with code from other CLR languages, F# doesn’t use null. When working solely with F# code there are but two ways to make null a legal value and both require a conscious effort. Instead, F# uses a more explicit model with options.

Options are a built-in type (Option<‘T>) that have two possible values, Some<‘T> and None. None is similar to null in that it indicates that the data item contains no specific value but None is a value in and of itself. Some<‘T> is a container that wraps a specific value.

Although None serves a similar purpose to null, F#’s lack of null and inclusion of options offer several distinct advantages. First, options force you to consider whether something truly doesn’t have a value. Next, options are type safe; you can pass around Option<‘T> like any other value without having to worry about NullReferenceException. Finally, they explicitly tell you that a data item may not have an associated value so you know to handle None accordingly rather than being forced into always checking for a value or taking your chances that a value will always be present.

Data/Object Anti-Symmetry

Chapter 6 describes the difference between objects and data structures. Objects are defined as those types that hide their data behind abstractions and expose functions which operate upon their data whereas data structures are those types that expose their data and have no meaningful functions.

The distinction is important because they’re essentially opposite ways to look at the same problem. It’s easy to add objects to the system without affecting existing objects but it’s difficult to add new functions because every object must change. Conversely, data structures make it easy to add new functions without affecting existing functions but adding data structures requires changing existing functions.

Languages like C# make it easy to fall into the trap of intermixing the approaches because virtually everything is a class. When this happens, it’s difficult to add both functions and objects.

F# actively steers you away from the hybrid trap by providing distinct types for both objects and data structures. Given F#’s immutable nature, you’ll often find yourself favoring data structures and representing them as tuples, records, and discriminated unions. For those times where you truly want an object, you can define a class.

Summary

As much as we developers like to think we spend most of our time writing new code, the truth is that we spend far more time reading existing code. Everything we can do today to make the code easier for our future selves to read and maintain will pay off over time. This is why the guidelines described in Clean Code continue to be so relevant.

Unfortunately, while we acknowledge that many of the guidelines address deficiencies in our tools, namely the languages we work in from day to day, we simply accept the problems as the way of the world rather than adapt. Languages such as F# evolve the guidelines described by Clean Code by incorporating many of them directly into the language, making them difficult if not impossible to break.

Advertisements

7 comments

  1. You’ve flipped the argument order between definition

    let encodeFileContents (encoding : Encoding) fileName =

    and use

    let encodeFileContentsUTF8 fileName =
    encodeFileContents fileName Encoding.UTF8

    The original definition lets you write

    let encodeFileContentsUTF8 = encodeFileContents Encoding.UTF8

    which drives the point about multi-argument functions being inherently higher order home — and in less code to boot!

    1. That was actually intentional there but I probably could have been clearer about it in the text. The first example in that section (with the file name following the encoding) was crafted to drive home the point of partial application later in that section.

  2. You can write encodeFileContents function more succinctly:

    let encodeFileContents (encoding: Encoding) = File.ReadAllText >> encoding.GetBytes

    1. Indeed. I was trying to keep the comparison more direct than it would be by introducing a specialized method call but I definitely prefer your version.

Comments are closed.