14 Aug
2009

Observations on the ‘if’ statement

A little while back a small debate has erupted over the if statement.  There is even and anti-if campaign going on.  If you read things wrong you would think that the ?if? needed to be banished from all languages.   And since I had so much fun with my ?Refactoring a Switch statement?  (and the rebuttal), I thought I would see what damage I could do here.  (Come on Sean Timm, I double-dog dare you)

This isn?t what is going on here ? well, not as I read it.  People often make brash statements so people will reconsider their actions.  So we call things ?evil?, compare them to the Vietnam war, say they cripple the mind, etc.

What people are really after is the miss-use of the if statement, not the expulsion.  After all, I believe there is an if statement in virtually every computer language ever created ? oh wait, assembly doesn?t have one, but do you really want to code in assembly?

So how do you miss-use something as simple as an if?   There are several ways actually.  But the basic problem is nesting (see the Arrowhead Anti-pattern for an extreme example).  Nesting code can lead to confusion — regardless of container. 

Arrowhead Anti-pattern

The Arrowhead anti-pattern, as written up by Chris Missal is a good place to start with this one.  The code ends up looking like this (code shamelessly stolen)

   1: if
   2:   if
   3:     if
   4:       if
   5:         do something
   6:       endif
   7:     endif
   8:   endif
   9: endif

The issue comes from the nested if statements.  The farther the nesting goes, the harder the code is to follow.  Flatten the list as best you can.  Another tactic is to leave the method as soon as possible (return is your friend, or throw an exception).

So the code could be refactored to look like this:

   1: if not something
   2:    return;
   3:  
   4: if not something
   5:    return;
   6:  
   7: do something

When I started coding (1997-ish), we thought of that having multiple return statements as bad coding practice.  Now I can?t stand to do anything else.  But there are conditions on that.  First off, if you have multiple returns in a method, return out as early as possible, or at the end.  Try not do do this in the middle.  Also, I have no problem with a method that is nothing more than a series of small if statements with returns (as the code above does).

As an aside, we are talking about code blocks here as well.  In C#, that would be any code between the { and } with a for/if/while preceding it.  I don?t like to see significant amounts of code inside of any of them.  If there is a lot of code there, move it to another method.

Never-ending conditions Anti-Pattern

You?ve all seen this type of code.  Where the distance between the ?if? and the ?then? can be measured by counting zip codes. Below is a simple example of the problem:

   1: if (this and that and otherthing or something and not like-like-like ...)

If your ?if conditions? look like something that would come out of the mouth of a valley girl, you might be a bad developer.  How often have you looked at the if conditionals had had to spend five minutes to figure out WTF what going on?  Way too often for me.

Any time you add an ?AND? (&&) or an ?OR? (||) to an if statement, please think about what you are trying to do.  Or better yet, do an extract to method.  Even as simple as this refactoring would be of help:

   1: if (INeedSmackedUpSideTheHead(data)) {
   2:  
   3: }
   4:  
   5: bool INeedSmackedUpSideTheHead(data) {
   6:   return this and that and otherthing or something and not like-like-like...);
   7: }

Even better would be to break up the conditions into separate calls within the new method, and remove the OR entirely.

   1: bool IsThisBetterNow(data) {
   2:   if (thisThing and that and otherthing) 
   3:     return true;
   4:   if (something and not like-like-like)
   5:     return true;
   6:   return false;
   7: }

You may disagree with me here, but in extended conditions, OR statements are a bit unnatural.  They break up the flow and the readability.  They cause the reader to have to go back and reevaluate what has been done, because it might not be relevant anymore ? because of the OR.

For-If Anti-Pattern

This is a .net 3.5 specific anti-pattern (because of linq and lambda), so if you are using a lamba-less language, just ignore this for now.  But the For-If Anit-Pattern could be considered a functional development anti-pattern, so also consider this if you are entering the wonderful world of F# or Scala. 

Now some will call foul on this anti-pattern, mostly because the code looks simple enough, but the solution is still nicer.  First off, there is a simple for look with a nested if:

   1: foreach(var data in list)

   2: {

   3:    if (data.IsTrue)

   4:    {

   5:       // do something

   6:    }

   7: }

With lambda you can now refactor this out a bit with what is essentially a custom iterator:

   1: foreach(var data in list.Where(x=>x.IsTrue))

   2: {

   3:    // do something

   4: }

OK, that is all that comes to mind at this point.  But as you can see, I hope, there are ways to misuse something as simple as an ?if?. Actually, it is often the simplest things that get misused the most, you rarely see an IoC library so grossly misused.  Anyway, hope you all have a good weekend.

22 thoughts on “Observations on the ‘if’ statement

  1. Heh, I just posted something around the office where I work amounting to the same thing. In that case it was specific to the way some of the systems had been built and the amount of conditional code that bloated the complexity figures and resulted in very large and bug-prone sections of code.

    A lot of the time, “if” is resorted to to aid re-usability, but in the end all it does is hinder it. Re-usable code is code that does one thing, adding if statements is usually a sign of code being expected to do one thing, or, one thing else. (or another thing, or yet another, provided the first thing wasn’t suitable, and the second thing would have been suitable if not for this new thing.)

  2. Chris,

    I won’t agree the last advice you have given on using Lambdas. The very first reason is that,every time you iterate in the foreach, you are going to refilter the list unnecessarily. So the code in the advice is less efficient than the code to be refactored because of the number of iterations done in the list.

    Even if you store filtered list in a variable instead of having it called in the foreach loop each time, the performance of hit will still be there because in the first code there is only 1 loop through the list.

    Another important thing to mention is that I hope this won’t lead to argue that all “for-if” s are bad. Because they are not. You don’t mention it but your code “implies” that there should be nothing else outside the if and inside the foreach for this approach which I see worth mentioning explicitly. (otherwise it wouldn’t be possible to use filtered list because you would miss the other part of the code that has to be applied to all elements)

  3. Yes, the Lambda rule would be evaluated each iteration in that example. Still, using the Lambda filtering before looping should be preferable to performing conditional code within a looping construct.

    var trueData = list.Where(x => x.IsTrue);
    foreach ( var data in trueData )
    {
    // Do Stuff.
    }

    My argument for this approach is that it is crystal clear what this code intends to do. Yes, the Where filtering takes cycles, but the resulting loop only iterations through relevant data. I gladly sacrifice a few cycles to aid in readability and concise, single-purpose code.

    Having conditional code where “this” applies to “these” elements, while “that” applies to “those” elements, and “this other thing” applies to all elements (which gets changed 3 months down the road by another developer) in one pass through a collection to save a few cycles is a maintenance nightmare waiting to happen.

    “if”s are a good indication that you’ve got code doing more than one thing. It gets a whole bunch more reliable, readable, and testable when you avoid them as much as possible.

  4. @Sidar, I don’t agree with you. There is as much of filtering in the first example as in the second. Just as the “if (data.IsTrue) { … }” condition is evaluated in every iteration, the lambda condition is evaluated in every iteration as well. There will be no performance penalty. What is the difference if I execute a condition “in-line” ar by means of a delegate (which a lambda effectivly is) ? None, I suppose.
    regards, Przemek Soszynski

  5. Copy List anti-pattern – Where’s that defined?
    Too much garbage to collect? These are references to the same items, it’s not creating copies of the items in the list.

  6. @Bjorn and @Sidar – you need to research how Linq-to-Objects actually works. Przemek and Steve’s responses are good place to start! In short, there is practically no overhead.

    With regard to this blog post generally, the big problem that the “anti if campaign” intended to highlight was the use of if/switch to perform a type switch. Meaning that really the operation to be peformed should be a method on an object, and so all the client has to do is call the method. The “decision” has already been (or should have been) taken when the object was constructed. Obviously this doesn’t apply to all uses of “if” which is why it’s not really an “anti if campaign” at all. It’s more of an effort to educate people in the basic notions of OOP.

  7. Actually, with the mention of the LINQ filtering I would put forward a little gem that I’ve found very, very handy especially when dealing with legacy systems that were perhaps not as well designed as they could have been… (ala lack of proper inheritance)

    Given a list of base types or base interfaces, executing code on a specific sub-set by type:
    I.e You have a list of Animals which include Dogs, Cats, Hamsters, and an assortment of other pet-store varieties.
    It might be nice to do something like:
    foreach ( Dog dog in animalList )
    {
    doDogStuff(dog); // Works until you hit a cat etc. Scary thing for real that I saw was code like this wrapped in a try-empty-catch block.. *shudder*
    }
    Instead you’d need to do: (This is the kind of stuff I’ve been digging through in legacy systems…)
    foreach ( Animal animal in animalList )
    {
    if (animal is Dog)
    doDogStuff((Dog) animal);
    }

    With LINQ it’s easy & readable.
    foreach (Dog dog in animalList.Where(x => x.GetType().IsAssignableFrom(typeof(Dog)))
    {
    doDogStuff(dog);
    }
    or to avoid the extra performance hit per iteration
    var dogs = animalList.Where(x => x.GetType().IsAssignableFrom(typeof(Dog)).Cast.ToList();
    foreach (Dog dog in dogs)
    {
    doDogStuff(dog);
    }

    The cast isn’t necessary in the first LINQ example which is kind of cool…

    It’s probably been figured out and written somewhere on the www but it was something I came up with when staring at hundreds of lines of conditional crud in loops.

  8. Also, converting to a List doesn’t avoid the performance hit per iteration. It just does the iteration twice and takes an additional performance hit of allocating a copy of the list.

  9. Steve Py :
    Yes, the Lambda rule would be evaluated each iteration in that example.

    Um…no.

    void Main()
    {
    var list = new[] { new Animal{Name = “Bulldog”}, new Animal{Name = “Leopard”}, new Animal{Name = “Doberman”} };

    foreach (var data in list.Where(x => x.IsDog))
    {
    Console.WriteLine(data.ToString());
    }

    }

    class Animal
    {
    public string Name { get; set; }
    public bool IsDog
    {
    get
    {
    Console.WriteLine(“IsDog called”);
    return Name != “Leopard”;
    }
    }
    public override string ToString() { return Name; }
    }

    Output:

    IsDog called
    Bulldog
    IsDog called
    IsDog called
    Doberman

  10. @Daniel Earwicker
    OfType, cool! I didn’t spot that one. I just did a quick whirr and it handles A : B : C nicely where OfType returns B’s & A’s.

    @Bill, um, not sure I get whether you got my original point or not. Lists are immutable within a loop (I.e. an exception if you try to add while inside a foreach), but the elements within that collection can be modified. The difference here is that if you use the LINQ expression inside the foreach statement and change an element that hasn’t been iterated through yet while in the loop, the linq expression will pick up that change as it is evaluated for each element. It is not evaluated once, creating a new list of results to iterate through, so there is a difference in behaviour between whether you retrieve the IEnumerable outside of the loop, or use LINQ inside the loop. It was thought there might be a performance hit with that, but it turns out there isn’t. (In fact it looks like there is a small savings)

    I’ve set up a set of tests to check performance between the foreach using var vs. cast with the LINQ query in the foreach vs. done separately. Essentially I added 10000 elements to a list with the property to get the element number pausing for 1ms. The LINQ statement returns items MOD 5 (returning 2005 items)
    I ran the tests in different orders several times and the results were virtually identical within a 2% variance. The curious thing was that performing the query outside of the foreach was typically that 2% slower than both tests with the LINQ performed in the foreach. There was no difference between using var vs. explicit type specification within the foreach.

  11. @Steve Py
    Well about the var vs explicit type… its the same once its compiled. “var” is just syntactic sugar that tells the compiler to find out what is the type being used in the right side of the operation, so when its compiled both:
    int a = 5 and var a = 5 will compile to int a = 5

Comments are closed.