Home Uncategorized Revisiting ISNULL, COALESCE, and the Perils of Micro-Optimization

    Revisiting ISNULL, COALESCE, and the Perils of Micro-Optimization

    400
    10

    Almost six years ago–in November of 2004–I posted what would turn out to be one of my most popular blog posts in terms of number of reads, “Performance: ISNULL vs. COALESCE.” (If you’re curious, the post is dated July 2006 because I was too lazyit was difficult to transition the publication dates over with the posts when I transferred to SQLblog from my previous blog location.)

    In this post I set out to determine whether ISNULL is faster than COALESCE, even though I admitted that I didn’t particularly care about the results:

    “Before getting to my own tests, I’d like to jump off on a quick tanget. COALESCE vs. ISNULL? Who cares! This isn’t a performance question, this is a question of standards-conformant vs. proprietary code. ISNULL is non-standard and provides less functionality than COALESCE. Yet a lot of SQL Server developers love to use it, I suspect because it’s a lot easier to remember (and spell). So learn a new word and type two extra characters and you’ll end up with more maintainable, more functional code. Sounds good to me — which is why I am a big fan of COALESCE.”

    I still agree 100% with what I said in this paragraph, and despite the fact that my tests showed ISNULL to be slightly faster than COALESCE, I didn’t switch over and start using ISNULL. I use the tool that I’m comfortable with, that is standard, and that has more functionality. COALESCE. Seems like a no-brainer.

    And yet, that other post, as I mentioned, is one of my most popular ever. It gets loads of hits, usually from Google, Bing, and the like. People want to know which is faster. Can they give their code that extra edge? Recently a reader named Kit chastised me for not updating the body of the post, given that there was some new and “important” information disclosed by another reader in the comments. And this made me think about this issue yet again… What follows are my current views on this topic, based on what I’ve learned in the past six years, much of which has been spent doing performance consulting.

    There is too much to focus on to bother with this issue.

    In my original tests I noted a 10% time benefit of ISNULL over COALESCE. Those tests were done on SQL Server 2000, and I don’t have a server to test on today, but in 2005 and 2008 the difference is closer to 1%. And even a 10% difference isn’t enough to get me excited. In most of my performance engagements I start by looking for areas where I can get at least a 100% improvement in performance by making simple changes, and usually there so many of these “low-hanging fruit” that I don’t even need to bother looking for lesser areas of improvement. Furthermore, the 10% benefit is applicable only to the ISNULL or COALESCE function itself, and will be totally overshadowed by the rest of the query. Each call to these functions takes approximately 0.002ms on my test server. Think about how much time data access from disk or even from memory takes. Do you really think you’ll see an appreciable difference in total query performance by switching between these functions?

    Sometimes switching between these two functions will appear to fix a problem–but it has nothing to do with relative performance.

    ISNULL and COALESCE seem like they’re equivalent aside from the fact that ISNULL accepts only two arguments while COALESCE accepts any number. However, there is a subtle difference between the two. The output data type of ISNULL is the same data type as the first input argument, whereas the output data type of COALESCE is determined by the argument with the highest data type precedence. Consider the following two expressions:

    [sql]ISNULL(‘1’, 2)
    COALESCE(‘1’, 2)
    [/sql]

    The first expression, which uses ISNULL, returns a value typed as VARCHAR, since that’s the type of the first argument. The second expression, on the other hand, will return a value typed as INTEGER, since given both of the arguments–one VARCHAR and one INTEGER–the INTEGER has a higher precedence.

    What does this have to do with query performance? Sometimes, when using ISNULL or COALESCE as part of a predicate, a user may end up with a data type mismatch that is not implicitly convertable and which therefore causes a table scan or other less-than-ideal access method to be used. A user that is using trial-and-error and guess work to solve performance problems may swap ISNULL for COALESCE or the other way around and discover that suddenly the query will appear to run much faster. The conclusion that will be made is that one function is faster than the other, but that’s clearly not the case. In reality, highly selective index seeks tend to be a lot faster than index scans, and that’s what swapping out the function caused to occur.

    When performance tuning, don’t guess. Collect evidence, form hypotheses, and test thoroughly. SQL Server 2005 and 2008 gives us plenty of information to diagnose the vast majority of issues; there is no reason we should have to poke or prod in an attempt to find an answer.

    ISNULL has a really cool use case.

    If you’re anything like me, you use SELECT INTO… a lot. And you may have discovered that sometimes you need to make special accommodations to control the resultant data type, size, precision, scale, and other metadata for each output column. The most difficult of these to control is nullability, which is mostly implicitly derived by the query engine and for which there does not exist a conversion. Take, for example, the following query:

    [sql]

    SELECT
    Color
    INTO C
    FROM Production.Product
    WHERE
    Color IS NOT NULL [/sql]

    We can clearly see that the Color column in our new table, C, will contain no NULLs, because that’s what has been specified in the WHERE clause. But the query optimizer doesn’t seem to make this leap, and should you run this query followed by a quick check using sp_help or the catalog views, you’ll notice that C.Color is in fact nullable. I used to get around this by running ALTER TABLE … ALTER COLUMN, but when working with a billion-row table I discovered that this is hardly a cheap process, and I went looking for a better solution. The answer to my problem? ISNULL:

    [sql]

    SELECT
    ISNULL(Color, ”) AS Color
    INTO c
    FROM Production.Product
    WHERE
    Color IS NOT NULL [/sql]

    This query will produce a non-nullable Color column, exactly what I want and expect. And COALESCE won’t work here. Why? I don’t know, it just won’t. Finally, a solid reason to use ISNULL that has nothing to do with performance.

    What have we learned today?

    There is nothing to be gained from micro-optimization. If you have time to worry about ISNULL vs. COALESCE, you should probably consider an early retirement, because your job is done.

    One should never guess when performance tuning. It only leads to improper conclusions and less than ideal practices further down the road.

    Don’t completely dismiss a tool. You may discover a great way to use it once you look further afield.

    Thank you for reading and as always, enjoy!

    10 COMMENTS

    1. Reminds me of a discussion on Twitter recently that stemmed out of me not specifying a length for the varchar data type in an example of converting a date to a string.
      They said they had been yelled at more than once for this practice because it was a performance hit. I challenged the person to prove it to me, and in the end, he said that the difference was so slight to be barely detectable.

    2. Regarding the typical abundance of low hanging fruit: Shhh!! you’ll spoil it for the rest of us.
      Along similar lines, it’s about time we stopped worrying about user defined stored procedures with names that start with "sp".

    3. I guess the reason that ISNULL works for creating non-null columns with SELECT…INFO and COALESCE doesn’t is because COALESCE is really just blindly translated into a CASE expression by the engine.
      In other words…
      SELECT COALESCE(Color,”)
      is translated to…
      SELECT CASE WHEN Color IS NOT NULL THEN Color ELSE ” END
      And I guess it’s harder to parse out that CASE expression to determine nullability than it is to interpret a simple ISNULL.
      Just my guess…
      –Brad

    4. The reason COALESCE doesn’t work in the SELECT INTO is that at the "meta level", there’s no guarantee that COALESCE’s output value is non-null. e.g in this the COALESCE is NULL for the first row:
      WITH Point(x,y) AS
      (
      SELECT NULL AS x, NULL AS y
      UNION ALL
      SELECT NULL, 1
      UNION ALL
      SELECT 2, NULL
      )
      SELECT ISNULL(x,0) AS NotNullX
       , ISNULL(y,0) AS NotNullY
       , COALESCE(x,y) AS IsNullXY
      FROM Point;

    5. Marc: Your COALESCE returns a NULL for that row only because you’re not using a constant for the final argument. If you were to use a constant, COALESCE would never return a NULL.

    6. Excellent post today, Adam!
      The behaviour seen in the Color example is fascinating.  It doesn’t surprise me that only ISNULL works to generate a NOT NULL column with SELECT…INTO since that’s the same arrangement that works for computed columns, and that’s a documented feature ("Computed Columns" in Books Online, right at the bottom of the entry).
      What is *interesting* is that the "WHERE Color IS NOT NULL" part is insufficient to produce a NOT NULL column without the ISNULL.
      The optimiser tracks column NULLability as it explores plan alternatives via ‘properties’, which can result in extra optimisation opportunities.  The "WHERE Color IS NOT NULL" predicate sets a QO property bit to let later operators know that NULLs are not possible in that column from that point forward.  (So far so good.)
      You can see that in action since the query plan for the ISNULL version _does not contain_ an ISNULL expression anywhere!  The ISNULL was in the Compute Scalar, but was optimised away by later application of a rule recognising that performing an ISNULL operation on an attribute that cannot be NULL is redundant and wasteful.
      Nevertheless, even though the ISNULL was optimised away, the fact that it was there originally was enough to trigger the rule that defines the destination column as NOT NULL.
      What I find surprising is that the SELECT…INTO doesn’t use the QO property information when deciding if the column should be created as NULLable or not, without the ISNULL.  Seems like a bug (of omission).
      Something else just to illustrate the point.  If we run:
      DECLARE @C TABLE (Color NVARCHAR(15) NOT NULL); — Note NOT NULL
      INSERT  @C (Color)
      SELECT  Color
      FROM    Production.Product;
      …there is a RaiseIfNullInsert in the predicate property of the Table Insert iterator, to check for NULLs.  If we add the WHERE Color IS NOT NULL clause, the RaiseIfNullInsert disappears – optimised away by the knowledge that Color cannot be NULL by then.  
      So, the property is there, the optimiser ‘knows’ the column can’t contain NULLs and can optimise on that basis.  Double weird that the optimiser doesn’t reason that in the SELECT…INTO case.
      Apologies for the long comment 🙂
      Paul

    7. Great post Adam.  I particularly found your Color example interesting.  It just goes to show that every tool has value when used appropriately.

    8. Thanks for this post, and the one from 6 years ago. Who cares? Pedants do! Seems to be a high ratio of those working in this industry.
      I like your lessons, and the specific use case for ISNULL.
      Personally I like COALESCE because if I accidentally do this:
      COALESCE(decimaltypecolumn,0) it returns the decimal data type
      If I *thinking I am being careful* do this
      ISNULL(inttypecolumn,0.0) it returns an integer data type! Which would be bad

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here