I am amazed at the amount of SQL and other code I encounter, which employs poor methodologies of establishing start and end points in time for a filter. I want to show some examples, and share some simple guidelines which can help you avoid the pitfalls of miscalculating time, or introducing gaps or overlaps in coverage.

I recently ran into this code example in a stored procedure:

SET @dMonthStartDate = '2010-12-01'
SET @dStartDate = DATEADD(MONTH, -1, @dMonthStartDate)
SET @dEndDate = DATEADD(SECOND, -1, DATEADD(MONTH, 1, @dMonthStartDate))

It’s obvious that the goal of the code is to set the memory variables @dMonthStartDate to Nov 1st, and @dMonthEndDate to Nov 30th, to filter the data for only the specified month. Later, when the code moves into production, the intent is to change SET @dMonthStartDate = ‘2010-12-01’ to SET @dMonthStartDate = GETDATE() which will dynamically adapt the memory variables to mean “start/end of the previous month”.

SET @dMonthStartDate = GETDATE()   /* The intended production change */
SET @dStartDate = DATEADD(MONTH, -1, @dMonthStartDate)
SET @dEndDate = DATEADD(SECOND, -1, DATEADD(MONTH, 1, @dMonthStartDate))

There are three (yes, three!) problems with the logic that lead to some big problems.

#1: It makes an assumption that the code will always run on the first of the month. The goal is to return a subset of data, within the timeframe of last month. But since the test value for @dMonthStartDate was set to the 1st of the month, which is the expected execution time, the two expressions underneath carried that assumption forward into the code.

The code does not allow for the occurrence of a case where, the code fails to execute or needs to be rerun a second time, which occurs after the 1st of the month. Without that, you end up with a time frame of 11/04/2010 to 12/03/2010 when the code runs on December 4th.

The date can be forced to the first, by using DATEADD() like this:

SET @dMonthStartDate = DATEADD(DD, -(DAY(@dNow) - 1), @dNow)

Note: I always use @dNow to hold a snapshot of the current date/time, when more than one call to GETDATE() is made, and the same value is assumed for every call. It isolates the code from weird, very maniacal answers where the current time from GETDATE() rolls over to a new minute, hour, day, week, etc, when going from one call to the next.

#2: It makes an assumption that the code will always run exactly at midnight. GETDATE() is poorly named: it returns both the date and the time. The test value of ‘2010-12-01’ is actually returning ‘2010-12-01 00:00:00.000’, or Dec 1st at midnight. Assume this code is run at Dec 1st, 2010 at 3:59:23.231 AM. GETDATE(), without rectifying the time portion to midnight, would return ‘2010-12-01 03:59:23.231’, making the start and end dates ‘2010-11-01 03:59:23.231’ and ‘2010-12-01 03:59:22.231’ .

The time portion can be forced to midnight of the 1st of the month using…

SET @dMonthStartDate = CONVERT(
         VARCHAR, DATEADD(DD, -(DAY(@dNow) - 1), @dNow), 101

.. making it the 1st of the month at midnight.

#3: Not all of the time within the month is covered. The clue is the nested DATEADD() methods (add 1 month, subtract 1 second). By doing this, @dMonthEndDate is actually set to 2010-11-30 23:59:59.000 (using the constant date in the first example). Because of this, any entries made in the last second of time in the month are excluded using the filter (e.g. 2010-11-30 23:59:59.231 would be excluded).

Is it a small amount of data? Probably. Will someone notice? Probably not, but it’s always a bad assumption. Will it become noticeable if the system scales to process about 100 actions a second or more, and introduce doubt about your core method? My experience says yes.. definitely.

These types of errors usually manifest when the process is run over several days, and generates different answers. It may mistakenly introduce doubt into the algorithms for the logic using the filtered data, instead of the filter itself. This is a recipe for heartburn when troubleshooting.

One other flaw I see from time to time is this type of evaluation:

SET @dMonthStartDate = '2010-12-01'
SET @dMonthEndDate = '2011-01-01'
WHERE ActionDate >= @dMonthStartDate
AND ActionDate <= @dMonthEndDate

In this scenario, data captured exactly at midnight is included in both this month’s summary and in the next month’s summary. It’s more maniacal, but nonetheless, will show up in high-volume systems.

Having said that, here are my guidelines for handling date ranges.

Rule #1: Always have the end time actually be the “cutoff” time.

Cutoff sounds non-inclusive, whereas End is a bit nebulous.  Think of the time range in your if clause as “From the start time and before the cutoff time”, and write your code that way.

Rule #2: Set the time portion of both the starting and “cutoff” dates to the same value, and evaluate the time range as greater than or equal to start, and less than “cutoff”

Take the following example:

/* drops time from the system time to force time
to midnight of the current day */
/* Change to XXXX-XX-01, or the first of the month for this date. */
SET @dMonthlyReport<strong>CutoffDate</strong> = DATEADD(DAY, -(DAY(@dToday) -1), @dToday)
 /* Using the first of the month you just calculated, find the 1st of the previous month at midnight */
SET @dMonthlyReport<strong>StartDate</strong> = DATEADD(MONTH, -1, @dMonthlyReportCutoffDate)
WHERE ActionDate >= @dMonthReportStartDate
AND ActionDate < @dMonthlyReportCutoffDate

Note: This code can be refactored to calculate start time before it calculates end time. This is just my preference.

Notice I use the term “cutoff” date. It’s a personal preference.

For a month filter (as in this example), the time range would be “XXXX-XX-XX 00:00:00.0000″ to XXXX-(XX+1)-XX 00:00:00.0000”. For a week filter, the time range should be “XXXX-XX-XX 00:00:00.0000″ to XXXX-XX-(XX+7) 00:00:00.0000”. For a day filter, the time range should be “XXXX-XX-XX 00:00:00.0000″ to XXXX-XX-(XX+1) 00:00:00.0000”.

For an hour filter, the time range should be “XXXX-XX-XX 00:00:00.0000″ to XXXX-XX-XX 01:00:00.0000”.

… NOT “XXXX-XX-XX 00:00:00.0000” to “XXXX-XX-XX 23:59:59.999999999”
… or “XXXX-XX-XX 01:00:00.0000” to “XXXX-XX-XX 01:59:59.999999999”
, etc, etc.

There are two reasons for this rule. The first reason is simplicity. If I am able to calculate the 1st of the month at midnight, I can easily calculate the first of the next month at midnight by simply adding one month using DATEADD(month, 1, {date}), or the first of the previous month at midnight by simply subtracting one month using DATEADD(month, -1, {date}). The technique stays the same, regardless of the time frame being a month, a day, a week, an hour, a quarter, a year, etc. I just change the “what” of the offset in the DATEADD() method, e.g. DATEADD(hour, 1, {date})

The second reason is accuracy and readability. If I offset the ending time of the day to “23:59:59.9999”, and evaluate it using “less than or equal to”, I would have accurate code on current Windows or Linux systems. If future systems are released (or others systems exist) with an accuracy of 1/10,000th of a second or more, I have to revisit this code and extend it to 5 digits past the decimal point to be complete and accurate.

By using a filter that says “timestamp occurs before cutoff” (i.e. less than cutoff), a system’s accuracy is not going to affect my evaluation. Also, which statement is more easily understandable to you, if you had to maintain it:

WHERE ActionDate >= '1/1/2010'
AND ActionDate <= '1/31/2010 23:59:59.9999'
WHERE ActionDate >= '1/1/2010'
AND ActionDate < '2/1/2010'

I identify the second code example quickly as being based on a one-month time frame.  It clearly shows the first day of back to back months, and I don’t have to worry about why the seconds have a decimal point.

Rule #3: Never assume your code is running at its scheduled launch time.

Data is often summarized at a regular interval, but sometimes needs to be launched “out-of-cycle” if system problems occur or some source data straggles into the system after its expected arrival. The out-of-cycle execution can easily be a few days after its normal scheduled start of the 1st of the month.

How do you deal with this in code? Simple. Always ensure that your calculated start and end dates do not vary for any system time until the next “reporting” or filtering time range. A weekly time range generated on any given day of the month for a filtering period should always give the same answer. A monthly time range generated on any day of a given month should always give the same answer.  So if I set my test date to 6/8/2010 @ 5:00PM or 6/22/2010, and I was calculating for the previous month, any date within June of 2010 should resolve to a start date of 6/1/2010 00:00:00 and an end date of 7/1/2010 00:00:00.

What do I hope you get out of this? I urge you to take the time to draw out scenarios of time handling to make sure all cases are covered, and understand that not adequately calculating time filters properly (or making assumptions of what the system date/time will be) can lead to serious user distrust in your code over time. This is especially true in system processes. These three guidelines are intended to help avoid those problems.

In fact, treat date/time handling as you would the foundation of a house. If your foundation is off by even a few inches, or is missing small parts of it here and there, it can spell disaster for the whole house as things are added like flooring, plumbing, electrical, walls, and furniture. The last thing someone wants to find as a homeowner in a finished and occupied house, is that low water pressure is due to improper piping under the concrete slab holding the house up. That requires tearing up a lot of the house to get it fixed, and the whole house is affected. After fixing the weak water-pipe, if the water pressure jumped 20 pps and then caused pipes to leak in a bunch of other places in the house, those leaks now have to be fixed. It gets ugly. Even worse… the integrity of the whole house comes into question.

That’s the effect bad date handling can potentially have as your data trickles downstream to the consumers of your results. So take care when processing time stamped data.