Drax'Jinn Development, LLC.
For more information please call (417) 863-9674
Simple Solutions for Your Complex Problems
Search  

Posts Tagged ‘Date Processing’

Partial Dates in PHP

Monday, January 5th, 2009

The problem with PHP’s date parsing

I’ve been working for Dream World Technology in the WikiHorseWorld.com project, and needed a small piece of code that would allow partial dates to be entered into a text input field, and parse the date properly. So, after digging in the PHP documentation, I found the mktime(), strptime(),  and strtotime(). (Yes, I know that the date routines were updated as of PHP 5.1.x, but they still had to be re-compiled in until PHP 5.2, so that wasn’t an option right now.)

The problem with the built-in functions is that all of them require a date to be well-formed in that you must include the month, day, and year when specifying a date, or the function will either throw an error, or return an invalid date. PHP does include some methods for parsing strings based on formatting, but you actually have to know the format before they can be used!

In addition, from past experience, I have discovered that without some form of prompting, everyone seems to have their own way of entering a date into a text box. Even with a small description paragraph that explains how to enter the date, and provides examples, you will still get some people who enter the date the way that THEY want to!

To further complicate matters, there are several different standards around the world regarding how dates are entered, and they differ by country. For example, here in the USA, we enter dates in the form of m-d-Y, but the European standard is d/m/Y.  To complicate things even further, MySQL expects dates to be entered in the ISO format of Y-m-d, and people may want to enter any or all of these with a 2 or 4 digit year and may or may not add a leading zero in front of the month and days. (And these scenarios are just for numeric dates. Just imagine the chaos if you allowed month names to be entered!)

Before I get into this any deeper, I should probably explain the reason for the piece of code I am about to show you.

Why would I need partial date matching?

WikiHorseWorld.com is a community site for horse enthusiasts to collaborate, and exchange information. To that end, I am programming a pedigree research database that will allow horse breeders to show the lineage of their horses. The information collected includes both a date of birth, and a date of death. (For past generations of horses)

The reason for the date code needing to be a partial match is that many times people may know the year of birth, or even the month and year, but not the full date. I could try to fake the information by providing January as the month and the first day of the month for the day if that information isn’t specified, but that presents its own problems. (Users will be Users, and I don’t want to field the Tech Support questions regarding “Why is my horse’s bithday 2009-01-01 when I only put in 2009?”.)

The Criteria

The specifications that I was given to program were very precise. The pedigree was to allow for dates to be entered in any form, and also allow for partial date matching with just a year, and year/month combinations. To that end, I worked up the following criteria that the script would need to fulfill:

  • The script must be able to parse full and partial month names (such as January and Jan)
  • Two and Four digit years must be supported (such as 98 and 1998)
  • The script must understand month and days with and without leading zeros (such as 01/01/2001 and 1/1/2001)
  • The script must allow for punctuation and day suffixes to be used (such as January 21st, 1998)
  • The script must allow for the most common date delimiters to be used (such as -/,.\ and spaces)
  • Extra whitespace should not break the algorithms
  • The code must be compact and easy to use
  • The code must allow me to determine if any parts of a date were left out
  • I must be able to output a standardized “Pretty” output no matter what the user enters.

Finally! a Solution

I’m not going to post the entire function code here unless I get permission from Dream World Technology, however, I will post enough for you to understand the concept.

I finally settled on four regular expressions (regex) that will allow for the most commonly used date formats. The formats I decided to use are: m-d-Y, Y-m-d, d-m-Y, and Y-d-m.

First we pre-process the string to “normalize it. This involves converting everything to lowercase, and replacing any month names with their abbreviations. This process also converts the delimiters to spaces so that whitespace can be taken into consideration.

Next, I created a regular expression that would allow parsing of the m-d-Y format:

'/^\b(0?[1-9]|1[012]|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[\s]+(?:(?:(0?[1-9]|[12][0-9]|3[01])(?:st|nd|rd|th|)[\s]+)|)((?:19|20)?[0-9]{2})\b/’

This regular expression will parse strings allowing for the leading zero in the month and year, as well as stripping out the ordinal suffixes for the days (1st, 2nd, 3rd, 4th, etc…). The months can be either numeric or the abbreviations. The Year can be in 2 or 4 digit format, and the day can even be left out, which allows this to parse m-d-Y,  j-n-y, M-d-Y, m-Y, n-y, M-Y, in any combination of formats thereof!

Note: This expression only parses dates from 1900-2099. If you need dates beyond this, you will need to modify the expression slightly.

The results of the match would include the parsed string in index 0, the month in index 1, the day in index 2, and the year in index 3.

Running this regular expression using:

$szFormat='/^\b(0?[1-9]|1[012]|jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[\s]+(?:(?:(0?[1-9]|[12][0-9]|3[01])(?:st|nd|rd|th|)[\s]+)|)((?:19|20)?[0-9]{2})\b/’;
preg_match($szFormat, “01-05-2009″, $aTemp);
preg_match($szFormat, “1.5.2009″, $aTemp);
preg_match($szFormat, “01/09″, $aTemp);
preg_match($szFormat, “January 5th, 2009″, $aTemp);

The resulting arrays would be:

{0 => ‘01 05 2009′, 1 => ‘01′, 2 => ‘05′, 3 => ‘2009′ }

{0 => ‘1 5 2009′, 1 => ‘1′, 2 => ‘5′, 3 => ‘2009′ }

{0 => ‘01 09′, 1 => ‘01′, 2 => ”, 3 => ‘09′ }

{0 => ‘jan 5 2009′, 1 => ‘jan’, 2 => ‘5′, 3 => ‘2009′ }

The other three regular expressions work similarly, but with the fields moved around to match the format. I use an if…elseif structure to handle the processing, with the most common (for our audience) formats in the first if statement, and the other 3 in order of precedence.

Finally, just a little extra processing to normalize the output,  and we have a very versatile algorithm that does exactly what we set out to do.

Conclusion

Though PHP is a very versatile language, and has many useful built-in functions, I’ve always felt that it was lacking a little in the area of date/time manipulation. However, with a little creative processing with regular expressions, these shortcomings can be overcome.

-Jason




Code and Content © Copyright 2007-2008 Drax'Jinn Development, LLC. All Rights Reserved