Performing Validity Checking on Date or Time Subparts
10.31.1 Problem
A string passes a pattern test as a date or time, but you want to perform further checking to make sure that it's legal.
10.31.2 Solution
Break up the value into subparts and perform the appropriate range checking on each part.
10.31.3 Discussion
Pattern matching may not be sufficient for checking dates or times. For example, a value like 1947-15-19 may match a date pattern, but if you insert the value into a DATE column, MySQL will convert it to 0000-00-00. If you want to find out that the value is bad before putting it into your database, combine pattern matching with range checking.
To make sure that a date is legal, break out the year, month, and day values, then check that they're within the proper ranges. Years should be less than 9999 (MySQL represents dates to an upper limit of 9999-12-31), month values should be in the range from 1 to 12, and days should be in the range from 1 to the number of days in the month. That latter part is the trickiest; it's month-dependent, and for February, it's also year-dependent because it changes for leap years.
Suppose you're checking input dates in ISO format. Earlier, in Recipe 10.26, we used an is_iso_date( ) function from the Cookbook_Utils.pm library file to perform a pattern match on a date string and break it into component values:
my $ref = is_iso_date ($val); if (defined ($ref)) { # $val matched ISO format pattern; # check its subparts using $ref->[0] through $ref->[2] } else { # $val didn't match ISO format pattern }
is_iso_date( ) returns undef if the value doesn't satisfy a pattern that matches ISO date format. Otherwise, it returns a reference to an array containing the year, month, and day values.[5] To perform additional checking on the date parts, pass them to is_valid_date( ), another library function:
[5] The Cookbook_Utils.pm file also contains is_mmddyy_date( ) and is_ddmmyy_date( ) routines that match dates in U.S. or British format and return undef or a reference to an array of date parts. (The parts are always in year, month, day order, not the order in which the parts appear in the date string.)
$valid = is_valid_date ($ref->[0], $ref->[1], $ref->[2]);
Or, more concisely:
$valid = is_valid_date (@{$ref});
is_valid_date( ) checks the parts of a date like this:
sub is_valid_date { my ($year, $month, $day) = @_; # year must be non-negative, month and day must be positive return (0) if $year < 0 || $month < 1 || $day < 1; # check maximum limits on individual parts return (0) if $year > 9999; return (0) if $month > 12; return (0) if $day > days_in_month ($year, $month); return (1); }
is_valid_date( ) requires separate year, month, and day values, not a date string. This forces you to break apart candidate values into components before invoking it, but makes it applicable in more contexts. For example, you can use it to check dates like 12 February 2003 by mapping the month to its numeric value before calling is_valid_date( ). Were is_valid_date( ) to take a string argument assumed to be in a given date format, it would be much less general.
is_valid_date( ) uses a subsidiary function days_in_month( ) to determine how many days there are in the month represented by the date. days_in_month( ) requires both the year and the month as arguments, because if the month is 2 (February), the number of days depends on whether the year is a leap year. This means you must pass a four-digit year value. Two-digit years are ambiguous with respect to the century, and proper leap-year testing is impossible, as discussed in Recipe 5.28. The days_in_month( ) and is_leap_year( ) functions are based on techniques taken straight from there:
sub is_leap_year { my $year = shift; return (($year % 4 == 0) && ((($year % 100) != 0) || ($year % 400) == 0)); } sub days_in_month { my ($year, $month) = @_; my @day_tbl = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31); my $days = $day_tbl[$month-1]; # add a day for Feb of leap years $days++ if $month == 2 && is_leap_year ($year); return ($days); }
To perform validity checking on time values, a similar procedure can be used, although the ranges for the subparts are different: 0 to 24 for the hour, and 0 to 59 for the minute and second. Here is a function is_24hr_time( ) that checks for values in 24-hour format:
sub is_24hr_time { my $s = shift; return undef unless $s =~ /^(d{1,2})D(d{2})D(d{2})$/; return [ $1, $2, $3 ]; # return hour, minute, second }
The following is_ampm_time( ) function looks for times in 12-hour format with an optional AM or PM suffix, converting PM times to 24-hour values:
sub is_ampm_time { my $s = shift; return undef unless $s =~ /^(d{1,2})D(d{2})D(d{2})(?:s*(AM|PM))?$/i; my ($hour, $min, $sec) = ($1, $2, $3); $hour += 12 if defined ($4) && uc ($4) eq "PM"; return [ $hour, $min, $sec ]; # return hour, minute, second }
Both functions return undef for values that don't match the pattern. Otherwise, they return a reference to a three-element array containing the hour, minute, and second values.