Using Patterns to Match Dates or Times

10.26.1 Problem

You need to make sure a string looks like a date or time.

10.26.2 Solution

Use a pattern that matches the type of temporal value you expect. Be sure to consider issues such as how strict to be about delimiters between subparts and the lengths of the subparts.

10.26.3 Discussion

Dates are a validation headache because they come in so many formats. Pattern tests are extremely useful for weeding out illegal values, but often insufficient for full verification: a date might have a number where you expect a month, but if the number is 13, the date isn't valid. This section introduces some patterns that match a few common date formats. Recipe 10.31 revisits this topic in more detail and discusses how to combine pattern tests with content verification.

To require values to be dates in ISO (CCYY-MM-DD) format, use this pattern:

/^d{4}-d{2}-d{2}$/

The pattern requires - as the delimiter between date parts. To allow either - or / as the delimiter, use a character class between the numeric parts (the slashes are escaped with a backslash to prevent them from being interpreted as the end of the pattern constructor):

/^d{4}[-/]d{2}[-/]d{2}$/

Or you can use a different delimiter around the pattern and avoid the backslashes:

m|^d{4}[-/]d{2}[-/]d{2}$|

To allow any non-digit delimiter (which corresponds to how MySQL operates when it interprets strings as dates), use this pattern:

/^d{4}Dd{2}Dd{2}$/

If you don't require the full number of digits in each part (to allow leading zeros in values like 03 to be missing, for example), just look for three nonempty digit sequences:

/^d+Dd+Dd+$/

Of course, that pattern is so general that it will also match other values such as U.S. Social Security numbers (which have the format 012-34-5678). To constrain the subpart lengths by requiring two to four digits in the year part and one or two digits in the month and day parts, use this pattern:

/^d{2,4}?Dd{1,2}Dd{1,2}$/

For dates in other formats such as MM-DD-YY or DD-MM-YY, similar patterns apply, but the subparts are arranged in a different order. This pattern matches both of those formats:

/^d{2}-d{2}-d{2}$/

If you need to check the values of individual date parts, use parentheses in the pattern and extract the substrings after a successful match. If you're expecting dates to be in ISO format, for example, do something like this:

if ($val =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/) { ($year, $month, $day) = ($1, $2, $3); }

The library file lib/Cookbook_Utils.pm in the recipes distribution contains several of these pattern tests, packaged as function calls. If the date doesn't match the pattern, they return undef. Otherwise, they return a reference to an array containing the broken-out values for the year, month, and day. This can be useful for performing further checking on the components of the date. For example, is_iso_date( ) looks for dates that match ISO format. It's defined as follows:

sub is_iso_date { my $s = shift; return undef unless $s =~ /^(d{2,4})D(d{1,2})D(d{1,2})$/; return [ $1, $2, $3 ]; # return year, month, day }

To use the function, do something like this:

my $ref = is_iso_date ($val); if (defined ($ref)) { # $val matched ISO format pattern; # check its subparts using $ref->[0] through $ref->[2] } else { # $val didn't match ISO format pattern }

You'll often find additional processing necessary with dates, because although date-matching patterns help to weed out values that are syntactically malformed, they don't assess whether the individual components contain legal values. To do that, some range checking is necessary. That topic is covered later in Recipe 10.31.

If you're willing to skip subpart testing and just want to rewrite the pieces, you can use a substitution. For example, to rewrite values assumed to be in MM-DD-YY format into YY-MM-DD format, do this:

$val =~ s/^(d+)D(d+)D(d+)$/$3-$1-$2/;

Time values are somewhat more orderly than dates, usually being written with hours first and seconds last, with two digits per part:

/^d{2}:d{2}:d{2}$/

To be more lenient, you can allow the hours part to have a single digit, or the seconds part to be missing:

/^d{1,2}:d{2}(:d{2})?$/

You can mark parts of the time with parentheses if you want to range-check the individual parts, or perhaps to reformat the value to include a seconds part of 00 if it happens to be missing. However, this requires some care with the parentheses and the ? characters in the pattern if the seconds part is optional. You want to allow the entire :d{2} at the end of the pattern to be optional, but not to save the : character in $3 if the third time section is present. To accomplish that, use (?:pat), an alternative grouping notation that doesn't save the matched substring. Within that notation, use parentheses around the digits to save them. Then $3 will be undef if the seconds part is not present, but will contain the seconds digits otherwise:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?$/) { my ($hour, $min, $sec) = ($1, $2, $3); $sec = "00" if !defined ($sec); # seconds missing; use 00 $val = "$hour:$min:$sec"; }

To rewrite times in 12-hour format with AM and PM suffixes into 24-hour format, you can do something like this:

if ($val =~ /^(d{1,2}):(d{2})(?::(d{2}))?s*(AM|PM)?$/i) { my ($hour, $min, $sec) = ($1, $2, $3); # supply missing seconds $sec = "00" unless defined ($sec); # convert 0 .. 11 -> 12 .. 23 for PM times $hour += 12 if defined ($4) && uc ($4) eq "PM"; $val = "$hour:$min:$sec"; }

The time parts are placed into $1, $2, and $3, with $3 set to undef if the seconds part is missing. The suffix goes into $4 if it's present. If the suffix is AM or missing (undef), the value is interpreted as an AM time. If the suffix is PM, the value is interpreted as a PM time.

10.26.4 See Also

This section is just the beginning of what you can do when processing dates for data transfer purposes. Date and time testing and conversion can be highly idiosyncratic, and the sheer number of issues to consider is mind-boggling:

The prevalence of these issues in data transfer problems means that you'll probably end up writing some of your own validators on occasion to handle very specific date formats. Later sections of this chapter can provide additional assistance. For example, Recipe 10.30 covers conversion of two-digit year values to four-digit form, and Recipe 10.31 discusses how to perform validity checking on components of date or time values.

Категории