Using Patterns to Match Numeric Values
10.25.1 Problem
You need to make sure a string looks like a number.
10.25.2 Solution
Use a pattern that matches the type of number you're looking for.
10.25.3 Discussion
Patterns can be used to classify values into several types of numbers:
Pattern |
Type of value the pattern matches |
---|---|
/^d+$/ |
Unsigned integer |
/^-?d+$/ |
Negative or unsigned integer |
/^[-+]?d+$/ |
Signed or unsigned integer |
/^[-+]?(d+(.d*)?|.d+)$/ |
Floating-point number |
The pattern /^d+$/ matches unsigned integers by requiring a nonempty value that consists only of digits from the beginning to the end of the value. If you care only that a value begins with an integer, you can match an initial numeric part and extract it. To do this, match just the initial part of the string (omit the $ that requires the pattern to match to the end of the string) and place parentheses around the d+ part. Then refer to the matched number as $1 after a successful match:
if ($val =~ /^(d+)/) { $val = $1; # reset value to matched subpart }
You could also add zero to the value, which causes Perl to perform an implicit string-to-number conversion that discards the non-numeric suffix:
if ($val =~ /^d+/) { $val += 0; }
However, if you run Perl with the -w option (which I recommend), this form of conversion generates warnings for values that actually have a non-numeric part. It will also convert string values like 0013 to the number 13, which may be unacceptable in some contexts.
Some kinds of numeric values have a special format or other unusual constraints. Here are a few examples, and how to deal with them:
Zip Codes
Zip and Zip+4 Codes are postal codes used for mail delivery in the United States. They have values like 12345 or 12345-6789 (that is, five digits, possibly followed by a dash and four more digits). To match one form or the other, or both forms, use the following patterns:
Pattern |
Type of value the pattern matches |
---|---|
/^d{5}$/ |
Zip Code, five digits only |
/^d{5}-d{4}$/ |
Zip+4 Code |
/^d{5}(-d{4})?$/ |
Zip or Zip+4 Code |
Credit card numbers
Credit card numbers typically consist of digits, but it's common for values to be written with spaces, dashes, or other characters between groups of digits. For example, the following numbers would be considered equivalent:
0123456789012345 0123 4567 8901 2345 0123-4567-8901-2345
To match such values, use this pattern:
/^[- d]+/
(Note that Perl allows the d digit specifier within character classes.) However, that pattern doesn't identify values of the wrong length, and it may be useful to remove extraneous characters. If you require credit card values to contain 16 digits, use a substitution to remove all non-digits, then check the length of the result:
$val =~ s/D//g; $valid = (length ($val) == 16);
Innings pitched
In baseball, one statistic recorded for pitchers is the number of innings pitched, measured in thirds of innings (corresponding to the number of outs recorded.) These values are numeric, but must satisfy a specific additional constraint: A fractional part is allowed, but if present, must consist of a single digit 0, 1, or 2. That is, legal values are of the form 0, .1, .2, 1, 1.1, 1.2, 2, and so forth. To match an unsigned integer (optionally followed by a decimal point and perhaps a fractional digit of 0, 1, or 2), or a fractional digit with no leading integer, use this pattern:
/^(d+(.[012]?)?|.[012])$/
The alternatives in the pattern are grouped within parentheses because otherwise the ^ anchors only the first of them to the beginning of the string and the $ anchors only the second to the end.