Matching Strings with Regular Expressions

Problem

You want to know whether or not a string matches a certain pattern.

Solution

You can usually describe the pattern as a regular expression. The =~ operator tests a string against a regular expression:

string = 'This is a 30-character string.' if string =~ /([0-9]+)-character/ and $1.to_i == string.length "Yes, there are #$1 characters in that string." end # => "Yes, there are 30 characters in that string."

You can also use Regexp#match:

match = Regexp.compile('([0-9]+)-character').match(string) if match && match[1].to_i == string.length "Yes, there are #{match[1]} characters in that string." end # => "Yes, there are 30 characters in that string."

You can check a string against a series of regular expressions with a case statement:

string = "123" case string when /^[a-zA-Z]+$/ "Letters" when /^[0-9]+$/ "Numbers" else "Mixed" end # => "Numbers"

 

Discussion

Regular expressions are a cryptic but powerful minilanguage for string matching and substring extraction. They've been around for a long time in Unix utilities like sed, but Perl was the first general-purpose programming language to include them. Now almost all modern languages have support for Perl-style regular expression.

Ruby provides several ways of initializing regular expressions. The following are all equivalent and create equivalent Regexp objects:

/something/ Regexp.new("something") Regexp.compile("something") %r{something}

The following modifiers are also of note.

Table 1-1.

Regexp::IGNORECASE

i

Makes matches case-insensitive.

Regexp::MULTILINE

m

Normally, a regexp matches against a single line of a string. This will cause a regexp to treat line breaks like any other character.

Regexp::EXTENDED

x

This modifier lets you space out your regular expressions with whitespace and comments, making them more legible.

Here's how to use these modifiers to create regular expressions:

/something/mxi Regexp.new('something', Regexp::EXTENDED + Regexp::IGNORECASE + Regexp::MULTILINE) %r{something}mxi

Here's how the modifiers work:

case_insensitive = /mangy/i case_insensitive =~ "I'm mangy!" # => 4 case_insensitive =~ "Mangy Jones, at your service." # => 0 multiline = /a.b/m multiline =~ "banana banana" # => 5 /a.b/ =~ "banana banana" # => nil # But note: /a b/ =~ "banana banana" # => 5 extended = %r{ was # Match " was" s # Match one whitespace character a # Match "a" }xi extended =~ "What was Alfred doing here?" # => 4 extended =~ "My, that was a yummy mango." # => 8 extended =~ "It was a fool's errand" # => nil

 

See Also

Категории