Regular Expressions
Overview
Version 5 and above of VBScript fully support regular expressions in VBScript. Before that this was one feature that was sorely lacking within VBScript, and one that made it inferior to other scripting languages, including JavaScript.
Introduction to Regular Expressions
Regular expressions provide powerful facilities for character pattern-matching and replacing. Before the addition of regular expressions to the VBScript engine, performing a search-and-replace task throughout a string required a fair amount of code, comprising mainly of loops , InStr , and Mid functions. Now it is possible to do all this with one line of code using a regular expression.
If you've programmed in the past using another language (C++, Perl, awk, or JavaScript-even Microsoft's own JScript had support for regular expression before VBScript did), regular expressions won't be new to you. However, one thing that experienced programmers need to know in order to leverage regular expressions in VBScript is that VBScript does not provide support for regular expression constants (like /a pattern/ ). Instead, VBScript uses text strings assigned to the Pattern property of a RegExp object. In many ways this is superior to the traditional method because there is no new syntax to learn. But if you are used to regular expressions from other languages, especially client-side JavaScript, this might be something that you may not expect.
Note |
There are now many Windows based text editors that have followed in the footsteps of the Unix text editor vi and now support regular expression searches. These include UltraEdit-32 ( www.ultraedit.com ) and SlickEdit ( www.slickedit.com ). |
Regular Expressions in Action
The quickest and easiest way to become familiar with regular expressions is to look at a few examples. Here is probably one of the simplest examples of regular expression in action-a simple find-and-replace example.
Dim re, s Set re = New RegExp re.Pattern = "France" s = "The rain in France falls mainly on the plains." MsgBox re.Replace(s, "Spain")
Nothing spectacular-it's just a simple find and replace, but it is a powerful foundation to build up from. Here's how the code works. First, we create a new regular expression object.
Set re = New RegExp
Then we set the key property on that object. This is the pattern that we want to match.
re.Pattern = "France"
And the following line is the string we will be searching.
s = "The rain in France falls mainly on the plains."
The last line is the powerhouse of the script and is the line that does the real work. It asks our regular expression object to find the first occurrence of "France" (the pattern) within the string held in variable s and to replace it with "Spain" . Once we've done that, we use a message box to show off our great find-and-replace skills.
MsgBox re.Replace(s, "Spain")
When the script is run, the final output should be as shown in Figure 9-1.
Figure 9-1
Now, it's all well and good hard-coding the string and search criteria straight from the start, but you can make it a lot more flexible by making the script accept the string and the find-and-replace criteria from an input.
Dim re, s, sc Set re = New RegExp s = InputBox("Type a string for the code to search") re.Pattern = InputBox("Type in a pattern to find") sc = InputBox("Type in a string to replace the pattern") MsgBox re.Replace(s, sc)
This is pretty much the exact same code as we had before, but with three key differences. Instead of having everything hard-coded into the script, we introduce flexibility by using three input boxes in the code.
s = InputBox("Type a string for the code to search") re.Pattern = InputBox("Type in a pattern to find") sc = InputBox("Type in a string to replace the pattern")
The final change to the code is in the final line enabling the Replace method to make use of the sc variable.
MsgBox re.Replace(s, sc)
This lets you manually enter the string you want to be searched, as shown in Figure 9-2.
Figure 9-2
Then you can enter the pattern you want to find (see Figure 9-3).
Figure 9-3
Finally, enter a string to replace the pattern (see Figure 9-4).
Figure 9-4
This let's you try out something that you might already be thinking. That is, what happens if you try to find and replace a pattern that doesn't exist in the string. In fact, nothing happens, as shown here. Type in the string as shown in Figure 9-5.
Figure 9-5
Next you enter a search for a pattern that doesn't exist (something that doesn't appear in the string). In Figure 9-6 we use the string 'JScript'.
Figure 9-6
In the next prompt, enter a string to replace the nonexistent pattern. As no replacement will be carried out it can be anything. In Figure 9-7 we use the string 'JavaScript.'
Figure 9-7
Notice what happens. Nothing. As you can see in Figure 9-8, the initial string is unchanged.
Figure 9-8
Building on Simplicity
Obviously the examples that you've seen so far are quite simple ones, and to be honest, we could probably do everything we've done here just as easily using VBScript's string manipulation functions. But what if we wanted to replace all occurrences of string? Or what if we wanted to replace all occurrences of string but only when they appear at the end of a word?
We need to make some tweaks to the code. Take a look at the following code.
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True s = "The rain in Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
This version has two key differences. First, it uses a special sequence ( ) to match a word boundary (we'll explore all the special sequences available in the Regular Expression Characters section ). This is demonstrated in Figure 9-9.
Figure 9-9
What if we left the b out, like this?
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True s = "The rain in Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
Without this, the "in" part of the words "rain" , "Spain" , "mainly" and "plains" would be changed to "in the country of" also. This would give, as you can see in Figure 9-10, some very funny , but undesirable, results.
Figure 9-10
Second, by setting the Global property we ensure that we match all the occurrences of "in" that we want.
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True s = "The rain in Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
Regular expressions provide a very powerful language for expressing complicated patterns like these, so let's get on with learning about the objects that allow us to use them within VBScript.
The RegExp Object
The RegExp object is the object that provides simple regular expression support in VBScript. All the properties and methods relating to regular expressions in VBScript are related to this object.
Dim re Set re = New RegExp
This object has three properties and three methods. The three properties are:
- Global property
- IgnoreCase property
- Pattern property
The three methods are:
- Execute method
- Replace method
- Test method
RegExp Properties
As mentioned before, the RegExp object has three properties that you can use. Let's take a look at the three properties associated with the RegExp object.Global property.
The Global property is responsible for setting or returning a Boolean value that indicates whether or not a pattern is to match all occurrences in an entire search string or just the first occurrence.
object.Global [= value ]
object | Always a RegExp object |
value | There are two possible values: True and False |
If the value of the Global property is True then the search applies to the entire string; if it is False then it does not. Default is False -not True as documented in some Microsoft sources
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True s = "The rain in Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
IgnoreCase Property
The IgnoreCase property sets or returns a Boolean value that indicates whether or not the pattern search is case-sensitive.
object.IgnoreCase [= value ]
object | Always a RegExp object |
value | There are two possible values: True and False |
If the value of the IgnoreCase property is False then the search is case sensitive; if it is True then it is not. Default is False -not True as documented in some Microsoft sources
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True re.IgnoreCase = True s = "The rain In Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
Pattern Property
The Pattern property sets or returns the regular expression pattern being searched.
object.Pattern [= "searchstring"]
object | Always a RegExp object |
searchstring | Regular string expression being searched for. May include any of the regular expression characters -optional |
Dim re, s Set re = New RegExp re.Pattern = "in" re.Global = True s = "The rain in Spain falls mainly on the plains." MsgBox re.Replace(s, "in the country of")
Regular Expression Characters
Tip |
Capitalized special characters do the opposite of their lower case counterparts.
|
Many of these codes are self-explanatory, but some examples would probably help with others. We've already seen a simple pattern:
re.Pattern = "in"
Often it's useful to match any one of a whole class of characters. We do this by enclosing the characters that we want to match in square brackets. For example, the following example will replace any single digit with a more generic term .
Dim re, s Set re = New RegExp re.Pattern = "[23456789]" s = "Spain received 3 millimeters of rain last week." MsgBox re.Replace(s, "many")
Figure 9-11 shows the output from this code.
Figure 9-11
In this case, the number "3" is replaced with the text "many" . As you might expect, we can shorten this class by using a range. This pattern does the same as the preceding one but saves some typing.
Dim re, s Set re = New RegExp re.Pattern = "[2-9]" s = "Spain received 3 millimeters of rain last week." MsgBox re.Replace(s, "many")
Replacing digits is a common task. In fact, the pattern [0-9] (covering all the digits) is used so often that there is a shortcut for it: d is equivalent to [0-9].
Dim re, s Set re = New RegExp re.Pattern = "d" s = "a b c d e f 1 g 2 h ... 10 z" MsgBox re.Replace(s, "a number")
The string with the replaced characters is shown in Figure 9-12
Figure 9-12
But what if you wanted to match everything except a digit? Then we can use negation, which is indicated by a circumflex ( ^ ) used within the class square brackets.
Tip |
Note:Using^ outside the square brackets has a totally different meaning and is discussed after the next example. |
Thus, to match any character other than a digit we can use any of the following patterns.
re.Pattern = "[^,0-9]" 'the hard way re.Pattern = "[^d]" 'a little shorter re.Pattern = "[D]" 'another of those special characters
The last option here uses another of the dozen or so special characters. In most cases these characters just save you some extra typing (or act as a good memory shorthand) but a few, like matching tabs and other nonprintable characters, can be very useful.
There are three special characters that anchor a pattern. They don't match any characters themselves but force another pattern to appear at the beginning of the input ( ^ used outside of [] ), the end of the input ( $ ), or at a word boundary (we've already seen ).
Another way by which we can shorten our patterns is using repeat counts. The basic idea is to place the repeat after the character or class. For example, the following pattern, as shown in Figure 9-13, matches both digits and replaces them.
Dim re, s Set re = New RegExp re.Pattern = "d{3}" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a whopping number of")
Figure 9-13
Without the use of the repeat count Figure 9-14 shows that the code would leave the '00' in the final string.
Figure 9-14
Dim re, s Set re = New RegExp re.Pattern = "d" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a whopping number of")
Note also that we can't just set re.Global = True because we'd end up with four instances of the phrase "a whopping number of" in the result. The result is shown in Figure 9-15.
Figure 9-15
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "d" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a whopping number of")
As the previous table shows, we can also specify a minimum number of matches {min} or a range {min , max ,}. Again there are a few repeat patterns that are used so often that they have special short cuts.
re.Pattern = "d+" 'one or more digits, d{1, } re.Pattern = "d*" 'zero or more digits, d{0, } re.Pattern = "d?" 'optional: zero or one, d{0,1} Dim re, s Set re = New RegExp re.Global = True re.Pattern = "d+" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a number")
The output of the last code is shown in Figure 9-16.
Figure 9-16
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "d*" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a number")
The output of the preceding code is shown in Figure 9-17.
Figure 9-17
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "d?" s = "Spain received 100 millimeters of rain in the last 2 weeks." MsgBox re.Replace(s, "a number")
The output of the preceding code is shown in Figure 9-18.
The last special characters we should discuss are remembered matches. These are useful when we want to use some or all of the text that matched our pattern as part of the replacement text-see the Replace method for an example of using remembered matches.
Figure 9-18
To illustrate this, and bring all this discussion of special characters together, let's do something more useful. We want to search an arbitrary text string and locate any URLs within it. To keep this example simple and reasonable in size , we will only be searching for the 'http:' protocols, but we will be handling some of the vulgarities of DNS names , including an unlimited number of domain layers . Don't worry if you 'don't speak DNS, ' what you know from typing URLs into your browser will suffice.
Our code uses yet another of the RegExp object's methods that we'll meet in more detail in the next section. For now, we need only know that Execute simply performs the pattern match and returns each match via a collection. Here's the code.
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. And " s = s & vbCrLf & "http://www.pc.ibm.com - even with 4 levels." Set colMatches = re.Execute(s) For Each match In colMatches MsgBox "Found valid URL: " & match.Value Next
As we'd expect, the real work is done in the line that sets the actual pattern. It looks a bit daunting at first, but it's actually quite easy to follow. Let's break it down.
Our pattern begins with the fixed string http:// . We then use parentheses to group the real workhorse of this pattern. The following highlighted pattern will match one level of a DNS name , including a trailing dot.
re.Pattern = "http://( w[ w-]* w . )*w+"
This pattern begins with one of the special characters we looked at earlier, w , which is used to match [a-zA-Z0-9] , or in English, all the alphanumeric characters. Next we use the class brackets to match either an alphanumeric character or a dash. This is because DNS names can include dashes. Why didn't we use the same pattern before? Simple-because valid DNS names cannot begin or end with a dash. We allow zero or more characters from this expanded class by using the * repeat count.
re.Pattern = "http://(w [ w-]* w..*w+"
After that, we again strictly want an alphanumeric character so our domain name doesn't end in a dash. The last pattern in the parentheses matches the dots (.) used to separate DNS levels.
Note |
we can't use the dot alone because that is a special character that normally matches any single character except a newline. Thus, we 'escape' this character, by preceding it with a slash . |
After wrapping all that in parentheses, just to keep our grouping straight, we again use the * repeat count. So the following highlighted pattern will match any valid domain name followed by a dot. To put it another way, it will match one level of a fully qualified DNS name.
re.Pattern = "http:// { w[ w-]*w .)*w+"
We end the pattern by requiring one or more alphanumeric characters for the top-level domain name (for example, the com , org , edu , and so on.).
re.Pattern = "http://(w[w-]*w.)* w+ "
RegExp Methods
We've covered the properties of the RegExp object, so it's time to take a look at the methods. There are three methods associated with the RegExp object that we can look at.
Execute Method
This method is used to execute a regular expression search against a specified string and returns a Matches collection. This is the trigger in the code to run the pattern matching on the string.
object.Execute(string)
object | Always a RegExp object |
string | The text string which is searched for-required |
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. And " s = s & vbCrLf & "http://www.pc.ibm.com - even with 4 levels." Set colMatches = re.Execute(s) For Each match In colMatches MsgBox "Found valid URL: " & match.Value Next
Note |
Some of Microsoft's own documentation has been known to contain such errors, most of whichthough have hopefully been corrected by now. Remember the result of Execute is always a collection, possibly even an empty collection. You can use a test like if re.Execute(s).count = , or better yet use the Test method, which is designed for this purpose. |
Replace Method
This method is used to replace text found in a regular expression search.
object.Replace(string1, string2)
object | Always a RegExp object. |
string1 | This is the text string in which the text replacement is to occur-required. |
string2 | This is the replacement text string-required. |
MsgBox re.Replace(s, '** TOP SECRET! **')
The output of the last code is shown in Figure 9-19.
Figure 9-19
The Replace method can also replace subexpressions in the pattern. In order to accomplish this we use the special characters $1 , $2 , and so on. in the replace text. These 'parameters' refer to remembered matches.
Backreferencing
A remembered match is simply part of a pattern. This is known as backreferencing. We designate which parts we want to be stored into a temporary buffer by enclosing them in parentheses. Each captured submatch is stored in the order in which it is encountered (from left to right in a regular expressions pattern). The buffer numbers where the submatches are stored begins at 1 and continues up to a maximum of 99 subexpressions. They are then referred to sequentially as $1 , $2 , and so on.
You can override the saving of that part of the regular expression using the noncapturing metacharacters ' ?: ', ' ?= ', or ' ?! '.
In the following example we remember the first five words (consisting of one or more nonwhite space character) and then we display only four of them in the replacement text.
Dim re, s Set re = New RegExp re.Pattern = "(S+)s+(S+)s+(S+)s+(S+)s+(S+)" s = "VBScript is not very cool." MsgBox re.Replace(s, " ")
The output of the preceding code is shown in Figure 9-20.
Figure 9-20
Notice how in the last code we have added a (S+)s+ pair for each word in the string. This is to give the code greater control over how the string is handled. With this we prevent the tail of the string being added to the end of the string displayed. Take great care when using backreferencing to make sure that the outputs you get are what you expect them to be to!
Test Method
The Test method executes a regular expression search against a specified string and returns a Boolean value that indicates whether or not a pattern match was found.
object.Test(string)
object |
Always a |
string |
The text string upon which the regular expression is executed-required |
The Test method returns True if a pattern match is found and False if no match is found. This is the preferred way to determine if a string contains a pattern. Note we often must make patterns case insensitive, as in the following example.
Dim re, s Set re = New RegExp re.IgnoreCase = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "Some long string with http://www.wrox.com buried in it." If re.Test(s) Then MsgBox "Found a URL." Else MsgBox "No URL found." End If
The output of the preceding code is shown in Figure 9-21.
Figure 9-21
The Matches Collection
The Matches collection is a collection of regular expression Match objects.
A Matches collection contains individual Match objects. The only way to create this collection is using the Execute method of the RegExp object. It is important to remember that the Matches collection property is read-only, as are the individual Match objects.
When a regular expression is executed, zero or more Match objects result. Each Match object provides access to three things:
- The string found by the regular expression
- The length of the string
- An index to where the match was found
Remember to set the Global property to True or your Matches collection can never contain more than one member. This is an easy way to create a very simple but hard to trace bug!
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) sMsg = "" For Each objMatch in colMatches sMsg = sMsg & "Match of " & objMatch.Value sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string." sMsg = sMsg & "The length matched is " sMsg = sMsg & objMatch.Length & "." & vbCrLf Next MsgBox sMsg
Matches Properties
Matches is a simple collection and supports just two properties:
Count Item
Count returns the number of items in the collection.
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf {&} "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) MsgBox colMatches.count
The output of the preceding code is shown in Figure 9-22.
Figure 9-22
Item returns an item based on the specified key.
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re re Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) MsgBox colMatches.item(0) MsgBox colMatches.item(1) MsgBox colMatches.item(2)
The Match Object
Match objects are the members in a Matches collection. The only way to create a Match object is by using the Execute method of the RegExp object. When a regular expression is executed, zero or more Match objects can result.
Each Match object provides the following:
- Access to the string found by the regular expression
- The length of the string found
- An index to where in the string the match was found
Match Properties
The Match object has three properties. All three properties are read-only:
- FirstIndex
- Length
- Value
FirstIndex Property
The FirstIndex property returns the position in a search string where a match occurs.
object.FirstIndex
object | Always a Match object |
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) sMsg = "" For Each objMatch in colMatches sMsg = sMsg & "Match of " & objMatch.Value sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string. " sMsg = sMsg & "The length matched is " sMsg = sMsg & objMatch.Length & "." & vbCrLf Next MsgBox sMsg
Length Property
The Length property returns the length of a match found in a search string.
object.Length
object | Always a Match object |
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) sMsg = "" For Each objMatch in colMatches sMsg = sMsg & "Match of " & objMatch.Value sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string. " sMsg = sMsg & "The length matched is " sMsg = sMsg & objMatch.Length & "." & vbCrLf Next MsgBox sMsg
Value Property
The Value property returns the value or text of a match found in a search string.
object.Value
object | Always a Match object. |
Dim re, objMatch, colMatches, sMsg Set re = New RegExp re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is " s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." Set colMatches = re.Execute(s) sMsg = "" For Each objMatch in colMatches sMsg = sMsg & "Match of " & objMatch.Value sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string. " sMsg = sMsg & "The length matched is " sMsg = sMsg & objMatch.Length & "." & vbCrLf Next MsgBox sMsg
A Few Examples
We've covered a lot of theory in the past few pages. Theory is great but you might like to see regular expressions in action. Let's complete this chapter with a few examples of how you can make use of regular expressions to solve real life problems.
Validating Phone Number Input
Validating inputs prevents bogus or dubious information being entered by a user . One piece of information that many developers need to make sure is a telephone number entered correctly. While we cannot write a script to actually check if a number is a valid phone number, we can use script and regular expressions to enforce a format on the input, which helps to eliminate false entry.
Dim re, s, objMatch, colMatches Set re = New RegExp re.Pattern = "([0-9]{3}[0-9]{3}-[0-9]{4}" re.Global = True re.IgnoreCase = True s = InputBox("Enter your phone number in the following Format (XXX)XXX-XXXX:") If re.Test(s) Then MsgBox "Thank you!" Else MsgBox "Sorry but that number is not in a valid format." End If
The code is simple, but again it is the pattern that does all the hard work. Depending on the input, you can get one of two possible output messages, shown in Figures 9-23 and 9-24.
Figure 9-23
Figure 9-24
If you want to make this script applicable in countries with other formats you will have to do a little work on it, but customizing it wouldn't be difficult.
Breaking Down URIs
Here is an example that can be used to break down a Universal Resource Indicator (URI) into its component parts . Take the following URI:
www.wrox.com:80/misc-pages/support.shtml
We can write a script that will break it down into the protocol ( ftp , http , and so on), the domain address, and the page/ path . To do this we can use the following pattern.
"(w+): / /([^ / :]+)(:d*)?( [^ # ]*)"
The following code will carry out the task.
Dim re, s Set re = New RegExp re.Pattern = "(w+): / /( [^ /:]+)(:d*)?( [^ # ]*)" re.Global = True re.IgnoreCase = True s = "http://www.wrox.com:80/misc-pages/support.shtml" MsgBox re.Replace(s, "") MsgBox re.Replace(s, "") MsgBox re.Replace(s, "") MsgBox re.Replace(s, "")
Testing for HTML Elements
Testing for HTML elements is easy; all you need is the right pattern. Here is one that works for elements with both an opening and closing tag.
"<(.*)>.*</>"
How you script this depends on what you want to do. Here is a simple script just for demonstration purpose.
Dim re, s Set re = New RegExp re.IgnoreCase = True re.Pattern = "<(.*)>.*< / >" s = "
This is a paragraph
" If re.Test(s) Then MsgBox "HTML element found." Else MsgBox "No HTML element found." End If
Matching White Space
Sometimes it can be really handy to be able to match white space, that is, lines that are either completely empty, or that only contain white space (spaces and tab characters ). Here is the pattern you would need for that.
"^[ ]*$"
That breaks down to the following:
^ -Matches the start of the line.
[ ] *-Match zero or more space or tab ( ) characters.
$ -Match the end of the line.
Dim re, s, colMatches, objMatch, sMsg Set re = New RegExp re.Global = True re.Pattern = "^[ ]*$" s = " " Set colMatches = re.Execute(s) sMsg = "" For Each objMatch in colMatches sMsg = sMsg & "Blank line found at position " & objMatch.FirstIndex & " of the string." Next MsgBox sMsg
Matching HTML Comment Tags
When you come to the section on Windows Script Host we'll show you how you can use VBScript and Widows Script Host to work with the file system. Once you can do this, reading and modifying files becomes within your reach. One good application of regular expressions might be to look for comment tags within an HTML file. You could then choose to remove these before making the files available on the Web.
Here is a script that can detect HTML comment tags.
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "^.*.*$" s = "
A Title " If re.Test(s) Then MsgBox "HTML comment tags found." Else MsgBox "No HTML comment tags found." End If
With a simple modification to the pattern and the use of Replace method, we can get the script to remove the comment tag altogether.
Dim re, s Set re = New RegExp re.Global = True re.Pattern = "(^.*)()(.*$)" s = "
A Title " If re.Test(s) Then MsgBox "HTML comment tags found." Else MsgBox "No HTML comment tags found." End If MsgBox re.Replace(s, "" & "")
Summary
In this chapter we've covered, in depth, regular expressions and how they fit into the world of VBScript. You've seen how regular expressions can be used to carry out effective, flexible pattern matching within text strings. You've also seen examples of what can be done by effectively integrating regular expressions with script together with examples of customizable find and replace within text strings as well as input validations.
Learning to use regular expressions can seem a bit daunting and even those comfortable with programming sometimes find regular expressions forbidding and choose instead less flexible solutions. However, the power and flexibility that regular expressions give to the programmer is immense and your efforts will be quickly rewarded!