Grep

Overview

Grep is a scan utility that originated with Unix systems, but is now implemented in many environments, including Qshell. The grep utility looks for files that contain character strings. Although grep is typically used with text files, it may also be used with binary files.

Anyone who deals with the Integrated File System should learn to use grep , because the Find String Using PDM command (FNDSTRPDM) won't work with IFS files. Even if you don't deal with IFS, you still might want use grep because it works with source physical files and has more powerful search capabilities than FNDSTRPDM.

Different sources give different versions of the origin of the term grep , but it's likely that it comes from a search command in the ed and ex text editors. The command is g/re/p, where g indicates that the search is global (that is, over the entire file), re indicates that a regular expression describes the search string, and p indicates that the results are to be printed (i.e., displayed on the screen).

Here is the syntax of grep :

grep [options] regular-expression [input-files]

Regular Expressions

The regular-expression parameter is the string for which you are searching. The simplest form of a regular expression is an exact sequence of characters for which the system is to search. In the following example, grep searches all files whose names end with .java (i.e., all Java source files) in the current directory for the string print :

grep print *.java

By default, the search is case-sensitive, so this grep command will not find Print , PRINT , pRINT , or any other combination of cases.

Grep would be useful even if this were the only kind of search it could perform, but grep can do much more, because it knows how to interpret metacharacters (sophisticated versions of wildcards). Table 17.1 describes these special symbols and their meanings.

Table 17.1: Metacharacters for Use with Grep

Metacharacter

Description

(period)

Match any character except end-of-line.

*

Match zero or more occurrences of the preceding pattern.

^

Match from the beginning of the line.

$

Match from the end of the line.

[ ]

Match any character within the brackets. Ranges may be specified with a hyphen.

[^ ]

Negates the groups or ranges of characters in the brackets. The caret must be the first character within the brackets.

{m}

Match exactly m occurrences of the preceding pattern.

{m,}

Match m or more occurrences of the preceding pattern.

{m,n}

Match m to n occurrences of the preceding pattern.

Turn off the special meaning of the following pattern.

()

Define a back reference to save matched characters as a pattern. The matched pattern can be referred to with a backslash followed by a number later in the expression.

You may also use certain symbolic names in place of characters. These are shown in Table 17.2.

Table 17.2: Symbolic Names

Symbol

Description

[[:alpha:]]

Any letter in either case

[[:upper:]]

Any uppercase letter

[[:lower:]]

Any lowercase letter

[[:digit:]]

Any decimal digit

[[:xdigit:]]

Any hexadecimal digit, where A-F may be upper or lowercase

[[:alnum:]]

Any letter or decimal digit

[[:space:]]

Any space, tab, carriage -return, or formfeed character

[[:blank:]]

Any space or tab character

[[:punct:]]

Any punctuation mark

[[:cntrl:]]

Any control character

[[:print:]]

Any printable character

[[:graph:]]

Any character that is not a letter, digit, or punctuation mark

Grep Examples

To illustrate how regular expressions work, several grep examples follow, along with explanations of what each one accomplishes. The data file being searched is goodoleboys.txt, shown in Figure 17.1.

cat goodoleboys.txt Name Born Phone Dog Wife Shotgun Paid ========= ======== ======== ======== ========= ======= ===== Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Amos Jan 4 333-1119 Amos Abigail 20 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12 Bill Feb 29 333-4444 Daisy Daisy 20 Ernest T. ?? none none none none

Figure 17.1: The goodoleboys.txt file is used for the search examples that follow.

The first example simply finds all lines that begin with uppercase C:

grep ^C goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Claude May 31 333-4340 Blue Etheline 12

Figure 17.2 is a slightly more complex example. It finds all lines that end with a zero. The first form of grep shown in the figure is used with files that are delimited with a single linefeed character, as is typical of Unix files. The second form is for files that are delimited with a combination of carriage-return and linefeed characters. The [[:cntrl:]] expression allows for the carriage return.

grep '0$' goodoleboys.txt grep '0[[:cntrl:]]$' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20

Figure 17.2: Find lines that end with a zero.

The following grep command looks for lines that contain a dollar sign followed by any two characters and a period:

grep '$...' goodoleboys.txt Arlis June 19 444-1314 Redeye Suzy Beth 12 .75

The first two periods function as metacharacters. The dollar sign and last period do not because they are preceded with backslashes.

These two grep commands find lines that contain a single quote:

grep " ' " goodoleboys.txt grep ' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12

The first line shows that double quotes can "escape" single quotes.

This command finds lines with three zeros together:

grep '0{3}' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12

Figure 17.3 expands on the previous examples to find lines where the same uppercase letter followed by a lowercase letter is repeated. The (and ) pair indicates that the match is to be saved as a pattern, which can be referred to as 1. If other patterns were saved, they would be referred to as 2, 3 , etc. These expressions are known as back references .

grep '([A-Z][a-z]).*' goodoleboys.txt Bubba Oct 13 444-1111 Buck Mary Jean 12 Amos Jan 4 333-1119 Amos Abigail 20 Otis Sept 17 444-8000 Ol' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20

Figure 17.3: Find lines where a particular pair of uppercase and lowercase letters are repeated.

In the first line returned, the pattern Bu is found twice. In the second line, Ro is repeated. In the third line, Sa is repeated.

Figure 17.4 illustrates the use of grep metacharacters that have to do with including and excluding characters in a search.

grep '^[CR]' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 grep '^[C-R]' goodoleboys.txt Name Born Phone Dog Wife Shotgun Paid Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Junior April 30 BR-549 Percival Lilly Faye 12 Ernest T. ?? none none none none grep '^[^p-r]' goodoleboys.txt Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410

Figure 17.4: Find lines that begin with C or R , then find lines that begin with any letter between C and R (inclusive). Finally, find lines that contain an A that is not followed by a letter from p to r , inclusive.

The three grep commands in Figure 17.5 use symbolic names. The first command finds lines with an alphabetic character, in either case, followed by a hyphen. The second command finds lines with white space followed by exactly three digits and more white space. The last command finds lines where a letter is followed by a punctuation mark.

grep '[[:alpha:]]-' goodoleboys.txt Junior April 30 BR-549 Percival Lilly Faye 12 grep '[[:space:]][[:digit:]]{3}[[:space:]]' goodoleboys.txt Roscoe Feb 2 444-2234 Rover Alice Jean 410 grep '[[:alpha:]][[:punct:]]' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12 Junior April 30 BR-549 Percival Lilly Faye 12 Ernest T. ?? none none none none

Figure 17.5: These commands illustrate the use of symbolic names.

Figures 17.6 and 17.7 combine metacharacters and symbolic names to perform complex searches. Figure 17.6 finds lines whose first non-blank token is a group of five to seven letters in any case. Figure 17.7 finds lines that contain a zero followed by a printable character, and then finds lines where the zero is followed by a control character.

grep '^[[:space:]]*[[:alpha:]]{5,7}[[:space:]]' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12 Ernest T. ?? none none none none

Figure 17.6: Find lines whose first non-blank token is a group of five to seven letters in any case.

grep '0[[:print:]]' goodoleboys.txt Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12 grep '0[[:cntrl:]]' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20

Figure 17.7: Find lines where the character (zero) is followed by a printable character, then find lines where zero is followed by a control character.

Quotes

A search pattern does not have to be enclosed in quotes if it does not contain any white space or special characters. In the following example, grep searches for the string an :

grep an goodoleboys.txt

However, there's nothing wrong with placing single or double quotes around a search string that has no blanks. Therefore, the following two grep commands are equivalent to the previous one:

grep 'an' goodoleboys.txt grep "an" goodoleboys.txt

When the search argument includes a parameter or variable, you need to use quotes, unless you are sure that the parameter or variable will never contain blanks. Even so, it is good to use quotes just to be on the safe side.

The grep command in Figure 17.8 fails when $searchname is not quoted because Qshell sees Billy as the search string, Bob as the first file name, and goodoleboys.txt as the second file name. Grep succeeds only when the search argument is quoted.

/home/JSMITH $ searchname= 'Billy Bob' /home/JSMITH $ grep $searchname goodoleboys.txt grep: 001-0023 Error found opening file Bob. No such path or directory. /home/JSMITH $ grep "$searchname" goodoleboys.txt

Figure 17.8: The grep search fails if the search pattern is not quoted because of an embedded space.

Single quotes and double quotes function differently in Qshell. Single quotes, also called strong quotes , protect from parameter substitution. Double quotes, also called weak quotes , permit parameter substitution.

Figure 17.9 illustrates this point. The echo command shows that the fifth positional parameter has the value 444 . Parameter substitution occurs in the first grep command, in which the search pattern is not quoted, and the second grep command, in which the search pattern is delimited by weak quotes. In the third grep command, parameter substitution does not take place; grep looks for the string $5 (five dollars).

echo 444 /home/JSMITH $ grep goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 Ol' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Bill Feb 29 333-4444 Daisy Daisy 20 /home/JSMITH $ grep "" goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 O1' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Bill Feb 29 333-4444 Daisy Daisy 20 /home/JSMITH $ grep '' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12

Figure 17.9: Strong quotes forbid parameter substitution; weak quotes allow it.

Did you notice the strange thing that grep does in this example? It matches a literal dollar-sign character ($) instead of treating it as an end-of-line metacharacter. Why? If the dollar sign is interpreted as end-of-line, then $5 is an illegal regular expression. Although taking advantage of this behavior will show that you are a real grep guru, you shouldn't rely on it. It's tricky and unclear to whoever might be changing the script later ”even if that person is you. Instead, to search for a literal dollar sign, use $ .

Here is the rule of thumb you should keep in mind:

Use double quotes if the search string includes the name of a variable whose value is to be substituted. Otherwise, use single quotes.

Files

You may list one or more file names at the end of the grep command. Each name can be an individual file, or it may contain wildcard characters for file-name expansion (globbing, discussed in chapter 14). If you omit the input-files parameter, grep reads from stdin. The only time you are likely to omit the input-files parameter, however, is when grep is reading the output of another command through a pipeline.

In the following example, output of the List Directory Contents command ( ls ) is the input to grep :

ls grep -i '[A-Z][12]'

This example lists files in the current directory whose names contain a letter of the alphabet, followed by either a one or a two.

There are several different ways to fill in the input-files parameter. One way is to list a file name, like this:

grep '22.34' mydata.csv

In this case, only one file (mydata.csv) is searched for in the current directory. You can specify a full path on the file name, of course:

grep '22.34' /home/jsmith/mydata.csv

You may want to use globbing to search more than one file at a time. The following example shows how to search all the CSV files in the current directory:

grep '22.34' *.csv

In the preceding two examples, the input-files parameter has only one argument. You can list more than one file, if you wish, separating them with white space. The command shown here searches three files:

grep '22.34' fileone.csv filetwo.csv filethree.csv

All of these commands search IFS files, but you can search source physical file members, too. For example, the next command searches all members of MYLIB/MYSRC for the string pgm :

grep 'pgm' /qsys.lib/mylib.lib/src.file/*

You can mix and match IFS files and source physical files, too. In this command, grep searches all members of a source physical file, as well as all HTML and text files in the current IFS directory:

grep 'pgm' /qsys.lib/js.lib/src.file/* *.htm* *.txt

Options

Options can be added to affect the behavior of grep . Table 17.3 contains a list of the permitted options.

Table 17.3: Grep Options

Option

Description

Release

-E

Use extended regular expressions (egrep, discussed later in this chapter).

V4R3

-F

Treat metacharacters literally. (See the discussion of fgrep, later in this chapter).

V4R3

-H

If the -R option is specified, symbolic links on the command line are followed. Symbolic links encountered in the tree traversal are not followed.

V5R2

-L

If the -R option is specified, both symbolic links on the command line and symbolic links encountered in the tree traversal are followed.

V5R2

-P

If the -R option is specified, no symbolic links are followed.

V5R2

-R

If the file designates a directory, grep searches each file in the entire subtree connected at that point.

V5R2

-c

Output consists of file names and the number of matched lines.

V4R3

-e

Multiple search patterns follow, separated by newline characters .

V4R3

-f

The argument following -f is the name of a file that contains search patterns. Each pattern must be separated by a newline character.

V4R3

-h

Do not print the filename.

V4R3

-i, -y

Ignore the case of letters in making comparisons.

V4R3

-l (ell)

Output consists of filenames, not matching lines.

V4R3

-n

Print a line number. This option is ignored if the -c, -l, or -s options are specified.

V4R3

-q

Quiet mode; no messages are printed.

V4R3

-s

Suppress the error messages ordinarily written for nonexistent or unreadable files. Other messages are not suppressed.

V4R3

-v

Invert the search ”print the lines that do not match the search patterns.

V4R3

-w

Search for the expression as a whole word.

V4R3

-x

Match only if the search pattern is the only thing on the line. The -w option is ignored if specified.

V4R3

The following examples illustrate how you can use options to run more powerful searches. This command searches for the string bi , regardless of case:

grep -i bi goodoleboys.txt Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Amos Jan 4 333-1119 Amos Abigail 20 Bill Feb 29 333-4444 Daisy Daisy 20

This example searches for Bill as a whole word, not as part of a word:

grep -w Bill goodoleboys.txt Bill Feb 29 333-4444 Daisy Daisy 20

The example in Figure 17.10 returns lines that meet any of three criteria, as if the conditions were concatenated with ORs. Grep finds lines that include an uppercase C , followed by any character, followed by a lowercase u . Grep also includes lines that contain the string BR , as well as those that end with the character 5 followed by any other character.

/home/JSMITH $ > grep -e 'C.u > > BR > > 5.$' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Otis Sept 17 444-8000 Ol' Sal Sally 12 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12

Figure 17.10: Grep returns lines that meet any of three criteria.

Figure 17.11 carries out the same search as in Figure 17.10, but reads the search patterns from file greppats.txt instead of from the command line.

/home/JSMITH $ cat greppats.txt C.u BR 5.$ /home/JSMITH $ grep -f greppats.txt goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Otis Sept 17 444-8000 Ol' Sal Sally 12 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12

Figure 17.11: Grep reads the search patterns from file greppats.txt to carry out the search.

Extended Regular Expressions (Egrep)

Extended regular expressions are an alternative to the basic regular expressions discussed so far in this chapter. When you use an extended regular expression, grep recognizes a different set of metacharacters, which are listed in Table 17.4.

Table 17.4: Egrep Metacharacters

Metacharacter

Description

(period)

Match any character except end of line.

(vertical bar)

Perform an OR.

?

The preceding pattern is optional and is to be matched at most once.

*

The preceding pattern is optional and is to be matched zero or more times.

+

The preceding pattern is not optional and is to be matched one or more times.

^

Match from the beginning of the line.

$

Match from the end of the line.

[ ]

Match any character within the brackets. Ranges may be

specified with a hyphen.

Turn off the special meaning of the following character.

()

Group characters or patterns into a larger pattern for more complex matches. For example, (abc)+ matches abc, abcabc, abcabcabc, etc.

There are two ways to use the alternate metacharacter set. One way is with the egrep utility; the other is to use grep with the -E option. The following examples illustrate the egrep features that differ from basic regular expressions. In the first example, egrep finds lines that contain either Oct or Feb :

egrep 'OctFeb' goodoleboys.txt Bubba Oct 13 444-1111 Buck Mary Jean 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20

The following command finds lines with an uppercase B , zero or one i , and a lowercase l :

egrep 'Bi?l' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Claude May 31 333-4340 Blue Etheline 12 Bill Feb 29 333-4444 Daisy Daisy 20

Finally, this example finds lines where an uppercase E is preceded by zero or more spaces:

egrep ' *E' goodoleboys.txt Claude May 31 333-4340 Blue Etheline 12 Ernest T. ?? none none none non e

Fgrep

You can tell grep to interpret metacharacters literally when you are looking for a character that would otherwise be interpreted as a metacharacter. For example, you might wish to search for a period, which grep interprets as a wildcard that stands for any character. You have already learned one method to have the period metacharacter interpreted literally, which is to precede it with a back-slash, as shown here:

grep '.' goodoleboys.txt

A second method is to use the -F option, which tells grep to treat metacharacters literally:

grep -F '.' goodoleboys.txt

A third method is to use the fgrep utility. Fgrep, which stands for fast grep or fixed grep (depending on whom you ask), does not interpret metacharacters. The following example illustrates the use of fgrep :

fgrep '.' goodoleboys.txt

The fgrep method is probably the easiest of the three. However, it cannot be used when you want to interpret some metacharacters literally, but make others use their wildcard abilities , as in this example:

grep '$[0-9]*.[0-9]*.$' goodoleboys.txt

This command searches for records that have a dollar sign, then zero or more digits, then a period, then zero or more digits again, and finally one other character, anchored at the end of the line. Table 17.5 shows the parts of this search string.

Table 17.5: A Search String, Explained

Symbols

Description

$

Search for a dollar sign.

[0 “9]*

Search for zero or more digits.

.

Search for a period.

[0 “9]*

Search for zero or more digits, again.

.

Search for any character.

$

Search for the end-of-line anchor.

Exit Status

Many Unix utilities only return a non-zero error status in case of an error. Grep is a bit more useful. As Table 17.6 shows, grep distinguishes between an error and a case of not finding the search string. For example, if the file does not exist, the exit status is 2.

Table 17.6: Exit Status Codes Set by Grep

Status

Description

The search string was found.

1

The search string was not found in the specified files.

>1

An error occurred.

Sometimes, grep both succeeds and fails. In Figure 17.12, for example, grep finds the first file, but not the second one. Grep reports failure even though the search was partially successful.

grep 'B[lu]' goodoleboys.txt nosuchfile.data goodoleboys.txt:Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 goodoleboys.txt:Bubba Oct 13 444-1111 Buck Mary Jean 12 goodoleboys.txt:Claude May 31 333-4340 Blue Etheline 12 grep: 001-0023 Error found opening file nosuchfile.data. No such path or directory. /home/JSMITH $ echo $? 2 /home/JSMITH $

Figure 17.12: Grep reports failure if any part of the search process fails.

In Figure 17.13, the order of the files is reversed . Grep never searches the second file, since the first file does not exist.

grep 'B[lu]' nosuchfile.data goodoleboys.txt grep: 001-0023 Error found opening file nosuchfile.data. No such path or directory. /home/JSMITH $ echo $? 2 /home/JSMITH $

Figure 17.13: Grep reports failure as soon as possible.

Figure 17.14 contains a similar example, using source physical file members rather than IFS files. Notice the exit status of each command.

grep -i 'goto' /qsys.lib/jsmith.lib/qrpglesrc.file/M*.MBR /home/JSMITH $ echo $? 1 grep -i 'goto' /qsys.lib/jsmith.lib/qrpglesrc.file/B*.MBR grep: 001-0023 Error found opening file /qsys.lib/jsmith.lib/qrpglesrc.file/B*.MBR. No such path or directory. /home/JSMITH $ echo $? 2 /home/JSMITH $

Figure 17.14: Grep distinguishes between an unsuccessful search and nonexistent source physical file members.

The first grep returns an exit status of one because there are members that start with M , but none have goto in them. The second grep returns an exit status of two because there are no members that start with B in file QRPGLESRC in library JSMITH.

Summary

The grep utility searches files for strings that match a pattern coded as a regular expression. Grep is very powerful because it allows regular expressions to include metacharacters. The egrep and fgrep utilities are variations on grep that are useful when grep's normal behavior does not achieve the desired results.

Категории