A Practical Guide to LinuxR Commands, Editors, and Shell Programming

 < Day Day Up > 

A gawk program (from the command line or from program-file) consists of one or more lines containing a pattern and/or action in the following format:

pattern { action }

The pattern selects lines from the input. The gawk utility performs the action on all lines that the pattern selects. The braces surrounding the action enable gawk to differentiate it from the pattern. If a program line does not contain a pattern, gawk selects all lines in the input. If a program line does not contain an action, gawk copies the selected lines to standard output.

To start, gawk compares the first line of input (from the file-list or standard input) with each pattern in the program. If a pattern selects the line (if there is a match), gawk takes the action associated with the pattern. If the line is not selected, gawk takes no action. When gawk has completed its comparisons for the first line of input, it repeats the process for the next line of input, continuing this process of comparing subsequent lines of input until it has read all of the input.

If several patterns select the same line, gawk takes the actions associated with each of the patterns in the order in which they appear in the program. It is possible for gawk to send a single line from the input to standard output more than once.

Patterns

You can use a regular expression (Appendix A), enclosed within slashes, as a pattern. The ~ operator tests whether a field or variable matches a regular expression. The !~ operator tests for no match. You can perform both numeric and string comparisons using the relational operators listed in Table 12-1. You can combine any of the patterns using the Boolean operators || (OR) or && (AND).

Table 12-1. Relational operators

Relop

Meaning

<

Less than

<=

Less than or equal to

= =

Equal to

!=

Not equal to

>=

Greater than or equal to

>

Greater than

BEGIN and END

Two unique patterns, BEGIN and END, execute commands before gawk starts its processing and after it finishes. The gawk utility executes the actions associated with the BEGIN pattern before, and with the END pattern after, it processes all the input.

, (comma)

The comma is the range operator. If you separate two patterns with a comma on a single gawk program line, gawk selects a range of lines, beginning with the first line that matches the first pattern. The last line gawk selects is the next subsequent line that matches the second pattern. If no line matches the second pattern, gawk selects every line through the end of the input. After gawk finds the second pattern, it begins the process again by looking for the first pattern again.

Actions

The action portion of a gawk command causes gawk to take that action when it matches a pattern. When you do not specify an action, gawk performs the default action, which is the print command (explicitly represented as {print}). This action copies the record (normally a line see "Variables") from the input to standard output.

When you follow a print command with arguments, gawk displays only the arguments you specify. These arguments can be variables or string constants. You can send the output from a print command to a file (>), append it to a file (>>), or send it through a pipe to the input of another program ( | ). A coprocess (|&) is a two-way pipe that exchanges data with a program running in the background (page 557).

Unless you separate items in a print command with commas, gawk catenates them. Commas cause gawk to separate the items with the output field separator (OFS, normally a SPACE see "Variables").

You can include several actions on one line by separating them with semicolons.

Comments

The gawk utility disregards anything on a program line following a pound sign (#). You can document a gawk program by preceding comments with this symbol.

Variables

Although you do not need to declare gawk variables prior to their use, you can optionally assign initial values to them. Unassigned numeric variables are initialized to 0; string variables are initialized to the null string. In addition to user variables, gawk maintains program variables. You can use both user and program variables in the pattern and in the action portion of a gawk program. Table 12-2 lists a few program variables.

Table 12-2. Variables

Variable

Meaning

$0

The current record (as a single variable)

$1 $n

Fields in the current record

FILENAME

Name of the current input file (null for standard input)

FS

Input field separator (default: SPACE or TAB)

NF

Number of fields in the current record

NR

Record number of the current record

OFS

Output field separator (default: SPACE)

ORS

Output record separator (default: NEWLINE)

RS

Input record separator (default: NEWLINE)

In addition to initializing variables within a program, you can use the assign ( v) option to initialize variables on the command line. This feature is useful when the value of a variable changes from one run of gawk to the next.

By default the input and output record separators are NEWLINE characters. Thus gawk takes each line of input to be a separate record and appends a NEWLINE to the end of each output record. By default the input field separators are SPACE s and TABs. The default output field separator is a SPACE. You can change the value of any of the separators at any time by assigning a new value to its associated variable either from within the program or from the command line by using the assign ( v) option.

Functions

Table 12-3 lists a few of the functions that gawk provides for manipulating numbers and strings.

Table 12-3. Functions

Function

Meaning

length(str)

Returns the number of characters in str; without an argument, returns the number of characters in the current record

int(num)

Returns the integer portion of num

index(str1,str2 )

Returns the index of str2 in str1 or 0 if str2 is not present

split(str,arr,del )

Places elements of str, delimited by del, in the array arr [1]...arr [n]; returns the number of elements in the array

sprintf(fmt,args)

Formats args according to fmt and returns the formatted string; mimics the C programming language function of the same name

substr(str,pos,len)

Returns the substring of str that begins at pos and is len characters long

tolower(str)

Returns a copy of str in which all uppercase letters are replaced with their lowercase counterparts

toupper(str)

Returns a copy of str in which all lowercase letters are replaced with their uppercase counterparts

Arithmetic Operators

The gawk arithmetic operators listed in Table 12-4 are from the C programming language.

Table 12-4. Arithmetic operators

Operator

Meaning

*

Multiplies the expression preceding the operator by the expression following it

/

Divides the expression preceding the operator by the expression following it

%

Takes the remainder after dividing the expression preceding the operator by the expression following it

+

Adds the expression preceding the operator to the expression following it

Subtracts the expression following the operator from the expression preceding it

=

Assigns the value of the expression following the operator to the variable preceding it

++

Increments the variable preceding the operator

Decrements the variable preceding the operator

+=

Adds the expression following the operator to the variable preceding it and assigns the result to the variable preceding the operator

=

Subtracts the expression following the operator from the variable preceding it and assigns the result to the variable preceding the operator

*=

Multiplies the variable preceding the operator by the expression following it and assigns the result to the variable preceding the operator

/=

Divides the variable preceding the operator by the expression following it and assigns the result to the variable preceding the operator

%=

Assigns the remainder, after dividing the variable preceding the operator by the expression following it, to the variable preceding the operator

Associative Arrays

An associative array is one of gawk's most powerful features. These arrays use strings as indexes. Using an associative array, you can mimic a traditional array by using numeric strings as indexes.

You assign a value to an element of an associative array just as you would assign a value to any other gawk variable. The syntax is

array[string] = value

where array is the name of the array, string is the index of the element of the array you are assigning a value to, and value is the value you are assigning to that element.

You can use a special for structure with an associative array. The syntax is

for (elem in array) action

where elem is a variable that takes on the value of each element of the array as the for structure loops through them, array is the name of the array, and action is the action that gawk takes for each element in the array. You can use the elem variable in this action.

The "Examples" section found later in this chapter contains programs that use associative arrays.

printf

You can use the printf command in place of print to control the format of the output that gawk generates. The gawk version of printf is similar to that found in the C language. A printf command has the following syntax:

printf "control-string", arg1, arg2, ..., argn

The control-string determines how printf formats arg1, arg2, ..., argn. These arguments can be variables or other expressions. Within the control-string you can use \n to indicate a NEWLINE and \t to indicate a TAB. The control-string contains conversion specifications, one for each argument. A conversion specification has the following syntax:

%[ ][x[.y]]conv

where causes printf to left-justify the argument; x is the minimum field width, and .y is the number of places to the right of a decimal point in a number. The conv indicates the type of numeric conversion and can be selected from the letters in Table 12-5. Refer to "Examples" later in this chapter for examples of how to use printf.

Table 12-5. Numeric conversion

conv

Type of conversion

d

Decimal

e

Exponential notation

f

Floating-point number

g

Use f or e, whichever is shorter

o

Unsigned octal

s

String of characters

x

Unsigned hexadecimal

Control Structures

Control (flow) statements alter the order of execution of commands within a gawk program. This section details the if...else, while, and for control structures. In addition, the break and continue statements work in conjunction with the control structures to alter the order of execution of commands. See page 436 for more information on control structures. You do not need to use braces around commands when you specify a single, simple command.

if...else

The if...else control structure tests the status returned by the condition and transfers control based on this status. The syntax of an if...else structure is shown below. The else part is optional.

if (condition) {commands} [else {commands}]

The simple if statement shown here does not use braces:

if ($5 <= 5000) print $0

Next is a gawk program that uses a simple if...else structure. Again, there are no braces.

$ cat if1 BEGIN { nam="sam" if (nam == "max") print "nam is max" else print "nam is not max, it is", nam } $ gawk -f if1 nam is not max, it is sam

while

The while structure loops through and executes the commands as long as the condition is true. The syntax of a while structure is

while (condition) {commands}

The next gawk program uses a simple while structure to display powers of 2. This example uses braces because the while loop contains more than one statement.

$ cat while1 BEGIN { n = 1 while (n <= 5) { print n "^2", 2**n n++ } } $ gawk -f while1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32

for

The syntax of a for control structure is

for (init; condition; increment) {commands}

A for structure starts by executing the init statement, which usually sets a counter to 0 or 1. It then loops through the commands as long as the condition is true. After each loop it executes the increment statement. The for1 gawk program does the same thing as the preceding while1 program except that it uses a for statement, which makes the program simpler:

$ cat for1 BEGIN { for (n=1; n <= 5; n++) print n "^2", 2**n } $ gawk -f for1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32

The gawk utility supports an alternative for syntax for working with associative arrays:

for (var in array) {commands}

This for structure loops through elements of the associative array named array, assigning the value of the index of each element of array to var each time through the loop.

END {for (name in manuf) print name, manuf[name]}

break

The break statement transfers control out of a for or while loop, terminating execution of the innermost loop it appears in.

continue

The continue statement transfers control to the end of a for or while loop, causing execution of the innermost loop it appears in to continue with the next iteration.

     < Day Day Up > 

    Категории