A Practical Guide to LinuxR Commands, Editors, and Shell Programming
< Day Day Up > |
A gawk program (from the command line or from program-file) consists of one or more lines containing a pattern and/or action in the following format: pattern { action } The pattern selects lines from the input. The gawk utility performs the action on all lines that the pattern selects. The braces surrounding the action enable gawk to differentiate it from the pattern. If a program line does not contain a pattern, gawk selects all lines in the input. If a program line does not contain an action, gawk copies the selected lines to standard output. To start, gawk compares the first line of input (from the file-list or standard input) with each pattern in the program. If a pattern selects the line (if there is a match), gawk takes the action associated with the pattern. If the line is not selected, gawk takes no action. When gawk has completed its comparisons for the first line of input, it repeats the process for the next line of input, continuing this process of comparing subsequent lines of input until it has read all of the input. If several patterns select the same line, gawk takes the actions associated with each of the patterns in the order in which they appear in the program. It is possible for gawk to send a single line from the input to standard output more than once. Patterns
You can use a regular expression (Appendix A), enclosed within slashes, as a pattern. The ~ operator tests whether a field or variable matches a regular expression. The !~ operator tests for no match. You can perform both numeric and string comparisons using the relational operators listed in Table 12-1. You can combine any of the patterns using the Boolean operators || (OR) or && (AND).
BEGIN and END
Two unique patterns, BEGIN and END, execute commands before gawk starts its processing and after it finishes. The gawk utility executes the actions associated with the BEGIN pattern before, and with the END pattern after, it processes all the input. , (comma)
The comma is the range operator. If you separate two patterns with a comma on a single gawk program line, gawk selects a range of lines, beginning with the first line that matches the first pattern. The last line gawk selects is the next subsequent line that matches the second pattern. If no line matches the second pattern, gawk selects every line through the end of the input. After gawk finds the second pattern, it begins the process again by looking for the first pattern again. Actions
The action portion of a gawk command causes gawk to take that action when it matches a pattern. When you do not specify an action, gawk performs the default action, which is the print command (explicitly represented as {print}). This action copies the record (normally a line see "Variables") from the input to standard output. When you follow a print command with arguments, gawk displays only the arguments you specify. These arguments can be variables or string constants. You can send the output from a print command to a file (>), append it to a file (>>), or send it through a pipe to the input of another program ( | ). A coprocess (|&) is a two-way pipe that exchanges data with a program running in the background (page 557). Unless you separate items in a print command with commas, gawk catenates them. Commas cause gawk to separate the items with the output field separator (OFS, normally a SPACE see "Variables"). You can include several actions on one line by separating them with semicolons. Comments
The gawk utility disregards anything on a program line following a pound sign (#). You can document a gawk program by preceding comments with this symbol. Variables
Although you do not need to declare gawk variables prior to their use, you can optionally assign initial values to them. Unassigned numeric variables are initialized to 0; string variables are initialized to the null string. In addition to user variables, gawk maintains program variables. You can use both user and program variables in the pattern and in the action portion of a gawk program. Table 12-2 lists a few program variables.
In addition to initializing variables within a program, you can use the assign ( v) option to initialize variables on the command line. This feature is useful when the value of a variable changes from one run of gawk to the next. By default the input and output record separators are NEWLINE characters. Thus gawk takes each line of input to be a separate record and appends a NEWLINE to the end of each output record. By default the input field separators are SPACE s and TABs. The default output field separator is a SPACE. You can change the value of any of the separators at any time by assigning a new value to its associated variable either from within the program or from the command line by using the assign ( v) option. Functions
Table 12-3 lists a few of the functions that gawk provides for manipulating numbers and strings.
Arithmetic Operators
The gawk arithmetic operators listed in Table 12-4 are from the C programming language.
Associative Arrays
An associative array is one of gawk's most powerful features. These arrays use strings as indexes. Using an associative array, you can mimic a traditional array by using numeric strings as indexes. You assign a value to an element of an associative array just as you would assign a value to any other gawk variable. The syntax is array[string] = value where array is the name of the array, string is the index of the element of the array you are assigning a value to, and value is the value you are assigning to that element. You can use a special for structure with an associative array. The syntax is for (elem in array) action
where elem is a variable that takes on the value of each element of the array as the for structure loops through them, array is the name of the array, and action is the action that gawk takes for each element in the array. You can use the elem variable in this action. The "Examples" section found later in this chapter contains programs that use associative arrays. printf
You can use the printf command in place of print to control the format of the output that gawk generates. The gawk version of printf is similar to that found in the C language. A printf command has the following syntax: printf "control-string", arg1, arg2, ..., argn
The control-string determines how printf formats arg1, arg2, ..., argn. These arguments can be variables or other expressions. Within the control-string you can use \n to indicate a NEWLINE and \t to indicate a TAB. The control-string contains conversion specifications, one for each argument. A conversion specification has the following syntax: %[ ][x[.y]]conv
where causes printf to left-justify the argument; x is the minimum field width, and .y is the number of places to the right of a decimal point in a number. The conv indicates the type of numeric conversion and can be selected from the letters in Table 12-5. Refer to "Examples" later in this chapter for examples of how to use printf.
Control Structures
Control (flow) statements alter the order of execution of commands within a gawk program. This section details the if...else, while, and for control structures. In addition, the break and continue statements work in conjunction with the control structures to alter the order of execution of commands. See page 436 for more information on control structures. You do not need to use braces around commands when you specify a single, simple command. if...else
The if...else control structure tests the status returned by the condition and transfers control based on this status. The syntax of an if...else structure is shown below. The else part is optional. if (condition) {commands} [else {commands}] The simple if statement shown here does not use braces: if ($5 <= 5000) print $0
Next is a gawk program that uses a simple if...else structure. Again, there are no braces. $ cat if1 BEGIN { nam="sam" if (nam == "max") print "nam is max" else print "nam is not max, it is", nam } $ gawk -f if1 nam is not max, it is sam
while
The while structure loops through and executes the commands as long as the condition is true. The syntax of a while structure is while (condition) {commands}
The next gawk program uses a simple while structure to display powers of 2. This example uses braces because the while loop contains more than one statement. $ cat while1 BEGIN { n = 1 while (n <= 5) { print n "^2", 2**n n++ } } $ gawk -f while1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32 for
The syntax of a for control structure is for (init; condition; increment) {commands}
A for structure starts by executing the init statement, which usually sets a counter to 0 or 1. It then loops through the commands as long as the condition is true. After each loop it executes the increment statement. The for1 gawk program does the same thing as the preceding while1 program except that it uses a for statement, which makes the program simpler: $ cat for1 BEGIN { for (n=1; n <= 5; n++) print n "^2", 2**n } $ gawk -f for1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32
The gawk utility supports an alternative for syntax for working with associative arrays: for (var in array) {commands}
This for structure loops through elements of the associative array named array, assigning the value of the index of each element of array to var each time through the loop. END {for (name in manuf) print name, manuf[name]}
break
The break statement transfers control out of a for or while loop, terminating execution of the innermost loop it appears in. continue
The continue statement transfers control to the end of a for or while loop, causing execution of the innermost loop it appears in to continue with the next iteration. |
< Day Day Up > |