.\"	chapter 3
.nr H1 2
.H 1 "Pattern Recognition and File Comparison Tools"
There are several revision tasks common to all text processing projects.
You will undoubtedly find yourself 
changing a key term, name, or phrase everywhere it 
appears, or locating references to items you need to change or delete.
You may need to compare and contrast multiple versions of your text
in order to locate variations.
You may also find that you may need to alter some aspect of the text format
to suit production requirements. 
To do this, you must locate a string--a word, a phrase, 
a text formatting macro or 
any repeated set of characters--and, if necessary, change it globally. 
With the use of the text processing tools discussed in this chapter,
these changes can be made very rapidly and with a high degree of consistency. 
.P
This chapter describes and compares
some of the \*(x1 tools you can use to accomplish these tasks.
All of these tools are common \*(x1 utilities also used by programmers
for searching and editing data and program text. 
As you read, it may become apparent that
several of the programs introduced here can be used interchangeably
in some situations,
and that many of the same tasks can also be done with
the text editors with which you are already familiar.
The emphasis here is on those tools which streamline complicated editing
command procedures, and on manipulating many files at once.
Having a number of alternatives available,
and knowing which one will do the job at hand most efficiently,
can save you a great deal of time and effort. 
.P
The following are discussed in this chapter:
.BL 
.LI 
the
\fBgrep\fR
command prints all lines which match a single specified pattern.
When it is combined with other commands in a shell procedure, and used to 
process many files at once, it becomes an extremely powerful aid in many text 
processing situations.
Two variants of the  
.B grep
family,
the commands
.B egrep
and 
.B fgrep,
are also introduced here. 
.LI
three \*(x1 file comparison utilities: the commands \fBdiff\fR, \fBdiff3\fR, 
and \fBcomm\fR.
All of these utilities have the capacity to
compare two or more files and output those lines which are
different. 
In text processing applications these programs can
be extremely useful for locating variations
between several versions of text very quickly.
.LI 
\fBsed\fR
is a non-interactive, or \*`batch\*', editor which is useful
if you must work with large files or run a complicated sequence of 
editing commands on a file or group of files.
.LI 
the
\fBawk\fR
program offers the added capability to search numerics, logical relations, 
variables, and particular fields within lines of text.
.LE
.P
Note that the members of the \fBgrep\fR family, the \fBawk\fR program, and 
.B sed
have as their basis the same principle of pattern recognition as
.B ed .
In each case, a file is searched for the occurrence of
a given pattern--a character or group of characters,
a word or word string--and a list of locations where
the pattern appears is generated.
.H 2 "Pattern Recognition: The grep Command" 
You will often need to find all occurrences of some word
or pattern in a group of files. 
The
.B grep 
command and its variants,
.B egrep
and
.B fgrep,
are among the most useful tools available to you
for quick searching of multiple files, as well as being fast
and extremely easy to use.
.B Grep
searches for the same \*`regular expressions\*' recognized by 
.B ed.
The word \*`grep\*' stands for
.DS I
g/re/p
.DE
that is, globally locate and print a regular expression.
It does exactly this. 
.B Grep
searches and prints every line in a set of files 
and prints all occurrences of the specified regular expression. 
Thus,
.DS I
grep thing  file1 file2 file3 ...
.DE
finds \*`thing\*' wherever it occurs in any of the 
files 
.I file1 ,
.I file2 ,
.I file3 ,
etc.
.B Grep  
also indicates the file in which the line
was found, so that it can be edited later.
.P
.B Grep
becomes even more powerful when it is used as a
\*`filter\*'. 
By combining the use of 
.B grep
with other commands
to generate a shell program to read and transform input,
large quantities of text can be processed through multiple
searching or editing procedures quickly.
.P
The commands 
.B grep, 
.B egrep, 
and 
.B fgrep 
all search one or more files for a 
specified pattern. They are expressed in the following form,
with options as listed below:
.DS I
grep [option] expression filename
  
egrep [option] expression filename 

fgrep [option] string filename
.DE
Commands of the
.B grep 
family search the files you specify, or if no files are specified, the 
standard input, for lines matching a pattern.
Normally, each line is copied to the standard output, but if you
are processing great quantities of text, you should specify an arbitrary
filename in which to store the results of the 
.B grep
search.
Unless the
.B -h
flag is used, the filename is shown if there is more than one input file.
.P
The difference between the three 
.B grep
variants is that
.B grep
patterns are limited regular expressions in the style of
.B ed.
.B Egrep
patterns are full regular expressions;
a faster algorithm is used which requires more space.
.B Fgrep
patterns, on the other hand, are fixed strings rather than regular expressions. 
.B Fgrep 
is faster and more compact.
All three programs recognize the following options, except as noted:
.VL 10
.LI \fB-v\fR
All lines but those matching are printed.
.LI \fB-c\fR
Only a count of matching lines is printed.
.LI \fB-l\fR
The names of files with matching lines are listed (once)
separated by newlines.
.LI \fB-n\fR
Each line is preceded by its line number in the file.
.LI \fB-b\fR
Each line is preceded by the block number on which it was found.
This is sometimes useful in locating disk block numbers by context.
No output is produced, only status.
.LI \fB-h\fR 
Does not print filename headers with output lines.
.LI \fB-y\fR
Alphabetic letters in the pattern will match letters of either 
case in the input (with 
.B grep 
and
.B fgrep
only).
.LI \fB-e\fR
Same as a simple expression argument, but useful when the expression
begins with a \-.
.LI \fB-f\fR 
The regular expression
\fBegrep\fR 
or string list
\fBfgrep\fR  
is taken from the file.
.LI \fB-x\fR
Exact lines matched in their entirety are printed
(for \fBfgrep\fR 
only).
.LE
.P
Just as with the editors,
you should be cautious in your use of the special characters
($ * [ ^ | ? \' " ( ) and \e) in a regular expression as they are
also meaningful to the shell.
It is safest to enclose the entire expression
argument in single quotes (' ').
.P
.B Fgrep
searches for lines that contain one of the strings
separated by newlines.
.B Egrep 
on the other hand, accepts the following extended regular expressions.
The term \*`character\*' excludes newline:
.AL 1
.LI
A \e followed by a single character
matches that character.
.LI
($) matches the beginning (end) of a line.
.LI
A dot (.) matches any character.
.LI
A single character not otherwise endowed with special
meaning matches itself.
.LI 
A string enclosed in brackets [\|]
matches any single character from the string.
Ranges of ASCII character codes may be abbreviated
as in \*`a\-z0\-9\*'.
A left bracket (]) may occur only as the first character of the string.
A literal dash (\-) must be placed where it cannot be
mistaken as a range indicator.
.LI
A regular expression followed by * (+, ?) matches a sequence
of 0 or more (1 or more, 0 or 1)
matches of the regular expression.
.LI
Two regular expressions concatenated
match a match of the first expression followed by a match of 
the second.
.LI
Two regular expressions separated by a pipe character (|) or a newline
match either a match for the first expression or a match for the
second.
.LI
A regular expression enclosed in parentheses
matches a match for the regular expression.
.LE
.P
The order of precedence of operators
at the same parenthesis level
is [\|], then *+?, then concatenation, then | and
newline.
.H 2 "File Comparison: diff, diff3, and comm"
In addition to locating occurrences of particular strings
or regular expressions in your text, you may often find
it useful to compare and contrast two or more similar text files
for variations which are not immediately apparent.
.B Diff
and
.B diff3
not only provide you with a facility for comparing two files rapidly,
but you can also use
.B diff 
to store file versions more compactly.
This is accomplished by storing the output of 
.B diff, 
which would be the differences in that file version,
rather than the file itself.
The 
.B -e
option collects a script of those 
.B ed 
commands ( such as append, change, and delete)
which would be necessary to recreate the revised file from from the original.
One more comparison tool,
.B comm,
is also discussed here.
.B Comm
is useful primarily for comparing the output of two sorted lists.
.H 3 "Using diff"
To use the 
.B diff 
command to compare two files, use the form:
.DS I
diff [-option] file1 file2
.DE
.B Diff
reports which lines must be changed in two files to bring them
into agreement.
If you give \*`\-\*' instead of the first filename,
.B diff
will read the \*`standard input\*', that is, what you type in.
If either \*`file\*' is actually a directory, then whatever file in that 
directory which has the same name as the other file is presumed. 
The normal output contains lines in this format:
.DS I
n1 a n3,n4 
n1,n2 d n3
n1,n2 c n3,n4
.DE
.P
In fact, these lines resemble
.B ed
commands to convert
.I file1
into
.I file2.
The numbers after the letters refer to
.I file2.
By exchanging \*`a\*' for \*`d\*' and reading backward
you can convert
.I file2
back into
.I file1.
In those cases where n1 = n2 or n3 = n4, the pairs 
are abbreviated as a single number.
Following each of these lines are printed all the lines that are
affected in the first file, flagged by \*`<\*', 
then all the lines that are affected in the second file,
flagged by \*`>\*'.
.P
.B Diff
has the following options:
.VL 10
.LI "\fB-b\fR"
ignores trailing blanks, including spaces and tabs; 
other strings of blanks are considered equal.
.LI "\fB-e\fR"
produces a script of append, change and delete commands for 
.B ed,
which will recreate
.I file2
from
.I file1 .
.LI "\fB-f\fR"
produces a similar script, which cannot be used with 
.B ed,
which reverses the editing commands to recreate
.I file1 .
If you are using the
.B \-e 
option, you can use the following shell program to maintain
multiple versions of a file.
Only an ancestral file ($1) and a chain of 
version-to-version
.B ed
scripts ($2,$3,...) made by
.B diff
need be on hand.
A \*`latest version\*' will appear on
the standard output, or you can designate a filename:
.DS I
(shift; cat $*; echo \'1,$p\') \(bv ed \- $1
.DE
.LI "\fB-h\fR"
does a faster, but half-hearted job.
It works only when changed stretches are short and well separated,
but does work on files of unlimited length.
You cannot use the \fB-e\fR and \fB-f\fR options with \fB-h.\fR
.LE
.sp
Normally,
.B diff
finds the smallest sufficient set of differences between two files.
.H 3 "Using diff3" 
.B Diff3 
works like 
.B diff,
except that it has the capacity to compare three files.
It is stated in the form:
.DS I
diff3 [-option] file1 file2 file3
.DE
.B Diff3
compares three versions of a file,
and reports disagreeing ranges of text
flagged with these codes:
.DS I
====
all three files differ

====1
file1 is different

====2
file2 is different

====3
file3 is different
.DE
The change which has occurred in converting a given range
of a given file to some other is
indicated as follows:
.VL 20
.LI "f : n1 a"
Text is to be appended after line number n1 in file \fIf\fR, where
\fIf\fR is 1, 2 or 3.
.LI "f : n1 , n2 c"
Text is to be changed in the range line n1 to line n2.
If n1 = n2, the range may be abbreviated to n1.
.LE
.P
The original contents of the range follows immediately
after a \*`c\*' indication.
When the contents of two files are identical, the contents of the lower-numbered
file is suppressed.
.P
As in the case of 
.B diff,
.B diff3
with the \*`-e\*' option publishes a script for 
.B ed
that will incorporate into
.I file1
all changes between
.I file2
and
.I file3,
In other words, it records
the changes that normally would be flagged \*`====\*' and \*`====3\*'.
Another option,
.B -x(-3) 
produces a script to incorporate only changes flagged either \*`====\*' or 
\*`====3\*'.
.H 3 "Comm"
The
.B comm 
program selects or rejects lines common to two sorted files.
It is expressed in the form:
.DS I
comm [-option] file1 file2
.DE
.B Comm
reads
.I file1
and
.I file2,
and produces a three-column output: lines only in
.I file1,
lines only in
.I file2,
and lines in both files.
Ordinarily, the lines
should be sorted in ASCII collating sequence,
a process which can be carried out
using the program
.B sort
before using 
.B comm.
To use sort, simply type the command
.DS I
sort filename
.DE
for each of the files to be compared with
.B comm.
As in 
.B diff
and its variants,
if you give \*`-\*'instead of a filename, 
.B comm
presumes the standard input.
.P
The possible options with 
.B comm 
are the
flags 1, 2, or 3, which suppress printing of the corresponding
column. 
Thus
.B comm
with \*`-12\*'
prints only the lines common to the two files;
.B comm 
\*`-23\*' prints only lines in the first file but not in the second.
Obviously,
.B comm -123
would print no lines.
.H 2 "Using sed"
The
.B sed
program is a non-interactive, or batch, editor which is especially useful 
when the files to be edited are either too large, or the sequence
of editing commands too complex, to be executed interactively.
.B Sed
works on only a few lines of input at a time and does not use temporary
files,
so the only limit on the size of the files you can process is that both the
input and output must be able to fit simultaneously on your disk.
You can apply multiple \*`global\*' editing functions to your text 
in one pass.
Since you can create complicated editing scripts and submit them to
.B sed
as a command file,
you can save yourself considerable retyping and the possibility of making
errors.
You can also save and re-use command files which perform editing operations
you need to repeat frequently. 
Processing files with 
.B sed
command files is more efficient than using
.B ed,
even if you prepare a pre-written script. 
Note, however, that
.B sed
lacks relative addressing becauses it processes a file one line at a time.
Also, like any batch editing facility,
.B sed
gives you no immediate verification that a command has altered your text in 
the way you actually intended.
Check your output carefully.
.P
The
.B sed
program is derived from  
.B ed,
although there are considerable differences between the two, 
resulting from the different characteristics of interactive
and batch operation.
You will notice, however, a striking resemblance 
in the class of regular expressions they recognize;
the code for matching patterns is nearly identical for
.B ed
and
.B sed.
.H 3 "Overall Operation"
By default,
.B sed
copies the standard input to the standard output,
performing one or more editing commands on each
line before writing it to the output.
Typically, you will need to specify the file or files you are processing,
along with the name of the command file which contains your editing script,
as in the following:
.DS I
sed -f script filename
.DE
The flags are optional. The 
.B -n
flag tells
.B sed
to copy only those lines specified by
.B p
funtions
or
.B p
flags after
.B s
functions.
The
.B -e
flag tells 
.B sed 
to take the next argument as an editing command, and
the
.B -f
flag tells 
.B sed
to take the next argument as a file name. 
(This file must contain editing commands, one to a line.)
.P
The general format of a 
.B sed
editing command is:
.DS I
address1,address2 function arguments
.DE
In any command, one or both addresses may be omitted. 
A function is always required, but an
argument is optional for some functions.
Any number of blanks or tabs may separate the addresses
from the function, and tab characters and spaces at the beginning 
of lines are ignored.
.P
Three flags are recognized on the command line:
.VL 10
.LI "\fB-n\fR"
directs
.B sed
to copy only those lines specified by
.B p
functions or
.B p
flags after 
.B s
functions.
.LI "\fB-e\fR"
indicates that the next argument is an editing command.
.LI "\fB-f\fR"
indicates that the next argument is the name of the 
file which contains editing commands, typed one to a line.
.LE
.P
.B Sed
commands are applied one at a time, generally in the order they are
encountered, unless you change this order with one of the \*`flow-of-control\*'
functions discussed below.
.B Sed
works in two phases: first compiling the editing commands in the order they 
are given, 
then executing the commands one by one to each line of the input file.
.P
The input to each command is the output of all preceding commands.
Even if you change this default order of applying commands with 
one of the two flow-of-control commands, 
.B t
and
.B b , 
the input line to any command is still the output of any previously applied 
command.
.P
You should also note that the range of pattern match is normally one line
of input text. 
This range is called \*`the pattern space\*'.
More than one line can be read into the pattern space by using the
.B N
command described below in \*`Multiple Input-Line Functions\*'.
.P
The rest of this section discusses the principles of 
.B sed
addressing, followed by a description of 
.B sed 
functions.
All the examples here are based on the following lines from Samuel Taylor 
Coleridge's poem, \*`Kubla Khan\*':
.DS I
In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man 
Down to a sunless sea.
.DE
For example, the command
.DS I
2q
.DE
will quit after copying the first two lines of the input.
Since we are borrowing the sample text from Coleridge, the result will be:
.DS I
In Xanadu did Kubla Khan
A stately pleasure dome decree:
.DE
.H 3 "Addresses"
The following rules apply to addressing in 
.B sed.
There are two ways to select the lines in the input file to which editing 
commands are to be applied: line numbers or \*`context addresses\*'.
Context addresses correspond to \*`regular expressions\*'.
The application of a group of commands can be controlled by
one address, or an address pair, by grouping
the commands with curly braces ({ }).
There may be 0, 1, or 2 addresses specified, depending on the 
command.
The maximum number of addresses possible for each command is indicated. 
.P
Of course, a line number is a decimal integer.
As each line is read from the input file, a line-number counter
is incremented.
A line-number address matches the input
line, causing the internal counter to equal the address line-number.
The counter runs cumulatively through multiple input files;
it is not reset when a new input file is opened.
A special case is the dollar sign character \*`$\*',
which matches the last line of the last input file.
.P
Context addresses are enclosed in slashes (/).
They include all the regular expressions common to both
.B ed,
and
.B sed:
.AL 1
.LI
An ordinary character is a regular expression and matches itself.
.LI
A circumflex (^) at the beginning of a regular expression matches the
null character at the beginning of a line.
.LI
A dollar sign ($) at the end of a regular expression matches the null
character at the end of a line.
.LI
The characters \*`\e\n\*' match an embedded newline character, but not the
newline at the end of a pattern space.
.LI
A period (.) matches any character except the terminal newline of the
pattern space.
.LI
A regular expression followed by an asterisk (*) matches any number, including
0, of adjacent occurrences of the regular expression it follows.
.LI
A string of characters in square brackets ([]) matches any character in the
string, and no others.
If, however, the first character of the string is a circumflex (^), the
the regular expression matches any character except the characters in the
string and the terminal newline of the pattern space.
.LI
A concatenation of regular expressions is a regular expression which
matches the concatenation of strings matched by the components of the
regular expression.
.LI
A regular expression between the sequences \*`\e\(\*' and \*`\e\)\*' is 
identical in effect to itself, but has side-effects with the
.B s
command, and the following specification.
.LI
The expression \*`\e\d\*' means the same string of characters matched by an
expression enclosed in \*`\(\*' and \*`\)\*' earlier in the same pattern.
Here \*`d\*' is a single digit; the string specified is that beginning with
the \*`dth\*' occurrence of \*`\(\*' counting from the left.
For example, the expression \*`\(.*\)\1\*' matches a line beginning with
two repeated occurrences of the same string.
.LI
The null regular expression standing alone is equivalent to the last
regular expression compiled. 
.LE
.P
For a context address to \*`match\*' the input, 
the whole pattern within the address must match some
portion of the pattern space.
If you want to use one of the special characters literally, that is, to 
match an occurrence of itself in the input file, precede the character with
a backslash (\\) in the command.
.P
Each
.B sed
command can have 0, 1, or 2 addresses.
The maximum number of allowed addresses is included with its description.
A command with no addresses specified is applied to every line
in the input.
If a command has one address, it is applied to all
lines which match that address.
On the other hand,
if two addresses are specified, the command is applied to the first
line which matches the first address, and to all subsequent lines
until and including the first subsequent line which matches
the second address.
An attempt is made on subsequent lines to again match the first
address, and the process is repeated.
Two addresses are separated by a comma.
Here are some examples:
.DS I
.nf
/an/    	matches lines 1, 3, 4 in our sample text
/an.*an/	matches line 1
/^an/   	matches no lines
/./             matches all lines
/r*an/  	matches lines 1,3, 4 (number = zero!)
.fi
.DE
.H 3 "Functions"
All 
.B sed
functions are named by a single character.
They are of the following types:
.BL
.LI
whole-line oriented functions add, delete, and change whole text
lines
.LI
substitute functions search and substitute regular expressions
within a line
.LI
input-output functions read and write lines and/or files
.LI
multiple input-line functions match patterns that
extend across line boundaries
.LI
hold and get functions save and retrieve input text for later use
.LI
flow-of-control functions control the order of application of functions
.LI
miscellaneous functions
.LE
.HU "Whole-line Oriented Functions"
.VL 10 
.LI "\fBd\fR" 
The
.B d
function deletes from the file all lines matched by its addresses.
No further commands will be executed on a deleted line.
As soon as the \fBd\fR function is executed, a new line is read from the 
input, and the list of editing commands is restarted from the beginning
on the new line.
The maximum number of addresses is two.
.sp
.LI "\fBn\fR" 
The \fBn\fR function reads and replaces the current line from the input,
writing the current line to the output if specified. 
The list of editing commands is continued following the 
\fBn\fR command.
The maximum number of addresses is two.
.LI "\fBa\fR"
The \fBa\fR function causes the argument--which is the text to be inserted--
to be written to the output after the line matched by its address.
The \fBa\fR command is inherently multiline;
\fBa\fR must appear at the end of a line. The text may contain
any number of lines.
The interior newlines must be hidden by a
backslash character (\e) immediately preceding the
newline.
The text argument is terminated by the first unhidden
newline, the first one not immediately preceded
by backslash.
Once an \fBa\fR
function is successfully executed, the text will be
written to the output regardless of what later commands do to
the line which triggered it,
even if the line is subsequently deleted.
The text is not scanned for address matches, and no editing
commands are attempted on it,
nor does it cause any change in the line-number counter.
Only one address is possible.
.sp
.LI "\fBi\fR"
The \fBi\fR function  followed by a text argument is the same as the \fBa\fR
function, except that the text is written to the output before the matched line.
It has only one possible address.
.sp
.LI "\fBc\fR"
The \fBc\fR function deletes the lines selected by its addresses,
and replaces them with the lines in the text.
Like the \fBa\fR and \fBi\fR commands, \fBc\fR 
must be followed by a newline hidden with a backslash;
interior newlines in the text must be hidden by
backslashes.
The \fBc\fR command may have two addresses, and therefore select a range
of lines.
If it does, all the lines in the range are deleted, but only
one copy of the text is written to the output, not
one copy per line deleted.
As in the case of \fBa\fR and \fBi\fR,
the text is not scanned for address matches, and no
editing commands are attempted on it.
It does not change the line-number counter.
After a line has been deleted by a \fBc\fR function, no further commands are 
attempted on it.
If text is appended after a line by \fBa\fR or \fBr\fR
functions, and the line is subsequently changed, the text
inserted by the \fBc\fR function will be placed before the text of the
\fBa\fR or \fBr\fR functions.
.LE
.P
Note that when you insert text in the output with these functions,
leading blanks and tabs will disappear in all 
.B sed
commands.
To get leading blanks and tabs into the output, precede the first
desired blank or tab by a backslash; the backslash will not
appear in the output.
.P
For example,
the list of editing commands:
.DS I
n
a\e\
XXXX
d
.DE
applied to our standard input, produces:
.DS I
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
.DE
In this particular case,
the same effect would be produced by either
of the two following command lists:
.DS I
.nf
n		n
i\		c\
XXXX    	XXXX
d
.fi
.DE
.HU "Substitute Functions"
The substitute function changes parts of lines selected by
a context search within the line, as in:
.DS I
(2)s<pattern><replacement><flags> -- substitute
.DE
The \fBs\fR function replaces part of a line selected by the designated 
pattern with the replacement pattern.
The pattern argument contains a pattern, exactly like the patterns in addresses.
The only difference between a pattern and a context address is
that the context address must be delimited by slash (/) characters.
A pattern argument may be delimited by any character other than space or
newline.
By default, only the first string matched by the pattern is replaced,
except when the \fBg\fR flag is used.
.P
The replacement argument begins immediately after the
second delimiting character of the pattern, and must be followed
immediately by another instance of the delimiting character.
The replacement is not a pattern, and the characters which are special in 
patterns do not have special meaning in replacement.
Instead, the following characters are special:
.VL 10
.LI "\fB&\fR" 
is replaced by the string matched by the pattern.
.LI "\fB\ed\fR"
\fId\fR is a single digit which is replaced by the \fId\fRth substring
matched by parts of the pattern enclosed in \*`\e\(\*' and \*`\e\)\*'.
If nested substrings occur in the pattern, the \fId\fRth
substring is determined by counting opening delimiters \*`\e\(\*'.
.LE
.P
As in patterns, special characters may be made
literal by preceding them with a backslash (\e).
.P
A flags argument may contain the following:
.VL 10
.LI "\fBg\fR" 
substitutes the replacement for all non-overlapping
instances of the pattern in the line.
After a successful substitution, the scan for the next
instance of the pattern begins just after the end of the
inserted characters; characters put into the line from
the replacement are not rescanned.
.LI "\fBp\fR" 
prints the line if a successful replacement was done.
The \fBp\fR flag causes the line to be written to the output if and only
if a substitution was actually made by the \fBs\fR function.
Notice that if several \fBs\fR functions, each followed by a
\fBp\fR flag, successfully substitute in the same input line,
multiple copies of the line will be written to the
output: one for each successful substitution.
.LI "\fBw\fR <filename>" 
writes the line to a file if a successful replacement was done.
The \fBw\fR  
flag causes lines which are actually substituted by the
.B s
function to be written to the named file.
If the filename existed before
.B sed
is run, it is overwritten;
if not, the file is created.
A single space must separate 
.B w
and the filename.
The possibilities of multiple, somewhat different copies of
one input line being written are the same as for \fBp\fR.
A combined maximum of ten different file names may be mentioned after
\fBw\fR flags and \fBw\fR functions.
.LE
.P
Here are some examples.
The following command, applied to our standard input,
.DS I
s/to/by/w changes
.DE
produces, on the standard output
.DS I
In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
.DE
and, on the file 
.I changes:
.DS I
Through caverns measureless by man
Down by a sunless sea.
.DE
The command
.DS I
s/[.,;?:]/*P&*/gp
.DE
produces:
.DS I
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
.DE
.P
With the
.B g
flag,
the command
.DS I
/X/s/an/AN/p
.DE
produces:
.DS I
In XANadu did Kubhla Khan
.DE
and the command
.DS I
/X/s/an/AN/gp
.DE
produces:
.DS I
In XANadu did Kubhla KhAN
.DE
.HU "Input-Output Functions"
.VL 10
.LI "\fBp\fR"
The print function writes the addressed lines to the standard output file
at the time the \fBp\fR 
function is encountered, regardless of what succeeding
editing commands may do to the lines.
The maximum number of possible addresses is two.
.LI "\fBw\fR" 
The write function writes the addressed lines to the file named
by <filename>.
If the file previously existed, it is overwritten; if not, it is created.
The lines are written exactly as they exist when the write function
is encountered for each line, regardless of what subsequent
editing commands may do to them.
Exactly one space must separate the \fBw\fR command  
and the filename.
A maximum of ten different files may be mentioned in write
functions and \fBw\fR 
flags after \fBs\fR functions combined.
.LI "\fBr\fR" 
The read function reads the contents of the named file, and appends
them after the line matched by the address.
The file is read and appended regardless of what subsequent
editing commands do to the line which matched its address.
If \fBr\fR and \fBa\fR functions are executed on the same line,
the text from the \fBa\fR functions and the
\fBr\fR functions is written to the output in the order that
the functions are executed.
Exactly one space must separate the \fBr\fR 
and the filename.
One address is possible.
If a file mentioned by an \fBr\fR 
function cannot be opened, it is considered a null file rather than
an error, and no diagnostic is given.
.LE
.P
Note that since there is a limit to the number of files that can be opened
simultaneously, be sure that no more than ten
files are mentioned in \fB\fR 
functions or flags; that number is reduced by one if any
\fBr\fR functions are present.
Only one read file is open at one time.
.P
Here are some examples.
Assume that the file 
.I note1
has the following contents:
.DS I
Note:  Kubla Khan (more properly Kublai Khan; 
1216-1294) was the grandson and most eminent
successor of Genghiz (Chingiz) Khan, and
founder of the Mongol dynasty in China.
.DE
Then the following command:
.DS I
/Kubla/r note1
.DE
produces:
.DS I
In Xanadu did Kubla Khan
    Note:  Kubla Khan (more properly Kublai Khan; 
    1216-1294) was the grandson and most eminent
    successor of Genghiz (Chingiz) Khan, and
    founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
.DE
.HU "Multiple Input-Line Functions"
.br
Three functions, all spelled with capital letters, deal
specially with pattern spaces containing embedded newlines.
They are intended principally to provide pattern matches across
lines in the input.
.VL 10
.LI "\fBN\fR"
Appends the next input line to the current line in the
pattern space; the two input lines are separated by an embedded
newline.
Pattern matches may extend across the embedded newline(s).
There is a maximum of two addresses.
.LI "\fBD\fR" 
Deletes up to and including the first newline character
in the current pattern space.
If the pattern space becomes empty (the only newline
was the terminal newline),
another line is read from the input.
In any case, begin the list of editing commands again
from its beginning.
The maximum number of addresses is two.
.LI "\fBP\fR" 
Prints up to and including the first newline in the pattern space.
The maximum number of addresses is two.
.LE
.P
The \fBP\fR and \fBD\fR functions are equivalent to their lowercase 
counterparts if there are no embedded newlines in the pattern space.
.HU "Hold and Get Functions"
These functions save and retrieve part of the input for possible later
use:
.VL 10
.LI "\fBh\fR"
The \fBh\fR function copies the contents of the pattern space
into a holding area, destroying any previous contents of the
holding area.
The maximum number of addresses is two.
.LI "\fBH\fR" 
The \fBH\fR function appends the contents of the pattern space
to the contents of the holding area. 
The former and new contents are separated by a newline.
.LI "\fBg\fR" 
The \fBg\fR function copies the contents of the holding area into
the pattern space destroying the previous contents of the
pattern space.
.LI "\fBG\fR"
The \fBG\fR function appends the contents of the holding area to the
contents of the pattern space. 
The former and new contents are separated by a newline.
The maximum number of addresses is two.
.LI "\fBx\fR"
The exchange command interchanges the contents
of the pattern space and the hold area.
The maximum number of addresses is two.
.LE
.P
For example, the commands
.DS I
1h
1s/ did.*//
1x
G
s/\n/  :/
.DE
applied to our standard example, produce:
.DS I
In Xanadu did Kubla Khan  :In Xanadu
A stately pleasure dome decree:  :In Xanadu
Where Alph, the sacred river, ran  :In Xanadu
Through caverns measureless to man  :In Xanadu
Down to a sunless sea.  :In Xanadu
.DE
.HU "Flow-of-Control Functions"
.br
These functions do no editing on the input
lines, but control the application of functions
to the lines selected by the address part.
.VL 10
.LI "\fB!\fR"
This command causes the next command
written on the same line to be applied to all and only those input lines
not selected by the address part.
There are two possible addresses.
.LI "\fB{\fR" 
The grouping command \fB{\fR causes the next set of commands to be applied
or not applied as a block to the input lines selected by the addresses
of the grouping command.
The first of the commands under control of the grouping
may appear on the same line as the \fB{\fR or on the next line.
The group of commands is terminated by a matching \fB}\fR standing on a line 
by itself.
Groups can be nested and may have two addresses.
.LI "\fB:<label>\fR"
The label function marks a place in the list
of editing commands which may be referred to by
\fBb\fR and \fBt\fR functions.
The <label> may be any sequence of eight or fewer characters;
if two different colon functions have identical labels,
an error message will be generated, and no execution attempted.
.LI "\fBb<label>\fR"
The branch function causes the sequence of editing commands being
applied to the current input line to be restarted immediately
after encountering a colon function with the same label.
If no colon function with the same label can be found after
all the editing commands have been compiled, an error message 
is produced, and no execution is attempted.
A \fBb\fR function with no label is interpreted as a branch to the end of the
list of editing commands.
Whatever should be done with the current input line is done, and
another input line is read; the list of editing commands is restarted from the
beginning on the new line.
Two addresses are possible.
.LI "\fBt<label>\fR" 
The \fBt\fR function tests whether any
successful substitutions have been made on the current input line.
If so, it branches to the label; if not, it does nothing.
The flag which indicates that a successful substitution has
been executed is reset either by reading a new input line, or
executing a \fBt\fR function.
.LE
.HU "Miscellaneous Functions"
.br
There are two other functions of 
.B sed
not discussed above.
.VL 15 10
.LI "\fB=\fR" 
The 
.B = 
function writes to the standard output the line number of the
line matched by its address.
One address is possible.
.LI "\fBq\fR"
The
.B q
function causes the current line to be written to the
output (if it should be), any appended or read text to be written, and
execution to be terminated.
One address is possible.
.LE
.H 2 "Using awk"
By now you are familiar with several tools for 
locating patterns and strings in one or more text files, including
.B grep
and its variants.
You are also familiar with global search techniques in
the editors
.B ed
and
.B ex,
as well as the batch editing capabilities of
.B sed .
.B Awk
offers another approach to many of these same tasks.
.B Awk
is actually a programming language designed to make
many common searching and text manipulation tasks
easy to state and to perform.
Although it may take more effort to learn and use,
.B awk
offers several key features not available with
.B grep
or 
.B sed .
These include the capability to do numeric processing, handle variables, 
make general selections, and control flow in commands.
.B Awk
is unique in providing a way to access fields within lines.
.P
In practice,
.B awk 
is used in two general ways: for report generation,
processing an input to extract counts, sums, sub-totals, etc.,
and to transform data from the form produced by one program into that expected 
by another.
.B Awk
searches input lines consecutively for a match of any patterns
which you designate.
For each pattern, an action can be specified;
this action will be performed on each line that matches the pattern.
.B Awk
recognizes more general patterns than 
.B grep ,
and the possible actions are more complex than merely
printing the matching line.
For example, the
.B awk
program
.DS I
{print $3, $2}
.DE
prints the third and second columns of a table in that order.
The program
.DS I
$2 ~/A|B|C/
.DE
prints all input lines with an A, B, or C in the second field.
The program
.DS I
$1 != prev { print; prev = $1 }
.DE
prints all lines in which the first field is different
from what was previously the first field.
.H 3 "How to Invoke awk"
The command in the following form:
.DS I
awk program filename
.DE
executes the
.B awk
commands written into the named program on the set of named files,
or on the standard input if no files are named.
The statements can also be placed in a file
.I pfile ,
and executed by the command:
.DS I
awk  -f pfile filename
.DE
.H 3 "Program Structure"
An
.B awk
program is a sequence of statements in the form:
.DS I
pattern	{ action }
pattern	{ action }
...
.DE
Each line of input is matched against each of the patterns in turn.
For each pattern matched, the associated action is executed.
When all the patterns have been tested, the next line
is read and the matching process repeated.
Either the pattern or the action may be omitted, but not both.
If there is no action for a pattern,
the matching line is simply copied to the output.
Thus a line which matches several patterns can be printed several times.
If there is no pattern for an action,
then the action is performed for every input line.
A line which matches no pattern is ignored.
Since patterns and actions are both optional, actions must be enclosed in braces
to distinguish them from patterns.
.H 3 "Records and Fields "
.B Awk
input is divided into \*`records\*' which are terminated by a record separator.
Because the default record separator is a newline,
.B awk
processes its input one line at a time.
The number of the current record is available in a variable
named NR,
for \*`number register\*'.
.P
Each input record is divided into \*`fields\*'.
Fields are normally separated by
white space, either blanks or tabs, but
the input field separator may be changed as indicated.
Fields are referred to as \*`$1\*', \*`$2\*',
and so forth, where $1 is the first field, and $0
is the whole input record itself.
Assignments may be made to fields.
The number of fields in the current record
is available in a variable named NF,
for \*`number fields\*'.
.P
The variables FS and RS
refer to the input field and record separators;
they may be changed at any time to any single character.
The optional command-line argument
.B -Fc
may also be used to set FS to the character \*`c\*'.
If the record separator is empty,
an empty input line is taken as the record separator,
and blanks, tabs and newlines are treated as field separators.
The variable FILENAME
contains the name of the current input file.
.H 3 "Printing"
If an action has no pattern, the action is executed for all lines.
The simplest action is to print some or all of a record,
using the
.B awk
command
.B print.
This command prints each record, copying the input to the output intact.
A field or group of fields may be printed from each record.
For instance, 
.DS I
print $2, $1
.DE
prints the first two fields in reverse order.
Items separated by a comma in the print statement will be separated 
by the current output field separator when output.
Items not separated by commas will be concatenated, so
.DS I
print $1 $2
.DE
runs the first and second fields together.
.P
The predefined variables NF and NR can be used.
For example,
.DS I
{ print NR, NF, $0 }
.DE
prints each record preceded by the record number and the number of fields.
Also, output may be diverted to multiple files.
The program
.DS I
{ print $1 >"foo1"; print $2 >"foo2" }
.DE
writes the first field, $1, on the file 
.I foo1 ,
and the second field on file
.I foo2 .
The \*`>>\*' notation can also be used.
For example,
.DS I
print $1 >>"foo"
.DE
appends the output to the file
.I foo .
In each case, the output files are created if necessary.
The file name can be a variable or a field as well as a constant.
For example,
.DS I
print $1 >$2
.DE
uses the contents of field 2 as a filename.
There is a limit of ten on the possible number of output files.
Output can also be piped into another process. 
For instance,
.DS I
print | "mail bwk"
.DE
mails the output to
.I bwk .
.P
The variables OFS and ORS
may be used to change the current output field separator and output
record separator.
The output record separator is appended to the output of the 
.B print 
statement.
.B Awk
also provides the
.B printf
statement for output formatting.
.DS I
printf format expr, expr, ...
.DE
formats the expressions in the list according to the specification
in the file 
.I format
and prints them.
For example,
.DS I
printf "%8.2f  %10ld\e\n", $1, $2
.DE
prints $1
as a floating point number 8 digits wide, with two after the decimal point,
and $2
as a 10-digit long decimal number, followed by a newline.
No output separators are produced automatically; they must be added, as in 
the above example.
.H 3 "Patterns"
You may specify a pattern before an action to act as a selector
for determining whether the action is to be executed.
A variety of expressions may be used as patterns:
regular expressions, arithmetic relational expressions,
string-valued expressions, and arbitrary boolean combinations of these.
.P
The special pattern BEGIN
matches the beginning of the input, before the first record is read.
The pattern END matches the end of the input, after the last record has been 
processed.
BEGIN and END
thus provide a way to gain control before and after processing,
for initialization and wrapup.
.P
For example, the field separator can be set to a colon with: 
.DS I
BEGIN	{ FS = ":" }
... rest of program ...
.DE
Or the input lines may be counted by:
.DS I
END  { print NR }
.DE
If
BEGIN
is present, it must be the first pattern, and END must be the last if used.
.HU "Regular Expressions"
.br
The simplest regular expression is a literal string of characters
enclosed in slashes, such as: 
.DS I
/smith/
.DE
This is actually a complete
.B awk
program which prints all lines containing any occurrence
of the name \*`smith\*'.
If a line contains \*`smith\*'
as part of a larger word,
it will also be printed, as in
.DS I
blacksmithing
.DE
.P
The list of
regular expressions recognized by
.B awk
include the regular expressions recognized by  
.B ed,
.B sed,
and the 
.B grep
command.
In addition, 
.B awk
allows parentheses for grouping, the \*`|\*' for alternatives,
the \*`+\*' for \*`one or more\*', and the \*`?\*' for \*`zero or one\*'.
Character classes may be abbreviated:
[a\-zA\-Z0\-9] is the set of all letters and digits.
For example, the
.B awk
program
.DS I
/[Aa]ho|[Ww]einberger|[Kk]ernighan/
.DE
prints all lines which contain any of the names
\*`Aho\*', \*`Weinberger\*', or \*`Kernighan,\*'
whether capitalized or not.
.P
Regular expressions must be enclosed in slashes, just as in
.B ed
and
.B sed .
Within a regular expression, blanks and the regular expression
metacharacters are significant.
To turn off the special meaning of one of the regular expression metacharacters,
precede it with a backslash.
For example, the pattern
.DS I
/\/.*\//
.DE
matches any string of characters
enclosed in slashes.
You can also specify that any field or variable matches
a regular expression (or does not match it) with the operators
\*`~\*' and \*`!~\*'.
The program
.DS I
$1 ~ /[jJ]ohn/
.DE
prints all lines where the first field matches \*`john\*' or \*`John\*'.
Notice that this will also match \*`Johnson\*', \*`St. Johnsbury\*', and so on.
To restrict the match to exactly \*`John\*' or \*`john\*', use
.DS I
$1 ~ /^[jJ]ohn$/
.DE
The caret (^) refers to the beginning of a line or field;
the dollar sign ($) refers to the end.
.HU "Relational Expressions"
.br
An
.B awk
pattern can be a relational expression involving the usual relational operators
<, <=, ==, !=, >=, and >.
For example, 
.DS I
$2 > $1 + 100
.DE
selects lines where the second field
is at least 100 greater than the first field.
Similarly,
.DS I
NF % 2 == 0
.DE
prints all lines with an even number of fields.
.P
In relational tests, if neither operand is numeric,
a string comparison is made; otherwise it is numeric.
Thus,
.DS I
$1 >= "s"
.DE
selects lines that begin with \*`s\*', \*`t\*', \*`u\*', etc.  
In the absence of any other information, fields are treated as strings, 
so the program
.DS I
$1 > $2
.DE
will perform a string comparison.
.HU "Combinations of Patterns"
.br
A pattern can be any boolean combination of patterns, using the operators
\||\|| (or), && (and), and ! (not).
For example,
.DS I
$1 >= "s" && $1 < "t" && $1 != "smith"
.DE
selects lines where the first field begins with \*`s\*', but is not \*`smith\*'.
The operators && and \||\||
guarantee that their operands will be evaluated from left to right;
evaluation stops as soon as their truth or falsehood is determined.
.P
The pattern that selects an action may also
consist of two patterns separated by a comma, as in
.DS I
pat1, pat2	{ ... }
.DE
In this case, the action is performed for each line between
an occurrence of
.I pat1
and the next occurrence of
.I pat2
(inclusive).
For example,
.DS I
/start/, /stop/
.DE
prints all lines between \*`start\*' and \*`stop\*',
while
.DS I
NR == 100, NR == 200 { ... }
.DE
does the action for lines 100 through 200 of the input.
.H 3 "Actions"
In addition to the patterns described above, 
there is a group of possible actions offered by the
.B awk
program. An
.B awk
action is a sequence of action statements
terminated by newlines or semicolons.
These action statements can do a variety of
bookkeeping and string manipulating tasks.
The possible actions are
built-in functions, the assignment of variables
and strings, the use of field variables, string concatenation statements,
arrays, and flow-of-control statements.
.HU "Built-in Functions"
.br
.B Awk
provides a \*`length\*' function to compute the length of a string of 
characters.
This program prints each record, preceded by its length:
.DS I
{print length, $0}
.DE
The length by itself is a \*`pseudo-variable\*' which
yields the length of the current record;
\fBlength\fR(argument) is a function which yields the length of its argument,
as in the equivalent:
.DS I
{print length($0), $0}
.DE
The argument may be any expression.
.P
.B Awk
also provides the arithmetic functions
.B sqrt ,
.B log ,
.B exp ,
and
.B int ,
for square root,
logarithm, exponential,
and integer parts of their respective arguments.
The name of one of these built-in functions,
without argument or parentheses,
stands for the value of the function on the whole record.
The program
.DS I
length < 10 || length > 20
.DE
prints lines whose length is less than 10 or greater than 20.
.P
The function \fBsubstr\fR(s,\ m,\ n) produces the substring of \*`s\*'
that begins at position \*`m\*' (origin 1) and is at most
\*`n\*' characters long.
If \*`n\*' is omitted, the substring goes to the end of \*`s\*'.
The function \fBindex\fR(s1,\ s2) 
returns the position where the string
\*`s2\*' occurs in \*`s1\*', or zero if it does not.
.P
The function \fBsprintf\fR(f,\ e1,\ e2,\ ...)
produces the value of the expressions e1, e2, etc., in the
.B printf
format specified by \*`f\*'.
Thus, for example,
.DS I
x = sprintf("%8.2f %10ld", $1, $2)
.DE
sets \*`x\*' to the string produced by formatting the values of
$1 and $2.
.HU "Variables, Expressions, and Assignments"
.br
.B Awk
variables take on numeric (floating-point) or string 
values according to context.
In the following example,
.DS I
x = 1
.DE
\*`x\*' is clearly a number, while in
.DS I
x = "smith"
.DE
it is clearly a string.
Strings are converted to numbers and vice versa whenever context demands it.
For instance,
.DS I
x = "3" + "4"
.DE
assigns 7 to \*`x\*'.
Strings which cannot be interpreted as numbers in a numerical context
will generally have the numeric value zero. 
.P
By default, variables (other than built-in functions) are initialized to a 
null string, which has numerical value zero.
This eliminates the need for most BEGIN sections.
For example, the sums of the first two fields can be computed by
.DS I
     { s1 += $1; s2 += $2 }
END  { print s1, s2 }
.DE
.LP
Arithmetic is done internally in floating point.
The arithmetic operators are: +, \-, \(**, /, and % (mod).
The C increment ++ and decrement \-\- operators are also available,
as well as the assignment operators +=, \-=, *=, /=, and %=.
These operators may all be used in expressions.
.HU "Field Variables"
.br
Fields in
.B awk
share essentially all of the properties of variables.
They may be used in arithmetic or string operations,
and may be assigned to.
Thus you can replace the first field with a sequence number:
.DS I
{ $1 = NR; print }
.DE
or accumulate two fields into a third, 
.DS I
{ $1 = $2 + $3; print $0 }
.DE
or assign a string to a field,
.DS I
{ if ($3 > 1000)
	$3 = "too big"
  print
}
.DE
which replaces the third field by \*`too big\*' when it is too big,
prints the record in either case.
.P
Field references may be numerical expressions,
as in the following:
.DS I
{ print $i, $(i+1), $(i+n) }
.DE
Whether a field is deemed numeric or string depends on context;
in ambiguous cases like
.DS I
if ($1 == $2) ...
.DE
fields are treated as strings.
.P
Each input line is automaticallty split into fields as necessary.
It is also possible to split any variable or string into fields.
For example,
.DS I
n = split(s, array, sep)
.DE
splits the the string
.I s
into
.I array[1] ,
\&...,
.I array[n] .
The number of elements found is returned.
If the
.I sep
argument is provided, it is used as the field separator.
Otherwise FS is used as the separator.
.HU "String Concatenation"
.br
Strings may be concatenated.
For example:
.DS I
length($1 $2 $3)
.DE
returns the length of the first three fields.
Or in a
.B print
statement,
.DS I
print $1 " is " $2
.DE
prints the two fields separated by \*`is\*'.
Variables and numeric expressions may also appear in concatenations.
.HU "Arrays"
.br
Array elements are not declared; they spring into existence as mentioned.
Subscripts may have any non-null value, including non-numeric strings.
For example, in a conventional numeric subscript, the statement
.DS I
x[NR] = $0
.DE
assigns the current input record to the NR -th element of the array \*`x\*'.
In fact, in principle it is possible 
to process the entire input in a random order with the
.B awk
program:
.DS I
     { x[NR] = $0 }
END { ... program ... }
.DE
The first action merely records each input line in the array \*`x\*'.
.P
Array elements may be named by non-numeric values. 
Suppose the input contains fields with values like
.I apple ,
.I orange ,
etc.
The program
.DS I
/apple/	{ x["apple"]++ }
/orange/	{ x["orange"]++ }
END		{ print x["apple"], x["orange"] }
.DE
increments counts for the named array elements,
and prints them at the end of the input.
Any expression can be used as a subscript in an array reference.
Thus,
.DS I
x[$1] = $2
.DE
uses the first field of a record as a string to index the array \*`x\*'.
.P
Suppose each line of input contains two fields, a name and a non-zero value.
Names may be repeated.
To print a list of each unique name
followed by the sum of all the values for that name.
Use the program:
.DS I
     { amount[$1] += $2 }
END  { for (name in amount)
		print name, amount[name] }
.DE
To sort the output, replace the last line with 
.DS I
print name, amount[name] | "sort"
.DE
.HU "Flow-of-Control Statements"
.br
.B Awk
provides the flow-of-control statements
.B if-else ,
.B while ,
.B for ,
and statement grouping with braces. 
When using the
.B if
statement the condition in parentheses is evaluated.
If it is true, the statement following the
.B if
is done.
The
.B else
part is optional.
.P
A
.B while
statement is also available.
For example, to print all input fields one per line, use:
.DS I
i = 1
while (i <= NF) {
	print $i
	++i
}
.DE
.P
The 
.B for
statement
.DS I
for (i = 1; i <= NF; i++)
	print $i
.DE
does the same job as the
.B while
statement above.
.P
An alternate form of the
.B for
statement is suited for accessing the elements of an associative array.
For example,
.DS I
for (i in array)
	statement
.DE
performs
.B statement
with \*`i\*' set in turn to each element of the array.
The elements are accessed in an apparently random order.
Chaos will ensue if \*`i\*' is altered, or if any new elements are
accessed during the loop.
.P
The expression in the condition part of an
.B if ,
.B while
or
.B for
statement can include relational operators like <, <=, >, >=, ==
(\*`is equal to\*'),
and != (\*`not equal to\*');
regular expression matches with the match operators \*`\e~\*' and \*`!\e~\*';
the logical operators \||\||, &&, and !, and parentheses for grouping.
.P
The
.B break
statement causes an immediate exit from an enclosing
.B while
or
.B for
statement.
The
.B continue
statement causes the next iteration to begin.
The statement
.B next
causes
.B awk
to skip immediately to
the next record and begin scanning the patterns from the top.
The statement
.B exit
causes the program to behave as if the end of the input
had occurred.
.P
Comments may be placed in
.B awk
programs. 
They begin with the character \*`#\*' and end with the end of the line,
as in
.DS I
print x, y	# this is a comment
.DE
.TC
