Home

Awesome

Simple Awk

Is awk simple? Or this guide will make learning awk simple? Ha-ha, you should decide this after you read this guide.

Recommended books (not written by me)

Definitive Guide to sed - by Daniel Goldman <br> Sed & Awk - Dale Dougherty & Arnold Robbins <br> Effective awk programming - by Arnold Robbins

More guides that I wrote

useful-sed - Useful sed tips, techniques & tricks for daily usage <br> wizardly-tips-vim - Less known Vim tips & tricks <br> quick-grep - Quick grep reference and tutorial. <br> convenient-utils-linux - Linux utils to make life easier and more convenient.


Before you start

Prerequisites

Intro

Note about lines/records fields/words

How to call awk

#!/usr/bin/awk -f
BEGIN {print "BEGINNING"}
/Gollum/ {print "I like it raw and riggling"}

Simple pattern

for each record (line) in all the files passed to awk

if record (line) matches /bilbo/ pattern

print the whole record (line)

Field

for each record (line) in all the files passed to awk

if record (line) matches /bilbo/ pattern

print the first field (word) from the record (line)

Pattern AND Pattern

On each record (line) that matches /bilbo/ AND /frodo/

print the string "my precious"

Pattern OR Pattern

On each record (line) that matches /bilbo/ OR /frodo/

print the string "Is it you mister Frodo?"

NOT Pattern

On each record (line) that DOESN'T match /frodo/

Print "Pohtatoes"

IF Pattern present then check for Pattern, ELSE check for Pattern

Read record.

If it matches /frodo/

Does it also match /ring/? If yes then print "Either frodo with the ring, or the orcs" If it doesn't match /frodo/ Does it match /orcs/? If yes then print "Either frodo with the ring, or the orcs"

Pattern Range

Execute command for each record (line)

Between the record (line) that matches /Shire/ (including that record (line)) And record (line) that matches /Osgiliath/ (including that record(line))

What is it mister Frodo? 
Do you miss the Shire?
I miss the shire too.
This Osgiliath is to drab for me. 
Too many orcs.
Do you miss the Shire?
I miss the shire too.
This Osgiliath is to drab for me.

BEGIN Pattern

Before any input is read

Print "And so it beggins"

END Pattern

After all input was read

Print "There and back, by Bilbo Baggins"

BEGINFILE, ENDFILE Patterns

Before input is read from a file

print "A new chapter is beginning mister Frodo"

NO Pattern

For all records (lines):

print the first word

Conditional Pattern

About commands

Variables

More Variables

BEGIN {IGNORECASE=1}
/frodo/ {print "do you remember the taste of strawberries Frodo?"}

Programming intro

 Operators
       The operators in AWK, in order of decreasing precedence, are:

       (...)       Grouping

       $           Field reference.

       ++ --       Increment and decrement, both prefix and postfix.

       ^           Exponentiation (** may also be used, and **= for the assignment operator).

       + - !       Unary plus, unary minus, and logical negation.

       * / %       Multiplication, division, and modulus.

       + -         Addition and subtraction.

       space       String concatenation.

       |   |&      Piped I/O for getline, print, and printf.

       < > <= >= == !=
                   The regular relational operators.

       ~ !~        Regular expression match, negated match.  NOTE: Do not use a constant regular expression (/foo/) on the left-hand side of a ~ or !~.  Only use one on the right-hand side.  The expression
                   /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp).  This is usually not what you want.

  &&          Logical AND.

       ||          Logical OR.

       ?:          The  C  conditional  expression.  This has the form expr1 ? expr2 : expr3.  If expr1 is true, the value of the expression is expr2, otherwise it is expr3.  Only one of expr2 and expr3 is
                   evaluated.

       = += -= *= /= %= ^=
                   Assignment.  Both absolute assignment (var = value) and operator-assignment (the other forms) are supported.

   Control Statements
       The control statements are as follows:

              if (condition) statement [ else statement ]
              while (condition) statement
              do statement while (condition)
              for (expr1; expr2; expr3) statement
              for (var in array) statement
              break
              continue
              delete array[index]
              delete array
              exit [ expression ]
              { statements }
              switch (expression) {
              case value|regex : statement
  ...
              [ default: statement ]
              }

Programming usage

#!/usr/bin/awk -f
BEGIN {
	IGNORECASE=1
	hobitses=0
}
/fellowship/ {
	if (index($0,"samwise") >0 ) {
		hobitses+=1
		print "Hurry up hobitses"
	}
}
END {
	print "Found a total of " hobitses " hobitses"
}

Check if we have the string "samwise" inside the record (line). Index is a built in function. It takes two strings. If the second string is contained within the first it will return a value bigger than 0. If the second string is not present in the first return 0. If index() returns a value bigger than 0 ("samwise" was found inside the current record (line)) do the following:

increase hobitses by 1 print a message

Options

String concatenation

System commands

-rw-rw-r-- 1 me me 0 Nov 14 17:40 f1
-rw-rw-r-- 1 me me 59 Nov 14 17:41 f2
-rw-rw-r-- 1 me me 20 Nov 12 15:42 col1

Writing dynamically to files

System commands with stdin

-rw-rw-r-- 1 me me 0 Nov 14 17:40 file.txt

Getline example

$ - the positional variable

Modify the positional variable

Selective .csv column print

city,area,population
LA,400,100
Miami,500,101,
Buenos Aires,800,102

Custom field separator with OFS

Mix with command line text

drwxr-xr-x  2 root root        69632 Nov 13 19:21 .
drwxr-xr-x 16 root root         4096 Nov  9 07:35 ..
-rwxr-xr-x  1 root root        59888 Dec  5  2020 [
-rwxr-xr-x  1 root root        18456 Feb  7  2021 411toppm
-rwxr-xr-x  1 root root           39 Aug 15  2020 7z
-rwxr-xr-x  1 root root           40 Aug 15  2020 7za
-rwxr-xr-x  1 root root           40 Aug 15  2020 7zr
-rwxr-xr-x  1 root root        35344 Jul  1 00:42 aa-enabled
-rwxr-xr-x  1 root root        35344 Jul  1 00:42 aa-exec

Math on text.

city,area,population
LA,400,100
Miami,500,101,
Buenos Aires,800,102
#!/usr/bin/awk -f
BEGIN {
	total=0
	FS=","
}
{
	if (FNR>1) {
		real_pop=$3 * 1000
		total+=real_pop
		print "Real population of", $1, "is" ,real_pop
	}
}

END {
	
	print "Total Population:", total
}
Real population of LA is 100000
Real population of Miami is 101000
Real population of Buenos Aires is 102000
Total Population: 303000

Fancy line numbers

(1) line one
(2) line two

Print words by their number

first second  third
fourth fifth sixth
seventh eight
#!/usr/bin/awk -f
BEGIN { RS="" }
{
		print $1, $8
}
first 
fourth
seventh

Pass stdin to awk (and show nicely formatted size of files)

-rwxr-xr-x  1 root root           39 Aug 15  2020 7z
-rwxr-xr-x  1 root root           40 Aug 15  2020 7za
-rwxr-xr-x  1 root root           40 Aug 15  2020 7zr

Pass both stdin and file to awk

Check if text coming from stdin or file

Arrays intro

#!/usr/bin/awk -f
{
		myarr["hobits"]="hobitses"
		print(myarr["hobits"])
}

Store lines in array

#!/usr/bin/awk -f
/bilbo/ {
		myarr[NR]=$0
}
END{
		for (i in myarr){
					print "subscript is",i
							print myarr[i]
								}
}
subscript is 3
a story by frodo and bilbo
subscript is 13
bilbo again and frodo

Delete array elems

Array index concatenation

#!/usr/bin/awk -f
BEGIN{
	arr[1,2]="one"
	arr["abc","bcd"]="two"
	arr["abc",1]="three"
	arr["foo" "bar"]="four"
	arr["bar" "foo"]="five"
	for (i in arr){
		print "at index",i,"value is",arr[i]
	}
}
at index abc1 value is three
at index abcbcd value is two
at index foobar value is four
at index barfoo value is five
at index 12 value is one

Ordered array indexes

#!/usr/bin/awk -f
BEGIN{
	i=0
	arr[""]=0
}
/bilbo/{
	arr[i++]=$0
}
END{
	for (j=0;j<i;j++){
		print "at index",j,"value is",arr[j]
	}
}
at index 0 value is a story by frodo and bilbo with ring
at index 1 value is bilbo again and frodo and orcs

The power of printf

{printf("%d is nice but %.2f is better",1,2)}

The printf Statement
       The  AWK  versions of the printf statement and sprintf() function (see
       below) accept the following conversion specification formats:

       %a, %A  A floating point number  of  the  form  [-]0xh.hhhhp+-dd  (C99
               hexadecimal floating point format).  For %A, uppercase letters
               are used instead of lowercase ones.

       %c      A single character.  If the argument used for %c  is  numeric,
               it  is treated as a character and printed.  Otherwise, the ar‐
               gument is assumed to be a string, and the only first character
               of that string is printed.

       %d, %i  A decimal number (the integer part).

       %e, %E  A  floating  point number of the form [-]d.dddddde[+-]dd.  The
               %E format uses E instead of e.

       %f, %F  A floating point number of the  form  [-]ddd.dddddd.   If  the
               system  library  supports it, %F is available as well. This is
               like %f, but uses capital letters for special “not  a  number”
               and “infinity” values. If %F is not available, gawk uses %f.

       %g, %G  Use %e or %f conversion, whichever is shorter, with nonsignif‐
               icant zeros suppressed.  The %G format uses %E instead of %e.

       %o      An unsigned octal number (also an integer).

       %u      An unsigned decimal number (again, an integer).

       %s      A character string.

       %x, %X  An unsigned hexadecimal number (an integer).   The  %X  format
               uses ABCDEF instead of abcdef.

       %%      A single % character; no argument is converted.

       Optional,  additional parameters may lie between the % and the control
       letter:

       count$ Use the count'th argument at  this  point  in  the  formatting.
              This is called a positional specifier and is intended primarily
              for use in translated versions of format strings,  not  in  the
              original text of an AWK program.  It is a gawk extension.

       -      The expression should be left-justified within its field.

       space  For  numeric  conversions, prefix positive values with a space,
              and negative values with a minus sign.

       +      The plus sign, used before the width modifier (see below), says
              to  always  supply  a sign for numeric conversions, even if the
              data to be formatted is positive.  The +  overrides  the  space
              modifier.

  #      Use  an  “alternate form” for certain control letters.  For %o,
              supply a leading zero.  For %x, and %X, supply a leading 0x  or
              0X for a nonzero result.  For %e, %E, %f and %F, the result al‐
              ways contains a decimal point.  For %g, and %G, trailing  zeros
              are not removed from the result.

       0      A  leading  0  (zero)  acts  as  a flag, indicating that output
              should be padded with zeroes instead of spaces.   This  applies
              only  to the numeric output formats.  This flag only has an ef‐
              fect when the field  width  is  wider  than  the  value  to  be
              printed.

       '      A  single quote character instructs gawk to insert the locale's
              thousands-separator character into decimal numbers, and to also
              use  the  locale's  decimal point character with floating point
              formats.  This requires correct locale support in the C library
              and in the definition of the current locale.

       width  The  field  should  be padded to this width.  The field is nor‐
              mally padded with spaces.  With the 0 flag, it is  padded  with
              zeroes.

       .prec  A  number  that  specifies  the precision to use when printing.
              For the %e, %E, %f and %F, formats, this specifies  the  number
              of  digits  you want printed to the right of the decimal point.
              For the %g, and %G formats, it specifies the maximum number  of
              significant  digits.   For  the %d, %i, %o, %u, %x, and %X for‐
              mats, it specifies the minimum number of digits to print.   For
              the  %s  format,  it specifies the maximum number of characters
              from the string that should be printed.

Selective file processing

#!/usr/bin/awk -f
{
	if (FILENAME=="skip.txt" || FILENAME=="skip2.txt"){
		nextfile
	}
	print $0
}

Skip records (lines) based on a certain condition

Some math funcs

{
	print "log",log($1),$2
	print "col",cos($1),$2
	print "sin",sin($1),$2
	print "rand",rand()
}

Print some random integers (1000 rand ints between 0-100)

#!/usr/bin/awk -f
BEGIN {
for (i=0; i<1000; i++) printf("%d\n",rand()*100)
}

String funcs - index

#!/usr/bin/awk -f
{
	i=index($0,"bilbo")
	print $0
	if (i>0) print ">>>> INDEX IS",i
}

String funcs - length

[ Hello world world ] has length of 17
[ hello andback again Again ] has length of 25
[ a story by frodo and bilbo with ring ] has length of 36

String funcs - split

#!/usr/bin/awk -f
BEGIN {
	mystring="The best | leaf in the Shire, isn't it?"
	n=split(mystring,array," ")
	for (i in array) print array[i]
	print "TOTAL splits:",n
}
The
best
leaf
in
the
Shire,
isn't
it?
TOTAL splits: 8

String funcs - substr

#!/usr/bin/awk -f
BEGIN {
	mystring="lotr is cool"
	n=substr(mystring,1,3)
	print n
}

String funcs - gensub

#!/usr/bin/awk -f
BEGIN {
	mystring="Run Halifax. Show us the meaning of haste."
	res=gensub(/[Hh]ali/,"Shadow","g",mystring)
	print res
}

String func - gsub

String func - sub

String func - match

#!/usr/bin/awk -f
{
	i=match($0,/bilbo/)
	if (i>0){
		res=substr($0,i,5)
		print res
	}
}

String func - tolower

String func - toupper

String func - asort

#!/usr/bin/awk -f
BEGIN {
	for (i=0;i<5;i++){
		arr[i]=rand()*100
		print arr[i]
	}
	print ">> SORTED"
	asort(arr)
	for (i in arr)print arr[i]
}

92.4046
59.3909
30.6394
57.8941
74.0133
>> SORTED
30.6394
57.8941
59.3909
74.0133
92.4046

Time func - strftime

%a	The locale's abbreviated weekday name
%A	The locale's full weekday name
%b	The locale's abbreviated month name
%B	The locale's full month name
%c	The locale's "appropriate" date and time representation
%d	The day of the month as a decimal number (01--31)
%H	The hour (24-hour clock) as a decimal number (00--23)
%I	The hour (12-hour clock) as a decimal number (01--12)
%j	The day of the year as a decimal number (001--366)
%m	The month as a decimal number (01--12)
%M	The minute as a decimal number (00--59)
%p	The locale's equivalent of the AM/PM
%S	The second as a decimal number (00--61).
%U	The week number of the year (Sunday is first day of week)
%w	The weekday as a decimal number (0--6). Sunday is day 0
%W	The week number of the year (Monday is first day of week)
%x	The locale's "appropriate" date representation
%X	The locale's "appropriate" time representation
%y	The year without century as a decimal number (00--99)
%Y	The year with century as a decimal number
%Z	The time zone name or abbreviation
%%	A literal %.

Extract the inverse of a regex match

#!/usr/bin/awk -f
{
	reg="[Bb]ilbo"
	if (match($0,reg)){
		bef=substr($0,1,RSTART-1)
		aft=substr($0,RSTART+RLENGTH)
		pat=substr($0,RSTART,RLENGTH)
		print bef,"|",pat,"|",aft
	}
	else print $0
}

Hello world world
hello andback again Again
a story by frodo and  | bilbo |  with ring with bilbo
 | Bilbo |  baggins baggins baggins

Put all lines on one line

User declared funcs


#!/usr/bin/awk -f

# Declare custom func outside
function throw_ring(who){
	if (who=="gollum"){
		return 0
	}
	else if (who=="frodo"){
		return 1
	}
}

# On all records matching /ring/
/ring/{
# Find gollum or frodo
if (match($0,"gollum")){
	was_thrown=throw_ring("gollum")
	#use ternary if/else. If was_thrown is true (or bigger than 0) return "THROWN". Else return "NOT THROWN"
	print "the  ring throw status is ", was_thrown?"THROWN":"NOT THROWN"
}

if (match($0,"frodo")){
	was_thrown=throw_ring("frodo")
	print "the  ring throw status is ", was_thrown?"THROWN":"NOT THROWN"
}

}

More about conditional patterns

Advanced conditional patterns

Print the first Nth lines of every file

Print until you encounter this pattern. Then move to next file.

Print every Nth line of file

About