Home

Awesome

Text Processing Recipes Linux.

Linux text processing reference & recipes. Featuring: vim, tr, cat, tac, sort, shuf, seq, pr, paste, fmt, cut, nl, split, csplit, sed, awk, grep and regex.

TABLE OF CONTENTS

Recommended books (not written by me)

Definitive Guide to sed - by Daniel Goldman <br> Sed & Awk - Dale Dougherty & Arnold Robbins <br> Effective awk programming - by Arnold Robbins

Basic Regex.

Literal. Matches 'hello'

hello

Any single character.

.

Two any single characters

..

Multiple specific characters. 'a' or 'b' or 'c'

[abc]

Character range. All characters between a-z.

[a-z]

Multiple ranges.All between a-z and A-Z.

[a-zA-Z]

Ranges + extra characters. Chars from 1-9 and 'a','b','c'

[1-9abc]

Negated chars. Any char that is NOT 'a', 'b', or 'c'

[^abc]

Negated char ranges. Any char NOT between a-z.

[^a-z]

* multiplier. Any char zero or more times. Works on the preceding ITEM.

.*

* multiplier again. Char 'a', followed by char 'b' (preceding item) zero or more times. This is how all multipliers work.

ab*

? multiplier. Any char zero or one times.

.?

+ multiplier. Any char one or more times.

.+

Numbered multiplier. Character 'l' five times. ('lllll')

l{5}

Numbered range multiplier. Char 'a' between 1-5 times ('a', 'aa','aaa', etc)

a{1-5}

Open ended numbered range multiplier. Char 'a' at least 2 times.

a{2,}

Greedy matching (default). From 'hello you loving kool kaizer king' will match 'llo you loving kool kaizer k'

l.*k

Lazy matching. From 'hello you loving kool kaizer king' will match 'llo you loving k' and 'l k'. In most cases this is the desired behaviour.

l.*?k

Shorthand char classes:

\s - whitespace (space, newline, tab, etc)
\S - oposito of \s (not whitespace)
\d - digit (0-9)
\D - not digit
\w - word char ([a-zA-Z0-9_])
\W - not word char

^ beginning of line. Beginning of line followed by 'h', 'e','l','l' and 'o'

^hello

$ end of line. 'b','y','e' followed by end of line.

bye$

\b - beginning or end of word. Will match ' FOO' at end of line but not 'BARFOO' at end of line.

\bFOO$

Group in parenthesis. Back reference it by '\1'. 'foo' is group 1. Matches 'foo bar foo' (foo, followed by bar followed by first group which is 'foo')

(foo) bar \1

Alternation. 'foo' or 'bar'

foo|bar

General Tools

Replace ':' with tabs in the output

cat /etc/passwd | tr ':' '\t'

Delete unwanted chars ('#')

echo "hello # world" | tr -d '#'

replace all but the alphanumeric chars with _

`cat main.py | tr -c '[:alnum:]\n' '_'

___usr_bin_python3
from_termcolor_import_colored__cprint
import_sys
for_i_in_range_0_10__
____print_str_i__end___r__

Convert all to uppercase:

cat main.py | tr 'a-z' 'A-Z'

Convert all to lowercase:

cat main.py | tr 'A-Z' 'a-z'

Swap uppercase/lowercase

tr 'a-zA-Z' 'A-Za-z' <file.txt

Replace all digits with empty space.

cat main.py | tr '0-9' ' '

Replace all punctuation with empty space. Braces, parenthesis, etc.

cat main.py | tr '[:punct:]' ' '

Remove repeated spaces (keep just a single space between words)

cat file | tr -s ' '

Sort using human-numeric-sort (will order: '1,10,15,100' ad not '1,10,100,15')

sort -h file

Sort and reverse

sort -r file

Random sort (shufle lines)

sort -R file

Sort in place (OVERWRITE file with sorted lines). Don't print anything on stdout.

sort file -o file

Sort and remove duplicate lines

sort -u file

Get N random lines from file

shuf -n2 file

Get N random lines from file, with repetition allowed (one line may be present multiple times)

shuf -rn2 file

Generate 3 rand numbers between 1 and 20 (no repetition)

shuf -n3 -i 1-20

Generate 3 rand numbers between 1-20 with repetition

shuf -rn3 -i 1-20

Generate 3 rand negative numbers between -10 and 0 with repetition ('-r' is for repetition. remove it for no repetition)

seq -10 0 | shuf -rn3

Generate 3 rand floats (0.1 step) between 0-5. Add '-' for repetition ('-rn3')

seq 0 0.1 5 | shuf -n3

Generate 3 rand numbers and sort them (numerical sort)

shuf -n3 -i 0-100 | sort -n

Get 2 random words from word list provided after '-e'

shuf -n2 -e red blue green orange

Return two random file names from current dir

shuf -n2 -e *

Generate random IP. (replace newlines with dots using 'tr')

shuf -n4 -i 0-255 | tr '\n' '.'

Generate a sequenco of numbers 1 to 10

seq 10

Generate sequence of numbers between 5-15

seq 5 15

Generate sequence of numbers between 10-20, skipping every second number (10,12,14,etc)

seq 10 2 20

Generate a descending sequence of numbers from 10 to 0

seq 10 -1 0

Generate a sequence of numbers separated by space

seq -s' ' 10

Generate sequence of numbers and pad them with 0 (01,02,etc)

seq -w 10

Generate sequence of numbers with custom formatting (using printf formatting)

seq -f'%.5f' 10

Double space between lines (if no empty lines add one. If 2 empty lines - make them 4)

pr -dt file

Gen numbers and put them on 3 columns, separated by tabs:

seq 9 | pr -3ts <br>

1	4	7
2	5	8
3	6	9

Gen numbers, put them on 3 cols, separated by commas:

seq 9 | pr -3ts, <br>

1,4,7
2,5,8
3,6,9

Gen numbers, put them on 3 cols ACROSS (from left to right, not top to bottom), separated by commas, :

seq 9 | pr -3ats,

1,2,3
4,5,6
7,8,9

Put the contents of files each on a column, separated by tabs. Works well with files with short lines.

pr -mts file1 file2

10	100
100	One thousand
20	2000
25	2,5
one	one and tow
	extra
	extra2
	extra3

Generate rand nums from: 1-10, 1-100. Put them on columns separated by tabs.

pr -mts <(shuf -n10 -i 1-10) <(shuf -n10 -i 1-100) <br>


2	62
10	88
4	76
6	96
5	35
1	39
7	64
3	1
9	48
8	53

Generate 100 sequential numbers. Put them on 5 columns separated by tabs ('t' stands for no header (no extra whitespace at the bottom usually)

seq 100 | pr -5t <br>


1	      21	    41		  61		81
2	      22	    42		  62		82
3	      23	    43		  63		83
4	      24	    44		  64		84
5	      25	    45		  65		85
6	      26	    46		  66		86
7	      27	    47		  67		87
8	      28	    48		  68		88
9	      29	    49		  69		89
10	      30	    50		  70		90
11	      31	    51		  71		91
12	      32	    52		  72		92
13	      33	    53		  73		93
14	      34	    54		  74		94
15	      35	    55		  75		95
16	      36	    56		  76		96
17	      37	    57		  77		97
18	      38	    58		  78		98
19	      39	    59		  79		99
20	      40	    60		  80		100

Another way to put contents of a file on columns (across), separated by tabs. One column for one dash (3 cols total)

paste - - - <file 

Another to generate a sequence of nums on 3 colums with custom separator

seq 20 | paste - - - -d,


1,2,3
4,5,6
7,8,9
10,11,12
13,14,15
16,17,18
19,20,

Put all lines on a single line, separated by custom delimiter (comma). Each file lines go on a separate line (output is 3 lines)

paste -sd, file1 file2 file3

Break lines at max width. Break if possible at spaces (don't put '-s' if you want the breaks to occur wherever). Will cut words if they are more than max width ('-w')

paste -sw5 file

Lorem
 
ipsum
 
dolor
 
amet.
 I 
am 
first
.
What 
a 
story
. I 
am 
secon
d.
The 
End.

Break lines at max width. Will keep words even if they exceed max width.

fmt -w3 file


Lorem
ipsum
dolor
amet.
I
am
first.
What
a
story.
I
am
second.
The
End.

Show file contents with line numbering

cat -n file

     1	Lorem ipsum dolor amet. I am first.
     2	What a story. I am second.
     3	The End.

Show file contents. Squeeze empty lines (to at most one line)

cat -s file

Show file contents. Number only non-empty lines

cat -b file

Display file lines in reverse

tac file

Reverse text.

echo "hello" | rev

olleh

Display first 5 lines

head -n 5 file

Display all except last N lines

head -n -5 file

Display a column (the second). Use custom separator.

echo "hello world of linux" | cut -f2 -d' '


world

Display a column range. Use custom separator.

echo "hello world of linux" | cut -f2-4 -d' '

world of linux

Display column range (from beginning to third). (use -f3- to display from third to end)

echo "hello world of linux" | cut -f-3 -d' '

hello world of

Number lines. Increase count by 3 at each line (1 line1, 4 line2, etc)

nl -i3 file

Number lines. Custom string after line number.

nl -s'--' file


     1--Lorem ipsum dolor amet. I am first.
     2--What a story. I am second.
     3--Before empty lines.
        
        
     4--After empty lines
     5--The End.
        

Number lines. Start counting from 5.

nl -v5 file

Split file into smaller files. Each smaller file has at most 100 lines. (WARNING - it may create lots of files in dir where cmd is run)

split -l10 file

Generate 100 rand nums. Put first 10 in one file, second 10 in another file, etc. File names start with 'x' - eg 'xai'.

seq 100 | split -l10

Split file in smaller files by byte count. (eg: 'xai' has content 'abcde', 'xaj' has 'fghij', etc). Each smaller file has 5 bytes at most.

split -b5 file

Split file into 3 (approximatively) equal chunks. WILL SPLIT LINES.

split -n3 file

Split file into 3 (approx.) equal chunks. Don't split lines (in the middle).

split -nl/3 file

Split file into 8 and display 4th chunk on stdout. Will cut lines. No files are generated.

split -n4/8 file

Split file into 8 and display 4th chunk on stdout. Lines are kept intact. No files generated.

split -nl/4/8 file

Split file into 3 chunks. Don't split lines. Use numeric suffixes for generated files ('x01, x02,etc')

split -dnl/3 file

Split file into 3 chunks. Use custom prefix for generated filenames ('sub_aa,sub_bb')

split -nl/3 file sub_

Split file into 3 cunks. Use custom prefix AND numeric suffixes ('sub_00, sub_01')

split -dnl/3 file sub_

Split file at 5th line into 2 subfiles. Subfile 1 ('xx00')wil have 4 lines. Subfile 2 ('xx01') will have the rest.

csplit file 5

Split file at line matching regex into 2 subfiles. Line matching regex will be in second file.

csplit file '/my line/'

sed

Print one line

sed -n '10p' myfile.txt

Do replacement on all lines except line 5

sed '5!/s/foo/bar/' file.txt

Do replacement on lines matching regex (eg: lines starting with 'hello')

sed '/^hello/ s/h/H/' file.txt

Do replacement from line 5 to end of file

sed '5,$ s/foo/bar/' file.txt

Delete empty files

sed '/^$/d' file

Print lines between two regex matches

sed -nE '/^foo/,/^bar/p' file.txt

Use custom delimiters to make it easy for some strings that contain slashes

sed 's_/bin/bash_/bin/sh_' file.txt

Custom delimiters for regex address combined with the classical delimiter for substitute command (you could also use there a custom delimiter). Useful for paths.

sed '\_/bin/bash_s/grep/egrep/' file.txt

Insert a space between lowercase/Uppercase characters using & (which represents the regex match)

sed 's/[a-zA-Z]/& /g' file.txt

Keep the first word of every line (where word is defined by alphanumeric chars + underscores for simplicity sake)

sed -E 's_[a-zA-Z0-9_]+.*_\1_' file.txt

Switch the first two words

sed -E 's_([a-zA-Z0-9_]*) ([a-zA-Z0-9_]*)_\2 \1_' f1

Remove duplicate words separated by a single space (but not triplicate)

sed -E 's_([a-zA-Z0-9_]+) \1_\1_ig' f1

Search and replace for pattern, write just the lines with the replacements in a new file

sed 's_foo_bar_w replaced.txt' file.txt

Multiple replacements

sed -e 's_foo_bar_' -e 's_hello_HELLO_' file.txt

Multiple replacements by using a sed script

#!/usr/bin/sed -f
s/a/A/
s/foo/BAR/
s/hello/HELLO/

Multiple commands using the ; operator which in theory concatenates commands (WARNING! It won't work as expected with certain commands such as 'r' or 'w'. Use a sed script instead OR put the command dealing with filenames last). Print line 10 and insert before line 5.

sed '10p;5i\"INSERTED BEFORE LINE 5" file.txt

Remove comments between lines starting with these two keywords. Empty lines will be put there instead

sed -E '/start/,/end/ s/#.*//' file.txt

Delete comments starting with # (no empty lines left behind)

sed -E '/^#/d' f1

Insert an empty line after pattern (after each line containing comment in this case)

sed '/^#/G' file.txt

View lines minus lines between line starting with pattern and end of file

sed '/start/,$ d' file.txt

View lines except lines between line starting with pattern and line ending with pattern

sed -rn '/start/,/end/ !p' file.txt

Print until you encounter pattern then quit

sed '/start/q' file.txt

Insert contents of file after a certain line

sed '5 r newfile.txt' file.txt

Append text after lines containing regex (AFTER FOO)

sed '/foo/a\AFTER FOO' file.txt

Insert text after lines containing regex (BEFORE FOO)

sed '/foo/i\BEFORE FOO' file.txt

Change line containing regex match

sed '/foo/c\FOO IS CHANGED' file.txt

Nested sed ranges with inversion. Between lines 1,100 apply actions where the pattern DOESN'T match.

#!/usr/bin/sed -f
1,100 {
	/foo/ !{
		s_hello_HELLOOOOWORLD_
		s_yes_YES_
	}
}

Use nested addresses with change, insert and append to modify: the line before match, the line with match, the line after match.

#!/usr/bin/sed -f
/^#/ {
i\
#BEFFORE ORIGINAL COMMENt
a\
#AFTER ORIGINAL COMMENT
c\
# ORIGINAL COMMENT IS NOW THIS LINE
}

Insert new line before the first comment, after the first comment put in the contents of file and quit immediately afterwards

#!/usr/bin/sed -f
/^#/ {
i\#BEFORE COMMENT
r myotherfile.txt
q
}

Transform text

sed 'y/abc/ABC/' file.txt

Copy all the comments (starting with #) to a new file

sed -E '/^#/w comments.txt' file.txt

Print every second line (substitute ~3 for third line, etc)

sed -n '1~2p' file.txt

Edit file in place but also create a backup

sed -i.bak 's/hello/HELLO/' file.txt

Append two extra lines after regex match

sed -E '/^#/G G' file.txt

grep

Search for match (the string 'hello') in file (called generically 'file'). Display every line that matches pattern (in this case every line containing 'hello')

grep hello file

Search for match in file and use quotes on the pattern. Not required unless you have special chars that are expanded by the shell. (in this case not required)

grep 'hello' file

Search for a match in multiple files

grep hello file1 file2

Search for match in all files in current dir (will show a warning if dirs are present too)

grep hello *

Search for a match in all files in curent dir. Don't show errors if dirs are present. (grep treats dirs just as ordinary files and tries to "read" them). '-s' is for silent. Will also skip errors regarding nonexistent files.

grep -s hello *

Search for a match in all files than end with '.py'

grep hello *.py

Search for match in all files in current dir. Suppress warning if dirs are present. (it searches for 'hello' in all files. 'skip' is an action passed to '-d'). Show warnings about unexisting files.

grep -d skip hello *

Case insensitive

grep -i Hello file

Invert search

grep -v hello file

Combine options. Case insensitive AND invert search

grep -iv Hello file

Use regex. (search for either 'year' or 'Year')

grep '[Yy]ear' file

Use basic regex (default). Match literal 'years+' in string. ('?+{|()' have no special meaning). Don't match 'years', 'yearss', 'yearsss', etc.

grep 'years+' file

Use extendend regex. Match 'years', 'yearss', 'yearsss', etc. ('+' means one or more of the chars before it, in this case an 's'). '?+{|()' have special meaning.

grep -E 'years+' file

Same as above (extended regex)

egrep 'years+' file

Match whole words. Will match ' year ' but not 'goodyear'

grep -w year file

Match whole lines. Will match 'year' (where 'year' is the single word on a line. Won't match 'one year', 'goodyear'.

grep -x year file

Treat the search pattern literally, not as a regex. Will match the literal '[Yy]ear' but won't match 'year'.

grep -F '[Yy]ear' file

Search for multiple patterns. Match both 'year' and 'hello'.

grep -e hello -e year file

Read search patterns from a file. Each pattern on a new line. Match all found patterns. 'patterns.txt' can have 'word' on one line, '[Yy]ear' on the second, etc.

grep -f patterns.txt file

Read search patterns from file AND from text passed to option '-e'. Match all found patterns.

grep -f patterns.txt -e '[Ee]xtra' file

Count matching lines for pattern (NOT matching patterns). Display a number - how many lines matched.

grep -c hello file

Count matching lines for every file except dirs (supressed with '-s'). Display how mayn lines matched (for every file). Will show multiple files with 0 or more matches.

grep -sc hello *

Print ONLY file names where match found (don't print the actual matches).

grep -l hello *.txt

Print ONLY file names where match NOT found.

grep -L hello *.txt

Search for pattern only whithin the first Nth lines. (only in the first 10 lines in the example)

grep -m 10 hello file

Search for pattern whithin Nth lines for every file in current dir. Skip dirs. Note how we concatenate '-m' and '10'. We could've alse written them with a space, like '-m 10'

grep -sm10 hello *

Print only the matched parts, without the surrounding text. Example will print 'year', 'Year', 'YEAR', etc - each an a new line

grep -o [Yy]ear file

Supress error messages about files not existing.

grep -s hello file nonexisting_file

Print filename before each match. Eg: 'file:goodyear' (default when multiple files are searched).

grep -H year file

Supress printing filenames before each match (even if multiple files are searched)

grep -h year file file2

Add line number before each output line (Eg: '1:goodyear')

grep -n year file

Print both line number and file name (eg: 'file:3:goodyear'). '-H' will force to display filename even if just one file (by default not shown). '-s' suppress dir missing warns.

grep -nHs year *

Also print N trailing lines AFTER matched line. (show N lines AFTER the matched line)

grep -A 2 year file

Also print N trailing lines BEFORE matched line. (show N lines BEFORE the matched line)

grep -B 2 year file

Print 2 lines before and 4 lines after matched line.

grep -B2 -A4 hello file

Also print N lines BEFORE and N lines AFTER matched line (eg: 2 before and 2 after)

grep -C 2 year file

Force process binary files. Without this you'll get 'grep: /usr/bin/pamon: binary file matches'. (search for string 'au' in binary file)

grep -a au /usr/bin/pamon

Exclude files that match this pattern. (eg: don't search .py or .c files)

grep --exclude=*.py --exclude=*.c year *

Include files that match this pattern. Use in conjuction with --exclude. (exclude all .py files and then include only 'main.py' in the search)

grep --exclude=*.py --include=main.py year *

Search recursively in dir (go as deep as possible, searching for files). DON'T follow symlinks. No warning about searching dirs shown.

grep -r hello

Exclude dirs from searching. Useful when using '-r' to skip certain dirs, such as '.git'

grep hello -r --exclude-dir='.git'

Seach recursively in dir. If simlynk encountered, follow it and search the file pointed by the symlink.

grep -R hello

Print total byte count before matched lines. First line (from file, not matched line) has a count of '0'. Eg: line 1 - '0:abc', line 2 '4:def'. It shows 4 because it has counted 4 bytes until now ('abc' + newline from the first file)

grep -b hello file

Search for 'hello' in files that might start with the '-' character. Without the '--' a file like '-myfile' won't be searched. WARNING - having such a file in your dir will BREAK "normal" grep functioning (eg: grep hello * WON'T SHOW all 'hello' lines from files. Reason is that when it encounters file '-x' it treats it as an option since it expands the * wildcard)

grep -- hello *

Sausage options 1. Search in binary files (text too but friendly toward binaries). Print byte count (or offset as grep calls it), force filename, ignorecase, show match only, also show line count, search recursively in this dir. Output is like 'hau/f:15:193:hello'

grep -abHionr hello.

awk

Prerequisites

Intro

Note about lines/records fields/words

How to call awk

#!/usr/bin/awk -f
BEGIN {print "BEGINNING"}
/Gollum/ {print "I like it raw and riggling"}

Simple pattern

for each record (line) in all the files passed to awk

if record (line) matches /bilbo/ pattern

print the whole record (line)

Field

for each record (line) in all the files passed to awk

if record (line) matches /bilbo/ pattern

print the first field (word) from the record (line)

Pattern AND Pattern

On each record (line) that matches /bilbo/ AND /frodo/

print the string "my precious"

Pattern OR Pattern

On each record (line) that matches /bilbo/ OR /frodo/

print the string "Is it you mister Frodo?"

NOT Pattern

On each record (line) that DOESN'T match /frodo/

Print "Pohtatoes"

IF Pattern present then check for Pattern, ELSE check for Pattern

Read record.

If it matches /frodo/

Does it also match /ring/? If yes then print "Either frodo with the ring, or the orcs" If it doesn't match /frodo/ Does it match /orcs/? If yes then print "Either frodo with the ring, or the orcs"

Pattern Range

Execute command for each record (line)

Between the record (line) that matches /Shire/ (including that record (line)) And record (line) that matches /Osgiliath/ (including that record(line))

What is it mister Frodo? 
Do you miss the Shire?
I miss the shire too.
This Osgiliath is to drab for me. 
Too many orcs.
Do you miss the Shire?
I miss the shire too.
This Osgiliath is to drab for me.

BEGIN Pattern

Before any input is read

Print "And so it beggins"

END Pattern

After all input was read

Print "There and back, by Bilbo Baggins"

BEGINFILE, ENDFILE Patterns

Before input is read from a file

print "A new chapter is beginning mister Frodo"

NO Pattern

For all records (lines):

print the first word

Conditional Pattern

About commands

Variables

More Variables

BEGIN {IGNORECASE=1}
/frodo/ {print "do you remember the taste of strawberries Frodo?"}

Programming intro

 Operators
       The operators in AWK, in order of decreasing precedence, are:

       (...)       Grouping

       $           Field reference.

       ++ --       Increment and decrement, both prefix and postfix.

       ^           Exponentiation (** may also be used, and **= for the assignment operator).

       + - !       Unary plus, unary minus, and logical negation.

       * / %       Multiplication, division, and modulus.

       + -         Addition and subtraction.

       space       String concatenation.

       |   |&      Piped I/O for getline, print, and printf.

       < > <= >= == !=
                   The regular relational operators.

       ~ !~        Regular expression match, negated match.  NOTE: Do not use a constant regular expression (/foo/) on the left-hand side of a ~ or !~.  Only use one on the right-hand side.  The expression
                   /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp).  This is usually not what you want.

  &&          Logical AND.

       ||          Logical OR.

       ?:          The  C  conditional  expression.  This has the form expr1 ? expr2 : expr3.  If expr1 is true, the value of the expression is expr2, otherwise it is expr3.  Only one of expr2 and expr3 is
                   evaluated.

       = += -= *= /= %= ^=
                   Assignment.  Both absolute assignment (var = value) and operator-assignment (the other forms) are supported.

   Control Statements
       The control statements are as follows:

              if (condition) statement [ else statement ]
              while (condition) statement
              do statement while (condition)
              for (expr1; expr2; expr3) statement
              for (var in array) statement
              break
              continue
              delete array[index]
              delete array
              exit [ expression ]
              { statements }
              switch (expression) {
              case value|regex : statement
  ...
              [ default: statement ]
              }

Programming usage

#!/usr/bin/awk -f
BEGIN {
	IGNORECASE=1
	hobitses=0
}
/fellowship/ {
	if (index($0,"samwise") >0 ) {
		hobitses+=1
		print "Hurry up hobitses"
	}
}
END {
	print "Found a total of " hobitses " hobitses"
}

Check if we have the string "samwise" inside the record (line). Index is a built in function. It takes two strings. If the second string is contained within the first it will return a value bigger than 0. If the second string is not present in the first return 0. If index() returns a value bigger than 0 ("samwise" was found inside the current record (line)) do the following:

increase hobitses by 1 print a message

Options

String concatenation

System commands

-rw-rw-r-- 1 me me 0 Nov 14 17:40 f1
-rw-rw-r-- 1 me me 59 Nov 14 17:41 f2
-rw-rw-r-- 1 me me 20 Nov 12 15:42 col1

Writing dynamically to files

System commands with stdin

-rw-rw-r-- 1 me me 0 Nov 14 17:40 file.txt

Getline example

$ - the positional variable

Modify the positional variable

Selective .csv column print

city,area,population
LA,400,100
Miami,500,101,
Buenos Aires,800,102

Custom field separator with OFS

Mix with command line text

drwxr-xr-x  2 root root        69632 Nov 13 19:21 .
drwxr-xr-x 16 root root         4096 Nov  9 07:35 ..
-rwxr-xr-x  1 root root        59888 Dec  5  2020 [
-rwxr-xr-x  1 root root        18456 Feb  7  2021 411toppm
-rwxr-xr-x  1 root root           39 Aug 15  2020 7z
-rwxr-xr-x  1 root root           40 Aug 15  2020 7za
-rwxr-xr-x  1 root root           40 Aug 15  2020 7zr
-rwxr-xr-x  1 root root        35344 Jul  1 00:42 aa-enabled
-rwxr-xr-x  1 root root        35344 Jul  1 00:42 aa-exec

Math on text.

city,area,population
LA,400,100
Miami,500,101,
Buenos Aires,800,102
#!/usr/bin/awk -f
BEGIN {
	total=0
	FS=","
}
{
	if (FNR>1) {
		real_pop=$3 * 1000
		total+=real_pop
		print "Real population of", $1, "is" ,real_pop
	}
}

END {
	
	print "Total Population:", total
}
Real population of LA is 100000
Real population of Miami is 101000
Real population of Buenos Aires is 102000
Total Population: 303000

Fancy line numbers

(1) line one
(2) line two

Print words by their number

first second  third
fourth fifth sixth
seventh eight
#!/usr/bin/awk -f
BEGIN { RS="" }
{
		print $1, $8
}
first 
fourth
seventh

Pass stdin to awk (and show nicely formatted size of files)

-rwxr-xr-x  1 root root           39 Aug 15  2020 7z
-rwxr-xr-x  1 root root           40 Aug 15  2020 7za
-rwxr-xr-x  1 root root           40 Aug 15  2020 7zr

Pass both stdin and file to awk

Check if text coming from stdin or file

Arrays intro

#!/usr/bin/awk -f
{
		myarr["hobits"]="hobitses"
		print(myarr["hobits"])
}

Store lines in array

#!/usr/bin/awk -f
/bilbo/ {
		myarr[NR]=$0
}
END{
		for (i in myarr){
					print "subscript is",i
							print myarr[i]
								}
}
subscript is 3
a story by frodo and bilbo
subscript is 13
bilbo again and frodo

Delete array elems

Array index concatenation

#!/usr/bin/awk -f
BEGIN{
	arr[1,2]="one"
	arr["abc","bcd"]="two"
	arr["abc",1]="three"
	arr["foo" "bar"]="four"
	arr["bar" "foo"]="five"
	for (i in arr){
		print "at index",i,"value is",arr[i]
	}
}
at index abc1 value is three
at index abcbcd value is two
at index foobar value is four
at index barfoo value is five
at index 12 value is one

Ordered array indexes

#!/usr/bin/awk -f
BEGIN{
	i=0
	arr[""]=0
}
/bilbo/{
	arr[i++]=$0
}
END{
	for (j=0;j<i;j++){
		print "at index",j,"value is",arr[j]
	}
}
at index 0 value is a story by frodo and bilbo with ring
at index 1 value is bilbo again and frodo and orcs

The power of printf

{printf("%d is nice but %.2f is better",1,2)}

The printf Statement
       The  AWK  versions of the printf statement and sprintf() function (see
       below) accept the following conversion specification formats:

       %a, %A  A floating point number  of  the  form  [-]0xh.hhhhp+-dd  (C99
               hexadecimal floating point format).  For %A, uppercase letters
               are used instead of lowercase ones.

       %c      A single character.  If the argument used for %c  is  numeric,
               it  is treated as a character and printed.  Otherwise, the ar‐
               gument is assumed to be a string, and the only first character
               of that string is printed.

       %d, %i  A decimal number (the integer part).

       %e, %E  A  floating  point number of the form [-]d.dddddde[+-]dd.  The
               %E format uses E instead of e.

       %f, %F  A floating point number of the  form  [-]ddd.dddddd.   If  the
               system  library  supports it, %F is available as well. This is
               like %f, but uses capital letters for special “not  a  number”
               and “infinity” values. If %F is not available, gawk uses %f.

       %g, %G  Use %e or %f conversion, whichever is shorter, with nonsignif‐
               icant zeros suppressed.  The %G format uses %E instead of %e.

       %o      An unsigned octal number (also an integer).

       %u      An unsigned decimal number (again, an integer).

       %s      A character string.

       %x, %X  An unsigned hexadecimal number (an integer).   The  %X  format
               uses ABCDEF instead of abcdef.

       %%      A single % character; no argument is converted.

       Optional,  additional parameters may lie between the % and the control
       letter:

       count$ Use the count'th argument at  this  point  in  the  formatting.
              This is called a positional specifier and is intended primarily
              for use in translated versions of format strings,  not  in  the
              original text of an AWK program.  It is a gawk extension.

       -      The expression should be left-justified within its field.

       space  For  numeric  conversions, prefix positive values with a space,
              and negative values with a minus sign.

       +      The plus sign, used before the width modifier (see below), says
              to  always  supply  a sign for numeric conversions, even if the
              data to be formatted is positive.  The +  overrides  the  space
              modifier.

  #      Use  an  “alternate form” for certain control letters.  For %o,
              supply a leading zero.  For %x, and %X, supply a leading 0x  or
              0X for a nonzero result.  For %e, %E, %f and %F, the result al‐
              ways contains a decimal point.  For %g, and %G, trailing  zeros
              are not removed from the result.

       0      A  leading  0  (zero)  acts  as  a flag, indicating that output
              should be padded with zeroes instead of spaces.   This  applies
              only  to the numeric output formats.  This flag only has an ef‐
              fect when the field  width  is  wider  than  the  value  to  be
              printed.

       '      A  single quote character instructs gawk to insert the locale's
              thousands-separator character into decimal numbers, and to also
              use  the  locale's  decimal point character with floating point
              formats.  This requires correct locale support in the C library
              and in the definition of the current locale.

       width  The  field  should  be padded to this width.  The field is nor‐
              mally padded with spaces.  With the 0 flag, it is  padded  with
              zeroes.

       .prec  A  number  that  specifies  the precision to use when printing.
              For the %e, %E, %f and %F, formats, this specifies  the  number
              of  digits  you want printed to the right of the decimal point.
              For the %g, and %G formats, it specifies the maximum number  of
              significant  digits.   For  the %d, %i, %o, %u, %x, and %X for‐
              mats, it specifies the minimum number of digits to print.   For
              the  %s  format,  it specifies the maximum number of characters
              from the string that should be printed.

Selective file processing

#!/usr/bin/awk -f
{
	if (FILENAME=="skip.txt" || FILENAME=="skip2.txt"){
		nextfile
	}
	print $0
}

Skip records (lines) based on a certain condition

Some math funcs

{
	print "log",log($1),$2
	print "col",cos($1),$2
	print "sin",sin($1),$2
	print "rand",rand()
}

Print some random integers (1000 rand ints between 0-100)

#!/usr/bin/awk -f
BEGIN {
for (i=0; i<1000; i++) printf("%d\n",rand()*100)
}

String funcs - index

#!/usr/bin/awk -f
{
	i=index($0,"bilbo")
	print $0
	if (i>0) print ">>>> INDEX IS",i
}

String funcs - length

[ Hello world world ] has length of 17
[ hello andback again Again ] has length of 25
[ a story by frodo and bilbo with ring ] has length of 36

String funcs - split

#!/usr/bin/awk -f
BEGIN {
	mystring="The best | leaf in the Shire, isn't it?"
	n=split(mystring,array," ")
	for (i in array) print array[i]
	print "TOTAL splits:",n
}
The
best
leaf
in
the
Shire,
isn't
it?
TOTAL splits: 8

String funcs - substr

#!/usr/bin/awk -f
BEGIN {
	mystring="lotr is cool"
	n=substr(mystring,1,3)
	print n
}

String funcs - gensub

#!/usr/bin/awk -f
BEGIN {
	mystring="Run Halifax. Show us the meaning of haste."
	res=gensub(/[Hh]ali/,"Shadow","g",mystring)
	print res
}

String func - gsub

String func - sub

String func - match

#!/usr/bin/awk -f
{
	i=match($0,/bilbo/)
	if (i>0){
		res=substr($0,i,5)
		print res
	}
}

String func - tolower

String func - toupper

String func - asort

#!/usr/bin/awk -f
BEGIN {
	for (i=0;i<5;i++){
		arr[i]=rand()*100
		print arr[i]
	}
	print ">> SORTED"
	asort(arr)
	for (i in arr)print arr[i]
}

92.4046
59.3909
30.6394
57.8941
74.0133
>> SORTED
30.6394
57.8941
59.3909
74.0133
92.4046

Time func - strftime

%a	The locale's abbreviated weekday name
%A	The locale's full weekday name
%b	The locale's abbreviated month name
%B	The locale's full month name
%c	The locale's "appropriate" date and time representation
%d	The day of the month as a decimal number (01--31)
%H	The hour (24-hour clock) as a decimal number (00--23)
%I	The hour (12-hour clock) as a decimal number (01--12)
%j	The day of the year as a decimal number (001--366)
%m	The month as a decimal number (01--12)
%M	The minute as a decimal number (00--59)
%p	The locale's equivalent of the AM/PM
%S	The second as a decimal number (00--61).
%U	The week number of the year (Sunday is first day of week)
%w	The weekday as a decimal number (0--6). Sunday is day 0
%W	The week number of the year (Monday is first day of week)
%x	The locale's "appropriate" date representation
%X	The locale's "appropriate" time representation
%y	The year without century as a decimal number (00--99)
%Y	The year with century as a decimal number
%Z	The time zone name or abbreviation
%%	A literal %.

Extract the inverse of a regex match

#!/usr/bin/awk -f
{
	reg="[Bb]ilbo"
	if (match($0,reg)){
		bef=substr($0,1,RSTART-1)
		aft=substr($0,RSTART+RLENGTH)
		pat=substr($0,RSTART,RLENGTH)
		print bef,"|",pat,"|",aft
	}
	else print $0
}

Hello world world
hello andback again Again
a story by frodo and  | bilbo |  with ring with bilbo
 | Bilbo |  baggins baggins baggins

Put all lines on one line

User declared funcs


#!/usr/bin/awk -f

# Declare custom func outside
function throw_ring(who){
	if (who=="gollum"){
		return 0
	}
	else if (who=="frodo"){
		return 1
	}
}

# On all records matching /ring/
/ring/{
# Find gollum or frodo
if (match($0,"gollum")){
	was_thrown=throw_ring("gollum")
	#use ternary if/else. If was_thrown is true (or bigger than 0) return "THROWN". Else return "NOT THROWN"
	print "the  ring throw status is ", was_thrown?"THROWN":"NOT THROWN"
}

if (match($0,"frodo")){
	was_thrown=throw_ring("frodo")
	print "the  ring throw status is ", was_thrown?"THROWN":"NOT THROWN"
}

}

More about conditional patterns

Advanced conditional patterns

Print the first Nth lines of every file

Print until you encounter this pattern. Then move to next file.

Print every Nth line of file

Vim

HOW TO READ SHORTCUTS:

<Esc> - Escape. Usually you should press the key, not type it. When you see ii<Esc> you type ii and then Escape, not 'ii<Esc>` literally.

<C-g> - means Ctrl+G

g<C-g> - means press g then press Ctrl+g

<S-g> - means press SHift+g

<C-Z> - means Ctrl+Shift+Z (Shift is not shown but implied since we have an uppercase Z)

<zZ> - z followed by Shift+Z

<C-m><CR> - type Ctrl+M followed by Enter. <CR> is Carriage Return (another word for Enter key)

Find files and send them to vim.

find ~ -size +10M | vim -

Encrypt

Macros

Open files in own tab/win

Info

Enter special chars

Quickly write buffer to disk

Sort

Paste register

Insert mode navig

Autocomplete

How to quit Vim

Repeat Insert

On the fly computations

History of past searches

Useful selection of g commands

Paste word under cursor

Create and use abbreviation.

iab sep #-----------------------------------------------<CR>

now in Insert mode type sep and press Tab. Your long separator is inserted. By using :iab you only create abbreviations for Insert mode and keep the other modes clean.

List words under cursor.

Pattern searching tricks

Use expression register to store and iterate

Save and load your sessions.

Go back in history.

Go back to the line you last edited

Terminal inside vim

Persistent undo

set undodir=~/.vim/undo-dir
set undofile

Load the previous command

Append your registers.

Range shortcut

Visual Block Syntax

I'm a long line, very vory long, YEAH.
Shory.
I'm a long line, very vory long, YEAH.

Quickly run an external command

Replace with confirmation

Documentation at your fingertips

Input/output with external commands.

  1. :!date - execute date command and print the results
  2. :r !date - execute date command and append the output after range last line. If no range provided use current line. Thus the results from date will be appended after the current line in our case.
  3. :w !date - send all the lines as stdin to date command. Print the results (no insertion into current buffer)
  4. :1,3!date send lines 1 to 3 as stdin to date command. Replace the lines in range (1,3) with output from date.

Put each word on a new line

Buffer delete

Autocomplete with files/lines

Repeat your insert

Better wrap

hello wo
rld of v
im
hello 
world of 
vim

Indent in Insert mode

Smart folding

Increment numbers

Copy a protected file and edit it immediately.

Comment a whole file (or a portion)

Keep temp commands in registers

#bla bla
-1,+1norm i#

Open extra files with terminal commands like find

:args `find /var/log -size +1M -name '*.py' \|\| true`

(OR)

:args `find /var/log -size +1M -name '*.py' \| xargs -n1`

Format with care

patterns

Quickly edit and reload your .vimrc