Home

Awesome

pz

Build Status Downloads

Ever wished to use Python in Bash? Would you choose the Python syntax over sed, awk, ...? Should you exactly know what command would you use in Python, but you end up querying man again and again, read further. The utility allows you to pythonize the shell: to pipe arbitrary contents through pz, loaded with your tiny Python script.

How? Simply meddle with the s variable. Example: appending '.com' to every line.

$ echo -e "example\nwikipedia" | pz 's += ".com"'
example.com
wikipedia.com

Installation

Install with a single command from PyPi.

pip3 install pz    

Or download and launch the pz file from here.

Examples

How does your data look when pythonized via pz? Which Bash programs may the utility substitute?

Extract a substring

Just use the [:] notation.

echo "hello world" | pz s[6:]  # world

<sub>Note that suppressing quotes around the argument may not work (Zsh) or lead to an unexpected behaviour: touch s1 && echo "hello" | pz s[1]Exception: <class 'NameError'>. Use echo "hello" | pz 's[1]' instead.</sub>

Prepend to every line in a stream

We prepend the length of the line.

# let's use the f-string `--format` flag
tail -f /var/log/syslog | pz -f '{len(s)}: {s}' 

# or do it the long way, explicitly setting the `s` variable
tail -f /var/log/syslog | pz 's = str(len(s)) + ": " + s'

Converting to uppercase

Replacing | tr '[:upper:]' '[:lower:]'.

echo "HELLO" | pz s.lower  # "hello"

Reversing lines

Replacing | tac or | tail -r (on some systems only) or | sed '1!G;h;$!d' (for cool guys only)

$ echo -e "1\n2\n3" | pz -E 'lines[::-1]'
3
2
1

Parsing numbers

Replacing cut. Note you can chain multiple pz calls. Split by a comma ',', then use n to access the line converted to a number.

echo "hello,5" | pz 's.split(",")[1]' | pz n+7  # 12

Find out all URLs in a text

Replacing sed. We know that all functions from the re library are already included, ex: "findall".

# either use the `--findall` flag
pz --findall "(https?://[^\s]+)" < file.log

# or expand the full command to which is the `--findall` flag equivalent
pz "findall(r'(https?://[^\s]+)', s)" < file.log

If chained, you can open all the URLs in the current web browser. Note that the function webbrowser.open gets auto-imported from the standard library.

pz --findall "(https?://[^\s]+)" < file.log | pz webbrowser.open

Sum numbers

Replacing | awk '{count+=$1} END{print count}' or | paste -sd+ | bc. Just use sum in the --end clause.

# internally changed to --end `s = sum(numbers)`
echo -e "1\n2\n3\n4" | pz --end sum  # 10

Keep unique lines

Replacing | sort | uniq makes little sense, but the demonstration gives you the idea. We initialize a set c (like a collection). When processing a line, skip is set to True if already seen.

$ echo -e "1\n2\n2\n3" | pz "skip = s in c; c.add(s)"  --setup "c=set()"
1
2
3

However, an advantage over | sort | uniq comes when handling a stream. You see unique lines instantly, without waiting a stream to finish. Useful when using with tail --follow.

Alternatively, to assure the values are sorted, we can make a use of --end flag that produces the output after the processing finished.

echo -e "1\n2\n2\n3" | pz "S.add(s)" --end "sorted(S)" -0

Note that we used the variable S which is initialized by default to an empty set (hence we do not have to use --setup at all) and the flag -0 to prevent the processing from output (we do not have to use skip parameter then).

<sub>(Strictly speaking we could omit -0 too. If you use the verbose -v flag, you would see the command changed to s = S.add(s) internally. And since set.add produces None output, it is the same as if it was skipped.)</sub>

We can omit (s) in the main clause and hence get rid of the quotes all together.

echo -e "1\n2\n2\n3" | pz S.add --end "sorted(S)"

Nevertheless, the most straightforward approach would involve the lines variable, available when using the --end clause.

echo -e "1\n2\n2\n3" | pz --end "sorted(set(lines))"

Counting words

We split the line to get the words and put them in S, a global instance of the set. Then, we print the set length to get the number of unique words.

echo -e "red green\nblue red green" | pz 'S.update(s.split())' --end 'len(S)'  # 3

But what if we want to get the most common words and the count of its usages? Let's use C, a global instance of the collections.Counter. We see then the red is the most_common word and has been used 2 times.

$ echo -e "red green\nblue red green" | pz 'C.update(s.split())' --end C.most_common
red	2
green	2
blue	1

Aggregating suffixes in a directory

To get a quick notion about the number of file extensions dwelling on a path, firstly convert file names to the suffixes. Then, feed them to the collections.Counter constructor.

$ ls
a.txt  b.txt  c.txt  v1.mp4  v2.mp4

$ ls | pz 'Path(s).suffix' | pz --end 'Counter(lines).most_common' 
.txt	3
.mp4	2

Fetching web content

Accessing internet is easy thanks to the requests library. Here, we fetch example.com, grep it for all lines containing "href" and print them out while stripping spaces.

$ echo "http://example.com" | pz 'requests.get(s).content' | grep href | pz s.strip 
<p><a href="https://www.iana.org/domains/example">More information...</a></p>

To see how auto-import are resolved, use the verbose mode. (Notice the line Importing requests.)

$ echo "http://example.com" | pz 'requests.get(s).content' -v | grep href | pz s.strip 
Changing the command clause to: s = requests.get(s).content
Importing requests
<p><a href="https://www.iana.org/domains/example">More information...</a></p>

Handling nested quotes

To match every line that has a quoted expressions and print out the quoted contents, you may serve yourself of Python triple quotes. In the example below, an apostrophe is used to delimit the COMMAND flag. If we used an apostrophe in the text, we would have to slash it. Instead, triple quotes might improve readability.

echo -e 'hello "world".' | pz 'match(r"""[^"]*"(.*)".""", s)' # world

In that case, even better is to use the --match flag to get rid of the quoting as much as possible.

echo -e 'hello "world".' | pz --match '[^"]*"(.*)"'  # world

Computing factorial

Take a look at multiple ways. The simplest is to use the function.

echo 5 | pz factorial  # 120

What happens in the background? factorial is available from math.factorial. Since it is a callable, we try to put current line as the parameter: factorial(s). Since s = "5" which means a string, it fails. It then tries to use factorial(n) where n is current line automatically fetched to a number. That works.

Harder way? Let's use math.prod then.

echo 5 | pz 'prod(i for i in range(1,n+1))'  # 120

Without any built-in library? Let's just use a for-cycle. Process all numbers from 1 to n (which is 5) and multiply to product. Finally, assign n to s which is output.

echo 5 | pz 'for c in range(1,n): n*= c ; s = n'   # 120

Using generator will print a factorial for every number from 1 to -g.

$ pz factorial -g5
1
2
6
24
120

Read CSV

As csv is one of the auto-imported libraries, we may directly access instantiate the reader object. In the following example, we output the second element of every line either progressively or at once when processing finished.

# output line by line
echo '"a","b1,b2,b3","c"' | pz "(x[1] for x in csv.reader([s]))"  # "b1,b2,b3"

# output at the end
echo '"a","b1,b2,b3","c"' | pz --end "(x[1] for x in csv.reader(lines))"  # "b1,b2,b3"   

Generate random number

First, take a look how to stream random numbers to 100 in Bash.

while :; do echo $((1+$RANDOM%100)); done

Now examine pure Python solution, without having pz involved.

python3 -c "while True: from random import randint; print(randint(1,100))"

Using pz, we relieve the cycle handling and importing burden from the command.

pz "randint(1,100)" --generate=0

Let's generate few random strings of variable length 1 to 30. When generator flag is used without a number, it cycles five times.

pz "''.join(random.choice(string.ascii_letters) for _ in range(randint(1,30)))" -S "import string" -g

Average a stream value

Let's have a stream and output the average value.

# print out current line `count` and current average `sum/count`
$ while :; do echo $((1 + $RANDOM % 100)) ; sleep 0.1; done | pz 'sum+=n;s=count, sum/count' --setup "sum=0"
1	38.0
2	67.0
3	62.0
4	49.75

# print out every 10 000 lines
# (thanks to `not i % 10000` expression) 
$ while :; do echo $((1 + $RANDOM % 100)) ;  done | pz 'sum+=n;s=sum/count; s = (count,s) if not count % 10000 else ""' --setup "sum=0"
10000	50.9058
20000	50.7344
30000	50.693466666666666
40000	50.5904

How can this be simplified? Let's use an infinite generator -g0. As we know, n is given current line number by the generator and i is by default implicitly declared to i=0 so we use it to hold the sum. No setup clause needed. No Bash cycle needed.

$ pz "i+=randint(1,100); s = (n,i/n) if not n % 10000 else ''" -g0
10000	49.9488
20000	50.5399
30000	50.39906666666667
40000	50.494425

Multiline statements

Should you need to evaluate a short multiline statement, use standard multiline statements, supported by Bash.

$ echo -e "1\n2\n3" | pz "if n > 2:
  s = 'bigger'
else:
  s = 'smaller'
"
smaller
bigger
bigger

Simple progress bar

Simulate a lengthy processing by generating a long sequence of numbers (as they are not needed, we throw them away by 1>/dev/null). On every 100th line, we move cursor up (\033[1A), clear line (\033[K) and print to STDERR current status.

$ seq 1 100000 | pz 's = f"\033[1A\033[K ... {count} ..." if count % 100 == 0 else None ' --stderr 1>/dev/null
 ... 100 ...  # replaced by ... 200 ...

Docs

Scope variables

In the script scope, you have access to the following variables:

s – current line

Change it according to your needs

echo 5 | pz 's += "4"'  # 54 

n – current line converted to an int (or float) if possible

echo 5 | pz n+2  # 7
echo 5.2 | pz n+2  # 7.2

b – current line as a byte-string

Sometimes the input cannot be converted to str easily. A warning is output, however, you can still operate with raw bytes.

echo -e '\x80 invalid line' | pz s
Cannot parse line correctly: b'\x80 invalid line'
� invalid line

# use the `--quiet` flag to suppress the warning, then decode the bytes
echo -e '\x80 invalid line' | pz 'b.decode("cp1250")' --quiet
€ invalid line

count – current line number

# display every 1_000nth line
$ pz -g0 n*3 | pz "n if not count % 1000 else None"
3000
6000
9000

# the same, using the `--filter` flag
$ pz -g0 n*3 | pz -F "not count % 1000"

text – whole text, all lines together

Not available with the --overflow-safe flag set nor in the main clause unless the --whole flag set. Ex: get character count (an alternative to | wc -c).

echo -e "hello\nworld" | pz --end 'len(text)' # 11

When used in the main clause, an error appears.

$ echo -e "1\n2\n3" | pz 'len(text)'
Did not you forget to use --text?
Exception: <class 'NameError'> name 'text' is not defined on line: 1

Appending --whole helps, but the result is processed for every line.

$ echo -e "1\n2\n3" | pz 'len(text)' -w 
5
5
5

Appending -1 makes sure the statement gets computed only once.

$ echo -e "1\n2\n3" | pz 'len(text)' -w1
5

lines – list of lines so far processed

Not available with the --overflow-safe flag set.
Ex: returning the last line

echo -e "hello\nworld" | pz --end lines[-1]  # "world"

numbers – list of numbers so far processed

Not available with the --overflow-safe flag set.
Ex: show current average of the stream. More specifically, we output tuples: line count, current line, average.

$ echo -e "20\n40\n25\n28" | pz 's = count, s, sum(numbers)/count'
1	20	20.0
2	40	30.0
3	25	28.333333333333332
4	28	28.25

skip line

If set to True, current line will not be output. If set to False when using the -0 flag, the line will be output regardless.

i, S, L, D, C – other global variables

Some variables are initialized and ready to be used globally. They are common for all the lines.

<sub>It is true that using uppercase is not conforming the naming convention. However, in these tiny scripts the readability is the chief principle, every character counts.</sub>

Using a set S. In the example, we add every line to the set and end print it out in a sorted manner.

$ echo -e "2\n1\n2\n3\n1" | pz "S.add(s)" --end "sorted(S)"
1
2
3  

Using a list L. Append lines that contains a number bigger than one and finally, print their count. As only the final count matters, suppress the line output with the flag -0.

$ echo -e "2\n1\n2\n3\n1" | pz "if n > 1: L.append(s)" --end "len(L)" -0
3  

Auto-import

Caveat: When accessed first time, the auto-import makes the row reprocessed. It may influence your global variables. Use verbose output to see if something has been auto-imported.

$ echo -e "hey\nbuddy" | pz 'a+=1; sleep(1); b+=1; s = a,b ' --setup "a=0;b=0;" -v
Importing sleep from time
2	1
3	2

As seen, a was incremented 3× times and b on twice because we had to process the first line twice in order to auto-import sleep. In the first run, the processing raised an exception because sleep was not known. To prevent that, explicitly appending from time import sleep to the --setup flag would do.

Output

CLI flags

Command clauses

Input / output

Regular expressions shortcuts

Bash completion

  1. Run: apt-get install bash-completion jq
  2. Copy: extra/pz-autocompletion.bash to /etc/bash_completion.d/
  3. Restart terminal