Home

Awesome

hsp

hsp is a command line text processor that provides most of the functionality of grep, sed, awk, and much more using standard haskell text and list functions as well as custom functions. hsp uses a haskell interpreter (from the hint package) that makes available any function or operator defined in the Prelude, Data.Text (qualified as T), Data.List, and several other modules. The interface is largely based on the Python Pyed Piper project, first developed by Toby Rosen at Sony Imageworks. However, hsp is much faster.

Why use hsp?

The unix utilities are powerful and fast, but the syntax can be difficult to remember. On the other hand, if one is a haskell user on a regular basis, hsp provides an intuitive and easy to master syntax, with acceptable performance even with large amounts of text. Another advantage of hsp compared with its unix cousins is hsp's easy-to-use macro facility, which offers a convenient way to save, list, search, and recall complex, frequently used, text manipulations without the need for collecting and managing executable scripts.

Installation

Step 1

Install stack if not already installed. See https://docs.haskellstack.org/en/stable/install_and_upgrade/ for instructions.

Step 2

Download the source from github, build the executable, and install the hsp script, using the following commands.

git clone https://github.com/bawolk/hsp
cd hsp
make && make install

This will create an executable script hsp and copy it to ~/.local/bin. If that directory does not exist, or that directory is not in your $PATH, simply copy the hsp file from the hsp source directory to a directory in your $PATH. The hsp script runs stack in the downloaded source directory, so the source directory should not be deleted.

A simple example

Suppose you have a large list of words in a file (say the ~102K words in the /usr/share/dict/words or equivalent file provided by most linux distributions). You wish to find all of the words that have five or more characters and are palindromes. Futhermore, when you output the result, you wish to reverse the order of the words and prefix each word with the string "Palindrome: ". We can accomplish this with hsp as follows:

$ cat /usr/share/dict/words | hsp 'T.length p > 4 && p == T.reverse p | reverse pp | "Palindrome: " <> p'
Palindrome: tenet
Palindrome: stats
(+ 15 more)

This contrived example illustrates several important features of hsp. The command string provided to hsp is contained within single quotes. Double quotes are reserved for string literals. The command string can be divided into pipes using the "|" character, quite analagous to unix pipes. The output of each pipe is the input to the next. p and pp are the workhorse variables of hsp. p represents the line-by-line output from the previous pipe. pp represents a list of all the lines of the previous pipe. If the statement within a given pipe yields a Bool (i.e., True or False), input lines that yield True are kept and lines that yield False are discarded. As a general rule, any Haskell function that takes a Text argument can be used with p and any Haskell function that takes a list argument (technically, Ord a => [a], which is why a function such as sort will work) can be used with pp.

Splits, joins, and lists

In the simple example above, the output of each of the three pipes in the pipeline is a list of Text. But there is a second acceptable type of output: a list of lists of Text. This allows a user to use a haskell expression to divide each line into various fields which are then input into the next pipe to be further manipulated. To this end, hsp provides a number of special built-in splits and joins that make it easy to accomplish most of the common ways lines are split and then joined.

  $ echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'c'
  [0][[0]a[1]1[2]cat]
  [1][[0]b[1]3[2]dog]
  [2][[0]d[1]7[2]fish]
  $ echo $'1@@cat\n3@@dog\n1@@fish' |hsp 'splitOn "@@" p'
  [0][[0]1[1]cat]
  [1][[0]3[1]dog]
  [2][[0]1[1]fish]
   $ echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'colon | u'
   a_1_cat
   b_3_dog
   d_7_fish
   $ echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'c | T.intercalate "%%" p'
   a%%1%%cat
   b%%3%%dog
   d%%7%%fish
  $ echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'c | drop 1 p'
  [0][[0]1[1]cat]
  [1][[0]3[1]dog]
  [2][[0]7[1]fish]
  $ echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'c | p !!! [2, 1, 2]'
  [0][[0]cat[1]1[2]cat]
  [1][[0]dog[1]3[2]dog]
  [2][[0]fish[1]7[2]fish]
  echo $'a:1:cat\nb:3:dog\nd:7:fish' | hsp 'c |  p!!2 <> "*" <> p!!0 <> "-" <> p!!1'
  cat*a-1
  dog*b-3
  fish*d-7

Operations on the entire input list

The entire input list can be manipulated using the pp variable. In the simple example above, sort pp sorted the input. Duplicate lines can be deleted using nub pp or the equivalent hsp function uniq pp. Other haskell functions, such as drop, take, tail, and init can be used as well. When pp is used, the output is displayed as a numbered list.

  $ echo $'aa\nbb\naa' | hsp 'dropWhile (T.isPrefixOf "aa" . getText) pp'
  [0]bb
  [1]aa
  $ echo $'1:a\n2:b\n1:c' | hsp 'c | dropWhileEnd (\tt -> (head $ getText tt) == "1") pp | c'
  1:a
  2:b
  $ echo $'cat\ndog' | hsp '["Animals", "======="] ++ pp ++ ["bird"]'
  [0]Animals
  [1]=======
  [2]cat
  [3]dog
  [4]bird
  $ echo $'bird\ndog' | hsp '[text ("pwd: " <> pwd)] ++ pp| p'
  pwd: /home/user/hsp
  bird
  dog
  $ echo $'cat\ndog' | hsp '["Animals", "======="] ++ pp ++ ["bird"] | oneline pp'
  Animals ======= cat dog bird
  $ echo $'1:cat\n3:dog' | hsp 'c | expand pp'
  [0]1
  [1]cat
  [2]3
  [3]dog
  $ echo $'1:cat\n3:dog' | hsp 'c | expand pp | p'
  1
  cat
  3
  dog

Filtering lines

If the expression in a pipe results in a Bool (i.e., True or False), the pipe acts as a filter. Lines that result in True are passed to the output; lines that result in False are dropped. Standard haskell logic operators such as &&, ||, and not are available.

  $ echo  $'1:cat\n3:dog\n1:fish' | hsp  'T.isPrefixOf "1" p && T.length p > 5'
  1:fish
    $ echo  $'1:cat\n3:dog\n1:fish' | hsp 'k ["og", "is"]'
    3:dog
    1:fish
    $ echo  $'1:cat\n3:dog\n1:fish' | hsp 'l ["og", "is"]'
    1:cat
    $ echo  $'1:cat\n3:dog\n1:fish' | hsp 'rek "^1.*i"'
    1:fish
    $ echo  $'1:cat\n3:dog\n1:fish' | hsp 'rel "h$"'
    1:cat
    3:dog
  $ echo  $'1:cat\n3:dog\n1:fish' | hsp  'c | p !! 1 == "dog" | c'
  3:dog
  $ echo  $'1:cat\n3:dog\n1:fish' | hsp --keep-false 'c | p !! 0 == "1" | c'
  1:cat

  1:fish

Line numbers

hsp provides a special variable, n, that represents the zero-based line number of each line. The type of n is Int to facilitate arithmetic operations. Typically one uses the special hsp function tshow to display n. The usual haskell show function yields a String, but hsp needs a Text.

$ echo $'cat\ndog\nfish' | hsp 'tshow (n + 1)  <> " " <> p'
1 cat
2 dog
3 fish

Math

hsp provides two functions to facilitate basic mathematical operations: integer and double. Currently, if the string cannot be parsed into the appropriate type, the hsp command will fail. The second example below illustrates how hsp can be used to sum a list of numbers. Recall that after a split, p represents a list of Text. Also note the use of tshow to convert from a number to Text.

$ echo $'12\n73\n7' | hsp 'tshow (integer p + 1)'
13
74
8
$ echo $'12\n73\n7' | hsp 'oneline pp | w | (tshow . sum) $ integer <$> p'
92

History

hsp caches the output of each pipe in the hsp command pipeline for possible use later in the pipeline. Two functions, hp and hpp, provide access to this cache. Each takes as its only argument a literal positive or negative integer. A positive integer, n, refers to the nth (zero-based pipe), with n = 0 refering to the original standard input to the hsp command. A negative integer, -n, counts backward from the current pipe. Thus, -1 refers to the immediately preceeding pipe. To limit unnecessary memory use, no cacheing occurs if history is not used in the pipeline.

  echo  $'1:cat\n3:dog\n1:fish' | hsp 'c|u|hp 1'
  [0][[0]1[1]cat]
  [1][[0]3[1]dog]
  [2][[0]1[1]fish]
  $ echo $'cat\ndog\nbird\nfish' | hsp 'sort pp | lose ["do"] | pp ++ ["rat"] | upper p <> "-" <> hp 0 <> "-" <> o'
  BIRD-cat-bird
  CAT-dog-cat
  FISH-bird-fish
  RAT-fish-rat

Other inputs

The normal input to the hsp command is via standard input, usually by piping into it via a shell command, often cat. But sometimes it may be necessary to combine two streams of inputs, such as combining the output of two shell commands (or two files) line by line. hsp provides several ways to accomplish this.

  $ echo $'cat\nfish' | hsp 'p <> " " <> sp' dog bird
  cat dog
  fish bird
  $ echo $'cat\nfish' | hsp 'p <> " " <> sp' `ls`
  cat app
  fish ChangeLog.md
  $ cat f.txt | hsp -t g.txt `p <> " " <> fp`
  fline1 gline1
  fline2 gline2
  ...
  $ ls short_directory | hsp 'pp ++ blanklines 10 | p <> "-" <> sp' `ls long_directory`
  short1-long1
  short2-long2
  -long3
  -long4
  ...
  $ hsp -b 4 'tshow(n+1)'
  1
  2
  3
  4

Macros

Macros are a way to permanently store useful commands for future use. Macros can become quite complex, and provide a useful intermediate between shell commands and scripts, especially for solving one-time problems.

    $ hsp -s "palindrome# palindrome finder" 'T.length p > 4 && p == T.reverse p | reverse pp | "Palindrome: " <> p'

Custom functions

hsp provides a number of special functions and imports all the haskell functions in the Data.Text (qualified by "T") and Data.List modules. You can make new functions available to hsp by placing them in a module named HspCustom.hs. The default directory is your home directory, but this can be changed by setting the HSP_CUSTOM variable in the hsp executable script. The source directory already has an HspCustom.hs.example file with a sample function to illustrate what is needed. To experiment with the sample function, just copy the example file to HspCustom.hs in your home directory and run hsp. Note that you must import any modules that you need for your function.

Bash quirks

Because the hsp command line string must first be parsed by bash, there are some issues to keep in mind.

Additional variables and functions

In addition to the variables and functions already discussed, hsp provides other useful special variables and functions.

Variables

Text functions

These functions take a Text argument, often p, sp, fp, or hp.

Regular expression functions: re and sub

hsp provides two text functions that take a regular expression as their first argument. Note that hsp uses "POSIX extended regular expressions", which is the same standard used by egrep. This differs from the Perl and python standards, but there is considerable overlap.

Functions on pp and hpp (when they are not split), spp and fpp.

hsp system functions

Testing

The testing is fairly primitive currently. There is a series of system tests that assure that hsp's basic functionality is correct. These can be run using shelltestrunner, a command-line tool for testing command-line programs, which is available in most distributions. Once shelltestrunner is installed, just run shelltest test in the hsp package directory.