Home

Awesome

Nim for awk programmers

A library of GNU awk functions for nim. Standard awk library functions written in and for nim.

Awk and nim can look very similair. Example awk program that prints the word "text":

BEGIN{
  str = "This is <a href=\"my text\">here</a>"
  if(match(str, "<a href=\"my text\">", dest)) {
    split(dest[0], arr, "\"")
    if(arr[2] ~ /text/)
      print substr(arr[2], 4, length(arr[2]))
  }
}

nim version:

import awk

var str = "This is <a href=\"my text\">here</a>"
if(match(str, "<a href=\"my text\">", dest) > 0):   
  awk.split(dest, arr, "\"")
  if(arr[1] ~ "text"):
    echo awk.substr(arr[1], 3, len(arr[1]) - 1)  

Nim compiles to C source, which compiles to a standalone binary executable using gcc. The nim compile (c) and run (-r) command:

nim c -r "test.nim"
text

Versions

Most of the nim procs in this package deal with awk's regex functionality.

Two versions are included: awk.nim uses the "re" module and awknre.nim uses the "nre" module.

The re module is significantly faster and recommended, but awknre.nim is included for backwards compat since the first version of this package used it and there may be some differences in regex options.

Functions

~ and !~

Emulate awk's ~ and !~ commands which can be thought of as a regex-enabled version of contains() in nim.

proc `~`*(source, pattern: string): bool 
proc `!~`*(source, pattern: string): bool 

Nim does not have an equivilent of awk's // to signify a text is regex. Therefore all text to the right of ~ is treated as regex. To do a literal string test use == instead of ~

Use grouping () when building a string with '&', for example:

if s ~ ("^" & re & "$"):

Example:

import awk
if "george" ~ "ge.*?rge":
  echo "true" #=> true

>* and >>

Write text to a file (append or overwrite)

proc `>*`(text, filename: string): bool

Write text to filename, overwrite previous content. Close on finish.

proc `>>`(text, filename: string): bool

Append text to filename. Close on finish.

Example:

"Hello" & " world" >* "/tmp/test.txt"
"Hello" >* "/dev/stderr"

Note that awk's ">" is refactored as ">*" to avoid conflicting with nim's ">"

match

Find regex pattern in source and optionally store result in dest.

proc match(source: string, pattern: string [, dest: string]): int

Example:

import awk
if match("this is a test a", "s.*?a", a) > 0:
  echo a #=> "s is a"   

split

Split source along regex match and store segments in dest.

template split(source: string, dest: untyped, match: string): int

The function behaves much like awk:

Example:

import awk
awk.split("This is a string", arr, "is")
echo arr[0] #> "Th"

gsub

Global substitute the regex pattern with replacement in the source string

gsub(pattern: string, replacement: string, source: string): string

gsub() returns the new string in addition to changing the source string in-place. It is discardable.

If the source string is not a var (let, const or literal string) the source string is not modified in-place.

Example 1:

str = "this is is string"
gsub("[ ]is.*?st", " is a st", str)   
echo str #=> "this is a string"

Example 2:

echo gsub("[ ]is.*?st", " is a st", "this is is string")   
=> "this is a string"

Caution: a self-reference will not produce expected results. For example this doesn't produce an error but doesn't work:

str = "abc"
str = gsub("b", "z", str)

gsubi

Global substitute the regex pattern with replacement in the source string, leaving the source string unmodified

gsubi(pattern: string, replacement: string, source: string): string

gsubi() returns the new string but leaves the source string untouched.

Example 1:

str = "this is is string"
echo gsubi("[ ]is.*?st", " is a st", str)  #=> "this is a string"
echo str #=> "this is is string"

gsubs

Global substitute non-regex pattern with replacement in the source string. A literal-string version of gsub()

gsubs(pattern: string, replacement: string, source: string): string

gsubs() returns the new string in addition to changing the source string in-place. It is discardable.

Example 1:

str = "this is is string"
gsubs(" is is st", " is a st", str)   
echo str #=> "this is a string"

Example 2:

echo gsubs(" is is st", " is a st", str)   
=> "this is a string"

sub

sub(pattern: string, replacement: string, source: string [, occurance: int]): string

Substitute in-place the first occurance of regex pattern with replacement in source string Optional occurance substitute at the Xth occurance.

If source is not a pre-declared variable, sub returns the new string but does not sub in-place Substitutions are non-overlap eg. sub("22","33","222222") => "333333" not "3333333333"

Example:

str = "This is a sring"
sub("[ ]is[ ]", " or ", str)                       # substitute 'str' in-place.
echo str #=> "This or a string"
echo sub("[ ]is[ ]", " or ", "This is a sring")    # doesn't sub "This is a sring" in-place, returns a new string

subs

Single substitute non-regex pattern with replacement in the source string. A literal-string version of sub(). See gsubs() for documentation

patsplit

Divide source into pieces defined by regex pattern and store the pieces in seq field. Optional sep stores the seperators.

patsplit(source: string, field: seq, pattern: string [, sep: seq]): int

patsplit() behaves as follows:

Example 1:

var str = "This is <!--comment1--> a string <!--comment2--> with comments."
var field = newSeq[string](0)
if patsplit(str, field, "<[ ]{0,}[!].*?>") > 0:
  echo field[0] #=> "<!--comment1-->"
  echo field[1] #=> "<!--comment2-->"

Example 2:

var ps = "This is <!--comment--> a string <!--comment2--> with comments."
var field, sep = newSeq[string](0)
patsplit(ps, field, "<[ ]{0,}[!].*?>", sep)
echo sep[1] #=> " a string "
echo unpatsplit(field, sep)

unpatsplit

Recombine two sequences created by patsplit()

unpatsplit(field: seq, sep: seq)

Given two seq's created by patsplit, recombine into a single string in alternating sequence ie. field[0] & seq[0] & field[1] & seq[1] etc.

If field has more elements than sep, return ""

substr

Return length-character long substring of source starting at char number start

substr(source: string, start: int [, length: int]): str

Example:

echo awk.substr("Hello World", 3)
#> "lo World"
echo awk.substr("Hello World", 3, 2)
#> "lo"

index

Return the start location (index) of the first occurance of non-regex target in source

index(source: string, target: string): int

Example

var loc = index("This is string", "is")
echo loc #=> 2

Techniques

associative arrays

Awk uses associative arrays. Nim also supports associative arrays, called "tables".

For example in awk to uniqe a list of words:

split("Blue Blue Red Green", arr, " ")        # Whoops, let's get rid of the extra "Blue"

for(i in arr)
  uarr[i] = 1
for(i in uarr)
  print i

The equivilent in Nim:

import strutils, tables

var 
  arr = split("Blue Blue Red Green", " ")     # list of words containing a duplicate
  uarr = initTable[string, int]()             # create empty table (associative array) to hold words

for i in arr:                                 # unique the list
  uarr[i] = 1
for j in uarr.keys:                           # print the list     
  echo j

Getting started with nim