Home

Awesome

<p align="center"><strong><a href="https://github.com/modernish/modernish/releases">Releases</a></strong></p> <p align="center"><strong>For code examples, see <a href="https://github.com/modernish/modernish/blob/master/EXAMPLES.md"> <code>EXAMPLES.md</code></a> and <a href="https://github.com/modernish/modernish/tree/master/share/doc/modernish/examples"> <code>share/doc/modernish/examples</code></a> </strong></p>

modernish – harness the shell

Modernish is a library for shell script programming which provides features like safer variable and command expansion, new language constructs for loop iteration, and much more. Modernish programs are shell programs; the new constructs are mixed with shell syntax so that the programmer can take advantage of the best of both.

There is no compiled code to install, as modernish is written entirely in the shell language. It can be deployed in embedded or multi-user systems in which new binary executables may not be introduced for security reasons, and is portable among numerous shell implementations. The installer can also bundle a reduced copy of the library with your scripts, so they can run portably with a known version of modernish without requiring prior installation.

Join us and help breathe some new life into the shell! We are looking for testers, early adopters, and developers to join us. Download the latest release or check out the very latest development code from the master branch. Read through the documentation below. Play with the example scripts and write your own. Try to break the library and send reports of breakage.

Table of contents

Getting started

Run install.sh and follow instructions, choosing your preferred shell and install location. After successful installation you can run modernish shell scripts and write your own. Run uninstall.sh to remove modernish.

Both the install and uninstall scripts are interactive by default, but support fully automated (non-interactive) operation as well. Command line options are as follows:

install.sh [ -n ] [ -s shell ] [ -f ] [ -P pathspec ] [ -d installroot ] [ -D prefix ] [ -B scriptfile ... ]

uninstall.sh [ -n ] [ -f ] [ -d installroot ]

Two basic forms of a modernish program

In the simple form, modernish is added to a script written for a specific shell. In the portable form, your script is shell-agnostic and may run on any shell that can run modernish.

Simple form

The simplest way to write a modernish program is to source modernish as a dot script. For example, if you write for bash:

#! /bin/bash
. modernish
use safe
use sys/base
...your program starts here...

The modernish use command load modules with optional functionality. The safe module initialises the safe mode. The sys/base module contains modernish versions of certain basic but non-standardised utilities (e.g. readlink, mktemp, which), guaranteeing that modernish programs all have a known version at their disposal. There are many other modules as well. See Modules for more information.

The above method makes the program dependent on one particular shell (in this case, bash). So it is okay to mix and match functionality specific to that particular shell with modernish functionality.

(On zsh, there is a way to integrate modernish with native zsh scripts. See Appendix E.)

Portable form

The most portable way to write a modernish program is to use the special generic hashbang path for modernish programs. For example:

#! /usr/bin/env modernish
#! use safe
#! use sys/base
...your program begins here...

For portability, it is important there is no space after env modernish; NetBSD and OpenBSD consider trailing spaces part of the name, so env will fail to find modernish.

A program in this form is executed by whatever shell the user who installed modernish on the local system chose as the default shell. Since you as the programmer can't know what shell this is (other than the fact that it passed some rigorous POSIX compliance testing executed by modernish), a program in this form must be strictly POSIX compliant – except, of course, that it should also make full use of the rich functionality offered by modernish.

Note that modules are loaded in a different way: the use commands are part of hashbang comment (starting with #! like the initial hashbang path). Only such lines that immediately follow the initial hashbang path are evaluated; even an empty line in between causes the rest to be ignored. This special way of pre-loading modules is needed to make any aliases they define work reliably on all shells.

Interactive use

Modernish is primarily designed to enhance shell programs/scripts, but also offers features for use in interactive shells. For instance, the new repeat loop construct from the var/loop module can be quite practical to repeat an action x times, and the safe module on interactive shells provides convenience functions for manipulating, saving and restoring the state of field splitting and globbing.

To use modernish on your favourite interactive shell, you have to add it to your .profile, .bashrc or similar init file.

Important: Upon initialising, modernish adapts itself to other settings, such as the locale. It also removes certain aliases that may keep modernish from initialising properly. So you have to organise your .profile or similar file in the following order:

Non-interactive command line use

After installation, the modernish command can be invoked as if it were a shell, with the standard command line options from other shells (such as -c to specify a command or script directly on the command line), plus some enhancements. The effect is that the shell chosen at installation time will be run enhanced with modernish functionality. It is not possible to use modernish as an interactive shell in this way.

Usage:

  1. modernish [ --use=module | shelloption ... ] [ scriptfile ] [ arguments ]
  2. modernish [ --use=module | shelloption ... ] -c [ script [ me-name [ arguments ] ] ]
  3. modernish --test [ testoption ... ]
  4. modernish [ --version | --help ]

In the first form, the script in the file scriptfile is loaded and executed with any arguments assigned to the positional parameters.

In the second form, -c executes the specified modernish script, optionally with the me-name assigned to $ME and the arguments assigned to the positional parameters.

The --use option pre-loads any given modernish modules before executing the script. The module argument to each specified --use option is split using standard shell field splitting. The first field is the module name and any further fields become arguments to that module's initialisation routine.

Any given short-form or long-form shelloptions are set or unset before executing the script. Both POSIX shell options and shell-specific options are supported, depending on the shell executing modernish. Using the shell option -e or -o errexit is an error, because modernish does not support it and would break.

The --test option runs the regression test suite and exits. This verifies that the modernish installation is functioning correctly. See Appendix B for more information.

The --version and --help options output the relative information and exit.

Non-interactive usage examples

Shell capability detection

Modernish includes a battery of shell feature, quirk and bug detection tests, each of which is given a special capability ID. See Appendix A for a list of shell capabilities that modernish currently detects, as well as further general information on the capability detection framework.

thisshellhas is the central function of the capability detection framework. It not only tests for the presence of shell features/quirks/bugs, but can also detect specific shell built-in commands, shell reserved words, shell options (short or long form), and signals.

Modernish itself extensively uses capability detection to adapt itself to the shell it's running on. This is how it works around shell bugs and takes advantage of efficient features not all shells have. But any script using the library can do this in the same way, with the help of this function.

Test results are cached in memory, so repeated checks using thisshellhas are efficient and there is no need to avoid calling it to optimise performance.

Usage:

thisshellhas item ...

thisshellhas continues to process items until one of them produces a negative result or is found invalid, at which point any further items are ignored. So the function only returns successfully if all the items specified were found on the current shell. (To check if either one item or another is present, use separate thisshellhas invocations separated by the || shell operator.)

Exit status: 0 if this shell has all the items in question; 1 if not; 2 if an item was encountered that is not recognised as a valid identifier.

Note: The tests for the presence of reserved words, built-in commands, shell options, and signals are different from capability detection tests in an important way: they only check if an item by that name exists on this shell, and don't verify that it does the same thing as on another shell.

Names and identifiers

All modernish functions require portable variable and shell function names, that is, ones consisting of ASCII uppercase and lowercase letters, digits, and the underscore character _, and that don't begin with digit. For shell option names, the constraints are the same except a dash - is also accepted. An invalid identifier is generally treated as a fatal error.

Internal namespace

Function-local variables are not supported by the standard POSIX shell; only global variables are provided for. Modernish needs a way to store its internal state without interfering with the program using it. So most of the modernish functionality uses an internal namespace _Msh_* for variables, functions and aliases. All these names may change at any time without notice. Any names starting with _Msh_ should be considered sacrosanct and untouchable; modernish programs should never directly use them in any way. Of course this is not enforceable, but names starting with _Msh_ should be uncommon enough that no unintentional conflict is likely to occur.

Modernish system constants

Modernish provides certain constants (read-only variables) to make life easier. These include:

Control character, whitespace and shell-safe character constants

POSIX does not provide for the quoted C-style escape codes commonly used in bash, ksh and zsh (such as $'\n' to represent a newline character), leaving the standard shell without a convenient way to refer to control characters. Modernish provides control character constants (read-only variables) with hexadecimal suffixes $CC01 .. $CC1F and $CC7F, as well as $CCe, $CCa, $CCb, $CCf, $CCn, $CCr, $CCt, $CCv (corresponding with printf backslash escape codes). This makes it easy to insert control characters in double-quoted strings.

More convenience constants, handy for use in bracket glob patterns for use with case or modernish match:

Usage examples:

# Use a glob pattern to check against control characters in a string:
	if str match "$var" "*[$CONTROLCHARS]*"; then
		putln "\$var contains at least one control character"
	fi
# Use '!' (not '^') to check for characters *not* part of a particular set:
	if str match "$var" "*[!$ASCIICHARS]*"; then
		putln "\$var contains at least one non-ASCII character" ;;
	fi
# Safely split fields at any whitespace, comma or slash (requires safe mode):
	use safe
	LOOP for --split=$WHITESPACE,/ field in $my_items; DO
		putln "Item: $field"
	DONE

Reliable emergency halt

The die function reliably halts program execution, even from within subshells, optionally printing an error message. Note that die is meant for an emergency program halt only, i.e. in situations were continuing would mean the program is in an inconsistent or undefined state. Shell scripts running in an inconsistent or undefined state may wreak all sorts of havoc. They are also notoriously difficult to terminate correctly, especially if the fatal error occurs within a subshell: exit won't work then. That's why die is optimised for killing all the program's processes (including subshells and external commands launched by it) as quickly as possible. It should never be used for exiting the program normally.

On interactive shells, die behaves differently. It does not kill or exit your shell; instead, it issues SIGINT to the shell to abort the execution of your running command(s), which is equivalent to pressing Ctrl+C. In addition, if die is invoked from a subshell such as a background job, it kills all processes belonging to that job, but leaves other running jobs alone.

Usage: die [ message ]

If the trap stack module is active, a special DIE pseudosignal can be trapped (using plain old trap or pushtrap) to perform emergency cleanup commands upon invoking die.

If the MSH_HAVE_MERCY variable is set in a script and die is invoked from a subshell, then die will only terminate the current subshell and its subprocesses and will not execute DIE traps, allowing the script to resume execution in the parent process. This is for use in special cases, such as regression tests, and is strongly discouraged for general use. Modernish unsets the variable on init so it cannot be inherited from the environment.

Low-level shell utilities

Outputting strings

The POSIX shell lacks a simple, straightforward and portable way to output arbitrary strings of text, so modernish adds two commands for this.

There is no processing of options or escape codes. (Modernish constants $CCn, etc. can be used to insert control characters in double-quoted strings. To process escape codes, use printf instead.)

The echo command is notoriously unportable and kind of broken, so is deprecated in favour of put and putln. Modernish does provide its own version of echo, but it is only activated for portable-form) scripts. Otherwise, the shell-specific version of echo is left intact. The modernish version of echo does not interpret any escape codes and supports only one option, -n, which, like BSD echo, suppresses the final newline. However, unlike BSD echo, if -n is the only argument, it is not interpreted as an option and the string -n is printed instead. This makes it safe to output arbitrary data using this version of echo as long as it is given as a single argument (using quoting if needed).

Legibility aliases: not, so, forever

Modernish sets three aliases that can help to make the shell language look slightly friendlier. Their use is optional.

not is a new synonym for !. They can be used interchangeably.

so is a command that tests if the previous command exited with a status of zero, so you can test the preceding command's success with if so or if not so.

forever is a new synonym for while :;. This allows simple infinite loops of the form: forever do stuff; done.

Enhanced exit

The exit command can be used as normal, but has gained capabilities.

Extended usage: exit [ -u ] [ status [ message ] ]

chdir

chdir is a robust cd replacement for use in scripts.

The standard cd command is designed for interactive shells and appropriate to use there. However, for scripts, its features create serious pitfalls:

Thus, robust and portable use of cd in scripts is unreasonably difficult. The modernish chdir function calls cd in a way that takes care of all these issues automatically: it disables $CDPATH and special operand meanings, and resolves symbolic links by default.

Usage: chdir [ -f ] [ -L ] [ -P ] [ -- ] directorypath

Normally, failure to change the present working directory to directorypath is a fatal error that ends the program. To tolerate failure, add the -f option; in that case, exit status 0 signifies success and exit status 1 signifies failure, and scripts should always check and handle exceptions.

The options -L (logical: don't resolve symlinks) and -P (physical: resolve symlinks) are the same as in cd, except that -P is the default. Note that on a shell with BUG_CDNOLOGIC (NetBSD sh), the -L option to chdir does nothing.

To use arbitrary directory names (e.g. directory names input by the user or other untrusted input) always use the -- separator that signals the end of options, or paths starting with - may be misinterpreted as options.

insubshell

The insubshell function checks if you're currently running in a subshell environment (usually called simply subshell).

A subshell is a copy of the parent shell that starts out as an exact duplicate (including non-exported variables, functions, etc.), except for traps. A new subshell is invoked by constructs like (parentheses), $(command substitutions), pipe|lines, and & (to launch a background subshell). Upon exiting a subshell, all changes to its state are lost.

This is not to be confused with a newly initialised shell that is merely a child process of the current shell, which is sometimes (confusingly and wrongly) called a "subshell" as well. This documentation avoids such a misleading use of the term.

Usage: insubshell [ -p | -u ]

This function returns success (0) if it was called from within a subshell and non-success (1) if not. One of two options can be given:

isset

isset checks if a variable, shell function or option is set, or has certain attributes. Usage:

Exit status: 0 if the item is set; 1 if not; 2 if the argument is not recognised as a valid identifier. Unlike most other modernish commands, isset does not treat an invalid identifier as a fatal error.

When checking a shell option, a nonexistent shell option is not an error, but returns the same result as an unset shell option. (To check if a shell option exists, use thisshellhas.

Note: just isset -f checks if shell option -f (a.k.a. -o noglob) is set, but with an extra argument, it checks if a shell function is set. Similarly, isset -x checks if shell option -x (a.k.a -o xtrace) is set, but isset -x varname checks if a variable is exported. If you use unquoted variable expansions here, make sure they're not empty, or the shell's empty removal mechanism will cause the wrong thing to be checked (even in the safe mode).

setstatus

setstatus manually sets the exit status $? to the desired value. The function exits with the status indicated. This is useful in conditional constructs if you want to prepare a particular exit status for a subsequent exit or return command to inherit under certain circumstances. The status argument is a parsed as a shell arithmetic expression. A negative value is treated as a fatal error. The behaviour of values greater than 255 is not standardised and depends on your particular shell.

Testing numbers, strings and files

The test/[ command is the bane of casual shell scripters. Even advanced shell programmers are frequently caught unaware by one of the many pitfalls of its arcane, hackish syntax. It attempts to look like shell grammar without being shell grammar, causing myriad problems (1, 2). Its -a, -o, ( and ) operators are inherently and fatally broken as there is no way to reliably distinguish operators from operands, so POSIX deprecates their use; however, most manual pages do not include this essential information, and even the few that do will not tell you what to do instead.

Ksh, zsh and bash offer a [[ alternative that fixes many of these problems, as it is integrated into the shell grammar. Nevertheless, it increases confusion, as entirely different grammar and quoting rules apply within [[...]] than outside it, yet many scripts end up using them interchangeably. It is also not available on all POSIX shells. (To make matters worse, Busybox ash has a false-friend [[ that is just an alias of [, with none of the shell grammar integration!)

Finally, the POSIX test/[ command is incompatible with the modernish "safe mode" which aims to eliminate most of the need to quote variables. See use safe for more information.

Modernish deprecates test/[ and [[ completely. Instead, it offers a comprehensive alternative command design that works with the usual shell grammar in a safer way while offering various feature enhancements. The following replacements are available:

Integer number arithmetic tests and operations

To test if a string is a valid number in shell syntax, str isint is available. See String tests.

The arithmetic command let

An implementation of let as in ksh, bash and zsh is now available to all POSIX shells. This makes C-style signed integer arithmetic evaluation available to every supported shell, with the exception of the unary ++ and -- operators (which are a nonstandard shell capability detected by modernish under the ID of ARITHPP).

This means let should be used for operations and tests, e.g. both let "x=5" and if let "x==5"; then... are supported (note: single = for assignment, double == for comparison). See POSIX 2.6.4 Arithmetic Expansion for more information on the supported operators.

Multiple expressions are supported, one per argument. The exit status of let is zero (the shell's idea of success/true) if the last expression argument evaluates to non-zero (the arithmetic idea of true), and 1 otherwise.

It is recommended to adopt the habit to quote each let expression with "double quotes", as this consistently makes everything work as expected: double quotes protect operators that would otherwise be misinterpreted as shell grammar, while shell expansions starting with $ continue to work.

Arithmetic shortcuts

Various handy functions that make common arithmetic operations and comparisons easier to program are available from the var/arith module.

String and file tests

The following notes apply to all commands described in the subsections of this section:

  1. "True" is understood to mean exit status 0, and "false" is understood to mean a non-zero exit status – specifically 1.
  2. Passing more than the number of arguments specified for each command is a fatal error. (If the safe mode is not used, excessive arguments may be generated accidentally if you forget to quote a variable. The test result would have been wrong anyway, so modernish kills the program immediately, which makes the problem much easier to trace.)
  3. Passing fewer than the number of arguments specified to the command is assumed to be the result of removal of an empty unquoted expansion. Where possible, this is not treated as an error, and an exit status corresponding to the omitted argument(s) being empty is returned instead. (This helps make the safe mode possible; unlike with test/[, paranoid quoting to avoid empty removal is not needed.)

String tests

The str function offers various operators for tests on strings. For example, str in $foo "bar" tests if the variable foo contains "bar".

The str function takes unary (one-argument) operators that check a property of a single word, binary (two-argument) operators that check a word against a pattern, as well as an option that makes binary operators check multiple words against a pattern.

Unary string tests

Usage: str operator [ word ]

The word is checked for the property indicated by operator; if the result is true, str returns status 0, otherwise it returns status 1.

The available unary string test operators are:

If word is omitted, it is treated as empty, on the assumption that it is an unquoted empty variable. Passing more than one argument after the operator is a fatal error.

Binary string matching tests

Usage: str operator [ [ word ] pattern ]

The word is compared to the pattern according to the operator; if it matches, str returns status 0, otherwise it returns status 1. The available binary matching operators are:

If word is omitted, it is treated as empty on the assumption that it is an unquoted empty variable, and the single remaining argument is assumed to be the pattern. Similarly, if both word and pattern are omitted, an empty word is matched against an empty pattern. Passing more than two arguments after the operator is a fatal error.

Multi-matching option

Usage: str -M operator [ [ word ... ] pattern ]

The -M option causes str to compare any number of words to the pattern. The available operators are the same as the binary string matching operators listed above.

All matching words are stored in the REPLY variable, separated by newline characters ($CCn) if there is more than one match. If no words match, REPLY is unset.

The exit status returned by str -M is as follows:

Usage example: the following matches a given GNU-style long-form command line option $1 against a series of available options. To make it possible for the options to be abbreviated, we check if any of the options begin with the given argument $1.

if str -M begin --fee --fi --fo --fum --foo --bar --baz --quux "$1"; then
	putln "OK. The given option $1 matched $REPLY"
else
	case $? in
	( 1 )	putln "No such option: $1" >&2 ;;
	( * )	putln "Ambiguous option: $1" "Did you mean:" "$REPLY" >&2 ;;
	esac
fi

File type tests

These avoid the snags with symlinks you get with [ and [[. By default, symlinks are not followed. Add -L to operate on files pointed to by symlinks instead of symlinks themselves (the -L makes no difference if the operands are not symlinks).

These commands all take one argument. If the argument is absent, they return false. More than one argument is a fatal error. See notes 1-3 in the parent section.

is present file: Returns true if the file is present in the file system (even if it is a broken symlink).

is -L present file: Returns true if the file is present in the file system and is not a broken symlink.

is sym file: Returns true if the file is a symbolic link (symlink).

is -L sym file: Returns true if the file is a non-broken symlink, i.e. a symlink that points (either directly or indirectly via other symlinks) to a non-symlink file that is present in the file system.

is reg file: Returns true if file is a regular data file.

is -L reg file: Returns true if file is either a regular data file or a symlink pointing (either directly or indirectly via other symlinks) to a regular data file.

Other commands are available that work exactly like is reg and is -L reg but test for other file types. To test for them, replace reg with one of:

File comparison tests

The following notes apply to these commands:

is newer file1 file2: Compares file timestamps, returning true if file1 is newer than file2. Also returns true if file1 exists, but file2 does not; this is consistent for all shells (unlike test file1 -nt file2).

is older file1 file2: Compares file timestamps, returning true if file1 is older than file2. Also returns true if file1 does not exist, but file2 does; this is consistent for all shells (unlike test file1 -ot file2).

is samefile file1 file2: Returns true if file1 and file2 are the same file (hardlinks).

is onsamefs file1 file2: Returns true if file1 and file2 are on the same file system. If any non-regular, non-directory files are specified, their parent directory is tested instead of the file itself.

File status tests

These always follow symlinks.

is nonempty file: Returns true if the file exists, is not a broken symlink, and is not empty. Unlike [ -s file ], this also works for directories, as long as you have read permission in them.

is setuid file: Returns true if the file has its set-user-ID flag set.

is setgid file: Returns true if the file has its set-group-ID flag set.

I/O tests

is onterminal FD: Returns true if file descriptor FD is associated with a terminal. The FD may be a non-negative integer number or one of the special identifiers stdin, stdout and stderr which are equivalent to 0, 1, and 2. For instance, is onterminal stdout returns true if commands that write to standard output (FD 1), such as putln, would write to the terminal, and false if the output is redirected to a file or pipeline.

File permission tests

Any symlinks given are resolved, as these tests would be meaningless for a symlink itself.

can read file: True if the file's permission bits indicate that you can read the file - i.e., if an r bit is set and applies to your user.

can write file: True if the file's permission bits indicate that you can write to the file: for non-directories, if a w bit is set and applies to your user; for directories, both w and x.

can exec file: True if the file's type and permission bits indicate that you can execute the file: for regular files, if an x bit is set and applies to your user; for other file types, never.

can traverse file: True if the file is a directory and its permission bits indicate that a path can traverse through it to reach its subdirectories: for directories, if an x bit is set and applies to your user; for other file types, never.

The stack

In modernish, every variable and shell option gets its own stack. Arbitrary values/states can be pushed onto the stack and popped off it in reverse order. For variables, both the value and the set/unset state is (re)stored.

Usage:

where item is a valid portable variable name, a short-form shell option (dash plus letter), or a long-form shell option (-o followed by an option name, as two arguments).

Before pushing or popping anything, both functions check if all the given arguments are valid and pop checks all items have a non-empty stack. This allows pushing and popping groups of items with a check for the integrity of the entire group. pop exits with status 0 if all items were popped successfully, and with status 1 if one or more of the given items could not be popped (and no action was taken at all).

The --key= option is an advanced feature that can help different modules or functions to use the same variable stack safely. If a key is given to push, then for each item, the given key value is stored along with the variable's value for that position in the stack. Subsequently, restoring that value with pop will only succeed if the key option with the same key value is given to the pop invocation. Similarly, popping a keyless value only succeeds if no key is given to pop. If there is any key mismatch, no changes are made and pop returns status 2. Note that this is a robustness/convenience feature, not a security feature; the keys are not hidden in any way.

If the --keepstatus option is given, pop will exit with the exit status of the command executed immediately prior to calling pop. This can avoid the need for awkward workarounds when restoring variables or shell options at the end of a function. However, note that this makes failure to pop (stack empty or key mismatch) a fatal error that kills the program, as pop no longer has a way to communicate this through its exit status.

The shell options stack

push and pop allow saving and restoring the state of any shell option available to the set builtin. The precise shell options supported (other than the ones guaranteed by POSIX) depend on the shell modernish is running on. To facilitate portability, nonexistent shell options are treated as unset.

Long-form shell options are matched to their equivalent short-form shell options, if they exist. For instance, on all POSIX shells, -f is equivalent to -o noglob, and push -o noglob followed by pop -f works correctly. This also works for shell-specific short & long option equivalents.

On shells with a dynamic no option name prefix, that is on ksh, zsh and yash (where, for example, noglob is the opposite of glob), the no prefix is ignored, so something like push -o glob followed by pop -o noglob does the right thing. But this depends on the shell and should never be used in portable scripts.

The trap stack

Modernish can also make traps stack-based, so that each program component or library module can set its own trap commands without interfering with others. This functionality is provided by the var/stack/trap module.

Modules

As modularity is one of modernish's design principles, much of its essential functionality is provided in the form of loadable modules, so the core library is kept lean. Modules are organised hierarchically, with names such as safe, var/loop and sys/cmd/harden. The use command loads and initialises a module or a combined directory of modules.

Internally, modules exist in files with the name extension .mm in subdirectories of lib/modernish/mdl – for example, the module var/stack/trap corresponds to the file lib/modernish/mdl/var/stack/trap.mm.

Usage:

The first form loads and initialises a module. All arguments, including the module name, are passed on to the dot script unmodified, so modules know their own name and can implement option parsing to influence their initialisation. See also Two basic forms of a modernish program for information on how to use modules in portable-form scripts.

In the second form, the -q option queries if a module is loaded, and the -e option queries if a module exists. use returns status 0 for yes, 1 for no, and 2 if the module name is invalid.

The -l option lists all currently loaded modules in the order in which they were originally loaded. Just add | sort for alphabetical order.

If a directory of modules, such as sys/cmd or even just sys, is given as the modulename, then all the modules in that directory and any subdirectories are loaded recursively. In this case, passing extra arguments is a fatal error.

If a module file X.mm exists along with a directory X, resolving to the same modulename, then use will load the X.mm module file without automatically loading any modules in the X directory, because it is expected that X.mm handles the submodules in X manually. (This is currently the case for var/loop which auto-loads submodules containing loop types on first use).

The complete lib/modernish/mdl directory path, which depends on where modernish is installed, is stored in the system constant $MSH_MDL.

The following subchapters document the modules that come with modernish.

use safe

The safe module sets the 'safe mode' for the shell. It removes most of the need to quote variables, parameter expansions, command substitutions, or glob patterns. It uses shell settings and modernish library functionality to secure and demystify split and glob mechanisms. This creates a new and safer way of shell script programming, essentially building a new shell language dialect while still running on all POSIX-compliant shells.

Why the safe mode?

One of the most common headaches with shell scripting is caused by a fundamental flaw in the shell as a scripting language: constantly active field splitting (a.k.a. word splitting) and pathname expansion (a.k.a. globbing). To cope with this situation, it is hammered into programmers of shell scripts to be absolutely paranoid about properly quoting nearly everything, including variable and parameter expansions, command substitutions, and patterns passed to commands like find.

These mechanisms were designed for interactive command line usage, where they do come in very handy. But when the shell language is used as a programming language, splitting and globbing often ends up being applied unexpectedly to unquoted expansions and command substitutions, helping cause thousands of buggy, brittle, or outright dangerous shell scripts.

One could blame the programmer for forgetting to quote an expansion properly, or one could blame a pitfall-ridden scripting language design where hammering punctilious and counterintuitive habits into casual shell script programmers is necessary. Modernish does the latter, then fixes it.

How the safe mode works

Every POSIX shell comes with a little-used ability to disable global field splitting and pathname expansion: IFS=''; set -f. An empty IFS variable disables split; the -f (or -o noglob) shell option disables pathname expansion. The safe mode sets these, and two others (see below).

The reason these safer settings are hardly ever used is that they are not practical to use with the standard shell language. For instance, for textfile in *.txt, or for item in $(some command) which both (!) field-splits and pathname-expands the output of a command, all break.

However, that is where modernish comes in. It introduces several powerful new loop constructs, as well as arbitrary code blocks with local settings, each of which has straightforward, intuitive operators for safely applying field splitting or pathname expansion – to specific command arguments only. By default, they are not both applied to the arguments, which is much safer. And your script code as a whole is kept safe from them at all times.

With global field splitting and pathname expansion removed, a third issue still affects the safe mode: the shell's empty removal mechanism. If the value of an unquoted expansion like $var is empty, it will not expand to an empty argument, but will be removed altogether, as if it were never there. This behaviour cannot be disabled.

Thankfully, the vast majority of shell and Un*x commands order their arguments in a way that is actually designed with empty removal in mind, making it a good thing. For instance, when doing ls $option some_dir, if $option is -l the listing will be long-format and if is empty it will be removed, which is the desired behaviour. (An empty argument there would cause an error.)

However, one command that is used in almost all shell scripts, test/[, is completely unable to cope with empty removal due to its idiosyncratic and counterintuitive syntax. Potentially empty operands come before options, so operands removed as empty expansions cause errors or, worse, false positives. Thus, the safe mode does not remove the need for paranoid quoting of expansions used with test/[ commands. Modernish fixes this issue by deprecating test/[ completely and offering a safe command design to use instead, which correctly deals with empty removal.

With the 'safe mode' shell settings, plus the safe, explicit and readable split and glob operators and test/[ replacements, the only quoting requirements left are:

  1. a very occasional need to stop empty removal from happening;
  2. to quote "$@" and "$*" until shell bugs are fixed (see notes below).

In addition to the above, the safe mode also sets these shell options:

Important notes for safe mode

Extra options for the safe mode

Usage: use safe [ -k | -K ] [ -i ]

The -k and -K module options install an extra handler that reliably kills the program if it tries to execute a command that is not found, on shells that have the ability to catch and handle 'command not found' errors (currently bash, yash, and zsh). This helps catch typos, forgetting to load a module, etc., and stops your program from continuing in an inconsistent state and potentially causing damage. The MSH_NOT_FOUND_OK variable may be set to temporarily disable this check. The uppercase -K module option aborts the program on shells that cannot handle 'command not found' errors (so should not be used for portable scripts), whereas the lowercase -k variant is ignored on such shells.

If the -i option is given, or the shell is interactive, two extra one-letter functions are loaded, s and g. These are pre-command modifiers for use when split and glob are globally disabled; they allow running a single command with local split and glob applied to that command's arguments only. They also have some options designed to manipulate, examine, save, restore, and generally experiment with the global split and glob state on interactive shells. Type s --help and g --help for more information. In general, the safe mode is designed for scripts and is not recommended for interactive shells.

use var/loop

The var/loop module provides an innovative, robust and extensible shell loop construct. Several powerful loop types are provided, while advanced shell programmers may find it easy and fun to create their own. This construct is also ideal for the safe mode: the for, select and find loop types allow you to selectively apply field splitting and/or pathname expansion to specific arguments without subjecting a single line of your code to them.

The basic form is a bit different from native shell loops. Note the caps:
LOOP looptype arguments; DO
      your commands here
DONE

The familiar do...done block syntax cannot be used because the shell will not allow modernish to add its own functionality to it. The DO...DONE block does behave in the same way as do...done: you can append redirections at the end, pipe commands into a loop, etc. as usual. The break and continue shell builtin commands also work as normal.

Remember: using lowercase do...done with modernish LOOP will cause the shell to throw a misleading syntax error. So will using uppercase DO...DONE with the shell's native loops. To help you remember to use the uppercase variants for modernish loops, the LOOP keyword itself is also in capitals.

Loops exist in submodules of var/loop named after the loop type; for instance, the find loop lives in the var/loop/find module. However, the core var/loop module will automatically load a loop type's module when that loop is first used, so use-ing individual loop submodules at your script's startup time is optional.

The LOOP block internally uses file descriptor 8 to do its thing. If your script happens to use FD 8 for other purposes, you should know that FD 8 is made local to each loop block, and always appears initially closed within DO...DONE.

Simple repeat loop

This simply iterates the loop the number of times indicated. Before the first iteration, the argument is evaluated as a shell integer arithmetic expression as in let and its value used as the number of iterations.

LOOP repeat 3; DO
	putln "This line is repeated 3 times."
DONE

BASIC-style arithmetic for loop

This is a slightly enhanced version of the FOR loop in BASIC. It is more versatile than the repeat loop but still very easy to use.

LOOP for varname=initial to limit [ step increment ]; DO
      some commands
DONE

To count from 1 to 20 in steps of 2:

LOOP for i=1 to 20 step 2; DO
	putln "$i"
DONE

Note the varname=initial needs to be one argument as in a shell assignment (so no spaces around the =).

If "step increment" is omitted, increment defaults to 1 if limit is equal to or greater than initial, or to -1 if limit is less than initial (so counting backwards 'just works').

Technically precise description: On entry, the initial, limit and increment values are evaluated once as shell arithmetic expressions as in let, the value of initial is assigned to varname, and the loop iterates. Before every subsequent iteration, the value of increment (as determined on the first iteration) is added to the value of varname, then the limit expression is re-evaluated; as long as the current value of varname is less (if increment is non-negative) or greater (if increment is negative) than or equal to the current value of limit, the loop reiterates.

C-style arithmetic for loop

A C-style for loop akin to for (( )) in ksh93, bash and zsh is now available on all POSIX-compliant shells, with a slightly different syntax. The one loop argument contains three arithmetic expressions (as in let), separated by semicolons within that argument. The first is only evaluated before the first iteration, so is typically used to assign an initial value. The second is evaluated before each iteration to check whether to continue the loop, so it typically contains some comparison operator. The third is evaluated before the second and further iterations, and typically increases or decreases a value. For example, to count from 1 to 10:

LOOP for "i=1; i<=10; i+=1"; DO
	putln "$i"
DONE

However, using complex expressions allows doing much more powerful things. Any or all of the three expressions may also be left empty (with their separating ; character remaining). If the second expression is empty, it defaults to 1, creating an infinite loop.

(Note that ++i and i++ can only be used on shells with ARITHPP, but i+=1 or i=i+1 can be used on all POSIX-compliant shells.)

Enumerative for/select loop with safe split/glob

The enumarative for and select loop types mirror those already present in native shell implementations. However, the modernish versions provide safe field splitting and globbing (pathname expansion) functionality that can be used without globally enabling split or glob for any of your code – ideal for the safe mode. They also add a unique operator for processing text in fixed-size slices. The select loop type brings select functionality to all POSIX shells and not just ksh, zsh and bash.

Usage:

LOOP [ for | select ] [ operators ] varname in argument ... ; DO commands ; DONE

Simple usage example:

LOOP select --glob textfile in *.txt; DO
	putln "You chose text file $textfile."
DONE

If the loop type is for, the loop iterates once for each argument, storing it in the variable named varname.

If the loop type is select, the loop presents before each iteration a numbered menu that allows the user to select one of the arguments. The prompt from the PS3 variable is displayed and a reply read from standard input. The literal reply is stored in the REPLY variable. If the reply was a number corresponding to an argument in the menu, that argument is stored in the variable named varname. Then the loop iterates. If the user enters ^D (end of file), REPLY is cleared and the loop breaks with an exit status of 1. (To break the menu loop under other conditions, use the break command.)

The following operators are supported. Note that the split and glob operators are only for use in the safe mode.

If multiple operators are given, their mechanisms are applied in the following order: split, glob, base, slice.

The find loop

This powerful loop type turns your local POSIX-compliant find utility into a shell loop, safely integrating both find and xargs functionality into the POSIX shell. The infamous pitfalls and limitations of using find and xargs as external commands are gone, as all the results from find are readily available to your main shell script. Any "dangerous" characters in file names (including whitespace and even newlines) "just work", especially if the safe mode is also active. This gives you the flexibility to use either the find expression syntax, or shell commands (including your own shell functions), or some combination of both, to decide whether and how to handle each file found.

Usage:

LOOP find [ options ] varname [ in path ... ] [ find-expression ] ; DO commands ; DONE

LOOP find [ options ] --xargs[=arrayname] [ in path ... ] [ find-expression ] ; DO commands ; DONE

The loop recursively walks down the directory tree for each path given. For each file encountered, it uses the find-expression to decide whether to iterate the loop with the path to the file stored in the variable referenced by varname. The find-expression is a standard find utility expression except as described below.

Any number of paths to search may be specified after the in keyword. By default, a nonexistent path is a fatal error. The entire in clause may be omitted, in which case it defaults to in . so the current working directory will be searched. Any argument that starts with a -, or is identical to ! or (, indicates the end of the paths and the beginning of the find-expression; if you need to explicitly specify a path with such a name, prefix ./ to it.

Except for syntax errors, any errors or warnings issued by find are considered non-fatal and will cause the exit status of the loop to be non-zero, so your script has the opportunity to handle the exception.

Available options
Available find-expression operands

LOOP find can use all expression operands supported by your local find utility; see its manual page. However, portable scripts should use only operands specified by POSIX along with the modernish additions described below.

The modernish -iterate expression primary evaluates as true and causes the loop to iterate, executing your commands for each matching file. It may be used any number of times in the find-expression to start a corresponding series of loop iterations. If it is not given, the loop acts as if the entire find-expression is enclosed in parentheses with -iterate appended. If the entire find-expression is omitted, it defaults to -iterate.

The modernish -ask primary asks confirmation of the user. The text of the prompt may be specified in one optional argument (which cannot start with - or be equal to ! or (). Any occurrences of the characters {} within the prompt text are replaced with the current pathname. If not specified, the default prompt is: "{}"? If the answer is affirmative (y or Y in the POSIX locale), -ask yields true, otherwise false. This can be used to make any part of the expression conditional upon user input, and (unlike commands in the shell loop body) is capable of influencing directory traversal mid-run.

The standard -exec and -ok primaries are integrated into the main shell environment. When used with LOOP find, they can call a shell builtin command or your own shell function directly in the main shell (no subshell). Its exit status is used in the find expression as a true/false value capable of influencing directory traversal (for example, when combined with -prune), just as if it were an external command -exec'ed with the standard utility.

Some familiar, easy-to-use but non-standard find operands from GNU and/or BSD may be used with LOOP find on all systems. Before invoking the find utility, modernish translates them internally to portable equivalents. The following expression operands are made portable:

Expression primaries that write output (-print and friends) may be used for debugging or logging the loop. Their output is redirected to standard error.

Picking a find utility

Upon initialisation, the var/loop/find module searches for a POSIX-compliant find utility under various names in $DEFPATH and then in $PATH. To see a trace of the full command lines of utility invocations when the loop runs, set the _loop_DEBUG variable to any value.

For debugging or system-specific usage, it is possible to use a certain find utility in preference to any others on the system. To do this, add an argument to a use var/loop/find command before the first use of the loop. For example:

Compatibility mode for obsolete find utilities

Some systems come with obsolete or broken find utilities that don't fully support -exec ... {} + aggregating functionality as specified by POSIX. Normally, this is a fatal error, but passing the -b/-B option to the use command, e.g. use var/loop/find -b, enables a compatibility mode that tolerates this defect. If no compliant find is found, then an obsolete or broken find is used as a last resort, a warning is printed to standard error, and the variable _loop_find_broken is set. The -B option is equivalent to -b but does not print a warning. Loop performance may suffer as modernish adapts to using older exec ... {} \; which is very inefficient.

Scripts using this compatibility mode should handle their logic using shell code in the loop body as much as possible (after DO) and use only simple find expressions (before DO), as obsolete utilities are often buggy and breakage is likely if complex expressions or advanced features are used.

find loop usage examples

Simple example script: without the safe mode, the *.txt pattern must be quoted to prevent it from being expanded by the shell.

. modernish
use var/loop
LOOP find TextFile in ~/Documents -name '*.txt'
DO
	putln "Found my text file: $TextFile"
DONE

Example script with safe mode: the --glob option expands the patterns of the in clause, but not the expression – so it is not necessary to quote any pattern.

. modernish
use safe
use var/loop
LOOP find --glob lsProg in /*bin /*/*bin -type f -name ls*
DO
	putln "This command may list something: $lsProg"
DONE

Example use of the modernish -ask primary: ask the user if they want to descend into each directory found. The shell loop body could skip unwanted results, but cannot physically influence directory traversal, so skipping large directories would take long. A find expression can prevent directory traversal using the standard -prune primary, which can be combined with -ask, so that unwanted directories never iterate the loop in the first place.

. modernish
use safe
use var/loop
LOOP find file in ~/Documents \
	-type d \( -ask 'Descend into "{}" directory?' -or -prune \) \
	-or -iterate
DO
	put "File found: "
	ls -li $file
DONE

Creating your own loop

The modernish loop construct is extensible. To define a new loop type, you only need to define a shell function called _loopgen_type where type is the loop type. This function, called the loop iteration generator, is expected to output lines of text to file descriptor 8, containing properly shell-quoted iteration commands for the shell to run, one line per iteration.

The internal commands expanded from LOOP, DO and DONE (which are defined as aliases) launch that loop iteration generator function in the background with safe mode enabled, while causing the main shell to read lines from that background process through a pipe, evaling each line as a command before iterating the loop. As long as that iteration command finishes with an exit status of zero, the loop keeps iterating. If it has a nonzero exit status or if there are no more commands to read, iteration terminates and execution continues beyond the loop.

Instead of the normal internal namespace which is considered off-limits for modernish scripts, var/loop and its submodules use a _loop_* internal namespace for variables, which is also for use by user-implemented loop iteration generator functions.

The above is just the general principle. For the details, study the comments and the code in lib/modernish/mdl/var/loop.mm and the loop generators in lib/modernish/mdl/var/loop/*.mm.

use var/local

This module defines a new LOCAL...BEGIN...END shell code block construct with local variables, local positional parameters and local shell options. The local positional parameters can be filled using safe field splitting and pathname expansion operators similar to those in the LOOP construct described above.

Usage: LOCAL [ localitem | operator ... ] [ -- [ word ... ] ] ; BEGIN commands ; END

The commands are executed once, with the specified localitems applied. Each localitem can be:

Modernish implements LOCAL blocks as one-time shell functions that use the stack to save and restore variables and settings. So the return command exits the block, causing the global variables and settings to be restored and resuming execution at the point immediately following END. Like any shell function, a LOCAL block exits with the exit status of the last command executed within it, or with the status passed on by or given as an argument to return.

The positional parameters ($@, $1, etc.) are always local to the block, but a copy is inherited from outside the block by default. Any changes to the positional parameters made within the block will be discarded upon exiting it.

However, if a double-dash -- argument is given in the LOCAL command line, the positional parameters outside the block are ignored and the set of words after -- (which may be empty) becomes the positional parameters instead.

These words can be modified prior to entering the LOCAL block using the following operators. The safe glob and split operators are only accepted in the safe mode. The operators are:

If multiple operators are given, their mechanisms are applied in the following order: split, glob, base, slice.

Important var/local usage notes

use var/arith

These shortcut functions are alternatives for using let.

Arithmetic operator shortcuts

inc, dec, mult, div, mod: simple integer arithmetic shortcuts. The first argument is a variable name. The optional second argument is an arithmetic expression, but a sane default value is assumed (1 for inc and dec, 2 for mult and div, 256 for mod). For instance, inc X is equivalent to X=$((X+1)) and mult X Y-2 is equivalent to X=$((X*(Y-2))).

ndiv is like div but with correct rounding down for negative numbers. Standard shell integer division simply chops off any digits after the decimal point, which has the effect of rounding down for positive numbers and rounding up for negative numbers. ndiv consistently rounds down.

Arithmetic comparison shortcuts

These have the same name as their test/[ option equivalents. Unlike with test, the arguments are shell integer arith expressions, which can be anything from simple numbers to complex expressions. As with $(( )), variable names are expanded to their values even without the $.

Function:         Returns successfully if:
eq <expr> <expr>  the two expressions evaluate to the same number
ne <expr> <expr>  the two expressions evaluate to different numbers
lt <expr> <expr>  the 1st expr evaluates to a smaller number than the 2nd
le <expr> <expr>  the 1st expr eval's to smaller than or equal to the 2nd
gt <expr> <expr>  the 1st expr evaluates to a greater number than the 2nd
ge <expr> <expr>  the 1st expr eval's to greater than or equal to the 2nd

use var/assign

This module is provided to solve a common POSIX shell language annoyance: in a normal shell variable assignment, only literal variable names are accepted, so it is impossible to use a variable whose name is stored in another variable. The only way around this is to use eval which is too difficult to use safely. Instead, you can now use the assign command.

Usage: assign [ [ +r ] variable=value ... ] | [ -r variable=variable2 ... ] ...

assign safely processes assignment-arguments in the same form as customarily given to the readonly and export commands, but it only assigns values to variables without setting any attributes. Each argument is grammatically an ordinary shell word, so any part or all of it may result from an expansion. The absence of a = character in any argument is a fatal error. The text preceding the first = is taken as the variable name in which to store the value; an invalid variable name is a fatal error. No whitespace is accepted before the = and any whitespace after the = is part of the value to be assigned.

The -r (reference) option causes the part to the right of the = to be taken as a second variable name variable2, and its value is assigned to variable instead. +r turns this option back off.

Examples: Each of the lines below assigns the value 'hello world' to the variable greeting.

var=greeting; assign $var='hello world'
var=greeting; assign "$var=hello world"
tag='greeting=hello world'; assign "$tag"
var=greeting; gvar=myinput; myinput='hello world'; assign -r $var=$gvar

use var/readf

readf reads arbitrary data from standard input into a variable until end of file, converting it into a format suitable for passing to the printf utility. For example, readf var <foo; printf "$var" >bar will copy foo to bar. Thus, readf allows storing both text and binary files into shell variables in a textual format suitable for manipulation with standard shell facilities.

All non-printable, non-ASCII characters are converted to printf octal or one-letter escape codes, except newlines. Not encoding newline characters allows for better processing by line-based utilities such as grep, sed, awk, etc. However, if the file ends in a newline, that final newline is encoded to \n to protect it from being stripped by command substitutions.

Usage: readf [ -h ] varname

The -h option disables conversion of high-byte characters (accented letters, non-Latin scripts). Do not use for binary files; this is only guaranteed to work for text files in an encoding compatible with the current locale.

Caveats:

use var/shellquote

This module provides an efficient, fast, safe and portable shell-quoting algorithm for quoting arbitrary data in such a way that the quoted values are safe to pass to the shell for parsing as string literals. This is essential for any context where the shell must grammatically parse untrusted input, such as when supplying arbitrary values to trap or eval.

The shell-quoting algorithm is optimised to minimise exponential growth when quoting repeatedly. By default, it also ensures that quoted strings are always one single printable line, making them safe for terminal output and processing by line-oriented utilities.

shellquote

Usage: shellquote [ -f|+f|-P|+P ] varname[=value] ...

The values of the variables specified by name are shell-quoted and stored back into those variables. Repeating a variable name will add another level of shell-quoting. If a = plus a value (which may be empty) is appended to the varname, that value is shell-quoted and assigned to the variable.

Options modify the algorithm for variable names following them, as follows:

shellquote will die if you attempt to quote an unset variable (because there is no value to quote).

shellquoteparams

The shellquoteparams command shell-quotes the current positional parameters in place using the default quoting method of shellquote. No options are supported and any attempt to add arguments results in a syntax error.

use var/stack

Modules that extend the stack.

use var/stack/extra

This module contains stack query and maintenance functions.

If you only need one or two of these functions, they can also be loaded as individual submodules of var/stack/extra.

For the four functions below, item can be:

stackempty [ --key=value ] [ --force ] item: Tests if the stack for an item is empty. Returns status 0 if it is, 1 if it is not. The key feature works as in pop: by default, a key mismatch is considered equivalent to an empty stack. If --force is given, this function ignores keys altogether.

clearstack [ --key=value ] [ --force ] item [ item ... ]: Clears one or more stacks, discarding all items on it. If (part of) the stack is keyed or a --key is given, only clears until a key mismatch is encountered. The --force option overrides this and always clears the entire stack (be careful, e.g. don't use within LOCAL ... BEGIN ... END). Returns status 0 on success, 1 if that stack was already empty, 2 if there was nothing to clear due to a key mismatch.

stacksize [ --silent | --quiet ] item: Leaves the size of a stack in the REPLY variable and, if option --silent or --quiet is not given, writes it to standard output. The size of the complete stack is returned, even if some values are keyed.

printstack [ --quote ] item: Outputs a stack's content. Option --quote shell-quotes each stack value before printing it, allowing for parsing multi-line or otherwise complicated values. Column 1 to 7 of the output contain the number of the item (down to 0). If the item is set, column 8 and 9 contain a colon and a space, and if the value is non-empty or quoted, column 10 and up contain the value. Sets of values that were pushed with a key are started with a special line containing --- key: value. A subsequent set pushed with no key is started with a line containing --- (key off). Returns status 0 on success, 1 if that stack is empty.

use var/stack/trap

This module provides pushtrap and poptrap. These functions integrate with the main modernish stack to make traps stack-based, so that each program component or library module can set its own trap commands without interfering with others.

This module also provides a new DIE pseudosignal that allows pushing traps to execute when die is called.

Note an important difference between the trap stack and stacks for variables and shell options: pushing traps does not save them for restoring later, but adds them alongside other traps on the same signal. All pushed traps are active at the same time and are executed from last-pushed to first-pushed when the respective signal is triggered. Traps cannot be pushed and popped using push and pop but use dedicated commands as follows.

Usage:

pushtrap works like regular trap, with the following exceptions:

poptrap takes just signal names or numbers as arguments. It takes the last-pushed trap for each signal off the stack. By default, it discards the trap commands. If the -R option is given, it stores commands to restore those traps into the REPLY variable, in a format suitable for re-entry into the shell. Again, the --key option works as in plain pop.

With the sole exception of DIE traps, all stack-based traps, like native shell traps, are reset upon entering a subshell. However, commands for printing traps will print the traps for the parent shell, until another trap, pushtrap or poptrap command is invoked, at which point all memory of the parent shell's traps is erased.

Trap stack compatibility considerations

Modernish tries hard to avoid incompatibilities with existing trap practice. To that end, it intercepts the regular POSIX trap command using an alias, reimplementing and interfacing it with the shell's builtin trap facility so that plain old regular traps play nicely with the trap stack. You should not notice any changes in the POSIX trap command's behaviour, except for the following:

POSIX traps for each signal are always executed after that signal's stack-based traps; this means they should not rely on modernish modules that use the trap stack to clean up after themselves on exit, as those cleanups would already have been done.

The new DIE pseudosignal

The var/stack/trap module adds new DIE pseudosignal whose traps are executed upon invoking die. This allows for emergency cleanup operations upon fatal program failure, as EXIT traps cannot be executed after die is invoked.

use var/string

String comparison and manipulation functions.

use var/string/touplow

toupper and tolower: convert case in variables.

Usage:

Arguments are taken as variable names (note: they should be given without the $) and case is converted in the contents of the specified variables, without reading input or writing output.

toupper and tolower try hard to use the fastest available method on the particular shell your program is running on. They use built-in shell functionality where available and working correctly, otherwise they fall back on running an external utility.

Which external utility is chosen depends on whether the current locale uses the Unicode UTF-8 character set or not. For non-UTF-8 locales, modernish assumes the POSIX/C locale and tr is always used. For UTF-8 locales, modernish tries hard to find a way to correctly convert case even for non-Latin alphabets. A few shells have this functionality built in with typeset. The rest need an external utility. Modernish initialisation tries tr, awk, GNU awk and GNU sed before giving up and setting the variable MSH_2UP2LOW_NOUTF8. If isset MSH_2UP2LOW_NOUTF8, it means modernish is in a UTF-8 locale but has not found a way to convert case for non-ASCII characters, so toupper and tolower will convert only ASCII characters and leave any other characters in the string alone.

use var/string/trim

trim: strip whitespace from the beginning and end of a variable's value. Whitespace is defined by the [:space:] character class. In the POSIX locale, this is tab, newline, vertical tab, form feed, carriage return, and space, but in other locales it may be different. (On shells with BUG_NOCHCLASS, $WHITESPACE is used to define whitespace instead.) Optionally, a string of literal characters can be provided in the second argument. Any characters appearing in that string will then be trimmed instead of whitespace. Usage: trim varname [ characters ]

use var/string/replacein

replacein: Replace leading, -trailing or -all occurrences of a string by another string in a variable.
Usage: replacein [ -t | -a ] varname oldstring newstring

use var/string/append

append and prepend: Append or prepend zero or more strings to a variable, separated by a string of zero or more characters, avoiding the hairy problem of dangling separators. Usage: append|prepend [ --sep=separator ] [ -Q ] varname [ string ... ]
If the separator is not specified, it defaults to a space character. If the -Q option is given, each string is shell-quoted before appending or prepending.

use var/unexport

The unexport function clears the "export" bit of a variable, conserving its value, and/or assigns values to variables without setting the export bit. This works even if set -a (allexport) is active, allowing an "export all variables, except these" way of working.

Usage is like export, with the caveat that variable assignment arguments containing non-shell-safe characters or expansions must be quoted as appropriate, unlike in some specific shell implementations of export. (To get rid of that headache, use safe.)

Unlike export, unexport does not work for read-only variables.

use var/genoptparser

As the getopts builtin is not portable when used in functions, this module provides a command that generates modernish code to parse options for your shell function in a standards-compliant manner. The generated parser supports short-form (one-character) options which can be stacked/combined.

Usage: generateoptionparser [ -o ] [ -f func ] [ -v varprefix ] [ -n options ] [ -a options ] [ varname ]

At least one of -n and -a is required. All other arguments are optional. Option characters must be valid components of portable variable names, so they must be ASCII upper- or lowercase letters, digits, or the underscore.

generateoptionparser stores the generated parser code in a variable: either REPLY or the varname specified as the first non-option argument. This makes it possible to generate and use the parser on the fly with a command like eval "$REPLY" immediately following the generateoptionparser invocation.

For better efficiency and readability, it will often be preferable to insert the option parser code directly into your shell function instead. The -o option writes the parser code to standard output, so it can be redirected to a file, inserted into your editor, etc.

Parsed options are shifted out of the positional parameters while setting or unsetting corresponding variables, until a non-option argument, a -- end-of-options delimiter argument, or the end of arguments is encountered. Unlike with getopts, no additional shift command is required.

Each specified option gets a corresponding variable with a name consisting of the varprefix (default: opt_) plus the option character. If an option is not passed to your function, the parser unsets its variable; otherwise it sets it to either the empty value or its option-argument if it requires one. Thus, your function can check if any option x was given using isset, for example, if isset opt_x; then...

use sys/base

Some very common and essential utilities are not specified by POSIX, differ widely among systems, and are not always available. For instance, the which and readlink commands have incompatible options on various GNU and BSD variants and may be absent on other Unix-like systems. The sys/base module provides a complete re-implementation of such non-standard but basic utilities, written as modernish shell functions. Using the modernish version of these utilities can help a script to be fully portable. These versions also have various enhancements over the GNU and BSD originals, some of which are made possible by their integration into the modernish shell environment.

use sys/base/mktemp

A cross-platform shell implementation of mktemp that aims to be just as safe as native mktemp(1) implementations, while avoiding the problem of having various mutually incompatible versions and adding several unique features of its own.

Creates one or more unique temporary files, directories or named pipes, atomically (i.e. avoiding race conditions) and with safe permissions. The path name(s) are stored in REPLY and optionally written to stdout.

Usage: mktemp [ -dFsQCt ] [ template ... ]

The template defaults to “/tmp/temp.”. An suffix of random shell-safe ASCII characters is added to the template to create the file. For compatibility with other mktemp implementations, any optional trailing X characters in the template are removed. The length of the suffix will be equal to the amount of Xes removed, or 10, whichever is more. The longer the random suffix, the higher the security of using mktemp in a shared directory such as tmp.

Since /tmp is a world-writable directory shared by other users, for best security it is recommended to create a private subdirectory using mktemp -d and work within that.

Option -C cannot be used without option -s when in a subshell. Modernish will detect this and treat it as a fatal error. The reason is that a typical command substitution like tmpfile=$(mktemp -C) is incompatible with auto-cleanup, as the cleanup EXIT trap would be triggered not upon exiting the program but upon exiting the command substitution subshell that just ran mktemp, thereby immediately undoing the creation of the file. Instead, do something like: mktemp -sC; tmpfile=$REPLY

This module depends on the trap stack to do auto-cleanup (the -C option), so it will automatically use var/stack/trap on initialisation.

use sys/base/readlink

readlink reads the target of a symbolic link, robustly handling strange filenames such as those containing newline characters. It stores the result in the REPLY variable and optionally writes it on standard output.

Usage: readlink [ -nsefmQ ] path [ path ... ]

The exit status of readlink is 0 on success and 1 if the path either is not a symlink, or could not be canonicalised according to the option given.

use sys/base/rev

rev copies the specified files to the standard output, reversing the order of characters in every line. If no files are specified, the standard input is read.

Usage: like rev on Linux and BSD, which is like cat except that - is a filename and does not denote standard input. No options are supported.

use sys/base/seq

A cross-platform implementation of seq that is more powerful and versatile than native GNU and BSD seq(1) implementations. The core is written in bc, the POSIX arbitrary-precision calculator language. That means this seq inherits the capacity to handle numbers with a precision and size only limited by computer memory, as well as the ability to handle input numbers in any base from 1 to 16 and produce output in any base 1 and up.

Usage: seq [ -w ] [ -L ] [ -f format ] [ -s string ] [ -S scale ] [ -B base ] [ -b base ] [ first [ incr ] ] last

seq prints a sequence of arbitrary-precision floating point numbers, one per line, from first (default 1), to as near last as possible, in increments of incr (default 1). If first is larger than last, the default incr is -1. An incr of zero is treated as a fatal error.

The -S, -B and -b options take shell integer numbers as operands. This means a leading 0X or 0x denotes a hexadecimal number and a leading 0 denotes an octal number.

For portability reasons, modernish seq uses a full stop (.) for the radix point, regardless of the system locale. This applies both to command arguments and to output. The -L option causes seq to use the current locale's radix point character for output only.

Differences with GNU and BSD seq

The -S, -B and -b options are modernish innovations. The -w, -f and -s options are inspired by GNU and BSD seq. The following differences apply:

The sys/base/seq module depends on, and automatically loads, var/string/touplow.

use sys/base/shuf

Shuffle lines of text. A portable reimplementation of a commonly used GNU utility.

Usage:

By default, shuf reads lines of text from standard input, or from file (the file - signifies standard input). It writes the input lines to standard output in random order.

Differences with GNU shuf:

use sys/base/tac

tac (the reverse of cat) is a cross-platform reimplementation of the GNU tac utility, with some extra features.

Usage: tac [ -rbBP ] [ -S separator ] file [ file ... ]

tac outputs the files in reverse order of lines/records. If file is - or is not given, tac reads from standard input.

Differences between GNU tac and modernish tac:

use sys/base/which

The modernish which utility finds external programs and reports their absolute paths, offering several unique options for reporting, formatting and robust processing. The default operation is similar to GNU which.

Usage: which [ -apqsnQ1f ] [ -P number ] program [ program ... ]

By default, which finds the first available path to each given program. If program is itself a path name (contains a slash), only that path's base directory is searched; if it is a simple command name, the current $PATH is searched. Any relative paths found are converted to absolute paths. Symbolic links are not followed. The first path found for each program is written to standard output (one per line), and a warning is written to standard error for every program not found. The exit status is 0 (success) if all programs were found, 1 otherwise.

which also leaves its output in the REPLY variable. This may be useful if you run which in the main shell environment. The REPLY value will not survive a command substitution subshell as in ls_path=$(which ls).

The following options modify the default behaviour described above:

use sys/base/yes

yes very quickly outputs infinite lines of text, each consisting of its space-separated arguments, until terminated by a signal or by a failure to write output. If no argument is given, the default line is y. No options are supported.

This infinite-output command is useful for piping into commands that need an indefinite input data stream, or to automate a command requiring interactive confirmation.

Modernish yes is like GNU yes in that it outputs all its arguments, whereas BSD yes only outputs the first. It can output multiple gigabytes per second on modern systems.

use sys/cmd

Modules in this category contain functions for enhancing the invocation of commands.

use sys/cmd/extern

extern is like command but always runs an external command, without having to know or determine its location. This provides an easy way to bypass a builtin, alias or function. It does the same $PATH search the shell normally does when running an external command. For instance, to guarantee running external printf just do: extern printf ...

Usage: extern [ -p ] [ -v ] [ -u varname ... ] [ varname=value ... ] command [ argument ... ]

use sys/cmd/harden

The harden function allows implementing emergency halt on error for any external commands and shell builtin utilities. It is modernish's replacement for set -e a.k.a. set -o errexit (which is fundamentally flawed, not supported and will break the library). It depends on, and auto-loads, the sys/cmd/extern module.

harden sets a shell function with the same name as the command hardened, so it can be used transparently. This function hardens the given command by checking its exit status against values indicating error or system failure. Exactly what exit statuses signify an error or failure depends on the command in question; this should be looked up in the POSIX specification (under "Utilities") or in the command's man page or other documentation.

If the command fails, the function installed by harden calls die, so it will reliably halt program execution, even if the failure occurred within a subshell.

Usage:

harden [ -f funcname ] [ -[cSpXtPE] ] [ -e testexpr ] [ var=value ... ] [ -u var ... ] command_name_or_path [ command_argument ... ]

The -f option hardens the command as the shell function funcname instead of defaulting to command_name_or_path as the function name. (If the latter is a path, that's always an invalid function name, so the use of -f is mandatory.) If command_name_or_path is itself a shell function, that function is bypassed and the builtin or external command by that name is hardened instead. If no such command is found, harden dies with the message that hardening shell functions is not supported. (Instead, you should invoke die directly from your shell function upon detecting a fatal error.)

The -c option causes command_name_or_path to be hardened and run immediately instead of setting a shell function for later use. This option is meant for commands that run once; it is not efficient for repeated use. It cannot be used together with the -f option.

The -S option allows specifying several possible names/paths for a command. It causes the command_name_or_path to be split by comma and interpreted as multiple names or paths to search. The first name or path found is used. Requires -f.

The -e option, which defaults to >0, indicates the exit statuses corresponding to a fatal error. It depends on the command what these are; consult the POSIX spec and the manual pages. The status test expression testexpr, argument to the -e option, is like a shell arithmetic expression, with the binary operators == != <= >= < > turned into unary operators referring to the exit status of the command in question. Assignment operators are disallowed. Everything else is the same, including && (logical and) and || (logical or) and parentheses. Note that the expression needs to be quoted as the characters used in it clash with shell grammar tokens.

The -X option causes harden to always search for and harden an external command, even if a built-in command by that name exists.

The -E option causes the hardening function to consider it a fatal error if the hardened command writes anything to the standard error stream. This option allows hardening commands (such as bc) where you can't rely on the exit status to detect an error. The text written to standard error is passed on as part of the error message printed by die. Note that:

The -p option causes harden to search for commands using the system default path (as obtained with getconf PATH) as opposed to the current $PATH. This ensures that you're using a known-good external command that came with your operating system. By default, the system-default PATH search only applies to the command itself, and not to any commands that the command may search for in turn. But if the -p option is specified at least twice, the command is run in a subshell with PATH exported as the default path, which is equivalent to adding a PATH=$DEFPATH assignment argument (see below).

Examples:

harden make                           # simple check for status > 0
harden -f tar '/usr/local/bin/gnutar' # id.; be sure to use this 'tar' version
harden -e '> 1' grep                  # for grep, status > 1 means error
harden -e '==1 || >2' gzip            # 1 and >2 are errors, but 2 isn't (see manual)
Important note on variable assignments

As far as the shell is concerned, hardened commands are shell functions and not external or builtin commands. This essentially changes one behaviour of the shell: variable assignments preceding the command will not be local to the command as usual, but will persist after the command completes. (POSIX technically makes that behaviour optional but all current shells behave the same in POSIX mode.)

For example, this means that something like

harden -e '>1' grep
# [...]
LC_ALL=C grep regex some_ascii_file.txt

should never be done, because the meant-to-be-temporary LC_ALL locale assignment will persist and is likely to cause problems further on.

To solve this problem, harden supports adding these assignments as part of the hardening command, so instead of the above you do:

harden -e '>1' LC_ALL=C grep
# [...]
grep regex some_ascii_file.txt

With the -u option, harden also supports unsetting variables for the duration of a command, e.g.:

harden -e '>1' -u LC_ALL grep

The -u option may be specified multiple times. It causes the hardened command to be invoked from a subshell with the specified variables unset.

Hardening while allowing for broken pipes

If you're piping a command's output into another command that may close the pipe before the first command is finished, you can use the -P option to allow for this:

harden -e '==1 || >2' -P gzip		# also tolerate gzip being killed by SIGPIPE
gzip -dc file.txt.gz | head -n 10	# show first 10 lines of decompressed file

head will close the pipe of gzip input after ten lines; the operating system kernel then kills gzip with the PIPE signal before it's finished, causing a particular exit status that is greater than 128. This exit status would normally make harden kill your entire program, which in the example above is clearly not the desired behaviour. If the exit status caused by a broken pipe were known, you could specifically allow for that exit status in the status expression. The trouble is that this exit status varies depending on the shell and the operating system. The -p option was made to solve this problem: it automatically detects and whitelists the correct exit status corresponding to SIGPIPE termination on the current system.

Tolerating SIGPIPE is an option and not the default, because in many contexts it may be entirely unexpected and a symptom of a severe error if a command is killed by a broken pipe. It is up to the programmer to decide which commands should expect SIGPIPE and which shouldn't.

Tip: It could happen that the same command should expect SIGPIPE in one context but not another. You can create two hardened versions of the same command, one that tolerates SIGPIPE and one that doesn't. For example:

harden -f hardGrep -e '>1' grep		# hardGrep does not tolerate being aborted
harden -f pipeGrep -e '>1' -P grep	# pipeGrep for use in pipes that may break

Note: If SIGPIPE was set to ignore by the process invoking the current shell, the -p option has no effect, because no process or subprocess of the current shell can ever be killed by SIGPIPE. However, this may cause various other problems and you may want to refuse to let your program run under that condition. thisshellhas WRN_NOSIGPIPE can help you easily detect that condition so your program can make a decision. See the WRN_NOSIGPIPE description for more information.

Tracing the execution of hardened commands

The -t option will trace command output. Each execution of a command hardened with -t causes the command line to be output to standard error, in the following format:

[functionname]> commandline

where functionname is the name of the shell function used to harden the command and commandline is the actual command executed. The commandline is properly shell-quoted in a format suitable for re-entry into the shell; however, command lines longer than 512 bytes will be truncated and the unquoted string (TRUNCATED) will be appended to the trace. If standard error is on a terminal that supports ANSI colours, the tracing output will be colourised.

The -t option was added to harden because the commands that you harden are often the same ones you would be particularly interested in tracing. The advantage of using harden -t over the shell's builtin tracing facility (set -x or set -o xtrace) is that the output is a lot less noisy, especially when using a shell library such as modernish.

Note: Internally, -t uses the shell file descriptor 9, redirecting it to standard error (using exec 9>&2). This allows tracing to continue to work normally even for commands that redirect standard error to a file (which is another enhancement over set -x on most shells). However, this does mean harden -t conflicts with any other use of the file descriptor 9 in your shell program.

If file descriptor 9 is already open before harden is called, harden does not attempt to override this. This means tracing may be redirected elsewhere by doing something like exec 9>trace.out before calling harden. (Note that redirecting FD 9 on the harden command itself will not work as it won't survive the run of the command.)

Simple tracing of commands

Sometimes you just want to trace the execution of some specific commands as in harden -t (see above) without actually hardening them against command errors; you might prefer to do your own error handling. trace makes this easy. It is modernish's replacement or complement for set -x a.k.a. set -o xtrace. Unlike harden -t, it can also trace shell functions.

Usage 1: trace [ -f funcname ] [ -[cSpXE] ] [ var=value ... ] [ -u var ... ] command_name_or_path [ command_argument ... ]

For non-function commands, trace acts as a shortcut for harden -t -P -e '>125 && !=255' command_name_or_path. Any further options and arguments are passed on to harden as given. The result is that the indicated command is automatically traced upon execution. A bonus is that you still get minimal hardening against fatal system errors. Errors in the traced command itself are ignored, but your program is immediately halted with an informative error message if the traced command:

Note: The caveat for command-local variable assignments for harden also applies to trace. See Important note on variable assignments above.

Usage 2: [ #! ] trace -f funcname

If no further arguments are given, trace -f will trace the shell function funcname without applying further hardening (except against nonexistence). trace -f can be used to trace the execution of modernish library functions as well as your own script's functions. The trace output for shell functions shows an extra () following the function name.

Internally, this involves setting an alias under the function's name, so the limitations of the shell's alias expansion mechanism apply: only function calls that the shell had not yet parsed before calling trace -f will be traced. So you should use trace -f at the beginning of your script, before defining your own functions. To facilitate this, trace -f does not check that the function funcname exists while setting up tracing, but only when attempting to execute the traced function.

In portable-form modernish scripts, trace -f should be used as a hashbang command to be compatible with alias expansion on all shells. Only the trace -f form may be used that way. For example:

#! /usr/bin/env modernish
#! use safe -k
#! use sys/cmd/harden
#! trace -f push
#! trace -f pop
...your program begins here...

use sys/cmd/mapr

mapr (map records) is an alternative to xargs that shares features with the mapfile command in bash 4.x. It is fully integrated into your script's main shell environment, so it can call your shell functions as well as builtin and external utilities. It depends on, and auto-loads, the sys/cmd/procsubst module.

Usage: mapr [ -d delimiter | -P ] [ -s count ] [ -n number ] [ -m length ] [ -c quantum ] callback

mapr reads delimited records from the standard input, invoking the specified callback command once or repeatedly as needed, with batches of input records as arguments. The callback may consist of multiple arguments. By default, an input record is one line of text.

Options:

Arguments:

Differences from mapfile

mapr was inspired by the bash 4.x builtin command mapfile a.k.a. readarray, and uses similar options, but there are important differences.

Differences from xargs

mapr shares important characteristics with xargs while avoiding its myriad pitfalls.

use sys/cmd/procsubst

This module provides a portable process substitution construct, the advantage being that this is not limited to bash, ksh or zsh but works on all POSIX shells capable of running modernish. It is not possible for modernish to introduce the original ksh syntax into other shells. Instead, this module provides a % command for use within a $(command substitution).

The % command takes one simple command as its arguments, executes it in the background, and writes a file name from which to read its output. So if % is used within a command substitution as intended, that file name is passed on to the invoking command as an argument.

The % command supports one option, -o. If that option is given, then it is expected that, instead of reading input, the invoking command writes output to the file name passed on to it, so that the command invoked by % -o can read that data from its standard input.

<table> <caption>Example syntax comparison:</caption> <tr> <th>ksh/bash/zsh</th><th>modernish</th> </tr> <tr> <td valign="top"> <code>diff -u <(ls) <(ls -a)</code> </td> <td> <code>diff -u $(% ls) $(% ls -a)</code> </td> </tr> <tr> <td valign="top"> <code>IFS=' ' read -r user vsz args < <(ps -o 'user= vsz= args=' -p $$)</code> </td> <td> <code>IFS=' ' read -r user vsz args < $(% ps -o 'user= vsz= args=' -p $$)</code> </td> </tr> <tr> <td valign="top"> <code>{ some commands; } > >(tee stdout.log) 2> >(tee stderr.log)</code> <br/><small>(both `tee` commands write terminal output to standard output)</small> </td> <td> <code>{ some commands; } > $(% -o tee stdout.log) 2> $(% -o tee stderr.log)</code> <br/><small>(both `tee` commands write terminal output to standard error)</small> </td> </tr> </table>

Unlike the bash/ksh/zsh version, modernish process substitution only works with simple commands. This includes shell function calls, but not aliases or anything involving shell grammar or reserved words (such as redirections, pipelines, loops, etc.). To use such complex commands, enclose them in a shell function and call that function from the process substitution.

Also note that anything that a command invoked by the % -o writes to its standard output is redirected to standard error. The main shell environment's standard output is not available because the command substitution subsumes it.

use sys/cmd/source

The source command sources a dot script like the . command, but additionally supports passing arguments to sourced scripts like you would pass them to a function. It mostly mimics the behaviour of the source command built in to bash and zsh.

If a filename without a directory path is given, then, unlike the . command, source looks for the dot script in the current directory by default, as well as searching $PATH.

It is a fatal error to attempt to source a directory, a file with no read permission, or a nonexistent file.

use sys/dir

Functions for working with directories.

use sys/dir/countfiles

countfiles: Count the files in a directory using nothing but shell functionality, so without external commands. (It's amazing how many pitfalls this has, so a library function is needed to do it robustly.)

Usage: countfiles [ -s ] directory [ globpattern ... ]

Count the number of files in a directory, storing the number in REPLY and (unless -s is given) printing it to standard output. If any globpatterns are given, only count the files matching them.

use sys/dir/mkcd

The mkcd function makes one or more directories, then, upon success, change into the last-mentioned one. mkcd inherits mkdir's usage, so options depend on your system's mkdir; only the POSIX options are guaranteed. When mkcd is run from a script, it uses cd -P to change the working directory, resolving any symlinks in the present working directory path.

use sys/term

Utilities for working with the terminal.

use sys/term/putr

This module provides commands to efficiently output a string repeatedly.

Usage:

Output the string number times. When using putrln, add a newline at the end.

If a - is given instead of a number, then the total length of the output is the line length of the terminal divided by the length of the string, rounded down.

Note that, unlike with put and putln, only a single string argument is accepted.

Example: putrln - '=' prints a full terminal line of equals signs.

use sys/term/readkey

readkey: read a single character from the keyboard without echoing back to the terminal. Buffering is done so that multiple waiting characters are read one at a time.

Usage: readkey [ -E ERE ] [ -t timeout ] [ -r ] [ varname ]

-E: Only accept characters that match the extended regular expression ERE (the type of RE used by grep -E/egrep). readkey will silently ignore input not matching the ERE and wait for input matching it.

-t: Specify a timeout in seconds (one significant digit after the decimal point). After the timeout expires, no character is read and readkey returns status 1.

-r: Raw mode. Disables INTR (Ctrl+C), QUIT, and SUSP (Ctrl+Z) processing as well as translation of carriage return (13) to linefeed (10).

The character read is stored into the variable referenced by varname, which defaults to REPLY if not specified.

This module depends on the trap stack to save and restore the terminal state if the program is stopped while reading a key, so it will automatically use var/stack/trap on initialisation.


Appendix A: List of shell cap IDs

This appendix lists all the shell capabilities, quirks, and bugs that modernish can detect in the current shell, so that modernish scripts can easily query the results of these tests and decide what to do. Certain problematic system conditions are also detected this way and listed here.

The all-caps IDs below are all usable with the thisshellhas function. This makes it easy for a cross-platform modernish script to be aware of relevant conditions and decide what to do.

Each detection test has its own little test script in the lib/modernish/cap directory. These tests are executed on demand, the first time the capability or bug in question is queried using thisshellhas. See README.md in that directory for further information. The test scripts also document themselves in the comments.

Capabilities

Modernish currently identifies and supports the following non-standard shell capabilities:

Quirks

Modernish currently identifies and supports the following shell quirks:

Bugs

Modernish currently identifies and supports the following shell bugs:

Warning IDs

Warning IDs do not identify any characteristic of the shell, but instead warn about a potentially problematic system condition that was detected at initialisation time.

Appendix B: Regression test suite

Modernish comes with a suite of regression tests to detect bugs in modernish itself, which can be run using modernish --test after installation. By default, it will run all the tests verbosely but without tracing the command execution. The install.sh installer will run modernish --test -eqq on the selected shell before installation.

A few options are available to specify after --test:

These short options can be combined so, for example, --test -qxx is the same as --test -q -x -x.

Difference between capability detection and regression tests

Note the difference between these regression tests and the cap tests listed in Appendix A. The latter are tests for whatever shell is executing modernish: they detect capabilities (features, quirks, bugs) of the current shell. They are meant to be run via thisshellhas and are designed to be taken advantage of in scripts. On the other hand, these tests run by modernish --test are regression tests for modernish itself. It does not make sense to use these in a script.

New/unknown shell bugs can still cause modernish regression tests to fail, of course. That's why some of the regression tests also check for consistency with the results of the capability detection tests: if there is a shell bug in a widespread release version that modernish doesn't know about yet, this in turn is considered to be a bug in modernish, because one of its goals is to know about all the shell bugs in all released shell versions currently seeing significant use.

Testing modernish on all your shells

The testshells.sh program in share/doc/modernish/examples can be used to run the regression test suite on all the shells installed on your system. You could put it as testshells in some convenient location in your $PATH, and then simply run:

testshells modernish --test

(adding any further options you like – for instance, you might like to add -q to avoid very long terminal output). On first run, testshells will generate a list of shells it can find on your system and it will give you a chance to edit it before proceeding.

Appendix C: Supported locales

modernish, like most shells, fully supports two system locales: POSIX (a.k.a. C, a.k.a. ASCII) and Unicode's UTF-8. It will work in other locales, but things like converting to upper/lower case, and matching single characters in patterns, are not guaranteed.

Caveat: some shells or operating systems have bugs that prevent (or lack features required for) full locale support. If portability is a concern, check for thisshellhas WRN_MULTIBYTE or thisshellhas BUG_NOCHCLASS where needed. See Appendix A.

Scripts/programs should not change the locale (LC_* or LANG) after initialising modernish. Doing this might break various functions, as modernish sets specific versions depending on your OS, shell and locale. (Temporarily changing the locale is fine as long as you don't use modernish features that depend on it – for example, setting a specific locale just for an external command. However, if you use harden, see the important note in its documentation!)

Appendix D: Supported shells

Modernish builds on the POSIX 2018 Edition standard, so it should run on any sufficiently POSIX-compliant shell and operating system. It uses both bug/feature detection and regression testing to determine whether it can run on any particular shell, so it does not block or support particular shell versions as such. However, modernish has been confirmed to run correctly on the following shells:

Currently known not to run modernish due to excessive bugs:

Appendix E: zsh: integration with native scripts

This appendix is specific to zsh.

While modernish duplicates some functionality already available natively on zsh, it still has plenty to add. However, writing a normal simple-form modernish script turns emulate sh on for the entire script, so you lose important aspects of the zsh language.

But there is another way – modernish functionality may be integrated with native zsh scripts using 'sticky emulation', as follows:

emulate -R sh -c '. modernish'

This causes modernish functions to run in sh mode while your script will still run in native zsh mode with all its advantages. The following notes apply:

See man zshbuiltins under emulate, option -c, for more information.

Appendix F: Bundling modernish with your script

The modernish installer install.sh can bundle one or more scripts with a stripped-down version of the modernish library. This allows the bundled scripts to run with a known version of modernish, whether or not modernish is installed on the user's system. Like modernish itself, bundling is cross-platform and portable (or as portable as your script is).

Bundled scripts are not modified. Instead, for each script, a wrapper script is installed under the same name in the installation root directory. This wrapper automatically looks for a suitable POSIX-compliant shell that passes the modernish battery of fatal bug tests, then sets up the environment to run the real script with modernish on that shell. Your modernish script can be run through the supplied wrapper script from any directory location on any POSIX-compliant operating system, as long as all files remain in the same location relative to each other.

Bundling is always a non-interactive installer operation, with options specified on the command line. The installer usage for bundling is as follows:

install.sh -B -D rootdir [ -d subdir ] [ -s shell ] scriptfile [ scriptfile ... ]

The -B option enables bundling mode. The option does not itself take an option-argument. Instead, any number of scriptfiles to bundle can be given as arguments following all other options. All scripts are bundled with a single copy of modernish. The bundling operation does not deal with any auxiliary files the scripts may require (other than modernish modules); any such need to be added manually after bundling is complete.

The -D option specifies the path to the bundled installation's root directory, where wrapper scripts are installed. This option is mandatory. If the directory doesn't exist, it is created.

The -d option specifies the subdirectory of the -D root directory where the bundled scripts and modernish are installed. It can contain slashes to install the bundle at a deeper directory level. The default subdirectory is bndl. The option-argument can be empty or /, in which case the bundle is installed directly into the installation root directory.

The -s option specifies a preferred shell for the bundled scripts. A shell name or a full path to a shell can be given. Wrapper scripts try the full path first (if any), then try to find a shell with its basename, and then try to find a shell with that basename minus any version number (e.g. bash instead of bash-5.0 or ksh instead of ksh93). If all that doesn't produce a shell that passes fatal bugs tests, it continues with the normal shell search.

This means the script won't fail to launch if the preferred shell can't be found. Instead, it is up to the script itself to refuse to run if required shell-specific conditions are not met. Script should use the thisshellhas function to check for any nonstandard capabilities required, or any bugs or quirks that the script is incompatible with (or indeed requires!).

Bundling is supported for both portable-form and simple-form modernish scripts. The installer automatically adapts the wrapper scripts to the form used. For simple-form scripts, the directory containing the bundled modernish core library (by default, .../bndl/bin/modernish) is prefixed to $PATH so that . modernish works. Since simple-form scripts are often more shell-specific, you may want to specify a preferred shell with the -s option.

To save space, the bundled copy of the modernish library is reduced such that all comments are stripped from the code, interactive use is not supported, the regression test suite is not included, thisshellhas does not have the --cache and --show operators, and the cap/*.t capability detection scripts are "statically linked" (directly included) into bin/modernish instead of shipped as separate files. A README.modernish file is added with a short explanation, the licence, and a link for people to get the complete version of modernish. Please do not remove this when distributing bundled scripts.


EOF