Home

Awesome

% pickle(1) | A Small TCL like interpreter

NAME

PICKLE - A Small and Embeddable TCL like interpreter and library

SYNOPSES

pickle files...

pickle

DESCRIPTION

Author:     Richard James Howe / Salvatore Sanfilippo
License:    BSD
Repository: <https://github.com/howerj/pickle>
Email:      howe.r.j.89@gmail.com
Copyright:  2007-2016 Salvatore Sanfilippo
Copyright:  2018-2020 Richard James Howe

This is a copy, and modification, of a small interpreter written by Antirez in about 500 lines of C, this interpreter is for a small TCL like language. The blog post describing this interpreter can be found at http://oldblog.antirez.com/post/picol.html, along with the code itself at http://antirez.com/picol/picol.c.txt. It does a surprising amount for such a small amount of code. This project is a little bit bigger than the original at around ~6000 lines.

LICENSE

The files pickle.c and pickle.h are licensed under the 2 clause BSD License, as are all the other files in this project.

BUILDING

To build you will need a C compiler and Make.

Type 'make' to build the executable 'pickle' (or 'pickle.exe') on Windows. To run type 'make run', which will drop you into a pickle shell. 'make test' will run the built in unit tests and the unit tests in shell.

RUNNING

To run the project you will need to build it, the default makefile target will do this, type:

make

Or:

make pickle

This will build the pickle library and then link this library with an example program contained in main.c. This example program is very simple and adds a few commands to the interpreter that do not exist in the library ("gets", "puts", "clock", "getenv", "exit", "source", "clock" and "heap"), this is the minimal set of commands that are needed to get a usable shell up and running and do performance optimization.

The executable 'pickle' that is built is quite simple, it executes all arguments given to it on given to it on the command line as scripts. There are no options, flags or detection of an interactive session with isatty. This makes usage of the interpreter in interactive sessions challenging, instead, the language itself can be used to define a shell and process command line arguments. This is done in a program called 'shell'. As mentioned it contains the unit tests for the project, as well as other subprograms, and most importantly it contains the interactive shell that reads a line at a time and prints the result.

The code in shell is quite large, if you do not want to use it an incredibly minimal shell can be defined with the code:

#!./pickle

set r 0
while { } {
	puts -nonewline "pickle> "
	set r [catch [list eval [gets]] v]
	puts "\[$r\] $v"
}

For those experienced with TCL some differences in the 'while' and 'gets' command should be apparent.

The Picol Language

The internals of the interpreter do not deviate much from the original interpreter, so the document on the picol language still applies. The language is like a simplified version of TCL, where everything is a command and the primary data structure is the string.

Some programmers seem to an obsessive interest in their language of choice, do not become one of those programmers. This language, like any other language, will not solve all of your problems and may be entirely unsuitable for the task you want to achieve. It is up to you to evaluate whether this language, and implementation of it, is suitable.

Language and implementation advantages:

Disadvantages:

Potential Improvements:

The interpreter is fairly small, on my 64-bit x86 machine the interpreter weighs in at 100KiB (stripped, dynamically linked to C library, Linux ELF). The interpreter can be configured so that it is even smaller, for example:

On Debian Linux, x86_64, dyanmically linking against glibc, using
gcc version 8.3.0, with version 4.1.4 of the interpreter:
Size    Options/Notes
100KiB  Normal target, optimized for speed (-O2).
84KiB   No debugging (-DNDEBUG), optimized for speed (-O2).
54KiB   No debugging (-DNDEBUG), 32-bit target (-m32), optimized
        for size (-Os), stripped, No features disabled.
34KiB   No debugging, 32-bit target, optimized for size, stripped,
        with as many features disabled as possible.

On Debian Linux, x86_64, statically linked against musl C library:
147KiB  Normal target, optimized for speed, statically linked.

This is still larger than I would like it to be, the original picol interpreter in the smallest configuration (32 bit target, optimized for size), comes in at 18KiB.

The language itself

Picol, and TCL, are dynamic languages with only one real data type, the string. This might seem inefficient but it is fine for a glue language whose main purpose is to bind lots of things written in C together. It is similar to lisp in ways, it is homoiconic, and is simple with very little in the way of syntax.

The following table sums up the different language constructs:

string  called if first argument
{ }     quote, used to prevent evaluation
[ ]     command substitution
" "     string
$var    variable lookup
\c      escape a character
#       comment
;       terminates a command

A Picol program consists of a series of commands and arguments to those commands. Before a command is evaluated, variables are looked up and strings substituted.

You may have noticed that things such as 'if' or 'while', and even procedure definition, are not part of the languages syntax. Instead, they are built in commands and are called like any other command.

Examples of commands:

puts "Hello, World"
"puts" "Hello, World"

# prints "Hello, World"
set cmd puts
$cmd "Hello, World"

# prints "Hello, World"
set a pu
set b ts
$a$b "Hello, World"

+ 2 2
- 4 5
if {bool 4} { puts "TRUE"}

proc x {a b} { + $a $b }
puts "x(3, 9) == [x 3 9]"

# prints 4 to 10 inclusive
set z 3
while {< $z 10} { set z [+ $z 1]; puts $z }

To best understand the language, play around with it, and look at the source, there really is not that much there.

Internally Defined Commands

Picol defines the commands in this section internally, in a default build all of the commands in this section will be available. There are some build options to remove some commands (such as the string function, the math functions, and the list functions).

The options passed to the command and type are indicated after the command, a question mark suffix on an argument indicates an optional command, an ellipsis indicates an optional series of arguments.

For some concrete examples of commands being run, see unit.tcl, which contains unit tests for the project.

'argv' is a variable, not a function, which should contain the arguments passed to the pickle interpreter on the command line in a TCL list.

Create a variable, or overwrite an existing variable, with a value. If only one argument is given, it returns the value of that variable if it exists or an error if it does not.

if is the command used to implement conditional execution of either one clause, or one clause or (exclusive or) another clause. Like in every other programming language ever (or more accurately the languages with more than one user, the implementer).

Keep executing the while clause whilst the condition is true (ie. is non-zero).

Break out of a while loop. This will continue to break out of a things until the return code is caught by a loop, or 'catch'.

Desist from executing the rest of the clause in a while loop, and go back to testing the condition.

Create a new command with the name 'identifier', or function if you prefer, with the arguments in 'argument list', and code to be executed in the 'function body'. If the final command is not a 'return' then the result of the last command is used.

There is a special case whereby the last argument in the argument list is called 'args', if this is the case then the renaming arguments are concatenated together and passed in to the function body. This allows variadic functions to be created.

Optionally return a string, optionally with an internal number that can affect control flow.

Evaluate the 'strings...' in the scope indicated by 'number'. A special case is '#0', which is the global context. The strings are concatenated together as with if they have been run through the 'concat' command. A scope of 0 is the current scope, of 1, the caller, of 2, the caller's caller, and so on. A '#' prefix is meant to reverse the search and start from the global scope and work down through the call stack, however only '#0' is supported.

Form a link from myVar to otherVar in the scope specified by number. A special case is '#0', which is the global context, see 'uplevel' for a description of the scoping traversal rules implied by the number argument.

You may have noticed that 'upvar' and 'uplevel', which come from TCL, are strange, very strange. No arguments from me.

Unset a variable, removing it from the current scope.

Concatenate a list of strings with a space in-between them, as with 'concat', then evaluate the string, returning the result of the evaluation.

Applies an argument list to a function body, substituting the provided arguments into the variables.

Examples:

# Returns 4
apply {{x} {* $x $x}} 2
# Returns 7
apply {{x y} {+ $x $y}} 3 4

It essential allows for anonymous functions to be made.

The following mathematical operations are defined:

'+', '-', '*', '/', 'mod', '<', '<=', '>', '>=', '==', '!=', 'min', 'max', 'pow', and 'log'. It should be obvious what each one does.

It should be noted that because all variables are stored internally as strings, mathematical operations are egregiously slow. Numbers are first converted to strings, the operation performed, then converted back to strings. There are also some bitwise operations; 'lshift', 'rshift', 'and', 'or', 'xor'. These mathematical operations can accept a list integers. '&', '|' and '^' are aliases for 'and', 'or' and 'xor' respectively. '&&' and '||' implement logical 'and' and 'or', but all arguments are evaluated -- and it is not a bug!.

There are also the following unary mathematical operators defined: 'not'/'!' (logical negation), 'invert'/'~' (bitwise inversion), 'abs' (absolute value), 'bool' (turn number into a boolean 0 or 1), 'negate' (negate a number). '-' is not defined as negate, as that symbol is already used for subtraction.

Numbers conversion is strict, an invalid number will not be silently converted into a zero, or a string containing a part of a number will not become that number, for example: "0", "-1" and "12" are valid numbers, whilst; "0a", "x", "--2", "22x" are not.

This allows arbitrary codes to be caught, 'catch' evaluates an expression and puts the return code into 'varname', the string returned is the result of the evaluation of 'expr'.

This function is used to inspect the currently defined commands in the system.

If no arguments are given then the number of commands defined is returned. If an item is given a number indicates which command that it applies to. Commands are indexed by numbers. Defining new command may change the index of other commands. Commands are either user defined or built in commands.

Given a TCL list, 'join' will flatten that list and return a string by inserting a String in-between its elements. For example "join {a b c} ," yields "a,b,c".

'conjoin' works the same as 'join' except instead of a list it joins its arguments, for example:

join {a b c} ,
conjoin , a b c

Are equivalent.

Implements a for loop.

Rename a function to 'new-name', this will fail if the function does not exist or a function by the same name exists for the name we are trying to rename to. A special case exists when the new-name is an empty string, the function gets deleted.

Get the length a list. A TCL list consists of a specially formatted string argument, each element of that list is separated by either space or is a string or quote. For example the following lists each contain three elements:

"a b c"
"a { b } c"
"a \" b \" c"

The list is the basic higher level data structure in Pickle, and as you can see, there is nothing special about them. They are just strings treated in a special way. Processing these lists is incredibility inefficient as everything is stored as a string - a list needs to be parsed before it can be manipulated at all. This applies to all of the list functions. A more efficient, non-TCL compatible, set of list functions could be designed, or the internals of the library could be changed so they are more complex (which would help speeding up the mathematical functions), but either option is undesirable for different reasons.

See 'llength'.

Index into a list, retrieving an element from that list. Indexing starts at zero, the first element being the zeroth element.

Repeat a string a number of times to form a list.

Examples:

pickle> lrepeat 3 abc
abc abc abc
pickle> lrepeat 2 {x x}
{x x} {x x}

Look up a variable containing a list and set the element specified by an index to be equal to 'value'.

Insert a value into a list at a specified index, indices less than zero are treated as zero and greater than the last element are appended to the end of the list.

Replace ranges of elements within a list, the function has a number of special cases.

This command sorts a list, it uses insertion sort internally and lacks many of the options of the full command. It does implement the following options:

Sort the list in increasing order.

Sort the list in decreasing order.

The list is a series of strings that should be stored in ASCII order.

The list is a series of numbers that should be sorted numerically.

Reverse the elements in a list.

Extract a range from a list.

The search command attempts to find a pattern within a list and if found it returns the index as which the pattern was found within the list, or '-1' if it was not found.

Do a case insensitive search, beware this is ASCII only!

Invert the selection, matching patterns that do not match.

Pattern is an exact string to search for.

The pattern is a number to search for.

This subcommand uses the same regex syntax (and engine) as the 'string match' subcommand, it is quite limited, and it is the default search option.

Instead of returning the index, return the found element.

Start at the specified index instead of at zero.

Split a string into a list, the value to split on is not a regular expression, but a string literal. There is a special case where the value to split on is the empty string, in this case it splits a string into a list of its constituent characters.

Append values to a list, stored in a variable, the function returns the newly created list.

Turn arguments into a list, arguments with spaces in them are quoted, the list command returns the concatenation of the escaped elements.

Trim arguments before concatenating them into a string.

'reg' implements a small regular expression engine that can be used to extract matches from text. It has a few options that can be passed to it, and a few virtues; lazy, greedy and possessive.

Ignore case when matching a string.

Set the start of the string to match from, numbers less than zero are treated as zero, and numbers greater than the length of the string are treated as referring to the end of the string.

Match the shortest string possible.

Match the longest string possible.

Match the longest string possible, with no backtracking. If backtracking is necessary the match fails.

This command is not defined at startup, but can be defined by the user to catch command-not-found exceptions.

When the interpreter encounters a command that has not been defined it attempts to find the 'unknown' command and execute that. If it is not found, it performs its default action, which is to throw an error and return an error string indicating the command has not been found. If the 'unknown' command has been found then it is executed with the command and its arguments being passed to 'unknown' as a list.

For example, defining:

proc unknown {args} { system "$args" }

Would mean any command the interpreter does know know about will be executed by the system shell, including its arguments, provided the system command is defined.

If an unknown command is found within the unknown function then a generic error message is returned instead.

This command can be used to turn tracing on, off, or to query the status of tracing. The TCL trace command is quite powerful, this one is far more limited.

This command not defined at startup, but can be defined by the user. This can be used to trace the execution of the program.

The commands executed within tracer will not be traced.

The 'info' command is used to query the status of the interpreter and supports many subcommands. The subcommands that are supported are:

Match defaults to '*'. Get a list of all defined commands filtered on 'match'.

Match defaults to '*'. Get a list of all commands defined with 'proc' filtered on 'match'.

Match defaults to '*'. Get a list of all mathematical functions filtered on 'match'.

Match defaults to '*'. Get a list of all defined locals filtered on 'match'.

Match defaults to '*'. Get a list of all defined globals filtered on 'match'.

Get the current 'level' of the interpreter, which is the degree of nesting or scopes that exist relative to the top level scope. Entering a function increases the level by one, for example.

Get the number of commands executed since startup, this can be used as a crude form of a performance counter if the command clock is not available.

Return the version number of the interpreter in list format "major minor patch", semantic versioning is used.

Does the 'line' constitute a command that can be called (which may result in an error)? Or 'does this line parse correctly'? "0" is returned if it cannot, "1" is returned if it can.

Does 'variable' exist in the current scope, "0" is returned if it does not whilst "1" is returned if it does.

Get the arguments of the named function. Functions that are defined in C will returned the string 'built-in', otherwise a list is returned containing the function arguments.

Get the body of the named function. Functions that are built in functions defined in C will return a function pointer that represents that C function. Functions defined with 'proc' will return the body of the function as a string.

Get the private data of a function.

The "system" subcommand is used to access various attributes that have been set in the interpreter at compile time or due to the environment that the system is compiled for.

Attributes that can be looked up are:

  1. "pointer": size of a pointer in bits.
  2. "number": size of a number in bits.
  3. "recursion": recursion depth limit.
  4. "length": maximum length of a string or -1 if string length is unlimited.
  5. "min": minimum size of a signed number.
  6. "max": maximum size of a signed number.
  7. "string": are string operations defined?.
  8. "maths": are math operations defined?.
  9. "list": are list operations defined?.
  10. "regex": are regular expression operations defined?.
  11. "help": are help strings compiled in?.
  12. "debugging": is debugging turned on?.
  13. "strict": is strict numeric conversion turned on?.

String Operator

The 'string' command in TCL implements nearly every string command you could possibly want, however this version of 'string' is more limited and behaves differently in many circumstances. 'string' also pulls in more standard C library functions from 'ctype.h' and 'string.h'.

Some of the commands that are implemented:

This command is a primitive regular expression matcher, as available from http://c-faq.com/lib/regex.html. What it lacks in functionality, safety and usability, it makes up for by being only ten lines long (in the original). It is meant more for wildcard expansion of file names (so '?' replaces the meaning of '.' is most regular expression languages). '\' is used as an escape character, which escapes the next character.

The following operations are supported: '*' (match any string) and '?' (match any character). By default all patterns are anchored to match the entire string, but the usual behavior can be emulated by prefixing the suffixing the pattern with '*'.

If 'class' is empty, a white-space class is used. 'trimleft' removes leading characters in a Class from the given String.

If 'class' is empty, a white-space class is used. 'trimleft' removes trailing characters in a class from the given String.

If 'class' is empty, a white-space class is used. 'trimleft' removes both leading and trailing characters in a class from the given String.

Get the length of String. This is a simple byte length excluding an ASCII NUL terminator.

Convert an ASCII String to lower case.

Convert an ASCII String to upper case.

Reverse a string.

Compare two strings for equality. Returns '1' if equal, '0' if not equal. This comparison is case sensitive.

Compare two strings.

Retrieve a character from a String at the specified Index. The index starts at zero for the first character up to the last character. Indices past the last character return the last character. Negative indexes starting counting from the last character (the last character being -1) and count downward, negative indexes that go before the first character return the first character.

'is' determines whether a given String belongs to a Class. Most class tests accept a zero length string as matching that class with a few exceptions. Most class tests test that a class contains only certain characters (such as 'alpha' which checks that a string only contains the characters 'a-z' and 'A-Z', or 'digit', which checks that a string only contains the characters '0-9'. Other class tests test that a string matches a specific format, such as 'integer' (which does not accept a zero length string), it excepts the string to contain a decimal number with an optional '+' or '-' prefix.

Class can be:

- [alnum][]
- [alpha][]
- [digit][]
- [graph][]
- [lower][]
- [print][]
- [punct][]
- [space][]
- [upper][]
- [xdigit][]
- ascii
- [control][]
- integer

Any other Class is invalid. Most classes are based on a C function (or macro) available in the ctype.h header.

Repeat a String 'Count' many times. 'Count' must be positive, inclusive of zero.

Find a Needle in a Haystack, optionally starting from 'StartIndex'. The index into the string where the first character of found of Needle in Haystack is returned if the string has been found, negative one if it has not been found.

Convert the first character in a string to a number that represents that character.

Convert a number to its character representation.

Convert a lower or uppercase hexadecimal number to its decimal representation.

Convert a decimal number to its lowercase hexadecimal representation.

Hash a string returning the hash of that string as a number.

Create a sub-string from Index1 to Index2 from a String. If Index1 is greater than Index2 an empty string is returned. If Index1 is less than zero, it is set to zero, if Index2 is greater than the index of the last character, it is set to the index of the last character. Indexing starts at zero and goes up to one less than the strings length (or zero of empty string), which is the index of the last character. The characters from Index1 to Index2 inclusive form the sub-string.

Much like the Unix utility 'tr', this performs various translations given a set (or two sets of characters). 'tr' can delete characters in the set of characters in 'set' from 'string' if the option provided to it is 'd', or it can perform a translation from one set to another if the 'r' specifier is given. If the second set is larger than the first for the 'r' command the last character applies to the rest of the characters in 'set2'.

Both 'r' and 'd' options can both have the additional specifier 'c', which compliments the given 'set' or characters.

'r' can also have the 's' specifier, which will squeeze repeated characters in the set

Example:

proc lowercase {x} {
	string tr r abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ $x
}

Which creates a function with the same functionality as 'string lowercase $x'.

This subcommands replaces a substring starting at 'first' and ending at 'last'. The 'new-string' replaces the removed section of the 'old-string'.

Returns '0' is two strings are not equal and '1' if they are. Unlike '==' this acts on the entire string.

Returns '1' is two strings are not equal and '0' if they are. Unlike '!=' this acts on the entire string.

Increment a variable by 1, or by an optional value. 'incr' returns the incremented variable. 'incr' being implemented in C is usually a lot more efficient then defining 'incr' in TCL, like so:

proc incr {x} { upvar 1 $x i; set i [+ $i 1] }

And it is used often in looping constructs.

Optionally perform substitutions on a string, controllable via three flags. When you enter a string in a TCL program substitutions are automatically performed on that string, 'subst' can be used to perform a subset of those substitutions (command execution, variable substitutions, or escape character handling) a string.

Disable escape characters.

Do not process variables.

Do not process command substitutions.

Extension Commands

These commands are present in the main.c file and have been added to the interpreter by extending it. They deal with I/O.

Read in a new-line delimited string, returning the string on success, on End Of File it returns 'EOF' with a return code of 'break'.

Write a line to stdout, the option '-nonewline' may be specified, which means no newline with be appended to the string.

If no string is given, then a single new line is printed out.

Retrieve an environment variable by the name 'string', returning it as a string.

Exit the program with a status of 0, or with the provided status number.

A simplified version of the TCL command 'clock' the subcommands it supports are:

Return the CPU clock.

Return the seconds since the Unix Epoch.

The format command a time in seconds since the Unix Epoch against an optional time-specification (the default time specification is "%a %b %d %H:%M:%S %Z %Y"). The formatting is done entirely by the function strftime.

There are internal limits on this string length (512 bytes excluding the NUL terminator).

This command is useful for inspecting the size of the heap, it can report the number of bytes allocator, the number of frees, the number of allocations, and other statistics.

The options are:

This is the number of frees that have taken place, excluding freeing 'NULL'.

This is the number of allocations that have taken place, including any reallocations.

This is the total number of bytes that have been allocated.

This is the number of reallocations that have been performed on an already allocated pointer.

The "heap" command has another subcommand "fail-after", which is used for internal testing purposes, it takes a number and after that many calls to the allocation function it causes it to return a failure, which is fatal to the interpreter (but should not cause a crash). You should not need to use this subcommand. Calling this subcommand again resets the count until failure, setting the count to zero disables deliberate failure. This feature could be used as a crude watchdog, but it would be inadvisable to do so.

Read and then evaluate a file off of disk. This may fail because the file could not be read or something when wrong during the evaluation.

Compile Time Options

I am not a big fan of using the C Preprocessor to define a myriad of compile time options. It leads to messy and unreadable code.

That said the following compile time options are available:

If defined this will disable assertions. It will also disable unit tests functions from being compiled. Assertions are used heavily to check that the library is being used correctly and to check the libraries internals, this applies both to the block allocation routines and pickle itself.

There are other compile time options within pickle.c that control; maximum string length and whether to use one, whether to provide the default allocator or not, whether certain functions are to be made available to the interpreter or not (such as the command 'string', the mathematical operators and the list functions), and whether strict numeric conversion is used. These options are semi-internal, they are subject to change and removal, you should use the source to determine what they are and be aware that they may change across releases.

Inevitably when an interpreter is made for a new language, readline (or linenoise) integration is a build option, usually because the author is tired of pressing the up arrow key and seeing '^[[A'. Naturally this increases the complexity of the build system, adds more options, and adds more code. Instead you can use rlwrap, or an alternative, as a wrapper around your program.

Custom Allocator / C Library usage

To aid in porting the system to embedded platforms, pickle.c contains no Input and Output functions (they are added in by registering commands in main.c). pickle.c does include stdio.h, but only to access vsnprintf. The big problem with porting a string heavy language to an embedded platform, unlike a language like FORTH, is memory allocation. It is unavoidable that some kind of dynamic memory allocation is required. For this purpose it is possible to provide your own allocator to the pickle library. If an allocator is not provided, malloc will be used, you can remove this from the initialization function in to stop your build system pulling in your platforms allocator.

The block allocation library provided in [block.c][] can be optionally used, but unlike malloc will require tweaking to suite your purposes. The maximum block size available to the allocator will also determine the maximum string size that can be used by pickle.

Apart from vsnprintf, the other functions pulled in from the C library are quite easy to implement. They include (but are not necessarily limited to); strlen, memcpy, memchr, memset and abort.

C API

The language can be extended by defining new commands in C and registering those commands with the pickle_command_register function. The internal structures used are mostly opaque and can be interacted with from within the language. As stated a custom allocator can be used and a block allocator is provided, it is possible to do quite a bit with this scripting language whilst only allocating about 32KiB of memory total on a 64-bit machine (for example all of the unit tests and example programs run within that amount).

The C API is small and regular. All of the functions exported return the same error codes and implementing an interpreter loop is trivial.

The language can be extended with new functions written in C, each function accepts an integer length, and an array of pointers to ASCIIZ strings - much like the 'main' function in C.

User defined commands can be registered with the 'pickle_command_register' function. With in the user defined callbacks the 'pickle_result_set' family of functions can be used. The callbacks passed to 'pickle_command_set' look like this:

typedef int (*pickle_func_t)(pickle_t *i, int argc, char **argv, void *privdata);

The callbacks accept a pointer to an instance of the pickle interpreter, and a list of strings (in 'argc' and 'argv'). Arbitrary data may be passed to the custom callback when the command is registered.

The function returns one of the following status codes:

PICKLE_ERROR    = -1 (Throw an error until caught)
PICKLE_OK       =  0 (Signal success, continue execution)
PICKLE_RETURN   =  1 (Return out of a function)
PICKLE_BREAK    =  2 (Break out of a while loop)
PICKLE_CONTINUE =  3 (Immediately proceed to next iteration of while loop)

These error codes can affect the flow control within the interpreter. The actual return string of the callback is set with 'pickle_result_set' functions.

Variables can be set either within or outside of the user defined callbacks with the 'pickle_var_set' family of functions.

The pickle library does not come with many built in functions, and comes with no Input/Output functions (even those available in the C standard library) to make porting to non-hosted environments easier. The example test driver program does add functions available in the standard library.

The following is the source code for a simple interpreter loop that reads a line and then evaluates it:

#include "pickle.h"
#include <stdio.h>
#include <stdlib.h>

static void *allocator(void *arena, void *ptr, size_t oldsz, size_t newsz) {
	if (newsz ==     0) { free(ptr); return NULL; }
	if (newsz  > oldsz) { return realloc(ptr, newsz); }
	return ptr;
}

static int prompt(FILE *f, int err, const char *value) {
	if (fprintf(f, "[%d]: %s\n> ", err, value) < 0)
		return -1;
	return fflush(f) < 0 ? -1 : 0;
}

int main(void) {
	pickle_t *p = NULL;
	if (pickle_new(&p, allocator, NULL) < 0)
		return 1;
	if (prompt(stdout, 0, "") < 0)
		return 1;
	for (char buf[512] = { 0 }; fgets(buf, sizeof buf, stdin);) {
		const char *r = NULL;
		const int er = pickle_eval(p, buf);
		if (pickle_result_get(p, &r) != PICKLE_OK)
			return 1;
		if (prompt(stdout, 0, r) < 0)
			return 1;
	}
	return pickle_delete(p);
}

It should be obvious that the interface presented is not efficient for many uses, treating everything as a string has a cost. It is however simple and sufficient for many tasks.

While API presented in 'pickle.h' is small there are a few areas of complication. They are: The memory allocation API, registering a command, the getopt function and the unit tests. The most involved is the memory allocation API and there is not too much to it, you do not even need to use it and can pass a NULL to 'pickle_new' if for the allocator argument if you want to use the built in malloc/realloc/free based allocator (provided the library was built with support for it).

It may not be obvious from the API how to go about designing functions to integrate with the interpreter. The C API is deliberately kept as simple as possible, more could be exported but there is a trade-off in doing so; it places more of a burden on backwards compatibility, limits the development of the library internals and makes the library more difficult to use. It is always possible to hack your own personal copy of the library to suite your purpose, the library is small enough that this should be possible.

The Pickle interpreter has no way of registering different types with it, the string is king. As such, it is not immediately clear what the best way of adding functionality that requires manipulating non-string data (such as file handles or pointers to binary blobs) is. There are several ways of doing this:

  1. Convert the pointer to a string and add functions which deal with this string.
  2. Put data into the private data field
  3. Create a function which registers another function that contains private data.

Option '1' may seem natural, but it is much more error prone. It is possible to pass the wrong string around and cause the program to crash. Option '2' is limiting, the C portion of the program is entirely in control of what resources get added, and only one handle to a resource can be controlled. Option '2' is a good option for certain cases.

Option '3' is the most general and allows an arbitrary resource to be managed by the interpreter. The idea is to create a function that acquires the resource to be managed and registers a new function in the pickle global function namespace with the resource in the private data field of the newly registered function. The newly created function, a limited form of a closure, can then perform operations on the handle. It can also cleanup the resource by release the object in its private data field, and then deleting itself with the 'pickle_command_rename' function. An example of this is the 'fopen' command, it returns a closure which contains a file handle.

An example of using the 'fopen' command and the returned function from within the pickle interpeter is:

set fh [fopen file.txt rb]
set line [$fh -gets]
$fh -close

And an example of how this might be implemented in C is:

int pickleCommandFile(pickle_t *i, int argc, char **argv, void *pd) {
	FILE *fh = (FILE*)pd;
	if (!strcmp(argv[1], "-close")) { /* delete self */
		fclose(fh);                                /* free handle */
		return pickle_command_rename(argv[0], ""); /* delete self */
	}
	if (!strcmp(argv[1], "-gets")) {
		char buf[512];
		fgets(buf, sizeof buf, fh);
		return pickle_result_set(i, "%s", buf);
	}
	return pickle_result_set(i, PICKLE_ERROR, "invalid option");
}

int pickleCommandFopen(pickle_t *i, int argc, char **argv, void *pd) {
	char name[64];
	FILE *fh = fopen(argv[1], argv[2]);
	sprintf(name, "%p", fh); /* unique name */
	pickle_command_register(i, name, pickleCommandFile, fh);
	return pickle_set_result(i, "%s", name);
}

The code illustrates the point, but lacks the assertions, error checking, and functionality of the real 'fopen' command. The 'pickleCommandFopen' should be registered with 'pickle_command_rename', the 'pickleCommandFile' is not as 'pickleCommandFopen' does the registering when needed.

It should be possible to implement the commands 'update', 'after' and 'vwait', extending the interpreter with task management like behavior without any changes to the API. It is possible to implement most commands, although it might be awkward to do so. Cleanup is still a problem.

Style Guide

Style/coding guide and notes, for the file pickle.c:

The callbacks all have their 'argv' argument defined as 'char*', as they do not modify their arguments. However adding this in just adds a lot of noise to the function definitions. Also see http://c-faq.com/ansi/constmismatch.html.

Notes

There are other implementations of TCL and other extensions of the original picol interpreter, here are a few:

And I am sure if you were to search https://github.com you would find more.

One of the goals of the interpreter is low(ish) memory usage, there are a few design decisions that go against this, along with the language itself, however an effort has been made to make sure memory usage is kept low.

Some of the (internal) decisions made:

Some of the design decisions made that prevent and hamper memory usage and things that could be done:

It would be possible to include some simple complication on the procedures that are stored, turning certain keywords into bytes codes that fall outside of the UTF-8 and ASCII character ranges, as well as removing runs of white space and comments entirely. This would be possible to implement without changing the interface and would both speed things up and reduce memory usage, however it would increase the complexity of the implementation (perhaps by about 500 LoC if that can be thought of as a proxy for complexity).

If you need an implementation of vsnprintf the Musl C library has one. This is the most complicate C function in use from the standard library and the one most likely not to be available in an embedded platform (although the base software packages are getting better nowadays). It is not difficult to make your own version of vsnprintf function usable by this library as you do not need to support all of the functionality library function, for example, floating point numbers are not used within this library.

The list functions are also far to complex, big, and error prone, they should be rewritten.

It might be nice to go back to the original source, with what I know now, and create a very small version of this library with a goal of compiling to under 30KiB. The 'micro' makefile target does this somewhat, or just starting from scratch and making my own version. A smaller API could be made as well, there really only needs to be; pickle_new, pickle_delete, pickle_eval, pickle_command_register, and pickle_result_set.

This interpreter lacks a module system, there are a few small and simple modules that could be integrated with the library quite easily, see; Constant Data Base Library https://github.com/howerj/cdb, A HTTP 1.1 client https://github.com/howerj/httpc, Tiny compression routines https://github.com/howerj/shrink, and a fixed point arithmetic library https://github.com/howerj/q, and UTF-8 string handling https://github.com/howerj/utf8 These would have to be external modules that could be integrated with this library.

The current project that attempts to remedy this is available at:

https://github.com/howerj/mod-pickle

A proper module system would also allow Shared Objects / Dynamically Linked Libraries to be loaded at run time into the interpreter. This complicates the library, but a Lisp Interpreter where I have done this, see https://github.com/howerj/liblisp.

Interpreter Limitations

Known limitations of the interpreter include: