Home

Awesome

Argdata

Argdata is a binary serialisation format inspired by YAML, MessagePack and nvlists.

It is originally designed and currently used in CloudABI but does not depend on anything specific to CloudABI.

The encoding is optimized for reading:

So, after the encoded argdata is stored in a buffer, no dynamic memory is needed at all to use any of the values contained within.

Types

Argdata values have one of the following types:

Library

This repository contains a C library for handling argdata. See the (well-commented) argdata.h for the details. A C++ interface is available for the same library in argdata.hpp. This repository also provides some example source files.

Binary encoding

The first byte of an encoded argdata value encodes the type of the value, using the tag value in the table below. Unlike many other serialization formats, encoded argdata values do not encode their own length. It is assumed that the length of the full buffer is already known. The length of the value is simply the length of the full buffer, minus the one tag byte encoding the type.

Null values do not have any tag, they don't even take any space at all: They are encoded as , nothing, zero bytes.

TypeTagValue bytes
binary01the binary data
bool02false: none, true: 01
fd0332-bit big-endian integer (e.g. 00 00 00 02 for fd 2)
float0464-bit big-endian IEEE754 float
int05N-bit signed big-endian integer (see below)
map06repeated subfield (see below): key, value, key, value, ...
seq07repeated subfield (see below): value, value, value, ...
string08null-terminated UTF-8
timestamp09N-bit signed big-endian integer (see below), encoding nanoseconds since 1970 (UTC)

For example, the string "123" is encoded as 08 31 32 33 00: The tag byte 08 for the type, followed by the UTF-8 encoding of the string, followed by a terminating null byte.

N-bit signed integers

Integers are signed, and encoded big-endian in as few bytes as needed. The C library doesn't support decoding and encoding integers that don't fit in intmax_t or uintmax_t (usually 64-bit), but the binary format has no restrictions on the size of integers.

Some examples:

ValueBytes
0none
101
1277F
-12880
-1FF
25500 FF
100003 E8
-1000FC 18
2^32 - 100 FF FF FF FF

Since integers are always encoded using the least bytes possible, no integer should be encoded as, for example, 00 or FF FF. (Since the values 0 and -1 can be encoded as an empty sequence and FF, respectively.)

Subfields

Subfields, inside the map and seq types, are encoded by their length, followed by the encoded value itself. The length is encoded as a variable length unsigned big-endian integer, where the high bit of each byte is not part of the integer, but indicates whether the byte is the last byte of the integer.

For example, the length of a subfield of 6 bytes is encoded as 86: The high bit is on, indicating that this is the last byte of the variable length integer, and the other 7 bits encode the value 6. As another example, a length of 128 is encoded as 01 80: Only the high bit of the second byte is set, indicating that two bytes are used, and the other 14 bits (7 bits of the first, and 7 bits of the second byte) encode the value 128.

Maps and sequences don't encode their own length, or number of elements stored. The subfields simply end when there are no more bytes left.

Maps always have an even number of subfields, as every pair represents a key with its value.

For example, a sequence of values 0, true, and "A" is encoded as: 07 81 05 82 02 01 83 08 41 00:

BytesMeaning
07seq type tag
81length of first subfield: 1
05first subfield: the integer 0
82length of second subfield: 2
02 01second subfield: the boolean value true
83length of the third subfield: 3
08 41 00third subfield: the string "A" (with null terminator)

File Descriptors

File descriptor numbers in argdata refer to open file descriptors in the same process or passed along with the message that contained the argdata. This is not useful on every platform, but can for example be used on CloudABI and POSIX systems.

File descriptors are always stored as an 32-bit integer in exactly four bytes, as opposed to integers which use only as many bytes as needed. The reason for this, is to allow substitution of file descriptors in encoded argdata. If the file descriptors were variable length, changing the value could involve resizing and thus re-encoding (part of) the argdata. Being able to substitute these values is useful since in many cases file descriptor numbers cannot be chosen freely.