Home

Awesome

Singeli

Introductions: Singeli as interpreter | Singeli as compiler -> Purity and Ford write a min filter

Singeli is a domain-specific language for building high-performance algorithms (including SIMD) with flexible abstractions over code that corresponds to individual instructions. It's implemented in BQN, with a frontend that emits IR and a backend that converts it to C (the IR is simple, so that other backends like LLVM or machine code could be supported without much work).

With 5k lines in production in CBQN, I think Singeli counts as usable! See also SingeliSort and 1brc. The core language should now be stable (and past breaking changes have been done with about a 6 month deprecation period). Standard includes are important for programming comfortably and are less solid in some areas, particularly the more sophisticated SIMD instructions. But if a better design comes up we'd probably introduce a new include rather than make major breaking changes.

To compile input.singeli:

$ singeli input.singeli [-o output.c]

For options see $ singeli -h or this section. To run singeli as an executable, ensure that CBQN is installed as bqn in your executable path, or call as /path/to/bqn singeli ….

Debugging compilation errors is easy enough since they come with parsing or stack traces (if not it's a bug—please report), and show{} prints whatever you want at compile time. At runtime, lprintf{} provided by include 'debug/printf' prints what you pass to it, and the emitted C code is rather verbose but it embeds source function and variable names you can use to get your bearings. For tooling, the interactive Singeli playground is a nice way to get parts of your code working without the awkward compile-debug loop, and singeli-lsp provides advanced editing features with parsing and name resolution.

Early design discussion for Singeli took place at topanswers.xyz; now it's in the BQN forum.

Language overview

Singeli is primarily a metaprogramming language. Its purpose is to build abstractions around CPU instructions in order to create large amounts of specialized code. Source code will tend to do complicated things at compile time to emit programs that do relatively simple things at runtime, so it's probably better to orient your thinking around what happens at compile time.

The primary tool for abstraction is the generator. Written with {parameters}, generators perform similar tasks as C macros, C++ templates, or generics, but offer more flexibility. They are expanded during compilation, and form a Turing-complete language. Generators use lexical scoping and allow recursive calls. Here's a generator that calls another one:

def gen{func, arg} = func{arg, arg + 1}

In fact, + is also a generator, if it's defined. Singeli operators aren't built in; instead the user declares each as infix or prefix syntax for a generator. For example, the following line from include/skin/cop.singeli makes + a left-associative infix operator with precedence 30 (there can be one infix and one prefix definition).

oper +  __add infix left  30

Generators can be extended with additional definitions. Each definition overrides the previous ones on its domain, which means that applying a generator searches backwards through all definitions visible in the current scope until it finds one that fits. gen above applies to any two arguments, but a definition can be narrowed using types and conditions:

def gen{x:T, n, n, S if 'number'==kind{n} and T<=S} = {
  cast{S, x} + n
}

Here, x must be a typed value, and have type T, and the two n parameters must match. And the condition following if must hold. Otherwise, previous definitions are tried and you get an error if there aren't any left.

The end goal here is to define functions for use by some other program. Functions are declared with the fn keyword and a parenthesis syntax at the top level, possibly with generator-like parameters. Types use value:type syntax rather than type value.

fn demo{T}(a:T, len:u64) : void = {
  while (len > 0) {
    a = a + 1
    len = len - 1
  }
}

Here you can see that code to be executed at runtime is written in an imperative style. if, else, and while control structures are provided. Statements are separated by newlines or semicolons; these two characters are equivalent.

If possible, iteration should be done with for-each loops. In SIMD programming these loops are very important and come in many varieties, taking vectorization and unrolling into account. So Singeli doesn't provide a single for loop structure, but rather a general mechanism for defining loops. Here's what a definition looks like:

def for{vars,begin,end,iter} = {
  i:u64 = begin
  while (i < end) {
    iter{i, vars}
    ++i
  }
}

While this loop is an ordinary generator (allowing other loops to call it), special syntax is used to invoke it on variables and a code block. Here's an example with two pointers:

@for (src,dst over i from 0 to len) {
  src = 1 + dst
}

Outside the loop, these variables are pointers, and inside it, they are individual values. It's the iter generator that performs this conversion, using the given index i to index into each pointer in vars. Each variable tracks whether its value was set inside the loop and writes to memory if this is the case.

Generators

A generator is essentially a compile-time function, which takes values called parameters and returns a result. There are a few ways to create generators:

# Anonymous generator, immediately invoked
def a_sq = ({x} => x*x){a}

# Named generator with two cases
def min{a, b} = a
def min{a, b if b<a} = b

# Generated function
fn triple{T}(x:T) = x + x + x

These are all effectively the same thing: there's a parameter list in {}, and a definition. When invoked, the body is evaluated with the parameters set to the values provided. Depending on the definition, this evaluation might end not with a static value, but with a computation to be performed at runtime (this always happens for functions). Generators are dynamically typed, and naturally have no constraints on the parameters or result. But the parameter list can include conditions that determine whether to accept a particular set of parameters. These are described in the next section; first let's go through the syntax for each case.

An anonymous generator is written {params} => body. If it's not the entire expression, it usually needs to be parenthesized. Otherwise {params} might be interpreted as a generator call on the previous word, for example. body can be either a single expression, or a block consisting of multiple expressions surrounded by {}.

A named generator is a statement rather than an expression: def name{params} = body. name can be followed by multiple parameter lists, defining a nested generator. This form also allows multiple definitions. When the generator is called, these definitions are scanned, beginning with the last, for one that applies to the arguments. That is, when applying the generator, it tests the arguments against all its conditions, then runs if they match and otherwise calls the previous definition. Sometimes it's useful to extend many generators in the same way; see extend for this use case.

A function can also have generator parameters. This case is discussed in the section on functions.

Parameter matching

A generator definition is only used when its specified parameter list matches those given, allowing multiple definitions to be effective. In addition to the names, the parameter list can include implicit and explicit constraints, such as:

The full system of matching and destructuring is described under def. The overall parameter list is matched in the same way as a tuple. This also means that (at most) one parameter slot can be variable-length if marked with a leading .... This "gathered" parameter corresponds to any number of inputs, and it's defined as the tuple of those values. For example, def tup{...t} = t is a possible implementation of the builtin tup, which returns the tuple of all parameters.

Spread syntax

When calling a generator, any parameter slot may be preceded by ... to expand it from a tuple into multiple parameters. For example, gen{a, ...tup{b, c}, d} expands to gen{a, b, c, d}. Any expression can follow: while ... isn't an operator, it acts like it has infinitely low precedence.

Partial application

A generator can be partially applied by using . in place of one or more parameters in call syntax. This doesn't call the generator, but creates a new generator whose parameters are used for those . positions. For example, gen{., 5, .} is a generator with two parameters, and when called on {a, b} the result is gen{a, 5, b}.

One parameter slot can also be replaced with ... without any following expression. This stands for any number of parameters, so that for example when gen{., 5, ...} is called on one or more values, the first is taken for the . and the rest are passed at the end where the ... is.

Kinds of value

A generator is one kind of value—that is, something that's first-class at compile time. Like most values, it doesn't exist at runtime. Hopefully it's already done what's needed! In fact it's one of the more complicated kinds of value. Here's the full list:

KindSummary
numberA compile-time, high-precision number
symbolA compile-time string
tupleA compile-time list of values
generatorTakes and returns values at compile time
typeA specific type that a runtime value can have
constantA typed value known at compile time
registerA typed value, unknown until runtime
functionTakes and returns typed values at runtime
labelTarget for goto{}

The simplest are discussed in this section, and others have dedicated sections below.

Numbers are floating-point, with enough precision to represent both double-precision floats and 64-bit integers (signed or unsigned) exactly. Specifically, they're implemented as pairs of doubles, giving about 105 bits of precision over the same exponent range as a double. Numeric literal syntax, which supports scientific notation and arbitrary bases up to 36, is described below.

Symbols are Unicode strings, written as a literal using single quotes: 'symbol'. They're used with the emit{} generator to emit instructions, and in export{} to identify the function name that should be exposed.

Constants consist of a value and a type. They appear when a value such as a number is cast, for example by creating a variable v:f64 = 6 or with an explicit cast{f64, 6}. For programming, constants work like registers (variables), so there's never any need to consider them specifically. Just cast a compile-time value if you need it to have a particular type—say, when calling a function that could take several different types.

Labels are for goto{} and related builtins described here.

Numeric literals

number     = repeat* ( scientific | base )
scientific = digit+ ( "." digit+ )? "e" "-"? digit+
base       = ( "0x" | digit+ "b" ) ( digit | letter )+
repeat     = digit+ ( "r" | "d" | "w" )

A number in Singeli starts with a digit but can contain letters, the decimal dot ., and an internal - as well; an initial - is not parsed as part of the number, but as a separate operator. Letters in numeric literals are case-insensitive and can contain underscores, which are ignored. Typical scientific notation such as 45 or 1.3e-12 is supported.

Integer hex literals can be written with x like 0xf3cc0, but b can be used for a more general base. A decimal number between 2 and 36 before b gives the base, like 2b110101 or 32b0jbm1. As with hex, digit 9 is followed by a.

An additional prefix allows repetition of digits to be specified. This repeats the mantissa digits, including leading 0s, as written, maintaining the base and exponent. The lowest-order digit always retains its position relative to the decimal point. Multiple prefixes can be specified as a form of documentation; they must have the same meaning.

Definition

The def statement, with syntax def target = value, makes compile-time definitions. A value defined this way is constant in its scope (following the rules of lexical scoping), with the exception that if it's a generator it can still be extended by generator definition statements (def target{params} = ...). More precisely, the definition remains constant at compile time. If it's a register, it will always refer to the same register, but the register can be mutable at runtime, allowing its value to be freely changed to another of the same type. Every copy of a mutable register made with def will reflect these changes, and assignments to it change the value.

Matching

The target of a definition can be a plain name, but other forms are allowed for destructuring tuples and types, and adding conditions. Some of these are more useful for generator parameters than def, and in fact the operators = and == aren't allowed at the top level of a def target.

These are the simple targets:

And these are the compound targets:

Each operator (=, ==, :, *, [k]) is right-associative, that is, the left-hand side can only be a simple target or tuple while the right-hand side extends to the end of the expression. For example, a:P=*V=[k]T groups as a:(P=(*(V=[k]T))), to match a variable a whose type is P, *V, and *[k]T.

Furthermore, any target may be followed by a condition such as if a < 5. Here if has to come after any operators, but it can appear in any component of a tuple matcher, or in the bracketed part [k] of a vector type matcher.

A tuple contains zero or more targets (and also can have an if condition inside even if there are no targets), and only matches a tuple or tuple type with the same length. However, it can also contain one "gathered" variable-length slot, indicated with a leading ..., that matches zero or more values—so that the total number of values allowed is the number of non-gathered targets, or more. These values are treated as a tuple. So the target {...a} is identical to a, except that it requires the matched value to be a tuple, and if it happens to be a tuple type it will be converted to a tuple of types.

A final implicit condition is that any name that appears more than once has to be assigned to matching values, so {a,a} matches tup{3,3} but not tup{3,4}. This doesn't apply to the placeholder _, so {_,_} matches either tuple.

Parenthesized expressions and conditions can freely use names from anywhere else in the target. This is accomplished by evaluating them after all other matching requirements are checked and names are defined. So {a if show{b>a}, b, b} is a perfectly fine target, and the show is only shown in cases where both b values match. The location of the if condition doesn't affect its meaning at all, except that the relative ordering of conditions controls what order they're evaluated in.

Note that the type in a:T is also a target! A target such as a:[8]i16 will assign to both a and i16. To specify the exact type, use parentheses like a:([8]i16).

Operators

Operators are formed from the characters !$%&*+-/<=>?\^|~. Any number of these will stick together to form a single token unless separated by spaces. Additionally, non-ASCII characters can be used as operators, and don't stick to each other.

The oper statement defines a new operator, and applies to all code later in the scope (operators are handled with a Pratt parser, which naturally allows this). Here are the two declarations of - taken from include/skin/cop.singeli.

oper -  __neg prefix      30
oper -  __sub infix left  30

The declaration lists the operator's spelling, generator, form (arity, and associativity for infix operators), and precedence. After the declaration, applying the operator runs the associated generator. More precisely, the generator name is looked up in the scope where the operator is used each time: by default the oper statement only associates the operator with a name and not any particular value. To assign a specific value instead, use a declaraction with the value—any expression—in parentheses such as oper - (__neg) prefix 30.

An operator can have at most one infix and one prefix definition. Prefix operators have no associativity (as operators can't be used as operands, they always run from right to left), while infix operators can be declared left, right, or none. With none, an error occurs in ambiguous cases where the operator is applied multiple times. The precedence is any number, and higher numbers bind tighter.

Parameters can be passed to operators before calling them, such as a -{b} c. This is converted to the generator call __sub{b}{a,c}. The arity is determined when it's called with operator syntax, and doesn't depend on any earlier calls with generator syntax.

Types

Singeli's type system consists of the following type-kinds: basic void and primitive types as well as compound types constructed from several underlying types.

Type kindDescriptionExample display
voidNothingvoid
primitiveA numberi32
vectorA list of same-type values[4]i32
pointerA pointer to memory*f64
functionA pointer to code(i8,u1)->void
tupleMultiple valuestup{u1,[2]u32}

The display isn't always valid Singeli code. Void and primitive types are built-in names but can be overwritten. The notation [4]i32 resolves to __vec{4,i32}, where __vec{} is also a built-in name. And * indicates the built-in __pnt{}, which isn't defined automatically but is part of skin/c. The notation given for functions can't be used, and the tuple tup{u1,[2]u32} is technically a different value from an actual tuple type but can be used as one where a type is expected.

Primitive types are written with a letter indicating the quality followed by the width in bits. The list of supported types is given below. Note the use of u1 for boolean data. A vector of booleans such as [128]u1 is one important use.

unsigned:   u1 u8 u16 u32 u64
signed int:    i8 i16 i32 i64
float:                f32 f64

A vector type indicates that the value should be stored in a vector or SIMD register, and the backend limits which vector sizes can be used based on the target architecture.

Functions

Generators are great for compile-time computation, but all run-time computation happens in functions. Functions are declared and called with parenthesis syntax:

fn times{T}(a:T, b:T) = a*b  # Type-generic function

fn square{T}(x:T) : T = {
  times{T}(x, x)
}

The body of a function can be either a plain expression like a*b above, or can be enclosed in curly braces {} to allow multiple statements. It returns the value of the last expression. The return type is given with : following the argument list, but can often be omitted (if it's left out and the body uses braces, = is also optional).

Function parameters like the {T} above are slightly different from arbitrary generator parameters: the function is only ever generated once for each unique set of parameters. Once generated, its handle is saved so that later calls return the saved function immediately (in contrast, a generator that declares a variable would make a new one each time). This avoids creating source code with lots of copies of functions, and also makes it possible for a function like square{T} to include recursion.

The parenthesis syntax to call a function is really just a nicer way to use the call{} builtin generator. As in a generator, such a call supports spread arguments with ....

A function argument can have a tuple type; in Singeli's compiled output it's flattened into multiple arguments. It can be made into a gathered argument with ... syntax: inside a function fn f(...a:tup{i8,i16}, b:i32), a is a tuple of registers, but the function is called as f(a0, a1, b) with a0:i8 and a1:i16. Since the length of a is known based on its type, any number of arguments can be prefixed by ..., unlike gathered parameters.

Export

The export builtin exports values for use in the calling language. In C this means a non-static constant with that name is defined in the output file.

export{'some_function', fn{i32}}     # Export as some_function()

export{tup{'fn', 'alias'}, fn{i16}}  # Export with two names

export{'twelve', cast{i16, 12}}      # Export a typed number

Registers

A function's arguments, and variables that it manipulates at runtime, are represented with typed slots called registers. Registers are first-class values, meaning they can be passed around and manipulated at compile time. For example, a generator can take a register as a parameter and set its value. Copies of registers, made by passing parameters or def statements, exhibit aliasing, like pass-by-reference.

In a function, registers can be declared using name : type = value syntax, where the type can be omitted if the value is already typed. The initial value is required. A function argument is also declared as a register, but it doesn't use an initial value as that's passed in when the function is called.

x:i32 = 25   # Type specified: numbers are untyped
y := x + 1   # Type inferred

The runtime value of a register can be changed with name = value syntax, with no :.

x = x + 1    # Increment

This does not change the compile-time value of x or y, which is a register. Declaration and reassignment do essentially the same thing at runtime, but at compile time they're two different things. Declaration is basically a def statement bundled with an assignment of the initial value. Assignment is an operation (in fact, a built-in operator) that acts on a register and a value, and can be used in generator calls. The left-hand side can be a full expression, as long as it resolves to a register at compile time—try (if (0) a; else b) = c for example. The built-in file skin/cmut (part of skin/c) defines generators for C operators +=, /=, >>=, and so on, so you can write:

x += 1
++x

At runtime, a register represents one value at a time, so whenever it's used it's the current value that will be visible. If you want to save the value somewhere, make another register, like x0 := x. This can be done even in a generator, so that def clone{old} = { new:=old } copies any typed value. So while skin/cmut doesn't define x++ because Singeli has no postfix operators, it's possible to make a generator that copies the parameter, increments it, and returns the copy.

Control flow

Singeli's built-in control flow statements are if-(else), and (do)-while. The semantics are just like in C, and the syntax is pretty similar too. You can use a single statement, or multiple statements enclosed in braces. Here are a few examples:

if (a < 4) a = a + 1

if (a > b) { c = a }
else { c = b }

do {
  doThing{c}
  c = c + 1
} while (c < a)

The result of a condition has to have the boolean type u1, or be a compile-time number with value 0 or 1. The number option allows if to be used as a compile-time switch: it won't compile the "if" block if the condition is 0, or the "else" block if it's 1.

In any if or while condition, the pseudo-operators and, or, and not can be used to make a compound condition. These are implemented by manipulating jumps, never with logic instructions. and and or are short-circuiting, meaning that they don't evaluate the second part of the condition if the result is determined by the first.

For loops

A for "loop" looks a lot like the for-each loops that are becoming common in high-level as well as low-level languages. Appearances are decieving, since it's really a special kind of generator call that can evaluate the block when it executes. But it covers the for-each functionality pretty well, with the right for generator.

# Loop over three pointers with a specific range
# Expressions in the descriptor like len/2 are only evaluated once
@for (dst, a, b over i from len/2 to len) {
  dst = a + b + i
}

# Use "in" to give the element a different name from the pointer
@other_for (x in src over n) x = 2*x

The name after @ is just a name, and is called as a generator. The following definition of for gives a typical C-like loop. It's passed a generator iter that evaluates the block; the details are discussed below.

def for{vars,begin,end,iter} = {
  i:u64 = begin
  while (i < end) {
    iter{i, vars}
    ++i
  }
}

Descriptor

The descriptor is the part in parentheses, and lists the variables and range to use for the loop. It provides the vars (a tuple of pointers), begin, and end parameters to the for generator above, and it defines which names are used in the main block of the loop, which then forms the iter parameter.

Here's the pattern. Square brackets [] indicate an optional part, and the means more names-maybe-in-pointers can appear, separated by commas.

[name ["in" pointer], … "over"]
  [index ["from" begin] "to"]
    end

That's a lot of options! You can have any number of pointers, and can leave the index out—although an oddity is that you have to include it if you want a starting value different from the default of 0. Some possibilities with the typical interpretations are listed below.

Other than the fixed default starting point of 0, the interpretation of the start and endpoint is all done by the generator that's used for the loop. It could ignore those values and always run exactly once, run backwards, or something else.

For generator

Once the descriptor and body are parsed, the following values are passed to whatever generator is named after @, and the result of the loop is whatever comes out. The names vars and so on are only for explanation, as they are nameless in Singeli compilation.

Most of the time the generator will evaluate the block—otherwise what's the point? This is done with iter{index, ptrs}, where ptrs is a tuple of pointers (it doesn't have match vars created by the for loop), and index is an index. This generator:

Here are some examples.

# The standard for loop, yet again
def for{vars,begin,end,iter} = {
  i:u64 = begin
  while (i < end) {
    iter{i, vars}
    ++i
  }
}

# Loop expanded at compile time, implemented with recursion
# Assumes begin and end are constant, to use compile-time if
# Each iter{} call compiles to code with a constant index
def for_const{vars,begin,end,iter} = {
  if (begin < end) {
    for_const{vars,begin,end-1,iter}
    iter{end-1, vars}
  }
}

# Loop over vectors of length vlen
def for_vec{vlen}{vars,begin,end,iter} = {
  # Cast each pointer to a vector
  def vptr{ptr:P} = reinterpret{*[vlen]eltype{P}, ptr}
  def vvars = each{vptr, vars}

  # Endpoints for vector part
  def vb = (begin+(vlen-1))/vlen  # Round up
  def ve = end/vlen               # Round down

  # Do the loops
  for{ vars, begin,   vlen*vb, iter}
  for{vvars, vb,      ve,      iter}
  for{ vars, vlen*ve, end,     iter}
}

for_vec showcases the flexibility of this approach. Since for is a normal generator, it can be called to avoid rewriting the same while-based logic everywhere. And since any pointer can be passed to iter, modified pointers vvars with a different type are fair game. This means iter will need to handle two different variable types, T and [vlen]T—fortunately Singeli is designed to support exactly this kind of polymorphism. And it's a requirement you control since you can decide what kind of for loop to use. Finally, for_vec takes another parameter before being called as a loop. It's invoked with, say, @for_vec{8} (…).

Match

The match structure performs compile-time case matching, similar to a switch-case statement. It wraps up the functionality of multiple def statements in an anonymous form. For example, this match statement:

def result = match (x, y) {
  {{a,b}, c} => b
  {a, {b,c}} => tup{a,c}
  {...args} => 0
}

Is equivalent to this sequence, except that it doesn't define temp.

def temp{...args} = 0
def temp{a, {b,c}} = tup{a,c}
def temp{{a,b}, c} = b
def result = temp{x, y}

Note the reverse order: in match, the first matching case will be used. The matched values ((x, y) above) can also be left out to create an anonymous generator. match (x, y) {…} behaves identically to (match {…}){x, y}.

Extend

The extend keyword allows for programmable generator extension, so that the same extension can be applied to multiple generators easily. It's mainly used for more "internal" Singeli definitions like in arch/c. In short, this repetitive code:

def sin{arg if arg<0} = -sin{-arg}
def tan{arg if arg<0} = -tan{-arg}

could be rewritten like this:

def extend odd{fn} = {
  def fn{arg if arg<0} = -fn{-arg}
}
extend odd{sin}
extend odd{tan}

The def extend statement creates a special kind of generator that can only be called by an extend statement. This statement (which can appear at the top level or anywhere else) looks like a generator call, but the parameters, sin and tan above, all have to be names—the extend mechanism allows the call to modify their values. The generator (odd above) can be any expression, but needs parentheses in most cases if it's compound. This means additional information can be passed in with extend (gen{args}){g0, g1}, where gen is an ordinary generator whose body returns a generator defined with def extend.

The top level

Top-level code is the statements that make up a file, local block, or body of an if_inline/else statement. Statements that could be used within a def or similar block context can also be used here, but there are some added features as well.

A different distinction prevents some features from being used at the top level. This is that code to be evaluated at runtime has to appear inside a function. This can be the main function, if the intent is to build a stand-alone executable. Also, a variable declared outside a function is constant and can't be modified with =. However, if it's a pointer to an array, it can be used in load and store instructions as usual.

Including files

The include statement can be used at the top level of a program. It evaluates the specified file as though it were part of the current one. The filename is a symbol, which loads from a relative path if it starts with . and from Singeli's built-in scripts (kept in the include/ source folder) otherwise. The option -l can be used to specify additional paths as well.

include 'arch/c'    # Built-in library
include './things'  # Relative path

As a result, an included file's definitions affect the file that includes it, any files that include that file, and other files included after that one.

Local

The far reaching consequences of file including might not be wanted for all definitions, so the local keyword restricts a top-level definition to apply only to the current file, and not anything else that includes it.

local def fn{a,b} = b              # Local generator
local b:u8 = 3                     # Local typed constant
local oper ++ merge infix left 30  # Local operator
local include 'skin/c'             # Local lots of operators

The local keyword restricts the scope of compile-time value and operator definitions. It doesn't do anything at runtime: all the functions and so on are still placed together in one big output file.

A file can only included once per scope; inclusions after the first will be ignored.

For larger sets of definitions, local also allows a block syntax. The contents of the block behave like a separate file included with local include, and more local statements are allowed inside—they'll apply inside the block but not to the rest of the file.

local {
  # All this stuff can be seen by the rest of the file,
  # but not outside it
  include 'skin/c'
  def t = 5
  def s = t + 1
}

if_inline

Top-level if statements act as if_inline currently; they will be changed to act as normal if soon.

An if_inline statement is allowed at the top level, and follows the same basic syntax as if but has very different evaluation rules. This system is quite a mess, and hopefully a better solution will be found eventually; for now, try to avoid these!

The condition of an if_inline statement is evaluated in a special scope that only has access to built-in Singeli definitions, so that for example if_inline (hasarch{'SSSE3'}) will work but not if_inline (debug) where debug has been defined previously. This is because the body of the statement doesn't have its own scope, but instead its definitions and operator declarations affect the scope it appears in—since the result of the if_inline statement isn't used, this is the only way for the body to define things. But if the condition could read from that scope, there's a circular dependency that can lead to paradoxes.

Main

In order to build stand-alone Singeli programs, a main block defines code that will be run when the program is invoked. A full version might look like this:

main(argc, argv) : i32 = {
  # Here argc:i32 and argv:**u8
}

The two arguments argc and argv are the number of command-line arguments and their values, as in C. Only their names can be specified; they'll implicitly be given the types i32 and **u8. Fewer names can be given (a single one will have type i32), and parentheses are optional if there are no arguments. The result type can be either i32 or void, and will be inferred as in a typical function if omitted. This means main can be as simple as single expression:

include 'debug/printf'
main = lprintf{'Hello'}

Config

The config statement allows compile-time configuration to be passed in as an argument to the build process. It has the same syntax as def except that the defined value must be a single name and not destructured. So config var = 4 ordinarily gives var the value 4. But if the argument -c var='conf' is passed when building, it'll be defined as 'conf' instead. The command-line argument can be any code and behaves as though it was written in the definition instead, which means it can access local operators and definitions.

Built-in generators

The following generators are pre-defined in any program. They're placed in a parent scope of the main program, so these names can be shadowed by the program.

Program

SyntaxResult
emit{type,op,args…}Call instruction op (a symbol, to be interpreted by the backend)
call{fun,args…}Call a function
return{result}Return result (optional if return type is void) from current function
export{name,value}Export value for use by the calling language
require{name}Require something from the calling language, such as a C header
load{ptr,ind}Return value at ptr+ind
store{ptr,ind,val}Store val at ptr+ind
makelabel{}Create a label value
setlabel{label}Set label to the current position
setlabel{}Short for setlabel{makelabel{}}
goto{label}Jump to the position set for a label

Architecture

SyntaxResult
setarch{feature…}Set the current function's feature set
addarch{feature…}Add features to the current function's set
hasarch{feature…}Return 1 if all features are present and 0 otherwise
listarch{}Current feature set as a tuple of symbols
witharch{fn,feat…}Return a generator like generated function fn but with the given feature set

An architecture feature is an uppercase symbol such as 'AVX2'. Each function is created with a set of such features, which can be set outside the function using witharch{} or inside using setarch{} and addarch{}.

Values

SyntaxResult
is{a,b}Return 1 if the parameters match and 0 otherwise
hastype{val}Return 1 if val is a typed value (constant, register, function)
hastype{val,type}Return 1 if val is a typed value of the given type
type{val}Return the type of val
kind{val}Return a symbol indicating the kind of value
__set{reg,val}Set value val for register reg, same as reg = val.
undefined{type}Unspecified or uninitialized value of the given type
undefined{type,len}Pointer to undefined array with the given element type and length
show{vals…}For debugging: print the parameters, and return it if there's exactly one

Possible kind results are number, constant, symbol, tuple, generator, type, register, and function.

Types

SyntaxResult
width{type}The number of bits taken up by type
eltype{type}The underlying type of a vector or pointer type
vcount{[n]t}The number of elements n in a vector type
cast{type,val}val converted to the given type
quality{type}Quality of primitive type: unsigned 'u', int 'i', or float 'f'
isfloat{type}1 if type is floating point and 0 otherwise
issigned{type}1 if type is signed integer and 0 otherwise
isint{type}1 if type is integer and 0 otherwise
typekind{type}A symbol indicating the nature of the type
primtype{q, w}The primitive type with quality q and width w
__pnt{t}Pointer type with element t
__vec{n,t}Vector type with element t and length n; equivalent to [n]t
tuptype{t…}Tuple type with the given element types
fntype{t…, res}Function type with the given argument types and result type res
unfntype{ft}Decompose a function type, giving fntype parameters as a tuple

Possible typekind results are void, primitive, vector, pointer, function, and tuple.

Typecasts

Casting generators convert untyped constants to typed, or convert between types, under specific conditions. In each case an untyped constant can be converted to any type that contains it. The conditions and behavior for converting between types are shown below.

SyntaxRequirement if typedConversion
cast{type,val}val's type matches typeSame value
promote{type,val}val's type is a subset of typeSame value
reinterpret{type,val}val's type has the same width as typeSame binary representation

Generators

SyntaxResult
bind{gen,param…}gen{param…, ...}
withenv{name,value,gen}gen but with environment name set to value during evaluation
getenv{name}Value for name in environment
memoize{gen}gen but return a saved result if parameters match a previous call

The program's environment used by withenv and getenv associates symbols (name in the table) with values. It's a form of dynamic scoping, in that associations are created and freed according to the generator call stack.

Tuples

SyntaxResult
tup{elems…}Create a tuple of the parameters
range{len}The tuple 0, 1,… len-1
merge{tups…}Concatenate elements from multiple tuples into one tuple
length{tuple}Return the length of a tuple or symbol
select{tuple,ind}Select the indth element (0-indexed) from the tuple
slice{tup,start,end}Take slice from start to (optional) end, like Javascript
apply{gen,tuple}gen{...tuple}
each{gen,tuple…}Map a generator over the given tuples
symchars{symbol}Return the characters of a symbol, as a tuple of symbols
fmtnat{num}Format natural number as symbol
findmatches{i, f}Return a tuple with, for each element of f, the indices of all matching elements in i

Generators merge{} and slice{} also work if applied to symbols instead of tuples.

Arithmetic

Arithmetic functions are named with a double underscore, as they're meant to be aliased to operators. The default definitions work on compile-time numbers, and sometimes types. The definitions fn{x} or fn{x,y} for numbers are shown, with a C-like syntax plus ** for exponentiation and // for floored division.

__neg__shr__shl__add__sub__mul__div__mod
-xx//2**yx*2**yx+yx-yx*yx/yx%y
__and__or__xor__not
x&yx|yx^y!x
__eq__ne__lt__gt__le__ge
x==yx!=yx<yx>yx<=yx>=y

The following builtins do the same thing as C's math.h functions with the same names.

__abs__floor__ceil__min__max
Absolute valueFloorCeilingMinimumMaximum

Built-in arithmetic is pervasive over tuples, meaning that if one or both arguments are tuples it will map over them, recursively until reaching non-tuples.

Command-line options

The arguments to the singeli command are input files and options in any order. Options all have a short form with one - and a long form with two. All except -h are followed by an additional argument.

-h, --help: Print short descriptions for all compilation options.

-o, --out: File path for compiled output; otherwise print directly to stdout.

-t, --target: Output type: c for C code, cpp for C++ (avoiding "crosses initialization" errors) and ir for Singeli IR. The IR format may not be stable, and that setting is currently just used for development.

-a, --arch: List of architecture features in the target system.

-i, --infer: Inference type, or when to assume one architecture feature implies another: strict to use only specified dependencies, loose to assume based on currently-existing architectures.

-l, --lib: Library paths to search in include statements. So -l path means include 'x' will check for lib/x.singeli, and -l lib=path means include 'lib/x' will check for path/x.singeli. All paths implied by -l are searched in order, followed by Singeli's built-in includes.

-c, --config: Specify the value of a config variable. For example, config var=4 normally acts as def var=4, but with -c var='conf' it will act as def var='conf' instead.

-p, --pre: A preamble to be placed before the emitted C output.

-n, --name: Prefix to use for C functions and global variables, instead of si_ (short for Singeli). If multiple files built with Singeli are to be included in the same project, this helps to avoid name conflicts.

-os, --show: Destination for show{} builtin. Set to stdout by default; may also be set to stderr, none to discard it, file= followed by a path to write to a file on program exit, or bqn to return all output as a BQN result, allowing it to be read by a BQN program that calls Singeli.

-oe, --errout: Destination for errors when the compilation fails. Set to stderr by default; other options same as -os. With bqn set, an error exits the current compilation rather than the program as a whole, and if there's no error causes the compilation to return 1 unless -os bqn is set.

-d, --deplog: A file path. A list of dependencies—files read while compiling—will be placed in this file. This way an incremental compiler framework can check these files to see if this compilation needs to be re-run.