Home

Awesome

swervec: The Swerve compiler

Since this project is a work in progress, a lot of the stuff in here is subject to change and can become outdated

About this project

This is a simple compiler for a made up programming language, Swerve.

The compiler phases are planned to be:

1. Lexical Analysis
2. Parsing
3. Semantic Analysis
4. IR generation
5. Optimization
6. Code generation

Although the phases can, and most likely will, change as the project goes on.

The goal is to create a somewhat practical language that is statically typed and is influenced by syntactic features of other languages.

Memory management is TBD

Parsing Strategy: Recursive Descent, LL(1)

The parser produces an AST.

Examples of the AST structure can be found here.

Testing strategy

This project has both unit and integration tests.

Unit tests will be used to test simple cases, and integration tests will be used to handle more complex ones (such as testing variable scope)

There are no coverage requirements, but the objective is to cover enough edge cases that someone would be confident different parts of the compiler will work

Testing commands

Testing Objectives

  1. Provide proof of correctness for implementation of compiler and language ideas
  2. Handle regressions breaking existing code
  3. Document compiler and language features/structure through usage
  4. Catch bugs as early as possible

Language specification

Statements in the language must end with a semicolon

Statements are:

Reserved words:

Operators:

Comments

Inline comments can be made with // which will cause the compiler to ignore the rest of the line (unless // is in a string)

Multiline comments can be made as starting with /* and ending with */. Unterminated multiline comments are a syntax error.

Type System

Swerve is intended to be a statically, strongly typed language. Variable types and function return types must be explicitly declared.

Basic types include

Array types are dependent on the depth (level of nesting) and the type of data stored in the array. For example, Array<Array<int>> is used to store an array of arrays of ints.

Typecasting is only allowed via builtin functions

Operations on types

Binary operators

Unary operators

Other type requirements

    if (boolean) {
      // ...
    }
    else if (boolean) {
      // ...
    }

Variable Declarations

variables declarations must have a type and can optionally be const

Examples:

A valid variable name can contain uppercase letters, lowercase letters, numbers, and underscores, but must start with an uppercase or lowercase letter.

Once a variable is declared, you can reassign it (unless it's a const variable). Re-declaring, by specifying a type before it, is an error unless it's part of a loop condition.

Arrays

Arrays can be declared using the Array keyword and must specify their types by wrapping the type in < and >. There are a few different ways to declare arrays that change their behavior.

  1. immutable arrays
  1. mutable arrays

Strings

Regular strings, like in many other languages, must be wrapped in double quotes and can not contain any nested double quotes unless those are escaped

Examples:

Another way to declare strings is to use multiline strings. The multiline strings in Swerve were inspired by the syntax of multiline comments, so they operate in a similar way. You can start multiline string with /" and terminate them with "/. Multiline strings allow you to nest quotes without escaping them, and they preserve any whitespace characters such as tabs and newlines. Unterminated multiline strings are a syntax error!

Examples:

Function declarations

Functions are declared with the fn keyword followed by the name, and then parameters enclosed in parenthesis. A function body starts with an opening curly brace and must be terminated by a closing curly brace.

Functions with that are not void must specify the return type by adding : after the closing parenthesis of the parameters and then the type. Void functions can skip the : and type.

All parameters must be valid variables and have a type to the left of them.

Examples:

Function names can be reused only with different parameters.

Valid Example:

Invalid Example:

The order of function declarations matter so if you have a function calling another one, the caller must be defined after the called function.

Prototypes and generics

There are certain times when the developers will want to pass in Arrays as parameters to functions but don't necessarily care what's in the arrays.

For example, when you want to get the length of an array, it doesn't matter whether the array is of type Array<string> or Array<Array<int>>.

In order to add flexibility to the type system without completely giving up type safety, two new keywords were introduced, prototype and generic.

Generics

The generic keyword is a type label that can be placed in front of a variable during declaration that states that we'll figure out the type later on.

You cannot perform any operations on generic variables that aren't compatible with every type.

Example:

  generic var1;
  generic var2;
  // ...assign them values at some point
  var1 + var2;

The above example will produce an error because you aren't allowed to do addition with generics.

Generic arrays are declared by using Array<generic> which let you treat the variable as an array without caring about what subtype it contains.

When you index a generic array, it will return a generic type. You can't index Array<generic> twice, but you can index Array<Array<generic>> twice.

Prototypes

A prototype is a function definition that contains one or more generic parameters. It can return nothing, a concrete (non-generic) type, or a generic type.

The use of the generic keyword outside a prototype definition is not allowed. You cannot initialize a generic to any literal type except for null.

When a prototype is called with concrete types, the semantic analyzer will re-analyze it with the concrete parameter types in place of the generic ones to make sure it's type safe and to place any local variables that were generic as concrete types in the symbol table. Generic returns are analyzed to approximate their return types and transformed into functions.

Translation to functions (AKA Monomorphization)

On a function call:

  1. Check the function part of the symbol table for a matching definition; if found, return that
  2. Check prototypes part of symbol table for a matching definition; if not found, throw error
  3. If prototype returns nothing or concrete type, run process of type checking local variables and de-genericizing them in the symbol table, return the return type
  4. If prototype returns generic type (generic or array of generics), do the same analysis of step 3 but also derive a concrete return type
  5. Return the first non-null return type derived from step 4; if there are more than one non-null types, throw an error

Since generics can call other generics, this process is inherently recursive.

Prototypes can also be directly or indirectly (they call functions that call them) recursive, we need to store a queue of the prototypes being translated and if we detect a duplicate translation, we return null for that call.

Prototypes can slow compilation due to this process and are still less type safe than normal functions so use them sparingly and only when the types truly don't matter

Program Structure

All programs must have an entry point into the execution of the program. That's why the compiler requires you to define a main function.

The main function can take one of a few forms:

  1. no params, no return
  2. no params, return int
  3. int param and Array<string> param, no return
  4. int param and Array<string> param, int return

The return from the main function is treated as the program's return code. If your main doesn't return, it will automatically have a return code of 0 (successful execution).

The parameters specified in main function forms 3 and 4 are used to deal with command line arguments.

Typically, they'll look like this:

fn main(int argc, Array<string> argv) {}

argv is an array of the command line arguments (separated by space) and argc is the length of argv.

You don't have to use the parameter names, but those names are common practice in C, which is an inspiration for this language.

The only code allowed outside a function are variable declarations (including arrays). Any variables declared outside a function will be global; while any variables inside a function are local to the function.

Built-ins

There are built-in variables, functions, and prototypes in order to make development easier. These names are reserved so re-defining them is an error.

Variables

nametypedescription
INT_MINintminimum value of an integer
INT_MAXintmaximum value of an integer
DOUBLE_MINdoubleminimum value of a double
DOUBLE_MAXdoublemaximum value of a double

Functions

nameparam typesreturn typedescription
printerrstringnoneprint to stderr
printerrstring, intnoneprint to stderr and terminate the program with an exit code
lengthstringintget length of string
maxint, intintcompare two values and return the greater one
maxdouble, doubledoublecompare two values and return the greater one
maxint, doubledoublecompare two values and return the greater one
maxdouble, intdoublecompare two values and return the greater one
minint, intintcompare two values and return the lesser one
mindouble, doubledoublecompare two values and return the lesser one
minint, doubledoublecompare two values and return the lesser one
mindouble, intdoublecompare two values and return the lesser one
replacestring, string, stringstringReplace the first matching substring with a different substring
replaceAllstring, string, stringstringReplace all matching substrings with a different substring
splitstringArray<string>Split string character by character
splitstring, stringArray<string>Split string by delimiter
slicestring, intstringCreate string from the index to the end of the string
slicestring, int, intstringCreate string from start to end index of string
containsstring, stringbooleancheck if string contains substring
toIntdoubleintreturn int from double (rounds down)
toIntstringintreturn int from string
toDoubleintdoublereturn double version of int
toDoublestringdoublereturn double from string
atstring, intstringindex string
joinArray<string>, stringstringCombine strings into one string with delimiter
reversestringstringreverse string
startsWithstring, stringbooleancheck if string starts with substring
endsWithstring, stringbooleancheck if string ends with substring
sleepdoublenonepause execution for specified amount of time (seconds)
sleepintnonepause execution for specified amount of time (seconds)
exitnonenonestop the program with exit code 0
exitintnonestop the program with supplied exit code
fileExistsstringbooleancheck if the file exists
readFilestringArray<string>Get contents of file line by line
writeFilestring, stringnoneWrite to the file
appendToFilestring, stringnoneAdd to the end of the file
renameFilestring, stringnoneChange the name of the file
deleteFilestringnoneDelete the file
getEnvstringstringGet a value from a specified environment variable
setEnvstring, stringnoneSet the value of an environment variable

Prototypes

nameparam typesreturn typedescription
printgenericnoneprint to the screen
printlngenericnoneprint to the screen with new line character
getTypegenericstringGet the type of the variable or constant
lengthArray<generic>intReturn length of array
capacityArray<generic>intGet the total size of the array
sliceArray<generic>, intArray<generic>Create an array from the index to the end of the array
sliceArray<generic>, int, intArray<generic>Create array from start to end index of array
containsArray<generic>, genericbooleanCheck if the array contains an element
appendArray<generic>, genericnoneAdd an element to the end of an array
prependArray<generic>, genericnoneAdd an element to the beginning of an array
insertArray<generic>, generic, intnoneAdd element at index
removeIndexArray<generic>, intnoneRemove element at index from array
removeArray<generic>, genericnoneRemove first matching element from array
removeAllArray<generic>, genericnoneRemove all matching elements from array
indexOfArray<generic>, genericintGet the index of an array element; return -1 if not found
toStringgenericstringreturn string version of parameter
reverseArray<generic>noneIn place reversing of array
sortArray<generic>noneIn place sort an array