Home

Awesome

This repo is a prototype of a VM for the backend of a compiler/interpreter

Vitrual Machine Internals

This is a stack based VM which takes some inspiration from the JVM and CPython VM. It is written in C and designed to be the backend for this compiler.

May eventually merge this repo into the compiler repo

The VM is made up of the following parts:

Arrays and memory managment

Arrays are references to a list of values (either some frame's locals or VM globals). When an array is created, the destination for the values is stored, as well as the offset ( value of lp) to the destination with the array size (total amount of memory allocated) and length (number of actual elements in the array).

By default, arrays are stored in the local frame. When the frame returns, all values in the locals are deallocated anyway, so the array memory gets cleaned up. Frames that return arrays, copy the memory to the caller frame then get deallocated. When you store an array reference globally, its values get copied to the VM globals, the pointer gets changed to point to the globals, and the offset changes to the value of gp.

There is a configuration file .swerve_vm_config.yml that lets you adjust the behavior of arrays. There is a setting DynamicResourceExpansion which gives you the option of using less memory upfront and expanding as needed; this is on by default. If DynamicResourceExpansion is off, the soft maximums will be ignored. The HeapStorageBackup setting gives you the option of storing the array values in globals if the array is too big for locals; on by default. You can adjust the number of frames, size of the VM globals, frame locals, and frame stack in this file.

The HeapStorageBackup setting is only for array values, local variables cannot be backed up by the heap

Going over the configured hard limits, will result in the program crashing with an out of memory error. Setting the amount of allocated too high or too low, will also result in a memory error.

Memory RegionMinimumMaximum
frames116,384
stacksizeof(DataContant)64 KB
localssizeof(DataContant)1 MB
globalssizeof(DataContant)32 GB

sizeof(DataConstant) is 32 B on a 64-bit machine, so the minimum value would be 32. May vary on other instruction sizes.

Funtion calls and returns

As stated above, when a non-built in function is called, a new frame is loaded and the pc of the current frame is saved as the return address of that frame.

Functions are called as CALL [function name] [argc] where [function name] is replaced by a function name and [argc] is replaced by the number of parameters your function takes (both without square brakets). The number of arguments tells the VM how many values to pop off the stack; these popped values will be treated as the function parameters. Since a stack is "Last in first out", you must push your parameters in reverse order. The parameters are added as local variables of the new frame.

When a function returns, the top of the stack is popped, the frame is popped off the call stack, and the PC is set to the return address of the frame. The return value is pushed onto the top of the stack, unless it's of type None (indicating void return).

Built-in functions are somewhat similar in behavior. The only difference is they don't create a frame; instead they pop parameters, run some C code, and push the return value onto the stack (if not None).

When a function call happens, the VM first check if the name matches any of the built-in function names. If not, the VM searches the source code for defined functions and returns the index of the chunk of source code representing that function. Functions must be separated by new lines. The name of the function must be followed by a semicolon then the instructions.

Example:

This byte code

plus_one:
    LOAD 0
    LOAD_CONST 1
    ADD
    RET

caller:
    LOAD_CONST 1
    CALL plus_one 1
    CALL println 1
    HALT

is equivalent to this source code

    fn plus_one(int num): int {
        return num + 1;
    }

    fn caller() {
        println(plus_one(1));
        exit();
    }

println is a built-in function

Byte code

White space is used to denote the end of a function; each function must be separated by one or more new lines. Other than that, white space is not consequential, but it's a good idea to use spacing and indentation.

Any lines starting with a semicolon are ignored.

Any line that starts with a dot and ends with a colon will be used as a jump point. Jump points are used by jump instructions (JMP, JMPT, JMPF, SJMPT, SJMPF) to move the pc non-sequentially. All jumps must be explicitly defined otherwise, the pc will move to the next instruction it sees. The VM ignores jump labels and move to the next instruction right away.

Exit codes

Any other exit codes will come from C functions or C itself

Dependencies

Running the VM

valid start commands: