Home

Awesome

Summus

Basic compiler frontend using LLVM as backend written without C macros (includes and simple defines don't count 😊) so it is easy to read and understand.

Mission

Create and refine a most basic compiler using LLVM as backend that can easily be used as a starting point for any kind of compiler and can also be used in learning the compiler construction basics.

Feel free to open issues if you feel any part of the code could be better documented.

Requirements and compiling

Windows

If you install Clang to C:\Program Files\LLVM\ the VS project should just work. Otherwise you will have to change project library path under linker options. Also in case you get a lot of linker errors try changing Runtime Library in project options, under C/C++ -> Code Generation, to /MD or /MT.

You could also try to setup official Clang build from llvm.org but I found it is not as straightforward as the unofficial build I linked above.

Linux

You can use clangCompile.sh or gccCompile.sh scripts to compile the compiler. You will get the binary output in the bin/ subdirectory.

Features

Language at the moment supports:

The goal of this project is not to create a perfect new language but to try and create a perfect compiler for minimal possible language.

Character literal examples

C charSummus char
'a'@a
'z'@z
'\n'@\n
'\t'@\t
'@'@@
'\x20'@\x20
'\20'@\16
'\'@\
' '@ 

Commands

Once you build summus compiler you can use these commands with it:

Here are some useful commands you can run on that output ll file:

Also if you want to experiment and discover what kind of llvm code needs to be written for certain constructs in C and C++ you can write the code you want to compile in a test.cpp file and then run:

For this you need llc built with cpp option which isn't always included in binary builds of LLVM so you may need to compile LLVM manually in order to get it.

Compiler structure

Compiler source is in compiler directory but it also uses smmgvpass from utility folder:

Test folder contains code and samples for automatic tests

Example improvement: How I added support for if statement

You can see the exact changes mentioned bellow in a commit called "Adds support for if and while statements" from January 14th 2017.

  1. First I decided what the sintax of if statement I want. I never liked parantheses around conditions so I decided I want this format:
    if condition then statement; else statement; // statement can also be { block }
  2. This means I first need to define 3 new keywords in lexer: if, then and else.
  3. In smmlexer.h I add three new values to SmmTokenKind enum: tkIf, tkThen and tkElse.
  4. In smmlexer.c I add string representations of these new enum values to tokenTypeToString array.
  5. I add definitions of new keywords to keywords array within initSymTableWithKeywords function in the same file.
  6. Within parser I need to have a node that represents if statement. That node must, besides a pointer to next statement, also have pointers to then and else statement so I need to define a new node type in smmparser.h. The new node type is called SmmAstIfWhileNode because I can reuse it for while statement as well. I also add new enum value to SmmAstNodeKind: nkSmmIf.
  7. In smmparser.c I first add string representations of new node kind values to nodeKindToString array. Since I am adding new statement I need to handle it in function parseStatement so I add a case for tkSmmIf token and make it call a new function: parseIfWhileStmt. In it I call parseExpression to parse the condition, then I check if tkSmmThen token is present and after that I call parseStament to parse if body. If after this the next token is tkSmmElse I again call parseStatement to parse the else part and I link it all into a newly created if node. Thus I get AST representation of if statement.
  8. To make sure all this works I now add handling of this new node kind to smmgvpass.c so that I can print it and see how it looks like. I just add some code under case nkSmmIf inside processStatement function to print if node itself and call processExpression and processStatement in order to print nodes for its condition and body.
  9. I compile all this, write some sample if/then code in inputfile.smm and I run summus -pp1 inputfile.smm | dot -Tsvg -oast.svg to generate an image of AST tree.
  10. After all that works I can add handling to further passes. First is smmtypeinference.c. Like every pass it has processStatement function and within it I add a new case for nkSmmIf node where I just call processExpression and processStatement to do type inference for its condition and body.
  11. Next is smmsempass.c where again I add handling of new node under processStatement function. In processIfWhile function that I call there I also just call processExpression and processStatement for condition and body which will check if all operand types are correct.
  12. At this point I can compile summus and run summus -pp2 inputfile.smm | dot -Tsvg -oast.svg and the same with -pp3 parameter to see how AST looks after each pass.
  13. Last pass is smmllvmcodegen.c where I need to construct LLVM module. In new processIf function that I again call from processStatement I first build LLVMBasicBlock for then body, for else body if it is given and for code that comes after if statement. If root of condition node is logical and or or node I call processAndOrInstr function directly because I just need conditional jumps it generates while I call processExpression for all other types of nodes. If I also called processExpression for logical nodes I would get a node that represents a resulting value of the condition (which is true or false) and then generate additional conditional jump based on that value which creates some extra instructions that I don't need. After this I position instruction building in then body block and call processStatement to generate instructions for it. At the end I add a branch instruction which jumps to end block. If there is an else body I position builder in its block and again call processStatement after which I also add a branch instruction that jumps to end block. At the end I position the builder in end block so the rest of code is generated in it.

This is where I stopped and left the rest of the work for the reader. The next step is to test different kind of invalid code and see how summus compiler handles it like missing then and similar. Once you are happy how it does that you need to write automatic tests for it:

  1. First you write a sample for all invalid situations and a sample for valid combinations of if statement inside tests/samples.
  2. You add handling of if statement to smmastwritter.c under processStatement function so it writes relevant data.
  3. You run the tests and you get ast files generated for new samples. You need to manually check these ast files and make sure all the data is as expected.
  4. You add handling of if statement to smmastreader.c and smmastmatcher.c so you can load the if node from ast file and then compare it with original.
  5. When you run the tests next time they should load ast file and compare it to parsed original and everything you check for in smmastmatcher should pass.

Further things that can be done