Awesome

Tokstyle

C style checker for TokTok projects.

This project uses cimple, a highly restrictive simplified C dialect used in c-toxcore. Tokstyle builds on top of an already restrictive grammar to do additional semantic checks.

Using the tool

Using the docker image

Using the built docker image (only contains binaries, so you need an OS around it):

FROM toxchat/haskell:hs-tokstyle AS tokstyle
FROM ubuntu:22.04

COPY --from=tokstyle /bin/check-cimple /bin/
WORKDIR /work
ENTRYPOINT ["/bin/check-cimple"]

Then, build and run:

$ docker build -t check-cimple .
$ docker run --rm -v "$PWD:/work" check-cimple myfile.c

Building from source (cabal)

$ cabal install tokstyle
$ check-cimple myfile.c

Building from source (bazel)

$ bazel run //hs-tokstyle/tools:check-cimple -- $PWD/myfile.c

How to write a new linter

If you want to just get started by copy/pasting an existing one, have a look at one of the many existing linters in src/Tokstyle/Linter/. A simple example is LoggerCalls.hs. The tutorial below walks you through the implementation of another existing linter, which happens to be the simplest linter we currently have.

Roughly speaking, a linter is a function from (file path, ast) to a list of diagnostics represented simply as Text. Typically, we call the linter function analyse.

analyse :: (FilePath, [Node (Lexeme Text)]) -> [Text]

Most linters will be using the TraverseAst framework which simplifies tree traversals so you can match over the entire tree, recursively, with just a few lines of code. You don't have to do this, but bear in mind that you will need to manually recurse into top level objects like PreprocIfdef to even get at all the top level declarations.

TraverseAst provides a traversal object called AstActions with functions for all kinds of elements you might find in the AST. The most commonly used one is doNode, which operates on a single AST node, but if you want to override other functions like doFile to handle an entire file at once, or doLexeme if you only want to inspect the tokens, those can be changed in the definition of your linter. Each of the do functions also has a plural form, e.g. doNodes, which as a default is simply iterating over the list, but can be used to operate on the list as a whole.

Let's get started with a simple linter: CompoundInit, which checks that we don't write the following unnecessarily verbose code:

MyType foo = (MyType){0};

when instead, we should simply be writing

MyType foo = {0};

Writing the test

We'll start by writing the test first, in a classical Test-Driven Development style. Create a new file called test/Tokstyle/Linter/CompoundInitSpec.hs.

{-# LANGUAGE OverloadedStrings #-}
module Tokstyle.Linter.CompoundInitSpec (spec) where

import           Test.Hspec          (Spec, it, shouldBe)

import           Tokstyle.Linter     (analyse)
import           Tokstyle.LinterSpec (mustParse)

spec :: Spec
spec = do

First, some boilerplate: we need OverloadedStrings because we'll use string literals for code and expected diagnostics (which are Text instead of String).
We name our module according to what we'll be calling our linter module, suffixed with Spec.
Then we import a few Hspec functions used in the spec below.
Then, we need the analyse function which dispatches over all linters (we'll add our linter to it, later).
Our last import is the mustParse function from LinterSpec, which fails the test when parsing fails, so we don't need to deal with parse errors. We don't expect parse errors, because we write the C code ourselves and it should be parseable.
Finally, the start of our executable specification: spec, which is automatically called by hspec-discover and hooked up to the test runner.

Now, for our first test, we write one with a piece of code containing the pattern we want to detect. In this case, a function containing the above variable declaration. We write this in the Spec monad (using it and shouldBe).

    it "detects compound literal initialisers" $ do
        ast <- mustParse
            [ "void f(void) {"
            , "  Foo foo = (Foo){0};"
            , "}"
            ]
        analyse ["compound-init"] ("test.c", ast)
            `shouldBe`
            [ "test.c:2: don't use compound literals in initialisations; use simple `Type var = {0};` [-Wcompound-init]"
            ]

First, we parse a piece of code given as [Text] into an AST. If that fails, the test stops.
Otherwise, we continue by calling analyse with only the linter we want to test ("compound-init") in the enabled linter list. We pass it the (file path, ast) tuple. For most linters, the file path won't matter, but some (like the logger related ones) may want to exempt some files. This way, you can test that the exemption works.
Finally, we check that the linter gives the diagnostic we expect. We can leave this empty if we don't know what the diagnostic will be, yet, or put an empty string in it to make sure it fails (Test-Driven Development: we start with a failing test and make it pass by writing code later).

Typically, we also write a negative test for the pattern we do want to allow. If the developer sees the above error, they fix it, and then the linter should be silent.

    it "accepts aggregate initialisers" $ do
        ast <- mustParse
            [ "void f(void) {"
            , "  Foo foo = {0};"
            , "}"
            ]
        analyse ["compound-init"] ("test.c", ast)
            `shouldBe` []

Registering the linter

Now we have our test, which will fail if we run it (using bazel below, but you can run it with stack or build and run with cabal as well):

$ bazel run //hs-tokstyle:testsuite -- --match "/Tokstyle.Linter.CompoundInit/"
...
  hs-tokstyle/test/Tokstyle/Linter/CompoundInitSpec.hs:18:9:
  1) Tokstyle.Linter.CompoundInit detects compound literal initialisers
       expected: ["test.c:2: don't use compound literals in initialisations; use simple `Type var = {0};` [-Wcompound-init]"]
        but got: []

It fails because there is no linter for -Wcompound-init, so we'll write one:

{-# LANGUAGE OverloadedStrings #-}
module Tokstyle.Linter.CompoundInit (analyse) where

import           Data.Text       (Text)
import           Language.Cimple (Lexeme (..), Node)

analyse :: (FilePath, [Node (Lexeme Text)]) -> [Text]
analyse _ = []

This linter doesn't do anything yet, but we'll register it with the linter list anyway. Open src/Tokstyle/Linter.hs, add a line like

import qualified Tokstyle.Linter.CompoundInit as CompoundInit

to the imports, and then a line like this to the localLinters list:

    , ("compound-init"      , CompoundInit.analyse     )

Don't worry about global linters at this point, those are a more advanced (but to be honest not actually more complicated) feature if you need to do whole program analysis.

Now we've created our linter boilerplate and registered it, we can run the test again, and it will still fail. The linter exists, but doesn't return any diagnostics. You can try returning one in the = [] part, e.g. saying = ["test.c: oh no!"], and now both tests will fail: one has the wrong diagnostic and the other has a diagnostic when it expects none. Let's write it to actually do something useful now.

Traversing the AST

We'll be using the TraverseAst framework here, so we'll need some more imports:

import           Language.Cimple.TraverseAst (AstActions, astActions, doNode,
                                              traverseAst)

We also need the AST type constructors to actually perform pattern matching. The Cimple AST uses fixpoint types, so we'll need Fix and the NodeF functor constructors (don't worry about what that means, TL;DR: all the AST node layers have an extra Fix in between, useful for recursion strategies, but we're not using those here).

import           Data.Fix                    (Fix (..))
import           Language.Cimple             (Lexeme (..), Node, NodeF (..))

And finally, we will be using the Diagnostics.warn helper function which takes care of formatting diagnostics and adding them to the State we'll be using:

import           Control.Monad.State.Strict  (State)
import qualified Control.Monad.State.Strict  as State
import           Language.Cimple.Diagnostics (warn)

Next, our actual code, starting with a no-op traversal. The default for astActions is to simply traverse the entire tree and do nothing. State [Text] is the monad each of the do actions runs in. This can be anything, and more advanced linters may choose to use a record type to contain more than just diagnostics, but for now it's just a list of Text.

linter :: AstActions (State [Text]) Text
linter = astActions

The analyse function now looks like this:

analyse :: (FilePath, [Node (Lexeme Text)]) -> [Text]
analyse = reverse . flip State.execState [] . traverseAst linter

It creates the linter monad using traverseAst linter and then runs it with State.execState and the initial state being the empty list [] (i.e. no diagnostics). Since warn adds diagnostics in reverse (because prepending to a linked list is O(1) while appending is O(n)), we need to reverse the final list when returning it.

Running the test now will still fail, because our linter only does a traversal but doesn't actually check anything.

Pattern matching and emitting diagnostics

We need to extend the astActions by overriding some of its functions that by default do nothing:

linter :: AstActions (State [Text]) Text
linter = astActions
    { doNode = \file node act ->
        case unFix node of
            _ -> act
    }

As you can see, the doNode function gets the current file path being processed, the current node being visited, and the recursive action. The default doNode ignores file and node and performs the recursive action. So, we have just written the default doNode implementation. We use unFix here to make the patterns slightly more readable (we still need all the Fixes for inner patterns).

Now, let's have a look at what things we might be able to match on by outputting everything this doNode function sees:

import           Debug.Trace                 (traceShowM)
...
        case unFix node of
            x -> do
                traceShowM x  -- show the node we're visiting
                act           -- still want to recurse

Running the test now will, in addition to failing, print a whole bunch of Haskell expressions, each of which can be copy/pasted as a pattern into the case unFix node of we wrote (using ... here to skip some uninteresting bits):

FunctionDefn Global (Fix (FunctionPrototype (Fix ..."void") (L ... "f") ...)) (Fix (CompoundStmt ...))
FunctionPrototype (Fix ..."void") (L ... "f") ...
TyStd (L (AlexPn 0 1 1) KwVoid "void")
CompoundStmt [...]
VarDeclStmt (Fix (VarDecl ..."Foo"... (L ... "foo") [])) (Just (Fix (CompoundLiteral ...)))
VarDecl ..."Foo"...
CompoundLiteral ...

Out of all of these, we want the VarDeclStmt. Why? Because if we match too deeply, e.g. on a VarDecl, we don't have the initialiser, or if we match the CompoundLiteral, we also match them in expressions outside variable initialisers. So next, we copy/paste the entire Haskell expression into the case:

        case unFix node of
            VarDeclStmt (Fix (VarDecl ...)) (Just (Fix (CompoundLiteral ...))) -> do
                traceShowM node
                -- don't recurse further, there's nothing interesting inside, so no "act" here.

            _ -> act  -- recurse for anything not matched

Running the test again, we can now see we only output the one node we care about. This means a match succeeds. You can also check the second test, the negative test which shouldn't match, to make sure only the first test triggers the traceShowM case. You can use the following command lines to run specific tests:

$ bazel run //hs-tokstyle:testsuite -- \
    --match "/Tokstyle.Linter.CompoundInit/detects compound literal initialiser/"
$ bazel run //hs-tokstyle:testsuite -- \
    --match "/Tokstyle.Linter.CompoundInit/accepts aggregate initialisers/"

The huge expression we just copied in (simplified above with ..., which isn't actually valid Haskell code, just done to shorten the example here) only exactly matches the one test we wrote. We should simplify and generalise it a bit until it matches everything we want to match and nothing we don't. We don't care about the type or name of the variable, so we use _ for that part, and we don't care what's inside the CompoundLiteral, so we use {} to match any CompoundLiteral regardless of its contents.

Finally, we want to formulate our diagnostic message using warn. We pass the currently processed file path, the node to use for source locations, and the diagnostic text.

        case unFix node of
            VarDeclStmt _ (Just (Fix CompoundLiteral{})) -> do
                warn file node $ "don't use compound literals in initialisations; "
                    <> "use simple `Type var = {0};`"

            _ -> act  -- recurse for anything not matched

Now run the test, and it should pass. We're done! You have now written your first linter. The sky's the limit from now on.

Appendix I: Add files to tokstyle.cabal

The above tutorial mostly assumes bazel, which automatically detects new files. To make a pull request or to run these with stack or cabal, you'll need to add your new module names to tokstyle.cabal in the library and test-suite sections.

Appendix II: Some useful tips

Have a look at other linters to see how they work with diagnostics. E.g. the Language.Cimple.Pretty module can be useful for pretty printing nodes as C code.
As a performance optimisation, stop recursive traversals whenever you've hit your pattern. If you know there are large subtrees you will never match in, write a pattern to avoid recursing into them (this can save a lot of time if you, for example, only look at top level declarations and don't need to inspect FunctionDefns).
The above is also useful if you have a pattern that's always OK in certain contexts, but not in others. You can skip the entire subtree for a node in which it's OK. Example: LoggerConst.hs.
Use PatternSynonyms if you need to match the same pattern many times (for an example, look at Booleans.hs).
If you need to keep more state during your linter execution, make a data Linter with a HasDiagnostics instance that tells warn how to add diagnostics to your data type. See DeclaredOnce.hs for an example.
If you write a linter that you don't want to enable by default, add it to the defaultFlags list in tools/check-cimple.hs as -Wno-my-linter. Users can still run it explicitly with -Wmy-linter.