Home

Awesome

Rhine

Rhine is a Clojure-inspired Lisp on LLVM JIT featuring variable-length untyped arrays, first-class functions, closures, and macros. While Clojure hides the lower-level details by running atop the JVM, Rhine aims to expose how common Lisp constructs map to hardware.

Building

First, opam switch 4.02.1 to make sure that you're running a custom-built ocaml (for camlp4). First, run brew install libffi. Then, run opam install ocamlfind menhir core textutils ctypes, open a new shell to refresh env, and invoke make.

Troubleshooting the build

There are a number of reasons for the build failing:

  1. Silly things like git submdoule --init failing can be fixed easily; just anchor the submodule to a valid commit and send a PR.

  2. opam problems. If you run into one of these after following the instructions presented above, open an issue. An upstream update probably screwed something up.

  3. Silly build issues usually arise from the build not being perfectly parallel: due to races, the -j8 picks the dependee too earlier. Either go into llvm-build/ and keep hitting make -j8 until it succeeds, or drop the -j8 together, waiting for a longer time for a predictable result.

  4. Running into more serious build issues usually means that llvm upstream has changed in a trivial way; you can attempt to fix this yourself, or open an issue.

How it works

An untyped system means that all values are boxed/unboxed from a value_t structure at runtime:

%value_t = type {
	 i32,                                ; type of data
	 i64,                                ; integer
	 i1,                                 ; bool
	 i8*,                                ; string
	 %value_t**,                         ; array/fenv
	 i64,                                ; array/string length
	 double,                             ; double
	 %value_t* (i32, %value_t**, ...)*,  ; function
	 i8                                  ; char
}

The overhead of boxing/unboxing is paid by all dynamic languages, although multiple optimizations (including speculative optimization) can reduce the overhead. Rhine currently only implements the basic optimizations bundled with LLVM.

Rhine does automatic type conversions, so (+ 3 4.2) will do the right thing. To implement this, IR is generated to inspect the types of the operands (zeroth member of value_t), and a br (branch) is used to take the right codepath. A possible optimization is to generate a branchless codepath for all-integer arguments.

LLVM provides the array type and vector type. They cannot be used since they are fixed-length; i.e. the length must be known at compile-time. The problem is that a construct like (cons 8 coll) generates a runtime length which is equal to the (length coll) + 1. So, we malloc, getelementptr, and store by hand. It has the type specified by the fourth member of value_t.

To implement first-class functions, note that all functions must have the same type; i.e. the type of the function pointer (the seventh member of value_t). How else would you implement:

(defn map
  [f coll]
  (if coll
    (cons (f (first coll))
          (map f (rest coll)))
    []))

(defn map2
  [f c1 c2]
  (if (and c1 c2)
    (cons (f (first c1) (first c2))
          (map2 f (rest c1) (rest c2)))
    []))

Here, f takes one argument in map, but takes two arguments in map2. A function pointer type embracing variable arguments is implemented using the varargs framework of LLVM. Note that va_arg doesn't work on x86, so Rhine extracts the values by hand. The first argument gives the number of arguments, and is used to implement varargs in the Rhine language. The second argument is the closure environment (which has the same type as an array).

Closures are simple to implement with this framework in place. First, when a function is declared, parse out all the unbound variable names (not present in arguments or let), sort the names, put it in a hashtable for later reference, and codegen the code required to bind the names from the env argument in order. At the callsite, look up this hashtable, and pack all the corresponding environment variables into the env argument. So, stuff like this will work:

(defn quux [] (let [a aenv] (println a) (println env)))
(let [env 12 aenv 17] (quux))

But there's a problem because we have first-class functions. What happens to this?

(defn t [y] (+ x y))
(defn f [x] t)
(let [g (f 3)] (println (g 4)))

Here, the callsite for t is not in f, but in the anonymous function at the end. But the anonymous function doesn't have the environment variable x that t requires, f does. So, we have to augment function pointers with the environment (resuing the fourth argument of value_t).

It's important to realize that macros require that we go back-and-forth between LLVM values and the OCaml codegen engine. How else would you evaluate something like:

(defmacro baz [x]
  `[1 2 ~x])
(baz (+ 2 2))

Note that macro arguments must be passed unevaluated at the callsite. The maco-expand stage now needs to codegen [1 2 <something>], where that <something> itself needs to be codegen'ed by evaluating the argument: the result from the compiler must be returned as an AST object to OCaml. This requires some involved construction of OCaml objects from C.

Another subtle point to note is that macros must be lifted out of the program and macro-expanded at the beginning of the program. This is because we can't suddenly codegen segments required for macroexpansion in the middle of codegen'ing another function.

Todo

Notes