Awesome

scall - A reference implementation of Dhall in Scala

This project is a Scala implementation of the Dhall language, a purely functional programming language designed for programmable configuration with strong guarantees of consistency and security.

Example usage

Read a Dhall expression into a Dhall syntax tree, perform type checking and beta-normalization, and convert into a Scala value.

import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala

val a: Boolean = "Natural/odd 123".dhall.typeCheckAndBetaNormalize().unsafeGet.asScala[Boolean]

assert(a == true)

val b: BigInt = "1 + 2".dhall.typeCheckAndBetaNormalize().unsafeGet.asScala[BigInt]

assert(b == 3)

Define a Dhall factorial function as a Dhall expression, and apply it to another a Dhall expression.

import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.Syntax.Expression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala

val factorial: Expression =
  """
    |\(x: Natural) ->
    |  let t = {acc: Natural, count: Natural}
    |  let result = Natural/fold x t (\(x: t) -> {acc = x.acc * x.count, count = x.count + 1} ) {acc = 1, count = 1}
    |    in result.acc
        """.stripMargin.dhall.betaNormalized

assert(factorial.print ==
  """
    |λ(x : Natural) → (Natural/fold x { acc : Natural, count : Natural } (λ(x : { acc : Natural, count : Natural }) → { acc = x.acc * x.count, count = x.count + 1 }) { acc = 1, count = 1 }).acc
    |""".stripMargin.trim)

val ten: Expression = "10".dhall

// Manipulate Dhall expressions.
val tenFactorial: Expression = factorial(ten)

assert(tenFactorial.betaNormalized.asScala[BigInt] == BigInt(3628800))

In this example, we skipped type-checking since we know that the Dhall factorial expression is well-typed. However, Dhall only guarantees correct evaluation for well-typed expressions. An ill-typed expression may fail to evaluate or even cause an infinite loop:

import io.chymyst.dhall.Parser.StringAsDhallExpression

// Curry's Y combinator. We set the `Bool` type arbitrarily; the types cannot match in any case.
val illTyped = """\(f : Bool) -> let p = (\(x : Bool) -> f x x) in p p""".dhall
val argument = """\(x: Bool) -> x""".dhall
val bad = illTyped(argument)

// These expressions fail type-checking.
assert(illTyped.inferType.isValid == false)
assert(bad.inferType.isValid == false)

// If we try evaluating `bad` without type-checking, we will get an infinite loop.
bad.betaNormalized // java.lang.StackOverflowError

The Dhall factorial function can be also converted directly to a Scala function:

import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala

val factorial: BigInt => BigInt = """
                                    |\(x: Natural) ->
                                    |  let t = {acc: Natural, count: Natural}
                                    |  let result = Natural/fold x t (\(x: t) -> {acc = x.acc * x.count, count = x.count + 1} ) {acc = 1, count = 1}
                                    |    in result.acc
        """.stripMargin.dhall.betaNormalized.asScala[BigInt => BigInt]

assert(factorial(BigInt(10)) == BigInt(3628800))

Goals of this project

Fully implement the syntax and semantics of Dhall. All standard tests from the dhall-lang repository must pass. (This is done.)
Implement JSON and YAML export. (This is done.)
Implement tools for working with Dhall values in Scala conveniently (this is in progress). Convert between ordinary Scala types and Dhall types (both at run time and at compile time if possible). Most Dhall integrations only support a small subset of Dhall, but Scala has a rich type system. We would like to support Scala function types, Scala type constructors, higher-kinded types, and other Scala features as much as possible.
Implement tools for converting Dhall values into compiled Scala code (JAR format). JAR dependencies should be a transparent replacement of the standard Dhall imports, as far as Scala is concerned.
Optimize Dhall execution further. At the moment, all intermediate results of typechecking and beta-normalization are cached. For more optimization, perhaps rewrite the interpreter use HOAS, PHOAS, or Normalization-By-Evaluation. Compute different parts of a record in parallel. Perform typechecking and beta normalization of different subexpressions in parallel. Make sure caching is thread-safe.

Current status

Experimental features and optimizations

All alpha-normalization, beta-normalization, and type-checking results are cached in LRU caches of configurable size.
A non-standard "do-notation" is implemented.
Optimization: Natural/fold will short-cut the loop when the current result stops changing. (Backward compatible, no change to normal forms!) I also contributed this optimization to the Haskell backend.
Experimental feature: assert : a === b will perform alpha, beta, and eta-reduction on a and b before comparing their CBOR serializations. (Breaking change to normal forms!)
Experimental optimization: Natural/fold will not expand under lambda if intermediate expressions keep growing beyond about 500 sub-terms. (Breaking change to normal forms! Standard tests still pass because they do not include terms with such large normal forms.)

Other features in the Scala implementation of Dhall

Dhall values of function types are converted to Scala functions. For example, λ(x : Natural) -> x + 1 is converted into a Scala function equivalent to { x : BigInt => x + 1 }, which has type Function1[BigInt, BigInt].
Dhall values of type Type (for example, Text, Bool, or Natural -> Natural) are converted to Scala type tags such as Tag[String], Tag[Boolean], or Tag[BigInt => BigInt].
Print Dhall values using the standard Dhall syntax.

Ideas for future developments

Implement automatic type inference for certain solvable cases. Omit type annotations from lambdas and omit parentheses: \x -> x + 1 should be sufficient for simple cases. Omit the type argument from curried functions if other arguments can be used to infer the type. List/map [ 1, 2, 3 ] (\x -> x + 1) should be sufficient. Just a None without a type should be sufficient in most cases. Similarly, with the do-notation, as bind with x in p then q should be sufficient. (This probably requires introducing a new syntax form for do-notation rather than immediate desugaring, but perhaps not.)
Try HOAS and PHOAS to make the implementation faster.
Make sure the parser and the interpreter are stack-safe or at least do not introduce stack-overflow bottlenecks beyond what is expected.
Convert between Dhall values and Scala values automatically (as much as possible given the Scala type system). Support both Scala 2 and Scala 3.
Create Scala-based Dhall values at compile time from Dhall files or from literal Dhall strings (compile-time constants).
Compile Dhall values into a library JAR. Enable importing JAR dependencies instead of Dhall imports ( import as Scala?). Publish the Dhall standard library and other libraries as JARs.
Extend Dhall on the Scala side (with no changes to the Dhall language definition) so that certain Dhall types or values may be interpreted via custom Scala code.
Avoid beta-normalizing under lambda when that would increase the size of a Dhall function body. This is needed to operate efficiently on literal arguments (function body should not be fully beta-normalized until applied to a literal argument).
Detect Dhall functions that will ignore some (curried) arguments when given certain values of literal arguments, and implement laziness to make code more efficient. With that, detect fixpoints of Dhall functions under List/fold and stop the iteration early.
Implement some elementary functions for Natural more efficiently (probably no need to change Dhall language), such as gcd, div_mod, int_sqrt.
Implement numerical functions for rational numbers and for floating-point numbers.
Implement higher-kinded types, heterogeneous lists, dependently-typed lists, etc., if possible.
assert should be more powerful. Enable comparing types, enable associative simplification, product and co-product rewriting.

Parsing with `fastparse`

The ABNF grammar of Dhall is translated into rules of fastparse.

The "cut" is used sparingly as the ~/ operator, usually after a keyword or after a required whitespace.

However, in some cases adding this "cut" operator made the parsing results incorrect and had to be removed.

Another feature is that some parses need to fail for others to succeed. For example, missingfoo should be parsed as an identifier. However, missing is a keyword and is matched first. To ensure correct parsing, negative lookahead is used for keywords.

To improve parsing performance, the parsing results for some sub-expressions are memoized. This is implemented via an add-on library fastparse-memoize. See fastparse-memoise README for more information. That library is published as a separate artifact.

Limitations

So far, there are some issues with the Unicode characters:

If the input contains non-UTF8 sequences, the fastparse library will replace those sequences by the "replacement" character (Unicode decimal 65533). However, the Dhall standard specifies that non-UTF8 input should be rejected by the parser. As a workaround, at the moment, Unicode character 65533 is not allowed in Dhall files and will be rejected at parsing time.
Deeply nested expressions will cause a stack overflow error.

Release version history

0.2.1

Implemented fastparse-memoize to speed up parsing (by 10x and more in some cases).
Upgrade to fastparse 3.1.x

0.2.0

First version published on Sonatype
Fixed the regression described in https://github.com/dhall-lang/dhall-haskell/issues/2597
Support for Yaml and JSON export
Standalone JAR executable dhall.jar with command-line options similar to dhall-haskell
Published to Maven (Scala 2.13 only)

0.1 (Not published)

Full support for Dhall language standard version 23.0.0, submitted corrections to the Dhall standard: https://github.com/dhall-lang/dhall-lang/pull/1362
Fixing the bug in CBOR library: https://github.com/peteroupc/CBOR-Java/pull/25 and uptaking CBOR-Java 4.5.3 with the fix.
Some extensions (all backward-compatible, with no changes in normal forms)

Building a command-line utility

To build a standalone dhall executable: bash make_jar.sh

This creates the file ./dhall.jar.

To test the executable: bash test_jar.sh

The test script should print "Tests successful." at the end. If it does not print that, some tests failed.

Using the command-line utility

$ java -jar ./dhall.jar --help
java -jar dhall.jar --flags... command
  -f --file <str>    Path to the input Dhall file (default: stdin)
  -o --output <str>  Path to the output file (default: stdout)
  -q --quoted        Quote all strings (for Yaml output only; default is false)
  -d --documents     Create a Yaml file with document separators (for Yaml output only; default is
                     false)
  -i --indent <int>  Indentation depth for JSON and Yaml (default: 2)
  command <str>...   Optional command: decode, encode, hash, text, type, yaml, json

Examples:

Compute the Dhall normal form of a Dhall expression from a given file.

$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall
{ True = [1.23, 4.56], a = 2, b = None Bool, c = Some "abc", y = True }

Print the inferred type of a Dhall expression.

$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall type
{ True : List Double, a : Natural, b : Optional Bool, c : Optional Text, y : Bool }

Compute the SHA256 hash of a Dhall expression.

$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall hash
sha256:e06ccdb4df3721dba87291eb49754c87955462d73df626e5e4c77de3af06e87f

Export a Dhall expression to Yaml format. Default indentation is 2 spaces.

$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall yaml
'True':
  - 1.23
  - 4.56
a: 2
c: abc
'y': true

Export a Dhall expression to JSON format, with 4 spaces of indentation.

$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall --indent 4 json
{
    "True":   [
        1.23,
        4.56
    ],
    "a":   2,
    "c":   "abc",
    "y":   true
}