Awesome
scall - A reference implementation of Dhall in Scala
This project is a Scala implementation of the Dhall language, a purely functional programming language designed for programmable configuration with strong guarantees of consistency and security.
Example usage
Read a Dhall expression into a Dhall syntax tree, perform type checking and beta-normalization, and convert into a Scala value.
import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala
val a: Boolean = "Natural/odd 123".dhall.typeCheckAndBetaNormalize().unsafeGet.asScala[Boolean]
assert(a == true)
val b: BigInt = "1 + 2".dhall.typeCheckAndBetaNormalize().unsafeGet.asScala[BigInt]
assert(b == 3)
Define a Dhall factorial function as a Dhall expression, and apply it to another a Dhall expression.
import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.Syntax.Expression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala
val factorial: Expression =
"""
|\(x: Natural) ->
| let t = {acc: Natural, count: Natural}
| let result = Natural/fold x t (\(x: t) -> {acc = x.acc * x.count, count = x.count + 1} ) {acc = 1, count = 1}
| in result.acc
""".stripMargin.dhall.betaNormalized
assert(factorial.print ==
"""
|λ(x : Natural) → (Natural/fold x { acc : Natural, count : Natural } (λ(x : { acc : Natural, count : Natural }) → { acc = x.acc * x.count, count = x.count + 1 }) { acc = 1, count = 1 }).acc
|""".stripMargin.trim)
val ten: Expression = "10".dhall
// Manipulate Dhall expressions.
val tenFactorial: Expression = factorial(ten)
assert(tenFactorial.betaNormalized.asScala[BigInt] == BigInt(3628800))
In this example, we skipped type-checking since we know that the Dhall factorial expression is well-typed. However, Dhall only guarantees correct evaluation for well-typed expressions. An ill-typed expression may fail to evaluate or even cause an infinite loop:
import io.chymyst.dhall.Parser.StringAsDhallExpression
// Curry's Y combinator. We set the `Bool` type arbitrarily; the types cannot match in any case.
val illTyped = """\(f : Bool) -> let p = (\(x : Bool) -> f x x) in p p""".dhall
val argument = """\(x: Bool) -> x""".dhall
val bad = illTyped(argument)
// These expressions fail type-checking.
assert(illTyped.inferType.isValid == false)
assert(bad.inferType.isValid == false)
// If we try evaluating `bad` without type-checking, we will get an infinite loop.
bad.betaNormalized // java.lang.StackOverflowError
The Dhall factorial function can be also converted directly to a Scala function:
import io.chymyst.dhall.Parser.StringAsDhallExpression
import io.chymyst.dhall.codec.FromDhall.DhallExpressionAsScala
val factorial: BigInt => BigInt = """
|\(x: Natural) ->
| let t = {acc: Natural, count: Natural}
| let result = Natural/fold x t (\(x: t) -> {acc = x.acc * x.count, count = x.count + 1} ) {acc = 1, count = 1}
| in result.acc
""".stripMargin.dhall.betaNormalized.asScala[BigInt => BigInt]
assert(factorial(BigInt(10)) == BigInt(3628800))
Goals of this project
- Fully implement the syntax and semantics of Dhall. All standard tests from the dhall-lang repository must pass. (This is done.)
- Implement JSON and YAML export. (This is done.)
- Implement tools for working with Dhall values in Scala conveniently (this is in progress). Convert between ordinary Scala types and Dhall types (both at run time and at compile time if possible). Most Dhall integrations only support a small subset of Dhall, but Scala has a rich type system. We would like to support Scala function types, Scala type constructors, higher-kinded types, and other Scala features as much as possible.
- Implement tools for converting Dhall values into compiled Scala code (JAR format). JAR dependencies should be a transparent replacement of the standard Dhall imports, as far as Scala is concerned.
- Optimize Dhall execution further. At the moment, all intermediate results of typechecking and beta-normalization are cached. For more optimization, perhaps rewrite the interpreter use HOAS, PHOAS, or Normalization-By-Evaluation. Compute different parts of a record in parallel. Perform typechecking and beta normalization of different subexpressions in parallel. Make sure caching is thread-safe.
Current status
-
The Dhall language standard version v23.0.0 is fully implemented:
-
A parser from Dhall to Scala case classes is implemented according to the Dhall ABNF grammar using fastparse, closely following the syntax guidelines.
-
A serializer and deserializer for CBOR format is implemented. User may choose one of the three CBOR libraries: cbor-java, CBOR-Java, or borer. The fastest of them is CBOR-Java.
-
Alpha-normalization is implemented according to the Dhall specification.
-
Beta-normalization is implemented according to the Dhall specification.
-
Typechecking is implemented according to the Dhall specification for type inference including the "function check".
-
Import resolution code is implemented according to the Dhall specification for imports.
-
All the Dhall acceptance tests pass: parsing, CBOR encoding and decoding, alpha-normalization, beta-normalization, type-checking, and imports.
-
-
Supporting Scala 2.13 only, for now.
-
GitHub Actions are used to test with JDK 8, 17, 22.
-
Converting Dhall values to Scala values: basic support is complete.
-
Standalone executable JAR with command-line arguments for type-checking, evaluating, and exporting Dhall expressions.
-
Converting Dhall to YAML and JSON: complete.
-
Converting Dhall to TOML: basic support is complete.
Experimental features and optimizations
-
All alpha-normalization, beta-normalization, and type-checking results are cached in LRU caches of configurable size.
-
A non-standard "do-notation" is implemented.
-
Optimization:
Natural/fold
will short-cut the loop when the current result stops changing. (Backward compatible, no change to normal forms!) I also contributed this optimization to the Haskell backend. -
Experimental feature:
assert : a === b
will perform alpha, beta, and eta-reduction ona
andb
before comparing their CBOR serializations. (Breaking change to normal forms!) -
Experimental optimization:
Natural/fold
will not expand under lambda if intermediate expressions keep growing beyond about 500 sub-terms. (Breaking change to normal forms! Standard tests still pass because they do not include terms with such large normal forms.)
Other features in the Scala implementation of Dhall
-
Dhall values of function types are converted to Scala functions. For example,
λ(x : Natural) -> x + 1
is converted into a Scala function equivalent to{ x : BigInt => x + 1 }
, which has typeFunction1[BigInt, BigInt]
. -
Dhall values of type
Type
(for example,Text
,Bool
, orNatural -> Natural
) are converted to Scala type tags such asTag[String]
,Tag[Boolean]
, orTag[BigInt => BigInt]
. -
Print Dhall values using the standard Dhall syntax.
Ideas for future developments
- Implement automatic type inference for certain solvable cases. Omit type annotations from lambdas and omit
parentheses:
\x -> x + 1
should be sufficient for simple cases. Omit the type argument from curried functions if other arguments can be used to infer the type. List/map [ 1, 2, 3 ] (\x -> x + 1) should be sufficient. Just aNone
without a type should be sufficient in most cases. Similarly, with the do-notation,as bind with x in p then q
should be sufficient. (This probably requires introducing a new syntax form for do-notation rather than immediate desugaring, but perhaps not.) - Try HOAS and PHOAS to make the implementation faster.
- Make sure the parser and the interpreter are stack-safe or at least do not introduce stack-overflow bottlenecks beyond what is expected.
- Convert between Dhall values and Scala values automatically (as much as possible given the Scala type system). Support both Scala 2 and Scala 3.
- Create Scala-based Dhall values at compile time from Dhall files or from literal Dhall strings (compile-time constants).
- Compile Dhall values into a library JAR. Enable importing JAR dependencies instead of Dhall imports (
import
as Scala
?). Publish the Dhall standard library and other libraries as JARs. - Extend Dhall on the Scala side (with no changes to the Dhall language definition) so that certain Dhall types or values may be interpreted via custom Scala code.
- Avoid beta-normalizing under lambda when that would increase the size of a Dhall function body. This is needed to operate efficiently on literal arguments (function body should not be fully beta-normalized until applied to a literal argument).
- Detect Dhall functions that will ignore some (curried) arguments when given certain values of literal arguments, and
implement laziness to make code more efficient. With that, detect fixpoints of Dhall functions under
List/fold
and stop the iteration early. - Implement some elementary functions for Natural more efficiently (probably no need to change Dhall language), such as gcd, div_mod, int_sqrt.
- Implement numerical functions for rational numbers and for floating-point numbers.
- Implement higher-kinded types, heterogeneous lists, dependently-typed lists, etc., if possible.
assert
should be more powerful. Enable comparing types, enable associative simplification, product and co-product rewriting.
Parsing with fastparse
The ABNF grammar of Dhall is translated into rules of fastparse
.
The "cut" is used sparingly as the ~/
operator, usually after a keyword or after a required whitespace.
However, in some cases adding this "cut" operator made the parsing results incorrect and had to be removed.
Another feature is that some parses need to fail for others to succeed. For example, missingfoo
should be parsed as an
identifier. However, missing
is a keyword and is matched first. To ensure correct parsing, negative lookahead is used
for keywords.
To improve parsing performance, the parsing results for some sub-expressions are memoized.
This is implemented via an add-on library fastparse-memoize
.
See fastparse-memoise README for more information.
That library is published as a separate artifact.
Limitations
So far, there are some issues with the Unicode characters:
-
If the input contains non-UTF8 sequences, the
fastparse
library will replace those sequences by the "replacement" character (Unicode decimal65533
). However, the Dhall standard specifies that non-UTF8 input should be rejected by the parser. As a workaround, at the moment, Unicode character65533
is not allowed in Dhall files and will be rejected at parsing time. -
Deeply nested expressions will cause a stack overflow error.
Release version history
0.2.1
- Implemented
fastparse-memoize
to speed up parsing (by 10x and more in some cases). - Upgrade to fastparse 3.1.x
0.2.0
- First version published on Sonatype
- Fixed the regression described in https://github.com/dhall-lang/dhall-haskell/issues/2597
- Support for Yaml and JSON export
- Standalone JAR executable
dhall.jar
with command-line options similar todhall-haskell
- Published to Maven (Scala 2.13 only)
0.1 (Not published)
- Full support for Dhall language standard version 23.0.0, submitted corrections to the Dhall standard: https://github.com/dhall-lang/dhall-lang/pull/1362
- Fixing the bug in CBOR library: https://github.com/peteroupc/CBOR-Java/pull/25 and uptaking CBOR-Java 4.5.3 with the fix.
- Some extensions (all backward-compatible, with no changes in normal forms)
Building a command-line utility
To build a standalone dhall
executable: bash make_jar.sh
This creates the file ./dhall.jar
.
To test the executable: bash test_jar.sh
The test script should print "Tests successful." at the end. If it does not print that, some tests failed.
Using the command-line utility
$ java -jar ./dhall.jar --help
java -jar dhall.jar --flags... command
-f --file <str> Path to the input Dhall file (default: stdin)
-o --output <str> Path to the output file (default: stdout)
-q --quoted Quote all strings (for Yaml output only; default is false)
-d --documents Create a Yaml file with document separators (for Yaml output only; default is
false)
-i --indent <int> Indentation depth for JSON and Yaml (default: 2)
command <str>... Optional command: decode, encode, hash, text, type, yaml, json
Examples:
Compute the Dhall normal form of a Dhall expression from a given file.
$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall
{ True = [1.23, 4.56], a = 2, b = None Bool, c = Some "abc", y = True }
Print the inferred type of a Dhall expression.
$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall type
{ True : List Double, a : Natural, b : Optional Bool, c : Optional Text, y : Bool }
Compute the SHA256 hash of a Dhall expression.
$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall hash
sha256:e06ccdb4df3721dba87291eb49754c87955462d73df626e5e4c77de3af06e87f
Export a Dhall expression to Yaml format. Default indentation is 2 spaces.
$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall yaml
'True':
- 1.23
- 4.56
a: 2
c: abc
'y': true
Export a Dhall expression to JSON format, with 4 spaces of indentation.
$ java -jar ./dhall.jar --file ./scall-cli/src/test/resources/jar-tests/3.dhall --indent 4 json
{
"True": [
1.23,
4.56
],
"a": 2,
"c": "abc",
"y": true
}