Home

Awesome

Contents

Purpose of Eso

I develop and maintain this for two principle reasons:

Eso started as a functional BrainFuck interpreter made out of curiosity about pure functional programming. In the course of turning Eso into what it is now, I've learned a shocking amount about language design, types, category theory, compiler theory, CPS, combinators, parsers, generative grammars, concurrency, effects, prime generators, and... quite a lot more.

You may notice that some things in Eso that could be implemented with a relatively simple library are not; usually that's on purpose, because writing the boilerplate and abstractions myself is... Well, it's fun and educational.

Meaning of functional

Functional programming is a paradigm with three principle characteristics:

First Class Functions

Functional programming gets its name from the fact that functions are treated as first-class values. This means you can use and manipulate functions just as you would any other variable or parameter.

Zero Side Effects

A function defines a relationship between an input and an output. This means a function does not change anything outside of itself. One major example of this is error handling: In most languages, an error is thrown by a method and prevents a return value, and thus must be caught separately. This means code can have unexpected behavior that extends beyond the return, and it can be a pain to deal with. In functional programming, errors are usually handled by returning a wrapper that holds either a result or error information, which eliminates behavior that the caller does not expect.

Referential Transparency

A function is also stable in that each input returns exactly one output. You should be able to replace any call to a function with the result and not change the behavior of the program, which means a function has no mutable state. As an extension of this, functional programming also makes heavy use of immutable data structures, which are data structures that cannot be changed in place.

Ultimately, what functional programming accomplishes is highly modular, testable, and type-safe code. Many functional languages also allow for far more elegant and concise code than is common of imperative and object oriented languages.

State of the Project

Supported Languages

Current native language support (mostly in chronological order):

  1. Brainfuck
  2. Fluffle Puff
  3. Ook
  4. WhiteSpace (as defined here)
  5. WhiteSpace Assembly
  6. FracTran
  7. FracTran++
  8. Thue
  9. P''
  10. ///
  11. Deadfish
  12. Emmental
  13. Befunge-93
  14. Befunge-98 (Handprint: 1165193033)
  15. Wierd (Compliant with wierd-new.c)
  16. Unlambda
  17. Grass
  18. Path
  19. SNUSP
  20. Metatape
  21. Prelude
  22. NULL
  23. Volatile
  24. Glypho
  25. Platts
  26. WordLang
  27. LazyK
  28. ALPL
  29. LazyBird
  30. Senpai (Currently missing module import command)
  31. Scala

Current features:

WIP:

Languages Under Consideration

This is not an exhaustive list, but here are some of the languages I've considered, am considering, or am currently working on.

Design of Eso

As I mentioned earlier, Eso started out as a simple, single functional-style BrainFuck interpreter. Since then it's gotten... Complicated. Though I prefer the word "sophisticated".

To clarify what is meant by "Functional Esoteric Language Interpreter": All of the language components (with the sole exception of the Scala component) are purely functional. The user interface is not purely functional, but its structure borrows a lot from functional style.

Language Components

Languages supported by Eso can currently have 3 types of components:

Parsers

One of the first things everyone would ask me when I started telling people about Eso is "what are you using for parsing?"

I didn't really understand the question at first, because each language's parser is different. I wrote the boilerplate code every time. But, as I wrote more and more parsers, I started to understand why they asked: There's a lot of boilerplate.

So I started looking into parser libraries, then parser combinator libraries, then just decided I'd write my own damn parsing tools.

So, Eso has two general forms of parsers: Bespoke parsers, which are written by hand boilerplate and all, and abstracted parsers, which use Eso's parser combinators to more cleanly and elegantly put together a parser. I only use bespoke parsers when there's a significant gain in performance (and I care enough about performance), or in any old parsers I left alone when overhauling for the new tools.

Eso's parser combinator toolset is based on two primitive parsers and a collection of functions to compose, group, and modify them (called combinators). The primitive parsers are:

The combinators are as follows, where p and q are parsers, a is a value, and f is a function:

The result is a shockingly versatile, expressive, and readable parser framework. A complete parser for the combinator calculus, for instance, can be expressed in two lines:

def combinator = ("S" ^^^ S) | ("K" ^^^ K) | ("I" ^^^ I) | ("(" &> expression <& ")")
def expression = combinator.+ map (_.foldLeft(I)(Application))

Now, you may notice that I didn't actually make any primitive parsers in there. That's because both string and regex parsers have implicit conversions that allow you to use string and regex values directly as parsers. Implicits may be dangerous, but they sure are fun.

Let's take a look at the Thue parser for some regex goodness:

def ruleParse: EsoParser[(String, String)] = """(?m)^(.*\S.*)::=""".r <&> """^(.*)\v""".r
def initParse: EsoParser[String] = """(?m)^\s*::=.*\v""".r &> """(?s).*""".r
def thueParse: EsoParser[(Vector[(String, String)], String)] = ruleParse.* <&> initParse

Going piece by piece...

def substring = """(?m)^(.*\S.*)::=""".r // match a substring at the beginning of a line of the form "[expression]::=", and uses a capturing group to only return "[expression]"
def replacement = """^(.*)\v""".r // Matches any substring from the beginning of input to a new line (vertical whitespace character), uses a capturing group to return everything but the new line character
def ruleParse = substring <&> replacement // Combines the first two parts to parse a line of the form "a::=b" into (a, b)
def ruleList = ruleParse.* // Parses all matches from ruleParse at the same time and returns them as a vector

def ruleTerminator = """(?m)^\s*::=.*\v""".r // Matches the first line with only "::=" and whitespace
def initialString = """(?s).*""".r // Matches the whole input
def initParse = ruleTerminator &> initialString // Matches all of the input after a line consisting of only "::=" and whitespace

def thueParse = ruleList <&> initParse // Parses all Thue replacement rules into a list of pairs followed by the initial string after the terminating empty rule

Isn't that fun? I think that's fun.

Eso's first set of parser tools was far more cumbersome and a bit less expressive, relying on overcomplicated primitive parser classes rather than powerful combinators. By relying on combinators to build up complicated parsers out of simple primitives, Eso's newer parser combinator toolset is far more elegant, readable, and concise. I wish I had it from the start.

As one last note on the parser combinators, you may have realized a potential problem hiding in the Combinator Calculus parser: Recursion.

Let's say for example we have this Iota parser:

def expression = ("i" ^^^ i) | ("`" &> (expression <&> expression) map (Application))

You see how the parser appears in its definition? Yeah, that'll blow up your stack real fast.

The solution is, of course, trampolining. There's a more thorough explanation of trampolining near the end of this writeup, but basically the function calls are turned into a series of tasks that can be done within a single stack frame. All the combinators have special classes that trampoline with each other, so no matter how recursive your parser, it will never blow up the stack! Might kill your heap though... but not your stack!

How the Interface Works

It's always a challenge to handle UI functionally. Everything behind the scenes is fair game, but actually receiving input from and sending output to the user is by definition a side-effect.

The philosophy I took with Eso's interface is to focus on modularity without overcomplicating things with a doomed crusade for pure functional code.

Persistent vs Non-Persistent Interfaces

Eso has two different interfaces, persistent and non-persistent.

The non-persistent interface is the entry point; it parses the input, executes it, then exits. This interface is useful for quick actions and working with other tools, for example:

>eso transpile -sl BrainFuck -tl C++ -s lostKingdom.b -o lostKingdom.cpp -init 1000 -dyn true -indent true
Transpiled program saves to lostKingdom.cpp.
>gcc lostKingdom.cpp -o lostKingdom
>lostKingdom.exe
...

If the non-persistent interface receives either no arguments or the command persistent, it starts the persistent interface. This interface takes over the console, so you cannot use other tools in the same terminal while it's running. The advantage of this is that it can maintain environment variables, such as runtime parameters, bindings, and user-defined translators.

If you supply persistent with arguments, the persistent interface initializes the corresponding environment variables to the provided values.

The Command Parser

Eso parses each line of input into two components: A command and an argument list

The command is just a string, while the arguments are read as a list of name-value pairs in the form command -name1 value1 -name2 value2 ..., then stored in a hash map. Boolean options listed without a value default to true.

To make bindings relatively seamless, the bind/unbind commands as well as any active bindings are intercepted before reaching the parser. The bind and unbind commands are instead parsed as bind <token> <command> and unbind <token>, and any active bindings are replaced with the bound command then sent to the parser.

States and Configurations

Many of the languages supported by Eso have configurable options or parts of their environment, like optional dynamic tape size, random number generators, or even timers. Rather than using simple global values, which would severely impact modularity and disregard the functional style, Eso wraps all these into a data structure to pass down to components.

There are two structures used for this:

The interface maintains a State, and when needed uses that State to build a Config which it can pass to the language components.

Any language features which would otherwise break the functional style are pushed to the Config. For instance, there is a Befunge instruction which makes a random decision; instantiating a random number generator from within the interpreter would require either using the same seed every time or breaking referential transparency, so the Config carries its own random number generator, thus making the generator itself an input to the function.

Command Handlers

Eso originally used a single monolithic interface, but that grew cumbersome. The current design uses objects called "Handlers".

A Handler represents a single command, and it holds three things:

A Handler is essentially a function with the signature (State => (Arguments => State)) (it isn't necessarily a pure function, but it's a simple way to look at it). The main interface parses its input, passes its state and any additional arguments to the corresponding Handler, then the Handler executes the command and returns the new state.

This approach makes it very easy to extend Eso with new features. Any new environment variables or components can be added to the State and Config structures without changing any other code, and new commands only require writing the new Handler and adding it to the interface's list of Handlers.

Load Handlers

Eso has several configuration components that can be saved and loaded later. Rather than requiring the user to manually load every part of their preferred configuration on each startup, Eso keeps a list of "Load Handlers" that it runs at startup to automatically load everything from the default locations. This means that if you save your bindings, custom translators, file associations, or anything else you can save, it'll be available to both the persistent and non-persistent interfaces without any action from the user.

Interpreter I/O

As the interpreters are pure functions, internal interaction with the console is out. Instead, Eso leans heavily on something called lazy evaluation.

By default, Scala adopts eager evaluation; every expression is immediately evaluated to its result as it's encountered. My guess is that this is mostly due to Scala's multi-paradigm design, mixing imperative and functional styles.

One of the features common to many functional languages, and indeed available in Scala, is laziness; the ability to hold off evaluation of expressions until they are needed. In Scala, this culminates in one of my favorite data structures: The LazyList.

A LazyList is, as its name implies, a lazily-evaluated linked-list; each element is only evaluated as it is used. This allows you to do lots of things, like a list of all the prime numbers, or simple reductions of consecutive filters and mappings to a single traversal, and so on.

In Eso, both the input and output of an interpreter are LazyLists.

The interface builds a LazyList by reading one Char from the input for each element and passes it to the interpreter, where each time it pulls another element off the list it's effectively reading from the console. The interpreter in turn builds a LazyList by essentially running the program in chunks; the main routine is generally a recursive function that executes instructions until it reaches an output instruction, at which point it returns its current state plus the output. The interpreter sequentially calls this routine for each element, passing the state down the line; when an element is evaluated, the routine is called and steps forward.

The interface actually runs the program by traversing the output list and printing each element.

This preserves pure functionality because LazyLists are immutable; each element is only evaluated once, after that it doesn't change. The input list always represents a single constant input, it just figures out what that input is while it's being used. The interpreter always maps each unique input to a single output.

File and Console I/O

LazyLists are all well and good for interpreters, but command handlers need a bit more freedom. They originally just directly used built-in functions like println() and Java's File API. Once I started writing automated tests, this grew cumbersome.

In the current design command handlers can be given two types of interface objects, one for console I/O and one for file I/O. These objects have methods for reading, printing, and whatever else makes sense, and are used in place of the standard functions. In general, these object can either be a passthrough that goes directly to the real console/file system, or they can have a self-contained environment that is very convenient for testing.

A pleasant side-effect (badum-tss) of this design is that it pushes the effectful I/O still further up the chain, making interface handlers almost pure functions.

How Translators are Used

Aside from directly translating a program using the command, Eso uses translators under the hood to make the interface more flexible.

For the translate command, Eso doesn't just look for a translator between the two specified languages; it instead looks for the shortest chain of translators that can get it from the source language to the target language, then composes the chain into a single function. So, if you translate a program from Ook to FlufflePuff, Eso is actually translating Ook>BrainFuck>FlufflePuff.

When you use the run command, Eso searches for a chain from the source language to the nearest (i.e. shortest translation path) available interpreter. If it finds one, it automatically translates.

The transpile handler searches for the shortest chain from the source language to a known transpiler to the target language; bear in mind that it will only use a single transpiler.

Eso can do this with translators because they are guaranteed to be invertible and to preserve the structure of a program. In essence, no matter how many times you translate it, the program is guaranteed to be the same program. Transpilers can freely change how a program works (they just need to preserve the behavior), which means they're more flexible in terms of functionality but less flexible in terms of how they can be manipulated.

Environment Variables

The environment maintained by Eso has collected quite a few parameters, detailed here:

Boolean values can be set to true with an argument matching the regex (true|yes|t|y|[1-9]) and false matching (false|no|f|n|0), both case-insensitive.

Numeric values may be set either with a numeric argument or with a character in single quotes 'c' (in which case it will be set to that character's ASCII code).

Config Environment

As mentioned earlier, one thing that was necessary to maintain a functional design was to make the environment an input, thus Eso hands off a Config object to language components.

The Config object currently holds:

Caching

If you have caching enabled, Eso will keep a list of built programs to save time on reading and initialization when you run a program multiple times.

The caching scheme is pretty simple thanks to Eso's functional design. Since the interpreter actually returns a function mapping the user input to the program output, the run handler can simply store the function it gets back in a HashMap and grab it from there instead of the interpreter for later runs. The built programs are keyed by the file name and language used, and are bundled with the last modified time of the source file at the time it was built so the run handler knows to build a new one if the code has been changed.

One of the most critical things that Eso must do for this scheme to work is to enforce referential transparency, meaning that the cached program will do exactly the same thing every time it's run. Thankfully, this is a natural consequence of functional programming.

The one bit where this can cause trouble is with the Scala interpreter, since it can run arbitrary Scala code. I recommend turning caching off if you plan to do much with Scala.

Notes

BrainFuck Optimization Strategy

The optimizing BrainFuck interpreter translates the program into an intermediate language in a series of passes:

  1. Clear loops (loops that set the current cell to 0) are replaced with a single instruction.
  2. Repeated instructions are contracted into one instruction. For instance, >>>>> would be contracted to (>,5).
  3. Unbroken sequences of shifts, increments/decrements, and clears are contracted into single 'bulk' operations.
  4. Copy/multiply loops are replaced with a single instruction.
  5. Brackets are assigned the index of their partner to eliminate scrubbing.

User Defined BrainFuck Translators

The best way to define your own translator is to use the command defineBFLang and follow the prompts.

The translators are stored in a JSON, and the names of the translators are listed in an array names "names", thus you are not able to define a BrainFuck translator with the name "names".

On the Scala "Interpreter"

The Scala 'interpreter' compiles the source code to an abstract syntax tree, defines a symbol with that tree, then tries to run the standard "main(args: Array[String]): Unit" method.

Its original purpose is to run programs generated by Eso's transpilers. Currently, the Scala 'interpreter' and the internal compiling interpreters are different; The internal compiling interpreters cast the code to a specific class, while the Scala 'interpreter' only assumes there's a main method. This ultimately means the internal ones are faster but the Scala language component is more versatile.

FracTran Program Format

The FracTran interpreter reads programs as an initial value followed by a list of fractions of the form "n/d", each term separated by a line break. The prime generator program looks like this:

2
17/91
78/85
19/51
23/38
29/33
77/29
95/23
77/19
1/17
11/13
13/11
15/2
1/7
55/1

FracTran++ has identical structure to (and is fully compatible with) FracTran, but with additional syntax detailed here.

P'' Program Format

The P'' interpreter handles Böhm's extended instruction set, with the seven instructions (, ), r, r', L, R, and A. The first line is the ordered alphabet (the first character is the empty symbol), the second line is the initial tape state (blank for empty tape), and all following lines contain the program.

Hello world looks like this:

.Helo wrd!
.
r'L
r'r'L
rrA
r'r'r'L
rrrA
rrrrrA
rrrrA
rrrA
rrA
rrA
rA
A

On the Befunge-98 Interpreter

This is by far the biggest, most complex, and most difficult to get right interpreter in Eso, as it will likely remain for some time.

For a while, there was a glaring issue in the Mycology report: Eso didn't report stack size correctly. There was a very simple reason for this; the Befunge-93 interpreter handles bottomless stacks with LazyLists, meaning the stack listerally is an endless list of 0s. Moving this over to Befunge-98 posed an issue, because Funge-98 adds the ability to query stack sizes, which is always infinite with the LazyList approach.

My first approach to resolve this was just to report the stack size as -1, but that obviously wasn't expected behavior. Eventually, I ended up writing a new collection type that keeps a finite Vector of elements but behaves like a bottomless list when elements are extracted.

Another difficulty comes from the fact that Funge-98 allows the program to query a clock/timer.

*long, shaky sigh*

Ultimately, I settled on a design where clock reads are handled similarly to user input, where it's an immutable LazyList of clock reads that is evaluated as it is used.

Of course, the Funge-98 interpreter also does not support file I/O or the system.exec instruction, as this would break the functional style. Thankfully, these features are actually optional in the spec.

Expect more fingerprints to become available over time. ... Slowly, probably. Whenever I get bored between new language components.

On the LazyK Interpreter

The LazyK parser doesn't support mixing dialects in the same source file. That feature had only one mention in the entire spec, and the author put that mention there to say he didn't know how he felt about it -and implementing it with my current set of parser tools would mean bringing an ugly, inelegant monstrosity into this world where I otherwise have a parser so neat and tidy it makes me smile. Sorry, I don't plan to implement that unless my parser tools evolve to the point where it becomes a 3-4 line solution.

This interpreter is also a little unique in that it's the only interpreter in Eso that uses a mutable state. When an S combinator transforms (Sxyz) into (xz)(yz), the duplicated function z is wrapped into a special expression that collapses both occurrences to the result when either one is evaluated. I did this because the performance gain is literally more than a thousand times.

The reason I had to use mutable states to accomplish this is that I have to perform the evaluation in the same level as the rest of the expression in order to not blow up the call stack. If I simply made the result a lazy val that made a new call to eval, that would introduce a recursion that can't be optimized away with trampolining. The moment I figure out a way to do this without mutable state, though, I'm changing it.

How the Interface Code Works

You may notice odd-looking structures like this in the code for several interface handlers:

Trampoline.doOrElse(state){
  DoOrOp(args.get("s"), "Missing Source File", eio){src =>
    DoOrOp(getLang(args, "l", "s"), "Unrecognized Language or File Extension", eio){lang =>
      DoOrOp(findTranslator(state, lang, state.interpNames), "Language Not Recognized", eio){
        case (inam, t) =>
          DoOrErr(EsoFileReader.readFile(src, normLineFlag), eio){progRaw =>
            DoOrErr(t(progRaw), eio){prog =>
              TimeIt(state.interps(inam)(state.config)(prog)) match{
                case (i, dur) =>
                  DoOrErr(i, eio){r =>
                    DoOrErr(inputs, eio){inp =>
                      TimeIt{tryAll{printer(olim(r(inp)))}} match{
                        case (flg, rdr) => flg match{
                          case Failure(e) =>
                            if(timeFlg) eio.println(s"\nError: $e\nProgram failed in ${rdr}ms")
                            else eio.println(s"\nError: $e")
                          case Success(_) =>
                            if(timeFlg) eio.println(s"\nProgram completed in ${rdr}ms")
                            else eio.println()}}
                      Done{state}}}}}}}}}}

These actually represent most of the core logic. Eso uses Try monads for most of its error handling, as well as Options for a lot of faillable tasks, so if the code were to use a normal imperative or structural style there would be a lot of redundant code for checking whether or not something failed and breaking the flow.

These structures just nest each step into the previous one it's dependent on. You can see that the first layer tries to get the source file, then the second layer tries to figure out which language to use, and so-on. Each layer tries to do something, and either passes the result to the next layer or short-circuits with a failure. This works out to be quite an elegant way to represent multi-step IO interactions.

You'll also notice that the whole thing is wrapped in a Trampoline method. If you were to just arbitrarily nest things like this, you would be pointlessly eating up the call stack -so instead, each layer is actually a wrapper that returns the next layer. So, during execution it's actually going into a layer then coming out of that layer with the next one, only ever going down into one layer at a time. This is called trampolining, and it's a common way to optimize recursive structures that could otherwise blow up the call stack.

Of course, Eso wouldn't need special handling of trampolines if it had full TCE, but that's a matter for future JVM versions to address.

Abstraction Eliminators

For some combinator-calculus languages without support for lambda expressions, Eso provides abstraction eliminators. These are special transpilers that turn all lambda expressions into combinator expressions. For example, if you wanted to run abstraction elimination on an Unlambda program, you would type:

Eso> transpile -s example.txt -o example.unl -sl Lambda_Calculus

Building

I use SBT assembly. This repo should give you everything you need to build it, just put it in a folder, run SBT from that directory, let it do its thing, then run assembly.