Home

Awesome

Metascala

Metascala is a tiny metacircular Java Virtual Machine (JVM) written in the Scala programming language. Metascala is barely 3000 lines of Scala, and is complete enough that it is able to interpret itself metacircularly. Being written in Scala and compiled to Java bytecode, the Metascala JVM requires a host JVM in order to run.

The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ which make up HotSpot, the standard implementation, and more amenable to implementing fun features like continuations, isolates or value classes. The 3000 lines of code gives you:

Although it is far from a complete implementation, Metascala already provides the ability to run untrusted bytecode securely (albeit slowly), since every operation which could potentially cause harm (including memory allocations and CPU usage) is virtualized and can be controlled. Ongoing work includes tightening of the security guarantees, improving compatibility and increasing performance.

Getting Started

<!-- --------------- -->

Metascala requires Scala 2.10 and is built using SBT 12. After checking out the repository, if you have SBT installed, all you need to do is run

sbt
> test-only metascala.features.*

Which will download the dependencies (currently just asm), compile the code, and run the unit tests in the test/scala/features folder. Compiling Metascala could take a minute, but running the unit tests should take less than 10 seconds. These tests exercise individual pieces of functionality available on the JVM: math, methods, classes, exceptions, etc., and verify that the result of executing a method via Metascala is identical to the result of executing it directly via reflection.

Apart from the basic feature tests, metascala also includes basic tests for the garbage collector, Java and Scala library functionality. Lastly, Metascala contains tests for Meta-interpretation, which tests the ability for Metascala to interpret its own source code, which in turn interprets some simple programs (e.g. a square-root calculator). Meta-interpretation is extremely slow, and these tests take several minutes to run.

Implementation

<!-- --------------- -->

Metascala is a simple Scala application, and compiles to Java bytecode like any other Scala program. It is literally a program that loads in a class file, parses it into a data structure and then has a while(true) loop that interprets the bytecodes one by one, updating the internal state of the VM following the JVM Spec and spitting out an answer at the end.

In fact, each Metascala JVM is a single Java object, containing in itself all state relevant to its own computation. Instantiating one and invoking methods using it is simple:

val x = new metascala.VM()
x.invoke("metascala.features.arrays.ArrayStuff", "bubbleSort", Seq(Array(6, 5, 2, 7, 3, 4, 9, 1, 8)))
// Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

Arguments passed into the invoke() method are converted from their real representation into virtual versions (the vrt.* classes) to be handled within the Metascala VM. The return value is similarly converted from a virtual value back to a real value before being given to the caller of invoke().

The main packages of interest in Metascala are:

metascala/imm

An immutable model of the data structures that make up a Java .class file. These are an almost direct conversion of the data structures provided by the ASM Tree API, converted to idiomatic, immutable Scala case classes. These classes should be purely immutable, and should have no dependency on the rest of Metascala.

metascala/opcodes

This contains the logic related to parsing and compiling the Java bytecode instructions from the default hybrid stack/register format into an SSA bytecode, simplifying it in the process. Currently Metascala does not perform any real optimizations on the bytecode apart from linking up class/method names with the relevant classes and methods, but the SSA format should make it easier to perform these operations in the future.

metascala/rt

Runtime data-structures that make up the JVM: threads, classes, methods, objects and arrays. These classes also contain the mutable state associated with these constructs (e.g. static class fields) or Metascala-specific optimizations (e.g. pre-computed vtables).

metascala/natives

Trapped methods, or Bindings, which when called within the Metascala VM result in some interaction with the Host VM. There is a default implementation of bindings, but it can easily be swapped out for a custom version of Bindings e.g. to redirect filesystem access, or mock out currentTimeMillis() with a custom time. All interaction between the code inside the VM and the external world takes place through these Bindings.


Many concepts have classes in several of these packages representing them. For example, the abstract idea of a Java "Class" is modelled by:

These types are always referred to by their qualified names in the source code (i.e. imm.Cls rather than simply Cls) in order to avoid confusion between them or name collisions.

Compatibility

<!-- --------------- -->

Metascala implements a subset of the Java Virtual Machine Specification. The implementation has been mostly focused on the features that Metascala needs to run. However, Metascala does not require (and hence does not implement) several things such as:

Apart from the language specification, there is a large amount of functionality in the JVM which is from native methods. These are required for the JVM to interact with the outside world in any way, and again Metascala only implements those which were necessary to interpret itself, leaving out a lot of other basic things such as

Nonetheless, Metascala is compatible enough to interpret itself: a moderately sized Scala program which makes heavy use of the standard library, some basic reflection, and some external Java libraries (Basically only ASM right now).

MetaScala has been tested on Windows 7 using the Sun JVM (Java 7), and Ubuntu 12.04 using OpenJDK 7.

Performance

<!-- --------------- -->

The performance of Metascala is absolutely abysmal: it performs basic optimizations (e.g. pre-calculating virtual-dispatch tables and maintaining invoke-interface caches), but not much more. This is partly because it is written in Scala in a mostly immutable, functional style, and that results in overhead (lots of extra allocations and method calls) over an imperative, mutable style. The other part is probably a fundamental limitation of being an interpreter, and any major increase in performance would require pretty fundamental changes to the system.

This can easily be improved a great deal in the future: micro-bottlenecks (e.g. for-comprehensions) can be optimized, and the SSA bytecode is amenable to analysis and optimization. Nonetheless, my focus so far has been on completeness and compliance; performance optimizations will have to wait.

Security

<!-- --------------- -->

Part of Metascala's goal is to allow the developer to safely run arbitrary untrusted bytecode. Metascala's approach to security is completely separate from the JVM's existing security model. In short: everything is virtualized, and everything is controlled. Because it reads and interprets each and every bytecode one by one, code executed with the Metascala VM cannot:

There are still some weaknesses in Metascala's security model: time spent garbage collecting isn't accounted for, neither is memory not directly allocated but required by the VM's auxiliary data structures (e.g. classes). These provide an attacker means to consume more resources than they should be allowed to, and solving this is part of the ongoing work.

Ongoing Work

<!-- --------------- -->

Immediate work includes:

Feel free to contact me (below) or open an issue/send a pull request if you're interested and want to help out. Contributions are welcome!

Fun Facts

<!-- --------------- -->

Credits

<!-- --------------- -->

Copyright (c) 2013, Li Haoyi (haoyi.sg at gmail.com)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.