Awesome
DCLua - Go Lua Compiler and VM:
This is a Lua 5.3 VM and compiler written in Go. This is intended to allow easy embedding into Go programs, with minimal fuss and bother.
I have been using this VM/compiler as the primary script host in Rubble (a scripted templating system used to generate data files for the game Dwarf Fortress) for over a year now, so they are fairly well tested. In addition to the real-world "testing" that this has received I am slowly adding proper tests based on the official Lua test suite. These tests are far from complete, but are slowly getting more so as time passes.
Most (if not all) of the API functions may cause a panic, but only if things go REALLY wrong. If a function does not state that it can panic or "raise an error" it will only do so if a critical internal assumption proves to be wrong (AKA there is a bug in the code somewhere). These errors will have a special prefix prepended onto the error message stating that this error indicates an internal VM bug. If you ever see such an error I want to know about it ASAP.
That said, if an API function can "raise an error" it can and will panic if something goes wrong. This is not a problem inside a native function (as the VM is prepared for this), but if you need to call these functions outside of code to be run by the VM you may want to use Protect or Recover to properly catch these errors.
The VM itself does not provide any Lua functions, the standard library is provided entirely by other packages. This means that the standard library never does anything that your own code cannot do (there is no "private API" that is used by the standard library).
Anything to do with the OS or file IO is not provided. Such things do not belong in the core libraries of an embedded scripting language (do you really want scripts to be able to read and write random files without restriction?).
All functions (including most of the internal functions) are documented to one degree or another, most quite well. The API is designed to be easy to use, and everything was added because I needed it. There are no "bloat" functions added because I thought they could be useful.
Note that another version of this exists over at ofunc/lua. That version has some interesting changes/features, I suggest you give it look to see if it suits your needs better.
Loading Code:
This VM fully supports binary chunks, so if you want to precompile your script it is possible. To precompile a script
for use with this VM you can either build a copy of luac
(the reference Lua compiler) or use any other third party Lua
complier provided that it generates code compatible with the reference compiler. There is no separate compiler binary
that you can build, but it wouldn't be hard to write one. Note that the VM does not handle certain instructions in pairs
like the reference Lua VM does, and I don't remember if I made the compiler take advantage of this or not. If I did then
binaries generated by my compiler may not work with the reference VM.
If you want to use a third-party compiler it will need to produce binaries with the following settings:
- 64 or 32 bit pointers (C type
size_t
), 64 bit preferred. - 32 bit integers (C type
int
). - 64 bit float numbers.
- 64 bit integer numbers.
- Little Endian byte order.
When building the reference compiler on most systems these settings should be the default.
The VM API has a function that wraps luac
to load code, but the way it does this may or may not fit your needs. To use
this wrapper you will need to have luac
on your path or otherwise placed so the VM can find it. See the documentation
for State.LoadTextExternal
for more information. Keep in mind that due to limitations in Go and luac
, this function
is not reentrant! If you need concurrency support it would be better to use State.LoadBinary
and write your own wrapper.
The default compiler provided by this library does not support constant folding, and some special instructions are not used at all (instead preferring simpler sequences of other instructions). Expressions use a simple "recursive" code generation style, meaning that it wastes registers like crazy in some (rare) cases.
One of the biggest code quality offenders is or
and and
, as they can result in sequences like this one:
[4] LT A:1 B:r(0) C:k(2) ; CK:5
[5] JMP A:0 SBX:1 ; to:7
[6] LOADBOOL A:2 B:1 C:1
[7] LOADBOOL A:2 B:0 C:0
[8] TEST A:2 C:1
[9] JMP A:0 SBX:7 ; to:17
[10] EQ A:1 B:r(1) C:k(3) ; CK:<nil>
... (7 more instructions to implement next part of condition)
As you can see this is terrible. That sequence would be better written as:
[4] LT A:1 B:r(0) C:k(2) ; CK:5
[5] JMP A:0 SBX:2 ; to:8
[6] EQ A:1 B:r(1) C:k(3) ; CK:<nil>
... (1 more instruction to implement next part of condition)
But the current expression compiler is not smart enough to do it that way. Luckily this is the worst offender, most
things produce code that is very close or identical to what luac
produces. Note that the reason why this code is so
bad is entirely because the expression used or
(and the implementation of and
and or
is very bad).
To my knowledge there is only one case where my compiler does a better job than luac
, namely when compiling loops or
conditionals with constant conditions, impossible conditions are elided (so if you say while false do x(y z) end
the
compiler will do nothing). AFAIK there is no way to jump into such blocks anyway, so eliding them should have no effect
on the correctness of the program.
The compiler provides an implementation of a continue
keyword, but the keyword definition in the lexer is commented
out. If you want continue
all you need to do is uncomment the indicated line (near the top of ast/lexer.go
). There
is also a flag in the VM that should make tables use 0 based indexing. This feature has received minimal testing, so
it probably doesn't work properly. If you want to try 0 based indexing just set the variable TableIndexOffset
to 0.
Note that TableIndexOffset
is strictly a VM setting, the standard modules do not respect this setting (for example the
table
module and ipairs
will still insist on using 1 as the first index).
Missing Stuff:
The following standard functions/variables are not available:
collectgarbage
(not possible, VM uses the Go collector)dofile
(violates my security policy)loadfile
(violates my security policy)xpcall
(VM has no concept of a message handler)package.config
(violates my security policy)package.cpath
(VM has no support for native modules)package.loadlib
(VM has no support for native modules)package.path
(violates my security policy)package.searchpath
(violates my security policy)string.gmatch
(No pattern matching support)string.gsub
(No pattern matching support)string.match
(No pattern matching support)string.pack
(too lazy to implement, ask if you need it)string.packsize
(too lazy to implement, ask if you need it)string.unpack
(too lazy to implement, ask if you need it)
The following standard modules are not available:
coroutine
(no coroutine support yet, ask if you need it)io
(violates my security policy)os
(violates my security policy)debug
(violates my security policy, if you really need something from here ask)
Coroutine support is not available. I can implement something based on goroutines fairly easily, but I will only do so if someone actually needs it and/or if I get really bored...
In addition to the stuff that is not available at all the following functions are not implemented exactly as the Lua 5.3 specification requires:
string.find
does not allow pattern matching yet (the fourth option is effectively always set totrue
).- Only one searcher is added to
package.searchers
, the one for finding modules inpackage.preloaded
. next
is not reentrant for a single table, as it needs to store state information about each table it is used to iterate. Starting a new iteration for a particular table invalidates the state information for the previous iteration of that table. Never use this function for iterating a table unless you absolutely have to, use the non-standardgetiter
function instead.getiter
works the waynext
should have, namely it uses a single iterator value that stores all required iteration state internally (the way the defaultnext
works is only possible if your hash table is implemented a certain way).
Finally there are a few things that are implemented exactly as the Lua 5.3 specification requires, where the reference Lua implementation does not follow the specification exactly:
- The
#
(length) operator always returns the exact length of a (table) sequence, not the total length of the array portion of the table. See the comment intable.go
(about halfway down) for more details (including quotes from the spec and examples). - My modulo operator (
%
) is implemented the same way most languages implement it, not the way Lua does. This does not matter unless you are using negative operands, in which case it may not provide the results a Lua programmer may expect (although C or Go programmers will be fine :P).
The following core language features are not supported:
-
Hexadecimal floating point literals are not supported at this time. This "feature" is not supported for two reasons: I hate floating point in general (so trying to write a converter is pure torture), and when have you ever used hexadecimal floating point literals? Lua is the only language I have ever used that supports them, so they are not exactly popular...
-
Weak references of any kind are not supported. This is because I use Go's garbage collector, and it does not support weak references.
-
I do not currently support finalizers. It would probably be possible to support them, but it would be a lot of work for a feature that is of limited use (I have only ever needed to use a finalizer once, ironically in this library). If you have a compelling reason why you need finalizers I could probably add them...
-
The reference compiler allows you to use
goto
to jump to a label at the end of a block ignoring any variables in said block. For example:do goto x local a ::x:: end
My compiler does not currently allow this, treating it as a jump into the scope of a local variable. I consider this a bug, and will probably fix it sooner or later...
Note that AFAIK there is nothing in the Lua spec that implies this is allowed, but it seems like a logical thing to permit so I suppose I'll have to fix it, sigh.
TODO:
Stuff that should be done sometime. Feel free to help out :)
The list is (roughly) in priority order.
- Write more tests for the compiler and VM.
- (supermeta) Allow using byte slices as strings and vice-versa. Maybe attach a method to byte slices that allows conversion
back and forth? (this would probably be fairly easy to do)
- Do the same with rune slices?
- Write better stack traces for errors.
- Improve compilation of
and
andor
. - Fix jumping to a label at the end of a block.
- Fix
CONCAT
so it performs better when there is a value with a__concat
metamethod. - (supermeta) Look into allowing scripts to call functions/methods. It's certainly possible, but possibly difficult (possible not as difficult as I think).
- Marshaling the AST as XML works poorly at best. The main problem is that some items retain their type info, and others have it stripped in favor of their parent field name.
Changes:
A note on versions:
For this project I more-or-less follow semantic versioning, so I try to maintain backwards compatibility across point releases. That said I feel free to break minor things in the name of bugfixes. Read the changelog before upgrading!
1.1.8
- Fixed
State.ConvertString
so it actually works (based on PR #21 by ofunc) - Fixed a really weird issue where statements that started with a parenthesized expression would be assumed to be a function call and error out if they were not. (Fixed #23)
- Fixed issue in
string.byte
. (PR #24 by ofunc) - Fixed
math.huge
to have the correct value. (PR #25 by ofunc) - Added
utf8
package (github.com/milochristiansen/lua/lmodutf8
) (PR #26 by ofunc)
1.1.7
- Function calls or parenthesized expressions that are followed by table indexers are now properly compiled (Fixed #13).
- The compiler sometimes did not always mark "used" the proper number of registers when compiling identifiers (Fixed #16).
- Fixed the table iterator not finalizing (Fixed #17).
- Removed my hacky slice library and just did things properly (Fixed #18).
- Fix
pcall
not returningtrue
on success (Fixed #19). - Fixed setting a nil index in a table not raising an error (Fixed #20).
1.1.6
Fun with tables! Ok, not so much fun.
- Fixed scripts with lots of constants overflowing RK fields in certain instructions. The proper constant load instructions are emitted in this case now.
- Tables with lots of empty space at the beginning of the array portion will no longer cause crashes when the array portion is resized.
1.1.5
And, another stupid little bug.
- Constructs similar to the following
[=[]==]]=]
were not working properly. The lexer was not properly constructing the lexeme, and it would return the wrong number of equals signs and it would eat the last square bracket. As a bonus I greatly simplified the string lexing code. (ast/lexer.go)
1.1.4
Not sure how I missed this one... Oh well, it should work now.
require
was not checkingpackage.loaded
properly. (lmodpackage/functions.go)
1.1.3
One of the tests was failing on 32 bit systems, now it isn't.
- Integer table keys that fit into a script integer but not a system default int value will no longer be truncated sometimes. Such keys were always supposed to go in the hash part of the table, but before this fix the keys were being truncated first in some cases. (table.go)
1.1.2
More script tests, but no real compiler bugs this time. Instead I found several minor issues with a few of the API functions and a few other miscellaneous VM issues (mostly related to metatables).
This version also adds a minor new feature, nothing to get excited about... Basically I made it so that JSON or XML encoding an AST produces slightly more readable results for operator expression nodes. Someone else suggested the idea (actually they submitted a patch, yay them!). I never would have thought to do this myself (never needed it), but now that I have it, it seems like it could be useful for debugging the compiler among other things.
Unfortunately due to the way the AST and most encodings work, it is impossible to unmarshal the AST. I am not 100% sure if it is possible with XML or not, but it certainly will not work with JSON. This could maybe be fixed, but would be way too much work.
Anyway, these improvements are still useful if you want to examine the AST for whatever reason...
- Added another set of script tests. (script_test.go)
- Fixed the
tostring
script function and theConvertString
API function so they pass the return value from a_tostring
metamethod through unchanged (instead of converting the result to a string, for example"nil"
). (api.go) - You may now use
nil
as a metatable value for theSetMetaTable
API function (and thesetmetatable
script function). (api.go) - Made some changes to the compiler tester so it is easier to tweak the output for specific error types (for example it is now possible to suppress the assembly listing). (test.go)
- Fixed variadic functions that also have named parameters (this was an issue with the new stack frame code, not the compiler). (stack.go)
- Fixed comparisons of non-matching types via metamethods, before if the types did not match the comparison would always fail (I must have been sleep deprived when I wrote that bit). (value.go)
- Changed the way greater-than and greater-than or equals (
>
and>=
) where implemented. The old way was2 > 1 == !(2 <= 1)
and2 >= 1 == !(2 < 1)
, the new way is2 > 1 == 1 < 2
and2 >= 1 == 1 <= 2
. The old way worked fine (and was slightly easier to implement), but it was not what the spec required (and so could cause problems with metamethods). (compile_expr.go) - Changed the way the
CONCAT
instruction is implemented so that the__concat
metamethod works correctly. Warning: using__concat
even once will cause string concatenation performance to nosedive for that group of concatenation operations! (vm.go) - Fixed the
rawset
andrawget
script functions so they ignore extra arguments. (lmodbase/functions.go) - Values with a
__newindex
metamethod that is a table now properly use a regular set (triggers metamethods) when indexing this table. I'm not sure how I missed this, I did it properly for gets (__index
). (value.go) - AST operator type constants now marshal and unmarshal as text rather than raw (more-or-less meaningless) integers. Also
when printed via
fmt
functions they should use text names by default. (ast/expr.go, commit by "erizocosmico") - When encoded as JSON, each Node now has an extra key named after the node type that contains an object with all fields common to all nodes (namely the line number). This greatly enhances readability of a JSON encoded AST. (ast/ast.go)
1.1.1
More script tests, more compiler bugs fixed. Same song, different verse.
- Added another set of script tests. (script_test.go)
- Fixed unary operators after a power operator, for example
2 ^ - -2
. To fix this issue I totally rewrote how operators are parsed. (ast/parse_expr.go) - Fixed semicolons immediately after a return statement. (ast/parse.go)
- Fixed an improper optimization or repeat-until loops. Basically if the loop had a constant for the loop condition its sense was being reversed (so a false condition resulted in the loop being compiled as a simple block, and a true condition resulted in an infinite loop). (compile.go)
- Fixed
and
in non-boolean contexts. Alsoand
andor
may produce slightly better code now. (compile_expr.go)
1.1.0
I was a little bored recently, so I threw together a generic metatable API. It was a quick little project, based on earlier work for one of my many toy languages. This new API is kinda cool, but it in no way replaces proper metatables! Basically it is intended for quick projects and temporarily exposing data to scripts. It was fun to write, and so even if no one uses it, it has served its purpose :P
I really should have been working on more script tests, but this was more fun... I have no doubt responsibility will reassert itself soon.
Anyway, I also added two new convenience methods for table iteration, as well as some minor changes to the old one (you can still use it, but it is now a thin wrapper over one of the new functions, so you shouldn't).
- Ran all code through
go fmt
. I often forget to do this, but I recently switched to a new editor that formats files automatically whenever they are saved. Anyway, everything is formatted now. (almost every file in minor ways) - Added
Protect
andRecover
, simple error handlers for native code. They are to be used when calling native APIs outside of code otherwise protected (such as by a call to PCall).Recover
is the old handler fromPCall
, wrapped so it can be used by itself.Protect
simply wrapsRecover
so it is easier to use. (api.go) - Added
ForEachRaw
, basicallyForEachInTable
, but the passed in function returns a boolean specifying if you want to break out of the loop early. In other newsForEachInTable
is now depreciated. (api.go) - Added
ForEach
, a version ofForEachRaw
that respects the__pairs
metamethod.ForEachRaw
uses the table iterator directly and does much less stack manipulation, so it is probably a little faster. (api.go) - Added a new sub-package:
supermeta
adds "generic" metatables for just about any Go type. For obvious reasons this makes heavy use of reflection, so it is generally much faster to write your own metatables, that said this is really nice for quickly exposing native data to scripts. From the user's perspective you just callsupermeta.New(l, &object)
andobject
is suddenly a script value on the top ofl
's stack. Arrays, slices, maps, structs, etc should all work just fine. Note that this is very new, and as of yet has received little real-world testing! (supermeta/supermeta.go, supermeta/tables.go) - Added a new sub-package:
testhelp
contains a few test helper functions I find useful when writing tests that interact with the VM. Better to have all this stuff in one place rather than copied and pasted all over... (testhelp/testhelp.go) - Modified the script tests in the base package to use the helper functions in
testhelp
rather than their own copies. The API tests still have their own copies of some of the functions, as they need to be in the base package so they can access internal APIs (stupid circular imports). (script_test.go) - Clarified what API functions may panic, I think I got them all... (api.go)
1.0.2
More tests, more (compiler) bugs fixed. Damn compiler will be the death of me yet...
In addition to the inevitable compiler bugs I also fixed the way the VM handles upvalues. Before I was giving each closure its own copy of each upvalue, so multiple closures never properly shared values. This change fixes several subtle (and several not so subtle) bugs.
Oh, and pcall
works now (it didn't work at all before. Sorry, I never used it).
- Added more script tests. I still have a lot more to do... (script_test.go)
- Fixed incorrect compilation of method declarations (
function a:x() end
). Depressingly the issue was only one incorrect word, but it resulted in very wrong results (I am really starting to remember why I hated writing the compiler, the VM was fun, the compiler... not.) (ast/parse.go) - Parenthesized expression that would normally (without the parenthesis) return multiple values (for example:
(...)
) were not properly truncating the result to a single value. (compile_expr.go) - Fixed a semi-major VM issue with upvalues. Closures that should have a single shared upvalue were instead each using their own private copy after said upvalue was closed. This required an almost total rewrite of the way upvalues are stored internally. (all over the place, but mainly callframe.go, function.go, api.go, and vm.go)
- JMP instructions created by
break
andcontinue
statements are now properly patched by the compiler to close any upvalues there may be. (compile.go) - Fixed the
pcall
script function so it actually works. (lmodbase/functions.go) - On a recovered error each stack frame's upvalues are closed before the stack is stripped. This corrects incorrect behavior that arises when a function stores a closure to an unclosed upvalue then errors out (the closure may still be referenced, but it's upvalues may be invalid). (api.go, callframe.go)
1.0.1
This version adds a bunch of tests (still not nearly as many as I would like), and fixes a ton of minor compiler errors. Most of the compiler errors were simple oversights, usually syntax constructs that I never used in my own code (and hence never tested).
The VM itself seems to be mostly bug free, but the compiler is a different story. I'm fixing bugs as fast as I discover
them, but sometimes it's really tempting to just use luac
and call it a day :P
- Fixed a issue with State.Pop possibly causing a panic if you pop values when the stack is empty (or if you try to pop more values than the stack contains), it now does nothing in this case. (stack.go)
- Added some tests for the VM native API (api_test.go)
- Added some script tests based on the official Lua 5.3 test suite. These tests are not (even close to) complete yet, (many) more are on the way. (script_test.go)
- Added a
String
method toSTypeID
to match the one forTypeID
. (value.go) - Made the custom
string
module extensions optional. (lmodstring/functions.go, lmodstring/README.md) - Fixed an issue with the
ForEachInTable
helper function, it left the table iterator object on the stack when it returned. (api.go) - Fixed inexplicably missing lexer entry for the semicolon (I know it was there before, it must have gotten removed by accident at some point). (ast/lexer.go)
- Lexer errors now contain the line number where the problem resides (or at least close to it). (ast/parse.go)
- Fixed that numeric for loops required all three arguments. I always use the full form, so I forgot that a short two argument form is legal... (ast/parse.go)
- Fixed that you could not repeat two unary operators in a row. (ast/parse_expr.go)
- You may now use semicolons as well as commas as field separators in table constructors (did you know that was legal? I didn't until I rechecked the BNF). (ast/parse_expr.go)
- Fixed certain cases in expression/name parsing. Some things are less permissive, others are more. (ast/parse.go ast/parse_expr.go)
- Fixed certain multiple assignment statements involving table assignments and direct assignments to the same variable.
If the table assignment came first the direct assignment would clobber its register/upvalue and you would get an error
or (even worse) unexpected behavior. This affected statements such as the following:
local a = {}; a[1], a = 1, 1
(compile.go) - All numeric constants were always being treated as floats, leading to errors when you tried to use a hexadecimal constants (and probably other subtle issues). (ast/lexer.go)
- You may now use the shorthand null string escape sequence ('\0'). Thank you to whoever wrote the Lua spec, not having a proper list of valid escape sequences is really helpful /s. (ast/lexer.go)
- Both sides of a shift are now converted to an unsigned integer for the duration of the shift, then converted back to the proper signed type. This resolves some strangeness with bitwise shifts. (value.go)
- Removed various debugging print statements that I forgot to remove earlier. The only ones still in were a few that printed just before an error triggered, so it is unlikely anyone ever saw one... (all over the place)