Home

Awesome

Opal: Monadic Parser Combinators for OCaml

Opal is a minimum collection of useful parsers and combinators (~150 loc of OCaml) that makes writing parsers easier. It is designed to be small, self-contained, pure-functional, and only includes most essential parsers, so that one could include single file in the project or just embed it in other OCaml source code files.

I find myself writing lots of recursive-descent parsers from scratch in OCaml when I was solving Hackerrank FP challenges. That's why I wrote opal: to include it in the source code and build parsers on top of parser combinators easily.

Example

Trivial arithmetic calculator:

open Opal

let parens = between (exactly '(') (exactly ')')
let integer = many1 digit => implode % int_of_string
let add = exactly '+' >> return ( + )
let sub = exactly '-' >> return ( - )
let mul = exactly '*' >> return ( * )
let div = exactly '/' >> return ( / )

let rec expr input = chainl1 term (add <|> sub) input
and term input = chainl1 factor (mul <|> div) input
and factor input = (parens expr <|> integer) input

let () =
  let input = LazyStream.of_channel stdin in
  match parse expr input with
  | Some ans -> Printf.printf "%d\n" ans
  | None -> print_endline "ERROR!"

For non-trivial examples, see Hackerrank challenge solutions using opal in examples/.

Documentation

The expressiveness of parser combinators are attributed to higher-order functions and the extensive use of currying. However, due to lack of do syntax, the bind operation of monad would not be as succinct as that in Haskell.

A parser monad is either None (indicates failure), or Some pair of result and unconsumed input, where the result is a user-defined value. The input is a lazy stream of arbitrary token type. A parser is a function that accepts an input and returns a parser monad. Although most parsers in opal is polymorphic over token type and result type, some useful parsers only accepts char as input token type.

Since combinators in opal are roughly based on Haskell's Parsec. The following documentation is somehow a rip-off of Parsec's doc.

Lazy Stream

type 'a LazyStream t

Polymorphic lazy stream type.

val LazyStream.of_stream : 'a Stream.t -> 'a LazyStream.t

Build a lazy stream from stream.

val LazyStream.of_function : (unit -> 'a) -> 'a LazyStream.t

Build a lazy stream from a function f. The elements in the stream is populated by calling f ().

val LazyStream.of_string : string -> char LazyStream.t

Build a char lazy stream from string.

val LazyStream.of_channel : in_channel -> char LazyStream.t

Build a char lazy stream from a input channel.

Utilities

val implode : char list -> bytes

Implode character list into a string. Useful when used with many.

val explode : bytes -> char list

Explode a string into a character list.

val ( % ) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c

Infix operator for left-to-right function composition. (f % g % h) x is equivalent to h (g (f x)).

val parse : ('token, 'result) parser -> 'token LazyStream -> 'result option

parse parser input parses input with parser, and returns Some result if succeed, or None on failure.

Primitives

type 'token input = 'token LazyStream.t

type ('token, 'result) monad = ('result * 'token input) option

type ('token, 'result) parser = 'token input -> ('token, 'result) monad

A parser is a function that accepts an input and returns either None on failure, or Some (result, input'), where result is user-defined value and input' is unconsumed input after parsing.

val return : 'result -> 'token input -> ('token, 'result) monad

Accepts a value and an input, and returns a monad.

val ( >>= ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser

x >>= f returns a new parser that if parser x succeeds, applies function f on monad produced by x, and produces a new monad (a.k.a. bind).

val ( let* ) : ('t, 'r) parser -> ('r -> ('t, 'r) monad) -> ('t, 'r) parser

This operator is the same as >>= but using the let notation. It is usefull to avoid ugly sequences of bindings. For exemple, p >>= fun x -> f x can be rewritten let* x = p in f x. Combined with the return function, we can define complex parsers :

let tuple_parser =
  let* x = digit in
  let* _ = exactly ',' in
  let* y = digit in
  return (x, y)

val ( <|> ) : ('t, 'r) parser -> ('t, 'r) parser -> ('t, 'r) parser

Choice combinator. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails, parser q is tried.

val mzero : 'a -> ('t, 'r) monad

A parser that always fails.

val any : ('t, 'r) parser

The parser succeeds for any token in the input. Consumes a token and returns it.

val satisfy : ('t -> bool) -> ('t, 'r) parser

The parser satisfy test succeeds for any token for which the supplied function test returns true. Returns the token that is actually parsed.

val eof : 'a -> ('t, 'a) parser

The parser eof x succeeds if the input is exhausted. Returns value x.

Derived

val ( => ) : ('t, 'r) parser -> ('r -> 'a) -> ('t, 'a) parser

Map combinator. x => f parses x. If it succeeds, returns the value of x applied with f.

val ( >> ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'b) parser

Ignore-left combinator. x >> y parses x and then y. Returns the value returned by y.

val ( << ) : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'a) parser

Ignore-right combinator. x >> y parses x and then y. Returns the value returned by x.

val ( <~> ) : ('t, 'r) parser -> ('t, 'r list) parser -> 't input -> ('t, 'r list) monad

Cons combinator. x <~> y parses x and then y. Returns the value of x prepended to the value of y (a list).

let ident = letter <~> many alpha_num

val choice : ('t, 'r) parser list -> ('t, 'r) parser

choice ps tries to apply the parsers in the list ps in order, until one of them succeeds. Returns the value of the succeeding parser.

val count : int -> ('t, 'r) parser -> 't input -> ('t, 'r list) monad

count n parses n occurrences of p. If n is smaller or equal to zero, the parser equals to return []. Returns a list of n values returned by p.

between : ('t, 'a) parser -> ('t, 'b) parser -> ('t, 'r) parser -> ('t, 'r) parser

between open close p parses open, followed by p and close. Returns the value returned by p.

let braces = between (exactly '{') (exactly '}')

val option : 'r -> ('t, 'r) parser -> ('t, 'r) parser

option default p tries to apply parser p. If p fails, it returns the value default, otherwise the value returned by p.

let priority = option 0 (digit => String.make 1 % int_of_string)

val optional : 'r -> ('t, 'r) parser -> ('t, unit) parser

optional p tries to apply parser p. It will parse p or nothing. It only fails if p fails. Discard the result of p.

val skip_many : ('t, 'r) parser -> ('t, unit) parser

skip_many p applies p zero or more times, skipping its result.

let spaces = skip_many space

val skip_many1 : ('t, 'r) parser -> ('t, unit) parser

skip_many1 p applies p one or more times, skipping its result.

val many : ('t, 'r) parser -> 't input -> ('t, 'r list) monad

many p applies the parser p zero or more times. Returns a list of returned values of p.

val many1 : ('t, 'r) parser -> 't input -> ('t, 'r list) monad

many1 p applies the parser p one or more times. Returns a list of returned values of p.

val sep_by : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad

sep_by p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.

let comma_sep p = sep_by p (token ",")

val sep_by1 : ('t, 'r) parser -> ('t, 'a) parser -> 't input -> ('t, 'r list) monad

sep_by1 p sep parses one or more occurrences of p, separated by sep. Returns a list of values returned by p.

val end_by: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser

end_by p sep parses zero or more ocurrences of p, separated and ended by sep. Returns a list of values returned by p.

let statements = end_by statement (token ";")

val end_by1: ('t, 'r) parser -> ('t, 'a) parser -> ('t, 'r) parser

end_by1 p sep parses one or more ocurrences of p, separated and ended by sep. Returns a list of values returned by p.

val chainl : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser

chainl p op default parses zero or more occurrences of p, separated by op. Returns a value obtained by a left associative application of all functions by op to the values returned by p. If there are zero occurences of p, the value default is returned.

val chainl1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser

chainl1 p op parses one or more occurrences of p, separate by op. Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. This parser can be used to eliminate left recursion which typically occurs in expression grammars. See the arithmetic caculator example above.

val chainr : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> 'r -> ('t, 'r) parser

chainr p op default parses zero or more occurrences of p, separated by op. Returns a value obtained by right associative application of all functions returned by op to the values returned by p. If there are no occurrences of p, the value x is returned.

val chainr1 : ('t, 'r) parser -> ('t, 'r -> 'r -> 'r) parser -> ('t, 'r) parser

chainr p op parses one or more occurrences of p, separated by op. Returns a value obtained by right associative application of all functions returned by op to the values returned by p.

Singletons

val exactly : 'r -> ('t, 'r) parser

exactly x parses a single token x. Returns the parsed token (i.e. x).

let semi_colon = exactly ';'

val one_of : 'r list -> ('t, 'r) parser

one_of xs succeeds if the current token is in the supplied list of tokens xs. Returns the parsed token.

let vowel = one_of ['a'; 'e'; 'i'; 'o'; 'u']

val none_of : 'r list -> ('t, 'r) parser

As the dual of one_of, none_of xs succeeds if the current token not in the supplied list of tokens xs. Returns the parsed token.

let consonant = none_of ['a'; 'e'; 'i'; 'o'; 'u']

val range : 'r -> 'r -> ('t, 'r) parser

range low high succeeds if the current token is in the range between low and high (inclusive). Returns the parsed token.

Char Parsers

val space = (char, char) parser

Parses a white space character ('\s\t\r\n'). Returns the parsed character.

val spaces = (char, unit) parser

Skip zero or more white spaces characters.

val newline = (char, char) parser

Parses a newline character ('\n'). Returns a newline character.

val tab = (char, char) parser

Parses a tab character ('\t'). Returns a tab character.

val upper = (char, char) parser

Parses an upper case letter (a character between 'A' and 'Z'). Returns the parsed character.

val lower = (char, char) parser

Parses a lower case letter (a character between 'a' and 'z'). Returns the parsed character.

val digit : (char, char) parser

Parses a digit. Returns the parsed character.

val letter = (char, char) parser

Parses a letter (an upper case or lower case letter). Returns the parsed character.

val alpha_num = (char, char) parser

Parses a letter or digit. Returns the parser character.

val hex_digit = (char, char) parser

Parses a hexadecimal digit (a digit or a letter between 'a' and 'f' or 'A' and 'F'). Returns the parsed character.

val oct_digit = (char, char) parser

Parses an octal digit (a character between '0' and '7'). Returns the parsed character.

Lex Helper

val lexeme : (char, 'r) parser -> (char, 'r) parser

lexeme p first applies skip_many space and then parser p. Returns the value returned by p.

val token : string -> char input -> (char, char list) monad

token s skips leading white spaces and parses a sequence of characters given by string s. Returns the parsed character sequence as a list.

div_or_mod = token "div" <|> token "mod"