Home

Awesome

Masala Parser: Javascript Parser Combinators

npm version Build Status Coverage Status stable

Masala Parser is inspired by the paper titled: Direct Style Monadic Parser Combinators For The Real World.

Masala Parser is a Javascript implementation of the Haskell Parsec. It is plain Javascript that works in the browser, is tested with more than 450 unit tests, covering 100% of code lines.

Use cases

Masala Parser keywords are simplicity, variations and maintainability. You won't need theoretical bases on languages for extraction or validation use cases.

Masala Parser has relatively good performances, however, Javascript is obviously not the fastest machine.

Usage

With Node Js or modern build

    npm install -S @masala/parser

Or in the browser

Check the Change Log if you can from a previous version.

Reference

You will find an Masala Parser online reference, generated from typescript interface.

Quick Examples

Hello World

const helloParser = C.string('hello');
const white = C.char(' ');
const worldParser = C.string('world');
const combinator = helloParser.then(white.rep()).then(worldParser);

Floor notation

// N: Number Bundle, C: Chars Bundle
const {Streams, N, C}= require('@masala/parser');

const stream = Stream.ofString('|4.6|');
const floorCombinator = C.char('|').drop()
    .then(N.number())      // we have ['|', 4.6], we drop '|'
    .then(C.char('|').drop())   // we have [4.6, '|'], we keep [4.6]
    .single() // we had [4.6], now just 4.6
    .map(x =>Math.floor(x));

// The parser parses a stream of characters
const parsing = floorCombinator.parse(stream);
assertEquals( 4, parsing.value, 'Floor parsing');

Explanations

According to Wikipedia "in functional programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output."

The Parser

Let's say we have a document :

The James Bond series, by writer Ian Fleming, focuses on a fictional British Secret Service agent created in 1953, who featured him in twelve novels and two short-story collections. Since Fleming's death in 1964, eight other authors have written authorised Bond novels or novelizations: Kingsley Amis, Christopher Wood, John Gardner, Raymond Benson, Sebastian Faulks, Jeffery Deaver, William Boyd and Anthony Horowitz.

The parser could fetch every name, ie two consecutive words starting with uppercase. The parser will read through the document and aggregate a Response, which contains a value and the current offset in the text.

This value will evolve when the parser will meet new characters, but also with some function calls, such as the map() function.

The Response

By definition, a Parser takes text as an input, and the Response is a structure that represents your problem. After parsing, there are two subtypes of Response:


    let response = C.char('a').rep().parse(Streams.ofString('aaaa'));
    assertEquals(response.value.join(''), 'aaaa' );
    assertEquals(response.offset, 4 );
    assertTrue(response.isAccepted());
    assertTrue(response.isConsumed());
    
    // Partially accepted
    response = C.char('a').rep().parse(Streams.ofString('aabb'));
    assertEquals(response.value.join(''), 'aa' );
    assertEquals(response.offset, 2 );
    assertTrue(response.isAccepted());
    assertFalse(response.isConsumed());

Building the Parser, and execution

Like a language, the parser is built then executed. With Masala, we build using other parsers.

const helloParser = C.string('hello');
const white = C.char(' ');
const worldParser = C.char('world');
const combinator = helloParser.then(white.rep()).then(worldParser);

There is a compiling time when you combine your parser, and an execution time when the parser runs its parse(stream) function. You will have the Response after parsing.

So after building, the parser is executed against a stream of token. For simplicity, we will use a stream of characters, which is a text :)

Hello Gandhi

The goal is to check that we have Hello 'someone', then to grab that name

// Plain old javascript
const {Streams,  C}= require('@masala/parser');

var helloParser = C.string("Hello")
                    .then(C.char(' ').rep())
                    .then(C.letters()) // succession of A-Za-z letters
                    .last();    // keeping previous letters

var value = helloParser.val("Hello Gandhi");  // val(x) is a shortcut for parse(Stream.ofString(x)).value;

assertEquals('Gandhi', value);

Parser Combinations

Let's use a real example. We combine many functions that return a new Parser. And each new Parser is a combination of Parsers given by the standard bundles or previous functions.

import  {Streams, N,C, F} from '@masala/parser';

const blanks = ()=>C.char(' ').optrep();

function operator(symbol) {
    return blanks().drop()
        .then(C.char(symbol))   // '+' or '*'
        .then(blanks().drop())
        .single();
}

function sum() {
    return N.integer()
        .then(operator('+').drop())
        .then(N.integer())  // then(x) creates a tuple - here, one value was dropped
        .map(tuple => tuple.at(0) + tuple.at(1)); 
        
}

function multiplication() {
    return N.integer()
        .then(operator('*').drop())
        .then(N.integer())
        .array() // we can have access to the value of the tuple
        .map( ([left,right])=> left * right); // more modern js 
}

function scalar() {
    return N.integer();
}

function combinator() {
    return F.try(sum())
        .or(F.try(multiplication()))    // or() will often work with try()
        .or(scalar());
}

function parseOperation(line) {
    return combinator().parse(Streams.ofString(line));
}

assertEquals(4, parseOperation('2   +2').value, 'sum: ');
assertEquals(6, parseOperation('2 * 3').value, 'multiplication: ');
assertEquals(8, parseOperation('8').value, 'scalar: ');

A curry paste is a higher-order ingredient made from a good combination of spices.

Precedence

Precedence is a technical term for priority. Using:

function combinator() {
    return F.try(sum())
        .or(F.try(multiplication()))    // or() will often work with try()
        .or(scalar());
}

console.info('sum: ',parseOperation('2+2').value);

We will give priority to sum, then multiplication, then scalar. If we had put scalar() first, we would have first accepted 2, then what could we do with +2 alone ? It's not a valid sum ! Moreover +2 and -2 are acceptable scalars.

try(x).or(y)

or() will often be used with try(), that makes backtracking : it saves the current offset, then tries an option. And as soon that it's not satisfied, it goes back to the original offset and use the parser inside the .or(P) expression.`.

Like Haskell's Parsec, Masala Parser can parse infinite look-ahead grammars but performs best on predictive (LL[1]) grammars.

Let see how with try(), we can look a bit ahead of next characters, then go back:

    F.try(sum()).or(F.try(multiplication())).or(scalar())
    // try(sum()) parser in action
    2         *2
    ..ok..ok  ↑oups: go back and try multiplication. Should be OK.

Suppose we do not try() but use or() directly:

    sum().or(multiplication()).or(scalar())
    // testing sum()
    2         *2
    ..ok..ok  ↑oups: cursor is NOT going back. So now we must test '*2' ;
                                               Is it (multiplication())? No ;
                                               or(scalar()) ? neither

Recursion

Masala-Parser (like Parsec) is a top-down parser and doesn't like Left Recursion.

However, it is a resolved problem for this kind of parsers, with a lot of documentation. You can read more on recursion with Masala, and checkout examples on our Github repository ( simple recursion, or calculous expressions ).

Simple documentation of Core bundles

Core Parser Functions

Here is a link for Core functions documentation.

It will explain then(), drop(), map(), rep(), opt() and other core functions of the Parser with code examples.

The Chars Bundle

Example:

C.char('-')
    .then(C.letters())
    .then(C.char('-'))
// accepts  '-hello-' ; value is ['-','hello','-']
// reject '-hel lo-' because space is not a letter    

General use

Other example:

C.string('Hello')
    .then(C.char(' '))
    .then(C.lowerCase().rep().join(''))

// accepts Hello johnny ; value is ['Hello', ' ', 'johnny']
// rejects Hello Johnny : J is not lowercase ; no value

The Numbers Bundle

The Flow Bundle

The flow bundle will mix ingredients together.

For example, if you have a Parser p, F.not(p) will accept anything that does not satisfy p

All of these functions will return a brand new Parser that you can combine with others.

Most important:

Others:

License

Copyright (C)2016-2024 Didier Plaindoux & Nicolas Zozol

This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.