Home

Awesome

MaCaQue : Macros for Caml Queries

MaCaQue (or macaque) is a DSL for SQL Queries in Caml.

It has the following objectives :

macaque is still a work in progress, please send feedback to bluestorm dot dylc (gmail). I'm interested in bug reports and reasonable feature requests.

Usual makefile provided (in src/) : make, make install, bleh, make uninstall Depends on PG'Ocaml

Summary of this document :

<a name="macaque_types"></a>

Important macaque types and structures

macaque builds SQL queries/views/values at three different levels :

query

Macaque queries represent SELECT, INSERT, DELETE and UPDATE queries. They can not be composed.

Examples:

Type : macaques queries have type 'a Sql.query, where 'a is the return type of the query : a row list for SELECT queries, unit otherwise.

view

Views represent SQL tables, views and (sub-)-SELECT results; view are composable and can be used nearly anywhere a SQL table can.

Examples :

Type : ('a, 'b) Sql.view, where 'a is the type of a row (considered as a value). 'b is either non_writable or 'c writable : only writable views correspond to real tables in the SQL database, and thus insert/update/delete won't accept non-writable views. The 'c type parameter in 'c writable is useless for now, it will eventually be used to transport information about default values for table fields.

:Note:

views and select queries are distinct objects (view = reusable set of rows, select = final query), but views are very common so a specific library function is provided to directly send a view to the SQL server, without having to manually build a select query first.

value

Values represent SQL values and expressions : int, float, string, and results of SQL data functions and operators.

Examples :

Type : 'a Sql.t, the most pervasive type in macaque. 'a is actually a rich type containing the value type (int, string, a row, etc.) but also several phantom types witnessing different properties of the value (detailed description in the "phantom types" section of this documentation). View rows are also values, and rows can be nested at will :

let strange_pack foo = <:value< { a = 1; b = { c = 2; d = $foo$ } } >>

<a name="sending_queries"></a>

Sending queries to the SQL server

Most of macaque efforts are directed towards producing safe and composables SQL queries, and rebuilding typed values from the all-string result of SQL queries. The actual interaction with the SQL server is relatively minor and well separated. It is done through the Query module.

The Query module is designed to work with PGOCaml (and is actually the only part of macaque with a hard dependence on PGOCaml). It thus reproduces the functorized design of PGOCaml, which is intended to support flexible threading/concurrence models. See PGOCaml documentation for more information. The casual user can use the non-functorized values, which are a directly-usable implementation of the QUERY interface, with no threading integrated.

The core Query interface is quite simple :

val query : _ Db.t -> ?log:out_channel -> 'a Sql.query -> 'a Db.monad
val view : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a list Db.monad
val view_one : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a Db.monad
val view_opt : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a option Db.monad

(in the simple case, _ Db.t is _ PGOCaml.t, and 'a Db.monad is just 'a)

The main function is query, which accepts a database handler (the one produced by PGOCaml.connect), the query, and returns the typed result : in case of SELECT, a list of value rows, otherwise (UPDATE, INSERT...) unit. An optional parameter ?log will output the query string in the given output channel, if any.

view, view_one and view_opt are given for convenience. view_one expect exactly one result, and view_opt at most one result; if they get more results, they will raise a Failure exception.

:Note:

The query module is very loosely integrated with the rest of Macaque. In particular, all the typed result processing is done by the Sql modules, through the Sql.handle_query_results function. It makes the Query module inessential : an user willing to experiment other interfaces to the SQL server could easily get the SQL string itself (Sql.sql_of_query, Sql.sql_of_view), call the SQL server in her preferred way, and give back the results to Sql.handle_query_results.

Beware however that Sql.handle_query_results is unsafe, as it strongly relies on results corresponding to the given query; giving incorrect results (eg. coming from another query) results in an undefined behaviour, even possibly a Segfault. You should really not try anything clever except giving the query string to a PostGreSQL server (in any way you like), and sending back the results to Macaque.

<a name="processing_results"></a>

Processing query results

macaque produces Sql.t values from the query results. The real caml values can be accessed using the Sql.get and Sql.getn operators :

val get : < get : _; nul : non_nullable; t : 't #type_info > t -> 't
val getn : < get : _; nul : nullable; t : 't #type_info > t -> 't option

Sql.get operates on non-nullable values, while Sql.getn operates no nullable values and returns an option.

A common use case is to access fields from a row, leading to code such as Sql.get row#field. The problem with this syntax is that it is not chainable : to access field of a subrow (row inside the result row), which is a Sql.t value itself, one has to write code such as Sql.get (Sql.get row#sub_row)#field.

To lighten the notation, a specific syntactic sugar is integrated with the pa_bananas syntax extension : a#!b is equivalent to Sql.get a#b, and a#?b is equivalent to Sql.getn a#b. It is thus possible to write row#!sub_row#!field directly.

Note that a#?b still returns an option, so you can't have something like a#?b#?c (as a#?b is not an object) : you still need to do pattern matching on the option value yourself. I could create an unsafe constructor which would raise an exception in the None case, but that would defeat the point of macaque (as few runtime errors as possible). I also feels that such facilities would encourage nullability-laziness from the user : you have to work on your database design upfront to eradicate nullable types were possible; in the case were a NULL case is a semantic necessity, you will want the explicit option matching anyway.

<a name="single_value_query"></a>

Single value queries

It is sometimes useful to query single values from the database, instead of the usual list-of-rows SELECT query -- for example to request the current time, or the value of a specific counter.

For this purpose, Macaque was extended with:

As an example, the following three definitions of get_time are equivalent:

let time = <:value< current_timestamp () >>

let get_time dbh = Sql.get (Query.view_one dbh <:view< {x = $time$} >>)#x

let get_time dbh = Sql.get (Query.query dbh (Sql.value time))

let get_time dbh = Query.value dbh time

<a name="syntax_description"></a>

General syntax : values, comprehensions, queries

macaque use the Camlp4 quotations system to integrate Domain-specific syntaxes into the Objective Caml language. General principles are as follows :

quotations

<:foo< ... >> is a quotation using the 'foo' syntax expander. macaque provides 6 syntax expanders :

'view' is the default syntax expander : << ... >> is equivalent to <:view< ... >>

antiquotations

Inside a quotations, $foo$ is an antiquotation with denotes a caml-level value :

let value v = <:value< $v$ >> (* the identity function on values *)

antiquotations can also be named in specific context :

let int n = <:value< $int:n$ + 0 >> (* function that transforms a caml int into a Sql.t value *)

named antiquotations are not hardcoded in the syntax definition : they rely on values of the Sql module, which are in the appropriate submodule : $foo:bar$ will expand to Sql.Module.foo bar, where the Foo module depends on the antiquotation position (View, Value, Table_type); this way, on can read the Sql interface to discover named parameters and their meaning; similarly, operators are defined in Sql.Op.

Loosely defined syntax

:BNF:

select ::= view
insert ::= table ':=' value [refinement]
delete ::= row name 'in' table refinement
update ::= row name 'in' table ':=' value refinement
view ::= comprehension
       | view-op '(' view ')' '(' view ')' ...
comprehension ::= comp-result comp-modifiers [refinement]
refinement ::= '|' comp-items ';' ...
comp-result ::= value
              | 'group' tuple ['by' tuple]
comp-modifiers ::= ['order' 'by' order ',' ...] ['limit' value] ['offset' value]
order ::= value ['asc' | 'desc']
comp-item ::= row-binding
            | guard
row-binding ::= row-name 'in' view
guard ::= boolean value
value ::= litteral constants
        | 0-ary operator : 'null', 'current_timestamp'
        | 1-ary operator : 'nullable' value
        | infix binary operator : value '+' value
        | tuple : tuple
        | row '.' field : field-access
        | row '?' field : default value for a field
        | 'cast' value 'as' column-type
tuple ::= '{' tuple-field ';' ... '}'
tuple-field ::= field-name '=' value
              | field-access
field-access ::= value '.' field-name '.' ...
view-op ::=
  | 'union'     | 'union_all'
  | 'intersect' | 'intersect_all'
  | 'except"    | 'except_all'
column-type ::= identifier   (see the "Column Types" section)

:Note:

A few remarks.

<a name="exhaustive_manipulations"></a>

Remark : implicit exhaustive manipulation

It was found that user sometimes forgot the guards in an UPDATE or DELETE query. The results of that mistake are usually rather displeasing (deleting or modifying the whole table). In order to make the mistake more visible and less likely to happen, the "|" before the guards of an UPDATE or DELETE query are not optional by default : if the user forgets the guards altogether, she will also forget the "|" and will get an error.

<a name="view_expressions"></a>

View expressions

:BNF:

view ::= comprehension
       | view-op '(' view ')' '(' view ')' ...
comprehension ::= comp-result comp-modifiers [refinement]
view-op ::=
  | 'union'     | 'union_all'
  | 'intersect' | 'intersect_all'
  | 'except"    | 'except_all'

<a name="view_results"></a>

View results

View results are either a simple value or a GROUP BY expression (see the GROUP BY expressions and aggregate functions part of this document).

The usual return value is a SQL tuple: {a = foo; b = bar} will return a view with columns named a and b and values corresponding to the foo and bar expressions. You can also give the name of a row : << t | t in .. >>

Columns values can be tuples themselves. This is a quite Macaque-specific feature, which makes the language much more homogenous. Tuple types (column or immediate records) can be nested arbitrarily. For example, a cartesian product view :

let cartesian_product va vb = << {a = a; b = b} | a in $va$; b in $vb$ >>

Care is taken that tuple values support all operations the plain SQL values support : Macaque works hard to translate them to expressions the usual SQL server can understand (tuple flattening). This part of macaque is rather delicate and you might encounter bugs with no-yet-well-tested combinations of nested tuples and other macaque features.

<a name="view_modifiers"></a>

View modifiers

View modifiers correspond to following SQL features : LIMIT, OFFSET and ORDER BY. Their behaviour should not come as a surprise : ORDER BY will sort the returned rows according to the specified sorting criteria, LIMIT will restrict the number of returned rows, and OFFSET will skip a given number of rows (for example, LIMIT 2 OFFSET 3 will return the 4th and 5th row of a view).

ORDER BY syntax mimics the usual SQL syntax : << ... order by foo asc, bar desc, ... >>

The ordering modifier (asc | desc) is optional, asc will be choosed by default. If the ordering expression (foo, bar) is a record (immediate tuple or row), no guarantee is given as to the column ordering decided (fields of a macaque record are not ordered, so a lexicographic order cannot be chosen) : the view result will be sorted on each column of the tuple, with the given ordering modifier.

ORDER BY expressions are in the comprehension items scope : they can depend on the rows bound in the latter part of the comprehension. LIMIT and OFFSET are not : obviously << foo LIMIT foo.id | foo in $..$ >> is not a well-formed query.

<a name="comprehension_items"></a>

Comprehension items

There is one important thing to know about comprehension items : row bindings are not sequential, they're simultaneous (let .. and .. and .. in ..) ! This is a not so happy feature, as the comprehension syntax strongly suggest a sequential binding : there is an impedence mismatch between the comprehension syntax and the SQL behaviour. For further discussion, see the Remarks > semantic of row bindings section of this manual.

Of course, guards can depend on the declared rows.

View operators

The usual UNION, INTERSECT and EXCEPT operators can be used in prefix position, with parentheses around their parameters. The syntax allows to pass arbitrary number of parameters, and a left-associativity is applied:

except (foo...) (bar...) (baz...)

is equivalent to

except (except (foo...) (bar...)) (baz...)

<a name="group_by"></a>

GROUP BY expressions and aggregate functions

The SQL query SELECT fields GROUP BY group_fields roughly translate in the group {fields} by {group_fields} expression : after group the aggregating part, and after by the grouping part.

The returned row are the concatenation of the GROUP record ({fields}) and the BY records ({group_fields}).

Example : << group {subtotal = sum[t.a]} by {k = t.b} | t in $view$ >> Will expand to SELECT SUM(t.a) AS subtotal, t.b AS k GROUP BY t.b FROM (...) AS t

<a name="empty_group"></a>

Empty GROUP part

GROUP BY expressions with all fields in the BY part are equivalent to a SELECT DISTINCT query.

<a name="accumulators"></a>

Restrictions on the GROUP record, accumulators [..] syntax

The BY values (here t.b) can be any expression depending on the bound view (here t, but possibly more than one), and anything else in scope. The group values are restricted to match the GROUP .. BY semantic :

<a name="static_group_by_typing"></a>

Corresponding typing errors

<a name="non_grouped_aggregates"></a>

Non-grouped aggregates

Aggregates functions cannot be used outside group by expressions. If you want to use an aggregate functions over all the rows of a table, use a group by expression with an empty BY record :

SELECT max(t.id) FROM ... -> group {max = max[t.id]} by {} | ...

In that case, by is optional : group {max = max[t.id]} | ...

Rationale : aggregate functions are not regular operators; they have a non-trivial semantic, which is modeled by the typing transformations inside the group .. by expressions. Using them outside those expressions would not be safe and could lead to runtime errors if the SQL server doesn't accept an ill-formed query. It is a design goal of macaque to protect from runtime errors.

<a name="descriptions"></a>

Description syntax

Description syntax is used to describe existing database tables, making them as macaque views (internally, it builds a runtime description of the data-base typing informations, to be used by further macaque processing). The syntax is a far relative of SQL CREATE TABLE statements, and can be discovered in the example base.ml file. Currently, no other information that name, type, nullability and default value are accepted.

<:table< recipes ( recipe bigint NOT NULL, name text, amount bigint NOT NULL DEFAULT($ <:value< 0L >> $), category text NOT NULL DEFAULT($ <:value< "unknown category" >> $) ) >>

This is not a table creation/specification tool : it does not free you from the need of creating your tables in the database (with a potentially richer description : foreign keys, etc.). If you change a table description, you will have to duplicate the changes in the caml description as well.

<a name="incr_column"></a>

Auto-incrementing columns

SERIAL columns are not supported (yet). It is however possible to emulate this feature with explicit SEQUENCE manipulations (sequence operators section) :

let my_table = <:table< my_sql_table ( id bigint NOT NULL DEFAULT(nextval $table_id_seq$), ... ) >>

At insertion site, one can then use the default access syntax to get the next identifier:

<:insert< my_table := { id = my_table?id; ... } >>

<a name="sequence_descriptions"></a>

sequence descriptions

It is not possible to create new sequences from Macaque (this is coherent with the choice that macaques allows descriptions, not declarations), but macaque support sequence descriptions. Sequence creation operators are part of the Sql.Sequence module.

let table_id_seq = <:sequence< serial "the_sql_sequence_name" >>

<a name="check_descriptions"></a>

Checking descriptions

To help you with the macaque/database synchronization, the Check module provide coherence check routines (check_table, check_sequence). It will raise errors if the caml-side description is not faithful to the real table structure (as described by the SQL server table informations). It is possible to enforce an automatic runtime check of every macaque-described table or sequence with the '-check_tables' command line option (camlp4 time), which can be enabled by the use_check ocamlbuild flag (see the OCamlbuild section of this document).

<a name="ocamlbuild"></a>

OCamlbuild

I have created a macaque-specific OCamlbuild plugin. It was originally intended to help during macaque development only, but is probably a valuable ressource if you want to use Macaque inside your project. Besides the classical ocamlfind integration, you'll find the Sql_syntax modules with support for the following tags :

You should generally use the macaque tag, the three other preprocessing tags being there for finer-grained control.

<a name="SQL_operators"></a>

Sql functions, operators and data types

<a name="value_operators"></a>

Sql value operators

macaque can use all function and operators defined in the Sql.Op module, using the standard ocaml syntax (and operator associativities and precedences) :

aggregate function

Sql.Op has some aggregate functions. See the GROUP BY expressions and aggregate functions section of this document for more information.

<a name="value_antiquotations"></a>

Value types antiquotations

macaque supports some SQL data types, some of them having a litteral syntax (ints and string : <:value< 2 >>, <:value< "foo" >>). They can all be constructed by using named antiquotations : <:value< $bool:true$ >> <:value< $float:cos 1.2$ >>

More generally, the type constructors are the values in Sql.Value : <:value< $foo:bar$ >> is equivalent to Sql.Value.foo bar

<a name="column_types"></a>

Column types

Data types used in table descriptions are defined in the Sql.Table_type module. We use a different set of type names, in order to mimic SQL type names and ease specification derivation from existing SQL tables. For example, "integer" is used instead of "int", and will expand to a Sql.Table_type.integer value.

<a name="casts"></a>

Casts

You can use an explicit cast of the form cast foo as bar, where bar is an identifier for a column-type as described in the "Column types" section above. This is useful in particular to work around some limitations of PostgreSQL type inference for nulls (see the "NULL and semantic issues" subsection): if a null somewhere is "too polymorphic" for PostgreSQL, you may want to use for example cast null as integer instead.

Sequence description

Sequence creators are defined in the Sql.Sequence module.

<a name="sequence_ops"></a>

Sequence operators

Sql.Op has some sequence functions. Currently supported are currval and nextval. They can be used to have an auto-incrementing identifier :

<:insert< $table$ := {id = nextval $table_id_seq$; .. } >>

See the src/tests/sequence.ml for example.

<a name="view_antiquotations"></a>

View antiquotations

In view positions, macaque supports antiquotations through the Sql.View module : $foo:bar$ will expand to Sql.View.foo bar.

<a name="phantom_types"></a>

phantom types

Sql.t types sure are heavy. You will be confronted to cluttered error messages with ugly as hell unification problems. Hopes this section helps.

Sql values are packed in a phantom type providing several information about the value. It is an object type with a field t containing type information, and other fields for value information.

Type information fields :

Value information fields :

Now, you probably understand why the error messages tends to be slightly longer than usual. And all sql values carry such information, including (nested) rows...

<a name="type_subtleties"></a>

Specific type fourberies

macaque being strongly (and intricately) typed, you will often find yourself confronted to hostile error messages, which means someone has done something wrong. I hope that you will more often be wrong than macaque. There are nonetheless some specificities that you should be aware of.

<a name="interface_safety"></a>

Sql interface safety

Macaque syntax extensions transform user code into complicated caml code. But the produced codes still lies outside macaque module boundaries, and has access to the same information that the user has. Concretely, as most of the operations needed by those extensions are not typable inside the OCaml type system (for example dynamic construction of an object based on a list of field names and values), some unsafe operations had to be exposed through Sql interface^α. They are marked with the "unsafe" parametrized type (which is only there for documentation purposes) and you should NOT use them : every function having an "unsafe" type somewhere in its interface is to be considered forbidden. All the other functions should be type-safe, otherwise it's a bug.

:Footnote:

α: and there is some Obj magic behind the scene; but it's protected by typing and you won't get a segfault, I hope.

<a name="update_subtyping"></a>

Update subtyping problem

An example of update syntax is << t in $tab$ := {amount = t.amout + 1} >>,- which increment the "amout" column of all rows in table tab. Table tab probably has more fields than just the "amount" colum, so the type of the tuple on the right should be a subtype of table row type.

The problem is that polymorphic subtyping quantification is not available in OCaml type system : subtype relations have to be explicitely constructed from the object types. This is fine when, as in the given example, the right tuple object type is known as camlp4-time (an object with only one field 'amout'), but not when the corresponding value is an antiquotation :

    let update table value predicate =
       <:update< row in $table$ := $value row$ | $predicate row$ >>

In this case, the update syntax is used to define a generic update operation^β : I know of no syntactic way to impose that the right tuple type be a subtype of the table row type.

To keep things simple, there is an ad-hoc rule for the specific update problem : when tuple type is known as camlp4-time, a subtyping relation is used, but when there is a quotation, the two values types are unified : the set tuple has to have exactly the same columns as the table, no less.

This loss of generality could surprise the innocent user. In order to prevent hair scratching, a warning is emitted at camlp4-time when an antiquoted caml value is used for the update tuple. This warning can be disabled using the -sql-nowarn-undetermined-update command line option.

Of course, it is always possible to manually expand an antiquotated value, when the fields name are known at development-time : <:update< t in $table$ := $tup$ | ... >> should be rewritten into <:update< t in $table$ := { foo = $tup$.foo; bar = $tup$.bar } | ... >>

:Footnote:

β: notice how value and predicate are actually functions depending on row values; this kind of thing can greatly improve code factorization among your SQL queries

<a name="general_remarks"></a>

Remarks

<a name="side_effects"></a>

Side effects

It is probably an obvious thing to say, but users should not put expressions which have side-effect when evaluated inside macaque expressions. No guarantee is given that any part of a macaque expression will be evaluated at all, or only once.

In case of doubt, you should explicitely evaluate expressions before handling them to the macaque expression : let my_val = my_expr in << ... $my_val$ ... >>

<a name="row_binding_semantics"></a>

Semantic of row bindings

From the View expressions > Comprehension items part of this document : %

There is one important thing to know about comprehension items : row bindings are not sequential, they're simultaneous (let .. and .. and .. in ..) ! This is a not so happy feature, as the comprehension syntax strongly suggest a sequential binding : there is an impedence mismatch between the comprehension syntax and the SQL behaviour.

The rationale behind this choice is purely pragmatic : SQL queries use simultaneous bindings, and reproducing that choice much facilitate the translation. It would be possible to have sequential bindings, but that's somewhat more complicated, and I supposed it was unnatural to the SQL query writer anyway.

I'm not quite satisfacted with this state of affairs, so it is possible that I replace the current behaviour with sequential declarations in the future. Please do not write code that would break if that was the case : << .. | t1 in $..$; t2 in $ .. t1 $ >> when t1 is already in the scope. Such trickery tremendously hurts readability anyway, so you should not use it even if it wasn't for compatibility reasons.

<a name="camlp4_use"></a>

Camlp4 use

I am under the impression that some "serious" ocaml users try to avoid to include camlp4 in their compilation chain if possible, and to minimize their reliance on camlp4 extensions. Camlp4 is a not-so-simple (and really-not-so-well-documented) tool, and camlp4 extensions are sometimes fragile and more subject to bugs than other pieces of ocaml software; syntactic bugs can have far-reaching effects (they are, however, nearly always spotted at compilation-time) and are difficult to spot for the non-camlp4-aware user.

Specific care has been taken in Macaque to make the camlp4 processing as solid as possible :

<a name="pgocaml_compat"></a>

Macaque and PG'OCaml

Macaque relies on PG'OCaml low-level interface. It is fully compatible with PG'OCaml : you can use a database handler for Macaque queries and PG'OCaml queries at the same time (PG'OCaml stocks some private information in those handler; Macaque doesn't access nor modify them).

PG'OCaml is more mature and its general approach leads to a safer software : Macaque will probably catch less errors at compile-time than PG'OCaml (see the PostGreSQL typing issues of the next section for an area where I believe Macaque lacks behind PG'OCaml in terms of safety), and it is very young software which probably have quite a few bugs remaining.

This inconvenient comes with the flexibility Macaque brings : it is composable, and does not need the SQL server available at compile time (or, more exactly, camlp4-time).

I also believe that PG'OCaml prepared queries model is a bit more efficient than Macaque raw-SQL-queries output. I have done absolutely no work on Macaque optimisation, and my priority are rather safety and composability. If you're interested in investigating Macaque performances, let me know.

<a name="null_inference"></a>

NULL and semantic issues

I have tried to reproduce SQL behaviour as closely as possible. In particular, all operators are really SQL operators, so they will have the SQL behaviour, for example null = null is null, not false.

I have encountered issues with PostGreSQL typing system though. PostGreSQL gives NULL the type 'unknown', which is problematic because sql operators are strongly typed, but the type system is not very expressive. For example NULL + 0 will work okay, but :

base=> SELECT e.n + 0 FROM (SELECT NULL AS n) AS e; ERROR: failed to find conversion function from unknown to integer

I have tried to work around those issues, but think some of them are probably lurking somewhere. It is unfortunate as it undermines Macaque static safety : these runtime errors are subtle and quite difficult to prevent. Please report them if you find some. I hope not.