Awesome
MaCaQue : Macros for Caml Queries
MaCaQue (or macaque) is a DSL for SQL Queries in Caml.
It has the following objectives :
- be highly compositional
- use expressive types to avoid runtime errors
macaque is still a work in progress, please send feedback to
bluestorm dot dylc (gmail)
. I'm interested in bug reports and
reasonable feature requests.
Usual makefile provided (in src/) : make, make install, bleh, make uninstall Depends on PG'Ocaml
Summary of this document :
- Important macaque types and structures
- Sending queries to the SQL server
- Processing query results
- General syntax : values, comprehensions, queries
- View expressions
- GROUP BY expressions and aggregate functions
- Description syntax
- Sql functions, operators and data types
- phantom types
- Specific type fourberies
- Remarks
<a name="macaque_types"></a>
Important macaque types and structures
macaque builds SQL queries/views/values at three different levels :
query
Macaque queries represent SELECT
, INSERT
, DELETE
and UPDATE
queries. They can not be composed.
Examples:
-
Select all rows from table
sql_table
whoseid
is inferior to50
:<:select< row | row in $sql_table$; row.id < 50 >>
-
Delete all rows from
sql_table
whoseid
is superior to50
:<:delete< row in $sql_table$ | row.id > 50 >>
Type : macaques queries have type 'a Sql.query
, where 'a
is the
return type of the query : a row list for SELECT
queries, unit
otherwise.
view
Views represent SQL tables, views and (sub-)-SELECT
results; view
are composable and can be used nearly anywhere a SQL table can.
Examples :
-
User name and category of the user with id superior to
42
<:view< {user = u.name; category = cat.name} | u in $users$; u.id > 32 + 10; cat in $categories$; cat.id = u.cat >>
-
Names of all simple users of the 'prev_view' view (which might be the previous example view)
<:view< {name = row.user} | row in $prev_view$; row.category = "simple user" >>
-
Generic function that will project any view on its "name" column
let names view = <:view< {name = t.name} | t in $view$ >>
-
Generic function that will build an associative table, which the 'name' column in the 'name' field and the whole column in the 't' field.
let names view = <:view< {name = t.name; t = t} | t in $view$ >>
Type : ('a, 'b) Sql.view
, where 'a
is the type of a row
(considered as a value). 'b
is either non_writable
or 'c writable
: only writable views correspond to real tables in the SQL
database, and thus insert/update/delete won't accept non-writable
views. The 'c
type parameter in 'c writable
is useless for now, it
will eventually be used to transport information about default values
for table fields.
:Note:
views and select queries are distinct objects (view = reusable set of rows, select = final query), but views are very common so a specific library function is provided to directly send a view to the SQL server, without having to manually build a select query first.
value
Values represent SQL values and expressions : int
, float
,
string
, and results of SQL data functions and operators.
Examples :
-
2
(as a Sql Value, not a Caml int)<:value< 1 + 1 >>
-
true
(as a Sql Value, not a Caml bool)<:value< 1 = 1 >>
-
the current timestamp (with local timezone information, use
localtimestamp
to omit it):<:value< current_timestamp () >>
-
field 'id' of table 'foo', casted as a nullable value (which can take the
NULL
value)<:value< nullable foo.id >>
-
returns twice the given int, as a Sql value
let double (n : int) = <:value< 2 * $int:n$ >>
-
returns the immediate row with column
foo
andbar
of values1
andbaz
<:value< { foo = 1; bar = "baz" } >>
Type : 'a Sql.t
, the most pervasive type in macaque. 'a
is
actually a rich type containing the value type (int, string, a row,
etc.) but also several phantom types witnessing different properties
of the value (detailed description in the "phantom types" section of
this documentation). View rows are also values, and rows can be nested
at will :
let strange_pack foo = <:value< { a = 1; b = { c = 2; d = $foo$ } } >>
<a name="sending_queries"></a>
Sending queries to the SQL server
Most of macaque efforts are directed towards producing safe and
composables SQL queries, and rebuilding typed values from the
all-string result of SQL queries. The actual interaction with the SQL
server is relatively minor and well separated. It is done through the
Query
module.
The Query
module is designed to work with PGOCaml (and is actually the
only part of macaque with a hard dependence on PGOCaml). It thus
reproduces the functorized design of PGOCaml, which is intended to
support flexible threading/concurrence models. See PGOCaml
documentation for more information. The casual user can use the
non-functorized values, which are a directly-usable implementation of
the QUERY
interface, with no threading integrated.
The core Query
interface is quite simple :
val query : _ Db.t -> ?log:out_channel -> 'a Sql.query -> 'a Db.monad
val view : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a list Db.monad
val view_one : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a Db.monad
val view_opt : _ Db.t -> ?log:out_channel -> 'a Sql.view -> 'a option Db.monad
(in the simple case, _ Db.t
is _ PGOCaml.t
, and 'a Db.monad
is just 'a
)
The main function is query
, which accepts a database handler (the one
produced by PGOCaml.connect
), the query, and returns the typed
result : in case of SELECT
, a list of value rows, otherwise
(UPDATE
, INSERT
...) unit
. An optional parameter ?log
will
output the query string in the given output channel, if any.
view
, view_one
and view_opt
are given for convenience.
view_one
expect exactly one result, and view_opt
at most one
result; if they get more results, they will raise a Failure
exception.
:Note:
The query module is very loosely integrated with the rest of
Macaque. In particular, all the typed result processing is done by
the Sql
modules, through the Sql.handle_query_results
function. It makes the Query
module inessential : an user willing
to experiment other interfaces to the SQL server could easily get
the SQL string itself (Sql.sql_of_query
, Sql.sql_of_view
), call
the SQL server in her preferred way, and give back the results to
Sql.handle_query_results
.
Beware however that Sql.handle_query_results
is unsafe, as it
strongly relies on results corresponding to the given query; giving
incorrect results (eg. coming from another query) results in an
undefined behaviour, even possibly a Segfault. You should really not
try anything clever except giving the query string to a PostGreSQL
server (in any way you like), and sending back the results to
Macaque.
<a name="processing_results"></a>
Processing query results
macaque produces Sql.t
values from the query results. The real caml
values can be accessed using the Sql.get
and Sql.getn
operators :
val get : < get : _; nul : non_nullable; t : 't #type_info > t -> 't
val getn : < get : _; nul : nullable; t : 't #type_info > t -> 't option
Sql.get
operates on non-nullable values, while Sql.getn
operates
no nullable values and returns an option
.
A common use case is to access fields from a row, leading to code such
as Sql.get row#field
. The problem with this syntax is that it is not
chainable : to access field of a subrow (row inside the result row),
which is a Sql.t value itself, one has to write code such as Sql.get (Sql.get row#sub_row)#field
.
To lighten the notation, a specific syntactic sugar is integrated with
the pa_bananas
syntax extension : a#!b
is equivalent to Sql.get a#b
, and a#?b
is equivalent to Sql.getn a#b
. It is thus possible
to write row#!sub_row#!field
directly.
Note that a#?b
still returns an option, so you can't have something
like a#?b#?c
(as a#?b
is not an object) : you still need to do
pattern matching on the option value yourself. I could create an
unsafe constructor which would raise an exception in the None
case,
but that would defeat the point of macaque (as few runtime errors as
possible). I also feels that such facilities would encourage
nullability-laziness from the user : you have to work on your database
design upfront to eradicate nullable types were possible; in the case
were a NULL
case is a semantic necessity, you will want the explicit
option matching anyway.
<a name="single_value_query"></a>
Single value queries
It is sometimes useful to query single values from the database, instead of the usual list-of-rows SELECT query -- for example to request the current time, or the value of a specific counter.
For this purpose, Macaque was extended with:
-
A
Sql.value
function building a query from aSql.t
value; This query, once run throughQuery.query
, agains return aSql.t
value, on whichSql.get
orSql.getn
can be used depending on its nullability. -
Convenience functions in Query to get the extracted value directly without using Sql.get(n):
Query.value
for non-nullable value, andQuery.value_opt
for nullable ones.
As an example, the following three definitions of get_time
are equivalent:
let time = <:value< current_timestamp () >>
let get_time dbh = Sql.get (Query.view_one dbh <:view< {x = $time$} >>)#x
let get_time dbh = Sql.get (Query.query dbh (Sql.value time))
let get_time dbh = Query.value dbh time
<a name="syntax_description"></a>
General syntax : values, comprehensions, queries
macaque use the Camlp4 quotations system to integrate Domain-specific syntaxes into the Objective Caml language. General principles are as follows :
quotations
<:foo< ... >>
is a quotation using the 'foo' syntax expander.
macaque provides 6 syntax expanders :
- 'select', 'insert', 'delete', 'update' : queries syntax expanders
- 'view' : view syntax expander
- 'value' : value syntax expander
'view' is the default syntax expander : << ... >>
is equivalent to <:view< ... >>
antiquotations
Inside a quotations, $foo$
is an antiquotation with
denotes a caml-level value :
let value v = <:value< $v$ >> (* the identity function on values *)
antiquotations can also be named in specific context :
let int n = <:value< $int:n$ + 0 >> (* function that transforms a caml int into a Sql.t value *)
named antiquotations are not hardcoded in the syntax definition :
they rely on values of the Sql
module, which are in the appropriate
submodule : $foo:bar$
will expand to Sql.Module.foo bar
, where the
Foo
module depends on the antiquotation position (View
, Value
,
Table_type
); this way, on can read the Sql
interface to discover
named parameters and their meaning; similarly, operators are defined
in Sql.Op
.
Loosely defined syntax
- camlp4 quotations symbols aren't represented
[foo]
means optionalfoo
(one or none)foo sep ...
means list offoo
s separated bysep
(";" or ",")
:BNF:
select ::= view
insert ::= table ':=' value [refinement]
delete ::= row name 'in' table refinement
update ::= row name 'in' table ':=' value refinement
view ::= comprehension
| view-op '(' view ')' '(' view ')' ...
comprehension ::= comp-result comp-modifiers [refinement]
refinement ::= '|' comp-items ';' ...
comp-result ::= value
| 'group' tuple ['by' tuple]
comp-modifiers ::= ['order' 'by' order ',' ...] ['limit' value] ['offset' value]
order ::= value ['asc' | 'desc']
comp-item ::= row-binding
| guard
row-binding ::= row-name 'in' view
guard ::= boolean value
value ::= litteral constants
| 0-ary operator : 'null', 'current_timestamp'
| 1-ary operator : 'nullable' value
| infix binary operator : value '+' value
| tuple : tuple
| row '.' field : field-access
| row '?' field : default value for a field
| 'cast' value 'as' column-type
tuple ::= '{' tuple-field ';' ... '}'
tuple-field ::= field-name '=' value
| field-access
field-access ::= value '.' field-name '.' ...
view-op ::=
| 'union' | 'union_all'
| 'intersect' | 'intersect_all'
| 'except" | 'except_all'
column-type ::= identifier (see the "Column Types" section)
:Note:
A few remarks.
-
";"-separated lists allows an optional ending ";", to match OCaml syntax : {a;b;}
-
'table' denotes a place where an
'a table
type is required -
antiquotations can be used in 'view', 'table' and 'value' places.
You can use antiquoted expressions in all the places where a 'view', 'table' (view) or 'value' (including rows, but not row names or field names) is accepted
-
Tuples, comprehension and guard lists can be empty :
<:view< $row$ | >>
is the view with only one line, which is 'row'<:insert< $table$ := $row$ | >>
inserts a single row<:delete< row in $table$ | >>
deletes a whole table<:update< row in $table$ := $value$ | >>
updates a whole table
In the 'view' and 'insert' cases, the ending "|" is optional. It is however not the case for 'update' and 'delete' : see the remark at the end of this section.
-
In order to simplify the common
{ a = foo.a; b = bar.foobar.b }
use case, tuple field names are optional when the value is a field access : in that case, the name of the accessed field is used (in case of nested access, eg.'foo.bar.baz
, the rightmost name is used, herebaz
); the example can thus be written{ foo.a; bar.foobar.b }
-
if the
BY
record of agroup .. by
expression is empty, theBY
part (by {}
) is optional. See the GROUP BY expressions and aggregate functions section.
<a name="exhaustive_manipulations"></a>
Remark : implicit exhaustive manipulation
It was found that user sometimes forgot the guards in an UPDATE
or
DELETE
query. The results of that mistake are usually rather
displeasing (deleting or modifying the whole table). In order to make
the mistake more visible and less likely to happen, the "|" before the
guards of an UPDATE
or DELETE
query are not optional by default :
if the user forgets the guards altogether, she will also forget the
"|" and will get an error.
<a name="view_expressions"></a>
View expressions
:BNF:
view ::= comprehension
| view-op '(' view ')' '(' view ')' ...
comprehension ::= comp-result comp-modifiers [refinement]
view-op ::=
| 'union' | 'union_all'
| 'intersect' | 'intersect_all'
| 'except" | 'except_all'
<a name="view_results"></a>
View results
View results are either a simple value or a GROUP BY
expression (see
the GROUP BY expressions and aggregate functions part of
this document).
The usual return value is a SQL tuple: {a = foo; b = bar}
will return
a view with columns named a
and b
and values corresponding to the
foo
and bar
expressions.
You can also give the name of a row : << t | t in .. >>
Columns values can be tuples themselves. This is a quite Macaque-specific feature, which makes the language much more homogenous. Tuple types (column or immediate records) can be nested arbitrarily. For example, a cartesian product view :
let cartesian_product va vb = << {a = a; b = b} | a in $va$; b in $vb$ >>
Care is taken that tuple values support all operations the plain SQL values support : Macaque works hard to translate them to expressions the usual SQL server can understand (tuple flattening). This part of macaque is rather delicate and you might encounter bugs with no-yet-well-tested combinations of nested tuples and other macaque features.
<a name="view_modifiers"></a>
View modifiers
View modifiers correspond to following SQL features : LIMIT
,
OFFSET
and ORDER BY
. Their behaviour should not come as
a surprise : ORDER BY
will sort the returned rows according to the
specified sorting criteria, LIMIT
will restrict the number of
returned rows, and OFFSET
will skip a given number of rows (for
example, LIMIT 2 OFFSET 3
will return the 4th and 5th row of
a view).
ORDER BY
syntax mimics the usual SQL syntax :
<< ... order by foo asc, bar desc, ... >>
The ordering modifier (asc
| desc
) is optional, asc
will be
choosed by default. If the ordering expression (foo, bar)
is
a record (immediate tuple or row), no guarantee is given as to the
column ordering decided (fields of a macaque record are not ordered,
so a lexicographic order cannot be chosen) : the view result will be
sorted on each column of the tuple, with the given ordering modifier.
ORDER BY
expressions are in the comprehension items scope : they can
depend on the rows bound in the latter part of the
comprehension. LIMIT
and OFFSET
are not : obviously << foo LIMIT foo.id | foo in $..$ >>
is not a well-formed query.
<a name="comprehension_items"></a>
Comprehension items
There is one important thing to know about comprehension items : row
bindings are not sequential, they're simultaneous (let .. and .. and .. in ..
) ! This is a not so happy feature, as the comprehension
syntax strongly suggest a sequential binding : there is an impedence
mismatch between the comprehension syntax and the SQL behaviour. For
further discussion, see the Remarks > semantic of row
bindings section of this manual.
Of course, guards can depend on the declared rows.
View operators
The usual UNION, INTERSECT and EXCEPT operators can be used in prefix position, with parentheses around their parameters. The syntax allows to pass arbitrary number of parameters, and a left-associativity is applied:
except (foo...) (bar...) (baz...)
is equivalent to
except (except (foo...) (bar...)) (baz...)
<a name="group_by"></a>
GROUP BY expressions and aggregate functions
The SQL query SELECT fields GROUP BY group_fields
roughly translate
in the group {fields} by {group_fields}
expression : after group
the aggregating part, and after by
the grouping part.
The returned row are the concatenation of the GROUP
record ({fields}
)
and the BY
records ({group_fields}
).
Example : << group {subtotal = sum[t.a]} by {k = t.b} | t in $view$ >> Will expand to SELECT SUM(t.a) AS subtotal, t.b AS k GROUP BY t.b FROM (...) AS t
<a name="empty_group"></a>
Empty GROUP part
GROUP BY
expressions with all fields in the BY
part are equivalent to
a SELECT DISTINCT
query.
<a name="accumulators"></a>
Restrictions on the GROUP record, accumulators [..] syntax
The BY
values (here t.b
) can be any expression depending on the bound
view (here t
, but possibly more than one), and anything else in
scope. The group values are restricted to match the GROUP .. BY
semantic :
- values depending on the bound tables of the view must be inside
"accumulators", that is square brackets :
<:view< group {c = t.c} by {k = t.b} | ... >>
is illegal (and won't type),
- accumulators must be used by an aggregate function, and nowhere else :
<:view< group {c = [t.c]} by {k = t.b} | ... >>
is illegal (and won't type),<:view< group {c = count[t.c]} by {k = t.b} | ... >>
is legal
- fields name bound in the
BY
record can be used freely :<:view< group {c = k; d = k + k; e = count[l]} by {k = t.b; l = t.c} | ... >>
is legal
<a name="static_group_by_typing"></a>
Corresponding typing errors
-
Use of group-varying values outside accumulators
Error: This expression has type Sql.grouped_row but an expression was expected of type < nul : Sql.non_nullable; t : 'a #Sql.row_t; .. > Sql.t
Cause : use of a bound row in the
GROUP
record outside an accumulator -
Use of aggregate functions outside accumulators
Error: This expression has type < nul : 'a; t : 'b > Sql.group Sql.unsafe but an expression was expected of type < nul : 'c; t : 'd; .. > Sql.t Sql.unsafe
Cause : one of the field values is an accumulator expression; accumulators should be consumed by aggregate functions and cannot be returned directly
<a name="non_grouped_aggregates"></a>
Non-grouped aggregates
Aggregates functions cannot be used outside group by expressions. If you want to use an aggregate functions over all the rows of a table, use a group by expression with an empty BY record :
SELECT max(t.id) FROM ...
-> group {max = max[t.id]} by {} | ...
In that case, by
is optional : group {max = max[t.id]} | ...
Rationale : aggregate functions are not regular operators; they have a non-trivial semantic, which is modeled by the typing transformations inside the group .. by expressions. Using them outside those expressions would not be safe and could lead to runtime errors if the SQL server doesn't accept an ill-formed query. It is a design goal of macaque to protect from runtime errors.
<a name="descriptions"></a>
Description syntax
Description syntax is used to describe existing database tables, making
them as macaque views (internally, it builds a runtime description of
the data-base typing informations, to be used by further macaque
processing). The syntax is a far relative of SQL CREATE TABLE
statements, and can be discovered in the example base.ml
file. Currently, no other information that name, type, nullability and
default value are accepted.
<:table< recipes ( recipe bigint NOT NULL, name text, amount bigint NOT NULL DEFAULT($ <:value< 0L >> $), category text NOT NULL DEFAULT($ <:value< "unknown category" >> $) ) >>
This is not a table creation/specification tool : it does not free you from the need of creating your tables in the database (with a potentially richer description : foreign keys, etc.). If you change a table description, you will have to duplicate the changes in the caml description as well.
<a name="incr_column"></a>
Auto-incrementing columns
SERIAL
columns are not supported (yet). It is however possible to
emulate this feature with explicit SEQUENCE
manipulations (sequence
operators section) :
let my_table = <:table< my_sql_table ( id bigint NOT NULL DEFAULT(nextval $table_id_seq$), ... ) >>
At insertion site, one can then use the default access syntax to get the next identifier:
<:insert< my_table := { id = my_table?id; ... } >>
<a name="sequence_descriptions"></a>
sequence descriptions
It is not possible to create new sequences from Macaque (this is
coherent with the choice that macaques allows descriptions, not
declarations), but macaque support sequence descriptions. Sequence
creation operators are part of the Sql.Sequence
module.
let table_id_seq = <:sequence< serial "the_sql_sequence_name" >>
<a name="check_descriptions"></a>
Checking descriptions
To help you with the macaque/database synchronization, the Check
module provide coherence check routines (check_table
,
check_sequence
). It will raise errors if the caml-side description
is not faithful to the real table structure (as described by the SQL
server table informations). It is possible to enforce an automatic
runtime check of every macaque-described table or sequence with the
'-check_tables' command line option (camlp4 time), which can be enabled
by the use_check
ocamlbuild flag (see the OCamlbuild
section of this document).
<a name="ocamlbuild"></a>
OCamlbuild
I have created a macaque-specific OCamlbuild plugin. It was originally
intended to help during macaque development only, but is probably
a valuable ressource if you want to use Macaque inside your
project. Besides the classical ocamlfind integration, you'll find the
Sql_syntax
modules with support for the following tags :
use_macaque
: enable the macaque syntax extensionsuse_check
: enable the-check_tables
flag (see the Decription syntax > Checking table descriptions subsection)
You should generally use the macaque
tag, the three other
preprocessing tags being there for finer-grained control.
<a name="SQL_operators"></a>
Sql functions, operators and data types
<a name="value_operators"></a>
Sql value operators
macaque can use all function and operators defined in the Sql.Op
module, using the standard ocaml syntax (and operator associativities
and precedences) :
<:value< $a$ + $b$ >>
is equivalent toSql.Op.(+) a b
<:value< nullable $foo$ >>
is equivalent toSql.Op.nullable foo
aggregate function
Sql.Op
has some aggregate functions. See the GROUP BY expressions
and aggregate functions section of this document for more
information.
<a name="value_antiquotations"></a>
Value types antiquotations
macaque supports some SQL data types, some of them having a litteral
syntax (ints and string : <:value< 2 >>
, <:value< "foo" >>
). They can
all be constructed by using named antiquotations :
<:value< $bool:true$ >>
<:value< $float:cos 1.2$ >>
More generally, the type constructors are the values in Sql.Value :
<:value< $foo:bar$ >>
is equivalent to Sql.Value.foo bar
<a name="column_types"></a>
Column types
Data types used in table descriptions are defined in the
Sql.Table_type
module. We use a different set of type names, in
order to mimic SQL type names and ease specification derivation from
existing SQL tables. For example, "integer" is used instead of "int",
and will expand to a Sql.Table_type.integer
value.
<a name="casts"></a>
Casts
You can use an explicit cast of the form cast foo as bar
, where
bar
is an identifier for a column-type as described in the "Column
types" section above. This is useful in particular to work around some
limitations of PostgreSQL type inference for nulls (see the "NULL and
semantic issues" subsection): if a null
somewhere is "too
polymorphic" for PostgreSQL, you may want to use for example cast null as integer
instead.
Sequence description
Sequence creators are defined in the Sql.Sequence
module.
<a name="sequence_ops"></a>
Sequence operators
Sql.Op has some sequence functions. Currently supported are
currval
and nextval
. They can be used to have an
auto-incrementing identifier :
<:insert< $table$ := {id = nextval $table_id_seq$; .. } >>
See the src/tests/sequence.ml for example.
<a name="view_antiquotations"></a>
View antiquotations
In view positions, macaque supports antiquotations through the
Sql.View
module : $foo:bar$
will expand to Sql.View.foo bar
.
<a name="phantom_types"></a>
phantom types
Sql.t types sure are heavy. You will be confronted to cluttered error messages with ugly as hell unification problems. Hopes this section helps.
Sql values are packed in a phantom type providing several information
about the value. It is an object type with a field t
containing type
information, and other fields for value information.
Type information fields :
-
'typ' : the corresponding caml type (eg.
<typ : int>
);Sql.null
has a polymorphict
field, asNone
for option typesval null : < t : < typ : 'a; numeric : 'b >; nul : nullable; get : unit > t
-
'numeric' : allows for numerical operators overloading (see
Sql.Op.(+)
type)
Value information fields :
-
'nul' field : nullability information, is either
Sql.nullable
orSql.non_nullable
. Constants from theSql.Data
operators have a polymorphicnul
field, so that they can be used in both nullability context -
'get' : just-built sql values (<:value< 1 >>, etc.) or values retrieved from a SQL query are gettable : you can ask for their corresponding caml value values using the two accessors Sql.get and Sql.getn (get nullable) :
val get : < t : < typ : 'a; ..>; nul : non_nullable; get : _; .. > Sql.t -> 'a val getn : < t : <typ : 'a; ..>; nul : nullable; get : _; .. > Sql.t -> 'a option
On the contrary, values built from SQL operators or field access can not be transformed back into caml values without being first sent to the SQL server as part of a query : they don't have a 'get' field.
Now, you probably understand why the error messages tends to be slightly longer than usual. And all sql values carry such information, including (nested) rows...
<a name="type_subtleties"></a>
Specific type fourberies
macaque being strongly (and intricately) typed, you will often find yourself confronted to hostile error messages, which means someone has done something wrong. I hope that you will more often be wrong than macaque. There are nonetheless some specificities that you should be aware of.
<a name="interface_safety"></a>
Sql interface safety
Macaque syntax extensions transform user code into complicated caml code. But the produced codes still lies outside macaque module boundaries, and has access to the same information that the user has. Concretely, as most of the operations needed by those extensions are not typable inside the OCaml type system (for example dynamic construction of an object based on a list of field names and values), some unsafe operations had to be exposed through Sql interface^α. They are marked with the "unsafe" parametrized type (which is only there for documentation purposes) and you should NOT use them : every function having an "unsafe" type somewhere in its interface is to be considered forbidden. All the other functions should be type-safe, otherwise it's a bug.
:Footnote:
α: and there is some Obj magic behind the scene; but it's protected by typing and you won't get a segfault, I hope.
<a name="update_subtyping"></a>
Update subtyping problem
An example of update syntax is << t in $tab$ := {amount = t.amout + 1} >>,-
which increment the "amout" column of all rows in table tab
. Table
tab
probably has more fields than just the "amount" colum, so the
type of the tuple on the right should be a subtype of table
row
type.
The problem is that polymorphic subtyping quantification is not available in OCaml type system : subtype relations have to be explicitely constructed from the object types. This is fine when, as in the given example, the right tuple object type is known as camlp4-time (an object with only one field 'amout'), but not when the corresponding value is an antiquotation :
let update table value predicate =
<:update< row in $table$ := $value row$ | $predicate row$ >>
In this case, the update syntax is used to define a generic update operation^β : I know of no syntactic way to impose that the right tuple type be a subtype of the table row type.
To keep things simple, there is an ad-hoc rule for the specific update problem : when tuple type is known as camlp4-time, a subtyping relation is used, but when there is a quotation, the two values types are unified : the set tuple has to have exactly the same columns as the table, no less.
This loss of generality could surprise the innocent user. In order to prevent hair scratching, a warning is emitted at camlp4-time when an antiquoted caml value is used for the update tuple. This warning can be disabled using the -sql-nowarn-undetermined-update command line option.
Of course, it is always possible to manually expand an antiquotated value, when the fields name are known at development-time : <:update< t in $table$ := $tup$ | ... >> should be rewritten into <:update< t in $table$ := { foo = $tup$.foo; bar = $tup$.bar } | ... >>
:Footnote:
β: notice how value
and predicate
are actually functions
depending on row
values; this kind of thing can greatly improve
code factorization among your SQL queries
<a name="general_remarks"></a>
Remarks
<a name="side_effects"></a>
Side effects
It is probably an obvious thing to say, but users should not put expressions which have side-effect when evaluated inside macaque expressions. No guarantee is given that any part of a macaque expression will be evaluated at all, or only once.
In case of doubt, you should explicitely evaluate expressions before handling them to the macaque expression : let my_val = my_expr in << ... $my_val$ ... >>
<a name="row_binding_semantics"></a>
Semantic of row bindings
From the View expressions > Comprehension items part of this document : %
There is one important thing to know about comprehension items : row bindings are not sequential, they're simultaneous (let .. and .. and .. in ..) ! This is a not so happy feature, as the comprehension syntax strongly suggest a sequential binding : there is an impedence mismatch between the comprehension syntax and the SQL behaviour.
The rationale behind this choice is purely pragmatic : SQL queries use simultaneous bindings, and reproducing that choice much facilitate the translation. It would be possible to have sequential bindings, but that's somewhat more complicated, and I supposed it was unnatural to the SQL query writer anyway.
I'm not quite satisfacted with this state of affairs, so it is
possible that I replace the current behaviour with sequential
declarations in the future. Please do not write code that would break
if that was the case : << .. | t1 in $..$; t2 in $ .. t1 $ >>
when
t1
is already in the scope. Such trickery tremendously hurts
readability anyway, so you should not use it even if it wasn't for
compatibility reasons.
<a name="camlp4_use"></a>
Camlp4 use
I am under the impression that some "serious" ocaml users try to avoid to include camlp4 in their compilation chain if possible, and to minimize their reliance on camlp4 extensions. Camlp4 is a not-so-simple (and really-not-so-well-documented) tool, and camlp4 extensions are sometimes fragile and more subject to bugs than other pieces of ocaml software; syntactic bugs can have far-reaching effects (they are, however, nearly always spotted at compilation-time) and are difficult to spot for the non-camlp4-aware user.
Specific care has been taken in Macaque to make the camlp4 processing as solid as possible :
-
camlp4 processing is split in two independent extensions (
pa_macaque
,pa_bananas
) so that the user can make fine-grained decisions as to what extension to activate for a specific compilation -
pa_macaque
relies on the "quotation" mechanism of the existing ocaml camlp4 grammar : they DO NOT modify the ocaml grammar but instead reuse the<:..< .. >>
mechanism which is widely used inside camlp4 itself, and thus can be reasonably trusted -
pa_bananas
is the only extension modifying the OCaml grammar itself; it is a simple extension which affects a localized part of the parsing process (the "." level of expressions :a.b
,a#b
..), with no precedence/associativity subtleties. It is a relatively trustable extension, and any spotted defects would be easy to fix.Moreover,
pa_bananas
is only a convenience extension and it is easy not to use it (useSql.get
/Sql.getn
instead of#!
/#?
). If a problem were to arise, it would be easy for a user to mechanically replace allpa_bananas
syntax uses by camlp4-free equivalent expressions, and droppa_bananas
from her compilation chain entirely.
<a name="pgocaml_compat"></a>
Macaque and PG'OCaml
Macaque relies on PG'OCaml low-level interface. It is fully compatible with PG'OCaml : you can use a database handler for Macaque queries and PG'OCaml queries at the same time (PG'OCaml stocks some private information in those handler; Macaque doesn't access nor modify them).
PG'OCaml is more mature and its general approach leads to a safer software : Macaque will probably catch less errors at compile-time than PG'OCaml (see the PostGreSQL typing issues of the next section for an area where I believe Macaque lacks behind PG'OCaml in terms of safety), and it is very young software which probably have quite a few bugs remaining.
This inconvenient comes with the flexibility Macaque brings : it is composable, and does not need the SQL server available at compile time (or, more exactly, camlp4-time).
I also believe that PG'OCaml prepared queries model is a bit more efficient than Macaque raw-SQL-queries output. I have done absolutely no work on Macaque optimisation, and my priority are rather safety and composability. If you're interested in investigating Macaque performances, let me know.
<a name="null_inference"></a>
NULL and semantic issues
I have tried to reproduce SQL behaviour as closely as possible. In
particular, all operators are really SQL operators, so they will have
the SQL behaviour, for example null = null
is null, not false.
I have encountered issues with PostGreSQL typing system
though. PostGreSQL gives NULL the type 'unknown', which is problematic
because sql operators are strongly typed, but the type system is not
very expressive. For example NULL + 0
will work okay, but :
base=> SELECT e.n + 0 FROM (SELECT NULL AS n) AS e; ERROR: failed to find conversion function from unknown to integer
I have tried to work around those issues, but think some of them are probably lurking somewhere. It is unfortunate as it undermines Macaque static safety : these runtime errors are subtle and quite difficult to prevent. Please report them if you find some. I hope not.