Home

Awesome

OCaml Towards Clarity and Grace

This is a guide towards writing better OCaml code. We all like to write OCaml code that is correct, maintainable, efficient, and maybe even beautiful. The sad truth is that this cannot be achieved simply by following some rules. Just like it is not enough for a book to be composed of correct sentences and coherent paragraphs. However, a book violating these principles stands little chance of finding many readers. Likewise, this guide tries to help with the small structures in programming OCaml upon which we can hope to build bigger ones.

I am happy to accept more material and to include critical discussion of the recommendations already in place. Uncovered Topics is a place to record desirable topics.

<details> <summary>Table of Contents</summary>

Table of Contents

Created by gh-md-toc

</details>

Other Resources

Below is a list of projects that I like for their clean code that could serve as a general inspiration.

Guiding Principles

This probably needs some discussion

Uncovered Topics

Indentation, Line Length

Rationale:

Comments

Comments generally go before the code they are referencing. The possible exception are declarations in interfaces (mli files, signatures) and types where they can go after the declaration.

Syntactically there are two kinds of comments:

  1. General comments, enclosed in (* and *)
  2. Special comments, enclosed in (** and *)

Special comments are associated with type and values in a program and treated specially by the compiler. They become available in automatically generated documentation. For the association to work, there must be no empty line between a special comment and the element they are associated with. See the section about ocamldoc for details.

Code should always be as clear as possible but that clarity cannot always convey the reason behind a design. Comments have the role to provide it: the why.

What to Comment

What not to Comment

Examples

Caveats

Code duplication is bad but copying comments is worse. Be very careful when copying comments to make sure they are not becoming misleading in a new context.

Viewing rendered documentation

You can use odig odoc && odig doc to view the documentation of all installed packages. If your package uses jbuilder then you can also view the documentation of the package you are working on with jbuilder doc.

Naming and Declarations

OCaml has a few conventions for names. In particular, capitalisation is significant.

General considerations:

Order of declarations: in a module, typically the following order is maintained unless dependencies force a different order or mixing declarations:

  1. Exceptions
  2. Types
  3. Modules
  4. Values

Special cases:

  module Tree: sig
    type 'a t
    val empty: 'a t
  end = struct
    type 'a t = Empty | Tree of 'a t * 'a * 'a t
    let empty = Empty
  end

This ties in with the recommendation above to not use tree for the type declaration in module Tree.

Don't use ;;

There is no place for the double semicolon (;;) in OCaml source code. It is used only interactively in the OCaml toplevel (REPL).

Scoping

Use the module system to group values. It's quite common to define simple values that belong together and to indicate this in their names:

let option_debug     = false
let option_verbosity = High
let option_log       = stdout

It is better to let the module system do the work:

module Option = struct
  let debug     = false
  let verbosity = High
  let log       = stdout
end

A value can now be accessed like in Option.debug. A module simply used for grouping doesn't require an interface.

Constants

  let ( ** ) x y    = Int64.mul (Int64.of_int x) y
  let sec           = 1L
  let sec_per_min   = 60 ** sec
  let sec_per_hour  = 60 ** sec_per_min
  let sec_per_day   = 24 ** sec_per_hour

More details are in the OCaml Manual.

Introduce and Document Interfaces

Module interfaces are the best way to document and control an implementation - employ them widely.

A module encapsulates code serving a specific purpose. An interface should hide implementation details and document the API.

In OCaml, every file abc.ml is a module and should come with an interface abc.mli. In addition, a module can be part of a (file) module using struct .. end. This is the only way to define functors.

module ID : sig
  type t
  val make: unit -> t
  (** [make] creates a unique token *)
  val equal: t -> t -> bool
  (** [equal] is true, if and only if two tokens are the same *)
end = struct
  type t = unit ref
  let make () = ref ()
  let equal x y = x == y (* use pointer equality *)
end

Interfaces not only help document code, but can also prevent needless recompilation (at least when you are using bytecode). As an exception to this rule, if your module only defines signatures then prefer using a .ml file for this, otherwise either the .mli would just be a duplicate of the .ml file, or you'd have to use .mli-only modules which don't have good tooling support.

Polymorphic vs. Regular Variants

OCaml has two kinds of variants: regular and polymorphic variants:

type colour
  = Red   of float
  | Blue  of float
  | Green of float    (* regular      *)

let blue = Blue 1.0

let blue' = `blue 0.1 (* polymorphic, no declaration requited *)

Use regular variants by default. Polymorphic don't require a type declaration, which makes them flexible but also difficult to debug and they lead to complicated inferred types. They have their use case in specific use cases but they should not be used simply to avoid a type declaration.

A recent discussion of polymorphic and regular variants is Do I need polymorphic variants?. The subject is encoding a finite state machine where the author would like the type system to prohibit certain state transitions.

Avoid Opening Modules Globally

For convenience, many developers open modules globally in order to gain access to functions without having to use qualified access. Below, Printf, List, and Sys are opened in this way:

open Printf
open List
open Sys

let rec join = function
  | []      -> ""
  | [x]     -> x
  | [x;y]   -> x ^ " and " ^ y
  | x::xs   -> x ^ ", "    ^ join xs

let main () =
  let argv = Array.to_list argv in
  let args = tl argv in
  match args with
  | []        -> printf "Hello, world!\n"
  | names     -> printf "Hello, %s!\n" (join names)

let () = main ()

Do This Instead

Between opening a module globally and not at all, several options exist.

  let main () =
    let argv = Array.to_list Sys.argv in
    let args = List.tl argv in
    match args with
    | []        -> Printf.printf "Hello, world!\n"
    | names     -> Printf.printf "Hello, %s!\n" (join names)
  module L = List

  let printf  = Printf.printf
  let sprintf = Printf.sprintf
  let main () =
    let open Printf in
    let argv = Array.to_list Sys.argv in
    let args = List.tl argv in
    match args with
    | []        -> printf "Hello, world!\n"
    | names     -> printf "Hello, %s!\n" (join names)

  Int64.(add (of_int x) 1_000_000L)

The code above is another way of writing the code below. The module Int64 is open inside the parentheses.

  Int64.add (Int64.of_int x) 1_000_000L

This is especially effective to access constructors that are defined inside another module.

  module M = struct
     type +'a t
     module Infix = struct
       val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
     end
     ...
  end
  ....
  let foo =
     let open M.Infix in
     f x
     >>= fun () ->
     ...

This avoid bringing in all the names from M itself in scope, it only brings in scope the very small number of operators defined by M.Infix

Rationale

Opening a module introduces all its values and constructors into the local module. Since the definitions of these values and constructors are not visible, it becomes very hard to understand the code without tool support. While a developer might argue that tool support is available, it is not during reviews on GitHub or when looking at a printout. The problem is exaggerated when several modules are opened.

Avoid using references

Most code does not require references. Introducing them should be well justified. In particular, using references to implement loops and similar local control flow is probably avoidable.

Equality: != and == vs <> and =

Using != and == for equality is probably wrong and you should use <> and = instead.

OCaml has two kinds of equality:

  1. Structural equality, tested with = and <>. This compares the shape of two values and is typically the correct choice.

  2. Physical equality (pointer equality), tested with == and !=. This compares the address in memory of two values. This is typically used in performance-oriented code. Pointer equality implies structural equality but not vice versa.

if vs. match - use of semicolon

Be aware that statement sequences require begin/end in if expressions. OCaml has some inconsistencies how it handles statement sequences (containing ;). Compare:

match true with
  | true -> ()
  | false -> print_endline "1"; print_endline "2"
(* prints nothing *)

versus:

if false then print_endline "1"; print_endline "2"
(* prints 2 *)

A statement sequence inside an if or else branch must be grouped by begin/end or parentheses to be governed by the guard.

if false then begin
  print_endline "1";
  print_endline "2"
end else begin
  print_endline "3";
  print_endline "4"
end

The danger of not realising what code is governed by an if-expression is exasperated by incorrect indentation and by incremental changes to existing code.

Error Messages and Error Handling

type 'a t = Ok of 'a | Error of string
let error fmt = Printf.kprintf (fun msg -> Error msg) fmt

Split imperative and functional code

Purely functional code is easiest to test. Therefore code should be as functional as possible and imperative code minimised.

In order of preference the interface of a module should expose:

Functions - Argument Order

In a function definition, the most common arguments should come first.

While a function can work with any order of formal arguments, putting common arguments make it more usable. For example:

let sum     = List.fold_left (+) 0
let default x = function
  | None   -> x
  | Some y -> y

let sum' xs  = xs |> List.map (default 0) |> sum

Function sum: int list -> int computes the sum from a list of integers. It's definition is so concise because List.fold_left takes its list argument last. Likewise, default and List.map combine well because of the order of their arguments. Usually operations on data structures should always take the data structure last to allow for the above concise data-flow usage.

The best argument order is not always clear - so this should be just a general consideration when writing code.

Functions - Named Parameters

Parameters in a function call are usually identified by their position but may be identified by name. The name becomes part of the type.

(** [prefix ~pre s] returns [true] iff [pre] is a prefix of [s]. *)
let prefix ~pre s =
  let len = String.length pre in
  if len > String.length s then false
  else begin
    let rec check i =
      if i=len then true
      else if String.get s i <> String.get pre i then false
      else check (i+1)
    in
    check 0
  end

Consider using named parameters if a function takes several arguments of the same type that could be confused. Without a named parameter, the above function would have type string -> string -> bool and is it not obvious, which roles the two strings have. As it is, the function has the following type:

val prefix : pre:string -> string -> bool

This function may be called as prefix "foobar" ~pre:"foo" but it can't be called without a named parameter -- which adds to clarity.

Functions - Pattern Matching

Pattern matching is the preferred way to define functions. Patterns are easier to read and extend than if-then-else expressions. Consider matching multiple arguments at the same time to arrive at a short and un-nested function definition.

The example below defines a binary tree and a function sum that sums up the nodes along a path in the tree. It uses pattern matching effectively by matching over two values (the current node, and the current position in the path) at the same time.

type 'a tree
  = Empty
  | Node of 'a tree * 'a * 'a tree

type branch = Left | Right
type path   = branch list

let sum tree path =
  let rec loop acc path tree = match path, tree with
    | []          , _           -> acc
    | _::_        , Empty       -> failwith "path too long"
    | Left::path  , Node(l,n,r) -> loop (acc+n) path l
    | Right::path , Node(l,n,r) -> loop (acc+n) path r
  in
    loop 0 path tree

Functions - Data Flow

The readability of code can often be improved by using the pipe operator |> because it allows functions to be written such that data flows from top to bottom and left to right.

Consider the following function numbers and numbers' that compute a string from a list of optional numbers:

let xs = [Some 1; None; Some 2; None; None; Some 3; Some 4; Some 5] in

let (++) x xs = match x with
  | Some x -> x :: xs
  | None   -> xs

let numbers xs =
  String.concat "," (List.map string_of_int (List.fold_right (++) xs [])

let numbers' xs =
  List.fold_right (++) xs []
  |> List.map string_of_int
  |> String.concat ", "

The definition of numbers is more traditional and numbers' uses the pipe operator. In the latter the data from the argument xs flows through the function definition from top to bottom and left to right. In the more traditional definition, data flows from right to left.

Functions - Avoid Deep Nesting

Code that is deeply nested is hard to read and hard to test. This suggests that it should be restructured - probably by introducing new functions. Pattern matching is a another proven way to reduce the nesting of code.

Functions - Tail Recursion

Recursive functions operating on large data structures need to be tail recursive as otherwise the runtime stack may overflow.

While tail recursiveness is generally desirable, it is often not required because data is not large. For performance it is usually better to pay attention to allocation patterns than to tail recursion.

Be aware that tail recursion can be inhibited by exceptions: the loop below is not tail recursive!

let read_lines inc =
   let rec loop acc =
     try
       let l = input_line inc in
       loop (l :: acc)
     with End_of_file -> List.rev acc
   in
   loop []

To make it tail recursive, handling of values and exceptions need to be combined into one match expression (available since OCaml 4.02):

let read_lines io =
  let rec loop acc =
      match input_line io with
      | l -> loop (l :: acc)
      | exception End_of_file -> List.rev acc
  in
  loop []

Function calls can be annotated to receive a compiler warning if a call is not tail recursive as expected:

let read_lines io =
  let rec loop acc =
    try
      let l = input_line io in
      (loop [@tailcall]) (l :: acc)
    with End_of_file -> List.rev acc
  in
  loop []

The problem with this is that one has to be aware of the problem in the first place to write such an annotation.

Be aware that some functions from the standard library are not tail recursive (e.g. List.map).

Resources and Exceptions: use finally

Ensure that resources are de-allocated in the presence of exceptions. This is typically done with a function like finally:

let finally (f: unit -> 'a) (free: unit -> 'b) =
  let res = try f () with exn -> free (); raise exn in
  free ();
  res

let with_file path (f: in_channel -> 'a) =
  let io = open_in path in
  finally
    (fun () -> f io)
    (fun () -> close_in io)

with_file "/etc/passwd" (fun io -> input_line io |> print_endline)

In the above code we make sure to close the file even if the function reading from it throws an exception. A function like finally should be already defined in existing projects so there is no need to re-implement it locally.

Resources that need to be managed are not just files but can be anything like database and network connections or timers.

A finer detail is that care should be taken not to destroy the backtrace of an exception if the code in finally (or functions called by it) raise/catch exceptions of its own.

Exceptions: try-with vs. match

OCaml 4.02 introduced the ability to match exceptions in patterns. This is often more elegant than using try .. with. However, a small difference is worth knowing:

let f x =
  try
    match x with
    | [] -> 0
    | _ :: _ -> failwith "not empty" (* caught below *)
  with _ -> 1

let g x =
  match x with
  | [] -> 0
  | _ :: _ -> failwith "not empty" (* escapes *)
  | exception _ -> 1

When using match, the evaluation of the right hand side is not governed by an exception handler, whereas it is when using match inside a try block.

Compare Functions

OCaml provides a generic compare function but it can give unexpected results when comparing complex values or not a desired order. This is the reason that the Map.Make functor requires to define a compare function. Here is a recipe to define a custom compare function:

module Time = struct
  type 'a t =
    { hour:     int
    ; minutes:  int
    ; seconds:  int
    ; info:     'a
    }

  let (<?>) a b =
    if a = 0 then b else a

  let compare a b =
        compare a.hour    b.hour
    <?> compare a.minutes b.minutes
    <?> compare a.seconds b.seconds
    <?> 0
end

This relies on the left-associativity of the infix <?> operator.

Type Compatibility and Functors

Below we have a functor and instantiate it twice:

module X (E: sig end): sig 
  type t 
  val t: t 
end = struct 
  type t = int 
  let t = 0 
end

module X1 = X (struct end)
module X2 = X (struct end)

The types X1.t and X2.t are distinct:

X1.t = X2.t;;
Error: This expression has type X2.t but an expression was expected of type
         X1.t

However, if we construct the two modules slightly different by applying an existing module twice, they are compatible:

module E = struct end
module X1' = X(E)
module X2' = X(E)

X1'.t = X2'.t;;
- : bool = true

Both behaviours can be useful - so be careful when instantiating a functor multiple times.

Objects

OCaml has an object system. It should not be used by default. Since it was introduced, the language gained a number of features that make using objects less convincing. In particular, OCaml supports modules as first-class values. Almost all libraries in the OCaml/Opam ecosystem use a functional style and don't provide a class hierarchy.

Git - Commit Messages

Examples:

Use Pattern Matching for Value Destruction

The readability of code is improved by using pattern matching rather than (deeply nested) if-then-else control flow. Multiple patterns can be matched at the same time.

type color = Red | Yellow | Green

let wait = function (* good *)
  | Red     -> seconds 30
  | Yellow  -> seconds 10
  | Green   -> seconds 0

let wait color = (* bad *)
  if color = Red then seconds 30
  else if color = Yellow then seconds 10
  else seconds 0

Avoid introducing new dependencies

Introducing new dependencies on outside libraries needs to be well justified.

With Opam it is easy to install libraries and to use them. It is usually better to use a well-established and maintained library than to roll your own. This is especially true for well-established protocols, formats and problems in general. But any new library also brings along the responsibility to watch its development.