Home

Awesome

wmk

wmk is a flexible and versatile static site generator written in Python.

<!-- features "Main features" 10 -->

Main features

The following features are present in several static site generators (SSGs); you might almost call them standard:

The following features are among the ones that set wmk apart:

The only major feature that wmk is missing compared to some other SSGs is tight integration with a Javascript assets pipeline and interaction layer. Although wmk allows you to configure virtually any assets processing you like, this nevertheless means that if your site is reliant upon React, Vue or similar, then other options are probably more convenient.

That exception aside, wmk is suitable for building any small or medium-sized static website (up to a couple of thousand pages, depending on the content).

<!-- installation "Installation" 20 -->

Installation

Method 1: git + pip

Clone this repo into your chosen location ($myrepo) and install the necessary Python modules into a virtual environment:

cd $myrepo
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt

After that, either put $myrepo/bin into your $PATH or create a symlink from somewhere in your $PATH to $myrepo/bin/wmk.

Required software (aside from Python, of course):

wmk requires a Unix-like environment. In particular, bash must be installed in /bin/bash, and the directory separator is assumed to be /.

Method 2: Homebrew

If you are on MacOS and already have Homebrew, this is the easiest installation method.

First add the tap to your repositories:

brew tap bk/wmk

Then install wmk from it:

brew install --build-from-source wmk

Method 3: Docker

If you are neither on a modern Linux system nor on MacOS with Homebrew, it may be a better option for you to run wmk via Docker. In that case, after cloning the repo (or simply copying the Dockerfile from it) you can give the command

docker build -t wmk .

in the directory containing the Dockerfile, in order to build an image called wmk. You can then run the various wmk subcommands via Docker, for instance

docker run --rm --volume $(pwd):/data --user $(id -u):$(id -g) wmk b .

to build the wmk project in the current directory, or

docker run --rm -i -t --volume $(pwd):/data --user $(id -u):$(id -g) -p 7007:7007 wmk ws . -i 0.0.0.0

to watch for changes in the current directory and run a webserver for the built files.

Obviously, such commands can be unwieldy, so if you run them regularly you may want to create aliases or wrappers for them.

<!-- usage "Usage: The wmk command" 30 -->

Usage

The wmk command structure is wmk <action> <base_directory>. The base directory is of course the directory containing the source files for the site. (They are actually in subdirectories such as templates, content, etc. – see the "File organization" section below).

<!-- organization "File organization" 40 -->

File organization

Inside a given working directory, wmk assumes the following subdirectories for content and output. They will be created if they do not exist:

<!-- input_formats "Input formats" 45 -->

Input formats

The format of the files in the content/ directory is determined on the basis if their file extension. The following extensions are recognized by default:

Pandoc is turned on automatically for all non-markdown, non-HTML formats in the above list. In order to use such content, a sufficiently recent version of Pandoc therefore must be installed.

The list of input formats and how they are handled is configurable through the content_extensions setting in the config file. See the "Configuration file" section below for details.

Note: The three formats JATS, DocBook and TEI are all XML-based. Files in all three formats would therefore often use the generic .xml extension. However, wmk currently assumes that .xml implies that the JATS format is intended. If you want to force wmk to handle a file with that extension as DocBook or TEI, you would have to add an external YAML metadata file with pandoc_input_format set to the appropriate value.

In-file YAML frontmatter is supported for all of the above except for the three binary formats DOCX, ODT and EPUB. Of course, metadata from an associated external YAML file or inherited metadata applies in all cases. In addition, the "native" metadata seen by Pandoc for most of the formats (more precisely all non-markdown, non-HTML formats other than Textile, which uses YAML frontmatter natively) will be used as a fallback source of in-file metadata, although this is limited to specific standard keys such as title, author and date.

Note that although other input formats are supported, the canonical format is still markdown. Unless there is a special reason to do otherwise it is the most sensible and efficient choice for websites generated using wmk.

<!-- gotchas "A few gotchas" 50 -->

A few gotchas

When creating a website with wmk, you might want to keep the following things in mind lest they surprise you:

<!-- vars "Context variables" 60 -->

Context variables

The Mako/Jinja2 templates, whether they are stand-alone or being used to render markdown (or other) content, receive the following context variables:

In the case of Jinja2 templates, three extra context variables are available:

When templates are rendering markdown (or other) content, they additionally get the following context variables:

For further details on context variables set in the document frontmatter and in index.yaml files, see the "Site, page and nav variables" section below.

<!-- config "Configuration file" 70 -->

Configuration file

A config file, $basedir/wmk_config.yaml, can be used to configure many aspects of how wmk operates. The name of the file may be changed by setting the environment variable WMK_CONFIG which should contain a filename without a leading directory path.

The configuration file must exist (but may be empty). If it specifies a theme and a file named wmk_config.yaml (regardless of the WMK_CONFIG environment variable setting) exists in the theme directory, then any settings in that file will be merged with the main config – unless ignore_theme_conf is true.

It is also possible to split the configuration file up into several smaller files. These are placed in the wmk_config.d/ directory (inside the base directory). The filename of each yaml file in that directory (minus the .yaml extension) is treated as a key and the contents as its value. Subdirectories can be used to represent a nested structure. For instance, the file wmk_config.d/site/colors/darkmode.yaml would contain the settings that will be visible to templates as the site.colors.darkmode variable. Note that the WMK_CONFIG environment variable affects the name of the directory looked for; setting it to myconf.yaml would e.g. mean that wmk will inspect myconf.d/ for extra configuration settings instead of wmk_config.d/ (although this does not apply to themes, whose configuration file/directory name is fixed).

Currently there is support for the following settings:

<!-- pandoc "A note on Pandoc" 80 -->

A note on Pandoc

Pandoc's variant of markdown is very featureful and sophisticated, but since its use in wmk involves spawning an external process for each content file being converted, it is quite a bit slower than Python-Markdown. Therefore, it is only recommended if you really do need it. Often, even if you do, it can be turned on for individual pages or site sections rather than for the entire site. (Of course, if you are working with non-markdown, non-HTML input content, using Pandoc is unavoidable.)

If you decide to use Pandoc for a medium or large site (or if you have a significant amount of non-markdown content), it is recommended to turn the use_cache setting on in the configuration file. When doing this, be aware that content that is sensitive to changes apart from the content file itself will need to be marked as non-cacheable by adding no_cache: true to the frontmatter. If you for instance call the pagelist() shortcode in the page, you would normally want to mark the file in this way.

The markdown_extensions setting will of course not affect pandoc, but there is one extension which is partially emulated in wmk's Pandoc setup, namely toc.

If the toc frontmatter variable is true and the string [TOC] is present as a separate line in a document which is to be processed by pandoc, then it will be asked to generate a table of contents which will be placed in the indicated location, just like the toc extension for Python-Markdown does. The toc_depth setting (whose default value is 3) is respected as well, although only in its integer form and not as a range (such as "2-4"). This applies not only to markdown documents but also to the non-markdown formats handled by Pandoc.

<!-- themes "Available themes" 90 -->

Available themes

There are several wmk themes available:

<!-- shortcodes "Shortcodes" 100 -->

Shortcodes

A shortcode consists of an opening tag, {{<, followed by any number of whitespace characters, followed by a string representing the "short version" of the content, followed by any number of whitespace characters and the closing tag >}}.

A typical use case is to easily embed content from external sites into your markdown (or other) content. More advanced possibilities include formatting a table containing data from a CSV file or generating a cropped and scaled thumbnail image.

Shortcodes are normally implemented as Mako components named <shortcode>.mc in the shortcodes subdirectory of templates (or of some other directory in your template search path, e.g. themes/<my-theme>/templates/shortcodes). If jinja2_templates is set to true, however, the shortcode templates are in Jinja2 format instead, and use the .jc extension rather than .mc.

The shortcode itself looks like a function call. Note that positional arguments can only be used if the component has an appropriate <%page> block declaring the expected arguments.

The shortcode component will have access to a context composed of (1) the parameters directly specified in the shortcode call; (2) the information from the metadata block of the markdown file in which it appears; (3) a counter variable, nth, indicating number of invocations for that kind of shortcode in that markdown document; and (4) the global template variables.

Shortcodes are applied before the content document is converted to HTML, so it is possible to replace a shortcode with markdown content which will then be processed normally. Note, however, that this may lead to undesirable results when you use such shortcodes in a non-markdown content document.

A consequence of this is that shortcodes do not have direct access to (1) the list of files to be processed, i.e. MDCONTENT, or (2) the rendered HTML (including the parts supplied by the Mako template). A shortcode which needs either of these must place a (potential) placeholder in the markdown source as well as a callback in page.POSTPROCESS. Each callback in this list will be called just before the generated HTML is written to htdocs/ (or, in the case of a cached page, after document conversion but right before the Mako layout template is called), receiving the full HTML as a first argument followed by the rest of the context for the page. Examples of such shortcodes are linkto and pagelist, described below. (For more on page.POSTPROCESS and page.PREPROCESS, see the "Site, page and nav variables" section below).

Here is an example of a simple shortcode call in markdown content:

### Yearly expenses

{{< csv_table('expenses_2021.csv') >}}

Here is an example csv_table.mc Mako component that might handle the above shortcode call:

<%page args="csvfile, delimiter=',', caption=None"/>
<%! import os, csv %>
<%
info = []
with open(os.path.join(context.get('DATADIR'), csvfile.strip('/'))) as f:
    info = list(csv.DictReader(f, delimiter=delimiter))
if not info:
    return ''
keys = info[0].keys()
%>
<table class="csv-table">
  % if caption:
    <caption>${ caption }</caption>
  % endif
  <thead>
    <tr>
      % for k in keys:
        <th>${ k }</th>
      % endfor
    </tr>
  </thead>
  <tbody>
    % for row in info:
      <tr>
        % for k in keys:
          <td>${ row[k] }</td>
        % endfor
      </tr>
    % endfor
  </tbody>
</table>

Note that if Jinja2 templates are being used, positional arguments are not supported except for in built-in shortcodes, so the shortcode call in the Markdown in the above example would have to be changed to cvs_table(csvfile='expenses_2021.csv') or similar.

Shortcodes can take up more than one line if desired, for instance:

{{< figure(
      src="/img/2021/11/crocodile-or-alligator.jpg",
      caption="""
Although they appear similar, **crocodiles** and **alligators** differ in easy-to-spot ways:

- crocodiles have narrower and longer heads;
- their snouts are more V-shaped;
- also, crocodiles have a protruding tooth, visible when their mouth is closed.
""") >}}

In this example, the caption contains markdown which would be converted to HTML by the shortcode component (assuming we're dealing with the default figure shortcode).

Note that shortcodes are not escaped inside code blocks, so if you need to show examples of shortcode usage in your content they must be escaped in some way in such contexts. One relatively painless way is to put a non-breaking space character after the opening tag {{< instead of a space.

Default shortcodes

The following default shortcodes are provided by the wmk installation:

<!-- pagevars "Site, page and nav variables" 110 -->

Site, page and nav variables

When a markdown file (or other supported content) is rendered, the Mako template receives a number of context variables as partly described above. A few of these variables, such as MDTEMPLATES and DATADIR set directly by wmk (see above). Others are user-configured either (1) in wmk_config.yaml (the contents of the site object and potentially additional "global" variables in template_context); or (2) the cascade of index.yaml files in the content directory and its subdirectories along with the YAML frontmatter of the markdown file itself, the result of which is placed in the page object.

When gathering the content of the page variable, wmk will start by looking for index.yaml files in each parent directory of the markdown file in question, starting at the root of the content directory and moving upwards, at each step extending and potentially overriding the data gathered at previous stages. Only then will the YAML in the frontmatter of the file itself be parsed and added to the page data.

The file-specific frontmatter may be in the content file itself, or it may be in a separate YAML file with the same name as the content file but with an extra .yaml extension. For instance, if the content filename is important.md, then the YAML file would be named important.md.yaml. If both in-file and external frontmatter is present, the two will be merged, with the in-file values "winning" in case of conflict.

At any point, a data source in this cascade may specify an extra YAML file using the special LOAD variable. This file will then be loaded as well and subsequently treated as if the data in it had been specified directly at the start of the file containing the LOAD directive.

Which variables are defined and used by templates is very much up the user, although a few of them have a predefined meaning to wmk itself. For making it easier to switch between different themes it is however suggested to stick to the following meaning of some of the variables:

The variables site and page are dicts with a thin convenience layer on top which makes it possible to reference subkeys belonging to them in templates using dot notation rather than subscripts. For instance, if page has a dict variable named foo, then a template could contain a fragment such as ${ page.foo.bar or 'splat' } -- even if the foo dict does not contain a key named bar. Without this syntactic sugar you would have to write something much more defensive and long-winded such as ${ page.foo.bar if page.foo and 'bar' in page.foo else 'splat' }.

The nav variable

The nav variable is an easy way of configuring a navigation tree for websites with content that has a hierarchical structure, such as a typical documentation site. It is set via the nav key in the wmk_config.yaml file and is represented in templates as a tree-like Nav object.

A Nav instance is a list-like object with two types of entries: links and sections. A link is just a title and a URL. A section has a title and a list of links or sections (possibly nested).

Each item has a parent (with the nav itself as the top level parent) and a level (starting from 0 for the immediate children of the nav). The nav has a homepage attribute which by default is the first local link in the nav. Each local link has previous and next attributes. Each section has children. There are other attributes but these are the basics.

Manually configured

A typical explicit nav setting looks something like this:

nav:
    - Home: /
    - User Guide:
        - Lorem:
            - Ipsum: /guide/ipsum/
            - Eu fuit: /guide/mageisse/
        - Dolor sit amet: /guide/concupescit/
    - Resources:
        - Community: 'https://example.com/'
        - Source code: 'https://github.com/example/com/'
    - About:
        - License: /about/license/
        - History: /about/history/

A manually configured nav setting of this kind is necessary if you want to have more than one level of nesting or if you need to link to something outside of the site from the nav. Both of these characteristics are present in the above example. However, if you need neither of them, an automatically generated nav may be more convenient.

Automatically generated

A simple nav object with a single level of nesting can also be generated by wmk from the frontmatter of the content files. In order for this to happen two conditions must be met:

  1. The value of nav in wmk_config.yaml is set to auto.

  2. Each item in the config that is to appear in the navigation tree must have at least the key nav_section in the frontmatter. To determine ordering, nav_order or (equivalently) weight may also be specified; and if necessary the page title may be overridden in the nav by setting the nav_title attribute.

The nav_section value Root is special. Pages assigned to that section are placed directly at the front of the nav structure as link items.

Other sections are simply grouped by their nav_section values. Please note that these values are case-sensitive.

Within each section the link items are ordered by the their nav_order/weight value, which should be an integer. If two or more items have the same ordering number, they are ordered by nav_title/title.

The sections themselves are ordered within the nav by the lowest nav_order/weight value of the pages assigned to them. Sections with the same ordering number are sorted alphabetically.

The TOC variable

When a page is rendered, the generated HTML is examined and a simple table of contents object constructed, which will be available to templates as TOC. It contains a list of the top-level headings (i.e. H1 headings, or H2 headings if no H1 headings are present, etc.), with lower-level headings hierarchically arranged in its children. Other attributes are url and title. TOC.item_count contains the heading count (regardless of nesting).

The TOC variable can e.g. be used by the page template to show a table of contents elsewhere on the page.

The table of contents object is not constructed unless each heading has an id attribute. When using the default python-markdown, this means that the toc extension must be active.

System variables

The following frontmatter variables affect the operation of wmk itself, rather than being exclusively used by templates.

Templates

Note that a variable called something like page.foo below is referenced as such in templates but specified in YAML frontmatter simply as foo: somevalue.

For both template and layout, the .mhtml (or .html in the case of Jinja2) extension of the template may be omitted. If the template value appears to have no extension, .mhtml or .html (depending on the template engine) is assumed; but if the intended template file has a different extension, then it must of course be specified.

Likewise, a leading base/ directory may be omitted when specifying template or layout. For instance, a layout value of post would find the template file base/post.mhtml unless a post.mhtml file exists in the template root somewhere in the template search path.

If neither template nor layout has been specified and no default_template setting is found in wmk_config.yaml, the default template name for markdown files is md_base.mhtml (or md_base.html if Jinja2 templates have been selected).

The special template/layout value __empty__ (case-insensitive) indicates that no base template should be applied to the given content file.

Taxonomy handling

A taxonomy is a classification of pieces of content for the purpose of grouping them together. Common taxonomy types are tags, categories, sections and article authors. However, the taxonomy that is appropriate to a particular website mainly depends on the content. On a site with book reviews you would have genres, book authors and publishers, on a movie site you would have genres and actors, and so on. Each set of frontmatter classifiers (e.g. the single classifier tag or the list ['tag', 'tags']) used in a taxonomy may be called a term. Each term may have several values, and each value represents a list of content items associated with it.

Up to version 1.13 of wmk, taxonomies had to be handled by templates, and this is still the best way to do it if you want a form of presentation which is tailored to a particular term. However, as a consequence, themes had to be designed around specific taxonomies, typically tags, categories, or sections. In other words, the presentation of taxonomies was not primarily content-driven.

From version 1.13 it is therefore possible to specify the taxonomy criteria directly in the front matter of the main content page for the corresponding term. Here is an example based on a movie site, for the term director. The content file might be named directors/index.md:

---
title: Directors
date: 2024-11-01
template: base/taxonomy/list.mhtml
TAXONOMY:
  taxon: ['director', 'directors']
  order: name
  detail_template: base/taxonomy/detail.mhtml
  list_settings:
    pagination: true
    per_page: 24
  detail_settings:
    biographies: directors.yaml
    item_template: lib/movie_teaser.mc
---

Below is a list of the directors of the movies
that have been covered on this website.

Click on the name of a director to see a short biography
and an overview of their movies.

The frontmatter variable page.TAXONOMY triggers the special processing of the page, provided that it contains at least the subkeys taxon and detail_template. This special processing consists in the following:

  1. wmk fetches a list of values for the term specified in taxon using the taxonomy_info() method of MDCONTENT. This will be added to the template context as TAXONS.

  2. For each value in the list, wmk renders the template detail_template with the same context, except that the two keys TAXON (the value) and TAXON_INDEX (the 0-based index of the value in the list) are added. Each TAXON has items which represent the pages tagged with that director, and the main job of thet detail page is to show a list of them to the user. The result is written to a destination file the name of which is based on the destination of the rendered Markdown content plus the slug of the string identifying the value (e.g. directors/orson-welles/index.html in this example). The target url will be available as TAXON['url'] (and thus also under the key 'url' for each item in TAXONS).

  3. wmk resumes normal operation by calling the main template with the modified template context as well as the content from the markdown file, and writes the result to the target file.

Please note that the settings in list_settings and detail_settings in the example above are merely for the purposes of illustration. Whether any of them are actually supported is entirely up to the template or theme author. The only subvariables used by wmk itself are taxon, order (if present), and detail_template.

Variables affecting rendering

Note that if two files in the same directory have the same slug, they may both be rendered to the same output file; it is unpredictable which of them will go last (and thus "win the race"). The same kind of conflict may arise between a slug and a filename or even between two filenames containing non-ascii characters. It is up to the content author to take care to avoid this; wmk does nothing to prevent it.

Standard variables and their recommended meaning

The following variables are not used directly by wmk but affect templates in different ways. It is a list of recommendations rather than something which must be necessarily followed.

Typical site variables

Site variables are the keys-value pairs under site: in wmk_config.yaml.

Templates or themes may be configurable through various site variables, e.g. site.paginate for number of items per page in listings or site.mainfont for configuring the font family.

Classic meta tags

These variables mostly relate to the text content and affect the metadata section of the <head> of the HTML page.

Note that this is by no means an exhaustive list of variables likely to affect the <head> part of the generated HTML. For instance, several other variables may affect meta tags used for sharing on social media. One of the more common ones is probably page.image (described below). In any case, the list of supported frontmatter attributes and how they are interpreted is for the most part up to the theme or template author.

Dates

Dates and datetimes should normally be in a format conformant with or similar to ISO 8601, e.g. 2021-09-19 and 2021-09-19T09:19:21+00:00. The T may be replaced with a space and the time zone may be omitted (localtime is assumed). If the datetime string contains hours it should also contain minutes, but seconds may be omitted. If these rules are followed, the following variables are converted to date or datetime objects (depending on the length of the string) before they are passed on to templates.

See also the description of the DATE and MTIME context variables above.

Media content

Taxonomy

See also the description of page.TAXONOMY above. The following are terms commonly used for taxonomy purposes:

<!-- template_filters "Template filters" 120 -->

Template filters

In addition to the built-in template filters provided by Mako or Jinja2 respectively, the following filters are by default made available in templates:

If you wish to provide additional filters in Mako without having to explicitly define or import them in templates, the best way of doing this his to add them via the mako_imports setting in wmk_config.yaml (see above). There is currently no easy way to do this if Jinja2 templates are being used, however.

Please note that in order to avoid conflicts with the above filters you should not place a file named wmk_mako_filters.py or wmk_jinja2_extras.py in your py/ directories.

<!-- mdcontentlist "Working with lists of pages" 130 -->

Working with lists of pages

Templates which render a list of content files (e.g. a list of blog posts or pages belonging to a category) will need to filter or sort MDCONTENT accordingly. In order to make this easier, MDCONTENT is wrapped in a list-like object called MDContentList, which has the following methods:

General searching/filtering

Each of the following methods returns a new MDContentList containing those entries for which the predicate (pred) is True.

Specialized searching/filtering

All of these return a new MDContentList object (at least by default).

A match_expr for page_match() is either a dict or a list of dicts. If it is a dict, each page in the result set must match each of the attributes specified in it. If it is a list of dicts, each page in the result set must match at least one of the dicts (i.e., the returned result set contains the union of all matches from all dicts in the list). When a string or regular expression match is being performed in this process, it will be case-insensitive. The supported attributes (i.e. dict keys) are as follows:

Searching/filtering using SQL

An MDContentList has three methods for examining the content using an SQLite in-memory database:

The content table constructed by get_db() always contains the columns source_file, source_file_short, url target, template, MTIME, DATE, doc, and rendered. In addition, it contains each page metadata field that appears in any of the entries in the MDContentList in question. These will be added as columns with the page_ prefix; for instance, the title field will become page_title.

It should be noted that all page fields added to the table will have to match the regular expression ^[a-z]\w*$. Thus, any metadata field with a key that is all uppercase, titlecased, or contains non-word characters (such as hyphens) will be omitted. Also, field names are case-sensitive in the raw metadata, but case-insensitive in the database table, so inconsistently capitalized field names may lead to unexpected results.

A field value that is not either string, integer, float, boolean, date, datetime, or None, will be serialized using json.dumps() with ensure_ascii set to False (for easier utf-8 matching). Dates and datetimes are stringified. Booleans will be represented as 1 or 0.

Sorting

All of these return a new MDContentList object with the entries in the specified order.

Pagination

Typical usage of paginate():

<%
  posts = MDCONTENT.posts()
  chunks, page_urls = posts.paginate(5, context)
  curpage = context.get('_page', 1)
%>

% for post in chunks[curpage-1]:
  ${ show_post(post) }
% endfor

% if len(chunks) > 1:
  ${ prevnext(len(chunks), curpage, page_urls) }
% endif

Render to an arbitrary file

Typical usage of write_to():

<%
  if not CHUNK:
     for tag in tags:
         tagged = MDCONTENT.has_tag([tag])
         if not tagged:
             continue  # avoid potential infinite loop!
         outpath = '/tags/' + slugify(tag) + '/index.html'
         tagged.write_to(outpath, context, {'TAG': tag})
%>

% if CHUNK:
  ${ list_tagged_pages(TAG, CHUNK) }
% else:
  ${ list_tags() }
% endif
<!-- site_search "Site search" 140 -->

Site search

Using Lunr

Lunr is the only search solution "natively" supported by wmk. That being said, implementing site search is not a simple matter of turning lunr indexing on. It takes a bit of work by the author of the site or theme templates, so depending on your needs it may even be easier to base your search functionality on another solution.

With lunr_index (and optionally lunr_index_fields) in wmk_config.yaml, wmk will build a search index for Lunr.js and place it in idx.json in the webroot. In order to minimize its size, no metadata about each record is saved to the index. Instead, a simple list of pages (with title and summary) is placed in idx.summaries.json. The summary is taken either from one of the frontmatter fields summary, intro or description (in order of preference) or, failing that, from the start of the page body.

If lunr_languages is present in wmk_config.yaml, stemming rules for those languages will be applied when building the index. The value may be a two-letter lowercase country code (ISO-639-1) or a list of such codes. The currently accepted languages are de, da, en, fi, fr, hu, it, nl, no, pt, ro, and ru (this is the intersection of the languages supported by lunr.js and NLTK, respectively). The default language is en. Attempting to specify a non-supported language will raise an exception.

The index is built via the lunr.py module and the stemming support is provided by the Python Natural Language Toolkit.

For information about the supported syntax of the search expression, see the Lunr documentation.

Limitations of Lunr

Overview of alternative solutions

If you are looking for an alternative to lunr, the first thing to consider is whether a server-based solution is needed or whether a Javascript-based client-side solution would be enough.

If the site has a lot of text (more than 200,000 words or so) or if it needs to work even without Javascript, then a server-based solution is required. You then need to decide whether you want to self-host it or if you are ready to pay for a third-party hosted solution. Meilisearch is open source and allows for self-hosting (although a hosted solution called Meilisearch Cloud is also available), while the market leader in hosted site search is probably Algolia.

If, however, a client-side Javascript solution is sufficient, there are several alternatives to lunr that could come into consideration, e.g. Pagefind, Tinysearch, Elasticlunr or Stork.

Whichever solution is picked, you of course need to add the required HTML, CSS and Javascript to the templates for the search functionality to work. You also need to take care of updating the search index whenever the site is built.

Assuming you have opted not to use the built-in lunr support, the index creation/updating step can basically be implemented in two ways:

  1. By running after the build step has finished via a cleanup_commands entry in wmk_config.yaml. This calls a script or another external program which can update the index based on either the HTML in the output folder or the JSON file specified using the mdcontent_json configuration option.

  2. By implementing a hook function in wmk_hooks.py (or wmk_theme_hooks.py), most likely for post_build_actions() or index_content(); see the "Overriding and extending wmk via hooks" section below.

Example: Pagefind

Taking Pagefind as an example of the steps described above, you would, per their documentation, add something similar to this to your templates in an appropriate location:

<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<div id="search"></div>
<script>
    window.addEventListener('DOMContentLoaded', (event) => {
        new PagefindUI({ element: "#search", showSubResults: true });
    });
</script>

It would also be a good idea to make sure you modify all base templates so as to identify the main part of each page with the data-pagefind-body attribute and thus omit repeated elements such as navigation and footer from the index.

Finally, in order to actually create or update the search index whenever the site is built, you would need to add the following to the wmk_config.yaml file:

cleanup_commands:
  - "npx -y pagefind --site htdocs"

This obviously assumes that you have npm installed on your system.

<!-- hooks "Overriding and extending wmk via hooks" 150 -->

Overriding and extending wmk via hooks

Much of the functionality of wmk can be changed by overriding or extending specific steps it performs. This is done by adding Python code to a file named wmk_hooks.py in the project py/ directory. Themes can do the same thing via the wmk_theme_hooks.py file in the theme's py/ directory. If both try to affect the same functionality, the project directory takes precedence.

Currently, the following defs from wmk.py can be extended by running hooks before or after them, or can be redefined entirely:

In order to override any of these entirely, define a function of the same name in the hooks file. One may also define a function that runs before or after:

You should examine wmk's source code to make sure that any replacement function you may write is compatible with the original in terms of its parameters and possible return values. Updates to wmk may of course make it necessary to change your hook functions.

Examples

Here is a generic get_extra_content() def which adds HTML pages fetched from a database to the "normal" content from the content/ directory:

def get_extra_content(content, ctdir, datadir, outputdir, template_vars, conf):
    known_ids = set([_['data']['page']['id'] for _ in content])
    content_extensions = { '.html': {'raw': True}, }
    extpat = re.compile(r'\.html$')
    result = _get_articles_from_database()
    for i, row in enumerate(result):
        meta, doc, pseudo = _munge_row(row, i, result, ctdir)
        wmk.process_content_item(
            meta, doc, content, conf, template_vars,
            ctdir, outputdir, datadir, content_extensions, known_ids,
            pseudo['root'], pseudo['fn'],
            pseudo['source_file'], pseudo['source_file_short'],
            extpat, False)

The functions _get_articles_from_database() and _munge_row() are left as an exercise for the reader.

Here is an __after hook for maybe_extra_meta() which fetches a conference schedule (e.g. from from an online calendar) if the conference_id key is present in the frontmatter. The retrieved information will then be available to the templates for that page as page.schedule.

def maybe_extra_meta__after(meta):
    if 'conference_id' in meta:
        meta['schedule'] = _get_conference_schedule(meta['conference_id'])
    return meta

A third example: Let's say you want to show information from a few RSS sources in a sidebar that will appear on several pages. In order to avoid refetching it for each page you can use something like this:

def get_template_vars__after(template_vars):
    if 'rss_sources' in template_vars:
        template_vars['rss_info'] = fetch_rss_feeds(template_vars['rss_sources'])
    return template_vars

This assumes that you set rss_sources in the template_context section of your wmk_config.yaml file.