Awesome
wmk
wmk is a flexible and versatile static site generator written in Python.
<!-- features "Main features" 10 -->Main features
The following features are present in several static site generators (SSGs); you might almost call them standard:
- Markdown or HTML content with YAML metadata in the frontmatter.
- Support for themes.
- Sass/SCSS support (via
libsass
). - Can generate a search index for use by
lunr.js
. - Shortcodes for more expressive and extensible content.
The following features are among the ones that set wmk apart:
- By default, the content is rendered using Mako, a template system which makes all the resources of Python easily available to you. However Jinja2 templates are also supported if that is what you prefer.
- "Stand-alone" templates – i.e. templates that are not used for presenting markdown-based content – are also rendered if present. This can e.g. be used for list pages or content based on external sources (such as a database).
- Additional data for the site may be loaded from separate YAML files or even (with a small amount of Python/Mako code) from other data sources such as CSV files, SQL databases or REST/graphql APIs.
- The shortcode system is quite powerful and flexible. For instance, among the default shortcodes are an image thumbnailer and a page list component. A shortcode is just a template, so you can easily build your own.
- Optional support for the powerful Pandoc document converter, for the entire site or on a page-by-page basis. This gives you access to such features as LaTeX math markup and academic citations, as well as to Pandoc's well-designed filter system for extending markdown. Pandoc also enables you to export your content to other formats (such as PDF) in addition to HTML, if you so wish.
- Also via Pandoc, support for several non-markdown input formats for content, namely LaTeX, Org, RST, Textile, Djot, Typst, man, JATS, TEI, Docbook, RTF, DOCX, ODT and EPUB.
The only major feature that wmk is missing compared to some other SSGs is tight integration with a Javascript assets pipeline and interaction layer. Although wmk allows you to configure virtually any assets processing you like, this nevertheless means that if your site is reliant upon React, Vue or similar, then other options are probably more convenient.
That exception aside, wmk is suitable for building any small or medium-sized static website (up to a couple of thousand pages, depending on the content).
<!-- installation "Installation" 20 -->Installation
Method 1: git + pip
Clone this repo into your chosen location ($myrepo
) and install the necessary
Python modules into a virtual environment:
cd $myrepo
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
After that, either put $myrepo/bin
into your $PATH
or create a symlink from
somewhere in your $PATH
to $myrepo/bin/wmk
.
Required software (aside from Python, of course):
rsync
(for static file copying).- For
wmk watch
functionality (as well aswatch-serve
), you need eitherinotifywait
orfswatch
to be installed and in your$PATH
. If both are available, the former is preferred.
wmk requires a Unix-like environment. In particular, bash must be installed
in /bin/bash
, and the directory separator is assumed to be /
.
Method 2: Homebrew
If you are on MacOS and already have Homebrew, this is the easiest installation method.
First add the tap to your repositories:
brew tap bk/wmk
Then install wmk from it:
brew install --build-from-source wmk
Method 3: Docker
If you are neither on a modern Linux system nor on MacOS with Homebrew, it may
be a better option for you to run wmk via Docker. In that case, after cloning
the repo (or simply copying the Dockerfile
from it) you can give the command
docker build -t wmk .
in the directory containing the Dockerfile
, in order to build an image called
wmk
. You can then run the various wmk subcommands via Docker, for instance
docker run --rm --volume $(pwd):/data --user $(id -u):$(id -g) wmk b .
to build the wmk project in the current directory, or
docker run --rm -i -t --volume $(pwd):/data --user $(id -u):$(id -g) -p 7007:7007 wmk ws . -i 0.0.0.0
to watch for changes in the current directory and run a webserver for the built files.
Obviously, such commands can be unwieldy, so if you run them regularly you may want to create aliases or wrappers for them.
<!-- usage "Usage: The wmk command" 30 -->Usage
The wmk
command structure is wmk <action> <base_directory>
. The base
directory is of course the directory containing the source files for the site.
(They are actually in subdirectories such as templates
, content
, etc. –
see the "File organization" section below).
-
wmk info $basedir
: Shows the real path to the location ofwmk.py
and of the content base directory. E.g.wmk info .
. Synonyms forinfo
areenv
anddebug
. -
wmk init $basedir
: In a folder which containscontent/
(with markdown or HTML files) but nowmk_config.yaml
, creates some initial templates as well as a samplewmk_config.yaml
, thus making it quicker for you to start a new project. -
wmk build $basedir [-q|--quick]
: Compiles/copies files into$basedir/htdocs
. If-q
or--quick
is specified as the third argument, only files considered to have changed, based on timestamp checking, are processed. Synonyms forrun
arerun
,b
andr
. -
wmk watch $basedir
: Watches for changes in the source directories inside$basedir
and recompiles if changes are detected. (Note thatbuild
is not performed automatically before setting up file wathcing, so you may want to run that first). A synonym forwatch
isw
. -
wmk serve $basedir [-p|--port <portnum>] [-i|--ip <ip-addr>]
: Serves the files in$basedir/htdocs
onhttp://127.0.0.1:7007/
by default. The IP and port can be modified with the-p
and-i
switches or be be configured viawmk_config.yaml
– see the "Configuration file" section). Synonyms forserve
aresrv
ands
. -
wmk watch-serve $basedir [-p|--port <portnum>] [-i|--ip <ip-addr>]
: Combineswatch
andserve
in one command. Synonym:ws
. -
wmk clear-cache $basedir
: Remove the HTML rendering cache, which is a SQLite file in$basedir/tmp/
. This should only be necessary in case of changed shortcodes or shortcode dependencies. Note that the cache can be disabled inwmk_config.yaml
by settinguse_cache
tofalse
, or on file-by-file basis via a frontmatter setting (no_cache
). A synonym forclear-cache
isc
. -
wmk preview $basedir $filename
where$filename
is the name of a file relative to thecontent
subdirectory of$basedir
. This prints (to stdout) the HTML which the given file will be converted to (before it is passed to the template and before potential post-processing). Example:wmk preview . index.md
. -
wmk admin $basedir
: Build the site and then start wmkAdmin, which must have been installed beforehand into theadmin
subdirectory of the$basedir
(or into the subdirectory specified withwmk admin $basedir $subdir
). The subdirectory may be a symbolic link pointing to a central instance. wmkAdmin allows you to manage the content of the site via a web interface. It is not designed to allow you to install or modify themes or perform tasks that require more technical knowledge, and works best for a standard site based on Markdown or HTML files in thecontent
directory. -
wmk repl $basedir
: Launch a Python shell (ipython, bpython or python3, in order of preference) with the wmk environment loaded and with the$basedir
as current working directory. Useful for examining wmk's view of the site content or debuggingMDContent
filtering methods. For these purposes,from wmk import get_content_info
, followed bycontent = get_content_info('.')
is often a good start. -
wmk pip <pip-command>
: Runpip
in the virtual environment used by wmk. Mainly useful for installing or upgrading Python modules that you want to use in Python files belonging to your projects. -
wmk homedir
: Outputs the path towmk
's installation directory. May be useful in shell scripts.
File organization
Inside a given working directory, wmk
assumes the following subdirectories for
content and output. They will be created if they do not exist:
-
htdocs
: The output directory. Rendered, processed or copied content is placed here, andwmk serve
will serve files from this directory. -
templates
: Mako templates (or Jinja2 templates ifjinja2_templates
is set to true inwmk_config.yaml
). Templates with the extension.mhtml
(.html
if Jinja2 templates are being used) are rendered directly intohtdocs
as.html
files (or another extension if the filename ends with.$ext\.mhtml
/$ext\.html
, where$ext
is a string consisting of 2-4 alphanumeric characters), unless their filename starts with a dot or underscore or contains the stringbase
, or if they are inside a subdirectory namedbase
. For details on context variables received by such stand-alone templates, see the "Context variables" section below. -
content
: typically markdown (*.md
) and/or HTML (*.html*
) content with YAML metadata, although other formats are also supported. For a full list, see the "Input formats" section above.- Markdown (or other supported content) will be converted into HTML and then
"wrapped" in a layout using the
template
specified in the metadata ormd_base.mhtml
by default. - HTML files inside
content
are assumed to be fragments rather than complete documents. Accordingly, they will be wrapped in a layout just like the converted markdown. In general, such content is treated just like markdown files except that the markdown-to-html conversion step is skipped. For instance, shortcodes can be used normally, although they may not work as expected if they return markdown rather than HTML. (Complete HTML documents are best placed instatic
rather thancontent
). - The YAML metadata may be (a) at the top of the md/html document itself,
inside a frontmatter block delimited by
---
; (b) in a separate file with the same filename as the content file, but with an extra.yaml
extension added; or (c) it may be inindex.yaml
files which are inherited by subdirectories and the files contained in them. For details, see the "Site, page and nav variables" section below. - The target filename will be
index.html
in a directory corresponding to the basename of the source file – unlesspretty_path
in the metadata isfalse
or the name of the file itself isindex.md
orindex.html
(in which case the relative path is remains the same, except that the extension is of course changed to.html
if the source is a markdown file). - The processed content will be passed to the template as a string in the
context variable
CONTENT
, along with other metadata. - A YAML datasource can be specified in the metadata block as
LOAD
; the data in this file will be added to the context. For further details on the context variables, see the "Context variables" section. - Files that have other extensions than
.md
,.html
or.yaml
will be copied directly over to the (appropriate subdirectory of the)htdocs
directory. This is so as to enable "bundling", i.e. keeping images and "attachments" together with related markdown files.
- Markdown (or other supported content) will be converted into HTML and then
"wrapped" in a layout using the
-
data
: YAML files for additional metadata. May be referenced in frontmatter data or used by templates. Other data files (CSV, SQLite, etc.) should typically also be placed here. -
py
: Directory for Python files. This directory is automatically added to the front ofsys.path
before Mako or Jinja2 is initialized, meaning that templates can import modules placed here. Implicit imports (for Mako only) are possible by settingmako_imports
in the config file (see the "Configuration file" section). There are also two special files that may be placed here:wmk_autolaod.py
in your project, andwmk_theme_autoload.py
in the theme'spy/
directory. If one or both of these is present, wmk imports a dict namedautoload
from them. This means that you can assignPREPROCESS
andPOSTPROCESS
page actions by name (i.e. keys in theautoload
dict) rather than as function references, which in turn makes it possible to specify them in the frontmatter directly rather than having to do it via a shortcode. (For more onPRE-
andPOSTPROCESS
, see the "Site, page and nav variables" section). -
assets
: Assets for an asset pipeline. The only default handling of assets involves compiling SCSS/Sass files in the subdirectoryscss
. They will be compiled to CSS which is placed in the target directoryhtdocs/css
. Other assets handling can be configured via settings in the configuration file, e.g.assets_commands
andassets_fingerprinting
. This will be described in more detail in the "Site, page and nav variables" section. Also take note of thefingerprint
template filter, described in the "Template filters" section. -
static
: Static files. Everything in here will be rsynced directly over tohtdocs
.
Input formats
The format of the files in the content/
directory is determined on the basis
if their file extension. The following extensions are recognized by default:
-
.md
,.mdwn
,.mdown
,.markdown
,.gfm
,.mmd
: Markdown files. If Pandoc is being used, the input formats.gfm
and.mmd
will be assumed to begfm
(GitHub-flavored markdown) andmarkdown_mmd
(MultiMarkdown), respectively. Note, however, that currently non-YAML metadata given in MultiMarkdown format is not picked up automatically in.mmd
files). -
.htm
,.html
: HTML. These are typically not standalone HTML documents but will be "wrapped" by the configured layout template. Like other input files, they may have a YAML frontmatter block. -
.tex
: LaTeX format. Currently ConTeXt is not supported. -
.org
: Org-mode format. -
.rst
: ReStructured Text format (RST). -
.textile
: Textile markup format. -
.dj
: The Djot lightweight markup format. -
.man
: Roff man format. -
.rtf
: Rich Text Format (RTF). -
.typ
: Typst format. -
.jats
,.xml
: The XML-based JATS (Journal Article Tag Suite) format. -
.docbook
: The XML-based DocBook format. -
.tei
: The Simple variant of the XML-based TEI (Text Encoding Initiative) format. -
.docx
: MS Word DOCX a.k.a. "Office Open XML" format. -
.odt
: OpenDocument Text format. -
.epub
: The EPUB e-book format.
Pandoc is turned on automatically for all non-markdown, non-HTML formats in the above list. In order to use such content, a sufficiently recent version of Pandoc therefore must be installed.
The list of input formats and how they are handled is configurable through the
content_extensions
setting in the config file.
See the "Configuration file" section below for details.
Note: The three formats JATS, DocBook and TEI are all XML-based. Files in
all three formats would therefore often use the generic .xml
extension.
However, wmk
currently assumes that .xml
implies that the JATS format is
intended. If you want to force wmk
to handle a file with that extension as
DocBook or TEI, you would have to add an external YAML metadata file with
pandoc_input_format
set to the appropriate value.
In-file YAML frontmatter is supported for all of the above except for the three
binary formats DOCX, ODT and EPUB. Of course, metadata from an associated
external YAML file or inherited metadata applies in all cases. In addition, the
"native" metadata seen by Pandoc for most of the formats (more precisely all
non-markdown, non-HTML formats other than Textile, which uses YAML frontmatter
natively) will be used as a fallback source of in-file metadata, although this
is limited to specific standard keys such as title
, author
and date
.
Note that although other input formats are supported, the canonical format is
still markdown. Unless there is a special reason to do otherwise it is the most
sensible and efficient choice for websites generated using wmk
.
A few gotchas
When creating a website with wmk, you might want to keep the following things in mind lest they surprise you:
-
The order of operations is as follows: (1) Copy files from
static/
; (2) run asset pipeline; (3) render standalone templates fromtemplates
; (4) render markdown content fromcontent
. As a consequence, later steps may overwrite files placed by earlier steps. This is intentional but definitely something to keep in mind. -
For the
run
andwatch
actions when-q
or--quick
is specified as a modifier,wmk.py
uses timestamps to prevent unnecessary re-rendering of templates, markdown files and SCSS sources. The check is rather primitive and does not take account of such things as shortcodes or changed dependencies in the template chain. As a rule,--quick
is therefore not recommended unless you are working on a small, self-contained set of content files. -
If templates or shortcodes have been changed it may sometimes be necessary to clear out the page rendering cache with
wmc c
. During development you may want to adduse_cache: no
to thewmk_config.yaml
file. Also, some pages should never be cached, in which case it is a good idea to addno_cache: true
to their frontmatter. -
If files are removed from source directories the corresponding files in
htdocs/
will not disappear automatically. You have to clear them out manually – or simply remove the entire directory and regenerate.
Context variables
The Mako/Jinja2 templates, whether they are stand-alone or being used to render markdown (or other) content, receive the following context variables:
DATADIR
: The full path to thedata
directory.WEBROOT
: The full path to thehtdocs
directory.CONTENTDIR
: The full path to thecontent
directory.TEMPLATES
: A list of all templates which will potentially be rendered as stand-alone. Each item in the list contains the keyssrc
(relative path to the source template),src_path
(full path to the source template),target
(full path of the file to be written), andurl
(relative url to the file to be written).MDCONTENT
: AnMDContentList
representing all the content files which will potentially be rendered by a template. Each item in the list contains the keyssource_file
,source_file_short
(truncated and full paths to the source),target
(html file to be written),template
(filename of the template which will be used for rendering),data
(most of the context variables seen by this content),doc
(the raw content document source), andurl
(theSELF_URL
value for this content – see below). Note thatMDCONTENT
is not available inside shortcodes. AnMDContentList
is a list object with some convenience methods for filtering and sorting. It will be described further later on.- Whatever is defined under
template_context
in thewmk_config.yaml
file (see the "Configuration file" section below). SELF_URL
: The relative path to the HTML file which the output of the template will be written to.SELF_TEMPLATE
: The path to the current template file (from the template root).ASSETS_MAP
: A map of fingerprinted assets (such as javascript or css files), used by thefingerprint
template filter.LOADER
: The template loader/env. In the case of Mako, this is aTemplateLookup
object; in the case of Jinja2 this is anEnvironment
object with aFileSystemLoader
loader.site
: A dict-like object containing the variables specified under thesite
key inwmk_config.yaml
.CACHE
: An ordinary dictionary object, intended for use by templates as a simple shared in-memory cache.
In the case of Jinja2 templates, three extra context variables are available:
mako_lookup
: A MakoTemplateLookup
instance which makes it possible to call Mako templates from a Jinja2 template.get_context
: A function returning all context variables as a dict.import
: An alias forimportlib.import_module
and can thus be used to import a Python module into a Jinja template as the value of a variable, e.g.{% set utils = import('my_utils') %}
. The main intent is to make code inside the projectpy/
subdirectory as easily available in Jinja templates as it is in Mako templates.
When templates are rendering markdown (or other) content, they additionally get the following context variables:
CONTENT
: The rendered HTML produced from the source document.RAW_CONTENT
: The original source document.SELF_FULL_PATH
: The full filesystem path to the source document file.SELF_SHORT_PATH
: The path to the source document file relative to the content directory.MTIME
: A datetime object representing the modification time for the source file.DATE
: A datetime object representing the first found value ofdate
,pubdate
,modified_date
,expire_date
, orcreated_date
found in the YAML front matter, or theMTIME
value as a fallback. Since this is guaranteed to be present, it is natural to use it for sorting and generic display purposes.RENDERER
: A callable which enables a template to render markdown inwmk
's own environment. This is mainly so that it is possible to support shortcodes which depend on other markdown content which itself may contain shortcodes. The callable receives a dict containing the keysdoc
(the markdown) anddata
(the context variables) and returns rendered HTML.page
: A dict-like object containing the variables defined in the YAML meta section at the top of the markdown file, inindex.yaml
files in the markdown file directory and its parent directories insidecontent
, and possibly in YAML files from thedata
directory loaded via theLOAD
directive in the metadata.
For further details on context variables set in the document frontmatter and in
index.yaml
files, see the "Site, page and nav variables" section below.
Configuration file
A config file, $basedir/wmk_config.yaml
, can be used to configure many aspects
of how wmk
operates. The name of the file may be changed by setting the
environment variable WMK_CONFIG
which should contain a filename without a
leading directory path.
The configuration file must exist (but may be empty). If it specifies a theme
and a file named wmk_config.yaml
(regardless of the WMK_CONFIG
environment
variable setting) exists in the theme directory, then any settings in that file
will be merged with the main config – unless ignore_theme_conf
is true.
It is also possible to split the configuration file up into several smaller
files. These are placed in the wmk_config.d/
directory (inside the base
directory). The filename of each yaml file in that directory (minus the .yaml
extension) is treated as a key and the contents as its value. Subdirectories can
be used to represent a nested structure. For instance, the file
wmk_config.d/site/colors/darkmode.yaml
would contain the settings that
will be visible to templates as the site.colors.darkmode
variable.
Note that the WMK_CONFIG
environment variable affects the name of the
directory looked for; setting it to myconf.yaml
would e.g. mean that wmk
will inspect myconf.d/
for extra configuration settings instead of
wmk_config.d/
(although this does not apply to themes, whose configuration
file/directory name is fixed).
Currently there is support for the following settings:
-
template_context
: Default values for the context passed to templates. This should be a dict. -
site
: Values for common information relating to the website. These are also added to the template context under the keysite
. They are often used by templates and themes to affect the look and feel of the website. For further discussion, see the "Site, page and nav variables" section below. -
render_drafts
: Normally, content files withdraft
set to a true value in the metadata section will be skipped during rendering. This can be turned off (so that thedraft
status flag is ignored) by settingrender_drafts
to True in the config file. -
markdown_extensions
: A list of extensions to enable for markdown processing by Python-Markdown. The default is['extra', 'sane_lists']
. If you specify third-party extensions here, you have to install them into the Python virtual environment first. Obviously, this has no effect ifpandoc
is true. May be set or overridden through frontmatter variables. -
markdown_extension_configs
: Settings for your markdown extensions. May be set in the config file or in the frontmatter. For convenience, there are special frontmatter settings for two extensions, namely fortoc
andwikilinks
:- The
toc
boolean setting will turn thetoc
extension off if set to False and on if set to True, regardless of its presence inmarkdown_extensions
. - If
toc
is inmarkdown_extensions
(or has been turned on via thetoc
boolean), then thetoc_depth
frontmatter variable will affect the configuration of the extension regardless of themarkdown_extension_configs
setting. - If
wikilinks
is inmarkdown_extensions
then the options specified in thewikilinks
frontmatter setting will be passed on to the extension. Example:wikilinks: {'base_url': '/somewhere'}
.
- The
-
pandoc
: Normally Python-Markdown is used for markdown processing, but if this boolean setting is true, then Pandoc via Pypandoc is used by default instead. This can be turned off or on through frontmatter variables as well. Another config setting which affects whether Pandoc is used iscontent_extensions
, for which see below. -
pandoc_filters
,pandoc_options
: Lists of filters and options for Pandoc. Has no effect unlesspandoc
is true. May be set or overridden through frontmatter variables. -
pandoc_input_format
: Which input format to assume for Pandoc; has no effect unlesspandoc
is true. The default value ismarkdown
. If set, the value should be a markdown subvariant for markdown-like content, i.e. one ofmarkdown
(pandoc-flavoured),gfm
(github-flavoured),markdown_mmd
(MultiMarkdown),markdown_phpextra
,markdown_strict
,commonmark
, orcommonmark_x
. As for other supported input formats, there is little reason to setpandoc_input_format
explicitly for them, since they have no variants in the relevant sense, and the right format is picked based on the file extension. May be set or overridden through frontmatter variables. -
pandoc_output_format
: Output format for Pandoc; has no effect unlesspandoc
is true. This should be a HTML variant, i.e. eitherhtml
,html5
orhtml4
, or alternatively one of the HTML-based slide formats, i.e.s5
,slideous
,slidy
,dzslides
orreavealjs
. Chunked HTML (new in Pandoc 3) is not supported. May be set or overridden through frontmatter variables. -
pandoc_extra_formats
,pandoc_extra_formats_settings
: Ifpandoc
is True, thenpandoc_extra_formats
in the frontmatter can be used to convert to other formats than HTML, for instance PDF or MS Word (docx).pandoc_extra_formats
is a dict where each key is a format name (e.g.pdf
) and its value is the output filename relative to the web root (e.g.subdir/myfile.pdf
). The special valueauto
indicates that the name of the output file should be based on that of the source file but with the file extension replaced by the name of the format. For instance, a source file namedsubdir/index.md
(relative to the content directory) maps to an output file namedsubdir/index.pdf
(relative to the web root directory) if the output format ispdf
, and so on.pandoc_extra_formats_settings
, if present, contains any special settings for the conversion in the form of a dict where each key is a format name and its value is either a dict with the keysextra_args
and/orfilters
, or a list (which then is interpreted as the value of theextra_args
setting). -
slugify_dirs
: Affects the names of directories created inhtdocs
because of thepretty_path
setting. Iftrue
(which is the default), the name will be identical to theslug
of the source file. If explicitly set tofalse
, then the directory name will be the same as the basename of the source file, almost regardless of the characters in the filename. -
use_cache
: boolean, True by default. If you set this to False, the rendering cache will be disabled. This is useful for small and medium-sized projects where the final HTML output often depends on factors other than the content file alone. Note that caching for a specific file can be turned off by puttingno_cache: true
in the frontmatter. -
cache_mtime_matters
: boolean, False by default. Normally only the body of the markdown file and a few selected processing settings make up the cache key. If, on the other hand, this setting is True (either in the configuration file or in the frontmatter), then the modification time of the markdown file affects the cache key, so touching the file is sufficient for refreshing its cache entry. -
use_sass
: A boolean indicating whether to handle Sass/SCSS files inassets/scss
automatically. True by default. -
sass_output_style
: The output style for Sass/SCSS rendering. This should be one ofcompact
,compressed
,expanded
ornested
. The default isexpanded
. Has no effect ifuse_sass
is false. -
assets_map
: An assets map is a mapping from filenames or aliases to names of files containing a hash identifier (under the webroot). A typical entry might thus map from/css/style.css
to/css/style.1234abcdef56.css
. The value of this setting is either a dict or the name of a JSON or YAML file (inside the data directory) containing the mapping. It will be available to templates asASSETS_MAP
. -
assets_fingerprinting
: A boolean indicating whether to automatically fingerprint assets files (i.e. add hash indicators to their names). If true, any fingerprinted files will be added to theASSETS_MAP
template variable. -
assets_fingerprinting_conf
: A dict where the keys are subdirectories of the webroot, e.g.js
orimg/icons
, and the values are dicts containing the keyspattern
and (optionally)exclude
. These are regular expressions indicating which files to fingerprint under these directories. The filename is fingerprinted if it matchespattern
but does not matchexclude
. (The default value ofexclude
looks for files that appear to have been fingerprinted already and thus does not normally need to be set). The default value of this setting is a simple setup for thejs
andcss
subdirectories of the webroot. -
assets_commands
: A list of arbitrary commands to run at the assets compilation stage (just before Sass/SCSS files inassets/scss
are processed, assuminguse_sass
is not false). The commands are run in order inside the base directory of the site. Example:['bin/fetch_external_assets.sh', 'node esbuild.mjs']
. -
lunr_index
: If this is True, a search index forlunr.js
is written as a file namedidx.json
in the root of thehtdocs/
directory. Basic information about each page (title and summary) is additionally written toidx.summaries.json
. -
lunr_index_fields
: The default fields for generating the lunr search index aretitle
andbody
. Additional fields and their weight can be configured through this variable. For instance{"title": 10, "tags": 5, "body": 1}
. Aside frombody
, the fields are assumed to be attributes ofpage
. -
lunr_languages
: A two-letter language code or a list of such codes, indicating which language(s) to use for stemming when building a Lunr index. The default language isen
. For more on this, see the "Site search" section below. -
http
: This is is a dict for configuring the address used forwmk serve
. It may contain either or both of two keys:port
(default: 7007) andip
(default: 127.0.0.1). Can also be set directly via command line options. -
output_directory
: Normally the output will be written to the directoryhtdocs
inside the basedir, but this can be overridden by setting this configuration variable. The value should be a relative path that does not start with/
or.
, e.g.site
orpublic
. -
mako_imports
: A list of Python statements to add to the top of each generated Mako template module file. Generally these are import statements. -
theme
: This is the name of a subdirectory to the directory$basedir/themes
(or a symlink placed there) in which to look for extrastatic
,assets
,py
andtemplate
directories. Note that neithercontent
nordata
directories of a theme will be used bywmk
. A theme-provided template may be rendered as stand-alone page, but only if no local template overrides it (i.e. has the same relative path). Mako's internal template lookup will similarly first look for referenced components in the normaltemplate
directory before looking in the theme directory. Configuration settings fromwmk_config.yaml
in the theme directory will be used as long as they do not conflict with those in the main config file. -
ignore_theme_conf
: If set to true in the main configuration file, this tells wmk to ignore any settings inwmk_config.yaml
in the theme directory. -
extra_template_dirs
: A list of directories in which to look for template files. These are placed after both$basedir/templates
and theme-provided templates in the template engine search path. This makes it possible to build up a library of components which can be easily used on multiple sites and across different themes. -
jinja2_templates
: If this boolean setting is true, it indicates that the template files in thetemplate
directory (and supplied by the theme, or otherwise in the template engine search path) are to be interpreted by Jinja2 rather than Mako. Note that Jinja2 templates used standalone or as layout templates for Markdown content should have the extension.html
rather than.mhtml
. -
redirects
: If this is True or a string pointing to a YAML file in thedata/
directory (whose default name isredirects.yaml
), then wmk will write HTML stubs containing<meta http-equiv="refresh" ...>
in the indicated locations. The contents of the YAML file is a list of entries with the keysfrom
andto
. The former is a path underhtdocs/
or a list of such paths, whileto
is an absolute or relative URL which you are to be redirected to. -
content_extensions
: Customize which file extensions are handled inside thecontent/
directory. May be a list (e.g.['.md', '.html']
) or a dict. The value for each key in the dict should itself be a dict where the following keys have an effect:pandoc
(boolean),pandoc_input_format
(string),is_binary
(boolean),raw
(boolean),pandoc_binary_format
(string). See the value ofDEFAULT_CONTENT_EXTENSIONS
inwmk.py
for details. -
mdcontent_json
: This option may specify the name of a JSON file to which to write the entireMDCONTENT
object in serialized form, along with the environment variables for each page. The destination file may be either inhtdocs/
,data/
ortmp/
. If the file path does not start with one of these,data
is assumed. The specified (or implied) directory must exist. -
init_commands
: A list of arbitrary commands to run at the very beginning of processing, just after theme settings have been loaded and the Python search path configured. They are run in order inside the base directory of the site. -
cleanup_commands
: A list of arbitrary commands to run at the very end of wmk processing. The commands are run in order inside the base directory of the site.
A note on Pandoc
Pandoc's variant of markdown is very featureful and sophisticated, but since its
use in wmk
involves spawning an external process for each content file being
converted, it is quite a bit slower than Python-Markdown. Therefore, it is
only recommended if you really do need it. Often, even if you do, it can be
turned on for individual pages or site sections rather than for the entire site.
(Of course, if you are working with non-markdown, non-HTML input content, using
Pandoc is unavoidable.)
If you decide to use Pandoc for a medium or large site (or if you have a
significant amount of non-markdown content), it is recommended to turn the
use_cache
setting on in the configuration file. When doing this, be aware that
content that is sensitive to changes apart from the content file itself will
need to be marked as non-cacheable by adding no_cache: true
to the
frontmatter. If you for instance call the pagelist()
shortcode in the page,
you would normally want to mark the file in this way.
The markdown_extensions
setting will of course not affect pandoc
, but there
is one extension which is partially emulated in wmk
's Pandoc setup, namely
toc.
If the toc
frontmatter variable is true and the string [TOC]
is
present as a separate line in a document which is to be processed by pandoc,
then it will be asked to generate a table of contents which will be placed in
the indicated location, just like the toc
extension for Python-Markdown does.
The toc_depth
setting (whose default value is 3) is respected as well,
although only in its integer form and not as a range (such as "2-4"
). This
applies not only to markdown documents but also to the non-markdown formats
handled by Pandoc.
Available themes
There are several wmk themes available:
-
Lanyonesque, a blog-oriented theme based on the Jekyll theme Lanyon. Demo.
-
Historia, a flexible single-page theme based on the Story template by HTML5 UP. Demo.
-
Picompany, a general-purpose theme based on the Company template that accompanies the PicoCSS documentation. Demo.
-
Walto, a more lightweight documentation theme. The wmk documentation site uses Walto.
-
Birta, the theme used for bornogtonlist.net
-
Cider, a simple and elegant theme for product or company sites.
Shortcodes
A shortcode consists of an opening tag, {{<
, followed by any number of
whitespace characters, followed by a string representing the "short version" of
the content, followed by any number of whitespace characters and the closing tag
>}}
.
A typical use case is to easily embed content from external sites into your markdown (or other) content. More advanced possibilities include formatting a table containing data from a CSV file or generating a cropped and scaled thumbnail image.
Shortcodes are normally implemented as Mako components named <shortcode>.mc
in
the shortcodes
subdirectory of templates
(or of some other directory in your
template search path, e.g. themes/<my-theme>/templates/shortcodes
). If
jinja2_templates
is set to true, however, the shortcode templates are in
Jinja2 format instead, and use the .jc
extension rather than .mc
.
The shortcode itself looks like a function call. Note that positional
arguments can only be used if the component has an appropriate <%page>
block declaring the expected arguments.
The shortcode component will have access to a context composed of (1) the
parameters directly specified in the shortcode call; (2) the information from
the metadata block of the markdown file in which it appears; (3) a counter
variable, nth
, indicating number of invocations for that kind of shortcode in
that markdown document; and (4) the global template variables.
Shortcodes are applied before the content document is converted to HTML, so it is possible to replace a shortcode with markdown content which will then be processed normally. Note, however, that this may lead to undesirable results when you use such shortcodes in a non-markdown content document.
A consequence of this is that shortcodes do not have direct access to (1)
the list of files to be processed, i.e. MDCONTENT
, or (2) the rendered HTML
(including the parts supplied by the Mako template). A shortcode which needs
either of these must place a (potential) placeholder in the markdown source as
well as a callback in page.POSTPROCESS
. Each callback in this list will be
called just before the generated HTML is written to htdocs/
(or, in the case
of a cached page, after document conversion but right before the Mako layout
template is called), receiving the full HTML as a first argument followed by the
rest of the context for the page. Examples of such shortcodes are linkto
and
pagelist
, described below. (For more on page.POSTPROCESS
and
page.PREPROCESS
, see the "Site, page and nav variables" section below).
Here is an example of a simple shortcode call in markdown content:
### Yearly expenses
{{< csv_table('expenses_2021.csv') >}}
Here is an example csv_table.mc
Mako component that might handle the above
shortcode call:
<%page args="csvfile, delimiter=',', caption=None"/>
<%! import os, csv %>
<%
info = []
with open(os.path.join(context.get('DATADIR'), csvfile.strip('/'))) as f:
info = list(csv.DictReader(f, delimiter=delimiter))
if not info:
return ''
keys = info[0].keys()
%>
<table class="csv-table">
% if caption:
<caption>${ caption }</caption>
% endif
<thead>
<tr>
% for k in keys:
<th>${ k }</th>
% endfor
</tr>
</thead>
<tbody>
% for row in info:
<tr>
% for k in keys:
<td>${ row[k] }</td>
% endfor
</tr>
% endfor
</tbody>
</table>
Note that if Jinja2 templates are being used, positional arguments are not
supported except for in built-in shortcodes, so the shortcode call in the
Markdown in the above example would have to be changed to
cvs_table(csvfile='expenses_2021.csv')
or similar.
Shortcodes can take up more than one line if desired, for instance:
{{< figure(
src="/img/2021/11/crocodile-or-alligator.jpg",
caption="""
Although they appear similar, **crocodiles** and **alligators** differ in easy-to-spot ways:
- crocodiles have narrower and longer heads;
- their snouts are more V-shaped;
- also, crocodiles have a protruding tooth, visible when their mouth is closed.
""") >}}
In this example, the caption contains markdown which would be converted to HTML
by the shortcode component (assuming we're dealing with the default figure
shortcode).
Note that shortcodes are not escaped inside code blocks, so if you need to show
examples of shortcode usage in your content they must be escaped in some way in
such contexts. One relatively painless way is to put a non-breaking space
character after the opening tag {{<
instead of a space.
Default shortcodes
The following default shortcodes are provided by the wmk
installation:
-
figure
: An image wrapped in a<figure>
tag. Accepts the following arguments:src
(the image path or URL),img_link
,link_target
,caption
,figtitle
,alt
,credit
(image attribution),credit_link
,width
,height
,resize
. Except forsrc
, all arguments are optional. The caption and credit will be treated as markdown. Ifresize
is True and width and height have been provided, then a resized version of the image is used instead of the original via theresize_image
shortcode (the details can be controlled by specifying a dict representingresize_image
arguments rather than a boolean; see below). -
gist
: A Github gist. Two arguments, both required:username
andgist_id
. -
include
: Insert the contents of the named file at this point. One required argument:filename
. Optional argument:fallback
(which defaults to the empty string), indicating what to show if the file is not found. The file must be inside the content directory (CONTENTDIR
), otherwise it will not be read. The path is interpreted as relative to the directory in which the content file is placed. A path starting with/
is taken to start atCONTENTDIR
. Nested includes are possible but the paths of sub-includes are interpreted relative to the original directory (rather than the directory in which the included file has been placed). Note thatinclude()
is always handled before other shortcodes. -
linkto
: Links to the first matching (markdown-based) page. The first parameter,page
, specifies the page which is to be linked to. This is either (a) a simple string representing a slug, title, (partial) path/filename or (partial) URL; or (b) amatch_expr
in the form of a dict or list which will be passed topage_match()
with alimit
of 1. Optional arguments:label
(the link text; the default is the title of the matching page);ordering
, passed topage_match()
if applicable;fallback
, the text to be shown if no matching page is found:(LINKTO: page not found)
by default; the booleanunique
, which if set to True causes a fatal error to be raised if multiple pages are found to match; andlink_attr
, which is a string to insert into the<a>
tag (by defaultclass="linkto"
). A query string or anchor ID fragment for the link can be added vialink_append
, e.g.link_append='#section2'
orlink_append='?q=searchstring'
. If the boolean parameterurl_only
is True, then the output will not be a link but only the URL (includinglink_append
, if any). -
pagelist
: Runs apage_match()
and lists the found pages. Required argument:match_expr
. Optional arguments:exclude_expr
,ordering
,limit
,template
,fallback
,template_args
,sql_match
.exclude_expr
is a match expression which serves to exclude pages from the list found using thematch_expr
. For instance,pagelist({'has_tag': True}, exclude_expr={'has_tag': 'private'})
finds all tagged pages except those that have the tagprivate
. The default way of representing the found pages is a simple unordered list of links to them, using the page titles as the link text. If nothing is found, a string specified in thefallback
parameter (by default an empty string) replaces the shortcode call. The formatting of the list can be changed by pointing to a Mako template using thetemplate
argument, which will receive the argumentpagelist
(aMDContentList
of found pages), as well astemplate_args
, if any. The template will only be called if something is found. Ifsql_match
is True, thematch_expr
andordering
andlimit
will be passed topage_match_sql()
(aswhere_clause
,order_by
, andlimit
, respectively) rather than topage_match()
. -
resize_image
: Scales and crops images to a specified size. Required arguments:path
,width
,height
. Optional arguments:op
('fit_width', 'fit_height', 'fit', 'fill'; the last is the default),format
('jpg' or 'png'; default is 'jpg'),quality
(default 0.75 and applies only to jpegs),focal_point
(defaultcenter
; only used forop='fill'
). Returns a path under/resized_images/
(possibly prefixed with the value ofsite.leading_path
) pointing to the resized version of the image. The filename incorporates a SHA1 hash, so repeated requests for the same resize operation are only performed once. The sourcepath
is taken to be relative to theWEBROOT
, i.e. the projecthtdocs
directory. -
template
: The first argument (template
) is either the filename of a template or literal template source code. The heuristic used to distinguish between these two cases is simply that filenames are assumed never to contain whitespace while source code always does. In either case, the template is called and its output inserted into the content document. The boolean argumentis_jinja
(default False) can be used to indicate that the given template source code is to be handled by Jinja2; otherwise Mako is assumed. For template files, however, the currently active engine as determined by the value of thejinja2_templates
is always used, regardless of theis_jinja
parameter. Any additional arguments are passed directly on to the template (which will also see the normal template context for the shortcode itself). -
twitter
: A tweet. Takes atweet_id
, which may be a Twitter status URL or the last part (i.e. the actual ID) of the URL. -
var
: The value of a variable, e.g."page.title"
or"site.description"
. One required argument:varname
. Optional argument:default
(which defaults to the empty string), indicating what to show if the variable is not available. -
vimeo
: A Vimeo video. One required argument:id
. Optional arguments:css_class
,autoplay
,dnt
(do not track),muted
,title
. -
youtube
: A YouTube video. One required argument:id
. Optional arguments:css_class
,autoplay
,title
,nowrap
,nocookie
,width
,height
. -
wp
: A link to Wikipedia. One required argument:title
. Optional arguments:label
,lang
. Example:{{< wp('L.L. Zamenhof', lang='eo') >}}
.
Template library
It is generally up to the site or theme author to define any needed Mako/Jinja
templates. In rare cases, however, the templates are general enough that it may
be natural to distribute them with wmk itself in the form of a Mako template
library located under /lib/
.
seo.mc
The template /lib/seo.mc
makes it easier to format metadata for use in the
<head>
section of a base template. It is used in something like the following
way:
<%namespace import="seo" file="/lib/seo.mc" />
% if page:
${ seo(site, page, url=SELF_URL, title=self.page_title) }
% else:
${ seo(site, page=None, url=SELF_URL, title=self.page_title,
img=self.attr.main_image) }
% endif
This will add common meta tags (including basic OpenGraph and JSON-LD
information). By default, it also adds a <title>
tag. For further details
regarding the functionality, see the template file itself.
atom_xml.mc
The template /lib/atom_xml.mc
can be used to facilitate the creation of an
Atom feed for the website. Set site.base_url
to a valid URL and set
site.atom_feed
to a true value. Then create a file named atom.xml.mhtml
in
the template root, containing something like the following:
<%namespace name="atom" file="/lib/atom_xml.mc" />\
${ atom.feed(contentlist=MDCONTENT.sorted_by_date()) }\
There are several optional parameters (with_img
, get_img
, with_summary
,
get_summary
, pubdate_attr
, updated_attr
, with_full_text
, limit
) for
tweaking the output.
sitemap_xml.mc
Similarly, /lib/sitemap_xml.mc
can be used to create a siteamp.xml
file.
Set site.enable_sitemap
to a true value and ensure that site.base_url
is present.
Then create a file named sitemap.xml.mhtml
in the template root, with the
following content:
<%namespace import="sitemap" file="/lib/sitemap_xml.mc" />\
${ sitemap(contentlist=MDCONTENT) }\
Usage in Jinja templates
No Jinja version of these components has been created, but the Mako version can be called from a Jinja2 template using code such as the following:
{% set seo = mako_lookup.get_template("/lib/seo.mc").get_def("seo") %}
{{ seo.render(site, page, url=SELF_URL, title=page.title) |safe }}
<!-- pagevars "Site, page and nav variables" 110 -->
Site, page and nav variables
When a markdown file (or other supported content) is rendered, the Mako template
receives a number of context variables as partly described above. A few of these
variables, such as MDTEMPLATES
and DATADIR
are set directly by wmk
(see
above). Others are user-configured either (1) in wmk_config.yaml
(the contents
of the site
object and potentially additional "global" variables in
template_context
); or (2) the cascade of index.yaml
files in the content
directory and its subdirectories along with the YAML frontmatter of the markdown
file itself, the result of which is placed in the page
object.
When gathering the content of the page
variable, wmk
will
start by looking for index.yaml
files in each parent directory of the markdown
file in question, starting at the root of the content
directory and moving
upwards, at each step extending and potentially overriding the data gathered at
previous stages. Only then will the YAML in the frontmatter of the file itself
be parsed and added to the page
data.
The file-specific frontmatter may be in the content file itself, or it may be in
a separate YAML file with the same name as the content file but with an extra
.yaml
extension. For instance, if the content filename is important.md
, then
the YAML file would be named important.md.yaml
. If both in-file and external
frontmatter is present, the two will be merged, with the in-file values
"winning" in case of conflict.
At any point, a data source in this cascade may specify an extra YAML file using
the special LOAD
variable. This file will then be loaded as well and
subsequently treated as if the data in it had been specified directly at the
start of the file containing the LOAD
directive.
Which variables are defined and used by templates is very much up the user,
although a few of them have a predefined meaning to wmk
itself. For making it
easier to switch between different themes it is however suggested to stick to
the following meaning of some of the variables:
The variables site
and page
are dicts with a thin convenience layer on top
which makes it possible to reference subkeys belonging to them in templates
using dot notation rather than subscripts. For instance, if page
has a dict
variable named foo
, then a template could contain a fragment such as
${ page.foo.bar or 'splat' }
-- even if the foo
dict does not contain a key
named bar
. Without this syntactic sugar you would have to write something much
more defensive and long-winded such as ${ page.foo.bar if page.foo and 'bar' in page.foo else 'splat' }
.
The nav
variable
The nav
variable is an easy way of configuring a navigation tree
for websites with content that has a hierarchical structure, such as a typical
documentation site. It is set via the nav
key in the wmk_config.yaml
file
and is represented in templates as a Nav
object.
A Nav
instance is a list-like object with two types of entries: links and
sections. A link is just a title and a URL. A section has a title and a list of
links or sections (possibly nested). It may or may not have a url as well.
Each item has a parent
(with the nav
itself as the top level parent) and a
level
(starting from 0 for the immediate children of the nav
).
The nav
has a homepage
attribute which by default is the first local link in
the nav. Each local link has previous
and next
attributes. Each section has
children
. There are other attributes but these are the basics.
Manually configured
A typical explicit nav setting looks something like this:
nav:
- Home: /
- 'User Guide [url=/guide/]':
- Lorem:
- Ipsum: /guide/ipsum/
- Eu fuit: /guide/mageisse/
- Dolor sit amet: /guide/concupescit/
- Resources:
- Community: 'https://example.com/'
- Source code: 'https://github.com/example/com/'
- About:
- License: /about/license/
- History: /about/history/
A manually configured nav
setting of this kind is only necessary if you want
to link to something outside of the site from the nav (as in the above example).
Otherwise, it depends on the kind of content you have whether a manually defined
or an automatically generated nav would be more appropriate to your use case.
Automatically generated
A nav
object can also be generated by wmk
from the frontmatter of the
content files. In order for this to happen two conditions must be met:
-
The value of
nav
inwmk_config.yaml
is set toauto
. -
Each item in the config that is to appear in the navigation tree must have at least the key
nav_section
in the frontmatter. To determine ordering,nav_order
or (equivalently)weight
may also be specified; and if necessary the pagetitle
may be overridden in the nav by setting thenav_title
attribute.
The nav_section
value Root
is special. Pages assigned to that section are
placed directly at the front of the nav
structure. For many sites, you would
simply place this in the index.yaml
file at the root of your content
directory.
Other sections are simply grouped by their nav_section
values. Please note
that these values are case-sensitive.
Within each section the link items are ordered by the their nav_order
/weight
value, which should be an integer. If two or more items have the same ordering
number, they are ordered by nav_title
/title
.
The sections themselves are ordered within the nav by the lowest
nav_order
/weight
value of the pages assigned to them. Sections with the same
ordering number are sorted alphabetically.
A page may be excluded from the nav (even if it has a nav_section
) by setting
its nav_exclude
to a true value.
The pages inside each section may be nested to an arbitrary depth by using the
nav_parent
(or parent
) variable in the frontmatter of the subpages. The
value of this is normally the nav_title
/title
(case-insensitive) of the
parent page. However, if more than one page in the same section has the same
title, then one may disambiguate by specifying the slug
or (in extreme cases)
the id
of the target page instead.
The TOC
variable
When a page is rendered, the generated HTML is examined and a simple table of
contents object constructed, which will be available to templates as TOC
. It
contains a list of the top-level headings (i.e. H1 headings, or H2 headings if
no H1 headings are present, etc.), with lower-level headings hierarchically
arranged in its children
. Other attributes are url
and title
.
TOC.item_count
contains the heading count (regardless of nesting).
The TOC
variable can e.g. be used by the page template to show a table of
contents elsewhere on the page.
The table of contents object is not constructed unless each heading has an id
attribute. When using the default python-markdown, this means that the toc
extension must be active.
System variables
The following frontmatter variables affect the operation of wmk
itself, rather
than being exclusively used by templates.
Templates
Note that a variable called something like page.foo
below is referenced as
such in templates but specified in YAML frontmatter simply as foo: somevalue
.
-
page.template
specifies the template which will render the content. -
page.layout
is used by several other static site generators. For compatibility with them, this variable is supported as a fallback synonym withtemplate
. It has no effect unlesstemplate
has not been specified explicitly anywhere in the cascade of frontmatter data sources.
For both template
and layout
, the .mhtml
(or .html
in the case of
Jinja2) extension of the template may be omitted. If the template
value
appears to have no extension, .mhtml
or .html
(depending on the template
engine) is assumed; but if the intended template file has a different extension,
then it must of course be specified.
Likewise, a leading base/
directory may be omitted when specifying template
or layout
. For instance, a layout
value of post
would find the template
file base/post.mhtml
unless a post.mhtml
file exists in the template root
somewhere in the template search path.
If neither template
nor layout
has been specified and no default_template
setting is found in wmk_config.yaml
, the default template name for markdown
files is md_base.mhtml
(or md_base.html
if Jinja2 templates have been
selected).
The special template
/layout
value __empty__
(case-insensitive) indicates
that no base template should be applied to the given content file.
Taxonomy handling
A taxonomy is a classification of pieces of content for the purpose of grouping
them together. Common taxonomy types are tags, categories, sections and article
authors. However, the taxonomy that is appropriate to a particular website
mainly depends on the content. On a site with book reviews you would have
genres, book authors and publishers, on a movie site you would have genres and
actors, and so on. Each set of frontmatter classifiers (e.g. the single
classifier tag
or the list ['tag', 'tags']
) used in a taxonomy may be called
a term. Each term may have several values, and each value represents a list
of content items associated with it.
Up to version 1.13 of wmk, taxonomies had to be handled by templates, and this is still the best way to do it if you want a form of presentation which is tailored to a particular term. However, as a consequence, themes had to be designed around specific taxonomies, typically tags, categories, or sections. In other words, the presentation of taxonomies was not primarily content-driven.
From version 1.13 it is therefore possible to specify the taxonomy criteria
directly in the front matter of the main content page for the corresponding
term. Here is an example based on a movie site, for the term director. The
content file might be named directors/index.md
:
---
title: Directors
date: 2024-11-01
template: base/taxonomy/list.mhtml
TAXONOMY:
taxon: ['director', 'directors']
order: name
detail_template: base/taxonomy/detail.mhtml
list_settings:
pagination: true
per_page: 24
detail_settings:
biographies: directors.yaml
item_template: lib/movie_teaser.mc
---
Below is a list of the directors of the movies
that have been covered on this website.
Click on the name of a director to see a short biography
and an overview of their movies.
The frontmatter variable page.TAXONOMY
triggers the special processing of the
page, provided that it contains at least the subkeys taxon
and
detail_template
. This special processing consists in the following:
-
wmk fetches a list of values for the term specified in
taxon
using thetaxonomy_info()
method ofMDCONTENT
. This will be added to the template context asTAXONS
. -
For each value in the list, wmk renders the template
detail_template
with the same context, except that the two keysTAXON
(the value) andTAXON_INDEX
(the 0-based index of the value in the list) are added. (If nodetailt_template
is specified, then the template for the page is used). EachTAXON
hasitems
which represent the pages tagged with that director, and the main job of thet detail page is to show a list of them to the user. The result is written to a destination file the name of which is based on the destination of the rendered Markdown content plus the slug of the string identifying the value (e.g.directors/orson-welles/index.html
in this example). The target url will be available asTAXON['url']
(and thus also under the key'url'
for each item inTAXONS
). -
wmk resumes normal operation by calling the main template with the modified template context as well as the content from the markdown file, and writes the result to the target file.
Please note that the settings in list_settings
and detail_settings
in the
example above are merely for the purposes of illustration. Whether any of them
are actually supported is entirely up to the template or theme author. The only
subvariables used by wmk itself are taxon
, order
(if present), and
detail_template
(if present).
Variables affecting rendering
-
page.slug
: If the value ofslug
is nonempty and consists exclusively of lowercase alphanumeric characters, underscores and hyphens (i.e. matches the regular expression^[a-z0-9_-]+$
), then this will be used instead of the basename of the markdown file to determine where to write the output. If aslug
variable is missing, one will be automatically added bywmk
based on the basename of the current content file (as well as, in the case ofindex.*
files, their proximate directory). Note that autogenerated slugs do not affect the location of the destination file. Slugs are not necessarily unique, butpage.id
values are – see below. -
page.pretty_path
: If this is true, the basename of the markdown filename (or the slug) will become a directory name and the HTML output will be written toindex.html
inside that directory. By default it is false for files namedindex.md
orindex.html
and true for all other files. If the filename contains symbols that do not match the character class[\w.,=-]
, then it will be "slugified" before final processing (although this only works for languages using the Latin alphabet). -
page.do_not_render
: Tellswmk
not to write the output of this template to a file inhtdocs
. All other processing will be done, so the gathered information can be used by templates for various purposes. (This is similar to theheadless
setting in Hugo). -
page.draft
: If this is true, it prevents further processing of the markdown file unlessrender_drafts
has been set to true in the config file. -
page.no_cache
: If this is true, the rendering cache will not be used for this file. (See also theuse_cache
setting in the configuration file). -
page.markdown_extensions
,page.markdown_extension_configs
,page.pandoc
,page.pandoc_filters
,page.pandoc_options
,page.pandoc_input_format
,page.pandoc_output_format
: See the description of these options in the section on the configuration file, above. -
page.POSTPROCESS
: This contains a list of processing instructions which are called on the rendered HTML just before writing it to the output directory. Each instruction is either a function (placed intoPOSTPROCESS
by a shortcode) or a string (possibly specified in the frontmatter). If the latter, it points to a function entry in theautoload
dict imported from either the project'spy/wmk_autoload.py
file or the theme'spy/wmk_theme_autoload.py
file. In either case, the function receives the html as the first argument while the rest of the arguments constitute the template context. It should return the processed html. -
page.PREPROCESS
: This is analogous topage.POSTPROCESS
, except that the instructions in the list are applied to the markdown (or other content document) just before converting it to HTML. The function receives two arguments: the document text and thepage
object. It should return the altered document source. Note that this happens before shortcodes have been expanded, so (unlikepage.POSTPROCESS
) such actions cannot be added via shortcode.
Note that if two files in the same directory have the same slug, they may both
be rendered to the same output file; it is unpredictable which of them will go
last (and thus "win the race"). The same kind of conflict may arise between a
slug and a filename or even between two filenames containing non-ascii
characters. It is up to the content author to take care to avoid this; wmk
does nothing to prevent it.
Standard variables and their recommended meaning
The following variables are not used directly by wmk
but affect templates in
different ways. It is a list of recommendations rather than something which
must be necessarily followed.
Typical site variables
Site variables are the keys-value pairs under site:
in wmk_config.yaml
.
-
site.title
: Name or title of the site. -
site.lang
: Language code, e.g. 'en' or 'en-us'. Used e.g. for translations by some themes. -
site.locale
: Locale code, e.g. 'en_US.utf8'. Used when sortingMDCONTENT
by name or title. -
site.tagline
: Subtitle or slogan. -
site.description
: Site description. -
site.author
: Main author/proprietor of the site. Depending on the site templates (or the theme), may be a string or a dict with keys such as "name", "email", etc. -
site.base_url
: The protocol and hostname of the site (perhaps followed by a directory path ifsite.leading_path
is not being used). Normally without a trailing slash. -
site.leading_path
: If the web pages built bywmk
are not at the root of the website but in a subdirectory, this is the appropriate prefix path. Normally without a trailing slash. -
site.build_time
: This is automatically added to the site variable bywmk
. It is a datetime object indicating when the rendering phase of the current run started. -
site.lunr_search
: A boolean automatically added to the site variable. It is true whenlunr_index
is true in the configuration file.
Templates or themes may be configurable through various site variables, e.g.
site.paginate
for number of items per page in listings or site.mainfont
for
configuring the font family.
Classic meta tags
These variables mostly relate to the text content and affect the metadata
section of the <head>
of the HTML page.
-
page.title
: The title of the page, typically placed in the<title>
tag in the<head>
and used as a heading on the page. Normally the title should not be repeated as a header in the body of the markdown file. Most markdown documents should have a title. If it is not explicitly specified, the title will be generated automatically from the filename. -
page.slug
: See above. If it is missing, the slug is created from the title. -
page.id
: This is guaranteed to be unique at rendering time. If it is present but not unique, then "-1", "-2", etc., will be appended as necessary. If it is not explicitly specified, then it is generated by slugifying the full path to the source markdown file (relative to the content directory). For instance,blog/2022/09/The letter Þ in Old English.md
will become the IDblog-2022-09-the-letter-th-in-old-english
. -
page.description
: Affects the<meta name="description" ...>
tag in the<head>
of the page. The variablesummary
(see later) may also be used as fallback here. -
page.keywords
: Affects the<meta name="keywords" ...>
tag in the<head>
of the page. This may be either a list or a string (where items are separated with commas). -
page.robots
: Instructions for Google and other search engines relating to this content (e.g.noindex, nofollow
) should be placed in this variable. -
page.author
: The name of the author (if there is only one). May lead to<meta name="keywords" ...>
tag in the<head>
as well as appear in the body of the rendered HTML file. Some themes may expect this to be a dict with keys such asname
,email
,image
, etc. -
page.authors
: If there are many authors they may be specified here as a list. It is up to the template how to handle it if bothauthor
andauthors
are specified, but one way is to add theauthor
to theauthors
unless already present in the list. -
page.summary
: This may affect the<meta name="description" ...>
tag as a fallback if nodescription
is provided, but its main purpose is for list pages with article teasers and similar content. If it is initially not present butpage.generate_summary
is True, then it will be generated from the body of the page, as follows: (1) if the HTML comment<!--more-->
is present in the body, then any non-heading content before that will be used as the summary; (2) otherwise the first paragraph of the body will be used. In either case, if the autogenerated summary is longer than 300 characters, then it is truncated so as to be shorter than that (this maximum length is configurable withpage.summary_max_length
). Autogenerated summaries will contain neither HTML tags nor Markdown markup; if this is desired, the summary must be explicitly added to the frontmatter.
Note that this is by no means an exhaustive list of variables likely to affect
the <head>
part of the generated HTML. For instance, several other variables
may affect meta tags used for sharing on social media. One of the more common
ones is probably page.image
(described below). In any case, the list of
supported frontmatter attributes and how they are interpreted is for the most
part up to the theme or template author.
Dates
Dates and datetimes should normally be in a format conformant with or similar to
ISO 8601, e.g. 2021-09-19
and 2021-09-19T09:19:21+00:00
. The T
may be
replaced with a space and the time zone may be omitted (localtime is assumed).
If the datetime string contains hours it should also contain minutes, but
seconds may be omitted. If these rules are followed, the following variables
are converted to date or datetime objects (depending on the length of the
string) before they are passed on to templates.
-
page.date
: A generic date or datetime associated with the document. -
page.pubdate
: The date/datetime when first published. Currentlywmk b
does not omit rendering files withdate
orpubdate
in the future, but it may do so in a later version. -
page.modified_date
: The last-modified date/datetime. Note thatwmk
will also add the variableMTIME
, which is the modification time of the file containing the markdown source, so this information can be inferred from that if this variable is not explicitly specified. -
page.created_date
: The date the document was first created. -
page.expire_date
: The date from which the document should no longer be published. Similarly topubdate
, this currently has no direct effect on howwmk
builds and renders the site but may do so in a later version. -
page.auto_date
: If this is True and nopage.date
is present (or rather the field specified inpage.auto_date_field
, which deafults todate
), thenwmk
tries to look for an ISO date in the source filename or its directory path. In this context, that means a group of 4+2+2 digits with a separator which may be either-
,_
, or/
: e.g.posts/2024-05-13-find-the-fish.md
ordiary/2024/02/19/spam.org
. If a date is found, thenpage.date
is set accordingly. (Obviously you would normally setauto_date
in anindex.yaml
file so as to affect all content files in that directory and its subdirectories.)
See also the description of the DATE
and MTIME
context variables above.
Media content
-
page.image
: The main image associated with the document. Affects theog:image
meta tag in HTML output and may be used for both teasers and content rendering. -
page.images
: A list of images associated with the document. Ifimage
is not specified, the main image will be taken to be the first in the list. -
page.audio
: A list of audio files/urls associated with this document. -
page.videos
: A list of video files/urls associated with this document. -
page.attachments
: A list of attachments (e.g. PDF files) associated with this document.
Taxonomy
See also the description of page.TAXONOMY
above. The following are terms
commonly used for taxonomy purposes:
-
page.section
: One of a quite small number of sections on the site, often corresponding to the leading subdirectory incontent
. E.g. "blog", "docs", "products". -
page.categories
: A list of broad categories the page belongs to. E.g. "Art", "Science", "Food". The first-named category may be regarded as the primary one. -
page.tags
: A list of tags relevant to the content of the page. E.g. "quantum physics", "knitting", "Italian food". -
page.weight
: A measure of importance attached to a page and used as an ordering key for a list of pages. This should be a positive integer. The list is normally ascending, i.e. with the lower numbers at the top. (Pages may of course be ordered by other criteria, e.g. bypubdate
).
Template filters
In addition to the built-in template filters provided by Mako or Jinja2 respectively, the following filters are by default made available in templates:
-
date
: date formatting using strftime. By default, the format '%c' is used. A different format is specified using thefmt
parameter, e.g.:${ page.pubdate | date(fmt=site.date_format) }
. -
date_to_iso
: Format a datetime as ISO 8601 (or similarly, depending on parameters). The parameters aresep
(the separator between the date part and the time part; by default 'T', but a space is sensible as well);upto
(by default 'sec', but 'day', 'hour' and 'frac' are also acceptable values); andwith_tz
(by default False). -
date_to_rfc822
: Format a datetime as RFC 822 (a common datetime format in email headers and some types of XML documents). -
date_short
: E.g. "7 Nov 2022". -
date_short_us
: E.g. "Nov 7th, 2022". -
date_long
: E.g. "7 November 2022". -
date_long_us
: E.g. "November 7th, 2022". -
slugify
: Turns a string into a slug. Only works for strings in the Latin alphabet. -
markdownify
: Convert markdown to HTML. It is possible to specify custom extensions using theextensions
argument. -
truncate
: Convert markdown/html to plaintext and return the firstlength
characters (default: 200), with anellipsis
(default: "…") appended if any shortening has taken place. -
truncate_words
: Convert markdown/html to plaintext and return the firstlength
words (default: 25), with anellipsis
(default "…") appended if any shortening has taken place. -
p_unwrap
: Remove a wrapping<p>
tag if and only if there is only one paragraph of text. Suitable for short pieces of text to which a markdownify filter has previously been applied. Example:<h1>${ page.title | markdownify,p_unwrap }</h1>
. -
strip_html
: Remove any markdown/html markup from the text. Paragraphs will not be preserved. -
cleanurl
: Remove trailing 'index.html' from URLs. -
url
: Unless the given path already starts with '/', '.' or 'http', prefix it with the first defined leading path ofsite.leading_path
,site.base_url
or a literal/
. Postfix a/
unless the path already has one or seems to end with a file extension. Callscleanurl
on the result. -
to_json
: converts the given data structure to JSON. Note that this should not normally be used as a string filter (i.e.${ value | to_json }
) but directly as a function, like this:${ to_json(value) }
. -
fingerprint
: Replace an unadorned path to an assets file with its fingerprinted (i.e. versioned) equivalent. Example:${ 'js/site.js' | url, fingerprint }
. Uses the corresponding entry from theASSETS_MAP
context variable if it is available but otherwise proceeds to do the fingerprinting itself.
If you wish to provide additional filters in Mako without having to explicitly
define or import them in templates, the best way of doing this his to add them
via the mako_imports
setting in wmk_config.yaml
(see above). There is
currently no easy way to do this if Jinja2 templates are being used, however.
Please note that in order to avoid conflicts with the above filters you should
not place a file named wmk_mako_filters.py
or wmk_jinja2_extras.py
in your
py/
directories.
Working with lists of pages
Templates which render a list of content files (e.g. a list of blog posts or
pages belonging to a category) will need to filter or sort MDCONTENT
accordingly. In order to make this easier, MDCONTENT
is wrapped in a list-like
object called MDContentList
, which has the following methods:
General searching/filtering
Each of the following methods returns a new MDContentList
containing those
entries for which the predicate (pred
) is True.
-
match_entry(self, pred)
: Thepred
(i.e. predicate) is a callable which receives the full information on each entry in theMDContentList
and returns True or False. -
match_ctx(self, pred)
: Thepred
receives the context for each entry and returns a boolean. -
match_page(self, pred)
: Thepred
receives thepage
object for each entry and returns a boolean. -
match_doc(self, pred)
: Thepred
receives the markdown body for each entry and returns a boolean. -
url_match(self, url_pred)
: Thepred
receives theurl
(relative tohtdocs
) for each entry and returns a boolean. -
path_match(self, src_pred)
: Thepred
receives the path to the source document for each entry and returns a boolean.
Specialized searching/filtering
All of these return a new MDContentList
object (at least by default).
-
posts(self, ordered=True)
: Returns a newMDContentList
with those entries which are blog posts. In practice this means those with markdown sources in theposts/
orblog/
subdirectories or those which have apage.type
of "post", "blog", "blog-entry" or "blog_entry". Normally ordered by date (newest first), but this can be turned off by settingordered
to False. -
not_posts(self)
: Returns a newMDContentList
with "pages", i.e. those entries which are not blog posts. -
has_slug(self, sluglist)
,has_id(self, idlist)
: Entries with specific slugs/ids. -
in_date_range(self, start, end, date_key='DATE')
: Posts/pages with a date betweenstart
andend
. The key for the date field can be specifed usingdate_key
. Unless the value fordate_key
is eitherDATE
orMTIME
, then the key is looked for in thepage
variables for the entry. -
has_taxonomy(self, haystack_keys, needles)
: A general search for entries belonging to a taxonomy group, such as category, tag, section or type. Theyhaystack_keys
are thepage
variables to examine whileneedles
is a list of the values to look for in the values of those variables. A string value forneedles
is treated as a one-item list. The search is case-insensitive. -
in_category(self, catlist)
: A shortcut method forself.has_taxonomy(['category', 'categories'], catlist)
-
has_tag(self, taglist)
: A shortcut method forself.has_taxonomy(['tag', 'tags'], taglist)
. -
in_section(self, sectionlist)
: A shortcut method forself.has_taxonomy(['section', 'sections'], sectionlist)
. -
get_used_taxonomies(self)
: Get a list of all known taxonomies that are actually used by items in this MDContentList (i.e. content files). These may be of two types: (1) the standard taxonomies tags, sections, categories and authors; and (2) anything defined as aTAXONOMY
in the frontmatter of a page. Returns a list of dicts with the keystaxon
,name
,name_singular
andname_plural
. If the taxonomy belongs to the latter group, thenorder
,list_url
,item_url_pattern
andpage_id
will be present as well, andname_singular
/name_plural
may be empty. If a standard taxonomy (e.g. tags) has been handled as a content pageTAXONOMY
, then the latter type takes precedence (i.e. the standard one is omitted from the list). -
group_by(self, pred, normalize=None, keep_empty=False)
: Group items in an MDContentList using a given criterion. Parameters:pred
is a callable receiving a content item and returning a string or a list of strings. For convenience,pred
may also be specified as a string and is then interpreted as the value of the namedpage
variable, e.g.category
;normalize
is an optional callable that transforms the grouping values, e.g. by truncating and lowercasing them;keep_empty
should be set to True when the content items whose predicate evaluates to the empty string are to be included in the result, since they otherwise will be omitted. Returns a dict whose keys are strings and whose values areMDContentList
instances. -
taxonomy_info(self, keys, order='count', tostring=None)
: Returns a list of dicts, where each dict corresponds to the slugified value of any of the keys inkeys
. The keys in the dict arename
,slug
,forms
(different forms ofname
that appear in the result, e.g. upper/lowercase),count
, anditems
(an MDContentList object).tostring
, if present, is a callable that changes non-string and non-list values into strings for the purposes of grouping. Shorthand forms for common taxonomy types are available, namelyget_categories(self, order='name')
,get_tags(self, order='name')
,get_sections(self, order='name')
, andget_authors(self, order='name', tostring=None)
. These look for both singular and plural forms of the given keys, e.g.['tag', 'tags']
forget_tags()
. -
page_match(self, match_expr, ordering=None, limit=None)
: This is actually quite a general matching method but does not require the caller to pass a predicate callable to it, which means that it can be employed in more varied contexts than the general methods described in the last section. Amatch_expr
contains the filtering specification. It will be described further below. Theordering
parameter, if specified, should be eithertitle
,slug
,url
ordate
, with an optional-
in front to indicate reverse ordering. Thedate
option forordering
may be followed by the preferred frontmatter date field after a colon, e.g.ordering='-date:modified_date'
for a list with the most recently changed files at the top. Thelimit
, if specified, obviously indicates the maximum number of pages to return. -
page_match_sql()
,get_db()
,get_db_columns()
– see "Searching/filtering using SQL" below.
A match_expr
for page_match()
is either a dict or a list of dicts. If it is
a dict, each page in the result set must match each of the attributes specified
in it. If it is a list of dicts, each page in the result set must match at least
one of the dicts (i.e., the returned result set contains the union of all
matches from all dicts in the list). When a string or regular expression match
is being performed in this process, it will be case-insensitive. The supported
attributes (i.e. dict keys) are as follows:
title
: A regular expression which will be applied to the page title.slug
: A regular expression which will be applied to the slug.id
: A string or list of strings (one of) which must match the page id exactly.url
: A regular expression which will be applied to the target URL.path
: A regular expression which will be applied to the path to the markdown source file (i.e. thesource_file_short
field).doc
: A regular expression which will be applied to the body of the markdown source document.date_range
: A list containing two ISO-formatted dates and optionally a date key (DATE
by default) - see the description ofin_date_range()
above.has_attrs
: A list of frontmatter variable names. Matching pages must have a non-empty value for each of them.attrs
: A dict where each key is the name of a frontmatter variable and the value is the value of that attribute. If the value is a string, it will be matched case-insensitively. All key-value pairs must match.has_tag
,in_section
,in_category
: The values are lists of tags, sections or categories, respectively, at least one of which must match (case-insensitively). See the methods with these names above.is_post
: If set to True, will match if the page is a blog post; if set to False will match if the page is not a blog post.exclude_url
: The page with this URL should be omitted from the results (normally the calling page).
Searching/filtering using SQL
An MDContentList
has three methods for examining the content using an SQLite
in-memory database:
-
get_db(self)
: Builds a SQLite database containing a single table,content
, whose structure is described below. Returns a connection to this database which can then be worked with using normal sqlite3/DBAPI methods. The database has a locale-sensitive collation calledlocale
(which applieslocale.strxfrm
) and a custom functioncasefold
(which simply applies the Pythoncasefold
string method). The row factory issqlite3.Row
, so row fields can be read using either column names or integer indices. -
get_db_columns(self)
: Returns a simple list of the columns in thecontent
table. -
page_match_sql(self, where_clause=None, bind=None, order_by=None, limit=None, offset=None, raw_sql=None, raw_result=False, first=False)
: Eitherwhere_clause
orraw_sql
must be specified. In either case, ifbind
is specified, the bind variables there will be applied to the SQL upon execution. Iforder_by
(a string),limit
oroffset
(integers) are specified, they will be appended to the SQL before executing it against the database connection. The result will be aMDContentList
unlessraw_result
is True, in which case it is a cursor object. (Ifraw_result
is False butraw_sql
is supplied, the column list in the SQL select statement must includesource_file
so as to permit the construction of an appropriateMDContentList
). Iffirst
is True, only the first item from the results is returned (or None, if the results are empty).
The content
table constructed by get_db()
always contains the columns
source_file
, source_file_short
, url
target
, template
, MTIME
, DATE
,
doc
, and rendered
. In addition, it contains each page
metadata field that
appears in any of the entries in the MDContentList
in question. These will be
added as columns with the page_
prefix; for instance, the title
field will
become page_title
.
It should be noted that all page
fields added to the table will have to match
the regular expression ^[a-z]\w*$
. Thus, any metadata field with
a key that is all uppercase, titlecased, or contains non-word characters
(such as hyphens) will be omitted. Also, field names are case-sensitive in the
raw metadata, but case-insensitive in the database table, so inconsistently
capitalized field names may lead to unexpected results.
A field value that is not either string, integer, float, boolean, date, datetime,
or None, will be serialized using json.dumps()
with ensure_ascii
set to False
(for easier utf-8 matching). Dates and datetimes are stringified. Booleans will
be represented as 1 or 0.
Sorting
All of these return a new MDContentList
object with the entries in the
specified order.
-
sorted_by(self, key, reverse=False, default_val=-1)
: A general sorting method. Thekey
is thepage
variable to sort on,default_val
is the value to assume if there is no such variable present in the entry, whilereverse
indicates whether the sort is to be descending (True) or ascending (False, the default). -
sorted_by_date(self, newest_first=True, date_key='DATE')
: Sorting by date, newest first by default. The date key to sort on can be specified if desired. -
sorted_by_title(self, reverse=False)
: Sorting bypage.title
, ascending by default.
Pagination
paginate(self, pagesize=5, context=None)
: Divides theMDContentList
into chunks of sizepagesize
and returns a tuple consisting of the chunks and a list ofpage_urls
(one for each page, in order). If an appropriate template context is provided, pages 2 and up will be written to the webroot output directory to destination files whose names are based upon the URL for the first page (and the page number, of course). Without the context, thepage_urls
will be None. It is the responsibility of the calling template to check the_page
variable for the current page to be rendered (this defaults to 1). Each iteration will get all chunks and must use this variable to limit itself appropriately.
Typical usage of paginate()
:
<%
posts = MDCONTENT.posts()
chunks, page_urls = posts.paginate(5, context)
curpage = context.get('_page', 1)
%>
% for post in chunks[curpage-1]:
${ show_post(post) }
% endfor
% if len(chunks) > 1:
${ prevnext(len(chunks), curpage, page_urls) }
% endif
Render to an arbitrary file
def write_to(self, dest, context, extra_kwargs=None, template=None)
: Calls a template with theMDContentList
inself
as the value ofCHUNK
and write the result to the file named indest
. The file is of course relative to the webroot. Any directories are created if necessary. Thetemplate
is by default the calling template whileextra_kwargs
may be added if desired.
Typical usage of write_to()
:
<%
if not CHUNK:
for tag in tags:
tagged = MDCONTENT.has_tag([tag])
if not tagged:
continue # avoid potential infinite loop!
outpath = '/tags/' + slugify(tag) + '/index.html'
tagged.write_to(outpath, context, {'TAG': tag})
%>
% if CHUNK:
${ list_tagged_pages(TAG, CHUNK) }
% else:
${ list_tags() }
% endif
<!-- site_search "Site search" 140 -->
Site search
Using Lunr
Lunr is the only search solution "natively" supported by wmk
. That being said,
implementing site search is not a simple matter of turning lunr indexing on. It
takes a bit of work by the author of the site or theme templates, so depending
on your needs it may even be easier to base your search functionality on another
solution.
With lunr_index
(and optionally lunr_index_fields
) in wmk_config.yaml
, wmk
will build a search index for Lunr.js and place it in
idx.json
in the webroot. In order to minimize its size, no metadata about
each record is saved to the index. Instead, a simple list of pages (with title
and summary) is placed in idx.summaries.json
. The summary is taken either from
one of the frontmatter fields summary
, intro
or description
(in order of
preference) or, failing that, from the start of the page body.
If lunr_languages
is present in wmk_config.yaml
, stemming rules for those
languages will be applied when building the index. The value may be a two-letter
lowercase country code (ISO-639-1) or a list of such codes. The currently
accepted languages are de
, da
, en
, fi
, fr
, hu
, it
, nl
, no
,
pt
, ro
, and ru
(this is the intersection of the languages supported by
lunr.js
and NLTK, respectively). The default language is en
. Attempting to
specify a non-supported language will raise an exception.
The index is built via the lunr.py
module and the stemming support is provided by the Python Natural Language
Toolkit.
For information about the supported syntax of the search expression, see the Lunr documentation.
Limitations of Lunr
-
Building the index does not mean that the search functionality is complete. It remains to point to
lunr.js
in the templates and write some javascript to interface with it and display the results. However, since every website is different, this cannot be provided by wmk directly. It is up to the template (or theme) author to actually load the index and present a search interface to the user. -
Similarly, if a "fancy" preview of results is required which cannot be fulfilled using the information in
idx.summaries.json
, this must currently be solved independently by the template/theme author. -
Note that only the raw content document is indexed, not the HTML after the markdown (or other input content) has been processed. The only exception to this is that the binary input formats (DOCX, ODT, EPUB) are converted to markdown before being indexed. The output of templates (including even text resulting from shortcodes called from the content documents) is not indexed either.
-
Because Lunr creates a single index file for the whole site, it may not be a practical option for large sites with lots of content – a realistic limit may be somewhere around 1,000 pages or so. Some other client-side search solutions break the index into smaller chunks and may therefore be a viable option for such sites.
Overview of alternative solutions
If you are looking for an alternative to lunr, the first thing to consider is whether a server-based solution is needed or whether a Javascript-based client-side solution would be enough.
If the site has a lot of text (more than 200,000 words or so) or if it needs to work even without Javascript, then a server-based solution is required. You then need to decide whether you want to self-host it or if you are ready to pay for a third-party hosted solution. Meilisearch is open source and allows for self-hosting (although a hosted solution called Meilisearch Cloud is also available), while the market leader in hosted site search is probably Algolia.
If, however, a client-side Javascript solution is sufficient, there are several alternatives to lunr that could come into consideration, e.g. Pagefind, Tinysearch, Elasticlunr or Stork.
Whichever solution is picked, you of course need to add the required HTML, CSS and Javascript to the templates for the search functionality to work. You also need to take care of updating the search index whenever the site is built.
Assuming you have opted not to use the built-in lunr support, the index creation/updating step can basically be implemented in two ways:
-
By running after the build step has finished via a
cleanup_commands
entry inwmk_config.yaml
. This calls a script or another external program which can update the index based on either the HTML in the output folder or the JSON file specified using themdcontent_json
configuration option. -
By implementing a hook function in
wmk_hooks.py
(orwmk_theme_hooks.py
), most likely forpost_build_actions()
orindex_content()
; see the "Overriding and extending wmk via hooks" section below.
Example: Pagefind
Taking Pagefind as an example of the steps described above, you would, per their documentation, add something similar to this to your templates in an appropriate location:
<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<div id="search"></div>
<script>
window.addEventListener('DOMContentLoaded', (event) => {
new PagefindUI({ element: "#search", showSubResults: true });
});
</script>
It would also be a good idea to make sure you modify all base templates so as to
identify the main part of each page with the data-pagefind-body
attribute and thus omit repeated
elements such as navigation and footer from the index.
Finally, in order to actually create or update the search index whenever the
site is built, you would need to add the following to the wmk_config.yaml
file:
cleanup_commands:
- "npx -y pagefind --site htdocs"
This obviously assumes that you have npm installed on your system.
<!-- hooks "Overriding and extending wmk via hooks" 150 -->Overriding and extending wmk via hooks
Much of the functionality of wmk
can be changed by overriding or extending
specific steps it performs. This is done by adding Python code to a file named
wmk_hooks.py
in the project py/
directory. Themes can do the same thing via
the wmk_theme_hooks.py
file in the theme's py/
directory. If both try to
affect the same functionality, the project directory takes precedence.
Currently, the following defs from wmk.py
can be extended by running hooks
before or after them, or can be redefined entirely:
auto_nav_from_content
binary_to_markdown
build_lunr_index
copy_static_files
doc_with_yaml
fingerprint_assets
generate_summary
get_assets_map
get_content_extensions
get_content
get_extra_content
get_index_yaml_data
get_nav
get_template_lookup
get_template_vars
get_templates
handle_redirects
handle_shortcode
handle_taxonomy
index_content
locale_and_translation
lunr_summary
markdown_extensions_settings
maybe_extra_meta
maybe_save_mdcontent_as_json
pandoc_extra_formats
pandoc_metadata
parse_dates
post_build_actions
postprocess_html
preferred_date
process_assets
process_content_item
process_markdown_content
process_templates
render_markdown
run_init_commands
run_cleanup_commands
write_redir_file
In order to override any of these entirely, define a function of the same name in the hooks file. One may also define a function that runs before or after:
- A function that runs before any of the above has the same name but with
__before
appended, e.g.index_content__before
. It receives the arguments passed to the original function and can modify them and return new arguments in the form of either a two-tuple of a list and a dict (for*args
and**kwargs
) or a single dict (for**kwargs
only). In either case, these will be passed to the affected function instead of the original arguments. If the before hook function returns nothing, the original arguments will be passed on unchanged. - A function that runs after any of the above has the same name but with
__after
appended, e.g.index_content__after
. It receives the return value of the original function and can return a new value that will be returned to the caller instead. (If it returns nothing, the original return value will be returned unchanged).
You should examine wmk's source code to make sure that any replacement function
you may write is compatible with the original in terms of its parameters and
possible return values. Updates to wmk
may of course make it necessary to
change your hook functions.
Examples
Here is a generic get_extra_content()
def which adds HTML pages fetched from a
database to the "normal" content from the content/
directory:
def get_extra_content(content, ctdir, datadir, outputdir, template_vars, conf):
known_ids = set([_['data']['page']['id'] for _ in content])
content_extensions = { '.html': {'raw': True}, }
extpat = re.compile(r'\.html$')
result = _get_articles_from_database()
for i, row in enumerate(result):
meta, doc, pseudo = _munge_row(row, i, result, ctdir)
wmk.process_content_item(
meta, doc, content, conf, template_vars,
ctdir, outputdir, datadir, content_extensions, known_ids,
pseudo['root'], pseudo['fn'],
pseudo['source_file'], pseudo['source_file_short'],
extpat, False)
The functions _get_articles_from_database()
and _munge_row()
are left as an
exercise for the reader.
Here is an __after
hook for maybe_extra_meta()
which fetches a conference
schedule (e.g. from from an online calendar) if the conference_id
key is
present in the frontmatter. The retrieved information will then be available to
the templates for that page as page.schedule
.
def maybe_extra_meta__after(meta):
if 'conference_id' in meta:
meta['schedule'] = _get_conference_schedule(meta['conference_id'])
return meta
A third example: Let's say you want to show information from a few RSS sources in a sidebar that will appear on several pages. In order to avoid refetching it for each page you can use something like this:
def get_template_vars__after(template_vars):
if 'rss_sources' in template_vars:
template_vars['rss_info'] = fetch_rss_feeds(template_vars['rss_sources'])
return template_vars
This assumes that you set rss_sources
in the template_context
section of
your wmk_config.yaml
file.
Incorporating external sources
A wmk-maintained website may incorporate material that does not originate as
content files in the site's content/
directory. The source of the material may
be a database or an external API, perhaps provided by a headless CMS system such
as Sanity, Directus, or DatoCMS.
In either case, there are two main approaches as to how to integrate such
content into a wmk site. The first is to use the hooks system described earlier,
especially get_extra_content()
. The second is to fetch the material
independently of wmk (or perhaps from the init_commands
that can be specified
in the configuration file) and write it as a set of html or markdown files into
content/
, whereupon wmk can treat it as normal file-based content.
Example: Import from WordPress
As an example of the latter approach, a set of scripts is available in the
extras/
subdirectory to fetch and maintain content from a WordPress site.
The script wordpress2content.py
uses the WordPress REST API to get posts and
pages from a WordPress site and export them as content files in content/
.
Images and other media files from the origin's wp-content/uploads/
folder go
into static/_fetched/
.
This may either be used to migrate from WordPress to a static site maintained by wmk, or to use a (possibly non-public) WordPress installation as a headless CMS for external authors or non-technical users.
When used in the latter way, the helper scripts duplicate_wp_content.py
and
removed_wp_content.py
may help with the housekeeping involved in keeping the
content properly synchronized.
For further details, see the readme in the extras/
directory.