Awesome
Paru—Pandoc wrapped around in Ruby
Contents
Introduction
Paru is a simple Ruby wrapper around pandoc, the great multi-format document converter. Paru supports automating pandoc by writing Ruby programs and using pandoc in your Ruby programs (see Chapter 2 in the manual). Paru also supports writing pandoc filters in Ruby (see Chapter 3 in the manual). In paru’s manual the use of paru is explained in detail, from explaining how to install and use paru, creating and using filters, to putting it all together in a real-world use case: generating the manual!
See also the paru API documentation.
Note If you’re using pandoc 3, use paru version 1.1.x or higher; paru 1.0.x doesn’t work with pandoc 3. If you’re still using pandoc version 2, use paru version 1.0.x instead.
This README is a brief overview of paru’s features and usages.
Licence
Paru is free sofware; paru is released under the GPLv3. You find paru’s source code on github.
Acknowledgements
I would like to thank the following users for their contributions of patches, bug reports, fixes, and suggestions. With your help paru is growing beyond a simple tool for personal use into a useful addition to the pandoc ecosystem.
Installation
Paru is installed through rubygems as follows:
gem install paru
You can also build and install the latest version gem yourself by running the following commands:
cd /path/to/paru/repository
bundle install
rake build
gem install pkg/paru-1.4.1.gem
Paru, obviously, requires pandoc. See https://pandoc.org/installing.html about how to install pandoc on your system and pandoc’s manual on how to use pandoc.
You can generate the API documentation for
paru
by cloning the repository and running rake yard
. It’ll put it in
documentation/api-doc
.
Paru says hello to pandoc
Using paru is straightforward. It is a thin “rubyesque” layer around the pandoc executable. After requiring paru in your ruby program, you create a new paru pandoc converter as follows:
require "paru/pandoc"
converter = Paru::Pandoc.new
The various command-line options of
pandoc map to methods on this
newly created instance. When you want to use a pandoc command-line
option that contains dashes, replace all dashes with an underscore to
get the corresponding paru method. For example, the pandoc command-line
option --pdf-engine
becomes the paru method pdf_engine
. Knowing this
convention, you can convert from markdown to pdf using the lualatex
engine by calling the from
, to
, and pdf_engine
methods to
configure the converter. There is a convenience configure
method that
takes a block to configure multiple options at once:
require "paru/pandoc"
converter = Paru::Pandoc.new
converter.configure do
from "markdown"
to "latex"
pdf_engine "lualatex"
output "my_first_pdf_file.pdf"
end
As creating and immediately configuring a converter is a common pattern,
the constructor takes a configuration block as well. Finally, when you
have configured the converter, you can use it to convert a string with
the convert
method, which is aliased by The <<
operator. You can
call convert
multiple times and re-configure the converter in between.
This introductory section is ended by the obligatory “hello world” program, paru-style:
#!/usr/bin/env ruby
require "paru/pandoc"
input = "Hello world, from **pandoc**"
output = Paru::Pandoc.new do
from "markdown"
to "html"
end << input
puts output
Running the above program results in the following output:
<p>Hello world, from <strong>pandoc</strong></p>
To support converting files that cannot easily be represented by a
single string, such as EPUB or docx, paru also has the convert_file
method. It takes a path as argument, and when executed, it tells pandoc
to convert that path using the current configured pandoc configuration.
In the next chapter, the development of do-pandoc.rb is presented as an example of real-world usage of paru.
Writing and using pandoc filters with paru
One of pandoc’s interesting capabilities are custom filters. This is an extremely powerful feature that allows you to automate certain tasks, such as numbering figures, using other command-line programs to pre or post process parts of the input, or change the structure of the input document before having pandoc writing it out. Paru allows you to write pandoc filters in Ruby.
For a collection of paru filters, have a look at the paru-filter-collection.
The simplest paru pandoc filter is the identity filter that does do nothing:
#!/usr/bin/env ruby
# Identity filter
require "paru/filter"
Paru::Filter.run do
# nothing
end
Nevertheless, it shows the structure of every paru pandoc filter: A
filter is an executable script (line 1), it uses the paru/filter
module, and it executes a Paru::Filter
object. Running the identity
filter is a good way to start writing your own filters. In the next
sections several simple but useful filters are developed to showcase the
use of paru to write pandoc filters in Ruby.
A more useful filter is to numbering figures. In some output formats, such as PDF, HTML + CSS, or ODT, figures can be automatically numbered. In other formats, notably markdown itself, numbering has to be done manually. However, it is very easy to create a filter that does this numbering of figures automatically as well:
#!/usr/bin/env ruby
# Number all figures in a document and prefix the caption with "Figure".
require "paru/filter"
figure_counter = 0;
Paru::Filter.run do
with "Image" do |image|
figure_counter += 1
image.inner_markdown = "Figure #{figure_counter}. #{image.inner_markdown}"
end
end
The filter number_figures.rb
keeps track of the last figure’s sequence
number in counter
. Each time an
Image
is encountered while processing the input file, that counter is
incremented and the image’s caption is prefixed with “Figure
#{counter}.” by overwriting the image’s node’s inner markdown.
For more information about writing filters, please see paru’s manual or the API documentation for the Filter class. Furthermore, example filters can also be found in the filters sub directory of paru’s examples. Feel free to copy and adapt them to your needs.
Documentation
Manual
For more information on automatic the use of pandoc with paru or writing pandoc filters in ruby, please see paru’s manual.
API documentation
The API documentation covers the whole of paru. Where the manual just describes a couple of scenarios, the API documentation shows all available functionality. It also give more examples of using paru and writing filters.
Frequently asked questions
Feel free to ask me a question: send me an email or submit a new issue if you’ve found a bug!
-
I get an error like “Erro: JSON parse error: Error in $: Incompatible API versions: encoded with [1,20] but attempted to decode with [1,21].”
The versions of pandoc and paru you are using are incompatible. Please install the latest versions of pandoc and paru.
Why does this happen? Internally pandoc uses pandoc-types to represent documents its converts and filters. Documents represented by one version of pandoc-types are slightly incompatible with documents represented by another version of pandoc-types. This also means that filters written in paru for one version of pandoc-types are not guaranteed to work on documents represented by another version of pandoc-types. As a result, not all paru versions work together with all pandoc versions.
As a general rule: Use the latest versions of pandoc and paru.
-
I get an error like “‘values_at’: no implicit conversion of String into Integer (TypeError) from lib/paru/filter/document.rb:54:in ‘from_JSON’”
The most likely cause is that you’re using an old version of Pandoc. Paru version 0.2.x only supports pandoc version 1.18 and up. In pandoc version 1.18 there was a breaking API change in the way filters worked. Please upgrade your pandoc installation.