Home

Awesome

YaLafi: Yet another LaTeX filter

***<br> *** THIS REPOSITORY HAS BEEN ARCHIVED<br> ***<br> *** Development continues under <ins>torik42/YaLafi</ins> since August 2022.<br> *** Thank you sincerely, <ins>@torik42<ins>, for taking over!<br> ***

Notice. The library of LaTeX macros, environments, document classes, and packages is still rather restricted, compare the list of macros. Please don't hesitate to raise an Issue, if you would like to have added something.

Summary. This Python package extracts plain text from LaTeX documents. The software may be integrated with a proofreading tool and an editor. It provides

The sample Python application yalafi.shell from section Example application integrates the LaTeX filter with the proofreading software LanguageTool. It sends the extracted plain text to the proofreader, maps position information in returned messages back to the LaTeX text, and generates results in different formats. You may easily

For instance, the LaTeX input

Only few people\footnote{We use
\textcolor{red}{redx colour.}}
is lazy.

will lead to the text report

1.) Line 2, column 17, Rule ID: MORFOLOGIK_RULE_EN_GB
Message: Possible spelling mistake found
Suggestion: red; Rex; reds; redo; Red; Rede; redox; red x
Only few people is lazy.    We use redx colour. 
                                   ^^^^
2.) Line 3, column 1, Rule ID: PEOPLE_VBZ[1]
Message: If 'people' is plural here, don't use the third-person singular verb.
Suggestion: am; are; aren
Only few people is lazy.    We use redx colour. 
                ^^

<a name="example-html-report"></a> This is the corresponding HTML report (for an example with a Vim plugin, see here):

HTML report

The tool builds on results from project Tex2txt, but differs in the internal processing method. Instead of using recursive regular expressions, a simple tokeniser and a small machinery for macro expansion are implemented; see sections Differences to Tex2txt and Remarks on implementation.

Beside the interface from section Python package interface, application Python scripts like yalafi/shell/shell.py from section Example application can access an interface emulating tex2txt.py from repository Tex2txt by 'from yalafi import tex2txt'. The pure LaTeX filter can be directly used in scripts via a command-line interface, it is described in section Command-line of pure filter.

If you use this software and encounter a bug or have other suggestions for improvement, please leave a note under category Issues, or initiate a pull request. Many thanks in advance.

Happy TeXing!

Contents

Installation<br> Example application<br> Interfaces to Vim<br> Interface to Emacs<br> Interface to Atom<br> Usage under Windows<br> Related projects<br> <br> Filter actions<br> Fundamental limitations<br> Adaptation of LaTeX and plain text<br> Extension modules for LaTeX packages<br> Inclusion of own macros<br> <br> Multi-file projects<br> Handling of displayed equations<br> Multi-language documents<br> Python package interface<br> Command-line of pure filter<br> Differences to Tex2txt<br> Remarks on implementation

Installation

YaLafi (at least with Python version 3.6). Choose one of the following possibilities.

LanguageTool. On most systems, you have to install the software “manually” (1). At least under Arch Linux, you can also use a package manager (2). Please note that, for example under Ubuntu, sudo snap install languagetool will not install the components required here.

  1. The LanguageTool zip archive, for example LanguageTool-5.0.zip, can be obtained from the LanguageTool download page. Option --lt-directory of application yalafi.shell from section Example application has to point to the directory created after uncompressing the archive at a suitable place. For instance, the directory has to contain file 'languagetool-server.jar'.

  2. Under Arch Linux, you can simply say sudo pacman -S languagetool. In this case, it is not necessary to set option --lt-directory from variant 1. Instead, you have to specify --lt-command languagetool.

Back to contents

Example application

Remark. You can find examples for tool integration with Bash scripts in Tex2txt/README.md.

Example Python script yalafi/shell/shell.py will generate a proofreading report in text or HTML format from filtering the LaTeX input and application of LanguageTool (LT). It is best called as module as shown below, but can also be placed elsewhere and invoked as script. A simple invocation producing an HTML report could be:

python -m yalafi.shell --lt-directory ~/lib/LT --output html t.tex > t.html

On option '--server lt', LT's Web server is contacted. Otherwise, Java has to be present, and the path to LT has to be specified with --lt-directory or --lt-command. Note that from version 4.8, LT does not fully support 32-bit systems any more. Both LT and the script will print some progress messages to stderr. They can be suppressed with python ... 2>/dev/null.

python -m yalafi.shell [OPTIONS] latex_file [latex_file ...] [> text_or_html_file]

Option names may be abbreviated. If present, options are also read from a configuration file designated by script variable 'config_file' (one option per line, possibly with argument), unless --no-config is given. Default option values are set at the Python script beginning.

<a name="dictionary-adaptation"></a> Dictionary adaptation. LT evaluates the two files 'spelling.txt' and 'prohibit.txt' in directory

.../LanguageTool-?.?/org/languagetool/resource/<lang-code>/hunspell/

Additional words and words that shall raise an error can be appended here. LT version 4.8 introduced additional files 'spelling_custom.txt' and 'prohibit_custom.txt'.

HTML report. The idea of an HTML report goes back to Sylvain Hallé, who developed TeXtidote. Opened in a Web browser, the report displays excerpts from the original LaTeX text, highlighting the problems indicated by LT. The corresponding LT messages can be viewed when hovering the mouse over these marked places, see the introductory example above. With option --link, Web links provided by LT can be directly opened with left-click. Script option --context controls the number of lines displayed around each tagged region; a negative option value will show the complete LaTeX input text. If the localisation of a problem is unsure, highlighting will use yellow instead of orange colour. For simplicity, marked text regions that intertwine with other ones are separately repeated at the end. In case of multiple input files, the HTML report starts with an index.

Back to contents

Interfaces to Vim

As [Vim] is a great editor, there are several possibilities that build on existing Vim plugins or use Vim's compiler interface:

plugin vimtex | “plain Vim” | plugin vim-grammarous | plugin vim-LanguageTool | plugin ALE

Plugin vimtex

The Vim plugin [vimtex] provides comprehensive support for writing LaTeX documents. It includes an interface to YaLafi, documentation is available with :help vimtex-grammar-vlty. A copy of the corresponding Vim compiler script is editors/vlty.vim.

The following snippet demonstrates a basic vimrc setting and some useful values for vlty option field 'shell_options'.

map <F9> :w <bar> compiler vlty <bar> make <bar> :cw <cr><esc>
let g:tex_flavor = 'latex'
set spelllang=de_DE
let g:vimtex_grammar_vlty = {}
let g:vimtex_grammar_vlty.lt_directory = '~/lib/LanguageTool-5.0'
" let g:vimtex_grammar_vlty.lt_command = 'languagetool'
let g:vimtex_grammar_vlty.server = 'my'
let g:vimtex_grammar_vlty.show_suggestions = 1
let g:vimtex_grammar_vlty.shell_options =
        \   ' --multi-language'
        \ . ' --packages "*"'
        \ . ' --define ~/vlty/defs.tex'
        \ . ' --replace ~/vlty/repls.txt'
        \ . ' --equation-punctuation display'
        \ . ' --single-letters "i.\,A.\|z.\,B.\|\|"'
    \newcommand{\zB}{z.\,B. }   % LanguageTool correctly insists on
                                % narrow space in this German abbreviation
    \newtheorem{Satz}{Satz}     % correctly expand \begin{Satz}[siehe ...]
    \LTinput{main.glsdefs}      % read database of glossaries package

<a name="example-vimtex-plugin"></a> Here is the introductory example from above:

Vim plugin vim-vimtex

“Plain Vim”

File editors/ltyc.vim proposes a simple application to Vim's compiler interface. The file has to be copied to a directory like ~/.vim/compiler/.

For a Vim session, the component is activated with :compiler ltyc. Command :make invokes yalafi.shell, and the cursor is set to the first indicated problem. The related error message is displayed in the status line. Navigation between errors is possible with :cn and :cp, an error list is shown with :cl. The quickfix window appears on :cw.

The following snippet demonstrates a basic vimrc setting and some useful values for option 'ltyc_shelloptions'. Please refer to section Plugin vimtex for related comments.

map <F9> :w <bar> compiler ltyc <bar> make <bar> :cw <cr><esc>
let g:ltyc_ltdirectory = '~/lib/LanguageTool-5.0'
" let g:ltyc_ltcommand = 'languagetool'
let g:ltyc_server = 'my'
let g:ltyc_showsuggestions = 1
let g:ltyc_language = 'de-DE'
let g:ltyc_shelloptions =
        \   ' --multi-language'
        \ . ' --replace ~/ltyc/repls.txt'
        \ . ' --define ~/ltyc/defs.tex'
        \ . ' --equation-punctuation display'
        \ . ' --single-letters "i.\,A.\|z.\,B.\|\|"'
compiler ltyc

The screenshot resembles that from section Plugin vimtex.

Plugin vim-grammarous

For the Vim plugin [vim-grammarous], it is possible to provide an interface for checking LaTeX texts. With an entry in ~/.vimrc, one may simply replace the command that invokes LanguageTool. For instance, you can add to ~/.vimrc

let g:grammarous#languagetool_cmd = '/home/foo/bin/yalafi-grammarous'
map <F9> :GrammarousCheck --lang=en-GB<CR>

A proposal for Bash script /home/foo/bin/yalafi-grammarous (replace foo with username ;-) is given in editors/yalafi-grammarous. It has to be made executable with chmod +x .... Please adapt script variable ltdir, compare option --lt-directory in section Example application. If you do not want to have started a local LT server, comment out the line defining script variable use_server.

In order to avoid the problem described in Issue #89@vim-grammarous (shifted error highlighting, if after non-ASCII character on same line), you can set output=xml-b in yalafi-grammarous.

<a name="troubleshooting-for-vim-interface"></a> Troubleshooting for Vim interface. If Vim reports a problem with running LT, you can do the following. In ~/bin/yalafi-grammarous, comment out the final ... 2>/dev/null. For instance, you can just place a '#' in front: ... # 2>/dev/null. Then start, with a test file t.tex,

$ ~/bin/yalafi-grammarous t.tex

This should display some error message, if the problem goes back to running the script, Python, yalafi.shell or LanguageTool.

Here is the introductory example from above:

Vim plugin vim-grammarous

Plugin vim-LanguageTool

The Vim plugin [vim-LanguageTool] relies on the same XML interface to LanguageTool as the variant in section Plugin vim-grammarous. Therefore, one can reuse the Bash script editors/yalafi-grammarous. You can add to ~/.vimrc

let g:languagetool_cmd = '$HOME/bin/yalafi-grammarous'
let g:languagetool_lang = 'en-GB'
let g:languagetool_disable_rules = 'WHITESPACE_RULE'
map <F9> :LanguageToolCheck<CR>

Please note the general problem indicated in Issue #17. Here is again the introductory example from above. Navigation between highlighted text parts is possible with :lne and :lp.

Vim plugin vim-LanguageTool

Plugin ALE

With [ALE], the proofreader ('linter') by default is invoked as background task, whenever one leaves insert mode. You might add to ~/.vimrc

" if not yet set:
filetype plugin on
" F9: show detailed LT message for error under cursor, is left with 'q'
map <F9> :ALEDetail<CR>
" this turns off all other tex linters
let g:ale_linters = { 'plaintex': ['lty'], 'tex': ['lty'] }
" default place of LT installation: '~/lib/LanguageTool'
let g:ale_tex_lty_ltdirectory = '~/lib/LanguageTool-4.7'
" uncomment the following assignment, if LT has been installed via package
" manager; in this case, g:ale_tex_lty_ltdirectory hasn't to be specified
" let g:ale_tex_lty_command = 'languagetool'
" set to '' to disable server usage or to 'lt' for LT's Web server
let g:ale_tex_lty_server = 'my'
" default language: 'en-GB'
let g:ale_tex_lty_language = 'en-GB'
" default disabled LT rules: 'WHITESPACE_RULE'
let g:ale_tex_lty_disable = 'WHITESPACE_RULE'

Similarly to setting 'g:ale_tex_lty_disable', one can specify LT's options --enable, --disablecategories, and --enablecategories. Further options for yalafi.shell (compare section Plugin vimtex) may be passed like

let g:ale_tex_lty_shelloptions = '--single-letters "A|a|I|e.g.|i.e.||"'
                \ . ' --equation-punctuation display'

Additionally, one has to install ALE and copy or link file editors/lty.vim to directory ~/.vim/bundle/ale/ale_linters/tex/, or a similar location.

Here is again the introductory example from above. The complete message for the error at the cursor is displayed on F9, together with LT's rule ID, replacement suggestions, and the problem context (left with q). Navigation between highlighted text parts is possible with :lne and :lp, an error list is shown with :lli.

Vim plugin ALE

Back to contents

Interface to Emacs

The Emacs plugin [Emacs-langtool] may be used in two variants. First, you can add to ~/.emacs

(setq langtool-bin "/home/foo/bin/yalafi-emacs")
(setq langtool-default-language "en-GB")
(setq langtool-disabled-rules "WHITESPACE_RULE")
(require 'langtool)

A proposal for Bash script /home/foo/bin/yalafi-emacs (replace foo with username ;-) is given in editors/yalafi-emacs. It has to be made executable with chmod +x .... Please adapt script variable ltdir, compare option --lt-directory in section Example application. If you do not want to have started a local LT server, comment out the line defining script variable use_server.

Troubleshooting for Emacs interface. If Emacs reports a problem with running LT, you can apply the steps from [Troubleshooting for Vim interface] to ~/bin/yalafi-emacs.

Server interface. This variant may result in better tracking of character positions. In order to use it, you can write in ~/.emacs

(setq langtool-http-server-host "localhost"
      langtool-http-server-port 8082)
(setq langtool-default-language "en-GB")
(setq langtool-disabled-rules "WHITESPACE_RULE")
(require 'langtool)

and start yalafi.shell as server in another terminal with

$ python -m yalafi.shell --as-server 8082 [--lt-directory /path/to/LT]

The server will print some progress messages and can be stopped with CTRL-C. Further script arguments from section Example application may be given. If you add, for instance, '--server my', then a local LT server will be used. It is started on the first HTML request received from Emacs-langtool, if it is not yet running.

Installation of Emacs-langtool. Download and unzip Emacs-langtool. Place file langtool.el in directory ~/.emacs.d/lisp/. Set in your ~/.profile or ~/.bash_profile (and log in again)

export EMACSLOADPATH=~/.emacs.d/lisp:

Here is the introductory example from above:

Emacs plugin Emacs-langtool

Back to contents

Interface to Atom

For the editor [Atom], you can use the plugin [linter-yalafi]. Please note that we have not yet tested this interface.

Back to contents

Usage under Windows

Both yalafi.shell and yalafi can be directly used in a Windows command script or console. For example, this could look like

py -3 -m yalafi.shell --server lt --output html t.tex > t.html

or

"c:\Program Files\Python\Python37\python.exe" -m yalafi.shell --server lt --output html t.tex > t.html

if the Python launcher has not been installed.

Files with Windows-style line endings (CRLF) are accepted, but the text output of the pure LaTeX filter will be Unix style (LF only), unless a Windows Python interpreter is used.

Python's version for Windows by default prints Latin-1 encoded text to standard output. As this ensures proper work in a Windows command console, we do not change it for yalafi.shell when generating a text report. All other output is fixed to UTF-8 encoding.

Back to contents

Related projects

This project relates to software like

OpenDetex | pandoc | plasTeX | pylatexenc | TeXtidote | tex2txt | vscode-ltex

From these examples, currently (March 2020) only TeXtidote and vscode-ltex provide position mapping between the LaTeX input text and the plain text that is sent to the proofreading software. Both use (simple) regular expressions for plain-text extraction and are easy to install. YaLafi, on the other hand, aims to achieve high flexibility and a good filtering quality with minimal number of false positives from the proofreading software.

Back to contents

Filter actions

Here is a list of the most important filter operations. When the filter encounters a LaTeX problem like a missing end of equation, a message is printed to stderr. Additionally, the mark from 'Parameters.mark_latex_error' in file yalafi/parameters.py is included into the filter output. This mark should raise a spelling error from the proofreader at the place where the problem was detected.

Back to contents

Fundamental limitations

The implemented parsing mechanism can only roughly approximate the behaviour of a real LaTeX system. We assume that only “reasonable” macros are used, lower-level TeX operations are not supported. If necessary, they should be enclosed in \LTskip{...} (see section Adaptation of LaTeX and plain text) or be placed in a LaTeX file “hidden” for the filter (compare option --skip of yalafi.shell in section Example application). With little additional work, it might be possible to include some plain-TeX features like parsing of elastic length specifications. A list of remaining incompatibilities must contain at least the following points.

Back to contents

Adaptation of LaTeX and plain text

In order to suppress unsuitable but annoying messages from the proofreading tool, it is sometimes necessary to modify the input text. You can do that in the LaTeX code, or after filtering in the plain text.

Modification of LaTeX text

The following operations can be deactivated with options --nosp and --no-specials of yalafi and yalafi.shell, respectively. For instance, macro \LTadd will be defined, but it will not add its argument to the plain text.

Special macros. Small modifications, for instance concerning interpunction, can be made with the predefined macros \LTadd, \LTalter and \LTskip. In order to add a full stop for the proofreader only, you would write

... some text\LTadd{.}

For LaTeX itself, the macros also have to be defined. A good place is the document preamble. (For the last line, compare section Inclusion of own macros.)

\newcommand{\LTadd}[1]{}
\newcommand{\LTalter}[2]{#1}
\newcommand{\LTskip}[1]{#1}
\newcommand{\LTinput}[1]{}

The LaTeX filter will ignore these statements. In turn, it will include the argument of \LTadd, use the second argument of \LTalter, and neglect the argument of \LTskip. The macro names for \LTadd etc. are defined by variables 'Parameters.macro_filter_add' etc. in file yalafi/parameters.py.

Special comments. Mainly the document preamble often contains statements not properly processed “out-of-the-box”. Placing the critical parts in \LTskip{...} may lead to problems, as the statements now are executed slightly differently by the TeX system. As “brute-force” variant, the LaTeX filter therefore ignores input enclosed in comments starting with %%% LT-SKIP-BEGIN and %%% LT-SKIP-END. Note that the single space after %%% is significant. The opening special comment is given in variable 'Parameters.comment_skip_begin' of file yalafi/parameters.py.

A preamble could look as follows.

\documentclass{article}
%%% LT-SKIP-BEGIN
... disturbing stuff ...
%%% LT-SKIP-END
\title{A paper}
\begin{document}

Phrase replacement in the plain text

Yalafi.shell and yalafi provide options --replace file and --repl file, respectively. They may be valuable, if you often use a phrase (possibly of multiple words) that is not accepted by the proofreader. In the given file, a '#' sign marks the rest of the line as comment. The first '&' separated by space splits a line into two parts; the first part is replaced by the second one. Space in the first part may correspond to arbitrary space in the plain text that does not break the paragraph.

Remark. With option --multi-language, yalafi.shell only replaces in text parts with language according to option --language.

This German example replaces two words by a single one and vice versa:

so dass & sodass
nichtlineare & nicht lineare
nichtlineares & nicht lineares

Finally, please note the comment on dictionary adaptation.

Back to contents

Extension modules for LaTeX packages

The modules yalafi.documentclasses and yalafi.packages contain further submodules that are activated by the LaTeX filter when executing \documentclass or \usepackage, and on other occasions.

Each extension module has to provide a list 'require_packages' of strings that causes loading of other modules, and a function 'init_module()'. It is called by the parser and can modify the object of class 'Parameters'. In order to add macros and environments, it has to construct strings or object lists that are included in the returned object of class 'InitModule'. Classes for definition of macros and environments are described in the sections starting at Definition of macros. For an example, see file yalafi/packages/amsmath.py.

Back to contents

Inclusion of own macros

Unknown macros and environment frames are silently ignored. As all input files are processed independently, it may be necessary to provide project-specific definitions in advance.

For macros, which may be declared with \newcommand or \def (the latter is only roughly approximated), you can apply \LTinput{file.tex} as a simple solution. This adds the macros defined in the given file, skipping all other content. For the “real” LaTeX, macro \LTinput has to be defined as \newcommand{\LTinput}[1]{} that is in turn ignored by the filter.

If LaTeX files have to stay untouched, you can use options --defs and --define for yalafi and yalafi.shell, respectively. Alternatively, one can add the definitions to member 'Parameters.macro_defs_latex' in file yalafi/parameters.py. Here are examples from this file and extension module yalafi/packages/xcolor.py:

        \newcommand{\quad}{\;}
        \newcommand{\textasciicircum}{\verb?^?} % \^ is accent
---
        \newcommand{\textcolor}[3][]{#3}

More complicated macros as well as environments have to be registered with Python code. This may be done with options --pack and --packages for yalafi and yalafi.shell, respectively; compare section Extension modules for LaTeX packages. Alternatively, you can modify the collections 'Parameters.macro_defs_python' and 'Parameters.environment_defs' in yalafi/parameters.py.

Definition of macros

Macro(parms, name, args='', repl='', defaults=[], extract='')

Definition of environments

Environ(parms, name, args='', repl='', defaults=[], remove=False, add_pars=True, items=None, end_func=None)

Parameters parms to defaults are the same as for Macro(), where name does not start with a backslash. The arguments are those behind the opening '\begin{xyz}'. This means that the environment name 'xyz' does not yet count as argument in args and repl.

Definition of equation environments

EquEnv(parms, name, args='', repl='', defaults=[], remove=False)

This is equivalent to Environ(), but maths material is replaced according to section Handling of displayed equations. Replacements in repl and defaults are still interpreted in text mode.

Macro handler functions

Parameter repl of class Macro may specify a function with the following arguments.

handler(parser, buf, mac, args, delim, pos)

It has to return a possibly empty list of tokens that are used as result of the macro expansion. The list may include tokens of class VoidToken (see argument args).

For examples, see file yalafi/handlers.py.

Back to contents

Multi-file projects

Here, we present one of several possibilities to cope with multiple files. The main point is that the base LaTeX filter currently cannot directly follow file inclusions like \input{...}. Assume you have the following file main.tex.

% (load document class and packages)
% possibly: load own macro definitions etc.
\input{defs.tex}
% the previous command is ignored by the filter, thus:
\LTinput{defs.tex}
\begin{document}
Test text.
\input{ch1/intro.tex}
\end{document}

Please provide the definition of \LTinput as in section Adaptation of LaTeX and plain text.

In order to check the “normal text” only in file main.tex, you say

python -m yalafi.shell [...] --packages "" main.tex

Macros like \input are ignored, in this case. With the optional '--packages ""', default loading of all packages known to the filter is suppressed.

The check of file ch1/intro.tex may look like

python -m yalafi.shell [...] --packages "" --define main.tex ch1/intro.tex

Option '--define main.tex' ensures that all settings and definitions from file main.tex are available. “Normal text” from that file is ignored. Alternatively, you can add '\LTinput{main.tex}' at the beginning of file ch1/intro.tex.

A recursive check of all files is initiated by

python -m yalafi.shell [...] --packages "" --include --define main.tex main.tex

During a first phase, all file names are collected by evaluation of \include, \input, \subfile and \subfileinclude commands. Then, each file is processed on its own. If you want to exclude certain files, for instance figures given in TeX code, you can use option --skip from section Example application.

Remark. An alternative version is as follows. Write all commands that YaLafi needs in an own file, say yy-defs.tex. Then use option '--define yy-defs.tex', or place '\LTinput{yy-defs.tex}' in all sources.

Back to contents

Handling of displayed equations

Displayed equations should be part of the text flow and include the necessary interpunction. The German version of LanguageTool (LT) will detect a missing dot in the following snippet. For English texts, see the comments in section Equation replacements in English documents ahead.

Wir folgern
\begin{align}
    a   &= b \\
    c   &= d
\end{align}
Daher ...

Here, 'a' to 'd' stand for arbitrary mathematical terms (meaning: “We conclude <maths> Therefore, ...”). In fact, LT complains about the capital “Daher” that should start a new sentence.

Trivial version

With the entry

    Environ(self, 'align', remove=True, add_pars=False),

in list 'environments' of file yalafi/packages/amsmath.py, the equation environment is simply removed. We get the following filter output that will probably cause a problem, even if the equation itself ends with a correct interpunction sign.

Wir folgern
Daher ...

Simple version

With the entry

    EquEnv(self, 'align', repl='  Relation', remove=True),

in 'Parameters.environment_defs', one gets:

Wir folgern
  Relation
Daher ...

Adding a dot '= d.' in the equation will lead to 'Relation.' in the output. This will also hold true, if the interpunction sign ('Parameters.math_punctuation') is followed by maths space or by macros as \label and \nonumber.

Full version

Remark. Our equation parsing currently assumes that aligned operators like '=' and '+' are placed on the right side of the alignment character '&'. LaTeX does not enforce that, but it is the style found in examples of the documentation for package amsmath.

Remark. For a simplification, see option --simple-equations in section Example application.

With the default entry

    EquEnv(self, 'align'),

we obtain (“gleich” means equal, and setting language to English will produce “equal”):

Wir folgern
  V-V-V  gleich W-W-W
  W-W-W  gleich X-X-X.
Daher ...

The replacements like 'V-V-V' are taken from collections 'math_repl_display*' in file yalafi/parameters.py that depend on language setting, too. Now, LT will additionally complain about repetition of 'W-W-W'. Finally, writing '= b,' and '= d.' in the equation leads to the output:

Wir folgern
  V-V-V  gleich W-W-W,
  X-X-X  gleich Y-Y-Y.
Daher ...

The rules for equation parsing are described in section Parser for maths material. They ensure that variations like

    a   &= b \\
        &= c.

and

    a   &= b \\
        &\qquad -c.

also will work properly. In contrast, the text

    a   &= b \\
    -c  &= d.

will again produce an LT warning due to the missing comma after 'b', since the filter replaces both 'b' and '-c' by 'W-W-W' without intermediate text.

In rare cases, manipulation with \LTadd{...} or \LTskip{...} may be necessary to avoid false warnings from the proofreader; compare section Adaptation of LaTeX and plain text.

Inclusion of “normal” text

In variant “Full version”, the argument of \mbox (macro names: collection 'Parameters.math_text_macros', loading of LaTeX package amsmath adds \text) is directly copied. Outside of \mbox etc., only maths space like \; and \quad (see 'Parameters.math_space') is considered as space. Therefore, one will get warnings from the proofreading program, if subsequent \text and maths parts are not properly separated.

Equation replacements in English documents

The replacement collections 'math_repl_display*' in file yalafi/parameters.py do not work well, if single letters are taken as replacements. For instance, 'V.' cannot be safely considered as end of a sentence. We now have chosen replacements as 'U-U-U' for German and English texts.

Furthermore, the English version of LanguageTool (like other proofreading tools) rarely detects mistakenly capital words inside of a sentence; they are probably considered as proper names. Therefore, a missing dot at the end of a displayed equation is hardly found. An experimental hack is provided by option --equation-punctuation of application script yalafi/shell/shell.py described in section Example application.

Back to contents

Multi-language documents

Remarks. This feature is experimental, any comments are welcome. Operation may be slow, unless a LanguageTool server is used, for instance, via option '--server my'.

As an example, assume option '--multi-language' for yalafi.shell and the LaTeX text:

\documentclass{article}
\usepackage[german,english]{babel}
\newcommand{\german}[1]{\textit{\foreignlanguage{german}{#1}}}

\begin{document}
This is thex German word \german{excellent}..
\end{document}

Then, the Vim example from section “Plain Vim” with setting let g:ltyc_showsuggestions = 1 will produce this quickfix window:

t.tex|6 col 9 info|  Possible spelling mistake found. Suggestion: the; then; they; them; thee; Theo; hex; THX; TeX; Tex; The; t hex; the x; Théo
t.tex|6 col 34 info|  Möglicher Tippfehler gefunden. Suggestion: exzellent; exzellente; exzellenten; exzellenter; Exzellenz; exzellentes; erzählend; exzellentem; erhellend; erkältend; exzelliert
t.tex|6 col 44 info|  Two consecutive dots Suggestion: .; …

The initial language is specified by option --language, it is overwritten upon \usepackage[...]{babel}. Commands like \selectlanguage{...} are also effective in files loaded via option --define or with \LTinput{...}. Language names in babel commands are mapped to xx-XX codes by dictionary 'language_map' in file yalafi/packages/babel.py.

Further options. In the above example, LanguageTool is invoked for 'This is thex German word L-L-L..' with language en-GB, and for 'excellent' with language de-DE. The following options for yalafi.shell can be used to adjust the behaviour.

Please consider also the tweaks in section Adaptation of LaTeX and plain text.

Back to contents

Python package interface

We comment the central function in file yalafi/tex2txt.py that uses the package interface to emulate the behaviour of script tex2txt.py in repository Tex2txt.

 1  def tex2txt(latex, opts, multi_language=False, modify_parms=None):
 2      def read(file):
 3          try:
 4              with open(file, encoding=opts.ienc) as f:
 5                  return True, f.read()
 6          except:
 7              return False, ''
 8
 9      parms = parameters.Parameters(opts.lang or '')
10      parms.multi_language = multi_language
11      packages = get_packages(opts.dcls, parms.class_modules)
12      packages.extend(get_packages(opts.pack, parms.package_modules))
13
14      if opts.extr:
15          extr = ['\\' + s for s in opts.extr.split(',')]
16      else:
17          extr = []
18      if opts.seqs:
19          parms.math_displayed_simple = True
20
21      if modify_parms:
22          modify_parms(parms)
23      p = parser.Parser(parms, packages, read_macros=read)
24      toks = p.parse(latex, define=opts.defs, extract=extr)
25
26      if not multi_language:
27          txt, pos = utils.get_txt_pos(toks)
28      if opts.repl:
29          txt, pos = utils.replace_phrases(txt, pos, opts.repl)
30      if opts.unkn:
31          txt = '\n'.join(p.get_unknowns()) + '\n'
32          pos = [0 for n in range(len(txt))]
33      pos = [n + 1 for n in pos]
34      return txt, pos
35
36  main_lang = opts.lang or ''
37  ml = utils.get_txt_pos_ml(toks, main_lang, parms)
38  if opts.repl and main_lang in ml:
39      for part in ml[main_lang]:
40          part[0], part[1] = utils.replace_phrases(part[0], part[1],
41                                                      opts.repl)
42  for lang in ml:
43      for part in ml[lang]:
44          part[1]= list(n + 1 for n in part[1])
45  return ml

Back to contents

Command-line of pure filter

The LaTeX filter can be integrated in shell scripts, compare the examples in Tex2txt/README.md.

python -m yalafi [--nums file] [--repl file] [--defs file] [--dcls class]
                 [--pack modules] [--extr macros] [--lang xy] [--ienc enc]
                 [--seqs] [--unkn] [--nosp] [--mula base] [latexfile]

Without positional argument latexfile, standard input is read.

Back to contents

Differences to Tex2txt

Invocation of python -m yalafi ... differs as follows from python tex2txt.py ... (the script described in Tex2txt/README.md).

YaLafi/yalafi/tex2txt.py is faster for input texts till about 30 Kilobytes, for larger files it can be slower than 'Tex2txt/tex2txt.py --char'. Run-time increases quasi linearly with file size. Due to token generation for each single “normal” character, memory usage may be substantial for long input texts.

<a name="equation-html-report"></a> With

python -m yalafi.shell --equation-punct all --output html test.tex > test.html

and input

For each $\epsilon > 0$, there is a $\delta > 0$ so that
%
\begin{equation}
\norm{y-x} < \delta \text{\quad implies\quad}
    \norm{A(y) - A(x)} < \epsilon, \label{lab}
\end{equation}
%
Therefore, operator $A$ is continuous at point $x$.

we get

HTML report

Back to contents

Remarks on implementation

Scanner / tokeniser

The scanner identifies token types defined in yalafi/defs.py.

Parser

The central method 'Parser.expand_sequence()' does not directly read from the scanner, but from an intermediate buffer that can take back tokens. On macro expansion, the parser simply pushes back all tokens generated by argument substitution. (Method 'Parser.expand_arguments()' collects tokens forming macro arguments and returns a list of replacement tokens that is eventually pushed back in the main loop.) The result is close to the “real” TeX behaviour, compare the tests in directory tests/.

A method important for simple implementation is 'Parser.arg_buffer()'. It creates a new buffer that subsequently returns tokens forming a macro argument (only a single token or all tokens enclosed in paired {} braces or [] brackets).

Parser for maths material

We follow the ideas described in section Handling of displayed equations, compare the tests in tests/test_display.py. All unknown macros, which are not in the blacklist 'Parameters.math_ignore', are assumed to generate some “visible” output. Thus, it is not necessary to declare all the maths macros like \alpha and \sum.

Displayed equations are parsed as follows.

Removal of unnecessary blank lines

In order to avoid creation of new blank lines by macros expanding to space or “nothing”, we include a token of type 'ActionToken' whenever expanding a macro. Method 'Parser.remove_pure_action_lines()' removes all lines only containing space and at least one such token. Initially empty lines are retained. Together with the extraction of special text flows, for instance from footnotes, this preserves sentences and paragraphs, thus improving checks and reducing false positives from the proofreading software.

Back to contents