Home

Awesome

spirit-po

Build Status Appveyor status Boost licensed

spirit-po is a header-only C++11 library that you can use for localization within the GNU gettext system, instead of using libintl.

spirit-po has the advantage that it does not use mo files, so your program can use the simple, portable po files produced directly by translators rather than require a binary format which requires special tools to create.

spirit-po is very easy to use in your C++11 projects. According to sloccount, it is only 916 lines of code in total (at time of writing), and is implemented using boost::spirit. Our po grammar itself is only a few dozen lines. This makes it relatively easy to understand its behavior, and makes the library as a whole easy to modify if needed.

By contrast, the equivalent po-parser facility within the gettext project is the libgettext-po po-manipulation library. sloccount counts the entire libgettext-po directory as ansic: 29382 (98.35%), that is, 30kloc of ANSI C. The parser is only a piece of this, but it is also quite complex and difficult to separate from the rest of the code. (Which is partly why I made spirit-po.)

An in-depth explanation of the rationale for this library as compared to libintl and boost::locale::gettext, and specifically, the advantages of parsing po files rather than mo files at run-time, is provided on the wiki.

Compatibility

spirit_po is intended to be a drop-in replacement for the use of GNU msgfmt and GNU libintl.
It should parse any well-formed .po file that msgfmt would read and the interface should produce the same results.

It's not guaranteed to reject any po file that msgfmt would reject, or to emit warnings similar to msgfmt for common translator errors. Broadly speaking, the parser has been engineered with a fail-fast mentality, and there are several unit tests that check that major structural problems cause a parse error rather than silently being accepted. However, for best results you may wish to validate po files by running them through msgfmt just to see if it emits warnings, before deploying them, even if you use spirit_po in your application.

Similarly, there are certain cases that I am aware of in which msgfmt will drop a message from the catalog if it contains invalid C format specifiers. spirit_po doesn't do this, which is a minor discrepancy.

If you are aware of any .po file which msgfmt parses, but spirit_po fails to parse, or, our emulation of the libintl interface doesn't yield expected results, please post a report on the issue tracker, with the po file included.

Quick Start

To begin, first obtain some .po files. PO files are created by translators, they contain a dictionary of translated strings.

For examples from various GNU projects, see our test folder.

Then, load the file and construct a spirit_po::catalog from it.


#include <spirit_po/spirit_po.hpp>
#include <fstream>
#include <iostream>
#include <string>

int main() {
  std::ifstream ifs("test.po");
  std::string po_file{std::istreambuf_iterator<char>{ifs}, std::istreambuf_iterator<char>()};
  
  spirit_po::default_catalog cat{spirit_po::default_catalog::from_range(po_file)};

  std::cout << cat.gettext("Hello world!") << std::endl;
  
  std::cout << cat.pgettext("Pick a number: ", "prompt") << std::endl;
  
  int number = 6;
  std::cin >> number;

  std::cout << std::endl;

  printf(cat.ngettext("Did I fire %d shot or was it only %d? Do you feel lucky, punk?",
                      "Did I fire %d shots or was it only %d? Do you feel lucky, punk?",
                      number),
         number,
         number - 1);
}

The catalog object serves translation requests using the strings it loaded from the PO file.

In this line,

std::cout << cat.gettext("Hello world!") << std::endl;

the translated form of "Hello world!" will be displayed. The result will be a const char * pointing to a string owned by the catalog. (Or, if the translation misses becaues this string wasn't in the catalog, it will simply return the english text "Hello world!", the same pointer it was passed in.)

In this line,

std::cout << cat.pgettext("Pick a number: ", "prompt") << std::endl;

a string is translated, and also marked with a context string. Sometimes the same english phrase or sentence appears in your program in multiple places, but should be translated differently depending on context. The context string allows you to provide a hint to the translator and allows the program to disambiguate the two usages. (This particular example is unfortunately a poor one.)

In this line,

printf(cat.ngettext("Did I fire %d shot or was it only %d? Do you feel lucky, punk?",
                    "Did I fire %d shots or was it only %d? Do you feel lucky, punk?",
                    number),
       number,
       number - 1);

the catalog object will look up the C-format string in the catalog, and search for the plural form corresponding to number. This ensures that "shots" will be pluralized correctly no matter what language is used. (In many languages, there are more than two plural forms and language-specific logic is needed to determine the appropriate form to use based on the number. The translator provides this logic in the po-file header.) Then we use printf to substitute the numbers into the string.

These examples are actually all rehash from gettext documentation -- the member functions gettext, pgettext, ngettext are all analogous to calls to the C library libintl.

If you aren't already familiar with gettext, have a look at their documentation.

Note: It is standard in gettext documentation examples to use functions like printf with translated strings, since it is a C library. However, if you are working in C++, you might be better off to use a type-safe alternative to printf. Otherwise, if the translator makes a mistake, or an adversary modifies the po files, you can get undefined behavior in your program with code like above. You might prefer to use something like tinyformat for instance.

Usage

Loading and Construction

When you load translations with spirit_po the loading process is entirely in your hands and you can make it work however you like. A catalog can be constructed using one of three methods:

(You should use one of these rather than using the ctor directly.)

If the po content is malformed, one of two things will happen (configurable):

Looking Up Strings (Messages)

The spirit_po::catalog object has 4 methods which are part of the gettext specification:

These implement, basic message lookup, plural-forms lookup, contextual lookup, and plural-forms-with-context lookup, respectively. See GNU gettext docs for details.

As in GNU libintl, the string pointers which are returned are non-owning pointers. When a translated form is found the catalog, the returned pointer points to a string owned by the catalog. When a translated form is not found, the returned pointer is one of the arguments. This is maximally efficient when using gettext with _ macros and such, where the input parameters will be string literals with static storage duration. However, in the general case, if the input pointer becomes invalid, then the output pointer may become invalid also.

We also give equivalent, alternate versions of these which return std::string and take const std::string & in place of const char * as parameters. In some scenarios (i.e. if you must make a copy of the output anyways, and the input string is already held by a std::string) these versions may actually be more efficient, and the lifetime of the result is unambiguous.

They are otherwise equivalent.

Managing multiple catalogs

In the simplest setup, you would have one application, and the set of all strings it needs to translate. If there are m languages you support, you might have one .pot file (po-template which you give to translators) and get back m different .po files. Your application would then load only one of them at a time based on the locale.

In many GNU programs, the situation is more complicated. The application may be broken up into components, and the strings from different components classified into different "textdomains". Then, there would be one .pot file for each textdomain. If there are m textdomains and n languages, you would have n * m different .po files.

In the traditional libintl C-interface, textdomains are, like locale, handled by a global variable. The libintl library manages the loading of catalogs and textdomains, which is accessed by gettext and friends. At any time, a global function may be called to change the current locale or textdomain, which are stored in global variables. So, if for instance your program has a UI module and a UI textdomain, the UI code would set the textdomain when it is entered, and then call gettext on each string it needs. Then when you enter another module, you would bind a different textdomain, and then make gettext calls for those strings.

There are various problems that I've experienced when writing programs that use this interface.

When using spirit_po, you should understand that, spirit_po is not attempting to emulate the entire libintl interface. A spirit_po::catalog is only a single catalog, corresponding to a single po file loaded into memory. It doesn't have dcgettext method like libintl does for instance, because a catalog only represents one textdomain.

If you don't need multiple textdomains, I recommend that you avoid it. (AFAIK the main advantage of having multiple textdomains is that if you have a large number of strings, you can assign different textdomains to different translators, to divide up the work. Many translators and translation management platforms already support having multiple textdomains for a project, and giving progress reports for each one. That is, I think this is primarily done for convenience of translators rather than programmers.)

If you do need multiple textdomains, I recommend that you throw together your own mechanism for this. For instance, if you are already using textdomains and libintl and want to switch to spirit_po, you can have a global std::map<std::string, spirit_po::catalog>. You can provide a function like dcgettext which dereferences this map at the correct textdomain, or manage the textdomain in a global variable yourself, or both. Coding this up is straightforward.

If your program has multiple threads, you can have one such map for each thread, and put it in thread-local storage. Or it may be that your main thread needs multiple textdomains, but the other threads each use at most one textdomain, which would make things simpler. You could then pass each one a catalog object when it starts up, or whatever.

Basically, I don't want to make such architectural decisions for you. spirit_po is focused just on parsing po-files and handling the queries, I don't want to force you to use global variables or broken filesystem functions. So, we don't completely reproduce the libintl interface.

In the interest of being flexible, spirit_po also allows you to merge compatible catalogs together into one master catalog. This is an alternate approach -- if the translation team wants to have multiple textdomains for their convenience, it doesn't mean the programmers have to think about multiple textdomains all the time as well.

However, merging catalogs has its own pitfalls. What happens if two catalogs contain the same string? One of them gets discarded, but which one is essentially arbitrary. You can give the catalog object a "warning channel" where it can send warnings when that happens. But it's hard to actually handle the problem or fix it. Also, merging catalogs requires dropping the metadata of one of them, and some other features like getting the line number at which a string appeared in the po file no longer makes sense, because when we merge we don't keep track of which file a string came from. So, merging catalogs might be appropriate for some projects, but I certainly wouldn't recommend it for every project.

Other functions

Some less commonly useful accessors

Customization points

spirit_po::catalog is a template, and you may customize it in two ways.

Licensing and Distribution

spirit-po is open-source software available under the Boost software license.

Dependencies

Compiler Support

spirit-po has been tested with

See .travis.yml and appveyor.yml for info about our CI.

Tests

The test/ folder contains the unit tests, built with boost.build. (There is also an old Makefile that should still work.)

The test_libintl/ folder contains the validation tests against libintl, built using cmake. To add new validation test cases, just drop new .po files in the folder test_libintl/po/.

Acknowledgements

The author thanks David White, Kristina Simpson, and others for conversations which informed the creation of this library.