Awesome

This python module provides an API with data about languages/regions/scripts for use in the language-support categorization of the font families in the Google Fonts collection.

You can also directly access the raw textproto files on the Lib/gflanguages/data directory:

Most of the code in this project was copied from the gftools repository (https://github.com/googlefonts/gftools/) so that language/region/script data can be easily available to all our tools without having to also get the large dependency tree of gftools. The most immediate user of this module is Font Bakery, which needs to validate language support on font binaries being checked. (see https://github.com/googlefonts/fontbakery/issues/3605)

The second obvious user of this gflanguages module is gftools itself.

Language/region/script definitions and the gflanguages modules are used as a subtree in the google/fonts repo, on its lang/ directory (https://github.com/google/fonts/tree/main/lang).

This module is the main place to update these definitions, avoiding data duplication and guaranteeing uniformity across tools.

To learn more about how lang metadata affects downstream, see gf-guide/lang.

Sample text rules

If there is a sample_text field for a language, it should contain all of the following fields:

masthead_full: show off four glyphs
masthead_partial: show off two glyphs
styles: a phrase of 40-60 characters
tester: a phrase of 60-90 characters
poster_sm: a word or phrase of 10-17 characters
poster_md: a word or phrase of 6-12 characters
poster_lg: a word or phrase of 3-8 characters
specimen_48: a sentence of 50-80 characters
specimen_36: a paragraph of 100-120 characters
specimen_32: a paragraph of 140-180 characters
specimen_21: one or more paragraphs totalling 300-500 characters
specimen_16: one or more paragraphs totalling 550-750 characters

Generally the sample text should be taken from the UN Declaration of Human Rights; if using Eric Muller's XML translations, snippets/lang_sample_text.py will convert the XML into textproto.

If the UDHR is not available in the language, the sample text should be a "neutral" text (not political or religious) - folk tales are generally good sources. (We recognise that for some liturgical languages, religious texts may be the only extant samples.) In these cases, please add a note: field with the source of the sample text.