Home

Awesome

Cross-Linguistic Transcription Systems

Build Status

This repository provides the data underlying the "cross-linguistic transcription systems" project (CLTS [siː ɛl tʰiː ɛs]), which offers transcription systems and transcription data for various sources. Please see CONTRIBUTING.md for more information on how to contribute.

Master data

This repository contains files that are generated by running commands from the pyclts package, intended to help with curation. Thus, it is important to know where master (or authoritative) copies of certain data types live (i.e. where to edit data).

CLDF Dataset

CLDF Metadata: cldf-metadata.json

Sources: data/references.bib

The Cross-Linguistic Transcription Systems (CLTS) project provides a catalog of speech sounds aggregated from (and linked to) phonetic notation systems from various sources.

propertyvalue
dc:conformsToCLDF Generic
dc:identifierhttps://doi.org/10.5281/zenodo.3515744

<a name="table-sourcesindextsv"></a>Table sources/index.tsv

CLTS is compiled from information about transcriptions and how these relate to sounds from many sources, such as phoneme inventory databases like PHOIBLE or relevant typological surveys.

propertyvalue
dc:extent33

Columns

Name/PropertyDatatypeDescription
NAMEstringPrimary key
DESCRIPTIONstring
REFSlist of string (separated by , )References data/references.bib::BibTeX-key
TYPEstring<br>Valid choices:<br> td ts scCLTS groups transcription information into three categories: Transcription systems (ts), transcription data (td) and soundclass systems (sc).
URITEMPLATEstringSeveral CLTS sources provide an online catalog of the graphemes they describe. If this is the case, the URI template specified in this column was used to derive the URL column in graphemes.csv.

<a name="table-datafeaturestsv"></a>Table data/features.tsv

The feature system employed by CLTS describes sounds by assigning values for certain features (constrained by sound type). The permissible values per (feature, sound type) are listed in this table.

propertyvalue
dc:extent163

Columns

Name/PropertyDatatypeDescription
IDstringPrimary key
TYPEstring<br>Valid choices:<br> consonant vowel toneCLTS distinguishes the basic sound types consonant, vowel, tone, and marker. Features are defined for consonants, vowels, and tones.
FEATUREstringNote that CLTS features are not necessarily binary.
VALUEstring

<a name="table-datagraphemestsv"></a>Table data/graphemes.tsv

propertyvalue
dc:extent81895

Columns

Name/PropertyDatatypeDescription
PKintegerPrimary key
GRAPHEMEstringGrapheme used in a particular transcription to denote a sound
NAMEstringThe ordered concatenation of feature values of the denoted sound<br>References data/sounds.tsv::NAME
BIPAstringThe grapheme for the denoted sound in the Broad IPA transcription system
DATASETstringLinks to the source of this grapheme<br>References sources/index.tsv::NAME
FREQUENCYinteger
URLanyURIURL of the grapheme in its source online database
IMAGEstringImage of the typeset grapheme.
SOUNDstringAudio recording of the sound being pronounced.
EXPLICITstringIndicates whether the mapping of grapheme to sound was done manually (explicitly, +) or whether it was inferred from the Grapheme.
FEATURESstringFeatures of the sound as described in the local feature system of the source dataset
NOTEstring

<a name="table-datasoundstsv"></a>Table data/sounds.tsv

propertyvalue
dc:extent8765

Columns

Name/PropertyDatatypeDescription
IDstring
NAMEstringOrdered list of features + sound type<br>Primary key
FEATURESlist of string (separated by )Ordered list of feature values for the sound.<br>References data/features.tsv::ID
GRAPHEMEstringCLTS choses the BIPA grapheme as canonical representative of the graphemes mapped to a sound.
UNICODElist of string (separated by /)Unicode character names of the codepoints in GRAPHEME
GENERATEDbooleanIndicates whether the sound was inferred by our algorithmic procedure (which is active for all diphthongs, all cluster sounds, but also all sounds which we do not label explicitly) or whether no inference was needed, since the sound is explicitly defined.
TYPEstring<br>Valid choices:<br> consonant vowel diphthong tone clusterCLTS defines five sound types: consonant, vowel, tone, diphthong, and cluster. The latter two are always GENERATED.
NOTEstring