Home

Awesome

<div align="center"> <img src="https://raw.githubusercontent.com/proycon/folia/master/logo.png" width="200" /> </div>

FoLiA: Format for Linguistic Annotation

tests documentation lamabadge DOI Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Documentation | Examples | Python Library | Python Library Documentation | C++ Library | Rust Library | FoLiA-Tools | FoLiA Utilities | FLAT: Web-based Annotation environment

by Maarten van Gompel, CLST/Radboud University Nijmegen & KNAW Humanities Cluster

https://proycon.github.io/folia

FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources. FoLiA's intended use is as a format for storing and/or exchanging language resources, including corpora. Our aim is to introduce a single rich format that can accommodate a wide variety of linguistic annotation types through a single generalised paradigm. We do not commit to any label set, language or linguistic theory. This is always left to the developer of the language resource, and provides maximum flexibility.

XML is an inherently hierarchic format. FoLiA does justice to this by maximally utilising a hierarchic, inline, setup. We inherit from the D-Coi format, which posits to be loosely based on a minimal subset of TEI. Because of the introduction of a new and much broader paradigm, FoLiA is not backwards-compatible with D-Coi, i.e. validators for D-Coi will not accept FoLiA XML. It is however easy to convert FoLiA to less complex or verbose formats such as the D-Coi format, or plain-text. Converters are provided.

The main characteristics of FoLiA are:

The FoLiA format makes mixed-use of inline and stand-off annotation. Inline annotation is used for annotations pertaining to single tokens, whilst stand-off annotation in a separate annotation layers is adopted for annotation types that span over multiple tokens. This provides FoLiA with the necessary flexibility and extensibility to deal with various kinds of annotations.

Notable features are:

Paradigm Schema

<div align="center"> <img src="https://github.com/proycon/folia/blob/master/docs/folia_paradigm2.png" width="800" /> </div>

Resources

A more extensive list of FoLiA-capable software is maintained on the FoLiA website

Publications

See the FoLiA website for more publications and full text links.