Home

Awesome

FROC-MSS: Old French and Old Occitan Medieval Manuscripts HTR Data and Models

This repository contains:

If you plan of using this data or the provided model for a publication, please cite it, as:

Jean-Baptiste Camps (éd.), FROC-MSS: Old French and Old Occitan Medieval Manuscripts HTR Data and Models, Paris: École nationale des chartes (PSL), 2018, https://github.com/Jean-Baptiste-Camps/FROC-MSS.

Data format

The data is as following:

Unicode NFD normalisation has been applied on the ground-truth text.

Models

Summary and C.E.R.

The root folder contains a vanilla Kraken model (model_froc.mlmodel), trained with default settings and without any additional data (e.g. no artificial noised data).

Data was randomly divided in 80% for training (train.txt), 10% for in-training validation (val.txt) and 10% for final testing of the model (test.txt).

It achieved a C.E.R. of:

Errors and most frequent confusions on test data

There were 13540 characters and 1061 errors on test data.

Globally, the error are as follow:

The most frequent confusions concerned spacing.

The 20 most frequent confusions are:

Errors	Ground Truth-Prediction
70	{ SPACE } - {  }
54	{  } - { SPACE }
48	{ ı } - {  }
43	{ n } - {  }
43	{ COMBINING ACUTE ACCENT } - {  }
27	{ e } - {  }
24	{ l } - {  }
24	{ u } - {  }
21	{ . } - {  }
20	{ u } - { n }
18	{ ſ } - {  }
18	{ a } - {  }
17	{ r } - {  }
14	{ t } - {  }
13	{ COMBINING TILDE } - {  }
13	{  } - { ı }
12	{ o } - { e }
12	{ o } - {  }
12	{ ı } - { m }
11	{ e } - { c }

List of manuscripts

The data comes from partial allographetic transcription of the following mss:

For these transcriptions, see: Jean-Baptiste Camps, La `Chanson d’Otinel’: édition complète du corpus manuscrit et prolégomènes à l’édition critique, PhD thesis, dir. Dominique Boutet, Paris-Sorbonne, 2016, DOI: https://doi.org/10.5281/zenodo.1116735.

<!-- TODO: à compléter avec les autres manuscrits: Vatican, Mende, … -->

License

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Licence Creative Commons" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />Cette œuvre est mise à disposition selon les termes de la <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Licence Creative Commons Attribution 4.0 International</a>.

Contribute

If you want to contribute training data or models, you can do so by cloning the repository and sending us a pull request, or by sending an email at jbcamps at hotmail.com .

Cite this repository

Jean-Baptiste Camps (éd.), FROC-MSS: Old French and Old Occitan Medieval Manuscripts HTR Data and Models, Paris: École nationale des chartes (PSL), 2018, https://github.com/Jean-Baptiste-Camps/FROC-MSS.