Datasets
Medieval Dataset
The following datasets are in the current version of the HuggingFace dataset, as seen in the paper from CATMuS Medieval.
All data were passed through Choco Mufin. Any attempt to replicate the HuggingFace dataset should use the conversion table provided with each dataset. The conversion table for the modified dataset will be made available.
language | url | version |
---|---|---|
Latin | DEEDS-Project/htr-dataset | v0.0.8 |
Latin | HTRomance-Project/medieval-latin | v0.0.8 |
Spanish Languages | HTRomance-Project/middle-ages-in-spain | v0.0.6 |
Old/Middle French | HTRomance-Project/medieval-french | v0.0.9 |
Italian Languages | HTRomance-Project/medieval-italian | v1.0.2 |
Old/Middle French | HTR-United/cremma-medieval | v2.0.1 |
Latin | HTR-United/cremma-medieval-lat | v0.1.2 |
Old/Middle French | Gallicorpora/HTR-imprime-gothique-16e-siecle | v0.0.19 |
Old/Middle French | Gallicorpora/HTR-MSS-15e-Siecle | v0.0.37 |
Old/Middle French | Gallicorpora/HTR-incunable-15e-siecle | v0.0.29 |
Old/Middle French | ciham-htr/fabliaux | v0.0.22 |
Old/Middle French | ciham-htr/liber | v0.0.5 |
Latin/Italian/French | Reorganized from HN2021 Boccace | last |
Old/Middle French | Reorganized [Decameron-Fr] | last |
Latin | Reorganized malamatenia/Eutyches | last |
Spanish Languages | Reorganized & Augmented Gille-Levenson's PhD Data | last |
Latin | Adapted rescribe/carolineminuscule-groundtruth | 2023-10-03 |
Vocabulary:
- Adapted: Image were replaced / rescaled
- Reorganized: Data were reorganized in a way that made them more usable
- Agumented: Data whose digitization were not publicly available where used to extract lines.