Catmus logotype
[WIP]

General principles


Graphemic transcription principles

Our task1 is to find a way to translate how a text is delivered on its original medium into a machine-readable system that facilitates its learning. Our solutions will inevitably be reductive and fundamentally interpretative, since it is impossible to render all the variety of handwriting by means of the limited character set known to a computer. To characterize our approach, we will use the definitions given by Peter Robinson and Elisabeth Solopova2 of:

  • the graphetic transcription3 designates a transcription which aims to give access to all the forms of each letter or sign, also called allographs.
  • the graphemic transcription4, on the other hand, preserves the sequence of letters and reduces each form to its meaning in an alphabetical system.

In order to propose an accessible transcription system, we rejected the idea of producing graphetic transcriptions, as we felt it would be impossible to propose general recommendations for transcriptions of this type for all medieval documents from the 10th to 15th centuries. What's more, if we rely solely on the shape of the letter, some allographs are so similar to each other, such as "u" and "n", "ſ" (long s) and "f", that they could be represented by the same letters, thus disconnecting the transcription from the meaning of the word in favor of the shape of the sign. Pushing the imitation too far would risk making the transcription impossible to complete and unusable. This is why we chose to adopt the graphemic transcription principles.

Favoring Unicode's public domain

Unicode's public domain is to be preferred to the various private domains (called Private Use areas), even if they have been created by initiatives such as MUFI. When such a choice is made, it must be clearly indicated in the project's documentation.

Notes

  1. This work is the result of a synthesis of textbooks written by Ariane Pinche: Ariane Pinche. "Guide de transcription pour les manuscrits du Xe au XVe siècle." 2022, hal-03697382 and the work of Thibault Clérice, Malamatenia Vlachou-Efstathiou and Alix Chagué: "CREMMA Medii Aevi: Literary manuscript text recognition in Latin", Journal of Open Humanities Data, 2023, 9, pp.4, 10.5334/johd.97.

  2. Robinson and Solopova use the term graphetic: Robinson, Peter, et Elizabeth Solopova. « Guidelines for Transcription of the Manuscripts of the Wife of Bath’s Prologue ». In The Canterbury Tales Project Occasional Papers, 19‑52, 1993, doi: 10.5281/zenodo.4050360.

  3. Dominique Stutzmann uses to the term allographétique here. See Stutzmann, Dominique. « Paléographie statistique pour décrire, identifier, dater... Normaliser pour coopérer et aller plus loin ? » In Codicology and Palaeography in the Digital Age, 2:34, 2010; see also Camps, Jean-Baptiste. « La Chanson d’Otinel  : édition complète du corpus manuscrit et prolégomènes à l’édition critique », 2016.

  4. Stutzmann uses graphématique. Op. cit.