Letters and numbers
The source MUST be transcribed following the letters used in the source: there MUST NOT be any editorial corrections.
Ramist letters ("u"/"v", "i"/"j")
In medieval documents
Ramist letters are a distinction of form, and MUST be normalized and transcribed as "i" and "u".
In modern and contemporary documents
For modern and contemporary manuscripts, ramist normalization MAY NOT be applied and instead the transcription MAY follow the modern usage, including for capital letters.
In printed sources, the transcription MUST follow the usage of the printer.
Capital letters
Trois cas d'usage recommandés:
- lettrines ou initiales ornées;
- titres courants;
- lettres en début de mot sémantiques;
La pratique de transcription doit être homogène.
The distinction between lowercase and uppercase can be difficult to make: transcribing upper or lower case MUST be consistent.
Lowercase letters MUST NOT be normalized into capital letters. However, when the tracing of a letter at the beginning of a word is ambiguous and could be considered a lowercase letter when we expect a capital letter, we MAY transcribe it as a capital letter.
Drop capitals and decorated letters MUST be transcribed with capital letters and MUST be considered as not a part of the same text line as the line they semantically belong to: they should be identified as drop capitals during the image segmentation phase.
However, oversized capitals expanding above the baseline, in particular at the beginning of paragraphs MUST be transcribed regardless of much of the letter is missing in the mask's polygon.
Digits and numerals
Numbers MUST be transcribed as they appear in the document, whether as Roman or Arabic numerals..
- For Roman numerals, ramist standardization (u, v, i, j) MAY be applied following the rules specified above.
- Numbers written with superscript letters MUST follow the rules specified in the section dedicated to superscript abbreviations.
- The groups of numbers MUST be clearly separated, and the punctuation around the number MUST be retained where it exists.
In modern and contemporary documents, when the distinction between a capital I and a 1 is difficult to make (as can be the case in typewritten documents or in some manuscripts), we MAY follow the transcription that makes the most sense in the source.
Old-style roman numerals M and D (sometimes transcribed as "CIↃ" and "IↃ") MUST be transcribed as, respectively, M and D.