Linguistic segmentation
Spacing
Segmentation, referring to the separation of text into "words", can be purely subjective as this notion only really makes sense of printed matter. The irregularility of spacing in manuscripts have led to the following choices:
Word segmentation MUST follow a modernized segmentation, to ensure homogeneity, using typographic space (" " [U+0020]).
Exceptions and special cases
- When modern usage would like to show an elision (such as
q'il
forqil
or doubled consonants such asá riva
forarriva
), whereas the source tends to use an agglutination, agglutinations MUST be kept (we noteqil
andarriva
. - When in doubt, word segmentation MAY follow the usage of the contemporary or normalized language of the manuscript.
- Locutions in the process of being lexicalized (such as for verbs like
enchargier
/en chargier
,en fuir
/enfuir
or certain locutions such asaujourd'hui
) MUST stay as close as possible to the source. - Initial in poetry separated from the rest of the text MAY be transcribed separately.
Agglutinations and dragging strokes in cursive writing MUST NOT be imitated in the transcription. Therefore, we MUST transcribe "et en effet" and not "eteneffet" or "et_en_effet", even when the quill was not lifted from the paper.
Hyphenation
Hyphenation refers to the act of indicating that a word was cut off at the end of a line. It can be marginal in medieval manuscripts but is frequent in modern and contemporary sources.
Hyphenation MUST be transcribed whenever it exists in the source.
The transcription MUST NOT add hyphenation symbol to signal the hyphenation in the source if the hyphenation mark is not in the source.
The character "-" [U+002D] MUST be used to transcribe the hyphenation symbol, whichever symbol is traced on the source.
When the hyphenation symbol is repeated at the beginning of the next line, it should also be transcribed with "-" [U+002D].
Diastoles
Sometimes, diastoles (vertical or oblique pen strokes) are drawn between two contiguous letters to indicate that they belong to different words.
Word-separating diastoles MAY be transcribed with the sign "/" [U+002F].