Catmus logotype
[WIP]

Linguistic segmentation


Spacing

Segmentation, referring to the separation of text into "words", can be purely subjective as this notion only really makes sense of printed matter. The irregularility of spacing in manuscripts have led to the following choices:

Word segmentation MUST follow a modernized segmentation, to ensure homogeneity, using typographic space (" " [U+0020]).

Exceptions and special cases

  • When modern usage would like to show an elision (such as q'il for qil or doubled consonants such as á riva for arriva), whereas the source tends to use an agglutination, agglutinations MUST be kept (we note qil and arriva.
  • When in doubt, word segmentation MAY follow the usage of the contemporary or normalized language of the manuscript.
  • Locutions in the process of being lexicalized (such as for verbs like enchargier/en chargier, en fuir/enfuiror certain locutions such as aujourd'hui) MUST stay as close as possible to the source.
  • Initial in poetry separated from the rest of the text MAY be transcribed separately.

Agglutinations and dragging strokes in cursive writing MUST NOT be imitated in the transcription. Therefore, we MUST transcribe "et en effet" and not "eteneffet" or "et_en_effet", even when the quill was not lifted from the paper.

Hyphenation

Hyphenation refers to the act of indicating that a word was cut off at the end of a line. It can be marginal in medieval manuscripts but is frequent in modern and contemporary sources.

Hyphenation MUST be transcribed whenever it exists in the source.

The transcription MUST NOT add hyphenation symbol to signal the hyphenation in the source if the hyphenation mark is not in the source.

The character "-" [U+002D] MUST be used to transcribe the hyphenation symbol, whichever symbol is traced on the source.

When the hyphenation symbol is repeated at the beginning of the next line, it should also be transcribed with "-" [U+002D].

Diastoles

Sometimes, diastoles (vertical or oblique pen strokes) are drawn between two contiguous letters to indicate that they belong to different words.

Word-separating diastoles MAY be transcribed with the sign "/" [U+002F].