Appendix:Bulgarian hyphenation
Hyphenation and syllabification of Bulgarian on Wiktionary
editThese two features of pronunciation are handled by the {{bg-hyph}}
template, which outputs both hyphenation and syllabification, or just "Hyphenation" if the two happen to be the same. To see which of the two means what, please see the sections below.
Hyphenation
editHyphenation, generally, is a system of rules that decides at which points a word can be broken over two lines with a hyphen. This is generally used in word processing, for example, to delineate the boundaries at which a word can be split onto a new line and still look pleasing. Contrarily, syllabification concerns dividing the word into the spoken syllables that make it up.
For Bulgarian, the rules of hyphenation are published by the Institute for Bulgarian Language in their orthographic dictionary, and codify the precepts by which a[note 1] valid hyphenation must abide.[note 2]
The rules are as follows:
- A consonant between two vowels links with the second vowel. For example, ви-со-чи-на (vi-so-či-na).
- In a sequence of two or more consonants between two vowels, at least one consonant stays with first vowel and at least one with the second vowel. For example, сес-тра (ses-tra) and сест-ра (sest-ra). [note 3]
- Two equal consonants are separated. For example, плен-ник (plen-nik).
- In a sequence of two or more vowels, the first vowel stays before the hyphen. For example пре-одолея (pre-odoleja) and прео-долея (preo-doleja).[note 4]
- In a sequence of three or more vowels, the last vowel stays after the hyphen. For example, мао-изъм (mao-izǎm), but not маои-зъм (maoi-zǎm).
- The letter й (j) between a vowel and a consonant stays with the vowel. For example, май-ка (maj-ka).
- When a sequence of two or more consonants follows й (j), at least one consonant links with й (j). For example, айс-берг (ajs-berg) (not ай-сберг (aj-sberg)).
- The letter й (j) between two vowels links with the second vowel. For example ма-йор (ma-jor).
- No hyphenation before or after ь (palatalization mark).
- When the letters дж (dž) denote a single consonant, then they are not separated. For example, су-джук (su-džuk) (not суд-жук (sud-žuk)), but над-живея (nad-živeja).
- There must be at least one vowel before and after the hyphen.
- One letter does not stay alone.
These are adapted and reproduced from this article by the University of Sofia. The rules above apply to the 1983 specification of the hyphenation standard, but they are also forward-compatible with the latest 2012 standard, which introduces the following two changes:
- Rule 5 is rescinded.
- A hyphenation that violates the above rules, but is more morphologically consistent (i.e. better separates the word on its morpheme boundaries) is allowed.
Because the two additions are merely permissive, and not compulsory, our algorithm still produces valid results.
As the University of Sofia identifies, the hyphenation rules as of 1983 do not impose any requirement of morphological sense, which can make some hyphenations look strange. Please be aware that there may actually be numerous valid hyphenations of a word, but our algorithm as used on Wiktionary will only ever choose one.
Notes
edit- ^ A word may have more than one valid hyphenation.
- ^ Note also that a multi-word term can be broken on a space or hyphen, so you may consider those to be potential hyphenation points, too.
- ^ In practice, this algorithm allots one consonant to the left side of the boundary, and any other consonants to the right side, thus satisfying the rule on both sides.
- ^ We apply the same rule as for consonants, i.e. one vowel goes on the left, and subsequent vowels go on the right (before the subsequent consonant). This also satisfies rule 5 for free.
Syllabification
editSyllabification is the process of breaking down a word into its spoken syllables, each of which has (in Bulgarian) a vowel in the middle, optionally with consonants before and after. For example, преодолея (preodoleja) can be syllabified as пре‧о‧до‧ле‧я. Unlike hyphenation, which focuses on where a word can be orthographically split, syllabification is concerned more with the phonetic aspect, and so is beholden to the below general phonetic rules:
- Each syllable must have exactly one vowel.[note 1]
- A new syllable is formed when the sonority[note 2] of sounds stops decreasing.
- The sonority scale for Bulgarian is defined to be the following:
- Fricatives (в, ф, ж, ш, з, с, х): 1
- Stops (plosives; б, п, г, к, д, т) and affricates (ч, ц): 2
- Sonorants (л, м, н, р, й, ў[note 3]): 3
- Vowels (а, ъ, о, у, е, и, ю, я): 4.
- Anything else (not sounds, e.g. punctuation): 0.
The above make up basically the most pertinent rules of syllabification. We also perform some smaller adjustments, as there are times when this general process is not perfect:
- Certain prefixes, such as без- (bez-), пред- (pred-), and превъз- (prevǎz-), would be incorrectly handled by this algorithm, so we treat them specially to ensure they always appear in their correct form at the beginning of a word.
- There are also three limited cases where a sequence should be broken according to sonority rules, but in Bulgarian it isn't. These are ств, св, and вс.
- Certain consonant clusters, e.g. км, цн, тн, згн, adhere to the rising sonority principle, but are unnatural as the onset of a syllable (in Bulgarian, at least). For each consonant cluster, we make sure it is not one of these, but if it, we break it up differently to make the syllabification more natural.
Notes
edit- ^ Words with no vowels such as с (s) from this consideration.
- ^ Sonority can be thought of as the "loudness" of a sound, where certain sounds are considered to be louder than others. In a given syllable (in some languages, like Bulgarian and English), the sonority starts out low, and is only allowed to rise. If we look at a word and identify a part where two letters have the same sonority, or the second one has lower sonority, then the lower letter marks the beginning of a new syllable (the letter after it forms the first letter of the new syllable); it's then guaranteed that this syllable will have its own vowel, per rule 1.
- ^ This symbol represents у (u) when it acts as a consonant (/w/. It is not used in real Bulgarian, and is only used internally to denote the sound /w/.