Wiktionary:Alphabetical order

About edit

Some sections of Wiktionary are lists of terms that are supposed to be ordered alphabetically. The purpose of this article is to precisely define what that means. Note that this page is a work in progress, and does not have any official policy status.

Alphabet soup edit

In linguistics, an alphabet is a standardized set of symbols that, through phonetic rules or guides, roughly represents the oral sonance of a natural language. That is to say, alphabetic symbols are loosely associated with phonemes, sometimes more closely as with Spanish, sometimes in complex combination as with French, and sometimes illogically as with English.

Formally, an alphabet distinguishes between consonant and vowel sounds, whereas consonant symbols in an abugida such as Thai are implicitly associated with a vowel sound should none be indicated. A syllabary such as Japanese kana is also phonetic. Each symbol, or syllabogram, represents a group of phonemes, but phonetically related syllabic symbols are not related graphically. Less formally, all of these may be referred to as alphabets, especially when an order is imposed. In contrast, logogrammatical systems such as hieroglyphics, with a large number of ideograms representing many morphemes, are not considered alphabets even in a loose sense. Such systems have highly specialized rules of ordering, if any.

Languages are grouped into classes of nearly identical alphabets called scripts. English and the European languages are written in Roman script, also called Latin script. Other widely shared scripts include Arabic, Chinese, Cyrillic, and Devanāgarī. Confusingly, these scripts are sometimes also referred to as alphabets. However, the rules for ordering, besides the alphabet itself, may vary between languages that use the same script. Many other scripts, such as Javanese, are used with only one or a handful of very closely related languages.

The symbols of an alphabet for a particular language are called letters. These are classes of various letterforms, or graphical signs, that are interchangeable in a given language. For instance, the two glyphs a and ɑ, one with a hook at top and one without, represent the same letter (a) in the English alphabet. The two glyphs и and u are different stylizations of the same letter (i) in the Serbian alphabet.

Many languages also make a distinction with an attribute of letters known as case, especially between majuscule (upper case) and minuscule (lower case). For instance, the glyph B is the majuscule of β (beta) in Greek and of в (ve) in Russian. A different case may be substituted based on rules of capitalization for a given language. Each case exists uniquely for every letter of an alphabet, with rare exceptions such as the the minuscule ſ (long s) and likewise ß (sharp s) in German.

The order of the alphabet helps to establish a lexical order on the language, but often with exceptions. Language rules may consider certain combinations, even if they are not linked visually, to be a single letter or digraph, such as the seventh letter dz of the Hungarian alphabet, or the archaic ฦๅ (lo lue) in Thai. In many cases tonal marks, diacritics such as the accent, and various other symbols are not counted as part of the letter, and would be considered secondarily in ordering words. Additionally, often letters that are not part of the standard alphabet nonetheless find their way into modern formal writing. For certain languages, lexical comparisons normalize foreign and obsolete letters as native or more contemporary ones. For instance, ç and ï equate to c and i in English, omitting the diacritical marks, and the single ligature æ to the two letters ae.

More broadly, letters are a type of grapheme, which may also include numerals, punctuation marks, logograms, and other symbols. Punctuation marks are those graphical signs used to structure and organize writing, which may therefore affect pronunciation. Often all non-alphanumeric symbols, including logograms such as @ (at), ∴ (therefore), and marks for currency, are considered punctuation. Punctuation more generally includes spacing, indentation, and many other aspects of writing. Its use is more prevalent in languages of European origin and a modern invention in others. Punctuation is completely excluded from the highest level, or sometimes the first few levels, of lexical comparison.

Spaces are not considered graphemes, but they do classify as characters, which are codes for graphemes and for control of layout. An encoding is a system of assignment of these codes. Note that only under artificial circumstances could an encoding perfectly match graphemes and characters. Usually a handful of the less common graphemes will be omitted, or some similar graphemes such as the virgule (forward slash) and solidus (used in fractions) may be considered duplicative. The more extensive encodings, on the other hand, tend to have duplicate assignments for the same glyph. Unicode, in particular, does this to separate scripts which have different associations of capitalization.

English edit

The English alphabet

Minuscules a b c d e f g h i j k l m n o p q r s t u v w x y z
Majuscules A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Lexical ordering

First level left to right end of string < Hindu-Arabic numeral < English letter or equivalent < logogram
Second level left to right no separation < punctuation or spacing
Third level left to right standard alphabetical letter < equivalent
Fourth level left to right minuscule < majuscule

Numerals

Hindu-Arabic 0 1 2 3 4 5 6 7 8 9

Foreign and archaic equivalents

æ Æ a + e A + e as in æsthetics
Æ A + E as in encyclopædia
ç Ç c C as in façade
é É e E as in résumé
ï Ï i I as in naïve
ñ Ñ n N as in mañana
N + o as in

Punctuation

interpunct < hyphen or dash (short to long) < apostrophe < space (thin to wide) < period/full stop < exclamation point < interrabang < question mark

See also edit