Wiktionary:About Contemporary Arabic

The template Template:policy-DP does not use the parameter(s):

1=Language considerations (Contemporary Arabic)

Please see Module:checkparams for help with this warning.

This is a Wiktionary policy, guideline or common practices page. This is a draft proposal. It is unofficial, and it is unknown whether it is widely accepted by Wiktionary editors.

Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

Shortcut:
WT:CON AR

This is an attempt to standardize the documentation of "Contemporary Arabic" terms, i.e. terms in the contemporary Arabic varieties. See WT:About Arabic for considerations relating to Modern Standard Arabic, and WT:About Arabic/Egyptian for a page discussing transliteration of Egyptian Arabic.

Being a think-tank draft, some of this page's proposals are nonstandard or disagreeable. The nonstandardest, disagreeablest ones are marked with the text [DISCUSS], but everything else on the page is open for discussion and editing as well.

Note that, while the goal of "unified Arabic" on Wiktionary is a good one to work toward (compare Chinese, with only one language header for all concerned varieties), this page operates under the assumption that each individual dialect or dialect group will be listed under its own heading.

Regarding entry layout

Verbs

In verbs' headword lines, we follow up the standard past-tense lemma (e.g. فَعَل (faʕal)) with the 3sg.m nonpast subjunctive (e.g. يَفْعَل (yafʕal)). For varieties where the "subjunctive" is not a distinct verb form, we instead use the least-inflected 3sg.m nonpast form accordingly. (Some current entries for Levantine or Egyptian instead show the present indicative, which has an unnecessary b- prefix.)
- A North Levantine example: رَكَد • (rakad) (nonpast يِركُد (yirkud))
Verb-conjugation tables list all possible conjugations, but cross out or highlight (in red) the implausible or unused ones.
Verb-conjugation tables list their respective varieties' "new passives", which are typically derived from or by analogy with the old active-voice reflexive verb forms (e.g. تْفَعَّل (tfaʕʕal) ٱنْفَعَل (nfaʕal) ٱتْفَعَل (tfaʕal)), as passive-voice conjugations. (Obviously, this doesn't apply to varieties that have preserved the internal-voweling passives.)
In entries for verbs that are both "new passives" and fossilized MSA loans of the aforementioned verb forms, e.g. تْصَوَّر (tṣawwar, “to imagine; or, passive of صَوَّر (ṣawwar)”), we use an Etymology 1 heading for the latter and secondary etymologies for their passive-voice meanings.

Not verbs

For prefixes, and for prepositions that normally require object-pronoun suffixes, the main entry shows the base form (with no suffix) alongside an inflection-table template. If morphological conditions cause the morpheme to change form, then there will be another suffixless entry for the other form with the text "[condition] form of [main entry]." (e.g. ب/في in Levantine)
- [DISCUSS] This works fine for examples like عِنْد (ʕind) that can function without a pronoun suffix, but it starts to get awkward at e.g. بَدّ (badd, “to want”), and it gets really awkward around هَيَّ (hayya, “here's”) and يَحّ (yaḥḥ, “here's”). Instead, perhaps the 3sg.m form should be the default for prepositions that require a suffix, giving Levantine بَدُّو (baddo, “to want”), هَيَّاه (hayyāh, “here's”), and يَحُوِّي (yaḥ(ḥ)uwwe, “here's”). The entries could list these definitions along with a qualifier that this is the 3sg.m form.

Regarding orthography

Arabic script

We do not represent /ʔi/ with ئ if it is the first radical or if it is word-initial but preceded by a prefix. Instead, we only use إ. This gives لإنو /laʔinno/ for some varieties' "because" (formerly under لئنو as the lemma, which is not preferable), متإكد /mitʔikkid/ "sure of", etc., but keeps سئيل /saʔiːl/ as is.
We spell the 3pl subject-conjugation suffix, also 1pl in the Maghreb, as ـُوا (-u) — not ـُو (-u). This isn't necessarily any "more correct", but it's an equally- or even a more-accepted spelling in most regions, so there's nothing wrong with preferring it.
The 3sg.m object suffix is spelled ـو in varieties where both (1) it is pronounced /u~o/ and (2) it is already customary among speakers to write it as ـو. Otherwise, it is spelled ـه.
Regarding verb forms incorporating the t-prefix, in varieties that elide its original /ta-/ vowel: if the resulting initial consonant cluster is allowed as is, like in Levantine تفَعَّل (tfaʿʿal), then the verb should not be written with an initial alif. But if the variety in question adds a short prosthetic vowel before this new cluster, as in Egyptian اتفعَّل (itfaʿʿal), then it is written with a hamza-less ا.

Transcription

As a rule of thumb, IPA transcription enumerates all recognizably distinct pronunciations, whereas Romanization aims to represent all possible pronunciations using as few transliterations as possible (by way of polyphony), and preferably only one transliteration.

All manners of transcription

[DISCUSS] ـِيّـ (-iyy-) and ـُوّـ (-uww-) should be transcribed as "VCC" rather than as "VVC". Morphologically they're VVC, as is plainly proven by a number of things^[1], but on the surface they're pronounced and analyzed as VCC. Proof/notes:
- The Arabic ـِيَّة (-iyya) suffix has shifted in Egyptian Arabic into something resembling /ejja/. One would expect /eːja/ were the original sequence analyzed as VVC.
- Urban North Levantine varieties show something that very much appears to be a distinction between the two: historic ـُوهَا (-ūhā) and ـِيهَا (-īhā) undergo h-elision into /-uːwa/ and /-iːja/, which aren't intuitively interpretable as /-uwwa/ and /-ijja/, whereas historic ـِيّـ (-iyy-) and ـُوّـ (-uww-) remain /ijj/ and /uww/. This gives, for example, جُوَّا (juwwa, “inside”) vs. فَرجُوَا؟ (farjūwa, “they showed her”). The other explanation would be that there's a hiatus there, but hiatus has always been verboten in Arabic to my knowledge — and native speakers in Lebanon consistently Romanize these as ⟨-iya⟩ and ⟨-uwa⟩, rarely as ⟨-ia⟩ and ⟨-ua⟩.
- Lastly, writing these patterns in this way also lets us match both Wiktionary's current MSA-Romanization scheme and the Arabic-script convention of writing a shadda on the semivowel consonant.
- ^[1] ...the "number of things" being:
  1. Dialectal "صِنِيّة (ṣiniyya) pl. صنايا (ṣanāya)", or "رِيّة (riyya) pl. روايا (rawāya)", or "صَبِيَّة (ṣabiyya) pl. صبايا (ṣabāya)", etc. — the long ā in the plural is necessarily derived from a corresponding long vowel in the singular
  2. The fact that [i] can never be stressed in a heavy syllable in Lebanese Arabic, yet the word مِيّة (hundred) has no issue with the first syllable being stressed, indicating that underlyingly it's not *[ˈmij.je] but [miːje]
We DO NOT represent word-final orthographic long vowels as phonemic (or phonetic) long vowels. They are phonetically pronounced and phonemically analyzed as short vowels — heck, they're even pronounced short in MSA, but at least declining to represent MSA pronunciation has a basis in diachronics. There is zero reason to do the same for contemporary Arabic.
- The exception is in single-syllable words such as شِي شُو مَا مُو جَا هِي هُو (šī šū mā mū jā hī hū), all of which are monosyllabic in various lects and thus free to alternate between having a long and a short final vowel.
We always double underlyingly geminate consonants, which some may not double when the gemination isn't obvious in speech. /baddkun/ and ⟨baddkun⟩ for Levantine "2pl want", for example, not /badkun/ and ⟨badkun⟩. This also goes for morphologically-doubled word-final consonants.
If, in some variety, there appears to be a word-initial vowel that mysteriously isn't preceded by a glottal stop when pronounced, then it really is a semivowel and we write it as such. (Currently, the Gulf Arabic entry for كَبّ (kabb) is in violation of this: the imperfective is spelled ikubb rather than ykubb.)
Stress can still be determined automatically if a long vowel is in a word's final three syllables or if a word is monosyllabic. However, we represent stress explicitly using an accent in the absence of these conditions, as it can be unpredictable otherwise. (Current Romanizations on Egyptian Arabic entries represent stress unconditionally, which looks cluttered — especially when the accent stacks on top of a vowel-length macron — and should not be necessary.)

IPA only

Again, IPA transcriptions enumerate all common and phonemically distinct variants within the given dialect or dialect group, where "common" is probably best defined as "recognizably present either in more than one country or all throughout a single country". This does leave out affectations like Lebanese rounding of post-emphatic /aː/, which (although recognizable) is a minority phenomenon in the big picture.

Romanization only

Our Romanization scheme is based on Wiktionary's current MSA Romanization, with the following exceptions and notes.

[DISCUSS] Use IPA ⟨ʕ⟩ ⟨ʔ⟩, not their Hans Wehr equivalents. Not only are the apostrophic Hans Wehr marks hard to see and distinguish from one another, but their use also sort of feeds the West-centric misconception that the glottal stop & pharyngeal approximant/fricative are "not really consonant sounds" not deserving of their own letters. Besides, ⟨ʕ⟩ ⟨ʔ⟩ are derived from the same apostrophes that Hans Wehr uses.
- If this is implemented, we can actually retain ⟨ʾ⟩ and have it serve a good purpose: representing a word-initial glottal stop that is elidible, but only in some of the varieties that the transliteration concerns. For example, some (but not all) North Levantine speakers have pronouns like اَنَا وَانا (ana wana, “I, and I”) rather than أَنَا وأَنَا (ʔana wʔana), but because both variants exist, "wʾana" could be used to bridge the gap. It would replace the clunkier "wʔana or wana".
- [DISCUSS] On this topic, something needs to be done about Hans Wehr ⟨ẓ⟩. Maybe. It's bad enough in MSA because it suggests a false pronunciation, but it's even worse now because there's certainly a phonemic contrast between /zˤ/ and /ðˤ/; there may even be varieties with minimal pairs in the two! However, introducing an unnecessary orthographic contrast violates the "as few transliterations as possible" goal of Romanization, so it's best to discuss what exactly is warranted here. If distinguishing the two is a good idea, then the standard solution is presumably to use ◌̣ U+0323 COMBINING DOT BELOW to create ⟨ḏ̣⟩, but that's quite ugly and likely not displayable in the same way everywhere, so this will need to be discussed as well. (It's currently used in the Gulf entry for ضبط, if we want to see it in action.) A better workaround may be ⟨ḏ̇⟩ (combining macron below) or ⟨ḏ̇⟩ (combining dot above).
[DISCUSS] In pursuit of transliteration-homogenization, original Arabic */aw/ and */aj/ should be represented using ⟨w⟩ and ⟨y⟩, because whether or not they monophthongize varies wildly depending on region and speaker. This way, ⟨ywmyn⟩ can be read as any of /joːmeːn/, /jawmeːn/, /joːmajn/, /jawmajn/ without needing to write all four possibilities explicitly.
- If this is implemented, then ⟨ē⟩ and ⟨ō⟩ can contrastingly be used for the same monophthongs when they don't come from historic diphthongs (e.g. ⟨motēr⟩, not ⟨mwtyr⟩), and specifically when said monophthongs can't diphthongize under any conditions whatsoever in any variety. (For example, the /oː/ in the loanword بنطلون "pants" (compare English pantaloons) can diphthongize in Lebanon, despite not being from a historic diphthong, and so the word might ideally be transliterated ⟨banṭalwn⟩ rather than ⟨banṭalōn⟩.)
An imaala'd (or otherwise affected) alif is still an alif, and the presence of affectations like raising can vary greatly within any given region. We therefore represent it using ⟨ā⟩ invariably.
Short vowels:
- Certain varieties, particularly Saudi and Bedouin lects, have a phonemic /ə/ distinct from /a/ and typically corresponding to a damma. We represent it using a ⟨ə⟩ in Romanization, too. Similarly, some varieties like Moroccan Arabic have a phonemic contrast between /i/ and /e/, and so we use the same characters appropriately in Romanization.
- Otherwise, we use ⟨i⟩ for all kasras and ⟨u⟩ for all dammas.
  - [DISCUSS] Current Gulf Arabic entries use IPA ⟨ɪ⟩ in Romanization for a kasra. Should this be kept...? Or replaced with ⟨e⟩ as below?
  - [DISCUSS] Levantine varieties have many, many words where the vowel can and often does alternate between /i~u/ with no semantic effect. Perhaps find a single symbol to transliterate this vowel with, e.g. a dotless ı.
- Use ⟨e⟩ for the epenthetic high vowel some varieties use to break up consonant clusters, not ⟨i⟩. However, this should be used sparingly. Some potentially acceptable contexts for it...
  1. when the epenthetic is stressed as in Gulf Arabic (see the "ghəwa" phenomenon, although it's best-modeled here by verbs such as يشربه jəšerba(h): insertion of an epenthetic consonant followed by a stress shift to it)
  2. in usage-example transliterations, to preserve the flow of a sentence (e.g. من التنين (min l-tnyn vs. mne t-tnyn))
  3. as a syllabification aid? (e.g. بيِكسروا (byiksru vs. byikesru); the first suggests a syllabic /s/)
- At the end of a word, use ⟨i⟩ and ⟨u⟩ for the corresponding historic long vowels. Use ⟨-o⟩ for the 3sg.m suffix in varieties where it's pronounced as such.
- [DISCUSS] The feminine suffix, historically */a(h)/, varies in its modern pronunciation between /a/ and /e/ depending on lect and phonological environment. Should a word ending in it be simply transcribed as "[...]a or [...]e"? (That seems to run counter to the goal of flattening out variation here.) Or should a single character be used to represent it, and if so, what character? ⟨ǽ⟩ looks handy.

Dialect specifics

This section will ideally be added to, as time passes, by contributors experienced in individual dialects (which warrant more-specific considerations than the generalities above).

Egyptian Arabic

The conventional Egyptian spelling of Arabic ـة (-a) is ـه (-a), so main entries in Egyptian Arabic go under a title with ـه. There then can be a separate page titled with the ـة spelling, containing an Egyptian Arabic definition that uses the "alternate form of" template. (We also follow the first part when determining Arabic spelling in quotes, usage examples, etc.)
Ditto for ـى (-i) when it represents ـِي (-ī).

Levantine Arabic

The conventional Levantine spelling of Arabic ـة is not ـه. This means that Levantine entries and writings on Wiktionary do not use ـه when representing Arabic ـة.
- Additionally, Levantine ـة is typically pronounced as /e/ when imaala applies, not as /i/. Therefore, we never, ever transcribe or Romanize it using ⟨i⟩, particularly to avoid confusion in transliteration with word-final ـي (-i).
However, North Levantine is one of the varieties, mentioned above, where it is customary to spell the 3sg.m object suffix as ـو rather than as ـه. We therefore spell it ـو for such entries on Wiktionary.
Following are Levantine's "word-initial hamza" rules, which are descriptions of pronunciation that we refer to when spelling (in Arabic) and transcribing (in Romanization or IPA).
- Imperative Form I verbs invariably start with a glottal stop when constructed on إفْعِل (ʾifʿil). If instead constructed on ٱفْعـVـل (fʿv̄l), there is never a glottal stop.
- The past tense and imperative of any verb whose perfective starts with a kasra (namely, verbs of Form VII and higher) do not start with a glottal stop. Instead, they start with a two-consonant cluster. For example, اجتمع (jtamaʿ, “to come together”).
- However, the verbal nouns of such verbs do start with a glottal stop. For example, إجتماع (ʾijtimāʿ, “meeting”), which differs from Modern Standard Arabic اِجْتِمَاع (ijtimāʕ). (The first-person present subjunctive does as well, of course.)
- Any other word, if it sounds like it starts with a glottal stop, always does. This includes pronouns such as أنا (ʾana, “I”) and إنت (ʾinta, “you (m)”).
- The ش (š) of the word for "what" can sometimes cause elision of a following ء (ʔ) in common collocations, such as شِسْمِك (šismik, “what's your name?”) and شَخْبَارِك (šaḵbārik, “how have you been?”, literally “what's your news”). However, this is a unique case of elision, and it's not reason enough to write the base words as, for example, اسم and especially اخبار. (What is reason enough to write اسم that way is its history, but that's another story.)