Module talk:ar-translit

Latest comment: 3 months ago by Atitarev in topic bug

Test edit

Invoke it so, e.g. اَلْلُغَةُ ٱلْعَرَبِيَّةُ (al-lúġa(tu) al-ʿarabíyya(tu)): [MODULE CALL REDACTED] --Anatoli (обсудить/вклад) 03:40, 18 March 2013 (UTC)Reply

ا, ي, و done. For tanwin and ta' marbuta, I think the best solution is changing the text style, e.g. turning them to grey, or superscripting them (as practised by some sources). --Z 10:26, 18 March 2013 (UTC)Reply
Good job! I will test with more examples. My preference would be to silence them altogether, we don't transliterate tanwīn in entries or translations. So لُغَةُ simply becomes "luġa", not "luġatun". As for tāʾ marbūṭa we mark the pronounced "t" in ʾiḍāfa constructs (إضافة or genitive constructs): همزة الوصل "hamzat al-waṣl" but I'm not sure if what's feasible. Other choices are ä, (t). --Anatoli (обсудить/вклад) 10:49, 18 March 2013 (UTC)Reply
For ta' marbuta, ä is not common, and ignoring the ة is not a good way to go. If we are not going to transliterate tanwin, then "(t)"/t is the best choice, I think. --Z 11:23, 18 March 2013 (UTC)Reply
Let's leave it as "(t)" then. What if we have an optional variable "showTanwin"? Pseudocode:
If showTanwin then tāʾ marbūṭa = "t" and tanwīn letters -un, -in,-an
Else tāʾ marbūṭa = (t) and tanwīn suppressed? --Anatoli (обсудить/вклад) 11:54, 18 March 2013 (UTC)Reply
Done: {{#invoke:ar-translit|tr|لُغَةٌ مَجْهُولَةٌ|showI3raab=yeah}} -> [MODULE CALL REDACTED] --Z 12:24, 18 March 2013 (UTC)Reply

well edit

I think this is a bad idea, I'm just going to say that. lol. — [Ric Laurent]16:15, 18 March 2013 (UTC)Reply

Why, because of the transliteration of because it's not feasible? We haven't started using it for anything, mate. It's still very raw. It may be helpful to automatically transliterate fully vocalised words, if and when it's working, provided Arabic is written in a strict style, of course. The usage is limited of course.
I don't blame you for that feeling. I think Google Translate gave up on transliterating Arabic, Persian and Urdu, also Hebrew. --Anatoli (обсудить/вклад) 22:25, 18 March 2013 (UTC)Reply
It just seems that a lot could go wrong. — [Ric Laurent]23:43, 18 March 2013 (UTC)Reply
I don't have skills in Lua but this may change. I'm tied with Japanese romaji and Russian. Will address Korean translit next, will ask Ruakh or learn to do something myself. This may become something useful but only for fully vocalised Arabic or someone wanting help on Arabic letters. --Anatoli (обсудить/вклад) 02:27, 21 March 2013 (UTC)Reply
It could be useful for fully vocalized Arabic if more people did it correctly. — [Ric Laurent]02:39, 21 March 2013 (UTC)Reply

Various translit tests edit

Original: اللغة العربية هي أكثر اللغات تحدثا ضمن مجموعة اللغات السامية، وإحدى أكثر اللغات انتشارًا في العالم، يتحدثها أكثر من 422 مليون نسمة، ويتوزع متحدثوها في الوطن العربي، بالإضافة إلى العديد من المناطق الأخرى المجاورة كالأحواز وتركيا وتشاد ومالي والسنغال وإرتيريا. اللغة العربية ذات أهمية قصوى لدى المسلمين، فهي لغة مقدسة (لغة القرآن)، ولا تتم الصلاة (وعبادات أخرى) في الإسلام إلا بإتقان بعض من كلماتها. العربية هي أيضا لغة شعائرية رئيسية لدى عدد من الكنائس المسيحية في الوطن العربي، كما كتبت بها الكثير من أهم الأعمال الدينية والفكرية اليهودية في العصور الوسطى. وأثّر انتشار الإسلام، وتأسيسه دولاً، في ارتفاع مكانة اللغة العربية، وأصبحت لغة السياسة والعلم والأدب لقرون طويلة في الأراضي التي حكمها المسلمون، وأثرت العربية، تأثيرًا مباشرًا أو غير مباشر على كثير من اللغات الأخرى في العالم الإسلامي، كالتركية والفارسية والأمازيغية والكردية والأردوية والماليزية والإندونيسية والألبانية وبعض اللغات الإفريقية الأخرى مثل الهاوسا والسواحيلية، وبعض اللغات الأوروبية وخاصةً المتوسطية منها كالإسبانية والبرتغالية والمالطية والصقلية. كما أنها تُدرَّس بشكل رسمي أو غير رسمي في الدول الإسلامية والدول الإفريقية المحاذية للوطن العربي.

Using User:ZxxZxxZ/arTranslit.js:

al-lġa(t) al-ʿrbya(t) hy/ī ʾkṯr al-lġāt tḥdṯā ḍmn mjmw/ūʿa(t) al-lġāt al-sāmya(t), wʾiḥdā ʾkṯr al-lġāt ʾntšāran fy/ī al-ʿālm, ytḥdṯhā ʾkṯr mn 422 mly/īw/ūn nsma(t), wy/ītw/ūzʿ mtḥdṯw/ūhā fy/ī al-wṭn al-ʿrby/ī, bālʾiḍāfa(t) ʾilā al-ʿdy/īd mn al-mnāṭq al-ʾxrā al-mjāw/ūra(t) kālʾḥwāz wtrkyā wtšād wmāly/ī wālsnġāl wʾirty/īryā. al-lġa(t) al-ʿrbya(t) ḏāt ʾhmya(t) qṣw/ūā ldā al-mslmy/īn, fhy/ī lġa(t) mqdsa(t) (lġa(t) al-qrʾān), wlā ttm al-ṣlāa(t) (wʿbādāt ʾxrā) fy/ī al-ʾislām ʾilā bʾitqān bʿḍ mn klmāthā. al-ʿrbya(t) hy/ī ʾyḍā lġa(t) šʿāʾrya(t) rʾysya(t) ldā ʿdd mn al-knāʾs al-msy/īḥya(t) fy/ī al-wṭn al-ʿrby/ī, kmā ktbt bhā al-kṯy/īr mn ʾhm al-ʾʿmāl al-dy/īnya(t) wālfkrya(t) al-yhw/ūdya(t) fy/ī al-ʿṣw/ūr al-wsṭā. wʾṯṯr ʾntšār al-ʾislām, wtʾsy/īsh dw/ūlāan, fy/ī ʾrtfāʿ mkāna(t) al-lġa(t) al-ʿrbya(t), wʾṣbḥt lġa(t) al-syāsa(t) wālʿlm wālʾdb lqrw/ūn ṭw/ūy/īla(t) fy/ī al-ʾrāḍy/ī al-ty/ī ḥkmhā al-mslmw/ūn, wʾṯrt al-ʿrbya(t), tʾṯy/īran mbāšran ʾw ġy/īr mbāšr ʿlā kṯy/īr mn al-lġāt al-ʾxrā fy/ī al-ʿālm al-ʾislāmy/ī, kāltrkya(t) wālfārsya(t) wālʾmāzy/īġya(t) wālkrdya(t) wālʾrdw/ūya(t) wālmāly/īzya(t) wālʾindw/ūny/īsya(t) wālʾlbānya(t) wbʿḍ al-lġāt al-ʾifry/īqya(t) al-ʾxrā mṯl al-hāw/ūsā wālswāḥy/īlya(t), wbʿḍ al-lġāt al-ʾwrw/ūbya(t) wxāṣa(t)an al-mtw/ūsṭya(t) mnhā kālʾisbānya(t) wālbrtġālya(t) wālmālṭya(t) wālṣqlya(t). kmā ʾnhā tudraas bškl rsmy/ī ʾw ġy/īr rsmy/ī fy/ī al-dw/ūl al-ʾislāmya(t) wāldw/ūl al-ʾifry/īqya(t) al-mḥāḏya(t) llw/ūṭn al-ʿrby/ī.

This module (current):

[MODULE CALL REDACTED]

This module (today's result only from preview):

al-lġ(t) al-ʿrbī(t) hī ʾkṯr al-lġāt tḥdṯā ḍmn mjmūʿ(t) al-lġāt al-sāmī(t), ūʾḥdā ʾkṯr al-lġāt āntšārā fī al-ʿālm, ītḥdṯhā ʾkṯr mn 422 mlīūn nsm(t), ūītūzʿ mtḥdṯūhā fī al-ūṭn al-ʿrbī, bālʾḍāf(t) ʾlā al-ʿdīd mn al-mnāṭq al-ʾxrā al-mjāūr(t) kālʾḥūāz ūtrkīā ūtšād ūmālī ūālsnġāl ūʾrtīrīā. al-lġ(t) al-ʿrbī(t) ḏāt ʾhmī(t) qṣūā ldā al-mslmīn, fhī lġ(t) mqds(t) (lġ(t) al-qrʾān), ūlā ttm al-ṣlā(t) (ūʿbādāt ʾxrā) fī al-ʾslām ʾlā bʾtqān bʿḍ mn klmāthā. al-ʿrbī(t) hī ʾīḍā lġ(t) šʿāʾrī(t) rʾīsī(t) ldā ʿdd mn al-knāʾs al-msīḥī(t) fī al-ūṭn al-ʿrbī, kmā ktbt bhā al-kṯīr mn ʾhm al-ʾʿmāl al-dīnī(t) ūālfkrī(t) al-īhūdī(t) fī al-ʿṣūr al-ūsṭā. ūʾṯṯr āntšār al-ʾslām, ūtʾsīsh dūlā, fī ārtfāʿ mkān(t) al-lġ(t) al-ʿrbī(t), ūʾṣbḥt lġ(t) al-sīās(t) ūālʿlm ūālʾdb lqrūn ṭūīl(t) fī al-ʾrāḍī al-tī ḥkmhā al-mslmūn, ūʾṯrt al-ʿrbī(t), tʾṯīrā mbāšrā ʾū ġīr mbāšr ʿlā kṯīr mn al-lġāt al-ʾxrā fī al-ʿālm al-ʾslāmī, kāltrkī(t) ūālfārsī(t) ūālʾmāzīġī(t) ūālkrdī(t) ūālʾrdūī(t) ūālmālīzī(t) ūālʾndūnīsī(t) ūālʾlbānī(t) ūbʿḍ al-lġāt al-ʾfrīqī(t) al-ʾxrā mṯl al-hāūsā ūālsūāḥīlī(t), ūbʿḍ al-lġāt al-ʾūrūbī(t) ūxāṣ(t) al-mtūsṭī(t) mnhā kālʾsbānī(t) ūālbrtġālī(t) ūālmālṭī(t) ūālṣqlī(t). kmā ʾnhā tudraas bškl rsmī ʾū ġīr rsmī fī al-dūl al-ʾslāmī(t) ūāldūl al-ʾfrīqī(t) al-mḥāḏī(t) llūṭn al-ʿrbī.

--Anatoli (обсудить/вклад) 04:09, 28 March 2013 (UTC)Reply

I tried these but it didn't work edit

I hoped I could make اُكتُبْ to be transliterated as "uktub", not "āuktub".

I tried this but it didn't work:

    ["اَ"]="a",
    ["اِ"]="i",
    ["اُ"]="u",

Test (اُكتُبْ): [MODULE CALL REDACTED]. --Anatoli (обсудить/вклад) 11:28, 5 July 2013 (UTC)Reply

Fixed. --Z 13:24, 5 July 2013 (UTC)Reply

showI3raab parameter edit

The module jumps back and forth in functionality. Not sure what happened. I'm trying to display and transliterate all verb endings but they don't show again here User:Atitarev/ar-conjug-I-test. The part "ar_translit.tr(form, "showI3raab")" in Module:ar-verb is supposed to make endings mandatory. It did work before but doesn't work again. --Anatoli (обсудить/вклад) 03:25, 25 November 2013 (UTC)Reply

ـ edit

The "underline" character (ـ) should be transliterated as a dash, -, particularly when it is not preceded by another Arabic-script character, see [1] for example. --Z 13:41, 6 August 2015 (UTC)Reply

I agree with you. Typically I've been manually inserting translit with the dash but it would be better to do it automatically. Benwing (talk) 20:51, 6 August 2015 (UTC)Reply
@ZxxZxxZ I implemented this at beginning and end of word. Benwing (talk) 09:55, 11 August 2015 (UTC)Reply

ة edit

ة is transliterated insoncistently: مِشْكَاة (miškāh) it adds a final "h". --Z 21:08, 23 August 2015 (UTC)Reply

There was a decision to transliterate ـَة as -a and ـَاة as -āh. I'm not personally a fan of the latter, but it is not a bug in the module. --WikiTiki89 00:49, 24 August 2015 (UTC)Reply

Translit problems edit

For some reason, the quotation in أَحَد (ʔaḥad) are not transliterating. I tried replacing the odd Qur'anic diacritics in the first example with normal ones, to no avail. What's wrong? — Eru·tuon 22:07, 22 January 2017 (UTC)Reply

  • قُلۡ هُوَ ٱللَّهُ أَحَدٌ / ٱللّٰهُ ٱلصَّمَدُ / لَمۡ يَلِدۡ وَلَمۡ يُولَدۡ / وَلَمۡ يَكُن لَّهُ ۥ ڪُفُوًا أَحَدٌ
    Say: He, Allah, is one. / Allah is He on Whom all depend. / He begets not, nor is He begotten; / And none is like Him.
  • قُلْ هُوَ ٱللَّهُ أَحَدٌ / ٱللّٰهُ ٱلصَّمَدُ / لَمْ يَلِدْ وَلَمْ يُولَدْ / وَلَمْ يَكُنْ لَهُ ڪُفُوًا أَحَدٌ
    qul huwa l-lahu ʔaḥadun / llāhu ṣ-ṣamadu / lam yalid walam yūlad / walam yakun lahu kufuwan ʔaḥadun
    Say: He, Allah, is one. / Allah is He on Whom all depend. / He begets not, nor is He begotten; / And none is like Him.

Aha, the problem is the "swash kaf" ڪ (k). — Eru·tuon 17:34, 28 July 2017 (UTC)Reply

  • قُلْ هُوَ ٱللَّهُ أَحَدٌ / ٱللّٰهُ ٱلصَّمَدُ / لَمْ يَلِدْ وَلَمْ يُولَدْ / وَلَمْ يَكُنْ لَهُ كُفُوًا أَحَدٌ
    qul huwa l-lahu ʔaḥadun / llāhu ṣ-ṣamadu / lam yalid walam yūlad / walam yakun lahu kufuwan ʔaḥadun
    Say: He, Allah, is one. / Allah is He on Whom all depend. / He begets not, nor is He begotten; / And none is like Him.

But that doesn't make sense. It's listed in the replacements (tt) for individual letters in the module, but it still causes the transliteration to fail. — Eru·tuon 17:38, 28 July 2017 (UTC)Reply

Fixed. — Eru·tuon 17:46, 28 July 2017 (UTC)Reply

Nunation before waSla edit

@Atitarev, Wikitiki89: In the following, should ʾaḥadun / llāhu be ʾaḥaduni / llāhu, with an epenthetic i before the waSla? It's unpronounceable as is.

قُلْ هُوَ ٱللَّٰهُ أَحَدٌ / ٱللَّٰهُ ٱلصَّمَدُ / لَمْ يَلِدْ وَلَمْ يُولَدْ / وَلَمْ يَكُنْ لَهُ كُفُوًا أَحَدٌ‏ (qul huwa llāhu ʔaḥadun / llāhu ṣ-ṣamadu / lam yalid walam yūlad / walam yakun lahu kufuwan ʔaḥadun)

Eru·tuon 17:32, 28 July 2017 (UTC)Reply

As with other consonant-final words, the epenthetic vowel is determined by a certain set of rules that depends both on the ending of the preceding word, and the beginning of the following word, e.g. عَلَيْكُمُ السَّلَامُ (ʕalaykumu s-salāmu), مِنَ الْبَيْتِ (mina l-bayti), كَتَبَتِ الْكِتَابَ (katabati l-kitāba). Unlike other cases, with nunation this vowel is not written, because there is nowhere to write it. I don't know the specific rule for nunation before the definite article. And regarding specifically this verse, I've heard it read, but only with a pause between each verse, and in pausal position, أَحَدٌ (ʔaḥadun) is pronounced أَحَدْ (ʔaḥad). --WikiTiki89 18:20, 28 July 2017 (UTC)Reply

Cantillation marks edit

Finally noting a thing that has run through my head for a few times: The auto-transcriptor of Arabic should ignore the prosodic signs added to Qurʾān. Currently if one such sign is present in a correctly vocalized quote the transcription completely fails, which causes the pesky Saudi IP to circumvent it by substituting (extra-work) and on the other hand their removal, not to mention that both combined this results in edit wars. I imagine however that people who have heard the Scripture recited often may find it easier to be reminded of its lines by reading those marks, and for the entries on these characters, the quotes should logically contain them, additionally the Qurʾān is likely the most-quoted source in Arabic entries and it would save a lot of work if these signs – often already contained in texts copied from sites like quran.com – were handled; apart from the fact that they are not wrong either. @Benwing2, Erutuon, Fenakhay. Fay Freak (talk) 22:54, 12 November 2020 (UTC)Reply

@Fay Freak: I'm not familiar with these signs. Programming-wise, you would like to just pass them through to the transliteration? Do you have a full list of these symbols that can be used in the Lua? It's a matter of changing has_diacritics, which determines whether export.tr will try to transliterate. — Eru·tuon 21:01, 19 November 2020 (UTC)Reply
@Fay Freak, Benwing2, Erutuon: They just add visual clutter and nothing else. They should be removed from all quotations, that's like adding Hebrew cantillation markers to Tanakh quotations. They are not even included in quotations from religious books. They have no value whatsoever outside the Qur'an. — فين أخاي (تكلم معاي · ما ساهمت) 16:27, 20 December 2020 (UTC)Reply

bug edit

اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ
(please add an English translation of this usage example)
This text should be able to be automatically transcribed, but there's no output. I guess the initial alif not followed by a diacritic should be transcribed as "i" if not part of the definite article, (a kasra is inferred in this case) (this feature was probably removerd because of initial "al-" handling.) LinguisticMystic (talk) 22:37, 23 January 2024 (UTC)Reply
@LinguisticMystic: No inferrals, no bugs. It should be اِهْدِنَا‎ihdināguide us. Check the conjugation table at هَدَى (hadā).
اِهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ.ihdinā ṣ-ṣirāṭa l-mustaqīma.Show us the straight path. Anatoli T. (обсудить/вклад) 22:52, 23 January 2024 (UTC)Reply
In Quranic texts this initial kasra is consistently skipped, all other vocalization is complete, except for this one always skipped. I wonder why. see here: https://ar.wikipedia.org/w/index.php?title=صلاة_الجنازة LinguisticMystic (talk) 22:59, 23 January 2024 (UTC)Reply
@LinguisticMystic: For humans, familiar with the grammar, many things are obvious and make sense and even if they provide vocalisations, they don't always do it 100% and skip obvious cases (for humans). The module won't know the grammar to make such inferrals and it would create lots of false positives, Besides, you will find plenty of evidence of more strict spellings, including your case with ‏اِهْدِنَا (ihdinā).
I just recalled something else. When Arabic was only starting to appear on the Internet as text, not as images, Arabic learners complained about the display of diacritics and they had specifically issues with fatha and kasra over/under alif. Anatoli T. (обсудить/вклад) 23:31, 23 January 2024 (UTC)Reply
I know this is grammatically incorrect, but on the Arabic Wikipedia there is a whole module for Quran (https://ar.wikipedia.org/wiki/وحدة:Quran) and in the data it uses (https://ar.wikipedia.org/wiki/وحدة:Quran/data_text) these kasras are consistently skipped, that's why I started to wonder ... and think that it is not by mistake, rather it is Quranic usage of grammar. LinguisticMystic (talk) 23:43, 23 January 2024 (UTC)Reply
@LinguisticMystic: There are stricter ways, even for Quranic Arabic writing and there are more than one style.
  1. ى is used in the final positions instead of ي
  2. ـٰ is rarely used, etc.
There is no bug here, if you want to provide quotations, please use a more strict vocalisation.
There are different opinions on this. Calling on the main Arabic editor @Fenakhay. Anatoli T. (обсудить/вклад) 00:09, 24 January 2024 (UTC)Reply
Search for better results, don't seek to avoid our rules - اِهْدِنَا in Google Books: https://www.google.com/search?q="اِهْدِنَا"&sca_esv=600915497&rlz=1C1GCEB_enAU994AU994&tbm=bks&sxsrf=ACQVn0_NkFp1fHsTrKW1ZKFQ1ftU0xIfWQ:1706056911706&source=lnms&sa=X&ved=2ahUKEwjb98Ch5fSDAxUgbGwGHaTkBbMQ_AUoAXoECAEQCw&biw=1536&bih=831&dpr=1.25 Anatoli T. (обсудить/вклад) 00:46, 24 January 2024 (UTC)Reply
(This issue could be resolved by doing this substitution: text = rsub(text, "^ā([bdḍḏfḡhḥjkḵlmnqrsšṣtṭṯzẓʕ])", "i%1") LinguisticMystic (talk) 15:07, 24 January 2024 (UTC)Reply
@LinguisticMystic, Atitarev: I am against a default vocalisation. اُكْتُب اِكْتِئاب اَلْكِتَاب، وَٱكْتُب(please add an English translation of this usage example)Fenakhay (حيطي · مساهماتي) 15:44, 24 January 2024 (UTC)Reply
You are right. It is probably not a good idea to generalize, as it would lead to many other unwanted errors. LinguisticMystic (talk) 15:49, 24 January 2024 (UTC)Reply
@LinguisticMystic: Of course. I am surprised that you actually thought that an initial vowel is always followed by a kasra with your piece of code. Anatoli T. (обсудить/вклад) 22:46, 24 January 2024 (UTC)Reply
Return to "ar-translit" page.