Module:sa-convert/documentation

Documentation for Module:sa-convert. [edit]
This page contains usage information, categories, interwiki links and other content describing the module.

This module is used to convert Sanskrit Devanagari text to other scripts. It is principally used in Template:sa-alt and its function tr is exported in Template:sa-convert.

Example edit

ॐ त्र्यम्बकं यजामहे सुगन्धिं पुष्टिवर्धनम् । उर्वारुकमिव बन्धनान् मृत्योर् मुक्षीय माऽमृतात् ॥ कः खगौघाङचिच्छौजा झाञ्ज्ञोऽटौठीडडण्ढणः। तथोदधीन् पफर्बाभीर्मयोऽरिल्वाशिषां सहः॥

Unresolved Issues edit

  • Burmese:
    • Round AA also needs to be replaced with tall AA in some situations.   Done
    • Some conjuncts need to be cleanup like -y-, -r-, -v- when they come together.
    • NGA floating င္ → င်္   Done
    • RA repha ရ္ → ရ်္ (This never happens in Pali.)   Done
    • NYA + virama + NYA → great NYA   Done
    • SA + virama + SA → great SA   Done
    • Final virama → asat   Done
  • Lao:
    • Lao does not have characters for ऋ ॠ ऌ ॡ so it uses equivalent ຣິ ຣີ ລິ ລີ instead.   Done
      • Evidence? I've read that it uses ຣຶ ຣື ລຶ ລື, which would eliminate the ambiguity.
      • In "Lanexang Mon4" font, there are already invented characters ຤(=ฤ) ຦(=ฦ) at unassigned codepoints but their usages are nowhere to attest.
  • Khmer:
    • RA repha រ្ → robat over next consonant ៌ (This never happens in Pali.)   Done
    • Final virama → viriam   Done
  • Javanese: ꦨꦹꦂꦨꦸꦮꦃꦱ꧀ꦮꦃꦠꦠ꧀ꦱꦮꦶꦠꦸꦂꦮꦫꦺꦟꦾꦁ꧉꧇꧑꧇꧉
    • no spaces in the script (need to remove the ones that enter the module); also causes the following two issues
    • ꦾ and ꦿ for word medial conjuncts, but ꦪ and ꦫ for conjuncts that cross word boundaries, e.g.
    • ꦂ for aksaras that end with r, but aren't aksara initial, e.g.
    • enclosing numbers around ꧇ (꧇꧑꧙꧇ = 19). Test: त्र्य०६म्बकं -> ꦠꦿꦾ꧇꧐꧖꧇ꦩ꧀ꦧꦏꦁ
    • ꦘ should be used for the conjunct ज्ञ, not ꦗ꧀ꦚ. Test: ज्ञ ->
  • Balinese:
    • also no spaces, and causes the following issue
    • ◌ᬃ for syllables that begin with r
    • enclosing numbers around ᭞ (᭞᭑᭞ = 1). Test: त्र्य०६म्बकं: ᬢ᭄ᬭ᭄ᬬ᭞᭐᭖᭞ᬫ᭄ᬩᬓᬂ
  • Bengali:
  • Assamese:
  • Sinhala
    • for Sanskrit, conjuncts are formed not by simply using its virama (U+0DCA) but by either abutting the consonants, encoded by the sequence <U+200D, U+0DCA> or by forming a ligature, encoded by <U+0DCA, U+200D>. (The extra character is ZWJ.) Which is used depends on the consonants, but as a general rule forms a ligature with a consonant to either side (very like Devanagari w:repha and rakar), while formally (ya) ligates with a preceding consonant, but in fact the glyph simply changes shape. There is some evidence for geminate (ya) being ය‍්ය in Sanskrit rather than ය්‍ය as in Pali. Finally, at least one pair form a separately encoded ligature - plus (ña) becomes (gna). My best estimate so far for the combinations has been encoded in Module:sa-utilities/translit/SLP1-to-Sinh, and ultimately I believe this module and that module should share common code for the fix-up of naive transliteration that just uses U+0DCA.   Done
    • Additionally, for the Pali and Sanskrit I can find, /e/ and /o/ do not have their length marked, but use the same symbols as the Sinhalese language uses for the short vowels.   Done
    • I have just (18/19 December 2023) added some evidence-based test cases to Module:sa-convert/testcases. Research continues to plod along.
  • Tamil
    • Final nasals.
    • Final visarga - the Grantha visarga is used.   Done
    • Encoding of superscript digits and vowels.
    • Syllabic consonants
    • Rules for /n/ - (na) v. (ṉa).
    • Alternative forms, e.g. subscript digits.