Module talk:pi-decl/noun/Laoo
Alphabetic Writing Systems
editI'm starting by trying to clean up some jumbled talk here.
Note for Outsiders
editThailand and Laos have two basic ways of writing Pali in their everyday script - as an abugida, and as an 'alphabet', i.e. with no implicit vowels. Pali support on Wiktionary has hitherto been geared solely towards the abugida. Now it so happens that almost all Lao script Pali I have to hand is written alphabetically. (The same goes for printed Thai at the moment!)
Quick Fix
editWe know Thai and Lao have another way to write in monosyllable version (that have ะ everywhere and drop of virama). Instead of adding new set of declensions, I suggest to make new submodules for them so they will not jumble. OR just convert them at the end. --Octahedron80 (talk) 17:13, 18 May 2019 (UTC)
@Octahedron80: As to explaining the 'ah' set, there was a change comment at its introduction at "21:56, 17 May 2019" (UTC, I believe) namely, "Added m/n with explicit vowels.". In retrospect, "m/n" for "masculine/neuter" may have been too cryptic.RichardW57 (talk) 18:31, 18 May 2019 (UTC)
Until such time as we have sorted them out, I am restoring the 'ah' declension to handle the Lao declension in ະ; the declension of alphabetic a-stems is a lot of work to handle by a list of exceptions. The other vocalic declensions are not two bad. For full support of these orthographies, there is more work to do, and inflection can then exploit transliteration from Latin. RichardW57 (talk) 17:39, 18 May 2019 (UTC)
Way Forward
editWe have at least two editors (me and Octahedron80) who are happy with the idea of supporting both writing systems. RichardW57 (talk) 18:31, 18 May 2019 (UTC)
I believe the proper way forward is to
- Create transliterations from Latin to these writing systems. Problems:
- Transliteration machinery seems to be geared to the idea of one writing system per script per language.RichardW57 (talk) 18:31, 18 May 2019 (UTC)
- These writing systems have not been registered with IANA.RichardW57 (talk) 18:31, 18 May 2019 (UTC)
- IANA registration may help if we need access to protected data. If we don't, then we could technically extend the 'script' names, e.g. with Laoo_full for the fully vowelled systems as opposed to the abugida. Are there accepted names for the two systems of writing Pali? What of Pali in Thai script? I feel we need to have a consensus for the names we use. The writing system needs to be an option to the inflection templates - automatic detection may sometimes fail and may sometimes get it wrong, just as script is often an optional argument. The script categories for a language's lemmas may merit subdivision by writing system; perhaps we can keep that division private to Pali without involving
{{Module:headword}}
. RichardW57 (talk) 22:40, 18 May 2019 (UTC)
- IANA registration may help if we need access to protected data. If we don't, then we could technically extend the 'script' names, e.g. with Laoo_full for the fully vowelled systems as opposed to the abugida. Are there accepted names for the two systems of writing Pali? What of Pali in Thai script? I feel we need to have a consensus for the names we use. The writing system needs to be an option to the inflection templates - automatic detection may sometimes fail and may sometimes get it wrong, just as script is often an optional argument. The script categories for a language's lemmas may merit subdivision by writing system; perhaps we can keep that division private to Pali without involving
{{pi-alt}}
should probably group forms on the basis of scripts. Some words, e.g. vidū, are the same in both writing systems.RichardW57 (talk) 18:31, 18 May 2019 (UTC)- We can also consider how to support Tai Tham and possible Sinhalese variations.
- We can then build most of the declension tables from Latin, as has been done for the consonant declension and is also done for the conjugation of verbs. The per-script tests can then also be added simply. They currently rely on transliteration. RichardW57 (talk) 18:31, 18 May 2019 (UTC)
- I propose adding three optional arguments to pi-decl-noun (plus another unconnected, optional argument for presentation):
- @Octahedron80: 'liap' will specify the consonant(s) to be used for the Lao-script Instrumental/Ablative Plural corresponding to -bhi; it will be a string built of 'b', 'bh' and 'b.' with the obvious meaning. I will also allow the values 'all' and 'none'. For now, the default will be as at present, which avoids breaking the Lao script declension test.
- 'aa' will select the form of aa in the Tai Tham (and Burmese?) affixes where variation is seen. The values will be 'round', 'tall', 'both' and 'default'. It will be relevant for stems in 'a' and possibly some of the consonant stems. We may need to discuss whether to apply it to the -avo ending of 'u' stems - first I need to collect the evidence.
- 'full' will select whether 'a' has a written vowel - 'yes', 'no' or 'both'. If implementing 'both' is too difficult, I will not implement it. Apart from error conditions, it will only take effect for the Lao and Thai scripts. It will have no effect for 'a' stems - their stems are always different in the two writing systems. I'm not sure if 'full=both' will be useful; it may be better to create two declension tables. This optional argument will also be passed to the transliteration function. -- RichardW57m (talk) 16:57, 21 May 2019 (UTC)
- @Octahedron80: I'm going to use 'impl' instead. It's much easier to explain than 'full'. Anyone from an alphabetic background who can read an abugida should know what an implicit vowel is. -- RichardW57 (talk) 16:00, 24 May 2019 (UTC)
- I will use 'label' to supply an alternative for display in the heading - sometimes the formal 'stem' is not the best base for generating the declension table.
- I will also prevent manual entry of forms from causing forms to be displayed twice. I already do this for verbs. -- RichardW57m (talk) 16:57, 21 May 2019 (UTC)
- I propose adding three optional arguments to pi-decl-noun (plus another unconnected, optional argument for presentation):
- A-stem declensions may be worth encoding separately.RichardW57 (talk) 18:31, 18 May 2019 (UTC)
Nuktas
editAFAIK underdot (virama) is not used for alternative consonant which correct glyph is not presented. (e.g. ພ຺ does not tend to be ຠ) Please see http://www.unicode.org/L2/L2017/17106-lao-for-pali.pdf (Unicode Proposal with unrelocated nya) --Octahedron80 (talk) 19:32, 18 May 2019 (UTC)
- Follow the quotes, e.g. for ສາຣະຖິ (sārathi), and learn that they have been so used. The evidence that I have says that your changes should be reverted. The date suggests that the additional letters had been rejected at the time of the book (1943 AD) I am working for. RichardW57 (talk) 19:58, 18 May 2019 (UTC)
- It is my little knowledge that I had never seen alternative consonant with underdot. Now I know, and the new form is not wrong either. So there is one more form of it. This will make my modules harder to return both forms. BTW, your original file at laomanuscripts.net is now missing. --Octahedron80 (talk) 20:14, 18 May 2019 (UTC)
- Do you have evidence of the additional letters being used with the alphabetic system? What 'permanent form' is it in? RichardW57 (talk) 19:58, 18 May 2019 (UTC)
- It is already in the PDF page 17. Have you seen it? --Octahedron80 (talk) 20:04, 18 May 2019 (UTC)
- Curious! I looked through that PDF (and have cited it for ພຸທ຺ຘ (buddha)), but could only find three words. I have looked again, and now I see an example in the alphabetic system (i.e. the system where vowels are all explicit, not implicit) on page 17 (page 20 in second issue). This is where life gets complicated. I don't believe extra letters and nuktas will be used in the same word. We have the added complication of writing that could be described as dropping the nuktas, e.g. ພະຄະວາ for bhagavā - there's plenty of that on the Internet. I had planned to propose treating the nuktas like macrons in Latin - we don't treat the absence as a difference in spelling. The Romans had a period of marking long vowels (using the apex), but gave it up. I intend to restore ພ຺ to the words I have cited from Maha Sena; there are several options for what we do with the general inflections. RichardW57 (talk) 21:16, 18 May 2019 (UTC)
- @Octahedron80:Options for alphabetic writing systems:
- Deliver -ຠິ, -ພ຺ິ and -ພິ for all stems. The editor must override the inflection if any of them is inappropriate. At present, he would use the 'replace' option and specify the ones he accepts. Perhaps I need to add options such as insp_del, insp_del2 to specifically prohibit certain inflected forms.
- Check for compatibility with stem, and only deliver the compatible forms. This is probably the easiest for editors. -- RichardW57 (talk) 23:46, 18 May 2019 (UTC)
- However, this won't stop *ພະຄະວັນເຕຠິ; if it has PALI BHA in the ending, it should have it for the initial letter! -- RichardW57 (talk) 01:07, 19 May 2019 (UTC)
- Only deliver -ຫິ; the editor must provide any other forms.
- Perhaps we need to add a simpler, more specific option to
{{pi-decl-noun}}
rather than only use a general purpose mechanism to tailor inflections. - I've looked at how entries for Latin work. They'd break the mechanism that allows us to invoke
{{pi-decl-noun}}
without any arguments. - I don't know whether the problem occurs with the abugida writing systems. -- RichardW57 (talk) 23:46, 18 May 2019 (UTC)
- The word ທັມມະ (damma) gives an example of invisible etymological constraints on the spelling of the -bhi case ending. -- RichardW57 (talk) 15:23, 19 May 2019 (UTC)
- It is already in the PDF page 17. Have you seen it? --Octahedron80 (talk) 20:04, 18 May 2019 (UTC)
PS Does IANA really matter?--Octahedron80 (talk) 19:53, 18 May 2019 (UTC)
- Answered where I raised the issue. -- RichardW57 (talk) 23:46, 18 May 2019 (UTC)
Copy and Paste Liability
editIn the mean time, why have you duplicated the i- and u- declensions for abugidic Lao? They were working just fine by conversion of the affixes only from Latin script. Entering manual conversions means that if a change is required - and I wouldn't rule it out, we have to change every submodule containing the data. RichardW57 (talk) 17:39, 18 May 2019 (UTC)
I generated an explicit short a-declension table for Lao because the prescript vowels would have required too much thought. (It's not insoluble - I had to deal with it for regular optatives.) RichardW57 (talk) 17:39, 18 May 2019 (UTC)
- I did not originally develop modules from Latin script. I created them from Thai script at Thai Wiktionary for some times and then I imported to here. The data tables for every script were the first thing I could think of. Later, auto-script conversion was introduced by here people. It may be better if it is able to work from only single data table as the new approach. IMO, Deva is the easiest model to manage, but due to policy, they use Latin script in first position. Octahedron80 (talk) 18:43, 18 May 2019 (UTC)
- I recently made the test module that can convert traditional text into monosyllable form Module:User:Octahedron80/test. Result is at User:Octahedron80/sandbox. I think it is ready to use. Please give me some times I think where to put this. --Octahedron80 (talk) 18:43, 18 May 2019 (UTC)
- @Octahedron80: One sensible home would be as an exported function of pi-Latn-translit. -- RichardW57 (talk) 16:43, 19 May 2019 (UTC)
- At Thai wiktionary, we have Module:pi-alt (Thai-based) as backbone of Template:pi-alt, instead of Module:pi-Latn-translit (Latin-based). Since they have different logic, and I must develop at my site either, so I give those Pali modules to mainly manage by you. --Octahedron80 (talk) 08:08, 25 May 2019 (UTC)
- @Octahedron80: Now implemented here for Lao. For readability, I have used carriers for the non-spacing vowels etc; function dc() deletes them. (How do you mange to read such expressions?) I found two bugs:
- You overlooked consonants at the end of the string - your test case does not check this environment well. I added the line
- result = mw.ustring.gsub(result, "([ກ-ຮ])$", "%1ະ")
- Thanks. I also add this rule to Thai either. (I use BabelMap that can see every symbols.) --Octahedron80 (talk) 08:50, 25 May 2019 (UTC)
- result = mw.ustring.gsub(result, "([ກ-ຮ])$", "%1ະ")
- When deleting mai kan because of a vowel immediately before it, you forgot the virama. That was almost impossible to see the way you wrote the regular expression. The corrected line reads
- result = mw.ustring.gsub(result, dc("([ກ-ຮ])([າອິອີອ຺ອຸອູ])ອັ"), "%1%2")
- I think I already added it. My mistake. --Octahedron80 (talk) 08:52, 25 May 2019 (UTC)
- result = mw.ustring.gsub(result, dc("([ກ-ຮ])([າອິອີອ຺ອຸອູ])ອັ"), "%1%2")
- You overlooked consonants at the end of the string - your test case does not check this environment well. I added the line
- -- RichardW57 (talk) 08:26, 25 May 2019 (UTC)
- @Octahedron80: Now implemented here for Lao. For readability, I have used carriers for the non-spacing vowels etc; function dc() deletes them. (How do you mange to read such expressions?) I found two bugs:
- At Thai wiktionary, we have Module:pi-alt (Thai-based) as backbone of Template:pi-alt, instead of Module:pi-Latn-translit (Latin-based). Since they have different logic, and I must develop at my site either, so I give those Pali modules to mainly manage by you. --Octahedron80 (talk) 08:08, 25 May 2019 (UTC)
- @Octahedron80: One sensible home would be as an exported function of pi-Latn-translit. -- RichardW57 (talk) 16:43, 19 May 2019 (UTC)
Testing with Explicit Vowels
editModule:pi-decl/noun/Laoo/testcases now tests inflection for both systems - implicit vowels plus 'Pali virama' and explicit vowels. There is as yet no regression test for alternative handlings of the -bhi ending. -- RichardW57 (talk) 08:33, 25 May 2019 (UTC)