Module talk:sa-translit

Until a better method is found (anusvara, candrabindu) ...

Latest comment: 10 years ago1 comment1 person in discussion

Here - Wiktionary_talk:Votes/pl-2014-06/Romanization_of_Sanskrit#What_if_the_word_is_attested_using_a_transliteration_different_from_our_module.3F, I mentioned the issue with anusvara and candrabindu transliterations. If this is fixed, other Indic transliteration modules can be fixed as well. --Anatoli ^{(обсудить}/^вклад) 01:19, 10 June 2014 (UTC)Reply

Useless

Latest comment: 10 years ago3 comments2 people in discussion

This module is useless since we must provide accent marks manually in transliterations, as they are unpredictable. Also elements of compounds must be manually specified. --Ivan Štambuk (talk) 12:26, 29 June 2014 (UTC)Reply

@Ivan Štambuk "Useless" is a strong word here and I disagree. If accent marks are not present in the native script, they obviously won't appear in the transliteration. Like unaccented Cyrillic won't show accents in the transliteration either. Same with compounds. It is possible to employ tricks, such as accents, which are not visible in the native script but visible in the transliteration. It is done in Korean, Japanese, Mandarin Chinese transliteration modules (most developed by User:Wyang), where word spaces (ja, cmn), hyphens, capitalisation is used to improve readability. Of course, a dedicated Sanskrit editor is required and someone with good Lua skills. --Anatoli ^{(обсудить}/^вклад) 23:41, 13 July 2014 (UTC)Reply

The problem is that this module was being used as a replacement for transcriptions with accents specified. God knows how many of Sanskrit entries were "improved" that way.

Any invisible characters for accents or compound separation would break Devanagari spellings and would require an extra preview to ensure that the output is generated properly, like it's currently done with Sanskrit inflection templates (which could be largely automated in Lua now, but the principle is same). --Ivan Štambuk (talk) 07:37, 14 July 2014 (UTC)Reply

Om

Latest comment: 7 years ago5 comments2 people in discussion

It appears that the module cannot handle ॐ (oṃ). This should produce whatever the preferred translit is (I would add it, but I don't know if what's currently in the entry is best). @DerekWinters, Atitarev, Wyang —Μετάknowledge^{discuss/deeds} 04:38, 16 September 2016 (UTC)Reply

Okay, I've added it as oṃ and fixed the entry. I've also added the om character to MOD:hi-translit and MOD:bo-translit. As usual, revert if I did anything wrong. —Μετάknowledge^{discuss/deeds} 04:55, 16 September 2016 (UTC)Reply

@Metaknowledge I think we should transliterate it as auṃ, even if "au" is monophtongized as "o". --Anatoli T. ^{(обсудить}/^вклад) 04:58, 16 September 2016 (UTC)Reply

I've not studied Sanskrit, so you guys can decide that (Wikipedia gives both as IAST, so I'm none the wiser as for which is better). —Μετάknowledge^{discuss/deeds} 06:21, 16 September 2016 (UTC)Reply

@Metaknowledge I haven't studied it either but I self-studied Hindi a little bit. I got it wrong, your version was correct. ॐ is a ligature of ओ (o) + ँ (m̐), so it should be oṃ, not auṃ. Sorry! --Anatoli T. ^{(обсудить}/^вклад) 06:36, 16 September 2016 (UTC)Reply

Prakrit

Latest comment: 2 years ago13 comments3 people in discussion

@Kutchkutch: to this module I added some missing lines from Module:inc-pra-Deva-translit, I think it should now be able to handle Prakrit transliteration too (as it is the difference in there transliterations in little). We could change this to Prakrit's Devanagari transliteration module and delete Module:inc-pra-Deva-translit. Svartava2 (talk) 14:24, 17 October 2021 (UTC)Reply

@Svartava2:

Done Observe if there are any errors. Kutchkutch (talk) 12:26, 18 October 2021 (UTC)Reply

@Kutchkutch: thanks, I'll be on the lookout. Svartava2 (talk) 13:48, 18 October 2021 (UTC)Reply

@Kutchkutch: Why did you change it back and create Module:inc-pra-Deva-translit? —Svārtava ^{[t•c•u•r]} 10:42, 14 December 2021 (UTC)Reply
@Svartava2: This module is used for several languages and

eCC → ĕCC & oCC → ŏCC

ऎ, ऒ, य़

may only be needed for Prakrit. Are these to be used for other languages as well? Kutchkutch (talk) 10:51, 14 December 2021 (UTC)Reply

@Kutchkutch: Just because certain part of a module is only for Prakrit, I don't think the whole module should be duplicated. Adding it in this module would at most mean that if these characters occur in any word of the other languages using this module, they will be transliterated as well. What's the harm? —Svārtava ^{[t•c•u•r]} 13:12, 14 December 2021 (UTC)Reply

Svartava2 There may be no harm regarding ऎ, ऒ, य़. However, with a single module with the rules eCC → ĕCC & oCC → ŏCC, Sanskrit हेक्का would be hĕkkā instead of hekkā, and Sanskrit योद्धृ would be yŏddhṛ instead of yoddhṛ. Is this appropriate for Sanskrit? Kutchkutch (talk) 14:48, 17 December 2021 (UTC)Reply

Kutchkutch It won't be appropriate for Sanskrit. However, I still can't see es as ĕs in Prakrit: for example at अभिसेअ (abhisea). —Svārtava ^{[t•c•u•r]} 15:08, 17 December 2021 (UTC)Reply

Svartava2 What eCC → ĕCC & oCC → ŏCC means is that e & o before two consonants becomes ĕ & ŏ, respectively. So, अभिसेअ (abhisea) does not qualify since e does not come before two consonants. See Module:inc-pra-Deva-translit/testcases. Kutchkutch (talk) 15:21, 17 December 2021 (UTC)Reply

@Kutchkutch, Svartava, Svartava2, RichardW57 Are ĕ & ŏ even appropriate for Prakrit? What transliteration schemes are they used in? The documentation for Prakrit says to use IAST, which doesn't know these symbols, and the Library of Congress claims not to make any such distinction. While the short vowels might occur in open syllables (as metrical evidence indicates for early Pali), what orthography makes the distinction? If we should keep the breve, a simple test on the language is all it needs for the modules to be reunited, as in Module:Brah-translit. --RichardW57m (talk) 13:23, 11 May 2022 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @RichardW57, RichardW57m: The transliteration schemes of the following primary Prakrit resources distinguish short e and short o from long e and long o in closed syllables:

{{R:inc:Pischel}}

{{R:inc:Woolner}}

http://prakrit.info/prakrit/grammar.html?r=phonology

Since this is a documented phonological distinction that is entirely predictable, it would be useful to indicate it in the transliteration of Prakrit with the breve diacritic as in Pischel and Woolner (prakrit.info transliterates the long vowels as ō and ē). prakrit.info makes the distinction in the orthography using ऎ and ऒ for the short vowels, which can be used in the |head= parameter instead of page titles. The addition of the breve diacritic (and the dot above y as ẏ) is not as drastic of a change as other phenomena such as homorganic nasal assimilation at MOD:hi-translit, which changes the character entirely (ṃ → [ṅ, ṇ, ñ, n, m]). The documentation could be changed to IAST with some modifications for Devanagari along with a more detailed description of the Brahmi and Kannada scripts.

(see User_talk:Kutchkutch#ಬೋಲ್ಲಇ, Talk:रोकड़, Template talk:pra-noun)

(see WT:Beer_parlour/2021/January#Use_ISO_15919_for_Hindi_transliteration).

Kutchkutch (talk) 00:58, 12 May 2022 (UTC)Reply

@Kutchkutch, Svartava:: The claim "Since this is a documented phonological distinction that is entirely predictable, it would be useful to indicate it in the transliteration of Prakrit" makes no sense to me, because if it is entirely predictable, there is no need to complicate matters by marking the distinction. Spelling is spelling, pronunciation is pronunciation.

A different justification is that, if Wiktionary is to be believed, then the Kannada script now makes the distinction when used for Prakrit. (I'd like to see evidence without having to force it via RfV.) As the orthographic distinction of short and long 'e' and 'o' is fairly recent, dating back I believe to Western European missionary effort in India, we're going to have to work out what to do for the forms in inscriptions. Or was this an independent orthographic reform? I've seen the Sanskrit vowels marked as long in a Kannada script version of the Bhagavad Gita. --RichardW57m (talk) 12:51, 12 May 2022 (UTC)Reply

As a point of information, Pischel reckons that the length of Prakrit final <e> and <o> after long vowels is metrically determined; in this environment, [ĕ] and [ŏ] may be written <e> and <o> or <ǐ> and <ŭ> in the old sources. It looks as though our automated transliteration mechanisms will fail for some quotations in verse. --RichardW57m (talk) 10:13, 23 May 2022 (UTC)Reply

Vedic accents

Latest comment: 5 months ago29 comments4 people in discussion

Getting Text

@RichardW57m Hi again. I am trying to implement the Devanagari Vedic accents, but there's one irritating issue: the bold formatting (3 apostrophes) apparently splits up the text in different strings, so for example in the second line of the quote at विष्णु (viṣṇu) there would incorrectly be an accent on the first syllable of 'stavate' (because at the start of a string).

There was a somewhat similar issue at the Hindi transliteration (using template 'hi-x'), but there I could solve it because the 'bold' apostrophes were simply part of the string. Any ideas to fix this? Exarchus (talk) 16:05, 29 February 2024 (UTC)Reply

@Exarchus: 'Automatic' transliteration is not suitable for text with Devanagari Vedic accents - Module:languages#Language:transliterate. If you believe the documentation, it's probably not suitable for many writing systems, e.g. most Mainland SE Asian Indic writing systems. However, the problem cases usually don't occur.

The solution is to bypass the standard mechanism (probably including {{xlit}}) and have a template to perform {{#invoke:sa-translit|tr}} directly to deliver the transliteration. As there are multiple ways of writing the Vedic accent, you may anyway need a more elaborate entry point to specify the system to be transliterated from, just as Module:pi-translit has an extra entry point trwo to enable one to specify which Thai- or Lao-script writing system is being used. One can also find that conflicts between word and syllable boundaries can force one to provide a complete manual transliteration.

Pinging @Theknightwho in case he knows a better solution. --RichardW57 (talk) 19:36, 29 February 2024 (UTC)Reply

@RichardW57 I don’t understand why you say automatic transliteration isn’t suitable while also saying we should use direct invocations. @Benwing2 can you make more sense of this? Theknightwho (talk) 21:51, 29 February 2024 (UTC)Reply

@Theknightwho I don't understand either. @RichardW57 can you clarify? Benwing2 (talk) 01:46, 1 March 2024 (UTC)Reply

@Benwing2: I was assuming that the referenced text by @Theknightwho was both clear enough and relevant: "This function assumes tr(s1) .. tr(s2) == tr(s1 .. s2). When this assertion fails, wikitext markups like ''' can cause wrong transliterations". Automatic transliteration breaks text at the triple quote needed for emboldening the word that is being quoted, transliterating text either side of a triple quote independently. However, the Vedic accentuation notation is stateful, so one cannot (TBC) always transliterate chunks independently. Certainly the reading rules that @Exarchus knows don't work.

On the other hand, a direct invocation would not split the string, so for text from the Rig Veda, the rule that a pada-initial unmarked syllable is udatta can be applied because the whole text, including mark-up, is passed to the Lua function tr. (I don't know how the segmentation into padas would be done.) --RichardW57 (talk) 13:15, 1 March 2024 (UTC)Reply

segmentation of pada's can be done by detecting danda (। ॥) or "<br\>" (without backslash) Exarchus (talk) 13:21, 1 March 2024 (UTC)Reply

@Exarchus: Is this some property of accented verse? I've often noted pairs of padas separated only by white space, and sometimes not even that, e.g. the first verse at https://sa.wikisource.org/wiki/%E0%A4%AD%E0%A4%97%E0%A4%B5%E0%A4%A6%E0%A5%8D%E0%A4%97%E0%A5%80%E0%A4%A4%E0%A4%BE/%E0%A4%85%E0%A4%B0%E0%A5%8D%E0%A4%9C%E0%A5%81%E0%A4%A8%E0%A4%B5%E0%A4%BF%E0%A4%B7%E0%A4%BE%E0%A4%A6%E0%A4%AF%E0%A5%8B%E0%A4%97%E0%A4%83. --RichardW57 (talk) 13:36, 1 March 2024 (UTC)Reply

I don't think the first lines there are part of a pāda, rather an introductory prayer I think. I'd think the first pada starts at धर्मक्षेत्रे Exarchus (talk) 13:46, 1 March 2024 (UTC)Reply

@Exarchus The point is that the first line "धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ।" contains two padas, "धर्मक्षेत्रे कुरुक्षेत्रे" and "समवेता युयुत्सवः". Of course, this is not an accented text. --RichardW57 (talk) 14:36, 1 March 2024 (UTC)Reply

@RichardW57 Maybe I have been using 'pada' incorrectly, as the example at विष्णु (viṣṇu) then also has 4 padas. What I meant was semiverse, delimited by (double) danda. Exarchus (talk) 15:25, 1 March 2024 (UTC)Reply

I have been looking at this manuscript (first Rigveda hymn) and it uses danda to actually delimit words, but it does seem to use the rule that anything between danda's is a unit with the accent rules that I've been talking about, as it has a lot more anudatta marks than would be needed in a text with less danda's. Exarchus (talk) 15:43, 1 March 2024 (UTC)Reply

@Benwing2, Theknightwho: I think the confusion may be in what is invoked by direct invocation. I was referring to direct invocation of a transliteration module, not of the method, which is an extremely roundabout route. --RichardW57 (talk) 15:04, 1 March 2024 (UTC)Reply

@RichardW57 About the bold formatting, I notice that when I use Hindi text in the 'Q' or 'usex' templates, the apostrophes are not part of the string. In 'hi-x', they are. So either some 'sa-x' template is needed, or ... the known bug at Module:languages#Language:transliterate should be fixed (if possible). Or maybe create specific rules for Sanskrit at the quotations module, as it would be useful to keep that functionality. Exarchus (talk) 19:30, 1 March 2024 (UTC)Reply

I disagree with your conclusion. Note that a request to delete {{hi-usex}} has been raised. A better solution would be a working transliteration template sa-tr, which could then be shared with all the quotation modules. I've opened a general discussion at Wiktionary:Grease pit/2024/March#Access to Raw Transliteration. --RichardW57 (talk) 21:35, 1 March 2024 (UTC)Reply

Ambiguity of Fragments

@Exarchus: In this particular case, स्तवते वी॒र्ये॑ण (stáváté vīryèṇa), wouldn't the lack of any accent marks before the anudatta at the start of vīryèṇa be incompatible with an accent on stavate? --RichardW57 (talk) 19:47, 29 February 2024 (UTC)Reply

@RichardW57 The thing is that the first syllable of a pāda is udatta (acute accent) unless otherwise indicated. Actually, 'stavate' would have three accents.

I have been looking at https://www.evertype.com/standards/iso10646/pdf/vedic/Vedic_accents_doc.pdf and yes, there are multiple ways of indicating the accent, but I think the one for Rigveda is by far the most commonly known. To incorporate the system for the Sāmaveda (with superscript numbers) would be easy.

There is of course the discussion if one wants to 'split' the words combined by external sandhi, that would have to be done manually. And then a further discussion would be if you want to metrically restore the Rigveda text (so no independent svarita).

I'll look at the suggestions to evade this issue with bold formatting. Exarchus (talk) 20:54, 29 February 2024 (UTC)Reply

@Exarchus: But if the 3rd syllable of stavate was udatta, wouldn't the first syllable of vīryeṇa have to be svarita or udatta rather than anudatta? What you have to do in the standard transliteration environment is to work out whether the first syllable of the string is the first syllable of the pada. I'm sure it can't always be done; I just don't know how often it can be done. --RichardW57 (talk) 21:46, 29 February 2024 (UTC)Reply

@RichardW57 udatta + anudatta + udatta is a possibility, see rule 5 on page 2 here

There's another problem with the example at विष्णु (viṣṇu), as the string "प्र तद्" (pra tad) has no accent indicated, so it would be skipped as being unaccented text, but it actually has two udattas. Exarchus (talk) 21:58, 29 February 2024 (UTC)Reply

@Exarchus: For anudatta, anudatta, the non-initial marking would be none, anudatta.

For udatta, anudatta, the non-initial marking would be none, svarita.

For anudatta, udatta, the non-initial marking would be anudatta, none.

Therefore, we must have two udattas,and there is no ambiguity. --RichardW57 (talk) 13:29, 1 March 2024 (UTC)Reply

My point about "प्र तद्" (pra tad) was that I obviously don't want to put acute marks on all Sanskrit words without accents, and in this string there is no indication of it being accented text.

The non-initial mark for anudatta, anudatta could also be none, none if no svarita or udatta follows. Exarchus (talk) 13:41, 1 March 2024 (UTC)Reply

@Exarchus But the relevant text starts प्र तद्विष्णुः॑. The fourth syllable is marked with STRESS SIGN UDATTA, which on that syllable surely must be a dependent svarita. --RichardW57 (talk) 14:16, 1 March 2024 (UTC)Reply

Yes, but I thought we were talking about the issue of the bold formatting splitting up text in strings, and as it is now "प्र तद्" is one string. Without this formatting, the module as I have it now works fine (I just need to add the case of independent svarita + independent svarita) Exarchus (talk) 14:26, 1 March 2024 (UTC)Reply

@Exarchus: Fair point! RichardW57 (talk) 14:28, 1 March 2024 (UTC)Reply

@Exarchus: In the sequence udatta, anudatta, udatta, wouldn't the anudatta be marked as such? --RichardW57 (talk) 14:27, 1 March 2024 (UTC)Reply

Yes, always before udatta, but you can't know whether the first syllable is udatta or anudatta without knowing what comes before (first syllable of pada → udatta; other udatta before it → udatta; svarita or anudatta before it → anudatta) Exarchus (talk) 14:48, 1 March 2024 (UTC)Reply

But you can sometimes work out what comes before by looking at what comes afterwards. --RichardW57 (talk) 15:11, 1 March 2024 (UTC)Reply

Validity of Quotation

@Benwing2, Exarchus Is the quotation used for विष्णु (viṣṇu) valid? The page author has split the 'dv' conjunct, so we get an extra orthographic syllable in the text! (Human memories are not durably archived, so far as I am aware.) --RichardW57 (talk) 14:55, 1 March 2024 (UTC)Reply

This was obviously done to be able to put "विष्णुः" in bold, I don't have a strong opinion on whether this is acceptable or not. What I have noticed is that the page on wikisource does not have the visarga in "तद्विष्णु". This page also has 'víṣṇu'. Exarchus (talk) 15:18, 1 March 2024 (UTC)Reply

I think the droppinɡ of visarɡa is here an allowed sandhiː -ḥ st- > -sst- > -st-. It's footnote 3 at http://spell.psychology.wustl.edu/sandhi-WCCFL/WCCFL-sandhi.html.en.utf8 . --RichardW57 (talk) 04:06, 2 March 2024 (UTC)Reply

Add topic