Module talk:ru-translit

(Redirected from Module talk:ru-translit/documentation)
Latest comment: 8 months ago by Atitarev in topic Middle Russian

Adding word stress

edit

Will {{#invoke:ru-translit|tr|сло́во}} (with an accent) return "slóvo"? --Anatoli (обсудить/вклад) 02:09, 14 March 2013 (UTC)Reply

You can test it yourself. Look here: [MODULE CALL REDACTED] —Μετάknowledgediscuss/deeds 02:19, 14 March 2013 (UTC)Reply
I see now how you can test it! Does it work for longer texts?
Testing on a random news text:
TEST: [MODULE CALL REDACTED] --Anatoli (обсудить/вклад) 02:34, 14 March 2013 (UTC)Reply

The code is too opaque, I don't understand it!

edit

I am afraid that this code was written to be so clever that I can't understand it. The variable names mean nothing, there are barely any comments to explain what each step does and why. How does the module actually approach the problem? What is the flag parameter for? I would really like this module to be cleaned up and made more readable. This is a wiki after all, so everyone with enough knowledge of Lua should be able to easily edit this, and the last thing we need is more arcane code that nobody except the creator can maintain. —CodeCat 03:22, 14 March 2013 (UTC)Reply

Hard to follow indeed.
  1. "if not mw.ustring.match(flag,"г") then word=mw.ustring.gsub(word,"([ое][́̀]?)го([́̀]?)$","%1vo%2")" romanises -ого"/"-его" as "-ovo"/"-(j)evo"
  2. "word = mw.ustring.gsub(word,"([АОУЫЕЯЁЮИЕаоуыэяёюиеъь][́̀]?)е","%1je");" romanises Cyrillic "е" as "je", not "e" after any of "АОУЫЕЯЁЮИЕаоуыэяёюиеъь".
  3. "word = mw.ustring.gsub(word,"([жшчщЖШЧЩ])ё","%1o");" romanises "ё" as "o" after any of жшчщЖШЧЩ. --Anatoli (обсудить/вклад) 06:17, 14 March 2013 (UTC)Reply
I think I am starting to understand the general idea. But how does the code know when to transliterate г as v? Does it have something to do with the flag parameter? Personally, I don't think the module should be capable of handling such irregular exceptions. It should provide a sensible default, but the default should be able to be overridden if necessary. That is what the tr= parameter would be for, after all... —CodeCat 14:01, 14 March 2013 (UTC)Reply
That beats me, we need Ignatus to reply.
Romanising "-ого"/"-его" as "-ovo"/"-(j)evo" should NOT be done via the tool. Russian has words "много", "ого" where "г" is pronounced as expected or /h/ in ого as a variant. Override manually.
Same with "Чч" as "š", in что, чтобы, конечно. Override manually.
Consistent change - "Ё,ё" as "o" after жшчщЖШЧЩ - OK.
Consistent change - "Е,е" as "je" after АОУЫЕЯЁЮИЕаоуыэяёюиеъь - OK. Add ALL capitals Ъ, Ь to the list.
Don't use "ɛ" at all! It's reserved for foreign words where consonants (бвгдзклмнпрстфх) (excluding жшчщЖШЧЩ and "цЦ") after "е". I don't follow the logic of the code but override manually. In short, "Э, э" is always "e", "Е, е" is "e" or "je" after АОУЫЕЯЁЮИЕЪЬаоуыэяёюиеъь.
This should make the code simpler. Please ask if it's confusing. --Anatoli (обсудить/вклад) 22:34, 14 March 2013 (UTC)Reply
Well, let me reply. Yes, maybe my idea with flags was not good. Simplifications you described can be accepted except it's better to handle -ого/-его by default; most words ending on them are genitives; there just should be a switch-off for cases fhen thy are definitely not, e.g. for {{ru-verb}}. "Что" should be listed as exception since it appears very often, other words with ч=ш may be transliterated manually. Exceptions with е=э are frequent altogether but each word with them is not very, so they will cause need in manual input common. Maybe we should use another way to denote specialities to letters for translit and inflection, like marking them in-place once in template (see my talkpage for suggestion). And, OK, I don't like now that in the module there are different functions for single words and phrases; we should rename phr into tr, and curent tr use innerly if it needed at all. Ignatus (talk) 14:13, 15 March 2013 (UTC)Reply
I would prefer it if exceptions are not added to the module at all, but are just supplied with the tr= parameter. So the module is only used to provide a default. —CodeCat 14:22, 15 March 2013 (UTC)Reply
  • The module was rewritten. It transliterates any phrases with function tr in simplest manner except that words starting on что and ending on ого and его are always treated specially; if genitives are obviously not awaited, as for {{ru-adv}}, parameter nogen= with any value can be sent to #invoke. Restore finally the doc subpage please and documentate this; I'm going now to do changes to affected templates. Ignatus (talk) 13:02, 16 March 2013 (UTC)Reply

There are a few problems

edit

The most obvious one first:

Thank you for your efforts.
I suggest we should remove lines 17 to 23 altogether:
    --handle genitive endings, which are spelled -ego but transliterated -evo
    if not frame.args['nogen'] then
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?)$","%1vo%2")
        word = mw.ustring.gsub(word, "([ое][́̀]?)го([́̀]?%A)","%1vo%2")
    end
    --Handle common exception words with ч
    word = mw.ustring.gsub(word, "[А-ЯЁа-яё][А-ЯЁа-яё́̀]*",function (w) return w:gsub("^Что","Što"):gsub("^что","što") end)
Let's have a manual override for these kinds of exceptions. The adjective declension tables will have a note on pronunciation of "-ого/-его". As for "что", "чтобы", "что-то", "ничто", "конечно". It's easier to add manual override than rely on the list of exceptions. --Anatoli (обсудить/вклад) 10:17, 17 March 2013 (UTC)Reply
I agree. Also, are there words in Russian where the stem ends in -g- and they receive -o as an inflectional ending? Like neuter adjectives? —CodeCat 14:39, 17 March 2013 (UTC)Reply
The stem ending in "г" has be preceded by "о" or "е" for this test. That would be "строго" (both an adverb and an adjective form).
More arguments in favour of removing "что"'s special treatment:
The word "что", pronounced "što" is not always at the beginning of the word, e.g. "кое-что", "ничто". The string "-что-" is pronounced by the rules ("čto") in words like "ничтожный", "ничтожество".
There are other words where "ч" is not pronounced as "š" or there are variant pronunciations. --Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)Reply

--Anatoli (обсудить/вклад) 22:34, 17 March 2013 (UTC)Reply

How can this be used from another Lua module?

edit

It currently requires a frame, which means it can't be used from Lua. Can this be fixed please? —CodeCat 02:03, 11 April 2013 (UTC)Reply

I don't think I can help but Module:ko-translit (by Ruakh) is written differently and uses calls to another module - Module:ko-hangul. --Anatoli (обсудить/вклад) 03:36, 11 April 2013 (UTC)Reply
I see. If I were to change this module to work like that one, then all the current uses of the "tr" function from within templates will break. That isn't a bad thing necessarily as long as someone is on standby to fix them all. Would you be so kind? —CodeCat 12:36, 11 April 2013 (UTC)Reply
Why are you changing? To convert Russian verb templates to Lua? I can change while there are not so many calls from templates at the moment, judging by [1]. Will we still be able to call the module from templates? --Anatoli (обсудить/вклад) 12:44, 11 April 2013 (UTC)Reply
Yes, it should still be callable from templates. I could also decide not to make this module work the same as Module:ko-translit but then that would be a bit inconsistent. I will ask Ruakh what he thinks. —CodeCat 12:56, 11 April 2013 (UTC)Reply
Yeah, it's annoying that it's so difficult, in the general case, to make a function that works smoothly from both Lua and templates. My general preference is to make functions that work smoothly from Lua, and then if needed to make a wrapper that can be called from templates, since the opposite approach is so obviously bad. But in this specific case, it's easy to make this function work both ways, so I've gone ahead and done so. —RuakhTALK 14:24, 12 April 2013 (UTC)Reply

Module:ru-translit's signature and a call match a bunch of other transliteration modules. You could probably change the code but leave the signature as is? The difference with Module:ko-translit is that it's invoked {{#invoke:ko-translit|main|tr|한국어}} (also with optional params) --Anatoli (обсудить/вклад) 13:10, 11 April 2013 (UTC)Reply

I have changed both the code and the signature (this entire discussion is about changing the signature to make it callable from Lua, so your suggestion that "You could probably change the code but leave the signature as is?" doesn't really make sense), but in a compatible way that shouldn't break existing uses. —RuakhTALK 14:24, 12 April 2013 (UTC)Reply
@Ruakh. Thank you. My comment about signature was to CodeCat. By signature I meant the module name, the function name, number and type of parameters - in other words what you just did. --Anatoli (обсудить/вклад) 14:41, 12 April 2013 (UTC)Reply

Transliterate ё as jó instead of jo?

edit

When there is an accent mark on some forms of a word, it is a bit strange when there is none on this one. So, блую́ appears as blujú but блуёт appears as blujot without any accent. That seems a bit inconsistent. —CodeCat 16:36, 12 April 2013 (UTC)Reply

You probably mean блюю́ and блюёт? There's a comment on WT:RU TR: The vowel “ё” is normally stressed in native Russian words, but occasionally it may be necessary to show the stress for this letter: “ё́”. A few exceptions are when multipart words with ё have stresses on other syllables (трёхме́стный - three-seater (adj)) and some rare loanwords. It looks a bit ugly with a stress and Russians never put accent on it. No template stresses it here either. The dots serve as a pronunciation indicator, since most of the time "ё" is written as "е" causing confusion. --Anatoli (обсудить/вклад) 16:46, 12 April 2013 (UTC)Reply

I'm not sure how to solve that, then... —CodeCat 16:48, 12 April 2013 (UTC)Reply
Let's change jo to jó in the module. --Anatoli (обсудить/вклад) 23:31, 12 April 2013 (UTC)Reply
But if it's like you said, then that might cause a word to have two accents in it, if it contains two ё's. —CodeCat 23:33, 12 April 2013 (UTC)Reply
It's OK to transliterate трёхме́стный as trjóxméstnyj and четырёхугольник as četyrjóxugólʹnik with a second stress, at least for the module. --Anatoli (обсудить/вклад) 23:37, 12 April 2013 (UTC)Reply

@CodeCat. I like your idea of transliterating ё selectively as you suggested on WT:RU TR. So, for monosyllabic words would be пёс/чёрт would become (čort/pjos) - no accent, polysyllabic пёстрый/жёлтый - pjóstryj/žóltyj, polysyllabic with another ё and an acute accent on another syllable only the syllable with the accent - чёрно-белый - čorno-bélyj? This might take a bit of coding, though but would be great if you could do it, please. --Anatoli (обсудить/вклад) 05:40, 16 April 2013 (UTC)Reply

I have realised that as well... the module would have to split the text into words first, and then put it back together again later. I have looked into a way to make it work, but I'm not really sure how to write the code. Something like чёрно-белый would come out as čórno-belyj, but what should чёрно-бе́лый become? čórno-bélyj or čorno-bélyj? In other words, does the - separate words that have individual stress, or not? And if so, is that for all words or only some? I'm beginning to think that this may not be as easy as it seemed at first. —CodeCat 12:39, 16 April 2013 (UTC)Reply
It would be easier if it was designed for single words, wouldn't it? :) Let's consider words with "-" solid words with one accent, so "чёрно-бе́лый" (black and white) should become "čorno-bélyj" but I'll use better examples without "-":
"трёхме́стный" (three-seated), "четырёхуго́льный" (quadrangular) ideally should become "trjoxméstnyj", "četyrjoxugólʹnyj"
четырёхколёсный (two "ё"), no stress at all (četyrjoxkoljosnyj) or two stresses (četyrjóxkoljósnyj), whatever is easier.
Please let me know if you have questions or suggestions. These situations are rare, so it's not critical. Even "trjóxméstnyj", "četyrjóxugólʹnyj" do not look terrible, one might consider the words as having two accents, they are compound words, anyway. --Anatoli (обсудить/вклад) 13:38, 16 April 2013 (UTC)Reply

Problems continued

edit

Words with hyphen and passing head argument have problems --user:Dixtosa 18:43, 31 May 2013 (UTC)Reply

This should be fixed by adding {{delink}} to all templates. But I wonder why this can't be done at the level of the module. DTLHS (talk) 18:52, 31 May 2013 (UTC)Reply

Capital "Е" - Е́сли, Если

edit

Capital "Е" without a stress mark is not transliterated properly: [MODULE CALL REDACTED] currently gives "Jésli, Esli", it should be "Jésli, Jesli". For some reason in делать из мухи слона the stressed "Е́сли" is "Ésli". --Anatoli (обсудить/вклад) 00:02, 17 July 2013 (UTC)Reply

Thanks for fixing, Z! --Anatoli (обсудить/вклад) 22:49, 25 July 2013 (UTC)Reply

Ѣ

edit

This edit.

   word = mw.ustring.gsub(word, "^Ѣ","Jě")
   word = mw.ustring.gsub(word, "^ѣ","jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])Ѣ","%1Jě")
   word = mw.ustring.gsub(word, "([^Ѐ-ӿ])ѣ","%1jě")

What is going on here? Michael Z. 2013-10-21 15:36 z

I don't understand? What are you asking? See ѣсть for an example of an entry that is affected by the change. —CodeCat 15:41, 21 October 2013 (UTC)Reply

Discussion leading up to making -го as -vo and что as što be the default

edit

(moved from Template talk:ru-ux)

@Atitarev, Cinemantique, Wikitiki89 I created this template along with {{ru-xlit}} to make it easier to create long usage examples in Russian without having to specify manual transliteration to handle adjectival -го, что, and other such things. It is like {{ux|ru}} but supports three extra parameters: (signature at top in case ping isn't sent: Benwing2 (talk) 22:47, 6 January 2016 (UTC))Reply

  1. |adj=: Transliterate -го as -vo
  2. |shto=: Transliterate что as što
  3. |sub=: Apply arbitrary Lua pattern substitutions to the Cyrillic text, esp. to handle cases where е should be transliterated as ɛ.
  • I'm thinking maybe adj= and shto= should be made the default, so if you don't want them you need to turn them off with adj=n or shto=n. What do you think?
  • Also, I'm thinking of adding support for this to templates like ru-phrase and ru-adj. Sound good?

Benwing2 (talk) 22:47, 6 January 2016 (UTC)Reply

    • I don't mind. Please be aware that чтобы, кое-что, что-нибудь, что-то, что-либо, ничто, etc. also should use "š". These words are derivations but are pronounced regularly - нечто, ничтожество, ничтожный, etc., should use "č". Unrelated words with "-что-" - уничтожать, почтовый, etc. should use "č". It's not so straightforward but probably feasible for you.
    • I would like Russian references to use "russkovo" instead of "russkogo" in Category:Russian reference templates to make it consistent but Vahagn will probably oppose. BTW, "russkogo" is used more often in referenced books but "russkovo" is also present, also in book titles. --Anatoli T. (обсудить/вклад) 23:22, 6 January 2016 (UTC)Reply
Currently the code for что substitution checks to see if there is a word boundary at both ends, so it will also apply to кое-что, что-нибудь, что-то, что-либо, etc. but not to чтобы or ничто, which I can special-case. Any other such words? Benwing2 (talk) 23:25, 6 January 2016 (UTC)Reply
I would need a list of Russian words containing "что". There won't be too many. enwiki words should be sufficient. --Anatoli T. (обсудить/вклад) 23:32, 6 January 2016 (UTC)Reply
OK, your list confirms what I said, no other words, words with final -его/-ого will also need a list of exceptions ("g", not "v") - много, немного, лого, лего, сого, ого, possibly some loanwords but сегодня and its derivations also use "v".--Anatoli T. (обсудить/вклад) 23:56, 6 January 2016 (UTC)Reply
Here it is (this includes expressions with что):

Benwing2 (talk) 23:41, 6 January 2016 (UTC)Reply

@Wikitiki89 Did you see the above discussion? I am thinking of making adj=y and shto=y the default for ru-ux and maybe other things like ru-phrase after accounting for words like много, so you'd have to turn them off with noadj=y or noshto=y. The purpose is to avoid having to have lots of manual transliterations. Benwing2 (talk) 00:33, 12 January 2016 (UTC)Reply

Possibility of making special-casing for -го and что the default in transliteration

edit

@Atitarev, Cinemantique, Wikitiki89 I've now implemented special-casing for -го and что in {{ru-ux}} and made it the default. The special-casing for -го (in genitives) is carefully written: It applies specifically to -ого/-его/-аго at the end of a word or followed by -ся, and it also catches сегодня and words beginning with сегодняшн-, and per Anatoli it has exceptions to ensure that много,немного,лого,лего,сого,ого don't get modified. The special-casing for что is also careful to apply only to что,чтобы,чтоб,ничто as whole words. (Note that "end of word" allows for a following hyphen, so cases like кого-либо, что-нибудь will be handled correctly.) Benwing2 (talk) 20:21, 15 January 2016 (UTC)Reply

What do people think about making this the default for all transliteration? This would solve a lot of issues that come up currently in various places, e.g. in бомж, the expansion is rendered лицо́ без определённого ме́ста жи́тельства (licó bjez opredeljónnovo mjésta žítelʹstva) with -ogo instead of -ovo. It could still be overridden using tr=, if necessary. The special-casing shouldn't slow things down due to the way it's written. Benwing2 (talk) 20:21, 15 January 2016 (UTC)Reply

Того

edit

(moved from Talk:Того)

@Benwing2 I missed this one. Does it need a manual transliteration "Tógo" to distinguish from того́ (tovó), which can also be capitalised at the beginning of a sentence? --Anatoli T. (обсудить/вклад) 20:40, 18 January 2016 (UTC)Reply

The stress is consistently different, though, can this be used in your logic for the automatic xlit? --Anatoli T. (обсудить/вклад) 20:42, 18 January 2016 (UTC)Reply
I added stressed То́го to the exceptions. Conceivably I could add unstressed Того there as well, but that would fail if того́ ever occurs at the beginning of a sentence and written without an accent (and того́ is much more common than То́го). Can того ever occur sentence-initially? Benwing2 (talk) 22:30, 18 January 2016 (UTC)Reply
Yes, it can. I think for ambiguous cases like sentence-initial "Того" without a stress mark (unknown sense), we should use "g" in translit and [ɡ] in IPA. Adding a stress mark would fix it. (I may have missed other loanwords with final "-ого" or "-его" where it should be "g" but I can't think of others at the moment). --Anatoli T. (обсудить/вклад) 22:44, 18 January 2016 (UTC)Reply
OK, I'll implement that. Benwing2 (talk) 22:46, 18 January 2016 (UTC)Reply

short forms of adjectives in -го

edit

(moved from Talk:дорого)

@Atitarev This should be another exception to the /v/ pronunciation right? Benwing2 (talk) 18:27, 10 April 2016 (UTC)Reply

@Benwing2 Yes, please! --Anatoli T. (обсудить/вклад) 20:25, 10 April 2016 (UTC)Reply
@Benwing2 Please also add недо́рого (nedórogo). --Anatoli T. (обсудить/вклад) 20:53, 10 April 2016 (UTC)Reply
@Benwing2 There are more - (не)стро́го, убо́го, поло́го, short neuter adjectives длинноно́го, коротконо́го, кривоно́го. --Anatoli T. (обсудить/вклад) 21:16, 10 April 2016 (UTC)Reply
OK thanks. Benwing2 (talk) 21:35, 10 April 2016 (UTC)Reply
@Atitarev Done. The following should all be handled correctly:
Benwing2 (talk) 02:07, 11 April 2016 (UTC)Reply
@Benwing2 Thank you. I am sorry I missed some terms earlier. I wonder if the search for string "ого" in the final position can be done in ruwikt, so that we could find more (potential) examples? All words with "-legged" suffix (like "длинноногий" - "long-legged") are affected. Need to check all -огий adjectives, if they have short forms, then they will need [ɡ] in pronunciation. --Anatoli T. (обсудить/вклад) 02:39, 11 April 2016 (UTC)Reply
@Cinemantique I'm not sure how to search ruwikt but maybe Cinemantique can help. Benwing2 (talk) 02:50, 11 April 2016 (UTC)Reply

I have tried this but this gives too many results. --Anatoli T. (обсудить/вклад) 03:04, 11 April 2016 (UTC)Reply

@Atitarev Here's the list of pages that are words (mostly adjectives) in -огий. Haven't checked which ones have short forms. Benwing2 (talk) 06:37, 11 April 2016 (UTC)Reply
BTW in -огой are only the following:
None in -егий or -егой. Benwing2 (talk) 06:39, 11 April 2016 (UTC)Reply
@Benwing2 Yes, all of these need the same treatment, if they have short forms. In the -егий-group there's an adjective пе́гий (pégij, piebald, skewbald (esp. of horses)), which can have short forms. --Anatoli T. (обсудить/вклад) 07:51, 11 April 2016 (UTC)Reply
@Benwing2 The short neuter in пегий shows "pévo", it should be "pégo". Pls add it to the exceptions. --Anatoli T. (обсудить/вклад) 02:37, 13 April 2016 (UTC)Reply
@Atitarev Will do. Benwing2 (talk) 03:02, 13 April 2016 (UTC)Reply

итого́

edit

@Atitarev should this be another exception? Benwing2 (talk) 03:25, 29 April 2016 (UTC)Reply

No, it's pronounced "итово́", from и + того́. --Anatoli T. (обсудить/вклад) 03:55, 29 April 2016 (UTC)Reply

apostrophes are occasionally used for ъ

edit

@Benwing2: Hi. This is not a current standard but an alternative spelling, was quite common at various periods. ' can replace ъ in all non-final position and it should produce the same results, especially with the following Cyrillic "е". So, подъе́зд (podʺjézd) can be spelled as под’е́зд (pod’ézd) (currently pod'ézd but it should be transliterated as podʺjézd or pod'jézd (not sure about the actual apostrophe - ʺ or '?) and it is pronounced the same way, it also applies to other iotated vowels but there's no problem with the transliteration at the moment with other letters. I hope it's not hard to implement. Please let me know what you think. --Anatoli T. (обсудить/вклад) 03:07, 11 April 2018 (UTC)Reply

@Atitarev I think the module can be fixed to insert a j as you request between apostrophe and е. Should the apostrophe map to an apostrophe or to ʺ? I think maybe the latter as it is standing in for a hard sign. If so I can just make a change to replace apostrophes with hard signs before transliterating, maybe only in certain circumstances (e.g. after a consonant and before a palatal vowel). Benwing2 (talk) 03:25, 11 April 2018 (UTC)Reply
@Benwing2: Yes, replace apostrophes with hard signs, ..., only in certain circumstances... - sounds good to me! --Anatoli T. (обсудить/вклад) 03:34, 11 April 2018 (UTC)Reply
@Benwing2: Certain circumstances should only be non-final positions. Please do the same treatment with all other combinations, i.e. ' = ъ. --Anatoli T. (обсудить/вклад) 04:00, 11 April 2018 (UTC)Reply
And non-final and non-initial positions. --WikiTiki89 15:42, 11 April 2018 (UTC)Reply
@Atitarev, Cinemantique, Wikitiki89, Wanjuscha, Per utramque cavernam Done. Benwing2 (talk) 19:33, 15 April 2018 (UTC)Reply

translation of terms starting with -е

edit

(moved from Talk:-езжать)

@Benwing2: Hi. I think terms starting with a hyphen and letter е (je) should transliterate as "-je" by default, without having to manually transliterate them. --Anatoli T. (обсудить/вклад) 09:50, 28 December 2019 (UTC)Reply

@Atitarev This can be done. But are we sure? The reason the module transliterates them as -e is so that common suffixes like -ев (-ev), -ение (-enije), -еть (-etʹ), -ец (-ec), etc. get transliterated with -e instead of -je, because the suffixes normally follow consonants, and would normally be rendered with -e in running text. -езжать is an exception in this regard as it normally follows prefixes ending in a vowel. Benwing2 (talk) 15:58, 28 December 2019 (UTC)Reply
@Benwing2: Good point about suffixes. The correct translit for them would probably be "-(j)e...". Thanks, let's leave it as is. --Anatoli T. (обсудить/вклад) 02:43, 29 December 2019 (UTC)Reply

щ

edit

why isnt <щ> transliterated as <śś>? AleksiB 1945 (talk) 17:23, 27 February 2022 (UTC)Reply

Transliterations of pre-1918 letters

edit

I don't have strong opinions on this, but should we use the ISO 9 standard as a baseline for transliterations of pre-1918 letters? We already do this for ѣ (), with an initial "j" where appropriate, but not for the others:

  • і (i) would be і (ì)
  • ѳ (f) would be ѳ ()
  • ѵ (i) would be ѵ ()

If we don't want to use the grave accent (due to stress confusion), we could use the older versions:

Pinging @Atitarev and @Benwing2. Theknightwho (talk) 16:37, 7 May 2023 (UTC)Reply

@Theknightwho, @Benwing2: We are already deviating from the ISO 9 standard, since it makes sense to have a more phonetic transliteration, you can see all the notes in WT:RU TR. Currently:
  1. і (i) or ѵ (i) = и (i) = "i". All archaic uses of "і" and "ѵ" are replaced with "и".
  2. ѳ (f) = ф (f) = "f". All archaic uses of "ѳ" are replaced with "ф".
Note that ъ in the final positions after consonants per pre-1918 orthography is transliterated as nothing, making миръ (mir) and міръ (mir) transliterated as the modern мир (mir).
My preference to leave it as is. Anatoli T. (обсудить/вклад) 00:48, 8 May 2023 (UTC)Reply

Iotated vowels after й

edit

I've noticed official Russian sources will commonly transliterate umlauted vowels in foreign names with iotated vowels, even when this means they come straight after й. This leads to weird spellings like Йё́нчёпинг (Jjónčoping, Jönköping), Йю́тербог (Jjúterbog, Jüterbog), Йю́вяскюля (Jjúvjaskjulja, Jyväskylä), Йя́мся (Jjámsja, Jämsä), Ва́тнайёкюдль (Vátnajjokjudlʹ, Vatnajökull), which seems to be a pretty recent phenomenon since many of these can easily be attested without the й (e.g. Ювяскюля, Ямся etc.).

Leaving aside е, the only other situations I can think of where this happens is when it immediately follows a stress (e.g. аллилу́йя (allilújja, allelujah), пайде́йя (pajdéjja) ва́йю (vájju)) which means the [j] is geminated, so it makes sense to use "jj". That doesn't apply to any of these examples. Should we change the transliteration so as to only give one "j" if previous syllable isn't stressed? i.e. Йё́нчёпинг (Jónčoping), Йю́тербог (Júterbog) etc.

With е, the situation is the other way around, since we always merge "jj" into "j": e.g. аллилу́йе (allilúje), even though this is still pronounced with a geminated [j]. Should we apply the same rule here (which would give allilújje)? This wouldn't affect words where gemination is optional/unpredictable, like фойе́ (fojé). Pinging @Atitarev @Benwing2. Theknightwho (talk) 22:44, 15 January 2024 (UTC)Reply

@Theknightwho, @Benwing2: Yes, the easiest approach/solution is to allow a geminated "jj" in these cases: allilújje, fojjé, etc. Anatoli T. (обсудить/вклад) 03:13, 16 January 2024 (UTC)Reply
@Atitarev It would only be allilújje, since I'm proposing this only if it immediately follows a stress (i.e. when it's predictably geminated). фойе́ (fojé) has optional gemination since it doesn't follow this pattern, meaning it wouldn't be caught by this. Theknightwho (talk) 03:17, 16 January 2024 (UTC)Reply
@Theknightwho: It also makes sense to use "fojjé" to reflect the spelling. хокке́й (xokkéj) and Хокка́йдо (Xokkájdo) also have ungeminated [k] but they are spelled with two к's. Same with [l] in алле́я (alléja), etc. Anatoli T. (обсудить/вклад) 04:58, 16 January 2024 (UTC)Reply
@Theknightwho, @Benwing2: Our translit system for Russian is only partially phonetic and only when readings are irregular. (Non)-geminations and vowel reductions are excluded. Anatoli T. (обсудить/вклад) 05:01, 16 January 2024 (UTC)Reply
@Atitarev That's true, but I was thinking of examples like жю (žu), шю (šu) where we already modify this. After all, the reason I raised this was my concern with things like Йё́нчёпинг (Jjónčoping), with the initial "jj" that isn't reflected in the pronunciation (since it's about consistent orthographic transcription from - in this case - Swedish), as distinct from situations where it has a genuine impact on the pronunciation. Theknightwho (talk) 05:09, 16 January 2024 (UTC)Reply
@Theknightwho: жю and шю are tricky but maybe we have to review this exception for consistency? The spelling is "still trying" to make Russians pronounce [ʑʉˈrʲi] (soft), not [ʐʊˈrʲi] for жюри́ (žurí). Жюль (Žulʹ, Jules (French name)) can also go both ways. However, парашю́т (parašút) has only one standard and common pronunciation [pərɐˈʂut].
Йё́нчёпинг (Jjónčoping) can still be spelled Ё́нчёпинг (Jónčoping). Anatoli T. (обсудить/вклад) 05:45, 16 January 2024 (UTC)Reply
@Atitarev That's true. I suppose we could change it to be "jj" in all cases, though it'll need some bugfixes to Module:ru-common first, because it handles manual transliterations in a special way that always reduces "jje" to "je" in inflection templates, even where it shouldn't: e.g. foreign names that contain ййе. I guess that would be "jjje" after the change.
Theknightwho (talk) 06:55, 16 January 2024 (UTC)Reply

Apostrophes

edit

Even though they're rare, apostrophes are a bit overloaded in Russian. They've got three uses:

  1. As a stand-in for the hard sign, e.g. с’езд (s’ezd) (= съезд (sʺjezd)).
  2. Transliterated foreign names, e.g. Кот-д’Ор (Kot-d’Or, Côte-d'Or).
  3. As a separator for Russian suffixes attached to terms in the Latin script, e.g. c-moll’ный (c-moll’nyj, in C minor).

Until now, we'd been treating any apostrophe that came after a consonant and before a vowel as a hard sign. I've just changed that slightly so that it has to be followed by a lowercase vowel, which avoids issues for many French and Italian borrowings: Кот-д’Ор (Kot-d’Or), Жанна д'Арк (Žanna d’Ark, Joan of Arc), Романо-д’Эццелино (Romano-d’Ɛccelino, Romano d'Ezzelino) etc. I also intend to add a check to make sure the preceding consonant is Cyrillic, since that should catch all terms of type 3. I guess it's still possible to get a false positive of type 2 (possibly in a rare Palladius term), but I think this should be the best balance.

I've also modified the logic for plain Es, so that they ignore the preceding apostrophe if it's actually an apostrophe (i.e. not a hard sign). You can see examples of both kinds above: с’езд (s’ezd) (iotated E, since it's really a hard sign), Романо-д’Эццелино (Romano-d’Ɛccelino) (plain E, since it's just an apostrophe). However, **О'Эган (O’Egan) isn't affected, since there's a vowel before the apostrophe. Theknightwho (talk) 13:04, 17 January 2024 (UTC)Reply

Do let me know if you think any of this should be changed at all. @Benwing2, Atitarev, Cinemantique, Wikitiki89, Wanjuscha, Per utramque cavernam Theknightwho (talk) 12:52, 17 January 2024 (UTC)Reply

@Theknightwho:
Please use the "ʺ" symbol only if the apostrophe is a stand-in for the hard sign, which you seem to have worked out. We can probably ignore all cap spellings like С’ЕЗД (S’JeZD) = СЪЕЗД (SʺJeZD). Anatoli T. (обсудить/вклад) 13:02, 17 January 2024 (UTC)Reply
@Atitarev Yeah, I agree - it's also necessary for names like Д’Ибервиль (D’Ibervilʹ). I've just come across ко̀т-д’ивуа́рский (kòt-d’ivuárskij), which should be kòt-d’ivuárskij - not sure what to do about that. Theknightwho (talk) 16:05, 17 January 2024 (UTC)Reply
@Theknightwho: Just stop the special handling for apostrophes. Withdrawing my old request. Thanks. Anatoli T. (обсудить/вклад) 20:14, 17 January 2024 (UTC)Reply
@Atitarev As in, don't ever treat them like the hard sign? Theknightwho (talk) 20:16, 17 January 2024 (UTC)Reply
@Theknightwho: Yes. Anatoli T. (обсудить/вклад) 20:44, 17 January 2024 (UTC)Reply
@Atitarev Alright. We could handle entries like с’езд (s’ezd) in the same way we do alternative jo entries, which would automate the correct transliteration on the entry itself. Maybe {{ru-alt-ъ}}? Theknightwho (talk) 20:50, 17 January 2024 (UTC)Reply
@Theknightwho, @Benwing2: Yeah, I was going to suggest soft redirects. These spellings are not standard, anyway, even if attested. Anatoli T. (обсудить/вклад) 21:01, 17 January 2024 (UTC)Reply
@Theknightwho I am inclined to agree with Anatoli here; we should simply treat apostrophes in translit as apostrophes and handle the uses as hard signs through some sort of manual marking; soft redirects possibly with your suggestion of {{ru-alt-ъ}} seem like a good idea. Benwing2 (talk) 06:02, 18 January 2024 (UTC)Reply
@Benwing2 Alrighty. I expect the number of pages this applies to is very low, in any case. Theknightwho (talk) 06:16, 18 January 2024 (UTC)Reply
@Theknightwho, @Benwing2: It will be equal (potentially) the number of post-1918 reform lemmas with ъ. (This replacement was much less common before 1918.) Not interested in making those entries, though. Cases with hyphens and other symbols also exist. Anatoli T. (обсудить/вклад) 06:25, 18 January 2024 (UTC)Reply
@Atitarev @Benwing2 Makes sense. At the moment, с’езд (s’ezd) seems to be the only entry we have (going by Category:Russian terms spelled with ').
To avoid things getting really messy (especially when you consider въёбывать (vʺjóbyvatʹ) has four possibilities), my preferred option would be to remove the "-ё" from the end of the alt-ё head templates, leaving us with {{ru-noun-alt}}, {{ru-proper noun-alt}}, {{ru-verb-alt}}, {{ru-adj-alt}} and {{ru-pos-alt}}, and to handle all of them with a new Module:ru-alt. This would compare the first parameter (the main spelling) with the page title, to determine which of the four alternative spelling categories the lemma should go in. It would then call the main module for that part of speech.
The other advantage of this is that it would stop the alt-ё templates from being as hacky as they are at the moment, since they're done entirely via wikitext (so g2, g3, g4 etc. have to manually specified, and there's a good chance they could get out of sync if parameters are changed, since they'll probably get forgotten about).
Theknightwho (talk) 03:37, 19 January 2024 (UTC)Reply
@Atitarev @Benwing2 Now that Category:Russian terms spelled with ' has had time to populate, it seems that it's only used as a hard sign in one entry + its inflections. For the moment, I think it's safe to remove the special handling from Module:ru-translit and use manual transliterations on those pages, but in future it would be good to integrate it properly into the Russian infrastructure in the same way we handle alt-ё forms. Theknightwho (talk) 08:45, 25 January 2024 (UTC)Reply
@Theknightwho Sounds good to me. If you feel like implementing your proposed solution, go ahead. Benwing2 (talk) 09:44, 25 January 2024 (UTC)Reply
@Benwing2 Great - will do. Theknightwho (talk) 09:46, 25 January 2024 (UTC)Reply
@Theknightwho: OK, thanks. Anatoli T. (обсудить/вклад) 13:18, 25 January 2024 (UTC)Reply

curly apostrophe

edit

@Theknightwho, @Benwing2: Hi. diff.I thought curly apostrophes were specifically disallowed, no? I don't remember specific discussions but most French, etc. entries use regular apostrophes. Anatoli T. (обсудить/вклад) 03:57, 18 January 2024 (UTC)Reply

@Atitarev They're disallowed in page titles. I've done what we do for quite a few languages, which is to display curly apostrophes, but they don't go in the page name and it's all handled automatically. It's based entirely on what the usual stylistic choice is for the language, but I don't mind switching it to be the other way around. So long as it's consistent. Theknightwho (talk) 04:05, 18 January 2024 (UTC)Reply
@Theknightwho Yeah I don't agree with normalizing on curly apostrophes like this. I don't know where you've changed it for other languages but I don't agree with those either. Benwing2 (talk) 05:50, 18 January 2024 (UTC)Reply
@Benwing2 I haven't - I was referring to Finnish, French etc., which had nothing to do with me. Theknightwho (talk) 05:52, 18 January 2024 (UTC)Reply
@Theknightwho I see, OK. Benwing2 (talk) 05:59, 18 January 2024 (UTC)Reply

Middle Russian

edit

Pinging @Atitarev and @Benwing2. I recently changed things so that none of the special transformations (e.g. -его (-evo), что (što) etc.) are applied if a langcode is explicitly provided which isn't "ru"; jo accents are also not applied. The main use of this is for Russian Cyrillisations such as the Palladius transcriptions of Mandarin, but it also affects Russian pidgins, where it's not clear that these transformations actually make sense.

However, this change also changes the handling of Middle Russian (etym-only code zle-mru), since it now means none of the transformations are applied. I'm reasonably sure that not all of them make sense for that period, but we should still have a discussion about it. Theknightwho (talk) 15:19, 1 March 2024 (UTC)Reply

Middle Russian writes the genitive endings with a -в- and it probably writes the precursor of что as pronounced, too. Thadh (talk) 15:27, 1 March 2024 (UTC)Reply
Thanks @Thadh. @Theknightwho, I don't know enough about Middle Russian to answer this but hopefully @Atitarev knows and can answer. Benwing2 (talk) 06:21, 2 March 2024 (UTC)Reply
@Theknightwho, @Thadh, @Benwing2: Hi,
Middle Russian should transliterate "г" as "g" and "ч" as "č". I am no expert on Middle Russian. Maybe @AshFox (formerly ZomBear)?
BTW, could we have the links to resources somewhere - corpus or dictionary. I don't remember where it is. Anatoli T. (обсудить/вклад) 23:47, 3 March 2024 (UTC)Reply