Links to Thesaurus:vagina#Portuguese from mainspace edit

It would be ideal if you would undo the changes your bot has done in mainspace, resulting in Portuguese entries no longer pointing to Thesaurus:vagina#Portuguese. Example: diff. If you don't do it, someone else will have to fix it. --Dan Polansky (talk) 07:33, 4 January 2023 (UTC)Reply

@Dan Polansky That's because you removed the Portuguese stuff from the translations page and moved it to Thesaurus:vagina#Portuguese, after I made the bot changes. IMO this is not my problem. I see you only moved Portuguese, not any of the other languages, which doesn't make a lot of sense. Benwing2 (talk) 04:31, 5 January 2023 (UTC)Reply
diff created Thesaurus:vagina#Portuguese, 9 October 2022
diff changed the link via a bot, 19 November 2022
Moving content from /Translations to separate pages is the right thing to do, and can be helped by those who want to help. --Dan Polansky (talk) 07:09, 5 January 2023 (UTC)Reply
@Dan Polansky Completely disingenuous. The Portuguese text was there in Thesaurus:vagina/translations at the time I did the bot edits (and I disagree it's the "right" thing to do to do the sort of thesaurus move you did, esp. when done in a half-assed fashion). You have your own pseudo-bot account (something-Maid, I forget the name), why don't you clean it up yourself? I've spent a LOT of time cleaning up your mistakes and questionable decisions. Benwing2 (talk) 07:19, 5 January 2023 (UTC)Reply
I do not know which mistakes of mine you cleaned up; any examples? Portuguese was in Thesaurus:vagina/translations because someone else has restored it after I had removed it; this restoration was a bad idea and was unjustified. --Dan Polansky (talk) 07:23, 5 January 2023 (UTC)Reply
The move that I did was not "half-assed" (incomplete or badly done); it was complete within the unit of Portuguese. There is no requirement that all languages need to be moved out from /translations in one fell swoop. --Dan Polansky (talk) 07:36, 5 January 2023 (UTC)Reply
I restored it because you had effectively vandalised the entry by removing the vast majority of the content with no justification, and had made no effort to adequately replace the info that you had removed. You then edit-warred over this, and even after realising your own mistake, you have the temerity to indirectly blame me anyway? Theknightwho (talk) 07:53, 6 January 2023 (UTC)Reply
As for Portuguese, I moved the content from one place to another, improving the state of affairs. As for other languages, one can justly charge me with removing content, to which my defense is that the loss was very little and that the lists are often unsubstantiated by mainspace and probably unverifiable inventions, as a cursory look discovers.
Be it as it may, I can restore the links to Thesaurus:vagina#Portuguese from mainspace myself provided the above editors promise not to undo such a correction; I hate to labor in vein and have my efforts thwarted by meddlers with no thesaurus contribution. --Dan Polansky (talk) 13:16, 6 January 2023 (UTC)Reply
Are you intentionally trying to be rude? Because talking about people you are responding to in the third person gives the impression that you are. Theknightwho (talk) 14:09, 6 January 2023 (UTC)Reply
I will trust the rudeness hypothesis after the above programmer traces the hypothesis to an authoritative source. I don't consider it to be rude at all, just taking some distance to the dear honorable gentlemen. --Dan Polansky (talk) 14:17, 6 January 2023 (UTC)Reply

Appendix:Russian Adverbs - Frequency List? edit

Hiya. Do you happen to have access to Russian adverb lists as well? Not sure if that short discussion helps: User_talk:Benwing2/2012-2019#Russian_adjectives_-_frequency_list?. Anatoli T. (обсудить/вклад) 23:56, 17 January 2023 (UTC)Reply

@Atitarev I generated the adjective list based on a frequency list of words of all parts of speech, so it should be possible to generate an adverb list as well, let me see what I can come up with. Benwing2 (talk) 00:58, 18 January 2023 (UTC)Reply
@Atitarev I generated a list in Appendix:Russian Adverbs - Frequency List based on my largest (32,600-word) frequency list. I have a couple of other frequency lists that might give slightly different results but this should get you started. Benwing2 (talk) 01:54, 18 January 2023 (UTC)Reply
Great, thank you! Anatoli T. (обсудить/вклад) 02:05, 18 January 2023 (UTC)Reply

Italian heads edit

Thanks for cleaning up those plurals in Italian headwords. Could you also clean up the heads of pages like numero di telefono? They're redundant but presumably missed by your bot because not every word was linked; I see lots of pages like this with just prepositions not linked. Also, something went wrong at capsa. Ultimateria (talk) 20:55, 25 January 2023 (UTC)Reply

@Ultimateria Thanks. I will do a run to handle missing preposition links. What happened with capsa is I thought I generalized the # support so that a # anywhere in the plural stands for the lemma; I just did that. Benwing2 (talk) 22:27, 25 January 2023 (UTC)Reply
@Ultimateria Done. Benwing2 (talk) 08:31, 27 January 2023 (UTC)Reply

Weird bug edit

This edit (diff) seems to have introduced text inadvertently. Gabbe (talk) 09:21, 4 February 2023 (UTC)Reply

Negativizzare and negativizzarsi edit

(@Catonif, too) Negativizzare must be quite recent, I can't find it on any of the major Italian dictionaries, i.e. Zingarelli, Devoto Oli, De Mauro, Treccani... On the other hand, they all have negativizzarsi, apparently attested since 1986 (Zingarelli). Technically speaking, negativizzarsi is the "reflexive" of negativizzare, but all evidence points to negativizzarsi existing prior to negativizzare. I feel we should make this clear in those entries, somehow... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:10, 6 February 2023 (UTC)Reply

@Sartma Thanks! I've seen several cases where terms that were created after the 1970's or so are missing in Treccani and Hoepli but are present in Internazionale (=De Mauro?), which seems to have better coverage of such terms. For terms like this I often check context.reverso.net for attestation; this is automatically scraped so it's somewhat spotty in quality but I've found it a good check for recent usage. Benwing2 (talk) 18:15, 6 February 2023 (UTC)Reply

пересекший vs пересёкший edit

You moved пересёкший to пересекший, with a comment that it was a misspelling. I believe this to be incorrect.

Google translate pronounces it пересёкший regardless of whether the ё is there or not. Additionally a native speaker I talked to, was only aware of пересёкший in modern speech.

Here is a russian stackexchange post where someone cites some dictionaries that state that both are acceptable however ё is considered less of a good choice[1] 190.194.221.97 03:04, 9 February 2023 (UTC)Reply

The second-to-last discussion section of Module talk:ru-verb touches on this concerning засе́чь (zaséčʹ), which is a very similar verb. In fact, Zaliznyak's grammar says both meanings of пересе́чь (pereséčʹ) are conjugated like the two meanings of засе́чь (zaséčʹ), and says both meanings have засе́кший. Since apparently засёкший and пересёкший are also used in modern Russian, we can create these two terms as alternative forms of засе́кший/пересе́кший. I'll fix the module to generate both forms; but there are lots of other similar verbs, and I'm not sure if all of them allow for -сёкший participles: насе́чь (naséčʹ), обсе́чь (obséčʹ), надсе́чь (nadséčʹ), подсе́чь (podséčʹ), пресе́чь (preséčʹ), осе́чь (oséčʹ), посе́чь (poséčʹ), просе́чь (proséčʹ), ссе́чь (sséčʹ), рассе́чь (rasséčʹ), иссе́чь (isséčʹ), отсе́чь (otséčʹ), усе́чь (uséčʹ), вы́сечь (výsečʹ). User:Atitarev can you comment on this as a native speaker? Do all these verbs allow for past active participles in -сёкший? Benwing2 (talk) 03:30, 9 February 2023 (UTC)Reply
@Benwing2, 190.194.221.97: Hi. (Just back from a long Pacific cruise.) Without a long thought or check I would immediately say "пересёкший" sounds more correct and modern (like I said in that discussion) but since resources say that "пересе́кший" is the correct form, I think we should allow both. Confirmed by gramota.ru (Орфографический словарь). --Anatoli T. (обсудить/вклад) 05:16, 9 February 2023 (UTC)Reply
@Atitarev Thanks and glad you had a good, hopefully relaxing trip! Presumably all the other verbs mentioned above are the same? Benwing2 (talk) 06:04, 9 February 2023 (UTC)Reply
@Benwing2: Thanks! Yes, treat them the same way. I checked some and they all have the same feel. Anatoli T. (обсудить/вклад) 06:06, 9 February 2023 (UTC)Reply
@Benwing2@Atitarev I believe вы́сечь (výsečʹ) would be the exception because the emphasis is on `вы` so ё doesn't make sense. 190.194.221.97 01:24, 10 February 2023 (UTC)Reply
@190.194.221.97, Benwing2: Good pickup. Yes, definitely. Hope Benwing2 would figure that out. Anatoli T. (обсудить/вклад) 01:27, 10 February 2023 (UTC)Reply
@Atitarev Yes, definitely. Benwing2 (talk) 01:39, 10 February 2023 (UTC)Reply
@Benwing2: Re: diff: I couldn't find anything about forms with "-сёкши(й)" being non-standard or proscribed. They are equally correct and common, as far as I can tell, not just attestable but included in dictionaries, modern, at least. Anatoli T. (обсудить/вклад) 09:36, 10 February 2023 (UTC)Reply

|translator= edit

I find that the quotation templates render this parameter in a way that makes it completely unattractive to use. In my opinion it should behave the same as |author=: appear before the title, just with the specification ‘(translator)’. As it is now, not even the examples on the documentation page (the Andersen and Milne quotations) use it.

Or am I mistaken and the present rendering of translator information is the standard? ―Biolongvistul (talk) 21:42, 13 February 2023 (UTC)Reply

There was some discussion about this in the past which led to the current configuration. If it is to be changed, a discussion at the Beer Parlour is probably required. — Sgconlaw (talk) 22:19, 13 February 2023 (UTC)Reply
@Biolongvistul I have no particular opinions on this; please do start a Beer Parlour discussion about changing this if you think the current display is wrong. Benwing2 (talk) 23:41, 13 February 2023 (UTC)Reply

Replacement of unnecessary redirects and templates edit

Hi, could you please carry out the following replacements?

#* {{RQ:Barrow Sermon|The Consideration of our Latter End}}
#*: No virtue is '''acquired''' in an instant, but by degrees, step by step.

to:

#* {{RQ:Barrow Works|sermonname=The Consideration of Our Latter End|passage=No virtue is '''acquired''' in an instant, but by degrees, step by step.}}

Thank you! — Sgconlaw (talk) 19:53, 16 February 2023 (UTC)Reply

@Sgconlaw I went to implement this but then I suspected you mean to convert all sermons in {{RQ:Barrow Sermon}} to {{RQ:Barrow Works}}, not just 'The Consideration of Our Latter End', is that right? Benwing2 (talk) 04:35, 20 February 2023 (UTC)Reply
Yes, as {{RQ:Barrow Sermon}} is a really bad quotation template. It doesn’t even actually refer to any published work, but merely asserts “this is a sermon by Isaac Barrow”. — Sgconlaw (talk) 10:50, 20 February 2023 (UTC)Reply

Arabic spellings in Northern Kurdish headword templates edit

Hi, I've recently taken an interest in adding Northern Kurdish entries with the aid of the Ferhenga Birûskî: Kurmanji–English Dictionary, wherein I often find entries with multiple Arabic spellings (e.g. the noun aciz, spelled as either ئاجز (aciz)‎ or عاجز ('aciz)), but the templates seem to only allow one entry for the ar= parameter.
Since you created the template, I thought I'd ask you. Is there a way to enter multiple spellings that I'm not aware of? — GianWiki (talk) 19:57, 18 February 2023 (UTC)Reply

@GianWiki Hi. Use |ar2= for the second Arabic spelling, |ar3= for the third, etc. Benwing2 (talk) 20:21, 18 February 2023 (UTC)Reply
Thanks! GianWiki (talk) 21:05, 18 February 2023 (UTC)Reply

Updated vowel length in nū̆ptum: bot help? edit

Hello, I'm wondering if you could help me out with a Latin vowel length update. I recently found out that a number of sources show a short vowel in the first syllable of nuptiae and all other words built on the supine stem of nūbō. On the other hand, Lewis 1891 shows the vowel as long. There are potential arguments based on etymology or analogy for either quantity. So I updated the main pages of nubo, nuptus, nuptiae to mark the vowel as ū̆, but I don't want to have to go through all the other inflected and derived forms manually. Would this be something you could carry out with your bot? Here's a list of words I think would be affected: all forms of nū̆pta nū̆ptia nū̆ptiae nū̆ptiālis nū̆pturiō nū̆ptus; supine forms of prefixed derived verbs such as innūbō, transnūbō, obnūbō. Urszag (talk) 22:15, 18 February 2023 (UTC)Reply

{{ping|Urszag} Yes, I have a script to do this, let me run it. Benwing2 (talk) 22:25, 18 February 2023 (UTC)Reply
@Urszag Oops ... Benwing2 (talk) 22:25, 18 February 2023 (UTC)Reply
@Urszag BTW Bennett [2] with corrections by Michelson and Allen has long nūptus; this seems primarily by analogy with nūbō (compare scrīptus from scrībō, where the length is clear, e.g. from Italian scritto). Benwing2 (talk) 22:31, 18 February 2023 (UTC)Reply
Yes, I mentioned in the note I added to nūbō that a long vowel can be supported by analogy to scrībō, scrīptum; but a paradigm with a short vowel in just this verb part is also theoretically possible as in dūcō, dūxī, ductum. (Lachmann's law isn't expected to apply here since the stem-final consonant is an original aspirate.) Combined with the existence of sources that do mark the vowel as ŭ (Gaffiot 2016 version V. M. Komarov gives "nŭptum" in its entry for nūbō), and De Vaan who implicitly describes the vowel as short by listing its forms as nūbō, nūpsī, nuptum, I think it's appropriate for us to use ū̆ to mark uncertainty in vowel length.--Urszag (talk) 22:39, 18 February 2023 (UTC)Reply
@Urszag Agreed. Benwing2 (talk) 22:50, 18 February 2023 (UTC)Reply

References > Further reading edit

Hi, can I ask why you changed 'References' sections into 'Further reading' ones? Catonif (talk) 19:36, 28 February 2023 (UTC)Reply

There was a discussion in the Beer Parlour awhile ago about what goes in 'References' vs. 'Further reading'. WT:ELE isn't very clear on this. I and some others (e.g. Rua) argued that only footnotes should go in 'References', and other references tied to the page as a whole rather than a specific piece of text should go under 'Further reading'. There was no consistency in the Italian entries about what went where and IMO it looks bad if both footnotes and other references go into the same section, so I standardized on putting non-footnote references under 'Further reading'. Benwing2 (talk) 23:33, 28 February 2023 (UTC)Reply
I see, there was, at least for me, a reason for in which header the links where, "Further reading" = "works consulted" and "References" = "works cited", as Ultimateria says on the discussion. Does this mean I should start using References only for footnotes now? And should WT:ELE be amended? I also see some other changes like changing all occurrences of {{ng}} to {{n-g}}, are we deprecating the shortcut {{ng}}? Should I not use it? Catonif (talk) 15:37, 2 March 2023 (UTC)Reply
@Catonif The issue with {{ng}} vs. {{n-g}} etc. is that there are a bunch of equivalent redirects (see [3]; there appear to be 6 aliases of {{non-gloss definition}}), and this was getting in the way of some searching-and-replacing I was doing, so I standardized them all on {{n-g}}, without prejudice towards any of the other aliases (although I'd like to get rid of some of them eventually). As for the References vs. Further reading, essentially what I did is standardize on "works actually cited using <ref>" in ==References== and all others in ==Further reading==, which is apparently similar to what you've been doing except you've putting works in ==References== that you cited conceptually without actually citing using <ref>, right? (If not, let me know where I'm confused ...) I don't really see how the idea of "conceptually citing" a work can be implemented in practice, and I'm almost positive other Italian editors haven't been observing this, but just randomly putting references under one or the other header. For this reason I'd recommend going with the practice I've established, but this is just a recommendation. As for amending WT:ELE, that's a whole can of worms; I think it would be nice to standardize the "==References== only for footnotes" practice but there may be objections since the Beer Parlour discussion you cited had no real consensus. Benwing2 (talk) 07:51, 3 March 2023 (UTC)Reply
Yes I've been using Refs/FR how you described, sorry for not being clear, and thanks for clarifying the {{ng}} thing. I understand that bureaucracy is stressful, and I'm fine with editors having their editing preferences (I'm not Thadh, lol), but per WT:BOT, there should be some consensus before changing everything. We want to make a change that involves how two of our most used headers are used? That's cool, but in that case it should be done site-wide and by updating our written guides (like WT:EL), instead of removing it only from Italian entries, which looks like we're trying to be sneaky about it. Catonif (talk) 19:00, 3 March 2023 (UTC)Reply

regex help? edit

I was wondering how to limit a pattern match from the search box to a single heading, eg L2 or L3/4. My regex and general programming knowledge is very elementary. Can you point to any examples of good regexes that do what I want or to a kind of regex capability that might work. In the simple cases I've tried, I seem to have run afoul of the fact that the regex search is "greedy".

I am optimistic that I could accomplish what I want using Perl on the xml dump, but that it less convenient for many purposes. DCDuring (talk) 18:12, 6 March 2023 (UTC)Reply

@DCDuring I am very familiar with regexes but unfortunately not so much with the limitations of the Cirrus search box because I don't normally use it. (I typically use regex searching through the contents of either a category or all references to a given template, and if that won't work, I search through the dump.) I can definitely help you with Perl or Python regexes applied to the dump file and might be able to help you with the search box if you give me some more details: What exact pattern were you using, what did you expect to happen and what actually happened? Benwing2 (talk) 08:01, 7 March 2023 (UTC)Reply
I've had some trouble interpreting the various fragments of documentation at mediawiki.
I was trying to show how easy it was to find HTML comments using the search box, using "insource=" and 'filters'. I thought it would be handy to show a search focused on one L2 and one L3/4.
My search line is "Pronunciation incategory:"English nouns" insource:/[=]+Pronunciation[=]+[^<]+\<!--.+--\>/"
The regex pattern is what follows "insource". I tried many variations. This search finds any HTML comment in an entry that is in Cat:English nouns that has a pronunciation section, not limited to a section. DCDuring (talk) 13:51, 7 March 2023 (UTC)Reply
@DCDuring: I tried to come up with a regex to find "bor" only within an Etymology section (insource:"Etymology" incategory:"Greek lemmas" insource:/Etymology *[0-9]* *=+((?![^ -􏿿]=).)*bor/) and it didn't work. I guess the negative lookahead syntax ((?!)) is disabled in insource:// though it exists in PHP regex. Negative lookahead is the only way I know to really restrict the search to be within a section. (All this assumes the headers are not commented out and don't have HTML comments interspersed in them, which is legal MediaWiki syntax but not allowed by the style guide.) [^ -􏿿] matches ASCII control characters U+0000-U+001F, which is only newline (U+000A) and tab (U+0009) in wiki pages because all other ASCII control characters are replaced with a replacement character (�). (\n matches a literal n in insource://.) So I think it's impossible to match text only within a section with CirrusSearch. — Eru·tuon 22:51, 7 March 2023 (UTC)Reply
What I feared, not what I hoped, but it's wonderful to be able to stop wasting time. Thanks. DCDuring (talk) 01:47, 8 March 2023 (UTC)Reply
@DCDuring, Erutuon It seems to me it should be possible without negative lookahead. When I created a Python regex to find 'confer' within an Etymology section, I wrote this:
Etymology( [0-9]+)?==*\n((?!=).*\n){0,20}.*[Cc]onfer\b.*
which uses negative lookahead, but you should be able to rewrite it without the negative lookahead like this:
Etymology( [0-9]+)?==*\n([^=\n].*\n|\n){0,20}.*[Cc]onfer\b.*
That is, I'm searching for 0 through 20 occurrences of a line inside an Etymology section, which consists of either (a) a character that's not an equal sign or newline followed by any number of non-newlines followed by a newline, or (b) just a newline. You don't necessarily need the {0,20}, you can use * if it doesn't choke. You have to figure out how to avoid the use of \n but it seems like you've figured that out. Benwing2 (talk) 07:02, 8 March 2023 (UTC)Reply
Thanks, I hope. I'll try it when I can. DCDuring (talk) 15:14, 8 March 2023 (UTC)Reply

Changes to Module:languages edit

Flagging this so you're aware of what I'm doing: I'm making a minor change to Module:languages and Module:languages/data, which should allow me to eliminate much of the additional code in Module:links which you flagged with your comment. This also has the advantage of keeping everything self-contained in the transliterate method, which then subdivides the text where necessary itself.

For context: Module:languages/data/patterns has a series of patterns, which are used to find things like formatting, URLs etc. so that they can be converted in PUA characters (i.e. text which we definitely don't want to be changed). These are stored in a table, and reconverted at the end. However, to avoid feeding PUA characters through a bunch of modules which may be unequipped to handle them, the text is then subdivided using mw.text.split and fed through in chunks. Conveniently, this is also a useful model for page-scraping modules like (the now-consolidated) Module:zh-translit, which makes it possible to feed through terms with embedded links without requiring a page for the whole term. For example, 香港語言學學會粵語拼音方案香港语言学学会粤语拼音方案 (Xiānggǎng yǔyánxué xuéhuì yuèyǔ pīnyīn fāng'àn).

I haven't yet documented this yet, because the ultimate aim is to have the functionality of {{zh-x}}, which uses spaces to achieve the same effect (but isn't capable of handling links). I'll explain this in more detail in my userspace. Theknightwho (talk) 21:28, 6 March 2023 (UTC)Reply

@Theknightwho Great, thank you for the message! Apologies, some RL stuff came up today and I haven't had a chance to look through Wiktionary pings or discussions. Will do that tomorrow (Tuesday). Definitely have some questions about the use of PUA chars and such, maybe your userspace docs will clarify this. Benwing2 (talk) 07:56, 7 March 2023 (UTC)Reply
@Benwing2 Sorry for the delay - a combination of real life stuff and procrastination. Will have things in my userspace this weekend. Theknightwho (talk) 18:17, 9 March 2023 (UTC)Reply

suízhe edit

Reporting bug: currently displays "隨著/随著, 随着". ---> Tooironic (talk) 21:54, 7 March 2023 (UTC)Reply

@Theknightwho: Character has as a variant or as a simplified form. What module needs updating?
The display should be 隨著/随着 as @Tooironic pointed out. Anatoli T. (обсудить/вклад) 22:08, 7 March 2023 (UTC)Reply
@Tooironic @Atitarev This isn't a bug - you can do a manual override using // as a divider (e.g. 隨著//随着), or alternatively make sure the traditional/simplfiied correspondence is correct in Module:zh/data/ts. Someone has been making a lot of changes to that module recently, which may explain the error. Theknightwho (talk) 22:12, 7 March 2023 (UTC)Reply
@Theknightwho, @Tooironic: Thanks. I made a manual edit. I forgot about the // trick. Anatoli T. (обсудить/вклад) 22:15, 7 March 2023 (UTC)Reply

On Northern Kurdish edit

Hi, there.
Since you seem to be an active enough user on the subject of Northern Kurdish, I wanted to try and ask you: do you have any idea if and how I can get answers to the questions I had here and here on the subject?
Thanks in advance for your time. — GianWiki (talk) 16:00, 10 March 2023 (UTC)Reply

Plus template conversion edit

Hey, I think Old Polish, Kashubian, and Silesian should use plus templates, could you run your script on them? Also I think some will have {{l|en|to [[define]]}} or {{l|en|definition}}, those should be bare links, and can we templatize categories (using {{C}} and {{cln}} and quotes? Any any femeq's should be applied to Silesian and Kashubian. Basically all of the changes you first proposed to Polish entries :) Vininn126 (talk) 23:51, 13 March 2023 (UTC)Reply

@Vininn126 Sure, I can do that. Benwing2 (talk) 00:00, 14 March 2023 (UTC)Reply
@Vininn126 I have done this for Old Polish, the others still to come. Benwing2 (talk) 07:37, 18 March 2023 (UTC)Reply

Quotation template replacements edit

Hi, could you please do a bot run to carry out the following replacements?

Thank you. — Sgconlaw (talk) 18:59, 15 March 2023 (UTC)Reply

@Sgconlaw This is done and also your request from Feb 19/20 (apologies for the delay, I had to rewrite the script that handles these requests to handle placeholder values in from-params that get copied to to-params (so in that case, 1= got copied to sermonname=). Benwing2 (talk) 09:28, 17 March 2023 (UTC)Reply
Thank you! — Sgconlaw (talk) 15:14, 17 March 2023 (UTC)Reply

Macrons in Classical Persian transliteration edit

Hi, it’s not appropriate to change macrons to circumflexes in Classical Persian romanizations. It might be right for Dari (since circumflexes suggest a qualitative rather than quantitative vowel difference), but macrons are both correct and absolutely standard for Classical Persian.—Saranamd (talk) 08:52, 17 March 2023 (UTC)Reply

Also @Atitarev.—Saranamd (talk) 08:53, 17 March 2023 (UTC)Reply
@Atitarev I discussed this and lots of other issues with User:Atitarev, where it was agreed to use circumflexes. I can switch the circumflexes to macrons specifically for Classical Persian if there's consensus to do so. Benwing2 (talk) 08:55, 17 March 2023 (UTC)Reply
Macrons are used in both Iranianist sources (e.g. Cheong, Etymological Dictionary of the Iranian Verb) and in the IJMES system, which is the standard “Orientalist” (for lack of a better word) transcription scheme used by English-language academic journals.
This is not universal, e.g. Thackston uses a circumflex in his Introduction to Persian and Millennium of Classical Persian Poetry, but it is certainly the minority for Classical transliterations and the decision should not have been made cursorily.—Saranamd (talk) 09:05, 17 March 2023 (UTC)Reply
Additionally, in the case of cases such as عنقا (anqa), I consider the transliteration of initial ع undesirable because Persian has initial glottal stop for all vowel-initial words, not just Arabic loans. We know that this was the case since Early New Persian.
If we are including this Arabic orthographic feature not reflected in any stage of Persian phonology, why not go the full way and use e.g. <ż> for ض?—Saranamd (talk) 09:15, 17 March 2023 (UTC)Reply
@Atitarev, Saranamd Let's see what Anatoli says. It's a little late to make wholesale changes like you're suggesting to the translits since I just did a big run trying to clean them up and Anatoli is in the middle of handling the cases that couldn't be done automatically; would have been nice if you had taken part in the long discussion that Anatoli and I had before making the changes, see User talk:Atitarev#Persian questions. Benwing2 (talk) 09:26, 17 March 2023 (UTC)Reply
I wasn’t aware of the discussion (not having been pinged), my apologies. It does feel a little bad, since I have been the main (only?) regular contributor to Persian over the past year.—Saranamd (talk) 09:28, 17 March 2023 (UTC)Reply
@Saranamd My apologies, I didn't realize that you have been contributing. I have added you to the Persian workgroup data to future workgroup pings will reach you. See also the Beer Parlour discussion at Wiktionary:Beer parlour/2023/March#Cleaning up Persian templates. Benwing2 (talk) 16:14, 17 March 2023 (UTC)Reply
@Saranamd, Atitarev BTW I have thought about it and I believe that all quotes and other terms from Classical Persian should use the etymology code fa-cls in place of fa. (Note, Dari also has an etymology code prs.) In general it's not sustainable to have two translit schemes for a given language so separating by etymology language is a way forward. We will need to modify the Module:links and translit code to etymology-only languages can have their own translit modules (or at least, the etymology-only variant gets retained and passed into the language's translit module as an additional param rather than canonicalizing all etymology-only codes to their parent code, attention User:Theknightwho).
@Saranamd, Atitarev Fuck, forgot to sign so ping won't go through. Benwing2 (talk) 16:20, 17 March 2023 (UTC)Reply
@Theknightwho See just above. Benwing2 (talk) 16:21, 17 March 2023 (UTC)Reply
@Benwing2 This actually fits neatly with a broader idea I had about etymology-only languages: essentially, replacing them with “variants”. From a technical perspective, a variant object would basically start off as a clone of the parent language object, but we could use data modules to optionally vary them however we see fit. That would make things like this really straightforward (see also the Prakrits etc). It would also make conversions from language to variant/variant to language simpler, and would introduce opportunities for using them in other ways (e.g. it feels silly to have BrEng/AmEng as “etymology-only” languages, as it’s rarely relevant, but it might be useful to have variants to handle differing spellings or pronunciations). Theknightwho (talk) 17:24, 17 March 2023 (UTC)Reply
@Theknightwho Sounds good, can you write up a little more on your thoughts about this? I feel we should hash out how variants work before diving into an implementation. BTW didn't even realize British and American English have etymology codes. Benwing2 (talk) 18:10, 17 March 2023 (UTC)Reply
@Benwing2 Yep - will do that this evening. On a side note, I’d like us to get rid of nonstandard etym-only codes like "LL." if at all possible (or at least bar new ones), but I don’t know if that’ll be very popular. Theknightwho (talk) 18:13, 17 March 2023 (UTC)Reply
@Theknightwho Yeah people seem to like those codes but definitely at the very least we should prohibit new ones. Benwing2 (talk) 18:14, 17 March 2023 (UTC)Reply
In the case of Classical Persian, it might be justifiable to use modern Iranian translit since that is how Iranians pronounce Classical texts, much like how Mandarin speakers pronounce Classical Chinese with Mandarin readings.
But a dedicated Classical translit would be more desirable for many reasons:
  1. Iranian pronunciation should not be overprivileged over Dari and other regional pronunciations, especially when Dari is closer to the actual pronunciations that Rumi or Sa’di would have used.
  2. It is usually trivial to get Iranian pronunciation from a Classical translit, but not vice versa due to phonemic mergers in Iranian.
  3. Some Classical rhymes do not rhyme in Iranian (e.g. خورد (xward/xord, he ate) and درد (dard, pain)), and some non-rhymes are now rhymes. Classical texts, both prose and verse, tend to be heavily rhymed.
Saranamd (talk) 05:49, 18 March 2023 (UTC)Reply
@Benwing2, Saranamd Thank you both! I was "out of action" since Friday - a real life got in the way and it's going to be a busy week. I'll try to catch up on pings and outstanding things. --Anatoli T. (обсудить/вклад) 22:31, 19 March 2023 (UTC)Reply
@Benwing2, Saranamd We're now in a position to create a Classical Persian transliteration module, as etymology-only languages now support full customisation. Theknightwho (talk) 03:42, 1 April 2023 (UTC)Reply
@Atitarev, Theknightwho Great! This can go on the list of Persian-related things to do (of which there are a lot :) ... the current code is messy and incomplete). Benwing2 (talk) 04:15, 1 April 2023 (UTC)Reply
@Theknightwho, @Benwing2 I do not think it is possible to automate Classical transliterations without making up our own vowel marks for the majhul vowels ē and ō.
In Classical/premodern dictionaries, pronunciations are indicated by poetic quotations where the meter and rhyme indicate the pronunciation by reference to a more widely known word, or by explicit designations (e.g. the dictionary might say that a given word has واوِ فارسی “the Persian و” for ō, etc.).
So I don’t think it is possible to transliterate CP, even when fully marked with diacritics.—Saranamd (talk) 04:54, 1 April 2023 (UTC)Reply
In late Indian sources, word-final majhul ē is distinguished from ī in the same way as modern Urdu, so that the critical difference between آمدی (āmadī, you came) and آمدے (āmadē, he/she was coming) can be represented. But I don’t think this is generally the case in premodern manuscripts.—Saranamd (talk) 05:07, 1 April 2023 (UTC)Reply
@Saranamd There are tricks that could be used in these cases. For example, we could implement special chars that are inserted after the majhūl vowels, which are passed onto the translit but removed before displaying the text or creating links. This exact thing is currently done for Korean with hyphens to mark prefixes, suffixes, etc.; the hyphens show up in translit but I'm pretty sure not elsewhere. I implemented this at the behest of User:Tibidibi. When I did this it was done in a very bespoke fashion for efficiency purposes, but it's possible that User:Theknightwho has generalized this mechanism by now. Benwing2 (talk) 05:30, 1 April 2023 (UTC)Reply
@Benwing2 @Saranamd Yep - you can make sure they're hidden with makeDisplayText in Module:languages. Theknightwho (talk) 05:46, 1 April 2023 (UTC)Reply

Listing of Periodic XML dump runs? etc. edit

  1. I was going to ask you about a specific kind of error in out entries (mismatch between inflection-line template and L2 header).
  2. But that got me thinking that it would be handy for each such run to be listed with a link its code, the date of the run, and the date of the dump on which it was run, as well as a description comprehensible by BP readers, rather than GP readers.
  3. Relatedly, at one time years ago, either Ullman or Visviva did runs of L2 sections that were not linked-from, but had links-to other entries with the same L2. (I think it was called "Linkeration".) I don't remember exactly how it treated alternative and inflected forms, but such a run among lemmas has value and may lead to reduction of the number of orphaned entries.
  4. Also, several of the special pages are no longer useful, probably beyond saving. Occasional runs duplicating the original intent of such pages (probably with more selectivity, eg, excluding user pages, or talk pages, or all spaces other that NS0) might be useful, even if the runs are infrequent (ie, annual, quarterly). Maybe you already do some of these things or have reasons for not doing them, but that gets back to my second point above. DCDuring (talk) 22:38, 17 March 2023 (UTC)Reply
@DCDuring Sure. It is pretty easy for me to write scripts to do various things with the XML dump. Can you clarify what some of the things above mean, with examples? E.g. which sorts of "runs" are you referring to in #2? By #3 you mean a given term in a given language that no other term links to (an orphan)? What is the purpose of restricting to only such terms that link to other same-language terms? In #4 by "special pages" you mean things in the Appendix and Wiktionary space, or my userspace pages, or ...? I agree there is a probably a lot of junk there in any case, although how would we identify such junk other than maybe by looking at the date of the last non-bot commmit? Benwing2 (talk) 23:40, 17 March 2023 (UTC)Reply
2. Any periodical processing of the XML dumps that identified recurring entry defects, whether formatting problems, misapplied templates, mismatches between Language and templates, labels, categories, etc.
3. AFAICR, in "Linkeration", considering only L2s in the same language, if entry A had a link to an entry B, but B did not have a link to A, that state of affairs appeared in a linkeration listing.
4. Special pages contains numerous pages with a maximum number displayed of 5,000, though there may actually hundreds of thousands of pages that actually have the characteristics. For, example, when I try to use such pages to find "wanted" items that are "Translingual" or "mul", I usually find that there are huge numbers of inflected forms that fill the displayed pages. Since that situation hasn't changes in 15 years, I believe it would be useful to have more selective replacements for such species pages, limited to, say, lemma pages, grouped by language in decending order of, say, "want".
HTH. DCDuring (talk) 00:05, 18 March 2023 (UTC)Reply
@DCDuring I actually don't normally do such periodic runs. The only things I do on a regular basis are create the {{auto cat}}-able categories in Special:WantedCategories every time it's refreshed (every 3 days) and periodically delete the empty categories in Category:Empty categories. I have done various runs fixing problems (e.g. a few months ago I did a run over almost all languages fixing misindented headers, and twice now I've done runs to remove unnecessary Unicode BIDI chars, particularly U+200E LEFT-TO-RIGHT MARK) but on a one-off basis. I don't currently have any infrastructure set up to automate recurring runs, although it's probably a good idea to do so, using toolforge or wmflabs or whatever.
  • As for Linkeration, looking for cases where "entry A had a link to an entry B, but B did not have a link to A" is very different from looking for orphaned pages; do you have an example of the output of any such run?
  • For #4, I see, and I think you've mentioned this before. I agree that a lot of the current Special: pages are filled with crap; it's too bad we don't have the ability to set custom filters to weed out that crap. I thought maybe there's an editor from Germany who produces the lists you're looking for? If not, and you can create a detailed requirements doc outlining what you are looking for, I can see about implementing it for a one-time run and later maybe automating it. Benwing2 (talk) 00:41, 18 March 2023 (UTC)Reply
2. Periodical runs would have value applied to new or recently revised L2 sections. I doubt that filters and patrols can prevent various outrages from being inflected on our entries, though maybe I underestimate our defenses.
3. Ullman is dead, Visviva has been inactive since 2022, but might be reachable. I hope he was the guy who did "Linkeration".
4. I can do that in a month or two.
Thanks. DCDuring (talk) 00:54, 18 March 2023 (UTC)Reply
3. See User:Visviva/Linkeration. DCDuring (talk) 02:03, 18 March 2023 (UTC)Reply
@DCDuring I see. It is looking for situations where A links to B using a relationship that "ought" to be reciprocated but isn't (synonym, antonym, related term, derived term reciprocated as related term?, homophone). I guess my question is, how important is this to worry about, vs. e.g. finding the non-junky "wanted" pages in a given language or finding orphaned pages? I can certainly implement this but each script takes some work and I'd like to do this in a priority order. Benwing2 (talk) 02:10, 18 March 2023 (UTC)Reply
Priorities are good, but very hard to decide on. Is quantity more important than quality? Are some errors more important than others? One class of priority items are those that have fundamental defects like inflection-line templates for a language other than that of the L2 in which they appear. Such defects lead to miscategorizaton, which makes them hard to find using searchbox searches to discover other defects, which are all too likely if there is a fundamental error. Generally, searching for and manually correcting picky errors leads one to entries that have multiple errors or, at least, weaknesses. In the English backbone, quality is now more important than quantity. Many quality problems need manual correction, using searches for defects likely to co-occur with multiple other, hard-to-detect defects. If defects need to be found using regex searches, then the "filters" (basic language and PoS categories, inflecton-line templates) needed to allow them to run to completion are fundamentals that need priority. Sorry if this seems like a TLDR rant and isn't clear. DCDuring (talk) 08:34, 18 March 2023 (UTC)Reply
@DCDuring I do understand what you're saying and I agree esp. for English (and several other well-covered languages) we don't need more entries for ever-more-obscure words, but we need to improve the quality of existing entries. One thing I've been particularly meaning to focus on with English is pronunciation, which is often missing or inconsistent; but that takes a good deal of work, either to create a pronunciation module (which will require a lot of thought given how inconsistent English spelling is) or to use some free source for English pronunciations (when I looked a few years ago, there were two of them and both had significant issues). As for things like wrong-language inflection templates, these are actually easy to find by bot (and I suspect many of the other fundamental defects you're thinking of can also be found pretty easily by bot). So if you want I can start with this and generate a list of such wrong-language entries. Benwing2 (talk) 19:45, 18 March 2023 (UTC)Reply
Can't help on pronunciation. I'd like to see whether we have very many of those wrong-language inflection templates. I've stumbled across ten or so lately, which I've corrected. Maybe there aren't as many as I fear. If they are few, we can wait a year or two before doing another run. Also, there are some untemplated "form of"-type definitions, which are easy to find if they have literally "form of" in the text. With all searches for defective entries I usually start with correcting taxonomic entries and English organism-name entries before attacking English as a whole. DCDuring (talk) 21:04, 18 March 2023 (UTC)Reply

Implementation of removing horizontal rule separators edit

Hi, I am an editor from Chinese Wiktionary. As the community in English Wiktionary decided to remove the ---- seperator, we want to follow up as well. I tried accomplishing this task with AWB, but it won't give the whole list of Special:AllPages, nor will it generate the list of entries with the seperator. There are other more powerful tools (like Pywikibot), but I am not familiar with them. Would you mind sharing your code, or the approach your bot used? --TongcyDai (talk) 19:09, 18 March 2023 (UTC)Reply

@TongcyDai Hi. What I did essentially is to download the English Wiktionary dump from dumps.mediawiki.org and use some existing scripts I've written that make use of Pywikibot. My scripts are here: [4]. The first thing I did was this:
bzcat enwiktionary-20230301-pages-articles.xml.bz2 | python find_regex.py --stdin -e '(^.*\n)?^--+$(\n.*)?' --all --namespaces 0 > find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1
This looks through the dump file for files matching the given regular expression.
Then I used this:
python rewrite.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1) --from '\n+---+\n+==' --to '\n\n==' --from '\n+---+\n*\Z' --to '' --diff --comment 'remove horizontal rule separators per [[Wiktionary:Votes/2023-02/Removing the horizontal rule]]'
This does the actual change. There were about 478,000 pages needing changing so I actually did ten separate invocations of this command like this:
#!/bin/zsh

SAVE="--save"
cmd="/opt/local/bin/python rewrite.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1) --from '\n+---+\n+==' --to '\n\n==' --from '\n+---+\n*\Z' --to '' --diff --comment 'remove horizontal rule separators per [[Wiktionary:Votes/2023-02/Removing the horizontal rule]]'"
sleep 0 && eval "$cmd $SAVE 1 25000 > rewrite.remove-horizontal-rule.1-25000.out.2" &
sleep 3 && eval "$cmd $SAVE 25001 75000 > rewrite.remove-horizontal-rule.25001-75000.out.1" &
sleep 6 && eval "$cmd $SAVE 75001 125000 > rewrite.remove-horizontal-rule.75001-125000.out.1" &
sleep 9 && eval "$cmd $SAVE 125001 175000 > rewrite.remove-horizontal-rule.125001-175000.out.1" &
sleep 12 && eval "$cmd $SAVE 175001 225000 > rewrite.remove-horizontal-rule.175001-225000.out.1" &
sleep 15 && eval "$cmd $SAVE 225001 275000 > rewrite.remove-horizontal-rule.225001-275000.out.1" &
sleep 18 && eval "$cmd $SAVE 275001 325000 > rewrite.remove-horizontal-rule.275001-325000.out.1" &
sleep 21 && eval "$cmd $SAVE 325001 375000 > rewrite.remove-horizontal-rule.325001-375000.out.1" &
sleep 24 && eval "$cmd $SAVE 375001 425000 > rewrite.remove-horizontal-rule.375001-425000.out.1" &
sleep 27 && eval "$cmd $SAVE 425001 478000 > rewrite.remove-horizontal-rule.425001-478000.out.1" &

wait
This may not be needed for the Chinese Wiktionary, which I assume is smaller. My scripts still use Python 2.7 and a somewhat old version of Pywikibot so you might have a bit of difficulty getting them running; you should also be able to write your own Pywikibot script, and I'm pretty sure Pywikibot already comes with a built-in script to do regex substitutions like my rewrite.py script, so you just have to figure out how to use it.
Benwing2 (talk) 20:43, 18 March 2023 (UTC)Reply

Some horizontal rule separators are still there edit

Did your bot complete the removal of horizontal rule separators already? I'm asking because I still see some, e.g. iBhayi, i.p.v., monoloog, Baai, toepassing, knorrig, pretpark, du Plessis, tikfout, inkus, speelkaart. Thanks! tbm (talk) 07:31, 19 March 2023 (UTC)Reply

@tbm That's because my bot used the Mar 1 2023 dump to figure out which pages need fixing. Any pages where the separators were only added afterwards will still have them. When the Mar 20 dump comes out (in two days or so), I will do another run, which should eliminate the remaining ones. Benwing2 (talk) 07:32, 19 March 2023 (UTC)Reply
Thanks for the explanation! tbm (talk) 07:33, 19 March 2023 (UTC)Reply

Module:cel-verbs edit

Thanks for reminding me to use the preview-page-with-template button. Since you intervened in my vain attempts to fix a problem in this module, I'll explain.

What I was attempting to do (and obviously struggled with) is to get the suffixal ablaut in the present stems to work properly. The issue is that Rua hard-coded the thematic e/o-conjugation by default via having "stem_e" and "stem_o" variables, which obviously doesn't work when the nasal infix on laryngeal-final roots gets involved. Most of my troubles have been attempting to work within this hardcoding instead of removing it entirely. I have since un-hardcoded this after you intervened (but this effort added around 2000 chars to the code). — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 23:28, 19 March 2023 (UTC)Reply

@Mellohi! I see, thanks for the message. Yeah you should work on removing the hard-coding, although it will probably take some work; Proto-Celtic verbs, like other old Indo-European languages, are very complicated. I have created several conjugation modules and they can run to several thousand lines when all variants are supported, even with clever coding. Benwing2 (talk) 23:37, 19 March 2023 (UTC)Reply
BTW I wouldn't worry about adding lines to the code; this is necessary in any case and we can work later on eliminating the redundancy. Benwing2 (talk) 23:38, 19 March 2023 (UTC)Reply

Context objects edit

Hiya - just as an FYI, I've created Module:User:Theknightwho/contexts for "context objects", which provide a straightforward way to manage complex combinations of context flags. In essence, it's a dressed-up version of bitwise operations, with a few convenience measures thrown in. Once you've given it a list of of context names, they can be toggled/checked in various different ways. It also handles aggregate contexts (i.e. contexts as sets), and has a mechanism for adding further contexts/removing those which are no longer needed. There's also a way to save/load the context state, which remains compatible throughout the lifetime of the context object (even if certain contexts get added/removed in the meantime).

This was developed as part of my work on the wikitext parser, but I feel like it could probably come in handy elsewhere. Theknightwho (talk) 02:47, 21 March 2023 (UTC)Reply

@Theknightwho Cool, I will take a look. BTW are you up early or late? :-) Seems like it's around 3am in England now. As for parsing, I didn't respond to your Grease Pit comment from a few days ago but I agree that the path forward is probably to write a parser that punts the edge cases to frame:preprocess(). As long as the edge case detector is robust (which shouldn't be too hard to implement), this should work quite well, as the edge cases (such as nested templates and mismatched/misplaced brackets or braces) should occur rarely. Benwing2 (talk) 03:18, 21 March 2023 (UTC)Reply
@Benwing2 I mostly work from home and have a lot of control over my own schedule, so I guess it's very late! To be honest, I get most of my best work done at night, as there's nothing to distract me.
Re the nested templates, I think there are actually quite a lot of them lurking in other templates, which is where things get a bit tricky. I've made some decent progress on transpiling a general purpose parser, but I'm honestly unsure what the performance is going to be like. However, it'll be easier to know what I can cut out once it's more complete. The very basic one I made a couple of months ago showed massive memory gains very quickly, but that's also because it was very compact (and fragile): on a page with a few link and head templates, the memory usage went down from 8.32MB to 5.86MB with loading time increasing from 0.266 to 0.281 seconds, going from what I wrote down at the time. I also noticed that the memory savings were accelerating as the page got more complex, while the time increases were decelerating. Theknightwho (talk) 04:06, 21 March 2023 (UTC)Reply
@Theknightwho Yeah I am also a night owl, although more recently I've tried to avoid staying up past 2-3am. It's great that even a simple parser showed significant memory gains. I still think though before you go down the path of implementing nested templates, see how often these actually occur; and even if you handle them, I wouldn't bother trying to handle more than one level. Because of wiki syntax weirdness and a wiki parser where (AFAIK) the "spec" is simply how the code runs, things can get really complex really quick when you have nesting. Benwing2 (talk) 04:16, 21 March 2023 (UTC)Reply

Bad bot edit edit

Probably a one-off, but just FYI: https://en.wiktionary.org/w/index.php?title=tusk&diff=prev&oldid=72178172 JeffDoozan (talk) 14:45, 22 March 2023 (UTC)Reply

@JeffDoozan Thanks, yeah my script to move synonyms inline apparently doesn't work perfectly in the presence of HTML comments. Fixed manually. Benwing2 (talk) 19:17, 22 March 2023 (UTC)Reply

Updating module message. edit

The things I don't know:

  1. Logic of module
  2. Where message is displayed
  3. Why we need taxon "rank" (or other type) in the message
  4. How to pass a parameter from template to message

With this level of ignorance I can't responsibly edit the module. DCDuring (talk) 11:55, 28 March 2023 (UTC)Reply

@DCDuring OK, my apologies and thanks for the message. An example is here: Category:Entries using missing taxonomic name (group) The message has two parts, a "description" (the first line) and an "additional" (the second line). Can you simply write out what you think the text of the category should be, including the taxon type if you think it should be present? For example, maybe the message should read this:
Entries that link to wikispecies because there is no corresponding Wiktionary entry for the taxonomic name group in the template {{taxlink}}.
instead of this:
Entries that link to wikispecies because there is no corresponding Wiktionary entry for the taxonomic name in the template {{taxlink}}.

Benwing2 (talk) 18:16, 28 March 2023 (UTC)Reply

Entries that link to Wikispecies because there is no corresponding Wiktionary definition for it as a taxonomic group.
The underlining highlights how my wording differs from yours. DCDuring (talk) 18:28, 28 March 2023 (UTC)Reply

Polish Adjective Module edit

I'd like to add an obsolete form to the instrumental plural (feminine/neuter nouns could take -emi in hard stems or -iemi in soft), I'm not sure what is needed to do that. (I'd also like to eventually get acceleration on adjectives and nouns, but the code is so spaghetti-like I'm not sure that'd be easy. Up to the task? Vininn126 (talk) 21:26, 11 April 2023 (UTC)Reply

I also realized that our changed to olddat= should be bot changed to 1 instead of the absorbed form (some of the masculine virile nominative forms are set as the old dative). Vininn126 (talk) 21:26, 11 April 2023 (UTC)Reply
@Vininn126 I can definitely take a look after some more work on Czech nouns. Probably we should just rewrite the adjective module; the Czech adjective module only took a couple of days to write and hopefully Polish isn't much more complex. Benwing2 (talk) 21:39, 11 April 2023 (UTC)Reply
Thanks. Plus the module is... done in a way, at least. Maybe that will speed things up. We can probably even rename it to something simpler. Vininn126 (talk) 21:40, 11 April 2023 (UTC)Reply

Declension of Russian ха́нец edit

Hiya - Russian ха́нец (xánec) doesn't seem to have its declension accounted for by {{ru-noun-table}}, so at the moment it all has to be entered manually. It's a reducible stem with an emergent ь. Also pinging @Atitarev. Theknightwho (talk) 23:09, 17 April 2023 (UTC)Reply

@Theknightwho: This word is irregular, in terms of it having the "ь" in the stem. By Palladius System rules (hàn) is transliterated as хань (xanʹ), hence ха́ньцы (xánʹcy). The singular form must be back-formed from plural. @Benwing2: I think a manual input is required, at least for some forms, not sure. I don't know another word with such declension.
Entry created by @Tetromino. Anatoli T. (обсудить/вклад) 23:35, 17 April 2023 (UTC)Reply
Actually, тайва́нец (tajvánec, Taiwanese male person) belongs in the same boat. E.g. nominative plural can be both тайва́нцы (tajváncy) or тайва́ньцы (tajvánʹcy). The reason is the same. 臺灣台湾 (Táiwān) is Тайва́нь (Tajvánʹ) per Palladius Cyrillisation system. Anatoli T. (обсудить/вклад) 23:40, 17 April 2023 (UTC)Reply
@Atitarev Thanks. If this is down to Palladius rules, it would apply to (edit: words backformed from) any Mandarin borrowing ending in (pinyin) "n", which includes a lot of place names such as Сиань (Sianʹ, Xi'an). I know the system, as I added Palladius to {{zh-pron}}. Theknightwho (talk) 23:43, 17 April 2023 (UTC)Reply
@Theknightwho: Oh, thanks. I didn't pay attention. Now, I can see that Module:cmn-pron uses Palladius.
тайва́нец (tajvánec) can be created regularly (with -нц-) but for cases with "ь" (before ц), a manual override would currently be required. Anatoli T. (обсудить/вклад) 23:49, 17 April 2023 (UTC)Reply
@Atitarev, Theknightwho This issue came up before and I remember either implementing something for it or saying I would implement something. Let me see whether I actually did anything. Benwing2 (talk) 02:04, 18 April 2023 (UTC)Reply
@Benwing2 Thanks for implementing this - it works great on ха́нец (xánec). Should we add a category for these? It's sufficiently weird/interesting enough that it's probably worth it; especially given that it's not predictable. Theknightwho (talk) 23:26, 18 April 2023 (UTC)Reply

Replacement of unnecessary redirects and templates edit

Here is another batch:

Thank you. — Sgconlaw (talk) 13:04, 22 April 2023 (UTC)Reply

@Sgconlaw Done, apologies for the delay as I was overseas. Benwing2 (talk) 17:10, 27 April 2023 (UTC)Reply
No worries at all. Thanks! — Sgconlaw (talk) 17:18, 27 April 2023 (UTC)Reply

{{fa-IPA}} edit

Hello, have there been any updates to the template? Thanks in advance.—Saranamd (talk) 06:10, 24 April 2023 (UTC)Reply

@Saranamd Not recently; I got partway through and then shifted to Czech. I will get back to Persian soon. Benwing2 (talk) 06:40, 24 April 2023 (UTC)Reply

More Latin vowel length changes edit

Hi, I made a few more changes to some Latin vowel lengths in words with "hidden" quantities: hirtus, hirsutus, luxus, luctor but I still need to get all inflected forms and derivatives updated. I'd appreciate any help you can spare. Also, I saw your comment to Brutal Russian about Bennett and changes from that source. I think while it's a great place to start, there is definitely more recent work that has been done that in some cases contradicts Bennett's conclusions; I've added citations to all of the above entries saying which authors give which lengths. Even if it isn't immediately clear why, Brutal Russian probably had some reason for each edit, so I'd be cautious when changing any entries back; I think it would be a good idea to first check De Vaan, and if there's an entry for it, Wartburg's Französisches Etymologisches Wörterbuch and Buchi and Schweickard's Dictionnaire Étymologique Roman as well to get more recent perspectives on vowel lengths. Urszag (talk) 15:09, 29 April 2023 (UTC)Reply

@Urszag Thanks, I'll work on this soon. I haven't changed any entries back but I would definitely like to hear from User:Brutal Russian; when there is any question about hidden length it's important to add sources so I'm glad you are doing this. Benwing2 (talk) 15:15, 29 April 2023 (UTC)Reply
@Urszag You pinged me on ullus about footnotes. There's no current way of specifying footnote symbols or numbers in template arguments. Probably the easiest way of adding this is to use Module:table tools when displaying the forms; I'll see about implementing it. Benwing2 (talk) 21:53, 30 April 2023 (UTC)Reply
Oh, thank you!--Urszag (talk) 23:49, 30 April 2023 (UTC)Reply

Replacement of unnecessary redirects and templates edit

Hi, a new batch:

Thank you. — Sgconlaw (talk) 15:51, 7 May 2023 (UTC)Reply

@Sgconlaw Should be done. I also renamed {{RQ:Fitzgerald Gatsby}}{{RQ:Fitzgerald Great Gatsby|year=1953}}. Apologies for the delay. Benwing2 (talk) 21:59, 13 May 2023 (UTC)Reply
No worries. Thanks! — Sgconlaw (talk) 22:16, 13 May 2023 (UTC)Reply

Russian transliteration scraper edit

Also tagging @Atitarev. I've develoepd a Russian transliteration scraper which is able to grab manual transliterations from headword templates, meaning that they only need to be entered in one place. That means, for example, that {{l|ru|атеисти́ческий}} would output атеисти́ческий (atɛistíčeskij) instead of атеисти́ческий (ateistíčeskij). It's currently in Module:User:Theknightwho/ru-translit as export.scrape_tr. Here is my idea of how it should work:

  • It checks for the exact stress pattern, and ignores anything else, which reduces the chance of ambiguity if there are multiple headwords with different stress patterns available.
  • If no stress is given and the term includes ё, then we treat it like a stress accent and check for an exact match only.
  • Otherwise, if no stress is given, then we grab all available possibilities and remove the stress accents, and then list them. That avoids terms like атеистический falling back on automatic transliteration, which would be wrong, because the entry shows that atɛističeskij is the only valid transliteration. However, in other cases where multiple transliterations are available, then either that's correct (e.g. terms like бел (bɛl, bel) where both are valid), or it's a prompt that the user should be more specific.
  • In some cases, transliteration is ambiguous even when the stress is given. I suggest that we give multiple transliterations as a comma-separated list in those situations, too: e.g. гэ́канье (hɛ́kanʹje, gɛ́kanʹje), for the exact same reason.
  • However, I think there should be an exception for alternative spellings where е is substituted for ё. Unless it's the only one given, they shouldn't be taken into account: e.g. берет (beret) and not берет (berjót), even though the latter is the only unstressed headword on the page. The reason being that these are marginal terms that rarely need to be linked to.

I've done my best to keep things as efficient as possible, and I haven't noticed a significant performance hit. For comparison, some of the Chinese entries scrape over 2,000 pages in a similar way. Do let me know your thoughts. Theknightwho (talk) 00:48, 8 May 2023 (UTC)Reply

@Theknightwho: Sounds good.
Will it work on inflected forms as well or only on exact same forms? E.g. атеисти́ческое (nom. and acc. neuter, sg.) is currently undefined.
гэ́канье differ by senses. "hɛ́kanʹje" and "gɛ́kanʹje" (or the verb it is derived from) are actually opposite perspectives of the pronunciation of letter г (g) in Russian (standard/regional). Overall, I am not sure, if pairs like |head2= and |tr2= are better than comma-separated transliteration2 but the former looks neater.
It's alright to fail the transliteration if the ambiguity exists. Anatoli T. (обсудить/вклад) 01:00, 8 May 2023 (UTC)Reply
@Theknightwho Overall this sounds good but I need to think about some of the special cases. For example, how are multiword terms handled? Also, the current transliteration module has special cases to handle ё occurring along with a stress (трёхэта́жный (trjoxɛtážnyj)) and even occurrences of two ё in a single word (трёхколёсный (trjoxkoljósnyj)). Do you handle these correctly? I will take a look at your code when I have a chance. Benwing2 (talk) 01:06, 8 May 2023 (UTC)Reply
@Atitarev It'll work if the page has been created, but otherwise it's forced to fall back on automatic transliteration. In theory it might be possible, but it would involve reversing the inflection algorithm, and I doubt it's possible to do that unambiguously in a way that's efficient enough. Inflections aren't linked very often, though, and if you notice that the scrape isn't working then it's simple to just create the inflection page.
I would rather avoid transliteration fails, as unlike Chinese they will be rare, meaning they're much more likely to confuse the user. Giving multiple possibilities feels like a better hint to the user that they should clarify, and in some cases it's simply better to show that more than one transliteration is possible. We should probably cap it at two, though, if we do take that approach.
@Benwing2 Multiword terms are handled differently depending on the headword template. For {{ru-verb}}, {{head}} etc., it's simply a case of grabbing what's in the transliteration parameter. If there are multiple possibilities, they're already given as a comma-separated list. This is cross-checked against the relevant head parameter first, so that only the correct stress pattern gets used. For {{ru-noun+}} and {{ru-proper noun+}}, the template parses the list of arguments, creating a list of all possible combinations (for the cases where multiple transliterations are provided). Manual transliterations are substituted into the text, and then that text is fed through the main transliteration function. This works on the assumption that it's safe to input Latin text to the translit function, but I need to confirm with you that that's actually the case. As an example, {{m|ru|арифмети́ческая прогре́ссия}} would give арифмети́ческая прогре́ссия (arifmetíčeskaja progréssija, arifmetíčeskaja progrɛ́ssija) from the input {{ru-noun+|[[арифметический|арифмети́ческая]]|+|_|[[прогре́ссия]]|or|[[прогре́ссия]]//progrɛ́ssija}}, which matches what's shown on the headword line.
For cases like трёхэта́жный (trjoxɛtážnyj), I want to double-check how these should be handled. Using it as an example:
  • If трёхэта́жный and трёхэтажный were separate headwords on the page, then not including the stress mark would output the transliteration for трёхэтажный, and it would not be treated as a stressless version of трёхэта́жный. e.g. {{m|ru|трёхэта́жный}}: трёхэта́жный (trjoxɛtážnyj); {{m|ru|трёхэтажный}}: трёхэтажный (trjóxɛtažnyj).
  • However, if трёхэта́жный is the only headword (as is the case), then not entering the stress mark means it should be treated as a stressless version of трёхэта́жный: {{m|ru|трёхэтажный}}: трёхэтажный (trjoxɛtažnyj). This avoids misleading the user with the wrong stress.
The overall point is making sure that the output matches the transliterations as given on the page, while making sure to include/exclude stress marks depending on whether the user has included them.
I still need to implement the removal of stress marks for unstressed input, as I wanted to check with both of you first as to how it should be handled. I'll let you know once that's done (which will probably be tomorrow). Theknightwho (talk) 01:35, 8 May 2023 (UTC)Reply
@Theknightwho Can you explain more why you think showing no transliteration ("transliteration fail") is confusing? It seems to me it may be better to show nothing if there's ambiguity (not simply a single headword with two possibilities, but two separate headwords with different stresses), or show some explicit indication of ambiguity, not just put two translits. In any case the page should be added to a tracking category.
As for {{head}}, there may be multiple transliterations in |tr2=, |tr3=, etc. so we need to handle that.
As for Latin in the translit function, yes that should be safe.
It sounds like you're doing the right thing for трёхэта́жный (trjoxɛtážnyj). In any case I think that only words with the prefixes трёх- and четырёх- work this way, although I'm not completely sure. Benwing2 (talk) 02:33, 8 May 2023 (UTC)Reply
@Benwing2 At the moment it cross-checks head2 with tr2 and so on. I've realised I haven't accounted for when the number of tr params exceeds the number of head params, so I'll need to add something to handle that. I'd be very surprised if more than a couple of pages actually have that (as it could only happen on pages with ё), but there's no harm in handling it.
The reason why I think showing no transliteration is confusing is because it's not something that Russian editors are used to, and it would only happen for a very small number of terms. I think it's something that we could do, but only in situations when two identical headwords have different (sets of) transliterations, as opposed to where two transliterations are given for the same headword (where we can just show both): the former changes depending on the sense, while the latter doesn't. Because of its rarity, I think my preference would be for it to show [transliteration needed], in the same way headword templates do. Theknightwho (talk) 03:02, 8 May 2023 (UTC)Reply
@Theknightwho I think showing [transliteration needed] in place of the transliterations is fine, or maybe [ambiguous transliterations; manual transliteration needed] or something. Benwing2 (talk) 08:37, 8 May 2023 (UTC)Reply
Good idea. I've also found a couple of terms with four transliterations for the same headword: конгрессме́н (kongressmén, kongressmɛ́n, kongrɛssmén, kongrɛssmɛ́n) and сенеша́ль (senešálʹ, sɛnɛšálʹ, sɛnešálʹ, senɛšálʹ), which both seem a bit excessive. These are both single-word terms, but things can quickly get out of hand if a multiword term has two or three words like this, as the headword template gives every combination. Should we introduce a cap for the scraper? Theknightwho (talk) 13:29, 8 May 2023 (UTC)Reply
@Theknightwho For inflections I have the concept of a "variant code" that helps ensure that e.g. if the instrumental singular of a noun has either -ой or -ою and the corresponding adjective has the same variation, you get the -ой's lining up and the -ою's lining up rather than all combinations; but I'm not sure if that's feasible for transliterations. But normally you shouldn't see cases like this; it would seem extremely rare to have two such multi-translit terms in the same expression. However, just in case it might make sense to have a cap of 4. Benwing2 (talk) 19:50, 8 May 2023 (UTC)Reply
That makes sense. I’ve had the thought that this is more likely when you account for usage examples as well, so we should probably do something to prevent things spiralling. It should be relatively straightforward to match alternation between e and ɛ, which will catch most instances of exponential growth in outputs.
In practical terms I will work on the test cases today, as it would be good to have a lot of them before rolling this out. Theknightwho (talk) 14:37, 13 May 2023 (UTC)Reply
@Theknightwho That sounds like a good idea. Benwing2 (talk) 21:44, 13 May 2023 (UTC)Reply

autoelegirse edit

autoeligió is in CAT:E with an unfixable module error, but that seems to be just a symptom of the main problem: acceleration fails for a large part of the paradigm at autoelegirse: pretty much all of the finite forms in the inflection table are redlinks. elegir doesn't have this problem. My only guess is that it might have something to do with the lack of an entry for autoelegir (I'm not sure if we would want one, since the object is always the same as the subject, but I don't normally edit Spanish entries).

Not that this very narrow edge case needs to be fixed immediately, but deleting autoeligió will take it off the radar and I wanted to make someone aware of the issue before I do that. Chuck Entz (talk) 16:00, 13 May 2023 (UTC)Reply

@Chuck Entz It looks like the module needed a little help to know how to conjugate the verb (it doesn't look up conjugations on other pages). Benwing2 (talk) 19:33, 13 May 2023 (UTC)Reply

Reflexive verbs edit

Could you also run that script on Old Polish, modern Polish, Silesian, and Kashubian to switch reflexive to reflexive-PRONOUN? I've updated their respective labels data modules to include it. Vininn126 (talk) 11:25, 14 May 2023 (UTC)Reply

@Vininn126 How do I know which pronoun to use? (And can you enumerate the possible pronouns for each language?) Benwing2 (talk) 17:30, 14 May 2023 (UTC)Reply
It should be in the module, hopefully. It should be one since these pronouns are more like unchanging particles. Vininn126 (talk) 17:39, 14 May 2023 (UTC)Reply
@Vininn126 I guess what I mean is, e.g. Czech has both se (accusative) and si (dative); similarly for Bulgarian. Does any such thing exist for any of the Lechitic languages? Benwing2 (talk) 18:06, 14 May 2023 (UTC)Reply
Yes, however these are generally lemmatized separately, see wmawiać sobie or radzić sobie. This is how WSJP does it. I think there might be one or two with <nocode>(reflexive, with sobie)</nocode> or some variant thereof. This should only be in Polish, no other Lechitic language. Vininn126 (talk) 18:15, 14 May 2023 (UTC)Reply
@Vininn126 I am currently pushing changes for Polish; will do the others shortly. Benwing2 (talk) 04:38, 18 May 2023 (UTC)Reply

Changes to the headword module. edit

Hello. Some edit to Module:headword is probably causing the pagename to be reproduced over and over again when a page has more than one transliteration. Case in point: राज्य and पक्ति where |tr2= and |tr3= are used, causing the headword to appear multiple times. It was of course not like this before. Could you look into this issue? Thanks. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 04:33, 15 May 2023 (UTC)Reply

@Bhagadatta Do you know how long this has been like this? Benwing2 (talk) 04:35, 15 May 2023 (UTC)Reply
Since at least a month, as I noticed it in April. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 04:46, 15 May 2023 (UTC)Reply
@Bhagadatta OK, this may be due to my headword rewrite in March, which generally aligns headwords and the corresponding translit. I'll add a special case so if there's only one headword and multiple translits, the headword will display once. But I'm not sure what to do if e.g. there are 2 distinct headwords and 3 translits, as if I only display two headwords, it will be ambiguous which translit goes with which headword. Benwing2 (talk) 05:24, 15 May 2023 (UTC)Reply
Okay, as I understand it, in case of multiple headwords, |tr1 takes care of headword 1, |tr2 takes care of headword 2 and so on, with there being an uncertainty with regards to what the module will display if there is a third translit but no third headword. Looks like this issue was discussed as early as 2014.[5]
Could a parameter be added so that a user can enter (for instance) |h1tr1, |h1tr2 and so on for all transliterations of the first headword and |h2tr1 and so on for the second headword? Is such a change feasible? -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 05:45, 15 May 2023 (UTC)Reply
Alternatively users could manually enter |headN=[TERM] corresponding to the transliterations in case the headwords are distinct. Svartava (talk) 05:48, 15 May 2023 (UTC)Reply
@Bhagadatta I don't think these extra params are needed, because the module can look for adjacent headwords that are the same and collapse them. The issue however is how to display these; currently we display all headwords together followed by all translits, which makes for difficulties in the situation described above. We'd have to totally restructure the display and somehow show headwords along with corresponding translits; this is not conceptually hard to do but would require some discussion and consensus. Benwing2 (talk) 05:49, 15 May 2023 (UTC)Reply
Addressing just Benwing2's display concern: Does differential punctuation help? Eg, semi-colon between the sets of transliterations for headwords, commas within the sets. When I do this kind of thing for hyponyms of taxonomic names with two ranks (eg, tribe and genus) of taxa being displayed and attempting to show the membership of the genera in their tribes, I find that the punctuation difference is a little hard to notice. Could it be emboldened? DCDuring (talk) 17:08, 15 May 2023 (UTC)Reply
@DCDuring Yes, we can change the fonts and punctuation of the headword line. If you have the energy, maybe you could create a few mockups, and then I'll bring this to the Beer Parlour. Benwing2 (talk) 04:37, 18 May 2023 (UTC)Reply
Can you point me to an example of two headwords and three or more transliterations? DCDuring (talk) 17:58, 18 May 2023 (UTC)Reply
To illustrate "enhanced" semi-colons to group items, see under Hyponyms at Amygdaloideae. The semicolons are both emboldened and embiggened. Alternative approaches would have something like " - genera in Sorbarieae" after each group of members of a tribe and/or put each group of genera on a separate line, or place parentheses or other bracketing punctuation around each group.
In my case, I could also put the type genus for each tribe first in each grouping of genera, the tribe name almost always being derived from the type genus. But, in this case, the type genus of Amygdaleae is its sole genus Prunus. DCDuring (talk) 18:33, 18 May 2023 (UTC)Reply
@DCDuring Hmm. The boldfaced and upsized semicolons aren't so noticeable to me and when I look closely at them they look a bit strange; I wonder if it would be better to use oversize slashes with a space on each side. As for headword lines with multiple headwords, each with multiple translits, I can't think of an example but I'm almost sure there are Russian examples with two or more possible stresses, each of which can have a palatalized or nonpalatalized consonant before /e/. User:Atitarev, can you think of such an example? Benwing2 (talk) 19:39, 18 May 2023 (UTC)Reply
деоккупация, деактивированный, деактивировать. Different types of speech are implemented differently and input is somewhat different.
The term роженица has multiple stresses.
Check also Arabic بروليتاريا, which looks differently but there are words with multiple vocalisations, which differently still. can't think of any ATM. Anatoli T. (обсудить/вклад) 23:14, 18 May 2023 (UTC)Reply
I think you're after something like роженица in this discussion. Anatoli T. (обсудить/вклад) 23:21, 18 May 2023 (UTC)Reply
It looks like there is at present a one-to-one relationship of stressed-headword-form to transliteration in the case of the entry for роженица (roženica). I thought we are looking for how to handle the situation where there are multiple transliterations for the same stressed-headword-form. The inflection lines already seem too complicated. Is there evidence that one of the stress patterns is significantly more common? Perhaps only that one could appear on the inflection line, with the others concealed under a show-hide bar? With a simpler inflection line the complication of the palatalize/nonpalatalized consonant might not be too much for the inflection line. DCDuring (talk) 00:44, 19 May 2023 (UTC)Reply
@DCDuring: деоккупа́ция (deokkupácija) is an example of the same stressed-headword-form. The transliterations are comma-separated, which may be imperfect but what's better?
What kind of evidence do you require for роженица (roženica)? The most common pronunciation ро́женица (róženica) is proscribed. Anatoli T. (обсудить/вклад) 01:13, 19 May 2023 (UTC)Reply
I'm losing track of what the problem is. DCDuring (talk) 01:35, 19 May 2023 (UTC)Reply
@DCDuring: To be honest, me too. @Bhagadatta complained about the display issues. Currently, I see no issues at राज्य (rājya). Multiple transliterations look fine to me. I actually prefer or to other delimiters. Anatoli T. (обсудить/вклад) 02:23, 19 May 2023 (UTC)Reply
The biggest problem from my non-Russian perspective is that some Russian inflection lines seem too busy, but what do I know? The problem I started out thinking I was addressing seems relatively rare, but is made worse by the busy-inflection-line problem. DCDuring (talk) 12:15, 19 May 2023 (UTC)Reply
I was looking for a minimal change. I have not implemented what I did at Amygdaloideae elsewhere, because I was not happy with it. It is asking a lot for users to notice and ascribe meaning to this kind of unconventional extension of the interpretation of a semicolon vs. a common. People have enough trouble with the simplest nesting: lists, separated by semicolons, of lists of comma-separated items even where the hierarchical structure does not require reference to a list on a different line. We can forget about my problem. I'll try to find some way to exploit the nature of the content to resolve it - or I'll stop trying to overload the entry with data. DCDuring (talk) 00:56, 19 May 2023 (UTC)Reply
@DCDuring Those semicolons don't look great to me, and they also make the wikitext absolutely awful. Are you really sure that's the best approach? Theknightwho (talk) 17:01, 19 May 2023 (UTC)Reply
We could clean up the wikitext with a template (or my other means?). I assume the HTML would still be a mess, unless CSS could save the day. If I really were sure it was a good idea, I'd have implemented it. DCDuring (talk) 14:25, 20 May 2023 (UTC)Reply
See Scombrinae#Hyponyms for another approach to the two-level grouping problem. I'm not sure about this one either. DCDuring (talk) 18:17, 21 May 2023 (UTC)Reply
@DCDuring Seems a bit better. Note that I implemented two-level grouping in synonyms, antonyms and the like using semicolons for the outer grouping and commas for the inner grouping; but in that case I think it might be a bit clearer. Benwing2 (talk) 19:12, 21 May 2023 (UTC)Reply
In the end it is a tradeoff between space and user convenience. It would be easy to argue that the two-level hyponym content overloads the user, the formatting problem just highlighting the overload problem. I would need to come up with some criterion about Hyponyms to decide whether to have two levels or one and, if one, which one. I think I have more degrees of freedom than you do. DCDuring (talk) 21:15, 21 May 2023 (UTC)Reply
@DCDuring Yes, agreed. The reason I implemented the semicolon in synonyms etc. is that people often listed a whole load of synonyms of different sorts in the same line, and I wanted to make it easier to logically separate them and in particular to allow people to specify a single qualifier for all synonyms in the group and make it fairly clear that the qualifier applied to the whole group. An alternative in the case of synonyms is to list the different groups on different lines; maybe you can do that with Hyponyms as well. Benwing2 (talk) 22:30, 21 May 2023 (UTC)Reply
As we speak, I am working on the entry for Brassicaceae, with many important genera. I am just separating the wheat from the chaff at the moment, but obviously the two-level hyponyms are a problem. I may only present the tribes on the same line as the genera therein. I should have come up with that before, but I've no one who takes an interest, not even ChuckE. DCDuring (talk) 22:39, 21 May 2023 (UTC)Reply
@DCDuring Hmm, definitely I would break up the long Hyponyms line in Brassicaceae into several. Using multiple lines with different indents seems natural for phylogenetic data because of the tree structure. Benwing2 (talk) 22:41, 21 May 2023 (UTC)Reply
I don't like the tree structure for our entries because it takes up even more space, but you may be right. Probably by some time next week, I'll have this done as yet another alternative presentation structure. Then I will do the tree structure on another, similar entry. Of all the comprehensive taxonomic databases, none includes all the ranks, especially below family level. We aren't and can't be comprehensive, but we try to include such ranks. I need to reduce the clutter they cause, without eliminating them. DCDuring (talk) 22:53, 21 May 2023 (UTC)Reply
@DCDuring: I took a stab at a different formatting for Brassicaceae. I've only done the first few tribes as a proof of concept. I used the Wikispecies version of the tribes because no one has gotten around to the Wikipedia version: the tribes are listed and the genera are listed, but there's no information in most cases as to which genera are in which tribes. There are also a few genera that don't have Wikipedia articles yet. For instance, Irania is a redlink on a diambiguation page.
At any rate, when you have a huge list like that, I think you're better off doing something to visually mark the groups so you don't have to scan though dozens and dozens of identically formatted taxonomic names looking for them. Chuck Entz (talk) 01:15, 22 May 2023 (UTC)Reply
I believe the best source for Brassicaceae taxonomy is Brassibase at Heidelberg. {{R:Brassibase}} is how I plan to reference the links to the data item, the most important feature of which is tribe membership for species and genera. DCDuring (talk) 02:54, 22 May 2023 (UTC)Reply

Wingerbot edit - one full stop too many edit

In this edit, Wingerbot added an extraneous full stop after {{pedia}}. (Only within the etymology; later usages were correctly ignored.)— Pingkudimmi 11:45, 19 May 2023 (UTC)Reply

@Pingku Thanks. My script has a long list of templates that include a period in their output but it missed this one; I have added it. Benwing2 (talk) 00:06, 21 May 2023 (UTC)Reply
Thanks.— Pingkudimmi 12:10, 21 May 2023 (UTC)Reply

Serbocroatian femeqs edit

@Anarhistička Maca @Stujul I propose we use {{femeq}} on Serbocroation entries. Vininn126 (talk) 20:21, 23 May 2023 (UTC)Reply

@Vininn126 This is fine with me. Benwing2 (talk) 23:26, 23 May 2023 (UTC)Reply
Yes, I agree, as long as the definitions are still shown. Stujul (talk) 12:48, 24 May 2023 (UTC)Reply
Yes. I also like how @Stujul has started to separate diminutive senses from the others, like on verižica, seems a good practice.Anarhistička Maca (talk) 07:28, 25 May 2023 (UTC)Reply
@Stujul, Anarhistička Maca I am confused about that particular entry. Why are the three definitions indented under the "diminutive of" definition? If this is because the base word has three senses, and you can form a diminutive of each, the definitions at verižica should make this clear, e.g. "small pothook" rather than just "pothook". If these senses aren't simply diminutives of the base word senses, but have taken on a life of their own, they shouldn't be indented under the "diminutive of" line. Benwing2 (talk) 07:47, 25 May 2023 (UTC)Reply
As for including the definition of {{femeq}} forms, we do that for Russian as well. Benwing2 (talk) 07:48, 25 May 2023 (UTC)Reply
@Benwing2 I guess they should be described more fully with qualifiers. Notwithstanding, I like how it appears visually. I think it's a good use of subsenses. Anarhistička Maca (talk) 07:51, 25 May 2023 (UTC)Reply
I think most cases are going to not need that, since most will only have one definition, in which case they will look like Czech entries, with a {{{t}}} providing a definition. Vininn126 (talk) 08:49, 25 May 2023 (UTC)Reply
I wanted to use the {{diminutive of}} template in the definition for categorisation, but I don't like to use it multiple times. I thought this would be the best alternative. I've seen other people use "small" in front of the definition as you said, but then what's the point of the template? That would just be giving the same information twice. Stujul (talk) 12:24, 25 May 2023 (UTC)Reply
@Stujul The thing is, you're already duplicating the entire set of definitions of the base term at the diminutive. Generally we want to avoid such duplication, so I'd recommend the same approach as User:Vininn126, which is to use the |t= param on the {{diminutive of}} template to summarize the definitions of the base term (use semicolons to separate the different definitions). Benwing2 (talk) 23:39, 25 May 2023 (UTC)Reply

Unicode Collation Algorithm - ideas? edit

Hiya - I've created a reasonably efficient implementation of the Unicode Collation Algorithm: I took the data in Unicode's allkeys.txt, stored it in a human-readable form in Module:User:Theknightwho/sortkey, and then serialised it in Module:User:Theknightwho/sortkey/serialized. That last module is read by Module:User:Theknightwho/sort, which is able to sort tables of inputs. This seems like a decent solution for the column template, and it's relatively straightforward to add language-specific tweaks.

Unfortunately, category sort is a bit more tricky, because categories are divided into sections by first letter: the sortkeys produced by the UCA are completely arbitrary from that perspective (e.g. the sortkey for "dictionary" is Ồήẽ⃚ή‡´ẉ⁸⅓ plus a bunch of secondary + tertiary weighting). As far as I can tell, our options are:

  • Putting up with this, so we'd have a bunch of perfectly sorted categories with arbitrary section headers.
  • Coming up with a bunch of "default" start letters: e.g. the primary weight for ᴅ is 20C3, and the next lowest standard Latin letter is D (with a primary weight of 20BF). We could then add D before the sortkey if a term starts with &#x1D05. This is still imperfect, and would likely end up being really inefficient.
  • Getting $wgCategoryCollation changed to uca-default in the site's LocalSettings.php file, but I don't know how we'd do that. Possibly a Phabricator ticket? If so, they'd probably want to see some kind of vote in favour first.

Just wondering if you have any thoughts on this. Also tagging @Erutuon and @Surjection. Theknightwho (talk) Theknightwho (talk) 18:11, 25 May 2023 (UTC)Reply

@Theknightwho This is somewhat technical; can you clarify what the difference is between the UCA and the current sorting algorithm(s), and what implementing the UCA gets us? As for category sorting, if we want the UCA to apply there then IMO changing LocalSettings.php is definitely the way to go. But don't we already have language-specific sorting keys, which are what category sorting uses? Benwing2 (talk) 23:37, 25 May 2023 (UTC)Reply
@Benwing2 The current sorting algorithm uses the Unicode codepoint, so is pretty arbitrary. The UCA is systematically designed to provide a sophisticated default sort order that is (reasonably) language-neutral, which can then be tailored on a language-by-language basis where further changes need to be made. It's far superior, but obviously requires considerably more work. There are features like secondary and tertiary weightings (used as tiebreakers), and it also provides a baseline for sorting nonstandard characters within a given language.
We do already have language-specific sortkeys, but they're of variable quality, because they all have to create a "fake" sortkey using codepoint tricks to fool the MW software's algorithm into doing the correct sort order. They're also a lot cruder, because they generally don't handle nonstandard characters for the language well (e.g. diacritics in English are just stripped, which means sort order is not predictable). In some cases like Tibetan they've become very complex due to the need to produce a MW-compatible sortkey, whereas Module:Mymr-sortkey doesn't even try, so is restricted to use in columns. Theknightwho (talk) 23:46, 25 May 2023 (UTC)Reply
By the way - just as a comparison:
Codepoint order:
  • Ѐ, Ё, Ђ, Ѓ, Є, Ѕ, І, Ї, Ј, Љ, Њ, Ћ, Ќ, Ѝ, Ў, Џ, А, Б, В, Г, Д, Е, Ж, З, И, Й, К, Л, М, Н, О, П, Р, С, Т, У, Ф, Х, Ц, Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я, а, б, в, г, д, е, ж, з, и, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ъ, ы, ь, э, ю, я, ѐ, ё, ђ, ѓ, є, ѕ, і, ї, ј, љ, њ, ћ, ќ, ѝ, ў, џ, Ѡ, ѡ, Ѣ, ѣ, Ѥ, ѥ, Ѧ, ѧ, Ѩ, ѩ, Ѫ, ѫ, Ѭ, ѭ, Ѯ, ѯ, Ѱ, ѱ, Ѳ, ѳ, Ѵ, ѵ, Ѷ, ѷ, Ѹ, ѹ, Ѻ, ѻ, Ѽ, ѽ, Ѿ, ѿ, Ҁ, ҁ, ҂, Ҋ, ҋ, Ҍ, ҍ, Ҏ, ҏ, Ґ, ґ, Ғ, ғ, Ҕ, ҕ, Җ, җ, Ҙ, ҙ, Қ, қ, Ҝ, ҝ, Ҟ, ҟ, Ҡ, ҡ, Ң, ң, Ҥ, ҥ, Ҧ, ҧ, Ҩ, ҩ, Ҫ, ҫ, Ҭ, ҭ, Ү, ү, Ұ, ұ, Ҳ, ҳ, Ҵ, ҵ, Ҷ, ҷ, Ҹ, ҹ, Һ, һ, Ҽ, ҽ, Ҿ, ҿ, Ӏ, Ӂ, ӂ, Ӄ, ӄ, Ӆ, ӆ, Ӈ, ӈ, Ӊ, ӊ, Ӌ, ӌ, Ӎ, ӎ, ӏ, Ӑ, ӑ, Ӓ, ӓ, Ӕ, ӕ, Ӗ, ӗ, Ә, ә, Ӛ, ӛ, Ӝ, ӝ, Ӟ, ӟ, Ӡ, ӡ, Ӣ, ӣ, Ӥ, ӥ, Ӧ, ӧ, Ө, ө, Ӫ, ӫ, Ӭ, ӭ, Ӯ, ӯ, Ӱ, ӱ, Ӳ, ӳ, Ӵ, ӵ, Ӷ, ӷ, Ӹ, ӹ, Ӻ, ӻ, Ӽ, ӽ, Ӿ, ӿ, Ԁ, ԁ, Ԃ, ԃ, Ԅ, ԅ, Ԇ, ԇ, Ԉ, ԉ, Ԋ, ԋ, Ԍ, ԍ, Ԏ, ԏ, Ԑ, ԑ, Ԓ, ԓ, Ԕ, ԕ, Ԗ, ԗ, Ԙ, ԙ, Ԛ, ԛ, Ԝ, ԝ, Ԟ, ԟ, Ԡ, ԡ, Ԣ, ԣ, Ԥ, ԥ, Ԧ, ԧ, Ԩ, ԩ, Ԫ, ԫ, Ԭ, ԭ, Ԯ, ԯ, ᲀ, ᲁ, ᲂ, ᲃ, ᲄ, ᲅ, ᲆ, ᲇ, ᲈ, ᴫ, ᵸ, Ꙁ, ꙁ, Ꙃ, ꙃ, Ꙅ, ꙅ, Ꙇ, ꙇ, Ꙉ, ꙉ, Ꙋ, ꙋ, Ꙍ, ꙍ, Ꙏ, ꙏ, Ꙑ, ꙑ, Ꙓ, ꙓ, Ꙕ, ꙕ, Ꙗ, ꙗ, Ꙙ, ꙙ, Ꙛ, ꙛ, Ꙝ, ꙝ, Ꙟ, ꙟ, Ꙡ, ꙡ, Ꙣ, ꙣ, Ꙥ, ꙥ, Ꙧ, ꙧ, Ꙩ, ꙩ, Ꙫ, ꙫ, Ꙭ, ꙭ, ꙮ, ꙳, ꙾, ꙿ, Ꚁ, ꚁ, Ꚃ, ꚃ, Ꚅ, ꚅ, Ꚇ, ꚇ, Ꚉ, ꚉ, Ꚋ, ꚋ, Ꚍ, ꚍ, Ꚏ, ꚏ, Ꚑ, ꚑ, Ꚓ, ꚓ, Ꚕ, ꚕ, Ꚗ, ꚗ, Ꚙ, ꚙ, Ꚛ, ꚛ, ꚜ, ꚝ
UCA order:
  • ꙳, ꙾, ҂, а, А, ӑ, Ӑ, ӓ, Ӓ, ә, Ә, ӛ, Ӛ, ӕ, Ӕ, б, Б, в, ᲀ, В, г, Г, ѓ, Ѓ, ґ, Ґ, ғ, Ғ, ӻ, Ӻ, ҕ, Ҕ, ӷ, Ӷ, д, ᲁ, Д, ԁ, Ԁ, ꚁ, Ꚁ, ђ, Ђ, ꙣ, Ꙣ, ԃ, Ԃ, ҙ, Ҙ, е, Е, ѐ, Ѐ, ӗ, Ӗ, ё, Ё, є, Є, ж, Ж, ӂ, Ӂ, ӝ, Ӝ, ԫ, Ԫ, ꚅ, Ꚅ, җ, Җ, з, З, ӟ, Ӟ, ꙁ, Ꙁ, ԅ, Ԅ, ԑ, Ԑ, ꙃ, Ꙃ, ѕ, Ѕ, ꙅ, Ꙅ, ӡ, Ӡ, ꚉ, Ꚉ, ԇ, Ԇ, ꚃ, Ꚃ, и, И, ѝ, Ѝ, ӥ, Ӥ, ӣ, Ӣ, ҋ, Ҋ, і, І, ї, Ї, ꙇ, Ꙇ, й, Й, ј, Ј, ꙉ, Ꙉ, к, К, ќ, Ќ, қ, Қ, ӄ, Ӄ, ҡ, Ҡ, ҟ, Ҟ, ҝ, Ҝ, ԟ, Ԟ, ԛ, Ԛ, л, Л, ᴫ, ӆ, Ӆ, ԯ, Ԯ, ԓ, Ԓ, ԡ, Ԡ, љ, Љ, ꙥ, Ꙥ, ԉ, Ԉ, ԕ, Ԕ, м, М, ӎ, Ӎ, ꙧ, Ꙧ, н, Н, ᵸ, ԩ, Ԩ, ӊ, Ӊ, ң, Ң, ӈ, Ӈ, ԣ, Ԣ, ҥ, Ҥ, њ, Њ, ԋ, Ԋ, о, ꙭ, ꚙ, ꙮ, ꚛ, ꙫ, ꙩ, ᲂ, О, Ꙫ, Ꚙ, Ꙩ, Ꚛ, Ꙭ, ӧ, Ӧ, ө, Ө, ӫ, Ӫ, п, П, ԥ, Ԥ, ҧ, Ҧ, ҁ, Ҁ, р, Р, ҏ, Ҏ, ԗ, Ԗ, с, ᲃ, С, ԍ, Ԍ, ҫ, Ҫ, т, ᲅ, ᲄ, Т, ꚍ, Ꚍ, ԏ, Ԏ, ҭ, Ҭ, ꚋ, Ꚋ, ћ, Ћ, у, У, ў, Ў, ӱ, Ӱ, ӳ, Ӳ, ӯ, Ӯ, ү, Ү, ұ, Ұ, ꙋ, ᲈ, Ꙋ, ѹ, Ѹ, ф, Ф, х, Х, ӽ, Ӽ, ӿ, Ӿ, ҳ, Ҳ, һ, Һ, ԧ, Ԧ, ꚕ, Ꚕ, ѡ, Ѡ, ѿ, Ѿ, ꙍ, Ꙍ, ѽ, Ѽ, ѻ, Ѻ, ц, Ц, ꙡ, Ꙡ, ꚏ, Ꚏ, ҵ, Ҵ, ꚑ, Ꚑ, ч, Ч, ӵ, Ӵ, ԭ, Ԭ, ꚓ, Ꚓ, ҷ, Ҷ, ӌ, Ӌ, ҹ, Ҹ, ꚇ, Ꚇ, ҽ, Ҽ, ҿ, Ҿ, џ, Џ, ш, Ш, ꚗ, Ꚗ, щ, Щ, ꙏ, Ꙏ, ꙿ, ъ, ᲆ, Ъ, ꚜ, ꙑ, Ꙑ, ы, Ы, ӹ, Ӹ, ь, Ь, ꚝ, ҍ, Ҍ, ѣ, ᲇ, Ѣ, ꙓ, Ꙓ, э, Э, ӭ, Ӭ, ю, Ю, ꙕ, Ꙕ, ꙗ, Ꙗ, я, Я, ԙ, Ԙ, ѥ, Ѥ, ѧ, Ѧ, ꙙ, Ꙙ, ѫ, Ѫ, ꙛ, Ꙛ, ѩ, Ѩ, ꙝ, Ꙝ, ѭ, Ѭ, ѯ, Ѯ, ѱ, Ѱ, ѳ, Ѳ, ѵ, Ѵ, ѷ, Ѷ, ꙟ, Ꙟ, ҩ, Ҩ, ԝ, Ԝ, ӏ, Ӏ
Theknightwho (talk) 00:21, 26 May 2023 (UTC)Reply
@Theknightwho I see. Overall this sounds good but I'd like to see a bit of info presented on how much work it will be to maintain the code (and in particular the language-specific stuff) once written vs. the benefit that comes from it (esp. since I haven't seen anyone else complain about the current sorting mechanism). Benwing2 (talk) 06:49, 26 May 2023 (UTC)Reply
@Benwing2 The code as I've written it is very easy to maintain, as it just relies on plugging in the Default Unicode Collation Element Table (DUCET) data, which is published by Unicode at [6].
To explain: each codepoint is given weightings between square brackets corresponding to one or more characters. For example, Latin "A" has the weighting [.2075.0020.0008], which shows its primary, secondary and tertiary weights. The initial . means it has a fixed weight (i.e. it's not context-specific), while some characters have *, which means they have a variable weight: this generally applies to punctuation, where it may be desirable to downgrade the weighting in some contexts.
In simple terms, sortkeys are calculated by collating all the primary, then secondary, then tertiary weights, while disregarding any zero-weights (so "word" would be 2343 221D 2275 20BF 0020 0020 0020 0020 0002 0002 0002 0002). Diacritics are usually only given secondary weights (i.e. a primary weight of 0000), while tertiary differences tend to only come into play for things like capitalisation or equivalent hiragana/katakana. Some characters have multi-character weights: these usually correspond to their decomposed forms, but not always (e.g. ). There's a lot more info about this here, and the algorithm can optionally use even lower-order weightings if needed.
Module:User:Theknightwho/sortkey is a table, which is a lightly rearranged version of the DUCET data that took me about 25 minutes to create (and could probably be automated):
  • Key [1] is the codepoint as a string. Occasionally, strings of more than one character are assigned a specific weight: sometimes where decomposable parts are equivalent to a composed character (e.g. U+0418 + U+0306 Й ≡ U+0419 Й), and sometimes not (e.g. U+0E40 U+0E2E เฮ). Where that occurs, key [1] is a table of codepoints.
  • Key [2] is "." or "*".
  • Keys [3], [4] and [5] are the primary, secondary and tertiary weights, respectively. Where there are multi-character weights, keys [6] to [9], [10] to [13] etc. correspond to characters 2 and 3 etc.
The serialize function iterates down the list, and generates a string in the following format (taking "A" as an example):
"A" .. "\254" .. "₩\20\2" .. "\255"
  1. The character(s).
  2. "\254" (= ".") or "\253" (= "*").
  3. Three characters, corresponding to the codepoints for each weight (we could concatenate the codepoints as 5-character strings with leading zeroes, but this keeps the length down).
  4. For multi-character weightings, any further "\253" or "\254" followed by the three weights.
  5. Final "\255", marking end of sequence.
These strings are then collated, and every byte converted into \XXX format for the sake of convenience, as the output string needs to be pasted into Module:User:Theknightwho/sortkey/serialized. So the above example becomes \65\254\226\130\169\20\2\255.
The sortkey algorithm in Module:User:Theknightwho/sort takes the serialized data and matches "\255(" .. char .. "[^\255]+)\255" (i.e. end + character + everything until the next end). It then iterates over the string with gmatch, matching "([\253\254])(" .. UTF8_char .. ")(" .. UTF8_char .. ")(" .. UTF8_char .. ")". Weights are collated in primary, secondary and tertiary tables, which are then concatenated as a hexadecmal string at the end. The sort function memoizes these as it goes, as it's fast and keeps memory use down. This is quite a crude implementation, as it doesn't yet take into account some of the more sophisticated things that are possible with the UCA, but it's certainly a major improvement over what we have at the moment.
In terms of maintenance, we'd just need to update it whenever the DUCET changes (about once a year).
In terms of language-specific needs, Unicode publish a big list of language-specific tailorings. These are often quite complex (particularly Arabic and East Asian scripts), so would need some work to implement - especially as they follow a different format to the DUCET. However, they have the major advantage of being produced by experts, and there is a Lua implementation of the UCA that we could probably use as a starting point. There's also nothing stopping us producing our own tailorings, too. Theknightwho (talk) 15:09, 26 May 2023 (UTC)Reply
If there should be a shortage of those willing and able to do the maintenance or if users in a particular language are unhappy, how hard is it to opt out and fall back to something generic, possibly just until someone willing and able does the maintenance or improvement work? DCDuring (talk) 15:41, 26 May 2023 (UTC)Reply
@DCDuring The UCA is meant to be the generic fallback (and ideally it's what we'd use as the default), so language-specific stuff goes on top of it. If language users are unhappy, we'd just need to change whatever is needed. It wouldn't be hard to turn it off, but I can't really envision any situations where that would be necessary, as specific problems should be dealt with on a case-by-case basis. The current default is a much more arbitrary order, as codepoint order isn't designed to be used for sorting.
@Benwing2 I've just written Module:User:Theknightwho/DUCET, which can generate the serialized data directly from Unicode's text file (stored at User:Theknightwho/DUCET). I removed the character names for the sake of space, but it doesn't make a difference either way. Theknightwho (talk) 16:21, 26 May 2023 (UTC)Reply
What is not easy to deal with on a case-be-case basis is the absence of people willing and able to address problems or dissatisfaction. DCDuring (talk) 16:24, 26 May 2023 (UTC)Reply
@DCDuring Sure, but that goes for whatever method we choose. Like I said: it would not be difficult to turn on or off. Theknightwho (talk) 16:26, 26 May 2023 (UTC)Reply
@Theknightwho Thank you for writing the automation module Module:User:Theknightwho/DUCET. IMO, what we need now is a good plan describing how to implement language-specific sorting on top of this, how to make sure we can transition bit-by-bit from using codepoint sorting as the fallback to using UCA sorting as the fallback, and how to switch off UCA sorting for a given language if for whatever reason this is deemed necessary (e.g. the editors of a given language don't like the results or find it too complex to implement or maintain the UCA version of the sorting). Then, we need to write how-to documentation on this plan (e.g. how to write a language-specific UCA sorting module and how to turn off language-specific UCA sorting and go back to codepoint sorting), along with how-to-documentation on how to run Module:User:Theknightwho/DUCET when a new DUCET version comes out. Essentially, you want to future-proof Wiktionary against the situation where you end up leaving the project or don't have the time to help with implementing or maintaining a given language-specific sorting module. (I am putting on my "tech company software engineer" hat here.) We also need to think about what happens if the change to $wgCategoryCollation in LocalSettings.php goes through; how does this impact language-specific sorting and how can we make sure it works correctly (or at least passably well) with language-specific sort keys written with codepoint sorting in mind? Benwing2 (talk) 04:04, 27 May 2023 (UTC)Reply
@Theknightwho Another thing, very important: how will adopting UCA sorting affect overall memory usage? I see around 23 entries now in CAT:E due to memory errors. The memory errors keep appearing and I assume it's due to functionality you keep adding, as AFAIK no one else is making changes to core modules. Are we ever going to be able to reduce the usage of {{*-lite}} templates and do you have a long-term strategy here? Benwing2 (talk) 08:48, 27 May 2023 (UTC)Reply

Hyphens in Catalan edit

Headword linking in porto-riqueny and costa-riqueny does not make sense. It is an orthographic convention instead of rr. Compare with novaiorquès not hyphenated. I know it can be avoided with a parameter. Is there any way to find "-r" or "-s" in Catalan headwords? Vriullop (talk) 06:45, 2 June 2023 (UTC)Reply

@Vriullop Hmm. I can turn off the hyphenation by default but there are a lot of words where the hyphenation does make sense. Using |nolinkhead=1 turns off linking of hyphenated components. I can look for words with -r or -s following a hyphen but if I turn off hyphenation in just those cases I suspect it will also turn off cases that should be hyphenated. Let me take a look at how many words occur where hyphenation makes sense vs. when it doesn't. Benwing2 (talk) 06:48, 2 June 2023 (UTC)Reply
OK, there are 447 lemmas with hyphens in them (not counting prefixes and suffixes), of which 171 have a hyphen followed by r or s. The vast majority of these do need hyphenation, e.g. penya-segat, barba-serrat, quaranta-set (and lots of other numbers), mata-rates, porta-revistes, quaranta-sis (and again lots of other numbers), cap-roig, etc. The only other one I can find that is somewhat like porto-riqueny and costa-riqueny is mont-realès. So I think the current solution along with |nolinkhead=1 is best. Benwing2 (talk) 06:59, 2 June 2023 (UTC)Reply
Thanks for checking it. Vriullop (talk) 12:25, 6 June 2023 (UTC)Reply

User:Conrad.Irwin/creation.js/intro edit

You deleted User:Conrad.Irwin/creation.js/intro. The MediaWiki:Gadget-AcceleratedFormCreation.js gadget still contains a link to that page. It's from the edit button in the notice displayed in the warning that says "Please ensure that the information is both complete and correct before clicking Publish changes. If you don’t speak this language, then be exceedingly careful not to propagate mistakes that exist in the source entry—a redlink is better than a wrong entry! edit". That broken link should be fixed. Daniel.z.tg (talk) 00:47, 9 June 2023 (UTC)Reply

@Daniel.z.tg Are you sure? I changed the code of that gadget in March to refer to the new location. Benwing2 (talk) 01:14, 9 June 2023 (UTC)Reply
@Benwing2: I checked again and it's still there. It affects both the old and the visual editor. In the old editor, the edit link shows the outdated URL on hover and its HTML source contains the following:
:: <div id="mw-content-text" class="mw-body-content"><div class="mw-editintro"><big>Please ensure that the information is both <b>complete</b> and <b>correct</b> before clicking Publish changes.</big> ::<p>If you don’t speak this language, then be exceedingly careful not to <a href="/wiki/propagate" title="propagate">propagate</a> mistakes that exist in the source entry—a redlink is better than a wrong entry! ::</p> ::<span style="font-size:85%;"><small class="editlink"><a class="external text" href="https://en.wiktionary.org/w/index.php?title=User:Conrad.Irwin/creation.js/intro&action=edit">edit</a></small></span></div><div id="wikiPreview" class="ontop" style="display: none;"><div lang="en" dir="ltr" class="mw-content-ltr"></div></div><form class="mw-editform" id="editform" name="editform" method="post" action="/w/index.php?title=pomeridianae&action=submit" enctype="multipart/form-data"><input type="hidden" value="ℳ𝒲♥𝓊𝓃𝒾𝒸ℴ𝒹ℯ" name="wpUnicodeCheck"><div id="antispam-container" style="display: none;"><label for="wpAntispam">Anti-spam check. ::
Daniel.z.tg (talk) 01:23, 9 June 2023 (UTC)Reply
@Daniel.z.tg OK I found the issue; the page I moved has a link to itself in it that wasn't fixed. Please check again; you might need to reload the page. Benwing2 (talk) 20:44, 9 June 2023 (UTC)Reply
@Benwing2: It's fixed now. Thanks! Daniel.z.tg (talk) 21:30, 9 June 2023 (UTC)Reply

Belarusian + Templates edit

I think we can also safely use + templates in Belarusian, I don't think we're gonna piss anyone off. Vininn126 (talk) 10:28, 9 June 2023 (UTC)Reply

Yes you are!!! PUC11:40, 9 June 2023 (UTC)Reply
@PUC Uh oh... I just the Slavic languages should resemble each other where they can (are you actually voting against this or...?) Vininn126 (talk) 11:43, 9 June 2023 (UTC)Reply
No, that's ok. I don't like these templates but won't kick up a fuss. PUC11:47, 9 June 2023 (UTC)Reply
@Vininn126, PUC OK I will get to this at some point. Benwing2 (talk) 20:38, 9 June 2023 (UTC)Reply

Manual conversion of Dari and Classical edit

Hi @Benwing2 ,


I have begun manually converting terms specific to Dari and Classical Persian (i.e. terms not applicable to Iranian Persian) to their respective transliteration at Persian transliteration/Dari. However, before I continue to go all out and convert all Dari & Classical terms, I just want to ask whether you think it's best to wait until the modules are complete or if it's fine to continue now? Like if you plan to use a bot to check transliterations again it may cause some issues (though Dari and classical specific terms already use a different transliteration so those issues would probably happen regardless). Do you think it's a good Idea to add the Classical-Dari transliteration on non-Iranian terms now? Or would it be best to wait until the support for multiple transliterations in one entry is ready, and add both transliteration schemes to every page (where possible)?


Let me know what you think the best course of action is, and if you still have questions about the formatting/modules atitarev and I will try to help you wherever we can.


If your fine with me manually converting them prematurely, is there someway you want me to mark the transliterations to prevent issues? سَمِیر | sameer (talk) 22:59, 9 June 2023 (UTC)Reply

Also I should clarify I don't intend to rush you, I understand you are very busy with other projects. I just want to make sure my conversions don't make it harder for you later on. سَمِیر | sameer (talk) 23:00, 9 June 2023 (UTC)Reply
@Sameerhameedy It is probably OK because my script was already ignoring Dari and Classical terms whenever it could identify them as such. But links to these terms will have to be updated as well; and such links should definitely use the appropriate etymology language codes and not just fa (specifically, prs for Dari, fa-cls for Classical). User:Atitarev do you have any comments? Benwing2 (talk) 05:46, 10 June 2023 (UTC)Reply
@Benwing2 Could we also have any clarification on phonetic IPA for Iranian Persian? Thanks in advance.--Saranamd (talk) 07:14, 10 June 2023 (UTC)Reply
@Benwing2 In that case I'll continue converting Classical and Dari entries but maybe i'll hold off on adding more Dari and classical translations to English entries. The translations are currently only marked with "fa", since "fa-cls" and "prs" are not compatible with the translation template. It would probably cause problems if a bunch of translations were unmarked. سَمِیر | sameer (talk) 09:01, 10 June 2023 (UTC)Reply
@Benwing2, @Sameerhameedy: It's fine as you suggest. I wonder if the Persian headword can also use labels (parameters) to mark and categorise varieties without making completely new language codes for Persian entries and usage examples. Please see User_talk:ZxxZxxZ#Option_3 (it's duplicated in a few places) for pronunciation, headword and usage examples.
دُولَت‎ (dowlat) [defaults to Iranian Persian if unlabelled]
دَوْلَت‎ (dawlat) (Classical Persian, Dari Persian)
Copying usage examples from Sameerhameedy's posts:
  1. هِنْدوسْتانی را می‌فَهْمی [Iranian Persian]
    hendustani râ mi-fahmi?
    "Can you understand Hindustani?"
    آیا زَبان هِنْدُوسْتانِی‌را خوش دارِی؟ [Dari]
    āyā zabān-i hindūstānī-rā xōš dārī?
    "Do you like the Hindustani language?"
Anatoli T. (обсудить/вклад) 09:27, 10 June 2023 (UTC)Reply
@Atitarev, Sameerhameedy When you say the translations don't support etymology-only codes, do you mean {{t}}? If so I think it's better to fix the module code to support these codes instead of using a special way of marking the Dari/Classical uses. Benwing2 (talk) 19:17, 10 June 2023 (UTC)Reply
Although maybe the special way can be used as well when it's needed to make the distinction visible. Benwing2 (talk) 19:18, 10 June 2023 (UTC)Reply
@Saranamd I'll focus next on Persian stuff. Benwing2 (talk) 21:50, 10 June 2023 (UTC)Reply
@Benwing2 No the special labeling is needed for Persian entries not English translations. I've been writing translations similarly to how they're written for Chinese so the translation for "fan" would be written as
  • Persian:
    • Classical Persian: پنکه (panka)
    • Dari Persian: پکه (paka)
    • Iranian Persian: فن (fan)
But because the {{t}} template does not support the codes fa-cls or prs these are all marked as "fa" so a bot cannot properly check these transliteration. So the {{t}} template doesn't need to show the transliteration, it just needs support for the other language codes since the label will be written manually (as it is for other languages).
The need to show special markings is for example sentences and headers (depending on what format you choose to show transliterations, one of the other options was to move all transliterations to the pronunciation section). سَمِیر | sameer (talk) 22:48, 10 June 2023 (UTC)Reply
@Benwing2 Also, is there anyway activate auto transliteration for Classical Terms using the module {{fa-cls-translit}}? I get that auto transliteration in entries is not feasible yet but in etymology's it can be useful, since most terms borrowed from Persian are from classical Persian. سَمِیر | sameer (talk) 05:40, 13 July 2023 (UTC)Reply
@Sameerhameedy Yes, as long as the etymology code fa-cls is used, this should be possible I think; User:Theknightwho can comment on how ready this is for prime time as they wrote the foundational code in question, but I think it's ready. If so, I'll look into it; I've been meaning to get back to doing some coding for Persian and it should happen in a few days. Benwing2 (talk) 05:58, 13 July 2023 (UTC)Reply
@Benwing2 Yep - it's ready. You can simply set the transliteration module in Module:etymology languages/data as you would in regular language data modules. Theknightwho (talk) 06:09, 13 July 2023 (UTC)Reply
@Benwing2 sounds great, thank you! and thank you @Theknightwho. سَمِیر | sameer (talk) 20:36, 13 July 2023 (UTC)Reply
And just to let you know all example entries for Persian referenced in this discussion have been moved to this page, for consistency. So instead of the examples being repeated on multiple pages they're all on one page. سَمِیر | sameer (talk) 02:29, 30 July 2023 (UTC)Reply

Edit at ty vogo edit

Hello. Your edit here was not useful. Thyself be knowne (talk) 14:10, 11 June 2023 (UTC)Reply

@Thyself be knowne Fixed. Benwing2 (talk) 16:39, 12 June 2023 (UTC)Reply
@Benwing2 There are more! See on the page User:Jberkel/lists/wanted/20230601/cs under the sections "See" (39 instances) and "also" (29). These aren't true synonyms, they have different shades of meaning. Thyself be knowne (talk) 08:12, 14 June 2023 (UTC)Reply
@Thyself be knowne Well, that is Dan Polansky's mistake. Synonyms shouldn't be formatted like that; no other language uses "See" or "also" notes under Synonyms. (And in any case it's common for listed synonyms to express different shades of meaning; Dan should have put qualifiers in that case but I don't think he bothered.) However, I'll take a look at the lists you mention. Benwing2 (talk) 08:14, 14 June 2023 (UTC)Reply
@Thyself be knowne In the examples I looked at, the term listed after the "see" or "see also" appears to be a true synonym, even if the synonyms listed under that term aren't. So I think it's safe to just remove the "see" or "see also" wording, as I did above for ty vogo. Benwing2 (talk) 08:55, 14 June 2023 (UTC)Reply

Request for cleanup edit

Hi
I stumbled upon the page preliare, which I had edited some time ago, and found a Request for cleanup banner (put there by WingerBot (talkcontribs), which is why I'm contacting you):
The definition(s) may be wrong or misleading, and important senses may be missing. The specified auxiliary may also be wrong. The remainder of the conjugation is probably correct for -are verbs but may be wrong in some particulars for -ire verbs (especially the present participle).
I'm not exactly sure what the problem is here. Could you please shed some light on that? — GianWiki (talk) 11:18, 12 June 2023 (UTC)Reply

@GianWiki I wanted to eliminate {{it-verb-old}} and make the conjugation argument mandatory for {{it-verb}}. Many existing verbs had missing or incorrect conjugations, definitions and/or auxiliaries, and what I did for these verbs was look up and verify only the vowel quality and position for -are verbs, and for -ire verbs I only checked whether they took the -isc infix. I didn't verify the auxiliaries or definitions or the -ire present participle (which is often irregularly -iente instead of -ente) to the standard I would like, because there were too many verbs to do that, so instead I left a cleanup banner. If you're confident that the auxiliary and definitions are correct and complete, you can remove the banner. Benwing2 (talk) 16:57, 12 June 2023 (UTC)Reply
@Benwing2 I see. Thank you very much for clarifying. — GianWiki (talk) 19:47, 12 June 2023 (UTC)Reply

French etymologies edit

Hey, just wanted to draw your attention to this bot edit which accidentally mangled the formatting of an {{etydate}} by separating it to a separate sentence without turning off nocap and adding an unnecessary full stop. Not sure if any other edits in that batch might have problems, I fixed this one manually. (Few weeks old but only just saw it since I'm not very active at the moment, sorry!) —Al-Muqanna المقنع (talk) 12:36, 17 June 2023 (UTC)Reply

repetition repitition edit

Hi. can you clean up these instances of edition edition and page page? No hago griego (talk) 17:56, 17 June 2023 (UTC)Reply

Functions to get the current page section from a module edit

Hi - I've seen your new comments on my page and will get to them shortly. I just wanted to share something I've been working on, which are a pair of functions that can both calculate the current page section of the calling template from within the module (i.e. they know where on the page they've been invoked from). This is particularly useful for Japanese, where terms are sorted by how they're read, which means that sortkeys for templates like {{lb}} need to change automatically based on where they are on the page (unless we do tens of thousands of sort=). However, I imagine they could have many other uses, too. See Module:User:Theknightwho/get header.

I've put detailed notes, as they're both very hacky, and both have drawbacks: the first is likely to be patched out by the devs at some point, and the second is pretty cumbersome (but may actually be workable in the medium-term, pending possible performance/maintenance issues). This exact functionality has been requested in the Community Wishlist Survey 2023, so hopefully it'll only be a stopgap. Theknightwho (talk) 09:35, 19 June 2023 (UTC)Reply

@Theknightwho Thank you for writing this code. You know a lot more about the internals of MediaWiki than I do. I think we should rely on the strip markers for now, both because this method is more efficient and because the strip markers actually seem less likely to me to change than the other method: They've been around quite awhile and I think a fair amount of code relies on them. I also think it would be worth reaching out to the MediaWiki developers if possible to see what their plans are, i.e. if and when they are planning to implement the above requested functionality, and if they have any plans to change the implementation of strip markers. I have definitely heard bad things about trying to contact the MediaWiki devs but I think if you can figure out who the relevant people are and reach out to them personally rather than through Phabricator or whatever, you might get better results. Benwing2 (talk) 00:20, 21 June 2023 (UTC)Reply
@Benwing2 Thanks - the first method was first written by Huhu9001, but I'm less optimistic than you are about it as they patched out something very similar involving the Cite extension, which is the reason it's only possible to unstrip nowiki tags and not any other strip markers.
If we do decide to go ahead with this, one use could be to deprecate {{l-self}}, {{m-self}} and so on. Page parsing is memory-cheap (because it's stored as a string), and the best way to do it is to have a gmatch call in Module:links/data which builds a table of the L2 indexes for each language on the page, which can then be accessed by any link template via mw.loadData. That way, it only needs to be done once. Theknightwho (talk) 14:59, 22 June 2023 (UTC)Reply
@Theknightwho I see. Since we have two ways of doing things we can always use the strip marker functionality and switch over if/when they break it. Benwing2 (talk) 18:28, 22 June 2023 (UTC)Reply
@Benwing2 Cool - that works. We should probably discuss it, but I did a quick test of self-links in {{l}} and it works as expected. Based on the current setup, I feel we should use strong for self-links if: (a) it's under the language's L2 and (b) no id= is given.
Because it's useful for any module that does page scraping on the current page, I've also added parsed headword info for the page to Module:headword/data (which seems to work fine - no new additions to CAT:E). With a bit of work it could be turned into something quite sophisticated, but for now it just gives the first and last indexes for strings and headings for each L2 on the page.
Module:headword/data is becoming a bit of a repository for general info about the current page, as it's a convenient way to make sure things like this are only done once per parse. We might want to split it out, though. Theknightwho (talk) 19:14, 22 June 2023 (UTC)Reply
@Theknightwho Can you explain what you mean by "once per parse"? Also your algorithm for using bold in place of a link seems fine to me. Benwing2 (talk) 19:47, 22 June 2023 (UTC)Reply
@Benwing2 It's because Module:headword/data should be accessed via mw.loadData, which means it's only run once for the whole page. Edit: my mistake - I wrote "parse" not "page".
I'll do some performance testing on the self-links to see what the overhead is. Theknightwho (talk) 19:52, 22 June 2023 (UTC)Reply
@Theknightwho Right, sorry, I was thinking this wasn't possible because of the code in Module:headword/data, but I realize that's only the case when the data file contains closures in the returned data. BTW my experience with the {{place}} data has made me realize that loadData() doesn't always decrease memory usage; in the case of {{place}} it actually increased it probably due to all the table wrapping. Benwing2 (talk) 19:56, 22 June 2023 (UTC)Reply
@Benwing2 I think it's because (a) the parser adds metatables to every subtable, and (b) it'll never get garbage collected. The things I've added are mostly expensive calculations that we need the headword template to handle, which means large pages might be accessing it tens or hundreds of times at a minimum.
In theory, we could pre-parse (e.g.) every link template or headword on the page and store the output as strings, which means the "real" calls just needs to access the data module to get their output. Could be more flexible a way to get the benefits of {{multitrans}}, with the added benefit of not needing to modify the wikitext. Theknightwho (talk) 20:12, 22 June 2023 (UTC)Reply
@Benwing2 I may have found a viable long-term solution to the memory issue - Module:User:Theknightwho/preparser is an implementation of the pre-parsing idea I mentioned above: it uses Module:User:Theknightwho/preparser/core to build a memoised table of template outputs, which can then be accessed by each template call via mw.loadData.
To adapt a module, it needs to be added to the templates table in the core module, which lists the module name, function name and any arguments. A small piece of code also needs to be added to the module itself, which is used to pull down any precalculated outputs.
The bulk of the code is actually for modified frame objects, which means they can be reused hundreds of times by simply swapping out the arguments tables. child_frame also carries a secret argument that tells a module whether it's being called by the core module, which is necessary to prevent an infinite loop. It's deleted immediately after the check, so shouldn't have any impact. By using pcall, we can simply fail a given memoisation if it throws an error, meaning that the error message remains localised to the template in question (unlike multitrans, where an error takes down the whole section).
Currently I've only put {{t}} ad {{t+}} in, and a quick test shows it to be about 30% as efficient as multitrans - however, (a) it can be generalised to the whole page, and (b) it can be combined with multitrans anyway. Theknightwho (talk) 00:55, 23 June 2023 (UTC)Reply
It also occurs to me that you can use this method to pass arbitrary info between invokes: you just store whatever you need in the modified frame object, and access it again from the later call. The "real" invocation might not be able to see it, but that doesn't matter because it's just copy-pasting the result from the preparse. Theknightwho (talk) 01:26, 23 June 2023 (UTC)Reply
@Theknightwho Cool, I will take a look. BTW what is the best way to parse a page and figure out what section a given template is in? I want to replace {{zh-syn-list}} and {{zh-ant-list}} with {{col3}}, which will work fine as long as I can figure out what section (usually called Synonyms or Antonyms) that the template is within. Module:templateparser doesn't provide this functionality. Is there an existing way of doing this or do I have to roll my own? Benwing2 (talk) 02:13, 23 June 2023 (UTC)Reply
@Benwing2 There isn't one at the moment, but it would be useful to have. Thanks for all the work you've been doing. Theknightwho (talk) 02:17, 23 June 2023 (UTC)Reply
@Theknightwho You're welcome. BTW I hacked up an implementation of {{saurus}}; you can see examples in User:Benwing2/test-saurus. This should make it possible to replace {{zh-syn-list}} and {{zh-ant-list}} (redirects to {{zh-der}}) as well as {{zh-syn-saurus}} and {{zh-ant-saurus}} (which call {{zh-der}} internally). Benwing2 (talk) 03:23, 23 June 2023 (UTC)Reply
@Benwing2 Fab - thanks for that. By the way - I've just linked Module:translations into Module:preparser as a trial, and I'm seeing results that are about 50% of the effectiveness of multitrans (e.g. it just brought teacher down from memory errors to 40MB, while multitrans manages about 30MB). With some optimisations, I'm sure we can do a lot better than that. Theknightwho (talk) 03:28, 23 June 2023 (UTC)Reply
@Theknightwho Sounds good. You might want to try this on something other than translations, since translations are more or less "solved"; e.g. on some of the templates where we currently have to use "lite" versions. Benwing2 (talk) 03:34, 23 June 2023 (UTC)Reply
@Theknightwho Ignoring Module:translations/multi-nowiki, which has its own problem, Template:hu-conj-ok, which is a template-include-size error due to its documentation subpage calling it too many times, the appendixes, which are Lua timeout errors, and the two Etymology scriptorium pages that are due to the preparser having issues with hyperlinks in |pos= parameters in other instances of {{m}} in other parts of those pages, that still leaves 39 pages with memory errors. It seems like just a few days ago we were down to about half of that. Chuck Entz (talk) 22:04, 23 June 2023 (UTC)Reply

Template sl-pr edit

Hi,

I just want to ask if you have found some time to start updating the {{sl-IPA}} template we talked about in January. I am planning to become a bit more active during the summer break and it would be great to have it ready. Perhaps I was requesting too much, but please at least make it work for Standard Slovene (you can forget about Natisone Valley and Resian standards if you have a lot of work). It would however be pretty important to include all those distinctions and SNPT pronunciation (only phonologists use IPA in Slovenia, even other linguists use SNPT). It would be great to also add that it also automatically generates rhymes and hyphenation. For rhymes, only /ɪ/ is included of the allophones as it is the allophone of more phonemes, otherwise the phonemic IPA is used. For hyphenation, if the number of consonants between two syllables is odd, the following syllable should take the additional one (e.g. ses∙tra), except sequences lj and nj should stick together if followed by a consonant. For the noun-inflection templates, I already made a 70kB large template {{sl-infl-noun}}, which declines the noun more or less automatically. Thank you again for helping. Garygo golob (talk) 15:31, 19 June 2023 (UTC)Reply

@Garygo golob Not sure I will have time to work on this, but I will take a look at the prior discussion; can you point me to where it is? Benwing2 (talk) 02:16, 23 June 2023 (UTC)Reply
@Benwing2 It can be found on this link. Thank you. Garygo golob (talk) 07:11, 26 June 2023 (UTC)Reply

Spanish plural issue edit

Hiya - I noticed this diff by WingerBot broke the plural of Spanish teacher, as the plural displays as "#s". Theknightwho (talk) 19:58, 23 June 2023 (UTC)Reply

@Theknightwho Thanks. I meant to fix Module:es-headword to account for that usage; will fix now. Benwing2 (talk) 20:02, 23 June 2023 (UTC)Reply

Tagalog pronunciation module sandbox edit

This is months in the making, but can you look at the Tagalog pronunciation module sandbox. I am adding functionality based on the Spanish template you started and expanded, but I have a hard time excising unnecessary code such as with pronunciation styles. I removed the pronunciation styles as not relevant for Tagalog, but the module should handle the occasional dialectal pronunciations as well. You may have received my ping into the module talk page, but that was at midnight. TagaSanPedroAko (talk) 20:55, 23 June 2023 (UTC)Reply

@TagaSanPedroAko Hi. I don't know much anything about Tagalog pronunciation so I'm not sure I can help you here. You might start with a version that doesn't deal with any dialectal variations and add that later once you have the standard pronunciation working. You definitely should create some test cases; see Module:es-pronunc/testcases for how to do this. Benwing2 (talk) 23:57, 24 June 2023 (UTC)Reply
For info on Tagalog pronunciation, there's a appendix on Tagalog pronunciation, also, there's Help:IPA/Tagalog in Wikipedia. I already have test cases for the sandbox version. TagaSanPedroAko (talk) 19:47, 25 June 2023 (UTC)Reply
Can this be followed up? Test cases for the expanded module adding support for automatic syllabification and rhymes currently gives error (table, no pronunciation generated). It's been months. Have a hard time to make it work.
TagaSanPedroAko (talk) 21:21, 27 June 2023 (UTC)Reply
@TagaSanPedroAko I think you're asking me to fix the module for you. Writing these modules is a lot of work and I don't have a lot of time right now; I have my hands full with existing requests. Benwing2 (talk) 21:23, 27 June 2023 (UTC)Reply
I understand. Also I'm editing from mobile. It's the complex nature of the original Spanish pronunciation module that makes it hard to adapt to Tagalog: there's this support for pronunciation styles. I stripped the adaptation of style support but the original is far too complex to begin with. TagaSanPedroAko (talk) 21:26, 27 June 2023 (UTC)Reply
I've just created an experimental Tagalog pronunciation template run by the sandbox module and able to trace some errors due to omissions regarding temporary substitutions for syllable dividers (which I just fixed). Hope this should work now, but other issues are deep in the module.
TagaSanPedroAko (talk) 21:51, 27 June 2023 (UTC)Reply
Sandbox version is nearly working, removing what remains of the code for handling Spanish pronunciation styles, but still far from stable. There's still issues, and it shows on use cases for the experimental {{tl-pr}} I just created.
@TagaSanPedroAko Sounds good, I would keep going in the same vein. Benwing2 (talk) 03:27, 28 June 2023 (UTX)
Appreciated. Thank you! TagaSanPedroAko (talk) 03:29, 28 June 2023 (UTC)Reply
Just noticed this es-pron derivative for Basque: Module:eu-pron/new. Possibly there's the key to what fixes are needed for the Tagalog pronunciation template. TagaSanPedroAko (talk) 19:55, 29 June 2023 (UTC)Reply
Unfortunately, the Basque one actually has some dialectal considerations, so holding off on this. Have tried to remove code triggering error messages, but only made the problem worse by rendering {{tl-pr}} useless. Please fix, thank you. TagaSanPedroAko (talk) 20:42, 29 June 2023 (UTC)Reply
@TagaSanPedroAko You can't just remove error messages; that only masks the underlying problem and doesn't fix it. Benwing2 (talk) 20:48, 29 June 2023 (UTC)Reply
I mean cutting out the code triggering the error.
TagaSanPedroAko (talk) 20:53, 29 June 2023 (UTC)Reply
@TagaSanPedroAko In general this leads to the same problem. Benwing2 (talk) 20:56, 29 June 2023 (UTC)Reply
Ah ok, holding off from editing the module directly.
Most of the module works fine already, but the problem is in line 1654. I think the problem lies somewhere. Stripped it of more code from the original Spanish to handle regional pronunciations, but there could be more as like what will be used to implement {{tl-IPA}}, the current template used to generate Tagalog pronunciation.
TagaSanPedroAko (talk) 21:03, 29 June 2023 (UTC)Reply

Changes made to Chinese thesaurus edit

Do you know what's going on with the Chinese thesaurus at the moment? The pinyin is not displaying (randomly it seems) for certain terms, e.g. at Thesaurus:了解, 清楚, 知道, 覺, etc. Also, I note that the head word is now showing up on the synonyms/antonyms list in the respective entries. Can this be disabled? ---> Tooironic (talk) 23:41, 24 June 2023 (UTC)Reply

@Tooironic I switched the Thesaurus to use generic templates instead of Chinese-specific ones. I'm not sure why the pinyin isn't showing up for certain terms, maybe User:Wpi and/or User:Theknightwho can debug this as I'm not sure how that code works. Also, what do you mean by "the head word is now showing up on the synonyms/antonyms list in the respective entries"? Can you give an example? Benwing2 (talk) 23:44, 24 June 2023 (UTC)Reply
For example, at 提供, the word itself (提供) appears in the synonyms list. This was not happening before the changes were made. ---> Tooironic (talk) 23:50, 24 June 2023 (UTC)Reply
Ahh right, yeah that is a bug; I'll look into fixing it. Benwing2 (talk) 23:51, 24 June 2023 (UTC)Reply
Thank you. ---> Tooironic (talk) 00:07, 25 June 2023 (UTC)Reply
@Tooironic OK, this issue should be fixed. Benwing2 (talk) 00:23, 25 June 2023 (UTC)Reply
Thanks again! ---> Tooironic (talk) 00:45, 25 June 2023 (UTC)Reply
@Tooironic All the terms where the pinyin isn't showing up seem to have two pinyins listed on their respective pages, which is probably what's going on. However, in most cases where there are two pinyins, one of them is just a variant of the other with neutral tone for the second syllable, so maybe we can handle these specially. Benwing2 (talk) 23:47, 24 June 2023 (UTC)Reply
@Tooironic previously the zh-specific templates were using Module:zh/extract and returns the first translit listed on the page (which is not always correct), which has some weird and unefficient code in it. It's now handled by Module:zh-translit, which now shows nothing when there are multiple readings. You can find a list of links with such situation at Special:WhatLinksHere/Template:tracking/links/manual-tr/zh.
The reasoning behind this is that always displaying a pinyin encourages people to not check them and leads to incorrect information. It's pending User:Theknightwho to add a functionality to do this more smartly by indicating a desired reading on the entry, but we've not gone into the details yet. TBH I think a better way would be using {{etymid}}s, but it seems the discussion or progress is going nowhere at the moment. – Wpi (talk) 07:47, 25 June 2023 (UTC)Reply
I think it's fine to display the most common pinyin reading by default. Careful editors like me will always check and modify readings where necessary. ---> Tooironic (talk) 08:11, 25 June 2023 (UTC)Reply
@Tooironic: yes, there are careful editors, but there are also people who don't speak Mandarin fluently (e.g. myself), or even not knowing Chinese at all (e.g. people who need to mention Chinese terms in the etymologies of other languages); these groups of people are, and will be the sources of such errors. – Wpi (talk) 18:38, 26 June 2023 (UTC)Reply
@Tooironic, Wpi I am inclined to agree with Wpi; it seems for every careful editor out there, there are 10 non-careful editors who either won't notice the error or won't care about fixing it. Benwing2 (talk) 22:24, 26 June 2023 (UTC)Reply
But there are soooo many 多音字. It will just create more hassle with little benefit. ---> Tooironic (talk) 00:20, 27 June 2023 (UTC)Reply
If there are really two readings, what good does it do to pick one at random when it may be wrong? IMO better to show both than pick one. Benwing2 (talk) 01:32, 27 June 2023 (UTC)Reply

Incorrect form of German noun edit

Hello, could you please delete the page “Nerden” (see talk page)? I have already requested the deletion of the page here a while ago. --Latisc (talk) 10:30, 25 June 2023 (UTC)Reply

Duplicate Swedish participles edit

Hi! First of all: thanks for the recent changes to the Swedish participles. The new format makes so much more sense, and the old system has been bugging me forever. However, I see that in some instances, the bot seems to have created duplicate "Participles"-headers. See this edit. I suspect this is a relatively easy fix. For all pages that have two identical ===Participle=== {{head|sv|past participle}} # {{past participle of|sv|...}}, just remove the second one. Again, thanks! Gabbe (talk) 11:50, 26 June 2023 (UTC)Reply

@Gabbe Should be fixed. Thanks for your well wishes; the old situation was an utter mess and I'm glad you appreciate the changes. In the case of vederbörande, it had entries for both past and present participle, which seems wrong, so I deleted the past participle entry. Benwing2 (talk) 22:22, 26 June 2023 (UTC)Reply

Slovak and Old Czech modules edit

Hello, I would like to thank you for your work at Czech modules. But Slovak has no noun or adjective declension modules, so I'd like to ask whether you're going to do something there. If not, I could possibly try to create “Module:sk-adjective” or “Module:sk-noun” the same way the Czech ones are. I will probably give up soon, but in case you aren't going to create them, can I try to create those two modules using your Czech modules as a pattern? And Old Czech has literally no modules. Can I try to create them using the existing Czech modules and just adjust them? I am worried about your and other authors' copyrights and I don't even know whether I have the right to create modules. Zhnka (talk) 07:09, 27 June 2023 (UTC)Reply

@Zhnka I don't have any current plans to work on Slovak inflection modules. I still need to finish the Czech verb module, which takes precedence. So go ahead and work on creating Slovak declension modules. Nouns are much more complex than adjectives so I would start with adjectives. Yes you have the right to create modules and you don't need to worry about the copyright issues for Wiktionary module code that you're copying into new module code. All Wiktionary module code is automatically released under two licenses: (1) the CC By-SA 4.0 license (i.e. you can use the code as you see fit as long as you keep the attribution and the existing licensing conditions) and the GFDL (GNU Free Documentation License, which is generally considered a pretty awful license so you can more or less ignore it). You can actually see the mention of these licenses just above the "Publish changes" button: It says (for me at least)
By clicking the “Publish changes” button, you are agreeing to the Terms of Use and the Privacy Policy, and you irrevocably agree to release your contribution under the CC BY-SA 4.0 License and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Benwing2 (talk) 07:52, 27 June 2023 (UTC)Reply
I only added missing pronoun forms to the template cs-noun. I just don't know how to put two footnotes at once to "něho". I hope I helped. Zhnka (talk) 15:16, 6 July 2023 (UTC)Reply

Spanish gender-neutral terms edit

At amigue and related entries, I've noticed the WingerBot has replaced the n (neuter/neutral) marker with "mf by sense" which isn't really accurate for how the terms work inherently. This change should be reverted; I'd revert it myself, but I'm unsure of how wide-reaching this change was. AG202 (talk) 04:16, 5 July 2023 (UTC)Reply

@AG202 I made the change on all such gender-neutral terms, which is probably around 10 or so. Can you explain why this isn't accurate? It seems to me that a term like amigue can be used to refer to both male and female friends, hence el amigue or la amigue, very similar to a term like terrorista that does not have an inherent gender and can be either male or female depending on the reference. The issue I have with n is that (a) there's not really any neuter gender in Spanish, and (b) to the extent there is, it occurs with terms like ello that are very different conceptually from amigue. Benwing2 (talk) 04:22, 5 July 2023 (UTC)Reply
@Benwing2 "Amigue", by nature of being a gender-neutral neologism, can't and shouldn't be used with the gendered el or la but instead the general-neutral neologism article le, so le amigue or une amigue simpátique. It doesn't follow the norms of "standard" Spanish grammar to begin with. It is not the same thing as words like terrorista or artista. As for the usage of "neuter", if we don't feel comfortable using n since it's used for things like ello, then it shouldn't have a gender marker at all (or maybe a new one). AG202 (talk) 04:40, 5 July 2023 (UTC)Reply
@AG202 I see, so maybe in this case we should use "c" = common gender. This is normally used for languages such as Danish and Dutch where masculine and feminine are merged but it could be coopted for this use. Benwing2 (talk) 04:51, 5 July 2023 (UTC)Reply
@Benwing2 Hmmm I'm still not sure if that'd be best. I'll ping the other Spanish editors: (Notifying Ungoliant MMDCCLXIV, Metaknowledge, Ultimateria, Koavf): AG202 (talk) 20:11, 5 July 2023 (UTC)Reply
@AG202 What is your preferred solution? Benwing2 (talk) 20:34, 5 July 2023 (UTC)Reply
To the extent that I've seen (e.g.) latine in the wild, it's been with neo-articles, not el/la articles, so in my head as a non-native, it seems like it's neuter, not bigender. —Justin (koavf)TCM 21:15, 5 July 2023 (UTC)Reply
Maybe then we should create a new gender gender-neutral; using n will auto-classify the nouns in CAT:Spanish neuter nouns, which feels wrong to me. With a new gender, we'd get CAT:Spanish gender-neutral nouns. Benwing2 (talk) 21:19, 5 July 2023 (UTC)Reply
That works for me honestly. AG202 (talk) 02:07, 6 July 2023 (UTC)Reply
Sounds good. Neuter or common seem out of place here. Ultimateria (talk) 20:02, 6 July 2023 (UTC)Reply
@AG202, Koavf, Ultimateria OK, I added gneut as a new gender abbreviation for gender-neutral, which displays as gender-neutral. Currently the tooltip just says gender-neutral as well, but maybe we can change it to something more explanatory. You can see an example of this in amigue. Let me know if this abbreviation works, and I'll fix up the remaining pages. (Note also, currently amigue is getting added both to CAT:Spanish gender-neutral terms and the new category CAT:Spanish gender-neutral nouns; I'll clean this up.) Benwing2 (talk) 22:09, 6 July 2023 (UTC)Reply
I was thinking about this thread and remembered that we had this discussion a couple of years back: Wiktionary:Beer_parlour/2021/November#Gender-neutral_Spanish_neologisms_(amigx,_maestrx,_etc.). Not sure if this is just redundant, but I figured I would alert/remind you. —Justin (koavf)TCM 07:22, 18 July 2023 (UTC)Reply
@Koavf Thanks. I was waiting for comments/go ahead from one of you, which no one gave, but I'll assume y'all are OK with it. I'll take a look at the BP thread tomorrow; now it's sleepy time. Benwing2 (talk) 07:25, 18 July 2023 (UTC)Reply
Oh yeah, I have no objection: silence is approval here. Sweet dreams. —Justin (koavf)TCM 07:27, 18 July 2023 (UTC)Reply

Need your input on a policy impacting gadgets and UserJS edit

Dear interface administrator,

This is Samuel from the Security team and I hope my message finds you well.

There is an ongoing discussion on a proposed policy governing the use of external resources in gadgets and UserJS. The proposed Third-party resources policy aims at making the UserJS and Gadgets landscape a bit safer by encouraging best practices around external resources. After an initial non-public conversation with a small number of interface admins and staff, we've launched a much larger, public consultation to get a wider pool of feedback for improving the policy proposal. Based on the ideas received so far, the proposed policy now includes some of the risks related to user scripts and gadgets loading third-party resources, best practices for gadgets and UserJS developers, and exemptions requirements such as code transparency and inspectability.

As an interface administrator, your feedback and suggestions are warmly welcome until July 17, 2023 on the policy talk page.

Have a great day!

Samuel (WMF), on behalf of the Foundation's Security team 23:02, 7 July 2023 (UTC)Reply

Just to save you some time: edit

The module error at Wiktionary:Grease_pit/2019/April is caused by this edit. It's hard to find because it's literally hidden at the very bottom of the page. Even after I narrowed it down to the section by previewing one section at a time, it took a while to spot the "Click to show or hide list" markup that hides the error message. I don't think you could have hidden it better if you tried. Chuck Entz (talk) 00:52, 8 July 2023 (UTC)Reply

The error at Template:pi-nr-inflection of/documentation is caused by the same thing. Chuck Entz (talk) 00:56, 8 July 2023 (UTC)Reply
@Chuck Entz Apologies, not my intention to make it difficult to find. Benwing2 (talk) 01:13, 8 July 2023 (UTC)Reply
No need to apologize. I found it amusing, which is the only reason I mentioned what I went through to find it. Thanks for cleaning it up. Chuck Entz (talk) 01:46, 8 July 2023 (UTC)Reply

Early Medieval Latin (EML.) edit

Any chance that this could be added as an etymology-only language, similar to LL. or VL.? Nicodene (talk) 12:54, 12 July 2023 (UTC)Reply

@Nicodene We should definitely have Early Medieval Latin, but we're generally moving away from nonstandard etym-only codes as they add unnecessary complexity. Theknightwho (talk) 14:16, 12 July 2023 (UTC)Reply
@Nicodene, Theknightwho I am not opposed to adding an etymology-only language but I agree it would be better to just use la-eme or similar as the code. Benwing2 (talk) 18:13, 12 July 2023 (UTC)Reply
I'm fine with any code. Nicodene (talk) 18:44, 12 July 2023 (UTC)Reply
@Nicodene I added Early Medieval Latin with code la-eme; but hold off for a bit in using it until User:Theknightwho weighs in because they might prefer a different code (a more canonical code might be la-med-ear but that's four more chars to type). Benwing2 (talk) 19:13, 12 July 2023 (UTC)Reply
@Benwing2 No objections from me - I prefer being practical about it. Theknightwho (talk) 19:15, 12 July 2023 (UTC)Reply
Thanks. Nicodene (talk) 19:15, 12 July 2023 (UTC)Reply

Deletion tags edit

Why did you remove the deletion tags without resolving the problem?

For example, you now claim that a letter which may only exist in Aiton is "translingual". Why intentionally add what appears to be false information?

In other cases, you claim that letters which are not confirmed to exist in any language are [mul] translingual rather than [und] undetermined. kwami (talk) 04:19, 15 July 2023 (UTC)Reply

@Kwamikagami I asked you a few days ago to convert the deletion tags to {{rfd}} tags. Did you not get that ping? You can't just tag things with a speedy deletion tag when there is any controversy at all about whether to delete them. In many cases furthermore you deleted all the content and then tagged the page with a speedy deletion tag with the message "no content", which is just bizarre. Benwing2 (talk) 04:26, 15 July 2023 (UTC)Reply
In general you need to start a discussion on these issues *BEFORE* simply going and making all the changes. Benwing2 (talk) 04:27, 15 July 2023 (UTC)Reply
Also I see in some cases other admins reverted your changes and then you restored them (sometimes more than once). Please don't edit war any more. Benwing2 (talk) 04:30, 15 July 2023 (UTC)Reply
I need to confirm it's acceptable to correct a particular error before I'm allowed to fix the error? And I need to request permission to tag a spurious article for deletion before tagging it for deletion? Could you point me to where it says that in our policy? Because that would mean I can create all sorts of bogus content on Wiktionary, and it would take weeks to months for people to get rid of it. kwami (talk) 04:35, 15 July 2023 (UTC)Reply
I didn't delete any content. I deleted nonsense, and without the nonsense, there was nothing left. But even if I'd left the nonsense in, there would still not be any dictionary content, so the deleting reason would still be valid. kwami (talk) 04:48, 15 July 2023 (UTC)Reply
@Kwamikagami Speedy deletion tags are for non-controversially bogus content, which isn't the case here; many of these articles have been around for years before you came along. As for "correcting an error" and "nonsense", these are your opinions and not clearly the case. Benwing2 (talk) 04:51, 15 July 2023 (UTC)Reply
Calling Burmese "translingual" is clearly nonsense. That's not just my opinion, but the opinion expressed at the Beer Parlor -- language entries should be under the appropriate language header, and translingual sections should not be about specific languages. (Though a list of applicable languages would of course be acceptable.)
Also, there has been consensus that, when someone creates an entry for a Unicode character and blindly copies the Unicode name as the "definition" of the character, the article has no content and should be deleted. Unicode names are often inaccurate, after all, and in any case a name is not a definition.
If someone comes across a Unicode character and wonders what it means or what it's used for, sees that Wiktionary has an article on that character and comes here for clarification, and all we do is repeat the Unicode name as if that meant something, then the entry is useless. kwami (talk) 04:58, 15 July 2023 (UTC)Reply
@Kwamikagami I haven't been following the BP discussion closely so I can't say whether there is consensus but I do see some people disagreeing with you. Furthermore many of your changes made the entries ill-formatted and generally worse off, and you still seem to be missing the fundamental point that speedy deletion is inappropriate in these cases. Benwing2 (talk) 05:03, 15 July 2023 (UTC)Reply
Okay, but in many cases you're still purposefully restoring bogus material that I had fixed (at least provisionally) and not just deleting the tags.
As for empty articles, I'll take an example from a zoom call I had today. What if the only "content" for the article ɿ was "a reversed r with a fishhook". Considering that the letter is not a reversed r and does not have a fishhook, should that really be acceptable as a Wiktionary definition?
As for tagging these all for more substantial discussion, there are hundreds of such articles. Do we really want to clog up the discussion board with them, and have to wait weeks to months for resolution, when there are admins who are willing to delete articles that have no actual content without wasting everyone's time? kwami (talk) 05:09, 15 July 2023 (UTC)Reply
P.S. I see no indication that User:RichardW57m is an admin. Is he really? He sure doesn't act like it. kwami (talk) 05:17, 15 July 2023 (UTC)Reply

Take Ʈ. Can you honestly tell me that this letter is used across multiple languages? Because I've looked, and haven't been able to find a single one. Not saying they don't exist, but isn't it a bit premature to call it "translingual"? kwami (talk) 05:19, 15 July 2023 (UTC)Reply

@Kwamikagami OK, you're writing a lot. To respond to your points one by one:
  1. Your attitude is often hostile and confrontational; I ask you to tone it down.
  2. I restored the old material because you made a lot of ill-advised changes (which are often even contradictory to your arguments above), such as changing Translingual to Undetermined, deleting *all* the content in many cases, tagging with speedy deletion rather than trying to add explanations (if that's what you believe is lacking) and disingenuously tagging with "no content" after deleting the content. I don't have a strong opinion about whether these articles should be Translingual, but these sorts of changes are unwelcome regardless.
  3. Yes this needs to go through RFD; it's not wasting people's time, and you can RFD all the articles together if you prefer. How many times do I have to tell you that speed deletion is only intended for entirely uncontroversial changes? This *IS* a policy, also on Wikipedia, so surely you know it.
  4. RichardW57 is not an admin but I never said he was, and I didn't see any evidence he reverted your changes; it appears to be admins who reverted you.

Benwing2 (talk) 06:06, 15 July 2023 (UTC)Reply

It was mostly RichardW57 who reverted me, insisting that the Burmese Army makes Burmese superior to other languages, so Burmese must be "translingual" (and that it's a "ban-worthy" edit to deny that), while all other languages of Burma are properly pronounced as Burmese. In other words, he's an admitted bigot, and yes, that does make me rather hostile. kwami (talk) 06:18, 15 July 2023 (UTC)Reply
@Kwamikagami OK, I missed that, my apologies. I saw reverts by Surjection, Thadh and TheDaveRoss, all of whom are admins. Honestly no one seems to like RichardW57 much so don't get upset about his garbage. Benwing2 (talk) 06:46, 15 July 2023 (UTC)Reply
All right, I'll try to cool down.
At first I thought maybe he didn't recognize how 'script' and 'alphabet' are used on the WP pages he linked to, but no, he clarified that he means them in exactly that sense, and actually said that it's the army that makes Burmese superior to other languages. kwami (talk) 06:50, 15 July 2023 (UTC)Reply
Bigotry is not tolerated around here. I have not had a chance to read the thread in question but bigotry is a blockable offense. Benwing2 (talk) 07:52, 15 July 2023 (UTC)Reply
He even deleted entries for Burmese, arguing that they're redundant because they duplicate the translingual entry, which is specifically for Burmese, with the sorting order of the Burmese alphabet (and a link to the WP article on the Burmese alphabet), IPA transcription as Burmese, audio recording in Burmese, etc., and this is deliberate rather than due to misunderstanding of the scope of "Burmese" in this context (ethnic Burmese, not the country of Burma), and that it's justified by the Burmese army. He clarified this when I tried fixing it to the Mon-Burmese script (with a link to that article) or did anything else to remove specifically Burmese elements from a supposed translingual section. kwami (talk) 08:28, 15 July 2023 (UTC)Reply
@Benwing2: What is the Wiktionary part of the process for extracting the £20,000 (includes prompt payment discount) compensation from user Kwamikagami for these libellous statements? --RichardW57 (talk) 02:02, 29 July 2023 (UTC)Reply
@RichardW57 there isn't any, and anything that can be construed as a legal threat is always a bad idea. Chuck Entz (talk) 02:28, 29 July 2023 (UTC)Reply

Replacement of unnecessary redirects and templates edit

Hello, here is another batch of redirects and templates to replace:

Thank you. — Sgconlaw (talk) 16:13, 15 July 2023 (UTC)Reply

@Sgconlaw Should be done. Benwing2 (talk) 03:57, 17 July 2023 (UTC)Reply
Thanks very much! — Sgconlaw (talk) 17:25, 17 July 2023 (UTC)Reply

Customising the Lua error function to mitigate errors edit

Hiya - while working on the parser, I was thinking that it would be good to have a customised Lua error function so that we can catch errors in very large Lua invokes without causing the whole thing to grind to a halt. As an example, imagine if {{multitrans}} was able to catch errors caused by individual translations, and to display them as though they were in their own separate invocation. This would greatly reduce the risk of whole-section or even whole-page failures. It's probably possible using a combo of xpcall and debug.traceback, but I just wanted to get your insight. Theknightwho (talk) 15:41, 18 July 2023 (UTC)Reply

@Theknightwho I think it depends on whether adding a bunch of xpcall or similar function invocations will increase memory usage. I found when I implemented language-specific form-of data that I needed to have a list of the languages with modules to avoid having to pcall() the module invocation, which seemed to dramatically increase memory usage (at least when no module existed for a given language). Benwing2 (talk) 18:59, 18 July 2023 (UTC)Reply
@Benwing2 I think it should be okay - the wikitext parser makes tens of thousands of calls of xpcall, because when it sees (e.g.) {{{ it assumes it's an argument opening, and then uses errors to unwind the stack and fail out if it encounters a failure condition (meaning it'll then try parsing it as { followed by a template opening, and so on). Because everything between the layer start and the failure trigger needs to be unwound and rebuilt, the same bit of text can be reparsed many times under different sets of assumptions, each layer of which involves a call into xpcall. Caching massively speeds up this process.
I've just done a test on teacher, which is 20,506 bytes, and parsing it uses xpcall 34,943 times and uses 13.5MB of memory.
I'm not sure why pcall causes memory problems with failed module invocations. Does it happen if you try it with xpcall as well? Theknightwho (talk) 19:40, 18 July 2023 (UTC)Reply
@Theknightwho Haven't tried it, might try it at some point. Benwing2 (talk) 19:47, 18 July 2023 (UTC)Reply

Deprecated templates in Accelerated forms edit

Hi, you recently deleted {{Template:past tense of}} which has broken the accelerated forms for Danish. Presumably it's not too difficult to switch it over to instead use the current {{Template:inflection of}} but I don't know how, would you mind giving some pointers as to how I might go about it? Helrasincke (talk) 10:28, 19 July 2023 (UTC)Reply

@Helrasincke Sorry I will fix it. All you really have to do is specify the inflection tags in the accelerator form value in the link, and delete any special code in MOD:accel/da that uses {{past tense of}}. Benwing2 (talk) 18:25, 19 July 2023 (UTC)Reply
@Benwing2 Thank you - I don't yet understand enough to really contribute much to the module programming. For some reason it is now working on some verbs e.g. oplyse and not others e.g. tale (verb) (missing forms talende, talen. You need the OrangeLinks gadget enabled to notice though. On a side note, it also doesn't seem to work at all on nouns (see also tale (noun), missing form talen) - how simple would this be to get working? Do you need any extra info to get this working or am I better off asking at the grease pit? P.s. I've now given pønse a work over - I presume that's what you were after? Helrasincke (talk) 20:50, 19 July 2023 (UTC)Reply
@Helrasincke This shouldn't be hard to get working and I can fix it, just let me know what isn't working. I tried clicking on talende and talen and it seems to work for me. Benwing2 (talk) 22:02, 19 July 2023 (UTC)Reply
BTW the only difference I can see is that the forms talende and talen are already defined in another language. In cases like that you have to have the Orange Links gadget enabled I think, otherwise the acceleration won't work. Benwing2 (talk) 22:04, 19 July 2023 (UTC)Reply
@Benwing2 Thanks for that. I'm not sure what's going on with the OrangeLinks - I have the gadget enabled but perhaps there's something else going on on my end because I've definitely been able to use the accelerated generation in similar situations before, for instance with Ukrainian links where there was already a Russian entry present. As to your changes, it's now working properly for most forms but I'm still getting the wrong templates generated for a few forms (from what I can see it's mostly participles), for example at oplyse, the present and past participles are still using the old templates. Helrasincke (talk) 08:15, 20 July 2023 (UTC)Reply
@Helrasincke I actually made that change intentionally, to use the more specific {{present participle of}} and {{past participle of}}, on the theory that those two templates aren't being deprecated. However, I'm not sure the wisdom of it, maybe we should just use {{infl of}} like for everything else. What do you think? Benwing2 (talk) 19:13, 20 July 2023 (UTC)Reply
I see, that makes more sense. Yeah I think it'd be best to use {{infl of}} for everything other than the lemma entry. Helrasincke (talk) 19:17, 20 July 2023 (UTC)Reply
@Helrasincke OK I removed the special use of {{present participle of}} and {{past participle of}} and switched things to use {{infl of}} instead of {{inflection of}}. Benwing2 (talk) 19:28, 20 July 2023 (UTC)Reply

Module "bg-pronunciation" needs several fixes edit

Hi Benwing2,

I've noticed that the "bg-pronunciation" module deviates in several places from canonical treatments of Bulgarian phonology. As a native Bulgarian speaker and Wiktionary contributor, the number of resultant entries with incorrect IPA pronunciation truly bothers me.

What is the best way to point out issues and encourage their speedy resolution? The module's talk page doesn't seem to be getting much traction, and I don't know OTOH who the approved module admins would be so that I could escalate to them. I'd really appreciate your help!

Thanks,

Chernorizets (talk) 02:43, 21 July 2023 (UTC)Reply

@Chernorizets Thanks for pinging me. Apologies, I don't always follow the talk pages of various modules and templates, so in general I need to be pinged. Can you make a list of what needs to be fixed? Also maybe we can engage User:Kiril kovachev, who is also a native Bulgarian speaker. Benwing2 (talk) 02:51, 21 July 2023 (UTC)Reply
@Benwing2 no worries, and I'd be happy to try to come up with a good list. Where would it be best for me to provide that list?
Thanks,
Chernorizets (talk) 02:55, 21 July 2023 (UTC)Reply
@Chernorizets You can put it on the module talk page but make sure to ping me. Benwing2 (talk) 03:20, 21 July 2023 (UTC)Reply
Done. Thanks! Chernorizets (talk) 04:45, 21 July 2023 (UTC)Reply
@Benwing2 I believe @Kiril kovachev is done with the set of IPA fixes I had identified in the thread on the module talk page. I know the discussion got long and unwieldy - sorry about that - but, per Kiril's summary, the scope has been brought back to IPA improvements, and anything else we touched upon will get its own discussion when the time is right. What's next for getting Kiril's changes applied to the module? Those wrong pronunciations keep poking my eyes esp. now that I'm adding entries more regularly :-) No rush by the way, I just don't want this to fall by the wayside in view of newer Bulgarian-related discussions on BP such as anagrams, the Banat dialect, etc. Thanks! Chernorizets (talk) 09:08, 1 August 2023 (UTC)Reply
@Chernorizets Sorry, I will respond on the Module talk page. Benwing2 (talk) 19:45, 1 August 2023 (UTC)Reply

Something broke edit

Seems like something broke: Reconstruction:Proto-Indo-European/plew-. --{{victar|talk}} 06:30, 21 July 2023 (UTC)Reply

@Victar Is it still broken? Benwing2 (talk) 06:31, 21 July 2023 (UTC)Reply
I think this is because I made the Latinx -> Latnx change slightly out of sync. It should be all fixed now and I'm clearing the errors in CAT:E. Benwing2 (talk) 06:36, 21 July 2023 (UTC)Reply
I figured it was related to that. Looks to be loading now, thanks. --{{victar|talk}} 06:37, 21 July 2023 (UTC)Reply

New abuse filter - WingerBot help? edit

Hi - I noticed that WingerBot ran into the new abuse filter, which is supposed to prevent any language nesting under Malay or Indonesian in translation tables (with a couple of carve-outs for Jawi and Rumi, as they're legitimate script subheadings for Malay). We have a perennial problem of (mostly IPs) nesting Indonesian under Malay or vice-versa, or in some cases simply grouping any languages spoken in either country under the one heading.

Annoyingly, it's not possible to only run it on the newly added text (as we do with most filters), because if someone moves language X under language Y, only the moved lines are considered "added lines", and so the filter has to be run on the whole of the saved text instead. This means if a page already has the problem and someone edits an unrelated part of it, it still triggers the filter.

Would it please be possible for WingerBot to fix all of the affected pages? It would help to prevent random editors from unfairly running into the filter. Pinging @Chuck Entz, who may be interested to read this as well. Theknightwho (talk) 07:11, 21 July 2023 (UTC)Reply

@Theknightwho This should be possible although I need to look at the filter to see exactly how it works. In the meantime you might want to make it a non-blocking filter, i.e. the second time someone saves the page, it's allowed. As is, if there's an existing problem of this nature anywhere on the page, you can't save the page at all without fixing it, and the abuse filter name isn't so clear on exactly what needs to be done. Benwing2 (talk) 07:27, 21 July 2023 (UTC)Reply
@Benwing2 Thanks. I've changed it to only warn, and have created a specific notice that explains the issue (MediaWiki:Abusefilter-Malayic). Theknightwho (talk) 08:08, 21 July 2023 (UTC)Reply

attesting to Ʈ edit

Answered on my talk page. I finally found attestation of use. I don't see how correcting errors (e.g. you claimed Ʈ is used in Serer -- AFAICT it's not -- and that the capital form is Ƭ, also an obvious error) or providing an instance actual use (I can give you the citation if you like) would count as "messing" with an article, or how it could possibly be a blockable offense. kwami (talk) 07:11, 22 July 2023 (UTC)Reply

This discussion should happen in one place, either here or on your talk page but not both. You keep saying *I* claim such and such, e.g. Ʈ is used in Serer. By reverting your changes to the preceding stable version I'm not making any particular claims, but simply undoing the damage you caused. Benwing2 (talk) 07:18, 22 July 2023 (UTC)Reply
Should I revert myself? kwami (talk) 07:28, 22 July 2023 (UTC)Reply
Do you mean should you revert your changes to Ʈ? In this case someone else already did. Benwing2 (talk) 07:29, 22 July 2023 (UTC)Reply

Descriptions edit

I'm not trying to be disingenuous, but is it okay to move descriptions of letters to a 'Description' section? kwami (talk) 08:29, 22 July 2023 (UTC)Reply

@Kwamikagami There's no Description header here, see WT:ELE. But there's a 'Usage notes' section that you can use for arbitrary comments such as what the purpose of a letter is, which languages use which letters and how, etc. Benwing2 (talk) 19:46, 22 July 2023 (UTC)Reply
Hmm, I guess there is in fact such a header per WT:ELE although I haven't seen it used commonly. I would put such text under 'Usage notes'; see ʍ for an example. Benwing2 (talk) 19:48, 22 July 2023 (UTC)Reply
OK, I guess you put those usage notes there, but I'm fine with it. Benwing2 (talk) 19:51, 22 July 2023 (UTC)Reply
BTW it would be much better IMO for you to engage in discussions in the BP than ask me what to do and do it without discussions. Benwing2 (talk) 20:11, 22 July 2023 (UTC)Reply

Hani sortkey - mass data module deletion request edit

Hi - since the data for Module:Hani-sortkey was serialized, there's no point in having it split between 196 different modules anymore, which makes it totally unwieldy. Instead, I've consolidated it in a single (huge) table in Module:Hani-sortkey/data, which can be edited as necessary, and pointed the serialiser at that instead. This doesn't matter from a memory perspective, because it's not directly accessed. It might be that we decide it's better to split the massive table up (e.g. by Unicode block), but that still wouldn't necessitate hundreds of data modules like we have now.

Would it please be possible for you to mass delete Module:Hani-sortkey/data/001 to Module:Hani-sortkey/data/196 and their associated documentation pages? Theknightwho (talk) 14:47, 24 July 2023 (UTC)Reply

@Theknightwho Sure. BTW in general, for cases like this where a set of related data modules are split up (e.g. Module:languages/data/3/LETTER and Module:languages/data/3/LETTER/extra), we shouldn't be creating identical doc pages for each one; instead, add an entry to Module:documentation to auto-generate doc pages for all of them. Benwing2 (talk) 19:34, 24 July 2023 (UTC)Reply
@Benwing2 Cheers - thanks. I think these were created by Erutuon and pre-date that, but I'm not sure. Theknightwho (talk) 19:38, 24 July 2023 (UTC)Reply
These are deleted. Benwing2 (talk) 07:16, 25 July 2023 (UTC)Reply
Great - thanks. Theknightwho (talk) 11:33, 25 July 2023 (UTC)Reply

Some issues with sorting prefix categories edit

Hi Ben - I've noticed that Module:affix sorts "X terms prefixed with Y" categories by ignoring the prefix. On one level this makes sense, as the assumption is that all the terms start with the same prefix, so ignoring it means that you can group terms under different headings depending on what comes after the prefix. However, this leads to a few problems, one language-specific and one general:

  1. Japonic languages sort by reading, instead of by orthography. Ignoring the prefix means that (めん)(どり) (mendori) ( + ) is sorted as とり instead of めんどり, which is confusing and unintuitive (note the missing vocalisation mark on the stem). This is because Module:affix is ultimately telling Module:Jpan-sortkey that it wants to sort , not 雌鳥. In some cases this could be much worse, if the first kanji reading on the page happens to be totally unrelated.
  2. This is also a lesser problem for any other languages which scrape sortkeys, if the stem happens to be a redlink. Currently I think this only applies to Japonic languages, but I bet there are other languages where this would come in useful.
  3. More generally, it creates problems if we want to (e.g.) group together English anthrop- and anthropo-, because ignoring the epenthetic -o- in sorting is likely just going to lead to confusion. Even if English doesn't, I know that some languages do group together affixes like this (e.g. Mongolian groups vowel harmonic variants together, though I can't think of any prefixes which vary by vowel harmony off the top of my head).
  4. If you look at a category like Category:English terms prefixed with ab-, you can see that some terms are sorted by including the prefix (e.g. abenteric), which has happened because the page has {{af|en|ab-||t1=away from}}. This inconsistency is especially confusing.

While I completely understand why this was implemented, I just don't think the minor benefit (i.e. category headings) outweighs the issues this causes, and given points 3 and 4 I don't think this can be solved by simply exempting certain languages. Theknightwho (talk) 15:15, 24 July 2023 (UTC)Reply

@Theknightwho Hello. I don't think I actually implemented this particular hack, although I'd have to look at the history to make sure. As a result this probably needs a wider discussion. However, I do think it may be possible to make this work. For issues #1 and #2, we could treat languages that scrape sortkeys different (e.g. by not ignoring the prefix or whatever). More generally, I recently implemented language-specific affix mappings so that you can (and should) specify the surface form of the prefix or suffix and it will categorize it according to the "base" form (for example, currently for Latin, il-, im- and ir- are mapped to in-, so that a word like illēgālis can be written as {{af|la|il-|lēgālis}} and will be categorized under Category:Latin terms prefixed with in-). The same could be done with Mongolian vowel-harmonic prefixes, if any exist. If we have the affix mappings in hand, it should be possible to do sensible things with variants like anthrop- vs. anthropo- (although you'll need to explain more what the issue is here). As for (4), such specifications shouldn't exist but if they do, we can subtract the prefix from the headword to get the remainder. I understand your concerns and this is definitely a hack but the alternative is to have all words sorted under the same letter, which doesn't seem super helpful. Benwing2 (talk) 19:31, 24 July 2023 (UTC)Reply

odd Unicode 'macron' edit

Hi Benwing.

This redirect is problematic. For one thing, it's not covered at the target. For another, I'm not sure it should be: although called "macron" in Unicode, it's not a macron in the normal sense of the word and won't be used/defined the same way. I think it should probably link to the same place as the characters it joins to on either side (forming one long macron/overline diacritic) and defined according to how that diacritic is used. I don't know what that is, or I would have done it myself. But linking it to the macron article can only confuse people. kwami (talk) 09:09, 26 July 2023 (UTC)Reply

@Kwamikagami Which char is this? I can't extract it from the link above. BTW I'm trying to stay out of the discussions involving these chars; as with other chars, I merely restored what was there before. You need to engage the relevant people, e.g. User:RichardW57 and whoever created this page, and get consensus with them, not with me. Benwing2 (talk) 22:26, 26 July 2023 (UTC)Reply
Sorry, it should link now, if you can stop it from automatically redirecting. kwami (talk) 04:09, 27 July 2023 (UTC)Reply
It's U+FE26. It forms a diacritic with FE24 and FE25; it means nothing on its own. If anything, it should link to ◌͞◌, as it is similar but used for 3+ base characters, but according to Unicode it's intended specifically for Coptic. kwami (talk) 04:14, 27 July 2023 (UTC)Reply
@Kwamikagami Yeah I see your point, it is some special diacritic used for Coptic and should not be conflated with the regular Unicode macron. BTW I would strongly recommend once the block expires that you not make any changes to character entries (which includes adding new ones) until you've outlined in the BP what you want to change and make sure that other people are in agreement. The problem is that you have strong views on how these pages should look that are not what anyone else seems to feel, so if you just make changes without getting consensus to make them, you're going to run into trouble and probably get blocked again. Does this make sense? (I feel like I've told you this about 10 times now and you're not getting it.) I would also strongly suggest you think of this as a "spirit of the law" thing; your prior actions show that you act according to the "letter of the law" and do whatever you feel you can get away with within this, which is not going to work. Benwing2 (talk) 04:36, 27 July 2023 (UTC)Reply
Whoop whoop pull up created that RD, so I contact them.
It's not "what I feel I can get away with", I'm simply trying to improve entries and cut the amount of nonsense. If you could point out where you see me trying to game the system against the spirit of the law, I'd appreciate you pointing it out because I don't see it.
As for that not being "not what anyone else seems to feel", it's what quite a few editors seem to feel. There are multiple editors who agree that articles with no content should be deleted (indeed, in the edit history you'll sometimes see multiple deletions with edit summaries such as "no content apart from the Unicode definition", which is where I got that wording from), that translingual entries should be translingual and not about a specific language (with the attitude that of course they should, why are we even asking about it), that the information we provide should be verified and, if we cannot verify it, should be deleted, etc. None of these are controversial ideas apart from Richard who wants to delete individual language entries and to promote Burmese above other languages, and a couple editors who are under the impression that the Unicode Registry is an adequate source for dictionary entries.
Today hasn't been a good day for starting a new BP discussion on the recreated/unverified character articles, but I'll try to get to it tomorrow. kwami (talk) 05:07, 27 July 2023 (UTC)Reply
@Kwamikagami OK sounds good. What I mean by "letter of the law" is sometimes you've interpreted what I've said very literally, e.g. when I said there should be a moratorium on changes to single-char entries, at first you didn't respect that and then you respected it only for changes to existing entries and not additions of new entries of the same sort, which logically should be part of "changes". When I said you should discuss your changes and get consensus on the BP, you engaged only with the "Kwami block" discussion and not with the other discussions related to this issue. Also when you assert that none of your ideas are controversial apart from Richard, that's clearly belied by several people stating (in the "Kwami block" section, e.g. User:Theknightwho, User:AG202, User:Sameerhameedy) that you are going against consensus and not properly engaging the relevant editing communities before making far-reaching changes. I don't want to get in an argument over whether your ideas have consensus; they clearly don't at this point, and you need to establish that before further changes. Benwing2 (talk) 05:52, 27 July 2023 (UTC)Reply
Part of not commenting at other threads was having the time to do it, part was whether I thought I had anything to add. kwami (talk) 06:23, 27 July 2023 (UTC)Reply
@Kwamikagami If you don't have time right now, that is fine, but in that case changes should wait until you have the time to carry out the discussion. Benwing2 (talk) 06:27, 27 July 2023 (UTC)Reply
I will need to travel out of town for an indefinite period sometime soon because a friend is having medical problems, and I have financial and other stuff to take care of before I leave, so I don't know how much time I'll have in the near future. It might be a matter of getting done what I need to and then seeing if I have spare time for WT before the procedures are scheduled, or maybe waiting until I'm up there and things have settled down. kwami (talk) 05:41, 28 July 2023 (UTC)Reply
@Kwamikagami Wow, I am sorry to hear that. I hope your friend's medical procedures turn out well and everything goes according to plan. Take all the time you need for your friend and yourself; one's health and well being (including financial matters) always come first. Benwing2 (talk) 06:12, 28 July 2023 (UTC)Reply
@Kwami, Benwing2, RichardW57m: U+FE26 may be peculiar to the Coptic script, though I wouldn't be amazed to find it used on Latin script letters, but it might not be peculiar to the Coptic language. It might be used in Old Nubian - have you checked? Typing hurriedly, I think the correct approach would be to raise an {{rfc}} on the macron page. Would discussing the matter in U+FE26's talk page meet the spirit of the moratorium? A link there in the Beer Parlour might not be inappropriate, perhaps under 'Moratorium Avoidance'. The macron's page completely omits the Semiticists' use as a fricativisation marker, which should be familiar to students of Biblical Hebrew. --RichardW57 (talk) 08:10, 27 July 2023 (UTC)Reply
At least typing [[talk:&#xfe26;]] works to access the tale page; to access the page itself I had to use a URL ending "%ef%b8%a6?redirect=no", and I couldn't work out how to link to it from Wikitext. We also have the redirect between OVERLINE and COMBINING OVERLINE is the wrong way round. We will need a privileged person to do the swap so that attribution is not lost. (This is a legal requirement.) Once that is sorted out, I think U+FE26 should redirect to the page for OVERLINE. The left and right half overlines (U+FE24 COMBINING MACRON LEFT HALF and U+FE25 COMBINING MACRON RIGHT HALF) should, I think, also redirect there. 'LEFT' and 'RIGHT' refer to their positions in the combined overline, not to the positions of their inking! RichardW57m (talk) 11:42, 27 July 2023 (UTC)Reply
@RichardW57 Sure, discussions don't have to be on the BP, although if you put them on some random talk page they might be lost in the future. Might be good to have them centralized somewhere. Benwing2 (talk) 19:21, 27 July 2023 (UTC)Reply
For discussions about a page, the talk page is the centralised place. Discussion on the BP are at some random month, and can be hard to find later. I'll go link to this discussion from the talk page. --RichardW57 (talk) 19:35, 27 July 2023 (UTC)Reply
BTW if you need some pages swapped like this and there's no complaints about it, let me know and I can do it. Benwing2 (talk) 19:22, 27 July 2023 (UTC)Reply
OK, can you please swap OVERLINE and COMBINING OVERLINE round so that the entry for COMBINING OVERLINE is a hard redirect to OVERLINE in accordance with WT:CFI#Combining characters. --RichardW57 (talk) 19:30, 27 July 2023 (UTC)Reply
The problem with following CFI for the combining overline is that the supposedly non-combining character is also a combining overline in many fonts. If a reader uses such a font, they're going to have a hard time clicking on links to the titular character, just as people here have been having a hard time getting to the rd. With the official combining character, it can at least be combined with a null carrier so that a link to it can be clicked on, which isn't really an option for the supposedly non-combining character. kwami (talk) 02:54, 28 July 2023 (UTC)Reply
But after a swap, clicking on the combining line will take you to the nominally spacing character, if only via a hard redirect. I think we need a navigation tool to get to the entries of
  1. Unidentified combining characters.
  2. Characters by codepoint.
The first is the more important. That is something to discuss on the Grease Pit. We might already have the needed bits, and just need to add or publicise the links. --RichardW57m (talk) 08:44, 28 July 2023 (UTC)Reply
I don't know enough about Coptic use to know if it should be rd to the overline. But Greek also has this convention, the two are essentially the same script, and Greek use is covered at ◌̅. If that's the correct place for it, then yes, I think the Coptic characters should rd there too. We should add info boxes for them, though. kwami (talk) 02:59, 28 July 2023 (UTC)Reply
Actually, no, it's not the same as Greek, but your conception justifies treating them together. Overline for abbreviation should be handled by combining overline as in Greek. Overlines for names start and end at the middles of consonants, so doing them by character encoding requires a left half at the start and a right half at the end. I'm not sure why there's a separate CONJOINING MACRON. Perhaps it works at a different height above the characters, or perhaps the two can be used on the same bit of word and one needs to know which overline to join the ends to. It seems fairly horrendous, but I've seen Andrew Glass do quadrates in cartouches, which he says is as complicated as it seems. TUS recommends mark-up for getting the effect, which means it was seen as hard work for poor dumb 'smart fonts'. Incidentally, the Unicode Standard puts Coptic, surely an African script, in the middle of the first chapter of European scripts. --RichardW57m (talk) 08:57, 28 July 2023 (UTC)Reply
The left and right are for the ends of the overline, or for two-letter abbreviations. The conjoining macron is for the middle letters where there are three or more. kwami (talk) 11:31, 28 July 2023 (UTC)Reply
@Kwamikagami: For two-letter abbreviations, that's not what we read in TUS (Section 7.3, Supralineation) or see in Lesson 19 at https://www.suscopts.org/ssc/wp-content/uploads/2021/04/CopticLessons.pdf. (They clearly had font trouble preparing the viewgraphs.) Capital letters have undocumented special treatment. The mid-letter to mid-letter overlines are also reportedly used 'to distinguish words' - I couldn't find any examples of them in that quick introduction or in Wiktionary. --RichardW57m (talk) 13:42, 28 July 2023 (UTC)Reply
Could you link to the TUS section?
That is what we see in the Unicode proposal that was accepted.[7] The difference is that they expected the left and right characters to join with the combining overline; it looks like the UTC changed this by adding a dedicated conjoining macron.
For example, R + left-joining + N + right-joining is a two-letter abbreviation, whereas R + overline + N + overline is gematria (the number 140).
I don't know if with the addition of the conjoining macron, it or the overline is intended for use with gematria. kwami (talk) 22:19, 28 July 2023 (UTC)Reply
Presumably this is a message for User:RichardW57? Benwing2 (talk) 22:33, 28 July 2023 (UTC)Reply
@Kwamikagami: I don't know how to link to the subsections of TUS. I navigate to TUS by going to https://unicode.org/main.html , clicking on 'Latest Version', and selecting the chapter from the list on the left. For the current latest edition, the chapter link is https://www.unicode.org/versions/Unicode15.0.0/ch07.pdf . I then look for the table of contents on the left, click on the relevant script section, and scroll down, in this case looking for the in text heading 'Supralineation'.
From the text there, I believe the use of letters as numbers is flagged by the use of the combining overline. A line above whole letters is encoded by combining overline. A line extending from the middle of one letter to the middle of another is encoded by the half macrons at the ends and the conjoining macron in the middle.
You've misread 'M' as 'N'. Weird as it may seem, ⲣ︤ⲙ︥ is apparently a vowelless prefix, probably easy for a Pole to pronounce, not an abbreviation, and appearing in Coptic ⲣⲙⲃⲉⲕⲉ (rmbeke). I'm afraid I'm not sure that our Coptic supralineation is reliable - I can't find any. --RichardW57 (talk) 00:19, 29 July 2023 (UTC)Reply
@Kwamikagami: Supralineation gets stripped from links, which is why I couldn't find any in the categories. I still haven't found any, though. --RichardW57 (talk) 00:32, 29 July 2023 (UTC)Reply
It would seem we need to keep the conjoining macrons distinct from the overline, then. Possibly it could be conflated with the generic macron, but it may be best to keep a distinct article. kwami (talk) 04:41, 31 July 2023 (UTC)Reply
@Kwamikagami: I think not:
  1. Spacing and non-spacing discritics are to be put in the same entry, so I don't think we can keep conjoing macrons in a separate article to combining overline. Usage notes should probably labour the differences.
  2. At the human level, they're almost the same thing, with just the stopping places differing.
I'd put the half 'macrons' in the same article as the conjoining macron. Can't do it immediately because of the moratorium.
I see different rendering system have the left and right halves different ways round. The Emacs I use couldn't get them to join either way round - I think there may be some rather complex shaping behaviour that needs them to be in the same rendering run, which doesn't work well with the Emacs line-wrapping algorithm. U+FE24 COMBINING MACRON LEFT HALF should go above the right half and vice versa {source: https://www.unicode.org/charts/PDF/UFE20.pdf), so "ⲣ︤ⲙ︥" above is the correct encoding. --RichardW57m (talk) 09:59, 31 July 2023 (UTC)Reply

More WF audios edit

Hey. Here is another WF account that they uploaded a lot of audios under labeled as "RP". Might wanna have the bot change them to "Southern England". lattermint (talk) 16:57, 26 July 2023 (UTC)Reply

I think RP is probably okay for them, as someone who speaks in RP, but I’ve not listened to them in large numbers. Theknightwho (talk) 10:55, 27 July 2023 (UTC)Reply
@Theknightwho OK thanks. Maybe User:Sgconlaw or User:Equinox can comment. The issue with the Vealhurl audios was especially those that claimed to be from a specific region, it seems. (As an American I can't make these fine judgments about English accents, need some British people to help.) Benwing2 (talk) 19:19, 27 July 2023 (UTC)Reply
@Benwing2, Theknightwho: I think if the audio files were indicated as RP it’s fine to leave them unchanged. Personally all the WF files sound like RP to me, but at some stage WF claimed he did not speak with RP so we settled on “Southern England” as a compromise. — Sgconlaw (talk) 01:30, 28 July 2023 (UTC)Reply

Template:new es demonym edit

Hey. Can you recreate Template:new es demonym? Just as relevant as Template:new en noun, methink BeirutGirlXX (talk) 21:55, 26 July 2023 (UTC)Reply

@BeirutGirlXX OK, dude who is not from Beirut and not a girl, I have restored {{new es demonym}} using {{demonym-adj}} and {{demonym-noun}} for use with gendered demonyms, and also added {{new es demonym mf}} for use with non-gendered demonyms (i.e. male and female are the same, such as terms in -ense). I know you hate templates but you'll have to learn to use {{demonym-adj}} and {{demonym-noun}}, which are not hard to learn. Benwing2 (talk) 22:22, 26 July 2023 (UTC)Reply
Thanks a lot! Good luck with the 'crat vote. I'll try to vote 6letter acronym (talk) 22:38, 26 July 2023 (UTC)Reply

URL quotation fixes edit

Your ‘correct errors in {{quote-*}} templates (manually assisted)’ bot edits have created errors where there were none (example)—parameters after |url= were shifted to the left, leaving the translation parameter void. Perhaps this is up to me for using an atypically condensed quotation format. ―⁠Biolongvistul (talk) 13:31, 30 July 2023 (UTC)Reply

@Biolongvistul Shit, you are right. I should have only added the url param when there was an equal sign. Will fix. Benwing2 (talk) 16:46, 30 July 2023 (UTC)Reply
@Biolongvistul OK, everything should be fixed. Benwing2 (talk) 18:44, 30 July 2023 (UTC)Reply

Italian multiword verbs headword edit

I'm assuming you changed t:it-verb with the "a/@" parameter to show conjugations. Is there a particular reason (e.g., a prior discussion) for this for the multiword Italian verbs? Imetsia (talk) 16:02, 2 August 2023 (UTC)Reply

@Imetsia Yeah I did this. It is parallel to how we do things with Spanish and English verbal expressions. In Spanish, in particular, we tend to put the multiword principal parts in the headword but not include a full conjugation table (although this is definitely doable). I think it's useful because it shows how to words of the verbal conjugation interact with the remaining words of the expression. Benwing2 (talk) 08:32, 3 August 2023 (UTC)Reply

Quotation templates again edit

Sorry for the second inquiry this week. What is the reason for adding named parameters to quotation templates? Is there any instability with usage of unnamed ones? Or is it to make the syntax clearer? I agree that it may be a little opaque at first, especially when there are empty parameters, but the practice can be easily learnt and it is a feature for a reason. Also, economy helps. ―⁠Biolongvistul (talk) 21:27, 4 August 2023 (UTC)Reply

@Biolongvistul Hi. There are a discussion recently concerning this, in the BP or GP (I forget which one). The problem with numbered params is that (a) they're opaque as you mention (someone not familiar with the template will have a problem interpreting them), but even more, (b) they're very fragile, esp. when there are a large number of them, as with many of these quote templates. (For example, {{quote-hansard}} had 10 numbered params, and no one was actually using this functionality.) In addition, (c) each quotation template has its own interpretation of the numbered params, different from {{m}}, {{bor}}, etc., which follow one or two standard patterns; this increases the fragility even more. There are lots of mistakes I've found where people have messed up the numbered params, and lots more mistakes where people insert a value that has an embedded unescaped vertical bar in it (typically either in a URL or a title, but sometimes in the quoted text, the author, etc.), which accidentally results in part of the param being interpreted as a numbered param. (User:JeffDoozan and I worked to fix such issues with URL's recently, and found over 1,000 such instances.) If there were no numbered params, such errors could be caught by implementing checking for unrecognized params (which I'd like to do eventually), but this isn't possible if numbered params are allowed. The replacement named params aren't that long (see the discussion I cited; there are very few more than 6 chars), and if this is an issue, we can shorten the common named params to make it easier to type. My actual plan is to replace and deprecate/eliminate the functionality for the templates where it isn't used that much, and leave it for the few templates where it's in common use (probably {{quote-book}}, {{quote-journal}}, maybe {{quote-web}}). Quotation templates take significant work to enter in any case since there are typically a lot of params to put in, and the extra effort of using named params seems very small in comparison, different e.g. from {{m}}, {{bor}}, etc. Benwing2 (talk) 21:59, 4 August 2023 (UTC)Reply
Well, if {{quote-book}} and {{quote-journal}} stay the way they are, everything’s fine by me. Thank you for the reply. ―⁠Biolongvistul (talk) 22:04, 4 August 2023 (UTC)Reply
If we can't deprecate numbered params completely, is it possible to have the template throw an error if there are numbered params and named params? For example {{quote-book|en|1999|author|title}} would be valid, but {{quote-book|en|year=1999|author=homer|title=This title has ||bars||}} would throw a warning? That would allow the "purely numbered" params to keep working while hopefully generating a warning on any non-expected use. JeffDoozan (talk) 23:27, 4 August 2023 (UTC)Reply
@JeffDoozan Do you mean if there is an occurrence of a given numbered param and the corresponding named param? We could throw an error (a warning is unlikely to be noticed). That will catch most but not all the cases, e.g. if someone uses purely numbered params and accidentally puts an unescaped vertical bar in a numbered param. Benwing2 (talk) 00:08, 5 August 2023 (UTC)Reply
I'd throw an error if |2= is defined (even with an empty value) plus any other parameter except |3= through |8= (or whatever the maximum is for the given template). That would allow simple usage of numbered parameters, but if a quote is too complex to fit the numbered parameters, then it should use only named parameters. JeffDoozan (talk) 13:45, 5 August 2023 (UTC)Reply
@Biolongvistul Just FYI, there are at least 1,557 cases of errors in numbered params in {{quote-book}} and {{quote-journal}} alone. These are just those that a script of mine was able to catch by looking for places where a numbered param and its corresponding named param both exist. This is out of 13,000 or so total uses. Benwing2 (talk) 04:59, 10 August 2023 (UTC)Reply

Updates on fa-IPA and transliterations edit

Edit: I shortened this because it was too much reading.

  • Since Saranamd asked you about adding phonetic transcriptions to {{fa-IPA}} a while ago, I think I could handle the basic character mapping. (finished) But I would still need your help exporting it.
  • I think you should consider this layout which uses {{fa-IPA}} to generate transliterations. Which would merge your work on fa-IPA and the transliteration stuff and, as entering non-standard characters would not be possible, solve issues with inconsistent transliterations.
    • for the "phonetic Persian" spelling, if you think it's too much we can cut it. I could make a transliteration module for it, but I can't export it. So it's entirely up to you.

سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 22:09, 5 August 2023 (UTC)Reply

@Sameerhameedy Hi, sorry for not responding already, I will respond tomorrow as it's my bedtime. Benwing2 (talk) 08:21, 6 August 2023 (UTC)Reply
It's fine. I'll try to do as much as I can and ask you about the rest later. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 00:08, 9 August 2023 (UTC)Reply
Hi, Ben. I managed to complete all of the changes to fa-IPA that Saranamd and I requested from you. Whenever you finish your current projects and have free time, could you fix the export of Classical Persian so it exports as phonetic // rather than phonemic []? Thank you, سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 05:51, 22 August 2023 (UTC)Reply

<syntaxhighlight> tag edit

Hi, why did you replace <syntaxhighlight> tags with <pre>? Wikitext with highlighted syntax is much easier to comprehend. JWBTH (talk) 02:53, 6 August 2023 (UTC)Reply

@JWBTH It didn't seem to me to matter much and it was a pain in the ass to maintain the syntax highlighting tags when changing all the text around. Benwing2 (talk) 05:09, 6 August 2023 (UTC)Reply
Well, it's just another tag with an extra attribute lang="wikitext". As for "a pain in the ass", Wiktionary could make use of a template like w:Template:Demo and w:Template:Nowiki template demo. That way, what we write as
 <pre># ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}
</pre> gives

# ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}
could be expressed two times shorter:
{{demo|sep=gives|<nowiki># ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}</nowiki>}}
The template will produce the code snippet and its rendering itself. JWBTH (talk) 05:21, 6 August 2023 (UTC)Reply
@JWBTH We already have {{temp demo}} for this purpose but as the documentation for the Wikipedia equivalents show, you have to wrap the example template code in <nowiki>...</nowiki>, which adds a lot of verbiage. I'd rather not have to type all that; you're welcome to fix the documentation to use syntax highlighting tags if you want (but please do NOT import any templates from Wikipedia as we already have {{temp demo}} for this purpose). Benwing2 (talk) 06:24, 6 August 2023 (UTC)Reply
Cool that you have {{temp demo}}. This template has several flaws though:
  1. It can only show the output of a template, not of generic code. So, if there is some code around the template code (like in Template:ux/documentation, we can't use it.
  2. It renders the markup (like links, bolding, and other templates) inside the code snippet and messes up the order of parameters. For example,
    {{temp demo|quote-book|fr|year=1973|author={{w|Claude Simon}}|title=Tryptique|publisher=Éditions de Minuit|page=12|passage=Les sons de la cloche '''égrenant''' les quarts, les demies et les heures {{...}}|translation=The sounds of the clock '''rattling off''' the quarters, the halves and the hours {{...}}}}
    
    shows
    {{quote-book|fr|author=Claude Simon|page=12|passage=Les sons de la cloche égrenant les quarts, les demies et les heures [] |publisher=Éditions de Minuit|title=Tryptique|translation=The sounds of the clock rattling off the quarters, the halves and the hours [] |year=1973}}
    1973, Claude Simon, Tryptique, Éditions de Minuit, page 12:
    Les sons de la cloche égrenant les quarts, les demies et les heures []
    The sounds of the clock rattling off the quarters, the halves and the hours []
  3. And no markup highlighting, of course.
Several years ago, I and other people have developed an analogue of {{temp demo}}, w:ru:Template:Example. It allows to add prefixes and postfixes, alleviating the problem 1 (but no solving it), and allows to use {{=}} instead of = to keep the parameter order. This is inferior to w:Template:Demo in terms of the overall capabilities, but still much better than {{temp demo}}.
So, I would still advise to import w:Template:Demo or w:Template:Nowiki template demo as the most straightforward solution that lacks the issues that other solutions have. JWBTH (talk) 00:22, 7 August 2023 (UTC)Reply
@JWBTH The problem is, unless you know Lua well and also know the Wiktionary modules well, you'll end up importing a bunch of crappy dependencies that the code depends on in Wikipedia but which are duplicative of code that already exists here. That's why I'm opposed to importing stuff from Wikipedia; we end up with all this extra garbage that we then have to maintain. So I would only tolerate importing this if you rewrite the dependency invocations to use the equivalents here. People (probably including you) have done this in the past, and I've had to spend a bunch of time cleaning up after them, which I don't want to do. Benwing2 (talk) 00:46, 7 August 2023 (UTC)Reply
To me, as an interface admin at Russian Wikipedia, this is a familiar concern. We've been cleaning up our codebase for a long time, but in many cases we dropped our solutions in favor of English Wikipedia's and other, because, as a rule, they are better maintained, if only because there's more people there to maintain. The fact that Wikimedia projects are so fragmented and unsynchronized is actually a huge problem that eats up a lot of people's effort, and measures have been suggested and implemented to address this. See mw:Multilingual Templates and Modules, for example.
The fact that people can't use templates that they've got used to in other Wikimedia projects (even if with local specifics) isn't contributing to their user experience either. First of all, I'm talking about utility templates that have no project-related specifics, like {{uses lua}}.
That said, I have no problem with integrating imported modules/templates with Wiktionary's codebase as long as this codebase is well-supported or has some unique features of Wiktionary, so that it makes sense to support it, and not just a poorly supported duplicate, often with extremely limited functionality.
> probably including you
I've only got involved in English Wiktionary this year, so it's unlikely. JWBTH (talk) 02:21, 7 August 2023 (UTC)Reply
@JWBTH My apologies, several people have been copying modules from Wikipedia, I can't keep track of them all. I would say the situation with Wiktionary is not like the Russian Wikipedia. In many ways Wiktionary is fundamentally different from Wikipedia since it's much more templatized. This means many things have to be done in a fundamentally different way, which is why the infrastructure is different. If you are referring to my rename of the {{Lua}} template to {{uses lua}}, this was because we already have a template called {{lua}} and in general Wiktionary templates don't begin with a capital letter. (Wiktionary has been around long before Lua came on the scene, and I suspect the use of an initial capital in templates on Wikipedia postdates Wiktionary's creation.) From what I've seen, many of Wiktionary's general modules do things better than English Wikipedia due to the need to support the more templatized wikicode. Benwing2 (talk) 03:39, 7 August 2023 (UTC)Reply
> If you are referring to my rename of the {{Lua}} template to {{uses lua}}
No, that rename is OK. (But I thought using lowercase letters for templates has to do primarily with the case-sensitiveness of Wiktionary, so it shouldn't matter much which case are the letters as long as their usage is consistent, and "Lua" is a proper name, so it could make certain sense? But of course having {{Lua}}{{lua}} collision is undesirable.)
> From what I've seen, many of Wiktionary's general modules do things better than English Wikipedia due to the need to support the more templatized wikicode.
I've already noticed this, and was pleasantly surprised that Wiktionary has such a developed module architecture. But still, there are things that Wikipedia modules would do better. Speaking of which, Wikipedia for one has a great architecture for template testing decribed at w:Wikipedia:Template sandbox and test cases, and I haven't seen anything like that here. Having such an architecture could really come in handy for template/module development, especially given that Wiktionary is so much templatized, as you say. JWBTH (talk) 09:05, 7 August 2023 (UTC)Reply
@JWBTH Almost all templates use lowercase initial letters, so for consistency we should do the same here. As for template sandboxes and testing, yeah this would be good to have; I test by putting a copy of the relevant modules in my userspace, but maybe there's a better way. I know User:Erutuon uses some special functionality for this, but I don't know how it works. Benwing2 (talk) 01:40, 8 August 2023 (UTC)Reply
So, after all, why don't we import w:Module:Demo? Its only dependency is Module:Yesno, and Wiktionary already has that. I like working with template documentations, and I'm not a fan of copypasting stuff. JWBTH (talk) 09:18, 9 August 2023 (UTC)Reply
@JWBTH Module:Yesno is an example of a crappy copied module. Whoever copied it was unaware of Module:yesno, which already existed. You can copy w:Module:Demo if you want but please rename it to use a lowercase letter Module:demo, delete the p.module function (which is used to invoke other modules that don't exist in Wiktionary), make sure all non-exported functions are local and fix it to use Module:yesno. Benwing2 (talk) 04:48, 10 August 2023 (UTC)Reply
Also, the invoking template should be named {{demo}} with a lowercase initial letter. Benwing2 (talk) 04:50, 10 August 2023 (UTC)Reply
OK.
> which is used to invoke other modules that don't exist in Wiktionary
Nah, it's actually used to reduce server load to show examples of module usage.
> Whoever copied it was unaware of Module:yesno, which already existed.
Could a filter for mw:Extension:AbuseFilter be created to prevent this from happening? It could at least warn users that the first lowercase letter is the convention. It could also check for the existence of the page with the first lowercase letter and direct the editor there if it exists.
The problem with Module:Yesno is that it is currently specified as the interlanguage link for w:Module:Yesno (it was even purposefully changed from Module:yesno, see the edit; @Kwamikagami hi there, we believe your action was wrong, see above). Unless there is some meaningful difference in the behavior of Module:yesno and Module:Yesno (I see only the treatment of t and f values which can easily be added to Module:yesno without plausibly breaking anything), I believe they could be merged. JWBTH (talk) 09:38, 10 August 2023 (UTC)Reply
Yes, I would think the two should be merged. And yes, if there were a warning that the opposite capitalization already exists, that might help. (Doesn't the mainspace warning work in template space?)
Currently our 'yesno' template has the comment "It works similarly to the template {yesno}," so that should presumably be removed.
14 other wikt's have duplicate templates. These and the WD items should probably also be merged. kwami (talk) 16:14, 10 August 2023 (UTC)Reply

Blocked from logging into the English Wikipedia edit

Hello Benwing. I hope I don't trouble you by asking for your help: I just tried logging into the English Wikipedia using this username (0DF) and my password, but it won't let me, seemingly because I'm subject to an IP-range block (instituted by w:User:Yamaguchi先生 and running until 2025). Any idea what I can do to fix this, please? 0DF (talk) 01:00, 8 August 2023 (UTC)Reply

@0DF I thought IP range blocks don't apply to accounts, only to anonymous IP's, but I may be wrong; maybe it depends on how the block was instituted. I suspect User:Chuck Entz would have a better idea, although neither of us can do anything about Wikipedia; you'd presumably have to contact an admin there (although I don't know how you'd do that if you are blocked). Benwing2 (talk) 01:10, 8 August 2023 (UTC)Reply
To the last question, I think the 'intended' route is to e-mail the admin who implemented the block. (If the block also disables sending e-mail, I guess you might try pinging—on your talk page on Wiktionary—the admin who implemented the block, and/or pinging someone like Thryduulf who's an admin and active user on both sites and could pass your situation along.) I don't actually spot where Yamaguchi has implemented any long blocks recently, though. (Are you being hit by the range block on 2a02:c7c::/30?) - -sche (discuss) 01:32, 8 August 2023 (UTC)Reply
The block menu has an option to apply the block to logged-in users, which I almost never use except on very short-term blocks. The fact that this came to light during a login attempt might indicate it has something to do with account creation being blocked, in which case it would only apply to the first visit to Wikipedia with the account. As for what to do: I believe any Wikipedia admin can make an account IP-block exempt locally. I don't think anyone at Wikipedia would object, because it only applies while logged in to that one account and doesn't stop them from blocking the account itself in the event of any abuse. Chuck Entz (talk) 05:11, 8 August 2023 (UTC)Reply
Thank you all for your responses.
@Benwing2: Yes, I tried and failed to make contact with Yamaguchi and others on the English Wikipedia (via e-mail and automated forms), but without success.
@-sche: Yes, it's this range-block that's affecting me. Per your advice: @Yamaguchi先生, Thryduulf, could you help me with this issue, please?
@Chuck Entz: I think you're right. I was able to log into the German Wikipedia without issue. If I could only log into the English Wikipedia just once, I'm sure the block would no longer cause me problems.
0DF (talk) 15:52, 8 August 2023 (UTC)Reply
@0DF I see there have been complaints about this block on Yamaguchi's talk page. I posted about the effect it's having on you; hopefully they will respond. Benwing2 (talk) 19:39, 8 August 2023 (UTC)Reply
@Benwing2: I've just now been able to log in to the English Wikipedia. Thank you very much for your intervention. 0DF (talk) 23:38, 8 August 2023 (UTC)Reply

WingerBot changing {{bg-phrase}} to {{head|bg|phrase}} edit

Hi,

I've just noticed you deleted {{bg-phrase}} - what was the issue with it? Should the following code from Module:bg-headword be removed too?

pos_functions["phrases"] = function(postype, def, args, data)
	local params = {
		[1] = {required = true, list = "head", default = def},
		["id"] = {},
	}

	local args = require("Module:parameters").process(args, params)

	data.heads = args[1]
	data.id = args.id
end

In the future, it would be nice to get a heads-up - not to me necessarily, but to some Bulgarian editor. E.g. I was planning on adding more phrases to the Bulgarian phrasebook, which is how I noticed the template was gone.

Thanks,

Chernorizets (talk) 06:13, 9 August 2023 (UTC)Reply

@Chernorizets My apologies, I just rewrote Module:bg-headword to have various more features and support comparatives and superlatives in adjectives. In the process I noticed {{bg-phrase}}; the general thinking now is to avoid having templates like this that are a trivial wrapper around {{head}} (note for example we have no {{bg-interj}}, {{bg-con}} [for conjunctions] or the like). Here instead you can write {{head|bg|phrase|head=...}} or its shorter equivalent {{h|bg|phr|head=...}}. Most other Slavic (and non-Slavic) languages don't have a {{LANG-phrase}} template, preferring to use {{head}} directly, and for the ones that used to, I recently deleted them after converting the uses to use {{head}}. The logic behind this is that having all these extra templates (each of which invariably ends up working a little different from each other) adds up to a big maintenance headache in the aggregate. In the future I'll let you know if I'm going to make any such deletions. Benwing2 (talk) 06:18, 9 August 2023 (UTC)Reply
@Benwing2 do your changes affect how {{bg-adj}} and {{bg-adv}} work? We already had comparatives and superlatives for both adjectives and adverbs, and those two templates are widely used.
As for {{bg-phrase}}, the reasoning makes sense. It didn't seem to be doing much beyond {{head}}. Thanks for explaining. And just to be clear, you don't have to ping me specifically for template or module edits - any Bulgarian editor would do. I'm sure you're thorough, but IMO at least one other person should be clued in just to reduce the surprise factor.
Thanks,
Chernorizets (talk) 06:33, 9 August 2023 (UTC)Reply
@Chernorizets My changes affect headwords; {{bg-adj}} didn't support comparatives, while {{bg-adv}} did (and does). As for changes, see Template talk:bg-adecl where Kiril requested some changes, and I explained what changes I made. The most important one besides supporting comparatives for {{bg-adj}} is that {{bg-adv}} no longer defaults to displaying comparatives; instead you need to request them using |comp=+ or |2=+. The reason for this is that many people (esp. people who are more occasional contributors) won't expect that just writing e.g. {{bg-adv}} would generate a default comparative and you have to say {{bg-adv|HEAD|-}} to get no comparative, and would just go ahead and write {{bg-adv|HEAD}} by itself regardless of whether there's actually a comparative or superlative. So now, {{bg-adv}} by itself doesn't make any assumptions about comparatives; you have to request them explicitly as I mentioned above, or say {{bg-adv|-}} to specifically indicate that there's no comparative. Benwing2 (talk) 06:57, 9 August 2023 (UTC)Reply
@Benwing2 sorry, I guess I was thinking about {{bg-adecl}} but wrote {{bg-adj}}. Could either you or Kiril update the documentation of {{bg-adv}} and {{bg-adj}} to reflect the changes? Esp. for {{bg-adv}} since IIRC it used to show comparative & superlative by default.
Thanks for making improvements and the automated bot runs!
Chernorizets (talk) 07:16, 9 August 2023 (UTC)Reply
@Chernorizets Yup, I'll update the docs. Benwing2 (talk) 07:42, 9 August 2023 (UTC)Reply
@Benwing2 you may already be aware of this if you look at CAT:E, but your change inadvertently broke {{bg-part form}} by obsoleting its g parameter. This is evident even on the template's documentation page, and it affects ~1400 verb form articles. Chernorizets (talk) 07:44, 9 August 2023 (UTC)Reply
@Chernorizets Oops, thanks for letting me know. I thought participle forms didn't use the g= param and should have checked (my excuse is it's late and time for bed :) ...). I fixed this and am running a purge bot job on Category:Pages with module errors. Benwing2 (talk) 07:53, 9 August 2023 (UTC)Reply
@Benwing2 thanks for the quick turnaround! I'm on PST (WA) so I know what you mean :-) Chernorizets (talk) 08:10, 9 August 2023 (UTC)Reply

Reporting unwelcome edits that don't quite rise to the level of vandalism edit

Hi @Benwing2, is there a mechanism for doing that? RE: Special:Contributions/Djkcel - as of 21:06 on 8/12/2023, their last two edits are to remove portions of Bulgarian etymology sections that mention the PIE roots for the respective lemmas. At least according to their user page, they're not a Bulgarian speaker. I've asked about гъдулка (gǎdulka) on their talk page, and then I noticed a similar edit on свиня (svinja).

Thanks,

Chernorizets (talk) 01:02, 13 August 2023 (UTC)Reply

@Chernorizets Looks like this user has a long history of editing in languages they don't know and refusing to change. They have been blocked several times for this. Generally in cases like this you post on the user's talk page like you did, and if necessary bring it up in the Beer Parlour. It may be necessary to institute a longer block, depending on (e.g.) what the user says. Benwing2 (talk) 01:11, 13 August 2023 (UTC)Reply

Mixed numbered/named parameters edit

I've only spotted this once so far, but it looks like there was a small problem replacing numbered parameters in quote-* when the template also used some named parameters.

It fixed four templates that had numbered parameters only, but skipped two that had named month= in addition to numbered year.

Happy editing, Cnilep (talk) 01:55, 14 August 2023 (UTC)Reply

@Cnilep I haven't yet converted numbered params in {{quote-journal}} or {{quote-book}} (which are the only two templates still supporting numbered params). Benwing2 (talk) 01:56, 14 August 2023 (UTC)Reply

One more edit

bit: Etymology 2: Adjective. Chuck Entz (talk) 02:32, 14 August 2023 (UTC)Reply

@Chuck Entz If you see other such cases that you think I might miss, let me know. The quote templates are a mess because there are a zillion params and historically there hasn't been any param checking. I have a script to check for unrecognized params in the 522,000 or so quote-* templates; I'm down to about 400 cases of completely unrecognized params (from maybe 5,000 to begin with) but then there are all these other issues that appear when I start checking for duplicate aliased params and such. Benwing2 (talk) 03:27, 14 August 2023 (UTC)Reply
blanc-manger and kuiperoid (in case you're still up). Chuck Entz (talk) 04:45, 14 August 2023 (UTC)Reply

Brazilian nasalization of vowels before nasal consonants edit

The pronunciation module for Brazilian Portuguese by default makes all vowels before nasal consonants nasal as well, i.e. /ˈkɾẽ.mi/ instead of /ˈkɾe.mi/ for creme, and /ˈkɐ̃.nɐ/ instead of /ˈkɐ.nɐ/ for cana, but as a Brazilian I can confirm this is utterly wrong. I've never heard anyone pronounce those words in such way. What is that based on? - Munmula (talk) 10:53, 14 August 2023 (UTC)Reply

@Munmula It does this with stressed vowels but not unstressed vowels. What part of Brazil are you from? I have definitely heard stressed vowels pronounced nasal before a nasal consonant; this is why the vowel is raised/centered in cana and cama, otherwise why would this pronunciation happen? Although granted my experience with Brazilian Portuguese is mostly from Salvador and maybe this is a Northeast thing. I am basing this off of Wikipedia, which says this:
Another difference between Northern/Northeastern dialects and Southern/Southeastern ones is the pattern of nasalization of vowels before ⟨m⟩ and ⟨n⟩. In all dialects and all syllables, orthographic ⟨m⟩ or ⟨n⟩ followed by another consonant represents nasalization of the preceding vowel. But when the ⟨m⟩ or ⟨n⟩ is syllable-initial (i.e. followed by a vowel), it represents nasalization only of a preceding stressed vowel in the South and Southeast, as compared to nasalization of any vowel, regardless of stress, in the Northeast and North. A famous example of this distinction is the word banana, which a Northeasterner would pronounce [bɐ̃ˈnɐ̃nɐ], while a Southerner would pronounce [baˈnɐ̃nɐ].
This pronunciation of banana rings true in my ears but again my experience is Salvador in the late 1990's/early 2000's and maybe things have changed since then especially in the South/Southeast.? Benwing2 (talk) 19:31, 14 August 2023 (UTC)Reply
That passage is unsourced so I wouldn't give too much credibility to it. I'm from São Paulo city and by my experience the nasalization happens when the vowel is followed by M or N and then another consonant (as in the words banco, sombra, tinta) or by another vowel but in a different word (as in the phrases bem alto and vim a, which are pronounced as if they were written benhalto and vinhapé respectively). But I've never seen nasalization when it is a vowel, then M or N, and then another vowel in the same word, as in creme, which is pronounced /ˈkɾe.mi/ but not /ˈkɾẽ.mi/.
Maybe we should ask other Brazilian editors about their stance? Besides @Daniel Carrero and @Ungoliant MMDCCLXIV, both of which I already know from this wiki, I'm pinging some recently active pt-N users who are or might be Brazilian: @Cpt.Guapo, @Bezwzględny, @Baudelairesantos, @Capmo, @Gmestanley, @Holodwig21, @Jesielt, @Junglk, @LearningFromTheCradleToTheGrave, @OweOwnAwe, @Protegmatic, @Psi-Lord, @Vocênãosabeenemeu - Munmula (talk) 12:07, 15 August 2023 (UTC)Reply
I'm no expert in phonetics, but doesn't this sound nasalized? [8] Jesielt (talk) 12:58, 15 August 2023 (UTC)Reply
It sounds more like [ˈkɾeːmi] to me. Gmestanley (talk) 03:39, 3 September 2023 (UTC)Reply
I feel like /ˈkɾẽ.mi/ would fit northeastern Brazilian Portuguese, but I don't think that anyone south of São Paulo speaks like that. I personally pronounce it /ˈkɾe.mi/. Maybe the last vowel isn't quite [i] either, but I don't know how to properly represent it.
Northeastern Brazilian Portuguese is known for their nasal vowels. It is also a region comprised of many different states, and a northeastern friend of mine told me that they have many different dialects, so I wouldn't be able to give you guys more details about it, as I don't live in that region. That's what I know. LearningFromTheCradleToTheGrave (talk) 13:15, 15 August 2023 (UTC)Reply
I am from the Northeast (specifically Pernambuco) and [ˈkɾẽ.mi] is absolutely not what we do for creme. I pronounce it like [ˈkɾe.mi]. Gmestanley (talk) 03:35, 3 September 2023 (UTC)Reply
Serious? I'm from Maranhão and I pronounce this as [ˈkɾẽ.mi] Stríðsdrengur (talk) 01:18, 1 January 2024 (UTC)Reply
Despite being from the São Paulo State, I also nasalise the first sylable of banana, as well as cama and cana. But not creme, so I guess it only happens with the vowel A. Capmo (talk) 15:28, 15 August 2023 (UTC)Reply
I unfortunately do not know much from the situation since I’m not a speaker of Brazilian Portuguese. I know that Brazilian Portuguese has dialects that differ in pronunciation; however, my knowledge of the language is not as extensive. From little that I know, I haven’t heard any speaker of Brazilian Portuguese say creme and cana with nasalized vowels. 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 16:58, 15 August 2023 (UTC)Reply
@Munmula OK let's see if we can gather some more info. We have the ability to generate multiple dialects and already do this in fact, but it's tricky because (a) there is insufficient info out there in many cases, (b) for the Northeast it seems there are a lot of different dialects so I'm not really sure how to handle them. Does anyone have any info on how the vowels are pronounced in creme, cama and banana in Rio? This is one of the dialects we specifically highlight. Benwing2 (talk) 19:41, 15 August 2023 (UTC)Reply
@Munmula BTW I take it from your statement about vim a pé being pronounced like vinhapé that both are pronounced /vĩj̃aˈpɛ/? This is approximately what we already generate with regards to written nh so I'm glad to get confirmation. Benwing2 (talk) 19:45, 15 August 2023 (UTC)Reply
Yes, that is the case. - Munmula (talk) 11:51, 18 August 2023 (UTC)Reply
I'm Brazilian, from São Paulo, and I do think that the nasalization of a vowel (especially of /e/, /i/, /o/ and /u/) in this case isn't necessarily phonemic. As for me, I would pronounce creme as /ˈkɾe.mi/; if there's any nasalization of /e/ in this case, at least in my accent, I wouldn't say it is that perceptible. I've just watched some videos of people pronouncing such words on [9] YouGlish, and it seems to me that /ɐ/ might be more often nasalized than other vowels in this case. It's also important to keep in mind that /ẽ, ĩ, õ, ũ/ are very often diphthongized ([ẽj̃, ĩj̃, õw̃, ̃uw̃]), which doesn't happen with /ɐ̃/; therefore, I think that the distinction of nasalization in the latter is less perceptible than in the former. (pente = [ˈpẽj̃.t͡ʃi], but pena = [ˈpe.nɐ]) OweOwnAwe (talk) 23:31, 15 August 2023 (UTC)Reply
Thanks for your comments. The issue of phonemicity is complicated by minimal pairs like caminha "small bed" with /ɐ̃/ vs. caminha "he/she walks" with /a/ (BTW how do you speakers from São Paulo pronounce these two words?), so I'd rather not get too hung up on whether the nasalization is phonemic; it's more important IMO (at first at least) to figure out where it exists at all. Benwing2 (talk) 23:44, 15 August 2023 (UTC)Reply
Most people would pronounce them [kɐˈmĩ.j̃ɐ] and [kaˈmĩ.̃jɐ], respectively. Pronouncing the second word as [kɐˈmĩ.j̃ɐ] is also possible, although I think it is less common in São Paulo. OweOwnAwe (talk) 01:00, 16 August 2023 (UTC)Reply
Hello, sorry for taking this long to reply to this thread. I don't have notifications enabled for Wiktionary (yet). I am in fact Brazilian, and as I said in another reply, from the Northeast; specifically Pernambuco. I have never seen that kind of nasalization for a rootword with a vowel before M or N and then another vowel either, nor do I see it with phrases like bem alto and vim a pé (for those, I say /ˈbeĩ ˈautu/ [ˈbeĩ.ˈäu.t̪u] and /ˈvĩ.a.ˈpɛ/ [ˈvĩ.ä.ˈpɛ]. benhalto and vinhapé contrast for me as they would be /beˈj̃au.tu/ [beˈj̃äu.t̪u] and /vij̃a.ˈpɛ/ [vij̃ä.ˈpɛ]). I've also heard Cana being pronounced as /ˈkɐ.na/ instead of /ˈkɐ̃.nɐ/ and not every Northeastener pronounces banana as [bɐ̃ˈnɐ̃nɐ]; we just do it as /ba.ˈnɐ.na/ here. Gmestanley (talk) 03:57, 3 September 2023 (UTC)Reply
Just checked my settings, and I do actually have notifications via email enabled, I've just not been seeing them on my email provider. I thought Wiktionary had some sort of push notification system that I just had disabled. Gmestanley (talk) 04:05, 3 September 2023 (UTC)Reply
My two cents: the nasalization of vowels that precede nasal consonants is said to have been an African influence (Africanismos no português do Brasil, Maria do Socorro Silva de Aragão, Rev. de Letras - Vol. 30 - 1/4 - jan. 2010/dez. 2011, Universidade Federal do Ceará). This maybe explains the pronunciation in Salvador.
I was born in Bahia (Recôncavo region) and lived in Salvador for 8 years and I can confirm that this type of nasalization is very common. The same cannot be said of the State of Santa Catarina, where now I live. 2804:1790:83AC:D200:2C7E:5128:78AC:55FA 06:42, 24 August 2023 (UTC)Reply
Thank you! I will be fixing this module up soon, probably according to what User:Munmula said below. Benwing2 (talk) 07:12, 24 August 2023 (UTC)Reply
Thanks for your attention. If I'm not too late or off-topic, there are also other changes I would like to propose to the module to represent Brazilian pronunciations:
1 - The inclusion of the alveolar trill (/r/) for initial and double Rs (as in rádio and arroz) as an alternative or old-fashioned pronunciation in Brazil. I don't know about Portugal and other countries, but in Brazil some older people, or even younger people imitating a more "vintage" style, may still pronounce words that way, and it can be found in old videos. Take for example this subtitled speech of president Getúlio Vargas in 1951. You may find that pronunciation in stuff from the first half of the 20th century until about 1970.
2 - A similar old-fashioned pronunciation of Ls in the end of words as /l/ instead of /w/, such as /na.si.oˈnal/ instead of /na.si.oˈnaw/ for the word nacional. This can also be heard in the video linked above or in virtually any other old stuff.
3 - The occasional pronunciation of que in some words as /kwe/ instead of /ke/. Former president Jair Bolsonaro, for example, is known for pronouncing the word questão as /kwesˈtɐ̃w̃/ instead of the more common /kesˈtɐ̃w̃/. This is also reminiscent of a dated pronunciation of "que".
4 - The Caipira dialect and the accent/dialect of Northeastern Brazil, two of the most distinguishable in the country, seem to be missing from the module, being present only in some words. People from the Northeast would pronounce disto as /ˈdiʃ.tu/, just like in European Portuguese, while Caipiras would pronounce words like amor and porta with a retroflex R (/ɹ/), similar to how dark is pronounced in American English. - Munmula (talk) 16:18, 24 August 2023 (UTC)Reply
Thank you for acknowledging that the Northeastern dialect is seldom seen on words. It is astounding how ignored it is when it really is one of the more distinguishable dialects in the country.
I've also noticed the characteristic of /ke/ being realized as [kwe], and also figured that maybe it was a thing with older people. Gmestanley (talk) 03:31, 3 September 2023 (UTC)Reply
@Gmestanley @Munmula I am not intentionally ignoring the Northeastern dialect, it's rather that I don't have a good reference on it (esp. on the occurrence of low-mid vowels in unstressed pretonic syllables) and it seems there are lots of different Northeastern dialects, so we'd have to figure out which one(s) to display. (As I mentioned above, the accent I'm most familiar with is the one of Salvador, which is a Northeastern accent, notable for luz /lujs/ and such.) If you could either list the relevant sound changes to generate this dialect or point me to a good reference outlining the sound changes, I can implement it. Same for the Caipira dialect. As for old-fashioned pronunciations, I'd rather hold off on including them for now as the code and output is already quite complex and niche pronunciations like this seem like they will just add noise. (Similarly, there are still people who say /ˈʒẽti/ instead of /ˈʒẽtʃi/, and in fact this used to be characteristic of São Paulo accents, but we don't display this variant to avoid added noise and undue weight placed on less common pronunciations.) Benwing2 (talk) 04:17, 3 September 2023 (UTC)Reply
Thank you everyone. Judging by what was said in this discussion, it seems to me that keeping the nasalization in As but removing it from other vowels would be a reasonable change. How does it sound to you guys? Perhaps we would also include phonetic Brazilian pronunciations of "en" and "in" as /ẽj̃/ and /ĩj̃/ respectively. - Munmula (talk) 11:51, 18 August 2023 (UTC)Reply
@Munmula What is the rule for pronunciation of en and in? Currently the module generates /ẽj̃/ for word-final -em but not word-internally. Should it also happen word-internally? Benwing2 (talk) 18:05, 18 August 2023 (UTC)Reply
I think /ẽj̃/ and /ĩj̃/ are more appropriate as phonetic transcriptions, but we should rather check a reliable source before changing it. - Munmula (talk) 22:02, 18 August 2023 (UTC)Reply
@Munmula Just a small add-on to the discussion; Celso & Cintra's Nova Gramática do Português Contemporâneo does not list any nasalization in standard Brazilian Portuguese (São Paulo and Rio, for them). That doesn't mean it doesn't occur, though. Also, another thing, though this would be another discussion, is that they note that contrary to final "o" that is realized as "u", final "a" is not realized as "â" (like in European Portuguese) in standard Brazilian. Is that the case? - Sarilho1 (talk) 16:18, 8 September 2023 (UTC)Reply
Hmm does that book mention an exception for the 'an' preceding either vowels or consonants? As to that final a/â distinction in Brazilian Portuguese, that is something tricky I always wondered about. I think I don't have enough knowledge on phonetics to properly distinguish between them and say how exactly we pronounce final unstressed As. - Munmula (talk) 11:06, 11 September 2023 (UTC)Reply

Section edit

Great work fixing all these quotations! diff raises a problem I have run into before. It shunts the material to the end of the cite, and I think what that means is that "Section" means something like "Sports Section" or "Local News Section", not 'the parts within the article'. No idea what a "solution" would be here, just noting that this is strange. --Geographyinitiative (talk) 11:46, 14 August 2023 (UTC)Reply

@Sgconlaw Can you help with this? I have also seen this before. The |section= param puts its value after the volume and especially the publisher, when it seems it ought to go before them. What does the APA say about this? Benwing2 (talk) 19:23, 14 August 2023 (UTC)Reply
The intent of |section= was to have a catch-all parameter that could be used to indicate various minor subdivisions of a work, such as parts, paragraphs, stanzas, etc. Where a journal article or chapter of a book has been divided into lettered or numbered sections, I have sometimes indicated such a section like this: |section=section 1.3 (Name of Section). However, if this type of section is not lettered or numbered, and I've felt it is useful to indicate the section, I have sometimes added it to the article title or book chapter name like this: "Article Title. [Section Name.]" I'm afraid I'm not familiar with the APA Style, so I don't know if what I have done is in accordance with that guide. — Sgconlaw (talk) 20:06, 14 August 2023 (UTC)Reply
@Sgconlaw OK thanks, I think I'm gonna move it after the title rather than put it at the end. Benwing2 (talk) 21:02, 14 August 2023 (UTC)Reply
FWIW, I use the |section= parameter in the same way @Sgconlaw describes. 0DF (talk) 22:52, 15 August 2023 (UTC)Reply
Well let me give you another example of how there's some kind of problem in this area: diff. Here, the section is Taiwan News (section of the newspaper). And the title is the title at the top of the page. The journal is the newspaper. But! There's a subsection of the article "Temperatures to rise" which is pretty darn relevant to the quotation, but yet I just omit it because there's no parameter for it. And then: on diff, I would add "section=Tung lay-up plans", but it would be strange to see that at the end of the citation (to my eye). --Geographyinitiative (talk) 10:53, 18 August 2023 (UTC)Reply
@Geographyinitiative: Do you have a different presentation in mind? 0DF (talk) 11:23, 18 August 2023 (UTC)Reply
Maybe there should be a "subheading" parameter for quote-journal? Or something like that? Idk. --Geographyinitiative (talk) 11:24, 18 August 2023 (UTC)Reply
@Geographyinitiative: That would be good. And maybe |subheadingn= for further subdivisions, if that were considered desirable. Personally, I'd also appreciate |footnote= and |endnote= parameters for quoting from foot- and endnotes. 0DF (talk) 12:53, 18 August 2023 (UTC)Reply
@0DF You can quote from footnotes and endnotes using something like |page=15 + |line_plain=footnote 3. Let me see what I can do about journal article sections; User:Sgconlaw do you have thoughts? Benwing2 (talk) 18:04, 18 August 2023 (UTC)Reply
OK, sure; I'll use |line_plain= for foot- and endnotes in future. 0DF (talk) 00:54, 19 August 2023 (UTC)Reply
@Benwing2: I have been indicating footnote numbers like this: |section=footnote †. As for journal article sections, see my comment above on 14 August 2023. — Sgconlaw (talk) 20:59, 18 August 2023 (UTC)Reply
@Sgconlaw The problem is you're overloading |section= to indicate lots of different things, which makes proper formatting very difficult. We need to use different parameters for semantically different information, and not do things like bundle the section name or number with the article title. That's why I recommend using |line_plain=, because a footnote is much like a line and not much like a chapter. In my view, |section= is for subdivisions of a work similar to chapters, if "chapter" doesn't make sense, e.g. "act II, scene 2"; this is what the documentation specifically says. For a journal article, |section= should be for sections of the article. If we need to indicate information such as sections of the overall collection (i.e. journal; for books the collection and work are the same), we need a separate param for that. Benwing2 (talk) 21:54, 18 August 2023 (UTC)Reply

Related: [10] I plan to move Chinese author names like this; of course, both this version and my original version are abuses/work-arounds. Let me know if you have any solutions on this for me. --Geographyinitiative (talk) 22:11, 16 August 2023 (UTC)Reply

@Geographyinitiative You should use the inline modifier support that I recently added, as shown in the example. It doesn't work with {{lw}} because of the language tagging that {{lw}} adds; I'll come up with a workaround. Benwing2 (talk) 00:21, 17 August 2023 (UTC)Reply

Polish headword adjustments edit

Would it be possible to 1) add the relational adjective parameter to the Polish headword noun and 2) bot convert pages with the relational adjectives in the derived/related section to the headword? and 3) add "indeclinable" as a categorizing parameter to {{pl-adj}}? Vininn126 (talk) 22:35, 14 August 2023 (UTC)Reply

Also perhaps we can do this for other Slavic languages, at least Czech, and maybe we can also remove the declension from the headword like we were discussing in that thread? Vininn126 (talk) 09:38, 15 August 2023 (UTC)Reply
@Vininn126 Yup, all in good time :) ... I can do (1) and (3) easily and have a script where I did (2) for Russian so I might be able to leverage this. For removing the declension from the headword I'd like to keep the declensions in irregular cases, so this will take a bit of work. Benwing2 (talk) 19:38, 15 August 2023 (UTC)Reply
Thanks a ton! Vininn126 (talk) 19:41, 15 August 2023 (UTC)Reply
I think the way (2) might have to be handled is to check etylines and also deflines of adjectives... But I'm afraid a lot are missing etylines... Vininn126 (talk) 22:03, 15 August 2023 (UTC)Reply
@Vininn126 I took a look at the Russian script I wrote. It attempts to add both relational adjectives and diminutives to noun headwords. For relational adjectives it looks at etymologies and assumes that if an adjective is defined as noun + suffix and has a relational label, it's a relational adjective of that noun; and for diminutives it looks for {{diminutive of}} and similar. But rather than just automatically convert such cases, it has a verification step: it outputs a list of potential cases (along with the definitions of the nouns and potential relational adjectives), which are manually edited to remove the bogus ones, and then in a separate pass it applies them, which involves adding them to the noun headword and removing the added term from ==Derived terms== or ==Related terms== of the base noun. Note that I also have a separate script for Russian that attempts to add etymologies for relational adjectives and such, essentially by trying, for each noun, to construct possible relational adjectives using certain suffixes (-ный, -ной, -ский, -ской, -овый, -евый, -ёвый, etc.) and palatalization rules, and seeing if such an adjective exists. If so, it's output for manual review (again with the definitions of noun and adjective included), and the etymologies that remain after editing are added in a separate step. (The script also handles adjective-noun derivations using -ство, verb-noun and noun-noun derivations using -ник, etc.) I think I ran the second script before the first one, so that the added etymologies could feed the relational adjective "snarfing". Unfortunately all of this takes time. You might be able to help me in the manual review steps; essentially I would give you a file containing possible cases and you would remove the lines that are bogus and leave the remainder. Benwing2 (talk) 00:04, 16 August 2023 (UTC)Reply
Thanks for the explanation. Of course I'd be willing to help with the manual steps. Vininn126 (talk) 08:55, 16 August 2023 (UTC)Reply
Turns out the parameter already exists, so it's just a matter of bot converting. Vininn126 (talk) 16:48, 17 August 2023 (UTC)Reply

Next WingerBot open source commit edit

The "out of date" part of User:WingerBot's "Source code (currently out of date) is available on github" has reached like half a decade now. I'm curious about what changed in the meantime. When will the next open source publication be? Daniel.z.tg (talk) 20:13, 19 August 2023 (UTC)Reply

@Daniel.z.tg See [11]. This is the latest code. It needs some cleanup and splitting into separate directories as it's over the github limit for # of files in the top-level dir. If you're seriously interested in looking at the code I'll put some time in to clean it up. Benwing2 (talk) 20:24, 20 August 2023 (UTC)Reply
Thank you. I didn't know the code was already available and pushed frequently. No, I've written ML code, so I don't need any cleanup to read it. I'm happy in the spirit of open source to just look at what's going on in the community around me. Wow, this new repository is 300 kLOC instead of the 30k in the old one. Thank you for sharing your knowledge and providing a comprehensive example of a Wiktionary bot! Daniel.z.tg (talk) 20:38, 20 August 2023 (UTC)Reply
@Daniel.z.tg You're welcome. Yeah there is quite a lot of code I've written over the years. Some of the scripts have very specific purposes and are more or less one-offs that I keep around because occasionally they can be recycled for some other purpose. Some of the scripts are very general and I use them all the time (e.g. find_regex.py lets you download sets of pages according to all sorts of criteria; often I download a set of pages, edit the resulting file using vim, and push the changes using push_find_regex_changes.py, which facilitates doing quick semi-manual changes across a large set of pages). It was only a few months ago that I finally got around to upgrading to Python 3; previously the code was using Python 2 and an old version of pywikibot. I was procrastinating because I thought it would be painful given all the Unicode stuff in the code, but it turns out it only took a few hours. Benwing2 (talk) 20:46, 20 August 2023 (UTC)Reply

Overemphasis of Other Language? edit

Hey, thanks for your work on quotations and thank you for reaching out to me. When I made my recent edits, I was definitely thinking about doing the in-line like you were talking about here diff. But I thought it was an overemphasis of Chinese characters to put them first and the English in brackets, since this book is totally in English except for the last page of the book. (1) Tech Question: Is there a new way to put English first for that book, with Chinese characters second? (I tried to do an in-line that would have this effect based on your model at Quwo, but I couldn't immediately figure out what I should do after a few failed attempts. I don't know much about coding.) (2) Policy Question: Whether or not that's possible under the new ways, which is more appropriate for this book: English first with Chinese characters in brackets, or the opposite? I'm not in favor of one over the other, but I think that since the book is almost entirely in English, I didn't want to misrepresent what the reader would see if they went to find the source. The next question might be: what's the value of the Chinese characters? I would say that it's very valuable info for a bilingual reader to see and potentially explore if they are interested in this subject. Also, I want Chinese-speaking people who are looking for this book in an online search to be able to see the book on Wiktionary- they could thereby learn what the English words they are looking at refer to or similar. I am very happy that someone is taking an interest in cleaning up quotations and making things nice and professional. (I had done it for some quotations one by one.) There are all kinds of questions and ideas I have for quotations, but I don't want to overload you. (For instance: here I know the Chinese characters behind Shaodian and Wuying, and I want to show that to Wiktionary readers, but I don't know how. See also [12]. Also: Check out "et al." on this page: Citations:Qibin. Of course I avoid et al., but sometimes I don't have access to the names of other authors. I guess those should be author2=?) Thanks again! --Geographyinitiative (talk) 11:23, 20 August 2023 (UTC) (Modified)Reply

@Geographyinitiative I would like to eliminate |author2= entirely; instead you should use semicolon-separated values in |author=. I will fix all the issues regarding "et al.", some others have also pointed them out and there's no reason to avoid it. As for whether to put the English or Chinese first, that's a good question. It isn't currently really possible to put the Chinese in brackets as the code assumes that transliterations and translations are in Latin characters. I can add support for this using a new inline modifier but before doing that we should start a BP discussion to see what people think should be done in this circumstance; I am genuinely not sure. Benwing2 (talk) 18:58, 20 August 2023 (UTC)Reply
Just do your best, I am on board for your changes, and I'm glad these things are getting cleaned-up. If you have to drop some Chinese characters or delete some work I did, no problem- I always worked on the assumption that a lot of things I was doing were probably going to be changed/deleted/etc. I will adapt to the changes as fast as I can. --Geographyinitiative (talk) 19:25, 20 August 2023 (UTC)Reply
@Geographyinitiative All good. I did post to the BP and we'll see what the outcome is. Benwing2 (talk) 19:36, 20 August 2023 (UTC)Reply
@Benwing2: Re: "use the format with inline modifiers and a single author= arg instead of first=/last="
Are you going to coordinate this removal with Wikipedia? I sometimes copy and paste references between Wikipedia and Wiktionary, and I would like that to remain functional. Daniel.z.tg (talk) 20:15, 20 August 2023 (UTC)Reply
@Daniel.z.tg It's not realistic to do that. Wikipedia requires the use of last=/first= in all circumstances because they want to display the name in LAST, FIRST format, while we display the name in FIRST LAST format. In general we have a lot of differences in our {{quote-*}} templates from Wikipedia's {{cite *}} templates; they serve different purposes and it's not realistic to expect us to be forced to conform to however Wikipedia does things. I have no immediate plans to deprecate the first=/last= params but I reserve the right to deviate from Wikipedia's structure. In general, blindly copy/pasting between Wikipedia and Wiktionary is a bad idea. Benwing2 (talk) 20:23, 20 August 2023 (UTC)Reply
If our templates already diverged from Wikipedia, then my original point is moot. I just noticed the cite button was added to Wiktionary's visual editor (or was I blind the whole time). It nicely displays the author field and is visual, which is one of the things I want. Daniel.z.tg (talk) 20:41, 20 August 2023 (UTC)Reply
@Daniel.z.tg Hmm. I don't use the visual editor so I'm not sure how this cite button works. If there are problems with it, let me know and I'll see if it can be fixed. I wonder if it uses the TemplateData stuff that is stuck at the bottom of some doc pages (which I hate, BTW; it is horribly designed and not automatable). Benwing2 (talk) 20:49, 20 August 2023 (UTC)Reply
@Daniel.z.tg Yes, a lot of things have diverged from Wikipedia because there are fundamental differences in how the two projects work, which is necessitated by their different goals. Benwing2 (talk) 20:50, 20 August 2023 (UTC)Reply
Here is the result of me filling in the visual form to recreate the example in your diff:
1989, 车慕奇, 丝绸之路今昔, →ISBN, page 293:
Passing through Qira County on our way, we were asked to stay by Wang Yijun, Director of the Office of the County Party Committee. He said he was an amateur archaeologist and an old acquaintance of Li Yuchun’s. In 1978 the two men had gone together to the desert in northern Qira County to survey a buried ancient city.
It's good that it's automatically a quotation, not a citation. The other fields are should show up if I select them in the left panel listview. The only thing I had to do is remove the <ref></ref> tags. Daniel.z.tg (talk) 20:58, 20 August 2023 (UTC)Reply
@Daniel.z.tg OK, the wikicode of that quotation looks fine to me. The only thing I could see being done is adding translations of the author and title using inline modifiers, something like this:
1989, 车慕奇 [Che Muqi], 丝绸之路今昔 [Silk Road, Past and Present], →ISBN, page 293:
Passing through Qira County on our way, we were asked to stay by Wang Yijun, Director of the Office of the County Party Committee. He said he was an amateur archaeologist and an old acquaintance of Li Yuchun’s. In 1978 the two men had gone together to the desert in northern Qira County to survey a buried ancient city.
Benwing2 (talk) 21:04, 20 August 2023 (UTC)Reply
Do check out diff if you have a chance. --Geographyinitiative (talk) 08:09, 2 September 2023 (UTC)Reply
I want to apologize to you, I have made several massive mistakes over the course of several years. It was I that was doing the systemic, unintentional racism the whole time my man. Thanks for your work! --Geographyinitiative (talk) 11:25, 2 September 2023 (UTC)Reply
@Geographyinitiative OK sure, I am confused though by what this is in reference to. Benwing2 (talk) 13:17, 2 September 2023 (UTC)Reply

Template:es-pr edit

Hi! There's an error in this with "gua" spellings. The /g/ should be silent, so guapo, guarra and Guatemala are like /wapo/, [warra] and /watemala/ and agua like /awa/. Medved Karol (talk) 07:24, 21 August 2023 (UTC)Reply

@Medved Karol I think it's more complex than that. Normatively, the /g/ at the beginning of a word and after /n/ should be [g]. In agua, normatively it's a very soft approximant, which we indicate by [ˈa.ɣ̞wa]; the same applies when guapo, guarra and Guatemala occur after a vowel or any consonant other than /n/. I think informally what you're describing is correct. Requesting comments from User:AdrianAbdulBaha User:AugPi User:MiguelX413 User:Rodrigo5260 User:Ser be etre shi User:Vivaelcelta who may be somewhat active and are native speakers. Benwing2 (talk) 07:45, 21 August 2023 (UTC)Reply
I only pronounce the g in these words when I want to sound pedantic (or occasionally after an utterance [and always after an /n/] in the case of these words starting with /gw/), when I speak naturally I barely pronounce them. Rodrigo5260 (talk) 12:10, 21 August 2023 (UTC)Reply
Well, my dialect (like much of the rest of Central America, and much of Colombia) uses the plosive versions of /b d g/ after /l ɾ/ too, e.g. El Salvador [el.sal.baˈðoɾ] besides the more widespread [-l.β-], so for me there are more instances where this "g" is pronounced as a plosive: el guaro [elˈgwa.ɾo] 'the booze', traer guaro [tɾaˈeɾ ˈgwa.ɾo] 'to bring booze'. And after a nasal or a pause it's [g] too of course: guaro [ˈgwa.ɾo], traen guaro [ˈtɾa.eŋ ˈgwa.ɾo] 'they bring booze'. And as Benwing2 said, I do generally "pronounce the g" intervocalically, it just happens to be with the approximant allophone: verse guapo [ˈbeɾ.se ˈɣwa.po]. I can say [ˈbeɾ.se ˈwa.po] but that already has a very informal connotation.--Ser be être 是talk/stalk 18:38, 21 August 2023 (UTC)Reply
For the people with silent /g/, does this mean there are no minimal pairs between words with gu and words with hu? WOuld huaca and guaca be perfect homonyms? Does a faint /g/-like sound ever appear in words where it is not spelled? Soap 12:03, 23 August 2023 (UTC)Reply
@Soap Yes, that is the case. I've heard people pronounce Wii (the Nintendo console) and Wi-Fi as [gwi] and [ˈgwaifai], for example.— This unsigned comment was added by Ser be etre shi (talkcontribs) at 22:32, 2 September 2023 (UTC).Reply
I think w vs gu are allophonic in a dialect continuum in the Romance languages (see Guadal#Etymology for it changing across Spanish, Arabic, English, and French). W vs gu is probably mutually intelligible and has dialectal variation within Spanish, but please definitely do put the more common or standard pronunciation if you know it. Daniel.z.tg (talk) 23:07, 23 August 2023 (UTC)Reply

Hi there! Since you all are talking about es-pr and /ɡ/, let me mention that the Spanish pronunciation given by es-pr in the article for exactitud (and related words, e.g, exacto, exactamente) does not seem (to mine ear) to sound quite right; it says /eɡsaɡtiˈtud/, whereas if you check the corresponding page in es.wiktionary, it gives [ek.sak.tiˈtuð]. That would be the normative pronunciation in Spanish for it, whereas the pronunciation with the /ɡ/'s replacing the /k/'s sounds anglified. Also, you can check this through, say, https://translate.google.ca/?sl=es&tl=en&text=exactitud&op=translate and clicking on the Listen button. Or you can go to es.wikipedia's article for Exactitud and try a Mac computer's system voice in Spanish on its text. Also, /eɡsaɡtiˈtud/ might by theoretically less likely since /g/ is voiced whereas /s/ is unvoiced; /gz/ would seem more likely since they are both voiced, but /z/ does not belong to Spanish; and also /ks/ would seem more likely since they are both unvoiced.   AugPi 03:05, 23 August 2023 (UTC)Reply

@Ser be etre shi Can you comment on this? Wikipedia is very specific that x is /gs/ pronounced [ɣs], and this is not the first citation I've seen that says the same thing, yet multiple people seem to think that [ks] is correct. Is this a Latin America vs. Spain or an old vs. new thing? Benwing2 (talk) 03:18, 23 August 2023 (UTC)Reply
we seem to only allow voiced stops in the coda when before another stop. check, for example, óptimo where even what is spelled as p is pronounced as a voiced approximant /β̞/. Soap 12:08, 23 August 2023 (UTC)Reply
@Benwing2 I think people normally say [ɣs] just as linguistics publications report (although for sure less-than-standard pronunciations exist too, notably [ts] which I associate with Caribbean speakers but it's not necessarily just them), but in conscious, careful pronunciation it does come out as [ks], so unfortunately I don't think we'll ever stop seeing complaints about it needing to be [ks]. Perhaps we can compare this with English -rt- as in "quarters" or "Marty", for which I've met North American speakers who deny the existence of the [ɹɾ] pronunciation ([ˈmɑɹɾi]), claiming people only say [ˈk(w)ɔɹtʰɚz, ˈmɑɹtʰi], because that's what they have as a conscious pronunciation.--Ser be être 是talk/stalk 22:32, 2 September 2023 (UTC)Reply
@Ser be etre shi Thanks! Benwing2 (talk) 20:46, 8 September 2023 (UTC)Reply

q= and qq= in {{quote-book}} edit

Hi. These parameters are not working. Vahag (talk) 18:55, 21 August 2023 (UTC)Reply

@Vahagn Petrosyan They currently work as inline modifiers attached to authors, titles, etc. but not yet as parameters. The reason for this is I'm not sure where to put them. Suggestions? Should they go on the same line as the quotation, before and after respectively? Or on their own lines? Or somewhere else? Benwing2 (talk) 19:20, 21 August 2023 (UTC)Reply
I prefer on the same line as the quotation, after it, like in {{usex}}.
I intend to use q= for dialect labels. But maybe we should have a tag= instead, which will fetch data from label modules like in {{desc}}. Vahag (talk) 19:30, 21 August 2023 (UTC)Reply
@Vahagn Petrosyan OK sounds good, I can implement that. Benwing2 (talk) 19:31, 21 August 2023 (UTC)Reply
Can |tag= be added to {{usex}}? Vahag (talk) 16:41, 13 September 2023 (UTC)Reply
@Vahagn Petrosyan Sure, I will implement that. Benwing2 (talk) 19:25, 13 September 2023 (UTC)Reply
Hello again. The parameter |mainauthor= does not display in {{Template:R:xcl:Beekes:2003}}. Am I doing something wrong? Vahag (talk) 18:49, 29 September 2023 (UTC)Reply
@Vahagn Petrosyan Hi Vahag. You're using {{cite-book}} rather than {{quote-book}}. I have plans to harmonize the two so that {{cite-book}} works more like {{quote-book}} but currently the code is entirely different. Benwing2 (talk) 19:36, 29 September 2023 (UTC)Reply
I see. I was misled by the documentation of {{cite-book}}. Vahag (talk) 19:39, 29 September 2023 (UTC)Reply
@Vahagn Petrosyan Oops. It looks like User:Sgconlaw added this in Apr 2021 and User:Victar undid the changes a month later, which broke the support. See [13]. Let me see if I can fix it. Benwing2 (talk) 19:45, 29 September 2023 (UTC)Reply
The edit changed the format references, which myself and others didn't consider preferable. There is a discussion if you want to search for it. --{{victar|talk}} 23:25, 30 September 2023 (UTC)Reply

Template:quote-av season parameter not working edit

Hiya - see above. Could you please take a look? Theknightwho (talk) 20:33, 21 August 2023 (UTC)Reply

[[Category:Sassarese terms derived from Classical Latin]] edit

Hi. I've recently added the Sassarese entry abi, and I noticed the category Sassarese terms derived from Classical Latin has been deleted by you—if I understand correctly—on the grounds that it was empty, except for the {{autocat}} template. Do you think it would be a problem if I were to recreate it, so that the entry can link to it? Thanks in advance for your time. —— GianWiki (talk) 04:00, 23 August 2023 (UTC)Reply

There's no problem re-creating such categories. They get automatically deleted periodically when empty, that's all. Benwing2 (talk) 04:36, 23 August 2023 (UTC)Reply
Ok. Thank you very much for your time. —— GianWiki (talk) 02:08, 25 August 2023 (UTC)Reply

Congrats on your promotion ;-) edit

Pretty neat that it was unanimous, too. Good luck! Chernorizets (talk) 09:37, 24 August 2023 (UTC)Reply

Very cool! --Geographyinitiative (talk) 10:18, 24 August 2023 (UTC)Reply

@Chernorizets @Geographyinitiative Thank you both! Benwing2 (talk) 05:57, 25 August 2023 (UTC)Reply

Bug in modifiers in {{quote-av}} edit

Hi, it seems that the inline modifiers for |actor= and |role= in {{quote-av}} does not work the way it is intended, since they both plug into |section= of Module:quote, and only after that the inline modifiers are processed. See for example diff where this issue occurs. – Wpi (talk) 15:06, 25 August 2023 (UTC)Reply

@Wpi Yup this is high on my list of things to fix; several templates have this issue. Benwing2 (talk) 18:23, 25 August 2023 (UTC)Reply
@Wpi This should work now. Benwing2 (talk) 18:07, 27 August 2023 (UTC)Reply

{{quote-book}} numbered parameter rightward shift edit

The seventh and eighth params, that is, text and translation, have somehow become 8 and 9 despite what the docu says. See călugăriță for a demonstration. Everything lines up if I add an empty numbered parameter between page and text. ―⁠Biolongvistul (talk) 22:23, 27 August 2023 (UTC)Reply

Just so you know: edit

There are 48 of these: [14]. And some of them also have |journal=. Chuck Entz (talk) 02:04, 28 August 2023 (UTC)Reply

@Chuck Entz Thanks. These were all created by User:Vininn126 and appear to have a lot of issues. I am trying to fix them but I'm hindered by not being able to read Polish. Benwing2 (talk) 02:06, 28 August 2023 (UTC)Reply
See pong for an example of "=" in Google URLs being taken as delimiting parameters. Chuck Entz (talk) 02:08, 28 August 2023 (UTC)Reply
Actually, it's pipes in the URL that are to blame. Chuck Entz (talk) 02:12, 28 August 2023 (UTC)Reply
I'll take a look; sorry everyone! Vininn126 (talk) 08:42, 28 August 2023 (UTC)Reply
@Vininn126 I tried to clean them all up. Mostly they need (a) review of the authors, all of whom I converted to |editor= or |editors=, to make sure this is correct; (b) English translations of the titles and journal names; (c) review of the pings I made to you. Benwing2 (talk) 08:45, 28 August 2023 (UTC)Reply
@Benwing2 Are there any others I need to take a look at aside from Chucks list and the pings? Vininn126 (talk) 08:51, 28 August 2023 (UTC)Reply
Also thank you for doing what you can; it seems you managed to make a lot of accurate corrections. Vininn126 (talk) 08:56, 28 August 2023 (UTC)Reply
Finally, I deleted the unneeded tempalates from Chuck's list as well as one other template - any without a documentation is likely unneeded and was created in error. Please let me know what else I need to fix so that I may! Vininn126 (talk) 09:04, 28 August 2023 (UTC)Reply
@Benwing2 If it's not botable, please make a list of all templates that need updating. Vininn126 (talk) 17:50, 28 August 2023 (UTC)Reply
@Vininn126 I fixed all the ones in CAT:E. There may be others needing updating that aren't throwing errors; we'll see. Benwing2 (talk) 21:00, 28 August 2023 (UTC)Reply

quote-book edit

Is it necessary for normalisation parameter to be visible, or can be somehow hidden, and only used for the transliteration? نعم البدل (talk) 20:25, 29 August 2023 (UTC)Reply

@نعم البدل There isn't any current way to hide the normalization. I'm not sure this is a good idea to implement. Can you point me to some of the pages in question? It seems to me the version marked up with vowel diacritics should be visible one way or another, either in the normalization or the usex itself. Benwing2 (talk) 20:44, 29 August 2023 (UTC)Reply
@Benwing2: Example being لَن٘گْھݨا (laṉghṇā), and lemmas which previously used the quote-book-ur template. It just looks a bit weird and unorthodox. I'm not sure what the normalisation parameter is exactly meant to be used for, but I'm essentially looking for a parameter which can let me modify the transliteration, while being able to retain the source formatting, without manually having to enter the transliteration. نعم البدل (talk) 20:52, 29 August 2023 (UTC)Reply
@نعم البدل The purpose of the normalization param is to present a normalized version of text that may be written in a nonstandard form. Here it would seem reasonable to use to present vocalized text. If you don't want to do that, an alternative is just to call {{xlit}} yourself in the translit field on the vocalized text. This seems a very niche use case that you're striving for and I don't want to burden the code with an extra parameter just for this purpose. Benwing2 (talk) 20:57, 29 August 2023 (UTC)Reply
@Benwing2: Perhaps it is, it's just because vocalised Urdu (Punjabi/Sindhi etc.) just looks weird. I might just take your advice and use xlit, or just utilise the tr parameter. نعم البدل (talk) 21:01, 29 August 2023 (UTC)Reply
@نعم البدل FYI we are starting to present Persian text in vocalized form as well; cc User:Sameerhameedy, User:Atitarev. So maybe it isn't so weird after all. Benwing2 (talk) 21:03, 29 August 2023 (UTC)Reply
@Benwing2: Vocalised Urdu, when it comes to dictionaries, is fine. Vocalised Urdu in actual usage, or quotes, is quite unorthodox. At most you'll have one or maybe two diacritics on a word to clear ambiguity, not on every single word. In any case would it be possible for you to utilise one of your bots, and just convert the norm parameters on the Urdu lemmas with quotes, and replace it with the tr parameter, with the transliteration that |norm gives, or would it better for me to do it? نعم البدل (talk) 21:07, 29 August 2023 (UTC)Reply
Actually never mind. I forgot Urdu doesn't have a transliteration module, and quote-book-ur used to call Module:pa-Arab-translit, so I'm going to have to render the transliteration manually anyways, for the Urdu lemmas with the norm parameter. نعم البدل (talk) 21:13, 29 August 2023 (UTC)Reply
@نعم البدل Are you sure? The normalization params are getting transliterated correctly, which means Urdu must have a translit module. Benwing2 (talk) 21:15, 29 August 2023 (UTC)Reply
They're fine for Punjabi lemmas, but for Urdu lemmas, the transliteration isn't there, only the normalised text. It's fine I'll sort them out. I did ask Kutchkutch to add Module:pa-Arab-translit as the transliteration module for Urdu, but I think he's not been active for some time. Thanks for the list. نعم البدل (talk) 21:21, 29 August 2023 (UTC)Reply
@نعم البدل Yes, we are a dictionary, that's why I think it's reasonable to make the fully vocalized text visible. If you want to convert the norm parameters, there were only 16 or so uses of {{quote-book-ur}}, which were the following:
So it shouldn't be hard to convert them by hand using {{xlit}}; I'd keep the vocalized text in the wikitext in case we change our mind about whether to show it. Benwing2 (talk) 21:14, 29 August 2023 (UTC)Reply
@نعم البدل: vocalised Urdu, Persian, etc. are totally fine for a dictionary. https://rekhtadictionary.com/ uses quite extensively vocalised Urdu. Check also mod:ur-translit and new Urdu lemmas where vocalisations are used in the headword. Anatoli T. (обсудить/вклад) 22:58, 30 August 2023 (UTC)Reply

author field wikipedia links edit

Thank you for cleaning up the oftentimes messy {{quote-book}} templates, some of which I admit are my own additions. I notice also that the bot is replacing some of the author names with links to their Wikipedia articles. I may worry too much, but it occurred to me that it's possible that in just a tiny number of cases, the bot may be adding a link to the wrong person. Suppose that there is a relatively obscure fantasy author named Kevin Harvick .... would the bot see this and then add a link to Kevin Harvick, a racecar driver? Or does the bot know, perhaps through Wikidata, that in this case the person with the Wikipedia article is not a published author? Best regards, Soap 06:36, 30 August 2023 (UTC)Reply

@Soap Thanks for the concern. The code to clean up quote templates has grown to over 1,500 lines and it's easily possible for there to be bugs in this code. In this case, the links to the Wikipedia article are being added only when there's an existing |authorlink= field, which is supposed to point to the appropriate Wikipedia article. If the author name and |authorlink= field are the same, you get an author that looks like w:Kevin Harvick; otherwise you get something like [[w:Kevin Harvick (author)|Kevin Harvick]], because the w: prefix doesn't (or didn't, until today) support two-part links following it. So hopefully it won't be adding any bad Wikipedia links. Benwing2 (talk) 06:48, 30 August 2023 (UTC)Reply
Okay, thank you for the prompt reply. To be honest I dont think I even knew about authorlink= and Im pretty sure Ive never used it except if copypasting an existing citation from another entry where the quote happened to contain both words. I'd come to think of these entries as "mine" even though I didnt type out the templates for some of them. But this is good to know ... the only cases in which there would be an incorrect link to Wikipedia are those where there was an incorrect link already, so what i'm worried about isn't going to happen. Again, thanks for all of your hard work. Soap 06:56, 30 August 2023 (UTC)Reply
I have not noticed any problems so far, but I will let you know if I see something. --Geographyinitiative (talk) 10:09, 30 August 2023 (UTC)Reply

New Polish pronunciation module edit

Hey Ben, the new module is ready. I'm writing to check the best course of action - I think the name will change to {{pl-pr}} and we'll make {{pl-pronunciation}} still the main, but I suppose we have to orphan the old module first in order to reclaim the name. I suppose I could make {{pl-pr}}, write a docu and give examples of how to bot convert, then we orphan the old code? Indicdentally, could we make a few other easy bot changes? Vininn126 (talk) 12:51, 1 September 2023 (UTC)Reply

@Vininn126 I am pretty much ready to work on this now. Where is the module code? Benwing2 (talk) 19:02, 1 September 2023 (UTC)Reply
MOD:pl-szl-IPA. I'm aware that Polish and Silesian are merged, however, I had grand dreams of using similar code for tons of Slavic languages anyway :p Vininn126 (talk) 19:06, 1 September 2023 (UTC)Reply
@Vininn126 @Catonif I'd like to switch the module code to use inline modifiers instead of the current way with separate params, similar to how {{es-pr}} and {{it-pr}} work. Do you mind if I make the change? We now have a library to assist with parsing inline modifiers so it's pretty easy to make use of them. Benwing2 (talk) 19:15, 1 September 2023 (UTC)Reply
You mean for example using <> instead of a pipe? I have no strong feelings one way or the other. Vininn126 (talk) 19:16, 1 September 2023 (UTC)Reply
@Vininn126 Right, so instead of entering |q3=foo along with the third pronunciation, you write <q:foo> after it. The way that {{es-pr}} works, you can specify multiple pronunciations either in a single comma-separated param or in multiple params; the difference is that all pronunciations given in a comma-separated param end up on the same line, whereas different params go on different lines. This lets you group sets of pronunciations that differ only slightly but separate those that differ more significantly. Benwing2 (talk) 19:23, 1 September 2023 (UTC)Reply
I do find using <> more "intuitive" in many ways, as it reduces your need to "count" various inputs - I wonder if it'd be possible to have our cake and eat it, too? I.e. have both? If not, having inline params is fine with me. Vininn126 (talk) 19:26, 1 September 2023 (UTC)Reply
@Vininn126 Having both is possible if you want that; this is how {{syn}} and {{alt}} work as well. Benwing2 (talk) 19:30, 1 September 2023 (UTC)Reply
I don't see why not. It gives editors some freedom if they prefer one over the other. Vininn126 (talk) 19:37, 1 September 2023 (UTC)Reply
@Benwing2 Yes, I also had planned on implementing your angle bracket syntaxt but it's good I can leave you to it. The affected parameters would be qual, q (its alias) and ref in both export.IPA (that handles {{szl-pr}} and will handle {{pl-pr}}) and export.mpl_IPA (that will handle {{zlw-mpl-IPA}}, i.e. the standalone template for MidPolish-only words, not MidPolish transcriptions in general, that are handled by {{pl-pr}}). I'm not sure if you want to handle mp_q(ual) and mp_ref (params that will be given to {{pl-pr}}) as, e.g., |mpN=...<q:...>, rather than equivalent |mpN=...|mp_qN=... that would be used otherwise. Catonif (talk) 19:50, 1 September 2023 (UTC)Reply
Please let me know what I can do to help. Vininn126 (talk) 10:31, 4 September 2023 (UTC)Reply
@Vininn126 Don't worry, I'm working on it :) ... I am visiting my mother but will be spending some time cleaning up the module. Benwing2 (talk) 20:36, 4 September 2023 (UTC)Reply
No sure, I'm just trying to be helpful! Vininn126 (talk) 21:46, 4 September 2023 (UTC)Reply
Can I tack on a few other bot changes?
  1. Remove etydates from Polish terms inherited from Old Polish (unless there is doubt, i.e. in rocznik)
  2. Remove :* {{R:pl:NFJP}} and * {{R:pl:NFJP}} from references section
  3. Change </references> to {{reflist}} Vininn126 (talk) 11:04, 5 September 2023 (UTC)Reply
@Vininn126 Can you clarify what needs to be done for #1 and give me some examples? The other two look easy. Benwing2 (talk) 19:59, 5 September 2023 (UTC)Reply
@Vininn126 Did you mean {{R:pl:NKJP}}? I don't see {{R:pl:NFJP}} in the References section. Also can you explain why this needs to be removed? Should it instead be moved to Further Reading? Benwing2 (talk) 04:37, 6 September 2023 (UTC)Reply
I did mean that one, sorry! This is the corpus I use for collocations and quotes. Do you think it makes sense to keep in FR?
For #1, I mean for example zapowiedź, where it should be removed, but rocznik where it shouldn't. Vininn126 (talk) 08:31, 6 September 2023 (UTC)Reply
Thank you for replacing references with reflist! I forgot to mention there might be some empty references sections once we remove the etydate from the Polish ety. Vininn126 (talk) 16:11, 6 September 2023 (UTC)Reply
@Vininn126 thanks! What is the reason for removing #2? Is it that it’s useless or just in the wrong section? Benwing2 (talk) 17:30, 6 September 2023 (UTC)Reply
It's just a corpus - it'd be a bit like sending people to g-books or something as further reading. I sometimes use it for etydates, collocations, and a lot for drawing quotes, but those are their specific tools. So I guess its current use as a template would be for use in {{etydate}}. Vininn126 (talk) 17:34, 6 September 2023 (UTC)Reply
Also sorry for the spam - for the new Polish code we should also consider that we will change {{zlw-mpl-IPA}} to use it and we might have to change its respellings as well. Vininn126 (talk) 17:36, 6 September 2023 (UTC)Reply
@Vininn126 Thanks and no problem. I will go ahead and remove {{R:pl:NKJP}}. As for the pronunciation module, I am rewriting it to use the multidialect code from Module:es-pronunc, which is very general, including with groups of dialects, phonemic/phonetic and optional show/hide, so it should be possible to use it to handle Middle Polish, Northern Borderlands, Southern Borderlands, etc. Benwing2 (talk) 18:15, 6 September 2023 (UTC)Reply
Cool, great! Thank you very much. Vininn126 (talk) 18:29, 6 September 2023 (UTC)Reply
@Vininn126 I have removed {{R:pl:NKJP}}. Removing the etydates is a bit more tricky. One thing I notice is the {{etydate}} in the Polish entry often has a reference that is missing from the Old Polish entry. Should we instead move the reference to the Old Polish entry? It would be a shame to lose it. BTW what is the end date of Old Polish? For example, aby has an Old Polish entry with etydate of 15th century, but I might expect that to be Middle Polish based on the periodization of Russian. Benwing2 (talk) 07:08, 7 September 2023 (UTC)Reply
Thanks!
Can you give an example? Chances are we shouldn't remove it. Old Polish ends at 1500. Vininn126 (talk) 07:55, 7 September 2023 (UTC)Reply
@Vininn126 Nearly all pages work this way. See a, abociem, aboć, abojem, just to name the first few. Benwing2 (talk) 07:58, 7 September 2023 (UTC)Reply
@Vininn126 In that case when does Middle Polish end? Benwing2 (talk) 07:59, 7 September 2023 (UTC)Reply
Ah, I see! If the reference is using {{R:zlw-opl:SPJSP}} or rarely {{R:zlw-opl:SSP1953}} then the information isn't lost but in the Old Polish entry :) And the Old Polish entry has the etydate and it based on those templates as well. Middle Polish is from 1500-1780, more or less, but Middle Polish is an etycode only lang with labels and such, so it's an extension of Polish. Vininn126 (talk) 08:01, 7 September 2023 (UTC)Reply
Sorry, I just realized one last thing - if etydate has two references and one of them is {{R:pl:NFJP}} then that should be moved to further reading, at the bottom. I think that would be my last change. Vininn126 (talk) 08:05, 7 September 2023 (UTC)Reply
@Vininn126 Was traveling today. I see in e.g. abociem the same template is found in the |ref= param of {{etydate}} in Polish and in the References section only in Old Polish; should it be moved from the Old Polish References section to the |ref= param, like you did for Polish? Benwing2 (talk) 05:53, 8 September 2023 (UTC)Reply
It doesn't need to be there! That is exactly the kind of entry where I'm trying to remove it. Please see how it looks now. szerokiej drogi! Vininn126 (talk) 07:46, 8 September 2023 (UTC)Reply
@Vininn126 I guess what I mean is, is it better for the Old Polish {{R:zlw-opl:SSP1953|abociem|9|1}} to be hanging out by itself in the Old Polish ==References== section, like it is now, or should it be in the Old Polish {{etydate}} |ref= param? Benwing2 (talk) 07:57, 8 September 2023 (UTC)Reply
BTW thanks for the well wishes! I just drove from Tucson AZ to Austin TX -- about 14 hours, long drive. Benwing2 (talk) 07:58, 8 September 2023 (UTC)Reply
It should be non-inline, I believe, as the work showing it's first attestation is usually in the entry itself - words starting with a excluded, but my plan is to change that.
Ah yes, the classic American road trip, I remember hours upon hours of staring at corn and fields. Vininn126 (talk) 08:01, 8 September 2023 (UTC)Reply
@Vininn126 I see, thanks! Benwing2 (talk) 08:17, 8 September 2023 (UTC)Reply
I was thinking about your plan to implement better dialect infrastructure and how it might influence Middle Polish - my original plan was to have dialectal stuff alongside modern stuff automatically, and to have Middle Polish be printed manually (i.e. as a manually supplied parameter), as I need complete control of 1) if it shows up at all and 2) what era it needs to show up in. I'm not sure if that affects your plans. Vininn126 (talk) 11:17, 10 September 2023 (UTC)Reply
@Vininn126 I think between the existing Spanish and Portuguese modules, all these cases are already covered, esp. the Portuguese module, where there are several dialects, grouped into dialect groups (e.g. Brazilian vs. European), and you can turn on or off individual dialects or groups through parameters as well as specify dialect and group-specific respellings. Brazilian and European Portuguese differ significantly in pronunciation and some differences aren't predictable (and in addition there's no normative pronunciation anywhere in Brazilian Portuguese, so the best we can do is list the pronunciations of several cities). What I did for this was to implement several special characters that say e.g. "such and such a vowel is pronounced X in Brazil and Y in Portugal"; this is all documented in {{pt-IPA}}, see Template:pt-IPA#Symbols indicating other Brazil-Portugal differences (vowel raising, glides, epenthesis). Benwing2 (talk) 19:32, 10 September 2023 (UTC)Reply
I'm not sure why I even bring up these things anymore! Of course you've thought all this through. Thanks for responding, I'm just bringing it up as someone who deals with it regularly. Vininn126 (talk) 19:35, 10 September 2023 (UTC)Reply
@Vininn126 Please feel free to mention things! It's always helpful. Benwing2 (talk) 19:42, 10 September 2023 (UTC)Reply

adding Dingal language edit

I tried adding but there seems to be errors.Can you correct: Module:labels/data/lang/inc-din and Category:Dingal data modules(this one shows error) कालमैत्री (talk) 14:24, 1 September 2023 (UTC)Reply

@कालमैत्री You can't create new languages like this. In general in order to create a new language you should post in the Beer parlour (WT:Beer parlour/2023/September) requesting the language to be added. Some people may have opinions on how this should be done. Benwing2 (talk) 19:04, 1 September 2023 (UTC)Reply

Module:ur-headword edit

I wasn't the one who created the module. I've only fixed it for prod. — Fenakhay (حيطي · مساهماتي) 06:19, 7 September 2023 (UTC)Reply

@Fenakhay Oops! Sorry about that. My message to you should then be directed to User:AryamanA: "please follow the way Hindi already does it; these are not phrasal verbs but compound verbs, and the light verb in the construction is not a particle but a verb". This is in regards to categories like Category:Urdu phrasal verbs with particle (کرنا). Benwing2 (talk) 06:23, 7 September 2023 (UTC)Reply

Reconstruction:Old English/Woðen edit

Hello Benwing2,

In regards to Woðen, this is a valid reconstruction because "Wothen" is attested in the Ethelwerdi Chronicorum. You can see it yourself at this link here: https://books.google.ca/books?id=N1v31IRyUdsC&newbks=0&printsec=frontcover&pg=RA1-PA995&dq=West-Saxonia&hl=en&redir_esc=y#v=onepage&q=Wothen&f=false

Thank you. Leornendeealdenglisc (talk) 10:55, 8 September 2023 (UTC)Reply

@Leornendeealdenglisc First of all, that was four years ago. Secondly, a vague spelling in Latin doesn't indicate that we can reconstruct an Old English word like this. Do you understand how Old English historical phonology works? PGmc /d/ would show up as d in Old English and not as ð. If you want to reopen the failed RFV, you need to post to WT:RFVN. Benwing2 (talk) 20:45, 8 September 2023 (UTC)Reply

Another thought that's been on my mind edit

This might be too drastic, but I think we might be able to use + templates for all Slavic languages. Anyone actively editing Slavic languages that opposed now actively use + templates! Furthermore, I see other active editors for various languages frequently using them. Sorry to dump a bunch of stuff on you, it's just been on my mind a while! Vininn126 (talk) 12:37, 8 September 2023 (UTC)Reply

@Vininn126 This is fine with me as long as you think it's OK. It is easy to make this switch. Benwing2 (talk) 20:41, 8 September 2023 (UTC)Reply
Sounds great! Vininn126 (talk) 20:43, 8 September 2023 (UTC)Reply
I see you ran the script as well as a few other cleanup scripts. Thanks! Vininn126 (talk) 10:05, 11 September 2023 (UTC)Reply

Bot errors edit

Your current script has problems: first of all, it's only running the check for unnecessary script parameters on the first of the multiple terms you're merging. More importantly, {{alter}} apparently doesn't allow numbered script parameters, even if the documentation says it does. The combination of the two means there are a number of entries in CAT:E with the error "The parameter "sc2" is not used by this template." Chuck Entz (talk) 20:52, 10 September 2023 (UTC)Reply

@Chuck Entz Thanks. The removal of unnecessary script params was actually done semi-manually through me editing the total set of lemmas in a text editor, which is why the sc2= things didn't get removed. I think the script that merges {{alt}} is OK but we should fix {{alt}} to accept numbered script params; I will do that and also remove the unnecessary sc2= params. Benwing2 (talk) 20:56, 10 September 2023 (UTC)Reply

Bot mistakes edit

Hello, I noticed your bot adds text around suffixee templates as if they were some sort of etymologies, e.g. here, here and here. — Phazd (talk|contribs) 23:15, 10 September 2023 (UTC)Reply

@Phazd Whoops, will fix. Benwing2 (talk) 23:23, 10 September 2023 (UTC)Reply

Handling escapes in character links edit

I've generally favoured the idea of using \ as an escape character in links, to override the special behaviour of characters like ^ etc. I think we should probably define a consistent behaviour for it going forward, because it's the kind of thing that can get really confusing if done badly.

My inclination is that we should handle it in the same way as Lua and regex, which is that it always results in the next character being treated in a literal, irrespective of whether that character is special in any way. This avoids users having to guess whether a character needs to be escaped or not, because that could get really confusing when dealing with links to emoticons - especially given that some special characters only have special behaviour in certain situations.

Related to this, there's also the question of how we should handle links to titles that use unsupported characters: \000-\031, #, <, >, [, ], {, |, }, \127 and (depending on the context) %, &, ., / and ;. At the moment, if you enter something like {{l|mul|]}} the module will assume you want to link to the character ]: ]. However, when you get to more complex examples, we start getting lots of ambiguity when we don't know which characters to treat as literal: {{l|mul|[[[]][[]]]}} becomes [[[]]], when intuitively it should be [] (i.e. a link to [ then a link to ]). This is unlikely to occur in a standard link entered by a user, but it definitely could turn up in an automatically-generated headword line for an emoticon, for example.

I think the most sensible solution is to require that any bad characters be escaped, or else the parser will simply treat the wikitext in the default way. Doing it this way means that the first set of bad characters will always require it, while the second set will only require it if they'd otherwise cause the link to fail: e.g. {{l|en|5:2 diet}} will work, but {{l|mul|:}} will need to be corrected to {{l|mul|\:}}; the above example would be {{l|mul|[[\[]][[\]]]}}. Any unnecessary escapes would simply make no difference, since \ will always treat the next character as a literal anyway (e.g. {{l|en|5\:2 diet}} would also work). Theknightwho (talk) 15:43, 12 September 2023 (UTC)Reply

@Theknightwho Sounds good to me. This is how Perl regexps work (which is where this first came from), except that backslashed ASCII letters and numbers have special meanings, which we should probably keep because people are familiar with this. And yes it can get very confusing if done wrongly, just look at shell quoting conventions. Benwing2 (talk) 20:16, 12 September 2023 (UTC)Reply

Bot non-mistakes edit

Hey. I actually found a handful of good edits by your bot, for a change. Jewle V (talk) 20:35, 13 September 2023 (UTC)Reply

Splitting se edit

Yeah, I knew that was gonna be reverted. I was trying to be the WT hero and save memory issues. We should put a sign at the top of the page saying something like "due to memory issues, this page is too fricking huge to display properly, so we have to split this page in two. Please see subpage 1 and subpage 2" Jewle V (talk) 22:45, 16 September 2023 (UTC)Reply

Or make a template. Something like {{massive page}} Jewle V (talk) 22:47, 16 September 2023 (UTC)Reply
The problem is that if we split pages like this, we have to make infrastructure changes so that e.g. links go to the right place, which will take some thinking; we can't just do what you're trying to do. I think User:Theknightwho is close to having a parser solution that should reduce memory usage, so hold tight for a bit. Benwing2 (talk) 22:48, 16 September 2023 (UTC)Reply
We don't need infrastructure changes. The guy reading the page just needs to do one extra click, and problem is solved. You know, TKW is never gonna find parser solutions. The Lua memory issues are gonna plague him all his life and he'll end up a bitter old man as a result. Jewle V (talk) 22:53, 16 September 2023 (UTC)Reply
The parser does work lol. It just needs to be a bit faster for the extreme edge cases (but not much). Theknightwho (talk) 22:57, 16 September 2023 (UTC)Reply
Well, if it makes you happy to work on it, then there are no complaints from me Jewle V (talk) 07:55, 17 September 2023 (UTC)Reply

Forcing fonts for Kyrgyz and Kazakh edit

Hi, this is a bit of a minor issue but I noticed that we (seemingly) force a specific font for Arabic, Persian, and Uyghur language links, I wonder if we could do the same for Kazakh and Kyrgyz language links?

I say this because my system font (and most Arabic fonts) don't support some rare characters in Kyrgyz and Kazakh, however when I put a Kyrgyz word such as ۉيچۉل in with the lang|ar or "ug" or "fa" ۉيچۉل it forces "Noto Naskh Arabic", a font which does support rare characters. And Kyrgyz and Kazakh words will display correctly.

Do you know how Arabic, Persian, and Uyghur links are able to force a specific font? And if you do, can we force Kyrgyz and Kazakh to display in the same fonts? If not, that's fine, I can always just change my system font. But since we're already forcing fonts for some languages we should do it for these languages too, since so few fonts fully support them. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 21:16, 21 September 2023 (UTC)Reply

this problem is most prominent for me on mobile btw. On my computer it seems to force the font Noto Naskh Arabic and displays properly. سَمِیر | Sameer (مشارکت‌هاکتی من گپ بزن) 21:17, 21 September 2023 (UTC)Reply

Replacement of unnecessary redirects and templates edit

Hi, another batch for your attention:

Thank you. — Sgconlaw (talk) 15:52, 30 September 2023 (UTC)Reply

@Sgconlaw Done and apologies for the long delay. Benwing2 (talk) 09:30, 24 October 2023 (UTC)Reply
Thanks! No worries about the delay—it gives me a chance to add other requests to the list, heh heh. — Sgconlaw (talk) 10:26, 24 October 2023 (UTC)Reply

FYI re new modules and increased memory usage edit

Hiya - from doing some experimenting, it seems that splitting modules increases memory use if they're loaded via mw.loadData, but that doesn't apply if they have to be loaded via require (where there's no measurable difference). This has been useful for the wikitext parser, because it means I can fully modularise it so that handlers are only loaded into memory when they need to be, but it's also useful to keep in mind generally. Theknightwho (talk) 16:24, 2 October 2023 (UTC)Reply

Another thing is that it probably means we should consolidate the general-purpose data modules that are likely to be loaded on most pages (e.g. Module:headword/data and Module:links/data for a start). Theknightwho (talk) 16:30, 2 October 2023 (UTC)Reply
@Theknightwho This is good to hear, but note that I saw the opposite when I tried to split Module:gender and number into a "basic" version and a full version; see Module:gender and number/basic for this experiment. When I tried to implement this, several pages that I previewed went over the memory limit when they hadn't before. 18:40, 2 October 2023 (UTC) Benwing2 (talk) 18:40, 2 October 2023 (UTC)Reply
Granted, this was not a comprehensive experiment. Benwing2 (talk) 18:41, 2 October 2023 (UTC)Reply
@Benwing2 It's difficult to say - I think mw.loadData is the biggest cause of unpredictable memory use, but it surprises me that use would go up in those cases to be honest (given the size of the basic version).
Having a look at Module:gender and number/data, I suspect that mw.loadData might be actually making things worse: when a module is loaded, it creates a (unique) metatable with 6 keys every time the data is loaded, and also for every subtable inside the data. That means if there are 50 subtables, 51 new metatables are created every time it's loaded. When a data module has lots of small subtables like that one (with 2 or 3 keys each), each additional load via mw.loadData actually uses more memory than simply loading it via require in the first place. Theknightwho (talk) 19:13, 2 October 2023 (UTC)Reply
@Theknightwho This makes sense. Note that one experiment I did was to split Module:place/shared-data into data-only and code modules so I could use loadData() on the former, and when I tested a page with 60 invocations of {{place}}, the memory went up from 25M to 29M when using loadData(). This surprised me a lot (and still surprises me, given the large number of invocations on that test page), but the data in Module:place/shared-data has lots of little subtables. (I think we could optimize the data there to reduce the number of subtables and replace them with strings in many cases, but that's another story.) Benwing2 (talk) 19:25, 2 October 2023 (UTC)Reply
@Benwing2 That makes sense. I wonder if it's worth using Module:languages/data/all, too: the pages with the most memory problems are already loading most of the language modules anyway, and loading them together like that would avoid the overhead of running mw.loadData on 27 different modules. Theknightwho (talk) 19:52, 2 October 2023 (UTC)Reply
@Theknightwho It's definitely worth trying IMO. I think what we'd want to do is have a list of high-memory pages where we load Module:languages/data/all in place of individual modules, and see if it helps. Benwing2 (talk) 20:55, 2 October 2023 (UTC)Reply
@Benwing2 Definitely. Another idea might be loading data via require and then forcing Lua to deallocate the memory before returning from the function. It's possible to force this by explicitly setting all the keys to nil, then allocating a single key to the table before returning from the function. Lua won't deallocate memory for objects when you remove keys, but does check if they can be resized when allocating, so this a hacky way to force an emergency garbage collect. It's slow, but probably worth it for any potential savings. Theknightwho (talk) 21:40, 2 October 2023 (UTC)Reply

Design question re wikitext parser edit

Hi - just to give some background, the output from the postprocessing parser is an array of UTF8 characters with tokens representing various formatting (e.g. [[wikilink#fragment|alt text]] or [https://example.com/ alt text]). The intention is so that we can have method(s) for iterating over the array in a way that ensures certain elements remain untouched, along with an options parameter that allows this to be varied (e.g. only iterating over display text, only iterating over link targets etc). They also get normalised, so repeated spaces become single spaces, HTML entities (or percent-encoding in link targets) get decoded etc.

In the case of HTML tags, a single token represents the entire tag, so a standard opening tag for an English link (<span class="Latn" lang="en">) would contain:

{
	tag = "span",
	class = "Latn",
	lang = "en"
}

(Note that "tag" isn't treated as a valid HTML attribute by MediaWiki and would simply be ignored, so there's no reason to bother with an attribute subtable since there's no risk of a clash.) Tokens also have simple class methods, to make identifying or editing them straightforward.

In some cases, the same output can be achieved using wikitext or HTML:

  1. ''foo'' is <i>foo<i/>.
  2. \n* foo is <ul><li>foo</li></ul>.

I think it makes sense to normalise these, which simplifies any processing that needs to be done by onward functions. My inclination is to normalise wikitext inputs to HTML. For example, ''foo'' becomes:

{
	HTML Open Tag {tag = "i"},
	"f",
	"o",
	"o",
	HTML Close Tag {tag = "i"}
}

Alternatively we could try normalising to wikitext, since in certain situations (e.g. lists) it's more straightforward most of the time. However, I'd be less keen on that since wikitext gets pretty funky in the edge-cases in a way that HTML does not. What do you think? Theknightwho (talk) 16:59, 4 October 2023 (UTC)Reply

@Theknightwho I think it's good to have some specific use cases in mind to use as test cases, but my instinct is that you are right about normalizing to HTML because of the edge case problem. I don't think normalizing to wikitext is feasible, but a third alternative is to leave things unnormalized and represent HTML and wikitext separately. Again it depends on what the use cases are for postpocessing the parser output. Benwing2 (talk) 03:56, 5 October 2023 (UTC)Reply
@Benwing2 It'll probably be easiest to decide once it's in an alpha state, because at that point I'll be able to start experimenting with redesigns of some of the core modules to see what works best. I did consider the third option of not normalising, and it would be the simplest approach, but I think it would be a mistake since it just creates more work for any downstream modules using the output.
Btw, the performance seems pretty good at this point: 10,000 invocations at User:Theknightwho/sandbox12 use 2.6MB and take 6 or 7 seconds to process; trying that number with {{l}} doesn't even get halfway through before timing out, even with a short English word. I'm sure I can get it down a bit further, too. Theknightwho (talk) 09:56, 5 October 2023 (UTC)Reply
@Theknightwho All sounds good. Benwing2 (talk) 01:22, 6 October 2023 (UTC)Reply
@Benwing2 I was wondering how best we should parse links to unsupported titles: anything containing #, <, >, [, ], _ {, }, |, , \000-\031 or \127, plus a few disallowed patterns. This also applies to links to pages using our own special characters as well, like \ and ^. I'll list some options using plain links, but assume that [[foo]] is equivalent to {{l|xx|foo}}, or part of something like {{l|xx|[[foo]] bar}}. The wikitext parser + existing modules will automatically handle what the "real" output is, but that's not important here.
  1. Match the default parser, so links fail if they contain these or anything that resolves to them. Unsupported titles can only linked to using backslash notation: e.g. [[f\#\#k]] as a link to f##k.
  2. Treat the literal characters as valid in links where possible, and only require escapes when it's actually needed. This seems like a bad idea, since it would make things unpredictable for users, and could mean any change to the special-character logic (e.g. introducing new features) might have a bunch of unintended consequences if a special character suddenly stopped/started being treated as a literal. We'll always have that problem if we want to introduce a new special character, but mandating escapes keeps things sane when it comes to the existing ones.
  3. Treat HTML entities as escaped forms as well. By default, these are resolved to their equivalents before parsing, so &num; (#) is treated as a fragment marker, for example. [, ] and | are exceptions, so entities for them simply fail the link outright. Instead, we could treat HTML entities as escaped forms: e.g. [[&gt;&gt;&gt;]] as a link to >>>. I'm in favour of this one since it's intuitive to editors, but the main disadvantage is the added complexity this causes for third-party bots, since they wouldn't be expecting this. It also makes the consequences of accidental double-decoding worse, but that would only happen if something got run through the wikitext parser twice in the same invocation (which should never happen). This wouldn't be a problem for nested link templates, though.
  4. Treat percent-encoding as an escape method as well. By default, these only resolve in link targets + fragments, but work in exactly the same way as HTML entities. Again, we could treat them as escaped forms: e.g. [[%5F(.%5F.)%5F]] as a link to _(._.)_. I'm less keen on this, since percent-encoding is a lot more opaque to human editors. There's also the fact that MW double-decodes any HTML entities that have themselves been percent encoded (e.g. [[%26%79%65%6E%3B]][[&yen;]][[¥]]: ¥), so I don't know if there'd be some weird side-effects if we started treating these as literals. The other disadvantages are the same as those for HTML entities.
  5. Only use HTML entities and/or percent-encoding as escape forms. This would potentially mean there's less for editors to learn, but at the expense of wikitext readability.
On balance, my favoured approach is number 3. Since the parser tokenises wikilinks in one pass, there's no measurable difference between any of these approaches in terms of performance, so I'm purely interested in what you think the most practical approach is from a usability/entry maintenance standpoint. Theknightwho (talk) 16:59, 12 October 2023 (UTC)Reply

A QOL addition to quotation templates edit

A |title_plain= parametre that doesn’t force <cite> would be of much use to me. Not all quotations have titles that need italics. As it stands, I’m forced to use <span style="font-style: normal;>. ―⁠Biolongvistul (talk) 07:58, 5 October 2023 (UTC)Reply

Alternative forms edit

Hey Benwing, while going through a bunch of articles (mostly Galician ones), I noticed some links in the alternative forms version were using Template:l instead of Template:alt as they should. Do you think you could use your bot to solve that? MedK1 (talk) 21:41, 6 October 2023 (UTC)Reply

es-pr audio bug edit

Apparently, when you specify the accent of an audio the template no longer shows it. Rodrigo5260 (talk) 03:41, 9 October 2023 (UTC)Reply

@Rodrigo5260 Hi, I saw your message about this but I'm not sure what you mean. Can you give me an example? Benwing2 (talk) 04:23, 9 October 2023 (UTC)Reply
@Benwing2 Look at Colombia for example, there is an audio link on the code, but it isn't shown outside of it. Rodrigo5260 (talk) 04:35, 9 October 2023 (UTC)Reply

Topic & Set categories edit

On the off chance you weren't already aware: it looks like your recent module edits broke all the topic and set category pages for 5 language codes:

  1. fa
  2. gsw
  3. pl
  4. tr
  5. zlw-ocs

None of the other language codes seem to have been affected, but for these language codes "all sets", "list of sets", "all topics", "list of topics", and everything under them on the tree all seem to have the same module error about labdata being nil. The fact that only 5 languages were affected would make it pretty hard to detect the problem via spot checks, so I can see how you might have missed this, but there are now about 6,000 categories in CAT:E due to it. Thanks! Chuck Entz (talk) 19:56, 9 October 2023 (UTC)Reply

@Chuck Entz Thanks, and my apologies, I've been running around today. It looks like those five language codes have bad alias settings in their language-specific label files, i.e. they are aliasing something to something else that doesn't exist. I will fix the aliases and also make the underlying code more robust to this. Benwing2 (talk) 22:17, 9 October 2023 (UTC)Reply

it-noun error at giovane edit

Hi! I just noticed that giovane currently has a bug with displaying the plural and diminutive forms, which were added to the it-noun template by this WingerBot edit in February. I wanted to check with you because I'm not sure if it's just one-off formatting typo for this page, or a case of the it-noun template not showing expected behavior (maybe due to a more recent change to an underlying module such as it-headword?). Urszag (talk) 03:56, 18 October 2023 (UTC)Reply

@Urszag Thanks for letting me know. It's probably an issue with the module. Benwing2 (talk) 04:32, 18 October 2023 (UTC)Reply

"Common" modules edit

I'm thinking of making some declension modules for Kashubian and Silesian (and maybe redoing the Polish ones...) and I was wondering about these "common" modules you seem to make. Could you tell me a little bit more about them? Vininn126 (talk) 11:04, 19 October 2023 (UTC)Reply

@Vininn126 They hold common functionality used in more than one module, e.g. between pronunciation and conjugation modules or between conjugation and declension modules. BTW I know you are waiting for me to finish up the changes to the Polish pronunciation module and I am going to work on it next. I've had a lot of interruptions recently for various reasons, my apologies. Benwing2 (talk) 08:45, 21 October 2023 (UTC)Reply
What are some of the common functionalities for Czech?
Also I understand. I'm sure the list is huge :P Vininn126 (talk) 10:24, 21 October 2023 (UTC)Reply
This question is related by looking at adjective modules - I believe when you added old_dat to the Polish adjective module and bot converted many entries, some entries received old_dat=scy or something similar, which should just be removed from entries entirely, and old_dat should always be set to one and only end in -u. Could you have your bot remove old_dat=scy? There's no need to convert them. to be honest the Polish adjective module could use a rewrite. Vininn126 (talk) 09:41, 24 October 2023 (UTC)Reply
@Vininn126 Apologies for the delay. Module:cs-common has the following in it, approximately:
  1. Various regexes for specific vowels, consonants and specific subsets, e.g. all velars, all labials.
  2. A function to apply vowel alternations, such as between "long" and "equivalent short" vowels, e.g. ů and o, which could be used by both noun and verb modules.
  3. Functions to handle the first palatalization, the second palatalization and iotation. Possibly shared between noun and adjective modules.
  4. Functions to handle reduction and dereduction (i.e. insertion or deletion of a fleeting vowel) of nouns and short adjectives.
  5. Functions to convert between "plain" and "paired palatal" versions of certain consonants (d vs. ď, t vs. ť, n vs ň, etc.) depending on the ending.
  6. A function to combine a stem with an ending, taking care of things like the plain vs. paired palatal form of the consonant at the end of the stem.
As for the Polish adjective module, yeah probably it wouldn't be too hard to rewrite to follow the form of the Czech and Ukrainian adjective modules; adjectives in general aren't so hard in most Slavic languages and I can't imagine Polish is all that different. As for removing old_dat=scy, yes that can be done. I forget the purpose of the scy, can you elaborate? Are we losing information here that we'll later have to put back by hand? Benwing2 (talk) 05:52, 25 October 2023 (UTC)Reply
BTW I am going through the pl-szl-IPA module now. There is a big impedance mismatch between the structure of Catonif's code and the structure of the Spanish code that I wrote and want to use as the front end (the issue is not so much Spanish vs. Polish but different data structures). So I am figuring out how Catonif's code works and gradually adapting it to my use case. Usually this goes a bit slow at first and then speeds up dramatically as I figure things out. Benwing2 (talk) 05:57, 25 October 2023 (UTC)Reply
Also BTW the term "impedance mismatch" is not defined properly in Wiktionary or Wikipedia. Google's generative AI stuff actually comes up with a better definition: "Impedance mismatch is a problem that arises when two systems or components that are supposed to work together have different data models, structures, or interfaces. This can make communication difficult or inefficient." Benwing2 (talk) 06:00, 25 October 2023 (UTC)Reply
@Benwing2 Thank you very much for all the explanations! And yes, my aim was to model it the Czech module (but change the table format, add acceleration, and the only other missing parameter would be something for an optional old form). scy was used for virile plural forms, which are handled automatically. And stop apologizing! Vininn126 (talk) 08:25, 25 October 2023 (UTC)Reply
@Vininn126 So the Spanish front end supports multiple dialects, each of which can be grouped into a dialect group. Currently Spanish implements two dialect groups (Spain and Latin America). Spain has two dialects ("lleismo", where written ll and y are distinguished, and the dominant "yeismo", where written ll and y are pronounced the same). Latin America has four dialects ("lleismo", "yeismo", "Buenos Aires and environs" and "elsewhere in Argentina and Uruguay"). There are in addition some special-case labels in case some dialects are the same as others; e.g. if the two Rioplatense dialects ("Buenos Aires and environs" and "elsewhere in Argentina and Uruguay") have the same value, they show up as "Argentina and Uruguay", and if the two non-Rioplatense dialects have the same value they show up as "everywhere but Argentina and Uruguay". You can see an example of all six dialects being different in cebolla. A very different example is hielo, where both Spain and both non-Rioplatense Latin America dialects all have the same pronunciation but the two Rioplatense dialects each have a distinct pronunciation. Note that the normal display of dialect groups is to hide the individual dialects by default and display the "dominant" pronunciation of that dialect group when hiding the individual dialects. Portuguese uses a similar framework but divides things into dialect groups Brazil and Portugal, each of which has several dialects; the "dominant" pronunciation for Brazil displayed when the individual dialects are hidden is actually a separate supra-dialectal "general Brazil" dialect that tries to represent the most common features cross-dialectally. A further complication found in Portuguese but not Spanish is variants within a given dialect. You can see a complex example of this under experienciar, where each dialect has at least two possible pronunciations depending on the handling of the initial e- as /i/ or /e/ in Brazil and as /(i)/ or /ɐj/ in Portugal, and in addition for Brazil there is a split based on the pronunciation of i in hiatus, which is /i/ in slower pronunciation in Brazil and /j/ in faster pronunciation (with a qualifier added to this effect), but always /j/ in Portugal.
For Polish, we currently have three "dialects": current Polish, Middle Polish and Silesian. In addition, each one can actually have two variants: current Polish has "prescribed" and "casual", Middle Polish has "early and late" and Silesian has "standard" and "Opolskie". Potentially we could handle the "variants" using qualifiers, the way that Portuguese handles variants such as slower and faster pronunciation, or we could treat them as their own dialects coming under "Polish", "Middle Polish" and "Silesian" dialect groups. My instinct is to treat "Polish" and "Middle Polish" as distinct dialect groups, and if/when we add nonstandard Polish dialects (e.g. Mazurzenie, Northern Kresy and/or Southern Kresy), they can become dialects under the Polish dialect group. As for the variants, we can treat them either as separate dialects, as qualifier variants (like for Portuguese faster and slower pronunciation), or a mixture. My instinct here is to treat the "prescribed" and "casual" variants of Polish as qualifier variants, but the "early" and "late" variants of Middle Polish, and the "standard" and "Opolskie" variants of Silesian, as distinct dialects. But you may disagree; please let me know your opinion. In any case, the difference is mostly in display: variants with qualifiers show on the same line whereas different dialects show on different lines. Also, since Silesian is being treated as a separate language, I assume there will never be a situation where Polish and Silesian pronunciations come under the same ==Pronunciation== header, so effectively there will be (up to) two dialect groups for Polish (current Polish and Middle Polish), but only one for Silesian.
Thoughts? Benwing2 (talk) 09:04, 25 October 2023 (UTC)Reply
@Benwing2 I think your instinct is more or less what I'd agree with. Specifically with Mazurzenie those forms would actually get their own pagename, as Mazurzenie is usually reflected in spelling, but having it as a separate dialect would be best. As far as having Middle Polish be a dialect as well as Opolskie for Silesian, that's also a good idea. Same goes for having prescribed/casual as labels within the same dialect, your intuition is correct here. Vininn126 (talk) 09:13, 25 October 2023 (UTC)Reply
Thinking about it, I think putting Opole on a different line is a good idea, I'm unsure about separating Middle Polish by time. However on that note when the conversion is done I will need a list of Middle Polish terms with IPA spelled with rz so I can set any potential time parameters. Vininn126 (talk) 12:55, 25 October 2023 (UTC)Reply

es-pr bug edit

Hey. A friend of mine would like to report a bug in Template:es-pr. Apparently the Peru bit doesn't show up in edits like this. Anything we can do about it? Probs best to assume we are both too dumb to fix it ourselves P. Sovjunk (talk) 19:51, 28 October 2023 (UTC)Reply

descodificado edit

The accelerated Portuguese entry gives a "verb" header and template rather than "participle". Can you take a look? Ultimateria (talk) 02:52, 1 November 2023 (UTC)Reply

@Ultimateria Yeah, Spanish probably has the same issue. I'll take a look soon. Benwing2 (talk) 02:53, 1 November 2023 (UTC)Reply

Errors in recent bot run edit

I fixed diff and diff before I noticed that there was a pattern. According to my list of pages with open templates, only zonal has a similar error, but you're in a better position to know if there may be other pages affected that didn't end up with an open template. JeffDoozan (talk) 21:11, 3 November 2023 (UTC)Reply

what's the deal with placenames? edit

Hi. So, what's the deal with placenames? If you remove the categorization, you remove information. It's just a placename. Are you saying there isn't a superset category of placename? Ish ishwar (talk) 05:47, 6 November 2023 (UTC)Reply

@Ish ishwar There are lot of placename categories, but 'LANG:Place names' is obsolete and I am trying to clean up remaining uses. There is a 'LANG:Places' category but it's not supposed to contain entries directly; they should be categorized as e.g. 'LANG:Towns in New Mexico, USA'. {{place}} will auto-categorize correctly but I need to know where and what sort of place this is so I can create the right description. Where did you find this information? Benwing2 (talk) 05:50, 6 November 2023 (UTC)Reply
Well, let's converse in a single place? (Wiktionary:Requests for verification/Non-English#hìwkwíalto) Ish ishwar (talk) 06:01, 6 November 2023 (UTC)Reply

Bot run for {{RQ:Marlowe Tamburlaine}} edit

Hi, could you please do a bot run and do the following replacement, which is necessitated by an update to {{RQ:Marlowe Tamburlaine}}?

  • {{RQ:Marlowe Tamburlaine|1|II|iii|page=34|Meete with the foole, and rid your royall ſhoulders<br>Of ſuch a burden, as outweighs the ſands<br>And all the '''craggie''' rockes of Caſpea.}}{{RQ:Marlowe Tamburlaine|part=1|scene=iii|page=34|passage=Meete with the foole, and rid your royall ſhoulders<br>Of ſuch a burden, as outweighs the ſands<br>And all the '''craggie''' rockes of Caſpea.}}

The second parameter (with the value "II" in the above example) should be removed, as the template should now be able to determine the act number when the page number is specified. (The parameter is now giving rise to ParserFunction errors.) Thank you. — Sgconlaw (talk) 15:45, 7 November 2023 (UTC)Reply

@Sgconlaw Done (a few hours ago). Benwing2 (talk) 08:17, 8 November 2023 (UTC)Reply
Much obliged! — Sgconlaw (talk) 10:00, 8 November 2023 (UTC)Reply

Deleted template redirect edit

Why did you delete {{afo}} which redirected to {{alternative form of}}? I created it to be a handy shorthand initialism for the long template name, and I read nothing on WT:REDIR#Other namespaces arguing against it. Gaioa (talk) 09:12, 8 November 2023 (UTC)Reply

@Gaioa We already have {{altform}}, {{alt form}} and several other shortcuts. Having lots of random extra redirects creates a maintenance burden for people like me who run bot scripts. Benwing2 (talk) 09:31, 8 November 2023 (UTC)Reply
So because we have "several" already, we cannot have one more? I don't really see how going from 3 shortcuts to 4 is that much of additional "burden" and percentual increase. But sure I won't argue. Gaioa (talk) 08:05, 13 November 2023 (UTC)Reply
@Gaioa The problem is that everyone starts thinking like you, "oh it's just one more" and you get hundreds more across dozens of different templates. Benwing2 (talk) 08:08, 13 November 2023 (UTC)Reply

{{transclude sense}} edit

I see you are applying this template to more entries - I generally support the use of this template (though in same ways it still bodges things, it has a hard time with adverbs and verbs, not to mention the teechnical issues you've noticed as well). What is the scope of deployment you intend? I wouldn't mind including it in more Slavic entries if possible. Vininn126 (talk) 15:47, 9 November 2023 (UTC)Reply

@Vininn126 So far I've only been using it for {{place}}, i.e. proper nouns. If you have specific issues when you use it with other parts of speech, please let me know and I'll try to fix them. I definitely plan to add the ability to transclude from a source word that's in a language other than English, although that has its own issues. It would be great to be able to use a similar approach to reduce the duplication between Latin and Cyrillic entries in Serbo-Croatian (which would require having the Serbo-Croatian-specific ability to convert between Latin and Cyrillic). Benwing2 (talk) 20:29, 9 November 2023 (UTC)Reply
@Vininn126 I would like to create a shortcut for {{transclude sense}}. What do you think it should be? I'm actually thinking we should get rid of the existing {{transclude}} (which is only used in {{Navbar}} and a few navigation boxes that invoke it; it appears on a lot of quotation pages but not directly) and rename {{transclude sense}} to {{transclude}}. Maybe it could then be shortcutted as {{tcl}}. Thoughts? Benwing2 (talk) 09:28, 10 November 2023 (UTC)Reply
@Benwing2 That sounds good to me. I've also wanted to improve some functionalities of it, such as allowing for a shorter gloss. Vininn126 (talk) 09:46, 10 November 2023 (UTC)Reply

In case you haven't noticed edit

There are 44 pages in CAT:E. Aside from the 3 due to your reversion of Fenakhay's edits, there seems to be a common theme of Module:place throwing an error without checking for things like full-form prefixes. Please fix. Chuck Entz (talk) 16:44, 12 November 2023 (UTC)Reply

@Chuck Entz My apologies. I added more error-checking to Module:place, which flushed out the various errors. They are all cleared now. Benwing2 (talk) 20:33, 12 November 2023 (UTC)Reply

Greek verbs edit

Thanks for dealing with that!   — Saltmarsh🢃 07:47, 16 November 2023 (UTC)Reply

@Saltmarsh You're welcome! Benwing2 (talk) 07:49, 16 November 2023 (UTC)Reply

Creating a general pronunciation module framework edit

Hiya - so I’ve been doing a lot of work on the English pronunciation module, and eventually came to the conclusion that although I like the design of eSpeak, the actual pronunciation rules themselves need a total rewrite, since they’re far too tailored towards specific words, and often fail unpredictably when given words not anticipated for (e.g. many place names).

That being said, it wouldn’t be too difficult to use the design with other languages, and in most cases the ruleset would be a hell of a lot simpler, while still being much more sophisticated than most of the modules we use at the moment. Would you mind taking a look at the documentation for eSpeak and letting me know what you think? In particular, this document on how the rules work, and you can also find the dictionaries for a bunch of other languages on there as well. Some of it’s only relevant to text-to-speech (e.g. the various different pause lengths), but the special rule characters are worth looking at. There’s also this document, which details how phonemes work in English, since it shows how accent differences are handled. Theknightwho (talk) 17:14, 19 November 2023 (UTC)Reply

@Theknightwho I'll definitely take a look. I'm not surprised about your conclusions regarding the pronunciation rules; I think I mentioned earlier that the rules looked too specific and too numerous, meaning that it wouldn't be possible for humans to reason about them -- this is a key requirement for people being able to write respellings, and given the irregularity of English spelling, most words will require respelling. (Also, in order to handle multiword terms effectively, we'll probably have to implement scraping of individual-word pronunciations.)
As for rewriting the modules for other languages, a lot of them work pretty well at this point (e.g. the French, Russian and Italian modules have been stable for awhile, as has the Spanish module although it needs some work on obstruent clusters) so it might not be worth it for these; and for the ones that still have issues among the ones I've worked on, the issues are often not related to the structure of the rules themselves. For example, the Portuguese module has some issues but these are more due to the lack of a single standard accent esp. in Brazil, along with insufficient documentation for Brazilian Portuguese pronunciations as well as native speaker disagreements as to what is "normal" or not in different speech registers. (User:MedK1 has given me a list of issues to work through that I'll get to at some point.) But it might be relevant to languages that don't yet have good modules (although insufficient documentation on language phonetics may be an issue in many cases). Benwing2 (talk) 22:54, 19 November 2023 (UTC)Reply
@Benwing2 Thanks - yeah, that was pretty much my conclusion. I think we'll probably be able to reach ~95% accuracy, and having native intuition is helpful: e.g. "tion", "sion", "cion", "tian" (etc.) type endings are actually pretty regular, but the rules around them are intricate. One thing that it really excels at is stress, and (as is common in English) you can set it so that that certain strings cause primary stress to go on the preceding syllable: e.g. "-ity", "-ical" etc. It's possible to set language-specific stress rules, too.
One of the trickiest areas with English is where rules are regular but unintuitive to native speakers, meaning that they don't always translate well into newer coinages: e.g. a stressed "war" is /wɔː/ in most words where it appears, but recent coinages like warg buck the trend, and lesser-known place names like Wark - which is near where I live - have unstable pronunciations. The same goes for "wa" before most consonants: e.g. archaic wabble is pronounced the same as wobble, by analogy with swab, but a newer borrowing like nawab is unstable.
I think German and Dutch are two obvious candidates for this approach, particularly given that they already have stable rulesets that we can borrow/tweak. Theknightwho (talk) 23:18, 19 November 2023 (UTC)Reply
@Theknightwho Makes sense. Keep in mind that I have a mostly complete module for German already in Module:User:Benwing2/de-pron with extensive test cases in Module:User:Benwing2/de-pron/testcases/prefixes, Module:User:Benwing2/de-pron/testcases/suffixes and Module:User:Benwing2/de-pron/testcases/misc, so it probably makes more sense for me to fix the remaining issues than start over. But Dutch might be a good use case. Benwing2 (talk) 23:25, 19 November 2023 (UTC)Reply
By the way - the Portuguese accent issue might be solvable in the same way as English: it works with the "deep" phonemes, which are then translated into the surface phonemes depending on which accent you want. This means that we need to make more distinctions than actually exist in any given accent (e.g. it uses aa to repesent the "a" in bath, which is /ɑː/ in RP and /æ/ in GA; that's distinct from a, which is /æ/ in both, or A:, which is /ɑː/ in both. Sometimes the deep phonemes differ only in how they behave under stress: e.g. the "eau" in bureau is /əʊ/ normally, /ə/ when diminished (e.g. bureaucrat) and /ɒ/ when stressed (e.g. bureaucracy). You get the same thing with gastro-, astro- etc. That's distinct from the /əʊ/ in a word like condone, which is still /əʊ/ when stressed. Theknightwho (talk) 23:31, 19 November 2023 (UTC)Reply
@Theknightwho Well, we already handle this in Portuguese in various ways, e.g. a grave accent indicates an unstressed but unreduced vowel in European Portuguese but has no effect in Brazilian Portuguese, where non-final unstressed vowels are normally unreduced. We also have vowels written with a ^, ^^ or * after them to indicate vowels pronounced one way in Brazil but a different way in Portugal; see Template:pt-IPA/documentation#Symbols indicating other Brazil-Portugal differences (vowel raising, glides, epenthesis). But there are also many unpredictable differences, and for this reason we support separate respellings for Brazil vs. Portugal and for individual dialects. You will undoubtedly need to support the same in English, to handle words with differing stress like laboratory, advertisement, controversy, etc. as well as words with differing vowel qualities like methyl, tomato, etc. I would not recommend trying to create a single, unified respelling mechanism to handle all such differences; it would be way too complicated. Benwing2 (talk) 04:04, 20 November 2023 (UTC)Reply
@Theknightwho Also check out Wikipedia's "diaphonemic" handling of English IPA. It uses (or used to use?) a pan-dialectal representation that tries to account for most of the systematic vowel differences across different dialects, e.g. they have (or used to have? I don't see it any more in w:Help:IPA/English) a symbol for the unstressed /o/ in words like motoric and omission, which sometimes sounds like /o/ and sometimes more like a schwa; the symbol as they used it looked sort of like a theta, I think it was /ɵ/, which per the IPA is a close-mid central rounded vowel. Benwing2 (talk) 04:13, 20 November 2023 (UTC)Reply
@Theknightwho See also International Phonetic Alphabet chart for English dialects. However, I'm not convinced a diaphonemic approach will be viable. Consider, for example, the cot-caught distinction, where American accents like mine that have this distinction also have the lot-cloth split, which is no longer present in RP. A diaphonemic approach would have to have a separate phoneme for such words (in general the CLOTH/CAUGHT vowel occurs before /f/, /s/, /θ/, /g/, /ŋ/ in monosyllabic words, but in multisyllabic words it is complex and variable whether the LOT or CLOTH vowel occurs, hence I usually have foster with the CLOTH vowel /ɒ/, but roster with the LOT vowel /ɑ/). Words like palm and calm for me have the CLOTH vowel e.g. /pɒm/ /kɒm/, but no dictionary validates this pronunciation, claiming that I "ought" to say these words with LOT vowel /ɑ/, which sounds very wrong to me; and younger American speakers usually pronounce these words with merged cot-caught and pronounced /l/. These sorts of complexities (and myriad others, cf. the bad-lad split in Australian English, so-called Canadian raising in North American [not just Canadian!] English, /æ/ raising in various US East Coast dialects, etc.) are IMO best handled with dialect-specific respellings along with appropriate qualifiers as needed. Benwing2 (talk) 04:30, 20 November 2023 (UTC)Reply
@Benwing2 I think it will probably end up being a mix - the good thing about a diaphonemic approach is that we're not restricted to the phonemes of any one accent, so we can make finer distinctions where necessary (e.g. the bath vowel). However, some of the mergers are non-predictable, which is where we'll be forced to take a respelling approach.
One thing we'll definitely need is a pos= parameter: it's possible to specify that certain prefixes/affixes are attached independently from the rest of the phonemes in the word, and sometimes this only occurs with certain parts of speech: e.g. -er (agent noun) is an example of this, so silent final consonants remain silent (e.g. bomber, condemner), but we don't want this to affect verbs like slumber or clamber. Even then, there will need to be overrides (e.g. amber). Obviously this also becomes relevant with stress patterns, too: accent, affect etc. Theknightwho (talk) 05:05, 20 November 2023 (UTC)Reply
@Theknightwho Hmm. I am wary of using a |pos= parameter for this. I think it will be better to use a symbol to separate morphemes when this is necessary for the pronunciation, e.g. 'bomb-er' or 'bomb+er' or whatever. Note that slumber can be a noun, just like bomber. I also think the stress patterns of nouns vs. verbs are far too irregular to use a |pos= param for this; better to have a default stress on the second or third from last syllable (depending on the heaviness of the second from last syllable, similar to Latin) and require final-accented words to be marked explicitly. Benwing2 (talk) 05:15, 20 November 2023 (UTC)Reply
@Benwing2 Hmm - I think we could probably use it for the stress, but you’re probably right when it comes to -er.
Default stress is the first syllable, but it’s overridden by various Greek/Latin suffix patterns like -ion or -ic, which push the stress onto the relevant prior syllable. This accounts for words like antithesis versus antithetical, for example, and it’s also capable of vowel elision, meaning ocean and oceanic both work correctly. Prefixes may need to be manually separated in some cases, if they’re supposed to be unaffected by the normal stress rules. Theknightwho (talk) 05:41, 20 November 2023 (UTC)Reply
@Theknightwho One issue with an actual |pos= parameter is if you have a multiword term, it's hard to indicate the part of speech of each word. This could be solved by inline modifiers, but in general the use of a parameter like this takes a lot of characters compared with a simple symbol like an accent mark. As for default stress on the first syllable, I think we'd need to examine various long words to see whether that makes sense. In my experience, English words are rarely stressed farther back than the fourth-from-last syllable, with foreign words tending to get penultimate or antepenultimate stress. Cf. for example, Sebástopol and Vladivóstok, in accordance with the Latinate light/heavy-syllable rule and in both cases not in accordance with the source-language terms (Russians would stress these words as Sevastópol' and Vladivostók). Also cf. Tokyo, Osaka with antepenultimate stress, Nagasaki, Mitsubishi, Kyoto, etc. with penultimate stress and Hiroshima with either. Stressing any of these four-syllable terms on the first syllable would be wrong, which suggests to me that penultimate/antepenultimate stress is the underlying default with ante-antepenultimate stress occurring only in specific contexts (cf. ádmirable, réputable, where somehow or other -able/-ible causes a leftward shift of stress or at least doesn't draw the stress forward; and contrast defénsible with a heavy syllable before the -ible). But maybe best would be to do a simple count of four-syllable-and-longer words with penultimate, antepenultimate and ante-antepenultimate stress to see what occurs most; you said you had a large set of words with RP pronunciations that you could use for this. Benwing2 (talk) 06:03, 20 November 2023 (UTC)Reply
@Benwing2 Yeah, it might end up being necessary to change the default stress, since the Latin/Greek suffixes are present in well over half of the vocab anyway. I think historically English definitely did have first syllable stress by default, but the overwhelming majority of long polysyllabic words are Greek/Latinate, which I guess might be why we apply similar stress patterns to other long words - especially when they exhibit similar patterns (e.g. Hiroshima, Chattanooga, ending in "a", compared to something like Witherington, which is first-syllable stress). That being said, so far I’m finding the stress rules are pretty accurate. Secondary stress is actually pretty straightforward once the primary stress has been identified, too, since English usually alternates starting from the first syllable, but will accept two unstressed syllables before the primary stress, and won’t usually give the final syllable secondary stress: àntĭdìsĕstàblĭshmĕntárĭ.ăn (with -ism being a suffix tacked on after the main stress has been determined). Theknightwho (talk) 06:30, 20 November 2023 (UTC)Reply
@Theknightwho Well, any default stress rule is fine as long as it works in most cases and isn't too complicated. I agree that English (specifically Old English) once had initial stress with the exception of unstressed prefixes like ge-, be- and a few others, for some of which it depended on whether the word was a noun or a verb (echoed in modern alternations between "to record" and "a record", etc.); I wrote the Old English pronunciation module and it uses this principle. Benwing2 (talk) 06:38, 20 November 2023 (UTC)Reply

gl-conj edit

Just passing by to thank you for the incredible work in this template :-) Froaringus (talk) 10:25, 20 November 2023 (UTC)Reply

@Froaringus You're welcome! Benwing2 (talk) 00:01, 22 November 2023 (UTC)Reply

mhr-conj edit

Hi Benwing, I was wondering if you'd be interested in helping with a project (module), if that's OK. Stríðsdrengur (talk) 21:33, 20 November 2023 (UTC)Reply

@Stríðsdrengur This is for Eastern Mari? Unfortunately I know nothing of this language. I'm also pretty busy with various other projects right now. I might be able to help you after dealing with the Polish pronunciation module rewrite, but I'd need some reference tables that I could work off of, so I have some idea how the verb conjugations work. Benwing2 (talk) 00:05, 22 November 2023 (UTC)Reply
No problem, friend, as soon as you can, I'll explain it better Stríðsdrengur (talk) 09:54, 22 November 2023 (UTC)Reply

Template parser live edit

Hiya - I've rolled out the template parser as a complete rewrite of Module:templateparser. In practical terms, the output is pretty much the same (except for the fact it can now cope with much more difficult inputs), but it shows the design is stable and workable. There's no urgent need to expand template parsing at the moment, but I thought it was a useful first step, given that the wikitext parser works in a similar way: they're both underpinned by Module:parser.

I'll get working on the documentation. Theknightwho (talk) 04:04, 24 November 2023 (UTC)Reply

@Theknightwho OK sounds good. Have you done any tests to compare the memory (especially) and also the speed of the new vs. old parsers? Benwing2 (talk) 04:54, 24 November 2023 (UTC)Reply
@Benwing2 Yes - the worst I've seen has been a ~1MB increase. The speed seems to be the same. Theknightwho (talk) 05:06, 24 November 2023 (UTC)Reply
OK great! Benwing2 (talk) 05:15, 24 November 2023 (UTC)Reply

لآلی edit

Your bot changed the transcription "la'âli" to "la-âli" which is wrong --2A01:CB09:B031:3F29:3518:E51B:A412:41A3 09:41, 24 November 2023 (UTC)Reply

I see, this was back in March. The rules used by the bot script are very complicated and took a lot of tuning so I'm not surprised there are mistakes. I'll see if there are others of this nature. Benwing2 (talk) 09:56, 24 November 2023 (UTC)Reply

link synchronically to help users other than linguists edit

With this searchbox search: "synchronically insource:/[s|S]ynchronically /", I found about 2,053 unlinked instances of this term, principally in Etymologies. Occurrences in definitions and in etymologies should definitely be linked unless we are, in fact, only a resource for linguists and data-miners, which would be sad for me. Could you do this, possibly making sure that occurrences in citations do not get so linked? DCDuring (talk) 18:41, 25 November 2023 (UTC)Reply

Same for diachronically. Theknightwho (talk) 19:45, 25 November 2023 (UTC)Reply
Yes. But diachronically is much less used (~1%) than synchronically. I guess there is a presumption that etymology is historical. DCDuring (talk) 21:13, 25 November 2023 (UTC)Reply
The link should probably be to the linguistic sense of synchronic. I think users can get the implication of the -ly. DCDuring (talk) 21:00, 25 November 2023 (UTC)Reply
@DCDuring @Theknightwho Most of these say "synchronically analyzable as" followed by one of the affix templates, which I think should use {{surf}}, because it's designed exactly for this purpose and already links to the glossary. What do you think? Benwing2 (talk) 22:19, 25 November 2023 (UTC)Reply
Sounds good to me. Theknightwho (talk) 22:22, 25 November 2023 (UTC)Reply
I don't think so. The wording in the glossary doesn't seem very attractive (Why "at a later point in time" rather than "in contemporary [your language here]"? Is someone trying to cover all possible situations with one set of words?) The link-to entry for surface analysis is horrible as a dictionary entry or as 'help'. When I say that we seem to be narrowing our user base to linguists and data miners, this is a better illustration than I expected. DCDuring (talk) 01:10, 26 November 2023 (UTC)Reply
Is the desire to cover all possible cases behind using surface analysis rather than surface etymology, which are, according to our entries, supposed to be synonymous? DCDuring (talk) 01:26, 26 November 2023 (UTC)Reply
@DCDuring We can fix the wording of the glossary, if that is your concern. Benwing2 (talk) 02:47, 26 November 2023 (UTC)Reply
That's only part of my concern on this narrow issue. Is {{surf}} ever used outside of Etymology sections? Shouldn't it display as "surface etymology", which gives users less need to go to the glossary? Shouldn't the glossary link to our entry for surface etymology? Shouldn't the definition of surface analysis be shortened by 60-75%?
My larger concern is that there we seem to be aiming at a user base that consists of users who don't need Wiktionary. Some of our linguistic- and computer-term definitions seem to be written for the satisfaction of the contributor, to show an ability to precisely define something, as if to one's teacher, especially something a little difficult. I think our need is to make it possible for a user to get a superficial grasp of the term and know where to go for more depth (usually WP). If we can't do this, we need to recruit contributors who can. DCDuring (talk) 17:21, 26 November 2023 (UTC)Reply
@DCDuring I'm still not completely understanding what you want changed. I think changing "surface analysis" to "surface etymology" is fine personally; it's intended specifically for etymology sections and not likely to be used outside of them. Can you propose new wording for the definition of "surface analysis" or "surface etymology"? Benwing2 (talk) 22:18, 26 November 2023 (UTC)Reply
Quercus solaris has changed the wording of both so the previous six-line definition is now reduced to two lines. The excess verbiage was simply transferred to an examples box, where it does much less harm, but should be simplified.
Other than that I'd merely like to change every contributor's mindset toward the needs of normal human users. DCDuring (talk) 00:25, 27 November 2023 (UTC)Reply
@DCDuring I did a pass cleaning up "Synchronically analy[sz]able as", "Synchronically derived from", "Synchronically equivalent to" and the like followed by an affix template to use {{surf}}. Hopefully this is an improvement or at least not a degradation; the words now displayed ("by surface analysis") are linked to the glossary, which is better than before. I have posted to the Beer Parlour about switching the wording from "surface analysis" to "surface etymology"; we'll see if it gets consensus. Benwing2 (talk) 09:06, 27 November 2023 (UTC)Reply
Thanks. Was the wording of {{surf}} subject to a vote or BP discussion at the time of its creation? DCDuring (talk) 15:19, 27 November 2023 (UTC)Reply
@DCDuring Not sure as I didn't create the template. I doubt there was a vote but there may have been a discussion. Benwing2 (talk) 20:19, 27 November 2023 (UTC)Reply
There was an RFDO discussion two years ago with "no consensus". The discussion included various bits of mild dissatisfaction with the wording. I participated and questioned the wording. DCDuring (talk) 23:10, 27 November 2023 (UTC)Reply
@DCDuring OK, thanks. Benwing2 (talk) 23:28, 27 November 2023 (UTC)Reply

WingerBot adding bor=1 to calques edit

In this edit, WingerBot added bor=1 to the descendant templates for calques. The effect of this is to change the tooltip on the arrow from "calque" to "borrowing". I believe this to be incorrect, since a calque is not a loanword. However, this was a semi-manual edit, and you wrote a substantial portion of the code for the descendant template, which gives me pause. Is this edit good for some reason I don't know about? Integeryielded (talk) 09:49, 26 November 2023 (UTC)Reply

Oops, that should be "borrowed" in the previous paragraph, not "borrowing". Integeryielded (talk) 10:21, 26 November 2023 (UTC)Reply
@Integeryielded This is probably a mistake; this was several years ago but it looks like I wrote code to add |bor=1 whenever there was a borrowing, not taking into account whether it's also a calque. Probably the easiest thing is to fix the descendant code to prefer 'calque' over 'borrowed' in the tooltip if both exist. Benwing2 (talk) 22:21, 26 November 2023 (UTC)Reply

Borrowing module es-pronunc for Spanish Wiktionary edit

Hello, I would like to borrow your module {{es-pronunc}} for an implementation on Spanish Wiktionary. Unless you have any complain, I will go ahead. Regards, Tmagc (talk) 14:46, 3 December 2023 (UTC)Reply

@Tmagc That is fine. Benwing2 (talk) 15:52, 3 December 2023 (UTC)Reply
@Benwing2Well, the module is implemented. Many thanks, it works fine.
Here are some (constructive) critics that I think will be good for the English implementation:
  1. There's no such thing as [z] in Spanish, even if there are books or sites affirm the opposite. “s” will always be either [s] or [θ] (ceceante, in some places of Spain), and "c/z" will always be either [θ] or [s].
  2. There's no such thing as [v], your sustitution of [f] with [v] in phonetic mode doesn't make sense.
  3. The “hi” from hielo is always [j], there isn't any yeísmo or sheísmo there.
  4. Changing “nb” and “np” for “mb” and “mp” is fine, but “nm” should not be changed to “mm” (eg: “inmolar” -> /in.mo'laɾ/).
  5. You used jokers for “ch” and “ʝ”, but you should also consider adding another one for “thr” (maps to pronunciation [tʴ])
  6. I woulnd't consider “y” as a vowel, beacuse you are breaking the accentuation rules. Words without an ´ accent and ending in a consonant diferent from n-s should be acutes, but “caray” would go to /'ka.ɾai/ rather than /ka'ɾai/. More over, I would simplify your line 486 with if #syllables > 1 and (rfind(word, "[^" .. vowel .. "ns#]#") then, your other two conditions doesn't seem to make any sense: ¿which word ending with consonant+n/s would be considered acute and which monosyllable doesn't have vowels?
  7. Missing “les” in unstressed object pronouns.
  8. I woudn't put any stress mark on monosyllabes. Actually, they aren't in the classification of aguda/llana/esdrújula and just stressing by default can bring more confusion (for example, “que” and “qué” would have exactly the same pronunciation while only the last one is stressed)
  9. Your sillabification algorithm is fine, but it lacks a “prefix table” or something like that for spotting known prefixes before doing the rest. Otherwise, there are a few words with bothersome prefixes that are impossible to sillabify properly. So, “transatlántico” should be “trans-at-lán-ti-co” and not “tran-sat-lán-ti-co”, but “transar” should be “tran-sar”. There's a well-known prefix prominence issue, but implementing this seems quite difficult and a “can wait, non urgent” problem.
  10. There are some variables that are or should be common to all functions, for example the final_conversions table, or the unicode codes for all the wild cards you used. But is secondary and my implementation and invocation strategy is different because the format, templates and modules of Spanish site are different. So it's difficult to give more advices related to code formatting and structure.
If you wait patiently, I can make the most important corrections to the English module. But anyway, I'm writing in case you want to check it by yourself. Regards and thanks again, will try to borrow your modules to make pronunciation for other languages. Tmagc (talk) 03:01, 21 December 2023 (UTC)Reply
@Tmagc Hi again. Glad to hear you were able to get things working, and thank you for your comments. It looks like your module is quite different from mine; I assume you cut out a lot of stuff. Nonetheless I think we should strive to keep at least the output of the two the same as much as possible. In regards to your specific issues, I'd like to see what other Spanish editors say. (Notifying Ungoliant MMDCCLXIV, Ultimateria, Koavf, AG202): and also pinging @Nicodene and native speakers @Ser be etre shi, Rodrigo5260, AugPi, Vivaelcelta, at least some of whom are hopefully still active. My comments on the numbered issues:
  1. Are you sure about this? I have definitely seen Spanish phonology papers that say there is a [z] in desde, mismo, etc.
  2. This seems quite possible and is probably a holdover from an earlier version of the module.
  3. I'm not sure I completely understand. hi in hielo, hierba is indeed always /j/, distinctive from yema, yerba, etc. But the module should correctly handle this already. What is your concern?
  4. Would like native speaker input about /nm/; I'm not sure here but it's possible I made the change based on some prior native speaker input.
  5. What do you mean by "joker"? Do you mean some sort of special rule? Can you give some examples of words with thr that have the pronunciation [tʴ]?
  6. The inclusion of y as a vowel is commented as follows: "include y so we get single-word y correct and for syllabifying from spelling". The latter I think means that it helps in unadapted borrowings where y is indeed a vowel, e.g. rally. Are there specific issues you're seeing with this? I'm pretty sure the code handles y correctly.
  7. You are right, I will fix it.
  8. The idea of putting stress marks on monosyllables is that some of them are stressed and some are unstressed, so we might as well indicate this. Maybe we can remove the stress mark from monosyllables in single-word terms, but I think in multiword terms they're helpful to identify where the sentence stresses occur. As for que and qué, if que without an accent is unstressed we should add it to unstressed_words (this is done in the corresponding portion of the Portuguese module, where it's more obvious because unstressed e gets raised to /i/, so que is /ki/ but quê is /ke/).
  9. Yes, the correct handling of prefixes is a problem. This issue occurs also in Slavic languages, for example. Some modules I have written or worked on have lists of prefixes, but as you point out, there are cases that are impossible to handle automatically and one way or another will require manual indication of the syllable boundary using ..
  10. Yes, the code could definitely be cleaned up, although for now let's focus on the output issues.
Again, your comments are greatly appreciated.
I also have questions about clusters involving stops. Currently, written x gets phonologized as /gs/, and similarly written ct becomes /gt/, written ps becomes /bs/, written pt becomes /bt/, etc. In the phonetic representation, the /g/ and /b/ get converted to approximants. Several people have complained about this, saying that there should be some subset of /ks/ /kt/ /ps/ /pt/ used instead, although which of these should be used seems to vary from speaker to speaker. Can you comment on this? Should we change the output accordingly, or should both pronunciations be produced? Benwing2 (talk) 04:10, 21 December 2023 (UTC)Reply
@Benwing2 Hi again:
  1. I'm sure. The genesis of the problem consists in a confusion between IPA symbols and RAE symbols. RAE is not in concordance with IPA conventions, and if you look at RAE phonetic chart you can see there's no /θ/ and instead they used /z/ to represent that sound. But that's different from IPA /z/ sound, the last one is like “ts”. We had something similar to that sound but was represented by the ç cedilla (which is an arcaism). The only site where I found explicitly what I said is on the same Spanish Wikipedia, but if you agree that IPA's /θ/ is the “c/z” and that phoneme is missing on RAE's site then you got a good proof. Your papers were talinkg actually about the use of /θ/ for “s”. That does exist and it's called “ceceo”.
  2. -
  3. text, word_initial_hi = rsubb(text, "#h?i(" .. V .. ")", rioplat and "#j%1" or "#ɟ%1"), the last part should be just "#j%1", without the ?:-like conditional for using your ɟ yeísmo joker.
  4. -
  5. Joker, wildcard, I don't know how to call it. You reserved some special characters (ɟ, ĉ, and the unassigned unicode characters) to represent specific phonemes. You should do the same for “thr”, for example es:thrauna and es:pithrel. Most of the words are of Mapuche origin, but they are as well part of Spanish.
  6. Ok, it's somehow dark at least for me. I wouldn't have considered “y” as vowel and respelled rally as “rallí”, but still I didn't find any mistake. Will be impossible to change this because would require to handle all the potentially affected pages. Forget.
  7. -
  8. Ok, you're right.
  9. -
  10. -
  11. I agree with the complainers that should be /ks/ /kt/ /ps/ /pt/ rather than /gs/ /gt/ /bs/ /bt/. Thanks, I will change it in Spanish module, too.
Regards. Tmagc (talk) 04:56, 21 December 2023 (UTC)Reply
@Tmagc It is very easy to find references that indicate that /s/ is [z] before a voiced consonant. So says the prestigious Journal of the IPA [15] So says the Collins Dictionary [16] Here is an entire dissertation on this subject [17] The very first sentence of the abstract says this:
In Spanish, the phoneme /s/ has two variants: [z] occurs in the coda when preceding a voiced consonant, and [s] occurs elsewhere.
So I think you are mistaken in your belief that there is no [z]. However, I'll fix the other issues. Benwing2 (talk) 05:07, 21 December 2023 (UTC)Reply
@Benwing2 Yes, I agree, @Tmagc I think you're mixing up phonology & phonetics. It's established in pretty much all of Spanish phonetic analyses that /s/ becomes [z] before voiced consonants. It's hard for a Spanish native to pinpoint due to its allophonic nature, but it's there for the majority of speakers.s
  • In terms of [v], though, I agree, it should not be used. Sonidos en contexto (Terrell Morgan, 2010) page 182 states:

Común en el inglés, la fricative [v] es labiodental, no bilabial. En español, el uso de la labiodental es esporádico y no sistemático; es por eso que no se emplea en la pronunciación normativa.

"Sporadic" usage that's not systemic should not be used on the module, let alone an automatic conversion for it.
  • As for [ʴ], I strongly disapprove of its usage. That's just completely incorrect IPA, and it should not be following a consonant to begin with, see: Wikipedia's page for it. I have never seen it before in any legitimate source of Spanish phonology. It should just be /tɾ/ [t̪ɾ]. I'm very concerned that es.wikt is using it in their entries.
  • As for the case of #3, what we have currently is correct. Argentine Spanish is known for having a distinction between /ʝ/ (or really /ʒ ~ ʃ/), spelled as <y>, and /j/, spelled as <hi>. This leads to the minimal pair of <yerba> /ˈʒerba ~ ˈʃerba/ vs <hierba> /ˈjerba/. Outside of that region, it's pronounced as /ʝ/ in both cases.
  • Yes, /ks/ and the rest should be used instead.
Overall, Tmagc, I would really brush up on Spanish phonology and proper usage of IPA symbols. AG202 (talk) 07:52, 21 December 2023 (UTC)Reply
@AG202
- I still doubt that [z] is widely used. At least, listening to the phoneme by itself it looks like someone who is trying to imitate a mosquito or something like that. What I often hear is the aspiration of [s] into [h], but I never identified [z], not from my city nor from my country or for other countries.
- No, “thr” is not the same as “tr”. This particular sound has many spelling, as it's explained [here (CHR section)]. As I said before, is Mapuche, but some Spanish words were borrowed and the original pronunciation is still in use. I think it's an affricate and could be [t̠ɹ̠̊˔] if you don't like [tʴ]. Here's a sample.
- I don't think there's a difference in the pronunciation of “hi” from “hierba”. What it's true is that outside of the Cono Sur, “yerba” is an alternative writing of “hierba”, so they might mix /j/ with /ʝ/. But I don't think they mix these two phonemes in the rest of words starting with “hi”; do you have any source that state the opposite? But even if it's true they do that, I think why do you even replace “hi” with ɟ instead of replacing with ʝ? That's the wildcard for ʝ/ʒ/ʃ, so you are only taking the risk of misinterpreting the ɟ symbol and befuddling with ʒ/ʃ. There isn't any sheísmo there. Tmagc (talk) 17:42, 21 December 2023 (UTC)Reply
  • It's very well used. Looking at the forvo pronunciations and going through several videos at Youglish, I hear several (though not all) speakers pronounce the /s/ as [z]. We could also show aspiration to [h] as a dialectal variation, but in terms of the base level phonetic realization, [z] is the most commonly given analysis, regardless of anecdotal experiences.
  • (I don't have access to that doc fyi) Regardless of if the original pronunciation is still in use by some speakers, it should not be included unless a critical mass of Spanish speakers truly make that distinction. It'd be as if we started including [v] in vaso because some Spanish speakers in contact with Catalan or English pronounce it with a [v] to distinguish it from baso. We don't do that, so we're not going to do the other one either.
  • Yes, they have a consistent difference with <hi> vs <y>, see: the forvo pronunciations from Argentines for hielo. Also this excerpt from The Routledge Handbook of Spanish Phonology (2019, page 10):

This is most clear in Argentinian Spanish, in which the palatal consonant under scrutiny has undergone a process of strengthening toward obstruents, either [ʒ] or [ʃ] (e.g. yema [ˈʒema] or [ˈʃema]). However, orthographic (h)i is pronounced [j] or [ʝ] (e.g. yerba [ˈʒeɾβa] ‘yerba mate’ but hierba [ˈjeɾβa] ~ [ˈʝeɾβa] ‘grass’).

It's a systematic change.
AG202 (talk) 19:22, 21 December 2023 (UTC)Reply
  • Yes, now I remember that. But I only remember from some accents of Spain, I'm still in doubt that the change of [s] by [z] is done by the majority of speakers. Do as you wish, I won't put it in the Spanish module yet.
  • Same than previous point, do as you wish.
  • Who are “they”? I think you mixed everything. Let's start again: your quotation does NOT say that Argentines pronounce “hi” different from the rest. “orthographic hi is pronounced [j] or [ʝ]”, of course because they are allophones. I agree there's a distinction between <y> and <hi> sound s in Argentina compared to the rest. But I said something different: there's no clear evidence that the “hi” is pronounced always as [j] in Argentina, and always [ʝ] in the rest of the world. From your Forvo page, I only heard [ʝ] from the guy of Mexico, all of the rest pronounced [j]. Further evidence is needed.
Tmagc (talk) 19:42, 21 December 2023 (UTC)Reply
You're misunderstanding what I said. I did not say that Argentines pronounce <hi> different from the rest. I was saying that they have a distinction between <hi> & <y> that the other dialects do not have, which is why they pronounce hielo with /j/ unlike the stereotypical [ʒ] or [ʃ] that they use in words like yerba. Hence, you have to have a special case for them in IPA representation. AG202 (talk) 19:47, 21 December 2023 (UTC)Reply
@AG202 Ha ha, of course that “hielo” is pronounced with the “i” (i.e. [j] or [ʝ]), /ʃe.lo/ would obviously be non standard in any region. So, we are in agreement. Then, the code of the module is wrong because generates [je.lo] for rioplatense and [ʝe.lo] for the rest of the world; and it's not clear that distinction even exist. Tmagc (talk) 20:03, 21 December 2023 (UTC)Reply
No, we're not. The reason that Argentine Spanish is separated is because they have that distinction. We can't have /'ʝelo/ (or /'ʃelo~'ʒelo/) represent Argentine Spanish for hielo as that's not how it's pronounced. On the other hand, we can't use /j/ for the rest of Spanish as that's not a phoneme in the rest of the world. Hence that's why we have that check and list both.
As for the other ones, I'm rather concerned that you're deciding to go against linguistic consensus based on personal & anecdotal experiences, as that's not how a wiki is supposed to work. It's supposed to work on consensus and scholarly resources. But I won't personally go to es.wikt to change anything as I do not edit entries there. AG202 (talk) 20:15, 21 December 2023 (UTC)Reply
@AG202 One more time:
  • First, it's not clear that [z] is a general phenomenon.
  • The words with “thr” are rather few, are of Mapuche origin, and most of the people who know them are the one from the Patagonia/Araucanía; they are in contact with Mapuches and will pronounce as [t̠ɹ̠̊˔] rather than [t̪ɾ].
  • There's a blunder either in your or your authors interpretations. In fact, the sequence occurred exactly in the inverse path, not as you say. Never in Argentina took place any decree or any consensus stating that “there shall be a distinction between <y> and <hi>”. What happened is that just the sound for <y> evolved into /ʃ~ʒ/. The so-called “y/hi distinction” is just an obvious consequence from that process. But in the few places from the Argentinian North-West who kept the /ʝ/ for <y>, they do not have a clear distinction from the <hi>, they use either /ʝ/ or /j/ for both <y> and <hi>. You should think the reciprocal implication, which is the only one which is valid. The idea that the /ʝ/ vs. /j/ distinction is a necessary consequence from a supposed “y/hi distinction maxim” is just a figment. Anyway, I didn't found in other sites or books that say explicitly that “<hi> is /j/ in rioplatense, and /ʝ/ in any other place” and even if some site say that will always be questionable because native experts treat them as allophones.
Of course, all of the points, came from consensus with Spanish community and are not my imagination. Fyi, English Wiktionary is not the only site of Wiktionary that exists. If you're so worried about my rebellion from the Spanish and IPA phonetic conventions then why you don't come to the Spanish Café's page and expose your arguments? I rest my case here, regards. Tmagc (talk) 21:48, 21 December 2023 (UTC)Reply
@Tmagc Just to comment here, the pronunciation of [z] seems supported by lots of reliable sources, so I'm not sure why you're questioning it. (It may be different in Spanish speakers with aspiration of /s/; what I have read about this is that they may say [ɦ] or even pronounce the following consonant geminate.) I also think we should not strive to imitate Mapuche phonetics in Spanish. It is similar to French words borrowed into English; people familiar with French may say lingerie with a proper French accent including nasal vowels and uvular r, but IMO we shouldn't list pronunciations like that as the normative ones. As for the <y> vs <hi>, I'm not quite sure what you are arguing; are you saying you think that <hi> should be /ʝ/ not /j/ in Rioplatense Spanish? Not sure about that, but I have heard some speakers elsewhere may also make a difference between <y> and <hi> when following <n>, where the former ends up sounding like an affricate but the latter stays an approximant. Benwing2 (talk) 22:50, 21 December 2023 (UTC)Reply
@Benwing2 What I just said is: “it's not clear that in rioplatense Spanish <hi> is /j/ while <hi> is /ʝ/ in the rest of the world”. You can choose either /ʝ/ or /j/ for “hi”, I don't care about that. But don't put /j/ on rioplatense and /ʝ/ for the rest of variants because is not clear that such difference of <hi> pronunciation even exist.
I don't agree with the sources that say [z] is in the vast majority of speakers. It's just in the speakers from the North of Spain, and the aspiration of [s] into [h] is a much more common phenomenon. But won't be my business if you do the opposite of what I said before. Same with the “thr” issue. Regards and have a Merry Christmas. Tmagc (talk) 23:10, 21 December 2023 (UTC)Reply
P.S. 12. It's not handled the difference between /wi/ and /ui/ (or /uj/). For example, the “ui” on es:chuico is different from the one on es:cuico (audio sample), but I don't know if there's a pattern or rule. We are discussing this in the Spanish site yet. If we reach to some conclusion I will tell you, but if you (or someone here) knows about this I'd appreciate to know that. Tmagc (talk) 05:08, 21 December 2023 (UTC)Reply
This is an issue of /ui/ [wi] vs /u.i/ [u.i], and similar syllabification issues with /i/ & /u/, as Spanish orthography is not clear. This leads to words like "cuico" being pronounced either as /'kuiko/ ['kwi.ko] or /ku'iko/ [ku'i.ko]. Another example is with hiato which can be pronounced as /'ʝato/ or /i'ato/ (at this Forvo link, speakers 1 & 3 do the former, while the others do the latter. However, this is something that depends entirely on the speaker and is not a systematic issue that differs with certain words, as the RAE states in DPD - diptongo, section 2. Honestly, I'd recommend listing both pronunciations, giving preference to the diphthong pronunciation since that's reflected in the orthography. AG202 (talk) 08:14, 21 December 2023 (UTC)Reply
@AG202 But I'm not talking about diptongo vs hiato, I'm taliking about /wi/ and /ui/ (but not /u.i/), both are dipthongs. Besides, I don't agree with RAE that is something completely at free will of the speaker depending on its region. They are considering “muy” as equivalent from “cuidado” while the first is always pronounced /mui/ and the second /kwi'da.do/ . So, I think listing both pronunciations is a bad idea, but I don't know if it's irregular or there's some pattern. Tmagc (talk) 17:58, 21 December 2023 (UTC)Reply
/w/ is almost universally said to not be a phoneme in Spanish. Note: I am NOT talking about /w̝/ found in words like huaca. It sounds like you're talking about hearing [wi] vs [uj], but outside of the muy case, which I'll address later, they're analyzed, respectively, as /ui/ vs /'u.i/ (or /u.'i/) phonemically. It is an issue that's been of much debate, but that's the scholarly consensus right now. The main articles I've found against that analysis, like The phonemic status of Spanish semivowels (2006), hinge on morphophonemic analyses, which are beyond our scope. I've scoured many many papers and books, but Glides and High Vowels by Ellen M. Kaisse in The Routledge Book of Spanish Phonology summarizes the arguments and confidently states that this issue is one of syllabification. She shows how it varies depending on speaker and on the word depending on where the stress is. For example, on pages 159-160:

Several authors (Roca 1986; Simonet 2005; Chitoran and Hualde 2007) have investigated the factors favoring desyllabification of those vocoids that are typically pronounced with hiatus in isolation or in careful speech. One factor that these authors have discovered is that the closer an unstressed hiatal vowel lies to the main phrasal stress of an utterance, the more likely it is to remain syllabic and the less likely it is to be realized as a glide. Thus, the word /pi.ano/, with a marked nuclear /i./, is frequently realized as [pi.ˈa.no] in the phrase me compré un piano ‘I bought myself a piano’ since the /i./ is right next to the phrasal stress (bolded), which falls on the stressed syllable of the last word of the utterance, in this case, the [a] of [pi.ˈa.no]. We are more likely to find the desyllabified variant [ˈpja.no] when the [i] is far from the end of the sentence and thus far from the phrasal stress, as in me compré un piano de cola carísimo ‘I bought myself a very expensive grand Ellen M. Kaisse 160 piano.’ Another factor, identified by Aguilar (1999), is the sociolinguistic context. Speakers conversing normally in a map task produced shorter, less syllabic vocoids than they did in a reading task.

I would seriously give it a read if you're able to get access.
  • The issue of muy is an interesting one. Harris, James (1969), Spanish Phonology and major subsequent publications argue that it's /'mui/ with a falling diphthong ['muj], which would make sense to me. It does create an exception, but seeing as there are no minimal pairs (if "mui" [mwi] existed for example), there's no need to create another phoneme /j/. (Kaisser (2020) argues the same.) We may need to create a better exception for it and names that end in <uy> in the module, so that /w/ is not in the phonemic analysis. Side note, @Benwing2, why do we list /j/ & /w/ as separate phonemes? That analysis is very much in a small minority and puts us at odds with other sources. As The Linguistics of Spanish states: "Finally, the semivowels [j] and [w] can be assigned to the phonemes /i/ and /u/, as there are no minimal contrasts in Spanish between the sounds [i] and [j] or between [u] and [w]. Thus the semivowels can be seen as the forms taken by /i/ or /u/ when they occur in a diphthong with another vowel."
Overall, I can come up with whatever analyzes I want, but until I write a paper with approval and/or find ample evidence to back me up, I can't just use it willy-nilly. And even then, if it completely goes against consensus (like with Altaicism), it shouldn't be prioritized on websites such as Wiktionary. As I've stated, I would really recommend reading multiple scholarly papers on this topic rather than relying on anecdotal and personal analyzes of audios as they can be unreliable and at risk of misanalysis. AG202 (talk) 02:30, 22 December 2023 (UTC)Reply
@AG202 Hmm, does the module have phonemic /w/ and /j/ in it? If so, I can fix that, but then how do we phonemically notate the difference between muy and fui? Saying "there are no minimal pairs so it's not phonemic" seems a cop-out because it requires making arbitrary lexically determined exceptions, which AFAIK isn't normally done except for interjections that can reasonably be argued to be outside the normal phonemic system (cf. English uh-huh (yes) and uh-uh (no), which have nasal vowels and/or mandatory glottal stops). And the argument that posits phonemic syllabic divisions (not at morpheme boundaries) to explain the difference between [u] and [w] seems rather questionable to me as well; essentially it just transfers the phoneme from a glide to a syllabic division, which doesn't increase the parsimony of the system and feels a bit arbitrary (why are there mandatory unpredictable syllable divisions only in this particular situation)? Benwing2 (talk) 03:55, 22 December 2023 (UTC)Reply
@Benwing2: Yes, the module has phonemic /w/ & /j/ in it, see: cuico for an example. Honestly, those are good questions, but I'm not the best person to answer them. Kaisse's article goes into it better than I can try to summarize (if you haven't read it already). The analyses I've seen just posit <muy> as /mui/ [muj] as an exception since it's not really said as [mwi]. So the phonemes are the said to be the same in fui /fui/ as well, but it's realized as [fwi] in that case. It's not the perfect argument, but that's the consensus I've been seeing. One could argue instead that it's /'mu.i/, but I haven't really seen that analysis yet.
Also, with syllabic analyses, we kind of already do something similar with huir, hui, & huido in which the hiatus is shown by placing the stress on the /i/, even though orthographically they're diphthongs. And at construir, we even list both pronunciations already. AG202 (talk) 04:12, 22 December 2023 (UTC)Reply
@AG202 Well, in that case I would rather keep things the way they are, at least for now, because I think it will be very confusing for the non-technical reader to find both /mui/ and /fui/ with clearly non-rhyming pronunciations and distinguished in spelling. Also it's not similar to the case of Altaic because the alternative analyses with /w/ and /j/ can still be found in the literature (e.g. I remember now reading in Harris and Vincent's book The Romance Languages about both analyses). It really feels to me like the people arguing against /w/ and /j/, consensus or not, are trying to stretch an argument beyond its natural limit to avoid having to posit these two phonemes. Just my two cents though. Benwing2 (talk) 04:23, 22 December 2023 (UTC)Reply
@Benwing2: (Altaic is still found in literature very often fyi and argued for) But outside of “muy” and its surrounding debate, we should not put /w/ & /j/ in the phonemic transcription as it goes against the consensus at this time. There are also not any minimal pairs to posit. Additionally, the sources that I’ve seen that do argue for /w/ & /j/ do not argue for separate /w̝/ & /ʝ/ phonemes and instead place them as allophones under the former pair. We now have: /u/, /i/, /w/, /j/, /w̝/ & /ʝ/, which I’ve yet to come across anywhere else. Overall we have two real options:
  • Vocalic & semiconsonantal pairings: /i/ & /u/, /j/ & /w/, with [ʝ] & [w̝] being allophones of the latter pairing
  • Vocalic & consonantal pairing: /i/ & /u/, /ʝ/ & /w̝/, with [j] & [w] being allophones of the former.
I’d strongly suggest going with the latter option as that’s the current consensus, and it gives us more room for issues like the Argentine pronunciation of <hi> discussed earlier. AG202 (talk) 05:22, 22 December 2023 (UTC)Reply
@AG202 That is fine, I am not arguing that /j/ and /w/ are separate from /ʝ/ & /w̝/, which seems unlikely except maybe in Rioplatense. I will fix this soon; there are various issues needing fixing in the module. (BTW what I mean about Altaic is that those who don't believe in it consider it a fringe opinion and a settled matter, despite the fact that there indeed people who argue in favor of it, e.g. Starostin, whereas this issue in Spanish does not seem settled or arouse the sort of antipathy that the Altaic hypothesis arouses. You only need a cursory reading of Wikipedia to get this idea, e.g. the page on Altaic says "highly controversial" in the infobox.) Benwing2 (talk) 05:33, 22 December 2023 (UTC)Reply
@AG202 Actually are you suggesting that I analyze both muy and fui as having /ui/? I'm not sure I agree with that; I would rather use something like /muʝ/ and /fui/, even though that looks a bit strange. Benwing2 (talk) 05:34, 22 December 2023 (UTC)Reply
@Benwing2 Sorry for the late response, but yes thanks for the future change, it'll be very helpful for accurate representation. In terms of muy, I'd be okay with /muʝ/, though I'd prefer /mui/ with an exception made, but I understand why you'd be opposed to that. I do wish, nonetheless, that other Spanish editors would've replied by now :-// AG202 (talk) 20:00, 29 December 2023 (UTC)Reply
@AG202 @Benwing2 I think you are making things complexer then what they actually are. With a few exceptions, “ui” maps to /wi/ and “uy” maps to /uj/. Last one could also be /ui/ but i don't think /uʝ/ is an accurate representation. Be that as it may, You should compare your module with my version of the module, specially lines 454 and 455. You make a big mistake when you replace “y” with “i” because you lose the difference between both kind of dipthongs “ui” and “uy”. In addition, watch the commented 512 to 516; that step was nonsense as it was generating the /bs/ /gs/ instead of /ps/ and /ks/. Thus, es:cuico is correctly transcribed as /kwiko/, and es:chuico should be respelled as “cuyco”. Check the unstressed words array, there was some other missing like “mas” and “so”. May you have a happy new year; make a wish to the Magi. Tmagc (talk) 21:40, 29 December 2023 (UTC)Reply
This will be the last time I'll argue about this, but the scholarly consensus is that /w/ & /j/ are not independent phonemes. The folks that do argue that they are, nevertheless, also argue that /w̝/ & /ʝ/ are allophones of the above. I'm very aware of the pronunciations of [wi] & [uj], but again, there's a difference between phonemic & phonetic representations. So again, we have two options:
  • Vocalic & semiconsonantal pairings: /i/ & /u/, /j/ & /w/, with [ʝ] & [w̝] being allophones of the latter pairing
  • Vocalic & consonantal pairing: /i/ & /u/, /ʝ/ & /w̝/, with [j] & [w] being allophones of the former.
The latter is the consensus and the most logical representation. Having all 6 phonemes isn't an option. I'd again recommend that you do the proper reading for this and become more familiar with the proper distinctions between phonemic & phonetic representations in Spanish linguistics. Nonetheless, if you choose to do what you want in es:wikt, go ahead, but for here and the standards of this Wiktionary project, we aim to follow the scholarly consensus, unless meaningful and proven alternatives have been shown, which they haven't yet been in this case. Happy New Year to you as well. AG202 (talk) 22:21, 29 December 2023 (UTC)Reply
@AG202 look, I don't understand anything about that sort of things ,they are too complex, unpractical and I don't have enough time for reading. But what I'm really sure about is that the writings "ui" and "uy" represents very well that two different pronunciations, with a few exceptions like *chuico*. If you understand that, then we made some progress. Use the phonemes you think will be more convenient, I don't think that digging too much in the nerdish distinctions is something that worths for. For me, /uʝ/ is inaccurate. But do what you want. Happy new year. Tmagc (talk) 22:38, 29 December 2023 (UTC)Reply
P.S. 13, I wouldn't call “lleísmo” the opposite of “yeísmo”. As it says here, “lleísmo” mean that you pronounce both <y> and <ll> as /ʎ/. I think what you did is to keep the distinction between <y> and <ll> in what you called “lleísmo”, which is different. I would call it “distinción” but you get a collision with the other name for the s/c/z distinction. So, maybe would be better to rename both s/c/z and y/ll distinctions as “no seseante” and “no yeísta” , you are giving to understand that is different from “ceceante” or “lleísta”. Tmagc (talk) 23:46, 21 December 2023 (UTC)Reply
@Tmagc Interesting, maybe I can call it "y-ll distinction" or something. Benwing2 (talk) 23:49, 21 December 2023 (UTC)Reply

et al. edit

Hope you are well. In this diff, I got the result "Lili Yuan et al." when I typed "Lili Yuan; et al." in the 'author' parameter of quote-journal. I believe the correct result should be "Lili Yuan, et al." (that is, with a comma). Let me know if you have any thoughts on this. --Geographyinitiative (talk) 18:44, 3 December 2023 (UTC)Reply

@Geographyinitiative User:Sgconlaw Do you have any thoughts about this? I don't have a strong opinion; my choice to not include the comma before et al. in this circumstance is based on its literal meaning of "and others". I don't know what the MLA or APA says about it. Benwing2 (talk) 18:47, 3 December 2023 (UTC)Reply
Wow wow wow-- I didn't realize that this kind of formatting could be intentional. But now I see that the journal article itself uses this formatting, saying "Yuan et al" at the top of each page. You all do what you think is correct-- I was merely going off a barbaric understanding that there ought to be a comma between two different things. --Geographyinitiative (talk) 18:55, 3 December 2023 (UTC)Reply
I’ve seen et al. used both with and without a comma. I’ve been using it without a comma but enclosed in brackets, thus: “[et al.]”. I think it’s something I picked up from legal referencing. — Sgconlaw (talk) 18:59, 3 December 2023 (UTC)Reply
We still have {{,}} for the "Oxford comma". I'm sure we could do some fancy regex (beyond my paygrade) to identify and replace all possibly offensive commas with {{.}} and insert them where they are omitted. Admittedly it might be a long run for a short slide. DCDuring (talk) 13:14, 4 December 2023 (UTC)Reply

Phixing Photius edit

Hello Benwing. I need some help with Module:Quotations/grc/data and I hope you won't mind that I come to you looking for it.

I am trying to get Module:Quotations/grc/data to generate correct links to Photii Lexicon, given logical inputs to {{Q}}. For example, I would like {{Q|grc|Phot.||1|255}} (where grc is the ISO code for Ancient Greek, Phot. is the bibliographical abbreviation for Photius, the null argument denotes his Lexicon, 1 denotes the first volume thereof, and 255 denotes the page of that volume) to generate:

a. A.D. 893, Photius, Lexicon 1.255

At the moment, however, my choice is between a link like n265 (correct link but incorrect display) and a link like 1.255 (correct display but incorrect link).

I've tried fixing this problem by adding the details for converting from the work's physical pagination to the Internet Archive's digital pagination for the work to Module:Quotations/grc/data, but I can't work out how to make if–then functions work in the module. Would you be able to effect what I could not? Here are the conversions in this table:

physical pagination {{Q}} input (v.p) URL fragment
volume 1, pages 1–25 1.11.25 125
volume 1, pages 26–299 1.261.299 n36n309
volume 1, pages 300–303 1.3001.303 300303
volume 1, pages 304–339 1.3041.339 904939
volume 1, pages 340–397 1.3401.397 n350n407
volume 1, page 398 1.398 998
volume 1, pages 399–448 1.3991.448 n409n458
volume 1, pages 449–458 1.4491.458 449458
volume 2, pages 1–25 2.12.25 n475n497
volume 2, pages 26–299 2.262.299 26299
volume 2, pages 300–329 2.3002.329 800829
volume 2, pages 330–339 2.3302.339 n802n811
volume 2, pages 340–397 2.3402.397 940997
volume 2, pages 398–448 2.3982.448 398448
volume 2, pages 449–454 2.4492.454 n921n927

The URL fragment goes where PAGENUMBER is in this URL: https://archive.org/details/photiipatriarch00nabegoog/page/PAGENUMBER/mode/1up

What do you think? Can you help me with this, please? Thanks in advance for whatever you can do.

0DF (talk) 00:37, 4 December 2023 (UTC)Reply

@0DF Module:Quotations is rather complicated and I haven't had a chance to look at it at all. I'll take a look but I can't promise it will happen right away. Benwing2 (talk) 00:41, 4 December 2023 (UTC)Reply
Yes, that's fine. Thank you. Of course, if you'd rather I ask someone else, I can do that, too. (If you have someone to suggest, that would be great.) 0DF (talk) 13:30, 4 December 2023 (UTC)Reply

Module:accel/la edit

Hi Benwing. I just created the submodule Module:accel/la to help with Latin non-lemma entry generation. At the moment, it just adds {{la-IPA}}, but that at least works well. I would like it, in the case of nouns, to copy the lemma's gender (and, for singularia and pluralia tantum, its number) from the lemma's headword template to the transclusions of {{head}} in its non-lemma forms' entries. That sounds simple, but I suspect it is more complicated in practice. As I wrote in this edit summary, I think that will require (inter alia) the addition of .accel.num to Module:la-nominal and of origin_number = {list = true, allow_holes = true}, to Module:accel. Rather than boldly mess around with high-use templates like those, however, I thought I'd check with you first, given that you created the former. Could you give me some guidance on this, please? Thanks a bunch. 0DF (talk) 03:01, 8 December 2023 (UTC)Reply

@0DF There is no .accel.num in the accelerator code. In general you should include that info in the gender, hence use e.g. m-p for masculine pluralia tantum and f-s for feminine singularia tantum. A couple of comments, though: (1) there is a school of thought that we shouldn't include the gender in the headword of non-lemma forms because it's redundant to the gender specified in the lemma (User:Rua was of this belief, for example, and would actually write bot scripts to remove genders specified in non-lemma headwords); (2) aside from this, I'm not sure the wisdom of including singularia tantum markings in particular in the non-lemma form, since it gets complicated when there are multiple meanings and some are countable and some not (and becomes problematic if a noun is defined as singulare tantum and then a new non-singular-tantum definition is added, because then the singulare tantum indications in the non-lemma forms are no longer accurate but don't get automatically updated). Benwing2 (talk) 03:12, 8 December 2023 (UTC)Reply
Understood re .accel.num; I just figured it would be easy and more reliable to take the information about a lemma's grammatical number from its declension table.
Re point 1: Isn't everything (except pronunciation and such) in the entry for a non-lemma form redundant to the information given in the entry for its lemma? By the same logic, shouldn't all such non-lemma entries be given a single basic definition (namely "declined form of lemma" or "conjugated form of lemma", as applicable), rather than any number of detailed grammatical glosses? The information given in a non-lemma entry is useful only insofar as it gives a user enough information to understand his/her definiendum's grammatical role in the sentence without recourse to its lemma's entry; in the case of Latin nouns and adjectives, the salient pieces of information are number, case, and gender. The rationale for removing this information seems poor to me.
Re point 2: The rationale for number-marking is similar to the above. For example, seeing f sg or f pl in an entry for a noun form ending in -ae for a Latin noun ending in -a tells the user his/her definiendum's number, that it is feminine, and that it could only be either in the genitive, dative, or locative case (for f sg) or in the nominative or vocative case (for f pl), rather than potentially masculine and in any of those five cases. But the number would (or at least should) only be marked in those non-lemma entries where the corresponding lemma has a declension table that is singular- or plural-only. How often do declension tables get changed from sg- or pl-only tables to both-number tables? How many of those changes need to be made because of user error? And would it really be that much effort to make the necessary minor changes to (at most) five non-lemma entries per table-change in order to propagate that change?
0DF (talk) 03:56, 8 December 2023 (UTC)Reply
@0DF There is a similar issue with animacy in Russian, which we've taken to putting in the headword of the non-lemma forms, and it turns out that there are quite a lot of animacy changes that have happened in Russian nouns as existing nouns get expanded and senses that are forgotten get included, leading to e.g. animacy changing from animate-only to both animate/inanimate, which in turn needs to be propagated to all the inflections. I think the issue would be even worse in Latin because a lot of supposedly singularia tantum can actually be used in some senses in the plural; certainly, the majority of English terms you'd think would be singularia tantum are in fact "countable and uncountable" depending on the sense. For abstract nouns, for example, it might even be difficult to determine if they are truly singularia tantum; it depends on whether they have any concrete senses, and if they do, whether the plural happens to be attested. Furthermore I imagine a lot of Latin nouns that are in fact singularia tantum aren't properly indicated as such currently; this is certainly the case with Russian, for example, where most dictionaries don't even indicate whether a given abstract nouns is a singulare tantum. So in general I would be opposed to including this info in the non-lemma forms. As for point #1, the issue is that determining the gender is fairly trivial (just click through to the lemma), but determining the pronunciation and relevant inflections are much less so. Benwing2 (talk) 23:24, 11 December 2023 (UTC)Reply
@Benwing2 I'd be more open to this if the information could be scraped from the lemma entry, but that would be tricky to generalise. Theknightwho (talk) 16:39, 12 December 2023 (UTC)Reply

A string library for wikitext objects edit

So the wikitext parser is now at the stage where it could feasibly be used: it outputs a wikitext object (in the form of a node tree) which can be traversed in different ways, depending on what you want to do. So far, the only two options are to iterate over the display characters (e.g. with the object representing "ab[[cd]] ef" you'd iterate over "abcd ef"), or the raw wikitext itself (e.g. "ab[[cd]] ef"), but there's plenty of scope to do more. Wikitext objects (and their specialised counterparts like wikilinks, external links etc) can be nested inside each other to any degree (though obviously not every combination actually occurs), but this complexity is abstracted from the user by creating a proxy object for a given loop: reading/writing to the proxy object will read/write to the relevant part of the node tree, without the user needing to know/care about what object is nested inside what; using the i returned on each iteration makes it trivial to read/write the current character, the previous one, the next one etc, though obviously the value of i doesn't actually represent a real key - it's just a way of telling the proxy object what you want to read/write. From the user's perspective, tree traversal is as simple as iterating over a list, with the iteration function providing the context in which that traversal takes place.

From my experimentation, this design is both fast and low memory (so long as you don't try anything dumb like parsing the whole of a into one object). However, the one big drawback is that it's not possible to use the string or ustring libraries since you're dealing with arrays; that means we need to write our own string library. My main question is: do we want an implementation that's faithful to the current ustring library, or should we try something more ambitious? In theory, there's nothing stopping us doing a full regex implementation, but that's probably too much work, but there are definitely some features that would come in handy.

The other issue is that we need a way to handle certain character sets like %w without tanking memory usage, since it consists of thousands of characters. Assuming everything goes to plan, we should be saving enough resources in other areas to give us a buffer, but it's something we'll need to tread carefully with.

Do let me know your thoughts. Theknightwho (talk) 17:42, 17 December 2023 (UTC)Reply

@Theknightwho Hmm. Still skeptical that going character by character is fast enough but I agree we do need a string engine. Maybe just start with the existing Lua patterns since people are familiar with them; might not be too hard to find a pure-Lua implementation of Lua patterns and just port it (OTOH there are certainly pure-Lua implementations of a full regex library and it might be the same amount of work to port that). For character classes like %w, just use the existing support in patterns, I think. Benwing2 (talk) 19:19, 17 December 2023 (UTC)Reply
@Benwing2 It'll be impossible to match speed if we use the existing support for %w since it'll be O(n^2). Let's see how it works in practice. Theknightwho (talk) 19:38, 17 December 2023 (UTC)Reply
@Theknightwho Can you explain why it's O(n^2)? Benwing2 (talk) 22:14, 17 December 2023 (UTC)Reply
@Benwing2 It isn't - I was overthinking it: it's linear, but callbacks into PHP are so slow that the actual effect is like comparing linear time with constant time. For example, there is no measurable difference between mw.ustring.match("a", "%w") and mw.ustring.match("abcdefghij", "%w+"), since both run just under 2 million times in 8 seconds (measuring with the sandbox). The fastest way to use pre-existing support is to use mw.ustring.match, meaning the best possible result for new_match("abcdefghij", "%w+") would be 10 times slower, since mw.ustring.match would be run 10 times per call. This is before we take into account loopbacks etc. Theknightwho (talk) 23:03, 17 December 2023 (UTC)Reply
@Theknightwho Well, I wouldn't sweat that too much until it actually becomes a speed issue. Benwing2 (talk) 23:38, 17 December 2023 (UTC)Reply
@Benwing2 Realistically, the only way we're going to know is once we start testing, but gsub, match and find are collectively contributing about 1 second to a at the moment. 10 times slower and we're timing out, and that's only with a 10 character string with a best-case match. Theknightwho (talk) 23:48, 17 December 2023 (UTC)Reply
@Theknightwho OK sure, but premature optimization will kill you. Not everything on a will pass through the parser, much less involve %w. Benwing2 (talk) 00:06, 18 December 2023 (UTC)Reply
@Benwing2 True, and memoisation should help to an extent. Theknightwho (talk) 00:16, 18 December 2023 (UTC)Reply

Faroese Conjugation Table Mishap edit

How could you describe this bug of the Faroese conjugation tables? -- Apisite (talk) 04:04, 19 December 2023 (UTC)Reply

@Apisite Please bring this up in the Grease pit. I'm not sure anything can be done about it. Benwing2 (talk) 04:06, 19 December 2023 (UTC)Reply

Koreajn hanja are not lemmata edit

Hi,

I'd like to undo changes done by WingerBot for Korean hanja, so that don't get into topical categories as in diff. Anatoli T. (обсудить/вклад) 23:24, 20 December 2023 (UTC)Reply

@Atitarev Totally fine with me. I wasn't sure whether to include hanja forms and chu nom Vietnamese forms in topic categories. I think, sometimes I did, sometimes I didn't. So feel free to remove those changes. Benwing2 (talk) 23:45, 20 December 2023 (UTC)Reply
BTW I may be able to make a list of all the forms that got added in this fashion. Benwing2 (talk) 23:46, 20 December 2023 (UTC)Reply
Thanks for that. To be honest, I sometimes don't know how to label and define Korean and Vietnamese character entries, so I just copy e.g. "(historical)" [[TERM]] without any wikilinks when a Japanese or Chinese equivalent might have {{lb|ja|historical}}. Anatoli T. (обсудить/вклад) 23:56, 20 December 2023 (UTC)Reply
@Atitarev Here is a list of hanja terms that use {{tcl}}:
Here is a similar list for 'han tu' forms that use {{tcl}}:
Here is a list of Vietnamese terms that use {{tcl}} and are labeled 'dated', 'rare', 'historical', 'obsolete' or 'archaic':
Benwing2 (talk) 01:34, 21 December 2023 (UTC)Reply
The last list is OK, it's only hanja/Han tu that need to be reverted. Anatoli T. (обсудить/вклад) 03:00, 21 December 2023 (UTC)Reply

pt-IPA edit

Hi Benwing, it's me again, I'd like to ask if you intend to continue expanding the Portuguese pronunciation module, I feel it lacks some very useful features, such as automatic syllabic separation and rhymes too Stríðsdrengur (talk)

You forgot to add your signature :P. Rodrigo5260 (talk) 17:11, 22 December 2023 (UTC)Reply
@Rodrigo5260 @Stríðsdrengur Those are indeed good features. First though, User:MedK1 has a whole list of fixes to the pronunciation that I need to look into. Benwing2 (talk) 22:41, 22 December 2023 (UTC)Reply
Ok thanks for the answer :) Stríðsdrengur (talk) 22:49, 22 December 2023 (UTC)Reply

WingerBot adds Catalan before English edit

See Special:Diff/77365789. Per Wiktionary:Entry layout § Language, Translingual should be first, then English, and other languages in alphabetical order. J3133 (talk) 22:13, 22 December 2023 (UTC)Reply

@J3133 Fuck, you're right, I forgot about this when ordering the sections. Benwing2 (talk) 22:39, 22 December 2023 (UTC)Reply

May I borrow a conjugation table? edit

Hello, I would like to take your spanish conjugation module and modify it to use for Armenian. Do you mind if I do so? (Also if I do do so what would the Authorship tag look like..?) RagingPichu (talk) 21:04, 26 December 2023 (UTC)Reply

@RagingPichu Sure, go ahead. For an authorship comment, you might say something like "Written by RagingPichu, modified from Module:es-verb by Benwing" or something like that. Benwing2 (talk) 21:40, 26 December 2023 (UTC)Reply
prefect, thanks! RagingPichu (talk) 21:48, 26 December 2023 (UTC)Reply

ipairs edit

Hi - minor thing, but it could help with really hot loops: I've done some time tests comparing for i, v in ipairs(t) do end with for i = 1, #t do local v = t[i] end using the Scribunto Lua binary, and the upshot is that ipairs was about 2-2.5x slower than the numeric for loop irrespetive of table length (I tested 0, 1 and multiples of 10 up to 100m). Other than being slightly more compact, I don't see any upside to using it, either.

From what I've read, with short loops the issue is the extra time taken to handle the ipairs function, and with long loops it's slowed down by calling #t every iteration, instead of only once at the beginning. Theknightwho (talk) 13:27, 29 December 2023 (UTC)Reply

@Theknightwho Good to know. I'm not actually sure that ipairs() is slowing us down much; it depends on what percentage of typical inner loops are spent in the loop end-condition checking, which I assume isn't much even with ipairs(). Also, for most of the stuff I work on, I generally don't have big arrays of stuff to iterate through. But for your parser it could definitely make a difference if you're implementing it character-by-character with lots of loops. Benwing2 (talk) 09:25, 30 December 2023 (UTC)Reply
@Benwing2 Yeah, I noticed ipairs was using ~400ms on , which I think is the page that calls the template parser the most due to the massive number of terms in the derived terms section which all use it to scrape the Chinese transliterations. Changing it over did reduce the loading time by a small amount, since ipairs was being called a few million times. I wish there was a straightforward way to install Scribunto in a sandbox so I could do some in-depth profiling of our most heavily used modules, but that would involve setting up a local MediaWiki installation for it to interface with, which isn't straightforward. Theknightwho (talk) 18:20, 30 December 2023 (UTC)Reply

Hats off edit

Thank you for your immense dedication in the project! Had a quick look at the Bulgarian declension templates, admirably thorough and versatile. Catonif (talk) 13:48, 31 December 2023 (UTC)Reply

@Catonif Thank you and Happy New Year! Those were one of the first sets of inflection templates I created, so they are somewhat idiosyncratic in implementation. Benwing2 (talk) 19:12, 31 December 2023 (UTC)Reply
Return to the user page of "Benwing2/2023".