Wiktionary:Grease pit

(Redirected from Wiktionary:BUG)

Wiktionary > Discussion rooms > Grease pit

A grease pit

Welcome to the Grease pit!

This is an area to complement the Beer parlour and Tea room. Its purpose is specifically for discussing the future development of the English Wiktionary, both as a dictionary and thesaurus and as a website.

The Grease pit is a place to discuss technical issues such as templates, Lua modules, CSS, JavaScript, the MediaWiki software, extensions to it, Toolforge, etc. It is also the second-best place, after the Beer parlor, to think in non-technical ways about how to make the best, free, open online dictionary of “all words in all languages”.

Others have understood this page to explain the “how” of things, while the Beer parlour addresses the “why”.

Permanent notice

  • Tips and tricks about customization or personalization of CSS and JS files are listed at WT:CUSTOM.
  • Other tips and tricks are at WT:TAT.
  • Find information and helpful links about modules, Lua in general, and the Scribunto extension at WT:LUA.
  • Everyone is encouraged to expand both pages, or to come up with more such stuff. Other known pages with “tips-n-tricks” are to be listed here as well.

Grease pit archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007
2006


March 2024

Japanese kanji appear as orange links edit

Hello, adjacent to my other post about accelerated editing, this time the orangelink gadget seems to be acting up somehow when linking to some Japanese kanji, a problem I noticed maybe a month or two ago now but am only now reporting; see e.g. 倒す, 不倒, 押掛ける, 怖気づく, 圧し殺す for some examples. I don't know why it affects some kanji and not others; I assume if the gadget's working by detecting headings, somehow there's been an interference with the way the page text is being parsed as of more recently. If I'm reading the gadget source correctly, the content is considered absent if the entry doesn't belong to a category like "Japanese lemmas" or "Japanese non-lemma forms", but no recognition appears to be made if "Japanese" is followed by "kanji" (despite "logograms" and "Han tu" being checked for); would it be possible that this is the culprit? Kiril kovachev (talkcontribs) 20:13, 1 March 2024 (UTC)Reply

@Kiril kovachev When you say it's "acting up somehow" can you clarify what the issue is? I looked at those examples but I'm not sure what the desired behavior is vs. what you're actually seeing. Benwing2 (talk) 02:04, 2 March 2024 (UTC)Reply
I believe the problem was that in {{ja-kanjitab}} in 倒す#Japanese was orange (or green in my case because of my CSS) because the kanji page doesn't have Category:Japanese lemmas but does have Category:Japanese Han characters, and the gadget doesn't recognize the latter as a lemma-like category. Adding "Han characters" to the regex for lemma-like categories seems to have fixed the problem. — Eru·tuon 03:25, 2 March 2024 (UTC)Reply
@Benwing2 Sorry, I didn't make it clear what I meant, but as Erutuon pointed out, e.g. should not be an orange link, rather it should just be blue because it does have a Japanese entry for it. Up until now those kanji were displayed as orange inside the respective kanjitabs despite definitely existing on the linked pages. @Erutuon, thanks a lot for making the change. Looks good now. Kiril kovachev (talkcontribs) 15:21, 2 March 2024 (UTC)Reply

Access to Raw Transliteration edit

For some languages,the transliterate method fails because of a workaround to the problem that some transliteration modules fail when the text to be transliterated includes mark-up. A formal statement of the problem with the current solution is given in the link above. One solution to the failure of the method is to bypass the workarounds in the method and access the transliteration modules directly. We have been discussing the issue for Sanskrit at Module talk:sa-translit#Getting_Text, where the context for interpreting accentuation marks is usually larger than the scope of mark-up, such as pairs of triple ASCII apostrophes for mandatory emboldening of words.

Should we have a generic template, analogous to {{xlit}}, and a generic Lua method or function to bypass the workarounds, or should we use ad hoc language-specific templates (e.g sa-tr) to do the bypassing? I believe the major use of such templates would be to generate 'manual' transliteration strings for quotation templates. --21:23, 1 March 2024 (UTC) RichardW57 (talk) 21:23, 1 March 2024 (UTC)Reply

@RichardW57 IMO neither approach you're suggesting is good. The issue you're running into has been a point of contention between me and User:Theknightwho; the decision he made to chop up transliteration into parts and not pass formatting such as apostrophes through to the translit method seems to be causing problems in several languages. AFAIK this is only needed for certain languages with complex transliteration methods so I would recommend we switch it to be opt-in on a per-language basis, and pass the unmodified source text by default. User:Theknightwho do you have any objections to this approach and can you let me know which languages should opt into the chop-up functionality? Benwing2 (talk) 02:00, 2 March 2024 (UTC)Reply
@Benwing2 It would potentially take a lot of work to determine that, and the current method is not something I want to keep around much longer, as obviously this is a major shortcoming. For now, I'd really prefer that we simply add languages to the opt-out list as necessary. Theknightwho (talk) 02:03, 2 March 2024 (UTC)Reply
@Theknightwho Is there an opt-out list? I remember our discussion for Thai concluding that there wasn't a simple way to opt out entirely. Also when you say "not something I want to keep around much longer" are you planning on reworking the code? Benwing2 (talk) 02:05, 2 March 2024 (UTC)Reply
@Benwing2 For what Richard needs, the opt-out list in Module:languages/data should be sufficient. In terms of replacing it, the wikitext parser should make that possible, since it knows how to work around formatting without needing to split up the string into chunks. Theknightwho (talk) 02:09, 2 March 2024 (UTC)Reply
@Theknightwho: Is the opt-out list the table contiguous_substitution? --RichardW57 (talk) 02:45, 2 March 2024 (UTC)Reply
@RichardW57 Yes. Theknightwho (talk) 02:48, 2 March 2024 (UTC)Reply
@RichardW57 @Theknightwho Then I suggest to add Sanskrit to the list, so I can check if the Vedic accents work fine (and perhaps check for other bugs) Exarchus (talk) 12:04, 2 March 2024 (UTC)Reply
@Exarchus I have added this. I doubt it will work for your purposes, though, as the opt-out is (IMO) badly designed in that it passes munged versions of the formatting characters rather than the formatting characters themselves. Let me know if you have issues; if so I'll create a second opt-out that does the right thing and opts out entirely of all processing. Benwing2 (talk) 12:19, 2 March 2024 (UTC)Reply
@Benwing2 It doesn't work perfectly now, although there is a difference. Exarchus (talk) 12:40, 2 March 2024 (UTC)Reply
if I can somehow know what characters these munged versions consist of, that might work too Exarchus (talk) 12:47, 2 March 2024 (UTC)Reply
If I rewrite the code somewhat (by including whatever characters that aren't Devanagari vowels etc.), then it seems to work as intended

So a workaround can be found with the current opt-out, the code would just be a bit weird.Exarchus (talk) 12:55, 2 March 2024 (UTC)Reply

One thing that doesn't work is using <br\> (without backslash) to detect the start of a new prosodical unit. So people should use danda । there (which is normal practice). Exarchus (talk) 14:03, 2 March 2024 (UTC)Reply
I think the module is working fine now. I don't think there's currently a use case for adding different accentuation schemes than the Rigvedic one (I could try adding the Samaveda one, or the Atharvaveda symbol for independent svarita, U+1CE1). Exarchus (talk) 16:53, 2 March 2024 (UTC)Reply
If there's no danda in the source, the only way to add it is as an explicit emendation of the source, which is supported using |norm= with {{quote-book}}. Or are fraudulent quotations acceptable now? --RichardW57 (talk) 23:13, 2 March 2024 (UTC)Reply

This is currently in CAT:E with the message 'The language or etymology language name "Hindustani languages" is not valid.' Looking at entries in this category, I can find at least one where {{translit|en|inc-hnd}} has been in place since October, but this category was only created today. That leads me to think that something has changed in the modules recently. The code "inc-hnd" must have already been in existence since October, or its absence would have resulted in module errors. I can only conclude that one or more of the following has changed:

  1. The behavior of {{translit}}
  2. The settings for the code "inc-hnd"
  3. The behavior of one of one or more of the modules called by {{auto cat}}

It seems reasonable to me to have a category for transliterations where the exact language within a group isn't specified, so how do we fix this? @Theknightwho. Chuck Entz (talk) 21:49, 2 March 2024 (UTC)Reply

@Chuck Entz I can't find anything that would explain why this category would've suddenly appeared recently, so it may be that no-one got round to creating it until now. I've certainly refrained from creating categories in the past if I notice the preview throws an error.
I agree that this should be allowed, though, so it's worth updating the category tree to permit families for this type of category.
Theknightwho (talk) 22:11, 2 March 2024 (UTC)Reply
@ChuckEntz: Unless we are publishing lies again, Hindustani is not a *group* of languages, but another name for Hindi and Urdu. Again, always explaining a language designation by its Wikipedia entry is a bad idea, especially for Indian languages. --RichardW57 (talk) 23:27, 2 March 2024 (UTC)Reply
@RichardW57: There's reality, and then there's the way the modules work. This is the Grease Pit, so I was talking about the latter. We have Hindi and Urdu as separate language codes, each with their own infrastructure. Combining them into one language code would cause massive disruption and require a huge amount of work. You would have to ask at the Beer parlour about whether there's a consensus to make that change. I'm just trying to fix something that's broken. Chuck Entz (talk) 00:00, 3 March 2024 (UTC)Reply

5,705 errors edit

What is going on here? Whatever caused the error has been fixed but not before completely trashing CAT:E. This happened yesterday, too, just not as extreme. Whoever is editing core modules here needs to be more careful. Benwing2 (talk) 02:10, 3 March 2024 (UTC)Reply

@Benwing2 It's really odd - the number keeps climbing, but I can't find any pages which are actually throwing the error. Theknightwho (talk) 02:41, 3 March 2024 (UTC)Reply
@Theknightwho: that's not uncommon. More often than not, I never see more than a page or two still displaying the error. I believe that's because the category updates are a separate process from the page updates. The display goes back to normally fairly quickly, while the propagation of the category updates can take as much as a week. When I'm around, I do my best to clear everything using the API Sandbox link. In this case I had it going in two tabs for what seemed like an hour, starting with well over 8,000 down to the current 3. I'm sure I wasn't the only one working on it, though. I expect there to be a few every few minutes or so for an hour or more. Chuck Entz (talk) 04:19, 3 March 2024 (UTC)Reply
@Chuck Entz Yeah, I also spent about 30 minutes trying to clear it before I gave up and went to bed. Thanks for sorting it. Theknightwho (talk) 10:10, 3 March 2024 (UTC)Reply

Gothic script edit

The Wikipedia link on the page Category:Gothic script is to w:Gothic script which is a disambiguation page. It should link to w:Gothic alphabet instead. 212.179.254.67 08:14, 4 March 2024 (UTC)Reply

Fixed. Theknightwho (talk) 20:27, 5 March 2024 (UTC)Reply

der2 edit

On the page cut off the der2 template is used to make a show more / show less list of derived terms. But there is only one line revealed by "show more". So the "show more" doesn't save any space... you may as well just show the one hidden line (and it will save the user the click). 212.179.254.67 12:08, 4 March 2024 (UTC)Reply

FYI: Major romanization change coming in Japan edit

May impact modules, etc. if this is a change we want to adopt as well: https://languagelog.ldc.upenn.edu/nll/?p=62827

Tangentially related: Wiktionary:Grease_pit/2024/February#Japanese_accel_moduleJustin (koavf)TCM 19:08, 4 March 2024 (UTC)Reply

@Koavf The change in question is a switch from Kunrei to Hepburn romanization. It looks like we already use Hepburn romanization, e.g. the page 松下 is transliterated Matsushita not Matusita. BTW there's some category breakage on that page; User:Theknightwho it looks something to do with sort key generation, do you have any idea what's going on? Benwing2 (talk) 23:23, 5 March 2024 (UTC)Reply
@Benwing2 I think it's related to this diff @Erutuon. Theknightwho (talk) 23:39, 5 March 2024 (UTC)Reply
Sorry about that! I should have checked a few more cases in {{tracking category}}. Fixed, I think. — Eru·tuon 23:49, 5 March 2024 (UTC)Reply
@Erutuon Thanks. What does this template do? Can you add a bit of documentation? Benwing2 (talk) 23:51, 5 March 2024 (UTC)Reply
@Benwing2: Okay, done. — Eru·tuon 00:02, 6 March 2024 (UTC)Reply
Nice. Thanks. I'm glad it at least surfaced this issue that got swiftly fixed. —Justin (koavf)TCM 00:27, 6 March 2024 (UTC)Reply

Time range with time ranges edit

This is about {{quote-book}}. If, for example, a work of literature was started somewhere between 1900 and 1905 (something to indicate using |startyear=), and finished somewhere between 1915 and 1920 (something to indicate using |year=), considering date ranges use an en dash (–), I would think to simply type:
{{quote-book|[LANGUAGE]|startyear=1900–1905|year=1915–1920|[...]}}
which would produce:
1900–19051915–1920 []
I'm wondering: is this too confusing? or is it a good enough way of rendering it? —— GianWiki (talk) 16:16, 5 March 2024 (UTC)Reply

@GianWiki Definitely a very edgy edge case. Does this ever actually happen? I think the display form 1900–19051915–1920 is not going to be interpreted correctly. Maybe we could add some code to change the display if there are en dashes or em dashes in the |startyear= or |year= parameters but I think it's probably better to just put the appropriate explanatory text in the |year= param, something like |year=from '''1900–1905''' to '''1915–1920''' (exact years unknown). The code should not boldface the year if there's already boldface in the param value. Benwing2 (talk) 23:18, 5 March 2024 (UTC)Reply
@GianWiki why not just put c. 1900–1920? Ioaxxere (talk) 22:20, 6 March 2024 (UTC)Reply
@GianWiki, Benwing2, Ioaxxere: I think this kind of case is usually presented as 1900/05–1915/20 or, more rarely, 1900–1905—1915–1920. 0DF (talk) 23:03, 21 April 2024 (UTC)Reply

Fix Module:place and Module:place/data for bugs edit

The function unpack is wrongly used in Module:place and Module:place/data.

Example code :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split)

In cases where split is a table where index 1 and 2 are nil (i.e. a sparse table, eg: { [3] = 'continent' }), this will not work as expected (all 3 variables will be nil). Code should be corrected to :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split, 1, 3)

I do not have right to fix it myself, but this should be fixed. Dodecaplex (talk) 19:17, 5 March 2024 (UTC)Reply

@Dodecaplex You are right about that; it's unfortunate the unpack function was implemented in a broken fashion. Whether this correction needs to happen depends on whether the values can ever be nil; I'll need to take a look at the code in question. Benwing2 (talk) 23:13, 5 March 2024 (UTC)Reply
For instance in page Mexico, the template is called for continent holonym (e.g. first definition), which has no qualifier. So, yes, it is nil in many places. As I extract all pages using an alternate Lua environment, I got at least 151849 errors of this kind. Dodecaplex (talk) 08:22, 6 March 2024 (UTC)Reply
I think it will be better to specify the start and end indices. What happens when we don't specify the start and end indices, and split[1] or split[2] is nil, is that unpack sets the end index to basically #split, and the length operator (ultimate implementation found here) gives oddball results based on undocumented implementation details of tables. unpack and the length operator are only designed to work properly for sequence tables, because they don't traverse all keys in the table to find the actual maximum integer key. So the only way to ensure that this_qualifier and bare_placetype are always set to the values of split[2] and split[3] is to set the start and end indices ourselves. — Eru·tuon 22:16, 6 March 2024 (UTC)Reply
I've gone ahead and added the start and end indices to unpack in Module:place and Module:place/data. — Eru·tuon 22:37, 7 March 2024 (UTC)Reply
Thanks ! Dodecaplex (talk) 17:39, 8 March 2024 (UTC)Reply

Disappearing text edit

My talk history page on Βικιλεξικό here shows edits which do not appear on the talk page itself here. There is obviously an explanation — I hope that it isn't me!!   — Saltmarsh🢃 19:25, 5 March 2024 (UTC)Reply

@Saltmarsh, the user was blocked and the edit at your Talkpage reversed. It was a text by a blocked (at en.wikt, now also at el.wikt) by Shāntián Tàiláng who asked these questions: Request for English Wiktionary. Hello, I have noticed that όρισμα (modern Greek) may be derived from modern Greek ορισμός (from ancient Greek ὁρισμός). I do know that English orismology needs an etymology section added; that section should state that it derives from ancient Greek ὁρισμός and {{suffix|en||logy}}.
Also, Category:grc:Woodworking should be created, because πρίσμα needs that same category added to it.
Incidentally, tenpenny nail really needs "w:" placed just before "The Old Curiosity Shop" in its first quotation. Shāntián Tàiláng (συζήτηση) 20:18, 27 Φεβρουαρίου 2024 (UTC)
1) After that, was blocked by me 2024.02.28.@el.wikt#Block for continuing annoying admins with questions.
2) After that, repeated the text, as an IP, and another admin reversed and changed visibility. HistoryOfYourTalk
3) He tries to apply again at en.wikt for unblocking and asked me @meta how he could apply for unblocking. ‑‑Sarri.greek  I 20:05, 5 March 2024 (UTC)Reply
Dear @Saltmarsh, tell me, if you wish me to unblock ‑‑Sarri.greek  I 20:12, 5 March 2024 (UTC)Reply
@Sarri.greek I strongly advise not unblocking ST on el.wikt. Pinging @Surjection, who is most familiar with them. Theknightwho (talk) 20:23, 5 March 2024 (UTC)Reply
ST should not be unblocked under any circumstance. — SURJECTION / T / C / L / 20:53, 5 March 2024 (UTC)Reply

Deleting and moving of public sandbox submodules edit

User:Theknightwho and User:Benwing2 have been getting rid of /sandbox submodules by moving them to Module:User:Erutuon/ and deleting them. I'm uneasy about this idea, but haven't cared enough to complain before today, when my module sandbox subpages are filling up with various sandbox modules I've created in the past. (At least they're more discoverable in my module sandbox subpages than when they're deleted.)

I recall some sort of discussion (Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E I guess) awhile ago, but I'm not aware of a vote that says that these public sandbox modules are banned.

I think it's counterproductive to remove the sandbox modules. Ideally we'd have a whole set of sandbox modules and lots of testcases in the main modules and sandbox modules, so casual users could just test a change in the sandbox and see what happens, without causing thousands of module errors. IP users don't have a place (Module:User:IPAddress/) to put sandbox modules in, and casual users who notice an error are also probably not going to know or bother to copy over modules to Module:User:whatever/modulename and test changes. User sandbox modules are hard to find and it's tedious to ask if User:whatever will mind you editing them, if you do find them. So I think it's good to have "public" sandbox modules.

Granted also that sandbox modules are not very useful when they are not in sync with the main module, which is very likely to be the case when production modules are being edited often. And we don't have very extensive testcases for main modules, much less sandbox modules, so it's currently hard for editors of sandbox modules to see what their edit actually does. It takes valuable time to add testcases for new changes. So I don't know how realistic my reasoning actually is.

To solve Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E, I've expanded Template:tracking category so that it identifies all the types of sandbox modules listed in Template talk:tracking category#Identifying sandbox modules and templates. I did a bunch of regex on the list of titles in the dump to figure out all the formats of titles of sandbox modules, and then I ran some JavaScript code to make sure that the new version of the template identifies all the sandbox modules I listed. Now MediaWiki:Scribunto-common-error-category should put all the usual types of sandbox modules out of sight in Category:Pages with module errors/hidden rather than in CAT:E. — Eru·tuon 21:58, 5 March 2024 (UTC)Reply

@Erutuon Hi. I moved some of them yesterday. The ones I moved were almost exclusively years old, almost exclusively yours, and usually not worked on by anyone else. My logic is that sandbox modules should not be cluttering the mainspace. In practice I have never found a need to use mainspace sandbox modules and I definitely believe that such modules should be in userspace. Mainspace sandbox modules by their nature don't support more than one person working on them at a time and there's no mechanism provided for multiple people to synchronize their edits to a given sandbox module. In general, all sorts of problems can potentially arise with mainspace sandbox modules. In addition they get out of date quickly since production modules do get edited fairly often. In practice, anyone working on sandbox modules has to copy over the latest production modules anyway, so I don't see how there's any benefit to having the sandbox modules in mainspace vs. in your own userspace. I understand that theoretically they could help IP users but I'm not sure how commonly this ever actually happens. Also, given the reality that testcases take effort to maintain that most people don't want to spend, I think it's unlikely we'll ever have a reasonable sandbox testcase infrastructure. That said, I won't move any more modules for the time being but I do hope you'll consider switching to userspace sandbox modules. Benwing2 (talk) 22:28, 5 March 2024 (UTC)Reply
Just to chime in to say the same thing: the ones I deleted were in all cases hopelessly out of date, and none had been edited within the last year; many hadn't been edited since before 2020. Theknightwho (talk) 23:49, 5 March 2024 (UTC)Reply

change to module categorization edit

FYI I made a change to Module:documentation so that modules are categorized even when documentation is present, as long as there is no <includeonly> section present on the page. I also made the module categorization smarter. Benwing2 (talk) 02:17, 6 March 2024 (UTC)Reply

CJK Compatibility Ideographs in ranges for Hani script edit

I haven't run the bot that converts between {{t}} and {{t+}} since December, because when I tried, I ran into a problem: the entry-name rules for Korean (ko) contain a pattern whose Perl analogue is invalid, causing my code to blow up with Invalid [] range "豈-舘" in regex.

I took some time to investigate this yesterday, and I believe I now understand how to fix it (so no real action is required), but I figured (some) people might be interested in what I found, because it involves some MediaWiki tech stuff that we don't usually think about but does have user-facing effects.

Some background:

So anyway, the issue turns out to be with the character range from U+F900 to U+FA6D, which ends up as a character range in a Lua pattern in the Korean entry-name rules [link · link].

The problem is that U+F900 and U+FA6D are CJK Compatibility Ideographs, and MediaWiki applies Unicode Normalization Form C (NFC) to inputs and outputs, so by the time my bot sees the range, it's become the range from U+8C48 to U+8218, which Perl rejects because the greatest character in the range would be less than the least character. And that's actually kind of good luck; the range immediately below it, from U+FA70 to U+FAD9, gets normalized to the range from U+4E26 to U+9F8E, which includes a whole bunch of characters that it's not intended to, but is valid so far as Perl can tell, so I would never have noticed it.

For purposes of the translation-bot, I plan to fix this by just changing its server-side component to escape non-ASCII characters in some way, and the bot proper to de-escape them. That should completely circumvent MediaWiki's application of NFC.

More broadly, it may be worth asking if we really want ranges of characters that MediaWiki literally won't even let be saved; I can see arguments either way. Feel free to discuss. :-)     (FYI @Theknightwho.)

RuakhTALK
08:27, 7 March 2024 (UTC)Reply

@Ruakh Thanks for doing the investigation! I know about the conversion to NFC form but I didn't suspect it would affect CJK chars in this fashion. The current code is probably OK since it doesn't store the characters literally but rather as numbers, and constructs the ranges on the fly (hence they never get saved and converted to NFC form). Whether the ranges are OK depends on whether there are any characters in the middle of the range that aren't canonicalized out of existence during the NFC conversion, and that I don't know. User:Theknightwho will hopefully comment on this. Benwing2 (talk) 22:23, 7 March 2024 (UTC)Reply
@Benwing2 The reason I did this was for a couple of reasons:
  1. I wanted to cover any edge-cases which involved these compatibility ideographs, since I didn't know if they were used anywhere (e.g. in the Unicode modules).
  2. There are actually 12 CJK characters in the "compatibility ideographs" range which aren't actually compatibility ideographs, and don't get normalised to other characters in NFC (which I assume got added to that range by mistake many years ago, or have since been disunified for some reason): 﨎, 﨏, 﨑, 﨓, 﨔, 﨟, 﨡, 﨣, 﨤, 﨧, 﨨, 﨩. They don't form a continuous range, so it was slightly more efficient to simply include the whole block.
Theknightwho (talk) 22:32, 7 March 2024 (UTC)Reply
@Erutuon Just letting you know about this approach of Ruakh's, since we were discussing something similar a while ago (which I have not had time to revisit). This, that and the other (talk) 11:47, 8 March 2024 (UTC)Reply
I should say that if I had to do the bot over from scratch, given the current state of Wiktionary and given what I know now, I probably would not implement it this way. I think a better approach would involve some degree of asking the server-side to do transformations, plus aggressive client-side caching (storing previously-computed transformations in timestamped files and reusing them for an extended period, e.g. six months), a bunch of client-side special-casing for high-volume cases (e.g. "if the language code is [foo] and the translation matches [simple pattern] then compute the entry-name by [simple function] and don't bother querying the server"), and various other such optimizations. In fact, even though I have all the code/etc. for my current approach, I'm still considering migrating to an approach like that at some point.
So if you're planning on writing something from scratch, I think that's what I'd recommend. (But if you're comfortable with Perl, and would rather just reuse my code than write something from scratch, let me know and I can try to get it into a shareable state.)
RuakhTALK 10:01, 9 March 2024 (UTC)Reply

T:km-xi got worse edit

At ថៃ (thay) is linking words to #English, rather than Khmer.

E.g. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand

Not asking for any improvement, just fixes. Anatoli T. (обсудить/вклад) 05:21, 8 March 2024 (UTC)Reply

I suspect a general problem. The same symptoms are showing with {{th-x}}, which invokes {{#invoke:th|usex}}, and earlier this week I found the same problem with plain double square brackets that link to translingual words rather than English words in glosses. Experimentation suggests that it only shows up on lines formatted by '#', so afflicting quotations and glosses. --10:07, 8 March 2024 (UTC) RichardW57m (talk) 10:07, 8 March 2024 (UTC)Reply
It occurred to me that Module:th should be corrected to specify that the linked-to words are Thai, because Thai entries are usually the last on their page, and got as far as changing Line 223 of Module:th from
exSet, "[[" .. thaiWord .. "]]")
to
table.insert(exSet, "[[" .. thaiWord .. "#Thai|"..thaiWord.."]]")
, but then I realised that that wouldn't handle normal numbers or even idiomatic ones - Thai 555 bears no relation to Translingual 555, so I abandoned the edit. Test cases were อมฤต (à-má-rít) and โควิด-19 (with '14' as a normal number). More thought is needed on that one - (Notifying Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Noktonissian): . --RichardW57m (talk) 12:15, 8 March 2024 (UTC)Reply
The cause of this is a JavaScript change mentioned in Wiktionary:Beer parlour/2024/February § Use of T:lang. I can fix this, but I think the templates should link to the correct language section. If ASCII numbers shouldn't be linked to the Thai section, that would be easy to fix in the module with if thaiWord:match("^%d+$") .... Granted I suppose that won't be the only thing that you don't want linked to the Thai section. Generally bare links to no section should be avoided when you know the probably correct language section to link to (which is wrong in the case of 14 here). {{th-x}} probably needs some way to link to 14#Translingual, and to disable linking. [[14#Translingual|1{สิบ} 4{สี่}]] doesn't work.
However, I've prevented the link-changing code in MediaWiki:Common.css from running within lang="..." text other than lang="en(-...)". — Eru·tuon 16:06, 8 March 2024 (UTC)Reply
@Erutuon Thank you, that change seems to have removed most of the problems. However, I'm still confused how taxonomic names as definitions should be linked to. @DCDuring. It seems that {{l|mul}} isn't the recommended way.
The Thai ASCII numbers are now acting tolerably again, though they obviously can't all be translingual - we hit a limit at 101, though I would expect that one to have specific semantics in Thailand as a place name. I'm fairly happy with treating them mostly as semantically digit sequences, though I think there may be lurking chauvinistic problems, and possibly trouble with line-breaking. Roman script acronyms (CD, DVD, VDO and OT come to mind, though the last one may be overseas Thai and it's a word I've heard, rather than seen) and taxonomic names may cause problems for {{th-usex}}, though I've mostly seen the latter as definitions in Thai dictionaries. Again, nationalism may have stored up problems. --RichardW57m (talk) 17:46, 8 March 2024 (UTC)Reply
@RichardW57m. I agree. Thai usex templates have the same problem now but the formatting colours don't reveal the problem. With Khmer, I am sure colours were right before but I can't say when exactly this problem occurred.
Pinging @Theknightwho, @Benwing2: Are you able to fix the language in the links? Anatoli T. (обсудить/вклад) 05:41, 12 March 2024 (UTC)Reply
@Atitarev What's the problem you're seeing now? Erutuon's fix of 8 March seems to have removed the problem you were talking about. The outstanding issue with the {{th-usex}} is that there doesn't seem to be a mechanism to specify the language of the elements in the quotation, which causes at the least a colouring problem with translingual elements if we try tagging the elements for language. (I noted this problem nearly 4 years ago.) --RichardW57m (talk) 09:52, 14 March 2024 (UTC)Reply
@RichardW57m:
I think the displaying colour is now fixed. When I posted, the linked terms showed in orange for the Khmer template. Which edit on which module was it? Can you ping me the {{diff|}}, please?
However, please compare the output by hovering over the word components. Only the last line shows expected links, like [[王國#Chinese]], the first two just show [[រាជាណាចក្រ]] without the language. So, if any of the words were shared by multiple languages, the links wouldn't connect to the correct ones.
  1. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand
  2. ราชอาณาจักรไทย  ―  râat-chá-aa-naa-jàk tai  ―  Kingdom of Thailand
  3. 王國王国  ―  Tài wángguó  ―  Kingdom of Thailand
Anatoli T. (обсудить/вклад) 23:34, 14 March 2024 (UTC)Reply
@Atitarev: I believe the fixing diff is Special:diff/78355928. The problem, as I said above, is tagging the entries in the quotation correctly - a quotation in Thai is not always composed only of Thai elements. The Chinese quote template seems to make the assumption that all the entries are in the same variety of Chinese; I don't know how well it handles translingual words within the quotation. For Thai and Khmer, the links actually connect to the page, rather than the first entry, which is correct, but not very helpful if the Thai or Khmer entry is not the first entry. (At least Khmer occurs before Pali.) --10:17, 15 March 2024 (UTC) RichardW57m (talk) 10:17, 15 March 2024 (UTC)Reply
@RichardW57m: Thanks. The Chinese template works like other language template when the words are wikified (linked), in case you're not familiar, e.g.
В чужо́й монасты́рь со свои́м уста́вом не хо́дят (proverb)V čužój monastýrʹ so svoím ustávom ne xódjatwhen in Rome, do as the Romans do (literally, “You don't go to another monastery with your own charter”)
All the words above link to Russian entries.
You can also unlink foreign words in a Chinese usex:
  1. X什麼意思X什么意思  ―  X shì shénme yìsī?  ―  What does X mean?
As for varieties, of course, it's linking to "Chinese", since the varieties are merged under "Chinese" L2 header. Defaults to Mandarin transliterations. It's working with other varieties too with parameters, e.g |C= for Cantonese:
  1. X乜嘢意思 [Cantonese, trad. and simp.]
    X mat1 je5 ji3 si1 aa3? [Jyutping]
    What does X mean?
Delinking should work in the Thai and Khmer usexes as well, the trouble is, nobody seems to be able to make sense, let alone fix or enhance these language-specific modules, since Wyang left. Anatoli T. (обсудить/вклад) 08:15, 16 March 2024 (UTC)Reply
@Atitarev: Can you make a list of things that are broken and what the correct behavior should be, with an example for each issue? I am going to sleep now but when I get up I will take a look and see about fixing them. Benwing2 (talk) 08:49, 16 March 2024 (UTC)Reply
@Benwing2: Thanks.
If User:Atitarev/Khmer translit test cases and User:Atitarev/Thai translit test cases are still on your watchlist, yoy can start there. I will start with simple fix requests, since I don't know if you guys still plan to make it work like the Chinese counterparts.
  1. I made a comment about ។ symbol problem (and other punctuation symbols, foreign symbols) on the Khmer page.
  2. Khmer is behind Thai in handling ៗ (repetition symbol). Thai ๆ can, at least repeat the last full word.
  3. The Khmer, unlike the Thai template, demands an English translation parameter, it should be optional but can ask for it, like regular templates.
  4. As above, it's desirable to delink certain words with @ without making the output fail.
  5. Delinked foreign words (e.g. English words, numerals) should transliterate as they are, without trying to "transliterate" from Thai/Khmer. E.g. โควิด-19
  6. A harder fix. Please see @Erutuon's example above regarding numerals for re-spelling of numerals. I pinged you on re-spelling numerals but I have to find that topic. It's a harder fix. Remind me if you still have the motivation later.
(You can move/split) this discussion, if you wish. Anatoli T. (обсудить/вклад) 09:20, 16 March 2024 (UTC)Reply
@Benwing2:
Here's the numeral respelling topic: Wiktionary:Grease_pit/2024/January#Transliterating_foreign_language_usage_examples_with_numerals
Chinese templates can respell numerals. Thai or Khmer can't.
หนองคายอยู่ห่างจากกรุงเทพฯ ๖๑๔ กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep 614 · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
Delinking @๖๑๔ doesn't work either. ๖๑๔ (614) doesn't need to be linked in the usex.
๖๑๔ (614) is pronounced (hòk rɔ́ɔi sìp sìi)
respelling "6{หก ร้อย} 1{สิบ} 4{สี่}" doesn't work.
In words: หกร้อยสิบสี่  ―  hòk rɔ́ɔi sìp sìi  ―  six hundred fourteen. Anatoli T. (обсудить/вклад) 10:13, 16 March 2024 (UTC)Reply
@Benwing2, @Atitarev: But
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  6{หก-ร้อย} 1{สิบ} 4{สี่}  กิโลเมตร|Nong Khai is 614 kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ 614 กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
does work.
Apart from irrelevantly fixing the punctuation errors - the number in the parameter should be flanked by double spaces so as to give visible spaces - the trick is not to have a space in the Thai phonetic spelling. Join the components with hyphens. The original example, which is an odd form of Thai, can be achieved by using Thai digits.
Of course, the documentation needs improvement. --RichardW57m (talk) 09:09, 18 March 2024 (UTC)Reply
@Atitarev: Actually, the second error may matter. If one omits all spaces before the last word, it disappears. That's a problem with lax parsing. --RichardW57m (talk) 09:29, 18 March 2024 (UTC)Reply
@RichardW57m, thanks. I see.
We need to have the ability to use both Arabic and Thai numerals (the example I provided earlier used Thai numerals, even if it's less common, not sure).
They need to be simply displayed, transliterated (if no respelling is provided) or transliterated with respellings - both Thai and Arabic numerals.
Does your example or Thai orthography require any VISIBLE space with numerals?
The example you gave also works with the Thai numerals!:
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  '''๖{หก-ร้อย} ๑{สิบ} ๔{สี่}'''  กิโลเมตร|Nong Khai is '''614 '''kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
In my book the text appears exactly with this spacing, including the Bangkok spelling : หนองคายอยู่ห่างจากกรุงเทพ ฯ ๖๑๔ กิโลเมตร
Hope it all makes sense, @Benwing2, at least we know there is a way to work with numerals. Anatoli T. (обсудить/вклад) 23:52, 18 March 2024 (UTC)Reply
Both Thai and Khmer modules need fixes and enhancements but Khmer modules are in a worse state than Thai.
This is failing with an error: {{demo|{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី '''២០០០''' ឆ្នាំ មុន មក ម្ល៉េះ '''។'''|It is an ancient language that probably dates back to 2000 years ago.}}}}
{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី  '''2000'''  ឆ្នាំ មុន មក ម្ល៉េះ|It is an ancient language that probably dates back to 2000 years ago.}}
វាជាភាសាមួយដ៏ចំណាស់ដែលប្រហែលជាមានដើមកំណើតតាំងតែពី 2000 ឆ្នាំមុនមកម្ល៉េះ  ―  viə ciə phiəsaa muəy dɑɑ cɑmnah dael prɑhael ciə miən daəm kɑmnaət tang tae pii · chnam mun mɔɔk mleh  ―  It is an ancient language that probably dates back to 2000 years ago. Anatoli T. (обсудить/вклад) 00:26, 19 March 2024 (UTC)Reply
@Atitarev: I've seen the statement that numbers need to be separated from words by white space, and turning to a Thai newspaper web site, e.g. https://www.thairath.co.th/home, that's what I see. On the other hand, at least on price tags, the baht symbol (฿‎) tended to be written without any separation from the digits. The space after "๖๑๔" was missing from your statement, but you've shown that your source had it. --RichardW57m (talk) 09:48, 19 March 2024 (UTC)Reply
@RichardW57m: Thanks for pointing out and explaining the common usage. The correct examples would have spaces on both sides (I will correct later).
With Khmer only Arabic numerals work, as you an see in the failure above. The punctuation, especially the important (used to mark sentence ending), fails all the time.
In my test cases in User:Atitarev/Khmer translit test cases I had to remove ។ but the original text is on top.
@Benwing2. Anatoli T. (обсудить/вклад) 04:12, 20 March 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks Anatoli. I will take a look. I am still planning on fixing the scraping of Thai and Khmer, it's just that it requires some non-trivial changes and I have some other things I'm also working on :) ... but let me see if I can make number handling work better. Benwing2 (talk) 22:00, 16 March 2024 (UTC)Reply

How are we supposed to link to a page rather than an entry in lines formatted with '#'? --RichardW57m (talk) 12:15, 8 March 2024 (UTC)Reply
@RichardW57m: What do you mean? What page? Anatoli T. (обсудить/вклад) 00:27, 19 March 2024 (UTC)Reply
@Benwing2: Hi. Any luck? Let me know if you need any clarifications. Anatoli T. (обсудить/вклад) 07:39, 18 March 2024 (UTC)Reply
@Atitarev Apologies, I was dealing with Chinese stuff today. Heading to bed now but I'll definitely take a look when I wake up. Benwing2 (talk) 07:43, 18 March 2024 (UTC)Reply
@Atitarev: please fix your edit above so it doesn't have a module error. CAT:E is for emergencies. @Benwing2: Any progress? Chuck Entz (talk) 21:43, 22 March 2024 (UTC)Reply
@Chuck Entz: Allright. I converted it to use {{tl}}, so it doesn't add to CAT:E. --Anatoli T. (обсудить/вклад) 22:31, 22 March 2024 (UTC)Reply
@Atitarev @Chuck Entz Let me take a look. Benwing2 (talk) 23:20, 22 March 2024 (UTC)Reply
@Benwing2: Hi. It looks like you lost motivation to try and fix this issue. I am almost sure you can add transliterations for Khmer number and critical punctuation symbols without some major efforts. You developed much more complex modules than this. It's OK if you did, just say so, don't promise, if you won't do it. :) Unfortunately, I am clueless there. I have tried but failed miserably.
Also calling @Octahedron80 who's got some interest in Khmer and some knowledge of Lua. Hi. Are you able to check, if it's possible to fix the Khmer transliteration module for numbers and ។ symbol without breaking it? Anatoli T. (обсудить/вклад) 23:54, 25 March 2024 (UTC)Reply
@Atitarev My apologies, I have not lost motivation but I was traveling in Puerto Rico up through yesterday and had difficulty finding a contiguous chunk of time long enough to look into this. I haven't forgotten about this, though. I should be able to look into this in the next couple of days, as soon as I finish the current effort I'm doing cleaning up Chinese lect categories, but like you noted, I won't make any promises because I don't want to end up making a promise I can't follow through on. Benwing2 (talk) 00:02, 26 March 2024 (UTC)Reply

Taxon linking edit

split from "#T:km-xi got worse"
For taxonomic-name linking there are now two distinct templates: {{taxlink}} (Now with more Lua!!!), which is for taxonomic names for which enwikt DOES NOT have an entry, used as before, eg, {{taxlink|Rosa noentry|species}}, and {{taxfmt}} (New!!!, with Lua!!!), to be used for taxonomic names for which enwikt DOES have an entry, used just as {{taxlink}}, eg, {{taxfmt|Rosa multiflora|species}}. I hope that "we" (@User:JeffDoozan, @User:AutoDooz) will soon (months) have applied {{taxfmt}} automagically to all taxonomic names that currently have some link and eventually (many months) even to all now-unlinked taxonomic names. At present this just addresses formatting (various configurations of italics) and makes searches easier. In the more distant future it may make other changes (improvements???) easier. The formatting should be the same for both templates, but categorization will be different, mostly effecting only me or someone else with an active interest in taxonomic names. DCDuring (talk) 19:10, 8 March 2024 (UTC)Reply
@DCDuring: Thank you for the clarification. Are there any plans to document the user interface of {{taxfmt}}? --RichardW57 (talk) 13:54, 9 March 2024 (UTC)Reply
On the input side {{taxfmt}} is identical to {{taxlink}}. I have always accepted that contributors may have trouble determining taxonomic rank (as taxonomists also seem to), especially at generic and suprageneric ranks (eg, homonyms, uncertain and changing placement, changes in nomenclature rules and fashions). The purpose of having two templates is that it be easy to count instances of missing taxonomic names ({{taxlink|Taxon name|rank}}) and that it be easy to rename the instances to {{taxfmt|Taxon name|rank}}. Further, each instance of {{taxfmt}} should not necessarily have to test for existence of an entry at each loading of the page it is on. Finally, categorization needs for taxa in {{taxfmt}} should be more modest than for those in {{taxlink}}. Not all of this is fully settled. DCDuring (talk) 15:30, 9 March 2024 (UTC)Reply
I have added 'temporary' documentation for {{taxfmt}}. DCDuring (talk) 15:42, 9 March 2024 (UTC)Reply
@DCDuring: While an improvement, it implies that {{taxfmt}} should not be used! Is the only difference most editors need know is whether an appropriate multilingual entry exists? --RichardW57 (talk) 21:46, 9 March 2024 (UTC)Reply
Should there be |id= for linking to taxonomic names with homonyms in the {{senseid}} and {{etymid}} systems? That might apply to clades, and will apply to generic names used in different kingdoms, and also for some taxons that have changed greatly, e.g. Hominidae and Reptilia. --RichardW57 (talk) 21:46, 9 March 2024 (UTC)Reply
Quite likely, at least for homonyms from different kingdoms (or, rather, different current taxonomic codes). We now have some 300 taxonomic entries with distinct homonyms, but a good number of them include an archaic or obsolete definition, many being synonyms of current taxa. For now, most readers would get one of the appropriate definitions without the help an id parameter would offer. Trying to follow the twists and turns of taxonomic history in terms of circumscription and placement is not something I have seen any taxonomic database do. They just leave breadcrumbs. Their breadcrumbs are more complete than ours, which is why I believe we need links to multiple other taxonomic databases. When WP articles try to follow twists and turns, it is limited in scope to 'recent' (< or <<20 years) changes and can be quite confusing, often because article contributors don't seem to understand how ambiguous English can be. Wikispecies just lays out 'systems' (with dates and authors) of higher taxa on the same page (See species:Holozoa for a short example of a recent (2002) name.). I always try to update to the latest accepted term, circumscription, and placement to be found in the better current databases, and retain any older taxon in our entry as a synonym.
Our coverage will probably always be limited compared to the comprehensive taxonomic databases. (Would we want to have a million taxonomic entries?) Our value added is in etymology (at least potentially), gender, vernacular names/translations, linkage to multiple taxonomic databases, images, and (potentially) definitions that address relevance (location, economic value, use for food, medicine, etc). I doubt that imprecise linking to definitions is our biggest deficiency, though it should and, I'm sure, will be addressed. DCDuring (talk) 23:01, 9 March 2024 (UTC)Reply
Given the massive instability in taxonomic names, it would be very useful to record older meanings, especially those of or as polyphyletic taxa. There are also dictionaries that have tried to anchor themselves in the sand of taxonomic names. Even now, I'm not sure that usages of 'crustacean' are usually intended to include butterflies, let alone in works from the 1980's. --RichardW57 (talk) 12:02, 10 March 2024 (UTC)Reply
We can give it a try. Century 1911, MW 1913, MW Intl. 2d would be reasonable sources for relatively common, older names. Beyond those, we can leave breadcrumbs. DCDuring (talk) 14:33, 10 March 2024 (UTC)Reply
@DCDuring: I think you are confusing names and meanings. To quote from the equivalent vernacular, when I was a young man, one would not say that a chimpanzee was a hominid, but would say that a mammal-like reptile (such as Dimetrodon) was a reptile. These changes don't reflect a change in knowledge, but a rejection of the notion that we are not fish. (And objectively, a dimetrodon was closer kin to a Jurassic allosaur than to us.) I don't see how 'breadcrumbs' help with such shifts in meaning. --RichardW57m (talk) 09:54, 11 March 2024 (UTC)Reply
Perhaps you could like to take a run at multiple definitions for a taxon so I could see what you mean? It would be interesting to keep track of the degree of acceptance of names and their circumscription and placement by date. DCDuring (talk) 13:52, 11 March 2024 (UTC)Reply
@DCDuring: I've got to do some work on taxonomic examples, but to get an idea before then, you might find it helpful to look at velociraptor. --RichardW57m (talk) 09:21, 18 March 2024 (UTC)Reply
@User:RichardW57 Generally, I don't think the taxonomic part of any etymology of a 'vernacular' word derived from a taxon, like velociraptor, belongs at the vernacular name, rather than at the taxon, eg, Velociraptor. A definition like the second one would seem hard to justify in an English vernacular-name entry, but this may be an exceptional case.
The definition at velociraptor seems encyclopedic. As we have an encyclopedia as a sister project just a link away, there is little justification for encyclopedic material here. Therefore, stylistically, a definition shouldn't need more than one phrase, possibly with a subordinate clause or absolute if there is particularly relevant information. For a taxon or a vernacular name of an organism, such information might be location, use to humans, disease, scientific importance, or other cultural significance (like use in Jurassic Park), etc. DCDuring (talk) 12:26, 18 March 2024 (UTC)Reply
The relevance is that there are two different meanings of velociraptor. The first one, with, as you complain, a rather encyclopaedic definition, is the one that is a popular synonym of Velociraptor, and is the meaning normally found in documentaries. The second one is actually Deinonychus, and is the one found in the context of Jurassic World, and probably toy shops.
In this particular case, I am not confident that the meaning of Velociraptor having Deinonychus as a hyponym actually meets CFI. Perhaps I am setting too high a bar for independence, but I have little confidence of finding two independent usages of the second sense of Velociraptor. This is not typical of evolving meanings of taxonomic names; G.S. Paul's proposed merger of the genera has not been accepted.
I think we should make it clear that 'velociraptor' may actually refer to Deinonychus. Likewise, we should not hide the fact that 'hominid' may be used to exclude Sivapithecus. RichardW57m (talk) 14:23, 18 March 2024 (UTC)Reply
I'm skeptical that there is such a meaning in actual English usage. In any event, defining velociraptor as "a member of the genus Velociraptor addresses the matter, adequately IMHO. DCDuring (talk) 14:54, 18 March 2024 (UTC)Reply
It addresses the first meaning. It doesn't address the meaning used in association with Jurassic Park. --RichardW57m (talk) 16:14, 18 March 2024 (UTC)Reply
You are so right and I so wrong. I am interested in how you would address the problem of multiple referents (or placements or circumscriptions) of a taxon, especially how they change over time. Taxonomic databases just leave breadcrumbs, of various kinds. DCDuring (talk) 00:28, 19 March 2024 (UTC)Reply
@DCDuring: I've added an archaic meaning for 'Hominidae' as an example of the sort of meaning shift I had in mind. Sometimes there are redefinitions, but I don't know how well they are recorded in the databases, and I don't know that there is one for Hominidae. Strictly, they're not adequate on Wiktionary for well-documented languages, as they're mere mentions, so I'm inclined to treat them like any other shift in meaning.
It looks as though we Wiktionarians need to do some research on the meaning of 'pongid' - w:Ape implies that it once included gibbons.
For interpreting Felis, it looks as though w:Felidae#Classification does the work for extant species of felid. The usage note at Felis is quite helpful.
There may be unexpected problems sorting out carnosaur - clade definitions seem to have been used lately, and membership of a clade can be a difficult question. There may be a lot of hard work for botany - updating a translation as the name of a species of flowering plant can itself be non-trivial. --RichardW57m (talk) 15:32, 19 March 2024 (UTC)Reply
@DCDuring: I've now created two extra senses for Pongidae, with quotations. I've added one of them as a synonym of Ponginae using {{syn}}. I couldn't use {{taxfmt}} for the link as it does not support sense-specific fragments. --RichardW57m (talk) 17:11, 20 March 2024 (UTC)Reply
I have added links to external databases using {{R:PaleoDB}} and {{R:Mammals}} to sow further ambiguity or food for more definitions. DCDuring (talk) 02:08, 21 March 2024 (UTC)Reply
Re: "hard work". I find it hard just to make basic entries for taxa that we are linking to, assuming that the most sought-after definitions are for the currently accepted names. I'm not doing much for fossil species either. DCDuring (talk) 02:13, 21 March 2024 (UTC)Reply

aWa not working edit

Our archiving gadget, aWa, is broken. It is getting confused by the "[subscribe]" link which is now present on discussions, which makes it try to archive on the wrong page.

I disabled the gadget for now until it can be fixed (I haven't tried to debug the issue yet). Ping @Erutuon who last edited the gadget. This, that and the other (talk) 23:08, 8 March 2024 (UTC)Reply

It's because of the changes to headers. The gadget was interpreting the "[subscribe]" link at the beginning of the header (in the HTML, though it displays as if it's after the header) as the link to the page to archive at. Also, the gadget wasn't going to the next HTML elements after the header correctly because they've added another layer of HTML elements in the header. I haven't fixed the fact that the "[subscribe]" link is interpreted as part of the header, which is a bug apparently tracked in phab:T13555#9592945 and due to be fixed soon. Not sure if the gadget works (because I don't really know where to test it), but give it a try and let me know. — Eru·tuon 01:18, 9 March 2024 (UTC)Reply
@Erutuon it looks like you've fixed it. I just tested it and, although it displayed the [subscribe] text in its UI as part of the header, it didn't actually make a difference to the archival itself. See [1]. This, that and the other (talk) 03:25, 9 March 2024 (UTC)Reply

Attempted to create a legitimate entry for "chmobik", tripping vague anti-spam measures edit

The specific abuse rule that was tripped was 'various specific spammer habits'. I'm not sure what that means, and the entry I wrote up has nothing I can find wrong with it.

Ishiura (talk) 10:27, 9 March 2024 (UTC)Reply

I'm not sure exactly what it is, but my first instinct is that it's the Reddit/Twitter links. Those aren't considered durably archived sources for quotations either way. — SURJECTION / T / C / L / 10:33, 9 March 2024 (UTC)Reply
OK. I actually modelled the "chmobik" entry on the "mobik" one, which uses pretty extensive Twitter quotations.
Ishiura (talk) 10:36, 9 March 2024 (UTC)Reply

Derived terms tool edit

As I'm useless with programming, I asked AI to make a tool to quickly add Derived terms. It is stored at User:Denazz/Derived Terms Tool. Is it complete crap, as I suspect? Denazz (talk) 15:58, 9 March 2024 (UTC)Reply

Lol, it looks very incomplete. Equinox 19:17, 9 March 2024 (UTC)Reply

husband's edit

How should I resolve the red link on husband's stitches? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:50, 9 March 2024 (UTC)Reply

What red link? Vininn126 (talk) 19:21, 9 March 2024 (UTC)Reply
That fixes the immediate problem, but doesn't address the difference between the lemma and the plural entries in the way linking is handled in the headword. Chuck Entz (talk) 19:40, 9 March 2024 (UTC)Reply

Template/deletion/inclusion error edit

Wiktionary:Beer parlour/2006/October and Wiktionary:Beer parlour/2006/August are showing up in pages for speedy deletion. Equinox 20:37, 9 March 2024 (UTC)Reply

  resolved by deleting Template:zh-hanzi-box. This, that and the other (talk) 02:40, 10 March 2024 (UTC)Reply

removing cruft from Module:labels/data/regional edit

Heads up, I am planning on moving quite a bit of stuff from Module:labels/data/regional to language-specific modules. There are over 4,000 lines of stuff in this module and 662 entries. Most of the entries are limited to one or two languages, but having them in the lang-independent data means that any use of the labels for any language will add the corresponding category. Hence we get CAT:French Translingual (with 5 current entries), CAT:Austrian English (with one current entry, which does not belong), CAT:Finland English (with one current entry, probably likewise), CAT:French Chinese (with one current entry, debatable), CAT:French Catalan (with one current entry that belongs rather in CAT:Northern Catalan), etc. I wrote a script to find all the existing per-language categories for each label in Module:labels/data/regional, which I am planning on using as a basis to move most entries out. There is a slight disadvantage to doing this in the case of a regional label that corresponds to several languages, in that the aliases and Wikipedia fields will get duplicated. For example, the current entry for France defines French as an alias with a link to the Wikipedia entry for France, and corresponds to six lang-specific categories: CAT:French French, CAT:French Ladino, CAT:French Latin, CAT:French Norman, CAT:French Vietnamese, CAT:French Yiddish. If we care enough about this, one way to minimize duplication is to support a field containing a list of allowed languages; I may do this. (OTOH the Wikipedia links should maybe be customized on a per-language basis. For example, rather than just linking to the Wikipedia entry on France, which is of questionable usefulness here, we could imagine linking to the Western Yiddish article for French Yiddish, the European French article for French French, etc.) Benwing2 (talk) 02:30, 11 March 2024 (UTC)Reply

FYI I have written a function in Module:alternative forms to convert lang-specific labels data modules to {{alt}} data modules. I will eventually be merging the two sets of data modules so that all the info is found in the labels data and the separate dialectal information in Module:CODE:Dialects disappears. For now I have done Maltese and Albanian. @Catonif, Fenakhay Benwing2 (talk) 04:21, 12 March 2024 (UTC)Reply
Nice, it's good to see this is gaining traction. Catonif (talk) 05:06, 12 March 2024 (UTC)Reply
OK, I have written most of the necessary code. Tomorrow I will run some of the code. The plan is as follows:
  1. Allow a list of language restrictions to be added to lang-independent labels, esp. those in Module:labels/data/regional (DONE).
  2. Move most regional labels to lang-specific modules. The current criterion is as follows: A label remains in Module:labels/data/regional if either (a) it concerns more than 3 languages, or (b) it has more than 1 alias and concerns more than 1 language. This means something like Congo with 8 aliases (Democratic Republic of the Congo, Democratic Republic of Congo, DR Congo, Congo-Kinshasa, Republic of the Congo, Republic of Congo, Congo-Brazzaville, Congolese) and 3 languages (yom, fr, avu) remains, as does Nigeria with 6 languages and 1 alias, as does Erzincan with 3 aliases (Yerznka, Erznka, Erzinjan) and 2 languages (tr, hy). OTOH, Lānaʻi with 4 aliases (Lanaʻi, Lanai, Lāna'i, Lana'i) gets moved because it concerns only one language haw (Hawaiian). Overall this moves 591 out of 662 entries out, spreading them over 108 lang-specific modules, of which 34 are new. Sometimes there are clashes between a lang-independent and lang-specific label; in that case the code adds the moved lang-independent version in a Lua comment, for later manual fixing.
  3. Fix up the clashes noted in the previous step; needs to be done manually.
  4. Convert the existing {{alt}} dialectal data modules to label modules (I have a script to do this), and integrate them into existing label modules (I have another script to do this). I wanted to do this step after step (2) because there may be clashes between labels in a lang-specific {{alt}} data module and a lang-indepedent label data module (specifically Module:labels/data/regional), and I'd like to have as few of those as possible as they need manual handling. The integration/merging of the two modules may introduce clashes when there are conflicting specs; as in step 2, the code generates comments for later manual fixing.
  5. Fix up the clash comments generated in the previous step.
  6. Convert all the {{alt}} dialectal data modules to auto-convert from the corresponding {{lb}} data modules; or better, just use the latter directly in {{alt}}. Benwing2 (talk) 06:46, 13 March 2024 (UTC)Reply
Benwing2 (talk) 06:46, 13 March 2024 (UTC)Reply
Great work @Benwing2!. ... French French? haaa ha ha ha. I cannot stop laughing. Why not placenames. France French. Belgium French. USA English. British English (a! an exception, ok say Britain...). I haven't seen them all, but the repetition is ... ‑‑Sarri.greek  I 07:04, 13 March 2024 (UTC)Reply
FYI I have carried out steps 1, 2 and 3. Step 6 won't be necessary because {{alt}} now directly reads the label data modules. Currently working on steps 4 and 5. Benwing2 (talk) 07:08, 6 April 2024 (UTC)Reply
Finished steps 4 and 5. After a bit of time to verify that nothing is amiss, I will delete all the dialectal data modules. Benwing2 (talk) 03:49, 9 April 2024 (UTC)Reply
Thanks. This was a monumental task. Seems to work correctly. —Justin (koavf)TCM 06:21, 9 April 2024 (UTC)Reply
@Koavf Thank you. Yes, it took a lot of coding plus several days of manual effort fixing up merge conflicts. Benwing2 (talk) 06:44, 9 April 2024 (UTC)Reply

Missing category edit

@Benwing2: Category:Places in Baja California seems to have been overlooked; there is one for Category:Places in Baja California Sur. A bit confusing, I know - anyway it's needed for English and Spanish. DonnanZ (talk) 17:41, 12 March 2024 (UTC)Reply

@Donnanz Baja California Norte is in the place data (the erstwhile state of Baja California was split into two states some time ago). Benwing2 (talk) 20:20, 12 March 2024 (UTC)Reply
@Donnanz NVM, I see that Baja California is the official name. I changed the place data and set up Baja California Norte as an alias. Benwing2 (talk) 20:25, 12 March 2024 (UTC)Reply
@Benwing2: More confusing than I realised. I see it's coming up as a red link for now. Thanks. DonnanZ (talk) 20:37, 12 March 2024 (UTC)Reply

Sicilian vowels edit

The last time this came up Cato and I found ourselves bogged down in consonant difficulties. No need to let the perfect be the enemy of the good, however.

It is uncontroversial that Sicilian has five, and only five, phonemic monophthongs: /i ɛ a ɔ u/. So let's simply start there. Could we run a bot to identify, and hopefully fix some of, the phonemic transcriptions featuring nonsense like /ɪ i̞ ɨ ɛ̃ ɐ̠ ɐ ʊ Vː/? I can clean up the rest manually. Might as well remove the various full-stops while we're at it, as there is no /./ phoneme in Sicilian. Nicodene (talk) 00:48, 14 March 2024 (UTC)Reply

Keeping the vowel transcriptions to those 5 phonemes sounds good. I don’t find “there is no /./ phoneme” a compelling rationale for omitting syllable divisions in phonemic transcriptions.
Many languages have phonological contrasts that are not normally analyzed as reducible to the presence or absence of a specific phoneme in a sequence; e.g. there is a contrast in Spanish between the pronunciation of ame and amé, which we transcribe (properly, I think) as /ˈame/ vs. /aˈme/. I have never seen it argued, nor would I argue, that /ˈ/ in this transcription is a phoneme, but I don't think these should both be transcribed as /ame/.
There might be other good reasons to omit syllable divisions. E.g. in the case of English, a lot of the time there isn’t even consensus between phoneticians about where syllable divisions falls. In some languages, syllable divisions might be completely predictable just from the sequence of phonemes in a word. (In other languages, this is mostly but not entirely the case, with morphology also affecting syllabification in some circumstances: e.g. in Latin, Catalan, and Spanish, heteromorphemic /bl/ can only be syllabified as heterosyllabic /b.l/ (as in Latin sublātus) but morpheme-initial or internal /bl/ can or must be syllabified as an onset.)--Urszag (talk) 01:27, 14 March 2024 (UTC)Reply
It was a tongue-in-cheek way of saying that syllabification is not phonemic in Sicilian (unlike stress in Spanish).
Perhaps I should state my concern more plainly. At the moment our transcriptions claim syllabification as a phonemic feature of Sicilian. That is an extremely bold claim, and one made accidentally by editors unaware of what phonemes are. It should be removed on those grounds alone. In the event that a groundbreaking paper surfaces to prove that Sicilian syllabification happens to be phonemic after all, then we will apply its findings carefully, and systematically, to our transcriptions. The chances that our current transcriptions would have got all the details right are nil.
As for morphology affecting pronunciation - that is a concern that applies to many if not most of the languages we have here. That is, morpheme (or word) boundaries often have consequences on the phonetic level. The solution would be adding morphophonemic transcriptions, if we reach such a level. Nicodene (talk) 02:38, 14 March 2024 (UTC)Reply
@Nicodene I'm not sure that just the presence of periods/full stops between slashes asserts anything about their phonemicity. It is fairly common to include syllable dividers in phonemic representations, esp. if the phonemic representation is all we have (yet another reason, I think, to prefer broad phonetic representations between brackets; it lets you include relevant info without having to worry about whether such-and-such a distinction is phonemic). In any case I can easily do a bot run to find occurrences of the non-phonemic vowels you mention above; correcting them automatically is a bit trickier as it depends both on having rules to do the conversion (which might not be so hard to work out) and making sure the rules are correct in all cases (which might be harder, as people might be doing surprising things with these non-phonemes). Benwing2 (talk) 03:49, 14 March 2024 (UTC)Reply
@Nicodene Here: User:Benwing2/bad-sicilian-vowels. There are about 600 instances. If you give me a list of replacement rules I'll see about implementing them. Benwing2 (talk) 04:03, 14 March 2024 (UTC)Reply
I've gone through the list. (/fʷ/ was really quite something.) These are all straightforward:
/ɪ i̞ ɨ/ → /i/
/ɛ̃/ → /ɛ/
/ɐ̠ ɐ ä aː/ → /a/
/ʊ/ → /u/
As for the long vowels, most of them are spurious, but some are indicated in the spelling with a circumflex and derive from contractions of /VV/. I'll see if I can find a paper discussing these before I do anything with them.
As for the full-stops – well, to place something in a phonemic transcription is to indicate that it is phonemic, inevitably and by definition. This is something that is often not grasped, for instance by (I would estimate) more than half of the contributors here, including otherwise very knowledgeable ones, but that is just how it is. One can either use phonological notation correctly or not use it at all.
I'm in favour of phonetic transcriptions as well, so long as we actually know enough to do them properly for the language in question. From what I have seen perusing the existing transcriptions, that is not the case for Sicilian. Someone has to put together a properly sourced and cited Wiki page on Sicilian phonology. Maybe I'll do it if I can find it in me to. Nicodene (talk) 05:17, 14 March 2024 (UTC)Reply
@Nicodene OK thanks. Are you sure about converting /aː/ to /ɛ/? That seems odd, while the others look totally fine. Benwing2 (talk) 05:22, 14 March 2024 (UTC)Reply
OK, I see you changed it. Benwing2 (talk) 05:23, 14 March 2024 (UTC)Reply
Yes. As for the 'legitimate' long vowels, we have the following (if the information in the entries is accurate):
Very interesting. Nicodene (talk) 05:37, 14 March 2024 (UTC)Reply
@Nicodene Done. Benwing2 (talk) 06:05, 14 March 2024 (UTC)Reply
Thank you. Nicodene (talk) 06:07, 14 March 2024 (UTC)Reply
I don't mean to be a wet blanket but I have to add that your claim that "more than half of the contributors here, including otherwise very knowledgeable ones" are confused about what "phonemic" means (by implication, this includes anyone who disagrees with you, including me and User:Urszag) is a very strong statement. I also see you are going ahead and removing the existing syllable breaks in the phonemic notation despite there being no consensus for this (since the two other people in this discussion both disagree with it). Benwing2 (talk) 07:19, 14 March 2024 (UTC)Reply
@Benwing2 Sorry I didn't mean at all to imply that. I ultimately disagree but your reasoning has to do, respectively, with morphological complications and acceptance of a common practice, not any kind of basic misunderstanding.
My point, which I could have conveyed better, was that the common practice itself comes ultimately from that misunderstanding. Sicilian transcriptions like /çɪɾɪ(ɨ)ˈv(ʲ)ɛɖːu/, Galician transcriptions like /baˈβuʃa̝/, and Neapolitan ones like /ʃkuŋˈtʃi.ʝʝə/ were all common practice until recently, and this sort of thing is self-reinforcing: the more such transcriptions there are, the more they come off as a legitimate model to emulate, and so they can spread and take on a life of their own. Which is what I think happened with syllable divisions in phonemic transcriptions becoming a sort of Wiktionary canon, across languages.
I've not seen a phonology paper with syllable divisions in phonemic transcriptions unless the author is really proposing that they are phonemic. And (aside from my finding the concept itself unlikely) I've not seen a proposal to that effect gain widespread acceptance, e.g. for English, or seen one made at all for Romance languages or Latin.
As for removing syllable breaks - since I was already there cleaning up the long vowels, I fixed other issues with the same transcriptions, such as /e̞ u̞/. In my view /./ is also incorrect but I didn't intend to edit any entries just for that reason. I'll now simply leave it as-is. Nicodene (talk) 14:02, 14 March 2024 (UTC)Reply
@Nicodene OK, my apologies as I think the tone of that message was stronger than I intended. What you say makes sense (although I'm pretty sure syllable breaks are in fact phonemic in English, cf. the classic minimal pair nitrate vs. night rate, unless you make morpheme boundaries in compounds be phonemic, which is six of one vs. half a dozen of the other). I still think it's helpful to include syllable breaks. Again this leads to my conclusion that for practical purposes (given that our foreign-language entries are not meant for a linguistics paper but as a learner's dictionary for English speakers) we should abandon a purely "phonemic" transcription in favor of a broad phonetic one, which allows us to pick and choose which level of detail to show. This is already done, for example, in Russian, where e.g. we are choosing to show broad /l/ as [ł] and notate some of the more important vowel allophones such as [æ] between palatalized consonants. A pure phonemic representation is a theoretical construct and sticking with such a thing can often put us in a straitjacket, sometimes leading to bizarre results, e.g. per User:AG202 the Spanish terms fui [fwi] and muy [muj] should be notated phonemically as /fui/ and /mui/, where the very salient vowel differences between the two are considered non-phonemic and lexically determined and hence not displayed. (Now, I don't really believe it makes sense to have lexically determined phoneme -> allophone rules like this, but per AG202, this is the consensus view among linguists working on Spanish.) Benwing2 (talk) 00:39, 15 March 2024 (UTC)Reply
There is a boundary in night-rate, and that boundary is what causes people to pronounce it differently from nitrate. I agree. But it is surely more economical to explain that as a word boundary, given that night-rate is plainly recognizable to a native speaker as night plus rate — and given also that we know speakers have a mental model that can treat words as fundamentally distinct units (or else language wouldn't be possible I think) — than it is to posit that speakers have a mental model which, in addition to that, treats t.r and .tr as fundamentally distinct units. What does the additional assumption contribute?
I agree about promoting [] over // at any rate so long as the phonetic details are known. It would save a lot of headaches, for more than one reason.
As for the thing about Spanish - it sounds by definition impossible. Has AG202 cited a paper to that effect? Perhaps there is some other factor involved, like regional differences which have been shoved into one phonemic representation. Nicodene (talk) 04:26, 15 March 2024 (UTC)Reply
See the discussion here: User talk:Benwing2/2023 § Borrowing module es-pronunc for Spanish Wiktionary. Particularly the part citing The Routledge Book of Spanish Phonology when it comes to syllabification. It specifically lists "muy" as an exception and phonemically represents it as /mui/, which is what I've seen for the most part elsewhere too from authors that don't list /j/ & /w/ as separate phonemes (which is the consensus). There's no minimal pair with a hypothetical "mui" [mwi] as well. There's an argument that could be made that it's instead /'mu.i/ though. AG202 (talk) 04:51, 15 March 2024 (UTC)Reply
@AG202 Thanks for the response. Keep in mind there are other terms in -uy. Looking through the lemmas we have produces the following: ababuy, Chuy, cocuy, cuy, espumuy, Esteguy, huy, Jujuy, Luy, muy, pijuy, Ruy, tepuy, uy, Yaracuy. If there are no minimal pairs with words in -ui, it seems a random gap not an inherent feature of language. (And in fact cf. huy and hui, both Spanish words.) Benwing2 (talk) 05:21, 15 March 2024 (UTC)Reply
Why is it not simply /kui'dado/, /ku'iko/, /fu'i/, /'mui/? I don't follow any of this. Nicodene (talk) 06:00, 15 March 2024 (UTC)Reply
Because, as Urszag stated below, /fu'i/ implies a disyllabic word, when in fact it's one syllable. AG202 (talk) 06:27, 15 March 2024 (UTC)Reply
So you're saying /fuí/ can only be [fwí], and not [fuí], while /kuíko/ can be both [kwíko] and [kuíko]? Nicodene (talk) 06:29, 15 March 2024 (UTC)Reply

The contrast between Spanish fui and muy can be analyzed as a matter of the position of the stress (like the contrast between ame vs. amé). The problem with the standard IPA stress notation (aside from the fact that the stress mark is not a phoneme) is that the IPA stress symbol is supposed to go at the start of the stressed syllable, which calls for /'fui/, /'kuiko/, etc. Some phonologists use the acute instead (/fuí/ vs. /múi/) to avoid that issue. "Quasi-Phonemic Contrasts in Spanish", by José Ignacio Hualde (2004:5), cites Quilis and Fernández 1985 as giving transcriptions like "[bjénto] /biéNto/; [porfiában] /poRfiábaN/; [kwál] /kuál/;[fwérte]". Ralph Penny, in A History of the Spanish Language, also makes use of the acute to mark stress in phonemic transcriptions e.g. /kantáis/. I agree with Benwing that broad phonetic transcriptions can often be preferable to phonemic transcriptions. Linguists discussing Spanish glides and syllabification seem to usually use broad phonetic transcriptions, but I've also seen a few uses of slashed transcriptions that the authors don't seem to have obsessed over getting perfectly theoretically accurate. E.g. "The Syllable", Alfonso Morales-Front (The Cambridge Handbook of Spanish Linguistics, 2018, pp. 190-210) gives a number of phonetic transcriptions such as [suβ.li.mi.ˈnal], [su.ˈβli.me], [ˈpje.ðɾa], [gwe.βo], but also gives in slashes the transcriptions /uebo/, /-ecito/, /ˈaman/. There's no explanation of why the stress symbol is included in the last but not the first two, or why the symbol /c/ was used in /-ecito/.--Urszag (talk) 06:10, 15 March 2024 (UTC)Reply

Thanks! You explained it better than I could, and I agree that it looks to be a matter of stress like Routledge also posits. I'm a bit wary of using the acute accent though as it's usually used for tone. I'm not sure how else we can show it though. AG202 (talk) 06:28, 15 March 2024 (UTC)Reply
The "calls for /'fui/, /'kuiko/" part doesn't follow for me. I understand specific languages can have some 'home-brew' IPA practices, to an extent, but this just seems misleading. To anyone else this reads as if /u/ is stressed, then stress migrates rightwards in every surface realization. And it causes a clash with the actually stressed /u/ in muy. Nicodene (talk) 06:42, 15 March 2024 (UTC)Reply
To be clear, I wasn't recommending the transcriptions "/'fui/, /'kuiko/". My point was that these (also /'fiesta/, /'fuerte/, etc.) would fall as a natural but undesirable consequence of the convention of placing IPA stress marks before the onset of the stressed syllable. Then again, I can't find that principle explicitly stated anywhere in the online IPA chart or in the 1999 handbook (just implicitly conveyed by the examples), so maybe it doesn't even technically have official status anyway--I know some phoneticians have violated it and instead adopted the convention of placing the stress marker directly before the stressed vowel, but we don't generally do that on Wiktionary (e.g. we don't transcribe floro as /flˈoɾo/).--Urszag (talk) 07:41, 15 March 2024 (UTC)Reply
Is it true that Spanish phonologists agree on phonemic representations like /'kuiko/ in a phonology that contain the vowels /i/ and /u/ and no phonemic diphthongs? I ask because it isn't clear to me how that would work. Given the phonology as described, if I've not missed something, that transcription could only stand for a phonemically stressed /u/.
Also, does the pronunciation [uˈi] occur? The linked discussion suggests so, at least for cuico, whereas the comments here seem to suggest otherwise. Nicodene (talk) 08:52, 15 March 2024 (UTC)Reply
I also am not saying that Spanish phonologists generally recommend using the transcription /'kuiko/. But it is a possible phonemic transcription of the disyllabic pronunciation ['ku̯i.ko]. Stress is analyzed as a suprasegmental feature, so the placement of /'/ relative to other symbols in a phonemic transcription is a matter of convention. One convention is to put it directly before the stressed syllable. If you think that convention doesn't seem to work very well in this context, you're not alone, but as a convention, it isn't something that can be true or false: it isn't a fact about Spanish phonology or the position of Spanish phonemes (since /'/ is not a phoneme and doesn't actually come before or after any phoneme in the phoneme sequence). Here are some relevant transcriptions and commentary from José Ignacio Hualde's chapter "Spanish", in Gabriel, Christoph; Gess, Randall; Meisenburg, Trudel (eds.), Manual of Romance Phonetics and Phonology, 2022:790: "/ˈbiaxe/ [ˈbi̯a.xe]", "/ˈbaile/ [ˈbai̯.le]", "/liˈana/ [liˈa.na]", "/ˈioɡa/ [ˈʝo.ɣa]", "/iˈato/ [iˈa.to]" "/ˈkon.iuxe/ [ˈkonʲ.ʝu.xe]" (these are given in the context of explaining the analysis where [ʝ] is treated as a positional allophone of /i/). In footnote 1, Hualde notes: "In yoga the vowel /o/ is the phonologically stressed element, not the initial /i/, which becomes a consonant as it does not receive the stress on itself in this context, although the initial syllable is stressed. The lack of clarity introduced by the IPA convention of marking the stress at the beginning of the syllable in sequences like /io/ without a preceding consonant is the reason why in Hualde (2005) stress is indicated directly on the stressed vowel instead." I can't confirm whether cuico is potentially trisyllabic, but I have no reason to doubt it.--Urszag (talk) 10:21, 15 March 2024 (UTC)Reply
There isn't anything in the phonemic representation /ˈfui/ to convey that it is one syllable as opposed to two like /ˈmio/ and /'tea/. If we're to assume that /u/ here is inherently non-syllabic, then what we are really saying is that it is the phoneme /u̯/ or /w/ and the transcription has to be revised.
If we attempt an allophonic rule turning /u/ in that context to [w], we'll have to find a way to make sure it doesn't affect the /u/ in /ˈmui/ or any of the other -uy words mentioned by Benwing earlier. Most difficult of all, we would have to differentiate /ˈui/ from /ˈui/, namely the pair huy/hui. Nicodene (talk) 11:48, 15 March 2024 (UTC)Reply

Autocloseable.close edit

All use of the {{tl}} template is now rendering as Autocloseable.close for some unknown reason. That's all I know. Thanks, Soap 01:59, 14 March 2024 (UTC)Reply

It probably began when an IP editor changed it from a redirect to {{temp}} into a standalone page with the error. The error was actually appearing as plain text in the diff, so maybe this was just an odd form of vandalism? Either way, if {{tl}} is supposed to be a redirect to {{temp}}, it should be fine now. If not, we need to work out what the IP was trying to do. Soap 02:09, 14 March 2024 (UTC)Reply
@Soap Thanks. Yes, {{tl}} is just supposed to redirect to {{temp}}. Benwing2 (talk) 03:42, 14 March 2024 (UTC)Reply

Interesting failure (8,782,141,951 IDs) edit

Sure, this edit is just vandalism, but I'm intrigued by the effect it had: instead of just making the one use of {{af}} fail, it made every instance of a Lua-using template on the page fail, saying "The time allocated for running scripts has expired." Why? Was the module thinking that "id8782141951=agent noun" meant it should keep looking through the other parameters trying to find "id8782141950=", "id8782141949=", etc? (If I change the parameter to e.g. "testbadparameter=agent noun", only that one instance of {{af}} fails.) Do I gather the module supports arbitrarily many id= parameters, even 8,782,141,951 of them, and times out when it thinks there are that many? Would it make sense to set any kind of sanity-check/sanity-limit, like more than 50 id= per template makes it spit out an error so that only the one instance of {{af}}, and not the whole page, breaks? - -sche (discuss) 06:53, 15 March 2024 (UTC)Reply

@-sche Yes, that's more or less what's going on. More specifically, I think what's happening is that it checks the maximum index of all numbered parameters and iterates from 1 up to that index, processing arguments. The reason for doing this is that potentially e.g. the tr could be supplied but not the term or display, etc. Yes, it probably should have some sanity checks in it, although it's not especially high priority because (a) it only breaks one page, (b) properly for this to be fairly robust we'd have to add sanity checks in lots of places, which is both a big undertaking and could backfire if we set the limits too low. Generally when I add sanity checks it's to prevent errors from swamping CAT:E, e.g. things like an alias loop in a label module used to cause all sorts of pages to get errors; now (if I remember aright) it only causes errors on certain pages (so we do get a few pages in CAT:E to alert us of the problem) and has some sort of fallback behavior on the rest (so they don't swamp the category). Benwing2 (talk) 07:07, 15 March 2024 (UTC)Reply
That makes sense (re why not to bother implementing sanity checks for this). - -sche (discuss) 14:51, 19 March 2024 (UTC)Reply

Mohawk stems edit

Many nouns in Mohawk are contain noun stems that are useful for stuff like noun incorporation and also historical linguistics (kéntsion is much more easily seen to be from proto iroquoian *-tsjõɁt- when you can see that the stem is -itsion- so I wanted to create a template moh-stem but I'm not sure how to do that and if I should be doing that. If anyone has any help I've been trying to change the etymology for the page for mohawk onón:tsi to say "Noun stem -nontsist- from Proto-Iroquoian *-nõːtsiː-" but I'm not sure how to do that ChromeBones (talk) 07:52, 15 March 2024 (UTC)Reply

@ChromeBones do you need a template for this? Can't you just write "Noun stem {{m|moh||-nontsist-}}"? This, that and the other (talk) 09:53, 20 March 2024 (UTC)Reply

uh oh, script timeouts edit

@Theknightwho semen, laven and kennen are now running out of time halfway through the page. This has only happened in the last hour or two. Could you have made a recent change (e.g. your bug fix to Module:parameters or some other change) that inadvertently slowed things down? If not, any ideas? Benwing2 (talk) 08:15, 15 March 2024 (UTC)Reply

@Theknightwho It is indeed this change, because when you preview the pages in question without it, you don't get timeout errors. Interestingly they happen only with Middle English verb conjugations; Module:enm-conj must be doing something strange with parameters that is triggering an edge-case bug in Module:parameters. Benwing2 (talk) 08:39, 15 March 2024 (UTC)Reply
@Benwing2 I've fixed this. In essence, {{enm-conj}} was relying on the old way that defaults were handled for list parameters, where if item 1 of a list was empty then the default value would get used as the first item. This applied even if the list contained higher values, such as (in this case) class2= etc. This is only relevant when lists are allowed to contain holes, as in this case, so the solution was twofold:
  1. Revert to the old method of handling defaults, so that they're always added if item 1 of a list parameter is empty.
  2. Move the handing of default values so that it comes after the handling of holes in lists. This therefore means that item 1 of a list can only be empty at that point if allow_holes = true.
There might still be some other module which relies on lists not having holes while also relying on the old default handling, so it might be worth tracking any instances where allow_holes hasn't been specified and an input list contains a hole at item 1, since that should hopefully flush out the possible instances where it could occur.
Going forward, we might want to change the spec so that defaults can either be (a) inserted only if the list has 0 items, or (b) inserted if item 1 is empty. 16:45, 15 March 2024 (UTC) Theknightwho (talk) 16:45, 15 March 2024 (UTC)Reply
@Theknightwho Great, thank you for looking into this and fixing it! I think ideally we should have disallow_holes = true as the default but that might require a lot of work. Benwing2 (talk) 19:35, 15 March 2024 (UTC)Reply
@Benwing2 That should be the default at the moment (or rather, allow_holes = true has to be set manually), but the issue is if a template relies on holes being removed automatically except for item 1, which is set as the default if empty. Theknightwho (talk) 20:28, 15 March 2024 (UTC)Reply
@Theknightwho There are actually three states with regard to holes: allow holes (allow_holes = true), compress holes (the default) and disallow holes (disallow_holes = true). What I mean is probably "disallow holes" should be the default and the "compress holes" state should have to be requested explicitly using compress_holes = true. I think the behavior where holes can be present and are compressed away is surprising, esp. with named parameters. Benwing2 (talk) 20:42, 15 March 2024 (UTC)Reply
@Benwing2 You're right - I'd forgotten (and it's not in the documentation, so I should update that). Theknightwho (talk) 20:46, 15 March 2024 (UTC)Reply

Unicode 15.1 update for Appendix:Unicode edit

I just updated the Indonesian Wiktionary's version of Appendix:Unicode to Unicode 15.1 at Lampiran:Unicode. Here's the list of the relevant changes if anyone wants to update the Appendix:Unicode to Unicode 15.1 since I don't have permission to edit the modules:

Also slightly unrelated, I created a name rule for Lampiran:Unicode/Variation_Selectors so it doesn't need a name module anymore:

Thank you! Ekirahardian (talk) 20:30, 17 March 2024 (UTC)Reply

Pinging @Erutuon who has edit-access to said modules. Ekirahardian (talk) 23:52, 18 March 2024 (UTC)Reply

Remove users with foo-0 from foo's Babel cats edit

I noticed (by looking at VGPaleontologist, who has apparently tried to indicate every language he doesn't speak) that users who declare "egy-0" nonetheless get put into Category:User egy, and likewise for other languages. Can we change this so they're not, so that Category:User egy (etc) only contains users who've indicated knowledge of egy? (Also, is anyone working on bot-removing / re-sorting inactive users, or am I just doing that manually when I think about it?) - -sche (discuss) 15:37, 19 March 2024 (UTC)Reply

@-sche Should be   fixed in [2]. This, that and the other (talk) 09:32, 20 March 2024 (UTC)Reply

typo, twice edit

Appendix:Glossary is edit-protected. Appendix:Glossary#ablaut and Appendix:Glossary#voice have 2 full stops instead of 1 after they mention Wikipedia. (Looks like someone put a period at the end of the sentence/fragment, unaware that the {{pedia|template}} automatically adds a period.)

Happy editing!

--173.67.42.107 15:56, 19 March 2024 (UTC)Reply

Fixed. Vininn126 (talk) 16:08, 19 March 2024 (UTC)Reply

Can anyone help with this problem I'm having in creating categories pls? edit

I'm trying to add a category in Maltese regarding word stems. For context, Maltese has words derived from Arabic in the form of roots (E.g. k-s-r; related to breaking --> kiser; 'he broke') and other words derived from mainly Italian in the form of stems or 'morphemic stems' (E.g. -komunika-;related to communicating --> with added suffix; komunikat; 'comunicated').

So, since "Maltese terms by root" already exists in the "Maltese Categories", i wish to create a "Maltese terms by stem", however I am having an increasingly hard time with trying to do so... firstly i can't seem to create a page with a word in the format '-***-' which is needed as to indicate affixes, and i just can't find a way to create a category or template or any link of some sort even with auto cat... can anyone suggest any solutions to this please?? Melithius (talk) 22:08, 19 March 2024 (UTC)Reply

@Melithius I can help with the first part of your question. To create a page that starts with "-", you either need to edit a page to include a red link to the desired entry (as in [[-something-]]) and click the link, or manually go to the page by typing it in your web browser's address bar, like https://en.wiktionary.org/wiki/-something-
As for the categorisation issue, judging by the absence of Category:Terms by stem by language, it seems like categorisation by stem is not something we currently do at Wiktionary. It would need to be implemented via Lua modules, probably after a community discussion at the Beer Parlour. This, that and the other (talk) 09:03, 20 March 2024 (UTC)Reply
@Melithius @This, that and the other I should add, the difference between roots and stems only makes sense in certain languages. Potentially we could categorize by stem for Maltese only; however, I'd be concerned about the number of stems involved (a ton), and the resulting likely sparsity of the coverage. Also I suspect that many of these "stems" only exist in a single or a limited number of words; since Maltese is a Semitic language and Semitic languages don't usually have "stems" per se, all of the stems in question (including the one you cited) are borrowed, usually in a single word. Benwing2 (talk) 03:28, 21 March 2024 (UTC)Reply
@Melithius In addition, something like -komunika- is not the normal way we do things at Wiktionary. Things with hyphens on both sides are infixes or interfixes; roots and stems of the sort you're referring to would only have a hyphen at the end (or at least this is how Proto-Indo-European roots are handled). Benwing2 (talk) 03:30, 21 March 2024 (UTC)Reply
No no, you are wrong actually if i'm understanding your confusion correctly... Maltese is a language of semitic origin but greatly influenced by italian and even english... The statistics being 40% arabic 40% italian and the rest english and other possible languages, last time i checked. Hence, yes, there are many borrowed terms from italian that are greatly used in everyday speech. So i think it's only fair such a system for showing the many possible formations of these roots exists. For example from -komunika- you can add -tur 'communicator', -zzjoni 'communication', -r 'communicating (noun)' and so on for many other stems. However, i do see the specialty of such system only being used by a handful of languages including Maltese... So i don't have many hopes of such a template being introduced.
Also yes I understand how it may be interpreted as an infix, but many maltese sites display them this way to show how you can add both prefixes and suffixes. But something like KOMUNIKA is enough, just something that doesn't show it as an actual word. Melithius (talk) 08:12, 21 March 2024 (UTC)Reply
My point is that words like komunikatur and komunikazzjoni were not formed in Maltese by adding a suffix -tur or -zzjoni to a stem komunika, but were borrowed as whole words from Italian. Benwing2 (talk) 02:04, 28 March 2024 (UTC)Reply
ah okay yes, that is indeed the case most of the time, but we shouldn't completely disregard such a system of 'adding' affixes to stems simply because of that no? Yes the words exist in their own sense, but yet they still need to be learnt, and the way to that in regards to our own language is by classifying such a system (as such a system is what we had in the first place before italian). We do it with english as well in education to teach what certain affixes mean and form. You are being very technical, as with your logic we can go further as to say italian doesn't do it either, everything's preserved from Latin, same with english and any other Latin derived languages just to have atleast a few rules to go by. Its just analysis and reinterpretation for better use and learning within the language itself, which in Maltese education is was long officialized, and i want to extend that part of the Maltese education system and the language as a whole onto here. Melithius (talk) 17:00, 28 March 2024 (UTC)Reply

Requests for verification/​Reconstruction edit

@This, that and the other, why again was it made so {{rfv}} no longer works for reconstructions? -- Sokkjō 05:57, 20 March 2024 (UTC)Reply

The RFV process is based on the goal of attesting words by finding the required number of usage examples. By definition, a reconstructed form won't have any usage examples.--Urszag (talk) 08:26, 20 March 2024 (UTC)Reply
Exactly. For relevant discussions, see this one from January 2023 and this one from February 2023. This, that and the other (talk) 08:51, 20 March 2024 (UTC)Reply
I suppose that makes sense, but what if I want to put in a verification request for a declension table, or a word sense? -- Sokkjō 19:58, 20 March 2024 (UTC)Reply
As far as I can make out from the links, {{rfd-sense}} will do for a word sense. --RichardW57m (talk) 13:42, 21 March 2024 (UTC)Reply
For specific declension tables, I've been experimenting with {{rfv}} next to the table and explaining the scope of the challenge in the discussion section. --RichardW57m (talk) 13:42, 21 March 2024 (UTC)Reply
@RichardW57m: {{rfv}} on reconstructions forwards to {{rfd}} at the moment, as does {{rfd-sense}}, which is not want I want to do. Another example is Reconstruction talk:Proto-West Germanic/hą̄han, which I started on a talk page, but it's not going to get the eyes like an RfV would. -- Sokkjō 19:31, 21 March 2024 (UTC)Reply
I would recommend using WT:ES or maybe WT:TR for discussing specific details of reconstruction entries. The right people hang out at ES, even if the question is not specifically about etymology. This, that and the other (talk) 22:38, 21 March 2024 (UTC)Reply

Languages with entries in fr.Wikt but not en.Wikt edit

I'd like a list of languages for which fr.Wikt has entries for terms in the language (see fr:Wiktionnaire:Statistiques and fr:Wiktionnaire:Statistiquesb) but en.Wikt does not (Wiktionary:Statistics). The difficulty is that both sites' stats pages index by language name (which obviously differ between English and French) rather than code; for English I suspect I could reasonably easily isolate all the names from our table and plug them into {{#invoke:languages/templates|getByCanonicalName|English}} and get a list of the language codes we have entries in, but I don't know what the equivalent function for fr.Wikt is: fr:Module:langues seems to only mention a function for getting the language name from the code but not vice versa. - -sche (discuss) 14:20, 20 March 2024 (UTC)Reply

@-sche An additional issue is that the codes used on fr.wikt and en.wikt may differ, esp. in lesser-used languages without ISO 639-3 codes (which are the ones you would be interested in). I don't know anything about fr.wikt though. Maybe @Noé would have some idea. Benwing2 (talk) 03:22, 21 March 2024 (UTC)Reply
Hello, French Wiktionary Stats are provided by Unsui, based on the dumps. He may be able to generate a list with language codes instead of language names? As Benwing2 said, some languages have a local-made code when ISO doesn't provide any, and it is often the name of the language itself in French. Pamputt and Otourly may also be interested by this conversation, and may like to do the reverse operation to have a list of entrees to create in French Wiktionary 🙂   Noé 09:42, 21 March 2024 (UTC)Reply
Hello, with Cognate Dasboard and some manipulations I was able to do something like that, manually using #ifexist magic word. But Cognate is broken, and I don’t know if it will be repaired. Otourly (talk) 19:16, 21 March 2024 (UTC)Reply
You can also try to use the kaikki JSON dumps for the French Wiktionary, they have the language codes: https://kaikki.org/dictionary/rawdata.html . The only remaining problem is likely a lack of standardization. MrBeef12 (talk) 08:39, 24 March 2024 (UTC)Reply

uh oh, timeouts once more edit

@Theknightwho Suddenly we have several pages in CAT:E that are running out of time. Cf. Milton, which exists only for a few languages but nonetheless has timeout errors. I see that User:Chuck Entz already pinged you about this, and you said there was a bug in the template parser that you fixed, but there still seem to be issues. What are the changes you've been making lately to the template parser module? Benwing2 (talk) 03:35, 21 March 2024 (UTC)Reply

@Benwing2 They've cleared up on their own with no intervention from me. I have no idea what the issue was. Theknightwho (talk) 03:50, 21 March 2024 (UTC)Reply
I think it may be something to do with this diff which restored the old punctuation-removing pattern in Module:languages. If I preview water/translations with the old version, it takes about 8.5 seconds, whereas the new version times out. That may just be down to random chance, though, since I don't see any obvious issues with the new pattern. @Erutuon, Benwing2. Theknightwho (talk) 04:04, 21 March 2024 (UTC)Reply
@Theknightwho @Erutuon Interesting. I suspect this is not chance. There is clearly variation on how long page saves take but we've never before this had a bunch of pages periodically appearing in CAT:E due to timeouts. If we've seen them once, they'll be back, and the fact that you were able to pinpoint the likely cause means we should focus on optimizing the pattern in question. The obvious thing that pops out is the two .- operators; depending on the implementation esp. given the need to work with Unicode, this could easily turn into an N^2 operation, whereas the previous operation, with only star operator (not counting the %s* operator, which shouldn't in most cases match anything, so should be O(C)), could be O(N). You might consider trying to split up the operation into various operations, e.g. separating the punctuation splitting and trimming in their own operations, so that at the end all you need to do is check for [^%p%s] in the punctuation-stripped and whitespace-trimmed string; this is guaranteed to be O(N). Benwing2 (talk) 04:24, 21 March 2024 (UTC)Reply
@Benwing2 As a side point, I've just noticed a bug in Module:translations where every translation is being tracked as having "no term", which is adding about 4 seconds to water/translations. Obviously that won't fix the other pages, but it may explain why certain big translation pages have been sluggish. Theknightwho (talk) 04:29, 21 March 2024 (UTC)Reply
The quantifier - in the pattern could certainly take some time. However, I tried replacing it with *, and then removing the line entirely, and didn't see Lua take significantly less time in Milton, and there's so much variation that I have no idea how to even time it so I've given up. Ultimately there's a translation to PHP regex and I don't know what the difference is between - and * in the PHP translation, but if the translation of * is faster in general, it would be fine to use it here. — Eru·tuon 19:29, 21 March 2024 (UTC)Reply
@Benwing2 @Erutuon Yeah, the variation in times is enormous. I've been doing quite a lot of profiling with the template parser today, as I was concerned that it was a major contributor to the time-outs on . After a lot of work, it now contributes about 0.5 seconds to the page load time (remembering that that's the time taken to parse several hundred pages; not just the raw content of ). That's totally dwarfed by the variation in some of the MediaWiki functions, and I really don't know what we can do about it: the profile shows mw.ustring.gsub varying between 0.8 to 1.5 seconds, and getContent (which is necessary for page scraping) takes anywhere between 1.5 to 3.5(!) seconds. I assume it's down to whichever machine happens to do the processing server-side. Theknightwho (talk) 23:03, 21 March 2024 (UTC)Reply
Just to add: the massive variation only seems to affect anything that calls back into PHP. The template parser is mostly built with Lua's native libraries, and I've noticed the times are pretty consistent between page loads. Theknightwho (talk) 23:06, 21 March 2024 (UTC)Reply
@Theknightwho: Entries keep popping in and out of CAT:E all the time. This used to happen with an entry or two every week. Now it's several at a time, every few minutes to an hour or two. It makes it harder to spot the real errors. I managed to completely clear CAT:E, but in the time it's taken to write this, another one has popped up. I cleared it again- we'll see how long that lasts. Chuck Entz (talk) 15:16, 27 March 2024 (UTC)Reply
@Chuck Entz I'm not completely sure, but it seems that module load times are longer just after they've been recently changed, but drop back down again after a short while. Presumably it's something to do with caching. Theknightwho (talk) 22:00, 27 March 2024 (UTC)Reply
@Theknightwho It may well be related to the change mentioned just above by User:Erutuon. I think we should consider reverting it. I don't think it has anything to do with caching. Something has definitely raised the average time that large pages take, which is why they're timing out a lot more often. Benwing2 (talk) 22:06, 27 March 2024 (UTC)Reply

Template requiring date or year edit

For some reason, the template here at the bottom is requesting a date or year, even though two dates are given. The same template does not request anything when used in template namespace. Anybody can tell me what’s going on? MuDavid 栘𩿠 (talk) 09:52, 21 March 2024 (UTC)Reply

@MuDavid: it's a strange template that uses reference templates inside the quotation template. I'll have to take a closer look at it later. — Sgconlaw (talk) 11:39, 21 March 2024 (UTC)Reply
@MuDavid: the issue is that you aren't supposed to squeeze a citation template inside |2ndauthor=. If you use |newversion= then the module requires you to provide a value for |date2= or |year2=. Thus, you can't just use {{cite-book}}, etc., inside |2ndauthor= but have to split up all the parameters using |title2=, |location2=, |publisher2=, etc. You can do this for the Allen and Stigand sources, but it won't work if you allow editors to insert a reference in the form of a {{cite-*}} template using |trans_from=. — Sgconlaw (talk) 13:25, 21 March 2024 (UTC)Reply
However, a possible workaround is to avoid using |newversion= and to put the citation templates into |section= instead. — Sgconlaw (talk) 13:29, 21 March 2024 (UTC)Reply
Okay, thanks for the hint. I edited the template and it seems to work as desired. MuDavid 栘𩿠 (talk) 02:05, 22 March 2024 (UTC)Reply

Ancient Greek conjugation template labels contracted forms as uncontracted edit

Template:grc-conj, when used to show a contract verb, is supposed to give two tables, one uncontracted and the other contracted. But when I use {{grc-conj|fut-con-a|...}}, the contracted table is also labeled as "Uncontracted". I noticed this on ἐλαύνω, where {{grc-conj|fut-con-a|ἐλ|dial=att}} produces

LaetusStudiis (talk) 15:19, 21 March 2024 (UTC)Reply

It's best to post problems with {{grc-conj}} in Module talk:grc-conj. They may not be fixed anytime soon, but at least they'll be in a central location. — Eru·tuon 18:36, 21 March 2024 (UTC)Reply
Actually, there appears to be a post about this already at Module talk:grc-conj § {{grc-conj|fut-ln|ἀγγελ|ἀγγελθ|dial=att}}, deux conjuguaisons non contractées. — Eru·tuon 18:38, 21 March 2024 (UTC)Reply
Finally fixed it. Just a single-character mistake but took a long time to find. — Eru·tuon 19:45, 21 March 2024 (UTC)Reply

Percent-encoded pipe and square brackets in T:rfv-sense (AE) edit

Look at AE, where the {{rfv-sense|de|regarding gender}} tag displays (Can we verify(gender%7cAE%5D%5D +) this sense?), and {{rfv-sense|de|also regarding gender}} displays (Can we verify(regarding gender%7cAE%5D%5D +) this sense). Why is it eating (not displaying) the first word, and why is it displaying the % stuff? - -sche (discuss) 04:23, 22 March 2024 (UTC)Reply

This can be fixed by adding additional URL encoding to the template code, but note that this template parameter is not intended to accept a reason. It is meant to take a unique "topic" identifier to distinguish RFVs for different senses under the same language on the same entry. I'm actually reluctant to fix the URL encoding issue as it would paper over the real problem (confusing parameters). I think we should retire unnamed parameter 2 and force the explicit use of |topic=. This, that and the other (talk) 06:41, 22 March 2024 (UTC)Reply
@This, that and the other What does the documentation sentence "If given, the specified text will be included at the end of a CSS span id contained in the request message." mean? This is confusing to me. Also it looks like you repurposed the old "topic" parameter, which was more open-ended. Was this intentional? Benwing2 (talk) 03:16, 23 March 2024 (UTC)Reply
@Benwing2 I didn't write that wording. Either way, it means that the text of |topic= or |2= will be appended to the "anchor" (id parameter) generated by the template. For example, {{rfv-sense|en}} will generate an anchor #rfv-sense-notice-en-, while {{rfv-sense|en|1 2 3}} or {{rfv-sense|en|topic=1 2 3}} will generate #rfv-sense-notice-en-1 2 3.
Yeah I don't know why I added support for {{{2}}} [3]. Moment of madness I guess. Probably {{{2}}} should simply contribute a pre-filled reason to the RFV section creation link. This, that and the other (talk) 07:15, 23 March 2024 (UTC)Reply
Fixed that problem, but {{rfv-sense|de|probably that's Translingual {{m|mul|AE}}}} still is breaking the template output. — Eru·tuon 04:12, 23 March 2024 (UTC)Reply
Thanks, all. I actually initially thought the issue was that this not input the template was intended to accept, and was going to post here just asking out of curiosity why it failed in this odd way (why was it eating the first word? what is the percent encoding coming from?), but the documentation seemed to suggest this use of 2 was OK. I note that T:rfv accepts a reason as 2, so it seems understandable that people would expect to be able to give a reason in T:rfv-sense too, but I think having that reason only be present in the wikicode (not displayed), and then auto-loaded when adding the section to WT:RFV (as suggested above), would be a reasonable solution. If it's easy to make it also handle {{rfv-sense|de|probably that's Translingual {{m|mul|AE}}}} at that point, great, but if not I see no problem with just updating the documentation to tell people not to do that. - -sche (discuss) 14:27, 23 March 2024 (UTC)Reply
@Erutuon @-sche It's fixable with {{ANCHORENCODE:string}}, which is specifically designed to generate anchor text from inputs containing links etc. Theknightwho (talk) 14:02, 27 March 2024 (UTC)Reply

Proto-Brythonic template edit

Instead of linking to Britonnic languages, it might make more sense to more specifically link to Common Brittonic. I don't know how to tweak this. Shoshin000 (talk) 13:30, 22 March 2024 (UTC)Reply

@Shoshin000 Which template are you referring to? Benwing2 (talk) 03:12, 23 March 2024 (UTC)Reply
Click on the "Proto-Brytonnic" link on aneval for instance. Shoshin000 (talk) 09:34, 23 March 2024 (UTC)Reply
@Shoshin000 Fixed. Benwing2 (talk) 22:16, 27 March 2024 (UTC)Reply

{{desctree|non|ok|id=yoke}} throws a module error

Lua error in Module:descendants_tree at line 39: Could not find the correct senseid template in the entry ok (with language non and id 'yoke')

but {{desc|non|ok|id=yoke}} links to the correct sense with no error:

Old Norse: ok

@Theknightwho. Chuck Entz (talk) 22:34, 22 March 2024 (UTC)Reply

@Chuck Entz Someone had accidentally copied a <noinclude> tag onto the page in the Elfdalian section, which the template parser was dutifully respecting by ignoring everything after it (since it had no closing tag). Theknightwho (talk) 23:21, 22 March 2024 (UTC)Reply

Template:R:cu:ESJS edit

This has been throwing a module error since the beginning of the week. It's pretty tricky because it only occurs on the template page itself, and is in a module invocation in a parameter that's not displayed. The only way to tell on the page that there's an error is by the Category:Pages with module errors link at the bottom of the page. Since it occurred at the same time as some edits to {{cite-book}} that JeffDoozan had just posted about on Theknightwho's talk page, I posted a reply there:

@JeffDoozan: {{R:cu:ESJS}} started throwing an invisible module error at about the same time you did this, and I suspect these changes are somehow involved. I don't really understand what's going on, but tinkering with html comments has narrowed it down to {{interval}} in the |entryurl= code throwing an error when |2= for the main template is missing, and |entryurl= not being displayed when |1= is missing. I have no clue why the module error didn't show up until now, since neither {{R:cu:ESJS}} nor {{interval}}/Module:interval have been edited recently and I didn't see anything about your edits that should have affected the |entryurl= parameter in {{cite book}} as used in this template. I'm obviously missing something. Chuck Entz (talk) 22:15, 17 March 2024 (UTC)]]Reply

No one seems to have read it who had the time and/or expertise to fix this, so I'm bringing it here. It only happens when the first two positional parameters are empty, which is true on the template page itself. The tempate has no provision for doing things differently there and has the parameter references scattered throughout the template, so it's not something I could fix easily with "noinclude" or "includeonly" tags. It's true that the template has had this deficiency all along, but I brought it to JeffDoozan's attention because it's only after the recent edits that {{cite-book}} responded to it with a module error. I don't really care whose fault this is, but we can't have it stay in CAT:E forever. Thanks! Chuck Entz (talk) 23:14, 22 March 2024 (UTC)Reply

@Chuck Entz I wrapped the whole thing in <includeonly> ... </includeonly> tags. This should always work in cases like this. One issue for sure is things like {{#ifexpr:{{{2|}}}>45|+4}}; in template space, |2= is undefined and the expression {{{2|}}} evaluates to a blank string, making the thing inside of #ifexpr: expression look like {{#ifexpr:>45|+4}}, which is malformed and results in Expression error: Unexpected > operator. I'm not sure if this counts as a module error (probably not) but it's certainly not good. The module error may come from the recent parameter checking added to {{cite-book}}. Benwing2 (talk) 03:10, 23 March 2024 (UTC)Reply
Those usually show up in CAT:PFE, which I also patrol. Chuck Entz (talk) 03:18, 23 March 2024 (UTC)Reply
I looked at this when Chuck first posted it but didn't see any way the param checking in cite-book could have caused the error unless it was exposing some sort of deep, weird interaction between the templates so I left it as-is in case someone else wanted to take a deeper look. JeffDoozan (talk) 18:41, 23 March 2024 (UTC)Reply

Rhymes in Template:pl-pronunciation edit

This template does not produce the correct rhyme categories if the stress is anything other than penultimate (such as at matematyka and Jujuy). İʟᴀᴡᴀ–Kᴀᴛᴀᴋᴀ (talk) (edits) 16:39, 23 March 2024 (UTC)Reply

@Ilawa-Kataka There is going to be a new module where this is handled. Vininn126 (talk) 17:03, 23 March 2024 (UTC)Reply
@Ilawa-Kataka @Vininn126 Yes, my apologies, I have this partly finished. Benwing2 (talk) 18:57, 23 March 2024 (UTC)Reply
BTW to give credit where it's due, the module in question was originally created by User:Catonif. Benwing2 (talk) 18:58, 23 March 2024 (UTC)Reply

Latin entries incorrectly containing M&A template edit

Some Latin entries contain the template {{R:M&A}} even though they are not actually in the phrasebook (eg Hesperus, occiduus, valentulus), and so are wrongly in Category:Latin words in Meissner and Auden's phrasebook. The template is able to detect this using Module:R:M&A and displays "[0 phrases]". Can the template be removed from these pages by a bot? Weylaway (talk) 19:27, 23 March 2024 (UTC)Reply

I now have a full list of the words and there are only about 30, but I would appreciate it if I could be given AutoWikiBrowser permission so I can use Javascript Wiki Browser to fix this and other things in the future. For instance I would also like to fix pages that use the {{Q}} template but don't use the "thru" parameter to properly specify a range of lines. Weylaway (talk) 00:00, 25 March 2024 (UTC)Reply
@Weylaway: Done. Benwing2 (talk) 19:53, 25 March 2024 (UTC)Reply
Thank you. Weylaway (talk) 21:24, 26 March 2024 (UTC)Reply

names template requires grammar fix edit

e.g. at Lída: {{given name|cs|female|dim=Lidmila|dim2=Ludmila}}, it says "a diminutive of the female given names Lidmila or Ludmila". It should be either "the name X or Y", or "the names X and Y". Equinox 05:15, 26 March 2024 (UTC)Reply

Honestly this sounds fine to me as written but we could change the conjunction in this case to "and" ("the name X or Y" sounds strange to me). However, the conjunction "or" is used in a lot of places, e.g. in the masculine and feminine equivalents (|m=, |f=) and it's not clear to me it could be switched in those cases to "and" without sounding strange. Another possibility is changing the text of diminutives to read more like "a female given name, diminutive of Lidmila or Ludmila", avoiding the singular/plural issue entirely. Benwing2 (talk) 02:01, 28 March 2024 (UTC)Reply

Module:Quotations edit

Could this be changed so that it allows separators to be specified on a work level as well as an author level? Quotes from the Vulgate are currently displayed like "Genesis.1.1". Weylaway (talk) 21:26, 26 March 2024 (UTC)Reply

Insertion of undocumented {{auto doc}} edit

Why has seemingly ineffective invocation of undocumented template {{auto doc}} been added to documentation page Module:RQ:pi:Sai Kam Mong/testcases/documentation? All it seems to achieve is the addition of the red text, "Unable to auto-generate documentation for this module page.", which is just confusing when displayed on Module:RQ:pi:Sai Kam Mong/testcases. I am minded to undo this addition. It was added on 9 March 2024, so it doesn't look like a temporary feature. --RichardW57m (talk) 10:41, 27 March 2024 (UTC)Reply

More techno-imperialism. Probably preparing for AI-generated entries to dispense with pesky manual contributors. DCDuring (talk) 12:54, 27 March 2024 (UTC)Reply
@RichardW57m Hi. Please ping me in the future when you see my bot has made a change you question, so I will make sure to see it. I forgot to document this template but it's used on pages where Module:documentation will autogenerate the documentation if no documentation is present; it explicitly requests Module:documentation to autogenerate the documentation. The idea is that if you want to put manual text on a doc page but you also want the autogenerated documentation, you use {{auto doc}} to explicitly request the latter. The way it's set up, it normally works when you view the module page itself but not when you directly view the documentation page. I should fix the message to make this clearer. I'm not sure why I put it on the page in question because I don't think there's any autogenerated module documentation available for that page; you can go ahead and take it out on that page. Benwing2 (talk) 18:33, 27 March 2024 (UTC)Reply
@Benwing2: I asked here because I thought this instance might be part of a general pattern. It seems that I need to add some categorisation. How in general should test case modules be categorised? As parent plus cat:Testcase modules and, for example, where applicable, cat:Pali testcase modules? Copying the parent categories just looks like clutter to me, but seems to be happening with automatic categorisation - but this could just be happening by oversight. --RichardW57 (talk) 08:10, 28 March 2024 (UTC)Reply
@RichardW57 You should probably use {{module cat}} (which should have good documentation) and follow the example of another test page. They do currently copy the parent categories but I'm not sure that is the best; I could be persuaded to change this to work in some other way. The advantage of using {{module cat}} is changes like this can easily be made in one place and propagate everywhere. Benwing2 (talk) 08:15, 28 March 2024 (UTC)Reply
@Benwing2: OK, I'll use that. The automation could be enhanced by recognising standard (TBC) prefixes such as 'RQ:pi:' as indicating language Pali and type 'Quotation and usage example' (which is a misnomer - 'and' should be replaced by 'or' to make the description true), though the yield may be fairly small. --RichardW57m (talk) 10:09, 28 March 2024 (UTC)Reply
@Benwing2: Done. --RichardW57 (talk) 11:36, 29 March 2024 (UTC)Reply

the template for adding Set-not-Topic categories is T:topic edit

I notice that e.g. CAT:en:Pinks advises (correctly) that it's a Set and not a Topic cat: "NOTE: This is a set category. It should contain terms for pinks, not merely terms related to [the topic of] pinks." I also notice the template which adds e.g. Mexican pink to it is T:topic. This seems like it could be confusing.
I wonder if we should change the main name of T:topics to something more indicative of its function, like "catlangcode" with a shortcut like "clc" mirroring catlangname and cln? (Could keep "topic"/"topics" as redirects too so as not to disrupt people who are used to them.) Or if we split the naming systems of topic vs set categories so that the scope of a category is discernible from its name and doesn't require users to click through to read the category description (or did we decide against that?), then maybe we'd just need a separate T:setcat (or something) at that point. - -sche (discuss) 14:44, 28 March 2024 (UTC)Reply

@-sche So the way I have handled this so far is to rename what formerly were "topic" categories to be "related-to" categories, preserving the name "topic" for the union of "related-to" and "set" categories. (Actually there are other types beyond just related-to and set categories; see Module:category tree/topic cat/data/documentation#Category types.) The term "related-to" is a bit awkward but I think it conveys pretty well what the purpose is, more than "topic" does. An alternative is to use a template like {{group}} or {{groups}} or {{groupcat}} or similar. As for splitting related-to and set categories, I don't think we decided against it but the discussion didn't come to a conclusion; there were some issues that we haven't yet resolved. Benwing2 (talk) 01:10, 30 March 2024 (UTC)Reply

Alphabeticisation of subcategories of Lithuanian terms suffixed with -mas edit

cat:Lithuanian terms suffixed with -mas has three subcategories, those suffixed by -imas, -umas and -ymas. These are sorted in English order, under alphabetic heads 'I', 'U' and 'Y'. Shouldn't they be sorted by Lithuanian alphabetic order, so in the order -imas, -ymas and -umas, and probably under initial letters 'I' and 'U'? (I don't think we can thoroughly 'Y' anyway as a header for Lithuanian ordering.) Pinging @Benwing2, Fay Freak. --RichardW57m (talk) 15:07, 28 March 2024 (UTC)Reply

Yeah, this is interesting, as in the superordinate category Category:Lithuanian terms by suffix y is sorted at the place customary for Lithuanian again. Fay Freak (talk) 15:34, 28 March 2024 (UTC)Reply
@Fay Freak: You mean by the Wiktionary sort order for Lithuanian. In the customary Lithuanian sort order, -yba comes before -izmas because 'b' comes before 'z' and 'y' only orders after 'i' as a tie-break. --RichardW57 (talk) 01:47, 29 March 2024 (UTC)Reply
@RichardW57 They are sorted that way because the parent categories are manually specified on the child category pages using raw Wikicode. I think if they used a template to do the categorization, things would work better; the sort order for Lithuanian in Module:languages/data/2 does indeed sort y with i. Pinging User:Theknightwho who may have thoughts about this. Benwing2 (talk) 21:47, 29 March 2024 (UTC)Reply

the label "color" edit

I was cleaning up entries which were in both the top-level "Colors" cat and the relevant subcat, e.g. Yale blue (already in "Blues"), and I notice it's {{lb|en|color}} that adds the redundant "Colors" category. IMO this label is not useful, it just tells you that "A dark azure colour" is a color, which the definition already tells you. (Should we label saleswoman {{lb|en|woman}}?) So I am inclined to remove the label altogether. (I removed it from a few entries following this, but upon realizing the scope of the issue, am coming here.) But if we don't remove the label, I am inclined to at least remove the categorization, because double-categorizing things into both parent and child categories is generally undesirable.
Thoughts? (It'd be good to ensure any entries the label is removed from are already in a subcat of "Colors" or, in rare cases where something is a non-visible color like octarine, the top-level "Colors" category.) - -sche (discuss) 19:13, 28 March 2024 (UTC)Reply

Yes, this doesn't seem to be a correct use of a label. — Sgconlaw (talk) 19:20, 28 March 2024 (UTC)Reply
Agreed. So long as it is clear from the definition that it is a color, the label adds nothing. It's worse than the "(anatomy) elbow" stuff you see sometimes, because at least that doesn't involve needless repetition of a word on a sense line. This, that and the other (talk) 21:40, 28 March 2024 (UTC)Reply
Agreed, feel free to correct. Benwing2 (talk) 21:49, 29 March 2024 (UTC)Reply

Template:table:colors edit

I notice that Template:table:colors also categorizes any entries it's on into CAT:foo:Colors, e.g. lime green. I am inclined to change it to not apply that category, thus requiring the relevant subcategories to be applied manually (like "CAT:en:Greens" in the case of lime green). Alternatively, if the template is supposed to be used on all and only those pages which are "top-level" primary colors we feel are 'worthy' of double-categorization into both the relevant subcats and the top-level Colors cat, then many things like "lime green" and "mint green" are clearly not regarded as fundamentally different colors from "green" in English and would need to be removed from the table, no? Thoughts? - -sche (discuss)

Agreed on removing the top-level category. —Justin (koavf)TCM 22:21, 31 March 2024 (UTC)Reply
@-sche Also agreed. I think maybe theoretically this table is supposed to be used only on basic colors, but I checked the usage of the Spanish variant and it's also used for the equivalent of cream, fuchsia, cobalt blue and several other random colors. Benwing2 (talk) 22:32, 31 March 2024 (UTC)Reply
I'm definitely not a fan of this template. The colors weren't well chosen to begin with ("lime green?", "mint green?", "magenta"?), and different languages divide up the color space differently. For a language that uses the same word for blue and green, what color do you show? The idea of choosing a single hue to show what a given color name depicts is particularly bad for proto-languages and dead languages. I helped to get the Proto-Indo-European one deleted by pointing out that Latin flavus (yellow) and English blue are from the same PIE root, but there are no doubt others that deserve the same fate. Chuck Entz (talk) 23:37, 31 March 2024 (UTC)Reply
I think the template could be useful if set up and used correctly. I'm under the impression that different languages dividing the color space differently is intended to be handled by modifying the template for that language, the way Template:table:colors/egy does, and certainly, I think Template:table:colors/egy is useful. But this probably does make the top-level, language-nonspecific template a bad idea, because having it encourages people to just use its values. The way the template is currently set up, people's desire to have a table with no empty cells results in many things being separated as fundamentally different colors which should not be, which (I agree) severely reduces the value of the template. It's impossible to discern that the reason the Russian template separates light and dark blue is that those are regarded as different colors in Russian (like pink vs red in English), when the English template turns around and separates two "blue" fields with virtually the same colors, and separates out three green fields, even though the only fundamental colors are one "green" category with various shades and one "blue" category with various shades. Maybe I'll try and clean up the English table later to have its own values like the Egyptian table does. - -sche (discuss) 19:12, 1 April 2024 (UTC)Reply
I revamped the English table. It could use more swatches to illustrate more shades of brown, grey, etc, but I tried to remove things that weren't separate 'core' colours, or subsume them under the relevant core colour. But as Chuck says, a lot of other tables also need revising, and maybe the idea of having a base Template:table:colors is bad because people just use that, and include everything it includes, rather than creating a table based on actually-recognized colours... e.g. the /ja table seems to just have translated the base table... - -sche (discuss) 01:28, 2 April 2024 (UTC)Reply
@-sche Thanks a million, I actually raised a very similar concern at the tea room in January. A Westman talk stalk

Phrasal verbs in Welsh and Irish edit

English phrasal verbs are subcategorised by the particle that the verb occurs with, e.g. Category:English phrasal verbs with particle (aback). Phrasal verbs are extremely common in Welsh, so it would be a good idea to have similar subcategories such as Category:Welsh phrasal verbs with particle (allan) - but many phrasal verbs are actually formed with multiword "particles", such as i fyny and i lawr.

I can see that Irish does indeed have subcategories like Category:Irish phrasal verbs with particle (faoi iamh) but it doesn't seem quite right to call these "particles". Any suggestions on a better term to use? Arafsymudwr (talk) 01:27, 29 March 2024 (UTC)Reply

@Arafsymudwr Ideally any such term would apply cross-linguistically, because currently these categories are all handled at the cross-linguistic level. Benwing2 (talk) 21:50, 29 March 2024 (UTC)Reply
@Benwing2 I'd be tempted to suggest simply saying English phrasal verbs formed with aback or Welsh phrasal verbs formed with i fyny in that case. But I realise that must be a lot of editing work, which is why I was hoping for a term that might cover multiword "particles" so the existing categories other than a few Irish ones can be left alone. Arafsymudwr (talk) 00:25, 30 March 2024 (UTC)Reply
@Arafsymudwr It's not actually so hard to make such a change as it can be done by bot; I've done similar category renames before. Benwing2 (talk) 00:39, 30 March 2024 (UTC)Reply
I actually think this suggestion made by @Arafsymudwr is a good one. Anyone else want to weigh in? Benwing2 (talk) 01:11, 30 March 2024 (UTC)Reply
  Support from me. This, that and the other (talk) 09:49, 30 March 2024 (UTC)Reply
  Support from me. Arafsymudwr (talk) 10:25, 30 March 2024 (UTC)Reply

merging lect info edit

Pinging a few people who might be interested: @Theknightwho, -sche, Surjection, Vininn126 It occurs to me we have info on different language lects/varieties in a whole shitload of places:

  1. Label data, e.g. for English: Module:labels/data/lang/en; for Chinese: Module:labels/data/lang/zh;
  2. {{alt}} "dialect" data, e.g. for English: Module:en:Dialects; for Chinese: Module:zh:Dialects (not currently defined);
  3. Module:etymology languages/data;
  4. the "varieties" and "aliases" fields in language data, e.g. Module:languages/data/2/extra;
  5. dialect synonyms data, e.g. for Chinese: Module:dialect synonyms/zh;
  6. category pages for individual lects, e.g. Category:Polari and Category:Jiaoliao Mandarin, which have parameters specified to {{auto cat|dialect=1}}.

This scattering and duplication of info is a real problem because inevitably the different sources get out of sync. I have been thinking of how to merge some of this data. My thoughts:

  1. I already proposed eliminating #2 (the {{alt}} "dialect" data) in WT:Grease pit/2024/March#removing cruft from Module:labels/data/regional, and I have written the code to do this, so that the dialect data modules can read and convert label data modules.
  2. I just added support to Module:category tree/poscatboiler/data/language varieties (which implements #6, the language variety category pages), so that Wikipedia and Wikidata information can be pulled out of label data modules automatically, and the support was already there to automatically pull this info out of Module:etymology languages/data when present. An example of this in action is Category:Jiaoliao Mandarin, where the links to the English and Chinese Wikipedia articles in the upper right-hand corner come from the Wikidata item listed for label Jiaoliao Mandarin in Module:labels/data/lang/zh.
  3. I am thinking of further moving info currently specified on individual category pages into the label data modules, perhaps into "extra data" modules similar to Module:languages/data/2/extra (e.g. Module:labels/data/lang/zh/extra) so they don't bloat the label data modules themselves.

I am soliciting thoughts for how to centralize lect information. Either we can continue augmenting the label data information, as I've been doing, or we can create a separate set of language-variety modules that contain all the info needed for the various applications mentioned above. Benwing2 (talk) 03:03, 30 March 2024 (UTC)Reply

I agree that most of the same labels need to be used in different places and that scattering them is problematic. As I am not a programmer, I am unsure which option would be best. I have an inkling that having a separate language module might be preferred by some. Vininn126 (talk) 07:41, 30 March 2024 (UTC)Reply
I agree with trying to consolidate as much of this as possible. I don't think I even knew/remembered Module:en:Dialects even existed (and now that I do, I'm unsure why it does exist as something different from the others). I'm unsure whether Module:labels/data is the best place for it to get consolidated to, though.
On one hand, I understand that because many (most?) of these will occur as {{label}}s, putting them in Module:labels avoids that module (and its human users) having to look somewhere else to process some labels. OTOH, (A) the label module seems like a less expected place to look for "language/lect data as such", compared to a language/lect module, particularly for any lect data that isn't used in a {{label}}, because (B) putting something in Module:labels/data strongly suggests that we think it is (or will/should be) used in {{label}}s, but AFAIK at least a few "etymology-only languages" really are "etymology-only" (the substrate codes for sure; are there others?), so does it make sense to have some things which aren't used in labels, or where we've added them to the module without regard for whether / intention that they be used in labels, be in the labels-data module?
But I understand that if we decide to consolidate things to a separate Module:subsumed language varieties (or whatever name) instead, the question is then, is that confusing to users, for certain labels to be in Module:labels while lect-y ones are in another module? So I'm unsure what's best.
BTW, it occurs to me that e.g. "en-CA" "Canadian English" and "fr-CA" "Canadian French" exist as etymology languages that can be used in etymologies, but you can also deploy either of them (or at least, their categories) as {{label}}s via {{lb|en|Canada}} and {{lb|fr|Canada}}, so if we centralize lect info to Module:labels, I guess it needs to be able to account for "Canada" being a lang=fr-specific alias of (or at least, adder of the category of) "Canadian French" while also being a lang=en-specific alias of "Canadian English"...? - -sche (discuss) 14:04, 2 April 2024 (UTC)Reply
@-sche Thanks for your thoughts. Yeah there are indeed some issues with centralizing into the labels modules, as you point out, although there are also issues with not doing this (as you also point out). So far what I've been doing is putting descriptions and parent label info in Module:labels/data/lang/zh and pulling it out in Module:category tree/poscatboiler/data/language varieties (which implements language variety categories such as Category:Wuhan Mandarin). This avoids the need to put this information in the call to {{auto cat}} itself (although this can still be done). Note also that the way the above module distinguishes "lect" labels from "non-lect" labels is by the presence of the parent field; I thought of introducing a specific nolect field to indicate non-lect labels but it seems unnecessary if all lects have a parent field (which is set to true for top-level lects). The basic problem is that many of the different data modules are used for slightly different purposes, so it's difficult to merge them all. The issue with the Canada label in particular is that it's in Module:labels/data/regional and is used by several languages; this can potentially be solved e.g. by moving such labels into the language-specific modules if they have language-specific info attached. Benwing2 (talk) 21:20, 2 April 2024 (UTC)Reply

Request for list of pages by time usage edit

Could someone with the technical knowledge make or teach me how to make a list of the, say, top 1000 pages by their "CPU time usage" or "Real time usage" or "Lua time usage"? Some pages hop in and out of Cat:E sometimes and I think such a list would be helpful to see which pages are in the "gray area". --kc_kennylau (talk) 11:51, 30 March 2024 (UTC)Reply

@Kc kennylau This is a very good question and I honestly don't know how to do it. It would be great if MediaWiki exported a page showing this but AFAIK they don't. In order to do this, then, you'd first have to figure out how to get the usage stats on a given page, then run this on the pages most likely to be taking up lots of time (which would probably be some combination of pages with lots of template calls and pages that have a lot of Wikitext). To get the usage stats, ideally there would be an API exposed by MediaWiki to get the usage stats but I looked and I can't find one; the alternative is to scrape the page when previewing but that would be rather painful to write, I think. You might want to search Phabricator and/or contact a MediaWiki developer like Tim Starling for this. Benwing2 (talk) 20:46, 30 March 2024 (UTC)Reply
@Benwing2, Kc kennylau: it's easier than you think. There's a parser profile report embedded as a comment in the HTML source. I use it all the time. I don't know if it's the same for bots or AWB, but it wouldn't take that long to just "view source" in your browser and save it for automated extraction later. Chuck Entz (talk) 21:40, 30 March 2024 (UTC)Reply
@Chuck Entz Right, that's the option I mentioned of scraping the page when previewing. Maybe not so hard to write but not ideal. Benwing2 (talk) 21:52, 30 March 2024 (UTC)Reply
@Benwing2 I'm not sure what you mean by "previewing". If you mean clicking "Edit", then "Preview", that's not necessary. All it takes is going to the page (not viewing the diffs, but just going to the page).
For instance, if I click on the link for a, right-click on the page, then select "View Page Source" from the menu, then page up from the bottom a bunch of times, I see:

<!-- NewPP limit report
Parsed by mw‐web.eqiad.main‐78d6c98b98‐tjckl
Cached time: 20240330220709
Cache expiry: 2592000
Reduced expiry: false
Complications: [vary‐revision‐sha1, show‐toc]
CPU time usage: 14.922 seconds
Real time usage: 17.072 seconds
Preprocessor visited node count: 144535/1000000
Post‐expand include size: 1873158/2097152 bytes
Template argument size: 163519/2097152 bytes
Highest expansion depth: 25/100
Expensive parser function count: 72/500
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 36525/5000000 bytes
Lua time usage: 9.702/10.000 seconds
Lua memory usage: 72225617/104857600 bytes
Lua Profile:
recursiveClone <mwInit.lua:41> 1200 ms 12.8%
 ? 920 ms 9.9%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::gsub 700 ms 7.5%
pcall 640 ms 6.9%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::match 380 ms 4.1%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::getAllExpandedArguments 340 ms 3.6%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::redirectTarget 300 ms 3.2%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::toNFD 280 ms 3.0%
<mw.title.lua:50> 280 ms 3.0%
(for generator) 260 ms 2.8%
[others] 4040 ms 43.3%
Number of Wikibase entities loaded: 0/400
-->

<!-- Transclusion expansion time report (%,ms,calls,template)
100.00% 14458.411 1 -total
8.85% 1279.244 400 Template:head
8.37% 1210.145 184 Template:inh
7.39% 1068.371 334 Template:no_deprecated_lang_param_usage
6.55% 946.334 536 Template:l-self
6.01% 868.748 24 Template:audio
5.84% 844.661 27 Template:catlangname
5.74% 829.833 69 Template:cite-book
4.92% 711.522 369 Template:l
4.86% 702.181 87 Template:cite-meta
-->

<!-- Saved in parser cache with key enwiktionary:pcache:idhash:106923-0!dateformat=mdy and timestamp 20240330220709 and revision id 78696257. Rendering was triggered because: page-view
-->

I escaped the comments and added line breaks to make it readable, but otherwise, that's it. If you know how to extract everything between "Lua time usage: " and "/10.000 seconds" you can get the processor time completely painlessly. Chuck Entz (talk) 22:22, 30 March 2024 (UTC)Reply
@Chuck Entz: Thanks for the insight. However, generating the source code still seems to be an expensive operation. Does en.wikt save the data somewhere in the "page" as returned by a pagegenerator? --kc_kennylau (talk) 22:35, 30 March 2024 (UTC)Reply
@Kc kennylau I don't think so. As I said above, maybe there's an API you can use to request this info, but if so I don't know it. I would suggest searching Phabricator [4], and if you can't find anything, opening a ticket about how to do this. Benwing2 (talk) 22:39, 30 March 2024 (UTC)Reply
@Kc kennylau: I doubt it, since it's different for every page load. I don't think there's any automated way to do this, so it would require visiting each page and working with the generated source. It's very easy compared to other things you might do without a bot, but there's no comparison to anything done by bot. The only consolation is that there aren't that many candidates and it could be optimized to less than a minute per page. Look at User:Chuck Entz/Memory and subpages to see what I pieced together using such crude techniques when I first starting trying to figure out what was going on with all the Lua errors that were popping up at the time. I used English frequency lists as a crude way of narrowing things down. If I were doing it now, I would start with single-character entries for Latin and Han scripts, and short (c)V(c) sequences with uncommon letters like "q", "x" and "z" excluded. You could alternatively use the number of templates or the number of L2 sections in the wikitext per page as tests that could be determined from the dumps, just to narrow things down. Chuck Entz (talk) 00:12, 31 March 2024 (UTC)Reply
(@Chuck Entz there was no "reply" button for this because you used ~~~ instead of ~~~~) This method would miss what I was trying to look for in the first place. The Coptic inflection tables had a lot of links and took a lot of parsing time (which I have since optimised). I am trying to find pages in similar situations. --kc_kennylau (talk) 23:52, 30 March 2024 (UTC)Reply
Oops. I manually added the date and time to fix that. As for other cases like Coptic: Coptic was exceptional. All the pages I've seen in CAT:E with timeouts have been due to large numbers of templates or due to bugs that caused things like partial recursion. That's not to say that there aren't potential cases that just haven't gotten bad enough yet. I'm not discouraging you from pursuing other options like those offered above- I just wanted to add a low-tech option in case high-tech ones aren't available. I only elaborated because I wasn't sure if everyone understood what I was talking about. Chuck Entz (talk) 00:51, 31 March 2024 (UTC)Reply

aggressive GC patch is rolled out edit

It appears that the more aggressive garbage collection patch [5] is rolled out. I notice that the memory on a has reduced to 71MB; I'm not sure if it's related but quite possibly. Benwing2 (talk) 02:06, 31 March 2024 (UTC)Reply

fixing Reply gadget on the Grease pit edit

@This, that and the other Whatever you did for the Beer Parlour works in that I can reply to posts directly from WT:BP. However, I can't do that in the Grease Pit. Could you apply the same change here as well? Benwing2 (talk) 05:18, 31 March 2024 (UTC)Reply

@Benwing2 I'm leaving this reply from WT:GP itself using the Reply tool, without having changed anything. However, I've also noticed that it doesn't work all the time. There seems to be some kind of intermittent fault, whether on our end (depending on something within the wikitext of the discussions?) or on the server end. Not really sure where to start tbh. This, that and the other (talk) 08:29, 31 March 2024 (UTC)Reply
Having reloaded the page a few times, I can now only use the reply tool on discussions in the top half of the page (February 2024) and not the bottom half (March 2024). This, that and the other (talk) 08:33, 31 March 2024 (UTC)Reply

Postal Romanization in derivation categories edit

Right now, Category:Tagalog terms derived from Postal Romanization is in CAT:E because {{auto cat}} doesn't know what to do with "Postal Romanization". Category:English terms derived from Postal Romanization exists, but it doesn't use {{auto cat}}. We do have romanizations such as "Wade-Giles" in Module:etymology languages/data, but I'm not sure which language code to attach it to. My understanding is that the Postal Romanization may not be strictly or solely Mandarin, being based on a Nanjing dialect- I'm not sure which language. Pinging (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): for input from people who would know. Chuck Entz (talk) 15:27, 31 March 2024 (UTC)Reply

Apart from "Amoy" being from Hokkien, the names seem to be from "southern (Nanjing) Mandarin" as the Wikipedia article Chinese postal romanization suggests. Is there no option to simply attach it to "Chinese"? --kc_kennylau (talk) 15:37, 31 March 2024 (UTC)Reply
@Chuck Entz Postal Romanization is not a recognized etymology language; we could add it but I'm not sure it's really needed. Benwing2 (talk) 20:31, 31 March 2024 (UTC)Reply
@Benwing2 look at Category:English terms derived from Postal Romanization. Most of the Wade-Giles spellings for place names like Pep'ing, Ssu-ch'uan and Nan-ching are completely obsolete, but Postal Romanization ones like Peking, Szechuan, and Nanking are still recognizable, even if the Pinyin spellings like Beijing, Sichuan and Nanjing are currently prescribed. Chuck Entz (talk) 00:08, 1 April 2024 (UTC)Reply
@Chuck Entz OK, I added code zh-postal for this and cleaned up the pages that referred to it. Benwing2 (talk) 01:57, 1 April 2024 (UTC)Reply

Invalid params in call to Template:lv-decl-noun-1 edit

Whilst editing the Latvian entry autors, I noticed the following error in the declension template:

Invalid params in call to Template:lv-decl-noun-1: 6={{{6}}}; 7={{{7}}}; 3=1st; drop-v=; 5={{{5}}}.

This error appears to apply to all instances of the template which I've checked, where the general format is {{lv-decl-noun|autor|s|1st|extrawidth=-60}}. I've read the documentation for Template:lv-decl-noun, Template:lv-decl-noun-1, and have tried removing the extrawidth parameter, but nothing stood out to me and I'm assuming that it's a fundamental issue with the template design. From what I can see, none of the original contributors to the templates appear to be still active, so I'm wondering if there's anyone in the general audience who would be able to look into this? Helrasincke (talk) 18:19, 31 March 2024 (UTC)Reply

This is quite a problematic situation, as the general format in which the template is called is completely incompatible with the parameters that are actually used by the template, and there is no template documentation to tell us the intended way of calling the template... --kc_kennylau (talk) 19:09, 31 March 2024 (UTC)Reply
Edit: the documentation is located at {{lv-decl-noun}}, and searching through the old versions, I still have completely no idea why |extrawidth=-60 got there in the first place. I can't find it in the old versions at all. --kc_kennylau (talk) 19:20, 31 March 2024 (UTC)Reply
(I have struckthrough my previous comments as I have investigated further.) @Helrasincke: Basically {{lv-decl-noun}} is a central hub that calls other inflection table templates depending on the 3rd parameter, and in this case it calls {{lv-decl-noun-1}}, and it also passes on parameters to the sub-templates. However, certain parameters are used only for other declensions, so the parameter checker puts a warning in the preview page (but not the actual page) that there are unused parameters. The extrawidth parameter is used to adjust the width of the table. Since the warning only appears in the preview, I suppose one can just ignore it. --kc_kennylau (talk) 19:33, 31 March 2024 (UTC)Reply
@Kc kennylau: I think it's less stressful, and possibly less error prone, if similar templates have the same set of parameters, even though some of them not be used in some cases. The downside is that {{#invoke:checkparams|warn}} then has to be told that the redundant parameters are allowed. If the extra parameter is one that is easy for a human to generate, it makes sense to allow it. If a parameter is one that would frequently be omitted, like the 'alt' parameter in {{link}}, then it does make sense to warn if it supplied with a non-blank value. @JeffDoozan and I have already has this discussion (Module talk:checkparams#Gaps in Positional Parameters)about Lithuanian inflection templates, where the general pattern is {{name|stem-with-no-accent|stem-with-accent|...}}, which is easier to commit to memory even if the first parameter is then unused for accentuation pattern 1. Greek has a similar template pattern for recessive accents. --RichardW57m (talk) 14:00, 2 April 2024 (UTC)Reply
This warning was showing up because I add parameter checking to some of the templates called by {{lv-decl-noun}}. I fixed a bug on {{lv-decl-noun}} and adjusted the list of allowed parameters on each of the templates called by {{lv-decl-noun}} so they will no longer display a warning. JeffDoozan (talk) 00:02, 1 April 2024 (UTC)Reply
@JeffDoozan: I believe we should really be checking that the extra parameters (to the template ..-noun-1, such as {{{5}}}) are empty, and that there are no "real" extra parameters (such as, say, {{{9}}}). Unfortunately the current module doesn't allow for this, but I think I can still do this "manually". --kc_kennylau (talk) 10:21, 1 April 2024 (UTC)Reply
Actually, it seems that the module does not count empty number parameters. I have changed the main template to not pass {{{3}}} (the declension type) to the sub-templates. I have also used a bit of a hacky method to ensure in the sub-templates that the other named parameters are empty. In the long run we would preferably convert the templates to Lua. --kc_kennylau (talk) 10:55, 1 April 2024 (UTC)Reply
@Kc kennylau, JeffDoozan, Helrasincke: I've just found a way to easily allow 2 currently unused parameters when another 27 are actually used. Just use the two without effect, e.g. in the condition of a #if test that does nothing either way. There are a lot of Sanskrit declension templates that used to use the 3rd and 4th positional parameters for transliteration when inflected forms were wrapped in {{lang}}. The forms are now wrapped in {{l}} or similar, so became redundant, but are mentioned all over the place, including in templates for adjectives that build on templates for nouns. I think the proper long term way forward is to replace these templates by less specific ones, but even that's a lot of effort for mostly little gain. --RichardW57m (talk) 15:43, 3 April 2024 (UTC)Reply
@RichardW57m: The problem is that we want to ensure that the parameters that are not used are actually empty. --kc_kennylau (talk) 15:53, 3 April 2024 (UTC)Reply
@kc_kennylau: Why? Let sleeping dogs lie. --RichardW57m (talk) 16:14, 3 April 2024 (UTC)Reply
Isn't that the whole point? It's a preview warning and it gets added to a hidden category. --kc_kennylau (talk) 17:02, 3 April 2024 (UTC)Reply
@kc_kennylau:: Then I'd better raise an RfD on the module. I thought the main purpose was to catch typos and mistaken names in calls; it also catches attempts to use, for example, the {{|cat2}} parameter of {{head}} in language-specific headword templates that happen not to support it. I've used it to fix about half a dozen uses of the wrong name such as 1 for tr, g for 1, head v. entry (in dictionary references), and some of the latter type I've mentally noted as requests for enhancement. --RichardW57m (talk) 17:33, 3 April 2024 (UTC)Reply
@RichardW57m If the module is raising warnings about unused parameters, and you think those parameters should be allowed, then the obvious solution is to add support for those parameters. Deleting the module is like throwing out an alarm instead of doing something about whatever caused it to go off. Theknightwho (talk) 22:35, 3 April 2024 (UTC)Reply
@Theknightwho: Or using blinkers to stop a horse panicking. Or leaving shrapnel in a wound rather than run severe risks in removing it. In some cases, the solution is to substitute better values invocation by invocation and enable their use - but getting better values takes significant effort. In another case, the better solution is actually to replace the templates by probably no worse templates that already exist - but that is not a higher priority, and I would check in each case that the new templates don't introduce new errors - they have generated erroneous outputs in several instances recently, and they don't have any testcases. The only issue caused by the unused parameters is slightly larger and presumably slower code. --RichardW57m (talk) 15:04, 4 April 2024 (UTC)Reply
Incidentally, I've not been deleting the module's invocation; I've been telling it what doesn't matter. However, I do remember you saying that it's not the sort of thing we should be using! --RichardW57m (talk) 15:04, 4 April 2024 (UTC)Reply
@RichardW57m I specifically said that because it's less efficient than rewriting the template in Lua, because it means the template's wikitext has to be parsed. It's certainly doable, but it would be better if we didn't have to. Theknightwho (talk) 15:07, 4 April 2024 (UTC)Reply
@Theknightwho: This was for a dumber version of the concept, which had to be told all the allowable parameters, just like call_quote_template in Module:quote. No automated template parsing was required. I think you were worried about the potential for parallel processing and all the Lua troubles that's been giving us over the years. --RichardW57m (talk) 15:19, 4 April 2024 (UTC)Reply
You are aware, are you not, that chopping and changing positional parameters is a recipe for disaster? --RichardW57m (talk) 15:04, 4 April 2024 (UTC)Reply

April 2024

Magic words appearing in WhatLinksHere edit

I noticed that WT:Todo/Lists/Entries using nonexistent templates had suddenly filled up with spurious transclusions of MediaWiki-implemented magic words like {{!}} and {{PAGENAME}}. This can also be seen in WhatLinksHere: [6]. These templates don't exist and are known by our Lua code to be magic words (well, at least {{temp}} itself treats them specially), so there should be no reason to attempt to transclude them.

Something has changed in the last week. It's only happening on this wiki, so it's coming from our Lua modules rather than MediaWiki itself. I'm pinging @Theknightwho as a starting point. This, that and the other (talk) 01:26, 1 April 2024 (UTC)Reply

@This, that and the other. I noticed this last week: if you look at the March 24 revision you'll see there are already some of those, so it had to have happened before then- not as many, so probably not long before. Chuck Entz (talk) 02:38, 1 April 2024 (UTC)Reply
After doing some spot-checking it appears that all of these have the magic words (as well as things like "!" and "=") wrapped in {{ }}. These were inserted as part or all of parameters in templates, in interwikis, and in categories. The use of {{PAGENAME}} in filenames makes me nervous, since they'll have to be fixed if the page is moved (those should have been subst:ed).
Come to think of it, either subst:ing all the {{PAGENAME}}s or replacing them with the pagenames themselves looks like a perfect job for a bot. Chuck Entz (talk) 03:47, 1 April 2024 (UTC)Reply
@Chuck Entz Yeah, some people systematically insert {{PAGENAME}} into Wikitext. I think it's a bad idea. Benwing2 (talk) 05:03, 1 April 2024 (UTC)Reply
@This, that and the other @Chuck Entz @Benwing2 This was down to an older version of the template parser which didn’t handle parser variables (i.e. magic words which don’t take any parameters), so it was still grabbing the title object. This was fixed about a week ago, but clearly hasn’t propagated through everywhere yet. Some parser variables can also act like magic words (e.g. {{PAGENAME}} vs {{PAGENAME:title}}), but many can’t (e.g. {{!}} and {{=}} will default to templates if you try), and some of them are case-sensitive while others aren’t, so I had to make sure it knew how to handle all the various possible inputs. As a side point, it is actually possible to use templates with those names by using (e.g.) {{msg:PAGENAME}}, which {{temp}} is also aware of, and is on my to-do list for the template parser. Theknightwho (talk) 15:15, 1 April 2024 (UTC)Reply

Der3, Rel3, Col3 edit

Please replace derx, relx in all Philippine languages (especially Tagalog) to colx templates. Thank you. Ysrael214 (talk) 02:36, 1 April 2024 (UTC)Reply

Welsh word 'hambon' edit

I'm trying to add the Welsh word 'hambon'. I've given various sources for the word used in context, but I'm given a

"This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please start a new Grease pit discussion and describe what you were trying to do. A brief description of the abuse rule which your action matched is: various specific spammer habits"

Could someone help resolve this? Wemblydumblediddle (talk) 08:58, 2 April 2024 (UTC)Reply

Your entry does not follow the correct formatting. Please see WT:EL and existing Welsh noun entries for examples. The "Cultural Significance" should perhaps at best be a "Usage notes" section. The most likely reason for the filter though is the usage of external links to e.g. YouTube. — SURJECTION / T / C / L / 09:03, 2 April 2024 (UTC)Reply
Thanks for getting back so quickly, I'm new to this. I can't think of a way around this, there isn't much in writing about the word. The only decent written source I could find is that Guardian article. Is there a way around the automatic filtering? Does someone have the authority to verify the entry, modifying it if necessary? Wemblydumblediddle (talk) 09:12, 2 April 2024 (UTC)Reply
Try publishing the entry again, but without the YouTube links (and ideally also with formatting changes you can gather from the two links I posted). — SURJECTION / T / C / L / 09:48, 2 April 2024 (UTC)Reply
Thanks, that worked. It's a shame I can't include the video, though. That Hansh video is probably the best example of the word in use; it features hambons explaining what it means to be a hambon, and it's very entertaining for West Wales Welsh speakers. Wemblydumblediddle (talk) 10:41, 2 April 2024 (UTC)Reply
Wiktionary doesn't seem to block youtube links per se, because I successfully added a youtube link in the third quotation of this revision via Template:quote-av in order to confirm pronunciation and the stressed syllable. Does this go against the WT:CFI#Durably_archived policy? --Ssvb (talk) 20:15, 2 April 2024 (UTC)Reply
"Durably archived" is only a policy requirement when it comes to proving the existence of the word (as a WT:RFV/WT:ATTEST question); it's OK to add links to (ideally reliable or at least representative and inoffensive) youtube videos to show pronunciation, and we regularly provide References or Further reading links to various reliable online dictionaries. Brand-new users are currently prevented from doing so, because most such users are spammers, but we do get feedback like this maybe once a year(?) from legitimate users whom the filter has stopped... it's a question of whether the (large) amount of spam that gets stopped is worth the (small) amount of valid edits which get stopped. - -sche (discuss) 05:13, 3 April 2024 (UTC)Reply
@Ssvb: Generally, abuse filters are much stricter on new accounts: vandals, spammers and self-promoters almost always get blocked long before they stop being new. As for YouTube: it shouldn't be used to meet WT:CFI, but it can be used occasionally for other purposes. In general we try to avoid linking to anything commercial or promotional, so it's best to be as judicious and selective as possible. Chuck Entz (talk) 05:17, 3 April 2024 (UTC)Reply
@Chuck Entz: Thanks, that's good to know. The documentation of the quote-av template says "Do not link to any webpage that has content in breach of copyright" and this is very useful, but other than this, the information is pretty scarce and maybe it could be improved? I think that the new contributors would appreciate that.
In my quotation I provided a link to a fragment of a news report published by a news agency on their own official youtube channel, so it should be okay from the copyright standpoint. As for the "avoid linking to anything commercial or promotional" guideline, I'm afraid that even a quotation from a book of a modern author may be potentially twisted as a commercial promotion of that particular author. I guess, "don't quote the same legit source too often and don't quote any shady sources at all" could be a good plan, though the distinction between legit and shady sources may be subjective in some cases. --Ssvb (talk) 07:23, 3 April 2024 (UTC)Reply
@Ssvb These are guidelines, and you should use your common sense when it comes to things like "avoid linking to anything commercial or promotional". Copyright infringement could lead to legal consequences for Wikimedia concerning Wiktionary, which is why it says "Do not" rather than "Avoid". Benwing2 (talk) 07:57, 3 April 2024 (UTC)Reply

alternative forms respect labels edit

I've fixed {{alt}} so if the tags specified after || can't be found in the "dialect data", they are looked up as labels. This respects omit_preComma and similar flags, so you can say something like

  • {{alt|en|Shi-jia-zhuang||also from|_|Pinyin|rare}}

and it correctly displays as

Here, the tag rare is a recognized label so it automatically links to the glossary; Pinyin is a label that normalizes to Hanyu Pinyin and links to Wikipedia; and the underscore prevents a comma from appearing. Benwing2 (talk) 03:36, 3 April 2024 (UTC)Reply

Ethiopic Letter Kurk edit

I cannot add the Ethiopic Letter Kurk. 2A09:BAC3:378F:D2:0:0:15:1B5 06:37, 3 April 2024 (UTC)Reply

I don't think that's a thing. See the letter names at w:Geʽez script#Geʽez abugida. --kc_kennylau (talk) 00:44, 4 April 2024 (UTC)Reply
@Kc kennylau|2A09:BAC3:378F:D2:0:0:15:1B5 That's not a complete list, though. But unless the IP can show us why he thinks it exists, we probably can't help any further. If it hasn't been encoded in Unicode (either as one character or a sequence), it can't be added. --RichardW57m (talk) 16:38, 8 April 2024 (UTC)Reply
google:"Ethiopic Letter Kurk" turns up exactly one hit: this thread. I suspect that this is not the right name for an Ethiopic letter. ‑‑ Eiríkr Útlendi │Tala við mig 18:19, 8 April 2024 (UTC)Reply

Category:Terms written in multiple scripts edit

I notice that entries are categorized into this category manually. It seems like {{head}} et al could detect multiple scripts and add the category automatically, at least in most cases. No? Is the issue that checking would be too 'expensive'? Would it be more expensive than the code that adds the "Terms spelled with..." categories? - -sche (discuss) 14:53, 3 April 2024 (UTC)Reply

@-sche: I think there may be some complex cases because Wiktionary scripts may overlap, e.g. the Beng and as-Beng scripts for Sanskrit, and I'm not sure that Arabic script variants don't overlap even for some varieties of Arabic. It gets worse if one considers scripts not recorded as being used for the language of the text they're found in. --RichardW57m (talk) 16:04, 3 April 2024 (UTC)Reply
Sure, some cases could still have to be added manually, but it seems like most cases could be handled automatically. Re "scripts not recorded as being used for the language of the text they're found in": isn't that orthogonal, or what am I missing? A Sanskrit term written (e.g.) partly in Beng and partly in Arab is a "term written in multiple scripts", regardless of whether the language has used both scripts, or neither script, or only one script, and regardless of whether our modules record either, both, or neither script as being used for the language, isn't it? The headword template/module just has to look at the characters in the pagetitle/head, determine if they're from more than one script, and add the category if so. We only need to fall back on manually adding the category if a pair of characters appear to be from the same ISO- or Wiktionary-code-having script, but actually represent different scripts (like might've been the case for subvarieties of Mong until we split Mong and gave them their own sub-codes, and like might be the case for subvarieties of Egyh if Egyh ever becomes computer-encodable and font-supported). - -sche (discuss) 16:20, 3 April 2024 (UTC)Reply
@-sche: Consider Sanskrit কামো (kāmo). It's correctly recorded as being in both the Assamese and the Bengali scripts. A dumb algorithm could consider it to be written in a mixture. It's also a Pali word. Now, Pali is currently recorded as using the Bengali script but not the Assamese script, so there is no ambiguity.
Now considered Pali ৰরো (varo). We don't have a record of an attestation yet, but I think it's only a matter of time before it turns up. The word's currently treated as being in the Bengali script, but the first letter belongs to the writing system used for Assamese, but not Bengali, while the second letter is in the writing system not used for Bengali. If you don't like this word, look at the last word of Example 20 on page 8, the 20th displayed page at https://archive.org/details/pali-grammar/Ucchatar%20Pali%20Bhasha%20Shikkha%20by%20Karunabangsha%20Bhikkhu/page/n19/mode/2up. That word also has both the letters in it. To keep things clean, we might need to declare a new script (pi-Beng) for Bengali script Pali, and prevent the analysis considering the other scripts. So far I've preferred to avoid the complication of doing that, and put up with the inconveniences occasioned by the word , which is written entirely in a letter from as-Beng, which shows up in Example 4 (the second example on that same page).
Now, we may be able to do a reasonable job if we partition the scripts as Unicode does, and ignore the 'inherited' and 'common' characters. We might miss some interesting examples in Burmese script Pali, where different local groups have rather different sets of characters, and for Pali, I'm not talking about the difference between NGA and MON NGA, which are distinguished only by the encoding in real Pali words. --RichardW57m (talk) 17:18, 3 April 2024 (UTC)Reply
@RichardW57m @-sche Category:Chinese terms written in multiple scripts is autogenerated by simply looking for terms that have both Hanzi and non-Hanzi characters in them. I don't see why we can't automate this everywhere by simply taking wha tever is the autodetermined script (which is based on which script has the most characters in the term) and looking to see whether all characters belong to that script. There's no problem in this approach if two scripts share some characters. Benwing2 (talk) 21:13, 3 April 2024 (UTC)Reply
And worst-case scenario, if Indian scripts are actually problematic, just exclude those from being auto-categorized (so people still have to add entries in those scripts to the category manually, just like they currently do: they're no worse off). - -sche (discuss) 21:41, 3 April 2024 (UTC)Reply
@-sche I implemented this. It started having false positives with spaces and hyphens, so I excluded them from consideration. However, there's still an issue with things like Area 51, where numbers aren't considered part of Latn. What do you think we should do here? Should we consider numbers as Latn, so that e.g. a Greek term with numbers in it still gets considered a "term written in multiple scripts", or should we exclude numbers entirely, or do nothing? Benwing2 (talk) 22:26, 3 April 2024 (UTC)Reply
Also issues with apostrophes (devil's advocate), slashes (K/S), etc. Thoughts? Maybe all ASCII chars should be considered Latn? Benwing2 (talk) 22:28, 3 April 2024 (UTC)Reply
@RichardW57m There are no terms so far in Category:Pali terms written in multiple scripts, and only one in Category:Sanskrit terms written in multiple scripts, which is उपेक्षिन्द्रिय​. Do you know why that term is there? Benwing2 (talk) 23:12, 3 April 2024 (UTC)Reply
NVM, the term wrongly contained a U+200B (zero-width space) at the end. Benwing2 (talk) 23:17, 3 April 2024 (UTC)Reply
@Benwing2, -sche: My first cut solution would be to ignore all characters in the Unicode script Common, aka Zyyy, and Inherited, aka Zinh. See https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt for definitions. The first includes ASCII non-letters. Note that many Thai abbreviations end in full stops - just look at category Category:Rhymes:Thai/ɔː - and they're being assigned to the category Creating Category:Thai terms written in multiple scripts. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)Reply
There is at least one term in Category:Pali terms written in multiple scripts, but you have to look at the categories of এৰ to see it. These two categories could conceivably take a week for all the members to be recorded in the category views. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)Reply
Is Thai โควิด-19 written in multiple scripts? --RichardW57 (talk) 23:54, 3 April 2024 (UTC)Reply
@RichardW57 I have already excluded periods (full stops) from consideration for all scripts, along with commas, hyphens and spaces. I would argue that โควิด-19 contains multiple scripts; certainly it looks that way on first glance. Benwing2 (talk) 23:56, 3 April 2024 (UTC)Reply
Can you tell me what's going on with এৰ? Does this legitimately have two scripts? If not, why not? Benwing2 (talk) 23:57, 3 April 2024 (UTC)Reply
@Benwing2: I didn't design the Wiktionary script concept. As far as Unicode is concerned, it's in a single script, the Bengali script, and from its usage, it would seem that at least some Bengalis think it is. There's a relevant discussion at Template talk:pi-alt#ৰ. For script determination, it's the same as ৰরো (varo) discussed above. Pali in Bengali script is a mixture of Beng (uses ) and as-Beng (uses , but for /v/, not /r/). We could put it in a single script pi-Beng created by adding 'ৰ' to Beng. --RichardW57 (talk) 00:16, 4 April 2024 (UTC)Reply
@RichardW57 If it's just a single char (or a fixed set of chars), I can add an exclusion for it, just like I've done for things like apostrophes in Cyrillic. Benwing2 (talk) 00:17, 4 April 2024 (UTC)Reply
And U+200C in fa-Arab. Benwing2 (talk) 00:18, 4 April 2024 (UTC)Reply
And U+200D in Sinh, and in whatever we use for the Bengali script for Pali. (It's needed in the latter to stop 'vy' being rendered with a repha.) --RichardW57 (talk) 00:28, 4 April 2024 (UTC)Reply
I think U+200C may needed in Deva to support pedantic Hindi and also the faking of Sanskrit quotations (though possibly the latter doesn't matter for this application). Also needed in Tham for some Lao words where the ᨶᩣ ligature was deliberately not used. Possibly also for some odd-looking Tham-script Pali. --RichardW57 (talk) 00:34, 4 April 2024 (UTC)Reply
Also, some cases where the ligature isn't formed in Northern Thai when the consonant and vowel are in different syllables might not be errors --RichardW57 (talk) 01:17, 4 April 2024 (UTC).Reply
@Benwing2: COVID-19 clearly contains a heathen (Arabic to be precise) number in it - shouldn't that similarly be categorised as mixed script? --RichardW57 (talk) 00:20, 4 April 2024 (UTC)Reply
@RichardW57 Maybe; but Arabic numerals are the native numeral set for Latin script whereas Thai script has Thai numerals natively. Benwing2 (talk) 00:21, 4 April 2024 (UTC)Reply
@Benwing2: What? Roman numerals are the native set for the Latin script, not these newfangled (Western) Arabic numerals, which incidentally are the usual set for Maghribi Arabic. And Thais do their arithmetic in European-style Western Arabic numerals, and may convert the results to use Thai digits. --RichardW57 (talk) 00:38, 4 April 2024 (UTC)Reply
There may be an exception if an abacus is used, but I think that would be by Chinese Thai, and I've never seen a Thai use an abacus. --RichardW57 (talk) 00:44, 4 April 2024 (UTC)Reply
Serious books in English often start their page numbering using Roman numerals; less commonly, serious books in Thai start with numbering in letters. Contrariwise, magazines in English and in Thai generally use Western Arabic digits for their page numbering throughout. --RichardW57 (talk) 01:07, 4 April 2024 (UTC)Reply
١٢٣٤٥٦٧٨٩٠ ≠ 1234567890 Chuck Entz (talk) 05:41, 4 April 2024 (UTC)Reply
The first set are the near eastern digits ('ARABIC-INDIC' digits in Unicode parlance), not the Western Arabic digits nor, in Unicode parlance, the EXTENDED ARABIC-INDIC digits (a slightly dodgy concept). --RichardW57 (talk) 20:36, 4 April 2024 (UTC)Reply
@Benwing2: Lithuanian is going to have a problem with U+0301 COMBINING ACUTE ACCENT and U+0303 COMBINING TILDE not being included in Latn. For future-proofing, we should also include U+0300 COMBINING GRAVE and U+0307 COMBINING DOT ABOVE. You're probably better off ignoring characters from the combining diacritics block altogether - there are issues with Romanian (combining comma below) and Thai-Script Patani Malay. I'll dig into them on request. --RichardW57 (talk) 01:41, 4 April 2024 (UTC)Reply
@RichardW57 I agree; done. Benwing2 (talk) 01:50, 4 April 2024 (UTC)Reply
I am thoroughly confused by this category. I went to see what could be an example in English other than terms that have numbers in them, which is a pretty suspect inclusion, and I saw Holy Wednesday is in Category:English terms written in multiple scripts. 1.) Why? 2.) How??? That category does not appear when I look at the entry, it is not a hidden category, and I assumed that it must have been in the entry as a manual addition that was recently removed, so there was just a lag time in the MediaWiki software generating the category, but it hasn't been edited in a year! How is "Holy Wednesday" multiple scripts??? Note that this is just a random example but there are many more that seem to have no clear reason for inclusion. —Justin (koavf)TCM 00:33, 4 April 2024 (UTC)Reply
@Koavf That is because of MediaWiki lag. When I first added the category, I forgot to exclude spaces from consideration, so some terms with spaces got added. They will clear in time. Benwing2 (talk) 00:35, 4 April 2024 (UTC)Reply
So it seems like most of the legit entries are letters-with-numbers, letters-with-@, and Roman-and-Greek-letters mixes, which is more-or-less sensible. As noted above, Arabic numbers are the native numeral system in English, so it's maybe arguable that this is "multiple scripts", but other typographic characters like "@" are definitely not a standalone "script", but perfectly normal parts of English-language writing. An entry like Borel σ-algebra seems legitimate. —Justin (koavf)TCM 00:40, 4 April 2024 (UTC)Reply
Letters-with-numbers and letters-with-@ aren't considered multiple scripts; I exclude all non-letter ASCII symbols from consideration when the script is Latin. Any of this nature that you see are due to MediaWiki lag. Benwing2 (talk) 00:53, 4 April 2024 (UTC)Reply
@Koavf the category is now clear of all stray (lagged) entries.
@Benwing2 we still need to dismiss en rules (Einstein–de Haas effect) from the category. Also not so sure about superscript numerals like I²C. This, that and the other (talk) 04:55, 4 April 2024 (UTC)Reply
The Unicode rules say they no more count for script determination than do ASCII digits. --RichardW57 (talk) 05:06, 4 April 2024 (UTC)Reply
FWIW, I would also not consider B♭ to be "multiple scripts". Would it work to (a) only categorize entries if they use multiple code-having scripts (so, using one script like Latn + using characters that are not script-specific won't get categorized, only the use of 2+ scripts like Latn + Arab would get categorized), and (b) also exclude any non-script scripts that need to be excluded, like if ♭ or ' (etc) is in Zsym, then have things in Zsym count as "not script-specific" for this purpose. ? - -sche (discuss) 05:33, 4 April 2024 (UTC)Reply
@-sche Sort of. I think your idea is a good one but there are still some special cases, e.g. I just had to add a case for Cyrillic ъ and ь used in Proto-Slavic Latin terms, and it is a bit trickier to implement than what I'm doing so far. Benwing2 (talk) 06:04, 4 April 2024 (UTC)Reply
@Benwing2: A case could be made for exempting the entire Reconstruction namespace, since they're in effect not spelled so much as notated. Chuck Entz (talk) 06:14, 4 April 2024 (UTC)Reply
@Chuck Entz I agree, and have added this exemption. Benwing2 (talk) 06:43, 4 April 2024 (UTC)Reply
Well, it's clearing, but it still has (e.g.) in hysterics on my end. It went from 443 to 233, so MediaWiki is doing its magic, so thanks to whomever (BW?) did that. I reckon we will soon have it whittled down to the 60 or so semi-legitimate entries.
I would think that "letters-plus-numbers" terms are actually much more reasonable to put in Category:English terms with numerals or somesuch (note that Category:English terms containing Roman numerals exists), as that could plausibly be something that someone is searching. And I don't think that someone who wants to see "Latin-characters-with-Greek-characters" also wants to see COVID-19 or A♭. Since it seems like a substantial majority are actually entries with Greek characters, I could give a weak support to "Category:English terms with Greek characters" or somesuch. —Justin (koavf)TCM 08:13, 4 April 2024 (UTC)Reply
@Koavf I manually purged the whole category but some things have crept in afterwards. Benwing2 (talk) 08:23, 4 April 2024 (UTC)Reply
I wasn't joking when I suggested it might take a week. I've certainly waited the best part of a week for a change to Pali categories to converge, and Pali is only a small part of Wiktionary. --RichardW57m (talk) 15:11, 4 April 2024 (UTC)Reply
@-sche @RichardW57 I have redone the algorithm and made it simply elide the difference between e.g. Beng and as-Beng (in general ignoring the language-specific component of a script), which should fix the issue with এৰ. A side effect of this is that โควิด-19 no longer is considered to have multiple scripts (and wouldn't even if it mixed Thai characters with e.g. Devanagari numerals, I think). Benwing2 (talk) 03:27, 5 April 2024 (UTC)Reply
Thanks. I'll defer to people who edit Thai, but my impression is that Thai uses Arabic numerals so normally that a text using them would not strike speakers as mixing scripts the way a mixture of Thai and Arabic letters would; certainly, I see that many languages like Chinese use Arabic numerals regularly enough that they don't seem to be part of a different script. So I think โควิด-19 not being considered to have multiple scripts is appropriate. - -sche (discuss) 04:30, 5 April 2024 (UTC)Reply

Why is .nato in Category:Translingual terms written in multiple scripts ? edit

Equinox 08:46, 4 April 2024 (UTC)Reply

Because it has "." Note that this will be purged and no longer appear in said category soon. E.g. I do not see it on my end.Justin (koavf)TCM 08:56, 4 April 2024 (UTC)Reply

This is a good idea but there are still several terms being falsely categorized, including (within the English category) 5′ cap, Ger⁺⁶, H₂O, ni🅱️🅱️a, o͝o, and others. Now I realize that I've been criticized for the same thing, but in this case there really was a severe lack of testing before making a change. I think a much more conservative approach is required, where two scripts (e.g. Latin and Greek) are explicitly set as "different". It might even have to be done on a per-language basis, since Japanese being written using Chinese characters is clearly different from the other way around. By the way, @Koavf, your idea would exclude the entries い-adjective and な-adjective, which are definitely the most interesting of the bunch. Ioaxxere (talk) 19:24, 4 April 2024 (UTC)Reply

@Ioaxxere I agree in general about testing, but this kind of stuff is difficult to test completely beforehand and the effect of getting things a bit wrong is fairly minor (just a false positive in a category). But I am going to implement User:-sche's approach of excluding all symbols and anything not a proper "script" from consideration; just had to get some sleep :) ... Benwing2 (talk) 19:44, 4 April 2024 (UTC)Reply

Does anyone know how to do this? edit

Does anyone know how to check for changes on a Language as a whole? So say i wanted to keep an eye on what changes are mad on English as a whole, including entries, categories and what else, is there a way to easily view them instead of having to see the ‘newest changes’ table of every category? Melithius (talk) 10:07, 4 April 2024 (UTC)Reply

@Melithius This is kind of possible: Go to Category:English lemmas and click "Related changes" on the left sidebar. For completeness, you would also need to monitor Category:English non-lemma forms' related changes page too. All English entries are in one or other category.
The big drawback, which will become obvious as soon as you attempt this, is that all changes for the entries concerned will be shown, even those relating to other language sections of the entry. But it may still be workable for you depending on what you want to do. It is likely to be very workable for languages written in scripts other than Latin. This, that and the other (talk) 11:41, 4 April 2024 (UTC)Reply
Ah ok yes it worked, especially with the other languages i wanted to view, as you mentioned. Thanks! Melithius (talk) 13:02, 4 April 2024 (UTC)Reply

Horizontal toclimit2 edit

Would you like to test e.g. at te or a something like {{Template:User:Sarri.greek/toc2-hor}}
If you think it looks better that the vertical toclimit, could a real programmer take a look? (my amateurish Module:User:Sarri.greek/toc2-hor, style.css, Template:User:Sarri.greek/toc2-hor alert programmers  MM Benwing2, Surjection PS Would editors of 3phased languages like something like wikt:el:Tempalte:test-ol? Thank you ‑‑Sarri.greek  I 05:13, 5 April 2024 (UTC)Reply

Template:ja-new some changes edit

Accelerated Japanese entry creation {{subst:ja-new|へん-のう|s|returning|to return}} didn't work on creation 返納. Anatoli T. (обсудить/вклад) 08:20, 5 April 2024 (UTC)Reply

@Atitarev What went wrong? It looks OK to me, although maybe I missed something. Benwing2 (talk) 08:54, 5 April 2024 (UTC)Reply
@Benwing2: To reproduce, paste the full code obove on an empty line in the same entry and preview.
I didn’t generate the entry, I made it manually. The code above is supposed to create a verbal noun and verb entry simultaneously. Anatoli T. (обсудить/вклад) 09:52, 5 April 2024 (UTC)Reply
@Atitarev Hmm, I tried it and it seems to work fine for me. What is the error you're seeing? Benwing2 (talk) 20:16, 5 April 2024 (UTC)Reply
@Benwing2: Thanks for checking. Something happened between yesterday and today, I was getting some string concatenation error. Anyway, it's working now. Anatoli T. (обсудить/вклад) 23:06, 5 April 2024 (UTC)Reply
@Benwing2: Hi. It happened again on 返納金(へんのうきん) (hennōkin): Lua error in Module:template_parser at line 402: bad argument #1 to 'find' (string expected, got nil)
I used {{subst:ja-n|へん-のう-きん||refund, repayment}}
It fixed itself on the 2nd edit but I saved this revision. Anatoli T. (обсудить/вклад) 05:22, 6 April 2024 (UTC)Reply
Also calling @Theknightwho. It's your module. Anatoli T. (обсудить/вклад) 05:27, 6 April 2024 (UTC)Reply
@Atitarev Hmm, I took a look at the error but I'm not sure why it happened. Usually this would mean someone accidentally introduced a bug and then quickly fixed it, but I don't see evidence of this. The error is in Module:template parser, which has been edited recently by User:Theknightwho but not in the last few minutes (and he hasn't contributed anything in a few hours). Benwing2 (talk) 05:27, 6 April 2024 (UTC)Reply
@Benwing2: I think it's the same as yesterday. It fails on the preview or first edit on a NEW page. Then it can be fixed by a new edit with the same code. Anatoli T. (обсудить/вклад) 05:33, 6 April 2024 (UTC)Reply
@Atitarev Hmm. Does it always happen on a new page? If so I may be able be fix it. Benwing2 (talk) 05:36, 6 April 2024 (UTC)Reply
@Benwing2: Yes, on a new page. I don't know when it started to occur but I only noticed yesterday. It may have been a few weeks since I made new Japanese entries. Anatoli T. (обсудить/вклад) 05:39, 6 April 2024 (UTC)Reply
@Atitarev Yes, I can reproduce this, but I can't figure out how to get a full stack trace due to the substing that's going on. Hopefully User:Theknightwho should be able to fix this; I imagine it is a simple fix. Benwing2 (talk) 05:47, 6 April 2024 (UTC)Reply
@Atitarev @Benwing2 I’ll need to check when I’m on my laptop, but that error suggests that something is feeding nil into the parser instead of the page content. I know that subst sometimes causes a page to need to be saved twice to fully take effect, so I wonder if that’s a relevant factor here. Theknightwho (talk) 12:49, 6 April 2024 (UTC)Reply
@Theknightwho, @Benwing2: Thanks, please do check.
I've made a three language (including four Chinese varieties) entry on 再起 with:
{{subst:zh-n|v|to rise again, to make a comeback||resurgence, comeback|k=재기}}
{{subst:ja-new|さい-き|s|resurgence, comeback|to rise again, to make a comeback}}
Only the Japanese entry failed, you can see in the edit history. The error was different this time.
The only sort of strange behaviour with "subst" I observed before was when something is reliant on the entry existence and it wasn't created yet, it showed some temporary errors, e.g. Thai readings in a usex or even headword but that behaviour changed to better.
Please fix. It may discourage users from making new accelerated Japanese entries, they will just think it's not working at all. Anatoli T. (обсудить/вклад) 00:03, 7 April 2024 (UTC)Reply
For experimenting, you can try creating a new entry on e.g. 才気(さいき) (saiki, wisdom) with this:
{{subst:ja-new|さい-き|n|wisdom}} Anatoli T. (обсудить/вклад) 00:07, 7 April 2024 (UTC)Reply
@Atitarev This should be fixed. Let me know if you're still having issues. Benwing2 (talk) 07:28, 8 April 2024 (UTC)Reply

Why do some Wikipedia images not show up when used on Wiktionary? edit

e.g. the cartoon I just added at Colonel Blimp. Equinox 13:32, 6 April 2024 (UTC)Reply

@Equinox Non-free images are uploaded to Wikipedia directly rather than Commons (where they’re not allowed). You could do the same, but we don’t really have any infrastructure for it. Theknightwho (talk) 13:39, 6 April 2024 (UTC)Reply
I see. Had noticed it seemed to happen with commercial-ish stuff like screenshots and comics. Equinox 14:06, 6 April 2024 (UTC)Reply
@Equinox If you do decide to reupload here, one other thing to be careful of is that permission to use non-free images is sometimes only given to Wikipedia by the copyright-holder. Theknightwho (talk) 15:02, 6 April 2024 (UTC)Reply
@Equinox, Theknightwho: A related discussion is Wiktionary:Beer_parlour/2023/August#Image_upload_rights, where people seemed to oppose including fair use images. I still think that Wiktionary is being seriously hampered by copyright paranoia. Ioaxxere (talk) 22:07, 6 April 2024 (UTC)Reply
It's not the culprit in this case, but FWIW another reason I've seen some images not display (anymore) here recently is that we added a bunch of images to our blacklist recently (because vandals started to put a few of them on irrelevant entries), and it turns out we were using at least one of them (to correctly illustrate nipple). (Perhaps someone could check whether any of the other images on MediaWiki:Bad image list are actually being used.) - -sche (discuss) 15:53, 6 April 2024 (UTC)Reply
There is a protocol for allowing the use of an otherwise banned image on an appropriate page, though I don't know the procedure offhand. bd2412 T 16:23, 6 April 2024 (UTC)Reply
In the nipple case, I just removed the image from the blacklist (it had been added as part of a mass import of WP's blacklist and not because anyone was specifically misusing it; I think we have abuse filters which stop most bad-image addition anyway). — This unsigned comment was added by -sche (talkcontribs) at 19:36, 6 April 2024 (UTC).Reply
I am willing to provide a free, tasteful image of my nipple. Equinox 22:09, 6 April 2024 (UTC)Reply
Only the one? DCDuring (talk) 23:36, 6 April 2024 (UTC)Reply
@Equinox we currently have a single non-free image at thagomizer. Indeed, we have a policy specifically to allow this file: WT:NFCC. If you want to upload a non-free file in the same vein as the Far Side strip we already have, you would need to ensure that "its presence significantly increases readers' understanding of the topic" (per point 5 of that policy). I'm not sure that a picture of Colonel Blimp would qualify. This, that and the other (talk) 02:54, 7 April 2024 (UTC)Reply
@Ioaxxere also. This, that and the other (talk) 02:54, 7 April 2024 (UTC)Reply
@This, that and the other See also Wiktionary:Beer_parlour/2024/April#Modify/deprecate_NFCC_or_request_re-enabling_Special:Upload_for_all_users? Liuxinyu970226 (talk) 04:08, 24 April 2024 (UTC)Reply
As noted above, we have a very restrictive media upload policy and only four pieces of local media, two of which are basically required by MediaWiki software, one as redundant in case there is some vandalism to the item at c:, and a single fair-use file. While these are the only files, there are several discussions of deleted and moved ones as well and those could also be instructive about what the requirements are to upload locally. —Justin (koavf)TCM 07:45, 7 April 2024 (UTC)Reply

Automatic acute stress addition to Belarusian (and possibly also Russian/Ukrainian) words in book quotations. edit

A somewhat relevant old discussion: https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2014/February#Should_quotations_be_normalized?. Ping @Benwing2, Atitarev, Insaneguy1083.

The current (unwritten?) rule is to add acute diacritics to mark stressed syllables in the quotations taken from the Belarusian, Russian and Ukrainian books (ex. дзот, бревно, завдовжки). However this is an annoying and time consuming chore for the native speakers and possibly a much more challenging and error prone task for the others. Not to mention possible typos. Also touching the original spelling just doesn't feel right.

I think that the stress marks can be added automatically for the majority of words. And I have created two proof-of-concept modules: Module:User:Ssvb/be-autostress-simple and Module:User:Ssvb/be-autostress-bloom-filter. The former is simple and doesn't scale. But the latter allows to squeeze up to ~30-40K lemmas and all their inflected forms (~200-300K words total) into a ~2MB Lua module without becoming a resource hog. It's possible to use data from https://github.com/Belarus/GrammarDB for the Belarusian words. And for the Russian language it's possible to just extract the words inflection and stress information from the Wiktionary dump (~53K lemmas). May I integrate it into the transliteration module? Does anyone see any pitfalls or have objections? --Ssvb (talk) 01:49, 7 April 2024 (UTC)Reply

@Ssvb I don't have major conceptual objections to this but there are a large number of considerations and edge cases that should be worked out *BEFORE* you integrate this into any transliteration module. I actually wrote an offline script awhile ago [7] to add automatic accents as well as lemma links to Russian terms, and it runs to 1,200 lines and took weeks of development effort to work out the kinks. Benwing2 (talk) 06:40, 7 April 2024 (UTC)Reply
@Benwing2: Thanks for the interesting link. I'm curious, how is this offline script used in practice? For example, К.Артём.1 have been adding some nice Russian quotations recently, but without annotating stressed syllables in them. Do you periodically run a bot to fix such quotations from time to time? How is this process organized? --Ssvb (talk) 13:53, 7 April 2024 (UTC)Reply
@Benwing2: As for the stress annotation in my Lua module, I want to keep it very simple and reliable without any extra bells and whistles. Your offline script has a lot more features, which are nice, but don't seem to be strictly necessary. Right now the Belarusian transliteration module already automatically annotates stress for the letter "o" and this doesn't seem to cause any problems. This algorithm guesses correctly in more than 90% cases. But it isn't perfect and makes mistakes, because compound words like "мовазна́ўства" or "штодня́" don't fit this model. This problem can be addressed by adding a small dictionary of these few problematic compound words. Once this is implemented, we just get a better user experience and no disadvantages at all! And once we have a dictionary framework up and running, nothing stops us from adding even more words to it. Conceptually this is still just an extension of the already existing letter "о" stress auto-guesser functionality.
As for the edge cases, the obvious ones are "гады́" vs. "га́ды". Also some capitalized proper nouns are tricky, such as "Та́ні" (genitive form of a girl's name) vs. "тані́" (imperative form of "to drown") or "Я́на" (genitive form of a boy's name) vs. "яна́" ("she"). The module needs testcases with a good coverage for such things, but handling them is pretty straightforward. At least that's how I see it right now. --Ssvb (talk) 14:24, 7 April 2024 (UTC)Reply
I'm not really a coder at least when it comes to Wiktionary, so I'm probably not one to answer here. I'm perfectly happy doing the stresses by hand, although as you mentioned, it's error-prone for non-native speakers like myself. Insaneguy1083 (talk) 11:40, 7 April 2024 (UTC)Reply
@Insaneguy1083: Thanks for your response. I can handle Lua coding myself and I'm primarily interested in your feedback as a user. I think that the Belarusian part of English Wiktionary needs a lot more editors to add a lot of the currently missing content, but the learning curve unfortunately seems to be too steep for many potential contributors. --Ssvb (talk) 14:37, 7 April 2024 (UTC)Reply
Adding accent marks to the first form of the quotation is deeply wrong. If you want to add editorial opinion to the line, there are {{quote-book}} options such as |norm= for this. While I understand why we don't do transliteration for Thai, it bothers me that there is no necessary relationship between the apparent transcription and how the original utterer would have intended the sentence to be said. For comparison, imagine transcribing "the ignominy of either economic controversy". --RichardW57 (talk) 17:21, 7 April 2024 (UTC)Reply
@RichardW57: I'm completely ignorant about Thai, do you mean that you would prefer |norm= instead of |transliteration= for Thai word quotations, such as the quotation used for "ระกาศก"?
I agree that it seems natural for |text= to precisely reproduce the original spelling of the quoted book, but these things are rather loosely documented in WT:QUOTE#Spelling_and_typography ("Generally, the original spelling of the word or phrase should be kept in the citation. In practice, however, this doesn't always happen") and new contributors tend to mimic the formatting of the existing entries. The language-specific guidelines in WT:ARU could potentially provide clarifications specifically for the Russian entries, but currently it has no clear explanations for book quotations.
I propose the following:
  • In a Russian quotation like |text=Мама мыла раму|t=Mom was washing a window frame, the Lua module can automatically create its normalization |norm=Ма́ма мы́ла ра́му using a dictionary and then the template can create transliteration |tr=Máma mýla rámu from this normalization. But if |text= already contains acute stress marks like it is done now, then the generation of normalization can be suppressed.
  • In a Belarusian quotation, Cyrillic normalization can be automatically created even from Łacinka and automatically stress annotated using a dictionary: |text=Ulezła ŭ chatu jak sztodnia|norm=Уле́зла ў ха́ту як штодня́|tr=Uljézla ŭ xátu jak štodnjá|t=Sneaked into the house like it was a daily routine.
The downside is that having both |text= and |norm= adds extra visual clutter, so I understand why the existing practice of replacing text with its normalization in Russian quotations has its appeal. --Ssvb (talk) 02:44, 8 April 2024 (UTC)Reply
@Benwing2: I just noticed that the |norm= parameter and the https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2023/July#Adding_a_normalization_param_to_{{ux}},_{{quote}},_etc. discussion about it was a relatively new development. Is there a framework and some sort of standardized Lua modules naming convention planned for hooking the automated conversion from |text= to |norm=? I mean something similar to the Module:languages#Language:transliterate functionality. --Ssvb (talk) 04:49, 8 April 2024 (UTC)Reply
For example, ภรรยา (pan-yaa, wife) may also be pronounced pan-rá-yaa. When we transliterate it in a quotation, we unintentionally attribute the 2-syllable pronunciation to the author. With โควิด-19, we have no idea whether the number part would have been pronounced as in Thai or (approximately) as in English. This problem is inherent in unfaithful transliteration, and I think it's rampant in Japanese with its multiple readings. With such systems, reason for scepticism increases as ones go from text to normalisation to 'transliteration' to translation. --RichardW57m (talk) 17:07, 8 April 2024 (UTC)Reply
@Ssvb: I didn't have a chance to review it thoroughly but to me, it seems like an almost, if not completely impossible task. You will get a lot of false positives, especially for Russian and Belarusian Ukrainian. For users not familiar with East Slavic languages, I wouldn't recommend "guessing" th emain stress or stress and inflection pattern without consulting dictionaries. Anatoli T. (обсудить/вклад) 05:37, 8 April 2024 (UTC)Reply
@Atitarev: The essence of my suggestion is to incorporate a small dictionary with the most common ~30K lemmas and all their inflections into a Lua module. So that the stressed vowels can be marked automatically when generating the transliterated text. And do this automatically only for those words, where this can be done unambiguously. Stress markup for the remaining words can be handled with the |subst= parameter. Alternatively, User:RichardW57 also mentioned the |norm= parameter, which could be used too. --Ssvb (talk) 06:01, 8 April 2024 (UTC)Reply
@Ssvb: I see thanks. If you're going to browse through the list of entries to be updated, as me and @Benwing2 did with the Russian accentuating effort, then that's fine, cases like (Russian) тра́ктора (tráktora) and трактора́ (traktorá) should be always kept in mind as well or words with multiple possible stress patterns. Anatoli T. (обсудить/вклад) 22:09, 9 April 2024 (UTC)Reply
@Atitarev: I have actually imported all Russian words from English Wiktionary and managed to fit them into a ~6MB Lua module. It's split into three ~2MB chunks due to module size limits. And now the Module:User:Ssvb/ru-autoaccent module can automatically mark stress and recover ё letters where it's possible. Feel free to experiment with its Module:User:Ssvb/ru-autoaccent/testcases to see if you can come up with something that this module can't handle correctly. --Ssvb (talk) 22:52, 9 April 2024 (UTC)Reply
I'm confused (as apparently is RichardW57) why it would be the norm to add acute accents to quotations in this context, any more than we would add accents to mark stress position in English, Italian, etc. quotations. It seems preferable to keep the original spelling.--Urszag (talk) 05:45, 8 April 2024 (UTC)Reply
@Urszag: The pronunciation of the Belarusian, Russian and Ukrainian words can be relatively easily deducted from their spelling in most cases. Except for the positions of the stressed syllables, which happen to be unmarked in books (other than the children's textbooks used for learning the language). I'm myself in favor of keeping the original spelling intact as far as the book quotations are concerned. And I would prefer to only add accents to the romanized transliterations. But the existing practice is to add acute accents to the original Cyrillic text. Maybe User:Atitarev can provide much better explanations about the reasons of doing it this way. --Ssvb (talk) 06:21, 8 April 2024 (UTC)Reply
We can also add Latin. For our head words and inflected forms, we mark long vowels, but not in quotations. --RichardW57m (talk) 17:12, 8 April 2024 (UTC)Reply

pre-phab question (categorically track uses of nonexistent templates) edit

If you use a module that doesn't exist, like {{#invoke:foobarbaz/templates|foobar}}, the page is categorized into Category:Pages with module errors. If you use an image or audio file that doesn't exist, the page goes in Category:Pages with broken file links. But if you use a template that doesn't exist, no categories are added AFAICT: the situation is not tracked, so unless someone looks at the page and sees the red "Template:foobar" link (thanks for catching this one, Chuck), it won't be noticed. I don't think this is something we can change, I think we'd have to ask the devs, so: is there already a Phabricator task about this? If not, who wants to start one? I can do it, but it might be better if someone with more technical expertise / sense of what would need to be done did it. (I poked around on Phabricator looking to see if there was already a task about this, and just saw tasks about turning non-existent template links red; that seems to work, so if the devs can do that, maybe they can also make such links categorize...?) - -sche (discuss) 17:21, 7 April 2024 (UTC)Reply

There's a list of "Entries using nonexistent templates" at Wiktionary:Todo/Lists JeffDoozan (talk) 17:29, 7 April 2024 (UTC)Reply
Great! But it would be useful if this were automatically generated (not requiring someone to run a bot), and on all wikis (not just this one), right? - -sche (discuss)
Yeah, definitely. I think it's weird that template redlinks aren't tracked or categorized automatically, which is why I was happy when TTO built a tool to do it for us. JeffDoozan (talk) 22:39, 7 April 2024 (UTC)Reply
Special:WantedTemplates does something like what seems to be asked for the 5,000 most common "wanted" templates, but it has been flooded with "wanted templates" like Template:tracking/inflection of/tag/Attic‏‎ (14 links), which is the least "wanted" of the 5,000. The most "wanted" is Template:tracking/parameters/empty parameter‏‎ (4,386,357 links). This does seem to be the detritus of some effort to eliminate the need to use dump processing to track "problems". I know of no documentation or explanation for this use. I don't know who uses this. If someone is using it, they seem to like being anonymous. DCDuring (talk) 00:08, 8 April 2024 (UTC)Reply
I once used this category to identify missing taxonomic and other templates. Now, of course, the name "WantedTemplates" is a misnomer. I doubt that there are even 50 actually "wanted" templates among the 5,000. I wonder about the value of this amount of tracking. Perhaps we are tracking non-problems or tracking real problems in overly fine detail. DCDuring (talk) 00:26, 8 April 2024 (UTC)Reply
@DCDuring The template tracking mechanism is incredibly useful but has as a side effect that Special:WantedTemplates gets filled up with tracking categories. It would be great if Wikimedia provided a way to specify prefixes to ignore in Special:WantedTemplates. @-sche maybe you can file a Phabricator request to this effect? This is equivalent to adding a simple blacklist, which is usually very easy to do programmatically. A conceivable alternative would be to use userspace (or maybe some other namespace that's ignored by the code that generates Special:WantedTemplates) for the template tracking mechanism. Maybe a Phabricator ticket could request more info on how Special:WantedTemplates works. Benwing2 (talk) 00:33, 8 April 2024 (UTC)Reply
Is it possible to have a compeletely invisible link like the ExpandTemplates trick (for others: see Module:debug/track) when linking to other namespaces? I can definitely do an invisible ping, but I have to use a sp ace to avoid the pipe trick. As for which namespace, I would recommend subpages of WT:tracking, since we can make sure that it doesn't conflict with anything else. Chuck Entz (talk) 01:23, 8 April 2024 (UTC)Reply
@Chuck Entz I think this is a great suggestion. If no one complains in a couple of days, I will make the change. Benwing2 (talk) 02:03, 8 April 2024 (UTC)Reply
Sorry, I'm trying to follow along: would a link being invisible (in the manner of that ping) make it not show up on Special:WantedTemplates, or what is making it invisible like that accomplishing...? It seems like the existing tracking links are already invisible; at least, they don't seem to generate visible redlinks in entries. As for changing "tracking" links to be links to some namespace the Special: list ignores, that sounds like a great idea. - -sche (discuss) 02:38, 8 April 2024 (UTC)Reply
The way the tracking mechanism works is by calling the expandTemplate API call, as if a template call to Template:tracking/inflection of/tag/Attic‏‎ or similar had been inserted; this causes the page to show up in e.g. Special:WhatLinksHere/Template:tracking/inflection of/tag/Attic, but also makes the tracking page show up in Special:WantedTemplates. However, templates don't actually need to be in the Template: namespace. In fact, you can transclude any page (including mainspace pages) into another page using the template calling syntax; see w:Help:Transclusion. It appears, however, that only pages in the Template: namespace show up in Special:WantedTemplates; if you look at w:Special:WantedTemplates on the English Wikipedia, for example, there are only 68 pages listed, all of which are in the Template: namespace. So if we use a different namespace for the tracking pages, the Special:WhatLinksHere/... trick should still work, but the pages won't pollute Special:WantedTemplates. Benwing2 (talk) 02:49, 8 April 2024 (UTC)Reply
@-sche: my point was that outputting a space isn't quite invisible. There's no redlink, but if the template is before other text, it will move it to the right. If I understand what Benwing2 is saying, specifying another namespace won't change the way the current code works- which is truly invisible. That's what I was asking about- whether it was possible to do something like that rather than my imperfect method. Chuck Entz (talk) 03:14, 8 April 2024 (UTC)Reply
@Chuck Entz @-sche @DCDuring @Theknightwho I went ahead and switched the template tracking mechanism to use Wiktionary:tracking/.... Please let me know if anything goes wrong, and revert if so. Benwing2 (talk) 05:50, 8 April 2024 (UTC)Reply
Specifically, my change to Module:debug/track. Benwing2 (talk) 05:52, 8 April 2024 (UTC)Reply
after e/c: @User:Benwing2 How exactly is, say, Template:tracking/parameters/empty parameter used? Is there no other way to track 4.4 million empty parameters in templates, if that is indeed what is being tracked. What is the point of tracking them? I don't see how this (empty parameters) is even a problem at all. It isn't really even diagnostic of anything of substance. Do we need another user space for this kind of thing or just to rethink this tracking with more respect for the tools MW software provides and users use or at least used when they were usable? DCDuring (talk) 01:29, 8 April 2024 (UTC)Reply
@DCDuring: Special:WhatLinksHere/Template:tracking/parameters/empty parameter, not to mention with hastemplate:"tracking/parameters/empty parameter" in searches. There may be some way for bots to use it, too. Chuck Entz (talk) 01:50, 8 April 2024 (UTC)Reply
@Chuck Entz I can find many thousands of instances pages with that pseudo-template and Template:taxon, but what is the actual problem that category membership is diagnostic of? In Animalia the offending template is {{der}}, the documentation for which says that parameter 4 is optional. If 2/3 of all entries "want" the pseudo-template, how useful is it? We have lots of templates that have documented use of such optional numbered parameters. Is this kind of shoddiness to be found often in this morass? This seems like a case of boiling the ocean to me. DCDuring (talk) 02:13, 8 April 2024 (UTC)Reply
@DCDuring It should be possible to exclude instances like that, which we already do elsewhere, since it's clearly a legitimate way to use empty parameters.
The reason why it's good to track this kind of thing is so you can see if things are likely to blow up if you turn on unused parameter checking for a particular template (which is helpful to catch mistakes), but that's done in a somewhat more intelligent way, and the two clearly should be aligned. Theknightwho (talk) 05:57, 8 April 2024 (UTC)Reply
@-sche @JeffDoozan I'm pretty sure this is possible to do, but it may take some time for me to work out the kinks. Theknightwho (talk) 05:49, 8 April 2024 (UTC)Reply
OK. FWIW, I was thinking it'd useful if a category was generated "server/software-side" like Category:Pages with broken file links, which gets updated automatically in near-real time, so if you're thinking of generating such a category "locally" (with the parser, etc) instead, then I would say that at least in my opinion, that's not a priority, if you have other things to work on, or if it would take resources (Lua memory, etc), or if it'd be liable to have unintended side effects like auto-categorizing multi-script terms initially turned out to, since we have TTO's lists, and (hopefully) will soon have Special:WantedTemplates again. - -sche (discuss) 13:46, 8 April 2024 (UTC)Reply
@-sche I was thinking of integrating it into the parse which is already done by Module:headword/page, but @Benwing2's solution makes more sense. Theknightwho (talk) 15:10, 8 April 2024 (UTC)Reply
  • How can normal contributors who are unaware of the existence and names of all the tracking templates (obviously many more than 5,000 of them) make use of them for any purpose? For me, a useful bi-weekly MW run to populate one of the special pages has been rendered useless. Can we be assured that any need we feel for a list of entries that have some kind of problem will be quickly responded to by our techno-mavens? I think not. This seems like a reversion of capabilities.
Can those who can explain how useful these things are do whatever it takes to get MediaWiki to do/permit whatever is needed to restore the reversion and make for room for technical innovation? Can @User:-sche explain it to them? If not, can someone else (BW, TKW, JD, TTO, ?) do it? DCDuring (talk) 16:20, 8 April 2024 (UTC)Reply

How can normal contributors who are unaware of the existence and names of all the tracking templates (obviously many more than 5,000 of them) make use of them for any purpose?

I don't think you can. They are for Lua developers. Now that the tracking system has been moved to a different namespace, Special:WantedTemplates will gradually start to become more and more meaningful, and once MediaWiki finishes processing the changeover, it's unlikely you will come across or notice the tracking templates ever again.
How do Lua developers keep track of the them?
What is your best guess as to when these will be gone from Special:WantedTemplates? That listing is updated, I believe, every 2 weeks. I wonder whether I will be alive to see it emptied of these. DCDuring (talk) 12:30, 9 April 2024 (UTC)Reply
At the current rate of several hundred per minute, it will probably be a week or two. Template:tracking/parameters/empty parameter is currently at 3,450,164 Chuck Entz (talk) 13:40, 9 April 2024 (UTC)Reply
3,418,788 now. ~300/minute, which implies 8 days to completion at linear pace. DCDuring (talk) 15:25, 9 April 2024 (UTC)Reply
It has always seemed to me that the amount of decrease per hour for such background updating diminished over time, but not quite exponential decay. Exponential decay would imply months or longer for total emptying. Does anyone understand the specifics of this updating process? DCDuring (talk) 14:12, 9 April 2024 (UTC)Reply
3,113,477 at 01:21, 2024 April 10. But the least-frequent item is wanted only 4 times vs. 14 times when the process began. DCDuring (talk) 19:41, 10 April 2024 (UTC)Reply
2,234,457 now, nearly halfway to being excluded from the page. Least frequent items are reported at 3, but some have 2, 1, or 0! links remaining, which means that many truly "wanted" templates will become visible the next time the 5,000-item list is generated. DCDuring (talk) 00:56, 12 April 2024 (UTC)Reply
767,747 now. Clearly not simply linear. If it were, there would be none left. The pace of removal is much slower. DCDuring (talk) 11:59, 20 April 2024 (UTC)Reply
731,379 now. DCDuring (talk) 01:00, 22 April 2024 (UTC)Reply
@DCDuring: Try it now. I figured out how to change the parameters in the API Sandbox setup linked to from CAT:E so that it purges pages that transclude this. I have it set to purge 75 pages at a time, which takes about 20-30 seconds from when you click "Make request" (it times out if it hits 30 seconds). Of course, after a while you feel like one of those lab rats they trained to press a lever for a food pellet, so it's not practical to clear it all at once that way. I would recommend setting it up in a separate browser tab or window so you can click on it every once in a while and go do something else. Chuck Entz (talk) 17:44, 20 April 2024 (UTC)Reply
Oops! Forgot the link: [8]Chuck Entz (talk) 18:41, 20 April 2024 (UTC)Reply
More usefully, you can click on generator=embeddedin and replace the geititle field with the tracking template you want to remove from Wanted templates, then click "Make request". If it has fewer than 75 links, running this once or twice will clear it. Chuck Entz (talk) 19:11, 20 April 2024 (UTC)Reply
@DCDuring I assume that what happens is that when you make a change to a module, it computes the graph of all pages needing recomputing and queues them up for recomputation. That means eventually all the pages will get recomputed. But a strict first-in-first-out approach wouldn't be fair, because in a case like this, where a change is made that touches a very large number of pages, it would block all other recomputation requests until all the large number of pages get recomputed. So those other changes must get prioritized ahead esp. if smaller, which means over time the remaining pages to compute will get more and more diluted by other requests. It can't be strictly exponential in its decay but there clearly is a decay. In a case like this I can do a purge operation to force this to speed up, but it's not clear it's worth it as all wanted templates are now represented in the list (the max-5000 list goes up only to 4,528 wanted templates). Benwing2 (talk) 01:12, 22 April 2024 (UTC)Reply
On top of that, I've done hundreds of null edits on the 1-link and 2-link items, so I'm confident that the tracking templates will only number in the hundreds on the next list- mostly clustered at the top of the list, but fewer in number than some other types. For instance, old discussions dating to the era when language codes were separate templates are probably responsible for hundreds of items in Wanted templates.
As for the pattern of removals: my take on it is that the automated part is very slow, but the fact that these are so widely distributed means that the higher-traffic pages have been cleared by unrelated edits first. Thus the quick ones get done quickly, leaving the rarely-edited, slow ones to make up a higher and higher proportion of the remainder. Also, the more of these there are on one page, the higher the higher the number that gets cleared by a single edit, and the higher the likelihood that a null edit or a purge will be performed on it. That means that the the thinly distributed and out-of-the-way ones will be the last to go. Chuck Entz (talk) 02:55, 22 April 2024 (UTC)Reply
An algorithm that was seeking to shorten the list would only work on the least frequent items that appear on the list or the least frequent items that do not appear on the list and leave the items that needed millions of changes for last. There might be some other ordering principle, like last edit date/time. In my ignorance, I would consider it random until authoritatively told otherwise. The effort to create an ordered list of items by frequency would probably not be worth doing multiple times, as evidenced by the relatively infrequent updates of many of the special pages. DCDuring (talk) 12:08, 22 April 2024 (UTC)Reply
What I was talking about assumes random ordering. When anyone edits a page, all the waiting changes are processed independantly of the automated processes during the saving of their edit. That's why we do null edits. These unrelated, unscheduled edits by normal editors are responsible for the faster pace at the beginning- the pages more likely to be edited are more likely to be cleared ahead of schedule, but once they're edited, they drop off the list. That leaves the ones that aren't edited to be processed by the slower, scheduled automated process. In other words, you have the completely random scheduled process combined with the work by people who just coincidentally edit one of these pages. The choice of pages edited by humans isn't random- no one bothers to edit a form-of entry that's already in good shape, but translations get added to English entries for basic concepts all the time. Chuck Entz (talk) 12:56, 22 April 2024 (UTC)Reply
Do we get 100,000 edits per day? (There have been only 100,000,000 edits since enwikt began.) At 100,000 edits per day, we would have only had 1.5MM since the change. I expect that would mean many fewer than 1.0MM (~0.5MM?) entries have been edited. That would mean that the automated process is doing the bulk of the work. Do bot edits count as edits for these purposes? DCDuring (talk) 17:18, 23 April 2024 (UTC)Reply

Can we be assured that any need we feel for a list of entries that have some kind of problem will be quickly responded to by our techno-mavens?

I'm always open for requests for a new WT:Todo/Lists list, although my time is finite. This, that and the other (talk) 11:57, 9 April 2024 (UTC)Reply
Thanks for the offer. I'll try not to take up your time frivolously, but I am not good at guessing at what is hard and what is easy. Also, JeffDoozan has waded into taxonomy, so working with him to perfect and extend what he has already done is probably my best course. I can also do a lot with Cirrus regex searches. DCDuring (talk) 12:30, 9 April 2024 (UTC)Reply
It occurs to me... as the Special pages update, won't moving tracking "templates" from the Template: namespace to the Wiktionary: namespace just mean they're going to swamp Special:WantedPages instead of Special:WantedTemplates? Pages in the Wiktionary namespace do seem to show up in Special:WantedPages, e.g. Wiktionary:Ushojo transliteration. Will something stop the tracking "templates" from showing up there? Perhaps it would be prudent to evaluate how many tracking templates we actually need, and whether we are actually getting use out of 5,000+...? (Alternatively, do they have to be wanted pages, redlinks? Could we clean them out of the Special pages by just mass-creating the pages, having a bunch of empty subpages of "Wiktionary:Tracking/..."?) - -sche (discuss) 23:17, 13 April 2024 (UTC)Reply
@-sche I don't think that's going to happen. By now there should be lots of such pages in Special:WantedPages but in fact there are none, so something in the way the tracking mechanism works must not be triggering Special:WantedPages (I think it's because they're only linked to and not actually transcluded). Benwing2 (talk) 23:34, 13 April 2024 (UTC)Reply
@Benwing2 vice versa actually: WantedPages only tracks links. WantedTemplates only tracks transclusions. This, that and the other (talk) 05:10, 14 April 2024 (UTC)Reply
@This, that and the other I see. I guess what's happening then is that expandTemplates is transcluding the tracking page and throwing away the result, and Special:WantedTemplates only includes pages in Template space, so the tracking pages don't end up anywhere (as is desired). Benwing2 (talk) 05:13, 14 April 2024 (UTC)Reply
@User:Benwing2 Special:WantedTemplates is, even now, dated 11 April 2024, so we should probably be waiting for its next update before celebrating victory. DCDuring (talk) 11:53, 20 April 2024 (UTC)Reply
@DCDuring Are you sure? It's dated 00:27 19 April 2024 for me. Benwing2 (talk) 19:38, 20 April 2024 (UTC)Reply
@User:Benwing2 Sorry, I meant to type Special:WantedPages, which still has 11 April 2024 as date of last update. I was thinking of -sche's concern about merely shifting the problem from WantedTemplates to WantedPages. Conclusive evidence may not be in until next update. DCDuring (talk) 20:19, 20 April 2024 (UTC)Reply
@DCDuring I see. I think this page must update 3x/month (whereas the others update every 3 days); this presumably means there should be an update tomorrow. Benwing2 (talk) 20:21, 20 April 2024 (UTC)Reply
@-sche: There was an incident not that long ago when someone created one of these tracking templates and used it to insert objectionable material on all of the pages that transcluded it. There's now an abuse filter to prevent that recurring. I don't know if there's any difference between a page with no content and a redlink as far as the "expand templates" trick is concerned, but I would want to be real sure that it wouldn't have unwanted side effects before doing what you're suggesting. Chuck Entz (talk) 23:37, 13 April 2024 (UTC)Reply
For any admins who don't know what I'm talking about, I managed to find the deleted page: Template:tracking/links/redundant wikilink. I'm not going to undelete it because it was an attack page about someone offwiki that the perpetrator wanted to force every site visitor to see. It does show the need to be careful about this kind of thing.Chuck Entz (talk) 00:29, 14 April 2024 (UTC)Reply

suppress talk page links to old templates edit

As tracking templates clear out of Special:WantedTemplates, I notice that another source of cruft filling it up is links to old templates on talk pages and the like: Template:ru-noun-old, Template:onym, Template:pos_vi, etc are not "wanted" anywhere in mainspace, but are mentioned/linked on some talk pages, so they seem to be most of what's on the Special page apart from tracking templates. Do we want to (1) systematically unlink these, and/or (2) request a category after all, with capacity to "hide" "wantings" outside of mainspace, the way module errors in unimportant namespaces are in a subcategory rather than the main CAT:E? On one hand, 2 is how broken file links, module errors, parser function errors, etc are handled; on the other hand, 1 would address the fact that it doesn't really make sense to leave Special:WantedTemplates in a permanently unusable or cruft-filled state. - -sche (discuss) 15:16, 10 April 2024 (UTC)Reply

Clearly a good idea. If they can't be suppressed by changing how WantedTemplates works, perhaps we could have a mass replacement of the template link instances wherever they occur by something that doesn't link, essentially a nowiki wrapper. In the event an admin-archaeologist wants to find the old template, they could, after all, type the template name into the search box. DCDuring (talk) 02:30, 14 April 2024 (UTC)Reply
My concern about this is that it would essentially render some old discussions unintelligible (though this is already the case where templates have been broken etc). It might be better to replace the templates in those discussions with the current equivalents (where that's possible to do). Theknightwho (talk) 03:45, 14 April 2024 (UTC)Reply
I have to question the intent of this push. Isn't it less effort to simply use Wiktionary:Todo/Lists/Entries using nonexistent templates, which in my mind exactly reflects the Special:WantedTemplates output that is ultimately desired? Of course, I acknowledge -sche's point that this list is dependent on an external system, unlike the native WantedTemplates, but I don't plan on disappearing any time soon. This, that and the other (talk) 05:18, 14 April 2024 (UTC)Reply
@User:Theknightwho One implication of the unintelligibility argument is that we should stop deleting templates and restore all those that were actually implemented and have some functionality. The discussions that are the locations of many of the links are already unintelligible to someone who doesn't remember the templates because the templates have been deleted or never implemented and their invocation cannot display their former functionality. Maybe we could link them to an archived copy of the modules and/or other Templates (and CSS etc). they used when they were functional. Try looking at the links for a few of these.
@User:This, that and the other The vast majority of the 'wanted templates' (other than the thousands of "Template:tracking templates" that remain) that we are once again beginning to see are templates that have been deleted or were proposed and not implemented, at least not under the redlinked name. Many Talk, Wiktionary, User talk, User pages (eg, sandbox), and some others have discussions that mention them, usually in discussions of their deletion and/or replacement or of the proposed functionality. They are (almost?) entirely gone from principal namespace and those remaining instances in principal namespace are exactly what we would hope the WantedTemplates would help us find, once we are able to see the very-low frequency-of-occurrence templates.
Admittedly, the value to admin-archaeologists of these discussions is slightly diminished in that there is no direct link to the deleted template page, where they could try to begin to reconstruct the functionality should they wish to do so. They would have to type the template name into the search box.
Special:WantedTemplates will be in its ideal state when it is empty of templates wanted in principal namespace. It is unlikely to reach that state when it is filled with the crufty, irremediable 'wants' that are still the vast majority of the items in that page. DCDuring (talk) 14:37, 14 April 2024 (UTC)Reply

Another maintenance category edit

@Benwing2: Where LANG is a dummy substring, can we please have Category:Pages using bad params when calling LANG templates made a subcategory of Category:LANG_entry_maintenance. The contents of the contents of the former will often be of interest to those who review entries. --RichardW57m (talk) 11:55, 8 April 2024 (UTC)Reply

@RichardW57m Done. Benwing2 (talk) 21:45, 8 April 2024 (UTC)Reply
@Benwing2: Thank you. And thank you for making the language the chief part of the key within Category:Pages using bad params when calling a template. --RichardW57 (talk) 06:32, 9 April 2024 (UTC)Reply

Toki Pona auto hyphenation edit

I'm creating Template:tok-IPA (based on Template:eo-IPA in function) and the only missing piece is hyphenation, as @Spenĉjo and I don't know any Lua. For anyone who does, this should hopefully be simple because of the regular (C)V(n) syllable structure (list of letters; possible test cases: a‧nu, an‧pa, si‧te‧len, ki‧je‧te‧san‧ta‧ka‧lu). Thanks in advance if anyone can help! AgentMuffin4 (talk) 01:34, 9 April 2024 (UTC)Reply

@AgentMuffin4 Hey, has anyone reached out about doing this yet? Chernorizets already wrote a very sensible syllabifier for Bulgarian last summer, which I ported over to Lua, and I think we can very nicely adapt it to Toki Pona if no one's done it yet! In fact, it could be even easier than needing to port it, since having that exact syllable structure makes it even more uniform than Bulgarian. Kiril kovachev (talkcontribs) 21:03, 17 April 2024 (UTC)Reply
@AgentMuffin4 Update, I just went ahead and did it lol. Please check out Module:User:Kiril kovachev/tok-hyph. We can change the name to "tok-syllab" or something else, but this is basically how it works. If you want to integrate it into the tok-IPA template, you can just use {{tok-hyph}} or invoke the module directly.
I'm now in the process of porting this to the mainspace - hope this is okay! Kiril kovachev (talkcontribs) 22:51, 17 April 2024 (UTC)Reply
Excellent, thanks! I managed to use it to fix an oversight with the template, as well (the IPA stress marker appearing for monosyllables). AgentMuffin4 (talk) 00:17, 18 April 2024 (UTC)Reply
@AgentMuffin4 Nice one. Do you think we should try to summarize the existing pronunciation sections using this template instead? I noticed last night there were only about 4 entries using it, but you look to have proliferated it a bit onto some more — do we want this on all the Toki Pona entries? Kiril kovachev (talkcontribs) 20:55, 18 April 2024 (UTC)Reply
I think on all the one-word entries at least, unless we want to refactor the whole thing to auto-process multiword terms, which doesn't seem urgently needed. (Also, for reference, we're similarly replacing the giant sitelen pona images with {{tok-sitelen}} under ===Glyph origin===.) AgentMuffin4 (talk) 22:04, 18 April 2024 (UTC)Reply
@AgentMuffin4 Okay, that's good — I don't know sufficient template code to work on any upgrades to the IPA part (is there anything else that needs to change for multiword terms?), but fortunately the syllabification logic already does work for multiword terms, in case we do ever feel the need to deploy it for them as well. Kiril kovachev (talkcontribs) 17:39, 19 April 2024 (UTC)Reply
It just adds a leading stress marker if the syllabification has any hyphenation point. So on mi tawa, it returns ˈmi tawa instead of mi ˈtawa, checking the whole string instead of iterating over each word.
I guess if this were Lua'd, then for the default secondary transcription (as on nanpa), that output could be fed into another function that replaces np with mp, nk with ŋk, and nj with ɲ(j), since that part of the template code is currently messy. Then, the rest could conceivably be handled at the template level, with an {{#ifeq:}} to check whether the broad and narrow transcriptions are actually different.
I expected there to be other problems with using the template on multiword terms, but I suppose if the other lines work, and if you're willing to write those extra functions, we might as well equip it for them. I'm still fine either way. AgentMuffin4 (talk) 20:47, 19 April 2024 (UTC)Reply

Lua error edit

It seems most Wiktionary pages display Lua errors for some reason. Kwékwlos (talk) 23:17, 9 April 2024 (UTC)Reply

This was because Module:utilities and Module:utilities/data got into an infinite loop for a brief period, because each would try to unconditionally load the other when first loaded. I've changed Module:utilities so that it only loads Module:utilities/data when it's actually needed for something. Theknightwho (talk) 00:04, 10 April 2024 (UTC)Reply

triple brace abuse filter edit

Just received notice of this. There has been previous discussion, but nothing current. At the time of writing, the editing guidelines recommend the use of triple braces as the "currently preferred method." https://en.wiktionary.org/wiki/Wiktionary:Templates#Formatting_the_headword 203.158.37.134 09:03, 10 April 2024 (UTC)Reply

Wow. This wasn’t current ten years ago, and I don’t see how it ever was, I think the documentation author tried to say that one can pass a modified headword there, he should have used var tags to express this parameter. Fay Freak (talk) 09:58, 10 April 2024 (UTC)Reply
I am too chill to rewrite the documentation page today, and another option, moving for its deletion, I am not gonna pursue since evidently people use the page, which contains references to necessary templates, as it is general to up-to-date pages for particular languages like WT:About Arabic. Somebody needs to go through it with the bulldozer, like I want to delete the whole section about headwords and also the same about inflections implying that one would add inflections outside of template, so I encourage some similar action. Fay Freak (talk) 10:07, 10 April 2024 (UTC)Reply
That help page is targeted towards people writing templates, not for dictionary entries. — SURJECTION / T / C / L / 13:13, 10 April 2024 (UTC)Reply
We should probably get rid of the "older method" and "still older method", which are both totally unacceptable in headword templates as they lack any kind of categorisation. The only thing the page has to say is that the second is "deprecated and discouraged", which isn't enough. Theknightwho (talk) 17:13, 10 April 2024 (UTC)Reply
Yep, bulldozer still; on the other hand which template authors this page speaks to? Nobody can do anything about such complicated infrastructure as behind {{head}} without reading lots of modules, this page won’t teach a template author a damn, only divert his attention, you can argue for deletion. Redundant to categories, some sections should be moved (e.g. etymology templates to → Wiktionary:Etymology, others already refer to other project pages or categories), and misunderstood. Cumulatively there is a lot against this page. Fay Freak (talk) 17:32, 10 April 2024 (UTC)Reply

User talk pages in CAT:E edit

I'm used to the occasional appearance of userspace pages in CAT:PFE due to the mysteries of Wikimedia transclusion updating, but those go away with a null edit. What I'm seeing now is user talk pages showing up in Category:Pages with module errors instead of Category:Pages with module errors/hidden even after a null edit. What has changed? Chuck Entz (talk) 14:16, 10 April 2024 (UTC)Reply

@Theknightwho: I see you've been changing things in the MediaWiki namespace. If nothing else, having a Lua module decide which category module errors go in is asking for trouble- what happens if your Lua module has a module error? Chuck Entz (talk) 15:09, 10 April 2024 (UTC)Reply
@Chuck Entz This is caused by an annoying bug in MediaWiki: the module checks whether the current namespace is a talk namespace by checking whether title.nsText (the current namespace) is the same as title.talkNsText (the talk namespace for that particular namespace), but for user talkpages, the first is "User talk" and the second is "User_talk", so it fails the equality check. The reason I did it like this is that that was how the old template worked, but it turns out in Lua there's a very simple title.isTalkPage check you can do instead, which is all we really care about.
In terms of using a Lua module: the reason I converted this template to Lua is because (a) it gives us much, much finer control over which pages go into the "hidden" category versus those which don't, (b) it gives us a Lua interface to call it from other modules, which Module:headword/page is now using, (c) I did check beforehand, and it seems to be exempt from things like out-of-memory errors when called automatically after things like that happen - presumably because it's done in a special way, and (d) the old version was just as prone to parser function issues anyway. However, to make sure we never run into that situation, I'll integrate some kind of error-catching mechanism so that it never ends up throwing raw errors directly when called from the template, since that's how the automatic error system uses it. Theknightwho (talk) 16:59, 10 April 2024 (UTC)Reply
@Chuck Entz, Benwing2, Erutuon, This, that and the other, Surjection I've come across an underlying bug in MediaWiki's error handling. Currently, error handling is determined by two kinds of special pages in the MediaWiki namespace:
  1. Pages like MediaWiki:Scribunto-common-error-category, which contain the name of a category, and the page is categorised when a certain event is triggered. These pages simply contain {{maintenance category|category name}}, which determines whether it should be the hidden category or not.
  2. Pages like MediaWiki:Pfunc expr unrecognised word, which contain an error message, and that error message also happens to contain a category. These ones use {{maintenance category|category name|cat=1}}, which returns the category as a standard category link.
The first type of page works fine, but the second will treat the current page as "Special:BadTitle/Missing" unless an error which works in the first way already exists further up the page. This means that, for example if you put {{#expr:foo}} at the top of a talkpage (which gives an "unrecognized word" error of the second kind), it will be categorised in the unhidden category, since we don't hide errors in the Special namespace. If you then put {{#invoke:bar}} above it (which causes a Scribunto error; an error of the first type), it then gets categorised in the hidden category. However, if you move that invoke below, then it's unhidden again.
Importantly, this bug is not related to using the new Lua module: it affects all the magic words like {{PAGENAME}} as well. Theknightwho (talk) 18:34, 10 April 2024 (UTC)Reply
Pinging @Umherirrender who might know about this type of thing. The offending MediaWiki code appears to be here. This, that and the other (talk) 23:38, 10 April 2024 (UTC)Reply
PAGENAME only works correctly for real tracking categories added by the software (like scribunto or others listed on Special:TrackingCategories), not by categories added with the help of messages (like pfunc here), to use real tracking categories in ParserFunctions extension the task T25959 exists.
The title used for PAGENAME is set on the parser when transform the message, but it seems that is get reused sometimes, created T362364 for a solution. Der Umherirrende (talk) 21:36, 11 April 2024 (UTC)Reply

clear out Special:WantedPages next edit

Following up on the partial cleaning-out of Special:WantedTemplates (above), I notice that Special:WantedPages is currently filled with a lot of things like "Module:labels/data/lang/mk/functions‏‎"; can we clean those out? I also notice there are a lot of links to SOP-seeming strings like administrative atolls, regional units, unincorporated communities‏‎, autonomous islands‏‎, and that these are coming from pages like Category:Chemical elements for no clear reason (maybe they're linked in the modules that generate category boilerplate, so every category is treated as linking to them??), similarly Unsupported titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` is supposedly linked-to from 1,800 pages such as Template:sv-noun-irreg-c; can we clear up why that's happening? - -sche (discuss) 15:27, 10 April 2024 (UTC)Reply

  •   Support Overdue. Many are not at all correctable manually. Seems to be some kind of artifact of some module generating some kind of inherited link. Whenever there are 500K+ members of such, its a good bet that some automagical process of the black variety is generating it. {{auto cat}} is often involved. DCDuring (talk) 16:23, 10 April 2024 (UTC)Reply
    Some of the problem seems to simply be that fewer contributors are motivated to add the pages than are motivated to add the wants (bright shiny objects). Generating many of the pages would seem almost as automatable as generating the wants. DCDuring (talk) 16:46, 10 April 2024 (UTC)Reply
@-sche The reason why that's happening is that that's the procedurally-generated title for the hypothetical page {{{1}}}, which means that someone has probably put that into a link template somewhere. Theknightwho (talk) 17:06, 10 April 2024 (UTC)Reply
So, how does one find the problem and eradicate it? If it occurs in Module space, few of us can be trusted to correct it. DCDuring (talk) 17:46, 10 April 2024 (UTC)Reply
Looking for "unincorporated communities" in Module space one finds
  • Module:place/shared-data
"[[census-designated place]]s", ["unincorporated communities"] = "[[w:unincorporated community|unincorporated communities]]", ["places"] = "places of all... 128 KB (13,792 words) - 16:24, 2024 March 12
  • Module:category tree/topic cat/data/Places
[[city]]"}, {"towns", {"polities"}}, {"townships", {"polities"}}, {"unincorporated communities", {"places"}}, {"valleys", {"places", "water"}}, {"villages"... 28 KB (3,599 words) - 20:53, 2023 November 12
Also some in Module:User. None have square double brackets, so some code must add the brackets. Who other than our technomavens can save us from this kind of problem? DCDuring (talk) 17:57, 10 April 2024 (UTC)Reply
@-sche @Theknightwho regarding Unsupported titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` and friends, the todo list WT:Todo/Lists/Entries linking to raw template syntax is relevant. Unfortunately the entries on this todo list are very difficult to clean up, because the large majority of entries relate to inflection templates that have not been filled in, and these require knowledge of the language in question. Moreover, the todo list is a bit of a mess due to the limitations of the way it's generated. I have ideas to improve this list so it's easier to work through – showing the template name, for instance. This, that and the other (talk) 22:49, 10 April 2024 (UTC)Reply
A lot of the pages I noticed "wanting" such things were in userspace, and sometimes quite old; perhaps we could HTML-comment-out or otherwise suppress 'linking' on such pages (old userspace and sandbox pages). - -sche (discuss) 22:55, 10 April 2024 (UTC)Reply
I've done that to the most repetitive lists on my own user subpages. The easiest cases should be the ones that have dates on them, for which we could comment out or nowiki all but the most recent page. We could also solicit the views of the user. DCDuring (talk) 00:51, 11 April 2024 (UTC)Reply
Intermittently, I go to these special pages reports to clear them out and there are so many pieces of old cruft that has been hanging around for a decade-plus. —Justin (koavf)TCM 04:27, 11 April 2024 (UTC)Reply
@-sche @DCDuring @Theknightwho The things like Module:labels/data/lang/mk/functions are because the code in Module:labels checks for the existence of a .../functions module and loads it if so, to get a postprocessing function. Currently the only one that exists is Module:labels/data/lang/zh/functions. This doesn't occur with e.g. nonexistent versions of Module:labels/data/lang/LANG because we have a manually curated list of all the languages with data modules (this exists because I thought it would reduce memory and/or be faster but I don't think it is). We could create such a manually curated list for functions modules as well but it would be extra manual effort for no gain other than keeping Special:WantedPages cleaner. As for the things like unincorporated communities, that is coming from the {{place}} code and *might* be fixable, I'd have to take a look at the code (but it might not be; the code might check for the existence of a plural page before falling back to the singular version, and the check for a plural automatically generates a link that gets added to Special:WantedPages, and there's no way to tell the code to not count this particular check). In general though I think the ideal solution would be a customizable Lua function or template that can be run to determine whether to include a given page in Special:WantedPages, or failing that, a blacklist containing regexes listing the pages we don't want included. (If the blacklist could be generated on the fly, it would be essentially as good as the ideal solution.) So you might want to file a Phabricator ticket requesting this functionality. Benwing2 (talk) 09:00, 11 April 2024 (UTC)Reply
But why does the software check for the existence of hundreds or thousands of modules when only one exists? Who are developing them? What is the schedule for their development? Why are resources gobbled up far in advance of need? DCDuring (talk) 13:56, 11 April 2024 (UTC)Reply
@DCDuring That isn't what's happening: it's checking whether such a module exists, and uses it if it does. Theknightwho (talk) 14:01, 11 April 2024 (UTC)Reply
@User:Theknightwho But @User:Bewwing2 stated above "Currently the only one that exists is Module:labels/data/lang/zh/functions." Why make it harder for the humans who use such pages to clean up errors that are beneath the notice of our error-filtering and -correcting software when there is so little benefit? DCDuring (talk) 14:36, 11 April 2024 (UTC)Reply
@DCDuring Because presumably there will be more in the future. It's brand new. Theknightwho (talk) 14:52, 11 April 2024 (UTC)Reply
@DCDuring I've changed Module:labels to use a different method to check whether the module exists which I think will stop these showing up, but I'm not certain. It's also a more efficient way of checking this anyway, so it was worth doing. Theknightwho (talk) 15:07, 11 April 2024 (UTC)Reply
Thanks. I appreciate your doing it even before you knew it would be more efficient. DCDuring (talk) 15:21, 11 April 2024 (UTC)Reply
Unfortunately, this hasn't worked - they're still showing up. Theknightwho (talk) 15:10, 13 April 2024 (UTC)Reply
The other alternative is for me to reimplement Special:WantedPages as a todo list. This has the advantage that it can be customised to exclude unneeded pages and potentially be updated more frequently. This, that and the other (talk) 23:50, 11 April 2024 (UTC)Reply
@User:This, that and the other Now that you mention it, an advantage of having our own version of this would be that we could make sure that only redlinks from principal namespace (and possibly others) were included. Getting rid of User space was mentioned by -sche, but Appendices, Wiktionary pages, Rhymes, talk pages, and others are of lesser importance. IMO, principal namespace most merits frequent updating, but any namespace might merit a run if someone was committed to work on it. Sorting by script and, to the extent possible, by language might make the lists much easier to work with. Cleaning out an entire list for a language (or anything else) one is interested in can be motivating. DCDuring (talk) 00:54, 12 April 2024 (UTC)Reply

Template garbage in drag noun senses 9 and 10 edit

Red error text says "Template:tracking/defdate/hyphen". Equinox 19:49, 11 April 2024 (UTC)Reply

@Equinox Fixed. Benwing2 (talk) 22:24, 11 April 2024 (UTC)Reply

-ment cleanup edit

Hello,

The following categories do not distinguish between terms suffixed with -ment (forms adverbs) and -ment (forms nouns):

Would it be possible to have a bot run through these categories, check a given term's part of speech, and assign id2=nominal or id2=adverbial to its etymology? The senseid's are all set up and the categories are ready to be populated. Nicodene (talk) 02:12, 12 April 2024 (UTC)Reply

One can probably create a good list of the would-be members of such categories easily using Cirrus Search, depending only on uniform use of inflection-line templates. Do we really need permanent categories? DCDuring (talk) 14:27, 12 April 2024 (UTC)Reply
For readers who do not know how to do that, a proportion just a hair under 100%. Nicodene (talk) 20:55, 12 April 2024 (UTC)Reply
I was thinking that the categories are mostly useful to contributors and regular users, both classes of which might come to learn Cirrus Search (esp. regexes). I'd bet relatively few others use our categories at all. (I'd love it if we had facts about such matters.) DCDuring (talk) 14:52, 13 April 2024 (UTC)Reply
@Nicodene I implemented this but haven't run it yet because I notice under Category:Catalan terms suffixed with -ment we have both the empty categories Category:Catalan terms suffixed with -ment (nominal) and Category:Catalan terms suffixed with -ment (adverbial) that you created, as well as partly-filled categories Category:Catalan nouns suffixed with -ment‎ and Category:Catalan adverbs suffixed with -ment‎ that predate your latest changes. What do you think should be done here? Should we adopt your new naming, the old naming, or something else? Benwing2 (talk) 07:43, 14 April 2024 (UTC)Reply
@Benwing2 Thank you very much.
That’s interesting- I hadn’t noticed that about Catalan. I would favour the new naming as consistency across the languages is nice to have. Nicodene (talk) 18:01, 14 April 2024 (UTC)Reply
@Nicodene Should be done except for Middle French estrangement, which needs cleanup. Benwing2 (talk) 02:55, 15 April 2024 (UTC)Reply
Done. I suppose the confusion was because there are/were two different estrangement's. Nicodene (talk) 00:56, 16 April 2024 (UTC)Reply

Character U+0486: COMBINING CYRILLIC PSILI PNEUMATA breaks Old Cyrillic transliteration edit

For whatever reason it seems that any Old Cyrillic quotations that contain the character U+0486: COMBINING CYRILLIC PSILI PNEUMATA fail to render a transliteration; I've been unable to discover why. Anyone have any ideas? The relevant module is at Module:Cyrs-translit; examples of broken quotes are at даждь (daždĭ) and кънигꙑ (kŭnigy). — Vorziblix (talk · contribs) 19:06, 12 April 2024 (UTC)Reply

I believe this is because the transliterate function in Module:languages, if it finds any characters of the original script in the transliteration, removes all Latin characters from the transliteration and then checks if the language-agnostic majority script of the remaining characters is not equal to None. But it was checking if a script object (table) was equal to a string, so I changed it to compare the script code. I guess this problem has occurred for a while, but because transliteration functions usually convert all characters in the original script, the problem wasn't very prominent. — Eru·tuon 04:12, 13 April 2024 (UTC)Reply
@Erutuon: Many thanks for the fix! — Vorziblix (talk · contribs) 13:37, 16 April 2024 (UTC)Reply

Exempt Template:REEHelp from CAPTCHA confirmation? edit

Could the REEHelp template be exempted from needing anti-robot/anti-spam CAPTCHA confirmation for adding external links? Or is there a realistic risk of this being hijacked with usage like famous.celebrity@private-emailaddress.co - OneLook - Google (BooksGroupsScholar) - WP Library or www.self-promotion-for-my-own-website.com - OneLook - Google (BooksGroupsScholar) - WP Library? —DIV (1.145.112.83 09:10, 13 April 2024 (UTC))Reply

I don't think this risk is worse than various other simple ways of accomplishing more or less the same thing. DCDuring (talk) 16:11, 13 April 2024 (UTC)Reply
Apparently we can add the relevant URLs as regexes to MediaWiki:Captcha-addurl-whitelist.
There is a workaround in the meantime though: you could create an account for yourself. That would save you a lot of trouble! This, that and the other (talk) 22:44, 13 April 2024 (UTC)Reply
@This, that and the other: Well, then, add quran.com and sunnah.com, as chosen for {{RQ:Qur'an}} and {{RQ:Sunna}}, for a beginning. It will greatly improve our closure rates and quotation coverage, you know those pesky IPs, being socialized muslim, dealing with the Qurʔān and the Sunna every day, so there will be a low-threshold motivation to expand our dictionary. I would add even more frequently needed domains, but I may have an unusual risk profile. However it be, at least some websites will be supported in the long run and do not maintain user-supported content nor ads. Fay Freak (talk) 22:57, 13 April 2024 (UTC)Reply
Thanks for the input, This, that and the other.
It sounds like it would be worthwhile to consider a moderate expansion in that whitelist. (By the way, the whitelist is currently unpopulated??)
For comparison,
likewise triggers a CAPTCHA confirmation; whilst
  • cross-references to W:Wikipedia use a different syntax and don't trigger a CAPTCHA confirmation.
Trouble? Some might say that's my middle name ;-)
Following Fay Freak's comments, I suggest that IP editors are underrepresented among correspondents on the WT Community pages.
—DIV (1.145.112.83 23:34, 15 April 2024 (UTC))Reply
Rather than us having to whitelist the URLs from templates that you use on a regular basis, we would all have an easier time if you created an account. WikiDIV is not taken, for instance.
Whitelisting REEHelp is probably worth doing either way, but preparing the regexes will take some effort. This, that and the other (talk) 23:45, 16 April 2024 (UTC)Reply
It's up to you (all), of course. But I strongly recommend that you don't think of it as doing me a personal favour. Think of it as whether it's worthwhile for the broad population of editors. For instance, I don't think I had ever used the KJV template until I posted the example in this discussion. —DIV (1.145.112.83 06:03, 18 April 2024 (UTC))Reply

XFAIL feature for Module:UnitTests? edit

Would it make sense to be able to label "expected failures" in Lua module tests? Sometimes a new feature is still under development and doesn't work yet or there's a workaroundable bug that doesn't need urgent attention. So having failing tests for these corner cases is useful, but they should not affect the overall module test verdict and there's no need to list the module in Category:Failing_testcase_modules. --Ssvb (talk) 07:36, 14 April 2024 (UTC)Reply

@Ssvb Yes, absolutely, this should be present. Benwing2 (talk) 07:37, 14 April 2024 (UTC)Reply

Checkparams related cat:E flood edit

This morning, at about 9:30 UTC, I made an edit to {{lt-noun-m-is-1}}, a template that nowadays invokes function error from Module:checkparams, and I thereby unleashed a flood of 'module errors' because, since I first looked at it, it uses parameter {{{2}}}, but not {{{1}}}, for which callers have typically provided a value to be consistent with other Lithuanian noun and adjective inflection templates. I suspect someone was experimenting with a key part of the module's functionality, for it had completely failed to report an attempt to provide a then unsupported parameter, namely |n=. I then added support for that parameter, and cat:E then started to flood. At about 9:45 UTC I then switched off the unhelpful reporting of supplies of non-blank values for {{{1}}}, mostly relevant for pages that have not been edited recently, and cleaned up cat:E.

I don't know what was going on. Module:checkparams has not been edited for days, and @Theknightwho and @JeffDoozan had not been active for hours. Also, the change to {{lt-noun-m-is-1}} by 'AutoDooz' to invoke Module:checkparams has change history comment, "no existing calls with bad parameters, throw error instead of warning to avoid future misuse", but page truputis has been passing non-blank {{{1}}} to the template for years.

Are there any other lurking issues like this, or was it just a one off? --RichardW57m (talk) 12:52, 15 April 2024 (UTC)Reply

It's the <onlyinclude> tags. The old version had them so the call to checkparams never actually ran and the category Category:Pages using bad params when calling Template:lt-noun-m-is-1 stayed empty. The bot interpreted the empty category as a sign that there we no bad calls and switched from 'warn' to 'error'. When you removed the <onlyinclude> tags in your edit, it caused checkparams to start running and throwing errors on the pages with bad calls. I'm sure there are more templates using <onlyinclude> where this is lurking, but now that we know it's a problem, it should be pretty easy to find and fix this automatically. Thanks! JeffDoozan (talk) 13:43, 15 April 2024 (UTC)Reply
@JeffDoozan: Well done reading past my error - it was {{lt-noun-m-tis-1}} that I changed. I don't understand your explanation - <onlyinclude>...</onlyinclude> didn't bracket the call into Module:checkparams. Have you got some logic that says 'Don't check if no parameters are used'?
There are several Lithuanian declension templates that have this construct, and I find it detracts from the documentation page. It may explain why the checking seemed not to be working in some other cases. --RichardW57 (talk) 21:08, 15 April 2024 (UTC)Reply
@RichardW57: The existance of <onlyinclude>...</onlyinclude> caused the parser to treat only the text inside the tags as code and everything outside the tags as documentation, including the call to checkparams. When you removed the <onlyinclude>...</onlyinclude>, it reversed the logic and made the parser treat everything on the page as code except the documentation inside <noinclude>...</noinclude>. See here for the Mediawiki documentation that might explain it better than I can. I manually adjusted the ~10 other templates that had checkparams with <onlyinclude> (and switched them from 'error' to 'check' to avoid flooding :CAT:E) so AFAIK everything should be working as expected. I'm not sure what you mean by logic that says 'Don't check if no parameters are used' or the construct used by the other Lithuanian templates so if I haven't answered your question, please give me an example or a link to the other templates. JeffDoozan (talk) 23:27, 15 April 2024 (UTC)Reply
@JeffDoozan: I'd misunderstood the tag. What's weird is that text inside the tags was not displaying when viewing the template - or there was something else going on that was suppressing the output when viewing template pages. --RichardW57 (talk) 07:49, 16 April 2024 (UTC)Reply
@RichardW57 It sounds like you're confusing <includeonly> and <onlyinclude>. The three tags are:
  • <noinclude>: this text won't be transcluded.
  • <includeonly>: this text will only be transcluded.
  • <onlyinclude>: only the text between these is allowed to be transcluded.
In other words, <onlyinclude> effectively determines what is treated as the page for the purpose of transclusion (so you can imagine the default position to be the whole page being between a pair of <onlyinclude> tags), and then the other two sets of tags are then applied on top of that. Theknightwho (talk) 13:52, 16 April 2024 (UTC)Reply

Should "the" be linked? edit

Template:en-noun#Other_parameters gives a way to put "the" in the inflection template for an entry, which is: "def=1". I recently saw that removed in favor of adding "the" (linked)- see [9]. Under "def=1", the "the" is not linked. Is this intentional? I have no opinion on the matter. --Geographyinitiative (talk) 19:54, 15 April 2024 (UTC)Reply

@Geographyinitiative I don't know why User:LlywelynII made that change, which seems counterproductive. I have no strong opinions either on whether to link the word "the". Benwing2 (talk) 21:25, 15 April 2024 (UTC)Reply
@Geographyinitiative Ditto. No strong feeling. Nearly anyone able to make sense of the entry presumably already knows how the English definite article works. Two minor caveats are (a) future improved machine translation might change that for some users and (b) it's just better to have all the headwords linked for the curious imo. You'll notice the of was already linked in the previous version anyway. It's a very minor point and I know where the coder was coming from, but my own preference is that the utility of online dictionaries is linking and people who just have an aesthetic aversion to blue links can always tell their browser not to display them. Extra weight to the link for being part of a headword, even though I probably wouldn't link it in the definition of some sense. Ofc will defer to standing policy if there is one. — LlywelynII 08:26, 16 April 2024 (UTC)Reply
Two questions, then.
What is the ultimate explanation of why "Fitz" can mean "Fitzwilliam College" while "the Fitz" means "the Fitzwilliam Museum"? (We currently lack entries for both of them.) A popular meme was students accidentally misdirecting tourists looking for the Fitz to Fitz - they're at opposite ends of the city.
How does a Wiktionary user switch off blue links on the definite article, but not other words? --RichardW57m (talk) 12:09, 16 April 2024 (UTC)Reply
I really don't like linking the in this way, for the same reason we don't link every word in a definition. I think @Fay Freak has expressed similar opinions on this. LlywelynII seems to display the same misunderstanding of the problem now as he has in the past, which is that linking everything removes the prominence of the words which are important to link; it's not that people dislike the colour blue. Theknightwho (talk) 13:41, 16 April 2024 (UTC)Reply
We could, theoretically, have a separate wikilink to each individual character, but, like this, it would just waste the time of anyone clicking the links. Chuck Entz (talk) 13:54, 16 April 2024 (UTC)Reply
  Oppose linking "the". Ioaxxere (talk) 15:47, 16 April 2024 (UTC)Reply

Parameters type2 and journal2 of Template quote-journal edit

The examples of {{quote-journal}} show |journal2=, but its use in zacusi was causing the module error "Lua error in Module:quote at line 2660: Parameter "journal2" is not used by this template.". I have just fixed it by inserting |type2=journal, but neither |type= nor |type2= is documented in the lists of parameters. When is it needed? When is it allowed? What are its values? --RichardW57m (talk) 11:36, 16 April 2024 (UTC)Reply

Its mentions are a bit more helpful in {{quote-book}}, but the same complaints apply. --RichardW57m (talk) 11:43, 16 April 2024 (UTC)Reply
Sorry, I need to document this. The use of |type2= is correct here; I did this and made it default to book because it was more common to have journal articles quoting book entries than quoting another journal article. Benwing2 (talk) 18:53, 16 April 2024 (UTC)Reply

MediaWiki Common.css issues edit

Several things:

  1. There are three errors in MediaWiki:Common.css. These are on lines last edited by User:Erutuon. Are these real errors or just cases where the CSS editor isn't up to date?
  2. How do you really force changes to MediaWiki:Common.css to take effect? The instructions say to "Reload" for Chrome but this does nothing. Eventually (10 minutes?) it seems to take hold, but that's a long time to wait.
  3. Opinions on how to best display deprecated labels like color ((color)). Currently I made them show as green (same as deprecated templates) and struck-through, although I don't (yet?) see the strike-through; maybe I have to wait awhile.

Pinging User:This, that and the other as our resident CSS expert. Benwing2 (talk) 02:22, 17 April 2024 (UTC)Reply

@Benwing2
  1. The three errors relate to the use of :has(), which is a very recent addition to CSS. Indeed, Firefox only gained support for it in December 2023. The MediaWiki CSS editor must not be up to date with this new feature.
  2. I believe the site CSS is cached for 5 minutes. After making a change, wait 5 minutes, then press Ctrl+Shift+R or Cmd+Shift+R in your browser to witness the effects. It's always a good idea to test changes in an incognito/private browsing window too, just to make sure they work for readers who do not have any gadgets enabled.
  3. On the word "color" I see strike-through and a green color which is very subtle, although that may just be my poorly adjusted monitor.
This, that and the other (talk) 03:14, 17 April 2024 (UTC)Reply
@This, that and the other OK thanks. I suppose there's no way to avoid waiting the 5 minutes? As for the color, it is set to darkgreen which maybe isn't the best choice. When not a link it appears as olivedrab, which looks like this: olivedrab; maybe we should make links that way too? Benwing2 (talk) 03:33, 17 April 2024 (UTC)Reply
@Benwing2 The 5 minutes is in place to ease load on the servers. You can attempt to load the page with ?debug=true at the end of the URL, which bypasses all caches (both on the WMF server-side end and your client-side end), but this (intentionally) loads the page's JS and CSS very slowly, so may not actually be faster in the end. As for colors, I don't particularly have an opinion other than to say that dark green is a very difficult colour to "get right". On many displays it can be practically indistinguishable from black. This, that and the other (talk) 07:42, 19 April 2024 (UTC)Reply

What put a bunch of Albanian entries into the "Latin terms with quotations" category? edit

All of the 'Newest pages ordered by last category link update' in Category:Latin terms with quotations are Albanian. I looked at the pages e.g. majth but couldn't notice what was causing it (most don't even seem to have quotations of any kind). They also weren't edited recently, so I assume the categorization change was caused by a bug somewhere else. Urszag (talk) 13:24, 17 April 2024 (UTC)Reply

@Urszag: It's the use of {{R:sq:Bardhi:1635}}, which includes a |passage= from a Latin-Albanian dictionary and passed |lang=la to {{cite-book}}, plus my recent change to {{cite-book}}'s programming to use Module:quote instead of {{cite-meta}} to make it work like {{quote-book}}. I adjusted {{R:sq:Bardhi:1635}} to use |worklang=la,sq instead of |lang=la, which will avoid classifying the passage as a Latin quote. JeffDoozan (talk) 13:53, 17 April 2024 (UTC)Reply

Aramaic and Nesting Dialects in English Translations edit

When translating words in English entries, I see some users add dialects of Aramaic (e.g. "Assyrian Neo-Aramaic", "Syriac", "Turoyo", "Mandaic", etc.) without nesting them under the banner "Aramaic". Is there a way to get a bot or something to do that automatically (like in this edit)? Would there be a technical page somewhere that has a master list of which dialects would be nested under which language? --334a (talk) 22:09, 17 April 2024 (UTC)Reply

@334a Yes, this is possible, although I'd like to hear from other Aramaic editors to verify they are on board with this @Rhemmiel, Shuraya, Fay Freak. Benwing2 (talk) 23:17, 17 April 2024 (UTC)Reply
Yes, this seems like a good idea instead of all the Aramaic languages being spread throughout Shuraya (talk) 04:52, 18 April 2024 (UTC)Reply
I too prefer this nesting. It must be like this because the labels are quite idiosyncratic and most usually the general interest of someone seeking translations is just hopefully getting anything in any Aramaic anyway. Fay Freak (talk) 23:53, 17 April 2024 (UTC)Reply
OK, I modified my existing sort-translation-lines script to indent Aramaic lects. Just verifying however that we want all such lects indented. See the family tree under Category:Aramaic language. This includes not only lects ending in "Aramaic" but also "Mlahsö", "Turoyo", "Classical Syriac", "Hulaulá", "Hértevin", "Koy Sanjaq Surat", "Lishana Deni", "Lishanid Noshan", "Lishán Didán", "Senaya", "Classical Mandaic" and "Mandaic". Benwing2 (talk) 02:07, 18 April 2024 (UTC)Reply

cleanup run on Nordic language lemmas edit

I am planning on doing a cleanup run on the lemmas in Swedish, Danish, Norwegian Bokmål, Norwegian Nynorsk and Icelandic. Hopefully these changes are noncontroversial. I have done similar runs on several languages before without complaints. The cleanups are:

  1. Templatize raw links occurring in list format in certain sections (e.g. ==Derived terms==, ==Related terms==); e.g. * [[meio ambiente]] occurring in a ==Derived terms== section of a Portuguese lemma would turn into * {{l|pt|meio ambiente}}.
  2. Detemplatize English links occurring in definitions, e.g. {{l|en|ambient}} -> [[ambient]]; but the opposite change happens when the English term is spelled the same as the pagename (because raw links to the same page turn into unlinked bolded terms). Note that since the JavaScript change of User:This, that and the other, raw links automatically link to the English section, so the extra templated linking has no effect except to make the Wikicode harder to read and edit.
  3. Templatize raw category references to use {{C}} (for topical categories) or {{cln}} (for poscat categories), if possible; other categories are left alone. Also standardize category references using different aliases (e.g. {{topics}}) to use these names.
  4. Convert synonyms and antonyms in ==Synonyms== and ==Antonyms== sections into inline synonyms and antonyms specified using {{syn}} and {{ant}}, when it is safe to do so. (Approximately, either (a) there's only one definition, or (b) there are {{sense}} tags associated with each synonym or antonym and all of them can be uniquely matched up with definitions.)
  5. Convert raw links and {{l}} links in ==Alternative forms== sections into {{alt}} links.
  6. Put Wikipedia boxes in a standard position. (Approximately, if there's only one part of speech in a given Etymology section, the Wikipedia box goes at the top of the Etymology section. If there's more than one part of speech, the box stays where it is because it might be associated with that part of speech.) Note, this only affects Wikipedia boxes, not inline Wikipedia links (using {{w}} or similar) or single-line Wikipedia links (using {{pedia}} or similar).

Benwing2 (talk) 05:11, 18 April 2024 (UTC)Reply

@Benwing2: I suggest you keep {{l|en}} links with disambiguating parameters, especially |id= but also |pos=. You should probably also keep those with the alternative parameter. --RichardW57m (talk) 09:38, 18 April 2024 (UTC)Reply
@Benwing2: just wanted to point out that I do use {{l}} in definitions when I need to provide a gloss for a term. — Sgconlaw (talk) 09:39, 18 April 2024 (UTC)Reply
@Benwing2, Sgconlaw: Probably best to only detemplatise calls of {{l}} only when its only parameters are |1= and |2=. --RichardW57m (talk) 11:26, 18 April 2024 (UTC)Reply
@Sgconlaw @RichardW57m My current script only replaces links of the form {{l|en|foo}} -> [[foo]] and {{l|en|foo|bar}} -> [[foo|bar]]; I should have clarified this. If there are any other params, the template is left alone. It's also smart enough to replace e.g. {{l|en|olive tree|olive trees}} with [[olive tree]]s. Benwing2 (talk) 19:52, 18 April 2024 (UTC)Reply
I support 1,2,3,5,6. I would not touch the synonyms/antonyms. Thadh (talk) 11:14, 18 April 2024 (UTC)Reply

Derived terms tool edit

We really need a tool to quickly add all these unlinked Derived terms. Doing it manually destroys my soul P. Sovjunk (talk) 18:49, 18 April 2024 (UTC)Reply

I can't help you with a JavaScript tool but you might be able to make use of a new-entry creation template similar to the ones that exist for Japanese, Thai, etc. if that is what you're looking for. Benwing2 (talk) 22:10, 18 April 2024 (UTC)Reply
MAybe. Whatcha got? P. Sovjunk (talk) 22:24, 18 April 2024 (UTC)Reply
@P. Sovjunk Nothing yet but I could maybe be persuaded to write something if you'd actually use it. Take a look for example at the documentation of {{ja-new}} and {{th-new}} and tell me if something along these lines would be helpful. Benwing2 (talk) 22:41, 18 April 2024 (UTC)Reply
If what needs to be done is "make amiability contain a link (in the Derived terms section) to unamiability", it seems like a bot could do that, at least for cases where amiability has only one part of speech (which is probably a large percentage of cases). I couldn't write such a bot, but it seems like the sort of thing a bot could be written to do, working from that list. - -sche (discuss) 23:06, 18 April 2024 (UTC)Reply
@-sche Hmmm, you are right, somehow I assumed the terms in question needed to be created but I see they already exist. Benwing2 (talk) 23:08, 18 April 2024 (UTC)Reply
Yeah, useful though those templates might be, not really what's needed. -sche hit the nail on the head with the desired function: Quickly add term fooable to Derived terms section of foo. I'd like to point out that after 20 years here I still am useless at the computing side of things (and to be fair, only slightly better at the lexi-stuff and equally as lame with the social side, TBH). However, I do love making my way through a big juicy cleanup list. P. Sovjunk (talk) 06:14, 19 April 2024 (UTC)Reply
Yes,it was pretty lame not to mention that these were only English derived terms. --RichardW57 (talk) 08:04, 19 April 2024 (UTC)Reply

Template:lq edit

I am thinking of creating a template {{lq}} that would be a combination of {{lb}} and {{q}}; essentially it works like {{lb}} but doesn't categorize. The idea is it could be used in cases where {{q}} is currently used but with proper linking of lects as well as terms like archaic, dated. Does this seem like a good idea?

I should add that if we add a language code to {{a}} and change it to accept labels, there might not be a need for this; or we could have two templates, one that takes a language code and one that doesn't, both of which process labels but the latter one only processing language-independent labels (things like archaic and dated, but not Southern US, Louisiana or the like). Thoughts? Benwing2 (talk) 23:17, 18 April 2024 (UTC)Reply

A good idea.
A change of {{a}} would of course be massive, I figure you already ballpark over a week to execute multiple bot-runs.
You seem to have no clear idea yet though, just as I, where else than in pronunciation sections this {{lq}} would be used, though I remember the feeling that I wanted to use such a thing, somewhere in the past already, which wasn’t necessarily beside pronunciations. Fay Freak (talk) 00:28, 19 April 2024 (UTC)Reply
@Fay Freak Examples would be ==Derived terms==, ==Synonyms== sections and the like. Benwing2 (talk) 01:01, 19 April 2024 (UTC)Reply
Also ==Translations== sections. Benwing2 (talk) 08:08, 19 April 2024 (UTC)Reply
I'm not opposed, but will caution that the more templates that do similar things (especially if used in the same places), the more likely people will not grasp the distinction and will use one where we want the other. (We already see people use T:q, T:a, T:lb or bare formatting in place of each other, e.g. T:a for T:q in dust, fright; T:ib/T:italbrac used to also be in that mix.) If neither T:q nor T:lq (even in translations sections) categorize, I guess that's not really a problem for anything but our sense of "this should be x, not y", if the only difference is "oops, sometimes 'Hakka' isn't a link". And if we want to have a {{q}}-like thing that links, adding T:lq is easier and less disruptive to people's habits than requiring every use of T:q include a language code.
Iff we add a language code to T:a, it does sound like it could become the same thing as this, but I suppose we could always create this now and, if we later add a langcode to T:a and make it use labels, reduce it to an alias of this [or vice versa] at that time.
BTW, T:a asserts it should only be used for {{a|UK|rare}} but that {{a|rare}} should use {{q}}, which is evidently too arbitrary a distinction because I often see entries use T:a even for non-accent labels, like horned, devil, ; if we add a langcode to T:a (which would undoubtedly take some getting used to, but again, not opposed), or even if we don't, maybe we can also abandon that "{{a|UK|rare}} but {{q|rare}} {{IPA}}" distinction? (BTW I guess bots cleaning up uses of "wrong langcode to be using in this L2" will need to know that T:lq in a ====Derived terms==== section uses the L2's langcode but in a ====Translations==== section uses the translation's?) - -sche (discuss) 16:17, 19 April 2024 (UTC)Reply
@-sche Yeah I get your point and I appreciate your thoughtful responses. I agree that the current idea that {{a|UK|rare}} is OK but not {{a|rare}} is silly. One possibility is to add a lang code to {{a}} and repurpose it as a general alternative to {{q}} for label-like qualifiers (although it might require a bit of thought to figure out what it ought to stand for :) ...). It has the advantage of being one character shorter than {{lq}}. In any case the current state of Module:accent qualifier/data is super messy and needs cleaning up and merging with the label data. Benwing2 (talk) 20:39, 19 April 2024 (UTC)Reply
This could be useful, or at least something similar to it. Vininn126 (talk) 16:18, 19 April 2024 (UTC)Reply

Reduplicated emoji in citation about said emoji triggered "emoji spam" abuse rule edit

I tried adding an extra definition to the "🍅" entry as it is also often used (particularly reduplicated) to express disapproval, like when booing, to mimick the act of audiences throwing tomatoes at bad performances. Found a citation for it, but it seems it got auto-flagged for emoji spam (and really, any citation I would've found/used would've triggered it due to the way this emoji is used in this sense):

  • 2024 January 15, @cragmites, Twitter[10], archived from the original on 2024-04-20:
    BOOOOOOOO [five tomato emoji in a row]

Big Sprinkler (talk) 15:44, 20 April 2024 (UTC)Reply

Seems legitimate enough, so done. This, that and the other (talk) 10:02, 21 April 2024 (UTC)Reply

Translation adder langname to langcode functionality edit

I'm not sure when this broke (for all I know, it could've been broken for years), but it used to be possible to type a canonical language name (rather than only a code) into the "Add translation:" field, and some javascript(?) would automatically convert the name (right there in the field, before you preview or post the translation) to the corresponding code. Now, it only converts language names to "languages/javascript-interface at [[Module". Not sure how easy this is to fix, or how much of a priority it is. - -sche (discuss) 07:07, 22 April 2024 (UTC)Reply

Latin-script footer not responding to expansion edit

I may be editing in the wrong place, but I expanded the list of letters with palatal and retroflex hooks at Template:mul-script/Latn-list, and they're not visible on pages. For retroflex what displays is ᶏ ᶒ ᶖ ɭ ɳ Ʈʈ ᶙ ʐ ᶚ, which isn't even in the same order, and for palatal it's just ᶀ ᶁ, which is missing the most common letters. If the list is kept elsewhere, should the one in the template be replaced with a note on its current location? kwami (talk) 10:05, 22 April 2024 (UTC)Reply

@Kwamikagami: If I'm reading the code right, the main template is just for choosing which group in {{Template:mul-script/Latn/groups-list}} is displayed. Chuck Entz (talk) 10:33, 22 April 2024 (UTC)Reply
Ah, I figured it out. Thanks. It's at Template:mul-script/Latn/groups-list. kwami (talk) 10:45, 22 April 2024 (UTC)Reply

Wrong warning by categorisation edit

It appears that gėlė is being put in categories:

because it contains [[:Category:lt:Flowers|Flowers in Lithuanian]]. The categories are for where sorting within other categories may go wrong, so isn't this categorisation wrong? I don't know which code needs correcting, and suspect I might not be able to edit anyway. --RichardW57m (talk) 14:01, 22 April 2024 (UTC)Reply

@RichardW57m Yes, that's wrong, since it's just a regular link which should be ignored. I'll do a fix. Theknightwho (talk) 14:19, 22 April 2024 (UTC)Reply
Fixed. The issue was in Module:headword/page, which parses categories on the page at line 676. It now ignores category links where a colon precedes "category". Theknightwho (talk) 14:31, 22 April 2024 (UTC)Reply
@Theknightwho: Thank you for fixing it, and thank you for telling us how you fixed it. --15:37, 22 April 2024 (UTC) RichardW57m (talk) 15:37, 22 April 2024 (UTC)Reply

Template junk at human being edit

Template:tea room sense seems to be producing junk. Equinox 15:51, 22 April 2024 (UTC)Reply

@JeffDoozan This is because of a bad parameter, but the warning message is corrupted; can you take a look? Benwing2 (talk) 22:02, 22 April 2024 (UTC)Reply
This seems to be caused by @Theknightwho's addition of some very clever stuff to extract and format additional error details: diff. I don't completely understand what's going on with that, so I'm kicking this over to knight. JeffDoozan (talk) 22:39, 22 April 2024 (UTC)Reply
@JeffDoozan @Benwing2 I'll take a look. The reason I made that change is because Scribunto error messages have standard wiki formatting applied to them (e.g. multiple spaces are compressed into one space etc.), which is a problem when you want to accurately display which argument is causing the problem: e.g. if a template contains {{{some arg}}}, it won't work if you accidentally put
|some  arg=
as there are two spaces. Module:checkparams (correctly) identifies this is a problem, but if you try to display that in a standard error message it'll get normalised to a single space, which is really confusing for the user since that looks identical to the correct input.
The normal solution to this would be to use <pre></pre> tags, but they don't work if you put them in a Scribunto error message. I also tried preprocessing the pre tags before throwing the error, but if you do that it simply displays the raw strip marker, which is even worse. Another alternative would be to display a manual error message (which can be formatted however we like), but that loses the benefit of things like automatic categorisation in CAT:E, traceback and so on (i.e. it's not considered a "real" error by the MediaWiki software).
To get the best of both worlds, the module (effectively) preprocesses {{#invoke:checkparams|placeholder_error}} using pcall, where placeholder_error simply throws an error with a placeholder string. This generates a real Scribunto error (i.e. it's automatically categorised in CAT:E, traceback works properly etc.), but because of the pcall the error block is caught and returned as a string to the main Scribunto instance. The placeholder can then be swapped out for the real message, which contains preprocessed <pre></pre> tags, and returned. Since the main module is simply returning a string, the strip markers for the pre tags expand into the desired output; it just so happens that output is a Scribunto error message.
At some point, I'll probably add a formatted_error function to Module:debug to handle this, since I expect Module:parameters (and a few other modules) would benefit as well. Theknightwho (talk) 14:16, 23 April 2024 (UTC)Reply
Also, more specifically to the issue at hand, it seems to be some kind of weird interaction between the "catch my attention" tag and wiki list formatting. Theknightwho (talk) 14:27, 23 April 2024 (UTC)Reply

Chiromantis edit

clearly Greek and not Italian, chir and mantis are both greek. 2A02:587:471B:F472:B659:C76E:8411:604 21:38, 22 April 2024 (UTC)Reply

Not sure what this is about: Chiromantis is a Translingual entry with no etymology section. Perhaps they meant chiromante, where the etymology we give is a valid surface analysis but probably not the true original formation of the term. This, that and the other (talk) 07:59, 23 April 2024 (UTC)Reply

Styling error with ئ edit

Hey all. I couldn't find another place to report this.

In the entry ئ (also عرب, among others), the character is displayed in a specifically Nastaliq font style, even though the entry is not solely for Urdu.

The character is used in Arabic, too, and therefore should not be marked with any style.

Marking font-style in the header is very excessive and unnecessary. Headers should remain consistent.

font-family: 'Noto Nastaliq Urdu', Tahoma, 'Arial Unicode MS', 'UT Cairo', 'UT Naskh', sans-serif;

font-family: 'Noto Naskh Arabic', 'Iranian Sans', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', 'Arial Unicode MS', sans-serif;

The previous are the font specification for the header.

For those unfamiliar with the topic, w:Nastaliq is a specific rendering style commonly used for Urdu and Persian, but not other languages which use Arabic script.

Thanks. --Esperfulmo (talk) 13:19, 23 April 2024 (UTC)Reply

@Esperfulmo Yes this is a known issue. It is a side effect of the current implementation of the code, which formats the title according to each language processed in turn. This is necessary in general to get the correct fonts for all sorts of different characters, but it has the weird side effect you've noticed when Urdu is the last language on the page (which is usually the case). I have proposed adding a special check which looks to see if there is more than one Arabic script language on a given page and if so disables the code mentioned above for Urdu. But I haven't gotten around to implementing it. Benwing2 (talk) 20:26, 23 April 2024 (UTC)Reply
Sounds like a proper proposal. Let's wait and see. -Esperfulmo (talk) 23:30, 23 April 2024 (UTC)Reply

Strange unwanted linking of Japanese transliterations edit

Somebody introduced linking of Japanese transliterations. I oppose it, even if it worked correctly. It happens with multipart terms

  1. (はだ, hada) - OK
  2. 肌の色 (はだのいろ, hada no iro) - OK but it's a sum of parts, split below
  3. (はだのいろ, hada no iro) - wrong, nothing should be linked. The current link is on はだのいろ, hada no iro

Anatoli T. (обсудить/вклад) 21:51, 23 April 2024 (UTC)Reply

@Theknightwho: Hi. It must be to do with your work on Module:ja. Please undo the linking. Anatoli T. (обсудить/вклад) 21:58, 23 April 2024 (UTC)Reply
@Atitarev It wasn't caused by those changes, but I'm not sure why this has happened. Theknightwho (talk) 22:02, 23 April 2024 (UTC)Reply
@Theknightwho: Thanks for replying. Could you please try fixing it? I see it wasn't intentional but it worked until recently and Module:ja is the module that does it. Anatoli T. (обсудить/вклад) 22:07, 23 April 2024 (UTC)Reply
I can have a look, but not right this minute. I can see that it definitely wasn't caused by any of the recent changes to Module:ja, though, since it still happens if I preview old versions of it. Theknightwho (talk) 22:17, 23 April 2024 (UTC)Reply
@Theknightwho: Thanks, it must be some other recent module change (not necessarily yours). Also calling @Benwing2 for help. Anatoli T. (обсудить/вклад) 23:04, 23 April 2024 (UTC)Reply
@Atitarev: I looked around but I can't see any recent changes that would have triggered this. Do you know when this happened approximately? Benwing2 (talk) 23:15, 23 April 2024 (UTC)Reply
@Benwing2: I can't tell you exactly but it's rather recent. No more than a month ago. Anatoli T. (обсудить/вклад) 23:19, 23 April 2024 (UTC)Reply
Another case is with Roman letters: UNICEF (Yunisefu). Anatoli T. (обсудить/вклад) 23:27, 23 April 2024 (UTC)Reply
@Atitarev link_tr is set to true in Module:languages/data/2, which normally triggers Japanese transliteration linking. But this has been the case since this diff [11] in Aug 2023. User:Theknightwho will have to look into this more as I don't know the ins and outs of how Japanese transliteration is handled. Benwing2 (talk) 23:43, 23 April 2024 (UTC)Reply
@Benwing2 @Atitarev Yeah, it definitely post-dates that change by quite a long time; that was added because it made it simpler to link transliterations in Japanese headwords. Theknightwho (talk) 23:55, 23 April 2024 (UTC)Reply
@Benwing2, @Theknightwho: Thanks. Weird linking of SoP terms happened not so long ago, I would have noticed.
Calling (Notifying Eirikr, TAKASUGI Shinji, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): : Hi. Would someone know who and why the change was made? Anatoli T. (обсудить/вклад) 01:41, 24 April 2024 (UTC)Reply