Wiktionary:Grease pit/2023/November

Chinese fallback

I was wondering if this can be applied to Wiktionary: When I'm reading about Mandarin 水 (shuǐ) or Hokkien 水 and wanted to click the links, I wanted to be taken to the Chinese 水 (shuǐ) definition right away. Sure, I can just scroll down a bit but can I (or every other user) be taken to the Chinese part already instead of scrolling? I think this improves user experience. This can be done when the Chinese "dialects" are in Chinese characters and not Latin like Mandarin shuǐ, Hokkien chúi that the Chinese definition can be a fallback. Ysrael214 (talk) 01:20, 1 November 2023 (UTC)[reply]

Lingua Libre: Is it okay to just directly link to .WAV files for pronunciation without any .OGG reuploading gymnastics?

I found some earlier discussion on this topic. The instructions about Lingua Libre in Help:Audio_pronunciations are very short and unclear about what we are supposed to do with the resulting .WAV files. I also see that the {{audio|pl|LL-Q809 (pol)-Poemat-spartaczyć.wav|Audio}} template from spartaczyć conveniently turns into the following information in the kaikki.org's exported json:

{"audio": "LL-Q809 (pol)-Poemat-spartaczyć.wav", "text": "Audio", "ogg_url": "https://upload.wikimedia.org/wikipedia/commons/transcoded/b/b2/LL-Q809_%28pol%29-Poemat-spartaczy%C4%87.wav/LL-Q809_%28pol%29-Poemat-spartaczy%C4%87.wav.ogg", "mp3_url": "https://upload.wikimedia.org/wikipedia/commons/transcoded/b/b2/LL-Q809_%28pol%29-Poemat-spartaczy%C4%87.wav/LL-Q809_%28pol%29-Poemat-spartaczy%C4%87.wav.mp3"}

So the autoconverted .OGG and .MP3 files are nicely referenced from it. Are there no downsides or have I missed something important? Ssvb (talk) 16:46, 1 November 2023 (UTC)[reply]

BTW, I'm trying to use the audio recording data from Lingua Libre this way. Please let me know if this is considered to be inappropriate. Ssvb (talk) 16:51, 1 November 2023 (UTC)[reply]

@Ssvb: yes, that's fine. — Sgconlaw (talk) 19:23, 1 November 2023 (UTC)[reply]

@Sgconlaw: Thanks! I can indeed see many other examples of .WAV files usage. Should Help:Audio_pronunciations be updated to make it less misleading? The way it is now, it's probably scaring away many potential pronunciation audio contributors. — Ssvb (talk) 08:59, 5 November 2023 (UTC)[reply]

@Ssvb: please indicate which parts of that page you think require updating. — Sgconlaw (talk) 11:55, 5 November 2023 (UTC)[reply]

@Sgconlaw: Please try to put yourself in the shoes of a new user. You visit some article and see the following message from the {{rfap|en}} template:

This entry needs audio files. If you are a native speaker with a microphone, please record some and upload them. (For audio required quickly, visit WT:APR.)

Coincidentally, you are a native speaker with a microphone and actually want to help. The link "record some" brings us to the Help:Audio_pronunciations page. What kind of ideas do you get after reading it? The emphasis there is on using the .OGG file format for storage efficiency. It also states "We recommend that you download Audacity" as if it were the current best practice. Then there are walls of text about the file naming conventions and some tips about the cumbersome process of renaming/uploading many files. A small notice about Lingua Libre is even not immediately visible, because it's just sandwiched somewhere in the middle of the page. But let's suppose that you found Lingua Libre and recorded a few hundreds pronunciation audio files using it. Now what? Lingua Libre uses .WAV format instead of .OGG and does not follow the prescribed file naming conventions. How do we convert to .OGG format, rename and upload the files? The article mentions that "a bot was made for adding the audios onto different wikis (e.g. Wiktionary in French)" without going into any details. Having no other information, a reasonable assumption is that the automated process of converting/renaming/uploading the resulting .OGG files is exclusively available only to the lucky French Wiktionary users via some sort of a bot. But we are in the English Wiktionary! Do we need to download Lingua Libre files one by one, manually rename them to comply with the required naming conventions and upload them manually too? That's the idea that any newcomer would get after reading the current not very helpful Help:Audio_pronunciations page.

My suggestion is to explicitly recommend Lingua Libre as a convenient unified solution, which already takes care of both recording and uploading. Mention that the .WAV files created by it are okay and can be linked directly. Maybe mention User:DerbethBot and what it does. The bits about Audacity surely have some archaeological value and may be kept as a fallback option, but they shouldn't be dumped on the unsuspecting new users the way they are now. Do I need a special permission and consensus to edit Help:Audio_pronunciations myself? — Ssvb (talk) 05:36, 6 November 2023 (UTC)[reply]

@Ssvb I'd encourage you, as someone with recent experience of this process, to consider updating as much of Help:Audio pronunciations as you can. I see that you already made a start, but explaining exactly how to get the pronunciations added to the page after recording them using Lingua Libre seems like a critical detail that we are missing. Honestly our entire Help namespace is in a dire state of outdatedness... please just go ahead and pitch in. This, that and the other (talk) 09:56, 10 November 2023 (UTC)[reply]

@This, that and the other: Done. At least content-wise I have nothing more to add. Now the others are welcome to proofread it for grammar/spelling/formatting issues and maybe give Lingua Libre a test drive to see whether the instructions are comprehensive enough. —Ssvb (talk) 14:21, 10 November 2023 (UTC)[reply]

Wikitable syntax gets mangled when using "Reply" functionality on Discussion pages

Wikitable syntax gets mangled when using the "Reply" functionality on Discussion pages, as a consequence o prepending a colon (:) to every line of 'code'.

Example at Talk:lay#To_lay_and_to_lie.

I'm not sure whether this is already a widely known problem. I'm also uncertain about how to resolve it: my first thought was that a parser could watch for key 'code phrases' to know when to forgo the prefixing of a colon, but I'm not sure that will be robust to all user intentions.

—DIV (1.145.44.122 07:42, 2 November 2023 (UTC))[reply]

Yes I have seen this when trying to use <pre> in Reply responses. BTW in your table about lie vs. lay I don't think the reflexive use of "lie" is correct; cf. Bridge Over Troubled Water, "I will lay me down"; saying "I will lie me down" sounds quite wrong to me. Benwing2 (talk) 21:03, 2 November 2023 (UTC)[reply]

"Now I lay me down to sleep, / I pray the Lord my soul to keep." On the other hand, "lie myself" sounds wrong to me. Equinox ◑ 21:19, 2 November 2023 (UTC)[reply]

Seems like a use of lay ("to put, to place") rather than lie ("to be placed horizontally"). — Sgconlaw (talk) 21:30, 2 November 2023 (UTC)[reply]

Table syntax don't work properly after : (definition list syntax), and pre tags just don't work at all if there's a newline inside. The second table in Talk:lay#To lay and to lie is in an entirely different definition list tag (<dl>) from the text above it. It's even worse trying to put a table after * or #: no table generates at all. Apparently nobody who was designing wikitext figured out how to make list syntax cooperate with these other syntactic features. So it kind of makes sense that the discussion feature wouldn't properly support pre tags with newlines at least, and maybe tables as well. They already don't work properly in discussions. I use syntaxhighlight tags instead of pre tags because they actually work in definition lists. — Eru·tuon 21:27, 17 January 2024 (UTC)[reply]

Double derived terms

There are plenty of examples where the same term has been added to a table twice, e.g. in Derived terms with {{col-auto}}. Any way we can generate a list of them and/or delete them all? P. Sovjunk (talk) 22:30, 2 November 2023 (UTC)[reply]

Can you give some examples? Benwing2 (talk) 10:03, 3 November 2023 (UTC)[reply]

I don't remember any, so I made my own error! |foolproof appears twice at fool. P. Sovjunk (talk) 10:10, 3 November 2023 (UTC)[reply]

A few appear among multi-word derived terms of many one-word English vernacular names of plants, mammals, birds, fish. Also alternative forms are shown separately in such lists. It is tedious to try to edit the duplicates out manually and group the alternative forms because the edit window does show them alphabetized. This in one of the things that I hold against automatic alphabetization in column templates. If both removal of duplicates and combining alternative forms on a single line were automated for all content in column templates, that would go a long way toward making autoalphabetization desirable for editors. DCDuring (talk) 13:00, 3 November 2023 (UTC)[reply]

@DCDuring How would turning off automatic alphabetisation help with these issues in any way? Theknightwho (talk) 22:23, 3 November 2023 (UTC)[reply]

It would probably double the speed with which I can find duplicates and alt form near-duplicates and make me more willing to do so. It might even cause other contributors to do so. DCDuring (talk) 23:37, 3 November 2023 (UTC)[reply]

@DCDuring So what you're really saying is that we should inconvenience everyone else for the sake of pointlessly sorting the wikicode. No. We can certainly eliminate duplicates automatically, but we don't need to waste everyone's time for the sake of making an unnecessary job a bit more convenient - it would be much more helpful if you spent time doing things that can't be done via automation. Theknightwho (talk) 23:39, 3 November 2023 (UTC)[reply]

But the point is that they have not been done by automation and are not being done by automation. DCDuring (talk) 00:00, 4 November 2023 (UTC)[reply]

@DCDuring The display form is what actually matters. If you need to turn off automatic sorting for yourself, just use sort=0 in a preview. Theknightwho (talk) 00:18, 4 November 2023 (UTC)[reply]

Speaking of alphabetization, I've been dumping loads of Derived terms with blatant disregard for alphabetical order. The idea is that one day a bot is gonna 'betize 'em anyway.P. Sovjunk (talk) 22:08, 3 November 2023 (UTC)[reply]
How hard would it be to alphabetize using a text editor, word-processor, or text-sorting utility before dumping? DCDuring (talk) 23:37, 3 November 2023 (UTC)[reply]
I don't think anyone else wants what you want here, @DCDuring. Asking everyone to sort everything just for a particular case is somewhat selfish. Vininn126 (talk) 23:50, 3 November 2023 (UTC)[reply]
What I've asked for all along is that the auto-alphabetizing column templates not be applied in an automated way. IOW, I've asked for less work from others than they insist on doing. DCDuring (talk) 00:00, 4 November 2023 (UTC)[reply]
@DCDuring Forcing everyone else to manually sort things for your personal convenience is extremely selfish. Most of us don't want to do that, and the convenience issues you raise are very straightforward for you to get around. Theknightwho (talk) 00:49, 4 November 2023 (UTC)[reply]

@DCDuring It sounds like you actually want auto-alphabetisation, but only if "removal of duplicates and combining alternative forms on a single line were automated" as well? I think any reasonable editor would want those things too in the long run (although maybe I should speak for myself). So I am sure we will get there. Wiktionary is a work in progress on every front, including the technical fronts. This, that and the other (talk) 01:09, 4 November 2023 (UTC)[reply]

The draft template {{derived terms}} removes duplicates, just to prove that it is possible. However, that template is only a draft. On my to-do list is to fix it up so that it can be put into production. This, that and the other (talk) 01:01, 4 November 2023 (UTC)[reply]

@This, that and the other It's something that can certainly be integrated into th main column templates, but the issue is making sure that it can account for various oddities like language code differences, qualifiers etc. Theknightwho (talk) 01:05, 4 November 2023 (UTC)[reply]

There's probably a way to fiddle with a template and add a parameter to avoid alphabetization in a template, like alpha=no. I certainly can't do it, but I'm sure it's doable. P. Sovjunk (talk) 10:37, 4 November 2023 (UTC)[reply]

template include size doubles when transcluded??

@Erutuon, Theknightwho Can you help me understand why the template include size more than doubles between Template:label/documentation and Template:label? I added a table to the former that shows all the defined labels. It's rather large at 1,026,442 bytes, but well below the 2M limit. However, it exceeds the 2M limit when transcluded into Template:label. I checked other pages with large documentation tables in them, and e.g. on Template:inflection of/documentation the size is 100,907 bytes, but it increases to 282,012 bytes when transcluded into Template:inflection of. Is there a a way to avoid this? Benwing2 (talk) 05:10, 3 November 2023 (UTC)[reply]

@Benwing2 Have a look at w:WP:Post-expand include size, which has some info about it. There are multipliers that apply in various situations, so that’s almost certainly what’s going on here. Theknightwho (talk) 05:21, 3 November 2023 (UTC)[reply]

(e/c) I "fixed" this by moving the table directly to the noinclude portion of {{label}}, but it's a nasty hack. Benwing2 (talk) 05:21, 3 November 2023 (UTC)[reply]

@Theknightwho There are no if statements or anything in the transclusion; {{documentation}} directly calls Lua. Apparently this is a known bug from way, way back, marked as "will not fix"; see [1]. Benwing2 (talk) 05:24, 3 November 2023 (UTC)[reply]

@Benwing2 Very frustrating. I guess we could use the template parser which would get around this, but it’s not finished yet. Theknightwho (talk) 05:27, 3 November 2023 (UTC)[reply]

Bot request: Thesaurus language code tidying

The Thesaurus namespace is stuck in an awful timewarp. One of the most severe issues is that the main templates used, {{ws sense}} and {{ws}}, do not require a language code. This means that thesaurus entries are not properly categorised according to language, and per-language/script text formatting and automatic transliterations are missing on the listed terms.

{{ws header}} takes a |lang= parameter, but this doesn't make sense, as that template occurs once on each page, while there may be multiple L2s for different languages (just as for our regular entries). Over 3500 thesaurus entries do not have this parameter set, so counting the thesaurus entries in a particular language is difficult. Moreover, it seems impossible to truly be sure of how many different languages are represented in the thesaurus without using dumps.

It's clear that the community wants to retain non-English content in this namespace (Wiktionary:Votes/pl-2017-11/Restricting Thesaurus to English), so to assist with categorisation, could I please request a friendly bot owner to help? We would need to add a language parameter (corresponding to the L2) as the first parameter of every occurrence of at least {{ws sense}}, to allow for proper categorisation, and ideally also {{ws}} as well, to allow for automatic transliterations on pages like Thesaurus:അമ്മ. We may as well also remove the lang= parameter of {{ws header}} while we're there. This, that and the other (talk) 11:56, 3 November 2023 (UTC)[reply]

@This, that and the other Given we now have a much large buffer when it comes to memory issues, it might be worth reconsidering whether we can integrate the thesaurus into the mainspace instead. It's a lot more likely to get attention that way. Theknightwho (talk) 23:03, 3 November 2023 (UTC)[reply]

Content-wise, the namespace isn't especially neglected; it gets occasional, but reasonably consistent, contributions, although I doubt many of the edits are patrolled that closely by experienced contributors. It's more the technical side that is in a state of almost complete neglect, probably not helped by the fact that one of the main and most prolific thesaurus contributors was an editor who was notoriously difficult to work with (but is no longer editing Wiktionary). This bot work is necessary to gain a proper understanding of the content that's in the namespace and I'd say it would be premature to consider any suggestions to integrate the content elsewhere before this work is done. This, that and the other (talk) 00:57, 4 November 2023 (UTC)[reply]

@This, that and the other I have a script to do this. Can you create new sandbox versions of {{ws}} and {{ws header}}, where the first takes a mandatory langcode param in |1= and the second doesn't take a langcode param? Once you do that, I will do the following:

Rename {{ws}} to {{ws-old}}, leaving the former name as a redirect.
Do a bot run to replace all uses of {{ws}} with {{ws-old}}.
Replace the definition of {{ws}} with the new sandbox one.
Do a bot run to rename {{ws-old}} back to {{ws}}, in the process adding the lang code in |1= and moving all numbered params up by one.
Delete {{ws-old}}.
In the process, insert the lang code in |1= for {{ws sense}}.
Do a bot run to remove the |lang= param from {{ws header}}.

The reason for this process is because {{ws}} is being changed in an incompatible way and it doesn't appear possible to distinguish the old calling convention from the new one, since the old calling convention allows for a variable number of numbered params. {{ws sense}} seems to already take a language code and handle both new and old calling conventions, so we don't need to do the same thing for it. Benwing2 (talk) 06:55, 6 November 2023 (UTC)[reply]

@Benwing2 thanks for pitching in! I was considering making a vote to give my bot TTObot the flag, but I would be much more likely to make a silly mistake!

It would be possible to potentially reduce the number of edits required at the cost of marginal additional complexity. Consider that the vast majority of thesaurus entries (97% by my reckoning) link only to terms that are longer than two characters. We could implement temporary logic in {{ws}} that treats parameter 1 as a language code if it is two characters, otherwise, as the term to be linked. That way, the only entries needing to be touched by step 1 would be those containing instances of {{ws}} with two-character terms (and three-character language codes, I guess), which would have an optional, temporary |lang= parameter added.

If you think this is worth doing, let me know and I'll code up a version of {{ws}} with extra smarts. Otherwise I'll just do a basic one where all the parameter numbers are increased by 1. This, that and the other (talk) 09:41, 6 November 2023 (UTC)[reply]

@This, that and the other I think this could get complicated because there are longer language codes, e.g. ine-pro and ine-bsl-pro and zlw-ocs and such. So it might not be worth the extra work to save some edits given the much higher possibility of mistakes and the fact that there are only about 3,500 pages that employ {{ws}} on them currently. Benwing2 (talk) 09:54, 6 November 2023 (UTC)[reply]

@Benwing2 I doubt there is too much Proto-Indo-European in the thesaurus :) Anyway, sure thing, it would have got pretty complex the way I suggested even without accounting for fancy language codes.

I've made {{ws/new}}, but I don't think any changes are required to {{ws header}}, as the |lang= parameter is optional, so it can just be removed from all the invocations. This, that and the other (talk) 10:15, 6 November 2023 (UTC)[reply]

@This, that and the other Can you add support for |pos= to {{ws sense}}? It was formerly present in {{ws header}}, and I just removed the support but it's used e.g. on Thesaurus:प्रसिद्धि and some other Sanskrit pages. Benwing2 (talk) 03:27, 7 November 2023 (UTC)[reply]

@Benwing2 thanks for your awesome work on this! I wonder what would be the impact of removing the |lang= parameter from {{ws header}}? It seems redundant now that {{ws sense}} is doing categorisation. This, that and the other (talk) 04:29, 7 November 2023 (UTC)[reply]

@This, that and the other Once you add support for |pos= and fix up the few places it's used, the impact should be none. Benwing2 (talk) 04:30, 7 November 2023 (UTC)[reply]

@Benwing2 I'm not really sure that |pos= belongs in {{ws header}}; it is a property of the language section, not of the thesaurus entry itself. The param was only used for categorisation, and since the categorisation logic has moved to {{ws sense}}, I guess the |pos= parameter and associated logic should be moved to that template... although I'm not totally convinced that it is needed at all... This, that and the other (talk) 04:34, 7 November 2023 (UTC)[reply]

Enabling Module:form of/lang-data/ttj

Please add ["ttj"] = true to Module:form of in order to enable the language specific tags for Tooro located at Module:form of/lang-data/ttj. Thank you. Ahiise2 (talk) 16:46, 5 November 2023 (UTC)[reply]

@Ahiise2:

Done. — Fenakhay ^{(حيطي · مساهماتي)} 17:01, 5 November 2023 (UTC)[reply]

Thank you! Ahiise2 (talk) 20:42, 5 November 2023 (UTC)[reply]

I can't create a sandbox?

The bot is flagging me as a spammer for trying to create a sandbox page. Is it because my previous user page was deleted? Saph668 (talk) 23:04, 6 November 2023 (UTC)[reply]

Not sure. User:Saph668/Sandbox has now been created. Please make sure that your content is on topic. Sandboxes and user pages in general are held to lower scrutiny than the main dictionary, but we don't provide free hosting for just any material. —Justin (koavf)❤T☮C☺M☯ 23:06, 6 November 2023 (UTC)[reply]

Thanks Saph668 (talk) 23:09, 6 November 2023 (UTC)[reply]

Finding the diacritic-stripped link target for a page title

I am trying to fix a problem in {{ws}} that concerns Arabic entries. Thesaurus:أحزن has an antonym, أَبْهَجَ (ʔabhaja). This word has a corresponding thesaurus entry, Thesaurus:أبهج, so the [⇒ thesaurus] cross-reference should be shown. However, the {{ws}} template does not recognise the existence of Thesaurus:أبهج because it is looking for Thesaurus:أَبْهَجَ, which contains diacritics that are not used in page titles.

I'd like to adjust the parameter to {{#ifexist:}} so that it looks for the correct title, but I cannot find an existing template that will generate this title. Essentially I just want the target of the link generated by {{l}} or {{m}}, not the link itself. It looks like Module:links contains a function export.getLinkPage that would do this, but I don't know enough Lua to make it callable from a template. This, that and the other (talk) 01:21, 8 November 2023 (UTC)[reply]

@This, that and the other You can do it with {{entryname|ar|أَبْهَجَ}}, which gives أبهج. In all honesty, it's probably easier to rewrite it in Lua, because otherwise you'll keep running into issues like this. Theknightwho (talk) 02:45, 8 November 2023 (UTC)[reply]

@Theknightwho Ah, thanks, exactly what I was after. I've put it in a category so hopefully it can be found more easily by others in the future. This, that and the other (talk) 02:51, 8 November 2023 (UTC)[reply]

Tabbed languages?

Is the Tabbed languages gadget failing for anyone else who usually uses it? —Mahāgaja · talk 18:52, 8 November 2023 (UTC)[reply]

Yes, it suddenly died. I'll look. This, that and the other (talk) 22:28, 8 November 2023 (UTC)[reply]

Should be back now. I have no idea why this suddenly broke. I guess something must have changed in the latest MediaWiki deployment. Nor do I have any idea why the code I commented out was present in the TabbedLanguages script in the first place. This, that and the other (talk) 00:13, 9 November 2023 (UTC)[reply]

Reported at phab:T350080#9318075. This, that and the other (talk) 00:17, 9 November 2023 (UTC)[reply]

Thanks! It's back for me now. —Mahāgaja · talk 07:31, 9 November 2023 (UTC)[reply]

Why does Template:senseid use HTML li tag?

@Erutuon, This, that and the other The {{senseid}} template sticks its anchors inside of <li .../> by default. This is highly problematic because it prevents anything from occurring to the left of the {{senseid}}. In particular, {{transclude sense}} puts a {{senseid}} at the left of the generated text, and if the user quite reasonably inserts a label before that, you get an unexpected blank line. Why is 'li' necessary? Why not just use 'span'? That's what was there originally. Benwing2 (talk) 06:22, 9 November 2023 (UTC)[reply]

@Benwing2 It's so the following CSS in MediaWiki:Common.css works:

/* senseids */
.senseid:target { background-color: #DEF; }

The :target pseudoclass applies to the element whose HTML id attribute matches the URL hash. If an empty span were to be used, this pure-CSS approach would need to be replaced by JS code. Once the :has CSS selector is fully supported and becomes established, it would be easy to switch to using an empty span instead. This, that and the other (talk) 07:31, 9 November 2023 (UTC)[reply]

@This, that and the other Thanks. Does this mean that the link to the sense ID is somehow colored? Presumably the text of the sense ID itself isn't colored because it's empty. Can you give me an example where this works? For {{place}} and {{transclude sense}} in particular, I'm going to hack it to use <span>, because having missing background color (in links to sense ID's that probably are never going to be linked to) is better than a highly visible extra newline. Can you go ahead and add the :has selector code to MediaWiki:Common.css so it works correctly with <span> wherever it's supported? (Which browsers are these?) Benwing2 (talk) 07:53, 9 November 2023 (UTC)[reply]

@Benwing2 sorry, I didn't really explain myself! I started writing something on my phone and then switched to my computer, but forgot to re-write the stuff I wrote on my phone.

This CSS rule powers the effect you see when you follow a link like {{m|en|sun|id=Q525}} sun, where the sense linked to is highlighted in blue. Does that help to explain?

I'll look at adding the extra CSS rule in a bit. This, that and the other (talk) 08:05, 9 November 2023 (UTC)[reply]

@This, that and the other Ahh, I see. So for example if I define Picardie in Norman using {{transclude sense}} to transclude the English definition of Picardy, and it generates a sense ID for that definition, someone who links to the Norman definition of Picardy e.g. {{m|nrf|Picardie|id=Q1249603}} will (ideally) see that definition highlighted in blue. You can see on that definition how the auto-generated {{senseid}} doesn't work well with {{lb}}. I think I'd rather lose the blue highlighting than have the extra newline always inserted, esp. if you can add the extra CSS rule so that it works with spans on newer browsers. Thanks! Benwing2 (talk) 09:10, 9 November 2023 (UTC)[reply]

@Benwing2: The tag can be changed in Module:transclude/sense. That requires directly calling the function in Module:senseid rather than expanding {{senseid}}, because {{senseid}} only permits li and p tags, which will generate highlighted text without JavaScript, so it is only a solution because {{transclude sense}} is implemented in Lua, rather than wikitext. Currently no extra processing is done in {{senseid}}, so no features are missed by calling directly into Module:senseid. — Eru·tuon 15:49, 9 November 2023 (UTC)[reply]

@Benwing2 I added the CSS, but I seemed to have some trouble getting it to work - I'd be interested to learn of your results. This, that and the other (talk) 01:11, 10 November 2023 (UTC)[reply]

@This, that and the other It seems to work for me, thanks! See User:Benwing2/test-senseid-link. This is a link using {{m|nrf|Picardie|id=Q1249603}} to the Norman entry for Picardie, using the sense ID added by {{transclude sense}}. User:Erutuon already changed the code in Module:transclude/sense to use a "span" instead of "li". This is using Chrome version 119.0.6045.105 on Mac OS Ventura 13.3. Benwing2 (talk) 01:30, 10 November 2023 (UTC)[reply]

@Benwing2 works for me too on Chrome. My testing was admittedly somewhat artificial, so it's good to see that a real example works.

In Firefox, as expected, only the content of {{transclude sense}} is highlighted, not the whole list item. This, that and the other (talk) 01:35, 10 November 2023 (UTC)[reply]

@This, that and the other Hmm, which is more correct? Presumably the Firefox behavior? Does "as expected" refer to Firefox's generally better implementations of W3 standards? Benwing2 (talk) 01:51, 10 November 2023 (UTC)[reply]

@Benwing2 it refers to the fact that MDN's compatibility table (which I linked in my first, somewhat incomprehensible, reply) says that Firefox currently doesn't support the :has selector. The :has selector is in the standard so I suppose Firefox is yet to catch up. This, that and the other (talk) 03:03, 10 November 2023 (UTC)[reply]

@This, that and the other Aha, OK; thanks for implementing it! Benwing2 (talk) 03:41, 10 November 2023 (UTC)[reply]

@Benwing2 {{transclude sense}} has a parameter for labels. (But I agree it should be changed). Vininn126 (talk) 10:13, 9 November 2023 (UTC)[reply]

I'm not 100% sure whether the changes made here have affected this, but I believe redirects to senseid anchors also used to result in blue highlighting, e.g. clicking on the talk used to result in the relevant sense of talk being highlighted upon landing on that page, but there is no longer any highlighting for me in Firefox or Chrome (although the Picardie link above does result in highlighting on that page). Highlighting was helpful due to the known bug that the screen can "jump" when tables collapse (see Wiktionary:Tea_room/2023/November#wrong_link_on_tetigere for a recent description), so it'd be nice if we could find a way to restore the functionality. (If not, I agree with Benwing that if we have to choose, not having the newlines seems like a higher priority than having the blue.) - -sche (discuss) 06:28, 10 November 2023 (UTC)[reply]

@-sche By "used to" do you mean up until the last 12 hours or so, or some time back? Benwing2 (talk) 06:58, 10 November 2023 (UTC)[reply]

I don't encounter redirects to senseid senses often, so I can't be sure whether the changes made here are what removed the blueness, but Internet Archive's most recent archive, from 26 September, had blue highlighting. (Were there other changes made between then and now to how senseid links work?) - -sche (discuss) 07:25, 10 November 2023 (UTC)[reply]

I can still see the blue on that redirect. This, that and the other (talk) 09:53, 10 November 2023 (UTC)[reply]

@This, that and the other I don't see it (Chrome version 119.0.6045.105 on Mac OS Ventura 13 as above); but strangely, I do see it upon refresh. Same behavior with Safari version 16.4 (I don't have Firefox installed). On which browser are you running? Benwing2 (talk) 10:08, 10 November 2023 (UTC)[reply]

The blue highlight works consistently on Edge 119.0.2151.44 and Chrome 118.0.5993.120, both on Windows, when I click the link [[the talk]], even if I have to scroll up a bit to see the actual blue-highlighted definition. When browsing directly to the URL https://en.wiktionary.org/wiki/the_talk it seems intermittent.

On Firefox the highlight only appears after refreshing the page.

Not sure why this would happen: I find it hard to believe that my recent change is to blame, but I could undo it for an experiment if it would help.

When clicking {{m|en|the talk}} the talk it does not work in any browser, probably because Tabbed Languages interferes with the hash component of the URL. This, that and the other (talk) 10:22, 10 November 2023 (UTC)[reply]

{{m|en|the talk}} translates to the talk#English. I don't have Tabbed Languages enabled, and when I click that, I go to talk#English. It seems that the fragment #English overrides the fragment #English:_the_talk specified in the redirect page. — Eru·tuon 15:52, 10 November 2023 (UTC)[reply]

Fascinating, refreshing the page causes the blue highlighting to appear for me, too, in both Firefox and Chrome, but going to the talk and being redirected for a second time does not cause any blue. I'm not sure why the blue only appears when refreshing the page (not when navigating from the talk) in this case, but appears straight away (without requiring any refreshing) when navigating from Benwing's test page to Picardie, or when clicking a {{l}} link to get to talk, like the one I inserted here, or even just using a direct URL link like the one I inserted here. So, it seems like a soft redirect that made the user click through to talk would result in the sense of talk being highlighted, but a hard redirect doesn't. This is probably a bad idea, but FWIW if we had e.g. some javascript that would notice when a user had been redirected to an anchor and refresh the page after a second, it would solve both the "there is no blue until refresh" issue and the "page jumps, so link appears to go to wrong section" (tetigere) issue. But it would probably annoy people who'd started reading or scrolling in the meantime. - -sche (discuss) 16:18, 10 November 2023 (UTC)[reply]

In Firefox I also don't see the "customary conversation" definition highlighted on clicking the talk and redirecting to talk#English_the_talk. This, that and the other didn't change the CSS rule that highlights the definition, and I can't remember visiting a redirect where the target was a sense ID anchor, so I can't verify whether the definition used to be highlighted. The archive link is a direct link to the sense ID target, so there is no redirect involved and it has a highlighted definition just as talk#English:_the_talk does. It seems like it's a browser bug where the element that's identified by the URL fragment #English_the_talk isn't being selected by the :target CSS selector when the page talk has been reached by a redirect. That's assuming the web standards say :target selector is meant to work after a redirect, or at least don't say it's not supposed to work. — Eru·tuon 15:42, 10 November 2023 (UTC)[reply]

ditto in it-verb

On prudere#Verb, the conjugation reads

prùdere (first-person singular present prùdo, first-person singular past historic (rare) prudétti or (ditto, traditional) prudètti, no past participle)

Can it just say rare, traditional instead? "Ditto" seems unacademic and breaks the readers' train of thought. And it might be somehow mis-parsed if the code changes and someone is looking at an old diff. Thanks, —Soap— 18:39, 9 November 2023 (UTC)[reply]

@Soap The reason I put in the ditto notation was to avoid repeating long sentences that sometimes occur. This can definitely be made smarter, so that e.g. it only does this if the qualifier(s) in question are longer than a certain length, and the word itself can be changed (any ideas?). Benwing2 (talk) 20:41, 9 November 2023 (UTC)[reply]

@Benwing2 The footnotes that Russian headwords have works quite well:

трюм • (trjum) m inan (genitive трю́ма, nominative plural трю́мы or трюма́^*, genitive plural трю́мов or трюмо́в^*) (* In professional speech.)

Theknightwho (talk) 00:33, 13 November 2023 (UTC)[reply]

Sorting mutated form categories

Can someone (e.g. @Benwing2) edit either the "mutated form of" templates ({{aspirate mutation of}}, {{eclipsis of}}, {{h-prothesis of}}, {{hard mutation of}}, {{lenition of}}, {{mixed mutation of}}, {{nasal mutation of}}, {{soft mutation of}}, {{t-prothesis of}}) or Module:form of/templates (which they invoke) so that the corresponding categories are sorted alphabetically by the base form (i.e. |2= of the form-of template) rather than by the page name? Thanks! —Mahāgaja · talk 20:37, 10 November 2023 (UTC)[reply]

Should be added to all of these templates. Sorry for the multiple pings. Benwing2 (talk) 22:38, 11 November 2023 (UTC)[reply]

Great, thanks! —Mahāgaja · talk 08:01, 12 November 2023 (UTC)[reply]

Request for deletion of "persón", not a Catalan word.

Request for deletion of "persón", not a Catalan word. Such a word does not exist in Catalan, an even if it did, it would not be written with "ó". Esberginia (talk) 16:36, 11 November 2023 (UTC)[reply]

This is the Grease Pit, for technical matters, not substantive language matters. Remove {{rfd|ca}} and add {{rfv|ca}} (Request for Verification) at "persón", and use the small "+" to add this to the right page. DCDuring (talk) 18:07, 11 November 2023 (UTC)[reply]

Ping @Esberginia in case you missed DCDuring's message. Thanks for your contributions! This, that and the other (talk) 00:22, 13 November 2023 (UTC)[reply]

Thanks, definitely missed previous reply.

Also, could you direct me to the place where to ask questions on grammatical templates and the like?

Esberginia (talk) 12:05, 13 November 2023 (UTC)[reply]

@Esberginia This is the place to ask for technical advice on templates. If your questions is more linguistic in nature you could try the Tea Room, but i'm not sure if we have many (any?) active Catalan contributors. This, that and the other (talk) 23:02, 13 November 2023 (UTC)[reply]

@Esberginia: What's your question regarding the template? If it's a question about a specific template, you can also try starting a discussion on its talk page, although editors don't always watch these pages and might miss it, so best to {{ping}} someone in particular. Check the history to see who's worked on the template. There's also a talk page for general language level discussions, Wiktionary talk:About Catalan (almost empty, though). Jberkel 00:04, 14 November 2023 (UTC)[reply]

Hard-coded Albanian reference links

While cleaning up hard-coded URLs linking to Wikipedia, I came across a whole bunch of references to a certain Albanian-Latin dictionary, added without any templates by someone who was later blocked for other reasons. For example, at Albanian pak:

<ref>[https://archive.org/details/fialuurivoghels00junggoog/page/n112/mode/2up Fialuur i voghel Sccyp e ltinisct (Small Dictionary of Albanian and Latin), page 94], by P. Jak Junkut, 1895, [https://en.wikipedia.org/wiki/Shkodër Sckoder]</ref>

Click on Special:Search/insource:"Wikipedia.org/wiki/Shkodër" to see all 68 of them. I don't really want to just convert the Wikipedia part to {{w|Shkodër|Sckoder}} in all of these, since it seems like a lot of work that will only address one aspect of the problem. I'm not sure whether we want to create a template for these, given that no one else will probably ever link to this source, or convert them to something like {{quote-book}}, or just remove them- but whatever we do, this looks like bot work. Anyone interested? Chuck Entz (talk) 21:49, 11 November 2023 (UTC)[reply]

Template:R:nan:thcwd parameter update

Every entry in this dictionary is assigned a unique ID, and the reference template takes the entry's ID as the main parameter. However, due to substantial updates to the dictionary this year, nearly all entry IDs have been modified. If feasible, I think we should use a bot to rectify this issue. --TongcyDai (talk) 01:34, 12 November 2023 (UTC)[reply]

Fixing up Portuguese pre-AO forms with bots

@WingerBot The Portuguese entries for spellings before the orthographic agreements are a mess. They use like three hundred templates and for the agreements made before 1990, they're absurdly inconsistent. Sometimes a word that was standard until the 70s was flagged as "obsolete", etc. etc. For the past couple of days, I've been endeavoring to straighten up every single old form, and to this end, after talking to people on Discord, I've created {{pt-pre-reform}}.

I need it to be applied to pages using the previous templates (there are about 1k of them), and I heard this would be really really easy with bots. I don't know how to do it though. If someone could apply these changes for me, I'd be really really grateful. Here are the exact replacements that need done:

{{pt-pre-1911|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-archaic-hellenism|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-archaic-latinism|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-obsolete-hellenism|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-obsolete-silent-letter-1911|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-archaic-silent-letter-1911|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=11}}
{{pt-dated-differential-accent|WORD}} -> {{pt-pre-reform|WORD|br=71|pt=45}}
{{pt-obsolete-differential-accent|WORD}} -> {{pt-pre-reform|WORD|br=71|pt=45}}
{{pt-dated-secondary-stress|WORD}} -> {{pt-pre-reform|WORD|br=71|pt=73}}
{{pt-obsolete-secondary-stress|WORD}} -> {{pt-pre-reform|WORD|br=71|pt=73}}
{{pt-superseded-silent-letter-1990|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=90}}
{{pt-superseded-ü|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=45}}
{{pt-obsolete-ü|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=45}}
{{pt-superseded-paroxytone|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=90}}
{{pt-superseded-diacritic-Brazil|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-obsolete-ôo|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-superseded-ôo|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-superseded-éia|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-obsolete-éia|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-superseded-hyphen|WORD|dialect=Brazil}} -> {{pt-pre-reform|WORD|br=90|pt=0}}
{{pt-superseded-hyphen|WORD|dialect=Portugal}}-> {{pt-pre-reform|WORD|br=0|pt=90}}
{{pt-superseded-hyphen|WORD}} -> {{pt-pre-reform|WORD|br=90|pt=90}}
{{pt-archaic-sc|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=45}}
{{pt-obsolete-sc|WORD}} -> {{pt-pre-reform|WORD|br=43|pt=45}}

I've tested and re-tested this in Wiktionary:Sandbox, so it should be working 100% as intended... It's a lot of lines, but that's because a bunch of these are redirects to each other; I'd been trying different solutions and learning about each reform deeply as I went. I hope this isn't too big an ask. MedK1 (talk) 04:35, 13 November 2023 (UTC)[reply]

I enjoy how you pinged the bot itself... maybe it is only a matter of time before our bots are hooked up to LLMs and can carry out tasks like this autonomously...

I'm not sure how I feel about the numbers in this template. I suppose there is a good reason why they have been used, but at the very least, they must be documented at Template:pt-pre-reform/documentation. This, that and the other (talk) 05:43, 13 November 2023 (UTC)[reply]

Yeah, of course! Making documentation is the plan. It should be there by tonight. The TL;DR though is that they mark what reform got rid of what kinds of words (there was more than one reform and they weren't exactly synchronous). I went for numbers rather than years because 1) it means there are less bits you have to replace/write when adding the template to a page or changing how a specific word is classified; and 2) the 1943 reform was technically only applied in 1946, and the 1990 one wasn't applied instantly either; if someday people decide to change the dates in the template to reflect that, no pages would need edits to go along with it. 2804:1B0:1903:FF5F:6580:2887:B51E:2729 10:21, 13 November 2023 (UTC)[reply]

@MedK1 Unfortunately I don't get pings addressed to my bot. I can carry out this change but I definitely think you should replace the numbers with something less opaque: Either named abbreviations of the reforms in question or years, and use country codes for the different countries referenced. Benwing2 (talk) 22:58, 13 November 2023 (UTC)[reply]

@Benwing2 My bad, I wasn't aware you didn't actually get pinged; I was imagining you had to log into the account in order to start applying the changes. Guess it just shows how little I know about this haha!

I was thinking about listing the reforms in the documentation, with a short summary of what each one does (and a few examples of which words fall where) similar to what I did in {{pt-archaic-sc}}. The numbers aren't random, they refer to the order at which the reforms happened: Portugal had 4 reforms (1911, 1945, 1973, 1990) and Brazil had 3 (1943, 1971, 1990). So doing "2" for the Portugal parameter would get you the "1945" description and "3" for Brazil, the 1990 description. Would this solution work?

As for the country codes, I actually wish I thought of that! I did consider that there could be confusion a la "does Brazil come first in the template or does Portugal?". All I could think about was putting it in alphabetical order and calling it a day. Making, for example, {{pt-pre-reform|WORD|br=1|pt=1}} is a much more elegant solution and I'm applying it right away! MedK1 (talk) 00:54, 14 November 2023 (UTC)[reply]

@MedK1 I use the pywikibot library to implement my bot code, and it logs into my bot account automatically (except when I need to do bot actions that need admin privileges, like deleting pages; for these I have it log into my admin account). I still think it would be better to use years rather than numbers; it seems especially confusing that the numbers refer to different reforms for Brazil vs. Portugal. In general, it's better to use human-parsable/memorable abbrevs rather than numbered or lettered ones unless the items in question are well-known by their numbers. So for example the seven classes of Germanic strong verbs are usually identified by number, same with the declensions and conjugations of Latin, but Germanic noun declensions don't have well-known numbers so we refer to them by name (e.g. i-stem, a-stem, ō-stem, etc.). I don't think the issue with the Brazilian 1943 reform being adopted in 1946 is a big issue with using years; Wikipedia, for example, identifies the reforms by years and refers to the "orthographic reform of 1943". Benwing2 (talk) 02:22, 14 November 2023 (UTC)[reply]

@Benwing2 There! I've coded it so you can use years; br=1971|pt=1945 instead of br=2|pt=2. I thought it'd be nice if br=71|pt=45 worked as well, so it does! I'm gonna work on the documentation now, but other than that, it should be all set. MedK1 (talk) 02:55, 14 November 2023 (UTC)[reply]

@MedK1 Thanks! Let me see about coding up a bot script. Benwing2 (talk) 04:33, 14 November 2023 (UTC)[reply]

Can you change the table above to reflect the new template calling syntax? Benwing2 (talk) 04:37, 14 November 2023 (UTC)[reply]

Ah, of course.

It's done, @Benwing2! MedK1 (talk) 14:19, 14 November 2023 (UTC)[reply]

Btw @Benwing2, I've made the documentation as well! MedK1 (talk) 02:11, 15 November 2023 (UTC)[reply]

@MedK1 I applied the above changes except to the following, which have extraneous params:

Page 344 proêmio: WARNING: Unrecognized param: from=Brazilian Portuguese form
Page 95 metempsychose: WARNING: Unrecognized param: 2=
Page 53 metempsychose: WARNING: Unrecognized param: 2=
Page 88 abobada: WARNING: Unrecognized param: t=vault, arched ceiling

If you can fix these last three, I'll delete the old templates. Benwing2 (talk) 07:41, 15 November 2023 (UTC)[reply]

@MedK1 In addition to what Benwing has noted, I noticed that sciencia is in Cat:Portuguese forms superseded by AO1990, whose description states it contains terms current from 1971 to 2008, but the entry itself says the term was obsolete by 1943. Can you check what's going on here? This, that and the other (talk) 12:17, 15 November 2023 (UTC)[reply]

The change from 1/2/3 to actual years was affecting the calculations for what categories to apply. I've since fixed that and the pages in @Benwing2's list; every page should be working alright now. MedK1 (talk) 16:08, 15 November 2023 (UTC)[reply]

Maybe improve Template:rfap to add a link to the existing pool of Lingua Libre records?

Let's take a look at the Russian word молот (molot) as an example. Right now it has a pronunciation audio created by @Ivnadur, which is nice. Except that the humming noise in the background is a bit annoying. Do we have any possible replacements for it? Yes! Going to https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-rus?from=молот allows us to easily see 4 existing pronunciation records of the same word "молот" from different speakers. I think that it would be useful if the Template:rfap template could generally give the users a hint to check the Lingua Libre records pool with an appropriate link. Currently it looks like this:

This entry needs audio files. If you are a native speaker with a microphone, please record some and upload them. (For audio required quickly, visit WT:APR.)

I suggest to modify it like this:

This entry needs audio files. If you are a native speaker with a microphone, please record some and upload them. But there may be even some existing Lingua Libre records here. (For audio required quickly, visit WT:APR.)

Maybe the other parts of the message could be changed too (is WT:APR still relevant?). And on the technical side, the two-letter language code needs to be substituted with a three-letter code in the link. Ssvb (talk) 11:38, 13 November 2023 (UTC)[reply]

Great idea. I've always been confused by the phrase "For audio required quickly" - who on earth "requires" an audio pronunciation "quickly"? @Ssvb check it out now. This, that and the other (talk) 00:59, 16 November 2023 (UTC)[reply]

I can think of many reasons why one might need an audio recording of a pronunciation quickly, but they're not going to get it here. If you need an audio of a pronunciation right now, you go to YouTube and find someone talking about an issue that includes the word in question. —Mahāgaja · talk 07:17, 16 November 2023 (UTC)[reply]

@Mahagaja: I think that right now the presence of the {{rfap|en}} template in an article already effectively means "this pronunciation is needed more urgently than the others". Because the lists of words for Lingua Libre (such as this one) are constructed automatically regardless of the presence or absence of the rfap template in Wiktionary articles. —Ssvb (talk) 12:34, 16 November 2023 (UTC)[reply]

@This, that and the other Thanks, this looks better. Though I believe that it would be useful to have a link to a more precise location. Not to the whole Lingua Libre category, but also pinpoint the right language and the right word itself. A Lingua Libre bot automatically maintains the list of English words lacking pronunciation audio here. In the latest update of this list, the bot removed words decoction, lawlessness and maidservant from the list, because some contributors already recorded pronunciation samples for these words. Now let's look at the lawlessness Wiktionary article. If somebody adds a pronunciation section with an rfap template to it, then it would be useful to precisely link to https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-eng?from=lawlessness from the template notice banner. Rather than sending people just in a general direction. —Ssvb (talk) Ssvb (talk) 12:16, 16 November 2023 (UTC)[reply]

@Ssvb The reason I didn't add the language name is I'm not aware that we maintain a list of three-character ISO codes anywhere on this wiki, and I couldn't be bothered to start one... This, that and the other (talk) 22:30, 16 November 2023 (UTC)[reply]

@Ssvb, This, that and the other: I can make a module for this based on List of ISO 639-1 codes on Wikipedia, but I need to know which three-letter codes are used by Lingua Libre, as the Wikipedia article lists two codes for some languages (e.g. 'sqi' and 'alb' for Albanian). [I am guessing that it's the ISO 639-2/T codes (e.g. 'sqi' not 'alb'), because these match ISO 639-3.] Benwing2 (talk) 23:08, 16 November 2023 (UTC)[reply]

@Benwing2 I wasn't able to identify a list of languages in their GitHub organisation, but the list of categories c:Cat:Lingua Libre pronunciation might suffice. It does appear that "sqi" is used. This, that and the other (talk) 23:23, 16 November 2023 (UTC)[reply]

@Ssvb, This, that and the other: I created Module:ISO 639 to do the conversion, and used it to add the language code to the link. Benwing2 (talk) 23:38, 16 November 2023 (UTC)[reply]

@This, that and the other, @Benwing2: I have updated Template:rfap to link to the exact word and reworded the message. Now the lawlessness article's rfap banner has a direct link to the two Lingua Libre pronunciation candidate records to choose from. Or more than two if additional pronunciation records show up later. Hopefully the article editors will find this convenient. —Ssvb (talk) 18:43, 17 November 2023 (UTC)[reply]

@Benwing2 It would be better to use Module:wikimedia languages. Theknightwho (talk) 17:14, 18 November 2023 (UTC)[reply]

@Theknightwho How should I use that? It appears to map *from* Wikimedia codes *to* Wiktionary codes, whereas the table I created maps *from* Wiktionary two-letter codes *to* Lingua Libre 3-letter codes (which may or may not be the same as Wikimedia language codes). In general our handling of mappings between Wikimedia and Wiktionary languages is a mess, with data in four (now five, with my module) different places. Benwing2 (talk) 08:01, 19 November 2023 (UTC)[reply]

Should `{{etydate}}` categorize?

As the title says. It might be useful to look up first attestations by time period. As a result we'd probably have to add a langcode paramater to it. Vininn126 (talk) 16:25, 13 November 2023 (UTC)[reply]

I think that'd be nice. Similarly, I feel that having access to citation/quotation dates by time period (by century, perhaps?) would be useful too. MedK1 (talk) 02:57, 14 November 2023 (UTC)[reply]

@MedK1 FYI User:DCDuring asked awhile ago for better versions of {{timeline}} and {{en-timeline}} that didn't just group by century but intelligently chopped up the attested range of years. I agree with this but it's a matter of finding the time to implement it ... Benwing2 (talk) 04:35, 14 November 2023 (UTC)[reply]

@Benwing2 I think making {{etydate}} categorize should be relatively easy, the category probably just needs to see the output and generates text X language terms attested in OUTPUT. Perhaps the markup would be harder to write since we include parameters such as {{{r}}} and text input. Vininn126 (talk) 10:19, 14 November 2023 (UTC)[reply]

The documentation for this says it has to do with senses.

If so, why does the documentation say that it appears in the etymology section.
Why should sense-specific content appear in the etymology section?
A basic indication for the time period of a sense is now addressed at each definition line by {{defdate}}.
If all or some of the citations for a definition appear on the citations page, can the template include all the sense's citations?
If we are to do this, shouldn't the template be deployed on individual definition lines, hidden by default?

Also, is it not misleading to suggest that our typical array of citations, resulting from RfVs, reflects first usage or the relative frequency of usage of the sense? The major need for this to be useful would be to have many more attested definitions and much more attestation per definition. I doubt that there will be sufficient additional attestation in the foreseeable future to make this more than rarely useful. DCDuring (talk) 14:34, 14 November 2023 (UTC)[reply]

@DCDuring It says that because a while I started a thread where we all agreed it should be for etylines and I either forgot to change it or someone changed it back after. Furthermore I have no idea what you're talking about. It's very possible to find first attestations, you're just being rather negative as usual. I have plenty of Polish entries with this, and I have seen interest from many users to include this information. Your complaints are simply that and I'm getting tired of them. Vininn126 (talk) 15:02, 14 November 2023 (UTC)[reply]

@Vininn126 I don't think making it categorize would be hard, but we'd have to add the language code everywhere (to about 7,000 uses). Benwing2 (talk) 22:47, 14 November 2023 (UTC)[reply]

@Benwing2 Sounds like a botjob? Vininn126 (talk) 08:49, 15 November 2023 (UTC)[reply]

@Vininn126 Yes, assuming there is consensus to do this. Benwing2 (talk) 09:25, 15 November 2023 (UTC)[reply]

I was the one who came up with this in the discord, and thus agree with some method of being able to search for words by attachment date, however that is best to be implemented. Akaibu1 16:16, 15 November 2023 (UTC)[reply]

@Benwing2 There is mild consensus to do this, and it seems fairly harmless. Can we go ahead with this? Vininn126 (talk) 18:07, 29 November 2023 (UTC)[reply]

@Vininn126 Sure, although I won't be able to get to it right away, as I have several other projects underway. Benwing2 (talk) 07:29, 30 November 2023 (UTC)[reply]

@Benwing2 Would it be possible to take a look now? Vininn126 (talk) 11:59, 16 January 2024 (UTC)[reply]

@Vininn126 I'd need a scheme for how to categorize a given year. Presumably it should not be more granular than a century since {{etydate}} often contains a century rather than a specific year. Benwing2 (talk) 06:13, 17 January 2024 (UTC)[reply]

@Benwing2 Do you think using just the given input text would be too granular? Vininn126 (talk) 08:18, 17 January 2024 (UTC)[reply]

@Vininn126 Yes, absolutely, because (a) the text can be free-form, (b) it is sometimes a year, and we'd have an enormous number of specific-year categories. I think it needs to be something like 0-500 AD, 500-1000AD, 1000-1200AD, 1200-1400AD, 1400-1600AD, 1600-1700AD, 1700-1800AD, 1800-1900AD, 1900-2000AD, 2000-2100AD. It might need to vary depending on the language in question; that isn't too hard to implement with a data module, along with a generic fallback similar to the one I just enumerated. Benwing2 (talk) 09:06, 17 January 2024 (UTC)[reply]

@Benwing2 Do you think to the quarter-century makes sense, or perhaps to the half-century? Vininn126 (talk) 09:16, 17 January 2024 (UTC)[reply]

@Vininn126 Maybe for specific languages only. The problem is what to do with {{etydate|pl|14th century}} in that case; assign it to the latest possible date? Benwing2 (talk) 09:24, 17 January 2024 (UTC)[reply]

@Benwing2 Perhaps we should have a way to regularly input centuries and half-centuries...? Vininn126 (talk) 09:28, 17 January 2024 (UTC)[reply]

@Vininn126 If you want to do that it sounds like you'll need to put some thought into overhauling the structure of the argument to {{etydate}}. Please do do this if you have time; I am happy to implement a well-thought-out plan but I don't really have time to think through a plan myself, esp. as I'm not the primary user of the template. Benwing2 (talk) 09:57, 17 January 2024 (UTC)[reply]

Adding Luganda terms with noun classes

Newcomer Elizabeth Nakiwu (talk • contribs) has been adding Luganda terms in the sections for translations, but the section for noun classes ought to be visible when Luganda terms are added. -- Apisite (talk) 11:43, 17 November 2023 (UTC)[reply]

@Atitarev, Elizabeth Nakiwu P.S. I mean checkboxes for the noun classes when adding, for example, Swahili terms in the section for translations. --Apisite (talk) 21:27, 17 November 2023 (UTC)[reply]

Romanisation of っ in Template:ja-suru-tsu

{{ja-suru-tsu}} romanises っ as ' even in the middle of the verb forms shown in the conjugation tables it generates, so that 接(せっ)する (sessuru) is romanised incorrectly as *se'suru, and so on. っ is normally romanised as the following consonant word-medially to represent gemination, and ' word-finally to represent a glottal stop, so this might be because the template romanises the verb root (the second parameter in the template) and the verb endings separately. Could this be fixed? Mcph2 (talk) 06:20, 18 November 2023 (UTC)[reply]

@Mcph2 It's because {{ja-verbconj-row}} adds a dot between the stem and the suffix, so your example is being transliterated as せっ.する, which gives "se'suru". Presumably this is so that stems ending in お transliterate correctly (i.e. "ou" instead of "ō"), but the implementation is really crude. The Japanese conjugation templates need a total rewrite, in my opinion. Theknightwho (talk) 16:15, 18 November 2023 (UTC)[reply]

These things worked previously, and when I rewrote the romanization code, I kept the old code alongside to see what new failures would come up. I guess that care wasn't done this time. —Fish bowl (talk) 23:09, 22 November 2023 (UTC)[reply]

@Fish bowl: As you correctly added this test case to Module:Hrkt-translit/testcases, せっする (se'suru) should still produce "sessuru", not "se'suru". @Theknightwho: FYI. Anatoli T. ^{(обсудить}/^вклад) 02:38, 26 November 2023 (UTC)[reply]

@Atitarev @Fish bowl The issue is the dot: せっ.する. I'm not actually convinced that we should change the module or that the testcase is helpful, since this amounts to ignoring the dot in certain situations, and I don't know if this could be relevant for any of the Ryukyuan languages. Seems more sensible to simply update the conjugation template instead. Theknightwho (talk) 02:43, 26 November 2023 (UTC)[reply]

@Theknightwho: Maybe you're right, I am not sure now. A dot in kana is inserted to mark phoneme boundaries, as in {{ja-r|小馬座|こ.うまざ}} gives 小馬座(こうまざ) (koumaza) instead of the wrong こうまざ (kōmaza) "kōmaza" (without the dot). Anatoli T. ^{(обсудить}/^вклад) 02:55, 26 November 2023 (UTC)[reply]

@Atitarev Precisely - that's why it's being transliterated as though it were せっ + する. Theknightwho (talk) 02:58, 26 November 2023 (UTC)[reply]

@Theknightwho: Aha, thanks. Then your case, @Fish bowl, may be wrong and the template should be fixed, not the translit module. Do you agree? Anatoli T. ^{(обсудить}/^вклад) 03:05, 26 November 2023 (UTC)[reply]

In this case I don't have any particular opinion, but I believe we need to look for other missed cases of transliterations changing after the new module code, as I did in the past. —Fish bowl (talk) 03:52, 27 November 2023 (UTC)[reply]

@Fish bowl, @Theknightwho: Yes, enabling tracking seems like a good idea. Anatoli T. ^{(обсудить}/^вклад) 03:58, 27 November 2023 (UTC)[reply]

Putting entire etymologies inside the gloss or pos parameter

I've seen the kind of thing this edit fixed in a number of other entries myself, I've even seen multi-sentence etymologies presented in the gloss parameter (although not often). Should we scan a database dump for and make a list of all the places where people have stuck e.g. other ety templates like {{bor}} or {{der}} or {{af}}, or multiple other {{m}}s, or the like, inside the gloss or pos= parameters of an {{m}}, {{bor}}, {{af}}, etc? - -sche (discuss) 16:02, 18 November 2023 (UTC)[reply]

Just looking at the pos parameter on pages that use {{m}}, there are probably thousands of instances of something other than part of speech on the right side of "pos=", including grammatical info. The underlying causes would seem to be that 1. folks believe information should be in templates whenever possible, 2. there is no input control on "pos=", and 3. there is no parameter for non-conforming information. Perhaps an input filter for non-conforming material following "pos=" would be useful to prevent one class of erroneous input. Appearance in a gloss seems much harder to cleanup and prevent, probably requiring lots of searches for particular patterns. DCDuring (talk) 16:55, 18 November 2023 (UTC)[reply]

It would probably be better to just replace pos with ng (for "non-gloss"). Theknightwho (talk) 17:12, 18 November 2023 (UTC)[reply]

Don't links ever use the pos parameter to generate section links? That would work for at least the first Etymology section of the first L2. Is there no other use for it? DCDuring (talk) 00:11, 19 November 2023 (UTC)[reply]

@Theknightwho I am inclined to agree with we should consider replacing pos with ng, since the pos parameter is frequently used for arbitrary non-gloss information that people don't want to appear inside of quotes. User:-sche I don't think the pos param is used to generate section links; that's what id is for. pos is ideally for clarifying what the POS is in case of ambiguity but in reality as pointed out above is for arbitrary non-gloss, non-quoted info. Benwing2 (talk) 08:18, 19 November 2023 (UTC)[reply]

To me, changing pos= to ng= seems at best orthogonal to the issue (and at worst undesirable, if it promotes things like what Al-Muqanna cleaned up); regardless of what we call the parameter, I don't think the whole rest of the etymology belongs as a parameter of the first {{bor}}, does it? It would be tedious to go back and find examples, but I've seen long strings of "from language X term {{m|foo|Y}}, from {{der|language A||term B}}, from language C term {{af|foo|D|E}}" inserted as the gloss or as the pos= of a {{der}} or similar template at the start or even in the middle of an etymology, and I have thought this was substandard; in the past, I've silently fixed such entries to be formatted more like 'normal' entries, just like Al-Muqanna's edit that I linked; I raised the topic here thinking it would lead to a cleanup list. If people instead think such formatting is fine, I am surprised but will stop 'fixing' it, then! :o - -sche (discuss) 08:49, 19 November 2023 (UTC)[reply]

I myself often put brief etymologies inside |pos1= in {{af}} to show the etymology of the first morpheme. For example, I might write the etymology of tragelaphic as "From {{af|en|tragelaphus|-ic|pos1=from {{der|en|grc|τραγέλαφος}}}}" to yield tragelaphus (from Ancient Greek τραγέλαφος (tragélaphos)) +‎ -ic. That's easier than typing "From {{m|en|tragelaphus}} (from {{der|en|grc|τραγέλαφος}}) + {{af|en|-ic}}". If the etymology is of a non-English word, it also yields a tidier result since the gloss and interior etymology are inside the same set of parentheses rather than two different sets. —Mahāgaja · talk 09:54, 19 November 2023 (UTC)[reply]

I think this could be helpful for more convoluted etymologies that jump back and forth between different morphemes but in the Latin tragelaphus case I cleaned up the parentheses don't serve much purpose when the etymology is a linear chain, irrespective of any technical considerations. I also use the pos parameter for glosses occasionally and I agree with Benwing that "ng" would make sense. —Al-Muqanna المقنع (talk) 09:59, 19 November 2023 (UTC)[reply]

Using "ng" as a substitute for "pos" would seem a little confusing as {{ng}} involves italicizing and this proposed use does not. Maybe it doesn't matter if we have separate parameter dialects and folks can context-switch easily, but it seems bad for newer and occasional users. Keeping "pos" seems wrong, as is the documentation for the parameter's use in the various templates. Any other suggestions? How about "misc" or "other" or "oth"? I'm told that this kind of mass change can be readily done automagically. DCDuring (talk) 17:48, 19 November 2023 (UTC)[reply]

@-sche No, I agree with you that etymologies (and in general nested templates) should not be stuffed inside of |pos=. I've used it for things like "all meanings" that aren't parts of speech but aren't glosses either. I've also seen it used for random notes like "with nasalization". Benwing2 (talk) 23:17, 19 November 2023 (UTC)[reply]

How does `{{t+|fa-ira|...}}` figure out what page to link to?

I'm trying to figure out how {{t+|fa-ira|کارْبَر}} knows to link to [[کاربر]] and [[fa:کاربر]] rather than to [[کارْبَر]] and [[fa:کارْبَر]], so that I can take that into account in my bot that updates between {{t}} and {{t+}}. (Currently my bot selects {{t}} in that case,^[diff] because it thinks the relevant page is [[fa:کارْبَر]], which doesn't exist; but it should select {{t+}}, because the relevant page is actually [[fa:کاربر]], which does exist.)

One thing that seemed potentially relevant is that fa-ira is mapped to fa.wikt in the interwiki_langs table at [[Module:translations/data]]; but looking through [[Module:translations]]'s source, I'm confident that that table is only used in deciding which language's Wiktionary to link to, not which page to link to within that Wiktionary.

Another thing that seems potentially relevant is that fa-ira is also defined as an etymology language, in [[Module:etymology_languages/data]]; and indeed, [[Module:translations]] explicitly supports etymology-language codes (by specifying "allow etym" in its call to require("Module:languages").getByCode(...)); but the settings in [[Module:etymology_languages/data]] don't include information about diacritic removal, and although they do indicate that the fa-ira belongs to the fa family, I don't see code that would cause family-level diacritic-removal information to be used.

I'm sure if I spent a bunch more time I could eventually figure it out, but I'm hoping that someone just already knows how this works, and can tell me? :-)

Thanks in advance!
—Ruakh_TALK
20:07, 18 November 2023 (UTC)[reply]

@Ruakh Unfortunately this code is super messy. But 'fa' is not a family; it's a full language, and full languages have diacritic removal info associated with them, which etymology languages fall back to if necessary. I am assuming that makeEntryLink() (which does diacritic stripping) is called somewhere by the code in Module:translations, and removes the diacritics appropriately. If I have a chance I'll look into this in more detail. Benwing2 (talk) 08:05, 19 November 2023 (UTC)[reply]

@Ruakh Module:translations is calling language_link() in Module:links to generate the actual link, and that (through a few calls in a call chain) calls makeEntryLink(). Benwing2 (talk) 08:10, 19 November 2023 (UTC)[reply]

Actually I think it's happening through the call to getLinkPage() in Module:links, which directly calls makeEntryLink(). Benwing2 (talk) 08:12, 19 November 2023 (UTC)[reply]

One final thing; the fallback logic from etymology to full languages happens due to an inheritance-like mechanism implemented a few months ago by User:Theknightwho; if you have questions about this in particular, you might ask them. Benwing2 (talk) 08:14, 19 November 2023 (UTC)[reply]

Thank you so much! I'd actually found the various things that you mentioned, but your comments were nonetheless very helpful, both in reassuring me about what I already thought I understood, and in helping me focus on the right things to get over the hump. :-)

The specific big piece that I was missing, that I've found now, is that all of the data in the source code of [[Module:etymology_languages/data]] gets transformed (on the last two lines) before being returned; so the reason I couldn't find where data[3] (family code, in this case fa) was being used for entry-name substitutions is that the 'finalizeEtymologyData' function does a switcheroo where data[3] is moved to data[5] (parent code). (I guess you tried to tell me this when you said that 'fa' is not a family but rather a full language, but I thought you were just stating a fact I already knew — yes, obviously 'fa' is actually a full language, but [[Module:etymology_languages/data]] nonetheless seemed to use it as a family code — so I didn't grok what you were really trying to say.)

Armed with that fact, I now see how it fits into the inheritance mechanism, and I think I can implement it in my bot now.

Thanks again!

—Ruakh_TALK 07:53, 21 November 2023 (UTC)[reply]

Success! user?diff=76743752 —Ruakh_TALK 08:53, 21 November 2023 (UTC)[reply]

@Ruakh Great, thank you! Benwing2 (talk) 09:02, 21 November 2023 (UTC)[reply]

What happened to the translation section of cup?

The translation section looks glitchy in this page. Screenshot. 64.224.132.49 09:36, 19 November 2023 (UTC)[reply]

Fixed — SURJECTION ^{/ T / C / L /} 09:45, 19 November 2023 (UTC)[reply]

The onomatopoeia template

Sorry to bother, but could someone please help me with this? It's been over two months. I'm not so lazy that I choose to simply sit back and watch everyone else do my work for me ... i just dont trust my coding skills to handle even such a simple task as this. Please help as this is affecting hundreds of pages, and the fix surely cant be that difficult. Thanks, —Soap— 16:42, 19 November 2023 (UTC)[reply]

And also apparently the templates are protected, so whoever wants to help would need to be an admin or a template-editor. —Soap— 16:45, 19 November 2023 (UTC)[reply]

@Soap I think what you're proposing is to have a list of terms that are auto-linked to the glossary. Can you make a list of such terms? Benwing2 (talk) 00:01, 20 November 2023 (UTC)[reply]

I would say that Appendix:Glossary#imitative and Appendix:Glossary#onomatopoeic are pretty much the same thing, but that there may still be good reason to preserve them both as separate entries in the glossary, and therefore I would prefer the template to be able to link to both of them as well. We also list Appendix:Glossary#sound_symbolism and Appendix:Glossary#ideophone as separate entries, which I think are slightly different, and I can't remember coming across an instance of the {{onom}} template that was linked to either of those. So my request would be for this:

Please enable the linking of the onom template to Appendix:Glossary#onomatopoeic when no title= parameter is entered, or when the title= parameter begins with the text onom. (Though this may be trivial because of my last line; see below.)

Please enable the linking of the onom template to Appendix:Glossary#imitative when the title= parameter begins with the text imit. This is because it is sometimes more convenient to write "imitation", as on plas#Dutch, than to always write imitative.

If people feel that ideophones and sound symbolism should also be covered by this template, and thus also explicitly linked to the Glossary, I'd support that as well, but I haven't yet come across any examples of etymologies written this way.

Ideally, all other parameters fed into the {{onom}} template should link to Appendix:Glossary#onomatopoeic, to cover specific languages whose linguistic tradition might prefer to use a word such as expressive or echoic for such words. I know I've at least seen "of echoic origin" here and there.

Please let me know if there are other questions I need to answer. Thanks, —Soap— 07:55, 20 November 2023 (UTC)[reply]

Just posting here again before this gets scrolled off the page. —Soap— 15:13, 21 December 2023 (UTC)[reply]

@Soap I will take a look at this soon. I've been reluctant to take action because what you're asking for is very hacky but maybe there's a clean way to go about it. Benwing2 (talk) 09:25, 17 January 2024 (UTC)[reply]

Okay, thank you. I honestly thought this was one of the simplest of all requests, but it's good to know the reason why it has so far gone unanswered. Should it be not possible to use the begins-with logic, we can just make invisible anchors on the glossary page so that all reasonable variations of the word imitative will point to Appendix:Glossary#imitative , and likewise for the other terms. Then, any other input would point to the onomatopoeia glossary entry as a fallback. This assumes that it is at least possible to make the template point to the title= word itself. If this is not possible either, I'm not sure what the reason would be, but anything would be better than the current state of the template, where it actually de-links the title= word, making it worse than nothing. Thanks for your hard work, —Soap— 10:51, 17 January 2024 (UTC)[reply]

@Soap, Benwing2

Done Ioaxxere (talk) 16:47, 5 May 2024 (UTC)[reply]

Thank you so much. Checking some words it seems that you've made it work with at least imitation, imitative, and with no parameter, which serves the purpose for onomatopoeic. Im sorry to ask for more, but is it possible to also have a fall-back behavior for when we want a different word to be displayed, so that it will still link to something? Probably the best fall-back is onomatopoeia. I know the term expressive is used in Southeast Asian linguistics, ideophone in Japanese linguistics, and I've seen echoic as well. There's also the concept of sound symbolism, which I've seen used to cover concepts like this, but which I didnt mention since the anchor we have for it now is much narrower in scope and probably shouldn't be changed. Thanks, —Soap— 05:11, 6 May 2024 (UTC)[reply]

Inline for Template:th-usex and Template:km-usex

Can we please an inline version or parameter for Template:th-usex and Template:km-usex?

I can imagine adding Thai and Khmer transliterations as in this template is complex and far away but adding inline capability shouldn't be that hard. Anatoli T. ^{(обсудить}/^вклад) 08:54, 20 November 2023 (UTC)[reply]

Thank you for addressing this, @Benwing2!

Hi @Octahedron80, @Alifshinobi: notifying you of the change. We have too many multiline very short usage examples. We can convert them over time.

@Benwing2: Would it be hard to allow {{th-l}} to work on the same principle as {{th-usex}}, so that it allows multiword terms separated by spaces and using the same tricks for re-spellings? The only difference with {{th-usex}}, if it makes it easier is to no translation would be required - just display (with no spaces), link and transliterate (two spaces mean a real space).

I'd like to have {{km-l}} as well, even if it means making another language specific template. These languages work differently in terms of transliterations. Please add it to your to-do list, if you agree. Anatoli T. ^{(обсудить}/^вклад) 22:32, 21 November 2023 (UTC)[reply]

BTW, these may be a temporary solution to transliteration difficulties for Thai and Khmer, until a method similar to the current Mandarin/Cantonese, etc. is found. Anatoli T. ^{(обсудить}/^вклад) 22:43, 21 November 2023 (UTC)[reply]

@Atitarev Maybe we should just put the scraping translit functionality of {{th-usex}} and {{km-usex}} in their transliteration modules, so we can avoid the need for language-specific copies of general templates? It seems to me if it's possible {{th-usex}} it should be possible generally. User:Theknightwho any comments/thoughts? Benwing2 (talk) 23:33, 21 November 2023 (UTC)[reply]

@Benwing2 @Atitarev Agreed - that’s what Module:zh-translit does. If the scraping needs to be used for other purposes, it might make sense for it to remain in its own dedicated module which is called by the transliteration module (and any others that need it), but transliteration, sortkeys etc should always be a black box from the perspective of any modules which use them, such as Module:links. The special handling we have at the moment was always a terrible design choice. Theknightwho (talk) 23:40, 21 November 2023 (UTC)[reply]

Thanks. Whatever you do, pls consider if it's possible to respell individual words, rather than the whole sentence. Anatoli T. ^{(обсудить}/^вклад) 23:45, 21 November 2023 (UTC)[reply]

@Benwing2, @Theknightwho: Guys, please make this happen. You can make a module with test cases. I am happy to provide test cases. In corner cases we can ask native speakers but at the moment, it's just a technical solution, very similar to the Chinese lects, almost no language knowledge is required. Anatoli T. ^{(обсудить}/^вклад) 05:59, 22 November 2023 (UTC)[reply]

@Atitarev Can you give me some requirements and test cases? Do you e.g. want it to support the {...} notation that is currently supported for {{th-usex}}? Any other requirements? Benwing2 (talk) 06:55, 22 November 2023 (UTC)[reply]

@Benwing2: Hi, They are the same as in Wiktionary:Grease pit/2023/April or other disucssion.

Input words separated by a space or [[ ]]. {{th-x}} or {{th-xi}} (also {{km-x}}, {{zh-x}}}) require spaces.
1. เขา เป็น เพื่อน ของ ผม ― kǎo bpen pʉ̂ʉan kɔ̌ɔng pǒm ― He is my friend, compare with the Chinese: 他是我的朋友 ― tā shì wǒ de péngyou ― He is my friend
Since this is based on the scraper, only DEFINED terms with {{th-pron}} will work. Monosyllabic undefined term also work but only those without a consonant cluster, even if it's only in the spelling (false cluster). เป็น (bpen) is not a cluster "pb" is one consonant for the letter ป (bpɔɔ). กว่า (gwàa) has a "true" cluster and สร้าง (sâang) has a "false cluster". The latter is spelled with a cluster "sr" but pronounced with an initial "s". (Not sure if details are important).
Undefined terms (no entry or no {{th-pron}}) or terms with multiple readings need a respelling {} or |subst= or |p= (as in {{th-l}}). My preference for entries where pronunciation is | or comma-separated, to read the first occurrence by default or respell if it's different. The onus is on the editor to provide a respelling.
Respelling should be in Thai/Khmer way, including the conventional symbols: - (hyphens) or ' (apostrophes). If I were to respell the term อักษร ควบ แท้ ― àk-sɔ̌ɔn kûuap tɛ́ɛ ― 'true' consonant cluster as one undefined word, I would use อักษรควบแท้ (àk-sɔ̌ɔn-kûuap-tɛ́ɛ) (using {{th-l}} with |p=อัก-สอน-ควบ-แท้ (phonetic Thai respelling).

I am flexible in terms of spaces vs square brackets. I think transliterators should expand to add substitutes for any language. E.g. 爸爸媽媽的房間／爸爸妈妈的房间 ― bàbà māmā de fángjiān ― parents' room works but 爸爸媽媽的房間／爸爸妈妈的房间 fails because the module is confused about the pronunciation of 的. I'd like to be able to say that 的=de, as it is in 95% of cases. I would tweak the transliteration to make it more natural 爸爸媽媽的房間／爸爸妈妈的房间 ― bàba māma de fángjiān ― parents' room (note how I made the neutrals tones with 爸{ba} and 媽{ma} in the wikicode). Same with Thai. E.g. เพลา (pee-laa) vs เพลา (plao) - two Thai homographs with different readings. Anatoli T. ^{(обсудить}/^вклад) 11:39, 22 November 2023 (UTC)[reply]

@Benwing2: There was some comment re brackets not working when they are at the end or something. It's the best tool we have, so we have to use what's available. Anatoli T. ^{(обсудить}/^вклад) 23:43, 21 November 2023 (UTC)[reply]

@Theknightwho: If I were to try to support the {...} notation in a translit module, am I going to run into problems with your "chop-it-up-and-pass-parts-through-the-translit-module" approach? Benwing2 (talk) 08:53, 22 November 2023 (UTC)[reply]

If so can we disable the chop-it-up functionality at a per-language level? Benwing2 (talk) 08:54, 22 November 2023 (UTC)[reply]

@Benwing2, @Theknightwho: Or use |subst= or |p= for each word in need of transliteration? Whatever works, really. :) (|p= is a native language method, takes Thai/Khmer respelling for undefined terms or words with multiple readings). Anatoli T. ^{(обсудить}/^вклад) 11:47, 22 November 2023 (UTC)[reply]

@Benwing2 There's an exceptions list in Module:languages/data called export.contiguous_substitution. Now that we've got some buffer room with memory, I'll see what I can do about implementing something more robust, since this is a major flaw with the current implementation. Theknightwho (talk) 20:24, 22 November 2023 (UTC)[reply]

@Theknightwho Just make sure it stays efficient, otherwise we'll end up rapidly using up the buffer. Benwing2 (talk) 00:58, 23 November 2023 (UTC)[reply]

@Benwing2, Theknightwho Case in point: a is currently up to 77.48 MB- that's half of the added memory right there. Chuck Entz (talk) 01:21, 23 November 2023 (UTC)[reply]

@Chuck Entz True, but that's not down to anything specific - it's simply the reason why the page was such a nightmare to deal with until recently. Theknightwho (talk) 01:36, 23 November 2023 (UTC)[reply]

@Theknightwho: of course it's an outlier- an extreme one- but it shows that we can't just assume that the memory monster has been defeated once and for all and we can do whatever we want happily ever after. Chuck Entz (talk) 02:01, 23 November 2023 (UTC)[reply]

@Chuck Entz @Benwing2 True. One of the major issues with Lua 5.1's memory use is that it fluctuates unpredictably within a margin of about 2-3MB, which is one of the reasons why it was proving such a problem before, since pages close to the limit would start throwing errors over changes that should have been insignificant. So long as we don't have any pages pushing close to the limit, things should be okay. Theknightwho (talk) 02:18, 23 November 2023 (UTC)[reply]

@Atitarev I think the use of {...} in general is clearer than using subst= so if we can implement it everywhere it would be good. An alternative would be something like {.../...} or {...//...} inserted inline where substitutions are needed. Benwing2 (talk) 01:00, 23 November 2023 (UTC)[reply]

@Benwing2: Thanks, it's very good, if it can be done. {...} notation is used already, which follows without a space a word or character (Chinese) to be substituted. @Theknightwho: please consider that enhancement for Chinese transliteration, if it's doable(?), so that e.g. {{t}} can take the notation. For example, character 的 is especially problematic for Mandarin transliteration because of multiple readings.

Question: will the multiword transliteration for Thai take spaces like the current {{th-x}} or {{zh-x}} or square brackets like the current transliteration for Chinese lects? That is {{m|th|อักษร ควบ แท้}} (spaces) or {{m|th|อักษรควบแท้}} (square brackets)? Anatoli T. ^{(обсудить}/^вклад) 01:14, 23 November 2023 (UTC)[reply]

@Atitarev: Personally, spaces seem like the way to go because they reduce typing (one space character per word vs. four bracket characters), but User:Theknightwho maybe there's some technical reason for using brackets? BTW my plan for the {.../...} notation is to have it surround the word needing respelling rather than follow; this uses almost the same number of characters but avoids any issues in figuring out where the beginning of the word needing respelling is. Benwing2 (talk) 04:48, 23 November 2023 (UTC)[reply]

Another advantage of spaces (or hyphens or something else besides brackets) is that they allow brackets to be used for their intended use. In particular, I can see cases where a translation table for a given English phrase might have an undefined Thai phrase in it, and depending on whether the phrase is SOP, we either do or don't want to link the individual words with double brackets, but we want correct translit regardless. Using double brackets to indicate where the translit boundaries occur makes this impossible. Benwing2 (talk) 04:57, 23 November 2023 (UTC)[reply]

@Atitarev One thing that would help a lot is if you could give me a bunch of testcase examples (e.g. 25-50 would be a good start, as varied as possible) using your proposed wikicode syntax for various standard templates, e.g. {{l}}, {{t}}, {{ux}}, along with the desired term to link to and the desired translit. This will both help iron out the syntax and serve as test cases when I start the implementation. Benwing2 (talk) 05:00, 23 November 2023 (UTC)[reply]

You can start with either Thai or Khmer, whichever one seems easier or more useful. Benwing2 (talk) 05:01, 23 November 2023 (UTC)[reply]

@Benwing2, @Theknightwho:

I've made these small Thai test cases in User:Atitarev/Thai translit test cases.

Please let me know if they make sense. I used {{m}} to show what is wanted to work and demonstrated {{th-xi}} what is already working.

I used spaces, double-spaces (for actual visible single spaces in the Thai text). Thai templates produce · instead of a space. {} are used for respellings.

(I am more familiar with Thai than Khmer, which is still basic. I have been working more with Khmer only because the current transliterations are in a bigger mess.

Modern loanwords are notoriously difficult to respell but when they are already defined, we lose a test case. I normally ask native speakers or search my limited resources for actual respellings.) Anatoli T. ^{(обсудить}/^вклад) 05:59, 23 November 2023 (UTC)[reply]

@Benwing2 I'm not familiar enough with the module to say, sorry. Theknightwho (talk) 05:03, 23 November 2023 (UTC)[reply]

@Benwing2, @Theknightwho: Guys, I did what Benwing2 asked me to get you started. Please let me know if anything is not clear, badly formatted or missing and if you're planning or have any interest to work on the enhancement, one of you or both? I can imagine it's not easy but in my understanding the method is going to be similar to that one used on Chinese topolects. Anatoli T. ^{(обсудить}/^вклад) 03:03, 26 November 2023 (UTC)[reply]

@Atitarev Thank you. I have been looking into what needs to be done but it's a significant amount of work so it's going to take a bit of time. In particular I am still figuring out what User:Theknightwho did to Module:languages and whether I need to change anything in that module. Benwing2 (talk) 03:30, 26 November 2023 (UTC)[reply]

@Benwing2: Thanks. I've added a second batch in User:Atitarev/Thai_translit_test_cases#Batch_2. It's probably better this way (compared to batch 1). Anatoli T. ^{(обсудить}/^вклад) 03:36, 26 November 2023 (UTC)[reply]

R:ru:BTS

The website featured by the reference template {{R:ru:BTS}} has been recently revamped. -- Apisite (talk) 22:20, 21 November 2023 (UTC)[reply]

About numbered lists

Is there a method to transform a several-levelled numbered list into one-levelled list with sub-numbering? From this:

ma
me
mi
1. mia
2. mio
3. miu
mo
1. moa
mu

into this:

1. ma

2. me

3. mi

3.1. mia

3.2. mio

3.3. miu

4. mo

4.1. moa

5. mu

I guess there can be something to do with CSS (or HTML) but I don't know so much. Why and what for?

I don't want to use it in Wiktionary itself for everyone, but only for my way of visualising the text. (Isn't it what Custom CSS in Preferences is for? To change only the way I look it when logged?)
It is not to be used in actual wikt pages but in the Sandbox.
I will copy the parsed wikicode it into a narrow column width of my text editor (you can see in the resulting file how easily indentation of pages 40 and 50 can become like in pages 57, 73, 82 and even 23, 46, 61, 94ss), so the fourth level indentation is a mess. And as I'm gonna copy a lot (a very lot) of lists, so working one-by-one in the editor is discarded.

※Sobreira ◣◥ 〒 @「parlez」 09:20, 22 November 2023 (UTC)[reply]

@Sobreira this is apparently possible with some rather advanced CSS: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_counter_styles/Using_CSS_counters#example_of_a_nested_counter. Are you a CSS novice or is this enough for you to work from? This, that and the other (talk) 12:11, 22 November 2023 (UTC)[reply]

@This, that and the other: Sorry, forgot to mention that I already asked in the en:wikipedia:Wikipedia:Help_desk (they recommended the same page) and I thought to I just copy the code you suggest:

ol {counter-reset: section; list-style-type: none;}
li::before {counter-increment: section; content: counters(section, ".") " ";}
ul {margin-left: 0; padding-left: 0;}

into User:Sobreira/monobook.css but they say no and sent me here, so I guess a novice, a noob, a noovice. ※Sobreira ◣◥ 〒 @「parlez」 14:37, 22 November 2023 (UTC)[reply]

What about User:Sobreira/common.css? common.css is where I keep my personalized formatting stuff. —Mahāgaja · talk 14:49, 22 November 2023 (UTC)[reply]

Thanks all, I got some kind of a solution from [2], because applying li::before added numbers to all lists, so I restricted to ordered lists by ol li::before. I found however that nested OL levels continue the numbering of higher levels. I may fix it erasing the top-bottom/colN-bottom/hcolN-bottom templates but I don't know. Another solution would be no margin-left/padding-left for second and subsequent levels, but I couldn't find how. ※Sobreira ◣◥ 〒 @「parlez」 21:18, 22 November 2023 (UTC)[reply]

@Sobreira try:

.mw-parser-output ol ol {
    margin-left: 0;
}

The .mw-parser-output is needed to make sure the rule has sufficient specificity and isn't overridden by the built-in MediaWiki margin rule. If you want all ordered lists to have no spacing at the left, use .mw-parser-output ol instead. This, that and the other (talk) 23:00, 22 November 2023 (UTC)[reply]

@This, that and the other: Thanks, ☺ 🎊 With that (and when I saw that I couldn't copy-paste the numbering or bulleting into the text editor, I don't know why not this time), I had the (basic) idea of set all indentings to 0 in the editor and it worked great (and I discovered errors on my parsing). And it works great in common.css too, I may leave it as is. Wouldn't you know how to fix the numbering problem? In User:Sobreira/Irish, the last AB Norwegian Bkm number (5) is followed by the first ACCIPITER Aragonese (5.1), but the last ACCIPITER 5.12.1 Spanish is not followed, but restarted with the first AEGER (1) English aeger. Also if two Descendant templates are followed, the numbering restarts the second level but not the first: the first QUARTA in 4.1, and the first QUINTA French antérieur is 5.1 again. I think it may have to do in HTML with < h2> closing and opening between two consecutive < /h4>...< h4> not considered, or not being nested, or with no < h3> altogether in between, or with that the second OL list doesn't restart numbering after a heading (maybe an h2+h4?). ※Sobreira ◣◥ 〒 @「parlez」 09:56, 23 November 2023 (UTC)[reply]

Crux critica †

Something for the long-term lists of obscure and merely aesthetic sorrows. We would have to figure where to place it parametrically, without bidi problems and transcriptional repetition as on مورسرج. Perhaps a special character like asterisks in front of terms linking to the reconstruction namespace. Fay Freak (talk) 19:11, 23 November 2023 (UTC)[reply]

Template:com not taking "fa-cls", "fa-ira" and "prs" language codes

{{com}} is not taking "fa-cls", "fa-ira" and "prs" - the new language codes. I think they should but please correct me if I am wrong. @Sameerhameedy, @Benwing2. Anatoli T. ^{(обсудить}/^вклад) 00:53, 24 November 2023 (UTC)[reply]

Thanks, @Benwing2. It's now fixed. Anatoli T. ^{(обсудить}/^вклад) 01:08, 24 November 2023 (UTC)[reply]

@Atitarev if prs is to be used, it looks like WT:LT needs updating. This, that and the other (talk) 03:37, 24 November 2023 (UTC)[reply]

@This, that and the other: Probably not. "fa-cls", "fa-ira" and "prs" are still used under the same L2 header ==Persian== and nested into Persian/*** in translations. It just makes it easier to use the specific codes in various situations, since the varieties use somewhat different vocalisations and transliterations. For example at (šebh-e jazire) I used classical Persian in the etymology. Anatoli T. ^{(обсудить}/^вклад) 03:44, 24 November 2023 (UTC)[reply]

codespill bug in yi-noun

I think there is a bug in {{yi-noun}}. On pages like חומוס#Yiddish, there is a spurious and nonfunctional [[Category:|חומוס]] immediately after the noun gender, not even separated by a space. Perhaps it's missing a word, or perhaps that's not meant to be there at all. If I add a transliteration, the codespill disappears, even if the transliteration is wrong. But it seems that there is no need to add a transliteration to some words because the template is able to do that on its own. So presumably something needs to be fixed to allow the yi-noun template (and maybe some related templates) to work properly even if no transliteration is given. Thanks, —Soap— 11:04, 24 November 2023 (UTC)[reply]

I note, though, there are some pages where I would expect the bug to appear, such as אַגרימאָניע, which also does not have a transliteration. Yet these pages appear normal. So perhaps this is more complex than I thought at first. Best regards, —Soap— 11:07, 24 November 2023 (UTC)[reply]

I reported a similar issue here in relation to {{t}}. Not sure if it’s related or if it’s been fixed. — Sgconlaw (talk) 11:42, 24 November 2023 (UTC)[reply]

I don't know what's causing the bug, but the difference between חומוס (khumus) and אַגרימאָניע (agrimonye) is that חומוס (khumus) contains a letter found only in Hebrew words and therefore requires a manual transliteration (otherwise it gets put in CAT:Requests for transliteration of Yiddish terms with Hebrew-only letters), while אַגרימאָניע (agrimonye) does not contain a Hebrew-only letter and therefore does not require a manual transliteration. As it happens, transliteration of חומוס (khumus) is actually unnecessary because the automatic transliteration works, but whatever gadget detects Hebrew-only letters doesn't know that, and so puts it in that category. I'm pretty sure that it's that category that's causing the bug somehow or other. —Mahāgaja · talk 23:33, 24 November 2023 (UTC)[reply]

@Soap, Mahagaja, Sgconlaw This should be fixed. The code in Module:yi-translit that added the Hebrew-only letter category used a weird format for the category that seems to have caused the issue. Benwing2 (talk) 05:10, 25 November 2023 (UTC)[reply]

Nepali Verb Forms

I don't think I have the condensed listing of forms quite right for Nepali जले (jale):

{{inflection of|ne|जल्नु|t=to burn||positive|simple|past|1s//3sm mid-respect|;|positive|injunctive|2|m//f|low-respect//mid-respect}}

inflection of जल्नु (jalnu, “to burn”):

1. positive degree simple past first-person singular/3sm mid-respect
2. positive degree injunctive second-person masculine/feminine low-respect/mid-respect

(Notifying Kushalpok01): @Benwing2 What should I have? As what is shown as '+' in the conjugation table (contrasting with '-') has nothing to do with degrees for an adjective, we clearly have an issue with the tags for forms, so I think some code changes are needed in a template or module somewhere. (The originating problem is that Pali and Nepali जले should not link to a Hindi word form.) --RichardW57 (talk) 11:06, 24 November 2023 (UTC)[reply]

@RichardW57 There seem to be at least two issues here, maybe more. One is the format of the tags; a second is Pali and Nepali linking to a Hindi form. Maybe there's a third one; not sure as some of what you have written is obscure. As for the tags, 3sm isn't a recognized tag shortcut and in general you shouldn't try to combine lists of tags like this as it leads to ambiguity; e.g. is it "1s" along with "3sm mid-respect" or "1s mid-respect" along with "3sm mid-respect"? I would recommend something like this:

{{inflection of|ne|जल्नु|t=to burn||positive|spast|1|s|;|positive|spast|3|s|m|mid-respect|;|positive|injunctive|2|m//f|low-respect//mid-respect}}

Basically I split the first of your tag sets into two (and used spast as a shortcut for simple|past, but you don't have to do that). As for Pali and Nepali linking to Hindi, I'm not sure what is going on here; can you explain further? Is this something in the Nepali code or in the implementation of {{inflection of}}? Benwing2 (talk) 05:37, 25 November 2023 (UTC)[reply]

@Benwing2: My mistake with using '3sm'; it's so common a combination for Afroasiatic that I had expected it to be implemented, but now I see that it perhaps shouldn't be because of the ambiguous expansion of Hebrew '1s//2sf' (though '2sf//1s' wouldn't be ambiguous). There are still three problems left even with your suggestion

{{inflection of|ne|जल्नु|t=to burn||positive|spast|1|s|;|positive|spast|3|s|m|mid-respect|;|positive|injunctive|2|m//f|low-respect//mid-respect}}

inflection of जल्नु (jalnu, “to burn”):

1. positive degree simple past first-person singular
2. positive degree simple past third-person singular masculine mid-respect
3. positive degree injunctive second-person masculine/feminine low-respect/mid-respect

The problems are:

'positive' expands to 'positive degree', and that links to a term for adjectives (though I can imagine some language including degrees of comparison in a stative verb). Perhaps we can duck the issue by having a convention that Nepali verb forms are positive unless otherwise stated.
I feel that 'low-respect/mid-respect' should be contracted to 'low/mid-respect'.
I am not sure that 'respect' is the correct expansion of the 'resp.' that we see in the inflection tables. I've now seen that Wikipedia uses 'grade' for 'resp.' in our conjugation tables.

As for Pali and Nepali linking to Hindi, that came from links to HTML fragments जले#pi and जले#ne simply taking the user to जले (in which the only entry was for Hindi) without warning unless the user is logged in and has enabled orange links. It's a very common problem for inflection tables, which I have raised in the past. Some inflection tables have rivers of orange links for me; for others, they will be blue. --RichardW57 (talk) 10:33, 25 November 2023 (UTC)[reply]

@RichardW57: Hmm, what does 'positive' mean here for verbs? Does it just mean it's not a negative verb? If so maybe it can be omitted; otherwise we need a separate abbreviation for this use of 'positive' (cf. "comc" or "comparative case", which display as "comparative"). The display of "low-respect/mid-respect" is definitely possible; we have display handlers for this purpose. As for whether "respect" is correct, I can't answer that as I don't know Nepali. As for the links, not sure we can do anything about that; if an HTML anchor isn't found on a page, it always links to the top of the page. Benwing2 (talk) 23:05, 25 November 2023 (UTC)[reply]

@Benwing2: The Nepali inflection tables give a contrast '+' v. '-', so I would assume that '+' just means 'not negative'. I wouldn't be surprised if things got complicated when there were other negative words in the sentence. I had hoped for some input from an editor of Nepali. --RichardW57 (talk) 23:20, 25 November 2023 (UTC)[reply]

In principle we or a Phabricator task could enable orange links for everyone; that would go a long way to dealing with the problem of mislinking. Apparently orange links are 'expensive' - the implementation is rather complex, and I get the feeling it's too clever by half. --RichardW57 (talk) 23:20, 25 November 2023 (UTC)[reply]

Adding trivia/facts about a word's historical usage

This may not be the best place to ask but I could find anywhere else to ask.

Is it okay to add in facts about a history of a word like how it was originally used, became obsolete then was reborrowed into the language again and if so how would I go about doing this? Traumnovelle (talk) 02:55, 25 November 2023 (UTC)[reply]

Yes, that would be fine to include, as long as it was fairly brief. The best place to put it would be the etymology section. As an example, you could say something along the lines of "Found in Early Modern English with the meaning "definition" but fell out of usage until it was reborrowed into contemporary English with the current meaning." Andrew Sheedy (talk) 04:59, 25 November 2023 (UTC)[reply]

@Traumnovelle Welcome! The best place to ask this kind of question is at the Information Desk. I'll post a welcome message to your talk page with some more links you might find helpful. This, that and the other (talk) 09:36, 25 November 2023 (UTC)[reply]

Splitting script code Hani

"Han script" Hani has two child script codes "Traditional Han" Hant and "Simplified Han" Hans, but this is hardly representative of the reality, especially in the area of font tagging/support. Generally speaking it can be split according to borders of countries, so there are the following variants each with different glyphs for the same character and therefore require different fonts, as listed in Template:Han char/documentation#Notes:

~~G Mainland China+Singapore~~ this would be represented by C and S
C Mainland China: currently as .Hans
S Singapore: shares most glyphs with Mainland China
T Taiwan: currently as .Hant
H Hong Kong: some glyphs different from Taiwan, has its own set of fonts (e.g. Noto Sans HK) but is not listed in MediaWiki:Gadget-LanguagesAndScripts.css
M Macau: shares most glyphs with Hong Kong
J Japan: Kanji, currently as .Jpan
K Korea: Hanja, currently as .Kore
V Vietnam: Chu Han, currently as .Hani:lang(vi)

Note that there are a number of flaws with the existing setup, namely Hong Kong does not has its own script code which doesn't allow the correct fonts to be displayed anywhere on the site, and Vietnam which uses a language selector, but there are other languages in Vietnam that also uses Hani and thus does not get the proper font treatment.

I therefore think we should split them up into the following script codes:

code	usage	languages	fonts
`Hani`	"generic" use of the script, such as in the Translingual section or for non-standardized Han scripts	mul, aih, bca, bfc, bfs, bje, byo, dta, eee, lay, pcc, sa, swi, xct, xng, za, zal, zkr, zkt, tuw-kkl, tbq-plg, qfa-xgx-rou, qfa-xgx-tuh, qfa-xgx-tuo, qfa-xgx-wuh, qfa-xgx-xbi	same as existing `.Hant`??
`Hans`	Mainland China + Singapore	zh (cdo, cjy, cmn, cmp, cpx, czo, czh, dng, gan, hak, nan, wuu, wxa, yue, zhx-sht, zhx-teo, zhx-tai)	same as existing `.Hans`
`Hant`	a generic catch-all term for traditional Chinese(-related) scripts	lzh, och, ltc, cpi, crp-mpp, zh (cdo, cjy, cmn, cmp, cpx, czo, czh, dng, gan, hak, nan, wuu, wxa, yue, zhx-sht, zhx-teo, zhx-tai)	same as existing `.Hant`??
`Hantw`	Taiwan	zh, cmn, hak, nan	same as existing `.Hant`
`Hanhk`	Hong Kong	zh, cmn, yue	`font-family: 'PingFang HK', 'Source Han Sans HC', 'Source Han Sans HK', 'Noto Sans CJK HK', 'Chiron Hei HK', 'Source Han Serif HK', MingLiU_HKSCS, MingLiU_HKSCS-ExtB, /* fallback fonts starting from here */'PingFang TC', 'Source Han Sans TC', 'Source Han Sans TW', 'Noto Sans CJK TC', 'Microsoft Jhenghei', PMingLiU, PMingLiU-ExtB, MingLiU, MingLiU-ExtB, Ming, 'Heiti TC', HanaMinA, HanaMinB, sans-serif;`
`Hanjp`	Japan, as a subset of `Jpan`	ja, ojp, ams, kzg, mvi, okn, ryn, rys, ryu, tkn, xug, yoi, yox	same as `.Jpan`
`Hankr`	Korea, as a subset of `Kore`	ko, ko-ear, jje, okm, oko, pkc, xpy, zkg	same as `.Kore`
`Hanvn`	Vietnam	vi, mlc, nut, pcc, tyz, mkh-mvi	replaces `.Hani:lang(vi)`

Besides font support, the glyph differences means that the radical-stroke number is sometimes different for certain characters, so splitting them will allow more accurate sortkey results. This could, for example, eliminate the use of |rs= in {{ja-kanji}} which are hardcoded separately on every character entry. – wpi (talk) 10:22, 25 November 2023 (UTC)[reply]

Can we just invent our own script codes? The ones we're currently using are all ISO 15924 codes, which have four letters (one capital followed by three lowercase). —Mahāgaja · talk 10:51, 25 November 2023 (UTC)[reply]

Incorrect, see Wiktionary:Beer parlour/2023/April#Let's get rid of exceptionally formatted etym lang codes and Wiktionary:Grease pit/2023/July#Script codes harmonized which results in non-ISO script codes having five letters, a rule I've specifically followed here. – wpi (talk) 11:34, 25 November 2023 (UTC)[reply]

@Wpi This generally sounds fine with me; I'm not opposed to inventing new script codes if needed. If we're splitting on country, can we just create language/country-type script codes like 'vi-Hani' or something? Does this make sense? Also I think User:Theknightwho knows a lot more about this stuff and might have thoughts. Benwing2 (talk) 23:12, 25 November 2023 (UTC)[reply]

@Benwing2 - the issue with vi-Hani is that it creates problems if the script is also used by other languages, and there are 5 others listed under the Vietnam umbrella there. Theknightwho (talk) 23:17, 25 November 2023 (UTC)[reply]

@Benwing2, Theknightwho: I've also considered using the more commonly-used ICU locales zh-Hant-HK, yue-Hant-HK, zh-Hant-TW, vi-Hani, etc., but these only apply to one language and it would be dumb to repeat dozens of script codes in the css when they only differ in the language but are otherwise identicial.

We could also use something like Hant-HK, Hant-TW, Hani-JP, Hani-KR, Hani-VN but these are not valid ICU locale codes AFAICT, though I'm also fine with using these ones if you think they are clearer in representation. Either way it would be inventing a new script code. – wpi (talk) 05:00, 26 November 2023 (UTC)[reply]

@Wpi Yeah, IMO there's no particular need to use something like Hani-JP unless you think it's clearer than Hanjp (since the former explicitly mentions Hani in it). BTW I think User:-sche was the only other person who commented when I harmonized the script codes back in July; any comments here? Benwing2 (talk) 05:19, 26 November 2023 (UTC)[reply]

@wpi: They're perfectly fine ICU language tag fragments, just not atomic subtags. The only problem as locale codes is that they lack languages. --RichardW57m (talk) 10:46, 30 November 2023 (UTC)[reply]

Support - this is something I've considered before with Vietnamese. I appreciate that Hani would still exist as a fallback, but I think Sawndip (for Zhuang) and possibly Bowen (for Bai) should probably also have script codes, too, given that they're recognised as being separate (to a degree). Theknightwho (talk) 23:17, 25 November 2023 (UTC)[reply]

Interwiki links and display title

How the split is it even going to be practically feasible to split where in 中國／中国 (Zhōngguó) 中國 is traditional, 中国 is simplified and in 日本 (Rìběn) it's both?! Unless we start splitting each instance, even if traditional = simplified: 日本／日本 (Rìběn) (this example method only works with {{zh-l}})

A more complicated example is 臺灣／台灣／台湾 (zh) (Táiwān). 臺灣 and 台灣 are traditional, 台湾 is simplified.

The first two are traditional, the last one is simplified.

(Interwiki links are wrong, should link to zh:臺灣) Anatoli T. ^{(обсудить}/^вклад) 06:29, 26 November 2023 (UTC)[reply]

@Atitarev Normally a term is associated with a language, and for that language there should be one (or at most two, with Traditional and Simplified) scripts. I think you are asking what happens for uses of terms not associated with languages, e.g. DISPLAYTITLE for the page as a whole? This is a good question, maybe it falls back to Hani? But problems happen with other scripts too, e.g. for pages with an Urdu lemma on them, the page title at the top shows up in the weird Urdu font (Nastaliq?) instead of in the more readable normal Arabic font, cf. تاج, بیت. @Erutuon, Theknightwho, This, that and the other I'm not sure how to handle this correctly but the current situation is non-ideal. User:Theknightwho I think this might be because Urdu comes last on the page so its DISPLAYTITLE setting overrides the ones for the other languages. Benwing2 (talk) 06:56, 26 November 2023 (UTC)[reply]

@Atitarev I'll see if I can fix the interwiki links; however, in general if there are multiple slash-separated terms, how should we know which term to link to in the interwiki link? Do we have to query the other Wiktionary to see which terms exist? Benwing2 (talk) 06:58, 26 November 2023 (UTC)[reply]

@Benwing2: Thanks, it should be the first term in order (the one we consider the lemma), regardless whether it exists in the other wiki or not. @Ruakh: made it possible to determine if zh:wiki had one or the other (trad. or simp.). I think he mentioned HTTP redirects being in place in zh:wikt. You asked about links in some other discussion (about Persian links to determine {{t+}} vs {{t}}), not sure if you got a satisfactory answer. Anatoli T. ^{(обсудить}/^вклад) 07:05, 26 November 2023 (UTC)[reply]

Indeed. zh.wikt has server-side software in place that automatically does an HTTP redirect from one version to another if the latter exists and the former does not. That behavior can be queried using via the API, by specifying the query parameter converttitles=true when retrieving page info via action=query. (I don't have an example of an actual title conversion handy, but it looks similar to https://en.wiktionary.org/w/api.php?format=jsonfm&action=query&titles=hot_dog, just with "converted" instead of "normalized".) For the translation-adder Gadget, the relevant client code is now in MediaWiki:Gadget-TranslationAdder-Data.js; search there for converttitles. Note that it currently does only a single title lookup at a time, but the API does support retrieving multiple titles at once, by using | as a separator. Also note that it doesn't actually care what the converted form is — just that one exists — because the whole point is that the conversion happens magically after the user clicks the link. —Ruakh_TALK 17:37, 26 November 2023 (UTC)[reply]

@Ruakh: Thank you! (pls note your original post was reverted, I think you have removed some other edits). @Benwing2: Just pinging you, perhaps you'll find this helpful. Anatoli T. ^{(обсудить}/^вклад) 21:49, 26 November 2023 (UTC)[reply]

@Ruakh Interesting, I wonder how they managed to get that special-purpose software put in place; presumably some sort of Phabricator request that was actually honored? Benwing2 (talk) 22:36, 26 November 2023 (UTC)[reply]

@Benwing2, @Ruakh:

An example of an old style entry (at the moment) would be 一丟點兒／一丢点儿 (zh). You see I am using a trad. version "一丟點兒" on the input but it links to the simplified form on zh:wikt zh:一丢点儿 without any regular redirect we know about, so it must be using an HTTP redirect. (I chose a random entry before it gets converted or deleted).

On the Chinese Wiktionary they have adopted our way to lemmatise entries on the traditional forms, display both, making soft-redirects for the simplified forms but the process is slow. If working with HTTP redirects turns out to be hard, I am happy to drop that request and requirement but it would still be good to use the regular method to determine whether an entry exists in the other wiki for {{t}} and {{t+}} distinctions. Anatoli T. ^{(обсудить}/^вклад) 23:27, 26 November 2023 (UTC)[reply]

@Benwing2: I don't know how they got that feature, but it's probably relevant that zh.WP has the same feature. Wikipedia presumably has an easier time getting things prioritized than Wiktionary does, and I'm sure it's easier to get a feature for Wiktionary if it's already been built and just needs to be enabled here. (Actually I don't know if zh.wikt even asked for the feature, or if it was just turned on for all zh projects, or what.) —Ruakh_TALK 01:41, 27 November 2023 (UTC)[reply]

@Ruakh, @Benwing2: The Chinese Wikipedia has a quick way of switching between varieties - mainland China, Hong Kong, Taiwan, etc..

For example, at w:zh:中华人民共和国, search for 大陆简体 (dàlù jiǎntǐ, “mainland simplified form”) drop-down box at the top to switch to e.g. 臺灣正體／台湾正体 (Táiwān zhèngtǐ, “Taiwan traditional form”). The whole page changes to traditional Chinese. It's not 100% reliable but they must working on improving the functionality. Anatoli T. ^{(обсудить}/^вклад) 01:53, 27 November 2023 (UTC)[reply]

I think they got it because the issue of traditional vs. simplified characters is a fundamental part of their basic user interface: if you want to have a website that a wide range of Chinese-speakers can use, you have to deal with it. It's probably analogous to the auto-redirect that happens here when there's a lowercase equivalent of a missing uppercase search term. Chuck Entz (talk) 02:15, 27 November 2023 (UTC)[reply]

@Chuck Entz Makes sense. I am glad they chose to lemmatize on traditional characters rather than the other way around; it would be much messier to lemmatize on simplified characters, and not friendly to languages other than Mandarin. Benwing2 (talk) 02:29, 27 November 2023 (UTC)[reply]

@Benwing2: It's because lemmatising on the traditional is technically easier, among other historical or emotional reasons (not without challenges when the conversion one-to-many in the direction and there are multiple variants). We do, the Chinese Wiktionary also does now, following the new styles with modules and templates. Anatoli T. ^{(обсудить}/^вклад) 03:13, 27 November 2023 (UTC)[reply]

@Benwing2: Do we have to query the other Wiktionary to see which terms exist? In short we do, to determine the type of translation template to use. Anatoli T. ^{(обсудить}/^вклад) 07:23, 26 November 2023 (UTC)[reply]

@Benwing2: Yes, I am talking about the display. Adding |sc=Hant or |sc=Hans to translations have been very error-prone and useless. I have been removing all occurrences of |sc= in translations. If you fall back to "Hani" then you may get a mixed script display. Anatoli T. ^{(обсудить}/^вклад) 07:00, 26 November 2023 (UTC)[reply]

@Benwing2 I suspect you’re right about why the Urdu font overrides. It might be desirable to have a default script if there’s a conflict, somehow? Theknightwho (talk) 07:19, 26 November 2023 (UTC)[reply]

@Theknightwho, @Benwing2: On the phone, templatised Urdu always look different (Nastaliq) than on the laptop. Yes, it's less readable but I thought it was intentional, since Nastaliq is popular in Pakistan. Anatoli T. ^{(обсудить}/^вклад) 07:26, 26 November 2023 (UTC)[reply]

@Atitarev It's definitely intentional to use Nastaliq for Urdu terms, but not so much for page titles when there are multiple languages. Not sure why you don't see Nastaliq fonts on your laptop, as I do see it there. Maybe your version of Windows doesn't have any Nastaliq fonts available, or they're under a different name than the ones encoded in our CSS? As for translations, yes you shouldn't have to add |sc=anything to translations; it should be autodetected. What User:Wpi is proposing should be all automatic and under the hood. BTW thanks for the info about using the first term when constructing interwiki links, that makes it very easy. Benwing2 (talk) 08:17, 26 November 2023 (UTC)[reply]

@Theknightwho Yes we should use generic Arabic script fonts when there's a conflict between Urdu and some other language's fonts. But how can we know this? Each headword gets processed independently. It seems like each call to {{head}} for an Arabic script language would have to fetch the page contents and look at all the languages present. Is that feasible? Benwing2 (talk) 08:19, 26 November 2023 (UTC)[reply]

@Benwing2 Putting the logic in Module:headword/data would probably work, since it only gets run once for the whole page. Theknightwho (talk) 20:59, 26 November 2023 (UTC)[reply]

It does seem to be relevant that Urdu is the last language on pages where the top of the page displays in Nastaliq, as that is not the case on pages like بار and زبان where Urdu isn't the last language listed. (Incidentally, زبان is an amusing false friend between Yemeni Arabic and the other the languages on the page.) —Mahāgaja · talk 21:22, 26 November 2023 (UTC)[reply]

@Mahagaja Thanks, that pretty much confirms what's going on. Benwing2 (talk) 22:23, 26 November 2023 (UTC)[reply]

@Theknightwho Thanks, makes sense. Benwing2 (talk) 22:37, 26 November 2023 (UTC)[reply]

@Theknightwho Is there an exported function in Module:links or elsewhere to split a page on //, taking into account escaping using backslashes? I looked at the code in full_link() and it seems the escaping and unescaping funcionality is not properly extracted into a function that can be used elsewhere. Benwing2 (talk) 02:10, 27 November 2023 (UTC)[reply]

@Benwing2 What do you mean by splitting a page? Theknightwho (talk) 02:24, 27 November 2023 (UTC)[reply]

@Theknightwho Sorry, I mean splitting a link that contains // in it into its components. I need this functionality to fix the handling of interwiki links when given a multipart (double-slash-separated) link. Benwing2 (talk) 02:30, 27 November 2023 (UTC)[reply]

@Benwing2: I reckon you'd want to do that at Module:translations#L-70 instead, because the problem is only present with the translation templates but not normal ones. – wpi (talk) 04:47, 27 November 2023 (UTC)[reply]

@Wpi Yes, that is where I need to add the code, but it needs to split out the //-separated components, and I'm asking for a function to do that (it's not as simple as splitting on // because of the possibility of escaping with backslashes). Benwing2 (talk) 04:52, 27 November 2023 (UTC)[reply]

@Benwing2 There isn't one - sorry. It'll be part of the wikitext parser once that's in a state to be rolled out. Theknightwho (talk) 23:55, 27 November 2023 (UTC)[reply]

@Theknightwho Can you make one in Module:links by exposing the escape and unescape functions? I don't want to wait until the wikitext parser is out (which could be months or more). Benwing2 (talk) 00:06, 28 November 2023 (UTC)[reply]

@Benwing2 Sure. Theknightwho (talk) 00:09, 28 November 2023 (UTC)[reply]

@Theknightwho I notice if there's a component of the //-separated term that is blank, it gets converted to nil in split_on_slashes(). That makes iteration over it inconvenient; how about using false instead? Benwing2 (talk) 03:59, 28 November 2023 (UTC)[reply]

@Benwing2 Sure - it's so that alt text works properly: e.g. {{l|en|link//link2|//alt}} gives link／alt. Theknightwho (talk) 05:13, 28 November 2023 (UTC)[reply]

@Benwing2, @Theknightwho: Thank you for fixing the complex interwiki links! Anatoli T. ^{(обсудить}/^вклад) 21:43, 28 November 2023 (UTC)[reply]

question about language tags in CSS

@-sche, Erutuon, This, that and the other I have a question about language tags in CSS and HTML. https://www.rfc-editor.org/rfc/bcp/bcp47.txt seems to imply that it's OK to have etym language codes like 'fa-cls' (Classical Persian) specified in the 'lang' attribute of <span ...> or the like. I *think* that means that if I put lang='fa-cls' there, it will automatically match CSS that just specifies lang=fa, but I'm not sure. I'm trying to fix up Module:script utilities and Module:headword to properly handle etymology-only languages, and I want to make sure I do the right thing here. This is not just academic because e.g. MediaWiki:Gadget-LanguagesAndScripts.css has some language restrictors on lines 180-182 as well as lines 291, 493, 820, 825, 930 and 935. The first three in particular reference ja/ko/zh, and I'm pretty sure we do have etymology-only variants of Korean and Chinese at least. Benwing2 (talk) 09:38, 27 November 2023 (UTC)[reply]

@Benwing2 it does appear that the :lang(...) selector does what you want: User:This, that and the other/langcodes This, that and the other (talk) 10:03, 27 November 2023 (UTC)[reply]

@This, that and the other: Wouldn't this depend on the language code 'cls' not being understood?

@Benwing2:From a quick search, it seems that [lang=fa] just does a very dumb comparison of the attribute value, so there would be no connection between values 'fa' and 'fa-cls'. This was hinted at in the previous reply.--RichardW57m (talk) 10:42, 27 November 2023 (UTC)[reply]

@This, that and the other Thank you! Can you check using [lang=fa]? Benwing2 (talk) 10:52, 27 November 2023 (UTC)[reply]

@Benwing2 the [lang=...] selector does an exact match. You can use [lang^=fa-] to match language codes starting with fa-, but I'm not sure why you wouldn't just use the :lang(...) selector?

@RichardW57m I put an example with the code en-fa on User:This, that and the other/langcodes, and you can see that it does not get matched by the :lang(fa) selector. Is this what you had in mind? This, that and the other (talk) 23:33, 27 November 2023 (UTC)[reply]

@This, that and the other There are references to [lang=...] in MediaWiki:Gadget-LanguagesAndScripts.css (lines 180-182). Can you see if they can be rewritten using :lang(...)? If so I can change Module:script utilities to include the etym code in the tagged text. Benwing2 (talk) 23:36, 27 November 2023 (UTC)[reply]

Almost. However, extlang codes have to be 3 characters long, so in "en-fa", the 'fa' should be interpreted as an undefined region code! An additional test would be on the matching of fa-pes, a valid code for Iranian Persian. --RichardW57m (talk) 10:53, 28 November 2023 (UTC)[reply]

Actually, "fa-cmn" would be better - a CSS rule for "fa" might match plain language subtag "pes". --RichardW57m (talk) 11:56, 28 November 2023 (UTC)[reply]

Second revision - try with 'fa-gom'. One BCP 47 validator (https://schneegans.de/lv/) chooses to interpret this as Goan Konkani, recommending the use of 'gom' on its own. It does deem it invalid, as the correct form with that extlang would be 'kok-gom'. --RichardW57m (talk) 13:21, 28 November 2023 (UTC)[reply]

@Benwing2, This, that and the other: Ho hum. We get different results on different browsers! On Safari and modern MS Edge (which is based on Chromium), :lang(fa) matches 'fa' (Persian macro-language), 'fa-cls' (the extlang has no meaning), 'fa-lR' (Liberian Persian) and 'fa-pes' (Persian language macro-language, Iranian Persian language - except that pes is not also an ext-lang). However, if the language tag is not valid according to BCP 47, Firefox derives a language value of 'invalid' (source: https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang), and, accordingly, only 'fa' and 'fa-lR' match.

The results with *[lang=fa] and *[lang^=fa-] are the same for all three browsers. --RichardW57 (talk) 20:08, 28 November 2023 (UTC)[reply]

@Benwing2 @RichardW57 Interesting! That behaviour has changed between Firefox 112 (which I was using prior to updating - it delivered identical results to Chrome) and Firefox 119. It seems that, to be safe, we need to use the attribute-based syntax, even though it requires writing every selector twice... This, that and the other (talk) 22:58, 28 November 2023 (UTC)[reply]

@Benwing2, RichardW57: A safe alternative would be to convert our language codes to the nearest BCP 47 code! If we need to export finer distinctions than provided by BCP 47, we may be able to use the private use extension, e.g. 'awd-x-mar' for our 'awd-mar' for Marawan. --RichardW57m (talk) 09:33, 29 November 2023 (UTC)[reply]

@Benwing2, This, that and the other: Just to clarify, I'm suggesting that we only put values consistent with BCP 47 in the lang attribute. --RichardW57m (talk) 10:04, 29 November 2023 (UTC)[reply]

That's a good point. Presumably Firefox's validation only checks that the language code follows a certain set of syntax rules, not that it is composed of valid/known/standardised elements. So we could adapt the output of the lang= attributes to fit those rules if we so wished. However, the question needs to be asked: for what purpose are these lang= attributes used, besides applying appropriate fonts using CSS? I'm not convinced it's worth the effort to change the format we use for those attributes. This, that and the other (talk) 10:28, 29 November 2023 (UTC)[reply]

@This, that and the other: The failure to see language fa in 'fa-cls' or 'fa-pes' suggests Firefox is using the types of element for 3-letter subtags, but we haven't enough tests to investigate the behaviour with extlangs. For xx-yyy to be a valid code, as opposed to a well-formed code, yyy must be an 'extlang', and I that may be why 'fa-cls' and 'fa-pes' weren't recognised. (I may very easily be wrong.)

The lang attribute may be used to select glyphs, as with the Padauk font for the Burmese script. This is usually done behind the scenes by the browser, which must deduce OpenType (sometimes AAT) language codes from locales. This is only likely to be done for standardised language codes.

The locale may also be used if the pages are converted to speech. Again, this will only work for standardised language codes. (This is one argument against using private use language codes; another is that there are only 520 such codes and we have 652 non-standard non-etymology language codes with hyphens, though possibly a few could be converted to adequate BCP 47 codes.)

@Benwing2 I would suggest that we store an (ambiguous) BCP 47 code as part of the language data, with it defaulting to the Wiktionary code. Thus this data need only be present in Module:languages/data/exceptional and Module:etymology languages/data. --RichardW57m (talk) 13:40, 29 November 2023 (UTC)[reply]

@RichardW57 What is BCP-47 and how should I store the code? What I'm currently doing is just storing the parent language code but this is clearly non-optimal (e.g. it will store 'sh', which is no longer a recognized ISO 639 code). Benwing2 (talk) 02:08, 30 November 2023 (UTC)[reply]

The Internet standard for language tags - BCP 47, which has its freely accessible registry. As for storage, I would store it in a field bcp47 of the arrays m of submodules of language/data. The default value of 'nil' would mean that the Wiktionary code may be used. The language code 'sh' may not be part of ISO 639 any more, but it remains in the BCP 47 registry. Incidentally, should a language with a 3-letter code suddenly get a 2-letter code, it the 3-letter code that will remain in the BCP 47 standard, and reassignments of codes will not be recognised. I'm not sure how well this system will be working in 4026 AD. --RichardW57 (talk) 19:32, 30 November 2023 (UTC)[reply]

@RichardW57: It would be nice to use BCP 47 language tags in lang="" attributes. That could allow browsers to more often choose correct fonts and might help screen readers. We currently insert our own ad-hoc language codes, which browsers can't recognize (and shouldn't because they're unique to English Wiktionary). BCP 47 language tags would also allow us to be more specific about which orthographical system a bit of text belongs to, in a way that browsers could use if they want. For instance, "a Chinese language written in some kind of standardized romanization" is apparently prescribed to be lang="zh-Latn-pinyin". Wiktionary uses lang="cmn-Latn" to mark automatically retrieved Mandarin transliterations (Standard Chinese written in Hanyu Pinyin, as in {{m|cmn|中國}} → 中國／中国 (Zhōngguó)), and that is a valid BCP 47 language tag, though strictly it's only as specific as "some kind of Mandarin Chinese written in some kind of romanization". I think other script-suffixed language attributes are valid BCP 47 language tags if the language code and the script code are valid subtags listed in the registry and if the script is not the primary one used for the language. (Polyt is a script code not in the registry, so lang="el-Polyt" would be invalid and lang="el-polyton" would have to be used instead, if we wanted to distinguish Modern Greek written in polytonic Greek script.)

The language tagging module wouldn't be able to always select the most specific BCP 47 language tag, and sometimes it wouldn't be able to generate a valid tag at all. All the modules have is Wiktionary language code (including sometimes an etymology language code) and script code, which is not always enough information. For instance, there is a specific BCP 47 language tag (lang="ru-Petr1708") that is apparently used for Russian spellings with archaic letters like yat (ѣ), but I think Wiktionary uses the script code Cyrl for that as well as for modern spellings. There are probably other distinct orthographical systems that share the same combination of language code and script code. And some Wiktionary languages don't have official BCP 47 language subtags for them at all. For instance, I believe there are no official language subtags for unattested proto-languages, like Proto-Indo-European. Template:lang on Wikipedia uses idiosyncratic tags such as lang="ine-x-proto" for what we call lang="ine-pro". (-x-proto is a private-use subtag suffix. lang="ine-x-proto" doesn't help browsers or screen-readers, but we could use it in our CSS.)

Still, we could make more language attributes compliant with the BCP 47 system with a bit of work and parsing of the language subtag registry. Wikipedia has some data derived from the subtag registry (the suppressed scripts module would tell the script tagger that it should write lang="ru", not lang="ru-Cyrl"), but we'd also need to write out a bunch of mappings from our idiosyncratic language and script code combinations to BCP 47 language tags, and I guess come up with some private-use tags (like ine-x-proto) for the languages that aren't listed there. — Eru·tuon 00:01, 1 December 2023 (UTC)[reply]

@Erutuon @RichardW57 I agree that it would be better to put more standard codes in the lang= attribute. As a first approximation we could just have a mapping from Wiktionary language codes (full and etym-only) to BCP 47 ones; if someone comes up with such a list, it wouldn't be hard to add the codes into the language data modules and modify the language tagging code to read them. Benwing2 (talk) 06:09, 1 December 2023 (UTC)[reply]

@Erutuon: I don't think we need the most precise language tag; just one that will do the job. One of the reasons for a precise language tag is to specify the language of text that is to be generated; I don't believe we are interested in that. So for example, we would only need "ru-Petr1708" if text to speech conversion were being applied to the text and we had to choose based on whether the ability to handle yat was needed. I'm also struggling to think of a context where we need to specify a BCP 47 script tag - that can almost always be generated from the context. What is generally more useful is applying a class depending on the Wiktionary script - and that can be much finer grained than the BCP 47 script codes.

Unattested Proto-languages were not accepted for ISO-693-3. (I don't know if a recent consolidation for ISO-693 has changed their acceptability. I am assuming not.) A suitable emergency conversion would be to change family-etc-pro to family-x-etc-pro, at least when 'family' is a recognised language family or language collection.

It's conceivable that we could get a 6 or 8-character language code from BCP 47, e.g. by suffixing or prefixing 'pro' or 'proto' to the family name without any hyphens. Quite honestly, registering a general variant proto to stick on to family names appeals, but I suspect such a request would be rejected, much as ine-proto would be semi-intelligible to humans and conforms to the BCP 47 syntax. (Semantics might be a bit iffy.)

It's conceivable that we may need to distinguish versions of proto-languages by their orthographies, but we could pile the variants up if we need to so they could all be uttered by text-to-speech converters, but I think that's a pipe-dream.

For other natural languages, including etymology-only languages, we could try the following process:

If Wiktionary name is family-etc, use emergency code of family-x-etc.
It there's a BCP 47 name, use it.
Request ISO-693(-3) name.
If refused on the grounds that it's a dialect, and that is too close to the truth to refute, request a 5-8 character variant name from the BCP 47 registry.
If that fails, we're stuck with family-x-etc.

If a name is refused on the grounds of insufficient use of the language, we might be able to get a 5 to 8 character name from BCP 47. --RichardW57m (talk) 17:38, 1 December 2023 (UTC)[reply]

@this, that and the other: I've looked at the latest CSS Selectors specification and the HTML5 spec, and I think Firefox is non-compliant. Also, the definition of 'same language' has been tightened up, and formally zh-cmn and cmn are not the same language, despite both meaning Mandarin Chinese. I expect Firefox behaviour will revert - lets see what's in Version 121. --RichardW57 (talk) 01:23, 30 November 2023 (UTC)[reply]

swapped order of parameters in some quote templates

The templates {{quote-song}} and {{quote-journal}}, and possibly others, place the variables in an unexpected order when used with some combinations. For example, on the song template, if both composer and artist are given, the output is such as

1984, “Ode to Joy (traditional)”, Beethoven (music), performed by John Singer:

Freude, Freude, Freude schöner Götterfunken Tochter aus Elysium

Which seems normal. But if I add an album title, which most songs will have, it changes to

1984, “Ode to Joy (traditional)”, in Beethoven (music), Sensory Overload, performed by John Singer:

Freude, Freude, Freude schöner Götterfunken Tochter aus Elysium

As if the album and composer parameters were swapped in place.

There's a similar bug with {{quote-journal}} that probably doesn't come up that often. I noticed it when quoting print newspaper comic strips a few months back, for which I've switched to {{quote-book}}, which is less than ideal. The journal template, if given a certain set of parameters, produces output such as

1991 September 19, Tim Sniffen, “Tangelo Pie”, in The Massachusetts Daily Collegian‎^[3] (comic), University of Massachusetts Amherst, page 13:

Excuse me, but is this huge application necessary? Listing job experience — what experience is crucial for being a stock boy?

Where it seems as if The Massachusetts Daily Collegian, rather than Tangelo Pie, is the name of the comic strip, because that's where the word (comic) goes.

I am not sure if these two bugs are the same underlying problem or if it's just that the same sort of swap-in-place error happened twice. Is it possible, perhaps even easy, to fix? Thanks, —Soap— 18:22, 27 November 2023 (UTC)[reply]

As discussed on Discord, I've also run across this issue with the "editor" tag in quote-journal. A magazine I wanted to cite had no author attributed to the relevant section, so I swapped out the "author" tag:

1948 September, “Air Force Day”, in Earl N. Findley, editor, U.S. Air Services‎^{[googlebooklink]}, page 6:

Something about a drone.

For future reference, in case of template correction, the above output is currently displayed as:

1948 September, “Air Force Day”, in Earl N. Findley, editor, U.S. Air Services^{[googlebooklink]}, page 6:

Something about a drone.

Qwertygiy (talk) 18:43, 27 November 2023 (UTC)[reply]

@Qwertygiy @Soap Let me take a look. The code in Module:quote is rather complicated and it changes the order under certain circumstances. It would help if you could give me examples of what the output ought to look like for the problematic cases you've highlighted. Benwing2 (talk) 00:17, 28 November 2023 (UTC)[reply]

Errors with Bengali lemmas ending in -য়

There seems to be an issue with -য় being parsed as য (jo) with an underdot in links, instead of as a separate letter য় (ẏo). This causes problems when using the {{R:bn:DDSA}} template, and also with links using Bengali script in the URL.

What fixes and workarounds are there for this?

Michael Ly (talk) 18:41, 27 November 2023 (UTC)[reply]

@Michael Ly It's because MediaWiki normalises the input to separate out the nukta, for whatever reason. In theory the two forms should be treated as the same thing according to the Unicode standard, but if an external website requires the atomic character then the template will need to be updated to account for this. Theknightwho (talk) 23:49, 27 November 2023 (UTC)[reply]

@Theknightwho BTW, The reason is "composition exclusion". Basically, precomposed Indian letters with nukta are not allowed in canonically normalised text. --RichardW57m (talk) 12:05, 28 November 2023 (UTC)[reply]

Words are not linked in headwords

@Benwing2 (E.g., for the Word of the Day, square peg in a round hole.) J3133 (talk) 00:00, 28 November 2023 (UTC)[reply]

Fixed. Benwing2 (talk) 00:13, 28 November 2023 (UTC)[reply]

Add Category:Dobrujan Tatar language to the relevant language-related modules if appropriate

I just discovered this now while watching Recent Changes and it clearly does not exist in Wiktionary's module system. As such, it should be added to the system if it is to be considered a valid language or else the entries created using this as an L2 header should be updated. Acolyte of Ice (talk) 14:37, 28 November 2023 (UTC)[reply]

@-sche Can you comment? I don't know the first thing about this, and Wikipedia says it's a "dialect" but it's not clear what of. Benwing2 (talk) 23:12, 28 November 2023 (UTC)[reply]

@Benwing2: You don’t know the first thing of it because the whole Wikipedia page and its presence on Wiktionary is an idiosyncrasy pushed by the same editor this year, with further edits that made the article less clear; other people couldn’t get through that amount of editing to a relatively recondite topic. This is deffo Crimean Tatar, I had touched upon Crimean Tatar entries a few times before and stumbled upon Romania-specific material, which differs mostly by the spelling. The Ceaușescu regime of course made things a bit special and people always feared to respond to outside occurrences, but the Dobruja community is a result of the sixth–ninth Russo-Turkish wars. Fay Freak (talk) 00:21, 29 November 2023 (UTC)[reply]

I can only find limited information about this lect(s?), which may be because at least on Wikipedia it was—as Fay Freak says—only recently renamed to this by the same user who added these terms. What I can find, e.g. Filiz Tutku Aydın, Émigré, Exile, Diaspora, and Transnational Movements of the Crimean Tatars (2021), seems to consider it Crimean Tatar. The user who wrote the Wikipedia entry and added these entries cast "Crimean Tatar and Nogai" as "dialects" of Dobrujan Tatar, whereas from the point of view of Ethnologue/ISO and our current setup, Crimean Tatar and Nogai are separate languages spoken not only in Romania but also elsewhere (not dialects of one language in Romania). Among the sources the user cited on Wikipedia is an apparently Altaist website, which is not the most inspiring sign of reliability (although not as bad as an Altaic website). Analele Universității București: Limbi și literaturi străine (2009), page 50, says "The two varieties of language spoken by the Tatars in Dobruja, Crimean Tatar proper (Qırımtatar tılı) and Noghai (Noğay tılı), belong, both of them, to the Crimean Tatar branch, […] ". In the absence of good evidence that Romania Crimean Tatar and Romania Nogai have become one new language, I think the conservative approach would be to switch the entries to ==Crimean Tatar== and crh with a "Romania" or "Dobruja" label. - -sche (discuss) 16:16, 30 November 2023 (UTC)[reply]

I'm in the process of switching them to ==Crimean Tatar== labeled {{lb|crh|Dobrujan}}, putting them into CAT:Dobrujan Crimean Tatar, which at least gets them out of an invalidly named category. I'll let other people make the decision as to whether that's the best name or whether they should instead be ==Nogai== (nog) —Mahāgaja · talk 11:38, 4 December 2023 (UTC)[reply]

Template:R:ur:UDB

This reference template acts weirdly on the Urdu term خُوبْصُورَت (xūbsūrat), possibly in other terms. It infinitely keeps adding the word to the URL. It may have worked in the past but not now. Anatoli T. ^{(обсудить}/^вклад) 04:42, 29 November 2023 (UTC)[reply]

Weird: I don't see that. I see a completely normal URI of "http://udb.gov.pk/result.php?search=%D8%AE%D9%88%D8%A8%D8%B5%D9%88%D8%B1%D8%AA", that loads just fine, not some infinitely recursive, malformed URI. (Note that I cannot read Urdu to save my life, but the page loads just fine). Can you explain more or maybe share a screenshot online? —Justin (koavf)❤T☮C☺M☯ 05:21, 29 November 2023 (UTC)[reply]

@Koavf: Oh, it's working fine now. It may have been a glitch. Thanks for checking. Anatoli T. ^{(обсудить}/^вклад) 05:58, 29 November 2023 (UTC)[reply]

It happens again on entry شَہَنْشاہ (śahanśāh). Re-opening the case. @Koavf. --Anatoli T. ^{(обсудить}/^вклад) 03:48, 13 December 2023 (UTC)[reply]

I'm not seeing a problem. —Justin (koavf)❤T☮C☺M☯ 03:55, 13 December 2023 (UTC)[reply]

@Koavf: Doesn't happen any more to me either. It seems it happens when the site is temporarily down, then the URI goes crazy, trying to refresh every few milliseconds. Anatoli T. ^{(обсудить}/^вклад) 04:16, 13 December 2023 (UTC)[reply]