Wiktionary:Grease pit/2023/May

Thai multiword terms edit

I've put this topic here rather than the Beer Parlour because I suspect the answer will relate to problems with algorithms rather than dictionary policy.

Why does Module:headword decline to categorise any Thai phrases or idiomatic terms as multiword terms? Examples of such failure include ไข่ดาว (kài-daao, fried egg, literally star egg) and คารมเป็นต่อ รูปหล่อเป็นรอง (kaa-rom-bpen-dtɔ̀ɔ rûup-lɔ̀ɔ-bpen-rɔɔng). From the change history, @Theknightwho seems to have excluded them on 4 March 2023. As a side-effect, Pali paradāraṃ gacchati is classified as a multiword term in the Roman and Singhala scripts, but not the Thai script. --RichardW57m (talk) 10:27, 2 May 2023 (UTC)[reply]

@RichardW57m You already know the reason why - because scripts without spaces are excluded - and you also know that that means it isn’t specific to Thai.
Doesn’t that mean a much better question would be “what is the best way to detect multiword terms in Thai?” Theknightwho (talk) 10:53, 2 May 2023 (UTC)[reply]
@Theknightwho: So does this mean that you downgraded the performance from say 3% detection to 0%? Note that the second example has a space in it, so it could at least be detected as a multiword term if you didn't exclude terms containing terms only in the Thai script.
A quick way of detecting multiword terms is to look for the sequence ']][[', or more precisely multiple links in the headword. Is there a problem with that? Why don't you accept quick wins such as spaces in terms? There's also a backup exclusion via data.no_multiword_cat defined in Module:headword/data of languages that don't separate words with spaces - I don't understand the logic of that, either. Looking at the history, I see that @Benwing2, Atitarev, Octahedron80 may be able to explain what is going on. --RichardW57m (talk) 11:48, 2 May 2023 (UTC)[reply]
I do see a different, related problem with Thai - the repetition mark, for which a preceding space is generally prescribed, though where it is a problem it could be handled with |nomultiwordcat=. --RichardW57m (talk) 12:00, 2 May 2023 (UTC)[reply]
@RichardW57m Thai was already excluded, and has been since October 2020: Special:Diff/60868413. You must already know this, as you refer to it in your comment, so I suggest you stop trying to imply I made a unilateral decision to do this.
You also haven’t addressed the main point: how can we improve detection of Thai multiword terms? Theknightwho (talk) 12:36, 2 May 2023 (UTC)[reply]
When I started this topic, I hadn't realised that Thai was specifically excluded. My original question is even more appropriate - why is the concept of Thai multiword terms specifically excluded? What was your edit targeted at? Surely you weren't trying to create issues for Pali and, if what I can find is still relevant, Northern Khmer! There seems to have been no recent discussion - is there some relevant old discussion, and if so where? --RichardW57m (talk) 13:34, 2 May 2023 (UTC)[reply]
I've already offered a suggestion - see ']][[' above. --RichardW57m (talk) 13:34, 2 May 2023 (UTC)[reply]
My original aim was to remove unreliable concepts from Module:scripts/data, such as a script being intrinsically consistently used or consistently not used as scriptio continua. For example, there was a tradition of writing Gurmukhi script as scriptio continua up until at least the 1970's, though there was also a long-standing alternative tradition. --RichardW57m (talk) 13:34, 2 May 2023 (UTC)[reply]
@Theknightwho: What's a 'script without spaces'? Does it include the Roman script? It is rather older than word-separating spaces are in Europe. --RichardW57m (talk) 14:00, 2 May 2023 (UTC)[reply]
I think two tasks have been confounded:
  1. Categorising multiword terms as such.
  2. Splitting multiword terms up without external input. For Thai, this is worse than automatic transcription - Thais' decisions are generally not consistent. --RichardW57m (talk) 14:48, 2 May 2023 (UTC)[reply]

I think multiword term category is useless for Thai, Lao, Shan, Khmer, etc. languages because every compound word can be considered as the multiword term; they just don't use space. Even some terms have space, it does not mean each part is valid entry. However, it is not always that one syllable becomes one word.

Pali & Sanskrit languages are deserved to have multiword term because they use lots of spaces (they are Indo-European) for any written script. SO, it is not the script's issue how space is used; it's the language's issue.--Octahedron80 (talk) 15:33, 2 May 2023 (UTC)[reply]

Dittoing @Octahedron80.
@RichardW57m: Originally many Thai entries contain ]][[ to separate words but that was done mostly for etymological reasons. ]][[ may separate true multiword components or just word components. I have abandoned the practice to use square brackets to separate components in Thai or Khmer entries, since it wasn't approved by the community. I wouldn't rely on that delimiter to categories multiwords or even spaces in entries. Unfortunately (from a technical point view), it's not possible to determine word boundaries technically, even syllable boundaries with 100% certainty. That's why most words need to be respelled or read from entries (with or without re-spellings). Anatoli T. (обсудить/вклад) 23:57, 2 May 2023 (UTC)[reply]
How prithee does respelling help determine word boundaries?
It is the editors' responsibility to show invisible word boundaries using the wiki markup [[..]]. If a lot of entries are mismarked, editors will just have to correct the mark up. A lot of Latin phrases in English are not marked up correctly. Of course, the investigation is difficult in languages where ZWSP is little-used, and Lao, with line-breaking on syllable boundaries, may be very difficult. Traditionally, Lao, at least in the Tai Tham script, also broke lines after preposed vowels. On the other hand, one does occasionally encounter Thai with word boundaries consistently marked up. --RichardW57m (talk) 09:00, 3 May 2023 (UTC)[reply]
Can the compounds สัตวแพทย์ (sàt-dtà-wá-pɛ̂ɛt, vet), ธนบัตร (tá-ná-bàt, bank note) and ผลไม้ (pǒn-lá-máai, fruit) really be considered as multiword terms? How acceptable is breaking them between lines without inserting a hyphen? --RichardW57m (talk) 09:35, 3 May 2023 (UTC)[reply]
Word-separating spaces were still getting established in Indian languages in the 20th century. It's probably one reason that @Octahedron80 and @Theknightwho appear to disagree on whether Ahom (in the Ahom script) uses spaces, and why I have recently said that even language and script are not enough for looking up the use of scriptio continua. --RichardW57m (talk) 09:35, 3 May 2023 (UTC)[reply]
The only cases I know of of a space in a Thai term that is not a multiword term are spaces before the repetition mark mai yamok. For example, คารมเป็นต่อ รูปหล่อเป็นรอง (kaa-rom-bpen-dtɔ̀ɔ rûup-lɔ̀ɔ-bpen-rɔɔng) is currently marked up as four words. However, though the space reveals it as a multiword term, it is insufficient data for automatic decomposition. Using such words as an argument is a case of the confusion I was talking of above. --RichardW57m (talk) 09:35, 3 May 2023 (UTC)[reply]

A relevant example from English is a priori, which is, in my opinion correctly, categorised as a multiword term but whose components are not linked. (I do think, though, that there are English morphemes there, but that is another topic.) --RichardW57m (talk) 09:53, 3 May 2023 (UTC)[reply]

Out of the 15 comments in this section (not counting this one), you have made 11 of them. Please stop barraging others, because it means I no longer want to respond at all. Judging by the lacklustre response from others, I suspect they feel similarly. Theknightwho (talk) 01:26, 4 May 2023 (UTC)[reply]

Unable to create FWOTD edit

After I typed the following code in the FWOTD and I clicked "pubilsh page", I have receive a message called "WOTD Protection", and it stopped me from creating the page. What shall I do to prevent the disagreement? Can I create a FWOTD directly if I'm not an admin?

{{FWOTD|yue|阿媽都唔認得||aa3 maa1 dou1 m4 jing6 dak1|{{lb|zh|Cantonese|figuratively|humorous|with {{m|zh|到|tr=-}}}} [[awfully]]; [[extremely]]; [[terribly]]|pos=phrase|comment=This phrase could be literally translated to "even one's mother could not recognize [them]".}} Beefwiki (talk) 15:35, 2 May 2023 (UTC)[reply]

False English Multiword Terms edit

There is a clutch of English lemmas that are being miscategorised as multiword terms because they contain digraph-breaking hyphens, e.g. re-entry and co-operate. If the headline template were {{head}}, one could fix this by adding |nomultiwordcat= to the call of the headword template. Unfortunately, these words need Module:en-headword, which is protected against correction, so could someone please add the parameter to the module and let us know when the lemmas can be fixed. --RichardW57m (talk) 14:39, 3 May 2023 (UTC)[reply]

@RichardW57 This is implemented for English. Let me know if there are other languages needing similar treatment. Benwing2 (talk) 09:42, 17 May 2023 (UTC)[reply]

multitrans formatting question edit

In [[plate]], when {{multitrans|data= is placed inside the first translation table, and its closing brackets are inside the second table (so it encompasses the second table's trans-top but not the first table's, and the first table's trans-bottom but not the second table's), is this the intended format? (It seems ... unintuitive. Unaesthetic. But as long as it works, I suppose it's fine.)

=====Translations=====
{{trans-top|silver coin}}{{multitrans|data=
* Finnish: {{tt+|fi|hopearaha}}
{{trans-bottom}}

{{trans-top|heraldic charge: roundel of silver}}
* Swedish: {{tt|sv|[[bysantin]] [[av]] [[silver]]|c}}, {{tt|sv|[[rundel]] [[av]] [[silver]]|c}}
}}
{{trans-bottom}}

- -sche (discuss) 20:37, 3 May 2023 (UTC)[reply]

(Is it so there's a balanced pair of trans-top and trans-bottom outside the multitrans? Aha.) - -sche (discuss) 20:54, 3 May 2023 (UTC)[reply]
@-sche I typically place the call to {{multitrans}} after the first {{trans-top}} so it doesn't interfere with the translation adder, which seems to have difficulties if you place the call outside the {{trans-top}}. This all happens to work because the result of expanding {{trans-top}} and {{trans-bottom}} is HTML code, which passed unchanged by {{multitrans}} (it just looks for certain special Unicode chars that are output by {{tt}} and {{tt+}}, and processes them into translation entries). Ideally we should fix the translation adder so it isn't confused by {{multitrans}} calls. Benwing2 (talk) 04:09, 4 May 2023 (UTC)[reply]
@-sche @Benwing2 This was previously the case, but @Erutuon fixed it so that it's no longer an issue. It would probably be a good idea to run a small bot job to correct any of these so that multitrans starts above the first {{trans-top}} and ends after the last {{trans-bottom}}. Theknightwho (talk) 11:53, 4 May 2023 (UTC)[reply]

CFI requested edit edit

In March I posted a thread at Wiktionary:Grease_pit/2023/March#cat's_pyjamas_and_CFI. Since it's no longer visible from the main Grease Pit page, and because it attracted little attention, I made the edit myself and the redirects are now flipped around. I still am requesting an edit for the CFI page, however. Even the person who wanted to keep the pages the way they had been agreed with me that bomb vs the bomb is a good example to use on the CFI page in this section: Wiktionary:Criteria_for_inclusion#Articles. Could some please make that change? Thank you, Soap 10:00, 4 May 2023 (UTC)[reply]

Damn, we need a much better process for requesting minor edits to protected pages...   Done this one. This, that and the other (talk) 11:23, 4 May 2023 (UTC)[reply]

Government of prepositions, verbs, nouns, etc edit

We need a stable way of handling government/syntax of words. The templates {{+preo}} a {{+obj}} exist, but sometimes they are rather cluttery, and having two different ones is frustrating.

One solution would be to have {{government}} and {{gov}} and put it in usage notes and pair it with something like {{sense}} or {{senseid}}/{{senseno}}, and you could have that alongside maybe {{govi}} for inline government. This template could take multiple prepositions and objects if they are equivalent in meaning (see mówić#Old Polish def 1). I am open to other ideas but I'd like to set up something more uniform, having it sometimes in the label, sometimes in the headword, and sometimes in these templates is frustrating. Vininn126 (talk) 08:47, 5 May 2023 (UTC)[reply]

Agree that there should be something more uniform.
I think it's best not to put them in usage notes since that disrupts the reader's viewing experience, especially when the definition list gets longer. A better idea would be either (1) to treat {{gov}} similar to {{co}} in the inline position, noting that {{afex}} etc. serves a similar purpose (or under a -nyms header if the inline stuff gets too crowded), or (2) to put them after the definition with spacing and maybe some background colour (to distinguish the actual definition from {{gov}}), similar to how {{zh-mw}} does it. – Wpi31 (talk) 10:37, 5 May 2023 (UTC)[reply]
I think some people might complain about crowding, so two options might be better. How would we feel about another header? Vininn126 (talk) 10:41, 5 May 2023 (UTC)[reply]
What English reference works use a dictionary-type entry to convey this kind of information? Other languages?
Could we have sample entries? DCDuring (talk) 11:31, 5 May 2023 (UTC)[reply]
WSJP, a Polish monolingual dictionary gives a section in appropriate under definitions with "Składnia" (Syntax, government). SXVII, a monolingual Middle Polish dictionary, has a section called rekcja, meaning more or less the same thing. I do not look at English dictionaries as often, but you see this a lot in at least the Slavic languages, plus things like Latin, Ancient Greek, or really any languages with... nouns, verbs, or adjectives. Currently we often use a note before the definition (with "to"), one of the templates in the OP, or sometimes in the headword ( + dative). Vininn126 (talk) 11:35, 5 May 2023 (UTC)[reply]
I think this is a good idea, but I'd prefer that we do it as a label. Standardising how those labels appear would be good practice, though.
Just as an aside, I think inline nyms are too hidden to be useful at the moment (unlike quotations, where it's the only real option), so I'd really rather not add to that problem. I don't really think there's an issue in stuff being a little bit more spread out sometimes. Theknightwho (talk) 11:49, 5 May 2023 (UTC)[reply]
So basically how preo and obj work, but merged as one template? Would you place them left or right of the def? Some people find it crowds the defintion line. Vininn126 (talk) 11:50, 5 May 2023 (UTC)[reply]
I'd say to the left. It can be annoying when there are a large number of labels, but that's usually because it feels like patronising overkill: e.g. (derogatory, offensive, vulgar, nonstandard). On the other hand, a label like (nonstandard, used with "by" or "with") is just as long, but doesn't irk me anywhere near as much. Not saying it has to look exactly like that, but I'm just illustrating the point. Theknightwho (talk) 11:55, 5 May 2023 (UTC)[reply]
I disagree with the labels approach, because (1) that is too wordy, as an example: {{lb|en|used with "foo" to form a plural dative}}: (used with "foo" to form a plural dative), compared with the current display in {{+preo|en|foo|p|dat}}: [+ foo (plural dative)]; (2) labels are already sometimes very long in some cases that aren't really overkill; and (3) it would complicate things for the {{lb}} code since you'd also have to deal with abbreviations like the ones in the example, and probably there'll be the need for some really convoluted syntax to allow this to happen. – Wpi31 (talk) 13:38, 5 May 2023 (UTC)[reply]
I agree and I think having it be separated would also be easier to manage in terms of changing settings in the future or for scrapers.. Vininn126 (talk) 13:49, 5 May 2023 (UTC)[reply]
PS: I suppose we could make something like {{indtr}} happen, but that already feels complicated when it only has a limited usecase, i.e. intransitive verbs in Portuguese. – Wpi31 (talk) 14:32, 5 May 2023 (UTC)[reply]
That makes sense. So long as there's one way of doing it that we all agree on, any decisions on how we want to lay things out will be reasonably straightforward to implement. Theknightwho (talk) 14:42, 5 May 2023 (UTC)[reply]
This is interesting, adding transitivity there. I agree that it might be too specific. Vininn126 (talk) 14:46, 5 May 2023 (UTC)[reply]
On further thought I have reservations. I think a template combining valency and government could be useful sometimes, but government can get really long. I think there needs to be a separate way to handle long lists of cases and prepositions + case as well. We should also keep in mind that we can add government to just nouns, adjectives, and prepositions, which don't have valency!
I bring this back to my initial proposal, which is to have a new template and merge the old ones into this new template, which can be set inline. Vininn126 (talk) 07:49, 7 May 2023 (UTC)[reply]
@Vininn126 See User:Benwing2/test-obj, an attempt at implementing a better version of {{+obj}} to support government, prepositions, etc. I have some concerns about the look which is why I haven't pushed this live. User:Surjection gave me some comments in another thread a month or so ago on this topic. Any comments you have about how it should appear are welcome. Benwing2 (talk) 22:01, 16 May 2023 (UTC)[reply]
@Benwing2 This is definitely much better imo. Is it able to handle conjunctions and/or tenses and the like? I'm thinking of something like чтобы which governs either past tense or the infinitive, and some verbs govern чтобы or contrastively the infinitive. Vininn126 (talk) 22:06, 16 May 2023 (UTC)[reply]
@Vininn126 Yes, sometimes the formatting gets a bit cluttered if the governance is really complicated but this is definitely possible; e.g. there's no reason the "preposition" has to be an actual preposition as opposed to a conjunction. There are some examples among the page I linked that do this. Benwing2 (talk) 23:02, 16 May 2023 (UTC)[reply]
@Benwing2 The clutter is also something many people have mentioned - do you have any comments if the template should be inline that generates a button that can collapse or expand it, for example? Vininn126 (talk) 23:04, 16 May 2023 (UTC)[reply]
@Vininn126 Most of the time the governance is pretty simple so I wouldn't advocate unilaterally collapsing it. Maybe it can be collapsed optionally through a flag or something, although in that case it might make sense to put the collapsed stuff on its own line. Benwing2 (talk) 23:23, 16 May 2023 (UTC)[reply]
@Benwing2 Yes, I agree! Perhaps after a certain number it should be collapsed. Vininn126 (talk) 10:07, 17 May 2023 (UTC)[reply]
@Benwing2 Does this template take things like {{{q}}} for obsolete government? Vininn126 (talk) 22:04, 22 May 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vininn126 Can you give me an example of what you're looking for? Benwing2 (talk) 22:41, 22 May 2023 (UTC)[reply]

odpowiedni#Polish definition one. Vininn126 (talk) 22:41, 22 May 2023 (UTC)[reply]
@Benwing2 Were you able to make that change? Does anyone else see anything missing? @Surjection @Anarhistička Maca @Wpi31 @Theknightwho Vininn126 (talk) 10:54, 1 June 2023 (UTC)[reply]
@Benwing2 Is it possible to make the template print in superscript to offset the government from the sense? Something like this:
  1. (transitive) to get, to cause (someone to do something) [+ zum (nominalized verb); or + dazu + zu (infinitive); or + dazu + dass (clause); or + (demonstrative) dazu]
Anarhistička Maca (talk) 00:50, 2 June 2023 (UTC)[reply]

Trashing the HTML Page Cache edit

My understanding, largely derived from Wikipedia usage, is that Wikimedia servers keep a cache of page derivations. These derivations get invalidated whenever a page used in making the HTML page gets changed. Apparently cached pages are not served to logged in users, which seems to be at odds with the Wikimedia policy of encouraging visitors to set up accounts. Should we attempt to reduce page cachingthe amount of HTML page regeneration by bundling up minor, non-bug-fixing changes to widely-used templates and modules, or is it not worth the effort? I'm asking because I'm slowly working on some enhancements in ease of use that will invalidate the pages for almost all Pali nouns, verbs and adjectives; possibly this is too small a part of Wiktionary to matter. --RichardW57m (talk) 10:31, 5 May 2023 (UTC)[reply]

What is the incentive for doing this? Theknightwho (talk) 11:19, 5 May 2023 (UTC)[reply]
What do you mean by 'this'? When I said 'invalidate the pages', I meant 'invalidate the cached HTML pages'; no source pages will be invalidated by the changes. Within 2 minutes of replying, I will have clarified part of what I wrote, which might remove the reason for your question. --RichardW57m (talk) 12:30, 5 May 2023 (UTC)[reply]
I’ll rephrase: what advantage is there to taking this into account? This seems to fall firmly under WP:DWAP. It’s one thing to optimise template loading times, but I don’t see why we as editors should be concerned about the servers recaching pages, as I can’t see how it would have any measurable impact on user experience. Theknightwho (talk) 12:52, 5 May 2023 (UTC)[reply]
Response time depends on how long it takes for a page to be updated. On Wikipedia, it seems to take days for some pages to be 'speculatively' updated, or at least for categories to be updated; during the wait, it will take longer for the unregenerated pages to be delivered to users, because the HTML has to be regenerated. A lot will depend on the scheduling scheme; I can certainly imagine a request having to wait for a speculative HTML job to complete. I don't know how Wikimedia servers are allocated between projects; the effects will be much smaller if resources are shared centrally rather than allocated to projects. --RichardW57 (talk) 06:48, 6 May 2023 (UTC)[reply]
It looks as though the answer is to bunch edits as most convenient until chidden. --RichardW57 (talk) 12:40, 8 May 2023 (UTC)[reply]
The Wikimedia servers are shared, and you are the only person who seems to have any concerns about this. You’re welcome to change your editing patterns, but I oppose any implication that others should do the same, as it’s a completely unnecessary thing for any of our editors to have to take into account. Theknightwho (talk) 12:43, 8 May 2023 (UTC)[reply]

sms:a and forms edit

I found sms:ar in Special:DeadendPages because the linking template in the entry tries to treat it as a link to an sms message rather than to a Wiktionary page. I tried subsituting &colon; and wrapping it in <nowiki></nowiki>. The latter changes the display, but the link stays the same. The lemma entry and its forms (see [1]) all have the same same problem. Of course, it could all be mocked up with plain text and wikilinks, but that defeats the purpose of having templates. Chuck Entz (talk) 02:03, 6 May 2023 (UTC)[reply]

{{l|sv|:sms:a}} works: sms:aWpi31 (talk) 07:39, 7 May 2023 (UTC)[reply]
@Chuck Entz, Wpi31 These now work. Theknightwho (talk) 17:34, 27 May 2023 (UTC)[reply]

{{l}} isn't handling links to :3 — even when I write e.g. {{l|mul|Unsupported_titles/:3|&#58;3}}, which links to the right page, it erroneously strips the colon from the displayed form leaving just "3" (see: :3, :3). I fixed it by using an untemplated link, but possibly there is a better solution. Same issue on owo btw. (Possibly this is related to the preceding section - was there a change to how colons are handled?) - -sche (discuss) 17:02, 8 May 2023 (UTC)[reply]

@-sche You can use a backslash to escape the colon: {{l|mul|\:3}} gives :3. You don't need to manually enter the unsupported title link, either - that's all handled automatically.
The colon isn't being erroneously stripped, by the way - that's a standard feature of links, which you'll get if you put [[:3]] as well, as it's how you do the the colon trick works (e.g. [[Category:X]] adds a page to category X, while [[:Category:X]] is a link to the category instead). It's also useful for files, interwikis etc.
The link templates are designed to exhibit the same behaviour for the sake of consistency, though they they go slightly further, as the initial colon also stops diacritics being stripped. This is occasionally necessary: e.g. {{l|la|:&̄}} links to Latin . The reason being that it's an extension of the logic behind the colon trick. Theknightwho (talk) 17:15, 8 May 2023 (UTC)[reply]
Edit - hang on, you're right. The colon is being erroneously stripped, as it's being entered as an HTML entity. That'll need to be fixed, though it's essentially obsolete due to the simpler method I've outlined. Theknightwho (talk) 17:18, 8 May 2023 (UTC)[reply]
Ah, thanks for pointing me to the escape method. I did know about using : at the start of a link to make it a visible link to a category, interwiki, etc; when I said erroneously stripped from the displayed form, I was referring to the stripping of it from parameter 3 (rather than parameter 2); for example, if I type {{l|mul|foo|:bar}}, it displays :bar ("bar") instead of ":bar". Is there a reason it's stripped there, instead of being displayed as entered? - -sche (discuss) 19:58, 8 May 2023 (UTC)[reply]
@Theknightwho Is the removal of the colon from the displayed form (parameter 3, not just the link / parameter 2: i.e. {{l|mul|foo|:bar}} displaying "bar" instead of ":bar") intended behaviour? Testing just now, manually writing things like [[Category:English nouns|:Category:English nouns]] (without invoking any template or module than could strip the colon) works as expected i.e. identically to writing [[Category:English nouns]], so it doesn't seem like the colon needs to be suppressed from the display form in links out of any concern for breaking the "colon trick". - -sche (discuss) 14:29, 12 May 2023 (UTC)[reply]
@-sche No - that isn't right. It's because the link template always splits links into target/display text, even if they're the same. I'll have a look. Theknightwho (talk) 20:14, 12 May 2023 (UTC)[reply]
@-sche I've fixed this. The function for generating the display text already had a parameter to prevent the removal of interwiki prefixes (which is being used by the headword template), so it was just a case of using it in the instances when the display text is explicitly given in links as well. Theknightwho (talk) 21:16, 12 May 2023 (UTC)[reply]
Thanks! - -sche (discuss) 00:17, 13 May 2023 (UTC)[reply]

Wikidata item in {{named-after}} edit

It would be useful if {{named-after}} could take a Wikidata item as the second parameter, in a manner similar to {{coinage}}. Does this seem feasible? Thanks, Einstein2 (talk) 14:54, 9 May 2023 (UTC)[reply]

Wiktionary:Requests for verification/Non-English includes a list of "Requests for verification by language" categories. The total list is too long to display. On my screen it is cut off after Category:Requests for verification in Old Saxon entries. Most of these categories are empty. Some of them refer to languages that belong in other RFV subpages. Can Module:request category page list be changed to make the list more relevant? Vox Sciurorum (talk) 18:22, 9 May 2023 (UTC)[reply]

It's difficult to do anything about the category tree, but it would be easy enough to delete old empty categories from the list. This, that and the other (talk) 08:34, 11 May 2023 (UTC)[reply]
@Benwing2, can you have your category bot remove the empty children of Category:Requests for verification by language? Vox Sciurorum (talk) 13:07, 17 May 2023 (UTC)[reply]

Syllabification/hyphenation for Korean romanization edit

Hello, Can Romanized Korean words written using syllabification/hyphenation although using Hangul and Hanja? Yuliadhi (talk) 23:38, 10 May 2023 (UTC)[reply]

@Yuliadhi Did you ask this question before? I remember someone (maybe you) asking a similar question before. If so can you clarify what you mean and how it differs from the previous question? In any case it might help to give examples of what you're looking for. Benwing2 (talk) 23:25, 16 May 2023 (UTC)[reply]
@Yuliadhi, @Benwing2: If it makes sense and is required, it is achievable by analysing the hangeul and romaja (the RR transliteration in Roman letters). The Korean hanja or Sino-Korean characters should not be hyphenated but it is one syllable per character, same as hangeul (undecomposed), not aware of any exceptions, apart from abbreviations and Internet memes.
In 한국(韓國) (Han'guk) - the 1st syllable is 한, 韓 or "Han", the 2nd syllable is 국, 國 or "guk". ' (apostrophe) is not part of any syllable, just a separator to avoid misreading. So, the hyphenated form is "Han-guk", IMO. The phonetic changes shouldn't affect the syllabification, IMO. E.g. 합니다 (hamnida) is still "ham-ni-da", not "hap-ni-da" or "hab-ni-da", based on RR standard. Anatoli T. (обсудить/вклад) 00:49, 17 May 2023 (UTC)[reply]

Template:ja-usex - make inline possible edit

Hopefully this is not to complex to allow this template to make inline usage examples. Even a simple word goes over multiple lines, which is ridiculous. Anatoli T. (обсудить/вклад) 06:20, 11 May 2023 (UTC)[reply]

  Oppose @Atitarev: Should we even have such short examples? Please give an example of where a one-word usage example makes sense. --RichardW57m (talk) 09:47, 11 May 2023 (UTC)[reply]
For example at 車#Affix, though arguably these should use {{ja-co}} (collocations) instead. – Wpi31 (talk) 09:59, 11 May 2023 (UTC)[reply]
@Wpi31: That template is missing. People use {{ja-usex}}, since it has the ruby (furigana) and multi-word transliteration functionality, which can't be currently achieved with {{ux}} or {{uxi}}. Anatoli T. (обсудить/вклад) 10:06, 11 May 2023 (UTC)[reply]
Yes, I'm aware of the fact that the template is missing, it should be similar to {{zh-co}} which is based on {{zh-x}} but categorises as collocations rather than examples. – Wpi31 (talk) 10:45, 11 May 2023 (UTC)[reply]
Yes, we should. There could be endless examples, same as with any other language where {{uxi}} is used, also usage notes, discussions etc. This example doesn't need to be on multiple lines, if it's used in notes: ちょっと()ってください
chotto matte kudasai
please wait
The usage notes at パーセント (pāsento) spreads over multiple lines for no real reasons, just because the template doesn't allow it.
You can object away, this functionality is essential and is available for most languages. Anatoli T. (обсудить/вклад) 09:59, 11 May 2023 (UTC)[reply]
  Support - the objection that all usage examples must be long enough not to need this is absurd, Theknightwho (talk) 11:13, 11 May 2023 (UTC)[reply]
@Atitarev: Is t:ja-usex-inline what you are looking for? -- Huhu9001 (talk) 15:48, 13 May 2023 (UTC)[reply]
@Huhu9001: That's exactly right! Somehow I missed that old template. Anatoli T. (обсудить/вклад) 22:56, 13 May 2023 (UTC)[reply]

Template:th-usex and Template:km-usex - make inline possible edit

Similar to above, please allow Template:th-usex and Template:km-usex to make inline usage examples, I didn't combine with the Japanese equivalent since the nature and functionality of these are different. Anatoli T. (обсудить/вклад) 06:24, 11 May 2023 (UTC)[reply]

  Oppose @Atitarev: As for {{ja-usex}} above. --RichardW57m (talk) 09:47, 11 May 2023 (UTC)[reply]
  Support. It makes no sense to exclude this functionality. Theknightwho (talk) 11:13, 11 May 2023 (UTC)[reply]
  Support per User:Theknightwho. Benwing2 (talk) 22:10, 16 May 2023 (UTC)[reply]

Usage example not in italics edit

@Theknightwho: See ·: {{ux|mul|(1,2,5) '''·''' (3,4,−1) {{=}} 6}}. All other usage examples are in italics. J3133 (talk) 07:47, 11 May 2023 (UTC)[reply]

This seems to have been fixed. J3133 (talk) 03:58, 13 June 2023 (UTC)[reply]

I learned of this page, or was reminded of it, by Chuck's comment above, and perhaps he has already pointed this out somewhere, but if not: a lot of entries here could be improved (and removed from the list) by a bot going through them and whenever a definition consists of a single lowercase unlinked word, wikilinking it thus. - -sche (discuss) 00:25, 13 May 2023 (UTC)[reply]

Add the language code nan-tw and hak-tw into templates edit

There are a number of terms in Taiwanese Mandarin that is derived from terms that only exist in Taiwanese Hokkien but not in Mainland Hokkien, e.g. 搓湯圓搓汤圆 (cuō tāngyuán). Also, some words, especially those from Japanese, only exist in Chinese dialects in Taiwan. Mahogany115 (talk) 07:41, 13 May 2023 (UTC)[reply]

Support in principle, assuming that these will be etymology-only codes.
Though I think this is a problem in the wider picture of Chinese, where one language code often represents an entire branch or group of unintelligible lects, and some of the more prominent lects (e.g. Jianghuai Mandarin and Sichuanese until recently) don't even have an etymology-only code. I do have some ideas for such a plan, but it'll take some time to formulate a proposal. – Wpi31 (talk) 13:41, 13 May 2023 (UTC)[reply]

dates and the aWa archiver edit

The aWa archiver (used on RFV, RFD, etc) used to show, next to the "archive" buttons next to section headers, how many days it'd been since the last update to the discussion; this was useful because one could search for discussions that'd been open for N days to close. I don't see that anymore, and haven't for at least a week. Is it broken? Was it removed? (Was it too 'expensive'?) - -sche (discuss) 18:00, 13 May 2023 (UTC)[reply]

The archiver has been broken for me for a while. It seems like the "archive" links appear for a few of the sections at the top of the page, and then they just disappear. — Sgconlaw (talk) 19:47, 13 May 2023 (UTC)[reply]

Would it be possible for someone to add a parameter like |see=1 to the module responsible for {{desc}} that would generate the text "see there for further descendants", so that we could get rid of {{see desc}}? —Mahāgaja · talk 16:09, 14 May 2023 (UTC)[reply]

I'd support this. One fallback would be the inability of placing a <ref> between the form and the (see for descendants), which could be solved by having a |refN= parameter, which would also make possible to place a ref between a form and another, currently impossible in one template call. Catonif (talk) 13:28, 15 May 2023 (UTC)[reply]
That sounds like a good idea too, although we only use {{see desc}} with existing entries, and if an entry exists, then the place to put references is in that entry's Descendants section. —Mahāgaja · talk 16:35, 15 May 2023 (UTC)[reply]
In some cases, a reference might be needed not for the existence of the descendant, but for the derivation of it from the ancestor.--Urszag (talk) 16:43, 15 May 2023 (UTC)[reply]
@Urszag: In that case, the reference could be given in the Etymology section of the entry. —Mahāgaja · talk 08:13, 16 May 2023 (UTC)[reply]
@Mahagaja How does {{see desc}} differ from {{desctree}}, which also avoids duplication of descendants? I mean, when should one be used vs. the other? Benwing2 (talk) 23:29, 16 May 2023 (UTC)[reply]
@Benwing2 {{see desc}} tells the user to go to the entry to see the remaining descendants, but doesn't reveal them on the page where it's used. {{desctree}} transcludes all the descendants from the other page onto the page where it's used. {{see desc}} is appropriate when there are a lot of descendants that would take up a lot of space, and {{desctree}} is appropriate when there aren't too many descendants. So for example, if you go to Proto-Indo-European *wer- (heed) and scroll down to Unsorted formations, you'll see a listing for Old Irish feraid followed by its Middle Irish and modern Irish descendants. That's generated by {{desctree}}, and is fine because there are only two lines being added after feraid, so it doesn't take up much space. Then below that you see a listing for Proto-Germanic *warjaną with the note (see there for further descendants) after it, which is generated by {{see desc}}. The descendants of *warjaną are not listed on the page for the Indo-European root because there are so many of them, and listing them all could make the page hard to navigate. So {{desctree}} and {{see desc}} are in complementary distribution: the former is used when you do want to list all the descendants on the page, and the latter when you don't. —Mahāgaja · talk 07:19, 17 May 2023 (UTC)[reply]

Some simplified Chinese forms are not added to lemmas edit

I am curious why currently the simplified form 可怜 (kělián) is not added to Chinese PoS categories and 可憐可怜 (zh) (kělián) shows in orange (as if a Chinese entry doesn't exist)? It works OK for others. Anatoli T. (обсудить/вклад) 00:17, 15 May 2023 (UTC)[reply]

For comparison, check the simplified form 定义 (dìngyì) (of 定義定义 (zh) (dìngyì)). Both the display and the categorisations are fine.) (the orange links are available if you have the right settings in your preferences, if I'm not mistaken).

A number-system-agnostic conversion template edit

Over the years, I've cleaned up lots and lots of module errors caused by people using Roman numerals in parameters that are supposed to take Arabic numerals and vice versa. This generally happens in quotation templates, where the templates for specific works plug values converted from their parameters into the parameters of the master quote template. Since the older works that the templates are designed for tend to use mostly Roman numerals for volumes, chapters, acts and scenes, that mostly means feeding the parameter through {{R2A}} and then using template code to decide what numbers to feed into the master template's parameters. The worst part of this is that the functions used by templates are designed to be robust and not fail on bad input, so it can be real hard figuring out where the module error is. It seems to me like this approach is relying too much on template users reading the documentation and using the correct numbering system for a parameter.

Proposal

I would like to propose a smarter conversion template that will take either Roman numerals or Arabic numbers and always convert them to the desired numbering system. Since there's zero overlap in the characters used by the two number systems, this should be pretty easy to implement for anyone who knows Lua coding.

The easiest method would be to look at the characters in the input and use our existing {{R2A}} code for Roman numerals, our {{A2R}} code for Arabic numerals, or use error-handling logic otherwise.

We could have one template that outputs arabic numerals and another that outputs roman numerals, or we could have a single template that takes a parameter to specify which numbering system to output.

Design criteria

Since this is for quotation templates, I think it's safe to limit ourselves to positive (decimal) integers.

For the Arabic numeral version, we can boil it down to four possibilities:

  1. Input: Arabic numerals >0 (no decimals). Possible characters: "0123456789"
    1. Output: Arabic numerals>0
  2. Input: Roman numerals >0 Possible characters: "IiVvXxLlCcMm"
    1. Output: Arabic numerals>0
  3. Input: Nothing
    1. Output: Nothing
  4. Input: Anything else (negative numbers, decimals, partly or wholly non-numeric text)
    1. Output: A negative number (or zero?) to signal that there's an error
    2. either that or Output: Nothing if we don't want to bother with error-handling (this seems like the only option for Roman-numeral output)

Possible other parameters:

  1. Roman/Arabic switch for output
    1. Arabic-numeral output is by far the most common, so we could just have a parameter for Roman-numeral output, and default to Arabic otherwise
  2. Negative or Nothing switch for output when there's an error
I think the {{R2A}} extension should also allow 0; it sometimes occurs as a full field in a part numbering. --RichardW57m (talk) 08:57, 15 May 2023 (UTC)[reply]
@RichardW57 I am in favor of this and I think User:Chuck Entz has been asking for this for a long time. Benwing2 (talk) 22:05, 16 May 2023 (UTC)[reply]
I would agree cautiously on the assumption that all our current use cases can be catered for. — Sgconlaw (talk) 00:49, 17 May 2023 (UTC)[reply]

New categories or a bug? (...terms spelled with...) edit

Since three or four days ago, new categories (Macedonian terms spelled with [letter]) appeared in many Macedonian entries. For instance, in блудница, дарче, калабалак, but not in подароче. Gorec (talk) 16:05, 15 May 2023 (UTC)[reply]

@Горец: Fixed. — Fenakhay (حيطي · مساهماتي) 18:28, 15 May 2023 (UTC)[reply]
@Fenakhay, Горец: Categories appeared in орудие as well. There are a few more, but I forgot in which words they were. What could this problem be? Andrew012p (talk) 23:39, 15 May 2023 (UTC)[reply]
@Andrew012p: It is normal since it takes time to clear the cache. Give it a few days. — Fenakhay (حيطي · مساهماتي) 23:41, 15 May 2023 (UTC)[reply]
@Fenakhay: You are right, it's fixed now. Thanks. Andrew012p (talk) 23:42, 15 May 2023 (UTC)[reply]

Wonky alphabetization in category edit

Here's a snippet of the contents of CAT:en:Heraldry (archive):

  • impresa
  • in chief
  • in full chase
  • in full course
  • in piety
  • in trian aspect
  • in bend
  • inclave
  • increment
  • indented
  • inescutcheon
  • inescutcheoned
  • in fruit
  • in glory

At first, I thought the failure of the "in (x)" terms to be grouped together was due to a change in how spaces are handled, which I recall being mentioned somewhere recently; I dislike the dispersal of "in ..." terms among unspaced inclave, ingulphant, etc, but wasn't going to complain because I figured either approach has downsides (it's unhelpful to disperse "in ..." terms here, but in other cases it might make more sense to group subfoobar, sub-foobar and sub foobar together than to have them separated). But then I realized: something's wrong independent of that; why is it "in full chase, ... inclave, ... in fruit"? - -sche (discuss) 00:28, 16 May 2023 (UTC)[reply]

I look forward to seeing the answer on this one! --Geographyinitiative (talk) 00:55, 16 May 2023 (UTC)[reply]
I could get a similarly wonky sort by using &nbsp; as the space. I would have thought that all the possible different spaces would be folded together in sorting and/or filtered out of entry titles. DCDuring (talk) 01:16, 16 May 2023 (UTC)[reply]
@-sche It looks fixed now. I suspect this is because a change was made in the handling of spaces vis-a-vis the sort key, and it takes awhile for such changes to propagate because they only happen when the pages in question are regenerated. Benwing2 (talk) 22:04, 16 May 2023 (UTC)[reply]
Fascinating : for me, and for the Wayback Machine(!), when I just now upon seeing your commented cleared my cache and reloaded the page, and also tried loading the category in a different browser, and got the Wayback Machine to re-archive it just now, it still had the order I listed above, with "impresa, in full chase, in full course, in bend, in chief, inclave, [...] inescutcheoned, in fruit". But when I null edit in full chase and in fruit and then null edit the category, now they do show up in a consistent order. - -sche (discuss) 00:25, 17 May 2023 (UTC)[reply]
How many different code points for spaces appear in our headwords? DCDuring (talk) 12:31, 17 May 2023 (UTC)[reply]

{{desc}} |alt= functionality edit

Can we restore the functionality of {{desc}} so that {{desc|en||term}} produces: English: term, instead of: English: [Term?], term. I don't see the value of the current output and voiding the link from {{desc}} is a common usage. Additionally, if someone actually wants that output, they can use {{desc|en|3=term}}. @Erutuon, Surjection, Benwing2 -- {{victar|talk}} 21:01, 16 May 2023 (UTC)[reply]

@Victar The problem is this would be a special-case hack that wouldn't work well with all the other params. What would happen for example if someone does {{desc|ru||term|tr=TRANSLIT}}? Or should it be {{desc|ru||term|tr2=TRANSLIT}}? IMO it would add significant complexity to the code. Benwing2 (talk) 21:38, 16 May 2023 (UTC)[reply]
@Benwing2: Preferably {{desc|ar||term|tr=translit|ts=transcript}}, yes. I presume it would function like if terms[1] == "" and terms[2] == true then triggering separate parameters lists. It could also be done inside the template instead of the module, but I don't know which would be faster. --{{victar|talk}} 00:28, 17 May 2023 (UTC)[reply]
@Victar It could not be done in the template and it would be a big pain in the ass in the code to save all of 3 chars of typing. For example, what happens if there is {{desc|en||term|alt=ALT}}? That is currently allowed and does something reasonable; you'd have to account for this special case. Also what if someone writes {{desc|en||term|term2}} possibly with additional params? What then? In addition it prevents someone from easily adding another descendant later in the same call to {{desc}}. Overall, lots of pain, little gain. Benwing2 (talk) 05:03, 17 May 2023 (UTC)[reply]
@Benwing2: Something like if #(terms) == 2 and terms[1] == "" then alts[1] = terms[2] end? Or maybe as basal as if #(arg) == 3 and arg[2] == "" then return table.concat{initial_arrow, label, ":", arg[2]} end --{{victar|talk}} 06:25, 17 May 2023 (UTC)[reply]
These are huge nasty hacks that won't work and will cause a lot of issues. I am opposed to this change. Benwing2 (talk) 20:03, 17 May 2023 (UTC)[reply]
@Benwing2: What issues do you foresee them causing? What alternatives would you recommend? --{{victar|talk}} 20:19, 17 May 2023 (UTC)[reply]
See above, I enumerated some of the issues. Honestly I don't see why just using |alt= is that bad. The only feasible alternative is to use some special character at the beginning of a given term to indicate that it shouldn't be linked, although I don't know what character should be used. I have to ask though, why do you want to put the terms without links? Benwing2 (talk) 20:11, 18 May 2023 (UTC)[reply]
Going through the module, I was able to put together this more accurate code: if #terms == 0 and terms[1] == nil then alts[1] = terms[2] terms[2] = nil end. I not familiar enough to reset the index to 1 item, see User:Victar/Sandbox101.
The need for unlinked terms usually comes up when a reconstruction shouldn't have an entry, due to uncertainty, etc.
--{{victar|talk}} 00:22, 19 May 2023 (UTC)[reply]
You are completely ignoring all the concerns and reasons why such a simplistic approach won't work. Please trust me (as a professional software developer) when I say this is a nasty hack that will lead to a lot more problems than are worth correcting. Benwing2 (talk) 03:38, 19 May 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── I had some more time to work on it. Here is what I have thus far:

local nth_params = false
for k, v in pairs(parent_args) do
	if string.match(k, "%a[2-9]$") or string.match(k, "%a%d%d$") then
		nth_params = true
	end
end

if terms.maxindex == 2 and terms[1] == nil and not alts[1] and not nth_params then
	alts[1] = terms[2]
	terms.maxindex = 1
end

@Benwing2, what specific concerns of yours do you feel I am ignoring? Here are the examples you gave with their outcomes:

  • {{desc|ru||term|tr=TRANSLIT}} => Russian: term (TRANSLIT)
  • {{desc|ru||term|tr2=TRANSLIT}} => Russian: [Term?], term (TRANSLIT)
  • {{desc|en||term|alt=ALT}} => English: ALT, term
  • {{desc|en||term|term2}} => English: [Term?], term, term2

What other scenarios do you think need to be accounted for? --{{victar|talk}} 08:04, 22 May 2023 (UTC)[reply]

I haven't had a chance to think more carefully about this but it is a nasty hack that gives different meanings to numbered params depending on other numbered params, which shouldn't be done. I gave you an alternative suggestion of using a special char to indicate that a term shouldn't be linked, which is much cleaner; what is your concern with this solution? Benwing2 (talk) 05:13, 25 May 2023 (UTC)[reply]
@Benwing2: For all intents and purposes, it's no different than how |3=term in {{l|en||term}} moves to |2=. My reservation is introducing a brand new mechanism for a common functionality. --{{victar|talk}} 06:11, 25 May 2023 (UTC)[reply]
I just don't see the need for this, and it has the disadvantage of creating a confusing syntax. Theknightwho (talk) 22:21, 25 May 2023 (UTC)[reply]

Links don't direct to sequence within brackets edit

I created articles for the following underlined (nasal) vowel letters of Choctaw.

a̱ A̱ i̱ I̱ o̱ O̱ 

However, any link to those letters, including to the alt case forms within the articles themselves, takes the reader to the articles on the plain vowels instead. I don't know if this is a bug or a feature for some other function, but something isn't working right. kwami (talk) 07:36, 17 May 2023 (UTC)[reply]

Module:languages/data/3/c is set to automatically strip the macron-below diacritic (as well as the dot-below and the acute accent) from links Choctaw entries, suggesting that entry names should not actually include those diacritics. We do this when diacritics are used primarily in dictionaries and pedagogical materials but not in running texts intended for adult fluent readers (like acute accents in Russian or macrons in Latin). Is that true of the macron-below diacritic in Choctaw? Is it used only in dictionaries and pedagogical materials? If not, we should change the module's behavior and possibly move some Choctaw entries to new names. If so, and the diacritic-stripping is appropriate, then you can use bare links like [[a̱#Choctaw|a̱]] rather than {{l|cho|a̱}}. —Mahāgaja · talk 08:14, 17 May 2023 (UTC)[reply]
I really don't know. The texts I've seen have it, and it's for nasalization, which is written 'm' or 'n' before certain consonants and underscored vowel otherwise.
Linking directly doesn't work, though. The underscores still get stripped out. kwami (talk) 08:50, 17 May 2023 (UTC)[reply]
@Kwamikagami Put : at the start of the term to get a literal entry name. Theknightwho (talk) 13:00, 17 May 2023 (UTC)[reply]
@Kwamikagami: After looking around on the Internet a bit, I'm going to undo the diacritic stripping of the macron-below diacritic, but keep it for the dot-below and the acute accent as those don't seem to be used much at all. But I will have the module ignore the macron-below diacritic for alphabetization within categories, as "a̱ i̱ o̱" are alphabetized exactly as if they were "a i o" at The Choctaw Dictionary. Also, our lemmas seem to use ⟨v⟩ for /ə/; how do we feel about that? It looks like ⟨ʋ⟩ is actually the canonically correct letter and ⟨v⟩ is just used for typographical convenience. (The Choctaw Wikipedia at Incubator uses ⟨ʊ⟩, which feels doubly wrong to me, since it's neither the canonical letter nor being used with its IPA value.) —Mahāgaja · talk 20:46, 17 May 2023 (UTC)[reply]
I don't have much problem with ⟨v⟩. That's the convention for Cherokee, so it should be familiar to people. But yes, ideally we should follow accepted orthography. ⟨Ʋ⟩ is used in the Choctaw constitution.
Then we've got the lax vowel ⟨u⟩ and, apparently, ⟨u̱⟩. But I don't see an underlined lax ⟨ʋ⟩, so maybe ⟨u⟩ is sometimes for an [u]-like allophone of /o/. There's no separate letter for a lax ⟨i⟩.
We're using the traditional orthography, with ⟨hl⟩ varying with ⟨lh⟩. But AFAICT the acute accent is just Choctaw Bible orthography, so yes, I'd think we'd want to strip those out just as we would for Russian.
Those Incubator entries are odd. I'd prefer to either substitute for convenience or not, but not invent yet another orthography. kwami (talk) 22:14, 17 May 2023 (UTC)[reply]
For what it's worth, in the alphabet here, there's a V but no upsilon. And some of the article titles use V as well. kwami (talk) 22:50, 17 May 2023 (UTC)[reply]

short form template edit

There'd be use for Polish adjectives, which often have a short form (cf. pewien, and soon to exist Middle Polish cał, from cały.) This is a common feature in many Slavic languages, so I think there'd be use in having a template that can be used for many languages that also categorizes these forms into something like Category:Short adjective forms by language. The question is, what should we call this, or is there a better way to handle this? I know we already include Russian short forms, but they are uncategorized. @Benwing2, @PUC, thoughts? Vininn126 (talk) 20:16, 17 May 2023 (UTC)[reply]

@Vininn126 Not sure how short forms work in Polish, but in Russian and Czech they aren't lemmas (except in one or two exceptional cases), so at least in those languages they shouldn't be specified as lemmas or have their own declension table. I have been using {{infl of}} with the short inflection tag e.g. {{infl of|cs|křepký||short|m|an|p}} would be used for křepci. Note that adjectives with short forms do get categorized in Category:Czech adjectives with short form (which probably should have plural 'forms' instead), and likewise Category:Russian adjectives with short forms (including subcategorization by short accent form, see Category:Russian adjectives by short accent pattern and e.g. Category:Russian adjectives with short accent pattern c'). We could make the short forms themselves get categorized by adding an entry to Module:form of/cats but typically we don't do that for non-lemma forms. Benwing2 (talk) 22:45, 17 May 2023 (UTC)[reply]
@Benwing2 Having them as just soft redirects would be best in 99% of cases, however I would like specifically a category for the short forms, I am not sure if having a category "words WITH short forms" would be better for Polish, to be honest. Vininn126 (talk) 22:48, 17 May 2023 (UTC)[reply]
Furthermore they are much rarer in Polish, either being more common in Middle Polish and nowadays used with a few set adjectives. Vininn126 (talk) 22:50, 17 May 2023 (UTC)[reply]
@Vininn126 Polish is like Czech in this respect; only a few Czech adjectives have short forms (except for passive participles, all of which have short forms that are routinely used to form the passive). However, for the ones that do, they are inflected for number, gender and animacy, which is why (among other things) it didn't seem best to categorize the short forms themselves. Is this how Polish works or do they exist only in the masculine singular (like Ukrainian)? Benwing2 (talk) 22:58, 17 May 2023 (UTC)[reply]
Basically they only exist for the masculine singular, so if you think that's they should categorize, I'd be fine with that. Where ever we end up listing them on the lemma, there needs to be a way to mark it as potentially only Middle Polish. Vininn126 (talk) 23:05, 17 May 2023 (UTC)[reply]
@Vininn126 I am going to clean up Polish short-form adjectives and make them non-lemma forms; just want to make sure that's OK. I'll also add them to the declension table of the corresponding long adjectives. Benwing2 (talk) 20:10, 18 May 2023 (UTC)[reply]
@Benwing2 Sure, but they should be optional and manual. Vininn126 (talk) 20:10, 18 May 2023 (UTC)[reply]
@Vininn126 Do you mean the short forms should be manually specified in the declension table? Yes, that's my plan; that's how they're also handled in Ukrainian. (For Czech, there's more logic involved because the short forms can be inflected in various ways, but that doesn't apply here.) Benwing2 (talk) 20:13, 18 May 2023 (UTC)[reply]

In Modern Standard Mandarin Chinese, when a labial consoanat (namely b, p, m and f) is spelled with the vowel o, [u̯] occurs after the consonant, but we don't write it out, both in Pinyin and Zhuyin. However, the module outputs a redundant ㄨ when generating Zhuyin form (bo → ˙ㄅㄨㄛ, should be ˙ㄅㄛ instead). --TongcyDai (talk) 20:18, 20 May 2023 (UTC)[reply]

@TongcyDai This is fixed. Theknightwho (talk) 22:28, 8 June 2023 (UTC)[reply]

[Template bug] The wrong output of template root. edit

Input:

{{root|en|ine-pro|*legʰ-}}<br> {{inh|en|ine-pro|*legʰ-}}

But output (missed first line).

Dušan Kreheľ (talk) 21:42, 20 May 2023 (UTC)[reply]

@Dušan Kreheľ: see the documentation at {{root}}. It's designed to not display anything. It adds the page to the appropriate root category, in this case Category:English terms derived from the Proto-Indo-European root *legʰ-. Chuck Entz (talk) 22:07, 20 May 2023 (UTC)[reply]

"Terms spelled with" issues edit

I posted these on my talk page in a section created by Theknightwho, who did not reply.

J3133 (talk) 09:32, 21 May 2023 (UTC)[reply]

@J3133 I've reverted my recent change which caused this, as I've realised that automating the handling of these also requires modifications to the headword module. Theknightwho (talk) 13:05, 21 May 2023 (UTC)[reply]
@Theknightwho: The other two issues I mentioned have different causes. J3133 (talk) 14:18, 21 May 2023 (UTC)[reply]
@J3133 I implemented something that should have solved the underscore issue, but it seems that it’s being overridden by JavaScript.
@Erutuon would you please be able to assist with this? By default, the JS gadget is adding underscores for every space, and only does something different on certain pages (which must be specified somewhere). At the moment, we seem to deal with unsupported titles in 3 different places: Module:links/data, Module:unsupported titles/data and some third location I’m unsure of that’s used by the gadget. Module:headword/data accesses the first of those, and generates certain page-wide data based on it, such as the canonical page name (i.e. what it would be if it were allowed). It should be very straightforward to fold the display title into that module as well, as it doesn’t depend on input arguments, and would save duplication by multiple headword templates anyway.
Would it be possible for the gadget to access Module:headword/data to get the canonical display title instead of what it’s doing at the moment? That would mean everything is ultimately working from one table. Not only would it work for pages like ┬─┬ノ( º _ ºノ), where conventional {{DISPLAYTITLE:}} works fine (except for removing "Unsupported titles/"), but it would also help the gadget in situations like # #, where {{DISPLAYTITLE:}} wouldn’t work at all. Theknightwho (talk) 12:07, 22 May 2023 (UTC)[reply]
@Theknightwho Could it be MediaWiki:Gadget-UnsupportedTitles.json? BTW it is definitely possible for JavaScript to read module data; it seems the translation adder does this. There is Module:languages/javascript-interface and MediaWiki:Gadget-LanguageUtils.js among other things; it seems we need a Lua function to convert the module data to JSON, which can then be accessed through the expandtemplates API and the JSON parsed. Benwing2 (talk) 01:29, 23 May 2023 (UTC)[reply]

Tagalog replace hyphenation with hyph edit

Please replace all cases of {{hyphenation|tl|<stuff here>}} with {{hyph|tl|<stuff here>}} in code. Thank you. Ysrael214 (talk) 10:07, 21 May 2023 (UTC)[reply]

Why? {{hyph}} is just a hard redirect to {{hyphenation}}. —Mahāgaja · talk 10:53, 21 May 2023 (UTC)[reply]
@Mahagaja just for consistency Ysrael214 (talk) 03:19, 22 May 2023 (UTC)[reply]

Proverbs as a subcategory of Phrases edit

Emm, "hello"???

For some reason, Proverbs are not categorized at all, probably the only such set from Figures of speech. Maybe someone just forgot to add a category at some point xd I think we should treat them as a subcategory of Phrases because that's what they are. @Surjection, maybe you could do that, you're smart after all 🤔🤔 Shumkichi (talk) 18:47, 23 May 2023 (UTC)[reply]

I agree that proverbs should probably be under phrases, but most other figures of speech don't seem to be categorized under anything else than it either. — SURJECTION / T / C / L / 06:21, 24 May 2023 (UTC)[reply]
@Surjection CLICK - true, but what I mean is that e.g. idioms are not a grammatical category, idiomatic phrases are different parts of speech, so each of them has their own individual category such as a noun, verb, etc. The same can be said about euphemisms, similes (usually adjectives and adverbs), etc. But proverbs have no category whatsoever, but, unlike idioms etc., they are uniform and can all be easily categorised as phrases. It would also increase the number of phrases btw. (we would finally beat Russia hehe) Shumkichi (talk) 17:23, 24 May 2023 (UTC)[reply]
Wait, do you mean that proverb categories should be under phrases or that we shouldn't consider "proverb" a part of speech at all and instead consider them to be phrases? — SURJECTION / T / C / L / 17:51, 24 May 2023 (UTC)[reply]
@Surjection The former, I think we should make proverbs a subcategory of phrases, that's it. And if there's a way to make them automatically counted as phrases without having to overtly mark them as such, then gooooood. Shumkichi (talk) 18:03, 24 May 2023 (UTC)[reply]
I'm not sure the latter is doable that easily, but the former should very much be by just modifying the category data. I can have a look. — SURJECTION / T / C / L / 20:29, 24 May 2023 (UTC)[reply]

Issue with bracketed link in quotation edit

I have used “EA &#91;{{w|Electronic Arts}}&#93;” at dorktastic. When I created this entry yesterday, it worked. Today, it displays “EA [[[w:Electronic Arts#English|Electronic Arts]]]”. @Theknightwho: I suppose some change to a module broke it. J3133 (talk) 10:22, 24 May 2023 (UTC)[reply]

@J3133: I fixed it by putting <nowiki/> between the bracket and the link. I would recommend moving all those citations to the Citations: page, though. —Mahāgaja · talk 14:29, 24 May 2023 (UTC)[reply]
@J3133 I will do a proper fix. Theknightwho (talk) 15:02, 24 May 2023 (UTC)[reply]
@J3133 @Mahagaja Now fixed - any changes to add <nowiki> tags can (and probably should) be rolled back, so as to keep the wikitext more user friendly. Theknightwho (talk) 15:55, 24 May 2023 (UTC)[reply]

Square brackets + links broken edit

Until a while ago, left and right square brackets could be represented with the &lsqb; (&91;) and &rsqb; (&93;) entities in cases where they enclosed links to avoid wrong formatting. These links are now broken. I fixed a few yesterday (diff, diff, diff), but there are more occurrences (e.g. built different sense 1, call#Noun sense 8). The bug only appears in certain templates ({{quote-book}}, {{uxi}} and similar). (Maybe {{lsqb}} and {{rsqb}} could be created to prevent bothering with nowiki tags in the future?) Einstein2 (talk) 15:25, 24 May 2023 (UTC)[reply]

No new templates please. I am working on a fix. Theknightwho (talk) 15:29, 24 May 2023 (UTC)[reply]
Oops, I've just seen this issue has been already raised in the preceding section. Einstein2 (talk) 15:31, 24 May 2023 (UTC)[reply]

Urdu headword issues with two Hindi equivalents edit

At Urdu اِنْسان (insān) the display is messed up (shows two) when there is |hi2=, e.g. {{ur-noun|g=m|head=اِنْسان|hi=इन्सान|hi2=इंसान}}

Produces:

اِنْسان or انسان • (insān or insān)

Expected:

اِنْسان • (insān) Anatoli T. (обсудить/вклад) 06:19, 25 May 2023 (UTC)[reply]

@Atitarev If |hi2= is given, the code attempts to convert it to Urdu and insert it into the second transliteration param, which is why this is happening. I assume this is wrong? What is the right behavior? Benwing2 (talk) 23:44, 25 May 2023 (UTC)[reply]
@Benwing2: Thanks. I think both |head2= and |hi2= shouldn't produce additional equivalents. I mean |head2= shouldn't make new Hindi spellings and |hi2= shouldn't assume additional Urdu spellings. Anatoli T. (обсудить/вклад) 22:58, 28 May 2023 (UTC)[reply]

alternative form of alternative forms edit

There are plenty of alternative forms of alternative forms out there, like telangiectasy, which until just now was an alt form of telangiectasis, itself an alt-form of telangiectasia. I imagine it wouldn't be too hard to generate a list at, for example, WT:Todo/Alt Alt forms, if anyone fancies Wonderfool69 (talk) 08:19, 25 May 2023 (UTC)[reply]

@Wonderfool69 A splendid notion! I usually don't encounter this situation because of my view of alternate forms as alternate of the main form, but @LlywelynII has proposed something that produced this kind of Alt-Alt situation at Chung-nan-hai. I don't necessarily know if it is "right", but its definitely 100% plausible. --Geographyinitiative (talk) 12:04, 25 May 2023 (UTC)[reply]

Multilevel list enumeration redux edit

Previously I asked here and there about:

  • the existence of current guidelines on the formatting of multilevel lists;
  • whether there is a practical technical means of using different numbering styles for each level, to create a hierarchy like, say, 1(c)(ii) instead of 1 3 2.

Since then it has been clarified by Surjection, Chuck Entz & Erutuon — see above-linked discussions — that:

  • the prevailing practice on WT (albeit nowhere formally documented‽) is to use the most basic default-HTML-style numbering; but also
  • there are practical technical solutions to enable alternative numbering formats.

I understand that the practice won't and shouldn't change until a consensus is expressed. To that end, I would like to get wider feedback on the æsthetically preferred formatting. (Site-wide, not as a user customisation.) As a case in point, I put forward the list numbering at stroke (as at 20 May 2023), which I have previously cited as what I consider to be an egregious example.
Feel free to also add ancillary remarks about possible pros & cons besides the æsthetics — that might be about, say, technical implementation, or vision-impaired users making use of screen readers.

As a follow-on, I suggest again that whatever the intended multilevel list numbering format is, it should be formally documented.

—DIV

P.S. I wasn't sure whether this was more suitable for the Beer Parlour, but decided to stick here for now, rather than risk the perception of forum (s)hopping.

(1.145.63.208 14:32, 25 May 2023 (UTC))[reply]

Can we get more accelerated creation support for Swedish? edit

I guess this isn't a huge deal, especially for me as I don't speak Swedish, but I've noticed some Swedish declension templates have ACCEL support but others don't and just thought it'd be nice if most templates worked with ACCEL. Unfortunately I can't remember the last entry I saw with no ACCEL support but I see that {{sv-infl-noun-c-ar}} does generate ACCEL links when the entry doesn't exist. Acolyte of Ice (talk) 10:41, 26 May 2023 (UTC)[reply]

UPDATE: I have just noticed that {{sv-infl-noun-c-er}} does not generate ACCEL links by the looks of it. Acolyte of Ice (talk) 13:56, 29 May 2023 (UTC)[reply]

Unsupported title categories displayed as text edit

@Theknightwho: See Unsupported titles/Enclosing less than greater than: “<􀀃 􀀄Unsupported titles/Greater than􀀅>[[Category:Unsupported titles|]][[Category:Pages with DEFAULTSORT conflicts|]]” and Unsupported titles/HTML comment: “<!-- -->[[Category:Unsupported titles|]][[Category:Pages with DEFAULTSORT conflicts|]]”. J3133 (talk) 11:37, 26 May 2023 (UTC)[reply]

@J3133 Yep - this is a known issue with terms involving enclosing angle bracket. I’ll have a look today. Theknightwho (talk) 12:00, 26 May 2023 (UTC)[reply]

Weird behavior on nonspacing diacritic entries edit

On some entries for nonspacing diacritics, such as ◌̇ and ◌́, some (but not all) language entries have, in addition to the normal "Foobar lemmas" category, an additional category where the s of the word "lemmas" is adorned with the diacritic in question. For example, ◌̇ is in CAT:Catalan lemmaṡ and CAT:Irish lemmaṡ, but the other two languages on the page don't create those weird category names. Likewise, ◌́ is in CAT:Translingual lemmaś, CAT:Bulgarian lemmaś, and CAT:Irish lemmaś, and probably some others, but some languages don't do that. I assume there's something at Module:diacritical mark that's causing this behavior that needs to be fixed. Pinging @Theknightwho as the primary editor of that module. —Mahāgaja · talk 16:12, 26 May 2023 (UTC)[reply]

@Mahagaja Yep - I'm looking into this. Theknightwho (talk) 16:14, 26 May 2023 (UTC)[reply]

Fix Tagalog hyphenation for "sy" combination edit

Basically change hyphenations with (vowel)|sy(vowel) to (vowel)s|y(vowel) pattern.

Example {{hyph|tl|kom|bi|na|syon}} to {{hyph|tl|kom|bi|nas|yon}}

Here's the regular expression I crafted: Hope this speeds up the process.

insource:/(\{\{hyph(enation)?\|tl\|[a-zA-Z |\-]+[AEIOUaeiou])(\|)([sS])([yY][aeiouAEIOU][a-zA-Z |\-]*\}\})/

From %1%2%3%4, change to %1%3%2%4.

Thank you! Ysrael214 (talk) 13:39, 27 May 2023 (UTC)[reply]

Weird interwiki links on page defective edit

For some reason interwiki links are there not as usually, as an example ruwikt link is to ru:дефектный. 46.132.9.203 18:42, 27 May 2023 (UTC)[reply]

I cannot duplicate this issue. For me, the interwiki link to ru-wikt is to ru:defective. —Mahāgaja · talk 20:33, 27 May 2023 (UTC)[reply]
It's changed since this was posted. I certainly saw it. My guess is that the answer is somewhere at Wikidata- perhaps an item with incorrect data that was reverted or deleted. Chuck Entz (talk) 20:58, 27 May 2023 (UTC)[reply]

Gadget-Editor is covered by the sticky header in Vector 2022 edit

Interface of MediaWiki:Gadget-Editor.js (z-index: 10) is covered by the sticky header in Vector 2022 (z-index: 200). 05:36, 29 May 2023 (UTC)[reply]

Tracking fossilized terms edit

Is there a way to track fossil words i.e. in a category? I think it would be a useful/interesting category to have. It would most likely be used on multiword entries, but there might be a need for it in terms affixed from fossilized terms. Vininn126 (talk) 17:10, 29 May 2023 (UTC)[reply]

@Vininn126: That sounds like a great idea! I would propose the following two categories: Category:Fossil words for words like ado, fro, or yore; and Category:Terms containing fossil words for terms like without further ado, to and fro, or of yore. Tc14Hd (talk) 20:43, 3 June 2023 (UTC)[reply]
I agree. I think it could be handled by an etymology template, we'd need to set up the categories. I'm going to wait just a little bit more before I do anything and see what people think. Vininn126 (talk) 22:00, 3 June 2023 (UTC)[reply]

rfv-sense formatting is broken edit

See accordion for current example. Equinox 13:38, 30 May 2023 (UTC)[reply]

I think it's because the {{senseid}} template needs to always be the first thing on the line. If we can fix that, that would be great, but I'd say it's more a problem with senseid than with the RFV templates. Soap 13:41, 30 May 2023 (UTC)[reply]
@Soap, Equinox: But in at least the short term, it's T:rfv-sense/documentation that needs fixing, to say that {{senseid}} comes before {{rfv-sense}}. --RichardW57m (talk) 14:39, 30 May 2023 (UTC)[reply]

Tagbanwa fonts edit

Please add to MediaWiki:Common.css

to the Tagbanwa class the fonts used in Wikipedia. Currently, Noto Sans Tagbanwa is missing.

/* Tagbanwa */

.Tagb {
	font-family: "Noto Sans Tagbanwa","Tagbanwa","Quivira",Code2000;
	font-size: 1.1em;
}

.Tagb, .Tagb * {
	font-style: normal;
}

Ysrael214 (talk) 06:08, 31 May 2023 (UTC)[reply]