Wiktionary:Grease pit/2022/August

Automatic redirects when on an uncreated page

I was on this page after deleting it because I wanted to protect it when I was suddenly automatically redirected to wastage. Is there a way to disable this behavior? Automatic redirects are among the most annoying things on the internet. — Fytcha〈 T | L | C 〉 10:07, 6 August 2022 (UTC)[reply]

@Fytcha I don't know how to disable it, but adding &redirect=no to the url will prevent it while you're on that page. This is added automatically to redlinks, so clicking on any of the self-redlinks on that page or in the "redirected from" message will take you to a stable version. Chuck Entz (talk) 15:45, 6 August 2022 (UTC)[reply]

Do we have a way of telling how many pageviews nonexistent pages get? I would expect that because other wikis are not case-sensitive, noobs end up on uppercase pages and are helped by the redirects, probably more often than adept editors like us are annoyed by them (and we know to look for and click the small link at the top of the page we're redirected to, to go back to the page we want, whereas a noob may not read the wall of text and see to go to the other page, and/or may create an entry at the capitalized page). But it'd be useful to have data. Perhaps someone could make an opt-in gadget to automatically add &redirect=no to URLs, for users who want to never be redirected? - -sche (discuss) 19:41, 6 August 2022 (UTC)[reply]

There used to be preference for disabling automatic redirection, and if someone even mildly capable with JavaScript wanted to re-make that preference as a Gadget that would be welcome. I think it is literally just appending the string Chuck mentioned above to applicable links. - TheDaveRoss 13:02, 8 August 2022 (UTC)[reply]

The redirect from Wastage to wastage is the "did you mean" redirect and it's handled by MediaWiki:Common.js. It's supposed to be possible to disable it by going to WT:PREFS and checking "Disable the javascript redirect between pages that differ only in case", but saving preferences on that page doesn't seem to work any longer. You can disable it with window.disableAutoRedirect = true; in your common.js instead. — Eru·tuon 21:50, 8 August 2022 (UTC)[reply]

Thanks all, the JS solution works for me. — Fytcha〈 T | L | C 〉 13:02, 9 August 2022 (UTC)[reply]

@Erutuon: Speaking of WT:PREFS, I wonder if it is time to finally kill that page off. Is anyone still using them? If so, what features, and can those be moved to gadgets? - TheDaveRoss 13:34, 9 August 2022 (UTC)[reply]

Bot request

I see 2547 Italian adjective forms using {{head|it|adjective form|}} and a 4th parameter used for gender but not starting with g=. The output looks weird and is inconsistent. Can someone add g= to any of these whose 4th parameter is a gender? Some also use e.g. |f|p}} and need to be corrected. Ultimateria (talk) 19:30, 6 August 2022 (UTC)[reply]

Relevant search. 70.172.194.25 19:33, 6 August 2022 (UTC)[reply]

@Benwing2 this was as the result of WingerBot's important (but evidently, in this case, flawed) work. This, that and the other (talk) 05:16, 8 August 2022 (UTC)[reply]

@Ultimateria, This, that and the other In the process of fixing. Benwing2 (talk) 05:49, 8 August 2022 (UTC)[reply]

Module:number list/data/en creates bad red links

e.g. the red link "five times" (sum of parts) appears at quintary. Can this be changed to not be a link please? Equinox ◑ 17:23, 7 August 2022 (UTC)[reply]

@Equinox Fixed to use separate per-word links. Benwing2 (talk) 05:49, 8 August 2022 (UTC)[reply]

Common Turkic declension

What I was doing was a template for the declension of Common Turkic words. When trying to publish, it says the page is harmful. What is happening? Is there a mistake on the template? 191.95.173.91 18:45, 7 August 2022 (UTC)[reply]

Sound file question

Hi

How could I download sound files with English pronunciation?

I need it to build my own Anki database.

For example

There is a word 'car' with section 'pronunctiation'

https://en.wiktionary.org/wiki/car#Pronunciation

It has 2 audio files

Audio (UK): (file)
Audio (US): (file)

How can I build URL to download these these files?

Cheers GT Gt4dev (talk) 14:56, 8 August 2022 (UTC)[reply]

You can find out more about the files, including the locations and URLs for downloading on the Commons description pages (e.g. https://commons.wikimedia.org/wiki/File:En-uk-a_car.ogg). - TheDaveRoss 15:34, 8 August 2022 (UTC)[reply]

"Latina" (at "In other languages") vs. Latine

Like Wiktionary talk:Main Page#Čeština vs. český, compare Latinus#Usage notes. --Ufila (talk) 17:35, 8 August 2022 (UTC)[reply]

Italics as alt in Greek

Problems for alternative output (bold or italics) for latin & other scripts (e.g. especially for greek) in various link-templates. Main problem: cannot write italics as in an alt= param.
@Benwing2 tells me, it has to do with sc (script) or languages like greek Grek and CSS here. Notifying @Erutuon
Control examples:

{l en / el {{l|en|word}} word {{l|el|λέξη}} λέξη (léxi) & grc λέξις (léxis)
{m en / el {{m|en|word}} word {{m|el|λέξη}} λέξη (léxi) & grc λέξις (léxis)
{w {{w|word}} word {{w|lang=el|λέξη}} λέξη
{head} oops _??1 why 2nd param required? {{head|en|head=word}} Lua error in Module:headword/templates at line 52: The parameter "2" is required.
OK we can write 'PAGENAME', but perhaps we only need a gender {f} of something
Or, a param |pos=-
{head en / el (default bold, but not for other scripts?) {{head|en|head=word|noun}} word {{head|el|head=λέξη|noun}} λέξη • (léxi)

Trying bold (I now observe at the examples, that fonts for greek are weird and have a very faded bold or no bold. _??2 Why special fonts for el Modern Greek? (ok, some other fonts for Ancient grc, but even there, bold has to be balanced)
_??3 Here, the transliterations are bold (I think they should not), but not the word - or is it some faded bold? I cannot tell. I presume that for some elaborate calligrpaphic scripts bold has to be changed a little bit, but GRek has no such problems).

{l en / el {{l|en|word|'''word'''}} word {{l|el|λέξη|'''λέξη'''}} λέξη (léxi) & grc λέξις (léxis) Try russian слово (slovo)
{m en / el {{m|en|word|'''word'''}} word {{m|el|λέξη|'''λέξη'''}} λέξη (léxi)
{w {{w|word|'''word'''}} word {{w|lang=el|λέξη|'''λέξη'''}} λέξη Here greek font & bold is correct.
try to unbold {{head|en|head='''word'''|noun}} word {{head|el|head='''λέξη'''|noun}} λέξη • (léxi) Trying russian слово • (slovo)

_??4 Trying italics For taxonomic species and genera we need italics (A side-problem: Please note, that apart from the transligual NewLatin terms, there are equivalent academic terms in other languages; Latin is not universal. {{taxon}} does not provide lang=. But never mind.)

by they way _??5 what is the template like {{title|italics}} Example at Homo sapiens
{l en / el {{l|en|word|''word''}} word {{l|el|λέξη|''λέξη''}} λέξη (léxi) Try russian слово (slovo)
{m en / el {{m|en|word|''word''}} word, not needed here, {{m|el|λέξη|''λέξη''}} λέξη (léxi)
{w {{w|word|''word''}} word {{w|lang=el|λέξη|''λέξη''}} λέξη
{head or alt=? {{head|en|head=''word''|noun}} word {{head|el|head=''λέξη''|noun}} λέξη • (léxi)
This is an extraordinary attempt by my administrator @Saltmarsh to help write italics at HEAD with
''''Κίτροι'''''{{head|el|proper noun form|g=f|head= |tr=Kítroi}}

It would be very nice, if all things were unified for all languages, and it would feel very comfortable for editors-not-so-able like myself, to know and trust that every time we use |alt= (hopefully also available along the positional parameter) we get the result we need at whatever Template. Thank you, and thank you Benwing2, for your effort to unify hundreds of templates, thanks.
PS, Benwing, could {l} and {m} give the name of the Languages like {{cog}} does? with some extra parameter? Something like {l|el|λέξη|lang=1} meaning showlanguage. It would be very handy at etymologies when we mention some word. ‑‑Sarri.greek ^♫ I 14:12, 9 August 2022 (UTC)[reply]

@Erutuon The issue here is that there's CSS in MediaWiki:Common.css to disable italics and bold for Greek text wrapped in class=Grek; similarly for Russian and other languages. Do you know why this is there? If not, I will remove it. Benwing2 (talk) 02:44, 10 August 2022 (UTC)[reply]

@Sarri.greek Check out {{m+}}, this does exactly what you are looking for in terms of something that works like {{m}} but includes the language name. Benwing2 (talk) 04:45, 10 August 2022 (UTC)[reply]

It's because {{m}} uses  and we don't want it to italicize non-Latin scripts because they are already distinguished from English by not being in Latin script. '' also becomes , so both end up not being italic. — Eru·tuon 05:56, 10 August 2022 (UTC)[reply]

Thank you @Erutuon, Trying {{m+|el|λέξη}} Greek λέξη (léxi). {{m+|el|λέξη|''λέξη''}} Greek λέξη (léxi). Yok.

Why distinguish a script from English in matters of style? What is the wrong with the Grek script? or the Cyrillic? They are very normal scripts. They are not ideographic or something. ... I do not see why the discrimination. I do not know how many scripts are affected here.

If not possible, then, why not extract el from Gek script, Orrrrr add a script = Default and put us under there. Orrr, make a thing that says: #if el, then script = blah. Thank you. ‑‑Sarri.greek ^♫ I 10:59, 10 August 2022 (UTC)[reply]

PS which is what we do at el.wikt for arabic etc., because we do NOT have scripts, or an interface administrator, we cannot change CSS, so we do all kinds of manoeuvres at Modules for {links} ‑‑Sarri.greek ^♫ I 11:01, 10 August 2022 (UTC)[reply]

The matter here, is not what to call scripts linguistically (Grek, Cyrili etc). CSS does not care about that. But as fonts. So, wikt can call Xscripts the ones that need handling ? ‑‑Sarri.greek ^♫ I 11:18, 10 August 2022 (UTC)[reply]

@Sarri.greek: yes, if you use {{head}} without a POS parameter, you'll get a module error. This is like flushing large objects down the toilet to show the mess that results. Would you be so kind as to fix it so we don't have this page in CAT:E for eternity? Chuck Entz (talk) 15:25, 10 August 2022 (UTC)[reply]

@Erutuon Why don't we want  to italicize Greek? The logic here seems a bit strange. Benwing2 (talk) 01:45, 11 August 2022 (UTC)[reply]

@Benwing2: To expand on what I said above, when mentioning Greek in English, we don't italicize because it's distinguished by script already. It's a Wikipedia convention as well (w:MOS:BADITALICS). Not sure if it's a convention in books. For Cyrillic, it actually makes the script hard to read for people who are only or mostly familiar with un-italicized Cyrillic (which includes me) because some letters dramatically change their shape when they are italicized (т becoming т for instance). I'm not familiar with the other uses of italics in Greek that User:sarri.greek is mentioning above, though. Maybe there is some way to accommodate them. — Eru·tuon 02:36, 11 August 2022 (UTC)[reply]

@Erutuon I agree with you about Cyrillic; I also find italicized Cyrillic hard to read. But italicized Greek looks much like non-italicized Greek and if the community of regular editors in Greek prefers to allow italicized Greek, I'm not sure why we should prevent it. Benwing2 (talk) 02:39, 11 August 2022 (UTC)[reply]

@Erutuon Likewise we are currently preventing bold Greek; not sure why. Benwing2 (talk) 02:40, 11 August 2022 (UTC)[reply]

I should also add, the apparent point of the idea that foreign scripts are "sufficient unto themselves" in being distinguished presupposes that everything is normally written in Latin script. For Wikipedia this may make sense because all the articles are in English and text in foreign scripts is only found when interspersed in English text; but in Wiktionary this makes a lot less sense when all terms are written in their native scripts and may not be found near English text. Benwing2 (talk) 02:53, 11 August 2022 (UTC)[reply]

I had always understood that the reason for italicising Latin-script words in this dictionary is to clearly distinguish mentions from uses. But because the dictionary is written in English, there should be no uses of non-Latin-script terms - they are all mentions. Therefore italic text in other scripts should, in theory, never be required. @Sarri.greek you say "It would be very handy at etymologies when we mention some word" - but the etymology is written in English, so I'm not sure if I understand the problem. Does the use of non-italic text in this situation look odd to a native Greek speaker? If you can explain (and demonstrate) more clearly what you're trying to do, it might be easier to imagine how to solve the problem. This, that and the other (talk) 09:45, 11 August 2022 (UTC)[reply]

@This, that and the other:, this 'showing' was for 'showing names of languages', an extra-request. E.g. {{l|nl|xxx|lang=1}} coudld give: Dutch xxx.

Thank you @Benwing2, Erutuon, So, what i gather is that italicised is an option decided by the editor (easytype{{m}} exclusively for english editors), But all templates should have an option alt= in which all kinds of styles should be allowed, bold or italics if the editor chooses..

Most importantly, the transcriptions are affected (they become italics, or bold too; I think they should remain steady)?

Russians never write with italics? I didn't know that. Have we asked someone? OK. Then, a russian editor, would never choose to alt=xxx. Unless he is writing a toxonomic genus, there, he needs to italicise. It should be available.

As for bad outputs in certain scripts, e.g. also in Greek two ττ look like π. I would love if a hairspace was added between ττ. But it is something we live with, in greek internet, everybody knows what to read. Thank you ‑‑Sarri.greek ^♫ I 11:18, 11 August 2022 (UTC)[reply]

PS, some thoughts while reading w:Use–mention distinction... The reason for electornic texts not using quotation marks as printed texts do, is when they are linked, and because they are blue already. What if they are not linked? The rules should apply there too, linked or no linked for the sake of uniformity.

So, a different marking was needed in electronic texts, and italics was chosen as a solution instead of quotation marks (because a text would look bad with quotations all over the place)? (in el.wikt we use quotation marks blah blah «mentionedword», or nothing: blah blah: mentionedword. If in English text, blah blah "mentionedword")

Two practices: A) when the mentioned.term is within a same.language.text. and B) when the mentioned.term does not belong to the language of the text. Further issue for A and for B: When the mentioned.word belongs to a different language but with similar script. And a 3rd issue for A and B, when it belongs to a non.greek.latin.cyrillic script like arabic, Han characters etc. Then, the native tradition sould be consulted. ‑‑Sarri.greek ^♫ I 11:56, 11 August 2022 (UTC)[reply]

Hah, yes I see you were talking about language names, not italics, when you wrote the sentence about it being "very handy in etymologies". Sorry for the confusion!

Anyway, what I was trying to say is, let's see an example so that we can focus our thinking. I can see that not even {{quote}} allows italic Greek text, which is a problem if a book/journal quotation happens to contain an italic word. But can you provide an example of a scenario where it will be important to see {{m}} displaying a Greek word in italics? This, that and the other (talk) 13:51, 11 August 2022 (UTC)[reply]

Maybe we can avoid changing the Greek-mentioned-in-English-text rule, but still accommodate italicization in quotations and taxonomic names with CSS rules? That is, make Greek italicized in cases where we should be following Greek style and not mentioned word style. It might be as simple as editing the CSS so that it only turns off Greek italicization when the class=mention is present along with class=Grek. That would target {{m}} (and other templates that use the same module code to format words, like {{der}}), but not '' or other cases of . — Eru·tuon 22:11, 11 August 2022 (UTC)[reply]

@Erutuon Seems reasonable to me. Italics should work in {{head}} as well. Benwing2 (talk) 06:55, 12 August 2022 (UTC)[reply]

Regarding the .Grek font (Modern Greek), I'm not sure why we would want any special fonts (font families). Modern Greek is well supported by mainstream fonts. Ancient Greek (.polytonic) is the one that is best in New Athena Unicode so that the length-breathing-accent vowel letters (ᾱ̓́) are rendered correctly. Those combinations of diacritics probably aren't ever used in Modern Greek text on Wiktionary, even in polytonic orthography. If this is correct, I will change the CSS for that and if it causes any problems for individual languages, they can be switched to the polytonic script code. — Eru·tuon 22:34, 12 August 2022 (UTC)[reply]

Okay, I made it so that Greek italics are only disabled at the top level of {{m}} and such (diff). In any other location, Greek should be able to be italicized. I'm not positive this is correct in all cases, so please describe any odd effects here. — Eru·tuon 22:47, 12 August 2022 (UTC)[reply]

Small bot jobs

Would a bot owner be so kind as to perform a few bot jobs on Polish entries?

I regret naming {{R:pl:SJPSBL}} that and would like to change it to {{R:pl:SJP1807}}, and we would need to change all instances of that on pages to the second template.
Many pages have many instances of {{col3}}; this is because we separate them by part of speech. could we organize them by title alphabetically? So adjectives, then adverbs, etc. (there are also tons of pages where all the parts of speech are crammed into 1 col3, and I'd like to get those separated off, but that's most likely an issue for another day.)
Could we standardize usage of {{wp}} on Polish pages so that if there is a pedia link, it's {{wp|lang=pl}} (plus any other parameters) and under the L2? Some are under further reading, but the vast majority are under the L2, and it would be nice to have that uniform. Vininn126 (talk) 09:18, 11 August 2022 (UTC)[reply]

@Vininn126 First one is done. Benwing2 (talk) 03:42, 12 August 2022 (UTC)[reply]

@Vininn126 Second one is running, with 8,791 pages to change. Note also, my script issued 4,406 warnings on {{col3|pl}} invocations without a title, probably too many to fix by hand. Benwing2 (talk) 06:42, 12 August 2022 (UTC)[reply]

@Benwing2 Thanks for #1. As to #2 - yeah there are a lot. The only way I could see fixing it by bot is by having a bot check the POS of each word within the title, then creating a separate col3 for it. Vininn126 (talk) 08:39, 12 August 2022 (UTC)[reply]

Quiet Quentin puts invalid language codes

As I noted in January, the Quiet Quentin gadget sometimes uses invalid (or incorrect) language codes, most recently pt-BR here. I assume it gets the codes from the Google Books metadata, so it'd probably be difficult to fix cases where the language is wrong (French mislabelled "en", etc), but it should be easier to fix the issue of invalid codes: where the language code fetched from Google is "pt-BR", always output "pt" instead, always replace "un" with "und", and probably we can find the full list of codes Google uses and identify others to replace. - -sche (discuss) 05:01, 12 August 2022 (UTC)[reply]

Another: [1]. - -sche (discuss) 18:46, 8 June 2024 (UTC)[reply]

T:en-plural noun with two singulars?

An IP has requested a sg2= (or whatever it should be called) parameter in T:en-plural noun for tranx (tranquilizers). But possibly it's better to either view trank and tranq as words for "tranquilizer" that are separate from (and not 'forms of') tranx (given the difference in spelling), or conversely to redefine tranx as a simple plural of one or both of those. - -sche (discuss) 21:03, 13 August 2022 (UTC)[reply]

I added a |sg2= parameter if you (or the IP) want to use it. This, that and the other (talk) 12:18, 14 August 2022 (UTC)[reply]

Preventing emoji display in titles

Since we generally do not want entries for emojis, is there a way of preventing emoji display for Unicode characters that have dual identity? The variation selector U+FE0E will force text display, for characters such as ♀, ♂ and the signs of the zodiac, that may display as either text or emoji depending on which fonts the user has installed, or which OS they're using. DISPLAYTITLE doesn't seem to work with this though because it doesn't recognize the text variant as being the same character. kwami (talk) 03:52, 14 August 2022 (UTC)[reply]

If we want to do this (I say if because we do have several entries for emojis), would it work to use the same method as in MediaWiki:UnsupportedTitles.js? Some tweaking might be needed, as I think it currently expects the pages it's changing to be subpages of Unsupported titles, whereas these could be at "regular" main-namespace locations, but it seems like it ought to work. - -sche (discuss) 05:36, 14 August 2022 (UTC)[reply]

@-sche I think that should work just fine. For readers who see the characters in their text forms anyway, they wouldn't even notice the difference.

It would be most convenient if this were enabled as a stand-alone option, so we don't need to make a protected-edit request every time we notice a problem in an article title. But I imagine it should be functional to add a line to the js for e.g. '♉': '♉︎',.

I'm not opposed to articles for emojis that are notable in themselves. I'm thinking of articles that are intended to be text, but whose titles will display as emojis for some readers. For example, ♀︎ and ♂︎ have been used for centuries as text characters. That's the basic form. The emoji variants ♀️ and ♂️ are very recent and I've never seen them used in a biology or astronomy text. I think it's good to have the emojibox showing the two variants, but the reason for the articles is the original text characters.

For me, ♀︎ and ♂︎ display as text, so the emoji problem is not something I'm generally aware of. But the zodiac symbols do display as emojis on my system, e.g. ♉️ instead of ♉︎, and the majority of the time they shouldn't. The Wiktionary articles are not about these characters as emojis, but about their occurrence in the lit, which is almost entirely in their text form. kwami (talk) 21:23, 15 August 2022 (UTC)[reply]

I don't think it can (or should) be un-protected, I think all MediaWiki: pages and especially .js pages are inherently restricted to being edited by admins or interface admins because the potential for vandalism is too vast.
As a test, I added '♉': '♉︎', but as I suspected it only works on Unsupported_titles/♉; someone with slightly more time than me will need to add a little code to let the script change the titles of main namespace pages and not just subpages of Unsupported titles, but I've given proof of concept. - -sche (discuss) 22:02, 15 August 2022 (UTC)[reply]

Oh, I agree that the js probably shouldn't be unprotected. That's why I think that a stand-alone version would be best.

I created the page Unsupported_titles/♉ and it does indeed work with your addition to the js list. kwami (talk) 02:22, 16 August 2022 (UTC)[reply]

User:Erutuon, can you tweak MediaWiki:UnsupportedTitles.js so it can modify titles of pages in the main namespace and not just subpages of Unsupported titles? - -sche (discuss) 23:43, 16 November 2022 (UTC)[reply]

@Erutuon, -sche Just a nudge. kwami (talk) 22:43, 19 February 2023 (UTC)[reply]

I've written the code to do this today, but haven't activated it because it needs to be a gadget. MediaWiki:Common.js used to be able to determine it can have an effect and only load it then, but now it can have an effect on any mainspace page so it needs to be loaded as a gadget so that we aren't making visitors do two more web requests on every page. I'll try to get to gadgetifying it soon. — Eru·tuon 07:01, 23 February 2023 (UTC)[reply]

It's loaded as a gadget and is working for me. — Eru·tuon 01:37, 24 February 2023 (UTC)[reply]

Forgot to mention, the unsupported title data is now in MediaWiki:Gadget-UnsupportedTitles.json. — Eru·tuon 21:28, 24 February 2023 (UTC)[reply]

Thanks. I started a thread on the gadget talk page for the remaining symbols. kwami (talk) 05:12, 26 February 2023 (UTC)[reply]

Translation tables are gone haywire

{{trans-mid}} is no longer splitting the table in two. Anatoli T. ^{(обсудить}/^вклад) 02:46, 15 August 2022 (UTC)[reply]

See Wiktionary:News for editors#August 2022 for an explanation of what has changed. 24.137.99.97 02:54, 15 August 2022 (UTC)[reply]

@Atitarev: see Wiktionary:Grease_pit/2022/July#Finally_killing_off_{{trans-mid}} Chuck Entz (talk) 03:06, 15 August 2022 (UTC)[reply]

Oh, no!

A few questions:

Are you seeing any problems besides the table no longer being split in two?
If you "zoom out" in your browser (decreasing the text size), does the table eventually split? Or does it stay un-split even when the text is extremely tiny?
If you visit Special:Preferences#mw-prefsection-rendering, which 'skin' does it tell you you're using?
What browser and browser version are you using?
How big is your screen?

Thanks in advance!

—Ruakh_TALK 03:32, 15 August 2022 (UTC)[reply]

@Ruakh, Chuck Entz: I've got the same concerns as Ruakh. Also, it doesn't seem to follow alphabetical orders any more, e.g. "Catalan" follows "Oriya" in culture#Translations. Was it even tested before released?! --Anatoli T. ^{(обсудить}/^вклад) 03:37, 15 August 2022 (UTC)[reply]

@Ruakh, Chuck Entz, Atitarev The alphabetical order is now broken for me (Chrome 67.0.3396.87 on Mac OS). For example, on the link Anatoli posted above (culture#Translations), I see Afrikaans through Sanskrit in the left-hand column, and the right-hand column has Dhivehi through Latin, followed by a blank line, then Scots through Zhuang. If I make the text bigger, it switches to one column, which is alphabetized. If I make the text smaller, I have Afrikaans through Scots in the left-hand column and Dhivehi through Zhuang in the right-hand column. Benwing2 (talk)

@Ruakh, Chuck Entz, Atitarev Signature got messed up, not sure if the ping went through. Benwing2 (talk) 04:03, 15 August 2022 (UTC)[reply]

@Ruakh My skin is Vector legacy (2010). Benwing2 (talk) 04:05, 15 August 2022 (UTC)[reply]

@Ruakh To clarify what is happening, the text goes in alphabetical order halfway down the left column, then continues in the right column, then jumps down to the lower half of the left column, the continues in the lower half of the right column. Benwing2 (talk) 04:09, 15 August 2022 (UTC)[reply]

@Ruakh Also, the CSS class user-testing--this-that-and-the-other does not sound like something that belongs in production code. Benwing2 (talk) 04:14, 15 August 2022 (UTC)[reply]

Thanks! I understand the problem now; {{trans-mid}} doesn't add anything to the page, but merely *having* it there means that we have two separate lists (since {{trans-mid}} is on its own line in the wikitext, which becomes a blank line), and so each list gets column-ized. This seems fixable to me, but it'll require thought and testing. For now I'll revert the changes.

Re: the CSS class that mentions 'testing': Yeah, I just left it there for now because it provided a backward-compatibility bridge to the old CSS (since the CSS sometimes gets cached in browsers). I figured we could remove it eventually.

—Ruakh_TALK 04:22, 15 August 2022 (UTC)[reply]

@Ruakh, Benwing2 I'm really sorry about this; a bad oversight.

It looks like setting {{trans-mid}} to contain * on its own line might work? See my latest change to User:This, that and the other/subject: I inserted {{User:This, that and the other/trans-mid}} between French and Galician in the "in grammar" translation table, with no perceptible impact on the display of the translations.

Compare this to the "main topic" translation table, which contains a blank template (just like {{trans-mid}} was before the change was reverted) between Hindi and Hungarian. This messes with the order of translations. This, that and the other (talk) 04:50, 15 August 2022 (UTC)[reply]

@This, that and the other The "in grammar" section looks good to me, while the "main topic" looks bad. If this is intended, then your fix should work. Benwing2 (talk) 06:00, 15 August 2022 (UTC)[reply]

That's good to know. Yes, I made "main topic" look bad on purpose as a point of comparison. We should set {{trans-mid}} to * on a line by itself. This, that and the other (talk) 06:40, 15 August 2022 (UTC)[reply]

Off-topic remark: I'm not sure what has ruined the translation tables of the Turkish and Armenian editions of Wiktionary. --Apisite (talk) 07:17, 15 August 2022 (UTC)[reply]

tr:sulamak looks fine to me - in fact, it looks like they have got in ahead of us and implemented CSS columns already. I found it hard to locate a translation table on Armenian Wiktionary - can you point to an example? This, that and the other (talk) 08:57, 15 August 2022 (UTC)[reply]

senseno not working in reconstructions

It is asking me to add * before the word, but we are not giving words in this template at all, it just links to senseid on the same page. Error:

Lua error in Module:links at line 56: The specified language Proto-Slavic is unattested, while the given word is not marked with '*' to indicate that it is reconstructed

Sławobóg (talk) 09:29, 15 August 2022 (UTC)[reply]

Should work now. 24.137.99.97 14:07, 15 August 2022 (UTC)[reply]

It is working now. However it is bugged in real time preview (pic). After saving, it works fine, but it might be worth fixing. t:reconstructed had the same problem. Sławobóg (talk) 14:22, 15 August 2022 (UTC)[reply]

Impossible to fix given the way the module works: it reads the page content and scans for the relevant senses. If the senses aren't on the saved page yet, it has nothing to work with. It could be changed to output a less scary error message, I suppose. 24.137.99.97 14:34, 15 August 2022 (UTC)[reply]

Maybe this helps. 24.137.99.97 14:54, 15 August 2022 (UTC)[reply]

It's good now. IMO, it would be a good idea if the sense would highlight when you hover the cursor over senseno (clicking should still be possible). Sławobóg (talk) 15:00, 15 August 2022 (UTC)[reply]

That would have to be done using JavaScript. The following snippet seems to do the job:

$("a[href^='#'][href*=':_']").hover(function(e) {
    document.getElementById(decodeURIComponent($(e.target).attr('href').substring(1))).style.backgroundColor = "#DEF";
  }, function(e) {
    document.getElementById(decodeURIComponent($(e.target).attr('href').substring(1))).style.backgroundColor = "";
});

142.166.21.76 19:59, 15 August 2022 (UTC)[reply]

(1) Where would this Javascript go? (2) What is the issue with Module:links that needs fixing? Your change to Module:senseno seems to be hacking around a bug elsewhere, which is bad coding practice; we should fix the bug at the source. (3) Please please create an account. There are several knowledgeable IP's around here and it's highly annoying dealing with them, as it's impossible for me to sort out how many of them there are or which ones are the same person as which other ones. If you're worried about anonymity, you surely know that accounts are more anonymous than IP's, since you can geolocate your IP address (in your case to St John's, Newfoundland). Benwing2 (talk) 04:40, 16 August 2022 (UTC)[reply]

(1) MediaWiki:Common.js seems like it would work, I suppose, but I can't test that. The only issue I could foresee is that it requires jQuery to be loaded by ResourceLoader, but it looks like the code in Common.js already assumes it's loaded. The code basically just implements the feature Sławobóg requested where mousing over a senseno link highlights the sense, instead of having to click on it. I can't comment as to the desirability of the feature.

(2) The entry hobby horse is a good example. On this entry one of the senseids is {{senseid|en|child's toy}}. If you try to link to this sense using any of the linking templates, e.g. {{l|en|hobby horse|id=child's toy}}, it generates the link [[hobby horse#English:_child%27s toy]]. That looks reasonable, but when actually visiting the link the sense is not highlighted. This is because the anchor ID itself is already escaped as English:_child%27s_toy, implying that the correct link would have to be doubly escaped as [[hobby horse#English:_child%2527s_toy]]. 142.166.21.76 05:02, 16 August 2022 (UTC)[reply]

→ before senseid

I want to mention some other bug here with related template.

# → {{senseid|en|smth}} [[asdf]]

Does that:

→
asdf

It makes it impossible to add senseid for "derived meanings". Although perhaps it would be good if "derived meanings" had some better solution, e.g. a template. Sławobóg (talk) 13:31, 30 August 2022 (UTC)[reply]

@Sławobóg: The documentation for {{senseid}} says "The template must be placed at the beginning of a definition line". So you need to put {{senseid}} before the arrow.

# {{senseid|en|smth}} → [[asdf]]

→ asdf

That said, I'm not even sure that the '→' notation is standard or would be well-understood by readers. It should probably be in the Symbols section of Appendix:Glossary at least. 98.170.164.88 23:54, 30 August 2022 (UTC)[reply]

Ok, I'm blind, sorry. It is not standard thing but it would be nice to have it. Surjection made some test here. Sławobóg (talk) 10:48, 31 August 2022 (UTC)[reply]

Collation severely messing things up in Template:akk-sign values

Look at 𒊨#Sign values, ‘Phonetic values’ rubric. What’s happening there? Looking at the module code I reckon that collation is at fault. ―Biolongvistul (talk) 10:05, 17 August 2022 (UTC)[reply]

~~What collocations even exist on the page?~~ I can't read. Vininn126 (talk) 11:23, 17 August 2022 (UTC)[reply]

@Biolongvistul: The template doesn't support qualifiers, so I don't know why you are expecting them to work. — Fenakhay ^{(حيطي · مساهماتي)} 11:32, 17 August 2022 (UTC)[reply]

Well, 𒁲 is working just fine, how is that? --Biolongvistul (talk) 14:34, 17 August 2022 (UTC)[reply]

@Biolongvistul: Working fine in some cases and being supported are two different things. The module wasn't intended to support qualifiers. I can see it is not working because you are passing multiple qualifiers (the extra |). I don't have time to add this feature at the moment. — Fenakhay ^{(حيطي · مساهماتي)} 14:43, 17 August 2022 (UTC)[reply]

This issue arises because the module splits phonetic values at commas (Module:akk-sign_values#L-27), which are also used between multiple qualifiers. It would be possible to change the splitting to be more sophisticated, but I think it might be better to add separate qualifier parameters instead of trying to jam everything into one string which then has to be parsed. 142.166.21.76 15:04, 17 August 2022 (UTC)[reply]

The module was only designed to display a list of text in an alphabetical order, as there was no need for qualifiers. And personally I don't know why we need them. If you want to add support for qualifiers, go ahead and implement it. — Fenakhay ^{(حيطي · مساهماتي)} 15:08, 17 August 2022 (UTC)[reply]

Rhymes and Category:Rhymes

I noticed we have both Rhymes and Category:Rhymes pages here, which serve basically the same scope, the latter objectively more efficiently than the former. My experience with Rhymes pages has been either (1) it's a redlink or (2) it's a bluelink, but it has only a portion of the entries present in its Cat counterpart, and also might have some entries which in the meantime have become red or yellow links. In general the fact that rhymes must be inserted manually through a textbox in a project where everything is automated with templates and categories sounds a bit uncharacteristic.

It's true that Rhymes pages have some information that the Cat pages lack, like a Pronunciation section (for people who cant read the title of the page), Spelling section (for people who cant look at the entries contained in the Category), mergers information (probably the only useful thing), etc (?).

My proposal is thus:

add the (valuable) information to the Cat pages (easily automated)
make {{rhymes}} link to the Cat pages and not the Rhymes pages

We could probably also make good use of better subcategorizing, for example Category:Rhymes:Italian/ate could be a subcategory of Category:Rhymes:Italian/a–e, which would also contain Category:Rhymes:Italian/are and Category:Rhymes:Italian/arne, but that's a postponable proposal.

Catonif (talk) 15:10, 17 August 2022 (UTC)[reply]

I think, or at least hope, this is the goal, to slowly move everything in Rhymes: to the categories. - -sche (discuss) 18:15, 17 August 2022 (UTC)[reply]

That would be nice. Vininn126 (talk) 19:59, 17 August 2022 (UTC)[reply]

The combining-diacritic section of the IPA/enPR character-input menu is borked

The Wiktionary character-input menu, with the borked combining diacritics indicated by the black arrow.

In the "IPA and enPR" edit-page character-input menu, the latter part of the diacritics section, containing combining diacritics, is borked and unusable, with the combining diacritics zalgoing on top of each other and unclickable (see first image to the right; you'll need to view the full-size image to see the problem clearly).

The Wikipedia character-input menu, with the combining diacritics, this time non-borked, indicated, again, by the black arrow.

This appears to be a Wiktionary-specific problem (rather than a bug with the MediaWiki software itself), as the combining-diacritic section of the Wikipedia IPA character-input menu is not borked (see second image to the right; again, you'll need to look at the full-size image to get a clear view of the situation there).

Could an interface admin please fix this? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 17:25, 17 August 2022 (UTC)[reply]

They display correctly for me in Firefox (whether I'm logged in or out), but display incorrectly in Chrome (whether logged in or out). There's been no change to MediaWiki:Edittools recently. Maybe Chrome has recently changed how it handles something we're doing; Wikipedia seems like it might be doing something different from us since their characters are all in their own boxes. - -sche (discuss) 18:11, 17 August 2022 (UTC)[reply]

Looking at the source for the (working) Wikipedia edittools and comparing it to the source for our own (borked) edittools, I can only find two significant differences:

The Wikipedia edittools automatically loads the relevant input menus from a separate JavaScript gadget, while ours keeps everything in the edittools itself.
The IPA combining diacritics in their edittools are mounted on regular spaces, while the borked ones in our edittools are mounted on non-breaking spaces inserted as character references.

I have no idea if either of these could actually explain this problem (and the latter one, at least, seems a bit unlikely, given that combining-character-on- works fine in, e.g., headword-line templates).

And it also turns out the problem is worse than I originally thought. Nearly all of the combining characters in out edittools are borked, not just the ones in the IPA menu. The only combining characters anywhere in our edittools that actually work at the moment, at least on Chrome, are the ones in the dedicated modifiers-and-combining-diacritics menu and in the Arabic, Cyrillic, Greek, and Hebrew menus (all of these menus, unlike the others, enclose their combining diacritics in their own individual span tags, interestingly enough), plus a few isolated examples from the other menus: the Myanmar nga-killer-virama complex (at least, that's what the codepoint readout says) from the Burmese menu, the candrabindu from the Devanagari menu, and the superscript alaph from the Syriac menu. This leaves the Burmese, Devanagari, and Syriac menus with almost all their combining characters borked, as well as every single combining character from the IPA/enPR, Khmer, Lao, and Sinhala menus. In addition, a number of spacing characters are also borked and unclickable in the edittools: the sof pasukh from the Hebrew menu, plus around half of the vowel nuclei from the Lao menu. Pardon my French, but merde. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 19:41, 17 August 2022 (UTC)[reply]

I use Firefox, but looked at the IPA diacritics in Chrome as well and the problem seems to be that MediaWiki:Gadget-Edittools.js replaces   with ASCII space in the displayed text of combining characters in MediaWiki:Edittools, and ASCII space plus combining character isn't clickable in Chrome. No-break space plus combining character is clickable in my Chrome (though it's narrow), so editing the JavaScript should fix the problem. (In Firefox the ASCII space plus rhotic hook U+02DE is unclickable... not sure why.)

I think Chrome must have changed their font rendering because both Firefox and Chrome render the IPA diacritics in Arial, but they are unclickable in Chrome. — Eru·tuon 08:20, 18 August 2022 (UTC)[reply]

I fixed the unclickable diacritics and made diacritics a bit wider so they are easier to click on. Unfortunately, this also increases the spacing between other characters in the Edittools and I don't know how to fix that. — Eru·tuon 22:53, 18 August 2022 (UTC)[reply]

At least it works now, though, so thanx! Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 01:05, 21 August 2022 (UTC)[reply]

Something wrong with sections on mobile

Looks like there's been something going wrong with sections in mobile, which doesn't show contents when I tap to expand them. Temporary issue?-TagaSanPedroAko (talk) 19:05, 17 August 2022 (UTC)[reply]

Oh, it just went back to normal. -TagaSanPedroAko (talk) 19:07, 17 August 2022 (UTC)[reply]

Suddenly all section links deliver me to the top of the page

e.g. so#Spanish goes to the Spanish subheader for just a fraction of a second and then brings me back to the top of the page. Copy/pasting the URL also does that, and refreshing the page doesnt change anything either. Recent code update? Thanks, —Soap— 22:13, 18 August 2022 (UTC)[reply]

It also happens on Wikipedia so this may be outside our purview but I want to leave this section up in case anyone else comes here to see whats going on. Hopefully it's fixed soon. —Soap— 22:14, 18 August 2022 (UTC)[reply]

I'm experiencing this too, whether in Firefox or Chrome or Edge, whether logged in or out, and also if I type a section link into the search bar, or e.g. when I save an edit to this section of this page. Yesterday I was experiencing the page jumping, but not all the way to the top or bottom (I don't recall whether it was jumping up or down). - -sche (discuss) 22:44, 18 August 2022 (UTC)[reply]

Yes, at el.wikt too! I thought i've done something very wrong!! ‑‑Sarri.greek ^♫ I 23:07, 18 August 2022 (UTC)[reply]

Fixed !! ‑‑Sarri.greek ^♫ I 01:10, 19 August 2022 (UTC)[reply]

Section links seem to be working as as normal for me now... - -sche (discuss) 20:35, 19 August 2022 (UTC)[reply]

Q fouling up suppression of quotations

{{Q}} seems to be fouling up the suppression of quotations. While the citation information seems to disappear, the quotation itself is remaining even when one hides quotations. There's an example in the first sense of book. The presence or absence of the parameter |notes= seems to affect the behaviour, at least in User namespace, but this may be a parallel bug. --RichardW57m (talk) 14:50, 19 August 2022 (UTC)[reply]

Belay that. The problem I'd been getting elsewhere was {{Q}} having a surprising sensitivity to line breaks before pipes, particularly the one of the |quote=. The problem in book is something else - a mixture of quotations and usage examples. --RichardW57m (talk) 15:17, 19 August 2022 (UTC)[reply]

Request for a mass-move of entries containing եւ

I am asking a bot-owner to mass-move the 300+ entries at [2] containing the sequence <եւ> to a page title where <եւ> is substituted with <և>, i.e. արեւ should be moved to արև. Please leave a redirect. No need to move the page եւ.

<եւ> and <և> are the orthographical variants of the same thing. The Armenian Wiktionary uses the second one, we use the first one; interwikis are not generated between us for that reason.

I have taken care of linking. Vahag (talk) 18:01, 19 August 2022 (UTC)[reply]

I'm not sure if a bot would have the necessary privileges to move a page over a redirect. If not, I or another bureaucrat can certainly grant them temporarily. Chuck Entz (talk) 18:15, 19 August 2022 (UTC)[reply]

Done Make sure all necessary templates work with the new titles. — SURJECTION ^{/ T / C / L /} 19:12, 19 August 2022 (UTC)[reply]

Thanks! --Vahag (talk) 19:14, 19 August 2022 (UTC)[reply]

Request for new non-lemma part of speech: clitic form

Hi - Mongolian has a feature called vowel harmony, where certain morphemes inflect differently depending on the surrounding vowels. In Mongolian, this impacts a large number of inflectional suffixes. Due to some idiosyncrasies in the conversion from the traditional to the Cyrillic script, a small number of suffixes are written as clitics in Cyrillic: most notably the directive/directional case markers руу (ruu) and луу (luu). These have the counterparts рүү (rüü) and лүү (lüü), which are currently classed as having unrecognised parts of speech.

Could I therefore please request "clitic form" be added to the list of non-lemmas? Theknightwho (talk) 17:05, 21 August 2022 (UTC)[reply]

Just to add that this also seems to affect Finnish. For example, -kö. Theknightwho (talk) 18:19, 21 August 2022 (UTC)[reply]

Aren't рүү (rüü) and лүү (lüü) clitics (i.e. lemmas)? This seems to be how Finnish and Turkish handle suffixes etc which alternate based on surrounding vowels, when it comes to what part of speech they are, even if the definitions define them as alternative forms as far as what they mean; on the face of it, that seems reasonable to me. I would think a non-lemma "clitic form" would be something more akin to how -ōrum is a "suffix form" because it is an inflected form of -or. Granted, Turkish seems to handle these by completely duplicating all information, e.g. between -lık, -lik, luk, and -lük, which seems less than ideal, although maybe it's fine and comparable to English -head vs -hood; but even if we centralize the definitions like is done in Finnish, that doesn't necessarily change the POS. I suppose it depends on whether рүү is considered more like an inflected form of руу, or a (conditional) alternative form; on the face of it it seems more like the latter. - -sche (discuss) 22:04, 21 August 2022 (UTC)[reply]

@Theknightwho, -sche The closest analogical situation I could come up with that is well-defined in Wiktionary is Celtic mutations. Cf. Cornish deg (“two”) with mutated variants teg and dheg, which are categorized as non-lemma forms but not as numeral forms; instead they show up under Category:Cornish mutated numerals. The question here is whether there is reason to privilege Mongolian руу (ruu) over рүү (rüü). If so, the former should be the lemma and the other a non-lemma form, but not necessary a "clitic form"; if not, they should both be variant lemmas. However, I question whether we need the "clitic" lemma category at all; why can't "suffix" suffice? Benwing2 (talk) 00:16, 22 August 2022 (UTC)[reply]

The reason for lemmatising руу (ruu) is because that's the default form you'll encounter in the literature. The common shorthand would be руу². The same goes for the four-way sets, where -аас (-aas), -оос (-oos), -ээс (-ees) & -өөс (-öös) would be given together as -аас⁴.

The primary reason for wanting these to be forms rather than lemmas is because we don't want to arbitrarily divide the "words suffixed with..." categories in two or four. Fundamentally, it's the same suffix (or clitic). Theknightwho (talk) 00:24, 22 August 2022 (UTC)[reply]

Ah. I would say "should the categories be merged or split?" and "what part of speech are these / are these lemmas?" are separate questions, and we could merge the categories while still viewing рүү as equally a lemma alongside (rather than an inflected form under) руу. I mean, from time to time it has been proposed that not only {{af}} but also {{compound}} should categorize, and I'm inclined to think that if we created a *"Category:English compounds containing favo(u)r it should contain both spellings, but that doesn't mean favour is a non-lemma form or "noun form" of favor or vice versa. But I'll defer to editors of languages like Finnish or Turkish or Mongolian that have such vowel harmony variations in suffixes/clitics if they think this should be considered a kind of inflection / creating non-lemma forms. - -sche (discuss) 00:57, 22 August 2022 (UTC)[reply]

@Theknightwho Tagging рүү (rüü) as a clitic form of руу (ruu) won't fix the issue of categories being split (I know this definitively because I've done a lot of work on Module:compound, which does the categorization). The only auto-merging of categories in that code happens as a result of the entry-name mappings in the language data modules, which do things like remove macrons and acute accents. We'd need a whole new mechanism (unless all editors are willing to manually use a two-part link of similar in the {{affix}} call, which IMO isn't a good solution because it's easy to mess up). Benwing2 (talk) 04:01, 22 August 2022 (UTC)[reply]

@Benwing2 That is a good point. I suppose what I was getting at was that it's a way of clearly demarking that they're the same lexeme, even though there might be morphological differences (which sometimes only exist on paper, anyway). Re why we have clitics, I think the best example of a term that is neither a word nor affix is Latin -que.

@-sche I think the difference between alternative forms and inflectional forms is that inflectional forms are determined based on contextual rules, rather than simply being at the discretion of the speaker (e.g. English spelling differences). Fundamentally, we always choose one inflectional form to lemmatise over the other(s), which is usually chosen so as to be consistent with dictionaries in that languages etc, rather than for any linguistic reason. Mongolian linguists happen to lemmatise back vowels over front vowels and unrounded vowels over rounded, so that's what I've gone with. Theknightwho (talk) 10:39, 22 August 2022 (UTC)[reply]

@Theknightwho -que is a clitic because it attaches to arbitrary parts of speech; does руу (ruu)/рүү (rüü) behave in the same way, or does it always attach to the last noun/adjective in a noun phrase (in which case it would be a suffix with Suffixaufnahme applying)? As for руу (ruu)/рүү (rüü), I continue to believe these are not "clitic forms". Fundamentally, they are more similar to Celtic mutations (as I mentioned above) or to Italian apocopic forms (e.g. anch' vs. anche). The basic commonality of Celtic mutations, "Altaic" vowel-harmonic forms and Italian apocopic forms is that they are all *phonological* variants, whereas "POS forms" is intended specifically for *morphological* variants. Otherwise in Italian we'd need Category:Italian preposition forms, Category:Italian adverb forms and such, which seems very weird. Apocopic forms are treated as lemmas in their own right, whereas mutations are treated as non-lemma forms, but not inflectional variants. I would argue that vowel-harmonic variants are all lemmas. Benwing2 (talk) 02:17, 23 August 2022 (UTC)[reply]

@Benwing2 Re whether руу (ruu) is a clitic, perhaps a better example is аа⁴ (aa⁴), which denotes the focus of a sentence, and very roughly equivalent to English prosodic stress (e.g. "I am going to do that."). It's inherently reliant on the presence of another term, but not fundamentally connected to any part of speech. The point is that a clitic which obeys vowel harmony undoubtedly exists.

As for whether clitic forms exist: the four-way division is an artificial construct of the Cyrillic script. The Mongolian script only has two forms for that same term: аа (aa) and оо (oo) both represent ᠠ (a), while ээ (ee) and өө (öö) both represent ᠡ (e). I could buy the argument that they're similar to mutations, but it seems implausible that they're genuinely separate lemmas. Theknightwho (talk) 04:00, 23 August 2022 (UTC)[reply]

Using |altN= in affix templates is exactly what is already done with Finnish entries. Mistakes can definitely happen, but it's not like this is the only thing where mistakes can happen (IDs in my experience for example result in a lot more categorization mistakes than this). — SURJECTION ^{/ T / C / L /} 07:09, 26 August 2022 (UTC)[reply]

Just noting that the entire notion of split categories should not arise with clitics. If something is a clitic, it has no business having a "words derived with" category assigned to it. Equivalently, if something definitely needs a category, it should not be a clitic but an affix. --Tropylium (talk) 09:20, 26 August 2022 (UTC)[reply]

Category:Finnish words suffixed with -kin does appear to exist, even though -kin is considered a clitic. — SURJECTION ^{/ T / C / L /} 09:50, 26 August 2022 (UTC)[reply]

Leaving aside the question of inflections, you're forgetting about lexicalised terms that derive from clitics such as Latin unusquisque. Theknightwho (talk) 11:16, 26 August 2022 (UTC)[reply]

The |altN= solution is also used in Mongolian. It would be good to have a more automated solution, though. Theknightwho (talk) 11:19, 26 August 2022 (UTC)[reply]

I know different languages might conventionally treat their vowel-harmony-variants differently, but because we should at least consider cross-linguistic consistency and Finnish has been brought up above, I'm pinging some Finnish speakers @Brittletheories, Hekaheka, Mölli-Möllerö, Surjection, Tropylium in a bid to get more input: this is an esoteric question, but if you have an opinion, at least for Finnish, do you think vowel harmony variants like -ko ~ -kö are better considered as context- (vowel-harmony-) dependent lemmas ("clitics" or "suffixes"), like say English a vs an or Italian anche vs anch' are all lemmas, or would it be better to consider only one of them the lemma and to consider the others to be non-lemma forms of it ("clitic forms" or "suffix forms"), the way past-tense verb forms or plural noun forms are non-lemma forms of base verbs/nouns? - -sche (discuss) 17:47, 25 August 2022 (UTC)[reply]

I have no opinion on whether the alternative forms are considered lemmas or not (as in whether they are categorized under lemma categories or nonlemma categories), but I stand by the notion that we pick one form as the representative form under which we document the suffix and the other forms should be soft redirects. This is how such cases are usually handled. More grammatical sources tend to use shorthand notation to cover all variants, such as using A, O, U for a/ä, o/ö, u/y respectively (-ko, -kö would be represented as -kO). Any sources that list either form or both forms always either list the back-vowel form (a, o, u) or list it first. As a pertinent side note, I don't think even "clitic" is currently accepted under our current configuration (it's considered an unrecognized part-of-speech). — SURJECTION ^{/ T / C / L /} 18:38, 25 August 2022 (UTC)[reply]

I completely agree with this, yes, and it sounds like Finnish and Mongolian take similar approaches.

I do think we should have "clitic" as a part of speech - you could argue that they're often other parts of speech (e.g. Latin -que is a conjunction and English 've is a verb etc.), but we should still recognise them as what they are. Plus those that would otherwise be classed as particles (e.g. Mongolian аа (aa)) are better referred to as clitics outright, since "particle" is just the miscellaneous category of parts of speech. Theknightwho (talk) 18:47, 25 August 2022 (UTC)[reply]

Hungarian palindrome bug

gazság is categorized as a palindrome, which is wrong, since the <zs> in the middle is not one letter ж, but two letters зш, straddling a morpheme boundary. I didn't see in the page's code how it's getting called a palindrome. The pronunciation has "gaz#ság", where the number sign shows that <zs> is not a digraph; can some similar annotation prevent it from being called a palindrome? PierreAbbat (talk) 05:42, 22 August 2022 (UTC)[reply]

@PierreAbbat This category is added in Module:headword. This uses Module:palindromes, which can be hacked not to treat <zs> as a digraph, but then no words with <zs> in the middle would be palindromes. If this needs to be contextual, we'd need to add a flag to {{head}} to disable the palindrome category, and thread it through {{hu-noun}} and all other Hungarian head templates. Benwing2 (talk) 02:21, 23 August 2022 (UTC)[reply]

@PierreAbbat, Adam78 (the latter as a recently-active Hungarian-speaking editor): Is the pronunciation given, /ˈɡɒʃːaːɡ/, correct? If correct, the pronunciation suggests that the -zs- is being treated as one sound regardless of the etymology, and so could also be considered one thing for palindrome purposes; OTOH, the entry was created by a now-indef-blocked user, so it could be wrong.
Adam, can you speak to whether this word (or -zs- words in general) should be considered palindromes? - -sche (discuss) 20:53, 31 August 2022 (UTC)[reply]

@-sche The pronunciation is correct: there is a z followed by an s, the latter pronounced [ʃ] as usual in Hungarian (rather than a digraph zs), and the first sound (z) undergoes total assimilation to become identical with the subsequent sound, hence the long [ʃː]. Therefore it is the result of two sounds that assimilate, rather than one sound, as the written form might suggest. (There are other words as well where the spelling may be misleading; I used to write a Wikipedia article on them.) In my opinion it cannot be called a palindrome, not only because z+s is not the same as s+z (which it would be backwards) but also because a is not the same as á; they are distinct phonemes. @Panda10 just for your information. Adam78 (talk) 21:57, 31 August 2022 (UTC)[reply]

@Adam78 As far as I can tell, vowel length is ignored in checking for palindromes, e.g. nőkön, szemész. PierreAbbat (talk) 02:34, 1 September 2022 (UTC)[reply]

So, would you say that no words with zs should be considered palindromes? That'd be easier to fix/implement than if only some words where 'z' and 's' belong to different morphemes need to be special cased. What about sz: szusz is currently categorized as a palindrome, is that right? - -sche (discuss) 03:30, 1 September 2022 (UTC)[reply]

@-sche No, on the contrary. I think all such words with digraphs should be considered palindromes (including e.g. szusz), except those few (!) where the digraph is a false digraph / pseudo-digraph (compare Category:English terms with consonant pseudo-digraphs), such as in gazság. I wonder if the latter group could be excluded on the basis of having a hashmark in the pronunciation field or perhaps with a template like {{nopalindrome}}. Adam78 (talk) 09:49, 1 September 2022 (UTC)[reply]

@Adam78 The problem is that the palindrome code is in Module:headword and so the module code only sees the headword template that was passed to it, unless it decides to scan the entire page for special templates or marks, which is considered an "expensive" operation and in fact will eat up significant memory and time. Benwing2 (talk) 02:43, 3 September 2022 (UTC)[reply]

@Benwing2, Adam78: Would you be open to adding a new parameter to {{head}}? Something like nopalindrome=y or whatever makes sense. Panda10 (talk) 16:41, 4 September 2022 (UTC)[reply]

@Panda10, Adam78 Yeah this should be possible. There's already noposcat, nomultiwordcat and nogendercat so maybe it should be nopalindromecat. Benwing2 (talk) 17:15, 4 September 2022 (UTC)[reply]

@Benwing2 It would be great, thanks. Panda10 (talk) 17:17, 4 September 2022 (UTC)[reply]

@Panda10, Adam78 Try it now. You will have to add a |nopalindromecat= param to {{hu-noun}} or whatever and pass it into {{head}}. Benwing2 (talk) 17:21, 4 September 2022 (UTC)[reply]

@Benwing2, Adam78 I'm sorry I don't know how to do this. There are five templates that call {{head}}: {{hu-noun}}, {{hu-verb}}, {{hu-adj}}, {{hu-pron}}, {{hu-adv}}. I'd appreciate your help. Panda10 (talk) 17:33, 4 September 2022 (UTC)[reply]

@Benwing2: Thank you very much for updating the templates and cleaning up the documentation in {{hu-noun}}. It was extremely helpful. @Adam78: We might want to document the rules for Hungarian palindromes in the category itself. Do we take this strictly or not so much? I don't understand how szusz can be a palindrome, unless sz is treated as sz backwards and not zs. Same for the other digraphs. I agree that a ≠ á, e ≠ é, i ≠ í, o ≠ ó, ö ≠ ő, u ≠ ú, ü ≠ ű but some would interpret them as the same in palindromes. I could not find any official rules for this. Panda10 (talk) 17:26, 5 September 2022 (UTC)[reply]

@Panda10 The rules for Hungarian palindromes are language-specific and are treating digraphs like single characters, and mapping accented chars to non-accented chars. They are here: Module:palindromes/data. This can be changed. Benwing2 (talk) 17:43, 5 September 2022 (UTC)[reply]

@Benwing2 This is very useful. Thank you. Panda10 (talk) 17:28, 6 September 2022 (UTC)[reply]

@Benwing2: I thank you too!! @Panda10 Apparently, this notion of palindromes is based on the ideas that digraphs form letters on their own (as fundamental elements of the alphabet) and they cannot be split up into their constituting letter elements (betűjegy). I don't know if it's a good idea as far as Hungarian is concerned, but e.g. the well-known palindrome sentence Indul a kutya s a tyúk aludni can't work unless we keep the order of letter elements within ty (and this sentence also disregards vowel length). The case is the same for gy, ly, ny, and generally for cs and dz too, so it's only sz and zs that could be reversed: only these latter two digraphs could be viewed as exceptions when creating palindromes. – It's a pity the software can't collect words that are different when spelled backwards but both forms are meaningful, e.g. ingovány and nyávogni (mind you, the ny is treated as a single letter here as well). Anyway, in the current system we can have many more palindromes than otherwise, due to the many (predominantly) useless letter sequences like yg, yl, yn, yt, and sc, which would arise. On the other hand, the language-specific treatment of accents like ó and ő could be changed as far as Hungarian is concerned, even if length is disregarded, as it is done in Hungarian dictionaries and, incidentally, in crossword puzzles (o = ó and ö = ő); the same way for ú vs. ű as opposed to u/ú and ü/ű. A and á as well as e and é are treated as the same in dictionaries, while (as far as I remember) they are treated as distinct entities in crossword puzzles. Adam78 (talk) 18:12, 5 September 2022 (UTC)[reply]

@Adam78: Thank you for the clarifications. I hope I understand this correctly: You are saying that the current settings are adequate and no more changes are necessary. The few exceptions such as gazság will be handled manually with the new parameter. Panda10 (talk) 16:47, 6 September 2022 (UTC)[reply]

@Panda10: I'm sorry I wasn't clear about the suggested conclusion. Benwing said these two features can be switched on or off for individual languages:

treating digraphs like single characters
mapping accented chars to non-accented chars.

What I meant to say was that (in my opinion) the first is okay the way it is now, as far as Hungarian is concerned, and the second should be modified to distinguish o/ó from ö/ő, as well as u/ú and ü/ű, and probably a and á should be distinguished too just like e and é, while the other five vowel pairs that differ only in length (i/í, o/ó, ö/ő, u/ú, and ü/ű) should be treated as the same. I tend to think that this practice would reflect best Hungarian native speakers' notion of how palindromes actually work, what is viewed as acceptable and what is not. What do you think about it? – PS: I went through a collection of palindromes I once saved from deletion and I noticed that sz/zs and dz/zd are actually reversed in palindromes (i.e. treated like separate letters), as opposed to the other digraphs that are irreversible (unswappable). Maybe these two digraphs should be taken as exceptions. Otherwise, I was confirmed that a/á and e/é are treated as different letters while the other vowel pairs, as identical ones; I checked them with automatic replacements in Excel. Adam78 (talk) 11:02, 7 September 2022 (UTC)[reply]

@Adam78: Thank you for double checking and summarizing the rules. Looking at the code in Module:palindromes/data, I think the following three mappings should be removed from the ["hu"] section: á to a, é to e and dz to ď. (I hope I understand the mappings correctly). The rest of the mapping is according to the rules. Would you agree? Panda10 (talk) 17:15, 7 September 2022 (UTC)[reply]

@Panda10 First I thought sz and zs should be removed too, but when I looked at the list again (sorry, I misformatted the link in my previous message; now it should work), I found only 11 instances where sz is to be read as zs when read backwards (e.g. Szúr a rúzs; Darázs eledele szárad; Anna, szuszogó Zsuzsanna; Zselatin, Anita, lesz?) and considerably more where they are to be read as unswapped (e.g. Szuezi Zeusz; Mára a Zsuzsa arám; E rébusz a szú bére; Erőszakos kannak sok a szőre) – so yes, the latter two digraphs should be treated the way they are now, i.e. similarly to most others, keeping their internal letter order. No need to remove them from the list of replacements. – On the other hand, I think dzs should be replaced with dž (this way might probably gain some palindromes, as opposed to practically zero that we lose otherwise), so this could be added to this list (that is, we'd treat it as d+zs, similarly to dz being treated as d+z as shown by some examples). – One more thing: I think the doubled digraphs should be added as well, otherwise some phrases will not be recognized (e.g. Deres asszír issza sered; Lesz szérum, Ádám úr, ésszel; Ennyi dinnye! and ággyá, meggyem, áccsá, giccsig, meccsem, eddze etc. among single words). So ccs should be treated in the module as čč, ggy as ǰǰ etc., doubling their existing codes. Thank you in advance! Adam78 (talk) 18:50, 7 September 2022 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Adam78 I agree with the additions of the double digraphs. I'm not sure about the replacement characters, though. Is doubling them OK? Also, dz is treated as a single character in the current code. I'd have to remove the dz to ď mapping to treat them separately. And how to handle ddz? Here is the current code:

["hu"] = { from = {"á", "é", "í", "ó", "ú", "ő", "ű", "cs", "dz", "gy", "ly", "ny", "sz", "ty", "zs", "dzs"}, to = {"a", "e", "i", "o", "u", "ö", "ü", "č", "ď", "ǰ", "ľ", "ň", "š", "ť", "ž", "ǯ"},

And here is the proposed code (unsure about dz and ddz inclusion and about ddzs mapping):

["hu"] = { from = {"í", "ó", "ú", "ő", "ű", "cs", "ccs", "dz", "ddz", "gy", "ggy", "ly", "lly", "ny", "nny", "sz", "ssz", "ty", "tty", "zs", "zzs", "dzs", "ddzs"}, to = {"i", "o", "u", "ö", "ü", "č", "čč", "ď", "ďď", "ǰ", "ǰǰ", "ľ", "ľľ", "ň", "ňň", "š", "šš", "ť", "ťť", "ž", "žž", "dž", "ddž"},

@Benwing2 Would you please confirm that the planned changes are acceptable in the module? Panda10 (talk) 20:15, 7 September 2022 (UTC)[reply]

@Panda10 Indeed, I didn't notice the inconsistence. Eddze as a palindrome calls for "ddz" treated as "ďď", however, gazdamadzag (and some sentences listed on the page linked above, e.g. Te, kérődző főz dőréket? if it makes sense) as a palindrome calls for "dz" treated as two separate letters. I think I'd go for the latter (because there are more examples that rely on this analysis), while eddze (and any other terms if there are) could be added to the category of palindromes manually. Adam78 (talk) 23:51, 7 September 2022 (UTC)[reply]

@Panda10, Adam78 The above substitutions will work except you need to put the double digraphs before the corresponding single ones, as the code proceeds from left to right; otherwise it will first replace dz with ď including in ddz trigraphs, and then the following ddz will never match. Benwing2 (talk) 01:11, 8 September 2022 (UTC)[reply]

@Benwing2, Adam78 The changes are completed. Thank you all for your help. Panda10 (talk) 17:32, 8 September 2022 (UTC)[reply]

Mongolian script space problem

Background

The Mongolian script treats inflectional suffixes in a special way, in that it leaves a (usually shorter) gap between the stem and the suffix. This is represented in Unicode by the narrow no-break space character (NNBSP; U+202F). This does not apply to derivational suffixes, and is not replicated in the Cyrillic script. For example, ᠬᠣᠲᠠ ᠶᠢᠨ (qota-yin, “of the city”), which Cyrillicizes as хотын (xotyn). These can be agglutinated with compound cases etc. Important: this should not be confused with the Mongolian vowel separator, which denotes a tiny pronunciation gap before a vowel (like an apostrophe sometimes does in English), and has nothing to do with inflections. Although both characters are invisible, they cause different display forms in surrounding characters.

Problem

Wikimedia software normalises the NNBSP to an ordinary space. This is nonstandard from Unicode's perspective, and causes proper implementations (e.g. Windows) to perceive a word break between the stem and suffix. This is a problem when it comes to page titles, which you can see from the fact that the above link goes to ᠬᠣᠲᠠ
ᠶᠢᠨ (qota yin). Note the different display format and romanisation, which carries over into the headword by default, but this can be manually overridden with head=. However:

While the title can also be overridden with {{DISPLAYTITLE}}, doing this on Mongolian script entries triggers a warning about overriding another use of {{DISPLAYTITLE}}, which is presumably used in the headword module somewhere. I can't figure out where, though, but I get the impression that it's to do with making sure the title displays vertical text.
It also means {{l-self}} doesn't work properly, because linking to ᠬᠣᠲᠠ ᠶᠢᠨ (qota-yin) from the page ᠬᠣᠲᠠ
ᠶᠢᠨ (qota yin) still creates a blue link. This is an issue for {{mn-variant}}, which is a standard template for Mongolian entries.

Solution

If I'm right about the headword template already using {{DISPLAYTITLE}}, it would be useful for it to notice the presence of a NNBSP in the head= field, and to modify the title display accordingly.
{{l-self}} should be modified so that it treats a NNBSP as a normal space for the purposes of links (so that it recognises when a page is linking to itself), but to treat it as a NNBSP for the purposes of display and transliteration etc.

This should be done on a per script basis, rather than a language one, as this issue will apply to any language which uses (or should use) the Mongolian script code (e.g. Classical Mongolian, Kalmyk, Manchu, Sanskrit). Theknightwho (talk) 18:12, 22 August 2022 (UTC)[reply]

@Theknightwho I got rather confused trying to follow the issue. For example, when you say "Wikimedia software normalizes NNBSP to an ordinary space", what does this mean? However, you are right that it's the Mongolian script auto-triggering the addition of {{DISPLAYTITLE}}. See lines 567 and following of Module:headword. This happens for several scripts including Mongolian; I tracked down the initial addition of this code to this diff: [3] by User:Erutuon, which was initially done for Mongolian only. Can you comment on this change and why it was done? It seems a bit hacky to do this as it results in the above warning in conjunction with user-specified {{DISPLAYTITLE}}. Benwing2 (talk) 02:33, 23 August 2022 (UTC)[reply]

@Benwing2 Mediawiki converts NNBSP (U+202F) to SPACE (U+0020) for the purposes of page titles. For example, ᠬᠣᠲᠠ ᠶᠢᠨ (qota-yin) and ᠬᠣᠲᠠ
ᠶᠢᠨ (qota yin) link to the same page, even though the first one uses NNBSP. This means that it's impossible for NNBSP to be part of the page title, which is why I'm having to request this fudge. It's reasonably important if we're going to have pages for Mongolian inflectional forms, as the presence of a NNBSP not only ensures that suffixes behave correctly, but also ensures that the initial letter of the suffix has the correct form, as it sometimes changes (which applies to this example, in fact - though it's quite subtle at this font size).

From having checked, the current {{DISPLAYTITLE}} puts the title in a span with class "Mong", which is necessary for a vertical display form.

On a related note, it would be good if page titles worked in the same way as {{l}} and {{m}}, by adding a newline for each space (but not for any NNBSP). Some page titles can be quite long (e.g. ᠭᠠᠰᠠᠯᠠᠩ ᠠᠴᠠ
ᠨᠥᠭᠴᠢᠭᠰᠡᠨ (ɣasalang-ača nöɣčiɣsen)). Theknightwho (talk) 03:21, 23 August 2022 (UTC)[reply]

FWIW, the thing with Mongolian automatically using {{DISPLAYTITLE}} was / is indeed to effect verticalization, see Wiktionary:Grease pit/2017/May#Display_of_vertically_written_languages_(Mongolian,_Manchu). - -sche (discuss) 03:27, 23 August 2022 (UTC)[reply]

@Theknightwho This might take a day of work to sort out all the issues and make sure nothing breaks. Can you add an entry to User:Benwing2/todo pointing to this page? Benwing2 (talk) 02:19, 24 August 2022 (UTC)[reply]

Thanks! Added. Theknightwho (talk) 03:30, 24 August 2022 (UTC)[reply]

@Benwing2 It seems the same thing happens with the Mongolian Vowel Separator (MVS; U+180E) as well. This is a different space character that's equally important, and should be treated in the same way (i.e. it needs to be in the displayed title etc.). Compare ᠪᠠᠶᠢᠨ᠋᠎ᠠ (bayin-a) (correct) and ᠪᠠᠶᠢᠨ᠋
ᠠ (bayin a) (incorrect). Both links go to the second one. Theknightwho (talk) 00:42, 28 August 2022 (UTC)[reply]

Module:pag-pron: issues

I had just created this module for Pangasinan, and also intended it to be a testbed for a potential pronunciation module for Ilocano and revisions to existing ones for Tagalog, Bikol and Cebuano, but I'm seeing problems with the handling of accented letters: letters with accents aren't broken down and remain on the generated result. It's fine in most of the other aspects like support for multi-word terms and automatic replacement of most consonants and digraphs. I already created the template that deploys it, but with the problems as mentioned, I can't deploy it for now.-TagaSanPedroAko (talk) 00:10, 23 August 2022 (UTC)[reply]

@TagaSanPedroAko Usually the reasons for difficulties with accented characters is because you have to apply Unicode decomposition, but I see you are doing that on line 71, so I don't quite know what's up. However, I'd recommend moving this module into your user space, i.e. move it to Module:User:TagaSanPedroAko/pag-pron and same for the subpages, and move Template:pag-IPA to User:pag-IPA; that way you don't leave a broken module in the production space. Benwing2 (talk) 02:39, 23 August 2022 (UTC)[reply]

@Benwing2: By the way, I already have a sandbox version of it. I can leave a note to the IPA template.

FYI, the module in question is based on the Spanish IPA module you created, and an Asturian adaptation of it, which decomposes things around the approximate line you pointed. What I don't know is why the the accents don't separate. Same goes with a sandbox version of the Tagalog IPA template. Also, I have little experience with Lua. -TagaSanPedroAko (talk) 02:57, 23 August 2022 (UTC)[reply]

@TagaSanPedroAko What I'm saying is you should not leave partly-finished modules in production space and expect someone else to fix them. Use your user space as a sandbox and only put them in production space when they are working. Benwing2 (talk) 03:13, 23 August 2022 (UTC)[reply]

@Benwing2:Here's the problematic code in question:

function export.IPA(text, phonetic)
	local debug = {}

	text = ulower(text or mw.title.getCurrentTitle().text)
	-- Decompose everything except ñ and ë
	text = mw.ustring.toNFD(text)
	text = rsub(text, ".[" .. TILDE .. DIA .. "]", {
		["n" .. TILDE] = "ñ",
		["e" .. DIA] = "ë",
	})

And the one on the Asturian module. The code in the Spanish is similar, except for the extra lines for styles:

function export.IPA(text, phonetic)
	local debug = {}

	text = ulower(text or mw.title.getCurrentTitle().text)
	-- decompose everything but ñ and ü
	text = mw.ustring.toNFD(text)
	text = rsub(text, ".[" .. TILDE .. DIA .. "]", {
		["n" .. TILDE] = "ñ",
		["u" .. DIA] = "ü",
	})

I can put a sandbox version on my userspace, but I think the issue is with complex coding (I know you can help with these; again, I have little experience with Lua). I had the module marked as experimental given the existing issue. While the Asturian modules works, it doesn't work for the Pangasinan, even with the similar code. Similar issues also happened with the first version of the Tagalog pronunciation module, which I created and based on an earlier version of Module:es-pronunc (which supported individual words only back then).-TagaSanPedroAko (talk) 03:19, 23 August 2022 (UTC)[reply]

@Benwing2: Still have no idea why character decomposition doesn't work in this case, a derivative of Module:es-pronunc. TagaSanPedroAko (talk) 08:37, 23 August 2022 (UTC)[reply]

@TagaSanPedroAko What exactly doesn't work? Do you have testcases? Benwing2 (talk) 02:16, 24 August 2022 (UTC)[reply]

@Benwing2: The acutes, graves and circumflexes aren't correctly decomposed (the Ñ and Ë, the latter used for a schwa-like sound, are correctly decomposed into N and tilde and E and diaeresis but are composed again as they represent their own phonemes), and here are some test cases. The accented letters aren't decomposed in the resulting IPA; all the other things are fine such as the alphabet-to-phoneme, syllable division, support for multi-word terms and phrases, etc.-TagaSanPedroAko (talk) 03:26, 24 August 2022 (UTC)[reply]

@TagaSanPedroAko You never call the function accent_word in your code. Benwing2 (talk) 04:14, 24 August 2022 (UTC)[reply]

@Benwing2: Added it, and things are now fine.-TagaSanPedroAko (talk) 04:57, 24 August 2022 (UTC)[reply]

Wrapping of hyphen in affixes

I noticed a suffix with the hyphen on a different line in an etymology because of wrapping and it seemed odd. Is it supposed to be this way?

E.g., etymological + -
ly. J3133 (talk)

@J3133 Can you give me an example page where this is happening? Benwing2 (talk) 02:11, 25 August 2022 (UTC)[reply]

@Benwing2 Any page that has an affix in the etymology; e.g., happily: you need to make your browser smaller. J3133 (talk)

@J3133 I see, it only happens at exactly the wrong browser size. This seems a fairly minor bug but maybe we can fix it by inserting a ZWNBSP or similar after the hyphen. Benwing2 (talk) 07:08, 25 August 2022 (UTC)[reply]

@Benwing2: I think we should use the word joiner instead: Wikipedia states that ZWNBSP is deprecated for this use. Will you add it to the affix module? J3133 (talk) 07:13, 25 August 2022 (UTC)[reply]

@J3133 I've added an entry to my todo list at User:Benwing2/todo. I don't think this will take too long but I don't have time this evening and there may be some edge cases in the handling of hyphens. Should be able to get to this tomorrow. Benwing2 (talk) 06:23, 26 August 2022 (UTC)[reply]

@Benwing2 @J3133 A better solution is the non-breaking hyphen (U+2011). Theknightwho (talk) 06:58, 26 August 2022 (UTC)[reply]

@J3133, Theknightwho Wikipedia:Non-breaking hyphen seems to indicate that using a non-breaking hyphen will interfere with searching for the affix, and suggests using the word joiner wrapped in a span that sets the font-width to 0 to work around bugs in the Android Wikipedia app. This change is very easy to make; it just requires changing default_display_hyphen() in Module:compound. But it requires some testing to make sure it doesn't cause breakage. An alternative is to wrap the hyphen + affix in .... Benwing2 (talk) 00:15, 27 August 2022 (UTC)[reply]

Perhaps a silly question, but wouldn't it be better to prevent the wrapping using CSS rather than adding an invisible character that is liable to get copy-pasted into people's documents, searches, and Wiktionary entries? This, that and the other (talk) 06:25, 29 August 2022 (UTC)[reply]

Hah, I see that Benwing actually just suggested that right above... This, that and the other (talk) 06:26, 29 August 2022 (UTC)[reply]

Why i was blocked of editing?

I don't know what just happened, i'm not misinformation, or joke or swearing in the articles, but i need to know why

Can't know unless we know what exactly you were trying to add. Vininn126 (talk) 09:11, 24 August 2022 (UTC)[reply]

You can look at the Abuse log here: link. - TheDaveRoss 12:51, 24 August 2022 (UTC)[reply]

Unfortunately, there's someone in Vietnam who has been making tons of bad entries in a number of languages for many years. When they were blocked, they kept on making bad edits from a large number of different internet protocol addresses, which made it impossible to block them without blocking all of Vietnam. This is the least damaging compromise we could come up with. The easiest way around this is to create an account, since this doesn't target logged-in edits at all. Chuck Entz (talk) 14:57, 24 August 2022 (UTC)[reply]

Action automatically identified as harmful (Microsoft support spam)

I'm completely new to this, I tried to create an up-to-date japanese wikipedia word frequency list article but I keep getting identified as harmful with the "Microsoft support spam" spam reason. I'm linking to a github project for the morphological analyzer I used, maybe that's why it thinks it's some kind of Microsoft spam? I generated the list for myself, since I was using the 2015 one to mine words for language learning into an SRS, but I realized the list was somewhat old (7 years), so I figured I'd generate an up to date one. Since I already have it, might as well share it.

Latest list available on wiktionary: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Japanese2015_10000 Page I tried to create: https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Japanese2022_10000 Nickitolas (talk) 21:27, 25 August 2022 (UTC)[reply]

@Nickitolas Looks like a false positive because your account is new and the title has what looks (sort of) like a phone number in it. But this is just a guess; the abuse filter in question is rather complex and I haven't gone through it in detail. Maybe User:Chuck Entz can comment more. Benwing2 (talk) 01:14, 26 August 2022 (UTC)[reply]

It looks like @Surjection has worked a lot more on this filter than I have. Chuck Entz (talk) 06:13, 26 August 2022 (UTC)[reply]

Could you try again now? — SURJECTION ^{/ T / C / L /} 06:28, 26 August 2022 (UTC)[reply]

It works now, thank you! Nickitolas (talk) 00:57, 27 August 2022 (UTC)[reply]

Parameter for Author's Native Language Name

The utility of the way I intentionally misused the "first=" parameter in this diff is obvious to me. Is there a way I can do this without misusing the "first=" parameter? Is a parameter for an author's native language name needed? Does it already exist? --Geographyinitiative (talk) 15:24, 26 August 2022 (UTC)[reply]

I don't see why it's worth having. It seems like just one of an indefinite number of factors involved in understanding the context of the work. What does Joseph Conrad's native language tell us about Heart of Darkness? Chuck Entz (talk) 17:16, 26 August 2022 (UTC)[reply]

Additional script code for Middle Mongolian

Hello - Middle Mongolian (xng) used four scripts: Mongolian, Phags-Pa, Arabic and Chinese. However Module:languages/data3/x only gives the first three, which is a problem for lemmas such as 合温. Could someone please amend the module to add Hani as another script? Many thanks. Theknightwho (talk) 18:20, 26 August 2022 (UTC)[reply]

@Theknightwho Done. Benwing2 (talk) 00:19, 27 August 2022 (UTC)[reply]

Sorry @Benwing2 - could you please also add Soyo (Soyombo) and Zanb (Zanabazar Square) for Classical Mongolian (cmg) and Latn for Mongolian (mn)? The first two are rare ceremonial scripts (Soyombo is on the national flag of Mongolia, for example). Mongolia also officially used the Latin alphabet from 1930 to 1941, before adopting Cyrillic. I don't really plan on using these for the time being, but it would be good to have the infrastructure in place. Really appreciate your help with everything on these. Theknightwho (talk) 10:29, 28 August 2022 (UTC)[reply]

@Theknightwho No problem at all; done. Benwing2 (talk) 18:42, 28 August 2022 (UTC)[reply]

Thanks! Theknightwho (talk) 21:43, 28 August 2022 (UTC)[reply]

`{{trans-mid}}`, again: a slower approach

Early last week, I attempted to apply some changes developed by This, that and the other (talk • contribs), whereby {{trans-mid}} would be a no-op, and the splitting of translation-lists into columns would be handled by CSS. (See [[Wiktionary:Grease pit/2022/July#Finally killing off {{trans-mid}}]].) It did not go smoothly, however, due partly to a bug, and I think partly due to various layers of caching that caused users to see versions of pages that had some of the changes and not others; so I ended up rolling the changes back. (See [[Wiktionary:Grease pit/2022/August#Translation tables are gone haywire]].)

I think we still want to make these changes, but I think it might be better to take a slower approach that's more amenable to testing by more folks, and more amenable to rolling back if issues are found.

I'm thinking that we should do it like this:

Add CSS to [[MediaWiki:Common.css]] to support CSS columns; but instead of attaching styles to existing HTML classes for translations-tables, this CSS would attach styles to new HTML classes. That means, firstly, that this would be a no-op change (totally safe, because nothing belongs to those HTML classes), and secondly, that this could also be used in other places where we do similar things, such as derived terms.
Wait a while for browser caches to clear. (I'm not sure how long this can take; does anyone know?)
Create forks of {{trans-top}} and {{trans-mid}} with the proposed new versions. (These new versions will be slightly different from what we did before — {{trans-top}} needs to use the new HTML classes, and {{trans-mid}} needs to allow the lists above and below it to be a single list — but, same concept.)
Edit a small number of pages to use those forked versions.
Ask various folks — especially the folks who've participated in these conversations and/or reported issues with the prior attempt — to try out those pages and make sure everything looks OK.
Apply the changes to {{trans-top}} and {{trans-mid}}.
Update [[MediaWiki:Gadget-TranslationAdder.js]] as before.
After the changes have been in place for a while and we no longer think a rollback is likely, we can start eliminating {{trans-mid}} from the wikitext.

Thoughts?

—Ruakh_TALK
20:28, 27 August 2022 (UTC)[reply]

Incidentally, I should mention that while looking at this, I noticed that {{der-top}} actually uses a simpler approach to CSS columns. That one isn't as fancy as the proposed changes for {{trans-top}} — in particular, it doesn't support adjusting the number of columns based on the width of the viewport — but it has the advantage that it doesn't seem to depend on any changes to [[MediaWiki:Common.css]], so we don't need to worry about CSS caching and so on. Might be worth considering? —Ruakh_TALK 20:28, 27 August 2022 (UTC)[reply]

@Ruakh this sounds like a great plan. I think we should keep {{der-top}} separate for now, and in the long run it too should transition to use width-based columns (and possibly a maximum width for the whole table!) rather than a fixed number of columns. We will need to deal with the CSS caching issue at some point whatever we do - I don't think it's worth putting off. I haven't been able to find any info on the CSS caching duration, but waiting a few days should be safest. This, that and the other (talk) 03:43, 28 August 2022 (UTC)[reply]

I've done some experiments, looks like the CSS cache time is only 5 minutes! This, that and the other (talk) 04:28, 28 August 2022 (UTC)[reply]

OMG, nice! It used to be something like 30 days, and I knew that it was shorter now, but I had no idea it was so short. That makes this much easier. :-) —Ruakh_TALK 06:55, 29 August 2022 (UTC)[reply]

It didn't work to make {{trans-mid empty. This change creates two separate HTML lists in translation boxes:

* Aari: ...
{{trans-mid}}
* Zuni: ...

when {{trans-mid}} is empty becomes

* Aari: ...

* Zuni: ...

I was going to say that {{trans-mid}} has to be removed from translation sections altogether to get the translations to be a single list, but it may work to edit {{trans-mid}} so that it outputs * (an empty list item). That seems to be parsed into <li class="mw-empty-elt"></li>, which is invisible (at least on desktop). Then all the translations in the box will form a single list across the {{trans-mid}} and they should be able to be styled with columns.

I think the cause of the first attempt failing was just the empty {{trans-mid}}. I've generally gotten my global CSS edits pushed to me within ten minutes.

I'd suggest putting the old and new versions of the translation boxes on the same page and verifying that they both work correctly at the same time before making the edits to {{trans-top}} and {{trans-mid}} and CSS. Having a new class would make that possible. — Eru·tuon 06:31, 28 August 2022 (UTC)[reply]

Yes, that was the problem with {{trans-mid}}, and the single * is also the solution we were planning to use. See User:This, that and the other/trans-mid, transcluded in the "in grammar" translation box at User:This, that and the other/subject for a demo. This, that and the other (talk) 06:37, 28 August 2022 (UTC)[reply]

Ah, good. I hadn't read the previous conversation where you figured out the solution. I somehow never noticed that an empty list item was invisible before. An odd wikitext quirk but useful. — Eru·tuon 06:45, 28 August 2022 (UTC)[reply]

@Ruakh: It seems like there was a consensus to move forward with this according to your plan, are you still willing to re-open this can of worms? If not, maybe @This, that and the other: can use their new admin powers to make the changes. JeffDoozan (talk) 17:27, 28 October 2022 (UTC)[reply]

Yes I was considering this myself. @Chuck Entz does the community hold votes for interface-admin status or is it simply given to administrators on request? If the latter, I'm happy to expand on my credentials if you need convincing. This, that and the other (talk) 01:41, 30 October 2022 (UTC)[reply]

Sorry for the delay! I've now done the first several steps, and am soliciting testing/feedback at: Wiktionary:Grease_pit/2022/October#New approach to columnization of translations; please test free, dictionary. (But yes, This, that and the other (talk • contribs) should certainly feel free to take this over. :-) ) —Ruakh_TALK 03:00, 30 October 2022 (UTC)[reply]

Middle Mongolian and ʼPhags-pa

Hi - sorry for (yet another) Mongolian-related request. The ʼPhags-pa script is one of the scripts used for Middle Mongolian - primarily in the form of edicts. It was also used for Chinese, and (occasionally) Tibetan. There are two issues, however:

As it is written vertically, the headword template needs to be adapted to amend the title to the vertical display form, as it currently does with the Mongolian script.
When used for Middle Mongolian, it adopts some of the quirks of the Mongolian script, such as the narrow gap between stems and suffixes that cannot be used for a line break. This involves the NNBSP, as it does with Mongolian. As such, the issues highlighted in #Mongolian script space problem also apply here.

The best page to use as an example is ꡢꡖ ꡋꡟ (which, for anyone curious, is ᠬᠠᠭᠠᠨ ᠤ (qaɣan-u, “of the khan”) - the second word of the Secret History of the Mongols). Currently, I'm forcing the correct display form with {{DISPLAYTITLE:}}. Theknightwho (talk) 09:15, 28 August 2022 (UTC)[reply]

@Theknightwho I added Phag to Module:headword/data, which should fix the first issue. For the second issue, can you link this section to the appropriate todo entry in User:Benwing2/todo? Benwing2 (talk) 18:46, 28 August 2022 (UTC)[reply]

Cheers - have done. Theknightwho (talk) 22:00, 28 August 2022 (UTC)[reply]

Distinguishing the Latin suffixes -īcius and -icius in categories

Hello. At the moment, all Latin words with either suffix automatically go into Category:Latin words suffixed with -icius. How can we make it so that the ones with -īcius go into Category:Latin words suffixed with -īcius?

Nicodene (talk) 21:43, 30 August 2022 (UTC)[reply]

I was going to ask whether i vs ī in the category names was the best way to go about distinguishing the categories, since glancing at Category:Latin words by suffix nothing else jumps out as using a macron and typically categories for homographs are distinguished via parentheticals added by id= like in agripeta. But I see they're both first/second-declension adjective-forming suffixes, so it's nonobvious what qualifier would to distinguish them; is that why you want to use vowel length? Hmm. But it seems like it might require quite a revision to the module, to make it not strip the macron in just this one situation while still removing it from links to the entry on the suffix, etc, and it'd be inconsistent to only mark vowel length in this category name and not others, so then we might be looking at renaming a lot of categories, including the -as (although maybe we should rename them all to mark vowel length, IDK). Maybe the entries should just use the macroned vs macronless form as the id= parameter, generating Category:Latin words suffixed with -icius (-icius) vs Category:Latin words suffixed with -icius (-īcius)? I admit that looks a little odd, but it'd work without requiring big changes or specialcasing. (But no objection if people think it's better to do the work of making Category:Latin words suffixed with -īcius work.) - -sche (discuss) 23:42, 30 August 2022 (UTC)[reply]

@-sche, Nicodene So we could potentially add a param to the categorizing templates ({{affix}} etc.) to make them not strip macrons. But I'd in general be opposed to that as it seems too hacky and usually stripping macrons and accents works fine. IMO either we should just not bother distinguishing the two, or we should use the existing |id= param in some fashion. In order to allow having categories with macrons/accents/etc. in them, we'd need significant evidence that this makes sense in a large number of cases, not just one or two. Benwing2 (talk) 02:17, 31 August 2022 (UTC)[reply]

Why not id=attaching to past participles and id=attaching to adjectives/nouns? Sure, it's wordy, but it would work within the existing system. @Urszag might have some input here. This, that and the other (talk) 03:43, 31 August 2022 (UTC)[reply]

I'd propose as ids 'long vowel' and 'short vowel', since the distinction betwen 'attached to ...' is kinda finicky, though acceptable. On the other hand, I would oppose having macrons on Latin page titles, categories should be no exception. For this last reason I would also oppose '-icius (-īcius)'. Catonif (talk) 12:57, 31 August 2022 (UTC)[reply]

I agree that ids "long vowel" and "short vowel" seem like the simplest way to unambiguously distinguish -icius from -īcius within the currently existing categorization system.--Urszag (talk) 14:24, 31 August 2022 (UTC)[reply]

~~I have added senseid's for 'long' and 'short'. How can we use them for categorization?~~ Solved, thanks Catonif. Nicodene (talk) 20:42, 31 August 2022 (UTC)[reply]

Request for `{{wikipedia}}`

Hi - would it please be possible to add the option of multiple language codes for the {{wp}} template? It's sometimes useful to link to both the English WP page and the language-specific one, and it feels silly to have multiple boxes. This is usually the case when we don't have an English entry, but there is an English WP article - very common for place names that are hard to attest in English. It does occasionally happen for other POS, too (e.g. German Schnellbomber). Theknightwho (talk) 14:26, 31 August 2022 (UTC)[reply]

And what would the box heading say? "English and German Wikipedia have articles on…"? Potentially very confusing. Why not use multiple {{pedia}} links? They're not as wasteful with space as {{wp}}. – Jberkel 16:19, 31 August 2022 (UTC)[reply]

Wikipedia has articles on:

例子 (Written Standard Chinese^?)
例子 (Cantonese)

Not confusing at all - we could do something similar to {{zh-wp}}, which users don't seem to have struggled with. The problem with {{pedia}} is that it creates cluttered bullet-points that are an inefficient use of space, and which don't draw the eye of the reader. There's a reason we're generally moving away from that format (not specifically with WP links, but synonym sections, derived terms etc.). Theknightwho (talk) 17:18, 31 August 2022 (UTC)[reply]

Derived terms are still formatted using bullet points. The main reason for moving synonyms etc. to {{syn}} was to bring them closer to the definitions. Boxes are visible, but float somewhere on the other side of your screen (or on top, on mobile). Anyway, it's a lost cause. The boxes have won. – Jberkel 19:45, 31 August 2022 (UTC)[reply]

Derived terms use {{der3}} or similar in many cases, and bullet points are messier. Anyway, the point is that this change is not confusing. Theknightwho (talk) 20:21, 31 August 2022 (UTC)[reply]

{{der3}} renders lists with bulletpoints. How messy. –Jberkel 08:33, 1 September 2022 (UTC)[reply]

How disingenuous. Theknightwho (talk) 17:28, 4 September 2022 (UTC)[reply]

Button to generate vocabulary words

Many people use Wiktionary to enrich their English vocabulary, but it's often difficult to find interesting words to learn, especially those which are particularly obscure. The online OED has a "Lost for Words?" button which randomly chooses a few words from the dictionary (Screenshot), and I think would be an excellent feature to add to Wiktionary. Unlike Special:Random, it would filter out:

Any non-English words
Terms with uppercase letters, numbers, or symbols (which are unlikely to be interesting)
Alternative forms/misspellings/etc.
Offensive terms and slurs
Obsolete terms (not entirely sure about this one)

Does anyone know how hard this would be to implement?

Ioaxxere (talk) 16:08, 31 August 2022 (UTC)[reply]

It would probably be a pain to implement on the site itself, but using category membership (start with Category:English lemmas and then exclude undesirable categories) it wouldn't be very hard to do as a tool or app. - TheDaveRoss 16:27, 31 August 2022 (UTC)[reply]

You can play with petscan, combining different categories, and setting the output to random. Here's an example: https://petscan.wmflabs.org/?psid=22747127. – Jberkel 16:33, 31 August 2022 (UTC)[reply]

Thanks, I didn't know about that site. User:TheDaveRoss, would this be difficult to integrate into the main page?

Ioaxxere (talk) 22:44, 31 August 2022 (UTC)[reply]

Requesting a bot edit for Swedish nouns

Where is the appropriate place to request bot edits? Specifically, I would like a bot to make edits if all of the following three conditions apply.

The page is in Category:Swedish lemmas
The page title ends with the three letters "are"
The page has the {{sv-noun-c-zero|stem=}} template with the page title (without the final "e") in the "stem=" parameter. For example, for dansare this would be {{sv-noun-c-zero|stem=dansar}}

If the three conditions above hold, the template in question should be replaced with {{sv-infl-noun-c-are}}. At the moment, these bot edits should produce no apparent change to any page as they appear to users. Arguably, however, the {{sv-infl-noun-c-are}} template should be modified in the future to show the colloquial short form without the final "e" in the definite form (ie. bagarn as opposed to proper bagaren) as well. Making such a change consistently would be much easier if all pages ending with "are" had the same inflection template. Gabbe (talk) 17:35, 31 August 2022 (UTC)[reply]

@Gabbe I can try doing this bot job. I don't know if there will be any cases where a {{sv-noun-c-zero}} will have parameters other than stem=, but I think I'll just make the bot ignore those cases (or maybe list them somewhere). — SURJECTION ^{/ T / C / L /} 09:48, 3 September 2022 (UTC)[reply]

@Surjection: Great! If {{sv-noun-c-zero}} has other parameters than "stem=" it should just ignore those cases. Alternately, it could list them on my talk page and I'll handle them manually. In addition, {{sv-infl-noun-c-zero}} is a redirect. If the bot could handle that template the same way as well, that would be splendid. Gabbe (talk) 10:34, 3 September 2022 (UTC)[reply]

The cases that it couldn't handle were byggnadsarbetare and segrare. — SURJECTION ^{/ T / C / L /} 10:42, 3 September 2022 (UTC)[reply]

Outstanding. Thanks! Gabbe (talk) 11:12, 3 September 2022 (UTC)[reply]