Partial synonyms

edit

Many languages that actively build compounds have the issue of compounds that technically mean the same thing as on of their base words, but are built to focus on one of that words' attributes. For instance Ingrian emämaanporoda is such a compound (since emämaa is already a type of poroda), but this compound focuses on that attribute. Think of it as the compound equivalent of "the land of England" or "the city of London".

So my question is, should we have a template for such compounds, maybe something similar to {{synonym of}}, so these nuances in interpretation can be explained without having to resort to duplicating a lot of information between entries. Thadh (talk) 21:33, 1 December 2022 (UTC)[reply]

Could be handy as a type of syn of template, linking to a glossary term. I think you might have to propose some terminology and a glossary write-up. Vininn126 (talk) 14:51, 6 December 2022 (UTC)[reply]
@Vininn126: I can find two terms for this in use in literature: "tautological compound" ([1], [2], [3]) and "clarifying compound" ([4], [5], [6], [7]). The template would probably just say "tautological/clarifying compound of" and link to the glossary. I personally prefer the latter term, but it seems to be less frequent. Thadh (talk) 14:28, 8 December 2022 (UTC)[reply]
I also prefer the the latter term, it seems often enough to justify its, even if it's less frequent. Vininn126 (talk) 14:31, 8 December 2022 (UTC)[reply]
I can think of many times I thought this too, eg vicino di casa neighbour, literally house neighbour since vicino alone also means near. I also would like something like {{syn of}}, but I agree with Vininn that we need to find a proper clear and unambiguous term (I have no suggestions...). This issue seems to have been addressed by Chinese editors with the creation of {{zh-div}}: see 上海 (Shànghǎi) for an example. Chinese doesn't have spaces, which alters the perception of what is SOP, and a lot of techincal Chinese stuff here has grown to be pretty much isolated and self-dependent, so we're clearly talking of very distinct modi operandi, but still worth mentioning. Catonif (talk) 15:50, 6 December 2022 (UTC)[reply]
I take it that this would include groups like plane tree, planetree, & plane, which are common for vernacular names of organisms. DCDuring (talk) 17:38, 6 December 2022 (UTC)[reply]
@DCDuring: I'd say definitely yes for planetree, but plane tree sounds like it should be split in an alt spelling entry and the mathematical sense. Thadh (talk) 18:56, 6 December 2022 (UTC)[reply]
Ugh, once again Wiktionary wrongly favors the closed compound. 'plane tree' is far more common; 'planetree' looks wrong to me and Chrome actually puts a red line under it. Google ngrams agrees; 'plane tree' is 10x more common. Benwing2 (talk) 03:36, 8 December 2022 (UTC)[reply]
The closed compound planetree is evidence needed by some to show that plane tree is not SoP. My concern is that the proto-proposal under discussion not do violence this kind of situation. [[Plane tree]] should be main entry, plane is its "partial synonym" under this proto-proposal, with planetree and plane-tree (?) being alternative forms. DCDuring (talk) 16:21, 8 December 2022 (UTC)[reply]
Sure, I don't have any objections to plane tree being the main entry, I was merely referring to the fact that the mathematical and botanical senses should imho be split. However, under this proposal, plane tree would actually be the clarifying compound (see above), whereas plane would be the main entry. If that doesn't work in this case, I would seek to use another template (probably just {{synonym of}}). Thadh (talk) 16:31, 8 December 2022 (UTC)[reply]
Interestingly, plane in the tree sense seems much less common than plane tree, but Oriental plane seems much more common relative to Oriental plane tree.
I don't see how plane can be the main entry for the vernacular name of the organisms. Each definition line has its own relative frequency and, therefore, its own "main entry". DCDuring (talk) 16:38, 8 December 2022 (UTC)[reply]
@DCDuring, Thadh I agree with DCDuring that plane tree should be the main entry. Thadh, maybe you're thinking of oak vs. oak tree or pine vs. pine tree? BTW my main concern with a "partial synonym of" template is that it's adding yet another template to express a fine shade of detail and which will probably end up being underused. We already have a lot of such cases and having more of them just increases the confusion for non-expert editors. Benwing2 (talk) 04:33, 12 December 2022 (UTC)[reply]
@Benwing2: Yeah, those are probably better examples. I also find it frustrating to have to add yet another template, but I can't think of a better solution to address the nuance inherent here. Thadh (talk) 08:49, 12 December 2022 (UTC)[reply]
If we recognize it to be part of grammar or pragmatics, it doesn't belong in the lexicon. It doesn't seem to me to be a question of fine distinctions of meaning ("nuance") as of different context-determined expressions to express the same thing in a relatively predictable way. DCDuring (talk) 12:28, 12 December 2022 (UTC)[reply]

Following this discussion and much complaining on the Discord channel, I've created the above-cited template. It doesn't have language sorting, and overall functions pretty roughly, as I'm not a template editor, and I believe that if I were to implement language sorting I would need to change something as well in {{auto cat}}'s script, which is way out of my reach. Catonif (talk) 11:05, 2 December 2022 (UTC)[reply]

I've added language sorting to the template. I've hopefully not gone against our categorization naming policies. Catonif (talk) 16:42, 2 December 2022 (UTC)[reply]

Looks good to me! This should be useful. 70.172.194.25 19:33, 6 December 2022 (UTC)[reply]

Slavic verbs and derived terms

edit

It would be handy if there were an easier way to list slavic prefixed verbs with pairs next to each other. I find the Russian system (i.e. ять) to be too clunky and big. I have a preference for what's on Polish jąć in terms of aesthetics, but the write up is clunky as hell. I think some solutions could include modifying things like col templates or to make a special template for Slavic verbs in the style of {{hu-verbpref}} but we'd need to modify for. Other solutions? Vininn126 (talk) 09:48, 4 December 2022 (UTC)[reply]

@Vininn126 I agree the Russian prefixed verb lists are clunky, but at least they used to line up; this seems to have gotten broken with changes to {{top2}}. Benwing2 (talk) 01:06, 5 December 2022 (UTC)[reply]
@Benwing2: Unfortunately the column solution wasn't reliable, because if one line of text in either column wrapped a different number of times than the corresponding line in the other column, the lines below that wouldn't line up. A table like this will keep corresponding lines aligned. I threw together a template {{ru-derived verbs}} for it to clean up the wikitext, though it's you and the other Russian editors' decision whether to use that, or rename or restyle. — Eru·tuon 04:17, 5 December 2022 (UTC)[reply]
Something like this could be handy if it were also collapsible. It'd also be nice if it could be modified to fit in with the col aesthetic. Vininn126 (talk) 05:41, 5 December 2022 (UTC)[reply]
@Erutuon Thanks, once this settles into final form I'll consider changing the current tables to use it; most of them were generated from source specs I have in my bot code. Benwing2 (talk) 10:56, 5 December 2022 (UTC)[reply]
@Vininn126: I added collapsibility in the ять (jatʹ) list, but am not sure what else to change style-wise to make it similar to {{col2}}. There are a lot of differences. Or do you mean an entirely different layout that replicates the list in jąć? Probably not too hard to implement. — Eru·tuon 17:22, 5 December 2022 (UTC)[reply]
That looks so much nicer! And I think if we could make a similar template that mimics the style of col2 or something would be nice. Vininn126 (talk) 17:49, 5 December 2022 (UTC)[reply]
@Vininn126: I added a demo of a jąć-style list format to Module:ru-derived terms/documentation. — Eru·tuon 14:44, 7 December 2022 (UTC)[reply]
@Erutuon This is excellent! I might copy it and make a separate template for Polish in that case, because something about using "ru-verbs" on Polish entries don't sit right with me. Vininn126 (talk) 14:53, 7 December 2022 (UTC)[reply]
@Vininn126: I should probably just rename the module and make it work for Polish as well. — Eru·tuon 15:17, 7 December 2022 (UTC)[reply]
@Erutuon Well that would be even better. Maybe something like sla-derived terms Vininn126 (talk) 15:20, 7 December 2022 (UTC)[reply]
@Vininn126: Okay, I did that and made {{pl-derived verbs}} with the list format as the default (demo: jąć). — Eru·tuon 16:04, 7 December 2022 (UTC)[reply]
Thank you so much! Vininn126 (talk) 16:05, 7 December 2022 (UTC)[reply]
@Erutuon Can we supply a null parameter using - for verbs that don't have a pair? See jąć both wziąć and powziąć are being printed as an impf/pf pair. And then also it would be nice to have a bot switch instances of col3|title=verb to using this. Can we also have it print "verbs" as a title? Finally, there are also times where there might be two of an aspect, so lopsided, e.g. on analiza there is analizoważ, prze- and z-. If the template can't handle that, I suppose what is there now is fine, but if it could that would be ideal. Vininn126 (talk) 16:17, 7 December 2022 (UTC)[reply]
@Vininn126: The template already could do that... I had messed up my find-and-replace converting the manually formatted list to the template. Fixed. — Eru·tuon 16:52, 7 December 2022 (UTC)[reply]
@Benwing2 The other issues are solved, could we get working on this module? And would it be possible to convert things with the text "title=verb" or "title=verbs" to work like on jąć#Polish and this dif on analiza? Vininn126 (talk) 17:25, 7 December 2022 (UTC) Vininn126 (talk) 17:25, 7 December 2022 (UTC)
@Vininn126 Before you get too into using the new templates, I am probably going to change their format a bit; in particular, instead of using two parameters to specify the imperfect and perfect, there will be one param and a slash separating them. This will allow some abbreviations, for example:
{{ru-derived verbs
|-/еда́ть<q:iterative>
|-
|*е́сть/еда́ть
|за
|на
|пере
|надо
|поднадо-/- 
|недо
|по-/-
|-/по-
|про
|объ
|надъ
|подъ
|разъ
|изъ
|съ
|отъ
|у
|вы́
|*е́сться/еда́ться
|за
|на
|при
|про
|объ
|въ
|разъ
}}

A line prefixed with a * establishes a template of suffixes which will be appended to prefixes (marked by trailing hyphens) on succeeding lines. As a special case, a line consisting of only a prefix (with or without a trailing hyphen) adds the prefix to the current set of suffixes. So this expands to

{{ru-derived verbs
|-/еда́ть<q:iterative>
|-
|зае́сть/заеда́ть
|нае́сть/наеда́ть
|надое́сть/надоеда́ть
|поднадое́сть/-
|недое́сть/недоеда́ть
|пое́сть/-
|-/поеда́ть
|прое́сть/проеда́ть
|объе́сть/объеда́ть
|надъе́сть/надъеда́ть
|подъе́сть/подъеда́ть
|разъе́сть/разъеда́ть
|изъе́сть/изъеда́ть
|съесть/съеда́ть
|отъе́сть/отъеда́ть
|уе́сть/уеда́ть
|вы́есть/выеда́ть
|зае́сться/заеда́ться
|нае́сться/наеда́ться
|прие́сться/приеда́ться
|прое́сться/проеда́ться
|объе́сться/объеда́ться
|въе́сться/въеда́ться
|разъе́сться/разъеда́ться
}}

but with a lot less typing. Here the hyphen on a line by itself separates "sorting groups"; within a given group, the verbs will automatically be sorted lexicographically. Note here that I put the perfective first; this is consistent with the fact that normally the prefixed imperfectives are derived from the prefixed perfectives and not the other way around, but if you definitely believe that imperfectives should be listed first, that change can be made. Benwing2 (talk) 03:19, 8 December 2022 (UTC)[reply]

@Benwing2 I have only put it on one page, plus the two Eru put, so changing this will not be hard. The important thing is that the slashes need to be able to take multiple impf's or pf's as it can now. As to the order - I believe I saw impf first other places so that's what I followed, but the fact imperfective's are usually from perfectives is a good reason to put perfectives first. Let me know when you make these changes. Perhaps we could make clones or something for other Slavic languages and create some uniformity on pages.
Actually wait; there are times when the imperfective is first - e.g.anarchizować vs zanarchizować; do we still want to list the perfective first? Vininn126 (talk) 08:16, 8 December 2022 (UTC)[reply]
@Vininn126 The general rule is base verb (impf) -> prefixed verb (pf) -> prefixed verb with special suffix (impf). The reason I advocate for putting the perfective first is that there are far more cases of (prefixed verb, prefixed verb with special suffix) than there are of (base verb, prefixed verb), and these derived terms are usually placed under the base verb, so the (base verb, prefixed verb) pair often won't appear among the derived terms in any case. Benwing2 (talk) 02:49, 9 December 2022 (UTC)[reply]
Also, the changes I'm making will preserve the ability to specify multiple comma-separated verbs of a given aspect (as well as multiple comma-separated suffix templates). Benwing2 (talk) 02:51, 9 December 2022 (UTC)[reply]
@Benwing2 Sounds good to me. I say go ahead. Vininn126 (talk) 08:07, 9 December 2022 (UTC)[reply]
@Vininn126 I have implemented this; see User:Benwing2/test-ru-derived verbs and User:Benwing2/test-pl-derived verbs. Will push to production shortly. Benwing2 (talk) 07:22, 10 December 2022 (UTC)[reply]
BTW I fixed up the existing uses of {{pl-derived verbs}}. I see you are using it on nouns as well as verbs; I didn't think about this use case. When used on nouns like analiza, I agree it looks a bit strange to put the prefixed perfectives before the unprefixed imperfective. If you think it won't be confusing, I could add an argument (e.g. |impf_first=1) so that the imperfect goes first (IMO probably best to do this both in the arguments and the output). We could even do this for individual arguments if you'd prefer that; or do nothing, your choice. Benwing2 (talk) 07:54, 10 December 2022 (UTC)[reply]
@Benwing2 I think such an argument would be useful. What's funny is your changes are actually what I had in mind for the template a long time ago when I first imagined its existence. I am unsure if I want individual control or mass control - there are times where the impf is first, then perf, then a SEPARATE impf. However the symmetry is nice. Individual would add a lot of typing, however. Vininn126 (talk) 11:55, 10 December 2022 (UTC)[reply]
There is also a secondary issue where the verbs displayed on mobile behaves oddly - it shows the same six as desktop when collapsed and they awkwardly jump around when uncollapsed (on mobile). Vininn126 (talk) 11:58, 10 December 2022 (UTC)[reply]
@Vininn126 I implemented |impf_first=1. For implementing this on a per-line basis, I would make it so that if you explicitly specify the gender on a term using <g:impf> or <g:pf>, the remaining terms are handled appropriately (those on the same side of the slash get the same aspect and those on the other side get the opposite aspect). If you think this is useful, let me know. As for the mobile issue, can you clarify what goes wrong? Maybe User:Erutuon can help; this seems likely to be a CSS issue or an issue in Module:columns. Benwing2 (talk) 22:46, 11 December 2022 (UTC)[reply]
@Vininn126 I created documentation for {{ru-derived verbs}} and {{pl-derived verbs}}. The main documentation is at {{ru-derived verbs}}; {{pl-derived verbs}} works essentially the same way (except with no special handling of diacritics) and its documentation refers to {{ru-derived verbs}}. Benwing2 (talk) 00:55, 12 December 2022 (UTC)[reply]
@Benwing2 Cheers! And have you tried opening it on mobile yourself? Vininn126 (talk) 09:15, 12 December 2022 (UTC)[reply]
@Vininn126 I have made a change to the derived-verbs templates so that if you specify the "wrong" aspect on the left side using e.g. <g:impf>, the remaining terms on the same side take the same aspect and the terms on the other side automatically take the opposite aspect. That way if you want to reverse the order for a given derived verb pair, you only need to specify the aspect explicitly on one of the left-side terms rather than on all terms. Benwing2 (talk) 02:44, 14 December 2022 (UTC)[reply]
@Erutuon Can you take a look at the mobile issue called out by Vininn126? I am unfortunately not familiar enough with CSS to know how to fix it. Benwing2 (talk) 02:45, 14 December 2022 (UTC)[reply]
@Benwing2: Mobile has CSS column count set to 1, but still has the data-column-count attribute set to 2 or whatever. The script was using data-column-count to (imperfectly) calculate which items to hide, which means it was showing items that would probably be at the top of two columns if there were two columns. Hopefully fixed by using getComputedStyle to get the true number of columns. I don't feel like it's a good solution, because it retrieves every CSS property value when it only needs one, but it should work. — Eru·tuon 05:10, 14 December 2022 (UTC)[reply]
@Erutuon Thank you! Sorry, I have never worked much with front end code so I have little idea what's going on here. When you say it retrieves every CSS property value, can that lead to OOM errors or is this not a realistic concern? Benwing2 (talk) 05:47, 14 December 2022 (UTC)[reply]
@Benwing2: I don't know, though I thought about it a bit and it's probably lazy-loaded to whatever degree the developers have thought fit to improve performance. I had been looking at the returned object in the console, which loads all its properties. Probably not a problem or someone would have put up a warning to be careful with it on MDN. — Eru·tuon 14:54, 14 December 2022 (UTC)[reply]

Is the HathiTrust Digital Library down?

edit

I've not been able to access the HathiTrust Digital Library, which some of our quotation templates use, for some days now. However, websites like Isitdownorjust.me are reporting that it is up and running. Is anyone else experiencing issues with this website? — Sgconlaw (talk) 19:41, 4 December 2022 (UTC)[reply]

It at least loaded for me. I was also able to open a random page. Vininn126 (talk) 20:21, 4 December 2022 (UTC)[reply]
@Vininn126: hmmm. It may be a geographical thing. I cannot access the site at all, whether on my laptop or a mobile device. I get a "cannot connect to server" error. This has happened for several days already. — Sgconlaw (talk) 15:39, 5 December 2022 (UTC)[reply]
FWIW, {{R:mnw:Stevens1896}} links to both to that site for a page and to archive.org for the whole book. The site is accessible to me in the UK. --RichardW57 (talk) 00:08, 7 December 2022 (UTC)[reply]
@RichardW57: I’m still completely unable to use the site due to a “server stopped responding” error. It’s pretty annoying. Since you can access it, are you able to find an email address that I can use to report the issue? Thanks. — Sgconlaw (talk) 01:48, 7 December 2022 (UTC)[reply]
support@hathitrust.org looks promising. RichardW57m (talk) 15:09, 7 December 2022 (UTC)[reply]
@RichardW57m: thanks! — Sgconlaw (talk) 15:10, 7 December 2022 (UTC)[reply]

Join the Coolest Tool Award 2022: Friday, Dec 16th, 17:00 UTC

edit

The fourth edition of the Wikimedia Coolest Tool Award will happen online on Friday 16 December 2022 at 17:00 UTC!

This award is highlighting software tools that have been nominated by contributors to the Wikimedia projects. The ceremony will be a nice moment to show appreciation to our tool developers and maybe discover new tools!

Read more about the livestream and the discussion channels.

Thanks for joining! -Komla


MediaWiki message delivery 18:53, 5 December 2022 (UTC)[reply]

This makes no sense (niner)

edit
Discussion moved to WT:TR.

Creation of {{displaced}}

edit

I have gone ahead and updateed our glossary to include displaced, and @Catonif made the template {{displaced}} (the history says I did because they helped me create it on the discord server). I wonder how possible it would be to check etymology sections for the word "displaced" or "displaced native" and replace this.

There's another issue - when it displaced a word from an ancestor of a language (say, an Old or Middle English word), we should use this, but what about the usage of it for obsolete terms within the modern language? This question needs to be answered regardless if we want the template or not. Vininn126 (talk) 17:06, 6 December 2022 (UTC)[reply]

Certainly a newer term can displace an older term all within one (stage of a) language, like miscegenation displaced amalgamation (at least in the US). Right? "displaced earlier", "replaced earlier" finds other examples. (Another term to look for btw is supplanted, as in orchid.) - -sche (discuss) 19:24, 6 December 2022 (UTC)[reply]
Sure, that makes sense. We might be able to modify the template to include the word "earlier" if it's a same-stage displacement. Vininn126 (talk) 19:38, 6 December 2022 (UTC)[reply]
Update - that feature was added when I wasn't looking. Vininn126 (talk) 10:06, 7 December 2022 (UTC)[reply]
We've added a {{{by}}} parameter to the template for displaced terms. By turning this on should the template categorize and add a term to a category "displaced terms by language"? Vininn126 (talk) 20:56, 8 December 2022 (UTC)[reply]
Hmm... if it'll categorize, and not just serve to templatize a certain text to save having to type it out repeatedly, then I wonder about the current parameter setup. The documentation says 1= is "the language code [...] of the term that was displaced", and doesn't seem to take a parameter for the language of the entry it's in. So I can see which parameters would make it display ēaþmōdlīċe as an "Old English displaced term", but ... suppose only Scots and not English has replaced an Old English term, do we want to indicate that it's an "Old English term displaced in Scots" only, and do we want to put the Scots replacement word into a category of any kind? Perhaps the template should take the language code of the entry it's in (e.g. en in humbly) as 1=, for consistency with other etymology templates like {{bor}}, {{doublet}}, {{initialism}}, etc, and then the language of the other term (e.g. ang when mentioning ēaþmōdlīċe in humbly) as 2= like in {{bor}}, etc? - -sche (discuss) 01:43, 9 December 2022 (UTC)[reply]
It is built of of {{ncog}}, which means we would have to change that. That is, of course, if we found the idea of a category useful. Vininn126 (talk) 08:08, 9 December 2022 (UTC)[reply]

The heading at WT:RFV says: "At least a week after a request has been closed, if no one has objected to its disposition, the request should be archived to the entry's talk page. This is usually done using the aWa gadget, which can be enabled at WT:PREFS." There doesn't seem to be any mention of aWa at WT:PREFS. WT:aWa directs you to Special:Preferences, but I can't seem to find any reference to it there either. Does anyone know if this gadget is still available? Graham11 (talk) 03:31, 7 December 2022 (UTC)[reply]

It's there in gadgets under Miscellaneous. First one. Vininn126 (talk) 10:07, 7 December 2022 (UTC)[reply]
Aha, I think I may see the issue: WT:aWa says "Only autopatrolled users can enable the gadget in their preferences", and checking his user rights page, Graham doesn't seem to be in that user group. (He's been around quite a while; should he be added to that group?) - -sche (discuss) 10:45, 7 December 2022 (UTC)[reply]
I am not familiar with this user but I don't see why not, they have around 1300 edits. Benwing2 (talk) 03:26, 8 December 2022 (UTC)[reply]
@-sche Ah, that makes sense. What would be the process to be added to that group? Graham11 (talk) 04:55, 9 December 2022 (UTC)[reply]
Looks like Wiktionary:Whitelist. Benwing2 (talk) 05:48, 10 December 2022 (UTC)[reply]

Standardising surname templates

edit

I noticed when adding an English surname entry to Vass that in Hungarian, for the surname it appears as "a surname", in contrast to "A surname" for the English entry. I know it's a minor thing, but shouldn't "A surname" be standardised for all languages? DonnanZ (talk) 19:47, 7 December 2022 (UTC)[reply]

This appears to be intentional, since most English definition lines begin with a capital letter, whereas non-English definitions tend to begin with a lowercase letter. See the code at Special:Permalink/70323366#L-715. I think the relevant guideline is Wiktionary:Style guide#Definitions, but that doesn't talk about whether this capitalization rule applies for non-gloss definitions like the ones generated by {{surname}}. 98.170.164.88 19:58, 7 December 2022 (UTC)[reply]
Personally, I feel as though non-gloss definitions should always be capitalised (with a full stop, too). I know that some people feel it's more consistent to treat all entries the same in a particular language, but I feel like it misses the point of why we don't follow ordinary English capitalisation rules with glosses in the first place. What's worse, the lack of capitalisation/full stop can make non-gloss definitions resemble glosses at first glance, given that sometimes italics get wrongly omitted. Theknightwho (talk) 01:02, 8 December 2022 (UTC)[reply]
I think full sentences with capitalization look awful for non-English language definitions, whether gloss or non-gloss definition, and the italicization should be enough to distinguish gloss from non-gloss definition (it works fine in English, what makes other languages different?). Benwing2 (talk) 03:32, 8 December 2022 (UTC)[reply]
I agree, and would like us to change e.g. {{alternative form of}} to also be uncapitalized outside English, personally. I wonder if we could make a PREF where users could opt in to (or out of) seeing non-English {{surname}}s etc in sentence case, or whether this would be bad because then they would want to also see the definitions/glosses in sentence case, too... - -sche (discuss) 04:18, 8 December 2022 (UTC)[reply]
Like @Theknightwho, my preference is for all glosses regardless of language to follow the usual sentence capitalization and punctuation rules. However, I don’t have skin in the game because I seldom edit non-English entries. — Sgconlaw (talk) 04:29, 8 December 2022 (UTC)[reply]
OK, it looks like the verdict is "no change". A word about full stops/periods included in some templates, it is better to eliminate them from the template and add them manually. They are a nuisance when an editor wants to add further text, with {{former name of}} for example. Yes, I know it's surmountable. DonnanZ (talk) 10:36, 8 December 2022 (UTC)[reply]
While we're at it can we make them all agree on punctuation? I prefer no punctuation for non-gloss non-English defs. Vininn126 (talk) 11:51, 8 December 2022 (UTC)[reply]
By that do you mean full stops/periods? Regardless of what we decide on capitalisation, my preference is that capitalisation should end with a full stop, and no capitalisation should end without one. Either way, I want to be able to use things like commas and semicolons. Theknightwho (talk) 00:07, 9 December 2022 (UTC)[reply]
Yeah, uncapitalized glosses/non-glosses should have no dot, like in Vass, and capitalization should go with a dot by default that can be turned off by nodot=1 if someone needs to add more text. It's annoying that some templates don't include the dot + option to turn it off, e.g. {{altform}}, so far, far, far more entries have "Alternative form of foobar" with a capital letter and no dot nor any further gloss/non-gloss, than actually have any following text. I think it would be simple to write a script that found all entries where a template like {{altform}} had a manually-written following dot, or comma or other text, so they could be cleaned up or have nodot=1 added, and then the template could be made to add a dot whenever its initial letter is capitalized (which IMO should be: for English but not for non-English). - -sche (discuss) 02:08, 9 December 2022 (UTC)[reply]
@-sche It is indeed a mess currently. There is a relatively up-to-date table at Category:Form-of templates documenting all the templates and the state of initial capital and final period with them. The problem is that there is no consensus about how to standardize them, as can be seen from this very discussion (e.g. I would prefer to have no default periods at all; it is easier to type a final period than to add |nodot=1). So they have tended to stay in their current muddled state (reminds me strangely of Puerto Rico ...). Benwing2 (talk) 02:57, 9 December 2022 (UTC)[reply]
@Benwing2: A muddled state indeed. A language-specific template like {{en-third-person singular of}} has a capital letter and no full stop, {{plural of}} has neither. I suggest you take the bull by the horns. Is there any objection to removing the full stop from {{former name of}}? DonnanZ (talk) 11:11, 9 December 2022 (UTC)[reply]
I do mean periods. And I agree with your preference. Vininn126 (talk) 08:10, 9 December 2022 (UTC)[reply]
@Benwing2, "it is easier to type a final period than to add |nodot=1" in one entry, perhaps, but AFAICT the number of entries where someone needs to suppress the dot for most of these ({{altform}}, etc) is small compared to the number where they don't, so requiring people to add the dots manually makes for more work overall than just having the template put the dot (and the work goes undone, so entries like kilikinick just start off with a capital and trail off without a dot).
I feel like the way votes require 2/3rds to change the status quo works for situations where there is a status quo, but for situations like this where there is no consistent status quo approach, but most of us agree there should be a consistent approach (and just disagree on what it should be), we might benefit from being able to run a vote between the two options and say whichever of them gets the majority of the votes gets implemented so at least things are consistent. - -sche (discuss) 09:48, 9 December 2022 (UTC)[reply]
@-sche I am inclined to agree with you although there appear to be more than two options here: should we capitalize form-of templates always or only in English, and should we add a final period always, only in English or never? My preference is probably to capitalize and add a period for English but not otherwise, consistent with how glosses are treated. Benwing2 (talk) 05:44, 10 December 2022 (UTC)[reply]
I agree here. Vininn126 (talk) 11:48, 10 December 2022 (UTC)[reply]
Since there’s probably no consensus on extending the English position to other languages, I agree too. — Sgconlaw (talk) 11:51, 10 December 2022 (UTC)[reply]
I'm not so sure. DonnanZ (talk) 15:41, 10 December 2022 (UTC)[reply]

Another unsupported title not displaying correctly

edit

& ; needs to be added to MediaWiki:UnsupportedTitles.js. Binarystep (talk) 09:23, 8 December 2022 (UTC)[reply]

Added. I also mostly alphabetized the list. - -sche (discuss) 01:55, 9 December 2022 (UTC)[reply]

{{Han ref}} threw an error when I tried to add an apostrophe in |dkj= here. For reference, see the kIRGDaiKanwaZiten index at the Unihan Database. Rdoegcd (talk) 17:26, 8 December 2022 (UTC)[reply]

The template {{Han ref}} includes the code {{#expr:{{{dkj}}}+0}}. The only possible reasons for using that code instead of just {{{dkj}}} on its own are to validate that the input is a number, or potentially to remove leading zeros. If the assumption that {{{dkj}}} should always be a number without any extra punctuation is wrong, then that extra code should be removed or changed. 98.170.164.88 23:41, 8 December 2022 (UTC)[reply]

Trying to add a citation but get blocked as "spam"

edit

This is the citation I was trying to add to the "docuseries" web page

https://www.bbc.co.uk/news/uk-wales-63903853 "The stars also worked on a Disney+ docuseries chronicling the purchase and stewardship of the club, who currently play in the National League, which was being filmed when the King and Queen Consort visited" Petermab (talk) 17:34, 9 December 2022 (UTC)[reply]

To be specific, the abuse filter prevented you from creating the page Citations:docuseries, perhaps because your account is new and the text contains some phrase the filter doesn't like. Dunno, since I can't examine the log details. Anyway, I added the quotation to the main entry docuseries. 98.170.164.88 19:59, 9 December 2022 (UTC)[reply]

Someone please add Standard Babylonian to Module:etymology languages/data.

edit

That would beakk-sb, together with and akin to the other Akkadian varieties. Biolongvistul (talk) 21:18, 9 December 2022 (UTC)[reply]

This has been created and it would be nice to have it be a type of "universal" template for derived/related terms sections - reduces the amount of templates people have to learn and also create uniformity, as well as make it easier for readers since the template automatically adjusts to the readers screen. I think we should at least replace instances of colX with this. I think it would be more controversial to replace instances of {{l}} in those sections (although many of those sections use bare links, like on Czech pages due to a one Mr Dan Polansky, et al), those should at LEAST have {{l}}. Vininn126 (talk) 15:21, 13 December 2022 (UTC)[reply]

Support, and preferably moved to {{col}}. — Fenakhay (حيطي · مساهماتي) 15:28, 13 December 2022 (UTC)[reply]
@Fenakhay Could you maybe not go around replacing {{col4}} to {{col-auto}} without consensus? Thanks. Thadh (talk) 23:20, 13 December 2022 (UTC)[reply]
I am replacing it only in Maltese entries :). — Fenakhay (حيطي · مساهماتي) 23:21, 13 December 2022 (UTC)[reply]
diff As long as you are, it's fine. Thadh (talk) 23:27, 13 December 2022 (UTC)[reply]
Weak oppose: I like how, for instance, {{col4}} puts everything in one line unless n > 4, whereas {{col-auto}} automatically puts them under each other. Also, I'm not sure {{col-auto}} is equipped to deal with other scripts, cf. Yiddish יאָדלע (yodle) which would become quite cluttery if placed in an {{col4}} or {{col5}}. Thadh (talk) 23:31, 13 December 2022 (UTC)[reply]
Oppose for now, I'm not a fan of the simplistic way that Module:columns/auto determines the number of columns and actually it's not obvious to me that higher n = more columns is the way to go about it—the primary factor for me is probably the visual length of each item, as Thadh suggests, and not n beyond a certain small number. —Al-Muqanna المقنع (talk) 23:48, 13 December 2022 (UTC)[reply]
I agree; length of each item should be the main factor, not total number of items. I tried implementing a proof-of-concept (hacky) way of determining column count from the number of visible characters per line. See Module:columns/auto/sandbox and Special:Permalink/70395376. I'm not very satisfied with it, though maybe it could be improved. I wonder whether letting the browser determine the number of columns automatically using CSS column-width would be a more elegant solution. 98.170.164.88 02:00, 14 December 2022 (UTC)[reply]
Support, though based on Thadh and al-Muqanna's comments the template apparently needs to be worked on. Ultimately uniformity is needed: there are too many templates (2, 3, 4, 5...) and no clear rule that dictates what should be used in what context, with the result being everyone using whatever is their aesthetical preference. Catonif (talk) 23:59, 13 December 2022 (UTC)[reply]
Prefer using column width: I would like to have fewer column templates, but column width is better than column count to adapt to people's screens. Currently {{col-auto}} (and templates based on Module:columns in general) are using column count. As Al-Muqanna says, the length of list items is as important as than how many items there are. The way Module:columns/auto does it, if there are many items, long items can be squished into multiple lines (or they'll protrude across columns) on narrower screens. I'm not sure how many people this will affect because we're disabling multiple columns in mobile mode using MediaWiki:Mobile.css. But the four-column list on Template:col4/documentation displays badly on my phone if I switch to the desktop site by clicking the link at the bottom of the page. I suppose if I made my desktop browser narrower, the lists would display badly as well. It's not feasible to reliably determine the length of list items in a module, so it's better to let people choose column widths in entries. We would have to permit one of several widths to be chosen as proposed for translations. — Eru·tuon 01:00, 14 December 2022 (UTC)[reply]
@Erutuon What do you mean exactly by "it's better to let people choose column widths in entries"? What are you proposing? BTW it's definitely possible to approximate the width using the number of characters displayed, and this might be good enough. This is done currently in {{pt-IPA}} and various other pronunciation templates to determine how big of a box to place the entries in. Benwing2 (talk) 02:37, 14 December 2022 (UTC)[reply]
@Benwing2: I mean we could have a number of classes each with a different em width in a CSS file and the template could have a parameter to choose between them. So if the template received |width=widest it could format the list with column-width: 40em;, |width=narrowest with column-width: 10em;, or whatever widths or parameter values we chose. I don't think calculating approximate character width in general is really feasible. It might be sort of possible for IPA characters, but doing it for all characters requires going from code units to code points to graphemes to calculating how wide the graphemes will be in the fonts readers happen to have. Latin script has variable width characters and some scripts like Javanese have much wider characters in the fonts my browser uses. We could parse graphemes and words in Lua, but we don't have the last link in the chain, character width information for fonts. So manually choosing from a set of possible widths seems a saner solution. — Eru·tuon 03:59, 14 December 2022 (UTC)[reply]
@Erutuon I would prefer to use an automatic algorithm but allow it to be overridden. Requiring a manual param is going to lead to too much error and bad results as new items are added. Benwing2 (talk) 04:03, 14 December 2022 (UTC)[reply]
@Benwing2: Well, the automatic algorithm would have to pick from a number of preset column widths in any case because MediaWiki list syntax provides no way to specify an arbitrary column width. A MediaWiki list with * or # or : or ; has the parser insert an implicit list element (ul, ol, dl), which can't be styled with style="column-width: whatever;". Manually converting MediaWiki list syntax to HTML, so that there's a list element to add the CSS to, would work, but could increase Lua memory because it would create more string objects. An alternative to trying to figure out the rendered width of words in list items in Lua would be to set a reasonable default width and let people override it. — Eru·tuon 20:42, 21 December 2022 (UTC)[reply]
Proposal I would like to suggest something hopefully noncontroversial in the meantime, which is to convert the {{col}} template so that the number of columns is a named param |n= rather than a numbered param |2= coming before the terms. That way if/when we implement autosizing of the number of columns in a way that is generally acceptable, we can switch {{col}} to be autosizizing by simply allowing |n= to be omitted. BTW there are < 170 current uses of {{col}} so this conversion should be easy. Benwing2 (talk) 02:40, 14 December 2022 (UTC)[reply]
@Thadh@Al-Muqanna We can change how col-auto works; the main issue that Catonif brings up is the sheer amount of choice and the only difference being the preference of the aesthetic of the given editor, which can cause headache and annoyance for the reader. By having one set it's more of a compromise for everyone. If people prefer things in lines, then sure, we can make that change, the proposal is more about trying to reduce template bloat and create a standard. Would you support if col-auto was changed to include things in a single line when possible? Vininn126 (talk) 10:27, 14 December 2022 (UTC)[reply]
The choice of how many columns to use is an aesthetic one and the template will need a smarter way to determine that if it's to be preferable to a manual decision, i.e. based on visual width and not n. I would support if such a method is figured out and the change to the underlying module is made, or at minimum if a manual override is added as Benwing mentions above. —Al-Muqanna المقنع (talk) 10:31, 14 December 2022 (UTC)[reply]
Sure, that's fine - but the problem with basing your vote on purely an aesthetic choice is that it leaves way too much room open to interpretation. What you find aesthetically unpleasing someone else will prefer, and you can make whatever arguments you want for why your choice is better but it's a bit like arguing about food preference. What someone prefers is just that. It's better to make a compromise with as many people here when one can. Vininn126 (talk) 10:35, 14 December 2022 (UTC)[reply]
And to add upon this - back when I had Polish entries switch to col 3 (there used to be a completely different style), I didn't like the aesthetic. But I also recognized the function of it. Vininn126 (talk) 11:32, 14 December 2022 (UTC)[reply]
But that's the whole point isn't it? You can either have a lot of choice and be aesthetically pleasing or you can have uniformity and not be. It seems Al-Muqanna and I prefer the former, whereas you and Catonif prefer the latter. Of course we all want uniformity and aesthetics, but as Al-Muqanna has pointed out, {{col-auto}} currently doesn't provide that. Thadh (talk) 11:33, 14 December 2022 (UTC)[reply]
Yes, but then readers looking at your entries may find them absolutely NOT pleasing, or other people won't. So while maybe you're happy, a whole bunch of other people aren't. And that said, what if we made col-auto put them on the same line when possible? Vininn126 (talk) 11:37, 14 December 2022 (UTC)[reply]
If the template can do that and factor width in, I'm fine with using it. Thadh (talk) 11:38, 14 December 2022 (UTC)[reply]
IMHO, the number of columns should be an aesthetic matter, as long as "aesthetic" is a synonym of "ease of use/comphrehension". And it should vary by the width of the window/frame in which the content appears. Tables that have items with lengths bimodally distributed can be particularly ugly, but I'm guessing that AI would be required to resolve the ugliness. DCDuring (talk) 14:30, 14 December 2022 (UTC)[reply]
Yeah, there seems to be a presumption that "aesthetic" implies "arbitrary", but it doesn't in this case: we're just talking about the look, and in terms of reader understanding or general UX there are some aesthetic choices that are objectively better than others. Thadh's point above about the Yiddish is an obvious example: entries in a foreign script are generally longer because they also contain a transliteration, and it's preferable for these not to be chopped into multiple rows on ordinary screen sizes. Uniformity should not be prioritised over this. —Al-Muqanna المقنع (talk) 16:12, 14 December 2022 (UTC)[reply]
We can make those changes and still make it the default. Vininn126 (talk) 16:23, 14 December 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── What about letting the browser pick the number of columns using the same solution as was recently implemented by User:This, that and the other for translation tables? Maybe there's some technical reason why this works for translation tables and not elsewhere, but if so I'm not aware of it. Benwing2 (talk) 01:30, 15 December 2022 (UTC)[reply]

This page needs to be removed. It shows up in lower-end search engine results as looking to be the correct spelling. And when you visit this website (that page), it is not immediately clear that it is intended as a placeholder for a pronunciation.

Was trying to add the following to the top of the page but it would not let me, just kept stating that I was trying to post something harmful.

http://pastehere.net/gzsG98H/ VirileLeo (talk) 17:29, 13 December 2022 (UTC)[reply]

If you want to remove an English entry you should either propose deleting it at WT:RFDE or ask for verification at WT:RFVE rather than here. I'd note, though, that I would not support deleting this. It looks decently attested as a deliberate pronunciation spelling, just like the entry says, not a misspelling, so your edit would have been incorrect. A pronunciation spelling is not just a "placeholder for a pronunciation", it can be used e.g. in fiction to represent characters' accents or as eye dialect. —Al-Muqanna المقنع (talk) 23:26, 13 December 2022 (UTC)[reply]

Yet another unsupported title needing some MediaWiki:UnsupportedTitles.js love

edit

It's [sic] this time. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 09:26, 16 December 2022 (UTC)[reply]

Hmm, I'm not sure why we should have an entry for this. Putting brackets or parentheses around a word doesn't change it into a different word. 98.170.164.88 09:29, 16 December 2022 (UTC)[reply]
RFDed. Equinox 09:45, 16 December 2022 (UTC)[reply]

I have one idea and one question regarding this module. 1) Is it possible to modify this template to allow a page parameter? (This is why I am posting this in the Grease Pit as opposed to say, the Tea Room) 2) Would it make sense to make a bibliography for Middle Polish/obsolete Polish and also based off of public domain dictionaries? I know it's mostly for extinct languages, and Middle Polish is treated as an extinct label of sorts, but I also see Q's for some living languages, albeit they are "smaller". Vininn126 (talk) 13:21, 16 December 2022 (UTC)[reply]

@Vininn126 (1) Yes, I can see about adding that. (2) Sure, I don't see why not. Benwing2 (talk) 03:45, 18 December 2022 (UTC)[reply]
@Benwing2 Thanks. Vininn126 (talk) 08:19, 18 December 2022 (UTC)[reply]
Also, what is the difference between using this and, say, creating an RQ template? Vininn126 (talk) 10:46, 18 December 2022 (UTC)[reply]
@Vininn126 It seems that {{Q}} was originally created for Latin and Ancient Greek primary sources, where the author and work names are fairly standardized, and has the effect of standardizing the formatting across works. In all other cases I think RQ templates are better because they allow for all relevant bibliographic information to be displayed (albeit at the possible expense of some consistency of formatting, which can be alleviated by using one of the {{cite-*}} templates in the RQ definition). Benwing2 (talk) 20:43, 18 December 2022 (UTC)[reply]

Haha, that's not right, is it? Equinox 17:33, 16 December 2022 (UTC)[reply]

I thought mi was the language code for Maori, but obviously not in this case. Anagrams are a pain in the wotsit. DonnanZ (talk) 23:21, 16 December 2022 (UTC)[reply]
  Fixed, though I don’t know if it will be re-added by a bot. I didn’t realize that entries with slashes aren’t actually subpages of entries! — Sgconlaw (talk) 04:56, 17 December 2022 (UTC)[reply]
It wouldn't surprise me if it was re-added. DonnanZ (talk) 11:19, 17 December 2022 (UTC)[reply]
Do we know who runs the anagram bot? — Sgconlaw (talk) 11:40, 17 December 2022 (UTC)[reply]
@Sgconlaw: NadandoBot (talkcontribs). DonnanZ (talk) 13:52, 17 December 2022 (UTC)[reply]
@DTLHS: could you update your bot so that it avoids creating spurious anagrams like the one discussed here? Thanks. — Sgconlaw (talk) 13:54, 17 December 2022 (UTC)[reply]
Is DTLHS still active? The bot hasn't been fired up since 9 February this year. DonnanZ (talk) 21:55, 19 December 2022 (UTC)[reply]
Oh, gosh. — Sgconlaw (talk) 19:34, 23 December 2022 (UTC)[reply]

Main space does not have subpages feature enabled. It is counted as a valid entry, but it is not a legit one. It should be moved with some namespace prefix, i.e. "Appendix:Translations/mi". --Vriullop (talk) 11:56, 17 December 2022 (UTC)[reply]

@Vriullop: however, at the moment it doesn't seem to be the practice to do so when creating "subpages" of entries to avoid module errors on entry pages. — Sgconlaw (talk) 13:57, 17 December 2022 (UTC)[reply]
I was thinking that we needed a Translations namespace, but using appendices is a good idea, especially since we also have fake subpages for derived terms (se/derived terms; see a search) and we'd need a new namespace for everything that we want to move out of entry pages. — Eru·tuon 18:36, 17 December 2022 (UTC)[reply]
I recall a few particularly large, CAT:E-prone Chinese entries having to have even more content than that moved to subpages, so yes, it's not just translations. 口/derived terms actually contains the entry's synonyms, compounds, and descendants. - -sche (discuss) 06:14, 18 December 2022 (UTC)[reply]
Okay, yes, we'd better not change the anagram bot while we still have actual slashed entries like he/she. Equinox 19:36, 23 December 2022 (UTC)[reply]
Ah. — Sgconlaw (talk) 11:35, 24 December 2022 (UTC)[reply]

|catN= parameter for {{af}}

edit

Italian a (style) works very much like a prefix, but since it rarely forms univerbed terms, it cannot be lemmatized as a-, and thus isn't recognized by {{af}} as a prefix. This leads to the impossibility to have a category like Category:Italian compounds with a (style) if not by manual categorization, like for example is Category:Ancient Greek compounds with ποιέω, which is unsupported by both {{af}} and {{auto cat}}.

I suggest the creation of a |catN= parameter, which would allow, by setting a boolean value to it, to choose whether to categorize for only one of the morphemes. So for example the etymology of a cappella would be:

From {{af|a|cappella|cat1=1|id1=style|t2=choir of a church}}.

This could also be useful in the case in which multiple affixes are given and one wants to disable the categorization of only one of them with |catN=0, while |nocat= doesn't allow that.

Following the change it would also need to be implemented in {{auto cat}}, and some sort of {{compoundsee}} would need to be created. Catonif (talk) 20:59, 19 December 2022 (UTC)[reply]

Apparently {{compoundsee}} already exists. Catonif (talk) 22:19, 19 December 2022 (UTC)[reply]

Automatic import from Palestinian Arabic dictionary

edit

The largest open machine-readable dictionary for Palestinian Arabic [ajp, South Levantine Arabic] has just been released by the CAMeL Lab at New York University Abu Dhabi: website / paper. The paper was accepted at The Annual Workshop on Arabic Natural Language Processing (WANLP 2022). It was praised by Noam Chomsky, Hamid Dabashi, Abdelkader Fassi Fehri, Ilan Pappé, Walid Saif and Clive Holes (see here).

With 36k entries from 17k lemmas, and 3.7k roots, it is probably the largest dictionary ever made for any Arabic variety.

It is released under CC BY-SA 4.0 and available in TSV. All entries include:

  • {{ajp-root}}
  • part of speech (NOUN:FS, NOUN:MS, NOUN:P, etc.)
  • diacritized Arabic orthography
  • phonological transcription
  • English glosses

~30% also include an MSA translation. 50% contain examples. Some have context notes ("archaic", "borrowed from Turkish", "medicine", "offensive", etc.).

Given the quality of this free dictionary and our poor coverage of South Levantine Arabic (2,816 lemmas) I'd like to automatically add its content to Wiktionary. How could we do that?

I created the template {{R:ajp:Maknuune}} and added two entries:

(please note that I've already asked the question last month, before the release and before we were sure about the license) A455bcd9 (talk) 13:32, 20 December 2022 (UTC)[reply]

Thanks @Fenakhay for improving the formatting of the above examples. I'm sure you'll notice that Maknuune uses CAPHI instead of our guidelines (Wiktionary:About Levantine Arabic). If we decide to import Maknuune, we should convert CAPHI to "our" transliteration: I assume we could do that automatically as well (e.g., aa => ā)? A455bcd9 (talk) 14:30, 20 December 2022 (UTC)[reply]
After a mere examination of the legal sufficiency of your claims but less so the content quality of the material – already vouchsafed by experts – you want to populate Wiktionary with, I answer that you may, and your observation that their transcription scheme has to be converted is also correct (it is unsufferable by our standards), and it is mostly but not wholly correct that it can be adapted automatically since it is difficult to implement for a bot to know whether something is a digraph or a morpheme boundary (as well as for humans in some cases, which is the most important reason why transcriptions in digraphs are frowned upon)—you probably have to convert the transcriptions in your data first and look through the results yourself then add it by bot, provided our community has authorized your bot by formal vote, for which purpose you would do a test run of it adding a low two-digit number of entries. If I am correct to observe that your data does not tell the genders of nouns explicitly your system should also make a reasonable guess of the genders of them as the reader is expected to do lest it leave too many requests for manual edits. Fay Freak (talk) 17:41, 21 December 2022 (UTC)[reply]
@Fay Freak: thanks for your answer. I've just had a call with @Christioscay, one of the co-authors, who confirmed:
  • Data tells genders explicitly (in both the PDF and the TSV files, for instance NOUN:MS for a masculine singular noun and NOUN:FS for feminine singular)
  • There's a 1:1 correspondance between the CAPHI transcription and IPA (it was used to generate IPA in the PDF dictionary, based on the TSV file). Converting to our transcription system shouldn't be a problem either.
  • Christioscay already has a script to convert the TSV into a PDF and will publish it on GitHub. The script could be adapted to populate Wiktionary.
Before starting a formal vote, I'd like to get more feedback from the community, and especially from Arabic contributors. What is the best way to do so besides posting here? A455bcd9 (talk) 17:51, 21 December 2022 (UTC)[reply]
If this is done, add a less alarming variant of {{Webster 1913}} to newly created entries so readers are aware that they were added in bulk. I don't have a strong feeling in general. The only time I look up ajp words is when I see a photo of a sign and want to know what the words mean. I'm not even sure if the words are typically in MSA or dialect. Vox Sciurorum (talk) 20:03, 21 December 2022 (UTC)[reply]
Hi @Vox Sciurorum: The template I initially created looked like {{Webster 1913}} (see here). @Fenakhay changed it to a reference (see {{R:ajp:Maknuune}}). Which format should we use "so readers are aware that they were added in bulk"? A455bcd9 (talk) 22:33, 21 December 2022 (UTC)[reply]
@A455bcd9 {{R:ajp:Maknuune}} is the correct name; not sure what's the best appearance though. Benwing2 (talk) 07:54, 22 December 2022 (UTC)[reply]
Hi @Benwing2, yes I wondered about the appearance mainly. (pinging @ShahdDibas, the main author, who's just created an account.) A455bcd9 (talk) 08:35, 22 December 2022 (UTC)[reply]
(@Benwing2, e/c) Well, {{R:ajp:Maknuune}} is the correct name and appearance for a reference template, but I think Vox's point is that we should also have a "this was added en masse without being checked for completeness" notice, like how T:Webster 1913 (the warning for words added from Webster without checking for missing senses) is not the same as T:R:Webster 1913 (the reference template for words Webster has relevant content on but which have been checked to be up to date etc, or were entered without initially using Webster), and we're gradually going through entries with the former and converting them to the latter once they've been checked. A455bcd9's initial content was good as a banner, though I don't know what it should be called. If we already have ajp entries we could add Maknuune as a reference for (but which we didn't import from Maknuune), or we check individual entries imported from Maknuune and find them to be complete, that's what {{R:ajp:Maknuune}} is for, whereas the banner is for flagging bot-imported, unchecked entries, so we need both. - -sche (discuss) 08:37, 22 December 2022 (UTC)[reply]
@-sche Agreed but I don't like the former template name {{Maknuune}}; it needs to have a more standardized name. One possibility is to use the same template for both but add a flag like |dump=1 or |banner=1 to display the banner and indicate that the entries were dumped en masse. When they're cleaned up you can just remove that flag. Benwing2 (talk) 09:04, 22 December 2022 (UTC)[reply]
@A455bcd9, -sche, Benwing2, Fay Freak, Vox Sciurorum: I've created {{bulk import}} as a generic template.
In my opinion, the template should be placed under the language header temporarily (as in User:Fenakhay/أكزخانة). Once the entry has been checked and amended, the before-mentioned template should be deleted and a references section be added with {{R:ajp:Maknuune}}. — Fenakhay (حيطي · مساهماتي) 22:40, 22 December 2022 (UTC)[reply]
@A455bcd9 All this needs to be done carefully so we don't end up adding a bunch of low-quality entries to Wiktionary. For example, the dictionary specifically covers Palestinian Arabic but Wiktionary classifies this variant as part of South Levantine Arabic, which also includes Jordan (and parts of Syria). Not all Palestinian terms will exist in Jordanian speech, and some that do may have different meanings. Yet adding a "Palestine" tag to all such entries may be misleading if the term also exists in Jordan. In addition there are certainly a ton of corner cases that will arise once someone starts writing the conversion script; whoever does the script needs to be both a good programmer and a good linguist. IMO it's unlikely an existing script to convert TSV to PDF will be at all useful; the conversion needs to be into Wiktionary templates, which have nothing to do with PDF. (Also, as for the size of this dictionary, it is probably similar in size to the Richard Harrell dictionaries; he did at least Moroccan and Syrian dictionaries, both of which I have in storage somewhere.) Benwing2 (talk) 01:16, 23 December 2022 (UTC)[reply]
Hi @Benwing2: the main author, Shahd, is a native speaker and PhD in linguistics at the University of Oxford. The article and the dictionary are published by the CAMeL at NYU Abu Dhabi which is one of the best research labs in Arabic natural language processing. @Christioscay (Christian (Khairallah) Cayralat / NYU profile) did the script to convert the TSV to PDF. You can see he's both a good programmer and a good linguist. And of course the script to convert TSV to PDF is useful because it's the exact same coding blocks and functions. Only the output format changes (Wiktionary templates instead of PDF templates), but that's the easiest part of the script.
I don't think the absence of a tag "Palestine" is an issue to start with. That's already how most entries are added. (And Jordanian and Palestinian are very similar to each other)
I think the Syrian dictionary was done by Stowasser (part of the Harrell series though?). It contains 15,000 entries and subentries over 288 pages, compared to 36k entries over 1,293 pages in Maknuune. I assume Harrell's 1983 Moroccan dictionary also has about 13k entries based on the 2019 update and the number of pages (528 for English-Moroccan and Moroccan-English so about 264 pages for each "direction"). And of course, neither Stowasser's nor Harrell's dictionaries are licensed under a free license. A455bcd9 (talk) 07:45, 23 December 2022 (UTC)[reply]
@A455bcd9 I am not questioning the skills or credentials of the authors of this dictionary but I think you fail to appreciate the complexities involved in dumping 30000+ entries into Wiktionary. Given that this is a relatively obscure language, it's unlikely very many people will be fixing up the entries unless you do it, so it's likely whatever gets dumped will remain, and I am concerned about ending up with low-quality entries that hang around for years. The creator of the script needs to understand Wiktionary templates well, and I still maintain that the existing TSV to PDF script is unlikely to be easily convertible into what you need. Unless you're a programmer yourself you might not understand the magnitude of the task required here. Benwing2 (talk) 08:28, 23 December 2022 (UTC)[reply]
@Benwing2: I'm a programmer myself. Yes it's a big task, but we don't have to (and we probably shouldn't) dump the 36k entries all at once. What I had in mind was to start with ~10 random entries then ~100 then ~1000 (etc.) and get feedback along the way. A455bcd9 (talk) 08:33, 23 December 2022 (UTC)[reply]
@A455bcd9 IMO a better approach is to start writing the script and running it on random entries, but don't actually dump the entries into Wiktionary until you are pretty happy with the results. Only then get feedback and iterate. Otherwise you may end up wasting people's time. Also I took a look at the TSV and I see several typos in the English glosses ("ostritch", "complian", "idiomatice", ...), which you will have to deal with; I suspect similar typos in the Arabic and IPA fields. Also text like "It_is_an_idiomatic_expression_that_means_that_sb_does_not_want_to_tell_his_destination" cannot be dumped as-is but needs to be templatized, etc. Benwing2 (talk) 08:39, 23 December 2022 (UTC)[reply]
@Benwing2 yes sure, we shouldn't dump in the mainspace directly. I agree with your approach. Shahd and Christian told me they focused on Arabic and pronunciation where they think there's no typos, whereas the English glosses have some as you noticed. Anyway, there's some work to do! A455bcd9 (talk) 08:42, 23 December 2022 (UTC)[reply]
Also: what's the easiest way to check whether a word is already in Wiktionary for ajp? Dump the whole ajp category and check its content? Or check the HTML of each entry/form and look for ajp in it? A455bcd9 (talk) 08:45, 23 December 2022 (UTC)[reply]
@A455bcd9 Try [8]. Good luck! This, that and the other (talk) 11:50, 23 December 2022 (UTC)[reply]
@This, that and the other: thanks a lot! A455bcd9 (talk) 12:17, 23 December 2022 (UTC)[reply]
If nothing else, we could dump it to subpages of Appendix:Palestinian Arabic or Appendix:Maknuune dictionary of Arabic or something (and then move entries to mainspace as they're checked), but given the scholarly quality of the overall project, even with typos (which we and Webster also have), I say we take it while the data and the people willing to do the legwork of importing it are available, lest it later go down or switch to a more restrictive licence, and then check it at leisure. - -sche (discuss) 17:17, 23 December 2022 (UTC)[reply]
perfect is the enemy of good... A455bcd9 (talk) 21:34, 23 December 2022 (UTC)[reply]
Exactly. - -sche (discuss) 07:42, 31 December 2022 (UTC)[reply]
Hi all, I'm Christian, one of the authors of Maknuune. Thanks for your collective interest in expanding this project, and your willingness to integrate it into this platform! I am willing to help with the integration process, but I need to know the following thing first: is it possible to get a dump of the wiki code that is used to generate the individual entries (for all entries) in an easy way (i.e., without having to web scrape)? I would like to be able to re-integrate Wiki changes to our own dictionary. Can someone please advise on this? Thanks! Christioscay (talk) 18:44, 4 January 2023 (UTC)[reply]
@Christioscay: welcome! Thank you for all of the work you put into creating this dictionary and for your willingness to share and integrate it here.
You can get a dump of all pages here. It's a huge compressed XML file any I'm not sure it will be particularly helpful unless you just want to see how all of the existing entries are formatted. Don't worry about the specifics of merging the SLA into pages that may already exist, there are people here with experience doing that who can help when the data is ready to be merged. Instead, it's much more important to ensure that the data to be merged fit with our existing style and, most importantly, make extensive use of our templates wherever possible.
See the pages مداس and أكزخانة that User: A455bcd9 created for examples of well-formatted SLA entries that make good use of the templates. As mentioned in the discussion above, it's very important to use our templates wherever possible for etymology, part of speech headlines, usage examples, and so on. This will help categorize everything and ensure that the entries fit with our formatting. If you have questions about which templates to use or how/when/where to use them, this is a good place to ask. Good luck! JeffDoozan (talk) 20:29, 4 January 2023 (UTC)[reply]
Thanks for the warm welcome! This is exactly what I need, I need to have easy access to all entry codes (i.e., formatting) and be able to identify Maknuune entries from the dump easily (using some label) in order to salvage useful edits that might be made by users on Wiktionary and integrate them back into the original Maknuune for future releases. I think I have found a way to do it as you said using the dump in conjunction with a Python library (Wiktextract). Converting from our TSV format to the Wikicode format should not be an issue, although I foresee some double counting instances especially due to different spelling conventions, but if you say that people have the expertise to deal with that here, then this is good. Christioscay (talk) 07:34, 5 January 2023 (UTC)[reply]
@Christioscay It should be possible to include a template in the page text of all entries originating with Maknuune, so that it's possible to automatically identify such entries. Benwing2 (talk) 01:38, 7 January 2023 (UTC)[reply]
@A455bcd9 Apologies if I came across as prickly in my responses to you; now that I reread it I realize I may not have been the most welcoming. My original assumption was that you did not understand the complexities of such a project; this was based on the fact that most new users tend to think of Wiktionary as more like Wikipedia and don't realize that the Wikicode here is far more structured. Please feel free to ping me if you need someone to review auto-created candidate entries. Benwing2 (talk) 01:43, 7 January 2023 (UTC)[reply]
@Benwing2: no worries at all! I'll try to find some time to play a bit with pwb this weekend ( https://github.com/a455bcd9/wikiMaknuune , nothing done yet...). Thanks for offering to help. I'll ping you if needed. Best, A455bcd9 (talk) 08:10, 7 January 2023 (UTC)[reply]
@Christioscay: pwb may be useful for you as well (to read ajp entries directly in wikicode). A455bcd9 (talk) 08:14, 7 January 2023 (UTC)[reply]
@Benwing2 @Christioscay @Fenakhay, I created the first 10 entries under my user page (list here) using this script. I focused on "simple" entries: singular nouns derived from a Semitic root with only one gloss.
The only issues are the transliteration scheme: we should use "ours". How can we easily convert Buckwalter (BW) to ours? Once done, our template will automatically display the correct IPA pronunciation as well. A455bcd9 (talk) 15:05, 8 January 2023 (UTC)[reply]
@A455bcd9: here is the mapping. The transliteration used in Maknuune is the BW (Buckwalter) transliteration. For the transcription, we are using CAPHI++ (a slight variant on CAPHI, and you can check out the paper for more about the ++ part (https://arxiv.org/pdf/2210.12985.pdf). Christioscay (talk) 15:14, 8 January 2023 (UTC)[reply]
Thanks @Christioscay. Our format is actually a transcription, not a transliteration so it's a bit more complex. Any idea @Fenakhay on how to do that? A455bcd9 (talk) 15:45, 8 January 2023 (UTC)[reply]
Here are the correspondences between our format and BW. Unsolved issues: vowels and learned borrowings. A455bcd9 (talk) 21:02, 8 January 2023 (UTC)[reply]
Actually using CAPHII++ instead should solve the above problems. I'll try to build the function f(CAPHI++) -> WiktionaryTranscription tomorrow. A455bcd9 (talk) 21:21, 8 January 2023 (UTC)[reply]
Transliteration (using our format) and IPA pronunciation done on the random entries: feedback welcome! A455bcd9 (talk) 17:59, 9 January 2023 (UTC)[reply]
@Assem Khidhr @Fay Freak you might be interested in this related discussion (although I've paused my efforts because of nonwiki work...). A455bcd9 (talk) 14:06, 30 January 2023 (UTC)[reply]

How come Vietnamese translation took a walk on the wild side, and now is sitting inside translation table title, and invisible to editing ?
Update + crazy I'd visited this one apprx. a month ago, there was no such thing, something wreaking havoc out there, and most likely there are more such pages. Can an admin find out if it is a bot and whatsnot? Flāvidus (talk) 16:00, 21 December 2022 (UTC)[reply]

@Flāvidus: I don’t see interference inside translation table titles. Looks like rendering error on your system. Fay Freak (talk) 20:43, 21 December 2022 (UTC)[reply]
If you click "Select preferred languages" at the top of a Translation section and then star "Vietnamese", it will show Vietnamese translations in the titlebar of the Translations table. Click it again and unstar "Vietnamese" if you don't want to see it in the titlebar. JeffDoozan (talk) 21:23, 21 December 2022 (UTC)[reply]
@JeffDoozan: I don’t see anything particular to Vietnamese. All preferred languages sit in the title bar and are not editable there. (I have never used the “preferred languages” before either.) Fay Freak (talk) 22:33, 21 December 2022 (UTC)[reply]
@Fay Freak: @JeffDoozan: Thank you all for responding. I took your time with my faux pas. Yes, starring "Vietnamese" in preferred languages caused the change in my system . Flāvidus (talk) 13:19, 22 December 2022 (UTC)[reply]

Prefix μ- not working

edit

See μm: “From {{prefix|mul|μ|t1=micro-|m|t2=metre}}.” Result: “From μ (“micro-”) + m (“metre”).” Category: Category:Translingual terms prefixed with μ. It worked previously and the correct category (Category:Translingual terms prefixed with μ-, which I created) was deleted today by @Benwing2 because it became empty. μ (correctly) does not have the “micro-” sense: only μ- does. J3133 (talk) 09:07, 24 December 2022 (UTC)[reply]

Did a bit of mw.log debugging in Module:compound, and this line incorrectly identifies the script of the prefix μ as Hani, because of this if statement in Module:scripts/findBestScript. I gather from the edit summary on the previous edit that it fixes something with regard to identifying Han script but don't really understand how, so User:theknightwho, who added it, will probably have to explain and help find a solution. Adding |sc1=Grek to the template fixes the problem, but ideally we wouldn't have Greek script identified as Han in translingual templates. — Eru·tuon 01:53, 25 December 2022 (UTC)[reply]
@Erutuon @J3133 The issue's now fixed.
To give a quick summary of why that if-statement is there:
  • By default, findBestScript will pick the script matching the most number of characters in the given string.
  • This is a problem for Chinese, because Hani (the general code) will always match an equal or greater number of characters to Hant (traditional) or Hans (simplified).
  • With most situations like this, the function will choose the first script in the event of a tie, so it's just a matter of ordering them correctly in the languages data module. However, the characters in Hant and Hans are determined on a special-case basis by Module:zh/data/ts and Module:zh/data/st, whereas Hani just covers the entire CJK set en masse. The reason for doing it this way is mostly because those tables are edited frequently (which prevents duplication), but also due to the very large number of characters (~5,000), which makes the usual way of specifying characters totally impractical.
  • Importantly, these tables do not contain characters that don't need to be converted (i.e. characters used in both traditional and simplified).
  • That means that there are situations which would cause Hani to match more characters than either Hant or Hans, even though it is actually possible to determine one way or the other. For example a term with a traditional character + an either-way character.
  • That if-statement activates a subroutine if findBestScript is currently checking Hani, and causes it to output bestscript (if it exists), or otherwise Hani. Because Hant and Hans are checked before Hani (for Chinese), bestscript will exist if and only if either of them matched at least one character. If that's the case, that script is output, not Hani - so in the example situation above, we'd get Hant.
  • As such, Hani is only output for Chinese if no characters are exclusively traditional or simplified.
  • However, this is only supposed to happen if Hani itself matches at least one character.
The reason we were getting this error looks to be down to a typo on my part. It was reaching Hani in the list, and then activating the subroutine because it had matched at least 0 characters (i.e. it was activating every time). I've forgotten which, but it was almost certainly supposed to read >= 1 or > 0, because otherwise it's a totally pointless check.
I've now corrected this, so the prefix is being categorised correctly once again. Theknightwho (talk) Theknightwho (talk) 03:07, 25 December 2022 (UTC)[reply]
As an addendum: I've made a small modification to Module:scripts to ensure that Hani is always the last script to be checked, which is necessary to ensure the above logic works correctly. Theknightwho (talk) 03:54, 25 December 2022 (UTC)[reply]

Invalid ISSN on Gerstmanngate

edit

Can someone figure out why this ISSN is coming up as invalid? It seems to be right, as it's the ISSN given in the periodical itself (pg. 2, toward the bottom), and the WorldCat link works. If all else fails, replacing it with |oclc=311055794 would be an acceptable alternative. 70.172.194.25 22:53, 24 December 2022 (UTC)[reply]

I think I figured this out. The real ISSN is probably 0140-0711, and both the paper itself and WorldCat are probably wrong to give it as 1040-0711. 70.172.194.25 00:28, 25 December 2022 (UTC)[reply]

Fula regional terms

edit

Can somebody explain why Fula terms with the template (Adamawa) added in the definition are added to a category called: Adamawa Fulfulde, yet none of the other regional varieties of fula have a page, and those comparable templates (E.g. (Pular), (Maasina fulfulde).) do not link to anything. Adamawa (ISO=fuc) is the only dialect of Fula that appears under the subcategory : Regional Fula? I am not experienced enough to understand how data modules really work, but this information should be edited somewhere to include the other eight major dialects of Fula. Can anybody elucidate how this works or help me make the necessary edit? Thanks!

P.S. The other major dialects and their ISO codes are Pulaar - fuc Pular - fuf Maasina fulfulde - ffm Borgu fulfulde - fue Western Niger fulfulde - fuh Central-Eastern Niger fulfulde - fuv Nigerian fulfulde - fub and Bagirmi fulfulde - fui Hk5183 (talk) 22:06, 28 December 2022 (UTC)[reply]

@Hk5183: You can add the missing dialects to Module:labels/data/lang/ff. — Fenakhay (حيطي · مساهماتي) 22:07, 28 December 2022 (UTC)[reply]
Thank you very much! Hk5183 (talk) 22:43, 28 December 2022 (UTC)[reply]

Removing automatic cuneiform from Akkadian inflection tables.

edit

Can anyone who has the technical knowledge please remove all automatically created cuneiform from appearing in Akkadian inflection tables? We use normalised forms in the Latin alphabet for Akkadian entries. We only use cuneiform in the Cuneiform spellings section, or when linking, mentioning or giving Akkadian words in Etymologies, but after making sure the spelling is attested or at least plausible. Cuneiform spellings in Akkadian cannot be automatised for the very nature of the script itself. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 21:29, 29 December 2022 (UTC)[reply]

@Erutuon, @Benwing2: Can you help with this? I wouldn't know where to start... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 23:37, 29 December 2022 (UTC)[reply]
@Sartma Can you clarify what you are referring to and how you propose to replace it? I looked at the random verb ḫalāqum and its conjugation table; I assume the cuneiform in there is auto-generated. Are you suggesting removing the auto-generation of cuneiform forms from inflection tables? If so, presumably the module needs to have a way of manually specifying the cuneiform when it is attested? Can any other Akkadian editors chime in? Benwing2 (talk) 03:44, 30 December 2022 (UTC)[reply]
@Benwing2: Hi! Sorry for the lack of info. Yes, I'm referring to all templates that generate inflection tables for Akkadian, so {{akk-conj}}, {{akk-decl-noun-m}} (like in šarrum), and the like. All auto-generated cuneiform should be removed, and only the Latin romanization left (at the moment there are both). It was introduced by an individual user together with the templates (that besides this have other issues of their own and tbh should not be used anyway...) without any previous discussion nor agreement. It's not a good idea to add cuneiform to the tables to begin with. It's already not a priority to give any cuneiform spelling at all, since Akkadian lemmas are given in the Latin alphabet anyway, and there are very little people who would be able to find attested inflected forms anyway (especially for verbs), but because of how cuneiform spelling works (more often than not there will be more than one possible/attested spelling), it would be impossible to manage this inside an inflection table. Unfortunately I'm the only currently active Akkadian editor. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 10:00, 30 December 2022 (UTC)[reply]
@Benwing2: Hi! Sorry for the lack of info. I'm referring to all Akkadian inflection tables ({{akk-conj}}, {{akk-decl-noun-m}}, etc.). The cuneiform should not be replaced, but removed. We give Akkadian lemmas in their Latin Romanisation, and those are the only forms that should appear in inflection templates. We already give attested cuneiform spellings in the Cuneiform spellings section. It doesn't make much sense to make inflection tables about "spelling", instead of keeping them focused on inflections. Moreover, if you have a look at šarrum, for instance, you can easily see how it would be completely impractical to give all attested spelling in the inflection table. Just considering the logograms would mean you should give at least 7 cuneiform spellings for each inflected form (logograms don't show inflections, but are nonetheless used for any inflected form), to which one should then add the attested phonetic spellings with case endings (and keep in mind there might be more than those we give, since being exhaustive of all cuneiform spellings would be way too much time-consuming and it's not a priority anyway). Akkadian inflection tables have been created and added by an individual user without any previous discussion or agreement on what they should contain or how they should look like, they are still in "beta" (incomplete) and have quite a few issues of their own. I'm always tempted to delete them from entries, but I guess that something (even if incomplete and possibly wrong) is better than nothing. Unfortunately I'm your only currently active Akkadian editor... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 11:44, 30 December 2022 (UTC)[reply]
@Benwing2: Sorry for the double-post, I had issues with refreshing the page and thought the first didn't go through. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:29, 30 December 2022 (UTC)[reply]

Spanish "entonces" shouldn't be in Category:Spanish 1-syllable words

edit

The Spanish word "entonces" shouldn't be included in Category:Spanish 1-syllable words as it currently is (it has 3 syllables). 37.11.122.76 02:15, 30 December 2022 (UTC)[reply]

The standard pronunciation is 3 syllables, but we also give reduced pronunciations that are monosyllabic, e.g., [tõ̞ns], or bisyllabic, e.g., [ˈtõ̞nse̞s]. I can see why this would be confusing, though. 70.172.194.25 02:20, 30 December 2022 (UTC)[reply]
IP 37.*: This is a bug; I will fix it. Benwing2 (talk) 03:39, 30 December 2022 (UTC)[reply]

translation tables are now one column

edit

@This, that and the other, Ruakh, Erutuon All translation tables are now showing up one-column for me as of the new code pushed a day or two ago. The new code seems to have worked before; I specifically remember reviewing free and dictionary during the testing phase and seeing two columns. So something must have changed between then and now. I am using Chrome 67.0.3396.87 on Mac OS X 10.9 (sorry my OS is way out of date) on a 13-inch Mac Book Pro with the browser window zoomed all the way out. Benwing2 (talk) 05:12, 31 December 2022 (UTC)[reply]

Update: If I make the font slightly smaller by pressing Command-minus one time, it reverts to two columns but now the font is a bit too small to read comfortably. Benwing2 (talk) 05:15, 31 December 2022 (UTC)[reply]
It’s still working OK for me viewed using Safari on an iPad, but I have my CSS set with a column width of 20em so that I see two columns. — Sgconlaw (talk) 05:18, 31 December 2022 (UTC)[reply]
FWIW, under Preferences I have the font size set to "Medium (Recommended)" and the Zoom to 100%. My fonts are Standard font Times, Serif font Times, Sans-serif font Helvetica, Fixed-width font Courier and I am using the old Vector skin (2010). I don't recall ever tweaking the font settings. Benwing2 (talk) 05:18, 31 December 2022 (UTC)[reply]
I guess my point is that we should not require users to customize their settings to get a good appearance; things should look right out of the box. Is there a min-width setting determining how many columns are shown that maybe can be tweaked a bit? Benwing2 (talk) 05:20, 31 December 2022 (UTC)[reply]
I agree. Like Sgconlaw, I had to set column width to 20em in my css to get the tables to display well. I believe the concern with setting the default to 20em was that extremely long words will display badly? (For example, if you change the Afrikaans word for dictionary from woordeboek to abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklm, when two columns are displayed with a column width of 20em the sixty-fifth and final letter m is smushed into the text in the next column.) But that seems like a rare tail wagging a large dog, to make the display bad for almost all entries to accommodate the handful which have unspaced words that long. Perhaps we should either manually look for words that long, or have Module:translations automatically notice unspaced words that long, and add alt= display forms with soft hyphens every twenty letters or so...? - -sche (discuss) 07:38, 31 December 2022 (UTC)[reply]
What skin are you using? On my wider screen (17 inches or so) the Timeless skin in the entry for dictionary doesn't provide enough width for there to be more than one column, but the other desktop skins (Minerva, Monobook, Vector legacy) do. The mobile site doesn't have necessary CSS rules, so it doesn't have columns.
I'm surprised that the Vector 2022 skin (which I currently use) has enough space for two columns because <div class="mw-content-container"> is only up to 60em wide and the translation columns are set to a maximum of 30em wide with a 20px gap between them. I'd've imagined 2 × 30em + 20px was greater than 60em, but according to Firefox 107 it's less than or equal to 54em. When I select mw-content-container in inspect element, I have to set the max-width width to less than 54em in the CSS window to get one column. — Eru·tuon 20:22, 31 December 2022 (UTC)[reply]
@Erutuon I suppose this is because translation tables use a smaller font than the rest of the page, and the em unit is relative to the font size of the element.
@Benwing2 I tend to agree that we should make the default column width smaller. 20em may be a little small - some translation tables start to look a bit cramped at this size: https://imgur.com/a/ZUnQF4d How about, say, 24em or 25em as a compromise?
@-sche you can use {{trans-top|column-width=wide}} on these exceptional pages. This, that and the other (talk) 09:18, 1 January 2023 (UTC)[reply]
@This, that and the other I did some experimentation. On my 13-inch laptop, any setting <= 28em results in two columns. >= 29em results in one column. So 24em sounds fine to me. Benwing2 (talk) 09:32, 1 January 2023 (UTC)[reply]
@Benwing2 thanks, this is useful info. Although I don't have one myself, this laptop configuration is quite familiar to me so I understand the need for adjustments.
@Erutuon as an interface-admin, please feel free to make the necessary change to MediaWiki:common.css. Personally I lean more towards 25em, to better avoid a cramped appearance. This, that and the other (talk) 09:40, 1 January 2023 (UTC)[reply]
I've made the change, though it's a bit odd now because the narrow column width is 22em, only slightly narrower than the default, 25em. — Eru·tuon 16:18, 1 January 2023 (UTC)[reply]
@Erutuon With the new widths, on one of my devices, normal translation boxes have 4 columns but narrow ones have 5, while on another device, 3 columns increase to 4. So there is still a benefit in maintaining the distinction. This, that and the other (talk) 11:07, 2 January 2023 (UTC)[reply]

Misdivision of syllables in Korean

edit

This is pretty small, but it was bothering me. On the page 법#Suffix, we have the syllables parsed as sayongbeop instead of what I assume it should be, sayongbeop. Can this be easily fixed? Is it because the ng digraph confuses the template into thinking that it's actually a cluster of /n/ and /g/? Thank you, Soap 16:54, 31 December 2022 (UTC)[reply]

@Soap It shouldn't be hard to fix this provided that gb isn't a valid onset in Korean. (This is based on writing the syllabification algorithms for many languages here.) However, I'm not familiar with the code that generates the syllabification for Korean; maybe User:Tibidibi or User:Fish bowl can comment. Benwing2 (talk) 01:46, 7 January 2023 (UTC)[reply]

Labels used in wrong L2s

edit

I searched a database dump for pages with {{lb|en| but no ==English==, and fixed them; then fixed {{lb|mul| with no ==Translingual==, then {{lb|de| with no ==German==. But it's impractical to do this one language at a time (and misses pages which have the right L2 but have the T:lb in a different L2). Anyone want to write a script to check for uses of each language code in a label outside its corresponding language-name section, or if that's too difficult, simply: on a page that doesn't contain the relevant L2? Simply automatically changing the codes to match the L2 would be correct in almost all cases, though in a few cases the {{lb}} should be {{q}} instead.
Another issue I spotted while doing this is pages with two back-to-back {{lb}}s which it might also be good to search for and fix. - -sche (discuss) 21:53, 31 December 2022 (UTC)[reply]

@-sche sounds like you want Wiktionary:Todo/Template language code doesn't match header. I agree that this can largely be automated, although manual supervision is needed - exactly the kind of task AutoWikiBrowser is built for. I have a Python script designed to be used in conjunction with AutoWikiBrowser that I'm willing to share with anyone who wishes to run it. This, that and the other (talk) 04:56, 2 January 2023 (UTC)[reply]