Wiktionary:Beer parlour/2023/February

FYI: January 2023 Unicode newsletter edit

https://mailchi.mp/f8faa6f0371c/unicode-in-6222562Justin (koavf)TCM 20:37, 1 February 2023 (UTC)[reply]

Bolding of years in reference templates edit

Considering we had a whole discussion about whether or not to end reference templates with a full stop, which is arguably a much more subtle matter, I want to pose another stylistic question: should years in reference templates be in bold, like they are for quotation templates? I personally don't think so. It looks out of place and there's not really any reason to highlight the year of publication instead of the author's name, the title of the book, etc. For quotations it makes sense to present the year in bold because it's the very first piece of information presented, and because at a glance it helps to show when a term was in use.

For examples of reference templates that do this, see T:R:alv, T:R:lt:Safarewicz1967, and this search (I can't guarantee there are no FPs or FNs). There are also non-template-based hardcoded references that put the year in bold, as on čoms, and see this search. 70.172.194.25 22:58, 1 February 2023 (UTC)[reply]

I agree—bold years are great for citations/quotes but seem much less useful for reference list entries. In a similar vein, I find it especially pointless how w:Template:cite journal sets the volume number in bold. An typical example of how it looks in a ref list is at w:Radium#References. But I defer if anyone knows any great reasons for such bold that I am ignorant of. Quercus solaris (talk) 08:04, 2 February 2023 (UTC)[reply]
Putting a volume number in bold is standard practice in some citation styles. —Justin (koavf)TCM 08:27, 2 February 2023 (UTC)[reply]
I agree, I think it looks weird. Vininn126 (talk) 08:22, 2 February 2023 (UTC)[reply]
I think the rationale for having the year in bold is to show usage over time, which I can see some value in. —Justin (koavf)TCM 08:27, 2 February 2023 (UTC)[reply]
For quotations, yes, not for references. —Al-Muqanna المقنع (talk) 08:55, 2 February 2023 (UTC)[reply]
Agree; not seeing much point in references. — Sgconlaw (talk) 17:59, 2 February 2023 (UTC)[reply]
I can see an argument made for consistency, but I do agree: citations bolded, references not. brittletheories (talk) 10:31, 3 February 2023 (UTC)[reply]

Too big edit

Wiktionary includes many rare and obscure words, which is great, but gets in the way of fulfilling the function of a concise or learner's dictionary, where you would want to learn common words first, and to know which words are recognized by most native speakers. Would there be a way to list only the top N thousand words, some kind of category or appendix? Drapetomanic (talk) 03:10, 2 February 2023 (UTC)[reply]

@Drapetomanic See Category:Basic word lists by language. Coverage is a bit uneven but it's a good place to start. Benwing2 (talk) 04:00, 2 February 2023 (UTC)[reply]
Thank you, this is awesome Technicalrestrictions01 (talk) 08:50, 3 February 2023 (UTC)[reply]
@Benwing2 I have actually never seen this before. Should we promote it on the front page, as we do with appendices and frequency lists? Or would it be better to incorporate this material ino WT:FREQ? brittletheories (talk) 10:34, 3 February 2023 (UTC)[reply]
@Brittletheories If you can incorporate it into WT:FREQ that would be great. Benwing2 (talk) 20:14, 3 February 2023 (UTC)[reply]

Removing horizontal rule ---- between language sections edit

Also brought up in October 2005, February 2006, June 2011, May 2013 and likely elsewhere as well.

Though it's been here for 20 years, it's maybe time to rethink whether we really need it. One often sees it either missing in places where it should be or extraneous at the end of an entry: it is overall confusing for new editors and even experienced editors can occasionally slip up. It serves no purpose, besides being aesthetically pleasing to some, and it's arguably a misuse of wikitext syntaxt. We could start a formal vote. Catonif (talk) 15:31, 2 February 2023 (UTC)[reply]

I like it. People often screw up indentation levels, but this separator makes it much harder to do so in a way that would merge two languages. Useful for bots too. Equinox 15:55, 2 February 2023 (UTC)[reply]
(As always, the elephant in the room is the refusal [or rather technical difficulty] of using/introducing a real markup language like XML. I suppose someone, by this point, must have written a Wikt-to-XML converter, but it must be an absolute piece of horror. On the other hand, it's totally unreasonable to expect users, even power users, to write XML entries: anyone who has written commercial code since 1990 will have enjoyed the whole "& amp;" thing, where one side does or doesn't decode or encode, or does it twice. But wikitext is shit and we know it, and never mind the Band-Aids.) Equinox 02:35, 3 February 2023 (UTC)[reply]
Has value in dump-processing and regex-type searches. DCDuring (talk) 02:14, 3 February 2023 (UTC)[reply]
Could you describe how you use the horizontal rules? I generally ignore horizontal rules and just use level-2 headers to find language sections because the presence of horizontal rules can't be relied on and I often want to know the language name anyway. — Eru·tuon 20:23, 3 February 2023 (UTC)[reply]
i agree with Equinox, it makes for a helpful division in the wikicode (and displayed page). I don't see removing it as offering any benefit. There are a lot of areas our content is uncompact and wastes horizontal and vertical space, but I don't see the tiny amount taken up by this line as a problem. - -sche (discuss) 07:04, 3 February 2023 (UTC)[reply]
Yes, I don’t see a compelling reason for doing away with it. — Sgconlaw (talk) 07:09, 3 February 2023 (UTC)[reply]
Same. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 12:34, 3 February 2023 (UTC)[reply]
The problem with horizontal rules isn't visual. We can generate the same lines with CSS if horizontal rules are removed. The problem is that they take up space in the wikitext and they don't give any extra information, and code has to be written so that it works whether they are there or not. — Eru·tuon 20:23, 3 February 2023 (UTC)[reply]

I find the horizontal rule above and below the language name confusing. I would prefer to do away with the rule below the language name. --RichardW57m (talk) 12:57, 3 February 2023 (UTC)[reply]

I wonder whether there might be custom CSS or JS that would address this. DCDuring (talk) 15:02, 3 February 2023 (UTC)[reply]

About bots and regex searches, wouldn't /^==[^=]/m or something of the like work as well? I can see how it might help in reading the wikitext, though I wonder whether syntax highliting could also do the job (making L2 headers a particular colour). Also, we should be thinking whether it is necessary enough to be kept, rather than bad enough to be removed. It is undeniably confusing for new users (there's many things much more confusing here, yes, but those are essential) and seems overall redundant. Catonif (talk) 18:16, 3 February 2023 (UTC)[reply]

As someone who codes bots on the regular, the horizontal rule is actually much more of a nuisance than an aid. — SURJECTION / T / C / L / 18:21, 3 February 2023 (UTC)[reply]
I completely agree with User:Surjection here. My bot scripts never use the horizontal rule to identify language sections since it's not reliable (sometimes users don't include it, etc.). Instead I look for L2 sections, like User:Catonif mentioned, and I have to take care to split off the horizontal rule (and categories ...) before doing certain sorts of transformations, and then put the stuff back at the end, and worry about properly inserting the horizontal rule (or not) if I insert a new L2 section. Benwing2 (talk) 19:54, 3 February 2023 (UTC)[reply]
It's similar with me. In dump-related activities, I only use level-2 headers to parse language sections. Horizontal rules (---- or <hr>) are useless to me, and they are sometimes a nuisance because I have to remove them from any search results that show the contents of language sections, and write the code such that it works if they are there or not. I would prefer removing horizontal rules from the wikitext and generating horizontal lines with CSS targeting level-2 headers instead. — Eru·tuon 20:14, 3 February 2023 (UTC)[reply]
Hmm, OK, I find the line helpful (when editing pages the traditional way i.e. not using visual editor) and am used to it, but if it's inconveniencing our bot runners, then that's an actual harm (where previously I hadn't realized there was one) which must be weighed against its benefit in visually separating language sections in the wikicode. Is the fact that people sometimes forget to include it a unique challenge as compared to any number of other things people do wrong in entries, like typoing section names, using imbalanced numbers of equals signs, forgetting a # in a sea of definitions and indented quotes, etc? - -sche (discuss) 00:56, 4 February 2023 (UTC)[reply]
@-sche No, all of the things you mention cause problems, and things like typos in section names actually cause more problems because they can't be handled automatically. But I don't see much benefit in the visual separation of the line; at least in my browser there's also a horizontal line directly below the L2 language name, so the line above it caused by the ---- seems superfluous. And if it can be displayed automatically (as User:Erutuon mentions), that seems a better approach. Benwing2 (talk) 03:14, 4 February 2023 (UTC)[reply]
To clarify, when I say I find it useful for "visually separating language sections in the wikicode", "when editing pages the traditional way i.e. not using visual editor", I mean I find the presence of ---- a useful separator when looking at the actual wikicode in the edit window, so whether we automatically display a line in the displayed text of the page with CSS is less important (although I do find it useful there, too, as it keeps the L2 from seeming like it belongs to the section above it). Nonetheless, I don't feel strongly about retaining the actual wikicode ----. - -sche (discuss) 20:58, 8 February 2023 (UTC)[reply]
For text editing, I find it useful for manually extracting a language section for cloning inflected forms for terms that have homographs in other languages. Its absence will require greater attention to cursor placement for copy and paste. --RichardW57 (talk) 09:02, 9 February 2023 (UTC)[reply]
If it causes problems, then I support getting rid of it. I have thought at times that it was redundant, although I don't mind it.--Urszag (talk) 07:59, 4 February 2023 (UTC)[reply]
Given the comments from users who actually run bots above I don't personally see the value of keeping it. —Al-Muqanna المقنع (talk) 09:48, 4 February 2023 (UTC)[reply]

Should we start a formal vote? @Benwing2, Erutuon, etc. Catonif (talk) 19:34, 7 February 2023 (UTC)[reply]

@Catonif Sounds good to me. Benwing2 (talk) 21:08, 7 February 2023 (UTC)[reply]

Vote created. Feel free to edit it in this buffer week. Catonif (talk) 20:24, 8 February 2023 (UTC)[reply]

I missed the important vote. :( BTW, I wanted to remove too. You can add bgcolor in heading like Thai does if you want some "distinction" among languages. --Octahedron80 (talk) 15:59, 25 March 2023 (UTC)[reply]

Decreasing Dan's Ban edit

If you missed it, Dan Polansky is two days into a month-long ban. The immediate trigger was this section, but there is a wider context that can be read here.

I disagree with the duration of the ban, so I therefore propose to decrease the ban. I have included three options and a separate "safety" measure. You may also float and discuss alternative sanctions for Dan, such as restrictions on certain namespaces. ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]

I find this proposal rather abrupt and inappropriate. You've already made it clear that you disagree with the ban with the vote to desyop, yet you've made yet another vote to decrease his block. There are users that have found Dan's behavior problematic in the past, and it's not like TheKnightWho is the first person to block Dan either. I don't even know if there's precedent for undoing someone's block with a vote, let alone one posed at Beer Parlour, which decreases visibility. There are better ways to go about this. Also, when would this vote even end? There aren't enough details to begin with. AG202 (talk) 20:44, 3 February 2023 (UTC)[reply]
It is entirely normal to bring up a block for discussion in the Beer Parlour. It is something that has happened many times and yes, it has resulted in partial reversals before. There is nothing untowards about this procedure. ←₰-→ Lingo Bingo Dingo (talk) 20:56, 3 February 2023 (UTC)[reply]
Bringing up a block for discussion, yes, I have seen that before. But immediately proposing a vote? I really do not think that's the best way. AG202 (talk) 21:07, 3 February 2023 (UTC)[reply]
@-sche's comment at Wiktionary:Votes/sy-2023-02/Desysop_Theknightwho is extraordinarily pertinent here as well. AG202 (talk) 21:09, 3 February 2023 (UTC)[reply]
-sche has had disputes with Dan for a long time. That comment was entirely expected and, truth be told, not very pertinent at all. ←₰-→ Lingo Bingo Dingo (talk) 21:44, 3 February 2023 (UTC)[reply]
That comment also specifically addresses the argument you are making: Dan gets involved with everyone who warns him about his behaviour, which makes any attempt to step in after giving a warning look biased. It's bog standard manipulation. Theknightwho (talk) 21:55, 3 February 2023 (UTC)[reply]
@AG202: The decrease of the ban and the de-sysoping are two separate issues and should thus be handled separately. Thadh (talk) 22:51, 3 February 2023 (UTC)[reply]
@Thadh Yes, they should, but yet I don't feel that two votes in two different locations is the best way to go. I've already stated that there are better ways to do this and am not opposed to having the discussion as stated already as well. AG202 (talk) 22:54, 3 February 2023 (UTC)[reply]
I have discovered that Dan has continued writing his rants about individual users on his talkpage on the Czech Wiktionary, including within the last 24 hours. Given that he explicitly refers to comments left on the desysop vote page, and therefore knows that these are a major contributing factor towards his blocks, it is impossible to see how he could be doing this in good faith. His future on the project feels untenable. Theknightwho (talk) 04:13, 4 February 2023 (UTC)[reply]
Wow, just wow. Knowing this, I'd be more inclined to vote for increasing his block length than decreasing it (although I'm not seriously proposing this). For the record, I don't have any particular grudge against Dan. His Thesaurus and Czech contributions seem valuable, and I even think I would side with him in one of the battles of this silly drama war (his closure of the Named roads RfD seemed relatively reasonable to me), but going on critical rants about en.wiktionary.org users on another WMF site after being blocked here for exactly that doesn't seem like the behavior of someone who has learned their lesson and is trying to improve. 70.172.194.25 04:41, 4 February 2023 (UTC)[reply]
This is grounds for a indefinite ban for me. Completely unacceptable. @Vininn126, you need to see this. AG202 (talk) 04:45, 4 February 2023 (UTC)[reply]
This comment in particular stands out to me: Nicméně protože jsem jistým obnoxiózním neférovým hnusákem, který se tváří jako Brit ale dost možná je nějaký despotický Asiat (edituje mongolštinu, proč?), na anglickém Wikislovníku zablokován... (However, since a certain obnoxious unfair bastard who pretends to be British but is quite possibly some despotic Asian (he's editing Mongolian, why?) blocked me from the English Wiktionary...) Dan has a history of making dubious racially-charged comments. Particularly given that he said Dělají ostudu anglosaské kultuře ("They are a disgrace to Anglo-Saxon culture") in reference to the English Wiktionary admins just over 2 weeks ago. Theknightwho (talk) 05:20, 4 February 2023 (UTC)[reply]
Oh. Oh dear. 😦 🤦‍♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 17:18, 4 February 2023 (UTC)[reply]
I just checked his Czech page again and he's updated it with past examples of inappropriate behavior from other users (both directed against him and other targets). To be fair, I think a lot of those examples are far more egregious than anything I've seen Dan write in his critical user reviews. But I think the logical syllogism goes the other way than what I think he's trying to argue. It's not that "X was able to say something terrible without any sanction, so Y should be able to get away with mild attacks." Rather, I think X and Y should both be sanctioned proportionally to their wrongdoing. And there should of course be some notion of forgiving and forgetting things from long ago, especially if apologies have been issued and behavior has changed. That said, some of the comments in question are recent and from the admin involved here; so maybe commenters on the desysop vote should review those. I don't really want to get involved in this further, the discussion is already making my blood pressure rise, and I regret participating in it. I guess I have a naively optimistic view that smart people with a shared mission should be able to get along and work collaboratively, but that doesn't seem to always be the case. 70.172.194.25 06:26, 4 February 2023 (UTC)[reply]
"far more egregious than anything I've seen Dan write in his critical user reviews"—Does that apply to any other than Romanophile's rant on his talk page? Most of it seems pretty mild to me, and none of it implies bigotry against entire groups which seems significant to me when considering the climate this kind of thing creates for other users beyond the people having an argument. I might be missing some though. —Al-Muqanna المقنع (talk) 11:01, 4 February 2023 (UTC)[reply]
@Al-Muqanna Speaking of which, Dan has also penned this delightful essay, which is one of the most tone-deaf pieces that I’ve ever read. Probably one of the best examples of the way Dan feigns objectivity and detachment while being highly selective in how he frames things in order to push an ulterior motive. It honestly doesn’t matter whether this is down to intentional manipulation or a lack of awareness of his own emotions; it’s antisocial either way, and yet another example of something he does that is highly offputting to anyone actually affected by it. Theknightwho (talk) 06:17, 5 February 2023 (UTC)[reply]
Speaking as someone who's trans herself, good frelling riddance! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:16, 6 February 2023 (UTC)[reply]
I wrote the above before I realized the extent of his comments. 70.172.194.25 19:06, 6 February 2023 (UTC)[reply]
Today, Dan wrote this rant about bigotry. In it, he says (in English):
  1. Any Asian user account presents an objectively existing cultural risk, as anecdotally confirmed by the behavior of Wyang in the English Wiktionary.
  2. If someone wants to accuse me of thinking of Asians as anti-democratic and despotic (not each of them, of course; I know a very kind Chinese I had worked with I would have never thought of a despotic; we are talking tendencies), that accusation is correct, and I believe that thinking is supported by solid evidence and analysis. One may even argue that Slavic people tend to be despotic as well (think of the current debacle with Russians, who ought to revolt better against their semi-crazed leader), yet I am nominally Slavic.
  3. Transgenderism or transgender ideology seems to be the cultural norm, although, objectively, it is not the completely dominant cultural norm even in the U.S., where this dangerous form of denial of objective reality has taken root.
I consider these kinds of statements fundamentally incompatible with the ethos of Wiktionary. They’re grounds for an indefinite block. Theknightwho (talk) 16:40, 5 February 2023 (UTC)[reply]
The admins really need to get a grip and act on this because if overt white supremacy doesn't merit removal from this project then I imagine I'm not the only editor who'd have difficulty continuing to participate. —Al-Muqanna المقنع (talk) 17:15, 5 February 2023 (UTC)[reply]
@Theknightwho, Benwing2, Lingo Bingo Dingo We should probably warn the Czech Wiktionary admins about what DP's doing there (if we haven't already). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:19, 6 February 2023 (UTC)[reply]
I got the impression that they're already aware. If they aren't, then they soon will be without our intervention, I think! Theknightwho (talk) 19:21, 6 February 2023 (UTC)[reply]
Just thought I'd bring it up, seeing as DP's still active there, and still has much of their vitriol against various English Wiktionarians on their talkpage there (although they've deleted the very worst of the racist ranting and raving in an apparent attempt to seem more respectable / hide what they'd written earlier, that, too, can still be seen in all its glory in their talkpage's history; judging from the tenor of those of DP's talkpage comments that've been translated here, at least some of what was on their talkpage would likely warrant revdeletion). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:31, 6 February 2023 (UTC)[reply]
I was going to say this yesterday in response to Dan's previous statements, but overt racism of any sort is completely beyond the pale and should automatically lead to an indefinite ban. Benwing2 (talk) 19:55, 5 February 2023 (UTC)[reply]
OK, as an uninvolved admin I have blocked Dan indefinitely. Benwing2 (talk) 20:01, 5 February 2023 (UTC)[reply]
Thank goodness! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:25, 6 February 2023 (UTC)[reply]
Okay, that's quite more than bad enough, reversing my vote. This can be snowballed, if desired. ←₰-→ Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)[reply]
Concerning this behaviour of Dan that has just come to light – wow, just wow. — Sgconlaw (talk) 20:49, 5 February 2023 (UTC)[reply]
I don’t know about this infinite bad requiring an infinite ban. Being wowed is not unjustified, but myself I am not triggered here in the least but my impressions and those of others who felt the block not too short are confirmed. These are the grievances you write if you sit on the internet too much; it’s as easy to lose one’s marbles as to appear ideologized, both are in the end the having primitive ideas that someone has not succeeded to think throuh. Lingo Bingo Dingo warned about his being “not in the best mental state”, and specifically for this I believed he needed a time off. Understanding bans also as a preventive measure, less so than a penalty for being evil, which I still find hard to believe, understanding the probabilities: you couldn’t just tally the block to the evil action without wider context, different persons always have required different measures even and specifically on this wiki, where there are various pathological patterns of editing; unfortunately we can’t put in much effort to convert him from his wrong. Fay Freak (talk) 21:21, 5 February 2023 (UTC)[reply]

Completely Undo Dan's Ban edit

Proposal: Immediately unban Dan Polansky upon the end of this straw vote (the condition being overwhelming support), if it hasn't been undone through another proposal. If this proposal and another reduction proposal pass, this one has priority.

Rationale: Posting that section was not ban-worthy.

Support edit

  Support ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]
  1.   Support I don't agree with many of Dan's contentious points but two or three statements which arguably show signs of low-level racism shouldn't get him banned. Free speech is too precious for that. --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)[reply]
    🤦‍♂️ AG202 (talk) 13:32, 6 February 2023 (UTC)[reply]
    "Two or three"? "Arguably"? "Low-level racism"? Seriously? 🤦‍♀️ There're a LOT more statements than that, many of them filled with obvious, pretty-high-level racism (and since when has racism been OK as long as it's only "low-level", anyways?). Not to mention the overt transphobia that's appeared more recently, and doubtless yet more insults and bigotry that I skimmed over. 🤦‍♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:24, 6 February 2023 (UTC)[reply]
    My point is that Dan hasn't done things like use racial slurs or issue death threats to people because of their race. I think a lifelong ban is excessive. --Overlordnat1 (talk) 19:03, 6 February 2023 (UTC)[reply]
    He has yet to even apologize and continues writing these essays, not only being racist and transphobic (also he uses derogatory terminology in his "essay" on trans folks, yet tries to academic his way around it), but also directly attacks editors here, including with the racist commentary. There's no way that people will ever feel comfortable with him being active here with that type of vitriol. And for the love of everything above, please don't use the "free speech" argument again (this is precisely what the deleted sense at free speech was talking about). AG202 (talk) 23:44, 6 February 2023 (UTC)[reply]
    Please don't try to dictate to me what phrases I can and cannot use. I was using the phrase in a manner consistent with definition 1 at free speech. My opinion is irrelevant in any case as I'm clearly outvoted in this instance. I've made my stand and have nothing further to say. --Overlordnat1 (talk) 01:03, 7 February 2023 (UTC)[reply]
    IMO Dan deserved to be blocked for trying to hijack our entire deletion process and filibustering any attempts to stop him. That's what he was blocked for in the first place. When the indefinite sitewide block was reduced, he responded by intensifying the disruptive behavior- not a good sign. As for the talk-page stuff, it just showed that any attempt at compromise was a waste of time. The details of what he said about whom aren't as important as the fact that he was still trying to justify his actions by vilifying anyone who disagreed with him. The offensive content that's come to light since is just icing on the cake. Chuck Entz (talk) 09:13, 7 February 2023 (UTC)[reply]

Oppose edit

  1.   Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)[reply]
  2.   Oppose. The desysop vote surprised me, and it seemed a bit underinformed to look only at Dan's most recent comment in isolation from context. LBD's harsh reply to me there and comments here surprise me even more (apparently I am not great at noticing when someone has a problem with me? I'm sorry!), and suggest this is not coming from a place of being underinformed about the problem. Unfortunately, to be knowingly arguing over whether a specific comment would violate the letter of one selected standard if it were viewed in isolation from context, ignoring the issue of the user being persistently disruptive and even (as pointed out on the vote page by another editor) ignoring even the letter of other standards which would prescribe an outcome the arguer doesn't like as much, such as the BLOCK policy which specifically prescribes blocks of this length for persistent or repeat offenders, kind of seems a bit like wikilawyering...? - -sche (discuss) 00:59, 4 February 2023 (UTC)[reply]
  3.   Oppose - Dan's talkpage conduct completely justifies the ban. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:06, 4 February 2023 (UTC)[reply]
  4.   Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)[reply]
  5.   Oppose, the issue at stake is clearly not one specific comment. To be honest, to me at least, the track record of racially charged and other exclusionary remarks noted both here and on the vote page, particularly against Asians, makes it very hard to understand the impetus for protecting Dan without any evidence of him amending his behaviour. I'll add, in that respect, another item that has basically gone without comment to my knowledge: Dan's parting shot at Talk:antimuslim, where he apparently calls other editors insane for believing that people with Asian names could be native English-speakers (cf. Citations:antimuslim, where the O'Brien quote was missing at the time). —Al-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)[reply]
  6.   Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)[reply]
  7.   OpposeFenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)[reply]
  8.   Oppose - Correct me if I am wrong, but Wiktionary policy states that your third block due to rudeness must be a month long, and this is Dan's fourth. Three citations, for all senses. (talk) 19:07, 5 February 2023 (UTC)[reply]
  9.   Oppose, switched. ←₰-→ Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)[reply]

Abstain edit

Comment edit

Decrease Dan's Ban to One Week edit

Proposal: Reduce the total length of Dan's ban to one week. If this proposal and another reduction proposal pass, the shorter one has priority.

Rationale: Posting that section may not have been ban-worthy by itself, but considering the context a slap on the wrist is justified.

Support edit

  Support ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]
  1.   Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)[reply]
    My commentary on the previous proposal is equally-applicable here. 🤦‍♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:41, 6 February 2023 (UTC)[reply]

Oppose edit

  1.   Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)[reply]
  2.   Oppose, as posting that section was banworthy - the context merely makes it much more so. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:07, 4 February 2023 (UTC)[reply]
  3.   Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)[reply]
  4.   OpposeAl-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)[reply]
  5.   Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)[reply]
  6.   OpposeFenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)[reply]
  7.   Oppose Three citations, for all senses. (talk) 00:58, 5 February 2023 (UTC)[reply]
  8.   Oppose, switched. ←₰-→ Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)[reply]

Abstain edit

Comment edit

Decrease Dan's Ban to Two Weeks edit

Proposal: Reduce the total length of Dan's ban to two weeks. If this proposal and another reduction proposal pass, the shorter one has priority.

Rationale: Posting that section may not have been ban-worthy by itself, but considering the context a slap on the wrist is justified.

Support edit

  Support ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]
  1.   Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)[reply]
    🤦‍♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:42, 6 February 2023 (UTC)[reply]

Oppose edit

  1.   Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)[reply]
  2.   Oppose, as posting that section was banworthy by itself. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:07, 4 February 2023 (UTC)[reply]
  3.   Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)[reply]
  4.   OpposeAl-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)[reply]
  5.   Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)[reply]
  6.   OpposeFenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)[reply]
  7.   Oppose - Three citations, for all senses. (talk) 00:59, 5 February 2023 (UTC)[reply]
  8.   Oppose, switched. ←₰-→ Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)[reply]

Abstain edit

Comment edit

Restrict Dan from voting on Desysop Theknightwho edit

Proposal: Bar Dan Polansky from voting here if he is unbanned.

Rationale: Very recently banned people should not vote so soon on the staff who banned them.

Support edit

  1.   Support ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]
  2.   Support - well, duh! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:08, 4 February 2023 (UTC)[reply]
  3.   Support - see 2. Three citations, for all senses. (talk) 00:00, 5 February 2023 (UTC)[reply]
  4.   Support - Obvious conflict of interest. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 05:27, 5 February 2023 (UTC)[reply]
  5.   Weak support It doesn't make much difference as Dan is but one editor but this would be a conflict of interest. --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)[reply]

Oppose edit

  1.   Oppose If an editor is not banned, they should be allowed to vote. The idea that someone who has had a negative experience with an admin should not be allowed to vote based on that experience is absurd. People can vote for whatever they want based on whatever criteria they want. - TheDaveRoss 15:03, 7 February 2023 (UTC)[reply]

Abstain edit

Comment edit

Comment edit

You can add general comments here. ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)[reply]

@Lingo Bingo Dingo Do the recent developments here make the ongoing desysop vote against TKW moot, or are we going to leave that vote to run its course (not that it's likely to matter either way, given that there seems to be an overwhelming consensus not to desysop them, but still)? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:29, 6 February 2023 (UTC)[reply]
Yes, I'll cut it short. There is no point to letting it continue. ←₰-→ Lingo Bingo Dingo (talk) 21:29, 6 February 2023 (UTC)[reply]

Sad to know this user is eventually permabanned. This user made some good points on the existing problems of Wiktionary's administration. -- Huhu9001 (talk) 13:05, 26 February 2023 (UTC)[reply]

Premature archiving edit

In January 2022, I commented on Wiktionary talk:Requested entries (English) that suggestions on Wiktionary talk:Requested entries (English) had been prematurely archived. There was no response.

The same thing has happened again, with even a suggestion I posted in January 2023 moved to the 2022 archive. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:08, 3 February 2023 (UTC)[reply]

Alternative forms edit

Here's a question that was occasioned by a particular word but that has more general applicability. What is the preferred entry layout for an inflected form of an alternative form?

Specifically:

screweyes is the plural, of course, of screweye.

It is also an alternative form of screw eyes.

Around which of these two ways of conceptualizing the word should the headings under the entry screweyes be structured? I know how to build the entry either way, but am not sure which would be a better fit for uniformity across Wiktionary in this sort of situation. --HelpMyUnbelief (talk) 01:42, 4 February 2023 (UTC)[reply]

My understanding is that the norm is that each inflected form is defined as an inflection of its own corresponding singular, and then the less-common singular is defined as an alternative form of the other, 'main' spelling. This does mean that someone who looks up e.g. mockups has to click twice, instead of just once, to arrive at mock-up, but this is not so onerous. Sometimes people do add both things to the definition line, i.e. "plural of ___, alternative spelling of ____" or the other way around. To define mockups only as an alternative form of mock-ups and not mention its singular mockup at all would be wrong IMO. - -sche (discuss) 02:09, 4 February 2023 (UTC)[reply]
Thanks. Now that I've re-pondered the issue in light of your answer, I'm having one of those "Of course! What was I thinking?" epiphanies.--HelpMyUnbelief (talk) 07:35, 4 February 2023 (UTC)[reply]
Agree. At minimum "plural of ___". Optimally "plural of ___, alternative spelling of ____". Quercus solaris (talk) 22:39, 4 February 2023 (UTC)[reply]
Agree as well. I usually write “{{plural of|en|mockup}} ({{alternative form of|en|mock-ups|nocap=1}})”. I would also add an “Alternative forms” heading to mock-ups. — Sgconlaw (talk) 22:49, 4 February 2023 (UTC)[reply]

I like the idea of listing it both ways in the definition; and leading with "plural of", I see now, is the only order that makes sense. And of course I'll always add an "Alternative forms" section to the 'main' entry. Thanks for the input, everyone. — HelpMyUnbelief (talk) 01:33, 6 February 2023 (UTC)[reply]

Is it time to look at Toki Pona again? edit

I'm very much not a Wiktionary editor by any stretch of the term, so if I'm way off base in any of this, feel free to poke me about it.

After another quick search of the toki pona appendix, because it's simply the only good toki pona to english dictionary around, I realized... wait, why is it in the Appendix, exactly? Some digging later I found the conlang inclusion guidelines, and saw that the inclusion criteria for mainspace was essentially "some very old IALs". Which, yeah, those are the ones that have a lot of use. Of course, there is a conlang that has a lot of use that Isn't a very old IAL (or from a book/movie): toki pona.

I went on a quick search of WT:VOTES and found that the last time that toki pona was discussed in any depth for inclusion appears to be this 2010 discussion, and that this 2017 discussion appears to me to be a consensus that the inclusion criteria for conlangs is essentially "the community says so". Since then, a few constructed languages have been removed from the main dictionary, but none have been added. I think it would likely be useful to add Toki Pona. For example, here's the number of speakers of the four currently included constructed languages, and Toki Pona.

Esperanto Ido Interlingua Volapük Toki Pona
~60,000c. 2017 200c. 1999 ~1,500c. 1999 20c. 2000[1] ~1,400c. 2022
  1. ^ Claimed 1,000,000 in 1889. Dubious.

I'm almost certainly comparing apples to oranges by using the vastly different dates for Toki Pona and Esperanto compared to the rest of them, but it's surprisingly difficult to find data on how many speak the other three. While I understand that Volapük makes sense to include for the same reason Wiktionary includes, say, Latin, it seems to me like the number of Toki Pona speakers is comparable to the number of Interlingua speakers and Ido speakers. For that reason, I ask: why is Toki Pona relegated to the Appendix? Unlike in 2010, when Toki Pona had a small-to-nonexistent community, nowadays Toki Pona is actively spoken / written by many people - including, for what it's worth, a handful of enwikipedia users.

Toki Pona also recently received an ISO 639-3 code. The application for it can be found here, and I must say it does a better job of explaining it's worthiness for inclusion then I do. Although "second-most used conlang" is a bold claim - have these jan never heard of Interslavic?

Because of my lack of experience with Wiktionary, I don't really know whether it's a discussion actually worth having, but here you go regardless. Casualdejekyll (talk) 02:29, 4 February 2023 (UTC)[reply]

Personally I think all conlangs other than Esperanto should be moved to the Appendix. Benwing2 (talk) 03:20, 4 February 2023 (UTC)[reply]
I have to agree. brittletheories (talk) 12:26, 4 February 2023 (UTC)[reply]
I would highly prefer this option. Volapük could also possibly stay due to its historical significance, but I wouldn't shed any tears over it being moved to the appendix too. — SURJECTION / T / C / L / 20:02, 4 February 2023 (UTC)[reply]
I agree, too. - -sche (discuss) 20:13, 4 February 2023 (UTC)[reply]
Agreed as well. All other conlangs are insignificant compared to Esperanto. Toki Pona may become a big thing but it's not there yet. Ioaxxere (talk) 20:45, 4 February 2023 (UTC)[reply]
I would also like to propose criteria for a conlang to be included into mainspace:
  1. Has an ISO 639-3 code
  2. Has or had at one point a significant community of native speakers
  3. Has a significant body of original literature (not translations) on a wide range of topics
Ioaxxere (talk) 20:54, 4 February 2023 (UTC)[reply]
Agreed. Vininn126 (talk) 22:55, 4 February 2023 (UTC)[reply]
Those other languages didnt get here by virtue of their number of speakers .... they got here because they're fully featured languages capable of doing anything a natural language can do, and have demonstrated such use through a body of works written in the language. Importantly, they can be used to translate material from another language. Quenya, Klingon, and other languages meant for fictional works are often very well made, but they were made for a specific purpose and cannot be used to express concepts outside the fictional world of the work. Therefore they cannot be used to translate written works the way Esperanto and the others can. Toki Pona is an experimental language even more restricted in vocabulary than the languages used for fictional works .... it isn't capable of expressing concepts outside its scope by design. Any translation from English into Toki Pona and back again would result in a distorted message. Therefore it isn't useful to our readers to be putting Toki Pona words and translations into mainspace. Much better to use the appendix, where they're all together, since most likely, people looking for one Toki Pona word are looking for others. Soap 08:01, 4 February 2023 (UTC)[reply]
Agree with Soap on this. There are methodological issues with treating it the same way as any natural language. —Al-Muqanna المقنع (talk) 19:30, 4 February 2023 (UTC)[reply]
You can say whatever you want in Toki Pona. You just have to use a bunch of words to express the same as a concept in English. I am not an expert in this conlang, however. Three citations, for all senses. (talk) 01:12, 5 February 2023 (UTC)[reply]
IIRC, one problem with including Toki Pona would be multi-word strings and if they are SOP or not, since definitions of base words is very loose, it would be hard to determine, plus it would be more difficult to say which such phrases have fully lexicalized and which are nonce (this is a problem with other languages, too, but exacerbated in Toki Pona). Vininn126 (talk) 12:33, 4 February 2023 (UTC)[reply]

As I see it, Wiktionary aims to include all words in all languages, with the footnote that they should be in use by a language community. This excludes made-up words which hobbyist dabble in. (It does not exclude made-up words that are actually in use, such as coinages by the Académie française.) Whether a language is a natlang or not is, therefore, irrelevant. (Standard French is a conlang created by the Académie française.) What matters is the language community and the use (vs. the hobbyists who dabble). Many hobbyist do not make a use.

Our rule of “use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year” is how we’ve been establishing use (in conlangs among others) for a long time. It’s flawed, but I think we should abide by it until we have a better alternative. When Lojban was moved to an appendix, several people brought up that most (all?) Lojban entries could not possibly meet this requirement. I don’t think discussing the inclusion of conlangs in main space can be fruitful until that argument is out of the way.

So you say that over a thousand people use Toki Pona. Sweet, but do they publish durably archived works? can Toki Pona entries meet our criteria for inclusion? (If so, please add quotes! Three independent durably archived quotes on every single sense of every single Toki Pona entry is the one argument I want to hear.) Or is there any other proof that there’s a language community (rather than a thousand hobbyists with too much time on their hands) that we could discuss? If so, I’m with you; if not, sorry. MuDavid 栘𩿠 (talk) 02:37, 6 February 2023 (UTC)[reply]

This is roughly the way we've approached the issue in the past. Volapük and Esperanto are used in a large number of durably archived books, so lots of words are attestable. (Volapük has gone out of fashion nowadays, but books in it were published in the late 1800s and early 1900s.) On that basis I support keeping Volapük and Esperanto in the mainspace. Ido and Interlingua don't have as large of a durably archived corpus, but I would guess probably still enough. For what it's worth, I've seen an actual physical book in Interlingua before, which is more than I can say for Lojban or, for that matter, Toki Pona. —Granger (talk · contribs) 03:50, 6 February 2023 (UTC)[reply]
I disagree with Soap; there are many modern messages Sumerian can't translate ("The Communists moved their tanks over the steel bridge, and ICBMs controlled by computers in Moscow were ready to launch.") If we had a body of text in Toki Pona, then fine. Klingon has ( http://klingon.wiki/En/PhysicalBooks ) a handful of printed books, with most of them being pretty short; The Wizard of Oz, for example, is one of the longer ones, at 40,000 words. Still, I'd argue for Klingon, if there weren't the copyright issues. Volapük has a decent collection of works, even if they're mostly century-old. Vo.Wikisource.org is sparser than I'd like to see, but it's still got a decent amount of text. There doesn't seem to be a single work printed in Toki Pona that's not about Toki Pona.--Prosfilaes (talk) 02:20, 23 February 2023 (UTC)[reply]
When I looked into this last year, I was able to find only two such works, and they were by the same author. I don’t know whether that has changed. 70.172.194.25 06:28, 25 February 2023 (UTC)[reply]

Quotes missing translations edit

These should categorize under their own categories, not under "Requests for translations of X usage examples". Quotes practically always have surrounding context, while usage examples do not, so translating them is a different kind of task. — SURJECTION / T / C / L / 13:32, 5 February 2023 (UTC)[reply]

Agree. Vininn126 (talk) 14:49, 5 February 2023 (UTC)[reply]
  DoneSURJECTION / T / C / L / 07:37, 6 February 2023 (UTC)[reply]

User rights edit

I've been curating our lists of users with certain rights (through the use of Special:ListUsers), and have removed rollback rights from the accounts of long inactive users.

I suggest a bureaucrat (@Chuck Entz, Surjection) do the same for administrator rights.

Also, since administrators automatically have the rollback rights, I intend to remove them from this list as well, so that only people without administrator rights appear in it. This would concern @Benwing2, Mnemosientje, Rua, SemperBlotto, Surjection. Any objection? PUC14:24, 5 February 2023 (UTC)[reply]

@PUC Hi, you need to keep rollback rights on User:Benwing2 since this is a non-admin account. You can remove them from User:Benwing. Benwing2 (talk) 19:43, 5 February 2023 (UTC)[reply]
Ah yes, I forgot about that, sorry. Okay, I won't touch your account. PUC19:47, 5 February 2023 (UTC)[reply]

Normalisation in Old English entries edit

@Skiulinamo:, @Hundwine:, @Hazarasp:, @Leornendeealdenglisc: Hello, all. I wanted to start a discussion to address an important issue regarding our Old English entries, and you all seem to be the editors consistently contributing to the language (please forgive me if I've left anyone out). I want to see if we can establish a consensus regarding the issue of spelling normalisation. I recently saw an edit to dēorling here [[2]] where the main entry was moved to dīerling (a very rare or possibly unattested form [?]) but the move makes sense given that we have the stem's entry as dīere (itself a rare, but attested spelling). In my own personal view, dīere would be the etymologically "expected" form, being inherited from *diurī, that later became dȳre. I know there are others just as valid, and these are often dialectal or temporal variations, but what are your thoughts on how situations like this should be treated ? I think we could get Old English terms added faster and more efficiently if we all align our efforts as a single unit and work together. Personally, I prefer the normalised spelling (even though I am not currently editing that way), as it allows end users to more readily see how derived terms relate to the root. But I am flexible and will support whatever we decide as a group. And of course, we can choose to decide nothing and simply continue to do as we do now. What say ye ? Leasnam (talk) 17:56, 5 February 2023 (UTC)[reply]

I'm not flexible at all. Normalizing makes it way easier to edit wiktionary. I'm not hunting everywhere to determine if a normalized spelling is attested when it's often impossible to tell, since there is no database listing every spelling for every word. Much more convenient to just allow the use of normalized spellings like most dictionaries do; that's been the de facto policy for Old English entries forever without causing any problems. We already sacrifice a little bit of purity for usability when we replace wynn and ⟨uu⟩ with ⟨w⟩, which creates an enormous amount of unattested spellings just by itself. I'll go into more detail if you like, but to me the benefits of allowing these spellings vastly outweigh the costs. Hundwine (talk) 20:49, 5 February 2023 (UTC)[reply]
This has already been brought up a few other places, right? E.g. User_talk:Hundwine#Hyrsum, Wiktionary:Requests_for_deletion/Non-English#hiersum. I agree that it seems like a good idea to come to a general consensus rather than bringing it up on each applicable word. I am not generally involved in editing Old English entries on Wiktionary; it seems to me that this issue mainly involves the diphthong "ie", a spelling "which is virtually restricted to Early West Saxon" ("Late West Saxon palatal diphthongization", CORE), but which apparently is useful per Hundwine as a normalized version of the more common alternatives found in its place.--Urszag (talk) 18:37, 5 February 2023 (UTC)[reply]
Yes, and in a way, also at stīele (steel). Leasnam (talk) 18:57, 5 February 2023 (UTC)[reply]
If you're going to do this, please at least add {{normalized}} and make it clear what the attested form(s) are in some way. 70.172.194.25 18:47, 5 February 2023 (UTC)[reply]
Perhaps greater use of dialect labels could be of use? Currently, "dēorling" is only marked as "Alternative form of "dīerling" and there is no dialect label on either, but if the normalization is implicitly to an Early West Saxon standard, perhaps it will be more appropriate to make that explicit by including an "Early West Saxon" dialect label on the normalized entry, and it certainly seems it would be helpful to include the labels for the dialects that use the form "dēorling" on that page.--Urszag (talk) 18:57, 5 February 2023 (UTC)[reply]
I agree with User:Urszag here about dialect labels. Keep in mind that {{alt form}} supports a |from= parameter to specify a dialect label; see its documentation as well as Category:Form-of templates. So we can add the appropriate dialect labels to indicate e.g. that a spelling is 'Late West Saxon' or whatever. ('Late West Saxon' is in fact one of the already-supported labels in Module:labels/data/lang/ang, meaning that if you use it, you'll get appropriate links.) Benwing2 (talk) 22:01, 5 February 2023 (UTC)[reply]

Changing our glossary definition of neologisms edit

@CitationsFreak @Al-Muqanna as people involved in the discord I believe our current definition of neologisms in the glossary is laughably bad. Neologisms is more a type of marking like slang, a "perceived" newness, and not "anything new". Vininn126 (talk) 18:51, 5 February 2023 (UTC)[reply]

Yeah, neologisms have to have a "new-word" scent to them to count. "Cinemanic" has it, "spongy moth" doesn't. Three citations, for all senses. (talk) 18:59, 5 February 2023 (UTC)[reply]
I agree the glossary definition is problematic, and should be more specific to account for how we use the term. When discussing neologisms people are generally thinking of something more organic than e.g. officially decided scientific names (so SARS-CoV-2 is not a neologism, but Covidtide is). It's also not useful to add a "neologism" context label to every word that happens to originate after, say, 2010. I also note WT:Neologisms (linked at the glossary) says that we label words as neologisms when they're not in other dictionaries, which is a bit barmy in my opinion (are we labelling early modern English terms that happen not to be in other dictionaries "neologisms" too?) and not how the label is used in practice. (I see @DCDuring complained about this way back in 2009 on the talk page too!) —Al-Muqanna المقنع (talk) 19:04, 5 February 2023 (UTC)[reply]
I agree. New word formation happens continually in living languages with large speaker populations, and it isn't useful in a non-linguistics-specific/exclusive context to apply the label of "neologism" to all of the developments (the example given above is a good one). It may be fine for linguists who are using the term advisedly with an agreed operational definition to use it in that sense amongst themselves (such as "any word less than 10 years old" or whatever), but for a wider audience, it is problematic because the public takes it to be a label of casualism or slang, which is a distinct sense from the stricter technical sense. (Polysemy strikes again.) Regarding other dictionaries, one thing about them for sure is that they fail to enter countless words that ought to have lexicographic coverage, and most of those have nothing to do with casualism or slang but rather are simply scientific or technical words that aren't common outside of particular semantic contexts. In the pre-web era, a valid excuse was page count containment. Today for the online versions (of any general dictionary for adults) there is no excuse except lack of budget to pay people to enter them and curate the collection. For Wiktionary at Appendix:Glossary#neologism perhaps a short and clear explanation something like: "neologism: A newly coined term or meaning. Wiktionary does not label new words or senses with this label unless their acceptance in formal register is incomplete or contentious." Something along those lines. Quercus solaris (talk) 19:34, 5 February 2023 (UTC)[reply]
I apply the (neologism) label if a term has achieved widespread recognition in a short time but is not generally recognized as a "real" word, essentially acting as a warning label. Does anyone have objections to these criteria? Ioaxxere (talk) 19:44, 5 February 2023 (UTC)[reply]
I agree that that is essentially the same spirit/theme. The adjective "real" is problematic for this purpose because nonce words and slang words are definitely real words; the true distinction is more about register (formality or absence thereof). But yes, nonetheless, I agree that that is the same idea. Quercus solaris (talk) 19:51, 5 February 2023 (UTC)[reply]
That is more or less what I am trying to say with the OP, and I agree with Q that it's less about "realness" (though I recognize that it's meant to refer to how it's perceived by speakers), and more about that when they hear the word, they feel it's new. Vininn126 (talk) 19:55, 5 February 2023 (UTC)[reply]
Yeah, I'm not sure how exactly to define it, but agree we should define it better. I think there are at least two criteria: firstly what Vininn called "perceived newness" (I understand what Ioaxxere is getting at with "achieved widespread recognition in a short time", but can't a term be a neologism and also rare / not widely recognized? so we need to be careful how we word this), and secondly actual newness (in previous discussions, people have said a word may take a generation to establish itself, so anything older than 20 years, maybe even just 15 or 10 years, isn't a neologism). I think the definition needs to include both, since IMO a name like Margaery or pronoun like ve or singular they or thon or any other word which is actually 100+ years old can't be a "neologism" even if people mistakenly perceive it as new (though it can be "rare", "nonstandard", etc, and in exceptional cases we might want to go into detail in a usage note). I agree it doesn't make sense to call something like SARS-CoV-2 or Delta variant or e.g. tennessine a "neologism" just because they were coined within the last eight years. (I wouldn't necessarily mind entirely replacing the label with defdates or other indications, e.g. in the etymology, of when a word was first used, but certainly as long as we're using the label we should define it well.) - -sche (discuss) 20:36, 5 February 2023 (UTC)[reply]
I think your point about ACTUAL newness is valid - the Polish term dlaczemu is often perceived as a neologism, despite being over 100 years old. Vininn126 (talk) 20:57, 5 February 2023 (UTC)[reply]
Removing it to the etymology section was my first instinct as well, but I think the point about perception and not just reality of newness is solid and supports its use as a context label. I think defining it as a combination would make sense. —Al-Muqanna المقنع (talk) 21:19, 5 February 2023 (UTC)[reply]
By "achieved widespread recognition", I'm talking about coverage by major news sites. In absolute terms, I agree that hardly anyone is using words like tipflation or tripledemic. A neologism that hasn't been established this way should be called a protologism. Ioaxxere (talk) 21:15, 5 February 2023 (UTC)[reply]
  • It is not a good idea to redefine for label purposes the word neologism, which almost all other OneLook dictionaries define as "A new word or phrase, or a new use of a word.", as we did before this change (Nov 18, 2021) and still do in Appendix:Glossary. I don't think we have bothered to look for attestation to support the changes in our definition. Even if we found such attestation, it still seems questionable to use the word in ways that are novel to our users, no matter how we define it in our glossary. I suppose that we could restrict the application of the label to only some neologisms ("new word or phrase, or new use of an existing one"), using some clearly stated criteria (which could be somewhat subjective) presented tersely in Appendix:Glossary#N and at greater length at Wiktionary:Neologisms.
If someone would like to take a crack at a first draft of a substitute or addition to Wiktionary:Neologisms, I am sure many would be happy to suggest changes. DCDuring (talk) 01:09, 6 February 2023 (UTC)[reply]

Improvement & expansion of Wiktionary:Frequency lists edit

I've been thinking about how Wiktionary:Frequency lists might be improved upon. Firstly, I was thinking of creating subpages for wiki-linked frequency lists using either the word lists based on www.opensubtitles.org or those included as part of the Leipzig Corpora Collection. Both collections (though particularly the latter) provide a staggering range of languages and could really help to improve our coverage of smaller languages and identify missing words. Best of all, the content is available under CC, the former with CC BY-SA-4.0 and the latter under CC BY (only applies to the corpora available for download), with no version specified. I hope to then later use these wikilinked lists to expand the frequency lists used on some other language projects, for instance in the Danish, Norwegian and German wiktionaries. If anyone would like to coordinate or collaborate, that would be awesome, but I'm equally fine to proceed on my own as and where time permits, if this is deemed a useful contribution. To see an example of what I'm thinking, you can check out this WIP. Enable OrangeLinks.js for best results.

This brings up the second aspect of my question regarding the organisation of that page, something which - at least from what I can see - doesn't seem to have been discussed much previously. In any case, I think the whole thing could do with some cleanup before being enlarged, with perhaps some of the information being moved into subpages - more so than is already the case. Since it looks like this could be quite a big job, and a lot of the decisions somewhat arbitrary, I'd rather get some input and perhaps find a consensus before starting on it - unless no one really cares either way. I'm unsure whether everything should be moved into subpages and we just keep the links here to each individual language's subpage here, or whether we aim to make this more an index of the frequency lists, much as it is now but with all of the actual list content (if applicable, for example much of the English section) moved into subpages. Any feedback or suggestions welcome. Helrasincke (talk) 08:28, 6 February 2023 (UTC)[reply]

@Helrasincke The frequency lists are in desperate need of some love. IMO they are clearly important, e.g. there was just a Beer Parlour discussion a few days ago (see #Too big) in reference to these lists, where I brought up Category:Basic word lists by language (there's also Category:Basic word lists by family), which aren't integrated into the frequency lists. Yet I don't know of anyone who actively works on them; so I'd suggest you just go ahead and start cleaning things up. Benwing2 (talk) 18:24, 6 February 2023 (UTC)[reply]
Great, thank you for your input @Benwing2. I've worked through to German, at the moment I'm focusing on moving any content to subpages and tidying lists, later I'll go through everything again for the deep clean - but I'm loathe to make any rash or unilateral decisions on that front, since my idea would involve deleting much of what is there and replacing it with the larger lists mentioned in my first post. In general, I'd like to draft — ideally with community input — some written guidelines as to the scope of these pages and how they might be best used in the context of the project. As it stands, there there is a lot of content which IMO is of minimal value to the task of increasing over coverage of words from many languages (granted that's my understanding of their main purpose). For instance 'A list of the top 13 words in language X', '200 most common sentences', 'Most common letters in language Y', etc. seem to fall short of that goal, when there are now lists available up to 1M for most common languages. I'm quite open to others' suggestions as to how to proceed, and there may be good arguments for their inclusion (here or elsewhere) which I can't yet see, but IMO this should at least be moved to a separate appendix (perhaps one targeted at learners or other use cases) or else removed entirely. It's hard to argue we should be providing a general directory service in the era of the search engine.
Furthermore, I think there could be an argument made to remove all of the numeric content from the lists, since this information is a) not in itself a criteria for inclusion (or exclusion for that matter) b) somewhat meaningless outside the context of the specific choice of source material and c) cluttering many of the pages. Same goes for content in tables.
So, clearly I have my views, but I would like to see if we can get some more general input, because there are bound to be different ways of seeing this. Is there anywhere else I could/should be asking these questions? Helrasincke (talk) 00:26, 8 February 2023 (UTC)[reply]
@Helrasincke This is the right place to ask these questions. I think the reason you're not getting a lot of input from other people is that most editors focus more on the mainspace; the subspaces and appendices get relatively neglected in comparison, except for some appendices that document language grammar or general editing guidelines for particular languages. Maybe if I specifically ping some long-time editors such as @-sche, DCDuring, Equinox, Chuck Entz, they will comment (or at least tell me to F off :) ...). Personally I think there are multiple purposes for frequency lists; one is definitely to aid in adding missing words (and that's how I've primarily used them), but another is to help learners in identifying words to focus on, and yet another is in making sure that when there are multiple synonyms in given language for a given word, the most common one is the one showing up in translation tables. Also, there's IMO definitely a value in having Wiktionary curate frequency lists (at least picking the most high-quality and useful ones) rather than just picking some at random: in my experience with various languages, some frequency lists are garbage either because (a) they use weird non-representative corpora, (b) they do a bad job lemmatizing non-lemma forms, and/or (c) they include proper names and abbreviations without properly identifying them as such (so e.g. you can filter them out). In general, just running a word-count algorithm on a bunch of randomly chosen text will yield poor results, and that's pretty much all that some frequency lists consist of. As for some of the more random lists currently listed among the frequency lists, IMO you should feel free to move them to an appendix or link them from a different page than the main frequency-list page; they shouldn't be deleted for now unless their content is wrong, since they might be useful e.g. to learners. Benwing2 (talk) 03:47, 8 February 2023 (UTC)[reply]
English word-frequency lists are an important input for "defining vocabulary" lists, which would help us make more instantly accessible (ie, user doesn't need to go to another page to understand the definition), less vacuous (eg, dormitive principle) definitions. Longman published a list of some 2,000+ words in which definitions were written. Any words used in a definition not in that list were highlighted. We can do them one better by using wikilinks for their highlighted words and not for the words in the defining vocabulary. DCDuring (talk) 15:20, 8 February 2023 (UTC)[reply]
We have an excessive number of wikilinks in our definitions. IMHO, plant is one of those terms excessively wikilinked, with more than 5,000. DCDuring (talk) 15:25, 8 February 2023 (UTC)[reply]
I'd love to see Wiktionary include some high quality frequency lists. As you've mentioned what we currently have is pretty uneven so if you're motivated to take this on go ahead!
I think there are two separate and important questions here:
  1. Where do we get good frequency lists?
    • I've done some work building Spanish frequency list for my own purposes and I can say the best option is to find a good corpus and use it to build the frequency lists yourself. As Benwing2 mentioned above, many existing frequency lists have some limitations that make them inappropriate for generating a lemma frequency list. For example, the opensubs "FrequencyWords" wordlists mentioned above converts everything to lowercase and includes many names from the credits so valid Spanish words like "lisa", "vera", and "vegas" appear much more often than they should. The original corpus does not suffer from the some problems but requires you to parse a bunch of XML files to get the original data. It's a good corpus for finding colloquial expressions but otherwise limited in the depth of vocabulary. I would avoid any source that doesn't include full sentences or, at least, 5-gram case sensitive slices of the text so that the same corpus can later be used for finding collocations or identifying multi-word lemmas.
  2. How do we incorporate good frequency lists into Wiktionary?
    • I think Wiktionary:Frequency lists (or wherever we decide to put it) should contain only a list of curated frequency lists that meet whatever requirements we set for #1, perhaps with an explanation of the corpus and how the list was generated.
    • Ideally every mature language would include both a "word frequency" and a "lemma frequency" list. The former being useful for finding new words to add to Wiktionary and the latter being helpful for learners and also for categorizing "common" lemmas in each language.
JeffDoozan (talk) 17:04, 8 February 2023 (UTC)[reply]
Routledge publishes a series of "frequency dictionaries" for English and ten or so other languages. The English one has 5,000 terms (some hyphenated, none spelled open) and can have a single homograph under as many as 5 PoSes, constituting 5 items on its list. The English one was published in 2010. DCDuring (talk) 22:41, 8 February 2023 (UTC)[reply]
I notice we have frequency lists from a number of sources whose inclusion could be seen as problematic from the standpoint of w:WP:C, some of which appear particularly brazen, and here I do very specifically include the (currently 20 or so) above mentioned Routledge sources, all of which have been uploaded by one user. Of concern are also the HSK & JLPT lists, the Khmer.info, Sanskrit. Despite their clear desirability, I would prefer we didn't host such sources at all without explicit permission or a clear explanation from initial uploader of why they should be excepted, in light of the clear risk. See more here. Are these grounds for speedy deletion? Helrasincke (talk) 23:10, 9 February 2023 (UTC)[reply]
The copyrighted lists are valuable for reference and as feed for development of superior lists. I don't think it is a good idea for them to be generally accessible on our site. DCDuring (talk) 01:36, 10 February 2023 (UTC)[reply]
@JeffDoozan My responses to your points below:
  1. Although I agree that lemmatised lists are satisfying and nice to have, IMO for our purposes the bigger lists with inflected forms are also quite useful: a) their licences are unambiguously compatible with this project, unlike many of the alternatives; b) they are more representative of words as they're likely to be encountered and thus reflect the range of forms which will be used for lookup; and c) if we're doing our work, we'll end up including the lemmas as part of the process anyway, since the inflected entries invariably lead there. There is however obviously the issue of homographs, so sources sentences are important. The Wortschatz Leipzig Corpora which included frequency lists and the OpenSubtitles results are ok on this front, but they do have some clear downsides. I agree the cleaning is a big sticking point - and the Wortschatz ones have have no cleaning done. I'm still trying to get an acceptable workflow for this and my coding skills are not really up to the task. If you have any suggestions here, I'm all ears. Perhaps @Hermitd would like to weigh in here - how would you set up a refined cleaning process, in light of your experience generating the OpenSubtitles wordlists? Are there any intentions for a further iteration or provision of uncleaned lists for manual processing?
  2. I agree that Wiktionary:Frequency lists should be curated - but it is already quite long even after my clean-up, so the question becomes: how do we make that work once the list of languages grows even further? Wortschatz Leipzig provides lists for over 250 languages (although getting them clean is a different issue). I've got a draft for an alternative organisation at my here (with a worst-case scenario also here). It's just one idea for a direction we could take this - with a subpage indexing lists per language.
Any feedback is appreciated.
Helrasincke (talk) 00:13, 10 February 2023 (UTC)[reply]
(Replying only because I was pinged.) I haven't checked recently, but I know what most of our appendices and non-mainspace are like. (Usually: so ill-judged or outdated as to be almost deletable without a vote.) Whether it's worth having these freq lists at all I would question (who uses them? can we find anyone who uses them?) but certainly if we can "bot it" then let's do it. Equinox 06:28, 14 February 2023 (UTC)[reply]
@Equinox Well I invite you to have a look at the changes I've made and judge for yourself if it's an improvement - my next step will be to go through list by list and tidy the formatting to make sure wikilinking is set up properly and improve the overall readability & usability so that you don't feel compelled to speedy delete them :P IMO they are definitely worth having (if done right), as I think they are a great tool to prioritise efforts, especially for neglected languages and lesser-resourced language editions. In a project the size of English wiktionary this is probably not so critical, as a lot of stuff just gets done through sheer diversity of editor interests & skills. For example, now that I've tidied them up, I'll be importing them to the Danish and German wiktionaries for help boosting the coverage of high-priority foreign language entries for my contribution languages. It's a long term goal, but I think one of the reasons Danish wiktionary is so quiet is because it's so empty — even in comparison to the Norwegian edition which has similar-sized maximum theoretical user base (~5M speakers). So hopefully if we can improve the utility we can attract more users, and finally more contributors. Helrasincke (talk) 07:31, 5 March 2023 (UTC)[reply]

Global ban for PlanespotterA320/RespectCE edit

Per the Global bans policy, I'm informing the project of this request for comment: m:Requests for comment/Global ban for PlanespotterA320 (2) about banning a member from your community. Thank you.--Lemonaka (talk) 21:40, 6 February 2023 (UTC)[reply]

@Lemonaka This user has only contributed 3 edits to this wiki, which were in 2017 and entirely non-problematic, so I don't have any particular views on this and I doubt anyone else does either. Benwing2 (talk) 21:03, 7 February 2023 (UTC)[reply]
Gonna miss you Planey. Equinox 06:29, 14 February 2023 (UTC)[reply]
It may be topical for people who edit in (Crimean) Tatar or Uyghur, in view of the user's politically motivated editing. ←₰-→ Lingo Bingo Dingo (talk) 17:41, 16 February 2023 (UTC)[reply]

Minor change in the treatment of Cantonese edit

Currently the language treatment page states that Wiktionary's Cantonese is based on the Guangzhou dialect. However, the prestige dialect has somewhat shifted to Hong Kong in the past few decades due to the influence of Hong Kong's media industry and the decline of Cantonese in Guangzhou. The romanisation system used on Wiktionary, jyutping, is created by the Linguistic Society of Hong Kong; it is based on the phonology of HK, which has merged the high-level and high-falling tones, while GZ still distinguishes them AFAIK. Also, the majority of the contributors and contributions on Cantonese come from HK, and almost all our knowledge on GZ Cantonese itself is solely based on dictionaries.

I therefore suggest that we instead treat Cantonese based on the shared parts between GZ Cantonese and conservative (or laan5-jam1-less, prescribed) HK Cantonese, or simply just conservative HK Cantonese. These two dialects are virtually identical in terms of pronunciation (except for the high-level/high-falling distinction mentioned above, but Wiktionary already disregards it by using jyutping), while the vocabulary (the ones labelled as Cantonese) are shared but with several subtle differences in terms of preference on which terms are used. Independent innovations in vocabulary in GZ or HK would not be labelled as Cantonese itself, but instead the relevant dialect.

I believe that this change better reflects the reality of the language and eases the complications of the need to crosscheck multiple sources. This also eliminates the need for dealing with the high-level/high-falling tones, as we would (or could) assume they are merged in Standard Cantonese. This should have minimal effect on the content of the entries, as this is already how some of us subconsciously operate in, for example GZ-specific words often are already labelled as Guangzhou Cantonese.

Pinging @Justinrleung, RcAlex36, Fish bowl, Mahogany115, and perhaps other Cantonese editors that I've missed. – Wpi31 (talk) 17:25, 7 February 2023 (UTC)[reply]

@Wpi31: I think I agree with what you propose, but I'm not exactly sure what needs to be changed other than what the language treatment page says to reflect this particularity. — justin(r)leung (t...) | c=› } 20:26, 7 February 2023 (UTC)[reply]
As far as I'm aware, the only other thing needed to be changed is the label under the Cantonese section in {{zh-pron}}. – Wpi31 (talk) 02:28, 8 February 2023 (UTC)[reply]
@Wpi31: I've updated WT:AZH and {{zh-pron}} to reflect this. — justin(r)leung (t...) | c=› } 22:00, 10 March 2023 (UTC)[reply]

Participles edit

My recollection from when I started serious editing on Wiktionary was that 'participle' was not an approved part of speech. When it became accepted, was there any guidance on how editing communities should decide how to handle participles? The issue has arisen with regard to Pali, for I had created a verb form headword template {{pi-verb form}}, only to discover that @Svartava had already created undocumented {{pi-vf}} with the same nominal function. On investigating its usage, I find that it is was used very differently - it has been used 6 times, each time for a past participle (to be precise, for the unmarked past participle, which can be active or passive in meaning). My practice has been to treat the past participle as a derived lemma worthy of mention in the conjugation table, for past participles often have meanings not automatically derived from the verb. Svartava's approach has led to two homonymous terms, a non-lemma for the participle and a lemma for the more derived meanings. By some recommendations, this would lead to two identical 1648-cell declension tables. Unless dissuaded, I will replace {{pi-vf}} by {{pi-adj}}, change the terms' headings to 'Adjective', merge with the cognate homonymous adjective where appropriate, and have deleted or reduce to hard redirect {{pi-vf}}. --RichardW57m (talk) 13:14, 8 February 2023 (UTC)[reply]

Of some relevance, yet others, notably @ВМНС, but also @aryamanA, have been creating Pali terms categorised as participles. --RichardW57m (talk) 13:14, 8 February 2023 (UTC)[reply]

Do you have the particular conversation informing that participles were not approved? Vininn126 (talk) 13:18, 8 February 2023 (UTC)[reply]
They were clearly non-standard in [this] old version of an appendix to WT:EL. --RichardW57m (talk) 13:34, 8 February 2023 (UTC)[reply]
@RichardW57m Participles are listed as a nonlemma part of speech in Module:headword/data. It is widely used in Latin, for instance. I also don’t see any reason why these wouldn’t be worth recording as a nonlemma - particularly when you say you’ve only been adding adjectival entries where the meaning has evolved (suggesting this won’t happen for every participle). Given the tables are collapsible, I see no problem in having it twice anyway. Space is cheap. Theknightwho (talk) 15:25, 9 February 2023 (UTC)[reply]
What I've been doing with Pali participles is to record them as derived lemmas when I come across attestation of their form or a perceived need to record them, and add in meanings that don't automatically derive from their being participles, while observing copyright. Each participle is a single term, and I record it as an adjective. What @Svartava was doing was recording participles under the PoS word 'verb' and their derived meanings as separate terms under 'adjective'. Others have been adding participles under the PoS word 'participle'; I've not seen them handling additional meanings. I'm not sure what would be a good forum for thrashing this out.
There seem to have been quite a few arguments over where some English words are adjectives as well as present participles, and we have the benefit of native speakers. I don't believe we have native speakers of Pali, and certainly not native speakers of the Pali of the Canon. It's so much simpler if we can collapse them to multiple senses of a single term!
Space is not cheap - it imposes a burden on users. Note the complaints this month about the size of the record of the sources for quotations. Collapsibility partially alleviates the burden, but selective expansion is also a pain. It is fortunate that the declension of participles, compared with one another, is overwhelmingly regular, otherwise having two tables would mean having to document (and maintain) a set of irregularities twice for each script, as well as ultimately for each writing system. I think there may a whole bunch of misbehaviour still to document for the oblique cases of the feminine plural of the present participle, bedevilled by a small data set. RichardW57m (talk) 17:35, 9 February 2023 (UTC)[reply]
So are you saying that participles are just a form of adjective? That seems controversial. Theknightwho (talk) 17:51, 9 February 2023 (UTC)[reply]
In Slavic languages they are adjectival or adverbial. Vininn126 (talk) 17:57, 9 February 2023 (UTC)[reply]
It seems to work for Pali. They inherit the ability to have non-subject factors of the action, but other adjectives also have this ability. The absolutive (=gerund, = independent participle =converb) is most simply treated as a verb form, though it has no marking for person or number. There is merit in categorising participles as participles, though one will have to remember not to use the categorising participle templates when defining case forms, such as "# {{inflection of|pi|pacant||loc|s}}, ''which is'' {{inflection of|pi|pacati||pres|part|t=to cook}}" for pacante, which is a case form of a participle rather than a participle. --RichardW57 (talk) 20:52, 9 February 2023 (UTC)[reply]
IMO participles should be placed under a Participle header and use {{head|LANG|participle}}. I've corrected all the places I could find in Spanish, Portuguese and Italian where they were placed under a Verb header. In general, some but not all participles have also evolved into adjectives and in the case this has happened, you should use a second L3 heading ===Adjective=== after the participle heading (or an L4 heading if there happen to be two distinct etymologies). If the participle and adjective have exactly the same declension, I suppose it's possible to put that declension under a separate L3 heading ===Declension=== rather than put two L4 headings, but my normal practice is to use two L4 headings. The duplication doesn't seem a big deal to me in most cases. Benwing2 (talk) 23:52, 9 February 2023 (UTC)[reply]
Inform me. What is the test for distinguishing a Pali adjective from a participle? Is there anyone here capable of applying the test? --RichardW57 (talk) 08:13, 10 February 2023 (UTC)[reply]
The true horror case for duplication is santa (true) - just look at the number of overrides in the page's source code for the declension! I have found an isolated text book which claims a monosyllabic nominative singular saṃ exists - I think this ought to be verified before inclusion. It would definitely tilt the stem from santa to sant. So far I have only needed to transliterate it for Devanagari (potentially automatable) and alphabetic Lao (four varieties jammed into one table). It's inclusion will now require maintenance on 6 (and rising) tables instead of 3 (and rising) as at present. Each script's entry will of cause get even more confusing visibly with three duplicates for utterly homonymous past participle and adjective santa (exhausted). --RichardW57 (talk) 08:13, 10 February 2023 (UTC)[reply]
Remind me. Would the nice translation of Pali santa (tranquil) go to RfD or RfV after I split the corresponding term into (past) participle and adjective? The question will be whether the adjective as opposed to past participle exists. --RichardW57 (talk) 08:13, 10 February 2023 (UTC)[reply]
Non-English words are supposed to have translations, rather than definitions. (It does seem that a lot of people, quite reasonably, ignore that rule.) Is the adjective v. participle decision to be based on the language in question or on the translations? This question is also relevant to the verb v. adjective distinction in some languages; I've seen edits where someone has objected to translating a 'verb' by an English adjective. --RichardW57 (talk) 08:18, 10 February 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── OK you are getting snarky here. No need for that. As for Pali, I don't know anything about it, but I'm sure you are aware of the tests for distinguishing adjectives from participles in general; why can't they apply to Pali? And I have no idea why Pali santa (tranquil) would be sent to RfD or RfV; has this happened before? Benwing2 (talk) 00:37, 11 February 2023 (UTC)[reply]

@Benwing2 As it happens, I'm not aware of a language-independent test to distinguish participles from other adjectives. I couldn't tell you why English lovable is not a participle, and I can only guess why the Latin gerundive is a participle but the semantically corresponding Ancient Greek verbal adjectives such as Ancient Greek λῠτέος (lutéos) are adjectives. --RichardW57 (talk) 05:00, 11 February 2023 (UTC)[reply]
The Pali lemma santa (tranquil) currently contains the senses 'tranquil' and past participle of sammati (to be calmed). A mechanical partitioning of the senses of Pali participles will result in the 'tranquil' sense being tagged as an adjective lemma, distinct from the lemma for the participle. That separation may be challenged. --RichardW57 (talk) 05:00, 11 February 2023 (UTC)[reply]
I've converted participles to L3=verb, headword={{pi-verb form}} to L3=verb headword={{pi-vf}}, making the rough breakdown of Roman script participles:
12 of L3=verb headword={{pi-vf}}
3 of L3=...participle
30 present active participles with L3=adjective
5 with form_of=participle (excludes {{inflection_of}}, which has no visible effect for Pali)
That makes L3=adjective the majority approach. I'm the only member of the Pali editor community to have spoken here. @Benwing2's suggestion to distinguish participles and adjective has no practical advice on when to separate adjective and participle, and will therefore also be ignored for that reason. I will standardise Pali participles to have L3='Adjective', and will move to using cat:Pali form-of templates (sorry, @Theknightwho) to categorise them as participles, except that the gerundive (aka future passive participle) remains TBD. Notifying @Octahedron80, Apisite. --RichardW57m (talk) 13:57, 13 February 2023 (UTC)[reply]
@RichardW57m "suggestion to distinguish participles and adjective has no practical advice on when to separate adjective and participle, and will therefore also be ignored for that reason" is obtuse and you know it. There are lots of ways to distinguish participles from adjectives: (1) adjectives have unpredictable meanings that are not transparently derivable from the verb; (2) adjectives lack the verbal meaning inherent in participles; (3) adjectives can (often) form the comparative and superlative, while participles cannot; (4) adjectives can (often) be modified by adverbs such as 'very', 'somewhat', etc. while participles cannot; etc. Use your judgment, obviously. Benwing2 (talk) 18:43, 13 February 2023 (UTC)[reply]
@RichardW57 I'm going to have to agree with Benwing here, and I'm not really sure where all this confusion is stemming from. Participles usually have a slightly different function than adjectives, if sometimes behaving like them. E.g. in English they are used pariphrastically to create the continuous constructions. They also rarely take degrees of comparison, unless they are fully adjectivilized, e.g. "I saw the more reading boy." (?) There are clear differences between the two, even if they are similar. Vininn126 (talk) 18:58, 13 February 2023 (UTC)[reply]
"E.g. in English" isn't much use for other languages. "'Rarely' take degrees of comparison" means that the test cannot be relied upon. Now, I could only find one comparative and one superlative built on a Pali present participle, santatara and sattama, and that is consistent with the sense "good" of santa being for a non-participial adjective. For the past participle, I could find two comparatives in the PTS, built on kanta and paṇīta, and also one on duggata (wretched), which is rather a compound of a past-participle. That is not a very powerful test.
As to the periphrasis test, what do we make of the example sentence for svākkhāta (well-preached), where the adjective appears to be being used to form a past 'passive' sentence, rather as in Latin? Have I mistranslated it, or is it evidence for an unattested verb svākkhāti (to expound well)? (The simplex, akkhāti (to preach) does exist.) RichardW57m (talk) 14:51, 14 February 2023 (UTC)[reply]
Not returning a 100% rate does not a weak test make, and disregarding information like that means you're still disregarding insight. Vininn126 (talk) 15:51, 14 February 2023 (UTC)[reply]
@Benwing2: So what is going on with Latin valēns? The PoS is given as 'Participle', but it has a comparative and superlative (though no attestation is accessible from the entry). There is no back-link from this 'participle' to the verb which gives it as a participle, namely valeō (to be strong). I think the answer is that comparative and superlative are available for Latin verbs with a stative sense, whereas there are very few English stative verbs that semantically allow for degrees of applicability.
I asked how to distinguish participle and mere adjectives at a Pali forum, and was told that in Thomas Oberlies' new book, what are generally called participles are simply described as 'verbal adjectives'! (See my remarks on lovable above.) I was also asked for a sentence where not knowing would cause confusion. My correspondent told me that the distinction really applies to the English translation rather than the Pali words themselves.
An example that comes to mind is Pali mata (dead) from marati (to die). If we redefine the verb as 'to become dead', then the discrepancy between the verb and the adjective becomes a matter of
  1. whether one can be dead without ever becoming dead; and
  2. whether being dead is permanent.
As 'having died' generally implies 'dead', the verbiness of the sense depends on the translation, which is not right.
@Vininn126: So what is the purpose of the distinction? Is it to aid parsing? Is it a filter for use when constructing new sentences in the language? Some adjectives (e.g. arguably English near) govern noun phrases without inheriting this from non-existent parent verbs, so parsing feels a weak argument for me.
I now have a related issue. In the example sentence at Pali ඡින්‍දති (chindati), there is a long list of present participles each translated as 'he who...'. I have treated these as participles used absolutely. Should I separate these usages out as agent nouns? Normally, depending on the language, we allow adjectives used as the cores of noun phrases to be recorded as adjectives. In English, this makes very good sense, because the adjective does not behave exactly like a noun. --RichardW57m (talk) 12:42, 15 June 2023 (UTC)[reply]
@RichardW57 Latin valēns is pretty clearly an adjective as well as a participle, just mislabeled in Wiktionary. You're supposed to use your judgment and intuition; you seem to refuse to do so. Benwing2 (talk) 18:24, 15 June 2023 (UTC)[reply]
It's meaning is exactly what one would expect given the meaning ascribed to the verb. The only surprise is that it has degrees of comparison; however, merely given its meaning, their existence is unsurprising.
My judgement is that trying to separate adjectives and participles for Pali is not worth the effort. --RichardW57 (talk) 19:01, 15 June 2023 (UTC)[reply]

Page headings edit

I notice Wikipedia has introduced page headings that stay at the top when the page is scrolled down, for example Britannia Bridge. I think it's a great idea, and wonder if it can be introduced in Wiktionary. I think it would be helpful for longer pages. DonnanZ (talk) 12:10, 9 February 2023 (UTC)[reply]

@Donnanz But it will be a nuisance for single screen displaying multiple windows. It may be impossible to slide the window so that the headings disappear off the top of the screen. --RichardW57m (talk) 14:05, 9 February 2023 (UTC)[reply]
@RichardW57m: I don't use that system, but I still have multiple windows open, two for Wiktionary, displaying one at a time. I'm interested in other users' views too. DonnanZ (talk) 14:14, 9 February 2023 (UTC)[reply]
You can check the new skin with url https://en.wiktionary.org/wiki/foo?useskin=vector-2022 or you can change your preferences/appearance to skin Vector 2022. Anyway, the new skin will be soon the default one. It's worth checking it to find out any problem in advance. Vriullop (talk) 07:03, 10 February 2023 (UTC)[reply]
@Vriullop: Oh right! It works on my widescreen monitor quite well when I scroll down. The main criticism I have is with the sidebar, which I feel is now too wide, and this applies to Wikipedia too (I notice it doesn't appear on my user page there). You now have to scroll down to find the table of contents (languages etc.) underneath everything else. I welcome its move from the very top, but I feel it should be at the top of the side bar. DonnanZ (talk) 09:35, 10 February 2023 (UTC)[reply]
@Donnanz You can hide the sidebar with the << icon, or you can hide the TOC then it is available in the page heading even scrolling down. There are some improvements on Wikipedia, not yet available here, splitting the sidebar with page tools in a new bar on the right. This affects some gadgets that will need to be updated. Vriullop (talk) 11:01, 10 February 2023 (UTC)[reply]
@Vriullop: OK, we can wait to see the new skin "in the flesh". DonnanZ (talk) 11:58, 11 February 2023 (UTC)[reply]
Is this when it is the topmost vertically stacked window, for which one can afford only a small vertical scan? Not everyone with a separate monitor has a widescreen. I suppose one may have to start picking and choosing skins. --RichardW57 (talk) 13:18, 11 February 2023 (UTC)[reply]
@RichardW57: I had to buy a new monitor last year after my old monitor with a narrower screen conked out. On top of that, the hard drive wore out, and I had a new solid-state hard drive installed by my local computer shop. An expensive year, but it was worth it. DonnanZ (talk) 15:32, 11 February 2023 (UTC)[reply]

Lemmas that are not words edit

Where do we mention that a lemma is not a word when it is listed under a part of speech that is normally associated with a word? For example, I believe it is unnecessary to mention it for lemmas categorised as prefixes. I can think of three examples from Pali alone:

  1. Adjectives in -nt. Perhaps it is obvious from none of the inflected forms ending thus. In general, we also have all or almost all nouns ending in a consonant other than niggahita.
  2. orimo, an alternative citation form of orima, an adjective that only occurs in the neuter gender.
  3. varati Etymology 1, which lacks a present tense and I strongly suspect also a present active participle.

Thoughts? --RichardW57m (talk) 13:35, 9 February 2023 (UTC)[reply]

Disallowing mass closures edit

Today, @Ioaxxere closed a very large number of nominations at WT:RFVE. This is an example of one of several large edits they made, closing many threads at once. The great majority were fails. I also don't see any evidence that they had actually attempted to look for citations themselves (though I may be wrong), but they did suggest that they thought the deadline for citations was a month (which is not the case; as far as I know, that's just the minimum).

While it's obviously a problem that we tend to end up with a large backlog at RFV and RFD, I don't think that means we should just start closing things en masse, as it seems very unlikely that each term would have been given proper consideration. Plus, if the closer isn't even attempting to cite the term themselves, then this just amounts to a fail (without warning) after an arbitrary period of time.

I propose that we don't allow these kinds of mass closures, as I think there are less unilateral ways to clear the backlog, which don't rely on a single person's understanding of what the consensus or relevant policy is (e.g. posting about it on the Beer Parlour). Before his ban, Dan Polansky also had a habit of doing mass closures at RFD, too, and the issue was the same: it gave one person far too much influence. Theknightwho (talk) 23:30, 9 February 2023 (UTC)[reply]

I'd also like to point out the issue of starting "CFI mandated" votes for their specific entries or entries that are found on Twitter or Reddit, even though we had a vote that showed the consensus about those sites and how we did not want to allow en masse words with no checks. It seems even more so now that this user is almost circumventing the WT:DEROGATORY policy that was voted in. They even started a "CFI-mandated vote" for entries that didn't even have cites, such as y'all'd'nt've, yet closed other entries as RFV-failed that didn't have any other cites either. I'm acutely aware of the problem of RFV backlogs and was in strong support of splitting WT:RFVNE out, but this is not the solution at all. AG202 (talk) 23:34, 9 February 2023 (UTC)[reply]
Also:
  • Deleting citations without saving them on the citations subpage.
  • Failing entries with 2 durably archived cites, seemingly without looking for more (which, while not against policy, is low effort and likely to mean we delete entries which are citable).
Theknightwho (talk) Theknightwho (talk) 23:39, 9 February 2023 (UTC)[reply]
If the nominations were closed improperly, I'd be all in favor of undoing them. User:Ioaxxere should really undo them themselves, or if they don't want to, give a very good reason for this. Benwing2 (talk) 23:46, 9 February 2023 (UTC)[reply]
I pointed this out yesterday, and largely agree with what Theknightwho and AG202 have said. The idea of helping reduce our badly backlogged request pages is wonderful, but I think care and research is needed, if only to confirm that the terms really are uncitable/hard to find. Mass-failing entries is not the best solution. I’d rather have a large backlog than indiscriminate closures, honestly, although I see why others might evaluate the tradeoff differently. 70.172.194.25 23:48, 9 February 2023 (UTC)[reply]

This proposal is problematic for a couple of reasons:

  1. You never gave a definition for "mass closures". 10 a day? Then I'll just be able to do 9 a day.
  2. What is "proper consideration"? Scouring the entire Internet? If a brief Google Books search is acceptable, then it's hard to believe that doing that would find quotations that the nominator missed.
  3. RFV fails are not "without warning": there's a big RFV flag over the entry that I guess people have been trained to ignore at this point.
  4. I'm not trying to gain "influence" over RFVE, and I would really like it if people closed RFVs more often.

As for the points raised about moving citations: I'm open to doing that, but that was never a requirement.

In my view, the solution to the backlog in RFVE is to enforce a one-month deadline and therefore create a sense of urgency to cite entries. Ioaxxere (talk) 23:48, 9 February 2023 (UTC)[reply]

  1. If you just look for ways to subvert the policy by maximising the number of allowed closures, then you'll probably just get told not to close anything. Closing a thread needs to be done when either (1) there is clear consensus, (2) the term has been cited, or (3) it seems unlikely that (further) citations are forthcoming anytime soon. That entails putting in a reasonable amount of effort for each one.
  2. "Proper consideration" means that you've looked in all the usual places, and still can't find anything.
  3. They were without warning. It doesn't matter that there's a big warning on the entry: I have already explained to you that the issue is that you failed them arbitrarily, without doing anything to give a sense of urgency (which might encourage people to find cites).
  4. I never said you were trying to gain influence. I'm saying you were exercising too much influence. It doesn't need to be intentional to be a problem; I'd have an issue if anyone did it, however well-meaning.
You seem to be coming at this from the perspective of what is technically allowed, but the overriding concern should be what is best for the project. These are not necessarily the same thing. Theknightwho (talk) 23:56, 9 February 2023 (UTC)[reply]
I feel that a well-functioning and strict RFVE process is what's best for the project, but it seems like a lot of people want more effort to be put into each closure. To address your points, would you prefer a message like "I haven't found any quotations on [list of place I've looked]. If three quotations aren't added by [date in a few days], I'll mark this as RFV Failed."? Ioaxxere (talk) 00:05, 10 February 2023 (UTC)[reply]
That's fine. Could you please undo your closures from today and yesterday? I think we need to start-over with them. Theknightwho (talk) 00:07, 10 February 2023 (UTC)[reply]
I think I'll take a break from closing RFVs until we reach consensus on a new policy... (edit: I assume there's no controversy on closing passes) Ioaxxere (talk) 00:27, 10 February 2023 (UTC)[reply]
I've rolled out this new approach on some newer RFVs: [3] Ioaxxere (talk) 02:18, 10 February 2023 (UTC)[reply]
I like the "new approach" much better. Thanks! 70.172.194.25 22:54, 16 February 2023 (UTC)[reply]
1. I find it ironic that a new order of obstructionists has emerged alongside the old order of cranky deletionists. The basic argument seems to be "we can't cite Twitter or Reddit because people say terrible things there." Yes, the toxicity of social media was a problem before, and that shadow has been Elon-gating in the current climate. But Wiktionary's mission is to document "all words in all languages," which by definition includes hateful, offensive, and stupid terms. There are only so many tools for documenting the bleeding edge and hidden underbelly of a language, and in 2023 no one is posting on Usenet or mimeographing zines in their kitchen. I draw the line at platforming hate sites. That's a hill I've died on before, and would die on again. But Twitter, Reddit, and the like do not deserve to be treated the same as The Daily Stormer. That's another hill on which I'm willing to die as far as wiki-participation goes.
2. CFI has never explicitly disallowed online sources. It was commonly interpreted that way due to unclear wording: "Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google." Over time, "no online sources except Usenet" became de facto policy, as it aligned with many users' personal sensibilities. Similarly, the updated text CFI text does not mandate a two-week-long discussion and vote for every RfV nomination involving online sources. It says "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks," without specifying what form this discussion should take, what the subject of discussion should be, or how consensus is to be reached. I understood it as codifying the consideration of websites on a case-by-case basis, creating a mechanism to approve useful sites like Reddit, Twitter, and news sites while shutting out fringe and extreme sites. Others have seemingly decided to interpret the vague wording in ways more favourable to their interests. And so we're right back to square one, where critical parts of CFI are unwritten, not actual codified and explicit policy.
3. The RfV closing procedure has never required that closers attempt to attest nominations themselves. This would be an unreasonable expectation even if there wasn't a substantial backlog. WordyAndNerdy (talk) 01:09, 10 February 2023 (UTC)[reply]
I don't think any policy prohibiting "mass closures" would be wise. Almost every major contributor to RFV, myself included, has at times closed a high volume of requests (because, as noted, the pages have quite a backlog). It's hard to see how an entry failing RFV after the allotted time is "without warning"; the terms are listed on a central page everyone can watchlist and have big banners in their entries warning everyone that cites are needed. If we want to make posting "warning: this term is two weeks away from failing" a requirement, we could have a bot do that, but it's hard to see how that'd represent an improvement over people just knowing how time works: terms are listed chronologically on WT:RFV and time progresses in a linear fashion (outside the TARDIS), so terms at the top of the list will be deleted after the allotted time. - -sche (discuss) 01:32, 10 February 2023 (UTC)[reply]
While I agree on the need to handle the backlog, and that closers should not be forced to attempt to attest nominations, I think mass closures are generally not a good idea—I agree with TKW that a single person shouldn't be seen to overly dominate the process, and I'd add that borderline cases really demand more attentive treatment, and ideally I do think closers should make at least a cursory check to see whether the issue with a term is actually poor attestability as opposed to just lack of interest. -sche's point about expecting terms at the top of the list to go is well taken, but then the natural expectation is really that with a large backlog terms further down aren't going to be on the chopping block just yet when there's uncertainty surrounding them. (As one example, I was meaning to get around to xheart/xliver myself at some point when I have more time if nobody else did it, though I accept it's long past one month, and the closing remark mentions that the discussion at RFV had had no obvious result yet...) —Al-Muqanna المقنع (talk) 02:37, 10 February 2023 (UTC)[reply]
I think the problem lies not on the massness closure, but rather the prematureness of the closure. If there had to be some rule, I would prefer disallowing closing an RFV that only one person has participated in, though I don't think such a rule is really that much of a necessity. – Wpi31 (talk) 05:08, 10 February 2023 (UTC)[reply]
I would be happy to give a once-over to an RfV that had been open without comment for 30 days to help satisfy any rule requiring more than one contributor be involved before removal can proceed. DCDuring (talk) 16:02, 10 February 2023 (UTC)[reply]
I also agree that the issue is the lack of participation rather than the 'massness' of the closure. For example, on Talk:Cel-Liberation Day there is no indication that anyone other than myself ever even tried to find citations for the term. I have to admit that the policy was not violated, but I'd rather have seen at least a tiny bit of confirmation that it seemed hard to cite, as this is a term that is mentioned in various places online. 70.172.194.25 22:54, 16 February 2023 (UTC)[reply]
I don't find mass closures to be more of a problem than having an overwhelming number of entries that attract RfVs. As long as citations (whether durably archived or not) are saved to the appropriate citations page, and any discussion (or lack thereof) is saved to the talk page, little effort is wasted. After all, any admin can restore the old entry and any contributor with sufficient new evidence can make a new entry. My own practice is to work only on RfVed entries that strike me as worth taking the time away from other contributions to Wiktionary. Over time, fewer and fewer of the RfVed items have seemed worth it to me. I suspect that others operate in the same way. DCDuring (talk) 15:59, 10 February 2023 (UTC)[reply]
I wouldn't disallow mass closures and, in fact, I think it's good that Ioaxxere has reduced the backlog. The way I see it, if something gets tagged as closed and the header gets struck out at RFV/RFD and people are given a week to object before it gets archived and the entry is deleted then they have ample time to object and the closure can be reversed. The only issue is if things get archived before the week is up. --Overlordnat1 (talk) 16:06, 10 February 2023 (UTC)[reply]
We shouldn't be reducing the backlog by rushing things. Theknightwho (talk) 16:13, 10 February 2023 (UTC)[reply]
Are you saying that we should lengthen the required minimum dwell time for an RfV to more than 30 days? If so, how much more? DCDuring (talk) 16:48, 10 February 2023 (UTC)[reply]

Passing terms with no or insufficient quotations edit

Examples: Mozella (there’s only one quote in this spelling on Mozela), praecognita (Al-Muqanna noted all of the easily findable hits on Google Books seem to be Latin code-switching; the OED reference is paywalled so I can’t see if the same applies there). I don’t support this practice, at least when finding uses is non-trivial, which applies to most terms sent to RfV. 70.172.194.25 19:09, 20 February 2023 (UTC)[reply]

Added permalinks for the entries above and below, to show how they looked at the time of passing. 70.172.194.25 21:28, 24 February 2023 (UTC)[reply]
@Ioaxxere, I thought that you were going to take a break from closing RFVs until a consensus was reached. There are terms that are being prematurely closed and archived, and it's much harder to reverse it once they're archived. There are way too many entries to check before they're archived as well. AG202 (talk) 19:28, 20 February 2023 (UTC)[reply]
Mozella seems to be easily attested on Google Books. Still would prefer having three citations (I hadn’t checked prior to writing the above.) antijapanese is a better example of the phenomenon. 70.172.194.25 20:09, 20 February 2023 (UTC)[reply]
I was under the impression that alternative forms are counted together with the main entry, so two quotations for color and two quotations for colour are sufficient. This is how every other dictionary (include far less inclusionist ones) count alternative forms.
As for praecognita: it seems pretty silly to make a distinction between a quotation and a link to a quotation. If the issue is the paywall, would you prefer a link to the free OED2 [4]? (although there are few quotations than in OED3 [5]) Ioaxxere (talk) 20:11, 20 February 2023 (UTC)[reply]
When alternative forms are sent specifically to RFV, they must have 3 cites on their own. This is standard practice. Same with having cites specifically on their entry per WT:ATTEST. AG202 (talk) 20:22, 20 February 2023 (UTC)[reply]
This was definitely my understanding as well. 70.172.194.25 20:27, 20 February 2023 (UTC)[reply]
Is it the same thing for inflected terms then? Ioaxxere (talk) 21:11, 20 February 2023 (UTC)[reply]
Use common sense. Lots of words aren't attested in all of their possible forms, so we don't want want to fail the first person singular of a completely regular verb just because it's only attested in the third person. On the other hand, if someone creates an entry for an archaic second-person singular form of a rare modern technical word used only in medical journals, we want to be able to challenge that.
If it's the word as a whole, and not just a given inflected form, we don't want to fail it because it doesn't have the complete paradigm, or even the principle parts, attested. If the lemma form is unattested and can't be conclusively derived from the attested forms, it can get tricky- but that will probably never happen with modern English. Chuck Entz (talk) 21:56, 20 February 2023 (UTC)[reply]
@Chuck Entz The issue here is that our common senses don't agree... my "common sense" is that if alternative form quotations get to count towards the lemma, then the lemma's quotations should count towards the alternative form (assuming that there is at least one quotation for each specific form), and that's the point of disagreement. Ioaxxere (talk) 22:34, 20 February 2023 (UTC)[reply]
It’s commonly accepted that inflected form quotations count for the lemma. The question is whether quotations for one spelling (whether lemma or inflected) count toward another spelling.
I think what you’re proposing would basically allow the creation of any alt spelling with at least one citation, and you even say as much, which definitely hasn’t been our standard practice as I’ve observed it. For precedents, see Talk:beat'emest, Talk:canican, Talk:gamahauch, and many others. 70.172.194.25 00:53, 21 February 2023 (UTC)[reply]
I agree that including every alternative spelling ever would be excessive (although the OED often actually does that). On the other hand, having a strict "three cites for the exact spelling" leads to some (in my opinion) absurd results. Do you think covid cut with five quotations should be deleted just because people are capitalizing it inconsistently?
By the way: I was trying to find a policy or a vote for this "inflected form quotations count for the lemma" rule. WT:CFI has nothing. Ioaxxere (talk) 04:40, 21 February 2023 (UTC)[reply]
While I don’t think that covid cut should be deleted (and I feel that that question to 70 was too much of a leading question considering it’d have to go to RFV first anyways), I do doubt that it should be the main lemma looking at the entry considering that it only has one cite for that spelling. AG202 (talk) 04:44, 21 February 2023 (UTC)[reply]
Er, I guess you might be disappointed by the fact that I already passed this entry in RFV a few days ago. But since you already think it should stay, I would like to ask why this doesn't conflict with your rule of "they must have 3 cites on their own" given that no particular capitalization reaches 3 cites.
And @AG202 I do try to follow policy as much as I can, but it feels like there is so much left unsaid, that unless I create a vote every week I have to rely on (subjective) "common sense" which has clearly not made everyone happy. If you genuinely think that everything is clear-cut then please let me ask for advice on a few difficult RFVs. Ioaxxere (talk) 04:51, 21 February 2023 (UTC)[reply]
@Ioaxxere I don't think everything is clear-cut, but this is where practice, expertise, and overall more time with the project comes in. I get wanting to move things faster and stuff like that, I was the same, but after getting corrected by folks like @BD2412, I realized that I should take a step back and watch more than act, until I had a good grasp as to what the policies and practices are, especially with very wide-reaching issues like RFVE. I got more used to pinging specific people who I knew were more experienced with the issue at hand who've been working on the project for much longer, rather than just proceeding forward based on my own sole interpretation. That's what I'd really recommend, honestly. Part of the problem, though, is that there just aren't that many admin anymore and the ones that are left are fairly new, which can lead to the disconnect that we've seen very recently (and honestly part of why culture here is important to me, as clearly something is going on to where people are leaving and not joining as much, leading to areas that have been left untouched like RFVE), and it's something that needs to be discussed more.
As for covid cut, this is one of those things where honestly I didn't even know that it was at RFV nor that it had passed. If I had seen the discussion, I would've pushed at the very least for the lemma to be moved and for more cites to be added, but I'm only one person and I simply can't proofread/check every RFV, which is part of why mass closures can be very ehh. AG202 (talk) 05:27, 21 February 2023 (UTC)[reply]
Well, after I finish looking at the November-December RFVs (about 80) I'll be going a lot slower, no more mass closures. Ioaxxere (talk) 06:35, 21 February 2023 (UTC)[reply]

@Ioaxxere, AG202, Chuck Entz: Another example where I would object is Falklands fritillary. I think it is citable, but the citations provided on the page aren't good enough for reasons described at Wiktionary:Requests_for_verification/English#Falklands_Fritillary_Butterfly. I understand that this was a case of "Cited" and not "RFV-passed", but I still find it to be problematic. With rare exceptions, citations should include the term in question as a grammatically separable unit, not just as sequence of words that doesn't parse as one unit. Especially when the issue raised by the user who sent a term to RfV was specifically whether the term exists as a separable unit! 70.172.194.25 03:12, 23 February 2023 (UTC)[reply]

Changed link to a permalink, showing the state of the entry when it was called "Cited", because they have since added better quotations. 70.172.194.25 04:33, 23 February 2023 (UTC)[reply]

"CFI votes" don't turn rfv into rfd edit

While I haven't read the vote itself or the discusssion there, my impression is that the idea of the vote on exceptions to the "durably archived" principle was strictly about allowing or disallowing sources, not entries.

As I understand it from the discussions here and on the other fora before, during and after the vote, the idea was not to overturn the status quo, but to allow for exceptions where the consensus was that it made sense to do so. That would mean that websites in general are still disallowed, but that either specific web sites would be allowed always if the community approved, or that certain web sites could be used for for certain rfvs if a consensus to do so was reached in a discussion lasting at least 2 weeks.

In other words, we might decide that site X is always worthy of being treated as if it were durably archived. Or we might say that it doesn't make sense for a term found all over the place online to fail just because it's never made its way into print or usenet- so we should allow site Y or site Z to count for this particular term.

There are definitely sites that are utterly useless for attestation purposes for any number of reasons, and should never be even considered. But there are also sites where we don't want to give them carte blanche because someone could use them to game our processes- but where it's obvious that nothing of the sort is going on, we can allow them with a clear conscience.Chuck Entz (talk) 07:14, 10 February 2023 (UTC)[reply]

At any rate, I think the "CFI-mandated discussion" for a term should consist of first deciding whether this is a real term that is only failing because of distortions caused by our choice of sources, and then deciding what sources can safely be allowed in this case in order to correct for those distortions.Chuck Entz (talk) 07:14, 10 February 2023 (UTC)[reply]

If we agree that a term has 'clearly widespread use', then we don't need to agree that any quotation is valid for CFI. As far as I am aware, quotations that are not resilient enough for CFI are allowed, though adding a dated HTML comment that it was disallowed for CFI would be useful. --RichardW57m (talk) 09:54, 10 February 2023 (UTC)[reply]
@Chuck Entz: if I understand correctly, your proposal is a series of votes: one, where people decide between Real or Fake, and subsequent votes for Accept/Reject X Quotations, Accept/Reject Y Quotations, etc.? If so, I don't see the utility of such a process over a simple Keep/Delete. Ioaxxere (talk) 23:36, 10 February 2023 (UTC)[reply]

Voting "delete" or "keep" doesn't make sense. RFV should be strictly about whether the term actually exists, and whether the tools we're using to determine whether it exists are right for the job. If they aren't, what should we be using? Chuck Entz (talk) 07:14, 10 February 2023 (UTC)[reply]

It especially doesn't make sense to vote "delete" based on whether there are enough cites or not. It totally distorts the discussion, because obviously more cites may come along, and it can always be failed anyway if not enough turn up. Theknightwho (talk) 07:17, 10 February 2023 (UTC)[reply]
@Theknightwho you have repeatedly demanded "CFI-mandated discussions" on terms which don't have three quotations. In your opinion, what should a such a discussion be about? Ioaxxere (talk) 12:56, 10 February 2023 (UTC)[reply]
It's helpful to have single-word votes for a quick tally, though, and it's not clear what they ought to be if not delete/keep. Allow/forbid (the citations)? Support/oppose? —Al-Muqanna المقنع (talk) 08:59, 10 February 2023 (UTC)[reply]
It would make sense to have two word votes for citations, which presumably should get their own paragraphs for clarity. 'Forbid' looks wrong for quotations - or are you proposing that quotations that don't count for CFI shall be turned into usage examples or deleted? Accept/reject (scilicet 'as evidence') would be better, though I think 'accept citation' would be much clearer. Not everyone who participates in an RfV discussion will be familiar with the procedure. --RichardW57m (talk) 10:08, 10 February 2023 (UTC)[reply]
@Chuck Entz I think you're overly paranoid about people who "game our processes". We're not that big of a deal, IMHO Emmett Lathrop Doc Brown (talk) 22:07, 10 February 2023 (UTC)[reply]

Deprecating {{zh-l}} edit

This follows on from my post on the Grease Pit about specifying alternative forms in a single link (e.g. colorcolour), which can be used in any link template by using the delimiter //. The motivation behind this was to make it possible to deprecate {{zh-l}}, which is the specialised link template for Chinese. The most obvious difference with {{l}} is that it automatically generates simplified forms: (e.g. (tán)). It also generates pinyin, which it manages by scraping pre-existing entries; something other link templates don't currently do for Chinese.

To replicate this, I've also created a way to generate forms automatically on a language-specific basis, which can be done by specifiying a module in the language data using the generate_forms key. In the case of Chinese, simplified forms would be generated by Module:zh-generateforms. In addition, I've also created Module:cmn-translit, which automatically generates pinyin in a similar fashion to {{zh-l}}. Neither of these have been turned on yet, but they do mean that it's now feasible to start replacing {{zh-l}} with {{l}}, {{m}} and similar. In particular, it also means we can getting rid of bodges in etymology sections involving Chinese, which frequently look like {{bor|en|cmn|-}} {{zh-l}}. What's worse, that bodge is the only way to give traditional/simplified side-by-side when specifying a specific language such as Cantonese.

Before turning either of these on, though, I just wanted to bring this up at the Beer Parlour to gauge any concerns. I've not noticed any memory issues in testing, but there are likely to be entries where doing this would lead to duplicated simplified forms being shown (as these have often been entered manually). I would also guesstimate that {{zh-l}} has been invoked a couple of hundred thousand times, so any conversion would need to be done by bot. This is not likely to be straightforward, because it has a somewhat more flexible syntax (which just isn't possible to port over to the main templates, because it would cause problems for other languages). Naturally, there is also the concern about whether Mandarin pinyin should be given by default for Chinese links as a whole, but that's a concern that also applies to {{zh-l}} itself, so I don't really want to tackle it here.

Overall, though, it would be really good to start sweeping away these sorts of language-specific templates wherever possible, because they're often not written very well, and they lead to walled gardens of badly written modules that end up being massively inefficient and incompatible with everything else. Not to put too fine a point on it, but the Chinese modules are a shitshow at the moment, and unpicking them is inherently going to involve growing pains such as this. Theknightwho (talk) 21:47, 11 February 2023 (UTC)[reply]

  1. I believe this should also imply deprecating most of the other Chinese templates, namely {{zh-syn}}, {{zh-ant}}, {{zh-cot}}, {{zh-hyper}}, {{zh-hypo}}, {{zh-also}}, {{zh-synonym}} (plus {{zh-altterm}} and {{zh-altname}} which the standard templates have been deprecated long ago), {{zh-alt form}}, {{zh-misspelling of}}, {{zh-short}}, and potentially some others that I've missed. Some of these have slightly different displays or input from the standard templates, but that should not be a significant hurdle.
  2. As I've mentioned numerous times before, I strongly oppose automatic pinyin. Sorry Knight I know you don't want to deal with this here, but that's exactly my concern with this change, since it codifies the status quo of having pinyin into something discussed and passed with a "consensus" in BP.
It introduces many errors and inaccuracies in how we present the information. Many characters have multiple readings, your example could be tán or dàn – both are equally common, but the template (including the existing {{zh-l}}) does not accomodate this and simply outputs one of them. I'm not fluent in Mandarin at all (and there are many other Chinese editors who similarly do not speak Mandarin fluently), so I could never tell which is the correct reading – what I would do is letting the template do its job and not caring about whether the output is correct, or more recently I would simply manually turn it off. I imagine that most of the fellow Mandarin editors wouldn't always check the correctness of the pinyin either. It appears to me that having automatic pinyin creates more maintainence than without it.
Also, fuck unified Chinese, the template is called {{zh-l}} not {{cmn-l}}, and Chinese ≠ Mandarin, so it is totally absurd to impose Mandarin onto every Chinese entry. I believe the various problems arisen from this has been mentioned to death everywhere, so I'm not repeating them here unless someone asks me to.
Wpi31 (talk) 04:51, 12 February 2023 (UTC)[reply]
Yep - that’s fair. These two features are disconnected from each other, so I can turn on automatic simplification without automatic pinyin. Theknightwho (talk) 16:12, 12 February 2023 (UTC)[reply]
@Wpi31 I’ve been having a think about how to mitigate this issue: it’s possible to turn off automatic pinyin if multiple different pronunciations are detected. This would therefore retain it for the majority of links, but prevent it anytime there’s ambiguity. It wouldn’t catch all false positives, but I think it’d bring them down to an acceptable level. Theknightwho (talk) 17:30, 12 February 2023 (UTC)[reply]
Thanks. I think that would be a reasonable compromise. – Wpi31 (talk) 17:39, 12 February 2023 (UTC)[reply]
Addendum: On the issue of zh, we could only turn on automatic pinyin like that for cmn, which would make this switch-over a good excuse to start disambiguating lots of Chinese links. Similar semi-automatic systems could also be put in place for yue, nan and so on. We could also use categories to flag any ambiguous links with no romanisation. — This unsigned comment was added by Theknightwho (talkcontribs) at 17:36, 12 February 2023 (UTC).[reply]
Addendum 2: I’m unsure about how practical this is, but it would also be possible for {{l|zh}} to show all the pronunciations. This might even be a good way to avoid pointless repetition when something applies to all/multiple varieties, while encouraging people to be more specific when possible, too. Theknightwho (talk) 17:48, 12 February 2023 (UTC)[reply]

Tagging @Justinrleung, @RcAlex36, @MSG17, @ND381, @Octahedron80, @Fish bowl, @LibCae, @沈澄心 for comment. Theknightwho (talk) 17:14, 14 February 2023 (UTC)[reply]

I broadly agree with both you and Wpi31 in this matter. This would be in line with other template depreciations in terms of proposal, though I do think some more testing would be need to ensure a smooth transition and show that other templates can handle the zh syntax. As to showing pronunciations, I think that while a change is needed with romanizations (particularly with entries showing pinyin automatically, but not say POJ for Min Nan only entries), implementing it might be problematic for users if either automatic pinyin is removed or all the romanizations are shown (which could lead to overly long listings, inconsistent display of entries with different romanizations, and/or general display changes). In any case, however, if a template can handle romanization of non-zh entries, then it should be able to handle character differences for zh entries. MSG17 (talk) 02:29, 15 February 2023 (UTC)[reply]
@Theknightwho Let me know if you need bot work done. I've written a lot of bot scripts to do rather complex things and I imagine converting {{zh-l}} to {{l}} shouldn't be so hard. If there are cases it can't handle automatically, it will leave them alone to be done manually in a separate pass. Benwing2 (talk) 03:33, 16 February 2023 (UTC)[reply]
@Benwing2: Thank you - much appreciated. I think it would be a good idea to spend a month or so doing manual replacements (on an as-and-when basis), which should hopefully identify most scenarios. This will also give time to identify/discuss/solve any formatting changes this would cause (e.g. I’ve noticed that Chinese link templates don’t bolden terms in non-gloss definitions, and there may not be consensus to change that). Theknightwho (talk) 16:48, 17 February 2023 (UTC)[reply]
Regarding the romanisations for ambiguous cases, would it be possible to supply the parameter with unformatted text and have the module output the formatted form (i.e. mainly automatic superscript) – Wpi31 (talk) 07:19, 16 February 2023 (UTC)[reply]
I strongly oppose the deletion of the template. Like {{zh-x}}, it requires care and knowledge (wich comes from careful checking), the generic templates are not able to transliterate Chinese, Japanese, Thai and Khmer terms or whole phrases, the way some language-specific templates do. Offer a good, working alternative before suggesting deletion. The incorrect transliterations will come from incorrect usage. --Anatoli T. (обсудить/вклад) 01:47, 17 February 2023 (UTC)[reply]
@Atitarev: That's exactly the reason why this is brought up here. Please read the entire discussion before commenting in such an aggresive manner. All features of {{zh-l}} are already replicated (though they are not enabled yet), except for the guessing with the |2= parameter which should always be discouraged (nevertheless it shouldn't exist in the first place). The automatic simplified forms are handled by Module:zh-generateforms; the * disabling translit/simplified can be done by {{zh-l|車//|tr=-}}; the automatic pinyin is done through Module:cmn-translit which, as Knight has said, is doing the exact same thing as {{zh-l}} does, so any incorrectness arising from the new module(s) already exists with {{zh-l}} itself. What we have been discussing above is simply trying to further eliminate the incorrect outputs by disabling pinyin for only the ambiguous cases and to support auto-transliteration for other lects. Wpi31 (talk) 05:04, 17 February 2023 (UTC)[reply]
@Wpi31: I wasn't aggressive, I just have a strong opinion. I saw your aggressive comments about unified Chinese, though. If you need coverage for other dialects, {{zh-x}} (for usexes) already uses parameters for other varieties, a similar template for e.g. Min Nan would require similar work and even more suppressions or manual overrides. Thanks for doing the work but I haven't seen it anywhere near completion (correct me if I'm wrong) or used in action with a template. The usage is very big and it seems too early to deprecate. Mandarin Chinese transliteration can stay largely automated, even if care should be taken, hope it will stay so. Anatoli T. (обсудить/вклад) 05:15, 17 February 2023 (UTC)[reply]
@Atitarev I am offering a good, working alternative. I don't really understand what your objection is. Theknightwho (talk) 05:37, 17 February 2023 (UTC)[reply]
@Theknightwho: I welcome the work, I don't welcome the deprecation (yet). Anatoli T. (обсудить/вклад) 05:41, 17 February 2023 (UTC)[reply]
@Atitarev: I’m not suggesting we deprecate it immediately, but I do want to start the process of migrating away from it. This will help to identify any further difficult cases, too. The fact that other lects will have exceptions and difficult cases is fair enough, but automating those would be a new feature anyway. Right now, I suggest we turn on these features for {{l}} (and by extension, all the other standard link templates). Then we can go from there. I expect {{zh-l}} will take several months (if not over a year) to fully replace. Theknightwho (talk) 16:38, 17 February 2023 (UTC)[reply]
I’m going to turn these features on in a day or two if there are no further objections (with the change that Mandarin pronunciations only work automatically if only one pronunciation is given on the main page). Tagging @Wpi31, @MSG17, @Benwing2, @Atitarev, who have participated in the discussion. Theknightwho (talk) 20:01, 21 February 2023 (UTC)[reply]
@Benwing2, @Atitarev (who may not be aware of this) - I've turned on automatic simplification for zh and cmn. Automatic pinyin is still to come, as there are some bits to iron out. Ben - would it please be possible for you to run a bot job removing any duplicated translations? Up until now, it's been necessary to add traditional and simplified separately - usually with traditional first, but with the pinyin given in the simplified template (to avoid duplication). In the great majority of cases, the automatically generated simplified form will be correct, meaning that it'll now be displayed twice (e.g. at noodle#Translations). The pinyin "transliterations" will need to be moved to the traditional template, too. Many thanks. Theknightwho (talk) 05:29, 24 February 2023 (UTC)[reply]
@Theknightwho Sure, although I'll need more guidance on exactly what to do. Specifically, can you give me examples of various templates as they look now and what they ought to look like? Benwing2 (talk) 05:33, 24 February 2023 (UTC)[reply]
Also, I see 27 entries in CAT:E. Only some of them are memory-related but I'm wondering if at the end of this, the removal of dead code will result in memory decrease from the current state. Benwing2 (talk) 05:35, 24 February 2023 (UTC)[reply]
@Benwing2 No problem. To use the same example:
There will be some that don't follow this format, but this should catch about 95% of them. I'm hoping you're right about the removal of dead code - there is a lot of code in the Chinese modules that we would do well to get rid of. Theknightwho (talk) 05:38, 24 February 2023 (UTC)[reply]
@Theknightwho: Thank you for the efforts! Will you be able to turn features (eventually) for sentence transliterations and simplified forms?
At Template:zh-x/documentation#Tricks (for Mandarin only) - the list describes typical situations when dealing with Mandarin (about 10%), where automated simplifications and transliterations are not right. Ignore the delinking but helping with desired simplified forms and corrected pinyin is what is typically required with this automation. Anatoli T. (обсудить/вклад) 05:40, 24 February 2023 (UTC)[reply]
@Atitarev I should think so, yeah. You can use // to separate traditional/simplified if you need to do it manually (in exactly the same way as / works for {{zh-l}}). Theknightwho (talk) 05:43, 24 February 2023 (UTC)[reply]
@Theknightwho:. Great. Forgot to mention (in case you haven't implemented) that "^" is already used for capitalisation of romanisations in Japanese, Korean and by {{zh-l}} and {{zh-x}}.
It opens the door for fully automating Japanese transliterations. That's why I always provided the full kana in all Japanese translations. {{ja-r}} and {{ja-x}} show similar tricks and challenges.
Same thing can be done for Thai and Khmer. {{th-x}} and {{km-x}} for reference. Anatoli T. (обсудить/вклад) 05:49, 24 February 2023 (UTC)[reply]
@Theknightwho Is there a bot-callable function to convert traditional Chinese to simplified, and one to generate the default transliteration for a string of Chinese characters? ({{xlit}} will probably work for the latter; not sure the correct function for the former). It needs to be either a template call or an instance of {{#invoke:...}}. Benwing2 (talk) 06:01, 24 February 2023 (UTC)[reply]
Also the example you gave has 'cmn' in it. Do you only want/need 'cmn' transliterations converted or also 'zh' transliterations (and do the latter exist at all)? Benwing2 (talk) 06:03, 24 February 2023 (UTC)[reply]
@Atitarev Yes - I’ve enabled ^ as a way to capitalise transliterations for all scripts which don’t have capitalisation - it’s more flexible, too, as you can put it anywhere in the term.
@Benwing2 Using lang:generateForms(text) will return a table of forms. Here, it’ll contain two forms if it’s made a conversion, but only one if not.
In terms of Chinese translations, there shouldn’t be any using zh, as it should be divided by lect. In reality, I know plenty do - but they’re usually bullet-pointed by lect (which makes determining the correct langcode trivial). Theknightwho (talk) 06:20, 24 February 2023 (UTC)[reply]
OK, to make things more concrete, here is part of a dump of searching through the Jan 20 dump file for the regex \{tt?\+?\|(cmn|zh)\|.* (although the Feb 20 file should be similar):
Page 900 Roman numeral: Found match for regex: {t+|cmn|羅馬數字}}, {{t+|cmn|罗马数字|tr=Luómǎ shùzì}}
Page 901 letter: Found match for regex: {t+|cmn|字母|tr=zìmǔ}}, {{t+|cmn|字|tr=zì}}, {{t+|cmn|文字|tr=wénzì}}
Page 904 decrypt: Found match for regex: {t+|cmn|解密|tr=jiěmì}}, {{t+|cmn|解碼}}, {{t+|cmn|解码|tr=jiěmǎ}}, {{t+|cmn|解讀}}, {{t+|cmn|解读|tr=jiědú}}
Page 906 Irish: Found match for regex: {t+|cmn|愛爾蘭語}}, {{t+|cmn|爱尔兰语|tr=ài'ěrlányǔ}}
Page 909 second: Found match for regex: {tt+|cmn|第二|tr=dì'èr|sc=Hani}}
Page 910 century: Found match for regex: {t+|cmn|世紀}}, {{t+|cmn|世纪|tr=shìjì}}
Page 911 clock: Found match for regex: {tt+|cmn|鐘}}, {{tt+|cmn|钟|tr=zhōng}}, {{tt+|cmn|時鐘}}, {{tt+|cmn|时钟|tr=shízhōng}}, {{tt+|cmn|鐘錶}}, {{tt+|cmn|钟表|tr=zhōngbiǎo}}
Page 912 millisecond: Found match for regex: {t+|cmn|毫秒|tr=háomiǎo|sc=Hani}}
Page 913 polytheism: Found match for regex: {t+|cmn|多神教|tr=duōshénjiào}}
Page 914 Japan: Found match for regex: {tt+|cmn|日本|tr=Rìběn}}
Page 915 computer science: Found match for regex: {t|cmn|電腦科學|sc=Hani}}, {{t|cmn|电脑科学|tr=diànnǎo kēxué|sc=Hani}}, {{t+|cmn|計算機科學|sc=Hani}}, {{t+|cmn|计算机科学|tr=jìsuànjī kēxué|sc=Hani}}
Page 917 few: Found match for regex: {t+|cmn|少|tr=shǎo|sc=Hani}}, {{t+|cmn|一些|tr=yīxiē|sc=Hani}}
Page 918 meat: Found match for regex: {tt+|cmn|肉|tr=ròu}}
Page 919 I love you: Found match for regex: {t+|cmn|我愛你}}, {{t+|cmn|我爱你|tr=wǒ ài nǐ}}
Page 920 beer: Found match for regex: {tt+|cmn|啤酒|tr=píjiǔ}}, {{tt+|cmn|麥酒}}, {{tt+|cmn|麦酒|tr=màijiǔ}} {{q|rare or regional}}
Page 922 encrypt: Found match for regex: {t+|cmn|加密|tr=jiāmì}}
Page 925 ASAP: Found match for regex: {t+|cmn|盡快|sc=Hani}}, {{t+|cmn|尽快|tr=jìnkuài|sc=Hani}}, {{t+|cmn|及早|tr=jízǎo|sc=Hani}}
Page 929 pseudo-: Found match for regex: {t+|cmn|偽|alt=偽-|sc=Hani}}, {{t+|cmn|伪|alt=伪-|tr=wěi-|sc=Hani}}, {{t+|cmn|假|alt=假-|tr=jiǎ-|sc=Hani}}
Page 934 trade union: Found match for regex: {t+|cmn|工會}}, {{t+|cmn|工会|tr=gōnghuì}}
Page 937 umbrella: Found match for regex: {t+|cmn|傘}}, {{t+|cmn|伞|tr=sǎn}}, {{t+|cmn|雨傘}}, {{t+|cmn|雨伞|tr=yǔsǎn}} {{qualifier|rain}}
Page 939 white-collar: Found match for regex: {t+|cmn|白領|sc=Hani}}, {{t+|cmn|白领|tr=báilǐng|sc=Hani}}
Page 941 chairman: Found match for regex: {t+|cmn|主席|tr=zhǔxí}}, {{t+|cmn|議長}}, {{t+|cmn|议长|tr=yìzhǎng}}
Page 943 bit: Found match for regex: {t+|cmn|馬銜|sc=Hani}}, {{t+|cmn|马衔|tr=mǎxián|sc=Hani}}
Page 946 BCE: Found match for regex: {t+|cmn|公元前|tr=gōngyuán qián}}
Page 947 BC: Found match for regex: {t+|cmn|公元前|tr=gōngyuánqián}}, {{t|cmn|主前|tr=zhǔqián}} {{qualifier|Christian}}, {{t+|cmn|紀元前}}, {{t+|cmn|纪元前|tr=jìyuánqián}}
Page 949 point: Found match for regex: {t+|cmn|點}}, {{t+|cmn|点|tr=diǎn}}

Some questions here:

  • Page 900 Roman numeral has the translit 'Luómǎ shùzì' which will not be what's auto-generated since it has a capital letter and a space. I'm pretty sure spaces should be preserved but do we want to map capital letters to lowercase in translit? Also is there a way to specify in the Chinese characters that there should be a space in translit? Some sort of specially-handled character which doesn't show up in the link or the Chinese display but does show up in translit. I implemented something of this nature for hyphens in Korean, but it was special-cased in Module:script utilities. I take it maybe you've implemented a generalized version of this?
  • Page 904 decrypt: I take it the third translit is the simplified equivalent of the second. If 'jiěmǎ' is the default translit, do you want the bot run to detect this and remove it, so it's auto-generated?
  • Page 909 second: Do you want the bot run to remove |sc=Hani?
  • Page 914 Japan: Another capital letter in translit.
  • Page 929 pseudo-: There's an |alt= param in both the traditional and simplified equivalent, from what I can tell. How should the bot handle this?
  • Page 946 BCE and page 947 BC: The same expression occurs in both places but with differences in placement of spaces. Presumably we should eventually fix this (not by bot)?

Benwing2 (talk) 06:18, 24 February 2023 (UTC)[reply]

@Theknightwho I checked some examples where I removed the manual translit and it appears auto-translit isn't ever getting generated. Is this correct? Are there plans to change this? Benwing2 (talk) 07:42, 24 February 2023 (UTC)[reply]
@Benwing2 Sorry for the misunderstanding - automatic pinyin hasn’t been turned on yet, as we’re ironing out the specifics on how best to go about it. The consensus so far is that it’s going to be a semi-automated, with ambiguous situations requiring manual input. As things stand, that means it’s best to keep all the transliterations we have at the moment, and then we can handle those later on if they need to be removed. They’re lower priority, as they don’t have the immediate visual/usability problem that the duplicates have. Theknightwho (talk) 12:50, 24 February 2023 (UTC)[reply]
@Benwing2, Theknightwho: ^ is used for capitalisation and space is space in Module:zh-usex. Both are invisible.
I’d like |sc=Hani or Hant to be removed. Anatoli T. (обсудить/вклад) 12:17, 24 February 2023 (UTC)[reply]
|alt= should be fine, e.g. 韓國的韩国的 (zh) (Hánguó de) Anatoli T. (обсудить/вклад) 12:22, 24 February 2023 (UTC)[reply]
@Benwing2 I’ll leave the various transliteration issues for now. However, please remove all script codes (from all lects, if that’s not too much trouble). At the moment, these are wrongly overriding the traditional/simplified detection, and for zh & cmn they will be potentially causing issue for the generation of simplified forms (as that only works if the script is detected as Hant). Plus, it’s likely automatic simplification will be turned on for some of the other lects at some point, too. Theknightwho (talk) 12:58, 24 February 2023 (UTC)[reply]
{{re|Benwing2}} I've checked some translations with warnings above. They are apparently already addressed. Do you still have any queries outstanding? I understand you're not checking for how to do spaces/capitalisations yet? It's not ready yet. Anatoli T. (обсудить/вклад) 05:46, 27 February 2023 (UTC)[reply]

based pro-standard templates chad, good work and ily, and {{l|zh|車//|tr=-}} is excellent syntax 👍️ —Fish bowl (talk) 21:54, 24 February 2023 (UTC)[reply]

Happy to see things up and running, and it's definitely a good step to make templates less divergent for Chinese. I'm wondering whether simplified should be suppressed in {{zh-dial}}, which has been relying on the backend of {{l}}. I know that was the status quo every since the conception of {{zh-dial}}, but I think it's time to make it more accessible for people who are used to simplified (which tbh is probably the majority of Chinese leaners). The downsides are that it would make the template a little clunkier (which I don't see as a big issue) and that it's going to take up a bit of memory because of the sheer amount of data we process with {{zh-dial}}. Thoughts from @Fish bowl, RcAlex36, Wpi31? — justin(r)leung (t...) | c=› } 22:46, 24 February 2023 (UTC)[reply]
IIRC it was decided to remove simplified to reduce visual clutter, which I think is reasonable (陰莖#Synonyms), but I can't find the conversation. (also I'd personally like to migrate {{zh-dial}} to my more language-agnostic {{dial syn}} too)Fish bowl (talk) 23:38, 24 February 2023 (UTC)[reply]
@Fish bowl: I believe it's Wiktionary talk:About Chinese#Simplified Chinese in all templates and modules. We might want to revisit this since I think the decision was mostly Wyang's. (I guess you were kind of also supporting it if you think it's cluttered in zh-dial.) — justin(r)leung (t...) | c=› } 05:21, 26 February 2023 (UTC)[reply]
@Theknightwho My bot script is running. There are 22,776 pages to process so it will run overnight. Besides removing redundant translations, it removes script codes from translation templates for all Chinese lects and replaces 'zh' with the correct lect code. When run on the Feb 20 dump it produced 165 warnings of various sorts; these need to be fixed up by hand. See User:Benwing2/remove-redundant-chinese-translations-warnings. Benwing2 (talk) 04:05, 25 February 2023 (UTC)[reply]
@Benwing2 Thanks for this - I’ll take a look at the warnings. Theknightwho (talk) 17:29, 25 February 2023 (UTC)[reply]
@theknightwho: I noticed that {{och-l}} and {{ltc-l}} are also using Module:zh/link (which is totally unnecessary, and could be done using the standard link modules). Can you look into that? (I don't have the time for that now)Wpi31 (talk) 14:46, 25 February 2023 (UTC)m[reply]
I agree - the two templates never really had any good reason to exist in the first place, and can be fairly easily replaced. One thing I’m considering is whether the language objects should have a pronunciation method, as that might be more applicable for MC and OC. Theknightwho (talk) 17:22, 25 February 2023 (UTC)[reply]
@Theknightwho See also User:Benwing2/remove-redundant-chinese-translations-warnings-from-to. These are the remaining 114 warnings in a slightly different format. The lines in question are in the form <from> LINE <to> LINE <end>; if you correct the part *after* the <to>, and leave the part before it alone, I can run a bot script to update all the pages in question. Some of the pages need to be edited directly, in particular the ones with junk after the Chinese: part, but this should make it easier to fix things up. Benwing2 (talk) 21:47, 25 February 2023 (UTC)[reply]
@Benwing2 The link templates now seems to be also generating simplified forms for the rest of the Chinese lects. (I don't remember it doing that a few days ago) Can you run the bot job again for them? – Wpi31 (talk) 13:58, 28 February 2023 (UTC)[reply]
@Wpi31 Sure. Can you give me a couple examples where this is happening? Also User:Theknightwho can you verify what Wpi31 says? Benwing2 (talk) 22:32, 28 February 2023 (UTC)[reply]
@Benwing2: book/translations#Noun, pen (writing tool), horse, pig, apple, wind/translations#Etymology_1. I think this needs to be put on hold for now, since something needs to be ironed out after seeing these pages – it looks like the simplfied forms are generated only for the larger Chinese lects, i.e. yue, hak, nan, wuu? @theknightwhoWpi31 (talk) 03:04, 1 March 2023 (UTC)[reply]
PS: hsn, gan, zhx-teo, also generates simplified forms, but not for cdo, cjy, cpx, czh, czo (probably someone forgot to press save?)Wpi31 (talk) 03:22, 1 March 2023 (UTC)[reply]
@Wpi31, Theknightwho It looks like the c... lects are fixed. I am ready to run the bot to fix up the non-Mandarin lects, let me know if that's OK. Benwing2 (talk) 21:22, 1 March 2023 (UTC)[reply]
@Benwing2 @Wpi31 I agree this sounds good. The lects which now use automatic simplification are: cdo, cjy, cmn, cpx, czh, czo, dng, gan, hak, hsn, mnp, nan, wuu, wxa, yue, zhx-sht, zhx-tai & zhx-teo. When turning these on, I forgot to check Module:languages/data/3/c for other lects because Mandarin (cmn) had already been enabled, which was just an oversight; nothing to do with the size of the lects. Theknightwho (talk) 21:42, 1 March 2023 (UTC)[reply]

The generate_forms key is not in the documentation of Module:languages/data/2. Is it a new feature? Anyway I suggest the documentation page be updated. -- Huhu9001 (talk) 13:11, 26 February 2023 (UTC)[reply]

@Huhu9001 I'll add it shortly. I'm still in two minds as to whether there should be some way to specify different modules depending on the script(s) of the forms submitted by the user, which explains the delay. For example, Dungan uses automatic simplification, but would also benefit from having Cyrillic + Han displayed together as well (which would need to be facilitated separately). Theknightwho (talk) 21:45, 1 March 2023 (UTC)[reply]
@Wpi31, Theknightwho More warnings (204 of them): User:Benwing2/remove-redundant-chinese-translations-warnings-2. Benwing2 (talk) 23:15, 1 March 2023 (UTC)[reply]
@Benwing2 There seems to be some translations that the bot never got to clean up, for example ten thousand and time/translations. Wpi31 (talk) 06:56, 2 March 2023 (UTC)[reply]
@Wpi31 I suspect those are cases where the simplified and traditional don't match according to our tables. In such a situation, the bot currently silently ignores the mismatch; I'll do a run outputting warnings for these cases to make sure this is the issue. Benwing2 (talk)
@Wpi31 Actually caused by an off-by-one error resulting in skipping Translations sections that were the last section on the page. Also fixed some other issues; will rerun. Benwing2 (talk) 10:06, 2 March 2023 (UTC)[reply]
Also, a few cases where lects in translations were skipped due to not using automatic simplification (this concerns Literary Chinese, Old Chinese, Middle Chinese):
  • Page 179 of: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t|lzh|之|tr=zhī}} <to> *: Literary Chinese: {{t|lzh|之|tr=zhī}} <end>
  • Page 299 foot: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{tt|lzh|足}} <to> *: Literary Chinese: {{tt|lzh|足}} <end>
  • Page 361 eat: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{tt|lzh|餔}} <to> *: Literary Chinese: {{tt|lzh|餔}} <end>
  • Page 467 dark: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{tt|lzh|黲}} <to> *: Literary Chinese: {{tt|lzh|黲}} <end>
  • Page 599 lithium: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t|lzh|鋰}} <to> *: Literary Chinese: {{t|lzh|鋰}} <end>
  • Page 798 Jesus: Skipping lect Middle Chinese (ltc) not using automatic simplification: <from> *: Middle Chinese: {{tt|ltc|移鼠}} <to> *: Middle Chinese: {{tt|ltc|移鼠}} <end>
  • Page 914 homosexuality: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t|lzh|同性戀}} <to> *: Literary Chinese: {{t|lzh|同性戀}} <end>
  • Page 1448 fragrance: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t|lzh|馥}}, {{t|lzh|馨香}}, {{t|lzh|馝}}, {{t|lzh|馞}} <to> *: Literary Chinese: {{t|lzh|馥}}, {{t|lzh|馨香}}, {{t|lzh|馝}}, {{t|lzh|馞}} <end>
  • Page 2098 longan: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t|lzh|龍目}} <to> *: Literary Chinese: {{t|lzh|龍目}} <end>
  • Page 2124 Wikipedia: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{t-check|lzh|維基大典}} <to> *: Literary Chinese: {{t-check|lzh|維基大典}} <end>
  • Page 2403 sun/translations: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese: {{tt|lzh|日}}; {{tt|lzh|陽}}, {{tt|lzh|阳}}; {{tt|lzh|太陽}}, {{tt|lzh|太阳}} <to> *: Literary Chinese: {{tt|lzh|日}}; {{tt|lzh|陽}}, {{tt|lzh|阳}}; {{tt|lzh|太陽}}, {{tt|lzh|太阳}} <end>
  • Page 2410 you/translations: Skipping lect Old Chinese (och) not using automatic simplification: <from> *: Old Chinese: {{t|och|你|tr=nɯʔ}} <to> *: Old Chinese: {{t|och|你|tr=nɯʔ}} <end>

Benwing2 (talk) 01:54, 2 March 2023 (UTC)[reply]

@Theknightwho Any objection to cleaning up {{pinyin reading of}} to remove the simplified readings? Benwing2 (talk) 02:49, 2 March 2023 (UTC)[reply]
@Benwing2 Thanks for all this. Cleaning up {{pinyin reading of}} sounds like a good idea, along with {{yue-jyutping of}} as well. Theknightwho (talk) 03:59, 2 March 2023 (UTC)[reply]
@Benwing2: Good idea. Anatoli T. (обсудить/вклад) 04:30, 2 March 2023 (UTC)[reply]
@Benwing2: But please check cases where the automated form <> manual, if you can. Anatoli T. (обсудить/вклад) 04:37, 2 March 2023 (UTC)[reply]
@Benwing2: I think "Literary Chinese" is often misused in translations and elsewhere when they literary written Chinese language code, especially with pinyin transliteration, e.g. {{t|lzh|之|tr=zhī}}. This should be {{t+|cmn|之|tr=zhī}} instead with {{qualifier|literary}} but that just complicates things. This is equally applicable to other Chinese varieties like Cantonese, just a different translit, e.g. {{t|yue|之|tr=zi1}}. Anatoli T. (обсудить/вклад) 04:46, 2 March 2023 (UTC)[reply]
Personally, I think we should rename "Literary Chinese" to "Classical Chinese" anyway (which would clear up this problem, among other issues). Theknightwho (talk) 04:56, 2 March 2023 (UTC)[reply]
@Theknightwho, Atitarev Yes I'll check to make sure the simplified form given is actually the simplified equivalent of the traditional form given. Also I'm thinking of renaming {{pinyin reading of}} to {{cmn-pinyin of}} in the process; this will make it consistent with {{yue-jyutping of}}, with the corresponding headword template {{cmn-pinyin}}, and with other language-specific form-of templates. Benwing2 (talk) 05:00, 2 March 2023 (UTC)[reply]
@Benwing2 Agree - sounds good. Theknightwho (talk) 05:02, 2 March 2023 (UTC)[reply]
@Theknightwho, Atitarev, Wpi31 I am doing a test run now. A lot of warnings are coming out; the first 1,800 pages processed produced 275 warnings concerning mismatched simplified vs. traditional. Some of them are resolved by assuming the params are reversed, but a lot still remain. See User:Benwing2/clean-pinyin-jyutping-of-warnings-first-1800. Can you comment on a handful of these and let me know if there's some automatic way of resolving some of them? Note that the issues appear front-loaded, i.e. the first 13,000 pages only produced 353 warnings, not much more than the 275 warnings coming from the first 1,800 pages. Benwing2 (talk) 06:33, 2 March 2023 (UTC)[reply]
Oh yeah, one other issue: definitions in {{pinyin reading of}}. Some template invocations have definitions in them, e.g. dǎngùchún ("cholesterol"), àihù ("to cherish"), yìxuéxí ("e-learning"), Sìchuān ("Sichuan"). The |def= param is currently ignored by {{pinyin reading of}} but my script flags the unrecognized param. Should we just remove this param entirely or should we modify {{pinyin reading of}} to display the definition when provided? Benwing2 (talk) 06:38, 2 March 2023 (UTC)[reply]
@Benwing2: Yeah, remove the definitions. Anatoli T. (обсудить/вклад) 06:53, 2 March 2023 (UTC)[reply]
A number of these are listing the variant/alternative forms alongside the standard trad/simp characters, and sometimes they list them separately. For example, on fēng, is a variant form of (the main entry, which has no simplified); is (trad), with simplfied and alt form ; wěi has (main entry, in traditional) and variant , but the simplified is listed on a separate line. It should be easy to fix them if one could tell which case it falls under, but sadly that's the complicated part which I think requires manual work. – Wpi31 (talk) 06:45, 2 March 2023 (UTC)[reply]
Regarding the front-loaded problem, I believe that is due to the facts that the earlier pages are probably created manually, and that the character correspondances are more complicated for the more common characters, which tend to be created first. – Wpi31 (talk) 06:47, 2 March 2023 (UTC)[reply]
@Benwing2:
  1. ba - IS: Hanyu Pinyin reading of 吧, 罷/罢, 罢. SHOULD BE: 吧, 罷/罢. Not sure about the warning.
  2. me - remove line with 么 on its own. Some tools fail to recognise 么 as a simplified form. Treat cases with 么 the same way.
  3. è - just remove the last simplified 鳄 as with ba above. It looks good otherwise.
Anatoli T. (обсудить/вклад) 06:52, 2 March 2023 (UTC)[reply]
@Wpi31, Atitarev Thank you! One more thing is that currently {{pinyin reading of}} supports up to 10 numbered params, including two trad/simp pairs (1/2, 3/4) and 6 more additional forms. Very few pages use these params but there are a few:
I'm thinking in the new {{cmn-pinyin of}} we don't need this many params, and they should maybe instead be converted to multiple lines. Does this make sense? If so, how many params should be supported at the max? (There won't be simplified equivalents supported, just a set of numbered params to be displayed using {{l}}.) Benwing2 (talk) 07:05, 2 March 2023 (UTC)[reply]
@Benwing2: Personally, I don't mind if they are split into all multiple lines. It sort of makes sense to keep together alternative traditional forms, e.g. Táiwān:
vs
It's not a big deal, though, if they are split (IMO!). Anatoli T. (обсудить/вклад) 10:30, 2 March 2023 (UTC)[reply]
Of course, simplified forms need to be removed carefully, taking care of cases where simplified forms can be both trad. and simp. such as
Anatoli T. (обсудить/вклад) 10:34, 2 March 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Wpi31, Theknightwho See User:Benwing2/clean-pinyin-jyutping-of-warnings. These are all the warnings generated when converting {{pinyin reading of}} (471 warnings) and {{yue-jyutping of}} (34 warnings); total of 505 warnings. I went ahead and created {{cmn-pinyin of}} and will be doing a run tomorrow (= Thursday, US time) to convert the templates appropriately. Benwing2 (talk) 10:19, 2 March 2023 (UTC)[reply]

@Atitarev Thanks. For the moment I'm not splitting any lines. I wrote {{cmn-pinyin of}} to take up to five variants; we can expand it if more are needed. The conversion script is running; there are about 61,000 entries so it may take most of a day to finish. It doesn't delete any lines so cases like 后 that are both a traditional character in its own right and a simplified variant character won't be messed up. Benwing2 (talk) 01:13, 3 March 2023 (UTC)[reply]

Warnings galore edit

@Wpi31, Theknightwho, Atitarev I finished another run of my script to remove redundant translations. User:Wpi31 pointed out that my previous runs missed lots of pages; this run gets all these pages and also checks and outputs warnings for disagreement between traditional and equivalent simplified forms, rather than just silently skipping those cases. Unfortunately a lot of warnings got output (1,508 of them when processing 12,478 pages). I have split the warnings into three categories:

For the first and third categories, you should be able to speed up processing the warnings by editing the portion of the line after the <to> tag rather than directly fixing up the page in question. If you do that, let me know and I can do a bot run to push those changes to the appropriate pages. For the second category (junk after 'Chinese:' header), it may be necessary to actually edit the page to add the appropriate line; but it may be possible to speed up processing as well. Specifically, if you edit the portion of the line after the <to> tag and make it contain the appropriate text for a 'Mandarin:' line, I can write a bot script to insert that text in its own 'Mandarin:' line at the appropriate position.

Apologies for all the warnings; it seems things are often messy currently. Benwing2 (talk) 05:42, 3 March 2023 (UTC)[reply]

@Benwing2: Some comments on the 1st list.
  • kòngxián - 空閒 (trad), 空閑 (variant trad), 空闲 (simp)
  • GDP - not pinyin
  • liùyuè - 六月 or 6月 are variants. IMO, should not be liùyuè but 6yuè.
  • zháohuǒ - 著火 (trad), 着火 is both variant trad or simp
  • yù, ào - 隩 trad, 奧 is both variant trad or simp
English abbreviations or Arabic numerals shouldn't probably get pinyin entries but people do make them. Anatoli T. (обсудить/вклад) 10:10, 3 March 2023 (UTC)[reply]
@Atitarev, Wpi31, Theknightwho Thanks User:Atitarev for the detailed comments! It sounds like we need some more thinking around variant forms; the current trad//simp display might be insufficient. For example we might need tables mapping canonical traditional forms to their variant forms, as well as auto-display of variant forms under some circumstances (maybe a flag of some sort to {{l}} and {{m}}, which can be triggered automatically by {{t}}, {{cmn-pinyin of}}, etc.). Benwing2 (talk) 10:41, 3 March 2023 (UTC)[reply]
Regarding the variant traditional forms (which are standard in some regions and not to be confused with variant forms), they are usually ignored. It may be possible to autogenerate variant traditional forms, but for characters with complicated correspondences like 著/着 or 檯/枱/臺/台 it's better to not overcomplicate it and simply use the manual // syntax, which is what I did when fixing the translation templates. Wpi31 (talk) 11:08, 3 March 2023 (UTC)[reply]
Thanks for the bot job. I'll look into cleaning it up next week when I have time. – Wpi31 (talk) 11:02, 3 March 2023 (UTC)[reply]
PS Question: I assume the errors generated from the earlier runs are also included here? – Wpi31 (talk) 11:10, 3 March 2023 (UTC)[reply]
@Wpi31 Yes. They are separate from the warnings generated when converting {{pinyin reading of}} but subsume all previous translation-table-related errors/warnings. Benwing2 (talk) 19:20, 3 March 2023 (UTC)[reply]
BTW conversion of {{pinyin reading of}} to {{cmn-pinyin of}} is done. For invocations that couldn't be cleaned up properly, I went ahead and renamed the template and added |attn=1, which causes the page to categorize into CAT:Requests for cleanup in Hanyu Pinyin entries. Once the invocation is cleaned up, just remove the |attn=1. Thanks! Benwing2 (talk) 20:07, 3 March 2023 (UTC)[reply]
@Benwing2: I checked a couple of entries in your 2nd list of warnings. There were problems with translations. No nesting (Chinese/Mandarin), using only the simplified forms, etc. I fixed e.g. in wind_instrument#Translations and acclamation#Translations. --Anatoli T. (обсудить/вклад) 01:11, 6 March 2023 (UTC)[reply]
@Benwing2: Please bypass {{not used|zh}} or {{not used|cmn}}. It's to show that a term is not used in Chinese, e.g. "the". Anatoli T. (обсудить/вклад) 01:23, 6 March 2023 (UTC)[reply]
A large amount of the unrecognised lects listed there are Teochew (and a few Taishanese), which have their own language codes but are still a subvariety of Min Nan or Cantonese respectively. Thus they are sub-subindented under them, but I don't think that is the best way to display them given that their treatment has since then changed. Before I go and make any further changes (but the codes need to be changed in any case), can we agree on how to format Teochew and Taishanese in translations? @Justinrleung, RcAlex36Wpi31 (talk) 13:35, 6 March 2023 (UTC)[reply]
PS: I also see other lects including Hokkien and Sichuanese appearing in the translations despite we treat the entire thing as Min Nan. – Wpi31 (talk) 14:02, 6 March 2023 (UTC)[reply]
@Wpi31: I'm not entirely sure how it should be done. I kind of like the set up in {{zh-pron}} where nesting only occurs if both Hokkien and Teochew appear, not if only one of them occurs. This might be tricky to deal with in translations. — justin(r)leung (t...) | c=› } 22:17, 10 March 2023 (UTC)[reply]
Minnan is not a language, and I believe we try to organize translations by language. Indenting Teochew and Hokkien under Minnan would be indenting Russian and Ukrainian under East Slavic. ISO is considering a proposal to break up Minnan into languages, which will hopefully solve the problem for us by next year, but meanwhile IMO we shouldn't follow the ISO breakdown for Chinese. kwami (talk) 09:24, 11 March 2023 (UTC)[reply]

I just created Wiktionary:Quotations/Resources as a guide for finding quotations, aimed at both new and experienced users. If anyone has other resources that you like to use, please add them to the page. (pinging @CitationsFreak, who I assume loves citations) Ioaxxere (talk) 02:33, 12 February 2023 (UTC)[reply]

I've used Issuu and Genius in the past.
Issuu has mostly modern magazines, but I've seen some books and newspapers and manuals there. Here's an example: https://issuu.com/search?q=example .
Genius has mostly songs. This means that it has modern slang[note 1] (as anyone can make a song, publish it, and then write their own lyrics). It also has some non-song stuff (like Atticus Finch's closing speech in To Kill A Mockingbird). I think they even have scores for football (soccer) games, and even guides on how to create your own Genius lyrics! Of course, I (personally) just use it for song-lyrics checking[note 2]. Here's an example: https://genius.com/search?q=example
[note 1] And slang from any time since records became popular, as anyone could transcribe the lyrics to a novelty song that only sold, like, 10 copies back during WWII.
[note 2] Counting albums with only people talking as "songs". Three citations, for all senses. (talk) 03:30, 12 February 2023 (UTC)[reply]
Thanks for reminding me about Genius, I've been using it to cite "urban slang" like pept and crodie. Ioaxxere (talk) 03:37, 12 February 2023 (UTC)[reply]
@CitationsFreak I've added Issuu and Genius. If you like, make sure the info is accurate. Ioaxxere (talk) 04:50, 12 February 2023 (UTC)[reply]
I’m not sure we should list Reddit and Twitter. The last time there was a vote on this there was no consensus that these sites should be used for quotes. Better to focus on non-controversial resources, I think. — Sgconlaw (talk) 06:13, 12 February 2023 (UTC)[reply]
I think we can cite Twitter now. I've seen cases in RFV where a term was passed with three Twitter cites. Three citations, for all senses. (talk) 20:25, 12 February 2023 (UTC)[reply]
Only when it is specifically agreed on a case-by-case basis, hence the recent "CFI votes" etc. —Al-Muqanna المقنع (talk) 21:44, 12 February 2023 (UTC)[reply]
I think Genius could only technically be considered durably archived if the lyrics have appeared in print, or if the song was released as a single or in an album using physical media (vinyl record, CD, etc.). There are a lot of songs on Genius.com that seem to only exist on YouTube and the like. These can still be cited, but only under the caveats for online media. Please correct me if my understanding is wrong. 70.172.194.25 23:24, 16 February 2023 (UTC)[reply]
That's what I was thinking. I would say that a song that only sold 1 copy during the Great Depression would be considered more durably archived than a song with a million views that only exists on YouTube, all other things being equal. Three citations, for all senses. (talk) 23:49, 16 February 2023 (UTC)[reply]

Link for absolutive in template inflection_of edit

Could someone please correct the reference given for 'absolutive' in {{inflection of}}. It currently misdirects, for Pali, to w:absolutive, which is an article on the absolutive case, which is singularly inappropriate for verbs. This may also be an issue for Sanskrit. absolutive#Noun would be a good reference. --RichardW57m (talk) 14:30, 13 February 2023 (UTC)[reply]

@RichardW57m There is no current support for making a given tag display in different ways for different languages. However, we've already run into the issue you describe and the way it's currently handled is by making the inflection tag be written differently but display the same, e.g. we have an inflection tag 'terminative case' and another 'terminative aspect' and they both display as 'terminative'. We have an 'absolute' tag that links to Appendix:Glossary#absolute; if this isn't the same as absolutive#Noun then I can create another tag 'absolutive participle' or something, with appropriate display and abbreviation ('absp'?). Benwing2 (talk) 18:59, 13 February 2023 (UTC)[reply]
That's the advantage of using an explanation on Wiktionary - the entry for the noun 'absolutive' already covers both meanings. The tag 'absolute' doesn't link to Appendix:Glossary#absolute - and there's no such fragment! I found I'd been using the tag 'abs', without realising that it didn't lead to anything very obvious. (These have already been converted to 'absolutive'.) I don't like calling it a participle - etymologically it appears to be the instrumental case form of a verbal noun, and it undergoes no further inflection. For the absolutive, I'd prefer a display form of 'absolutive', and for the inflection tags I'd prefer 'absvf', with long form, if needed, of 'absolutive verb form'. --RichardW57m (talk) 13:49, 14 February 2023 (UTC)[reply]
If you're planning to do the conversions yourself, note that the inflection tag 'absolutive' is used by both {{inflection of}} and its non-Roman Pali front end, {{pi-nr-inflection of}}. It's probably simpler to let me know when the new tag is available, and I can do the change myself - there aren't many entries for the Pali absolutives. --RichardW57m (talk) 13:49, 14 February 2023 (UTC)[reply]
Could you please set up, for maintenance purposes, maintenance categories (perhaps just one) to catch the use of the inflection tags 'abs' and 'absolutive' for Pali inflections so as to catch their inappropriate use. --RichardW57m (talk) 13:49, 14 February 2023 (UTC)[reply]
At the second attempt, I think I've found the relevant code at Module:form of/data2. I'll see if I can implement this tonight as @RichardW57. Of course, another solution would have been to change the label to 'gerund', though a Slavic and an Indic 'gerund' are quite different adverbs. --RichardW57m (talk) 15:27, 20 February 2023 (UTC)[reply]
The new inflection tag (sane input: 'absvf') has now been created. I haven't yet tried setting up the maintenance categories, which I think are needed as the use of the wrong tag is not blindingly obvious and in practice reading documentation is usually a last resort. I've fixed the 23 absolutive terms to use 'absvf'. --RichardW57 (talk) 08:44, 21 February 2023 (UTC)[reply]
Test words: disvā and ປຫາຍ (pahāya). RichardW57m (talk) 15:52, 14 February 2023 (UTC)[reply]

Universal Code of Conduct revised enforcement guidelines vote results edit

The recent community-wide vote on the Universal Code of Conduct revised Enforcement Guidelines has been tallied and scrutinized. Thank you to everyone who participated.

After 3097 voters from 146 Wikimedia communities voted, the results are 76% in support of the Enforcement Guidelines, and 24% in opposition. Statistics for the vote are available. A more detailed summary of comments submitted during the vote will be published soon.

From here, the results and comments collected during this vote will be submitted to the Board of Trustees for their review. The current expectation is that the Board of Trustees review process will complete in March 2023. We will update you when their review process is completed.

On behalf of the UCoC Project Team,

Mervat (WMF) (talk) 21:21, 14 February 2023 (UTC)[reply]

Discussion moved to WT:RFDO.

Simplification edit

There is a proposal on meta that would substantially simplify page structure and reduce risk for mistakes. Taylor 49 (talk) 18:07, 15 February 2023 (UTC)[reply]

That would be a very useful feature, thank you for raising that issue. JeffDoozan (talk) 18:32, 15 February 2023 (UTC)[reply]

en.wikt's options for lemmatization approach for prevocalic forms of prefixes edit

Seeking people's preferences on the following options, regarding lemmatization approach for prevocalic alternative forms of prefixes:

Right now en.wikt largely follows a pattern whereby the following pages are separate, and the cats have hyperlinked cross-references to each other:

I have been proceeding according to the approach above. Recently I saw where another editor deleted one of the prevocalic categories and changed the etymology sections' {{confix}} or {{affix}} parameters (for example, for |en|rhiz-|-ome to become |en|rhizo-<alt:rhiz->|-ome), most likely because they feel that all derived forms from what is (in lemmatic essence) the selfsame prefix should fall into a unified category. That desire is laudable; the only question is which approach en.wikt would convene upon as its standard, if any consensus exists. To me it seems appropriate that if en.wikt has separate entries for the alt forms at all (many dictionaries do not; some do things such as headword "rhiz(o)-"), then it makes sense to retain separate categories for each and cross-reference them to each other. The advantage is that it is transparently clear at a glance which derived terms come from which alt form, which has a certain small philologic value. But if a consensus develops to retain separate entries but have the etymology sections link to a lemma form for categorization, I will follow suit and will make changes in that direction in future.

Thanks for any thoughts or tips. Regards, Quercus solaris (talk) 04:09, 16 February 2023 (UTC)[reply]

For Sanskrit, compounds are given in terms of morphemes, and explaining the assimilation is generally (perhaps always) left as an exercise for the reader. For example, no explanation is given of why the suffix -अन (-ana) often surfaces as -अण (-aṇa), though the explanation may be given when the latter variant, which at least gets a mention under the lemma form, gets an entry. (In this case the rule is so pervasive that one can't decline Sanskrit nouns without knowing it.) For Pali, treatment mostly follows the same pattern, though we seem mostly to be allowing a decent collection of Pali terms to build up before documenting them. (I don't entirely trust the text books - most of them are aimed at teaching the understanding of Pali.) --RichardW57m (talk) 11:54, 16 February 2023 (UTC)[reply]
For these English prefixes, I would select the longer form as the lemma and treat the forms with elision as variants, and call out any examples where expected elision fails as exceptions in the etymology of the examples. --RichardW57m (talk) 11:54, 16 February 2023 (UTC)[reply]
Sounds good, thanks. In the case of en.wikt's handling of EN's use of ISV prefixes, it is good that it is already the case that the longer form (i.e., the -o- form) is the main entry, and the prevocalic form points to it via {{alternative form of}}. The remaining question is how to handle the etymology of each derived term (for example, for |en|rhiz-|-ome versus |en|rhizo-|alt1=rhiz-|-ome) and thus how the autocats will be handled. TBD whether any consensus will materialize here/now (in this thread). If not, then perhaps for now this aspect will simply remain unstandardized (notwithstanding an inconsistency that is fairly venial anyway). In the meantime I will aim at least to finish ensuring that each sibling pair of autocats consistently cross-references (between each other). Someday, as I could envision, someone might impose a consistent method on the categorization aspect (whether me or someone else; the biggest theme regarding "who" is "whoever would bother to implement it, either way, scut-wise"); the rationale at that point (for which method to impose) would be "no one else had a strong preference, so flip a coin, then stick with that result afterward." Quercus solaris (talk) 18:52, 16 February 2023 (UTC)[reply]

Whether IPs can participate in CFI discussions edit

Bringing this up since I'm not sure about the best solution myself.

This is the relevant part of WT:CFI:

  • "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks."

The issue is what we count as an "editor". Under Wiktionary:Voting policy#Voting eligibility, an account is a requirement for voting. Below that, we have:

  • "Where the consensus of editors is required for discussions other than formal votes at Wiktionary:Votes (for example, in discussion rooms such as Wiktionary:Beer parlour and on discussion pages such as Wiktionary:Requests for deletion and Wiktionary:Requests for verification), the support of at least two-thirds of the editors taking a supporting or opposing stance in a discussion on an issue is a hint for the threshold for consensus, but it is not set in stone. As a result, the consensus determination is somewhat indeterminate and can take into account considerations other than pure tallying. Tallying does play a role."

Since using the literal definition of "editor" meaning "anyone who has ever edited, including vandals" would be awful, the assumption would be equate "editor" and "user who can vote".

However, a proposal to forbid IPs from participating in RFDs, which are similar to CFI votes, failed 6-8.

Arguments in favour edit

  • IP editors which establish themselves in the community are essentially equivalent to regular users, and often have a better knowledge of policy.

Arguments against edit

  • Many [6] CFI votes take place in WT:VOTES, which is inaccessible to IP editors. It would be easy to game the system by forum shopping (choosing where to hold the votes to try to get a better result).
  • It would be easy to hijack RFV discussions with a swarm of editors, e.g. by linking a vote to 4chan.
come on why is it always 4chan you people get on.. actually 4chan has a policy against brigading, they'll delete your links, reddit doesn't, and discord doesn't enforce any rules + is entirely deep-net Fishing Publication (talk) 14:31, 28 February 2023 (UTC)[reply]

Possible solutions edit

  • Allow any IP to participate
  • Allow some IPs to participate, maybe with edit requirements
  • Allow only users to participate
  • Hold all CFI votes in WT:VOTES in the future (equivalent to above)

Ioaxxere (talk) 21:48, 16 February 2023 (UTC) Ioaxxere (talk) 21:48, 16 February 2023 (UTC)[reply]

Regarding the argument in favour, how can it be ascertained whether a particular IP address represents a single individual? — Sgconlaw (talk) 22:33, 16 February 2023 (UTC)[reply]
The approach of sending these terms to WT:VOTES was an initiative of one user and was mostly disliked by the community. At least five of the abstentions on Wiktionary:Votes/2022-05/elfism validation pointed out that (in their view) it was a misuse of the formal vote process, and you can find similar comments on Wiktionary:Votes/2022-05/melanoheliophobia validation, etc. I don't think that would be a widely accepted solution. People wanted these discussions to be held on the RfV page itself.
I'm not sure why we need a specific policy on this. There's no specific policy saying unregistered users can't comment on RfDs or RfVs in general. Why would CFI discussions on whether to accept online sources be special?
It might be the final motivation I needed to create an account if the community decides I can't fully participate in RfV... or maybe I'll be content to watch from the sidelines and work on other things. IDK. 70.172.194.25 23:05, 16 February 2023 (UTC)[reply]
I agree with the IP here. I don't see the point of holding formal votes for CFI attestation discussions; it seems overkill. Furthermore this seems like a solution in search of a problem. We have one well-known IP editor who seems to have a static IP address and contributes to RFD/RFV discussions, and I haven't seen very many (if any) other IP's contributing like this. If and when we get unknown IP editors contributing to RF* discussions en masse, we can revisit the issue (or just ignore the unknown IP's). Benwing2 (talk) 06:04, 18 February 2023 (UTC)[reply]

Proto-Romance pronunciations in attested words edit

@Kwékwlos has been systematically adding Proto-Romance pronunciations to attested Late Latin words such as ceresia, cisorium, and rasorium. Similarly, Proto-Italo-Western pronunciations for portaticum, campania and Proto-Gallo-Romance for missaticum.

I can't say that I find the idea fundamentally incorrect. These pronunciations do seem more plausible for the period(s) in question—as far as popular speech is concerned, that is—than one based on Cicero's contemporaries, or one based on modern Italian clergy. I worry, however, that this may be a bit too 'adventurous' for the purposes of Wiktionary. A step too far, as it were. Thoughts?

Pinging @Ser be etre shi, Al-Muqanna, Catonif, Hazarasp, Ultimateria, Fay Freak as potentially interested parties.

Nicodene (talk) 02:05, 17 February 2023 (UTC)[reply]

  • If the issue is that these are reconstructions: Our Classical Latin pronunciations could be called reconstructions too, even if we can be pretty confident in them based on various forms of evidence, so I'm not sure that's enough of a reason to exclude these (not to mention that we also include reconstructed IPA for Old Chinese, Ancient Egyptian, and so forth in mainspace).
  • If the issue is that the reconstructions for Proto-Romance and the other branches are highly uncertain or contentious, I could see that as a potential problem.
  • If the issue is rather that Proto-Gallo-Romance */meˈsad͡ʒo/ shouldn't be treated as belonging to the same chronolect as Latin missaticum, then I can also see that as a potential problem. However, it seems like we do treat Proto-Gallo-Romance as a form of Latin in the reconstruction namespace, and even include IPA when doing so: e.g. Reconstruction:Latin/leviarium. If it's appropriate there, why would it not be appropriate in mainspace? Maybe it shouldn't be the only IPA we provide, but I wouldn't personally have an issue with including it alongside the Classical or Ecclesiastical pronunciations (as relevant). That said, I could also see a case for splitting Proto-Gallo-Romance off from Latin, like how Proto-West Germanic is split off from Proto-Germanic, but that's apparently not what the Romance editing community has chosen to do so far.
  • I don't know much about Latin or the Romance languages so these are mostly just intuitive off-the-cuff responses, feel free to ignore. 70.172.194.25 03:07, 17 February 2023 (UTC)[reply]
    The 'proto-pronunciations' are based mainly on the comparative method as applied to Romance (albeit with nebulous support from contemporary spelling mistakes), whereas the reconstructed Classical pronunciation is based mainly on evidence from the relevant period. Hence the former are 'at home' in entries reconstructed from Romance data, whereas the latter is so in attested entries, at least for time periods where a Classical pronunciation makes sense. I suppose that is the main difference in the end.
    I have thought about splitting up the proto-languages, actually, but thought that would be rather contentious. It isn't always clear which proto-language a given reconstruction should belong to, and there are a fair amount of scholars who reject concepts such as 'Proto-Gallo-Romance' or 'Proto-Italo-Western Romance' (which entails also rejecting the branch model for Romance). How does all this go over in the Germanic community, I wonder. Nicodene (talk) 05:01, 17 February 2023 (UTC)[reply]
I generally support adding such reconstructed pronunciations where appropriate—it makes no sense, IMO, to only add them when a term isn't attested, i.e. when it's in the Reconstructions space. My ideal preference would actually be for {{la-IPA}} to generate a full suite of pronunciations over time similar to the pronunciation module for Ancient Greek. —Al-Muqanna المقنع (talk) 20:27, 17 February 2023 (UTC)[reply]
In my opinion, the main reason it’s helpful to display a reconstructed Classical Latin pronunciation and an Ecclesiastical Latin pronunciation on Latin entries is because both are common pronunciation styles used by present-day learners and speakers of Latin. That motivation doesn't apply to Proto-Romance, Proto-Italo-Western Romance, Proto-Gallo-Romance, Proto-Iberian-Romance etc. reconstructed pronunciations: it would be very fringe for any contemporary user of Latin to pronounce Latin words in that manner. As the list above indicates, there are many steps along the way from Latin to modern Romance. The best way for interested parties to get a complete and accurate picture of how their pronunciation evolved is to use a source that comprehensively describes the relevant sound changes. I think the normalized Latin spelling provides all of the important lexicographical information about the form of words of this type that had a phonetically regular development; the detailed history of their pronunciation is a matter better covered by other types of resources than dictionary entries. (Furthermore, there are disagreements among scholars about details.) For comparison, we don’t include additional pronunciations on Old English entries showing stages between Old English and Middle English, or on Sanskrit entries showing stages between it and its descendants.
At the same time, I do think that it can be misleadingly anachronistic to include the reconstructed Classical Latin pronunciation on pages for medieval Latin words or Late Latin words that may never have been pronounced that way. To me, it seems fine to just omit pronunciation information for words of this type. But I’m not strongly opposed to the practice of including reconstructions like */meˈsad͡ʒo/ in pace of the reconstructed Classical Latin pronunciation in such cases, with an appropriate label describing the language stage that the reconstruction is supposed to represent (as in the examples).
I’m pretty strongly opposed to adding later pronunciations like this to the la-IPA template based on how it worked out in practice when we had a “Vulgar Latin” pronunciation in that module: it was included inappropriately on miscellaneous pages (at one point, it was inexplicably added to the article Status Uniti Americae) and the actually implemented pronunciation had several bugs that could not be fixed because there were no clear criteria in the first place for what it was meant to represent. Templates do have advantages, in particular consistency, so I would say having a separate template specifically for Late Latin or Proto-Romance pronunciation might be helpful; my main question would be if we’re dealing with a small enough number of nodes on the Romance tree to make a template of that kind feasible.--Urszag (talk) 02:34, 18 February 2023 (UTC)[reply]
I tend to agree with User:Urszag here. I don't really see the point of most reconstructed pronunciations, e.g. I'm generally opposed to Proto-Germanic pronunciations since (a) the spelling indicates pretty clearly the pronunciation, (b) any type of narrow pronunciation will not represent a scholarly consensus. Benwing2 (talk) 06:07, 18 February 2023 (UTC)[reply]
If we had a phonemic spelling for Reconstruction:Latin, as we do for Proto-Germanic, there wouldn't really be a reason to provide phonemic transcriptions. We're stuck, however, with Classical spellings (as a matter of policy), even for terms that might as well have originated in a community of Balkan shepherds in the eleventh century.
I don't recall seeing a narrow reconstructed pronunciation. Except our Classical ones. Nicodene (talk) 06:42, 18 February 2023 (UTC)[reply]
Inappropriate inclusion is always going to be an issue regardless. I would consider the Classical pronunciation currently listed at Status Uniti Americae to be equally inexplicable, and the "Ecclesiastical Latin" pronunciations currently given are really a 19th-century (Italianate) style that's not universal in the present-day Catholic Church, let alone trans-historical or generally appropriate for post-Classical Latin as the label by itself might imply. If the main purpose of the pronunciation template is the utility of modern speakers of Latin then that probably needs to be indicated somewhere, since it conflicts with what I'd take as the more intuitive understanding that the Classical pronunciation simply describes how the word was pronounced in the Classical era. A formally reconstructed Classical pronunciation will not necessarily be the same as a conventional "classicising" pronunciation by a modern speaker. —Al-Muqanna المقنع (talk) 09:30, 18 February 2023 (UTC)[reply]
Are you suggesting that the ‘Ecclesiastical’ label should be removed entirely and replaced by ‘Italianate’? I’d say that it’s definitely standard to pronounce words like caeli in an Italian fashion in Anglo-Catholic churches at least (though ‘Italianate’ would do as a label instead, I suppose). Overlordnat1 (talk) 10:25, 18 February 2023 (UTC)[reply]
Yes but it is not standard in e.g. Latin services in Germany. Its current use in England is a product of 19th-century ultramontanism which subsequently also bled over to Anglo-Catholicism. (There is some info on Wikipedia about the traditional English pronunciation preceding it if you're curious.) Italianate is the correct name for the pronunciation we give, “Ecclesiastical Latin” is far too broad and implies it is much older and more generally adopted than it is. So it should ideally be changed. —Al-Muqanna المقنع (talk) 10:28, 18 February 2023 (UTC)[reply]
This is very acute of us. Our pronunciation information about “Ecclesiastical Latin” is a virtual concept essentially created by the Anglo-centrism of Wikipedia which does not correspond to our use of “Ecclesiastical Latin”, e.g. as that Medieval Spanish one I thought of when deriving Spanish mantel. I always found it odd because in Germany, or Poland or whatever, this pronunciation has been used in no context, and I have only ever heard it in the Modern Latin scene attached to a Anglo-centric social media sphere, which of course often travelled to Italy and connected with its churchpeople (as in Germany, this pronunciation unused in church would be marked as Italian!), so this turns out the reasoning pertinent to this distribution. Fay Freak (talk) 14:05, 18 February 2023 (UTC)[reply]
After considering the above, I think I won't actively oppose others' including 'proto-pronunciations' in the mainspace, so long as they are properly done, chronologically plausible, and marked with {{a}} rather than {{lb}} (to avoid putting attested terms in reconstruction categories).
Urszag brings up a fair point about anachrony. It should be clarified that what Wiktionary simply labels as 'Ecclesiastical' is in fact 'modern Roman' or similar.
For words that are specifically Late or Medieval Latin, my inclination would be to show no pronunciation. Informed readers wanting to read a given word out loud in some modern pronunciation will have no difficulty using the spelling with macrons to do so (all modern pronunciations of Latin are, fundamentally, based on this type of reading). On the other hand, an uninformed reader, seeing a Classical pronunciation on an entry for, say, a term that originated in tenth-century England, would be easily misled to believe that that is how the term was actually pronounced at the time.
Agreed that it would be desirable to have something equivalent to our system for Greek, showing for instance 'fifth-century Roman' and other (properly researched) pronunciations. Preferably phonemic- as, incidentally, our Classical pronunciations also should be. Nicodene (talk) 18:51, 18 February 2023 (UTC)[reply]

Zhomron sockpuppetry edit

Greetings, is this the correct place to bring behavioral, user-related problems for administrator attention?

If any admin here has CheckUser ability, or would simply like to investigate this on behavioral evidence, I'd appreciate it if you had a look soon. Thanks, and kind regards. Elizium23 (talk) 18:53, 17 February 2023 (UTC)[reply]

This seems to be   resolved thanks to Chuck Entz. 70.172.194.25 02:54, 19 February 2023 (UTC)[reply]

splitting WT:RFVN again (this time for real) edit

Last June in Wiktionary:Beer parlour/2022/June#The State of WT:RFDN it was proposed to split out Romance and/or Romance+Latin (=Italic) and/or Romance+Latin+Greek from WT:RFVN. It was also proposed recently to split out reconstructed terms. I only see 19 reconstructed terms currently in WT:RFVN and many more Romance terms. What do people think? My instinct is to leave Latin and Greek out of Romance, but maybe someone disagrees. Benwing2 (talk) 06:49, 18 February 2023 (UTC)[reply]

I'm skeptical. There's a large body of Latinate vocabulary in all of the languages of Europe- not just the Romance languages. There's also taxonomic nomenclature, which is explicitly (for plants and animals, anyway) based on Latin, and musical terminology based on Italian. Then there are the pidgins, creoles and mixed languages- some would even include Middle English as one of those. Anglo-Norman is more clearly fuzzy in that respect, though. Yes, there was Vietnamese to complicate things for the CJK(V) split, but the dividing lines for Romance are a lot blurrier. As for excluding Latin: we can't even decide what to treat as Latin vs. Proto-Romance half of the time. Chuck Entz (talk) 07:53, 18 February 2023 (UTC)[reply]
These seem like a lot of straw-man arguments. The definition of Romance is pretty clear. Proto-Romance is a special case that should go wherever Latin goes. Latinate vocabulary in non-Romance languages has no bearing on Romance language RFV's; the idea of splitting out Romance is that we have a lot of Romance-language RFV's and people who know one Romance language often know another and can assist. As for creoles etc., it's not like we have very many RFV's involving creoles or pigeons, and if we do, and it's not clear where to put them, it doesn't matter so much as long as it goes somewhere reasonable. Benwing2 (talk) 08:09, 18 February 2023 (UTC)[reply]
Can we still split off reconstructions... please Vininn126 (talk) 09:04, 18 February 2023 (UTC)[reply]
@Vininn126 Sure, no objections here. Should it be reconstructions only or include dead languages in general? Benwing2 (talk) 11:04, 18 February 2023 (UTC)[reply]
(There was a discussion before on that and it had overwhelming support).
I think it's an interesting idea to merge them, but reconstructions are inherently different since there are no quotes. I think it should be separate. Vininn126 (talk) 11:10, 18 February 2023 (UTC)[reply]
@Vininn126 OK, I'll see if anyone else comments and if not I'll go ahead and execute the split in a couple of days. Benwing2 (talk) 20:22, 18 February 2023 (UTC)[reply]
Wiktionary:Beer_parlour/2023/January#A_verification_page_for_the_Reconstruction_namespace Just for reference. Vininn126 (talk) 07:57, 19 February 2023 (UTC)[reply]
I think I'd prefer to split out Italic (including both Romance and Latin but not Greek), which would avoid any arguments about Proto-Romance. As for reconstructions, someone pointed out in some recent discussion that they cannot really follow the normal RFV process anyway (by definition, there aren't any attestations for reconstructed words) so it seems like reconstructed terms would more properly go on a special RFD subpage, not an RFV subpage.--Urszag (talk) 03:01, 19 February 2023 (UTC)[reply]
I don't particularly care for whether Greek is included but would prefer to have Romance and Latin together. I'm perhaps a more 'extreme' case of overlap between the two than is typical but am far from an exception in that regard. Nicodene (talk) 04:20, 19 February 2023 (UTC)[reply]
Apologies for not responding to you on the prior thread, but I support the proposal as you've proposed it. AG202 (talk) 10:00, 19 February 2023 (UTC)[reply]
@Vininn126 I split out RFVR. But what about reconstructed terms in WT:RFDN? Should we have a separate WT:RFDR or combine both into WT:RFVR? Also, I have split out Italic, which leads to the question of what to do about reconstructed Italic-language terms in Latin, Old Portuguese, etc. Do those go into Italic or Reconstruction? Benwing2 (talk) 06:24, 23 February 2023 (UTC)[reply]
@AG202, Urszag, Nicodene Benwing2 (talk) 06:49, 23 February 2023 (UTC)[reply]
I'd like to get these moved as well - in the original thread it was suggested we call it RFD Reconstructions, or we keep it as RFV, since reconstructions will never have quotes, I think it makes more sense to have them all under an RFD page.
I can see the counter argument for having two since some pages are OR and need cognates to be established. Vininn126 (talk) 08:58, 23 February 2023 (UTC)[reply]
@Vininn126 OK I'll put them all under an RFDR page. Question: What about reconstructed terms in extant languages? Cf. Wiktionary:Requests for deletion/Non-English#Reconstruction:Latin/consutura, Wiktionary:Requests for deletion/Non-English#Reconstruction:Latin/retina, Wiktionary:Requests for deletion/Non-English#Reconstruction:Old Portuguese/africão, etc. Do they go under RFDR or RFDI? Benwing2 (talk) 19:52, 23 February 2023 (UTC)[reply]
Since the reasoning for splitting out reconstructions applies whether or not the language in general is attested they should be under RFDR, I think. —Al-Muqanna المقنع (talk) 19:58, 23 February 2023 (UTC)[reply]
I believe RFDR. Vininn126 (talk) 20:01, 23 February 2023 (UTC)[reply]
OK I have split out reconstructed terms into WT:RFDR and Italic terms to WT:RFDI. All that remains is to fix up Wiktionary:Request pages. Benwing2 (talk) 23:59, 23 February 2023 (UTC)[reply]
It's beautiful. A man could cry. Vininn126 (talk) 00:03, 24 February 2023 (UTC)[reply]
@Benwing2, Fenakhay: Just a heads up that the links created on pages by the template {{rfv}}, thereafter its documentation, have to be adjusted. Fay Freak (talk) 09:32, 1 March 2023 (UTC)[reply]

THUB amendment edit

I've created a vote. Feel free to suggest other changes. PUC11:25, 18 February 2023 (UTC)[reply]

What’s the rationale for the proposed change? — Sgconlaw (talk) 20:38, 18 February 2023 (UTC)[reply]
@PUC Agree with sgconlaw, I'm support of a change, but I can't bring myself to vote on it if I don't know the rationale. It seems rather sudden. AG202 (talk) 00:29, 19 February 2023 (UTC)[reply]
@Sgconlaw, AG202: I've added a rationale on the vote page. Please reword it if necessary. PUC12:36, 19 February 2023 (UTC)[reply]
You might want to explain why smoked salmon is useful. Drapetomanic (talk) 14:24, 19 February 2023 (UTC)[reply]
Yeah, the justification looks circular to me: it seems to just say that some people have been ignoring the policy in delete discussions so the policy should be changed. The ongoing RFD discussion for smoked salmon also doesn't show much consensus for its usefulness as an entry at this stage. Is the idea to force a vote for their benefit and leave the rationale to them? I'd oppose the change in its current form since I don't think we should be opening the floodgates to entries for every English noun phrase that happens to be written as one word in German and Swedish. —Al-Muqanna المقنع (talk) 15:43, 19 February 2023 (UTC)[reply]
Bella Coolla would allow whole sentences through. For German and Swedish, and for that matter Sanskritic languages, we're relying on self-denial already to keep the floods out. But don't we want translations for 'horses, elephants and men'? RichardW57 (talk) 23:21, 19 February 2023 (UTC)[reply]
Well, the sentence The attested English term has to be common gets out of that pit .... sure, in theory we could create a page for a word in a polysynthetic language that translates to "with my car, I drove her to the airport" but it would still automatically fail the criterion of being a common phrase in English. Soap 07:45, 20 February 2023 (UTC)[reply]
I'd rephrase this as "some people have been ignoring the policy in delete discussions, so we should consider whether the policy still has the support of the community or should be changed". Megathonic (talk) 22:19, 20 February 2023 (UTC)[reply]
The part about deletion discussions was removed, which is better, but the new list of examples is a bit of a grab-bag and not wholly relevant to the point: the newly-created water shortage does not fall under this criterion, since at least the Japanese, Polish and Russian translations are not closed compounds identical per component to the English; the same goes for the languages with cognates of sideropenia at iron deficiency; breast cancer was kept for other reasons (see the discussion at Talk:breast cancer). —Al-Muqanna المقنع (talk) 23:59, 19 February 2023 (UTC)[reply]
@Al-Muqanna: I've removed the offenders and added some new examples fetched from Category:RFD result (failed). Please comment on these. heat-resistant is imo a particularly compelling case. PUC19:53, 20 February 2023 (UTC)[reply]
@PUC There are already two non-SOP translations given for heat-resistant, so this would qualify regardless. Benwing2 (talk) 21:42, 20 February 2023 (UTC)[reply]
I just want to make sure Im understanding one thing in particular. We currently have the qualification At the very least, two qualifying translations must support the English term. Editor judgment can require a higher number, on a case-by-case basis. Does this mean that to create a THUB entry, there must be two different languages that have a translation for that term? Or is this a reference to our general-purpose CFI guidelines, meaning that two uses of the term in a single language are enough to qualify for each entry, and that only one language is needed? This is established policy and perhaps I should know this already, but for my benefit and others' I'd like to make sure it's clear to all of us what this line means. Thanks, Soap 07:53, 20 February 2023 (UTC)[reply]
Your first interpretation is right (“two different languages that have a translation for that term”). Technically, one could argue that one language with two qualifying translations would also work under the literal wording, but that’s not how this has generally been understood and applied AFAIK. Number of quotations is not directly relevant, although of course the qualifying translations should be attestable. 70.172.194.25 19:28, 20 February 2023 (UTC)[reply]
Okay thanks. I think this is good, and also oppose the recent change in wording that moves the barrier from two to ten languages .... really, I don't see any reason to worry about a flood of spurious words. I've posted on the talk page of the vote since I think that it might spawn a conversation of its own that would outlast this thread. Soap 23:22, 20 February 2023 (UTC)[reply]
Personally I don't see the slightest problem with phrases such as "animal rights activist" being proliferated. These are typical word combinations in the English language, which in and of itself makes them useful (you wouldn't say "animal rights advocater"), and they're also useful for looking up how one would articulate that phrase in another language. It's not like we have a lack of space for entries, and this won't change that an English multi-word term can't be added if it's not common in the language. Thus there will be no "animal garden" entries for Tiergarten or words which are translatable as entire sentences. Megathonic (talk) 22:31, 20 February 2023 (UTC)[reply]
For my part, I don't see why we should have "car key" just because there exists "Autoschlüssel", so I wouldn't support this change (but if other people do, then sure, let's vote on it). - -sche (discuss) 23:09, 20 February 2023 (UTC)[reply]

Header when an abbreviation does not correspond to a part of speech edit

I see that the 'abbreviation' heading is deprecated, because we should use the part of speech instead. But what should we do when there is no set part of speech? I'm adding stenoscript abbreviations, and often they're for a word and its derivations, or for homophones, or for both independent words and affixes, and so may be for multiple parts of speech. Please ping. kwami (talk) 04:14, 19 February 2023 (UTC)[reply]

@Kwamikagami: can you provide some examples? — Sgconlaw (talk) 05:32, 19 February 2023 (UTC)[reply]
In the book I'm checking at Internet Archive, only two of the more lexically specific abbreviations are expanded in any detail. One of these is 'og', which they list as organize, organized, organizes, organizing, organizable, organization, organizational. That's an illustrative example for the rest. That is, the more lexically specific abbreviations stand for a root and any derivations of that root, rather than for a fixed set of words, and the specific derivation (and part of speech) is determined from context. I suppose we could list 'og' four times for noun, verb, adjective and adverb (organizationally is understood, it just wasn't mentioned), but the rest of the abbreviations are defined by analogy, so it would be OR for us to try to list all the possibilities or even all the parts of speech.
The monoliteral abbreviations generally have more exhaustive definitions. E.g. 'b' is be, been, being, by, bye, buy, but. That would be a bit more straightforward to list multiple times for different parts of speech, but it would be more helpful to the person looking it up if they were all listed together. kwami (talk) 05:51, 19 February 2023 (UTC)[reply]
Shouldn't the senses be grouped at the L3 level by an 'Etymology' header? --RichardW57 (talk) 14:22, 19 February 2023 (UTC)[reply]
I've seen that, but I've also also seen etymology sections that group all abbreviations together. If an abbreviation has many expansions, granting a separate etymology section to each could get out of hand. 70.172.194.25 15:55, 19 February 2023 (UTC)[reply]
That firstThe second type is what I was suggesting. For example, an obvious etymology for the 'b' is "Formed as first letter of a short word; for be also assisted by being homophonous.". (One would need to check whether the second clause is true.) The structure is made to serve man, not man to serve the structure. --RichardW57 (talk) 16:03, 19 February 2023 (UTC)[reply]
Ah, sorry for the confusion. I agree that this is a reasonable approach. 70.172.194.25 16:40, 19 February 2023 (UTC)[reply]
What are you thinking of for the definition line under an Abbreviations header? —Al-Muqanna المقنع (talk) 15:46, 19 February 2023 (UTC)[reply]
I've been separating words, affixes and sound/letter sequences as 3 different definitions, and occasionally listing the words under more than one. But there's also the question of which template to use under the header to replicate the lemma. If I use 'abbreviation' I get a warning that that template is deprecated, but I'm still able to use it. Presumably someone will get pinged that I did so and will come to review it, so I suppose that should be acceptable. I've also been using 'symbol', but that's not good practice. kwami (talk) 21:51, 19 February 2023 (UTC)[reply]
@Kwamikagami I'd prefer a POS other than 'abbreviation', otherwise we will have to try to distinguish "acceptable" from "deprecated" uses of 'abbreviation', which will be hard for bots to handle. Maybe 'polysemous abbreviation'? The abbreviations you're working with are of a special sort and I'd like to avoid people thinking it's OK to go back to using 'abbreviation' for regular abbreviations. Benwing2 (talk) 03:24, 20 February 2023 (UTC)[reply]
@Benwing2 Makes sense to me. It looks like we would need to add that as an option for the 'head' and 'head-lite' templates. kwami (talk) 04:45, 20 February 2023 (UTC)[reply]
I would put all the separate parts of speech. After all, organization and organizational and organize are not the same word and not used the same way in a sentence, so lumping them on one line doesn't really work. I think these cases are quite rare. Equinox 03:26, 20 February 2023 (UTC)[reply]
Thinking about it, I agree with Equinox. In most cases there would only be a few part-of-speech headings anyway (for example, adjective and noun). I’m guessing it would be quite rare for a particular abbreviation to use multiple headings, and even if that were the case we’re probably only talking about four or five different headings. — Sgconlaw (talk) 04:23, 20 February 2023 (UTC)[reply]
Actually, these cases are the norm. Nearly all stenoscript abbreviations would need more than just a couple POS headings, often lots more. It seems overkill to have half a dozen headings for a single abbreviation. kwami (talk) 05:01, 20 February 2023 (UTC)[reply]
I went through 18 of the 85 abbreviations in my list (got too bored to continue). Of those 18, the majority would need 5 or more headings apiece -- 6 of them would need 5, 4 of them would need 6, one of them 7, and one of them ten: adjective, adverb, conjunction, determiner, interjection, noun, pronoun, suffix, verb and as a sound. I really think that listing the same stenoscript abbreviation under ten separate headings is excessive. kwami (talk) 05:30, 20 February 2023 (UTC)[reply]
Personally, I don't think we should add a new part of speech like "polysemous abbreviation". I tend to agree with Equinox that we should add whatever parts of speech are attested; I don't see why a page listing a lot of parts of speech would be a bad thing when the string is attested in a lot of parts of speech: that's how it works, that's why we have a verb sense at verb and a noun at Schwimmen, etc. But if we do want to lump these together, surely an existing (or broader) POS would be better... I thought "Particle" was the catch-all for (most) things that didn't fit in other POS? "Symbol" might also work, if one is taking the view that organize, organization, organizational etc are not to be considered separate parts of speech but just different roles the 'symbol' or. can be used in. - -sche (discuss) 23:03, 20 February 2023 (UTC)[reply]
I've used 'symbol' for some articles. But they're not symbols, they're abbreviations. The problem with separate POS's is that it will often be OR: we would need to invent those derivatives, yet it would be false to claim that something like adm "administer and derivations of that word" is a verb. kwami (talk) 23:22, 20 February 2023 (UTC)[reply]
I don't follow: if a POS (e.g. the verb administer) is attested, it meets CFI and we add it (it's only "OR" to the extent that our whole project of descriptivism entails OR); if the POS is not attested, we can't add it to mainspace (not even under a "polysemous abbreviation" header, or any other header), even if some other list of abbreviations asserts it exists, we can only add attested parts of speech (e.g. the noun administration). - -sche (discuss) 23:36, 20 February 2023 (UTC)[reply]
We'd be misrepresenting sources to say that e.g. adm is a verb. Very few of these abbreviations have a fixed part of speech, and that's inherent in the design of stenoscript. This differs from regular English orthography, where it's reasonable to make the assumption that e.g. jump is a verb until we attest that it's also a noun. kwami (talk) 23:46, 20 February 2023 (UTC)[reply]
If we'd be misrepresenting sources to say that adm is a verb, then we don't need to discuss whether it needs a ===Verb=== header or just to be mentioned under a ===Polysemous abbreviation=== or ===Particle=== or ===Symbol=== header, do we? If the use of adm to mean the verb administer is not attested, then we can't list it under any header, and the question of which header to use seems moot. If it is attested in a place in a sentence where it must be a verb (and specifically, if it's attested in three such places, being used, spanning a year), then it's a verb. If it's attested in places where it must be a noun, and also in some ambiguous places where it could instead be a verb, or adjective, or whatever else, but it's not attested in any places where it's unambiguously s verb/adjective/etc, ... we already have that exact problem and deal with it by only positing POS for which we have unambiguous evidence. - -sche (discuss) 00:03, 21 February 2023 (UTC)[reply]
But we do have unambiguous evidence: the manuals for stenoscript. By listing adm as simply a verb, even though we can adequately attest to it as a verb, we make a false claim. We should not be providing false information just because a legalistic reading of our guidelines tells us we need to lie to our readers. All rules on Wikt. are subject to common sense and not providing misinformation. As above, the rules serve us, we don't serve the rules. kwami (talk) 00:40, 21 February 2023 (UTC)[reply]
For example, according to Avancena, Stenoscript ABC Shorthand (1950, 1951, 1956, 1958, 1959, 1961, 1962, 1963, 1964, 1965, 1967 and presumably later editions), "there are 24 brief forms. [Each] applies to any form of the word it represents. ak would represent not only acknowledge but also acknowledgement and acknowledging."
kwami (talk) 01:10, 21 February 2023 (UTC)[reply]
The problem is that you're not really telling what "ak" means. You're saying it's short for some word that starts with "acknowledg-", but which one, exactly, will have to be deduced from the context. I'm not sure if that's all that much different from saying that "Mr. J." refers to someone, but who, exactly, will have to be deduced from the context. If on the other hand, you're saying that it can be short for one of a limited set of things, then each of those things has at least one part of speech inherent in it. You're not saying that "ak is a verb", you're saying that "when ak is short for 'acknowledge'/'acknowledges'/etc., it's a verb". Chuck Entz (talk) 01:36, 21 February 2023 (UTC)[reply]
Yes, but ak is not always a verb. It's also short for 'acknowledgement' etc., and 'acknowledging' isn't specifically verbal. I've found another manual that may spell out these 'brief' forms more explicitly. E.g. 'adm' "administer" is also "administrative" and "administrator". Basically, all derivatives of the stem "administer". That's a potentially open-ended list, and it would be OR for us to claim that it's restricted to a specific list of words. kwami (talk) 03:00, 21 February 2023 (UTC)[reply]
Re "But we do have unambiguous evidence: the manuals for stenoscript", "it would be OR for us to claim that it's restricted to a specific list of words", "our guidelines tells us we need to lie to our readers": this seems to be the core of the difference in views, because from my perspective that's backwards: it would be "lying to readers" if we said adm stands for something which in fact no-one can be shown to have used it to stand for. Refraining from positing unattested words is not OR, but basic application of the spirit (as well as letter) of the rules of WT:CFI; it's not "legalistic", it's how descriptivism works. For English and other WT:WDLs, manuals which mention but don't use words, the lists of phobias new users sometimes try to add entries from, etc, count for nothing: if there's no work using adm as a verb (or whatever), no-one will run across it and want to know what it means, so it's fine to not have a mainspace entry. Quite possibly we could have an appendix of stenoscript abbreviations regurgitating those manuals for people who do start from the other direction, not running across adm and wanting to know what it means but rather wanting to know how to encode administered into stenoscript. - -sche (discuss) 17:45, 21 February 2023 (UTC)[reply]
...but I think I see where you're coming from; I wouldn't want to define e.g. Kymber as "First part of the full names of [list of individuals named Kymber]" (or "Last part of the full names of [list]"), I would generalize that it is a first/last name, and you want to generalize from cites of adm (etc, etc) in various POS to "adm stands for any word starting administ-". Hmm, this requires more thought... (I'm still not fond of "Polysemous abbreviation" as a header, but I admit something like "Stenoscript abbreviation" might be worse because then it'd be narrow and couldn't handle if we wanted to include similar cases in another system.) - -sche (discuss) 17:59, 21 February 2023 (UTC)[reply]
The stenoscript manuals are instructional rather than exhaustive; for the 24 "brief forms" they don't provide a full list but rather instruct the reader to use them for a base word "and related forms". I found a short stenoscript dictionary that supplies many additional words that these abbreviations can be used for, and some of those surprised me -- I wouldn't have thought of them myself -- but even so they're just the most common of a potentially open-ended list.
It's quite possible that two of the brief forms, bz for 'business' (and also regularly for 'busy' etc.) and co for 'company', are only used for those words and their plurals, but it would be OR for us to claim that without a supporting source. kwami (talk) 20:35, 21 February 2023 (UTC)[reply]
The concept of OR is a Wikipedianism; it doesn't function here (and wouldn't prevent only providing attested senses and rejecting unattested senses, anyway). The main project of Wiktionary is "OR", we specifically don't just copy what other dictionaries, manuals, etc say (at least for English, the language under discussion here). It would violate CFI and fail RFV if we claimed that bz stood for something we couldn't find anyone using it to stand for. Things from manuals can't be added here unless they're attested in use; we exclude things that are in dictionaries but not in use; they fail RFV (yllanraton, gorget hummer, etc). - -sche (discuss) 22:53, 21 February 2023 (UTC)[reply]
I see those as two very different things: a dictionary records what people already use. If we can't confirm, it's possible that the dictionary made an error, or that the word is hopelessly obscure. Stenoscript manuals do the opposite: they instruct people on what to use. Whether we can confirm the words outside the manuals is rather besides the point. If the constitution of a small country declares that X is the national anthem, it would be appropriate to report that X is the national anthem even if we couldn't attest to it actually being played anywhere. kwami (talk) 01:22, 22 February 2023 (UTC)[reply]
But we are ourselves a dictionary that records what people already use, hence whether words are used is not beside the point, it is the point. For well-attested/documented languages like English, we don't include words that some manual instructs are to be used, if they're not actually used. - -sche (discuss) 23:00, 22 February 2023 (UTC)[reply]
But these aren't words, they're spellings, and the manuals are attestation that they're used, because thousands of people use stenoscript. Direct attestation is difficult because stenoscript is not commonly published or digitized. It's rather like signal flags or Morse code -- difficult to attest to directly, because they tend to be ephemeral, but we know they're used because people follow the manuals that define their use. kwami (talk) 00:06, 23 February 2023 (UTC)[reply]
However, if we haven't encountered it in our research, but someone else comes across it and wants to know what it means, we won't help them. Not so good. That's one of the reasons for allowing regular inflections regardless of the lack of evidence. --RichardW57 (talk) 18:11, 21 February 2023 (UTC)[reply]
Yes, it's very much like regular but unattested inflections.
Stenoscript is usually in ms form and almost never published. For normal English orthography, we can search electronic DB's like GBooks. Not so with stenoscript. Also, there's no overarching design in English vocabulary, but there is in stenoscript. Comparing them is comparing an organic to an artificial inventory. So IMO it's a very different situation. kwami (talk) 20:22, 21 February 2023 (UTC)[reply]
Also, we would sometimes need more than one header for the same part of speech. For example, co 'company' is already a noun, plural cos. But the stenoscript co does not have that plural -- plural marking is optional, but when present it is a dot below the last letter, not an s. So we'd need two 'noun' headings for co, one with plural cos and one with optional plural cọ (or whatever). kwami (talk) 23:40, 20 February 2023 (UTC)[reply]
Since we already face that issue with the plural of Chinese being Chinese, obiectūs being the genitive inflected form of obiectus, etc, we can handle it the same way. - -sche (discuss) 23:46, 20 February 2023 (UTC)[reply]
So two 'noun' headers or two 'verb' headers? kwami (talk) 23:49, 20 February 2023 (UTC)[reply]
That would work, although I notice Chinese gets by with only one noun header that says "plural Chinese or Chineses" (the entry doesn't actually have a separate POS or definition for "plural of Chinese" at all), and likewise obiectus has no separate POS section or definition for obiectūs as the genitive of obiectus (although I think I have seen Latin entries which do give homographic inflected forms their own POS section and definition line, and that also seems reasonable enough). - -sche (discuss) 00:03, 21 February 2023 (UTC)[reply]
I generally don't create separate headers for stuff like -ūs vs. -us and ablative -ā vs. -a since it seems redundant to me when it'll already be bolded in the inflection table (ideally the separate pronunciations should still be given where appropriate though that's tangential), but either way works. —Al-Muqanna المقنع (talk) 17:50, 21 February 2023 (UTC)[reply]
Separate from the question of whether to lump "abbreviation for any word starting organiz-" under one sense and POS or split it by POS, it occurs to me that iff we do lump, a header like "Polysemous abbreviation" still looks like it applies to any abbreviation that stands for multiple things (e.g. SB or EG), so I'd expect new users would just add new (non-stenoscript) abbreviations using that header if they perceive we're replacing "Abbreviation" but allowing "Polysemous abbreviation". So iff we want to lump, perhaps we'd be better off just sticking with "Abbreviation" as the header, using a template for stenoscript abbreviations which puts them in a stenoscript category, and then our "cleanup" list of "bad" uses of the "Abbreviation" header would be "entries using the POS header 'Abbreviation' but not in the 'stenoscript abbreviations' category"? (Or use a specific header like "Stenoscript abbreviation"?) - -sche (discuss) 22:53, 21 February 2023 (UTC)[reply]
I think either of those last two options would work. I'm not sure what to think about a 'stenoscript abbreviation' header. My first reaction is that I don't like it, but I don't have a good reason for opposing it. Maybe I just feel it's granting too much importance to stenoscript.
What I've been doing in the meantime, based on what I've seen on Wikt, is adding a new etymology header, writing "Abbreviations" as the first line, and then listing the stenoscript uses without a POS header. In some cases, there's already an etymology for abbreviations, and I've added the stenoscript uses above the POS sections. Sometimes there's already an etymology section for abbreviations and the stenoscript abbreviation is reasonably close to a single POS, so I've just added it under 'verb' or whatever. E.g. O for the preposition 'out', though 'O' is also used for the sound of 'out' in any word, so listing it under 'preposition' is really too limited.
I'll clean these up when we decide how they should be organized, but meanwhile I wanted to get the info in while I have the chance. kwami (talk) 01:31, 22 February 2023 (UTC)[reply]
Re "granting too much importance to stenoscript": I have to admit I'd never heard of it before, and we have no entry for stenoscript on Wiktionary (though WP does have an article). Equinox 01:35, 22 February 2023 (UTC)[reply]
You can take it in adult ed courses in the US. I used it for taking notes in college, so wanted to see it in Wikt. But I don't know that the user population is very large. kwami (talk) 01:42, 22 February 2023 (UTC)[reply]

desysop -sche??? edit

User:Gnosandes created what I believe is a frivolous vote to desysop User:-sche, see Wiktionary:Votes/2023-02/Desysop -sche. This was based on blocking User:Shumkichi. I have no context whatsoever on what happened with User:Shumkichi but IMO User:-sche is one of our most respected editors and admins. Anyone object to nuking this vote? Benwing2 (talk) 03:18, 20 February 2023 (UTC)[reply]

I think Gnosandes was sincere, so it wasn't frivolous in that respect. I'm reluctant to nuke it on those grounds, since even a highly respected admin has to be accountable. Having people oppose us is just part of the job. The only reason for stopping the vote would be the equivalent of a w:WP:SNOWBALL close: if there's no way the vote could possibly succeed and it's all a waste of time and unnecessary drama, I could support that. I'm not sure about the legalities, though. Chuck Entz (talk) 03:49, 20 February 2023 (UTC)[reply]
OK. My personal opinion is that this really does have a snowball's chance in hell of passing, and it will in fact create unnecessary drama, but we can let it play out for a few days before nuking it. Benwing2 (talk) 03:54, 20 February 2023 (UTC)[reply]
I voted; no problem with deleting my vote if you nuke the request. There's a tag that the voting hasn't started, but according to the schedule it started 29 hours ago. kwami (talk) 05:00, 20 February 2023 (UTC)[reply]
Shumkichi has a history of aggressive outbursts. Vininn126 (talk) 05:23, 20 February 2023 (UTC)[reply]
I would have preferred letting it play out. I don't know any specifics, but it would take a lot of evidence of especially bad behavior to offset -sche's record of exemplary contributions and handling of controversy and make me vote to desysop. DCDuring (talk) 18:33, 20 February 2023 (UTC)[reply]
Agreed. I'm not sure it's a good idea in principle to prevent votes just because they're unlikely to succeed, though there's not much doubt of the outcome in this case. Having said that, IIRC Gnosandes has said before that he believes every admin should be desysopped and has voted to that effect in the past so it might be considered frivolous purely on that basis in any case. —Al-Muqanna المقنع (talk) 18:44, 20 February 2023 (UTC)[reply]
I would love to see the quality of an admin-less Wiktionary. Imagine how it would look after a decade or so. One big mudball of SEO spam and nonsense. Equinox 18:53, 20 February 2023 (UTC)[reply]
Somehow I don't think it would take nearly that long. ‑‑ Eiríkr Útlendi │Tala við mig 16:49, 21 February 2023 (UTC)[reply]
I see. In that case, it does seem frivolous. But we sysops still look a bit cliquish when we suppress a vote to limit "one of our own". I don't want to extend the frivolity. DCDuring (talk) 18:54, 20 February 2023 (UTC)[reply]
He's also basing it on nothing - even Shumkichi, the person who caused the vote, voted against desysopping, and indeed things transpired. In my opinion much of what Gnosandes does is to be contrarian or inflamatory, and many users can testify to that. Vininn126 (talk) 19:00, 20 February 2023 (UTC)[reply]
I don't mind whether the vote runs or is deleted, though if kept it should be listed on WT:V and in the watchlist box, which it wasn't. I appreciate the words of support above. :) For those who asked, the context was: Shumkichi nominated himself for adminship, other users said he shouldn't be an admin because of attacks on other editors, Shumkichi replied with attacks against opposers (vote, archive), I saw he'd been blocked a dozen times by half a dozen admins for that since 2021, and so as an AFAICT/R uninvolved observer of more such behaviour, I imposed a block of double the most recent block (3 months by Fenakhay last September). Gnosandes objected, but since I was aware Gnosandes had a history of tendentious editing (being blocked himself by several different admins for that) and he was only one of the people Shumkichi insulted on the vote page, I left the block in place. (Another user who'd been criticized by Shumkichi on the vote page later reduced his block, which is, as I said with regard to my block of Dan, how I think that should go: if I impose a block I think is appropriate, or any admin does, other users should discuss and other admins should modify it if they think that appropriate.) - -sche (discuss) 21:30, 20 February 2023 (UTC)[reply]
@-sche Thanks for the context. I completely agree that the correct response if you think a block is overly long is to reduce it (if you're an admin) or get an admin to reduce it (if you're not an admin); not to try and desysop or otherwise retaliate against the blocking admin. Benwing2 (talk) 21:38, 20 February 2023 (UTC)[reply]
@-sche, @Benwing2, Sorry, but I must ask this. Why haven't both of these users been permablocked? I've mentioned this before, but what's the purpose of these short term blocks if they don't do anything to make the users change? I'm glad that we were able to finally decide to permablock Dan Polansky, but like I don't understand the enforcement mechanism here. As @Vininn126 linked to, for Gnosandes, "He has been blocked several times in fact", and many users have talked about how he's been impossible to work with and causes havoc. Is this not grounds for a permablock? I just feel like this emboldens users to continue their same behavior when they know that they'll eventually be allowed back in after pulling the same crap. AG202 (talk) 22:54, 20 February 2023 (UTC)[reply]
@AG202 Shumkichi, contrary to appearances, does slowly improve, and definitely has good intentions for the project, and does indeed do good work. Vininn126 (talk) 22:57, 20 February 2023 (UTC)[reply]
@AG202 Personally I do think Gnosandes should be perma-blocked if he causes any more problems. Benwing2 (talk) 23:35, 20 February 2023 (UTC)[reply]
Seconded. Vininn126 (talk) 23:36, 20 February 2023 (UTC)[reply]
That’s good to hear at least, thanks for the clarification. The other user though… at least to me doesn’t seem to have any redeeming qualities. AG202 (talk) 23:32, 20 February 2023 (UTC)[reply]
I don't think this was an entirely frivolous vote proposal; it was a mistake to preemptively nuke it. While Gnosandes might wish for a wiki without admins, it's not like they're just willy-nilly nominating admins to be desysopped. We just had a vote to desysop an admin for the allegedly excessively long block of 1 month, so it's not out of the question for someone to see a 6 month block (which ended up being reduced to just 10 days) as an abuse of admin power. Whether that merits desysopping or not is a question for the community to answer, as it is the community who admins answer to. If it weren't for the fact that I intended to vote "oppose" just now and don't think the vote had the slightest chance of passing, I'd recreate the vote myself out of principle. Megathonic (talk) 22:16, 20 February 2023 (UTC)[reply]
@Al-Muqanna: What you have written is not entirely correct. I wrote this in the context of the fact that administrators across all possible boundaries by blocking users for nothing for a very long time or forever. It's just not normal. Therefore, Chuck Entz's words look very reasonable.
@Vininn126: This is called finding the first guilty, and then everyone will consider him guilty. User Sławobóg is not going to tell about himself, obviously. Too one-sided view. I may disagree and I may not follow your point of view. But because of this, I will not be wrong, it will not be called that it is impossible to work with me, &c. If person Sławobóg doesn't want to have a discussion, then that's not my problem. Centralization does not work in Polabistics (in Wiktionary), there have been no discussions, recent studies (Kortlandt 2011; Rachwał 2022, &c.) do not show what user Sławobóg says.
@-sche: Say directly that you have blocked user Shumkichi forever. Since I participated in the voting, I have not seen any attacks from this user. It is likely that user Fay Freak wanted to tell you the same thing that I did. But you didn't answer him. Then I personally wrote to you, but you also ignored it. You did not respond, although you are an admin. You just ignored two users. I also see that the administrator (that is, you) do not know what an insult is, because user Shumkichi did not insult me, as you claim. Obviously, after you ignored me (and not only ignored me) I waited a little longer in the hope that you would write and created a vote to attract attention. No one forbade me to create a vote. History has shown that your words in parentheses about Dan and so on don't work because you ignore users. Since I look at all this not like you or Benwing2, but I look at it objectively and impartially, I am not interested in whether you are a good admin or a very good one, but I see that you do not know what an insult is, which means you can incorrectly assign blocks.
As a result, I am glad that Shumkichi has been unblocked. Although I have a bad relationship with him. Gnosandes ❀ (talk) 11:24, 21 February 2023 (UTC)[reply]
I don't know about Polabistics but it's a bit bizarre to claim that users are being blocked for nothing. One can disagree with the reasoning but, to me at least, there doesn't appear to be any systemic problem of admins roaming around blocking people arbitrarily, and in any case as Shumkichi's case shows (and Dan's on previous occasions) what one admin does is often undone by another. —Al-Muqanna المقنع (talk) 11:31, 21 February 2023 (UTC)[reply]
@Al-Muqanna: But why does he ignore users who disagree with this blocking, besides, even Shumkichi himself wrote to him in the mail that he was joking? -sche was watching my blocks, so obviously he was interested in me, which means the ignoring was intentional. But my locks do not change anything, because the administrators of Rua, Vahagn Petrosyan and -sche also had blocks in the past. Gnosandes ❀ (talk) 11:56, 21 February 2023 (UTC)[reply]
Yes, users with long histories of berating users should not be punished for berating users. Truly such grounds for blocking are unjustified and senseless. And no, an email saying "it was just a joke bro" does not excuse the behavior. Surely you can't believe this. Vininn126 (talk) 11:58, 21 February 2023 (UTC)[reply]
@Gnosandes thanks for standing up for me i guess but i really don't need it. listen, i also find some ppl here rather annoying and i think they can be a little pretentious and certainly oversensitive as they can't take some innocent trolling like a man, but it doesn't mean that we should block them or take away their privileges. not taking into account their contributions to the project and focusing only on their activity as admins/mods/whatever (so blocking, technical things, you know, all that boring stuff) is something i've always opposed. adding and editing entries >>>>> technical stuff and petty drama (even though i also sometimes like to take part in it hehe). ignoring this user's contributions as an active editor is actually not impartial at all, it's unnecessarily retaliatory.
so plsss, can we stop discussing all of this already? it's getting rather boring zieeew ;__; go drink some tea (but not coffee, i hate its taste bleeeee how can you even swallow it and not suffocate immediately?) and go back to editing :3 or if you really want to keep arguing, why can't you do it privately? i suggest you do it via email or on Discord as I've done several times, it's a really great way to exchange insults! and once you stop calling each other names, you can go back and add more entries to vent your frustration. Shumkichi (talk) 13:49, 21 February 2023 (UTC)[reply]
Since I like to ask a lot of questions, this defense will be an experiment that will bring me some data. I just don't go and say that this user should be blocked forever and this one is on the minus forever, &c. It is obvious that these administrators itch their hands, they splash out on other people their despair in lives. They delete votes just like that, having bothered to write two unintelligible words in the reason for this deletion. Without discussion. WP:SNOWBALL is not the main rule, so it is written.
Take the case of accentology. How many times have I said that many reconstructions are wrong, even the system itself is wrong, I wanted to change it and wrote discussions. No one answered me, because no one understands this. Question. How should I change the system if it can only be changed through discussion, but no one discusses it because no one knows this section of science? Then I start changing it myself, but people come and say that you have disruptive edits. Gnosandes ❀ (talk) 14:41, 21 February 2023 (UTC)[reply]
maybe your edits were simply incorrect? Shumkichi (talk) 15:16, 21 February 2023 (UTC)[reply]
User:Gnosandes: the basic issue is that you have strongly held fringe beliefs (both politically and in terms of Slavic accentology) and think it's appropriate to try to force your views into Wiktionary; and furthermore you show no interest in working *with* or cooperating with others (in fact you think everyone but you is clueless, based on your comment just above "because no one understand this"). I'm giving you a warning now that I will perma-block you if I hear any more substantiated complaints about future behavior of this nature. You probably think this is unfair, but it's necessary to protect the community and project as a whole. Benwing2 (talk) 08:05, 22 February 2023 (UTC)[reply]
@Benwing2: (@Thadh: Help me, please. Do you really think that the works on the accentology of Potebnia, Saussure, Fortunatov, Stang, Illich-Svitych, Dybo, Redkin, Zalizniak, Nikolaev, Schallert, Bulatova, Zamiatina, Kapović, Oslon, Bolotov, Lashin, Shrager, Šekli, &c. formed fringe beliefs in me? The question was, why isn't anyone discussing this? You just didn't answer and wrote some nonsense that I was imposing something. Further. It is impossible to draw such a conclusion from my comment that you made. Because I made my conclusion on the fact that no one answers me. Thadh wrote to me many times that either there is no specialist, or no one is interested in it. Why did you decide that I don't show interest and cooperation? Of course this is unfair, because these are unfounded complaints, or should I start complaining too? Gnosandes ❀ (talk) 08:57, 22 February 2023 (UTC)[reply]
@Shumkichi: Obviously, they will not look right in this system. Gnosandes ❀ (talk) 09:05, 22 February 2023 (UTC)[reply]
You can add my name to the list of people who think this should've been left to play out. I would have opposed a desysop, of course, and there's probably no point in recreating the vote now but on principle the vote shouldn't have been struck. --Overlordnat1 (talk) 12:57, 21 February 2023 (UTC)[reply]
While we're on the subject, is it possible to have a de-troll vote? Nicodene (talk) 15:24, 21 February 2023 (UTC)[reply]
-sche desysopped, is this an alternative timelime in some Marvel universe? – Jberkel 09:28, 22 February 2023 (UTC)[reply]

Words + Clitics edit

Combinations of words plus a clitic can look very like a word, for example when part of the word plus the clitic looks like an inflection. What is the policy on recording these wordsthings? I'm thinking that they should be included when manual stemming is very likely to fail. The thing I'm currently pondering is Pali uttareyyanti, which is uttareyyaṃ (1s optative active) + ti (end quote), which looks as though it should be a 3p optative active of uttarati, but can't be. --RichardW57m (talk) 15:01, 21 February 2023 (UTC)[reply]

Given the silence, the only relevant policy I could find was 'be bold', and so I've added the term. --RichardW57m (talk) 13:08, 22 February 2023 (UTC)[reply]

(of a law, etc.) edit

For exampe, the entry of take effect reads

2. (of a law, etc.) To come into force, to come into effect, to inure.

Howver, a single example is not enough to infer what other similar elements are being referred to. At least an example sentence or citation is needed with a different term, as in its first meaning

1. (of a drug, etc.) To become active; to become effective.
The medication won't begin to take effect for 3-4 hours. 

JMGN (talk) 18:29, 21 February 2023 (UTC)[reply]

Community feedback-cycle about updating the Wikimedia Terms of Use starts edit

You can find this message translated into additional languages on Meta-wiki.

Hello everyone,

Wikimedia Foundation Legal Department is organizing a feedback-cycle with community members to discuss updating the Wikimedia Terms of Use.

The Terms of Use (ToU) are the legal terms that govern the use of websites hosted by the Wikimedia Foundation. We will be gathering your feedback on a draft proposal from February through April. The draft will be translated into several languages, with written feedback accepted in any language.

This update comes in response to several things:

  • Implementing the Universal Code of Conduct
  • Updating project text to the Creative Commons BY-SA 4.0 license
  • Proposal for better addressing undisclosed paid editing
  • Bringing our terms in line with current and recently passed laws affecting the Foundation, including the European Digital Services Act

As part of the feedback cycle two office hours will be held, the first on March 2, the second on April 4.

For further information, please consult:

On behalf of the Wikimedia Foundation Legal Team, Mervat (WMF) (talk) 18:35, 21 February 2023 (UTC)[reply]

Prohibition of Valence Theory[1] edit

Since the ideas within the framework of this theory created by the Moscow Accentological School and expanded by other non-“Soviet” accentologists are considered fringe/marginal/pseudo-scholarship (see Fringe theory) and do not meet the standards of Wiktionary, I propose to prohibit and not use it in the future. These ideas were named respectively by people such as Rua, Victar, Benwing2 and Thadh.

According to the work of Thomas Olander (2015) Det baltoslaviske problem: Accentologien, page 38‒40:

Den såkaldte „Moskvaskole” har siden starten af 1960’erne – først og fremmest inspireret af Stangs bog fra 1957 – udviklet en egen tilgang til baltisk og slavisk accentologi. De tidlige fremstillinger indeholder omfortolkninger af visse dele af Stangs system, men lægger sig ellers forholdsvis tæt op herad. Siden har Moskvaskolen fulgt en stadig mere selvstændig kurs og efterhånden bygget et fundamentalt anderledes system op.

In accordance with this, I propose to prohibit and not use the works on accentology from the early 1960s onwards by the respective authors:

  1. Vladislav Illich-Svitych;
  2. Vladimir Dybo;
  3. Andrey Zaliznyak;
  4. Sergei Nikolaev;
  5. Joseph Schallert;
  6. Mate Kapović;
  7. Mikhail Oslon;
  8. Sergei Bolotov;
  9. Svetozar Lashin;
  10. Miriam-Maria Shrager;
  11. The list can be expanded (little known).

Special cases.

  1. The work of Thomas Olander (2001) Common Slavic accentological word list is mostly based on the works of the above-mentioned personalities.
  2. All works by Frederik Kortlandt that use valence theory (and in the case that the valence theory itself is used incorrectly by him).

All relevant references to these works, apparently, should be deleted, and the data distorted by these works are brought into proper form.

References

  1. ^ Kapović, Mate (2019) Shortening, Lengthening, and Reconstruction: Notes on Historical Slavic Accentology[1], →DOI, page 126

Gnosandes ❀ (talk) 14:07, 22 February 2023 (UTC)[reply]

What the hell are you talking about? Thadh (talk) 14:09, 22 February 2023 (UTC)[reply]
As 2p from the peanut gallery, as it were -- I offer my observations as an uninvolved editor.
I don't have anything to do with Slavic languages at all: no experience, no study, no exposure. And even in my ignorance, I do note that Gnosandes has come across as a pot-stirrer. I have so far avoided interaction, partly out of no general need for any, given my focus on Japanese, and partly out of a desire to avoid drama. At my remove, I had hoped that my initial impressions were mistaken, and/or that Gnosandes would learn more about how the Wiktionary community operates and adjust their approach.
The above suggests that no quarter is being given.
I do not see evidence of someone who plays well with others, or who has much interest in doing so.
‑‑ Eiríkr Útlendi │Tala við mig 18:35, 22 February 2023 (UTC)[reply]
As someone with exposure to Balto-Slavic languages, but who is not an expert in their proto-languages or accentology, I think that the above is a very technical matter, but still potentially a valid discussion that would mainly be of interest to those working in Proto-Slavic, Proto-Balto-Slavic, and perhaps even Proto-Indo-European. I would tend to defer to Derksen and the Leiden school, if only because his materials on PSl. and PBsl. seem the most accessible and modern, but I don't have any view on the actual substance of that school's theory vis-à-vis the Moscow school or the traditionalists, and I've barely read anything on the topic.
That said, even if accentology is a valid subject of debate, I cannot vouch for the point of view or proposal being advocated, which for all I know may be way off-base. What sticks out to me is the stringency of the proposal. Calling an opposing theory "fringe" or "pseudo-scholarship" may be valid in certain cases, but I'd only use those terms if they are widely rejected by the vast majority of experts, and I'm not sure enough evidence has been given here to come to that conclusion. Further, it seems to me that even if Wiktionary were to stick to one school or another for consistency, that doesn't mean we can't cite works by members of other schools, especially when the issue isn't specifically accentology. For example, we would be silly to prohibit the citation of all works by members of the Leiden school ({{R:grc:Beekes}}, {{R:bsl:EDBIL}}, {{R:itc:EDL}}, etc.), which tend to be very high-quality, even if we use the vowel a in our PIE reconstructions (they avoid it), or they have other idiosyncratic views we don't want to follow.
Wiktionary:About Proto-Slavic and Wiktionary:About Proto-Balto-Slavic already include discussion of related matters. The main consequence of all of this, AFAICT, is basically just what diacritics to put on reconstructed headwords, which might not matter much to the majority of readers, even those interested in etymology. That's not to be completely brushed aside though as Balto-Slavic accentology is a big part of Balto-Slavic linguistics, and the relative preservation of the BSl. accent compared to other IE branches is one of the reasons for the importance of BSl. in reconstructing PIE. 70.172.194.25 00:12, 23 February 2023 (UTC)[reply]
From what I gather, the stringency is because it's bad-faith point-making and Gnosandes is upset about pushback on some edits he's made which were sourced to these authors, e.g. diff (Nikolaev, listed above). Cf. the previous discussion here last month at Wiktionary:Beer parlour/2023/January#User:Gnosandes and links there. —Al-Muqanna المقنع (talk) 20:11, 23 February 2023 (UTC)[reply]
Al-Muqanna Zalizniak, Dybo, Snoj, Nikolaev*. Besides, it was from the very beginning. Gnosandes ❀ (talk) 22:53, 23 February 2023 (UTC)[reply]
So he says "Siden har Moskvaskolen fulgt en stadig mere selvstændig kurs og efterhånden bygget et fundamentalt anderledes system op." But it does not mean that it is a fringe theory. It means that it is just an alternative view on the slavic accentology in different linguist schools. Or maybe you mean, Zaliznyak is not trustworthy b'cuz he is from Soviet, you must be kidding. He was nor a supporter of Soviet, nor any cringe linguist stuff from Soviet. Zaliznyak was based on Stang's research. So maybe do you mean, Stang is a pseudoscience-dude? How??

Editing news 2023 #1 edit

Read this in another languageSubscription list for this multilingual newsletter

This newsletter includes two key updates about the Editing team's work:

  1. The Editing team will finish adding new features to the Talk pages project and deploy it.
  2. They are beginning a new project, Edit check.

Talk pages project

 
Some of the upcoming changes

The Editing team is nearly finished with this first phase of the Talk pages project. Nearly all new features are available now in the Beta Feature for Discussion tools.

It will show information about how active a discussion is, such as the date of the most recent comment. There will soon be a new "Add topic" button. You will be able to turn them off at Special:Preferences#mw-prefsection-editing-discussion. Please tell them what you think.

 
Daily edit completion rate by test group: DiscussionTools (test group) and MobileFrontend overlay (control group)

An A/B test for Discussion tools on the mobile site has finished. Editors were more successful with Discussion tools. The Editing team is enabling these features for all editors on the mobile site.

New Project: Edit Check

The Editing team is beginning a project to help new editors of Wikipedia. It will help people identify some problems before they click "Publish changes". The first tool will encourage people to add references when they add new content. Please watch that page for more information. You can join a conference call on 3 March 2023 to learn more.

Whatamidoing (WMF) (talk) 23:25, 22 February 2023 (UTC)[reply]

Gujarati declension template titles don't align with WT:Templates edit

I originally asked this over at The Grease Pit, but I feel like that discussion might be moving away from a technical question and becoming more appropriate for WT:BEER. The Gujarati noun declension template titles are a bit of a mess and don't really follow the guidelines laid out in WT:Templates. Per those guidelines, the templates that build the tables should be titled something like {{gu-decl-noun-table}}, then the templates which provide the information to those tables should be titled like {{gu-decl-noun-f}} or {{gu-decl-noun-m}}. However this is not the case with the current templates at all, and most of the current titles look more like headword template titles. Here's the current declension table titles:

Just wanted to get these thoughts out there to see other users' thoughts before creating new template pages (I also don't have template removal permissions...). In the other discussion it was mentioned that renaming a template wouldn't cause issue because invokations of the old titles would automatically redirect to the new ones... although since {{gu-noun-f}} shuold be renamed to {{gu-decl-noun-f}} and since the latter already exists that's going to cause redirect problems. For that one, the template callout in the individual entries may need to be changed, which may require a bot to do efficiently...
I don't want to sound too pedantic with this, but it was enough to confuse me quite a bit while I was trying to figure out what all the templates do, and I just want to limit the confusion of anybody trying to figure it out in the future. – Guitarmankev1 (talk) 19:57, 24 February 2023 (UTC)[reply]

@Guitarmankev1 Yeah when things are messed up like this you might need a bot to straighten it out. Template renames are really easy to do by bot so I can help you if you need all uses of certain templates renamed. In general when implementing declension templates for new languages my choice is to have a single template called {{LANG-ndecl}} for noun declensions and another single template {{LANG-adecl}} for adjective declensions, although to implement this you need to do it in Lua (cf. {{hi-ndecl}}, {{hi-adecl}}). One thing you could do is call your templates {{gu-ndecl-m}}, {{gu-ndecl-f}}, etc.; those names shouldn't be taken currently so it should be easy to implement the renames. For templates that should be deleted, you can add the {{delete}} template to the top of the template definition (with an attached reason, e.g. {{delete|unused template}}), and it will (eventually) get deleted by an admin. Benwing2 (talk) 05:03, 25 February 2023 (UTC)[reply]
@Benwing2 renaming the Gujarati templates to the {{gu-ndecl-X}} convention to align with the Hindi {{hi-ndecl}} template looks like a good idea. Although others like Sanskrit {{sa-decl-noun}} use the guideline-recommended convention, the "ndecl" way would at least be clearer than the existing setup and it avoids having to coordinate with a bot to make sure the entries which reference {{gu-decl-noun-f}} get re-routed right away... Would it be best practice to leave all of the existing entries referencing the old templates which would redirect to the new titles, or to still use a bot to update the entries then delete the old templates altogether? The latter would save a little space and keep Category:Gujarati noun inflection-table templates clean, while the former might cause less confusion to current editors familiar with the current titles (there doesn't appear to be too many active Gujarati editors). – Guitarmankev1 (talk) 16:58, 27 February 2023 (UTC)[reply]
@Guitarmankev1 I think we should use a bot to switch all entries to the new names, esp. given the lack of active Gujarati editors. In the long run it will be much less confusing that way. Benwing2 (talk) 21:06, 27 February 2023 (UTC)[reply]
@Benwing2 Per your suggestion I added {{delete}} to the templates requiring removal and I made some new templates for renaming the ones requiring that. Also some minor cleanup & improvement of the templates themselves. Here's what could be accomplished efficiently by bot:
  • {{gu-noun-m-c}} to {{gu-ndecl-m}}: Replace all instances on entry pages. Keep all parameters.
  • {{gu-noun-n-c}} to {{gu-ndecl-n}}: Same as above.
  • {{gu-noun-f}} to {{gu-ndecl-f}}: Replace all instances on entry pages. Any specified parameters in existing instances should be identical to the pagename, and can now be removed. If any existing entries have a parameter which doesn't match the pagename we should take a closer look at that entry, but I'm not predicting that to happen. Are you able to include that check in the bot programming?
  • {{gu-decl-noun-f}} to {{gu-ndecl-f-table}}: I don't believe that this template is directly called out in any entry pages since it's just used to build the table for the other templates... After the other templates are replaced, we can double-check if any entries still link to it.
  • {{gu-decl-noun}} to {{gu-ndecl-table}}: Same as above.
  • {{gu-noun-um-v}} to {{gu-ndecl-um-v}}: I've already handled these manually, since there weren't many instances.
  • {{gu-noun-um-c}} to {{gu-ndecl-um-c}}: Same as above.
After that's done, we can add {{delete}} to those old pages as well, after it's verified that nothing links to them anymore.
Also, I was noticing that not all of those new templates are showing up on Category:Gujarati noun inflection-table templates, but I'm not quite sure why...
I also want to eventually remove the need for parameters in the gu-ndecl-m/n templates by auto-removing the predictable endings, and also merge the gu-ndecl-um-c/v templates by auto-detecting if the word ends in a vowel or consonant, but I need to learn a little more about templates & modules first. I'll be sure to let you know once I do since that might require the bot again. Thanks! – Guitarmankev1 (talk) 01:32, 1 March 2023 (UTC)[reply]
@Guitarmankev1 I made the renames and deleted the old templates. {{gu-decl-noun-f}} is used on one page so I didn't delete it yet. Benwing2 (talk) 06:52, 1 March 2023 (UTC)[reply]
Thanks @Benwing2!! I forgot about the irregularity in રાત/રાત્રે... I manually replaced that instance with the new {{gu-ndecl-f-table}} just now. – Guitarmankev1 (talk) 14:30, 1 March 2023 (UTC)[reply]

A recent change to Template:link is causing problems to Chinese entries edit

Previously {{l|zh|頭}} gave just "頭". Now it also produces the simplified form, giving "頭/头". However this is not always helpful.

For example, the "glyph origin" of the entry previously read:

From cursive script of 偉. Simplified from 偉 (韋 → 韦).

But now it reads:

From cursive script of 偉/伟. Simplified from 偉/伟 (韋/韦 → 韦).

After the change this sentence does not make sense. 恨国党非蠢即坏 (talk) 03:33, 25 February 2023 (UTC)[reply]

You can disable automatic simplification by using (e.g.) {{l|zh|頭//}}, which gives . Compare with {{l|zh|頭}}, which gives . Theknightwho (talk) 03:49, 25 February 2023 (UTC)[reply]
Is this syntax documented anywhere? 70.172.194.25 05:06, 25 February 2023 (UTC)[reply]
Not yet (but I will do so very soon). We're essentially in the beta stage of rolling this out, where we're trying to find any issues (like this). This syntax is a side-effect of the manual // divider for forms, because (a) it overrides automatic forms, and (b) empty forms are not shown. Manual simplification is also possible (e.g. default: 稜鏡棱镜 (léngjìng); nonstandard: 稜鏡稜镜 (léngjìng); both 稜鏡棱镜稜镜 (léngjìng) etc.). Theknightwho (talk) 23:08, 25 February 2023 (UTC)[reply]
This is a recent change discussed above. I've made the changes disabling the generation of simplified forms to the template accordingly (diff). For some reason, was using {{l|zh|}} rather than {{zh-l}}, which has caused the issue here as the latter was expected, then we slowly convert it to {{l|zh}} with the method that Knight just discussed.
CC @theknightwho. – Wpi31 (talk) 03:50, 25 February 2023 (UTC)[reply]

Symbolism edit

If a leek is symbolic of Wales, is it wrong to mention that at leek? (It's in category:Wales but not in the body of the entry) @Sgconlaw Drapetomanic (talk) 18:12, 25 February 2023 (UTC)[reply]

This is related to "Wiktionary:Requests for deletion/English#God Defend New Zealand" where the argument sought to be made is that the names of national anthems should not be deleted because they are symbolic of patriotism. I don't think this is sufficient; a term still needs to be idiomatic in some way. White flag is idiomatic as it indicates surrender, for example, "She waved the white flag and said no more." If a term, including the name of a national anthem, is idiomatic in that sense then at least three qualifying quotations evidencing that sense need to be found. Otherwise, I don't think we go around adding, for example, at rose the sense "A symbol of love", or at magnifying glass "A symbol for conducting a search". There was a previous discussion where it was argued that born on the Fourth of July meant "patriotic about the United States"; a search for quotations was conducted but insufficient unambiguous quotations were found to support such an idiomatic sense. — Sgconlaw (talk) 18:23, 25 February 2023 (UTC)[reply]
OK I'm quite happy that rose of Sharon says "a national emblem of South Korea" but you presumably aren't. Drapetomanic (talk) 18:43, 25 February 2023 (UTC)[reply]
We often ignore uses of a word in expressions that are incomprehensible or misunderstood using the definitions in our entries if the meaning that would eliminate the incomprehensiblity or misunderstanding are deemed "symbolic" or for other whimsical reasons. Symbolic/cryptic meanings existing for many words, such leek, *bald eagle*, rose, red*, red pill*, black* ("anarchistic"), purple*. For some (marked with *) we include them; for others we don't. Don't ask for any consistency: we don't believe in it, at least in semantics. DCDuring (talk) 20:12, 25 February 2023 (UTC)[reply]
It's not defined as a national symbol, that fact is merely mentioned. That part could be removed and it would still mean the same thing. Symbolism is a very complex phenomenon that doesn't reduce very well to lexicographic terms. For one thing, there's a whole language of flowers that has, in the past, been used to encode personal messages in bouquets. For another, common, widespread species have been used as symbols for various things in various cultures, places and historical periods. It's better to leave most of that to encyclopedias. Chuck Entz (talk) 20:29, 25 February 2023 (UTC)[reply]
Another similar example is tiki torch as a (recent) symbol of the far-right or racism, which failed RfV. 70.172.194.25 20:16, 25 February 2023 (UTC)[reply]
We have nearly 300 English noun lemmas that include "symbol of", usually in a definition. I think we need to see whether we want to extirpate "symbol of". Perhaps some of the offending definitions are instances of metonymy or poor wording. The treatments of blue pill and red pill are exemplary in their exclusion of any reference to symbolism. DCDuring (talk) 20:22, 25 February 2023 (UTC)[reply]
The 65 English noun lemma entries with the collocation "emblem of" seems also worthy of extirpative review. DCDuring (talk) 20:28, 25 February 2023 (UTC)[reply]
Does not belong here. Mention it in an encyclopaedia (Wikipedia), not in a dictionary (Wiktionary). Equinox 13:47, 26 February 2023 (UTC)[reply]
It depends how the term is used. As Sgconlaw said, something like white flag has become lexical, you can fly, raise, run up (etc) the / a white flag meaning signal desire to surrender / have a truce without any actual flag. And many cites of hammer and sickle only make sense if you know (or our entry explains) the connection to communism. I'm not aware of cites where a leek being the national symbol of Wales, or e.g. the California poppy being the state flower of California, is lexically significant, though if you think there are such cites, let's discuss them. (Similarly, I think it makes sense to define Paris as "the capital of France" or at least "a city in France", but less sense to bother mentioning that it's "the birthplace of Emma Watson" : both are factually true, but cites probably only rely on someone to already know the first thing as part of the definition/meaning of the term.) For the language of flowers, I think it would again come down to cites / how terms were used... if enough works refer to someone sending (someone else) violets with no explanation, relying on the reader to know the meaning of that, that might suggest the language-of-flowers meaning had become a part of the meaning of the term. For bald eagle, it seems wrong to have "This bird as a national symbol of the United States." as a separate second sense, it seems like it should at least be folded into the first sense, or perhaps dropped entirely, no? - -sche (discuss) 20:14, 26 February 2023 (UTC)[reply]
I'm going to RFD that second sense of bald eagle, it does seem strange Drapetomanic (talk) 08:07, 28 February 2023 (UTC)[reply]
Yes, I agree with Chuck Entz and -sche. In entries about actual symbols (like asterisk) there will naturally be a sense that describes the referent as a symbol. In entries about some other thing, there will be a definition of the thing, and it is often unnecessary to mention that the thing is also a symbol of something. Description of that sort of symbolism is best left to Wikipedia. If by "symbolism" what is meant is that a thing has gained some idiomatic or figurative sense, then this can be added as a sense only if it can be attested by our usual rules requiring at least three qualifying quotations. — Sgconlaw (talk) 21:29, 26 February 2023 (UTC)[reply]
Though Wiktionary is not an encyclopedia, it is the sort of thing that would be well worth mentioning in the trivia section. Otherwise, we might have to find that "the land of the bald eagle" is a synonym of 'the USA'. --RichardW57 (talk) 00:10, 5 March 2023 (UTC)[reply]

Bokmål and Danish edit

Hi, I'm new here and I ain't finding no reasons why the most of Bokmål words here got their etymology direct from Old Norse, and not through Danish? It is not even possible to use the "inh|nb|da" template, because Danish is not set as an ancestor of Norwegian Bokmål (by some weird reason). While in practique, there are very few Bokmål words which are not inherited from Danish (even they, which look very similar to Nynorsk ones, like e.g. å væra is from vera with East Norwegian splitted infinitive, and Bokmål være is from Danish). May this chaotic etymology situation in Bokmål on Wiktionary be a result of some early old discussion on Wiktionary between the experts? Or may it be just a cruel mistake? What must be done in this sad situation? Tollef Salemann (talk) 13:24, 26 February 2023 (UTC)[reply]

We have discussed this about four or five times and most editors involved with these languages have agreed the ancestor of Bokmål should be changed to Danish. As for why this still has not been done, it's up to someone with administrator permissions to do it. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 13:35, 26 February 2023 (UTC)[reply]
Oh, thanks! Glad to know that nobody ain't gonna be banned for their edits on Bokmål word etymology. Tollef Salemann (talk) 13:42, 26 February 2023 (UTC)[reply]
@Mårtensås Can you give links to these discussions? This, that and the other (talk) 00:35, 27 February 2023 (UTC)[reply]
Yes I am a bit skeptical of making a change of this nature; it is quite far-reaching in its implications. Benwing2 (talk) 05:57, 27 February 2023 (UTC)[reply]
Is it because of heavy structural change impact? Or do you mean the difference between Bokmål and Danish is not so big? If you look NAOB Dictionary, you easy find that the % of Danish "borrowings" in Bokmål is kinda 99% (if you include the original Riksmål spellings). It's extremely weird to put Bokmål and Nynorsk as sister languages. They both are not oral, but written forms of Norwegian, based on two different written traditions: Nynorsk was made from a written oral Norwegian (set by Ivar Aasen and developed by people like Vinje, Garborg and Duun) and - on other side is Bokmål, which is based on written Danish with some spelling/grammar changes influenced by oral Norwegian. And most of these spelling/grammar changes met an aggressive reaction from the Danish-form-users from larger cities. Tollef Salemann (talk) 06:41, 27 February 2023 (UTC)[reply]
Two previous ones are linked here. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 07:42, 27 February 2023 (UTC)[reply]
Well, "most editors agree the ancestor of Bokmal is Danish" has lately become an assertion of some editors, but it's doubtful that it's numerically accurate since another significant faction of editors think we shouldn't be duplicating Norwegian-language content across three L2 headers at all. But a vote to unify Norwegian content under one header a la simplified vs traditional and Mandarin vs Cantonese Chinese also failed several years ago; it seems like neither side has a consensus-level majority behind it, and the opinions of the two about what direction to go in are quite opposite, so we're at something of an impasse, having the current degree of separation between Nynorsk and Bokmal and neither widening it (acting like they are completely separate, only relatively distantly related through separate ancestors) or lessening it (unifying under one header). I am one of those who would prefer to unify the content; giving only two of the orthographic standards their own L2s but leaving purely dialectal forms and Riksmal with nowhere to go but under ==Norwegian== (as if those are the only true Norwegian forms, the only forms deserving of being called simply "Norwegian") seems to me like a confusing state of affairs. - -sche (discuss) 19:02, 27 February 2023 (UTC)[reply]
I agree with User:-sche about merging all Norwegian varieties under a 'Norwegian' header. Benwing2 (talk) 19:21, 27 February 2023 (UTC)[reply]
As do I. Thadh (talk) 19:23, 27 February 2023 (UTC)[reply]
Why not merge all the North Germanic languages into one language then? :) Bokmål and Nynorsk aren't even in the same branch. Tollef Salemann (talk) 19:50, 27 February 2023 (UTC)[reply]
Let's turn that slippery slope argument in the other direction: why doesn't Riksmal get its own L2? And each Norwegian dialect that has forms not found in one of the Bokmal/Nynorsk 'standards'? (Why not give American and British English their own L2s? That would solve a lot of fussing over whether to spell things with flavor or flavour, whether to inflect things in a way that fulfills or fulfils, and what suspenders are, etc...) - -sche (discuss) 20:11, 27 February 2023 (UTC)[reply]
As i understood, your logic is that if Bokmål and Nynorsk have status as two different written languages of the same oral language, they must be considered as the same language? Sounds fair. But oh Lowd, it's gonna be a horrifying mess if we try to merge them under the same L2-cathegory. What about etymology, grammar, alternative feminine plural, splitted verb infinitive and so on? That's why i mean, it is much more logical and easier just to accept that the main mass of Bokmål words is not (directly) related to Nynorsk, but rather to Danish. And that's also why they must have two different L2-cathegories. Tollef Salemann (talk) 20:34, 27 February 2023 (UTC)[reply]
Doesn’t that also apply to any language with multiple scripts? Shall we have two Mongolian L2s? Theknightwho (talk) 09:35, 28 February 2023 (UTC)[reply]
Mongolian has two scripts, but not two versions of spelling and grammar with two different language sources. Tollef Salemann (talk) 10:28, 28 February 2023 (UTC)[reply]
It has two very different versions of the spelling: they do not correspond at all. As for grammar, there are terms which are SOP in one script but not the other, due to the differences. I really don’t see how it differs. Theknightwho (talk) 14:29, 28 February 2023 (UTC)[reply]
Interesting! But are they derived from two different spoken languages? Or is it because of some spelling reform (like in Russian or Tibetan)? Tollef Salemann (talk) 14:50, 28 February 2023 (UTC)[reply]
The adoption of Cyrillic for Mongolian constituted a massive spelling reform. The pronunciation had moved on. --RichardW57m (talk) 15:19, 28 February 2023 (UTC)[reply]
Ok, so it's not the same situation as in Norwegian Tollef Salemann (talk) 15:23, 28 February 2023 (UTC)[reply]
But the spelling in the Mongolian script is derived from Middle Mongol (and was little-changed from the 14th century), while the Cyrillic is specifically from the Khalkha dialect at the beginning of the 20th century. These would not be mutually intelligible in spoken form, and they derive from different languages (as we use a separate language code for MM), making the situation very, very similar. Theknightwho (talk) 16:02, 28 February 2023 (UTC)[reply]
That's what often happens in spelling reforms. The old spelling is based on an older form of the language. --RichardW57m (talk) 16:10, 28 February 2023 (UTC)[reply]
Yes, which is precisely why it doesn’t make sense to separate Norwegian like this based on something very similar. Theknightwho (talk) 16:28, 28 February 2023 (UTC)[reply]
Bokmål and Nynorsk occured as a result of spelling reforms from two different langusges. Their sources are not related. Middle Mongolian is at least an ancestor of Modern Mongolian. Danish is not ancestor of Nynorsk. Nynorsk stuff like "vite" and "munn" may look similar to Bokmål "vite" and "munn", but they have different word evolution. Tollef Salemann (talk) 16:53, 28 February 2023 (UTC)[reply]
I don’t see that it’s relevant that MM is an ancestor. It’s not mutually intelligible. Theknightwho (talk) 16:55, 28 February 2023 (UTC)[reply]
For Pali in Lao and in Thai script, I now generally give separate inflection tables for abugidic and alphabetic writing systems even when the citation forms are the same. (I used to merge them, but I've now decided that that was a bad idea.) For the Lao alphabetic system, I even give separate tables for the verb forms with 'y', for which there are two writing conventions. There's a rich set of conjugation at ເຊຕິ (jeti). On the other hand, for senses, the multiple senses are hiding in the Latin script entry. We seem not yet to have hit the stress of regional variations - a chronicle from Northern Thailand written in the script of Sri Lanka might induce some stresses. --RichardW57m (talk) 15:40, 28 February 2023 (UTC)[reply]
See also what is done with Karelian, e.g. kieli, hyvä, atraadra etc. Thadh (talk) 16:03, 28 February 2023 (UTC)[reply]
The source of Tver Karelian and White Sea Karelian is common (Old Karelian?), when Bokmål and Nynorsk are derived from two separate languages, but co-exist as spelling system for the same language. I'm not sure how it works in Pali, but it seems more like the Pali situation with Sanskrit loans, rather the Karelian example. Tollef Salemann (talk) 16:45, 28 February 2023 (UTC)[reply]
If you don’t consider the origin languages having separate language codes to be enough evidence that two spelling systems are derived from different languages (which would mean admitting Mongolian is in the same situation), then it’s not clear that you are using coherent reasoning that can actually be applied consistently. In any event, you haven’t actually explained why it’s relevant that the origins are separate, either. That seems totally arbitrary, to be quite frank, as it’s something we can clearly recognise with proper labelling under a unified L2 header. Plus, you’ve ignored the fact that the orthography of Cyrillic and Mongoljin are completely unrelated - which is directly analogous to your point! Theknightwho (talk) 16:55, 28 February 2023 (UTC)[reply]
Ehm, ok, what about this Pali-ish solution?
1) merge Bokmål, Riksmål, Nynorsk, Høgnorsk, and maybe Jamtish into one united L2-category called "One Big Happy Norwegian"
2) in "Etymology" give two different etymologies for indentical spelled forms if they have different ancestors
3) never derive the dialectal splitted infinitive forms of verbs from Bokmål
4) maybe add some runic spellings from 19th century
5) make Danish, Old Norse and Middle Norwegian into ancestors
PS in head of this topic I was talking only about this last one, so im also not supporting any merge without this Danish-Bokmål etymology connection is made clear. Tollef Salemann (talk) 17:33, 28 February 2023 (UTC)[reply]
I'm genuinely considering whether we need to revamp how we deal with orthographic etymologies as a whole: I'm not disputing that Danish is an ancestor to Bokmål, but having that would - at least in some sense - imply that Russian is an ancestor of Cyrillic Mongolian. We already run into this issue with Chinese, where there are separate "Etymology" and "Glyph origin" sections, so I'm thinking we need to generalise that, because the word origin and the spelling origin are two different things. Theknightwho (talk) 17:54, 28 February 2023 (UTC)[reply]
THIS is a very good point!
(But, i still think, Mongolian is a bad example)
I was thinking about it before, because Nynorsk "vite" and Bokmål "vite" corresponds to the same pronounced word [²ʋɪːtə] (but not the variants like vita and våtå). In the same time they have different spelling origin (Danish vide contra oral Norwegian ʋɪːtə). How in whole world the examples like it sould be described etymologycally? Tollef Salemann (talk) 18:00, 28 February 2023 (UTC)[reply]
Unless there are some cryptic abbreviations, Russian is the ancestor of Cyrillic Mongolian only so far as the letters go. The analogy I think you want is with Japanese and its use of Chinese characters for native words. --RichardW57 (talk) 08:58, 1 March 2023 (UTC)[reply]
Yeah many Norwegians i know like to comparising Bokmål to Kanji. Tollef Salemann (talk) 09:05, 1 March 2023 (UTC)[reply]
Both Norwegian and Danish come from Old Norse, so regardless of whether you call Bokmål Norwegian-based or Danish-based, they share a common ancestor. North Karelian and South Karelian aren't necessarily more closely related to each other than to other neighbouring lects in the Finno-Veps dialect continuum, they just culturally form a union. And there is no "Old Karelian", they just both derive from Proto-Northern Finnic (which we handle as Proto-Finnic here). Thadh (talk) 18:00, 28 February 2023 (UTC)[reply]
I've heard two-three other versions on this question from some high Karelians and Russians university people, but i myself can't almost no Karelian, so im a wrong person to discuss the Finno-Veps-Karelian continuum with. But i've got your point. Tollef Salemann (talk) 18:10, 28 February 2023 (UTC)[reply]
A little late to the party, but I've been meaning to bring this up too. I support merging Norwegian under one unified header as well as implementing Danish as an ancestor for Bokmål (although I am undecided as to whether it would be appropriate to include Jämtish just at this point). Rather than reinventing the wheel, it would be instructive to inspect the practice followed at https://no.wiktionary.org to resolve a lot of the theoretical/technical objections. In my opinion this would be the most elegant solution, see for example the following: vite / vita, hus, heim / hjem, bein / ben. For what it's worth, I would also support extending such a merger to the entire Scandinavian dialect continuum - though I'm fairly confident I'll be in a minority on that one! Helrasincke (talk) 07:08, 5 March 2023 (UTC)[reply]
I ain't totaly against this solition as i have no clear position on merging, but Nynorsk "vite" and "bein" have not the similar evolution of the spelling as the Bokmål "vite" and "bein" if you look up the differrent "old" texts or just see at the conjugation (which is also oral). In case of merging we gonna to need to have two different etymologies in one place. I'm sure that most of people gonna just give a total f* in it and gonna derive every Bokmål/Nynorsk word from the same souce (direct from Old Norse), and gonna ignore the Nynorsk and dialectal splitted verb infinitive, like it happened in Bokmål Wiktionary. But to connect verbs like "våtå" and "beina" to Bokmål "vite" and "beinene" seems very wrong to me. That's why im sceptical on any "elegant" solutions. Every Bokmål word has Riksmål form(s) and "Samnorsk" forms (like "kjerke") and every Nynorsk word has old and new variant, as well as different conjugations and dialectal variations. "Elegant solution of Norwegian" sounds very shady for my ears, but maybe im wrong. Tollef Salemann (talk) 10:38, 5 March 2023 (UTC)[reply]
Re how to handle "grammar, alternative feminine plural, splitted verb infinitive and so on": is there any of that which can't be handled in the way we already handle other languages that have inflected forms (etc) that differ in different orthographies/dialects/eras/etc, by listing both forms on the headword line with the appropriate qualifiers (as in level), and/or having both a Nynorsk inflection table and a Bokmal inflection table (like we have different inflection tables in balneum), and separate pages like fulfil and fulfill when the headword itself differs (ideally, as in that linked example, soft-redirecting one spelling to the other)? - -sche (discuss) 10:10, 1 March 2023 (UTC)[reply]
Yeah, this solution is also possible. The example of Pali was also mentioned. But my question was about Danish-Bokmål relations, not about merging. If ya'll gonna need to merge all the houndreds forms of Norwegian, go ahead. At least I ain't gonna make no big deal outta this and can help as well. But am not sure about the other nn-users are gonna support your solution at all.
PS I think it will be stupid to merge all the Norwegian stuff before Scanian and Westrobothnian are not merged into Swedish. Elfdalian is not full recognized as a language in Sweden either. Jamtish is kinda inbetween Swedish and Trøndersk (according to many scholars), so im supporting Jamtish as L2. But not the other dialects. Tollef Salemann (talk) 10:27, 1 March 2023 (UTC)[reply]
The issue here is that the Scandinavian standard languages and the dialects of their respective countries often don't have much to do with eachother. Traditional Scanian and Bornholmsk are closer to eachother than to standard Swedish and Danish respectively, yet as it is they would be listed as Swedish and Danish dialects.
The so-called "Elfdalian" (traditional Dalecarlian dialects are spoken in a larger area than just Älvdalen) is not a uniquely divergent dialect; traditional Westrogothic, Westrobothnic and other dialects are just as far from the standard language. Maybe having a 'Dialectal Scandinavian' header could be a solution, though that is far from conventional.
As for the Norwegian issue, spoken Bokmål (which is real and exists) is primarily Danish-derived. Listing forms like hjem as inherited directly from Old Norse is just inexcusable, and so is calling them borrowings. They are remnants of the Danish writing system, but have at no point been borrowed. Can a language not have several ancestors on Wiktionary?
Finally, I think it's notable that most of the Scandinavian editors are in favour of this change, while editors with little knowledge of the situation are the ones most strongly opposing it. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 16:45, 1 March 2023 (UTC)[reply]
Agree. Why are people suddenly talkin about merging stuff? I was asking about "what was the problem with fixing Danish as an ancestor of Bokmål". Language/dialect merging/splitting is more a political question. For me, it's the same if Norwegian shares same L2 as Swedish and Danish, or if we gonna use different L2 for every single dialect. Im wasnt asking about merging/splitting at all. Tollef Salemann (talk) 17:03, 1 March 2023 (UTC)[reply]
@Mårtensås I want to be clear that I'm not opposing the initial suggestion (Danish being an ancestor of Bokmål) - I just think it's too heavily intertwined with the issue of having two Norwegian L2s for that to be ignored. I think we should take this as an opportunity to consider how we deal with orthographic ancestors in general, as this problem is not unique to Norwegian, and there is no consistency with how it's dealt with between languages at the moment. Theknightwho (talk) 19:17, 1 March 2023 (UTC)[reply]
I should add, this is starting to be a repeat of the discussion last October. In that discussion I said "please assume good faith" and don't insult people with a different view; statements like "editors with little knowledge of the situation" are hardly helpful. Those with "little knowledge" might in fact simply be those who don't have a visceral reaction towards Nynorsk and didn't grow up with the "Spynorsk mordliste" shoved in their faces. (Compare Serbo-Croatian.) Benwing2 (talk) 19:28, 1 March 2023 (UTC)[reply]
Based on etymology, it must be more normal to not merge Norwegian. I understand examples like Kanji and Sanscrit-derived Pali, but Kanji is not sound-spelling based, and Sanskrit Pali is kinda not the same as Bokmål before a proper voting on this subject between the Pali-experts based on scientist views. In the same time, the existence of the Swedish dialects as separate L2-categories is also very weird if ya'll gonna merge Norwegian. The status of Jamtish etc is also not clear.
Also, per status quo I'm totally against the merging of Norwegian, but i see that it may be possible (and even logical) to merge it in a very distant future - but i think it's just a huge waste of time. Tollef Salemann (talk) 19:51, 1 March 2023 (UTC)[reply]
My input here is that unless we merge Bokmål and Nynorsk, I would support setting Danish as the ancestor of the former (and volunteer to help with this process if nobody else will). Diachronic factors matter a lot more with language classification (which is largely etymological in nature) than any synchronic ones. — SURJECTION / T / C / L / 21:49, 3 March 2023 (UTC)[reply]
There are just 315 Bokmål words with the template "derived" instead of "inherited", but there are ca. 15000 Bokmål words with etymology from Old Norse instead of Danish. So it's a huge work. But it's maybe still not so dramatic, as in the situation with the Church Slavonic borrowings in Russian :) Tollef Salemann (talk) 22:07, 3 March 2023 (UTC)[reply]
Inheritance from Old Norse isn't technically wrong for Bokmål (because we treat OWN and OEN as varieties of the same language), it's just skipping one step of the inheritance chain. So while these etymologies would need to be updated to fill in the missing information, it's not something that needs to be fixed before adjusting the ancestors in the language data. — SURJECTION / T / C / L / 22:39, 3 March 2023 (UTC)[reply]
Wouldn't a bot-free solution be to set up dual ancestry? (We have dual ancestry set up for Michif.) I am confused by the current descriptions. Is it the case that often the pronunciation derives from Middle Norwegian but the spelling from Danish, not unlike the mess seen with English bury and busy? --RichardW57 (talk) 22:43, 3 March 2023 (UTC)[reply]
Kinda. At least the modern Danish pronunciation is never used in Bokmål, even if it may have a strong influence in many (or some) cases. Tollef Salemann (talk) 22:49, 3 March 2023 (UTC)[reply]
Bokmål grammar and spellings (especially the older ones) is allmost identical to Danish, but the pronunciation and the modern spelling is heavly influenced by Nynorsk and oral Norwegian. Tollef Salemann (talk) 22:55, 3 March 2023 (UTC)[reply]
Let's please not change the ancestor of Bokmål without consensus. Several people are clearly opposed to it and it would be needlessly antagonizing to unilaterally make such a change. Benwing2 (talk) 01:45, 4 March 2023 (UTC)[reply]
My understanding is that there is consensus to change the ancestor, at least when it comes to the editing communities of these languages. I haven't seen any convincing oppose arguments other than "it just complicates things further for no reason". — SURJECTION / T / C / L / 09:45, 4 March 2023 (UTC)[reply]
The main argument, namely that spelling and vocab do not constitute a language and that Bokmål is readily analysable as West Norse written with Danish (and then diverged even in the written form) does make sense to me.
Imagine you're a Karelian speaker that moved to Finland and learned standard Finnish while speaking Karelian. Over time his spoken and written languages converged. In my opinion, he still speaks a descendant of Karelian, as we document the spoken language by means of the written language, not the other way around. Thadh (talk) 10:27, 4 March 2023 (UTC)[reply]
But what about the stuff like beinene and jenten? These words are pronounced. It is clearly not a Norwegian structure, but directly inherited from Danish. Tollef Salemann (talk) 10:32, 4 March 2023 (UTC)[reply]
Converging one's grammar with other languages is not unheard of, doesn't make it a different language. Compare for instance Kamassian sometimes using -лар plurals in accordance with nearby Turkic languages, doesn't make it a Turkic language. Thadh (talk) 11:09, 4 March 2023 (UTC)[reply]
I ain't saying Bokmål is a different language. I'm saying that 99% of Bokmål is derived from Danish through a strong influence from oral Norwegian and written Nynorsk, and is used as a form of spelling of Norwegian. The L2-classification of Bokmål is a different subject. I have no position on L2-status of any language, it's usually a question about national self-identity and political interests. Etymology, on the other hand, is more about the mathematical statistics, reconstruction and comparison between all the languages and dialects, despite their status. That's why it is important to make Danish as the ancestor of Bokmål. Tollef Salemann (talk) 11:29, 4 March 2023 (UTC)[reply]
I think that's a mischaracterization. Bokmål is, from what I have read, much closer to a (hypothetical) case where Karelian speakers would've had standard written Finnish imposed on them, causing the Karelian-speaking elites to adopt Finnish as their written language and start adopting Finnish elements into their spoken language as well, and only later did they start mixing in some Karelian features to bring it closer to the actual Karelian dialects, thus creating a "kniigakieli" which is Finnish with some Karelian elements. — SURJECTION / T / C / L / 12:25, 4 March 2023 (UTC)[reply]
That's pretty much what I said isn't it? Thadh (talk) 12:39, 4 March 2023 (UTC)[reply]
No, there is a crucial difference, and that is that Bokmål is not representative of spoken Norwegian (as in the descendant of Old West Norse), much like my hypothetical kniigakieli is not representative of spoken Karelian, so treating Bokmål as a descendant of OWN is nonsensical. — SURJECTION / T / C / L / 20:56, 4 March 2023 (UTC)[reply]
But we don't record the written language, we record the spoken varieties associated with the written language, which is a whole different story. I'm not saying Bokmål is a descendant of OWN, but the language that is spoken by those who write Bokmål is. Thadh (talk) 22:18, 4 March 2023 (UTC)[reply]
So you say that the spoken Bokmål is not a creole because the Nowegian state says so? Or you mean, a language of a Trønder or a Traweller who uses Bokmål in writing, has anything to do with Bokmål? If i write Latin in a Mediaeval Norway, can i say that my Latin is a kind of Norwegian? Tollef Salemann (talk) 22:40, 4 March 2023 (UTC)[reply]
That doesn't sound like a fair comparison, nobody writing Latin thought they were writing in their own language. People writing in Bokmål still consider it Norwegian. Thadh (talk) 23:20, 4 March 2023 (UTC)[reply]
That's why it calls Norwegian Bokmål. What has it to do with it's Danish origin? Tollef Salemann (talk) 23:44, 4 March 2023 (UTC)[reply]
Bokmål has developed a spoken variety (Standard Østnorsk) that is the closest there is to a Standard Spoken Norwegian, and from what I can tell, it's based more on Bokmål than the actual local Norwegian dialects (although there is a considerable gradient that exhibits major idiolectic variations). — SURJECTION / T / C / L / 22:47, 4 March 2023 (UTC)[reply]
Yeah. But Standard Østnorsk isn't the same as the Eastern dialects. The most of Eastern dialects is similar to Nynorsk (both grammar and spelling). Toten and, especially Gudbrandsdal, have some sound shifts. Many (most?) areas in Norway aren't using the second feminine gender and the splitted verb infinitive. Trøndelag and the North have very heavy sound shifts, but are close to Nynorsk grammar, even if they/we usually ain't using no written Nynorsk nowadays. Tollef Salemann (talk) 23:11, 4 March 2023 (UTC)[reply]
Old East Slavic literary tradition arose from adopting the Church Slavonic alphabet and often grammar for East Slavic lects. Modern Russian has more Church Slavinisms than you can count. By the above standards, should we call it a descendant of Church Slavonic now? Thadh (talk) 23:25, 4 March 2023 (UTC)[reply]
Again, not a good comparison: Unless my understanding of how Modern Russian has developed is completely mistaken, the Church Slavonic based tradition of the written language began to die out in the 18th century, but left a considerable influence in the modern language, which is still an East Slavic variety with Church Slavonic elements, while Bokmål is Danish with Norwegian elements. — SURJECTION / T / C / L / 23:32, 4 March 2023 (UTC)[reply]
The Church-Slavonic based tradition is Old East Slavic, which did not "die out" at any time, it just went through reforms, much like Bokmål has. The introduction of northern elements (like ц > к) did help in quickly changing the language into something completely different. And some southern features are very visible in grammar, like the fact that the present participles are borrowed from OCS, or the numerous productive suffixes or the comparatives... Thadh (talk) 16:51, 5 March 2023 (UTC)[reply]
But you're clearly talking about terms of Church Slavonic features implanted on top of an East Slavic base, while Bokmål is the opposite (Norwegian features on Danish, not the other way around). — SURJECTION / T / C / L / 17:18, 5 March 2023 (UTC)[reply]
In both cases you have a foreign written language imposed upon a different spoken language. We record the spoken language (assuming it co-exists with the spoken variety), and the only way to do this sanely is by lemmatising at the written language. In Norwegian's case, this spoken language has always been and still is West Norse, but was often influenced significantly by the written language. So the question here is whether or not it's fair to call Norwegians writing Bokmål speakers of a Danish descendant, which in my opinion it's not. Thadh (talk) 19:49, 5 March 2023 (UTC)[reply]
It's more a political and ethnical question. Norwegians speaking Bokmål without use of feminine gender speaks a Norwegian variant Danish (at least some linguists says so, and, of course, many not-linguists). Some others may not agree in it at all. I'm from the first group, but im not a linguist, nor a politician. Cool to hear different opinions on this subject. By me, it's no reason to exclude Danish from the ancestors of Bokmål (as well as it is wrong to say that Bokmål isn't Norwegian at all). But it's not directly related to Nynorsk, so to have two L2's is not so stupid as it seems to many. Tollef Salemann (talk) 20:02, 5 March 2023 (UTC)[reply]
The closest "spoken language" (as in, a spoken standard) of Bokmål is, as I mentioned, a variety that is likewise based on Danish through being based on Bokmål. It is absurd to claim that we should not consider Bokmål a descendant of Danish just because there are speakers who use a Norwegian variety descended from West Norse but write Bokmål, when the linguistic argument about Bokmål being descended from Danish enjoys wide agreement. There is a clear distinction between registers here. — SURJECTION / T / C / L / 20:34, 5 March 2023 (UTC)[reply]
Riksmål (old Bokmål) had a kinda Norwegian lexical base, and the Russian OCS as well has a Russian lexical base sometimes. The spoken languge behind Riksmål and Russian OCS was native (Old Russian and Old Oslo dialect). So i indeed see some parallels here. And you must not think i'm a Trediakovsky-fan who says that Russian and OCS is the same language. Riksmål and Danish on the other hand, is clearly a same language both written and spoken. What is the situation with the old Russian manuscripts from the Mediaeval times? Have the munks from Suzdal and Belozersk wrote almost the same language as OCS? Maybe mr. Zaliznyak had an opinion on it's classification/registration as well? (I'm a nube when it comes to most of Russian texts before the generation of Avvakum Petrov and Yerofey Khabarov) Tollef Salemann (talk) 19:31, 5 March 2023 (UTC)[reply]
@Thadh I know it, but some spelling/grammar/word borrowings from Mediaeval times in a modern spelled language which is heavly based on Eastern Slavic spoken language - those things are not the same as the whole language spelling/grammar/lexical tradition created in a 100 years directly from an other language. By the way, we need to work with the Russian OCS borrowings as well. There are at least some thousands, not some houndreds of them on Wiktionary. Tollef Salemann (talk) 23:33, 4 March 2023 (UTC)[reply]
E.g. Janne Bondi Johannessen clearly states that the spoken Bokmål is from Danish, because most of the Danish grammar forms have replaced the traditional Oslo-Norwegian ones. Tollef Salemann (talk) 23:16, 4 March 2023 (UTC)[reply]
Where is the dogma that we record the spoken language rather than the written language laid down? It sounds like the garbage of the linguists who completely ignore the written language. Literacy is very significant in some cultures. --RichardW57 (talk) 00:00, 5 March 2023 (UTC)[reply]
I agree with this. This is one reason why that argument doesn’t work too well here, but another is that Bokmål isn’t even entirely written. It’s based on a language that developed from Norwegians speaking Danish (compare Gøtudanskt), one that’s been in continuous use since, and the written language has later included innovations in pronunciation. Forms like myk (< Danish myg, cf. Norwegian mjuk) would never be a thing hadn’t it been spoken. And without these innovations, it would still not mean Bokmål derives from Middle Norwegian. The Bokmål entries on Wiktionary describe both a written and spoken tradition, both derived from Danish. Eiliv / ᛅᛁᛚᛁᚠᛦ (talk) 14:02, 7 March 2023 (UTC)[reply]
Consensus is important, but what you mean by several? I've not found more than 3 people in the previous discussions, who oppose it. The other part are talking about merging or about the technical issues, but gives no clear arguments towards the subject (ancestry of Bokmål). Tollef Salemann (talk) 09:50, 4 March 2023 (UTC)[reply]
I agree with you that there definitely seems to be consensus for setting Danish as an ancestor of Bokmål. Despite the concerns about the current split of Norwegian into two L2s (which is separate, albeit related), I see only one person raising serious objections on the ancestry issue. Theknightwho (talk) 17:23, 5 March 2023 (UTC)[reply]
For everyone's information, I've set up dual ancestry for Bokmål for now - this can be seen in Category:Norwegian Bokmål language, but I maintain that we should just have da as the ancestor (even if that means we'd need some way to mark "Norwegian" borrowings into Norwegian Bokmål; tagging them as Nynorsk wouldn't probably be that helpful). — SURJECTION / T / C / L / 20:55, 5 March 2023 (UTC)[reply]
Thanks! Agree about the most of "Nynorsk borrowings" came actually from an oral source. Gonna try to fix it in a more proper way. See also Russenorsk. I have no clue how to do it there, bacause the Norwegian words in Russenorsk came neither from Bokmål or Nynorsk, but Northern Norwegian dialect. This is one of the reasons why i kinda see a logic in merging of Nynorsk with Bokmål. Tollef Salemann (talk) 01:36, 6 March 2023 (UTC)[reply]
The best solution I can think of is allowing the use of no in etymologies, but let it point to the Nynorsk entry. Compare Frankish pointing to Proto-West Germanic. nn is already used more generally for all Middle Norwegian-derived forms of Norwegian, with all reforms and dialects. The issue is just that “Norwegian Nynorsk” has a much more specific meaning in English than the word nynorsk does in Norwegian. If used in the etymology, it would look like something was borrowed from the written standard (post-1873), and not the general language. My solution so far has been to write {{bor|XX|no|-}} {{m|nn|word}}, for instance when a word was borrowed into Danish in the early 1800s or earlier (e.g. fos). This would be easier if no simply pointed to Nynorsk. Eiliv / ᛅᛁᛚᛁᚠᛦ (talk) 01:11, 7 March 2023 (UTC)[reply]
I'm not sure whether it has been mentioned in this long thread that Norway was under Danish rule for a long time, hence the similarity between the languages. Once Norway regained its independence, Norwegians set about revising Danish spellings, probably to reflect pronunciation in Norway. Another major change was the adoption of -sjon noun endings instead of -tion as in Danish. Riksmål, based on Danish, was once standard, and replaced by Bokmål, so Riksmål is not recognised as an official language in Norway now, but still lingers. On the other hand, Nynorsk is official, but mainly spoken in the south-west around Bergen, Stavanger etc.. So if Nynorsk is an official language in Norway, Wiktionary should respect that. A major problem that would be encountered by anybody attempting to merge Bokmål and Nynorsk is the inflections, a good reason why they should be kept separate. DonnanZ (talk) 11:57, 4 March 2023 (UTC)[reply]
@Donnanz See my comment above regarding the way no.wiktionary handles variant forms and inflections. It should be doable here too. Helrasincke (talk) 07:41, 5 March 2023 (UTC)[reply]
Tbf whether or not a lect is official doesn’t mean it’s a separate language: see: Serbo-Croatian and (formerly official) Moldovan. The inflections can be handled like @-sche mentioned. Having 3 separate Norwegian L2s is the most notable example of duplication I’ve seen here. AG202 (talk) 02:38, 7 March 2023 (UTC)[reply]
Serbocroatian is a language combined of two (or more) related languages from the same dialect continuum. Moldovan is a cyrillic variety of Romanian and has a language status in Transnistria, and had it in the whole Moldova during the Soviet time. How is it like the situation with the Norwegian language? Bokmål is derived from Danish, which came from Old East Norse. Nynorsk is derived from the oral Norwegian, which came from Old West Norse. It's not the same grammar, nor spelling. Tollef Salemann (talk) 10:24, 7 March 2023 (UTC)[reply]
@Tollef Salemann I did not say that it was the same situation. I was specifically responding to the point stating "So if Nynorsk is an official language in Norway, Wiktionary should respect that." I purely focused around the official language point, as that has little bearing on whether or not a language is included. AG202 (talk) 14:17, 7 March 2023 (UTC)[reply]
Oh ok sorry. So you say that the official status doesn't mean so much? Very agree on it. If USA suddenly gonna have an official language (English) and call it 'Amurican' it's still gonna be de-facto English. Or not. Maybe it depend on situation. The Norwegian law (if i understand it correct) don't say that Norway has two different Norwegian languages, but just one Norwegian with two equal written varieties. The same with the "Sami language" (they actually use this not-plural-form in the gubment documents). So if we gonna merge Norwegian just because of the name, we can also merge Sami, but it seems very wrong if you aks me. And i'm not sure if it is right to have Moldovan as part of Romanian either, even if they are kinda identical. Tollef Salemann (talk) 15:12, 7 March 2023 (UTC)[reply]

Use of {{suffixsee}} alongside manual derived terms list edit

See -ware#Etymology 2 for an example though that's not the only one I've come across. This might be useful to list redlinks, but for the entries that do exist is there any point to this? Should the blue links be removed? —Al-Muqanna المقنع (talk) 18:06, 26 February 2023 (UTC)[reply]

It would certainly be good to remove blue links IFF they also appeared in the list generated by {{suffixsee}}. I would not remove any redlinks. A counsel of perfection would be to make sure that all the blue-linked items had appropriate etymology content that would place them in the {{suffixsee}} list. DCDuring (talk) 20:23, 26 February 2023 (UTC)[reply]
Aren't prefix and suffix entries only supposed to use {{suffixsee}} and not to have manually added derived terms? (One problem is that if there are many words using the prefix or suffix, {{suffixsee}} only provides a truncated list. I wonder if that can be fixed.) — Sgconlaw (talk) 21:21, 26 February 2023 (UTC)[reply]
Don't remove optionally orange blue links - treat them as red. --RichardW57m (talk) 14:36, 28 February 2023 (UTC)[reply]
How we handle etymology and sense IDs ought to be documented in the interface templates {{suffixsee}} and friends and also {{rootsee}}. Do we really want to hardwire in main space the way term and ID are combined? --RichardW57m (talk) 11:19, 28 February 2023 (UTC)[reply]
I've been grouping the derivatives of Pali roots functionally, as in gah, based on what I'd seen for a Sanskrit root. There seems to be no consistency for even Sanskrit roots. Are roots' derivatives required to be presented as a disordered jumble? --RichardW57m (talk) 11:19, 28 February 2023 (UTC)[reply]
I second the principles summarized above by DCDuring. A way to recap the situation is thus: An iterative path toward eventual perfection: Yes, it is true that the ideal state is that {{prefixsee}} and {{suffixsee}} are the sole things needed under "Derived terms", but, in practice, in the meantime (i.e., while still short of ideal state), whether that call alone produces comprehensive coverage (i.e., all derived terms) is (in turn) dependent on whether each derived term's Etymology section yet adequately covers all the etymology (i.e., both diachronic and synchronic; both historical and surface). In practice, in the meantime, it is better to retain any manually entered redlinks (as prompts for development that needs to happen later), and it is also better to retain the manual bluelinks until someone bothers to verify which ones are covered by the template call and thus can be deleted without losing any information. A trail of breadcrumbs to be fully digested as soon as anyone gets around to each one. Quercus solaris (talk) 20:04, 1 March 2023 (UTC)[reply]
Example: I just checked all the manually entered bluelinks that had been at -rrhea and handled the remaining "holdouts" among them, so that {{suffixsee}} is now the sole element needed there. Quercus solaris (talk) 04:10, 2 March 2023 (UTC)[reply]
That took you about 20 minutes for eight etymologies, based on the contributions log. DCDuring (talk) 22:46, 2 March 2023 (UTC)[reply]
Unclear whether that is a neutral observation of elapsed time, meant to provide a gauge about time required to complete a representative example, or a critique about my speed specifically? Those 20 minutes included an interruption for a lookup regarding GCU and NGU ABX indications, so the elapsed time in that instance is not representative of how long it would take if someone were racing the clock on that sole task in isolation. Such is often true of any elapsed time with me — I often address the sidetrack threads before returning to the main weave du jour, as opposed to leaving them loose or tagging them for later. Quercus solaris (talk) 17:45, 3 March 2023 (UTC)[reply]
Certainly not a critique. I sometimes do boring, minor changes as a way of patrolling entries of interest, with the result that the ostensible main task is often interrupted. In this case, I'd expect one would discover etymologies that need cleanup. DCDuring (talk) 00:22, 4 March 2023 (UTC)[reply]
I agree, quite true. A pattern that I find is that the surface analysis, which from some viewpoints might be considered the lowest hanging fruit for entry, is nonetheless sometimes absent, even when the historical origin and/or ultimate/ancient etymons are already covered. And it occurs to me (now) to consciously acknowledge that point here, in case anyone else who might read this thread later might think to themselves, "how can those multiple etymology sections be confidently backfilled so fast when no philological deep-digging (for each term) is being done?" It's because the surface analysis can usually be backfilled without historical research, and it is valid (per se) without it, that is, "for what it's worth"—and for most people it is worth plenty (in fact the chief value), whereas they just need perspicacity into ISV compounds (discerning the building blocks) more than they need the historical philology (although that is nice too). Quercus solaris (talk) 06:34, 4 March 2023 (UTC)[reply]

2022 ISO 639-3 language code changes edit

Koavf's post above about Unicode reminded me to check and notice that the 2022 changes to ISO 639-3 have also been posted. They:

  • retired zkb (Koibal) as a duplicate of kjh (Khakas); several of our reconstruction pages mention Koibal (as part of a Kamass-Koibal group of related languages).
  • retired tpw (which they call Tupi, and we call Old Tupi and have several hundred entries in) as a duplicate of tpn (Tupinamba), but we seem to use those two codes for separate lects, and Wikipedia considers that tpw can be either tpn or tpk rather than only tpn, so some consideration is required. (codes left as-is)
  • retired kgm (Karipuna) as a duplicate of plu (Palikur); we seem to have already retired (or in any case don't have) code kgm so we're good on that front.
  • retired slq (Salchuq), ostensibly related to Azeri, as nonexistant.

They also:

  • merged tmk (Northwestern Tamang) into tdg (Western Tamang).
  • merged ajp (South Levantine Arabic) into apc (North Levantine Arabic); see discussion here and also recall that some users are working on importing ajp language data, so please ensure any change to these codes here has the support of the editors who are working on Levantine Arabic and won't disrupt that work.
  • merged pmk (Pamlico) into crr (Carolina Algonquian)
  • merged prp (Parsi) into guj (Gujarati); we already removed prp
  • merged xss (Assan) into zko (Kott)
  • merged szd (Seru) into uki (Ukit)
  • merged nom (Nocamán) into cbr (Cashibo-Cacataibo)

They also:

  • split plj (Polci) into Pesse [pze], Dir-Nyamzak-Mbarimi [nzr], Zul [zlu], and Buli [uly]
  • split ksa (Shuwa-Zamani) into [rsw] Rishiwa and [izm]
  • split zua (Zeem) into [tvi] Tulai, [dyr] Dyarim, [dsk], Dokshi, [cxh] Cha'ari, and [zem] Zeem

And added several new codes/languages:

  • lgs (Guinea-Bissau Sign Language)
  • vjk (Bajjika)
  • lvl (Lwel)
  • ykh (Khamnigan Mongol), for which we currently use the exceptional code xgn-kha, so we should be able to switch that over (and move any categories which use the old code in their names) Done.
  • ycr (Yilan Creole); added, see Wiktionary:Beer parlour/2023/March#Adding_Yilan_Creole_Japanese
  • wtb (Matambwe)
  • ikh (Ikhin-Arokho)
  • eud (Eudeve)
  • dzd (Daza)

(They also changed the names of wnb, krp, loh, and lag).
If there are any changes that would require consideration before implementing (or rejecting), besides tpw which requires consideration of whether to change existing uses and to what, and ajp which requires care so as to not disrupt ongoing work, please comment. Pinging User:Tropylium who might know or have access to resources about zkb Koibal, and User:Ungoliant MMDCCLXIV who might have knowledge/resources/opinions about Tupi. - -sche (discuss) 20:01, 27 February 2023 (UTC)[reply]

Yeah it's been agreed to merge ajp and apc but this requires some care given how many entries there currently are. I will be working on this at some point; haven't gotten there yet. Benwing2 (talk) 21:01, 27 February 2023 (UTC)[reply]
As for Koibal, someone has fucked up here: the Koibal / zkb which we mention in entries like *pajmå is an extinct Uralic language from the 18th century, not a dialect of Khakas / kjh (that its speakers have since then shifted to). If it's SIL that has fucked up (and not the people I've seen using zkb for Samoyedic Koibal; would include e.g. Glottolog and Multitree), you could expect this to be reverted by the next update as long as someone gets around to complaining to them. --Tropylium (talk) 23:24, 27 February 2023 (UTC)[reply]
Khamnigan Mongol is now done. Theknightwho (talk) 18:32, 28 February 2023 (UTC)[reply]
More generally, please be careful when adding new language codes. I am about half-way through revamping the language data modules (currently up to K), so please check that what you are adding is consistent with the other codes in the same module. Theknightwho (talk) 18:32, 28 February 2023 (UTC)[reply]
Glottolog identifies [tpw] with w:Lingua Geral Paulista, so there's that consideration.
They also ID [zkb] as the Samoyed lect, but consider that to be a dialect of [xas], so either way there's no need for a separate code and I don't see this as likely to be reversed.
Oh wow, Multitree is up again. I figured it was simply defunct. kwami (talk) 04:53, 1 March 2023 (UTC)[reply]
Right, I actually don't recall seeing [zkb] for any real entries; do we even link to any anywhere? Merging it with [xas] instead would be doable fine. --Tropylium (talk) 16:30, 1 March 2023 (UTC)[reply]
Wait, sorry, i did not understood. If i wanna to add words from Samoyedic Koibal, they gonna be registered as Khakas words? Tollef Salemann (talk) 17:24, 1 March 2023 (UTC)[reply]
You will need to determine whether your "Koibal" words are Kamassian (ISO xas) or Khakassian (ISO kjh), and choose one of those ISO codes. I wouldn't recommend using the defunct code zkb because it's ambiguous between sources, and it's possible that which code it's merged into here on Wikt will change in the future and mess up your edits: ISO has determined that [zkb] is a duplicate of Khakas [kjh], but Glottolog and Multitree treat it as a dialect of Kamas [xas]. Best IMO to avoid [zkb] altogether. kwami (talk) 23:19, 1 March 2023 (UTC)[reply]

Your wiki will be in read only soon edit

Trizek (WMF) (Discussion) 21:21, 27 February 2023 (UTC)[reply]

now this is really a shocking headline if i ever seen one sounds almost like a scam email -- no period nothing you do the exclamation all by yourself yeah well thanks for using your funding & resources & fat checks to actually do something on-wiki for once -- based WMF ??? Fishing Publication (talk) 14:01, 28 February 2023 (UTC)[reply]

VGPaleontologist edit

(Notifying Benwing2, Cinemantique, Useigor, Guldrelokk, Fay Freak, Tetromino, PUC, Brutal Russian): A lot of bad edits (especially grammatically) by User:VGPaleontologist. I have checked and cleaned up to "07:59, 24 February 2023" (the earliest), earlier edits still need to be checked. @Benwing2, perhaps you can also run to update to use an automated IPA, please? He was copying from the Russian Wiktionary.

I have blocked the user for two weeks. Anatoli T. (обсудить/вклад) 06:46, 28 February 2023 (UTC)[reply]

@Atitarev I've had it with users like this; I think this user should be perma-blocked given their repeated warnings and total lack of response to those warnings. Maybe call it a "3 strikes and you're out rule"; this user has had far more than 3 strikes against them. BTW I've been thinking of writing a bot script to mass-undo all changes made by a given user; this may give me the motivation to do it. Also I don't see a lot of manual IPA added before Feb 24; most of the older commits are adding translations. But let me see if I can find some. Benwing2 (talk) 07:05, 28 February 2023 (UTC)[reply]
@Benwing2: Thanks, I filtered by page creation. I don’t object if new entries are nuked in the future. I felt like fixing them today. Anatoli T. (обсудить/вклад) 07:18, 28 February 2023 (UTC)[reply]
@Benwing2 If you do develop such a script, that would be a great way to undo everything by Rajkiandris. Theknightwho (talk) 09:31, 28 February 2023 (UTC)[reply]
@Theknightwho Yes, this same user was the original motivation for writing the script. Benwing2 (talk) 21:33, 28 February 2023 (UTC)[reply]
I've been monitoring this person, also known as StuckInLagToad (talkcontribs), for a few months. It seemed to me that they were improving, for example they're adding genders to nouns now, but I may be missing things I can't see simply because I don't know Russian or much of any other language outside English, Spanish, and their close relatives. Maybe they would have better luck here if they communicated with us more. It seems their only communication with the rest of us is in response to blocks or warnings, and theyve proven unreliable with the many times they've said they understand us and then keep at the same behavior. Still, even though they have had a lot of extra chances, Im glad this isnt an eternal block, because i hate to see hard work answered with punishments. I wonder if they would agree to edit in English only from now on, or some other kind of restriction that others may come up with. Soap 11:02, 28 February 2023 (UTC)[reply]
@Soap The problem is that this user doesn't communicate or respect warnings, so whether or not they're getting better is mostly irrelevant IMO, and I doubt they will agree with (or even respond to) a request to edit only in English. Benwing2 (talk) 22:28, 28 February 2023 (UTC)[reply]
@Soap, Benwing2: He learned a few tricks like adding genders and stresses because otherwise it generates errors but he is completely unaware of declensions or doesn't care enough. He made multiword terms declined as a single word (at мускусная утка), adjectival nouns as regular nouns, animate nouns as inanimate, indeclinable nouns as declineable, etc., which made a mess.
If someone blocks him for longer, I won't object. Anatoli T. (обсудить/вклад) 23:34, 28 February 2023 (UTC)[reply]
This user has been given too many chances and hasn't (even once!) responded on their talk page to numerous requests to stop and change their behavior, so I extended the block to one year. I don't think two weeks has enough teeth for someone like this. In the block message I specified how they can request a reduction in the block. Benwing2 (talk) 08:49, 1 March 2023 (UTC)[reply]

Sabir language edit

At the moment, eleven of the thirteen entries at Category:Sabir lemmas are cited to the 19th-century Dictionnaire de la Langue Franque, ou Petit Mauresque. However, there is a good deal of controversy in the relevant academic literature over the reliability of the Dictionnaire as a source for Sabir (AKA the Mediterranean Lingua Franca or MLF), and even over whether Sabir existed as a durable language in the first place. In light of this I'm not sure it's a good idea for us to present these entries without some sort of warning label.

Joshua Brown, "On the Existence of a Mediterranean Lingua Franca and the Persistence of Language Myths" (2022, in this book), is very sceptical of the evidence in the Dictionnaire. Brown finds that only 1% of the terms in the Dictionnaire are not derived in some way from Italian (68% of the entries are simply Tuscan terms or at most modified only orthographically, a further 18% represent senses transferred to a different Tuscan lexeme, and the rest correspond in more or less obvious ways to an Italian root)—but compare how we've classified them. On this basis he rejects that the Dictionnaire represents anything more than a variant of Tuscan, and indeed rejects that MLF was ever a distinct language with a durable grammar as opposed to a set of ad hoc patterns arising from language contact in specific contexts.

Joanna Nolan, The Elusive Case of Lingua Franca (2020), is much less sceptical than Brown, but she also accepts that the lexicon of the Dictionnaire is heavily biased towards Italian. She points out, however, that the book also includes a section of dialogues which contradict the lexicon and seem to reflect a more heavily Spanish-influenced vocabulary and orthography ("these phrases offer additional Lingua Franca vocabulary, and indeed reveal a more Spanish influence than the predominantly Italian bias of the wordlist"), possibly the work of a different author. Despite accepting MLF as a real phenomenon that's reflected in some way by the Dictionnaire (primarily its dialogues), she accepts that it's an unreliable and frustrating source and mentions that various scholars have adopted a more sceptical attitude.

I don't think we need to go as far as Brown and the other radical sceptics, but we should probably add some kind of note of caution to these entries. I'm not sure if we have a ready-made template we can add, or if there's something else we can do? —Al-Muqanna المقنع (talk) 14:05, 28 February 2023 (UTC)[reply]

{{LDL}} looks suitable. --RichardW57m (talk) 16:17, 28 February 2023 (UTC)[reply]
I don't think it's relevant. The issue is the reliability of the source (and dispute over the existence of the language itself) rather than just the number of attestations. —Al-Muqanna المقنع (talk) 18:55, 28 February 2023 (UTC)[reply]
@Al-Muqanna I think we need to work up a custom template for this. Maybe for example we can use {{Webster 1913}} as a model. Benwing2 (talk) 22:47, 28 February 2023 (UTC)[reply]
And the problem here is that we have just one dictionary, apparently exmplifying the problem that relying on a single dictionary is vulnerable to any problems with its compilation. --RichardW57 (talk) 08:47, 1 March 2023 (UTC)[reply]

Arabic-script declension template for nouns in Northern Kurdish edit

Hi, everybody
I was thinking that Arabic-script Northern Kurdish entries might benefit from the addition of a dedicated noun declension table template, and I was wondering if anyone else felt the same way. — GianWiki (talk) 20:27, 28 February 2023 (UTC)[reply]

@GianWiki I personally think this should not be a priority. Northern Kurdish is primarily written in the Latin script and a LOT of work remains in getting declension tables for nouns and adjectives as well as conjugation tables for verbs. We should not focus on Arabic-script terms until the Latin-script terms are in good shape. Benwing2 (talk) 22:40, 28 February 2023 (UTC)[reply]
I see. Thank you very much for your input. – GianWiki (talk) 16:48, 1 March 2023 (UTC)[reply]