Wiktionary:Beer parlour
Wiktionary > Discussion rooms > Beer parlour
Wiktionary discussion rooms (edit) see also: requests | ||||
---|---|---|---|---|
Information desk new | archives Newcomers’ questions, minor problems, specific requests for information or assistance. |
Tea room new | archives Questions and discussions about specific words. |
Etymology scriptorium new | archives Questions and discussions about etymology—the historical development of words. |
Beer parlour new | archives General policy discussions and proposals, requests for permissions and major announcements. |
Grease pit new | archives Technical questions, requests and discussions. |
All Wiktionary: namespace discussions 1 2 3 4 5 – All discussion pages 1 2 3 4 5 |

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.
Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.
Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!
2002 | |||
---|---|---|---|
December | |||
2003 | |||
| |||
2004 | |||
| |||
2005 | |||
| |||
2006 | |||
| |||
2007 | |||
| |||
2008 | |||
| |||
2009 | |||
| |||
2010 | |||
| |||
2011 | |||
| |||
2012 | |||
| |||
2013 | |||
| |||
2014 | |||
| |||
2015 | |||
| |||
2016 | |||
| |||
2017 | |||
| |||
2018 | |||
| |||
2019 | |||
| |||
2020 | |||
| |||
2021 | |||
| |||
2022 | |||
| |||
2023 | |||
|
February 2023
FYI: January 2023 Unicode newsletterEdit
https://mailchi.mp/f8faa6f0371c/unicode-in-6222562 —Justin (koavf)❤T☮C☺M☯ 20:37, 1 February 2023 (UTC)
Bolding of years in reference templatesEdit
Considering we had a whole discussion about whether or not to end reference templates with a full stop, which is arguably a much more subtle matter, I want to pose another stylistic question: should years in reference templates be in bold, like they are for quotation templates? I personally don't think so. It looks out of place and there's not really any reason to highlight the year of publication instead of the author's name, the title of the book, etc. For quotations it makes sense to present the year in bold because it's the very first piece of information presented, and because at a glance it helps to show when a term was in use.
For examples of reference templates that do this, see T:R:alv, T:R:lt:Safarewicz1967, and this search (I can't guarantee there are no FPs or FNs). There are also non-template-based hardcoded references that put the year in bold, as on čoms, and see this search. 70.172.194.25 22:58, 1 February 2023 (UTC)
- I agree—bold years are great for citations/quotes but seem much less useful for reference list entries. In a similar vein, I find it especially pointless how w:Template:cite journal sets the volume number in bold. An typical example of how it looks in a ref list is at w:Radium#References. But I defer if anyone knows any great reasons for such bold that I am ignorant of. Quercus solaris (talk) 08:04, 2 February 2023 (UTC)
- Putting a volume number in bold is standard practice in some citation styles. —Justin (koavf)❤T☮C☺M☯ 08:27, 2 February 2023 (UTC)
- I agree, I think it looks weird. Vininn126 (talk) 08:22, 2 February 2023 (UTC)
- I think the rationale for having the year in bold is to show usage over time, which I can see some value in. —Justin (koavf)❤T☮C☺M☯ 08:27, 2 February 2023 (UTC)
- For quotations, yes, not for references. —Al-Muqanna المقنع (talk) 08:55, 2 February 2023 (UTC)
- I can see an argument made for consistency, but I do agree: citations bolded, references not. brittletheories (talk) 10:31, 3 February 2023 (UTC)
Too bigEdit
Wiktionary includes many rare and obscure words, which is great, but gets in the way of fulfilling the function of a concise or learner's dictionary, where you would want to learn common words first, and to know which words are recognized by most native speakers. Would there be a way to list only the top N thousand words, some kind of category or appendix? Drapetomanic (talk) 03:10, 2 February 2023 (UTC)
- @Drapetomanic See Category:Basic word lists by language. Coverage is a bit uneven but it's a good place to start. Benwing2 (talk) 04:00, 2 February 2023 (UTC)
- Thank you, this is awesome Technicalrestrictions01 (talk) 08:50, 3 February 2023 (UTC)
- @Benwing2 I have actually never seen this before. Should we promote it on the front page, as we do with appendices and frequency lists? Or would it be better to incorporate this material ino WT:FREQ? brittletheories (talk) 10:34, 3 February 2023 (UTC)
- @Brittletheories If you can incorporate it into WT:FREQ that would be great. Benwing2 (talk) 20:14, 3 February 2023 (UTC)
Removing horizontal rule ---- between language sectionsEdit
- Also brought up in October 2005, February 2006, June 2011, May 2013 and likely elsewhere as well.
Though it's been here for 20 years, it's maybe time to rethink whether we really need it. One often sees it either missing in places where it should be or extraneous at the end of an entry: it is overall confusing for new editors and even experienced editors can occasionally slip up. It serves no purpose, besides being aesthetically pleasing to some, and it's arguably a misuse of wikitext syntaxt. We could start a formal vote. Catonif (talk) 15:31, 2 February 2023 (UTC)
- I like it. People often screw up indentation levels, but this separator makes it much harder to do so in a way that would merge two languages. Useful for bots too. Equinox ◑ 15:55, 2 February 2023 (UTC)
- (As always, the elephant in the room is the refusal [or rather technical difficulty] of using/introducing a real markup language like XML. I suppose someone, by this point, must have written a Wikt-to-XML converter, but it must be an absolute piece of horror. On the other hand, it's totally unreasonable to expect users, even power users, to write XML entries: anyone who has written commercial code since 1990 will have enjoyed the whole "& amp;" thing, where one side does or doesn't decode or encode, or does it twice. But wikitext is shit and we know it, and never mind the Band-Aids.) Equinox ◑ 02:35, 3 February 2023 (UTC)
- i agree with Equinox, it makes for a helpful division in the wikicode (and displayed page). I don't see removing it as offering any benefit. There are a lot of areas our content is uncompact and wastes horizontal and vertical space, but I don't see the tiny amount taken up by this line as a problem. - -sche (discuss) 07:04, 3 February 2023 (UTC)
- Yes, I don’t see a compelling reason for doing away with it. — Sgconlaw (talk) 07:09, 3 February 2023 (UTC)
- Same. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 12:34, 3 February 2023 (UTC)
- The problem with horizontal rules isn't visual. We can generate the same lines with CSS if horizontal rules are removed. The problem is that they take up space in the wikitext and they don't give any extra information, and code has to be written so that it works whether they are there or not. — Eru·tuon 20:23, 3 February 2023 (UTC)
- Yes, I don’t see a compelling reason for doing away with it. — Sgconlaw (talk) 07:09, 3 February 2023 (UTC)
I find the horizontal rule above and below the language name confusing. I would prefer to do away with the rule below the language name. --RichardW57m (talk) 12:57, 3 February 2023 (UTC)
- I wonder whether there might be custom CSS or JS that would address this. DCDuring (talk) 15:02, 3 February 2023 (UTC)
About bots and regex searches, wouldn't /^==[^=]/m
or something of the like work as well? I can see how it might help in reading the wikitext, though I wonder whether syntax highliting could also do the job (making L2 headers a particular colour). Also, we should be thinking whether it is necessary enough to be kept, rather than bad enough to be removed. It is undeniably confusing for new users (there's many things much more confusing here, yes, but those are essential) and seems overall redundant. Catonif (talk) 18:16, 3 February 2023 (UTC)
- As someone who codes bots on the regular, the horizontal rule is actually much more of a nuisance than an aid. — SURJECTION / T / C / L / 18:21, 3 February 2023 (UTC)
- I completely agree with User:Surjection here. My bot scripts never use the horizontal rule to identify language sections since it's not reliable (sometimes users don't include it, etc.). Instead I look for L2 sections, like User:Catonif mentioned, and I have to take care to split off the horizontal rule (and categories ...) before doing certain sorts of transformations, and then put the stuff back at the end, and worry about properly inserting the horizontal rule (or not) if I insert a new L2 section. Benwing2 (talk) 19:54, 3 February 2023 (UTC)
- It's similar with me. In dump-related activities, I only use level-2 headers to parse language sections. Horizontal rules (
----
or<hr>
) are useless to me, and they are sometimes a nuisance because I have to remove them from any search results that show the contents of language sections, and write the code such that it works if they are there or not. I would prefer removing horizontal rules from the wikitext and generating horizontal lines with CSS targeting level-2 headers instead. — Eru·tuon 20:14, 3 February 2023 (UTC)
- Hmm, OK, I find the line helpful (when editing pages the traditional way i.e. not using visual editor) and am used to it, but if it's inconveniencing our bot runners, then that's an actual harm (where previously I hadn't realized there was one) which must be weighed against its benefit in visually separating language sections in the wikicode. Is the fact that people sometimes forget to include it a unique challenge as compared to any number of other things people do wrong in entries, like typoing section names, using imbalanced numbers of equals signs, forgetting a # in a sea of definitions and indented quotes, etc? - -sche (discuss) 00:56, 4 February 2023 (UTC)
- @-sche No, all of the things you mention cause problems, and things like typos in section names actually cause more problems because they can't be handled automatically. But I don't see much benefit in the visual separation of the line; at least in my browser there's also a horizontal line directly below the L2 language name, so the line above it caused by the
----
seems superfluous. And if it can be displayed automatically (as User:Erutuon mentions), that seems a better approach. Benwing2 (talk) 03:14, 4 February 2023 (UTC)- To clarify, when I say I find it useful for "visually separating language sections in the wikicode", "when editing pages the traditional way i.e. not using visual editor", I mean I find the presence of ---- a useful separator when looking at the actual wikicode in the edit window, so whether we automatically display a line in the displayed text of the page with CSS is less important (although I do find it useful there, too, as it keeps the L2 from seeming like it belongs to the section above it). Nonetheless, I don't feel strongly about retaining the actual wikicode ----. - -sche (discuss) 20:58, 8 February 2023 (UTC)
- For text editing, I find it useful for manually extracting a language section for cloning inflected forms for terms that have homographs in other languages. Its absence will require greater attention to cursor placement for copy and paste. --RichardW57 (talk) 09:02, 9 February 2023 (UTC)
- @-sche No, all of the things you mention cause problems, and things like typos in section names actually cause more problems because they can't be handled automatically. But I don't see much benefit in the visual separation of the line; at least in my browser there's also a horizontal line directly below the L2 language name, so the line above it caused by the
- If it causes problems, then I support getting rid of it. I have thought at times that it was redundant, although I don't mind it.--Urszag (talk) 07:59, 4 February 2023 (UTC)
- Given the comments from users who actually run bots above I don't personally see the value of keeping it. —Al-Muqanna المقنع (talk) 09:48, 4 February 2023 (UTC)
Should we start a formal vote? @Benwing2, Erutuon, etc. Catonif (talk) 19:34, 7 February 2023 (UTC)
- Would it be possible to put the rule generated automagically above the L2 (language) header, so that the it served as a divider between the sections? DCDuring (talk) 01:02, 8 February 2023 (UTC)
- We can generate a line with CSS over all headers but the first, yeah. I worked out the rough CSS code that's needed, though I didn't save it anywhere. — Eru·tuon 05:54, 8 February 2023 (UTC)
- — SURJECTION / T / C / L / 19:01, 8 February 2023 (UTC)
body.ns-0 #mw-content-text > .mw-parser-output > h2:not(:first-child) { border-top: 2px solid #777; /* or something else */ padding-top: 0.5em; }
- Thanks. Should it be tested? (Can I test it myself by inserting the snippet into my custom CSS? If it works, I'd support getting rid of the four dashes. DCDuring (talk) 22:28, 8 February 2023 (UTC)
- Does this code work if the header is preceded by
{{also}}
? Or if it doesn't, swould we just accept a rule between the similar writings and the first L2 header? --RichardW57m (talk) 10:53, 9 February 2023 (UTC)- @DCDuring, RichardW57m I tested it at User:Catonif/common.css. Had to change
:first-child
with:first-of-type
to account for the TOC, so it is now unbothered by{{also}}
s. Catonif (talk) 10:03, 12 February 2023 (UTC)
- @DCDuring, RichardW57m I tested it at User:Catonif/common.css. Had to change
Vote created. Feel free to edit it in this buffer week. Catonif (talk) 20:24, 8 February 2023 (UTC)
I missed the important vote. :( BTW, I wanted to remove too. You can add bgcolor in heading like Thai does if you want some "distinction" among languages. --Octahedron80 (talk) 15:59, 25 March 2023 (UTC)
Decreasing Dan's BanEdit
If you missed it, Dan Polansky is two days into a month-long ban. The immediate trigger was this section, but there is a wider context that can be read here.
I disagree with the duration of the ban, so I therefore propose to decrease the ban. I have included three options and a separate "safety" measure. You may also float and discuss alternative sanctions for Dan, such as restrictions on certain namespaces. ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)
- I find this proposal rather abrupt and inappropriate. You've already made it clear that you disagree with the ban with the vote to desyop, yet you've made yet another vote to decrease his block. There are users that have found Dan's behavior problematic in the past, and it's not like TheKnightWho is the first person to block Dan either. I don't even know if there's precedent for undoing someone's block with a vote, let alone one posed at Beer Parlour, which decreases visibility. There are better ways to go about this. Also, when would this vote even end? There aren't enough details to begin with. AG202 (talk) 20:44, 3 February 2023 (UTC)
- It is entirely normal to bring up a block for discussion in the Beer Parlour. It is something that has happened many times and yes, it has resulted in partial reversals before. There is nothing untowards about this procedure.
←₰-→Lingo Bingo Dingo (talk) 20:56, 3 February 2023 (UTC)- Bringing up a block for discussion, yes, I have seen that before. But immediately proposing a vote? I really do not think that's the best way. AG202 (talk) 21:07, 3 February 2023 (UTC)
- @-sche's comment at Wiktionary:Votes/sy-2023-02/Desysop_Theknightwho is extraordinarily pertinent here as well. AG202 (talk) 21:09, 3 February 2023 (UTC)
- -sche has had disputes with Dan for a long time. That comment was entirely expected and, truth be told, not very pertinent at all.
←₰-→Lingo Bingo Dingo (talk) 21:44, 3 February 2023 (UTC)- That comment also specifically addresses the argument you are making: Dan gets involved with everyone who warns him about his behaviour, which makes any attempt to step in after giving a warning look biased. It's bog standard manipulation. Theknightwho (talk) 21:55, 3 February 2023 (UTC)
- -sche has had disputes with Dan for a long time. That comment was entirely expected and, truth be told, not very pertinent at all.
- @-sche's comment at Wiktionary:Votes/sy-2023-02/Desysop_Theknightwho is extraordinarily pertinent here as well. AG202 (talk) 21:09, 3 February 2023 (UTC)
- Bringing up a block for discussion, yes, I have seen that before. But immediately proposing a vote? I really do not think that's the best way. AG202 (talk) 21:07, 3 February 2023 (UTC)
- @AG202: The decrease of the ban and the de-sysoping are two separate issues and should thus be handled separately. Thadh (talk) 22:51, 3 February 2023 (UTC)
- It is entirely normal to bring up a block for discussion in the Beer Parlour. It is something that has happened many times and yes, it has resulted in partial reversals before. There is nothing untowards about this procedure.
- I have discovered that Dan has continued writing his rants about individual users on his talkpage on the Czech Wiktionary, including within the last 24 hours. Given that he explicitly refers to comments left on the desysop vote page, and therefore knows that these are a major contributing factor towards his blocks, it is impossible to see how he could be doing this in good faith. His future on the project feels untenable. Theknightwho (talk) 04:13, 4 February 2023 (UTC)
- Wow, just wow. Knowing this, I'd be more inclined to vote for increasing his block length than decreasing it (although I'm not seriously proposing this). For the record, I don't have any particular grudge against Dan. His Thesaurus and Czech contributions seem valuable, and I even think I would side with him in one of the battles of this silly drama war (his closure of the Named roads RfD seemed relatively reasonable to me), but going on critical rants about en.wiktionary.org users on another WMF site after being blocked here for exactly that doesn't seem like the behavior of someone who has learned their lesson and is trying to improve. 70.172.194.25 04:41, 4 February 2023 (UTC)
- This is grounds for a indefinite ban for me. Completely unacceptable. @Vininn126, you need to see this. AG202 (talk) 04:45, 4 February 2023 (UTC)
- This comment in particular stands out to me:
Nicméně protože jsem jistým obnoxiózním neférovým hnusákem, který se tváří jako Brit ale dost možná je nějaký despotický Asiat (edituje mongolštinu, proč?), na anglickém Wikislovníku zablokován...
(However, since a certain obnoxious unfair bastard who pretends to be British but is quite possibly some despotic Asian (he's editing Mongolian, why?) blocked me from the English Wiktionary...) Dan has a history of making dubious racially-charged comments. Particularly given that he saidDělají ostudu anglosaské kultuře
("They are a disgrace to Anglo-Saxon culture") in reference to the English Wiktionary admins just over 2 weeks ago. Theknightwho (talk) 05:20, 4 February 2023 (UTC)- Oh. Oh dear. 😦 🤦♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 17:18, 4 February 2023 (UTC)
- This comment in particular stands out to me:
- I just checked his Czech page again and he's updated it with past examples of inappropriate behavior from other users (both directed against him and other targets). To be fair, I think a lot of those examples are far more egregious than anything I've seen Dan write in his critical user reviews. But I think the logical syllogism goes the other way than what I think he's trying to argue. It's not that "X was able to say something terrible without any sanction, so Y should be able to get away with mild attacks." Rather, I think X and Y should both be sanctioned proportionally to their wrongdoing. And there should of course be some notion of forgiving and forgetting things from long ago, especially if apologies have been issued and behavior has changed. That said, some of the comments in question are recent and from the admin involved here; so maybe commenters on the desysop vote should review those. I don't really want to get involved in this further, the discussion is already making my blood pressure rise, and I regret participating in it. I guess I have a naively optimistic view that smart people with a shared mission should be able to get along and work collaboratively, but that doesn't seem to always be the case. 70.172.194.25 06:26, 4 February 2023 (UTC)
- "far more egregious than anything I've seen Dan write in his critical user reviews"—Does that apply to any other than Romanophile's rant on his talk page? Most of it seems pretty mild to me, and none of it implies bigotry against entire groups which seems significant to me when considering the climate this kind of thing creates for other users beyond the people having an argument. I might be missing some though. —Al-Muqanna المقنع (talk) 11:01, 4 February 2023 (UTC)
- @Al-Muqanna Speaking of which, Dan has also penned this delightful essay, which is one of the most tone-deaf pieces that I’ve ever read. Probably one of the best examples of the way Dan feigns objectivity and detachment while being highly selective in how he frames things in order to push an ulterior motive. It honestly doesn’t matter whether this is down to intentional manipulation or a lack of awareness of his own emotions; it’s antisocial either way, and yet another example of something he does that is highly offputting to anyone actually affected by it. Theknightwho (talk) 06:17, 5 February 2023 (UTC)
- Speaking as someone who's trans herself, good frelling riddance! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:16, 6 February 2023 (UTC)
- I wrote the above before I realized the extent of his comments. 70.172.194.25 19:06, 6 February 2023 (UTC)
- @Al-Muqanna Speaking of which, Dan has also penned this delightful essay, which is one of the most tone-deaf pieces that I’ve ever read. Probably one of the best examples of the way Dan feigns objectivity and detachment while being highly selective in how he frames things in order to push an ulterior motive. It honestly doesn’t matter whether this is down to intentional manipulation or a lack of awareness of his own emotions; it’s antisocial either way, and yet another example of something he does that is highly offputting to anyone actually affected by it. Theknightwho (talk) 06:17, 5 February 2023 (UTC)
- "far more egregious than anything I've seen Dan write in his critical user reviews"—Does that apply to any other than Romanophile's rant on his talk page? Most of it seems pretty mild to me, and none of it implies bigotry against entire groups which seems significant to me when considering the climate this kind of thing creates for other users beyond the people having an argument. I might be missing some though. —Al-Muqanna المقنع (talk) 11:01, 4 February 2023 (UTC)
- This is grounds for a indefinite ban for me. Completely unacceptable. @Vininn126, you need to see this. AG202 (talk) 04:45, 4 February 2023 (UTC)
- Wow, just wow. Knowing this, I'd be more inclined to vote for increasing his block length than decreasing it (although I'm not seriously proposing this). For the record, I don't have any particular grudge against Dan. His Thesaurus and Czech contributions seem valuable, and I even think I would side with him in one of the battles of this silly drama war (his closure of the Named roads RfD seemed relatively reasonable to me), but going on critical rants about en.wiktionary.org users on another WMF site after being blocked here for exactly that doesn't seem like the behavior of someone who has learned their lesson and is trying to improve. 70.172.194.25 04:41, 4 February 2023 (UTC)
- Today, Dan wrote this rant about bigotry. In it, he says (in English):
- Any Asian user account presents an objectively existing cultural risk, as anecdotally confirmed by the behavior of Wyang in the English Wiktionary.
- If someone wants to accuse me of thinking of Asians as anti-democratic and despotic (not each of them, of course; I know a very kind Chinese I had worked with I would have never thought of a despotic; we are talking tendencies), that accusation is correct, and I believe that thinking is supported by solid evidence and analysis. One may even argue that Slavic people tend to be despotic as well (think of the current debacle with Russians, who ought to revolt better against their semi-crazed leader), yet I am nominally Slavic.
- Transgenderism or transgender ideology seems to be the cultural norm, although, objectively, it is not the completely dominant cultural norm even in the U.S., where this dangerous form of denial of objective reality has taken root.
- I consider these kinds of statements fundamentally incompatible with the ethos of Wiktionary. They’re grounds for an indefinite block. Theknightwho (talk) 16:40, 5 February 2023 (UTC)
- The admins really need to get a grip and act on this because if overt white supremacy doesn't merit removal from this project then I imagine I'm not the only editor who'd have difficulty continuing to participate. —Al-Muqanna المقنع (talk) 17:15, 5 February 2023 (UTC)
- @Theknightwho, Benwing2, Lingo Bingo Dingo We should probably warn the Czech Wiktionary admins about what DP's doing there (if we haven't already). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:19, 6 February 2023 (UTC)
- I got the impression that they're already aware. If they aren't, then they soon will be without our intervention, I think! Theknightwho (talk) 19:21, 6 February 2023 (UTC)
- Just thought I'd bring it up, seeing as DP's still active there, and still has much of their vitriol against various English Wiktionarians on their talkpage there (although they've deleted the very worst of the racist ranting and raving in an apparent attempt to seem more respectable / hide what they'd written earlier, that, too, can still be seen in all its glory in their talkpage's history; judging from the tenor of those of DP's talkpage comments that've been translated here, at least some of what was on their talkpage would likely warrant revdeletion). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:31, 6 February 2023 (UTC)
- I got the impression that they're already aware. If they aren't, then they soon will be without our intervention, I think! Theknightwho (talk) 19:21, 6 February 2023 (UTC)
- @Theknightwho, Benwing2, Lingo Bingo Dingo We should probably warn the Czech Wiktionary admins about what DP's doing there (if we haven't already). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:19, 6 February 2023 (UTC)
- I was going to say this yesterday in response to Dan's previous statements, but overt racism of any sort is completely beyond the pale and should automatically lead to an indefinite ban. Benwing2 (talk) 19:55, 5 February 2023 (UTC)
- OK, as an uninvolved admin I have blocked Dan indefinitely. Benwing2 (talk) 20:01, 5 February 2023 (UTC)
- Thank goodness! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:25, 6 February 2023 (UTC)
- OK, as an uninvolved admin I have blocked Dan indefinitely. Benwing2 (talk) 20:01, 5 February 2023 (UTC)
- Okay, that's quite more than bad enough, reversing my vote. This can be snowballed, if desired.
←₰-→Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)- Concerning this behaviour of Dan that has just come to light – wow, just wow. — Sgconlaw (talk) 20:49, 5 February 2023 (UTC)
- I don’t know about this infinite bad requiring an infinite ban. Being wowed is not unjustified, but myself I am not triggered here in the least but my impressions and those of others who felt the block not too short are confirmed. These are the grievances you write if you sit on the internet too much; it’s as easy to lose one’s marbles as to appear ideologized, both are in the end the having primitive ideas that someone has not succeeded to think throuh. Lingo Bingo Dingo warned about his being “not in the best mental state”, and specifically for this I believed he needed a time off. Understanding bans also as a preventive measure, less so than a penalty for being evil, which I still find hard to believe, understanding the probabilities: you couldn’t just tally the block to the evil action without wider context, different persons always have required different measures even and specifically on this wiki, where there are various pathological patterns of editing; unfortunately we can’t put in much effort to convert him from his wrong. Fay Freak (talk) 21:21, 5 February 2023 (UTC)
- The admins really need to get a grip and act on this because if overt white supremacy doesn't merit removal from this project then I imagine I'm not the only editor who'd have difficulty continuing to participate. —Al-Muqanna المقنع (talk) 17:15, 5 February 2023 (UTC)
Completely Undo Dan's BanEdit
Proposal: Immediately unban Dan Polansky upon the end of this straw vote (the condition being overwhelming support), if it hasn't been undone through another proposal. If this proposal and another reduction proposal pass, this one has priority.
Rationale: Posting that section was not ban-worthy.
SupportEdit
Support←₰-→Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)
- Support I don't agree with many of Dan's contentious points but two or three statements which arguably show signs of low-level racism shouldn't get him banned. Free speech is too precious for that. --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
- 🤦♂️ AG202 (talk) 13:32, 6 February 2023 (UTC)
- "Two or three"? "Arguably"? "Low-level racism"? Seriously? 🤦♀️ There're a LOT more statements than that, many of them filled with obvious, pretty-high-level racism (and since when has racism been OK as long as it's only "low-level", anyways?). Not to mention the overt transphobia that's appeared more recently, and doubtless yet more insults and bigotry that I skimmed over. 🤦♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:24, 6 February 2023 (UTC)
- My point is that Dan hasn't done things like use racial slurs or issue death threats to people because of their race. I think a lifelong ban is excessive. --Overlordnat1 (talk) 19:03, 6 February 2023 (UTC)
- He has yet to even apologize and continues writing these essays, not only being racist and transphobic (also he uses derogatory terminology in his "essay" on trans folks, yet tries to academic his way around it), but also directly attacks editors here, including with the racist commentary. There's no way that people will ever feel comfortable with him being active here with that type of vitriol. And for the love of everything above, please don't use the "free speech" argument again (this is precisely what the deleted sense at free speech was talking about). AG202 (talk) 23:44, 6 February 2023 (UTC)
- Please don't try to dictate to me what phrases I can and cannot use. I was using the phrase in a manner consistent with definition 1 at free speech. My opinion is irrelevant in any case as I'm clearly outvoted in this instance. I've made my stand and have nothing further to say. --Overlordnat1 (talk) 01:03, 7 February 2023 (UTC)
- IMO Dan deserved to be blocked for trying to hijack our entire deletion process and filibustering any attempts to stop him. That's what he was blocked for in the first place. When the indefinite sitewide block was reduced, he responded by intensifying the disruptive behavior- not a good sign. As for the talk-page stuff, it just showed that any attempt at compromise was a waste of time. The details of what he said about whom aren't as important as the fact that he was still trying to justify his actions by vilifying anyone who disagreed with him. The offensive content that's come to light since is just icing on the cake. Chuck Entz (talk) 09:13, 7 February 2023 (UTC)
- Please don't try to dictate to me what phrases I can and cannot use. I was using the phrase in a manner consistent with definition 1 at free speech. My opinion is irrelevant in any case as I'm clearly outvoted in this instance. I've made my stand and have nothing further to say. --Overlordnat1 (talk) 01:03, 7 February 2023 (UTC)
- He has yet to even apologize and continues writing these essays, not only being racist and transphobic (also he uses derogatory terminology in his "essay" on trans folks, yet tries to academic his way around it), but also directly attacks editors here, including with the racist commentary. There's no way that people will ever feel comfortable with him being active here with that type of vitriol. And for the love of everything above, please don't use the "free speech" argument again (this is precisely what the deleted sense at free speech was talking about). AG202 (talk) 23:44, 6 February 2023 (UTC)
- My point is that Dan hasn't done things like use racial slurs or issue death threats to people because of their race. I think a lifelong ban is excessive. --Overlordnat1 (talk) 19:03, 6 February 2023 (UTC)
- Support I don't agree with many of Dan's contentious points but two or three statements which arguably show signs of low-level racism shouldn't get him banned. Free speech is too precious for that. --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
OpposeEdit
- Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)
- Oppose. The desysop vote surprised me, and it seemed a bit underinformed to look only at Dan's most recent comment in isolation from context. LBD's harsh reply to me there and comments here surprise me even more (apparently I am not great at noticing when someone has a problem with me? I'm sorry!), and suggest this is not coming from a place of being underinformed about the problem. Unfortunately, to be knowingly arguing over whether a specific comment would violate the letter of one selected standard if it were viewed in isolation from context, ignoring the issue of the user being persistently disruptive and even (as pointed out on the vote page by another editor) ignoring even the letter of other standards which would prescribe an outcome the arguer doesn't like as much, such as the BLOCK policy which specifically prescribes blocks of this length for persistent or repeat offenders, kind of seems a bit like wikilawyering...? - -sche (discuss) 00:59, 4 February 2023 (UTC)
- Oppose - Dan's talkpage conduct completely justifies the ban. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:06, 4 February 2023 (UTC)
- Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)
- Oppose, the issue at stake is clearly not one specific comment. To be honest, to me at least, the track record of racially charged and other exclusionary remarks noted both here and on the vote page, particularly against Asians, makes it very hard to understand the impetus for protecting Dan without any evidence of him amending his behaviour. I'll add, in that respect, another item that has basically gone without comment to my knowledge: Dan's parting shot at Talk:antimuslim, where he apparently calls other editors insane for believing that people with Asian names could be native English-speakers (cf. Citations:antimuslim, where the O'Brien quote was missing at the time). —Al-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)
- Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)
- Oppose — Fenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)
- Oppose - Correct me if I am wrong, but Wiktionary policy states that your third block due to rudeness must be a month long, and this is Dan's fourth. Three citations, for all senses. (talk) 19:07, 5 February 2023 (UTC)
- Oppose, switched.
←₰-→Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)
AbstainEdit
CommentEdit
Decrease Dan's Ban to One WeekEdit
Proposal: Reduce the total length of Dan's ban to one week. If this proposal and another reduction proposal pass, the shorter one has priority.
Rationale: Posting that section may not have been ban-worthy by itself, but considering the context a slap on the wrist is justified.
SupportEdit
Support←₰-→Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)
- Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
- My commentary on the previous proposal is equally-applicable here. 🤦♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:41, 6 February 2023 (UTC)
- Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
OpposeEdit
- Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)
- Oppose, as posting that section was banworthy - the context merely makes it much more so. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:07, 4 February 2023 (UTC)
- Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)
- Oppose —Al-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)
- Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)
- Oppose — Fenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)
- Oppose Three citations, for all senses. (talk) 00:58, 5 February 2023 (UTC)
- Oppose, switched.
←₰-→Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)
AbstainEdit
CommentEdit
Decrease Dan's Ban to Two WeeksEdit
Proposal: Reduce the total length of Dan's ban to two weeks. If this proposal and another reduction proposal pass, the shorter one has priority.
Rationale: Posting that section may not have been ban-worthy by itself, but considering the context a slap on the wrist is justified.
SupportEdit
Support←₰-→Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)
- Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
- 🤦♀️ Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:42, 6 February 2023 (UTC)
- Support --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
OpposeEdit
- Oppose Theknightwho (talk) 21:53, 3 February 2023 (UTC)
- Oppose, as posting that section was banworthy by itself. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:07, 4 February 2023 (UTC)
- Oppose Vininn126 (talk) 08:47, 4 February 2023 (UTC)
- Oppose —Al-Muqanna المقنع (talk) 09:27, 4 February 2023 (UTC)
- Oppose AG202 (talk) 15:25, 4 February 2023 (UTC)
- Oppose — Fenakhay (حيطي · مساهماتي) 21:54, 4 February 2023 (UTC)
- Oppose - Three citations, for all senses. (talk) 00:59, 5 February 2023 (UTC)
- Oppose, switched.
←₰-→Lingo Bingo Dingo (talk) 20:45, 5 February 2023 (UTC)
AbstainEdit
CommentEdit
Restrict Dan from voting on Desysop TheknightwhoEdit
Proposal: Bar Dan Polansky from voting here if he is unbanned.
Rationale: Very recently banned people should not vote so soon on the staff who banned them.
SupportEdit
- Support
←₰-→Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC) - Support - well, duh! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 01:08, 4 February 2023 (UTC)
- Support - see 2. Three citations, for all senses. (talk) 00:00, 5 February 2023 (UTC)
- Support - Obvious conflict of interest. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 05:27, 5 February 2023 (UTC)
- Weak support It doesn't make much difference as Dan is but one editor but this would be a conflict of interest. --Overlordnat1 (talk) 09:08, 6 February 2023 (UTC)
OpposeEdit
- Oppose If an editor is not banned, they should be allowed to vote. The idea that someone who has had a negative experience with an admin should not be allowed to vote based on that experience is absurd. People can vote for whatever they want based on whatever criteria they want. - TheDaveRoss 15:03, 7 February 2023 (UTC)
AbstainEdit
CommentEdit
CommentEdit
You can add general comments here. ←₰-→ Lingo Bingo Dingo (talk) 20:15, 3 February 2023 (UTC)
- @Lingo Bingo Dingo Do the recent developments here make the ongoing desysop vote against TKW moot, or are we going to leave that vote to run its course (not that it's likely to matter either way, given that there seems to be an overwhelming consensus not to desysop them, but still)? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 18:29, 6 February 2023 (UTC)
- Yes, I'll cut it short. There is no point to letting it continue.
←₰-→Lingo Bingo Dingo (talk) 21:29, 6 February 2023 (UTC)
- Yes, I'll cut it short. There is no point to letting it continue.
Sad to know this user is eventually permabanned. This user made some good points on the existing problems of Wiktionary's administration. -- Huhu9001 (talk) 13:05, 26 February 2023 (UTC)
Premature archivingEdit
In January 2022, I commented on Wiktionary talk:Requested entries (English) that suggestions on Wiktionary talk:Requested entries (English) had been prematurely archived. There was no response.
The same thing has happened again, with even a suggestion I posted in January 2023 moved to the 2022 archive. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:08, 3 February 2023 (UTC)
Alternative formsEdit
Here's a question that was occasioned by a particular word but that has more general applicability. What is the preferred entry layout for an inflected form of an alternative form?
Specifically:
screweyes is the plural, of course, of screweye.
It is also an alternative form of screw eyes.
Around which of these two ways of conceptualizing the word should the headings under the entry screweyes be structured? I know how to build the entry either way, but am not sure which would be a better fit for uniformity across Wiktionary in this sort of situation. --HelpMyUnbelief (talk) 01:42, 4 February 2023 (UTC)
- My understanding is that the norm is that each inflected form is defined as an inflection of its own corresponding singular, and then the less-common singular is defined as an alternative form of the other, 'main' spelling. This does mean that someone who looks up e.g. mockups has to click twice, instead of just once, to arrive at mock-up, but this is not so onerous. Sometimes people do add both things to the definition line, i.e. "plural of ___, alternative spelling of ____" or the other way around. To define mockups only as an alternative form of mock-ups and not mention its singular mockup at all would be wrong IMO. - -sche (discuss) 02:09, 4 February 2023 (UTC)
- Thanks. Now that I've re-pondered the issue in light of your answer, I'm having one of those "Of course! What was I thinking?" epiphanies.--HelpMyUnbelief (talk) 07:35, 4 February 2023 (UTC)
- Agree. At minimum "plural of ___". Optimally "plural of ___, alternative spelling of ____". Quercus solaris (talk) 22:39, 4 February 2023 (UTC)
- Thanks. Now that I've re-pondered the issue in light of your answer, I'm having one of those "Of course! What was I thinking?" epiphanies.--HelpMyUnbelief (talk) 07:35, 4 February 2023 (UTC)
I like the idea of listing it both ways in the definition; and leading with "plural of", I see now, is the only order that makes sense. And of course I'll always add an "Alternative forms" section to the 'main' entry. Thanks for the input, everyone. — HelpMyUnbelief (talk) 01:33, 6 February 2023 (UTC)
Is it time to look at Toki Pona again?Edit
I'm very much not a Wiktionary editor by any stretch of the term, so if I'm way off base in any of this, feel free to poke me about it.
After another quick search of the toki pona appendix, because it's simply the only good toki pona to english dictionary around, I realized... wait, why is it in the Appendix, exactly? Some digging later I found the conlang inclusion guidelines, and saw that the inclusion criteria for mainspace was essentially "some very old IALs". Which, yeah, those are the ones that have a lot of use. Of course, there is a conlang that has a lot of use that Isn't a very old IAL (or from a book/movie): toki pona.
I went on a quick search of WT:VOTES and found that the last time that toki pona was discussed in any depth for inclusion appears to be this 2010 discussion, and that this 2017 discussion appears to me to be a consensus that the inclusion criteria for conlangs is essentially "the community says so". Since then, a few constructed languages have been removed from the main dictionary, but none have been added. I think it would likely be useful to add Toki Pona. For example, here's the number of speakers of the four currently included constructed languages, and Toki Pona.
Esperanto | Ido | Interlingua | Volapük | Toki Pona |
---|---|---|---|---|
~60,000c. 2017 | 200c. 1999 | ~1,500c. 1999 | 20c. 2000[1] | ~1,400c. 2022 |
- ^ Claimed 1,000,000 in 1889. Dubious.
I'm almost certainly comparing apples to oranges by using the vastly different dates for Toki Pona and Esperanto compared to the rest of them, but it's surprisingly difficult to find data on how many speak the other three. While I understand that Volapük makes sense to include for the same reason Wiktionary includes, say, Latin, it seems to me like the number of Toki Pona speakers is comparable to the number of Interlingua speakers and Ido speakers. For that reason, I ask: why is Toki Pona relegated to the Appendix? Unlike in 2010, when Toki Pona had a small-to-nonexistent community, nowadays Toki Pona is actively spoken / written by many people - including, for what it's worth, a handful of enwikipedia users.
Toki Pona also recently received an ISO 639-3 code. The application for it can be found here, and I must say it does a better job of explaining it's worthiness for inclusion then I do. Although "second-most used conlang" is a bold claim - have these jan never heard of Interslavic?
Because of my lack of experience with Wiktionary, I don't really know whether it's a discussion actually worth having, but here you go regardless. Casualdejekyll (talk) 02:29, 4 February 2023 (UTC)
- Personally I think all conlangs other than Esperanto should be moved to the Appendix. Benwing2 (talk) 03:20, 4 February 2023 (UTC)
- I have to agree. brittletheories (talk) 12:26, 4 February 2023 (UTC)
- I would highly prefer this option. Volapük could also possibly stay due to its historical significance, but I wouldn't shed any tears over it being moved to the appendix too. — SURJECTION / T / C / L / 20:02, 4 February 2023 (UTC)
- I agree, too. - -sche (discuss) 20:13, 4 February 2023 (UTC)
- Agreed as well. All other conlangs are insignificant compared to Esperanto. Toki Pona may become a big thing but it's not there yet. Ioaxxere (talk) 20:45, 4 February 2023 (UTC)
- I would also like to propose criteria for a conlang to be included into mainspace:
- Has an ISO 639-3 code
- Has or had at one point a significant community of native speakers
- Has a significant body of original literature (not translations) on a wide range of topics
- Ioaxxere (talk) 20:54, 4 February 2023 (UTC)
- I would also like to propose criteria for a conlang to be included into mainspace:
- Agreed. Vininn126 (talk) 22:55, 4 February 2023 (UTC)
- Those other languages didnt get here by virtue of their number of speakers .... they got here because they're fully featured languages capable of doing anything a natural language can do, and have demonstrated such use through a body of works written in the language. Importantly, they can be used to translate material from another language. Quenya, Klingon, and other languages meant for fictional works are often very well made, but they were made for a specific purpose and cannot be used to express concepts outside the fictional world of the work. Therefore they cannot be used to translate written works the way Esperanto and the others can. Toki Pona is an experimental language even more restricted in vocabulary than the languages used for fictional works .... it isn't capable of expressing concepts outside its scope by design. Any translation from English into Toki Pona and back again would result in a distorted message. Therefore it isn't useful to our readers to be putting Toki Pona words and translations into mainspace. Much better to use the appendix, where they're all together, since most likely, people looking for one Toki Pona word are looking for others. —Soap— 08:01, 4 February 2023 (UTC)
- Agree with Soap on this. There are methodological issues with treating it the same way as any natural language. —Al-Muqanna المقنع (talk) 19:30, 4 February 2023 (UTC)
- You can say whatever you want in Toki Pona. You just have to use a bunch of words to express the same as a concept in English. I am not an expert in this conlang, however. Three citations, for all senses. (talk) 01:12, 5 February 2023 (UTC)
- IIRC, one problem with including Toki Pona would be multi-word strings and if they are SOP or not, since definitions of base words is very loose, it would be hard to determine, plus it would be more difficult to say which such phrases have fully lexicalized and which are nonce (this is a problem with other languages, too, but exacerbated in Toki Pona). Vininn126 (talk) 12:33, 4 February 2023 (UTC)
As I see it, Wiktionary aims to include all words in all languages, with the footnote that they should be in use by a language community. This excludes made-up words which hobbyist dabble in. (It does not exclude made-up words that are actually in use, such as coinages by the Académie française.) Whether a language is a natlang or not is, therefore, irrelevant. (Standard French is a conlang created by the Académie française.) What matters is the language community and the use (vs. the hobbyists who dabble). Many hobbyist do not make a use.
Our rule of “use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year” is how we’ve been establishing use (in conlangs among others) for a long time. It’s flawed, but I think we should abide by it until we have a better alternative. When Lojban was moved to an appendix, several people brought up that most (all?) Lojban entries could not possibly meet this requirement. I don’t think discussing the inclusion of conlangs in main space can be fruitful until that argument is out of the way.
So you say that over a thousand people use Toki Pona. Sweet, but do they publish durably archived works? can Toki Pona entries meet our criteria for inclusion? (If so, please add quotes! Three independent durably archived quotes on every single sense of every single Toki Pona entry is the one argument I want to hear.) Or is there any other proof that there’s a language community (rather than a thousand hobbyists with too much time on their hands) that we could discuss? If so, I’m with you; if not, sorry. MuDavid 栘𩿠 (talk) 02:37, 6 February 2023 (UTC)
- This is roughly the way we've approached the issue in the past. Volapük and Esperanto are used in a large number of durably archived books, so lots of words are attestable. (Volapük has gone out of fashion nowadays, but books in it were published in the late 1800s and early 1900s.) On that basis I support keeping Volapük and Esperanto in the mainspace. Ido and Interlingua don't have as large of a durably archived corpus, but I would guess probably still enough. For what it's worth, I've seen an actual physical book in Interlingua before, which is more than I can say for Lojban or, for that matter, Toki Pona. —Granger (talk · contribs) 03:50, 6 February 2023 (UTC)
- I disagree with Soap; there are many modern messages Sumerian can't translate ("The Communists moved their tanks over the steel bridge, and ICBMs controlled by computers in Moscow were ready to launch.") If we had a body of text in Toki Pona, then fine. Klingon has ( http://klingon.wiki/En/PhysicalBooks ) a handful of printed books, with most of them being pretty short; The Wizard of Oz, for example, is one of the longer ones, at 40,000 words. Still, I'd argue for Klingon, if there weren't the copyright issues. Volapük has a decent collection of works, even if they're mostly century-old. Vo.Wikisource.org is sparser than I'd like to see, but it's still got a decent amount of text. There doesn't seem to be a single work printed in Toki Pona that's not about Toki Pona.--Prosfilaes (talk) 02:20, 23 February 2023 (UTC)
- When I looked into this last year, I was able to find only two such works, and they were by the same author. I don’t know whether that has changed. 70.172.194.25 06:28, 25 February 2023 (UTC)
Quotes missing translationsEdit
These should categorize under their own categories, not under "Requests for translations of X usage examples". Quotes practically always have surrounding context, while usage examples do not, so translating them is a different kind of task. — SURJECTION / T / C / L / 13:32, 5 February 2023 (UTC)
- Agree. Vininn126 (talk) 14:49, 5 February 2023 (UTC)
- Done — SURJECTION / T / C / L / 07:37, 6 February 2023 (UTC)
User rightsEdit
I've been curating our lists of users with certain rights (through the use of Special:ListUsers), and have removed rollback rights from the accounts of long inactive users.
I suggest a bureaucrat (@Chuck Entz, Surjection) do the same for administrator rights.
Also, since administrators automatically have the rollback rights, I intend to remove them from this list as well, so that only people without administrator rights appear in it. This would concern @Benwing2, Mnemosientje, Rua, SemperBlotto, Surjection. Any objection? PUC – 14:24, 5 February 2023 (UTC)
- @PUC Hi, you need to keep rollback rights on User:Benwing2 since this is a non-admin account. You can remove them from User:Benwing. Benwing2 (talk) 19:43, 5 February 2023 (UTC)
Normalisation in Old English entriesEdit
@Skiulinamo:, @Hundwine:, @Hazarasp:, @Leornendeealdenglisc: Hello, all. I wanted to start a discussion to address an important issue regarding our Old English entries, and you all seem to be the editors consistently contributing to the language (please forgive me if I've left anyone out). I want to see if we can establish a consensus regarding the issue of spelling normalisation. I recently saw an edit to dēorling here [[2]] where the main entry was moved to dīerling (a very rare or possibly unattested form [?]) but the move makes sense given that we have the stem's entry as dīere (itself a rare, but attested spelling). In my own personal view, dīere would be the etymologically "expected" form, being inherited from *diurī, that later became dȳre. I know there are others just as valid, and these are often dialectal or temporal variations, but what are your thoughts on how situations like this should be treated ? I think we could get Old English terms added faster and more efficiently if we all align our efforts as a single unit and work together. Personally, I prefer the normalised spelling (even though I am not currently editing that way), as it allows end users to more readily see how derived terms relate to the root. But I am flexible and will support whatever we decide as a group. And of course, we can choose to decide nothing and simply continue to do as we do now. What say ye ? Leasnam (talk) 17:56, 5 February 2023 (UTC)
- I'm not flexible at all. Normalizing makes it way easier to edit wiktionary. I'm not hunting everywhere to determine if a normalized spelling is attested when it's often impossible to tell, since there is no database listing every spelling for every word. Much more convenient to just allow the use of normalized spellings like most dictionaries do; that's been the de facto policy for Old English entries forever without causing any problems. We already sacrifice a little bit of purity for usability when we replace wynn and ⟨uu⟩ with ⟨w⟩, which creates an enormous amount of unattested spellings just by itself. I'll go into more detail if you like, but to me the benefits of allowing these spellings vastly outweigh the costs. Hundwine (talk) 20:49, 5 February 2023 (UTC)
- This has already been brought up a few other places, right? E.g. User_talk:Hundwine#Hyrsum, Wiktionary:Requests_for_deletion/Non-English#hiersum. I agree that it seems like a good idea to come to a general consensus rather than bringing it up on each applicable word. I am not generally involved in editing Old English entries on Wiktionary; it seems to me that this issue mainly involves the diphthong "ie", a spelling "which is virtually restricted to Early West Saxon" ("Late West Saxon palatal diphthongization", CORE), but which apparently is useful per Hundwine as a normalized version of the more common alternatives found in its place.--Urszag (talk) 18:37, 5 February 2023 (UTC)
- If you're going to do this, please at least add
{{normalized}}
and make it clear what the attested form(s) are in some way. 70.172.194.25 18:47, 5 February 2023 (UTC)- Perhaps greater use of dialect labels could be of use? Currently, "dēorling" is only marked as "Alternative form of "dīerling" and there is no dialect label on either, but if the normalization is implicitly to an Early West Saxon standard, perhaps it will be more appropriate to make that explicit by including an "Early West Saxon" dialect label on the normalized entry, and it certainly seems it would be helpful to include the labels for the dialects that use the form "dēorling" on that page.--Urszag (talk) 18:57, 5 February 2023 (UTC)
- I agree with User:Urszag here about dialect labels. Keep in mind that
{{alt form}}
supports a|from=
parameter to specify a dialect label; see its documentation as well as Category:Form-of templates. So we can add the appropriate dialect labels to indicate e.g. that a spelling is 'Late West Saxon' or whatever. ('Late West Saxon' is in fact one of the already-supported labels in Module:labels/data/lang/ang, meaning that if you use it, you'll get appropriate links.) Benwing2 (talk) 22:01, 5 February 2023 (UTC)
- I agree with User:Urszag here about dialect labels. Keep in mind that
- Perhaps greater use of dialect labels could be of use? Currently, "dēorling" is only marked as "Alternative form of "dīerling" and there is no dialect label on either, but if the normalization is implicitly to an Early West Saxon standard, perhaps it will be more appropriate to make that explicit by including an "Early West Saxon" dialect label on the normalized entry, and it certainly seems it would be helpful to include the labels for the dialects that use the form "dēorling" on that page.--Urszag (talk) 18:57, 5 February 2023 (UTC)
Changing our glossary definition of neologismsEdit
@CitationsFreak @Al-Muqanna as people involved in the discord I believe our current definition of neologisms in the glossary is laughably bad. Neologisms is more a type of marking like slang, a "perceived" newness, and not "anything new". Vininn126 (talk) 18:51, 5 February 2023 (UTC)
- Yeah, neologisms have to have a "new-word" scent to them to count. "Cinemanic" has it, "spongy moth" doesn't. Three citations, for all senses. (talk) 18:59, 5 February 2023 (UTC)
- I agree the glossary definition is problematic, and should be more specific to account for how we use the term. When discussing neologisms people are generally thinking of something more organic than e.g. officially decided scientific names (so SARS-CoV-2 is not a neologism, but Covidtide is). It's also not useful to add a "neologism" context label to every word that happens to originate after, say, 2010. I also note WT:Neologisms (linked at the glossary) says that we label words as neologisms when they're not in other dictionaries, which is a bit barmy in my opinion (are we labelling early modern English terms that happen not to be in other dictionaries "neologisms" too?) and not how the label is used in practice. (I see @DCDuring complained about this way back in 2009 on the talk page too!) —Al-Muqanna المقنع (talk) 19:04, 5 February 2023 (UTC)
- I agree. New word formation happens continually in living languages with large speaker populations, and it isn't useful in a non-linguistics-specific/exclusive context to apply the label of "neologism" to all of the developments (the example given above is a good one). It may be fine for linguists who are using the term advisedly with an agreed operational definition to use it in that sense amongst themselves (such as "any word less than 10 years old" or whatever), but for a wider audience, it is problematic because the public takes it to be a label of casualism or slang, which is a distinct sense from the stricter technical sense. (Polysemy strikes again.) Regarding other dictionaries, one thing about them for sure is that they fail to enter countless words that ought to have lexicographic coverage, and most of those have nothing to do with casualism or slang but rather are simply scientific or technical words that aren't common outside of particular semantic contexts. In the pre-web era, a valid excuse was page count containment. Today for the online versions (of any general dictionary for adults) there is no excuse except lack of budget to pay people to enter them and curate the collection. For Wiktionary at Appendix:Glossary#neologism perhaps a short and clear explanation something like: "neologism: A newly coined term or meaning. Wiktionary does not label new words or senses with this label unless their acceptance in formal register is incomplete or contentious." Something along those lines. Quercus solaris (talk) 19:34, 5 February 2023 (UTC)
- I apply the (neologism) label if a term has achieved widespread recognition in a short time but is not generally recognized as a "real" word, essentially acting as a warning label. Does anyone have objections to these criteria? Ioaxxere (talk) 19:44, 5 February 2023 (UTC)
- I agree that that is essentially the same spirit/theme. The adjective "real" is problematic for this purpose because nonce words and slang words are definitely real words; the true distinction is more about register (formality or absence thereof). But yes, nonetheless, I agree that that is the same idea. Quercus solaris (talk) 19:51, 5 February 2023 (UTC)
- That is more or less what I am trying to say with the OP, and I agree with Q that it's less about "realness" (though I recognize that it's meant to refer to how it's perceived by speakers), and more about that when they hear the word, they feel it's new. Vininn126 (talk) 19:55, 5 February 2023 (UTC)
- Yeah, I'm not sure how exactly to define it, but agree we should define it better. I think there are at least two criteria: firstly what Vininn called "perceived newness" (I understand what Ioaxxere is getting at with "achieved widespread recognition in a short time", but can't a term be a neologism and also rare / not widely recognized? so we need to be careful how we word this), and secondly actual newness (in previous discussions, people have said a word may take a generation to establish itself, so anything older than 20 years, maybe even just 15 or 10 years, isn't a neologism). I think the definition needs to include both, since IMO a name like Margaery or pronoun like ve or singular they or thon or any other word which is actually 100+ years old can't be a "neologism" even if people mistakenly perceive it as new (though it can be "rare", "nonstandard", etc, and in exceptional cases we might want to go into detail in a usage note). I agree it doesn't make sense to call something like SARS-CoV-2 or Delta variant or e.g. tennessine a "neologism" just because they were coined within the last eight years. (I wouldn't necessarily mind entirely replacing the label with defdates or other indications, e.g. in the etymology, of when a word was first used, but certainly as long as we're using the label we should define it well.) - -sche (discuss) 20:36, 5 February 2023 (UTC)
- I think your point about ACTUAL newness is valid - the Polish term dlaczemu is often perceived as a neologism, despite being over 100 years old. Vininn126 (talk) 20:57, 5 February 2023 (UTC)
- Removing it to the etymology section was my first instinct as well, but I think the point about perception and not just reality of newness is solid and supports its use as a context label. I think defining it as a combination would make sense. —Al-Muqanna المقنع (talk) 21:19, 5 February 2023 (UTC)
- By "achieved widespread recognition", I'm talking about coverage by major news sites. In absolute terms, I agree that hardly anyone is using words like tipflation or tripledemic. A neologism that hasn't been established this way should be called a protologism. Ioaxxere (talk) 21:15, 5 February 2023 (UTC)
- I think your point about ACTUAL newness is valid - the Polish term dlaczemu is often perceived as a neologism, despite being over 100 years old. Vininn126 (talk) 20:57, 5 February 2023 (UTC)
- It is not a good idea to redefine for label purposes the word neologism, which almost all other OneLook dictionaries define as "A new word or phrase, or a new use of a word.", as we did before this change (Nov 18, 2021) and still do in Appendix:Glossary. I don't think we have bothered to look for attestation to support the changes in our definition. Even if we found such attestation, it still seems questionable to use the word in ways that are novel to our users, no matter how we define it in our glossary. I suppose that we could restrict the application of the label to only some neologisms ("new word or phrase, or new use of an existing one"), using some clearly stated criteria (which could be somewhat subjective) presented tersely in Appendix:Glossary#N and at greater length at Wiktionary:Neologisms.
- If someone would like to take a crack at a first draft of a substitute or addition to Wiktionary:Neologisms, I am sure many would be happy to suggest changes. DCDuring (talk) 01:09, 6 February 2023 (UTC)
Improvement & expansion of Wiktionary:Frequency listsEdit
I've been thinking about how Wiktionary:Frequency lists might be improved upon. Firstly, I was thinking of creating subpages for wiki-linked frequency lists using either the word lists based on www.opensubtitles.org or those included as part of the Leipzig Corpora Collection. Both collections (though particularly the latter) provide a staggering range of languages and could really help to improve our coverage of smaller languages and identify missing words. Best of all, the content is available under CC, the former with CC BY-SA-4.0 and the latter under CC BY (only applies to the corpora available for download), with no version specified. I hope to then later use these wikilinked lists to expand the frequency lists used on some other language projects, for instance in the Danish, Norwegian and German wiktionaries. If anyone would like to coordinate or collaborate, that would be awesome, but I'm equally fine to proceed on my own as and where time permits, if this is deemed a useful contribution. To see an example of what I'm thinking, you can check out this WIP. Enable OrangeLinks.js for best results.
This brings up the second aspect of my question regarding the organisation of that page, something which - at least from what I can see - doesn't seem to have been discussed much previously. In any case, I think the whole thing could do with some cleanup before being enlarged, with perhaps some of the information being moved into subpages - more so than is already the case. Since it looks like this could be quite a big job, and a lot of the decisions somewhat arbitrary, I'd rather get some input and perhaps find a consensus before starting on it - unless no one really cares either way. I'm unsure whether everything should be moved into subpages and we just keep the links here to each individual language's subpage here, or whether we aim to make this more an index of the frequency lists, much as it is now but with all of the actual list content (if applicable, for example much of the English section) moved into subpages. Any feedback or suggestions welcome. Helrasincke (talk) 08:28, 6 February 2023 (UTC)
- @Helrasincke The frequency lists are in desperate need of some love. IMO they are clearly important, e.g. there was just a Beer Parlour discussion a few days ago (see #Too big) in reference to these lists, where I brought up Category:Basic word lists by language (there's also Category:Basic word lists by family), which aren't integrated into the frequency lists. Yet I don't know of anyone who actively works on them; so I'd suggest you just go ahead and start cleaning things up. Benwing2 (talk) 18:24, 6 February 2023 (UTC)
- Great, thank you for your input @Benwing2. I've worked through to German, at the moment I'm focusing on moving any content to subpages and tidying lists, later I'll go through everything again for the deep clean - but I'm loathe to make any rash or unilateral decisions on that front, since my idea would involve deleting much of what is there and replacing it with the larger lists mentioned in my first post. In general, I'd like to draft — ideally with community input — some written guidelines as to the scope of these pages and how they might be best used in the context of the project. As it stands, there there is a lot of content which IMO is of minimal value to the task of increasing over coverage of words from many languages (granted that's my understanding of their main purpose). For instance 'A list of the top 13 words in language X', '200 most common sentences', 'Most common letters in language Y', etc. seem to fall short of that goal, when there are now lists available up to 1M for most common languages. I'm quite open to others' suggestions as to how to proceed, and there may be good arguments for their inclusion (here or elsewhere) which I can't yet see, but IMO this should at least be moved to a separate appendix (perhaps one targeted at learners or other use cases) or else removed entirely. It's hard to argue we should be providing a general directory service in the era of the search engine.
- Furthermore, I think there could be an argument made to remove all of the numeric content from the lists, since this information is a) not in itself a criteria for inclusion (or exclusion for that matter) b) somewhat meaningless outside the context of the specific choice of source material and c) cluttering many of the pages. Same goes for content in tables.
- So, clearly I have my views, but I would like to see if we can get some more general input, because there are bound to be different ways of seeing this. Is there anywhere else I could/should be asking these questions? Helrasincke (talk) 00:26, 8 February 2023 (UTC)
- @Helrasincke This is the right place to ask these questions. I think the reason you're not getting a lot of input from other people is that most editors focus more on the mainspace; the subspaces and appendices get relatively neglected in comparison, except for some appendices that document language grammar or general editing guidelines for particular languages. Maybe if I specifically ping some long-time editors such as @-sche, DCDuring, Equinox, Chuck Entz, they will comment (or at least tell me to F off :) ...). Personally I think there are multiple purposes for frequency lists; one is definitely to aid in adding missing words (and that's how I've primarily used them), but another is to help learners in identifying words to focus on, and yet another is in making sure that when there are multiple synonyms in given language for a given word, the most common one is the one showing up in translation tables. Also, there's IMO definitely a value in having Wiktionary curate frequency lists (at least picking the most high-quality and useful ones) rather than just picking some at random: in my experience with various languages, some frequency lists are garbage either because (a) they use weird non-representative corpora, (b) they do a bad job lemmatizing non-lemma forms, and/or (c) they include proper names and abbreviations without properly identifying them as such (so e.g. you can filter them out). In general, just running a word-count algorithm on a bunch of randomly chosen text will yield poor results, and that's pretty much all that some frequency lists consist of. As for some of the more random lists currently listed among the frequency lists, IMO you should feel free to move them to an appendix or link them from a different page than the main frequency-list page; they shouldn't be deleted for now unless their content is wrong, since they might be useful e.g. to learners. Benwing2 (talk) 03:47, 8 February 2023 (UTC)
- English word-frequency lists are an important input for "defining vocabulary" lists, which would help us make more instantly accessible (ie, user doesn't need to go to another page to understand the definition), less vacuous (eg, dormitive principle) definitions. Longman published a list of some 2,000+ words in which definitions were written. Any words used in a definition not in that list were highlighted. We can do them one better by using wikilinks for their highlighted words and not for the words in the defining vocabulary. DCDuring (talk) 15:20, 8 February 2023 (UTC)
- @Helrasincke This is the right place to ask these questions. I think the reason you're not getting a lot of input from other people is that most editors focus more on the mainspace; the subspaces and appendices get relatively neglected in comparison, except for some appendices that document language grammar or general editing guidelines for particular languages. Maybe if I specifically ping some long-time editors such as @-sche, DCDuring, Equinox, Chuck Entz, they will comment (or at least tell me to F off :) ...). Personally I think there are multiple purposes for frequency lists; one is definitely to aid in adding missing words (and that's how I've primarily used them), but another is to help learners in identifying words to focus on, and yet another is in making sure that when there are multiple synonyms in given language for a given word, the most common one is the one showing up in translation tables. Also, there's IMO definitely a value in having Wiktionary curate frequency lists (at least picking the most high-quality and useful ones) rather than just picking some at random: in my experience with various languages, some frequency lists are garbage either because (a) they use weird non-representative corpora, (b) they do a bad job lemmatizing non-lemma forms, and/or (c) they include proper names and abbreviations without properly identifying them as such (so e.g. you can filter them out). In general, just running a word-count algorithm on a bunch of randomly chosen text will yield poor results, and that's pretty much all that some frequency lists consist of. As for some of the more random lists currently listed among the frequency lists, IMO you should feel free to move them to an appendix or link them from a different page than the main frequency-list page; they shouldn't be deleted for now unless their content is wrong, since they might be useful e.g. to learners. Benwing2 (talk) 03:47, 8 February 2023 (UTC)
- I'd love to see Wiktionary include some high quality frequency lists. As you've mentioned what we currently have is pretty uneven so if you're motivated to take this on go ahead!
- I think there are two separate and important questions here:
- Where do we get good frequency lists?
- I've done some work building Spanish frequency list for my own purposes and I can say the best option is to find a good corpus and use it to build the frequency lists yourself. As Benwing2 mentioned above, many existing frequency lists have some limitations that make them inappropriate for generating a lemma frequency list. For example, the opensubs "FrequencyWords" wordlists mentioned above converts everything to lowercase and includes many names from the credits so valid Spanish words like "lisa", "vera", and "vegas" appear much more often than they should. The original corpus does not suffer from the some problems but requires you to parse a bunch of XML files to get the original data. It's a good corpus for finding colloquial expressions but otherwise limited in the depth of vocabulary. I would avoid any source that doesn't include full sentences or, at least, 5-gram case sensitive slices of the text so that the same corpus can later be used for finding collocations or identifying multi-word lemmas.
- How do we incorporate good frequency lists into Wiktionary?
- I think Wiktionary:Frequency lists (or wherever we decide to put it) should contain only a list of curated frequency lists that meet whatever requirements we set for #1, perhaps with an explanation of the corpus and how the list was generated.
- Ideally every mature language would include both a "word frequency" and a "lemma frequency" list. The former being useful for finding new words to add to Wiktionary and the latter being helpful for learners and also for categorizing "common" lemmas in each language.
- Where do we get good frequency lists?
- JeffDoozan (talk) 17:04, 8 February 2023 (UTC)
- Routledge publishes a series of "frequency dictionaries" for English and ten or so other languages. The English one has 5,000 terms (some hyphenated, none spelled open) and can have a single homograph under as many as 5 PoSes, constituting 5 items on its list. The English one was published in 2010. DCDuring (talk) 22:41, 8 February 2023 (UTC)
- I notice we have frequency lists from a number of sources whose inclusion could be seen as problematic from the standpoint of w:WP:C, some of which appear particularly brazen, and here I do very specifically include the (currently 20 or so) above mentioned Routledge sources, all of which have been uploaded by one user. Of concern are also the HSK & JLPT lists, the Khmer.info, Sanskrit. Despite their clear desirability, I would prefer we didn't host such sources at all without explicit permission or a clear explanation from initial uploader of why they should be excepted, in light of the clear risk. See more here. Are these grounds for speedy deletion? Helrasincke (talk) 23:10, 9 February 2023 (UTC)
- @JeffDoozan My responses to your points below:
- Although I agree that lemmatised lists are satisfying and nice to have, IMO for our purposes the bigger lists with inflected forms are also quite useful: a) their licences are unambiguously compatible with this project, unlike many of the alternatives; b) they are more representative of words as they're likely to be encountered and thus reflect the range of forms which will be used for lookup; and c) if we're doing our work, we'll end up including the lemmas as part of the process anyway, since the inflected entries invariably lead there. There is however obviously the issue of homographs, so sources sentences are important. The Wortschatz Leipzig Corpora which included frequency lists and the OpenSubtitles results are ok on this front, but they do have some clear downsides. I agree the cleaning is a big sticking point - and the Wortschatz ones have have no cleaning done. I'm still trying to get an acceptable workflow for this and my coding skills are not really up to the task. If you have any suggestions here, I'm all ears. Perhaps @Hermitd would like to weigh in here - how would you set up a refined cleaning process, in light of your experience generating the OpenSubtitles wordlists? Are there any intentions for a further iteration or provision of uncleaned lists for manual processing?
- I agree that Wiktionary:Frequency lists should be curated - but it is already quite long even after my clean-up, so the question becomes: how do we make that work once the list of languages grows even further? Wortschatz Leipzig provides lists for over 250 languages (although getting them clean is a different issue). I've got a draft for an alternative organisation at my here (with a worst-case scenario also here). It's just one idea for a direction we could take this - with a subpage indexing lists per language.
- Any feedback is appreciated.
- Helrasincke (talk) 00:13, 10 February 2023 (UTC)
- Routledge publishes a series of "frequency dictionaries" for English and ten or so other languages. The English one has 5,000 terms (some hyphenated, none spelled open) and can have a single homograph under as many as 5 PoSes, constituting 5 items on its list. The English one was published in 2010. DCDuring (talk) 22:41, 8 February 2023 (UTC)
- (Replying only because I was pinged.) I haven't checked recently, but I know what most of our appendices and non-mainspace are like. (Usually: so ill-judged or outdated as to be almost deletable without a vote.) Whether it's worth having these freq lists at all I would question (who uses them? can we find anyone who uses them?) but certainly if we can "bot it" then let's do it. Equinox ◑ 06:28, 14 February 2023 (UTC)
- @Equinox Well I invite you to have a look at the changes I've made and judge for yourself if it's an improvement - my next step will be to go through list by list and tidy the formatting to make sure wikilinking is set up properly and improve the overall readability & usability so that you don't feel compelled to speedy delete them :P IMO they are definitely worth having (if done right), as I think they are a great tool to prioritise efforts, especially for neglected languages and lesser-resourced language editions. In a project the size of English wiktionary this is probably not so critical, as a lot of stuff just gets done through sheer diversity of editor interests & skills. For example, now that I've tidied them up, I'll be importing them to the Danish and German wiktionaries for help boosting the coverage of high-priority foreign language entries for my contribution languages. It's a long term goal, but I think one of the reasons Danish wiktionary is so quiet is because it's so empty — even in comparison to the Norwegian edition which has similar-sized maximum theoretical user base (~5M speakers). So hopefully if we can improve the utility we can attract more users, and finally more contributors. Helrasincke (talk) 07:31, 5 March 2023 (UTC)
Global ban for PlanespotterA320/RespectCEEdit
Per the Global bans policy, I'm informing the project of this request for comment: m:Requests for comment/Global ban for PlanespotterA320 (2) about banning a member from your community. Thank you.--Lemonaka (talk) 21:40, 6 February 2023 (UTC)
- @Lemonaka This user has only contributed 3 edits to this wiki, which were in 2017 and entirely non-problematic, so I don't have any particular views on this and I doubt anyone else does either. Benwing2 (talk) 21:03, 7 February 2023 (UTC)
- Gonna miss you Planey. Equinox ◑ 06:29, 14 February 2023 (UTC)
- It may be topical for people who edit in (Crimean) Tatar or Uyghur, in view of the user's politically motivated editing.
←₰-→Lingo Bingo Dingo (talk) 17:41, 16 February 2023 (UTC)
Minor change in the treatment of CantoneseEdit
Currently the language treatment page states that Wiktionary's Cantonese is based on the Guangzhou dialect. However, the prestige dialect has somewhat shifted to Hong Kong in the past few decades due to the influence of Hong Kong's media industry and the decline of Cantonese in Guangzhou. The romanisation system used on Wiktionary, jyutping, is created by the Linguistic Society of Hong Kong; it is based on the phonology of HK, which has merged the high-level and high-falling tones, while GZ still distinguishes them AFAIK. Also, the majority of the contributors and contributions on Cantonese come from HK, and almost all our knowledge on GZ Cantonese itself is solely based on dictionaries.
I therefore suggest that we instead treat Cantonese based on the shared parts between GZ Cantonese and conservative (or laan5-jam1-less, prescribed) HK Cantonese, or simply just conservative HK Cantonese. These two dialects are virtually identical in terms of pronunciation (except for the high-level/high-falling distinction mentioned above, but Wiktionary already disregards it by using jyutping), while the vocabulary (the ones labelled as Cantonese) are shared but with several subtle differences in terms of preference on which terms are used. Independent innovations in vocabulary in GZ or HK would not be labelled as Cantonese itself, but instead the relevant dialect.
I believe that this change better reflects the reality of the language and eases the complications of the need to crosscheck multiple sources. This also eliminates the need for dealing with the high-level/high-falling tones, as we would (or could) assume they are merged in Standard Cantonese. This should have minimal effect on the content of the entries, as this is already how some of us subconsciously operate in, for example GZ-specific words often are already labelled as Guangzhou Cantonese.
Pinging @Justinrleung, RcAlex36, Fish bowl, Mahogany115, and perhaps other Cantonese editors that I've missed. – Wpi31 (talk) 17:25, 7 February 2023 (UTC)
- @Wpi31: I think I agree with what you propose, but I'm not exactly sure what needs to be changed other than what the language treatment page says to reflect this particularity. — justin(r)leung { (t...) | c=› } 20:26, 7 February 2023 (UTC)
ParticiplesEdit
My recollection from when I started serious editing on Wiktionary was that 'participle' was not an approved part of speech. When it became accepted, was there any guidance on how editing communities should decide how to handle participles? The issue has arisen with regard to Pali, for I had created a verb form headword template {{pi-verb form}}
, only to discover that @Svartava had already created undocumented {{pi-vf}}
with the same nominal function. On investigating its usage, I find that it is was used very differently - it has been used 6 times, each time for a past participle (to be precise, for the unmarked past participle, which can be active or passive in meaning). My practice has been to treat the past participle as a derived lemma worthy of mention in the conjugation table, for past participles often have meanings not automatically derived from the verb. Svartava's approach has led to two homonymous terms, a non-lemma for the participle and a lemma for the more derived meanings. By some recommendations, this would lead to two identical 1648-cell declension tables. Unless dissuaded, I will replace {{pi-vf}}
by {{pi-adj}}
, change the terms' headings to 'Adjective', merge with the cognate homonymous adjective where appropriate, and have deleted or reduce to hard redirect {{pi-vf}}
. --RichardW57m (talk) 13:14, 8 February 2023 (UTC)
Of some relevance, yet others, notably @ВМНС, but also @aryamanA, have been creating Pali terms categorised as participles. --RichardW57m (talk) 13:14, 8 February 2023 (UTC)
- Do you have the particular conversation informing that participles were not approved? Vininn126 (talk) 13:18, 8 February 2023 (UTC)
- They were clearly non-standard in [this] old version of an appendix to WT:EL. --RichardW57m (talk) 13:34, 8 February 2023 (UTC)
- @RichardW57m Participles are listed as a nonlemma part of speech in Module:headword/data. It is widely used in Latin, for instance. I also don’t see any reason why these wouldn’t be worth recording as a nonlemma - particularly when you say you’ve only been adding adjectival entries where the meaning has evolved (suggesting this won’t happen for every participle). Given the tables are collapsible, I see no problem in having it twice anyway. Space is cheap. Theknightwho (talk) 15:25, 9 February 2023 (UTC)
- What I've been doing with Pali participles is to record them as derived lemmas when I come across attestation of their form or a perceived need to record them, and add in meanings that don't automatically derive from their being participles, while observing copyright. Each participle is a single term, and I record it as an adjective. What @Svartava was doing was recording participles under the PoS word 'verb' and their derived meanings as separate terms under 'adjective'. Others have been adding participles under the PoS word 'participle'; I've not seen them handling additional meanings. I'm not sure what would be a good forum for thrashing this out.
- There seem to have been quite a few arguments over where some English words are adjectives as well as present participles, and we have the benefit of native speakers. I don't believe we have native speakers of Pali, and certainly not native speakers of the Pali of the Canon. It's so much simpler if we can collapse them to multiple senses of a single term!
- Space is not cheap - it imposes a burden on users. Note the complaints this month about the size of the record of the sources for quotations. Collapsibility partially alleviates the burden, but selective expansion is also a pain. It is fortunate that the declension of participles, compared with one another, is overwhelmingly regular, otherwise having two tables would mean having to document (and maintain) a set of irregularities twice for each script, as well as ultimately for each writing system. I think there may a whole bunch of misbehaviour still to document for the oblique cases of the feminine plural of the present participle, bedevilled by a small data set. RichardW57m (talk) 17:35, 9 February 2023 (UTC)
- So are you saying that participles are just a form of adjective? That seems controversial. Theknightwho (talk) 17:51, 9 February 2023 (UTC)
- In Slavic languages they are adjectival or adverbial. Vininn126 (talk) 17:57, 9 February 2023 (UTC)
- It seems to work for Pali. They inherit the ability to have non-subject factors of the action, but other adjectives also have this ability. The absolutive (=gerund, = independent participle =converb) is most simply treated as a verb form, though it has no marking for person or number. There is merit in categorising participles as participles, though one will have to remember not to use the categorising participle templates when defining case forms, such as "# {{inflection of|pi|pacant||loc|s}}, ''which is'' {{inflection of|pi|pacati||pres|part|t=to cook}}" for pacante, which is a case form of a participle rather than a participle. --RichardW57 (talk) 20:52, 9 February 2023 (UTC)
- IMO participles should be placed under a
Participle
header and use{{head|LANG|participle}}
. I've corrected all the places I could find in Spanish, Portuguese and Italian where they were placed under aVerb
header. In general, some but not all participles have also evolved into adjectives and in the case this has happened, you should use a second L3 heading===Adjective===
after the participle heading (or an L4 heading if there happen to be two distinct etymologies). If the participle and adjective have exactly the same declension, I suppose it's possible to put that declension under a separate L3 heading===Declension===
rather than put two L4 headings, but my normal practice is to use two L4 headings. The duplication doesn't seem a big deal to me in most cases. Benwing2 (talk) 23:52, 9 February 2023 (UTC)- Inform me. What is the test for distinguishing a Pali adjective from a participle? Is there anyone here capable of applying the test? --RichardW57 (talk) 08:13, 10 February 2023 (UTC)
- The true horror case for duplication is santa (“true”) - just look at the number of overrides in the page's source code for the declension! I have found an isolated text book which claims a monosyllabic nominative singular saṃ exists - I think this ought to be verified before inclusion. It would definitely tilt the stem from santa to sant. So far I have only needed to transliterate it for Devanagari (potentially automatable) and alphabetic Lao (four varieties jammed into one table). It's inclusion will now require maintenance on 6 (and rising) tables instead of 3 (and rising) as at present. Each script's entry will of cause get even more confusing visibly with three duplicates for utterly homonymous past participle and adjective santa (“exhausted”). --RichardW57 (talk) 08:13, 10 February 2023 (UTC)
- Remind me. Would the nice translation of Pali santa (“tranquil”) go to RfD or RfV after I split the corresponding term into (past) participle and adjective? The question will be whether the adjective as opposed to past participle exists. --RichardW57 (talk) 08:13, 10 February 2023 (UTC)
- Non-English words are supposed to have translations, rather than definitions. (It does seem that a lot of people, quite reasonably, ignore that rule.) Is the adjective v. participle decision to be based on the language in question or on the translations? This question is also relevant to the verb v. adjective distinction in some languages; I've seen edits where someone has objected to translating a 'verb' by an English adjective. --RichardW57 (talk) 08:18, 10 February 2023 (UTC)
- IMO participles should be placed under a
- So are you saying that participles are just a form of adjective? That seems controversial. Theknightwho (talk) 17:51, 9 February 2023 (UTC)
- @RichardW57m Participles are listed as a nonlemma part of speech in Module:headword/data. It is widely used in Latin, for instance. I also don’t see any reason why these wouldn’t be worth recording as a nonlemma - particularly when you say you’ve only been adding adjectival entries where the meaning has evolved (suggesting this won’t happen for every participle). Given the tables are collapsible, I see no problem in having it twice anyway. Space is cheap. Theknightwho (talk) 15:25, 9 February 2023 (UTC)
- They were clearly non-standard in [this] old version of an appendix to WT:EL. --RichardW57m (talk) 13:34, 8 February 2023 (UTC)
OK you are getting snarky here. No need for that. As for Pali, I don't know anything about it, but I'm sure you are aware of the tests for distinguishing adjectives from participles in general; why can't they apply to Pali? And I have no idea why Pali santa (“tranquil”) would be sent to RfD or RfV; has this happened before? Benwing2 (talk) 00:37, 11 February 2023 (UTC)
- @Benwing2 As it happens, I'm not aware of a language-independent test to distinguish participles from other adjectives. I couldn't tell you why English lovable is not a participle, and I can only guess why the Latin gerundive is a participle but the semantically corresponding Ancient Greek verbal adjectives such as Ancient Greek λῠτέος (lutéos) are adjectives. --RichardW57 (talk) 05:00, 11 February 2023 (UTC)
- The Pali lemma santa (“tranquil”) currently contains the senses 'tranquil' and past participle of sammati (“to be calmed”). A mechanical partitioning of the senses of Pali participles will result in the 'tranquil' sense being tagged as an adjective lemma, distinct from the lemma for the participle. That separation may be challenged. --RichardW57 (talk) 05:00, 11 February 2023 (UTC)
- I've converted participles to L3=verb, headword=
{{pi-verb form}}
to L3=verb headword={{pi-vf}}
, making the rough breakdown of Roman script participles: - 12 of L3=verb headword=
{{pi-vf}}
- 3 of L3=...participle
- 30 present active participles with L3=adjective
- 5 with form_of=participle (excludes
{{inflection_of}}
, which has no visible effect for Pali) - That makes L3=adjective the majority approach. I'm the only member of the Pali editor community to have spoken here. @Benwing2's suggestion to distinguish participles and adjective has no practical advice on when to separate adjective and participle, and will therefore also be ignored for that reason. I will standardise Pali participles to have L3='Adjective', and will move to using cat:Pali form-of templates (sorry, @Theknightwho) to categorise them as participles, except that the gerundive (aka future passive participle) remains TBD. Notifying @Octahedron80, Apisite. --RichardW57m (talk) 13:57, 13 February 2023 (UTC)
- @RichardW57m "suggestion to distinguish participles and adjective has no practical advice on when to separate adjective and participle, and will therefore also be ignored for that reason" is obtuse and you know it. There are lots of ways to distinguish participles from adjectives: (1) adjectives have unpredictable meanings that are not transparently derivable from the verb; (2) adjectives lack the verbal meaning inherent in participles; (3) adjectives can (often) form the comparative and superlative, while participles cannot; (4) adjectives can (often) be modified by adverbs such as 'very', 'somewhat', etc. while participles cannot; etc. Use your judgment, obviously. Benwing2 (talk) 18:43, 13 February 2023 (UTC)
- @RichardW57 I'm going to have to agree with Benwing here, and I'm not really sure where all this confusion is stemming from. Participles usually have a slightly different function than adjectives, if sometimes behaving like them. E.g. in English they are used pariphrastically to create the continuous constructions. They also rarely take degrees of comparison, unless they are fully adjectivilized, e.g. "I saw the more reading boy." (?) There are clear differences between the two, even if they are similar. Vininn126 (talk) 18:58, 13 February 2023 (UTC)
- "E.g. in English" isn't much use for other languages. "'Rarely' take degrees of comparison" means that the test cannot be relied upon. Now, I could only find one comparative and one superlative built on a Pali present participle, santatara and sattama, and that is consistent with the sense "good" of santa being for a non-participial adjective. For the past participle, I could find two comparatives in the PTS, built on kanta and paṇīta, and also one on duggata (“wretched”), which is rather a compound of a past-participle. That is not a very powerful test.
- As to the periphrasis test, what do we make of the example sentence for svākkhāta (“well-preached”), where the adjective appears to be being used to form a past 'passive' sentence, rather as in Latin? Have I mistranslated it, or is it evidence for an unattested verb svākkhāti (“to expound well”)? (The simplex, akkhāti (“to preach”) does exist.) RichardW57m (talk) 14:51, 14 February 2023 (UTC)
- @RichardW57 I'm going to have to agree with Benwing here, and I'm not really sure where all this confusion is stemming from. Participles usually have a slightly different function than adjectives, if sometimes behaving like them. E.g. in English they are used pariphrastically to create the continuous constructions. They also rarely take degrees of comparison, unless they are fully adjectivilized, e.g. "I saw the more reading boy." (?) There are clear differences between the two, even if they are similar. Vininn126 (talk) 18:58, 13 February 2023 (UTC)
- @RichardW57m "suggestion to distinguish participles and adjective has no practical advice on when to separate adjective and participle, and will therefore also be ignored for that reason" is obtuse and you know it. There are lots of ways to distinguish participles from adjectives: (1) adjectives have unpredictable meanings that are not transparently derivable from the verb; (2) adjectives lack the verbal meaning inherent in participles; (3) adjectives can (often) form the comparative and superlative, while participles cannot; (4) adjectives can (often) be modified by adverbs such as 'very', 'somewhat', etc. while participles cannot; etc. Use your judgment, obviously. Benwing2 (talk) 18:43, 13 February 2023 (UTC)
Page headingsEdit
I notice Wikipedia has introduced page headings that stay at the top when the page is scrolled down, for example Britannia Bridge. I think it's a great idea, and wonder if it can be introduced in Wiktionary. I think it would be helpful for longer pages. DonnanZ (talk) 12:10, 9 February 2023 (UTC)
- @Donnanz But it will be a nuisance for single screen displaying multiple windows. It may be impossible to slide the window so that the headings disappear off the top of the screen. --RichardW57m (talk) 14:05, 9 February 2023 (UTC)
- @RichardW57m: I don't use that system, but I still have multiple windows open, two for Wiktionary, displaying one at a time. I'm interested in other users' views too. DonnanZ (talk) 14:14, 9 February 2023 (UTC)
- You can check the new skin with url https://en.wiktionary.org/wiki/foo?useskin=vector-2022 or you can change your preferences/appearance to skin Vector 2022. Anyway, the new skin will be soon the default one. It's worth checking it to find out any problem in advance. Vriullop (talk) 07:03, 10 February 2023 (UTC)
- @Vriullop: Oh right! It works on my widescreen monitor quite well when I scroll down. The main criticism I have is with the sidebar, which I feel is now too wide, and this applies to Wikipedia too (I notice it doesn't appear on my user page there). You now have to scroll down to find the table of contents (languages etc.) underneath everything else. I welcome its move from the very top, but I feel it should be at the top of the side bar. DonnanZ (talk) 09:35, 10 February 2023 (UTC)
- @Donnanz You can hide the sidebar with the << icon, or you can hide the TOC then it is available in the page heading even scrolling down. There are some improvements on Wikipedia, not yet available here, splitting the sidebar with page tools in a new bar on the right. This affects some gadgets that will need to be updated. Vriullop (talk) 11:01, 10 February 2023 (UTC)
- Is this when it is the topmost vertically stacked window, for which one can afford only a small vertical scan? Not everyone with a separate monitor has a widescreen. I suppose one may have to start picking and choosing skins. --RichardW57 (talk) 13:18, 11 February 2023 (UTC)
- @RichardW57: I had to buy a new monitor last year after my old monitor with a narrower screen conked out. On top of that, the hard drive wore out, and I had a new solid-state hard drive installed by my local computer shop. An expensive year, but it was worth it. DonnanZ (talk) 15:32, 11 February 2023 (UTC)
- @Vriullop: Oh right! It works on my widescreen monitor quite well when I scroll down. The main criticism I have is with the sidebar, which I feel is now too wide, and this applies to Wikipedia too (I notice it doesn't appear on my user page there). You now have to scroll down to find the table of contents (languages etc.) underneath everything else. I welcome its move from the very top, but I feel it should be at the top of the side bar. DonnanZ (talk) 09:35, 10 February 2023 (UTC)
- You can check the new skin with url https://en.wiktionary.org/wiki/foo?useskin=vector-2022 or you can change your preferences/appearance to skin Vector 2022. Anyway, the new skin will be soon the default one. It's worth checking it to find out any problem in advance. Vriullop (talk) 07:03, 10 February 2023 (UTC)
- @RichardW57m: I don't use that system, but I still have multiple windows open, two for Wiktionary, displaying one at a time. I'm interested in other users' views too. DonnanZ (talk) 14:14, 9 February 2023 (UTC)
Lemmas that are not wordsEdit
Where do we mention that a lemma is not a word when it is listed under a part of speech that is normally associated with a word? For example, I believe it is unnecessary to mention it for lemmas categorised as prefixes. I can think of three examples from Pali alone:
- Adjectives in -nt. Perhaps it is obvious from none of the inflected forms ending thus. In general, we also have all or almost all nouns ending in a consonant other than niggahita.
- orimo, an alternative citation form of orima, an adjective that only occurs in the neuter gender.
- varati Etymology 1, which lacks a present tense and I strongly suspect also a present active participle.
Thoughts? --RichardW57m (talk) 13:35, 9 February 2023 (UTC)
Disallowing mass closuresEdit
Today, @Ioaxxere closed a very large number of nominations at WT:RFVE. This is an example of one of several large edits they made, closing many threads at once. The great majority were fails. I also don't see any evidence that they had actually attempted to look for citations themselves (though I may be wrong), but they did suggest that they thought the deadline for citations was a month (which is not the case; as far as I know, that's just the minimum).
While it's obviously a problem that we tend to end up with a large backlog at RFV and RFD, I don't think that means we should just start closing things en masse, as it seems very unlikely that each term would have been given proper consideration. Plus, if the closer isn't even attempting to cite the term themselves, then this just amounts to a fail (without warning) after an arbitrary period of time.
I propose that we don't allow these kinds of mass closures, as I think there are less unilateral ways to clear the backlog, which don't rely on a single person's understanding of what the consensus or relevant policy is (e.g. posting about it on the Beer Parlour). Before his ban, Dan Polansky also had a habit of doing mass closures at RFD, too, and the issue was the same: it gave one person far too much influence. Theknightwho (talk) 23:30, 9 February 2023 (UTC)
- I'd also like to point out the issue of starting "CFI mandated" votes for their specific entries or entries that are found on Twitter or Reddit, even though we had a vote that showed the consensus about those sites and how we did not want to allow en masse words with no checks. It seems even more so now that this user is almost circumventing the WT:DEROGATORY policy that was voted in. They even started a "CFI-mandated vote" for entries that didn't even have cites, such as y'all'd'nt've, yet closed other entries as RFV-failed that didn't have any other cites either. I'm acutely aware of the problem of RFV backlogs and was in strong support of splitting WT:RFVNE out, but this is not the solution at all. AG202 (talk) 23:34, 9 February 2023 (UTC)
- Also:
- Deleting citations without saving them on the citations subpage.
- Failing entries with 2 durably archived cites, seemingly without looking for more (which, while not against policy, is low effort and likely to mean we delete entries which are citable).
- Theknightwho (talk) Theknightwho (talk) 23:39, 9 February 2023 (UTC)
- If the nominations were closed improperly, I'd be all in favor of undoing them. User:Ioaxxere should really undo them themselves, or if they don't want to, give a very good reason for this. Benwing2 (talk) 23:46, 9 February 2023 (UTC)
- Also:
- I pointed this out yesterday, and largely agree with what Theknightwho and AG202 have said. The idea of helping reduce our badly backlogged request pages is wonderful, but I think care and research is needed, if only to confirm that the terms really are uncitable/hard to find. Mass-failing entries is not the best solution. I’d rather have a large backlog than indiscriminate closures, honestly, although I see why others might evaluate the tradeoff differently. 70.172.194.25 23:48, 9 February 2023 (UTC)
This proposal is problematic for a couple of reasons:
- You never gave a definition for "mass closures". 10 a day? Then I'll just be able to do 9 a day.
- What is "proper consideration"? Scouring the entire Internet? If a brief Google Books search is acceptable, then it's hard to believe that doing that would find quotations that the nominator missed.
- RFV fails are not "without warning": there's a big RFV flag over the entry that I guess people have been trained to ignore at this point.
- I'm not trying to gain "influence" over RFVE, and I would really like it if people closed RFVs more often.
As for the points raised about moving citations: I'm open to doing that, but that was never a requirement.
In my view, the solution to the backlog in RFVE is to enforce a one-month deadline and therefore create a sense of urgency to cite entries. Ioaxxere (talk) 23:48, 9 February 2023 (UTC)
- If you just look for ways to subvert the policy by maximising the number of allowed closures, then you'll probably just get told not to close anything. Closing a thread needs to be done when either (1) there is clear consensus, (2) the term has been cited, or (3) it seems unlikely that (further) citations are forthcoming anytime soon. That entails putting in a reasonable amount of effort for each one.
- "Proper consideration" means that you've looked in all the usual places, and still can't find anything.
- They were without warning. It doesn't matter that there's a big warning on the entry: I have already explained to you that the issue is that you failed them arbitrarily, without doing anything to give a sense of urgency (which might encourage people to find cites).
- I never said you were trying to gain influence. I'm saying you were exercising too much influence. It doesn't need to be intentional to be a problem; I'd have an issue if anyone did it, however well-meaning.
- You seem to be coming at this from the perspective of what is technically allowed, but the overriding concern should be what is best for the project. These are not necessarily the same thing. Theknightwho (talk) 23:56, 9 February 2023 (UTC)
- I feel that a well-functioning and strict RFVE process is what's best for the project, but it seems like a lot of people want more effort to be put into each closure. To address your points, would you prefer a message like "I haven't found any quotations on [list of place I've looked]. If three quotations aren't added by [date in a few days], I'll mark this as RFV Failed."? Ioaxxere (talk) 00:05, 10 February 2023 (UTC)
- That's fine. Could you please undo your closures from today and yesterday? I think we need to start-over with them. Theknightwho (talk) 00:07, 10 February 2023 (UTC)
- I think I'll take a break from closing RFVs until we reach consensus on a new policy... (edit: I assume there's no controversy on closing passes) Ioaxxere (talk) 00:27, 10 February 2023 (UTC)
- I've rolled out this new approach on some newer RFVs: [3] Ioaxxere (talk) 02:18, 10 February 2023 (UTC)
- I like the "new approach" much better. Thanks! 70.172.194.25 22:54, 16 February 2023 (UTC)
- That's fine. Could you please undo your closures from today and yesterday? I think we need to start-over with them. Theknightwho (talk) 00:07, 10 February 2023 (UTC)
- I feel that a well-functioning and strict RFVE process is what's best for the project, but it seems like a lot of people want more effort to be put into each closure. To address your points, would you prefer a message like "I haven't found any quotations on [list of place I've looked]. If three quotations aren't added by [date in a few days], I'll mark this as RFV Failed."? Ioaxxere (talk) 00:05, 10 February 2023 (UTC)
- 1. I find it ironic that a new order of obstructionists has emerged alongside the old order of cranky deletionists. The basic argument seems to be "we can't cite Twitter or Reddit because people say terrible things there." Yes, the toxicity of social media was a problem before, and that shadow has been Elon-gating in the current climate. But Wiktionary's mission is to document "all words in all languages," which by definition includes hateful, offensive, and stupid terms. There are only so many tools for documenting the bleeding edge and hidden underbelly of a language, and in 2023 no one is posting on Usenet or mimeographing zines in their kitchen. I draw the line at platforming hate sites. That's a hill I've died on before, and would die on again. But Twitter, Reddit, and the like do not deserve to be treated the same as The Daily Stormer. That's another hill on which I'm willing to die as far as wiki-participation goes.
- 2. CFI has never explicitly disallowed online sources. It was commonly interpreted that way due to unclear wording: "Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google." Over time, "no online sources except Usenet" became de facto policy, as it aligned with many users' personal sensibilities. Similarly, the updated text CFI text does not mandate a two-week-long discussion and vote for every RfV nomination involving online sources. It says "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks," without specifying what form this discussion should take, what the subject of discussion should be, or how consensus is to be reached. I understood it as codifying the consideration of websites on a case-by-case basis, creating a mechanism to approve useful sites like Reddit, Twitter, and news sites while shutting out fringe and extreme sites. Others have seemingly decided to interpret the vague wording in ways more favourable to their interests. And so we're right back to square one, where critical parts of CFI are unwritten, not actual codified and explicit policy.
- 3. The RfV closing procedure has never required that closers attempt to attest nominations themselves. This would be an unreasonable expectation even if there wasn't a substantial backlog. WordyAndNerdy (talk) 01:09, 10 February 2023 (UTC)
- I don't think any policy prohibiting "mass closures" would be wise. Almost every major contributor to RFV, myself included, has at times closed a high volume of requests (because, as noted, the pages have quite a backlog). It's hard to see how an entry failing RFV after the allotted time is "without warning"; the terms are listed on a central page everyone can watchlist and have big banners in their entries warning everyone that cites are needed. If we want to make posting "warning: this term is two weeks away from failing" a requirement, we could have a bot do that, but it's hard to see how that'd represent an improvement over people just knowing how time works: terms are listed chronologically on WT:RFV and time progresses in a linear fashion (outside the TARDIS), so terms at the top of the list will be deleted after the allotted time. - -sche (discuss) 01:32, 10 February 2023 (UTC)
- While I agree on the need to handle the backlog, and that closers should not be forced to attempt to attest nominations, I think mass closures are generally not a good idea—I agree with TKW that a single person shouldn't be seen to overly dominate the process, and I'd add that borderline cases really demand more attentive treatment, and ideally I do think closers should make at least a cursory check to see whether the issue with a term is actually poor attestability as opposed to just lack of interest. -sche's point about expecting terms at the top of the list to go is well taken, but then the natural expectation is really that with a large backlog terms further down aren't going to be on the chopping block just yet when there's uncertainty surrounding them. (As one example, I was meaning to get around to xheart/xliver myself at some point when I have more time if nobody else did it, though I accept it's long past one month, and the closing remark mentions that the discussion at RFV had had no obvious result yet...) —Al-Muqanna المقنع (talk) 02:37, 10 February 2023 (UTC)
- I think the problem lies not on the massness closure, but rather the prematureness of the closure. If there had to be some rule, I would prefer disallowing closing an RFV that only one person has participated in, though I don't think such a rule is really that much of a necessity. – Wpi31 (talk) 05:08, 10 February 2023 (UTC)
- I would be happy to give a once-over to an RfV that had been open without comment for 30 days to help satisfy any rule requiring more than one contributor be involved before removal can proceed. DCDuring (talk) 16:02, 10 February 2023 (UTC)
- I also agree that the issue is the lack of participation rather than the 'massness' of the closure. For example, on Talk:Cel-Liberation Day there is no indication that anyone other than myself ever even tried to find citations for the term. I have to admit that the policy was not violated, but I'd rather have seen at least a tiny bit of confirmation that it seemed hard to cite, as this is a term that is mentioned in various places online. 70.172.194.25 22:54, 16 February 2023 (UTC)
- I don't find mass closures to be more of a problem than having an overwhelming number of entries that attract RfVs. As long as citations (whether durably archived or not) are saved to the appropriate citations page, and any discussion (or lack thereof) is saved to the talk page, little effort is wasted. After all, any admin can restore the old entry and any contributor with sufficient new evidence can make a new entry. My own practice is to work only on RfVed entries that strike me as worth taking the time away from other contributions to Wiktionary. Over time, fewer and fewer of the RfVed items have seemed worth it to me. I suspect that others operate in the same way. DCDuring (talk) 15:59, 10 February 2023 (UTC)
- I wouldn't disallow mass closures and, in fact, I think it's good that Ioaxxere has reduced the backlog. The way I see it, if something gets tagged as closed and the header gets struck out at RFV/RFD and people are given a week to object before it gets archived and the entry is deleted then they have ample time to object and the closure can be reversed. The only issue is if things get archived before the week is up. --Overlordnat1 (talk) 16:06, 10 February 2023 (UTC)
- We shouldn't be reducing the backlog by rushing things. Theknightwho (talk) 16:13, 10 February 2023 (UTC)
Passing terms with no or insufficient quotationsEdit
Examples: Mozella (there’s only one quote in this spelling on Mozela), praecognita (Al-Muqanna noted all of the easily findable hits on Google Books seem to be Latin code-switching; the OED reference is paywalled so I can’t see if the same applies there). I don’t support this practice, at least when finding uses is non-trivial, which applies to most terms sent to RfV. 70.172.194.25 19:09, 20 February 2023 (UTC)
- Added permalinks for the entries above and below, to show how they looked at the time of passing. 70.172.194.25 21:28, 24 February 2023 (UTC)
- @Ioaxxere, I thought that you were going to take a break from closing RFVs until a consensus was reached. There are terms that are being prematurely closed and archived, and it's much harder to reverse it once they're archived. There are way too many entries to check before they're archived as well. AG202 (talk) 19:28, 20 February 2023 (UTC)
- Mozella seems to be easily attested on Google Books. Still would prefer having three citations (I hadn’t checked prior to writing the above.) antijapanese is a better example of the phenomenon. 70.172.194.25 20:09, 20 February 2023 (UTC)
- I was under the impression that alternative forms are counted together with the main entry, so two quotations for color and two quotations for colour are sufficient. This is how every other dictionary (include far less inclusionist ones) count alternative forms.
- As for praecognita: it seems pretty silly to make a distinction between a quotation and a link to a quotation. If the issue is the paywall, would you prefer a link to the free OED2 [4]? (although there are few quotations than in OED3 [5]) Ioaxxere (talk) 20:11, 20 February 2023 (UTC)
- When alternative forms are sent specifically to RFV, they must have 3 cites on their own. This is standard practice. Same with having cites specifically on their entry per WT:ATTEST. AG202 (talk) 20:22, 20 February 2023 (UTC)
- This was definitely my understanding as well. 70.172.194.25 20:27, 20 February 2023 (UTC)
- Is it the same thing for inflected terms then? Ioaxxere (talk) 21:11, 20 February 2023 (UTC)
- Use common sense. Lots of words aren't attested in all of their possible forms, so we don't want want to fail the first person singular of a completely regular verb just because it's only attested in the third person. On the other hand, if someone creates an entry for an archaic second-person singular form of a rare modern technical word used only in medical journals, we want to be able to challenge that.
- If it's the word as a whole, and not just a given inflected form, we don't want to fail it because it doesn't have the complete paradigm, or even the principle parts, attested. If the lemma form is unattested and can't be conclusively derived from the attested forms, it can get tricky- but that will probably never happen with modern English. Chuck Entz (talk) 21:56, 20 February 2023 (UTC)
- @Chuck Entz The issue here is that our common senses don't agree... my "common sense" is that if alternative form quotations get to count towards the lemma, then the lemma's quotations should count towards the alternative form (assuming that there is at least one quotation for each specific form), and that's the point of disagreement. Ioaxxere (talk) 22:34, 20 February 2023 (UTC)
- It’s commonly accepted that inflected form quotations count for the lemma. The question is whether quotations for one spelling (whether lemma or inflected) count toward another spelling.
- I think what you’re proposing would basically allow the creation of any alt spelling with at least one citation, and you even say as much, which definitely hasn’t been our standard practice as I’ve observed it. For precedents, see Talk:beat'emest, Talk:canican, Talk:gamahauch, and many others. 70.172.194.25 00:53, 21 February 2023 (UTC)
- I agree that including every alternative spelling ever would be excessive (although the OED often actually does that). On the other hand, having a strict "three cites for the exact spelling" leads to some (in my opinion) absurd results. Do you think covid cut with five quotations should be deleted just because people are capitalizing it inconsistently?
- By the way: I was trying to find a policy or a vote for this "inflected form quotations count for the lemma" rule. WT:CFI has nothing. Ioaxxere (talk) 04:40, 21 February 2023 (UTC)
- While I don’t think that covid cut should be deleted (and I feel that that question to 70 was too much of a leading question considering it’d have to go to RFV first anyways), I do doubt that it should be the main lemma looking at the entry considering that it only has one cite for that spelling. AG202 (talk) 04:44, 21 February 2023 (UTC)
- Er, I guess you might be disappointed by the fact that I already passed this entry in RFV a few days ago. But since you already think it should stay, I would like to ask why this doesn't conflict with your rule of "they must have 3 cites on their own" given that no particular capitalization reaches 3 cites.
- And @AG202 I do try to follow policy as much as I can, but it feels like there is so much left unsaid, that unless I create a vote every week I have to rely on (subjective) "common sense" which has clearly not made everyone happy. If you genuinely think that everything is clear-cut then please let me ask for advice on a few difficult RFVs. Ioaxxere (talk) 04:51, 21 February 2023 (UTC)
- @Ioaxxere I don't think everything is clear-cut, but this is where practice, expertise, and overall more time with the project comes in. I get wanting to move things faster and stuff like that, I was the same, but after getting corrected by folks like @BD2412, I realized that I should take a step back and watch more than act, until I had a good grasp as to what the policies and practices are, especially with very wide-reaching issues like RFVE. I got more used to pinging specific people who I knew were more experienced with the issue at hand who've been working on the project for much longer, rather than just proceeding forward based on my own sole interpretation. That's what I'd really recommend, honestly. Part of the problem, though, is that there just aren't that many admin anymore and the ones that are left are fairly new, which can lead to the disconnect that we've seen very recently (and honestly part of why culture here is important to me, as clearly something is going on to where people are leaving and not joining as much, leading to areas that have been left untouched like RFVE), and it's something that needs to be discussed more.
- As for covid cut, this is one of those things where honestly I didn't even know that it was at RFV nor that it had passed. If I had seen the discussion, I would've pushed at the very least for the lemma to be moved and for more cites to be added, but I'm only one person and I simply can't proofread/check every RFV, which is part of why mass closures can be very ehh. AG202 (talk) 05:27, 21 February 2023 (UTC)
- While I don’t think that covid cut should be deleted (and I feel that that question to 70 was too much of a leading question considering it’d have to go to RFV first anyways), I do doubt that it should be the main lemma looking at the entry considering that it only has one cite for that spelling. AG202 (talk) 04:44, 21 February 2023 (UTC)
- @Chuck Entz The issue here is that our common senses don't agree... my "common sense" is that if alternative form quotations get to count towards the lemma, then the lemma's quotations should count towards the alternative form (assuming that there is at least one quotation for each specific form), and that's the point of disagreement. Ioaxxere (talk) 22:34, 20 February 2023 (UTC)
- Is it the same thing for inflected terms then? Ioaxxere (talk) 21:11, 20 February 2023 (UTC)
- This was definitely my understanding as well. 70.172.194.25 20:27, 20 February 2023 (UTC)
- When alternative forms are sent specifically to RFV, they must have 3 cites on their own. This is standard practice. Same with having cites specifically on their entry per WT:ATTEST. AG202 (talk) 20:22, 20 February 2023 (UTC)
@Ioaxxere, AG202, Chuck Entz: Another example where I would object is Falklands fritillary. I think it is citable, but the citations provided on the page aren't good enough for reasons described at Wiktionary:Requests_for_verification/English#Falklands_Fritillary_Butterfly. I understand that this was a case of "Cited" and not "RFV-passed", but I still find it to be problematic. With rare exceptions, citations should include the term in question as a grammatically separable unit, not just as sequence of words that doesn't parse as one unit. Especially when the issue raised by the user who sent a term to RfV was specifically whether the term exists as a separable unit! 70.172.194.25 03:12, 23 February 2023 (UTC)
- Changed link to a permalink, showing the state of the entry when it was called "Cited", because they have since added better quotations. 70.172.194.25 04:33, 23 February 2023 (UTC)
"CFI votes" don't turn rfv into rfdEdit
While I haven't read the vote itself or the discusssion there, my impression is that the idea of the vote on exceptions to the "durably archived" principle was strictly about allowing or disallowing sources, not entries.
As I understand it from the discussions here and on the other fora before, during and after the vote, the idea was not to overturn the status quo, but to allow for exceptions where the consensus was that it made sense to do so. That would mean that websites in general are still disallowed, but that either specific web sites would be allowed always if the community approved, or that certain web sites could be used for for certain rfvs if a consensus to do so was reached in a discussion lasting at least 2 weeks.
In other words, we might decide that site X is always worthy of being treated as if it were durably archived. Or we might say that it doesn't make sense for a term found all over the place online to fail just because it's never made its way into print or usenet- so we should allow site Y or site Z to count for this particular term.
There are definitely sites that are utterly useless for attestation purposes for any number of reasons, and should never be even considered. But there are also sites where we don't want to give them carte blanche because someone could use them to game our processes- but where it's obvious that nothing of the sort is going on, we can allow them with a clear conscience.Chuck Entz (talk) 07:14, 10 February 2023 (UTC)
At any rate, I think the "CFI-mandated discussion" for a term should consist of first deciding whether this is a real term that is only failing because of distortions caused by our choice of sources, and then deciding what sources can safely be allowed in this case in order to correct for those distortions.Chuck Entz (talk) 07:14, 10 February 2023 (UTC)
- If we agree that a term has 'clearly widespread use', then we don't need to agree that any quotation is valid for CFI. As far as I am aware, quotations that are not resilient enough for CFI are allowed, though adding a dated HTML comment that it was disallowed for CFI would be useful. --RichardW57m (talk) 09:54, 10 February 2023 (UTC)
- @Chuck Entz: if I understand correctly, your proposal is a series of votes: one, where people decide between Real or Fake, and subsequent votes for Accept/Reject X Quotations, Accept/Reject Y Quotations, etc.? If so, I don't see the utility of such a process over a simple Keep/Delete. Ioaxxere (talk) 23:36, 10 February 2023 (UTC)
Voting "delete" or "keep" doesn't make sense. RFV should be strictly about whether the term actually exists, and whether the tools we're using to determine whether it exists are right for the job. If they aren't, what should we be using? Chuck Entz (talk) 07:14, 10 February 2023 (UTC)
- It especially doesn't make sense to vote "delete" based on whether there are enough cites or not. It totally distorts the discussion, because obviously more cites may come along, and it can always be failed anyway if not enough turn up. Theknightwho (talk) 07:17, 10 February 2023 (UTC)
- @Theknightwho you have repeatedly demanded "CFI-mandated discussions" on terms which don't have three quotations. In your opinion, what should a such a discussion be about? Ioaxxere (talk) 12:56, 10 February 2023 (UTC)
- It's helpful to have single-word votes for a quick tally, though, and it's not clear what they ought to be if not delete/keep. Allow/forbid (the citations)? Support/oppose? —Al-Muqanna المقنع (talk) 08:59, 10 February 2023 (UTC)
- It would make sense to have two word votes for citations, which presumably should get their own paragraphs for clarity. 'Forbid' looks wrong for quotations - or are you proposing that quotations that don't count for CFI shall be turned into usage examples or deleted? Accept/reject (scilicet 'as evidence') would be better, though I think 'accept citation' would be much clearer. Not everyone who participates in an RfV discussion will be familiar with the procedure. --RichardW57m (talk) 10:08, 10 February 2023 (UTC)
- @Chuck Entz I think you're overly paranoid about people who "game our processes". We're not that big of a deal, IMHO Emmett Lathrop Doc Brown (talk) 22:07, 10 February 2023 (UTC)
This follows on from my post on the Grease Pit about specifying alternative forms in a single link (e.g. color/colour), which can be used in any link template by using the delimiter //
. The motivation behind this was to make it possible to deprecate {{zh-l}}
, which is the specialised link template for Chinese. The most obvious difference with {{l}}
is that it automatically generates simplified forms: (e.g. 彈/弹 (tán)). It also generates pinyin, which it manages by scraping pre-existing entries; something other link templates don't currently do for Chinese.
To replicate this, I've also created a way to generate forms automatically on a language-specific basis, which can be done by specifiying a module in the language data using the generate_forms
key. In the case of Chinese, simplified forms would be generated by Module:zh-generateforms. In addition, I've also created Module:cmn-translit, which automatically generates pinyin in a similar fashion to {{zh-l}}
. Neither of these have been turned on yet, but they do mean that it's now feasible to start replacing {{zh-l}}
with {{l}}
, {{m}}
and similar. In particular, it also means we can getting rid of bodges in etymology sections involving Chinese, which frequently look like {{bor|en|cmn|-}}
{{zh-l}}
. What's worse, that bodge is the only way to give traditional/simplified side-by-side when specifying a specific language such as Cantonese.
Before turning either of these on, though, I just wanted to bring this up at the Beer Parlour to gauge any concerns. I've not noticed any memory issues in testing, but there are likely to be entries where doing this would lead to duplicated simplified forms being shown (as these have often been entered manually). I would also guesstimate that {{zh-l}}
has been invoked a couple of hundred thousand times, so any conversion would need to be done by bot. This is not likely to be straightforward, because it has a somewhat more flexible syntax (which just isn't possible to port over to the main templates, because it would cause problems for other languages). Naturally, there is also the concern about whether Mandarin pinyin should be given by default for Chinese links as a whole, but that's a concern that also applies to {{zh-l}}
itself, so I don't really want to tackle it here.
Overall, though, it would be really good to start sweeping away these sorts of language-specific templates wherever possible, because they're often not written very well, and they lead to walled gardens of badly written modules that end up being massively inefficient and incompatible with everything else. Not to put too fine a point on it, but the Chinese modules are a shitshow at the moment, and unpicking them is inherently going to involve growing pains such as this. Theknightwho (talk) 21:47, 11 February 2023 (UTC)
- I believe this should also imply deprecating most of the other Chinese templates, namely
{{zh-syn}}
,{{zh-ant}}
,{{zh-cot}}
,{{zh-hyper}}
,{{zh-hypo}}
,{{zh-also}}
,{{zh-synonym}}
(plus{{zh-altterm}}
and{{zh-altname}}
which the standard templates have been deprecated long ago),{{zh-alt form}}
,{{zh-misspelling of}}
,{{zh-short}}
, and potentially some others that I've missed. Some of these have slightly different displays or input from the standard templates, but that should not be a significant hurdle. - As I've mentioned numerous times before, I strongly oppose automatic pinyin. Sorry Knight I know you don't want to deal with this here, but that's exactly my concern with this change, since it codifies the status quo of having pinyin into something discussed and passed with a "consensus" in BP.
- It introduces many errors and inaccuracies in how we present the information. Many characters have multiple readings, your example 彈/弹 could be tán or dàn – both are equally common, but the template (including the existing
{{zh-l}}
) does not accomodate this and simply outputs one of them. I'm not fluent in Mandarin at all (and there are many other Chinese editors who similarly do not speak Mandarin fluently), so I could never tell which is the correct reading – what I would do is letting the template do its job and not caring about whether the output is correct, or more recently I would simply manually turn it off. I imagine that most of the fellow Mandarin editors wouldn't always check the correctness of the pinyin either. It appears to me that having automatic pinyin creates more maintainence than without it. - Also, fuck unified Chinese, the template is called
{{zh-l}}
not{{cmn-l}}
, and Chinese ≠ Mandarin, so it is totally absurd to impose Mandarin onto every Chinese entry. I believe the various problems arisen from this has been mentioned to death everywhere, so I'm not repeating them here unless someone asks me to.
- It introduces many errors and inaccuracies in how we present the information. Many characters have multiple readings, your example 彈/弹 could be tán or dàn – both are equally common, but the template (including the existing
- – Wpi31 (talk) 04:51, 12 February 2023 (UTC)
- Yep - that’s fair. These two features are disconnected from each other, so I can turn on automatic simplification without automatic pinyin. Theknightwho (talk) 16:12, 12 February 2023 (UTC)
- @Wpi31 I’ve been having a think about how to mitigate this issue: it’s possible to turn off automatic pinyin if multiple different pronunciations are detected. This would therefore retain it for the majority of links, but prevent it anytime there’s ambiguity. It wouldn’t catch all false positives, but I think it’d bring them down to an acceptable level. Theknightwho (talk) 17:30, 12 February 2023 (UTC)
- Addendum: On the issue of
zh
, we could only turn on automatic pinyin like that forcmn
, which would make this switch-over a good excuse to start disambiguating lots of Chinese links. Similar semi-automatic systems could also be put in place foryue
,nan
and so on. We could also use categories to flag any ambiguous links with no romanisation. —This unsigned comment was added by Theknightwho (talk • contribs) at 17:36, 12 February 2023 (UTC). - Addendum 2: I’m unsure about how practical this is, but it would also be possible for
{{l|zh}}
to show all the pronunciations. This might even be a good way to avoid pointless repetition when something applies to all/multiple varieties, while encouraging people to be more specific when possible, too. Theknightwho (talk) 17:48, 12 February 2023 (UTC)
- Yep - that’s fair. These two features are disconnected from each other, so I can turn on automatic simplification without automatic pinyin. Theknightwho (talk) 16:12, 12 February 2023 (UTC)
- I believe this should also imply deprecating most of the other Chinese templates, namely
Tagging @Justinrleung, @RcAlex36, @MSG17, @ND381, @Octahedron80, @Fish bowl, @LibCae, @沈澄心 for comment. Theknightwho (talk) 17:14, 14 February 2023 (UTC)
- I broadly agree with both you and Wpi31 in this matter. This would be in line with other template depreciations in terms of proposal, though I do think some more testing would be need to ensure a smooth transition and show that other templates can handle the zh syntax. As to showing pronunciations, I think that while a change is needed with romanizations (particularly with entries showing pinyin automatically, but not say POJ for Min Nan only entries), implementing it might be problematic for users if either automatic pinyin is removed or all the romanizations are shown (which could lead to overly long listings, inconsistent display of entries with different romanizations, and/or general display changes). In any case, however, if a template can handle romanization of non-zh entries, then it should be able to handle character differences for zh entries. MSG17 (talk) 02:29, 15 February 2023 (UTC)
- @Theknightwho Let me know if you need bot work done. I've written a lot of bot scripts to do rather complex things and I imagine converting
{{zh-l}}
to{{l}}
shouldn't be so hard. If there are cases it can't handle automatically, it will leave them alone to be done manually in a separate pass. Benwing2 (talk) 03:33, 16 February 2023 (UTC)- @Benwing2: Thank you - much appreciated. I think it would be a good idea to spend a month or so doing manual replacements (on an as-and-when basis), which should hopefully identify most scenarios. This will also give time to identify/discuss/solve any formatting changes this would cause (e.g. I’ve noticed that Chinese link templates don’t bolden terms in non-gloss definitions, and there may not be consensus to change that). Theknightwho (talk) 16:48, 17 February 2023 (UTC)
- @Theknightwho Let me know if you need bot work done. I've written a lot of bot scripts to do rather complex things and I imagine converting
- Regarding the romanisations for ambiguous cases, would it be possible to supply the parameter with unformatted text and have the module output the formatted form (i.e. mainly automatic superscript) – Wpi31 (talk) 07:19, 16 February 2023 (UTC)
- I strongly oppose the deletion of the template. Like
{{zh-x}}
, it requires care and knowledge (wich comes from careful checking), the generic templates are not able to transliterate Chinese, Japanese, Thai and Khmer terms or whole phrases, the way some language-specific templates do. Offer a good, working alternative before suggesting deletion. The incorrect transliterations will come from incorrect usage. --Anatoli T. (обсудить/вклад) 01:47, 17 February 2023 (UTC)- @Atitarev: That's exactly the reason why this is brought up here. Please read the entire discussion before commenting in such an aggresive manner. All features of
{{zh-l}}
are already replicated (though they are not enabled yet), except for the guessing with the|2=
parameter which should always be discouraged (nevertheless it shouldn't exist in the first place). The automatic simplified forms are handled by Module:zh-generateforms; the*
disabling translit/simplified can be done by{{zh-l|車//|tr=-}}
; the automatic pinyin is done through Module:cmn-translit which, as Knight has said, is doing the exact same thing as{{zh-l}}
does, so any incorrectness arising from the new module(s) already exists with{{zh-l}}
itself. What we have been discussing above is simply trying to further eliminate the incorrect outputs by disabling pinyin for only the ambiguous cases and to support auto-transliteration for other lects. Wpi31 (talk) 05:04, 17 February 2023 (UTC)- @Wpi31: I wasn't aggressive, I just have a strong opinion. I saw your aggressive comments about unified Chinese, though. If you need coverage for other dialects,
{{zh-x}}
(for usexes) already uses parameters for other varieties, a similar template for e.g. Min Nan would require similar work and even more suppressions or manual overrides. Thanks for doing the work but I haven't seen it anywhere near completion (correct me if I'm wrong) or used in action with a template. The usage is very big and it seems too early to deprecate. Mandarin Chinese transliteration can stay largely automated, even if care should be taken, hope it will stay so. Anatoli T. (обсудить/вклад) 05:15, 17 February 2023 (UTC)- @Atitarev I am offering a good, working alternative. I don't really understand what your objection is. Theknightwho (talk) 05:37, 17 February 2023 (UTC)
- @Theknightwho: I welcome the work, I don't welcome the deprecation (yet). Anatoli T. (обсудить/вклад) 05:41, 17 February 2023 (UTC)
- @Atitarev: I’m not suggesting we deprecate it immediately, but I do want to start the process of migrating away from it. This will help to identify any further difficult cases, too. The fact that other lects will have exceptions and difficult cases is fair enough, but automating those would be a new feature anyway. Right now, I suggest we turn on these features for
{{l}}
(and by extension, all the other standard link templates). Then we can go from there. I expect{{zh-l}}
will take several months (if not over a year) to fully replace. Theknightwho (talk) 16:38, 17 February 2023 (UTC)
- @Atitarev: I’m not suggesting we deprecate it immediately, but I do want to start the process of migrating away from it. This will help to identify any further difficult cases, too. The fact that other lects will have exceptions and difficult cases is fair enough, but automating those would be a new feature anyway. Right now, I suggest we turn on these features for
- @Theknightwho: I welcome the work, I don't welcome the deprecation (yet). Anatoli T. (обсудить/вклад) 05:41, 17 February 2023 (UTC)
- @Atitarev I am offering a good, working alternative. I don't really understand what your objection is. Theknightwho (talk) 05:37, 17 February 2023 (UTC)
- @Wpi31: I wasn't aggressive, I just have a strong opinion. I saw your aggressive comments about unified Chinese, though. If you need coverage for other dialects,
- @Atitarev: That's exactly the reason why this is brought up here. Please read the entire discussion before commenting in such an aggresive manner. All features of
- I strongly oppose the deletion of the template. Like
- I’m going to turn these features on in a day or two if there are no further objections (with the change that Mandarin pronunciations only work automatically if only one pronunciation is given on the main page). Tagging @Wpi31, @MSG17, @Benwing2, @Atitarev, who have participated in the discussion. Theknightwho (talk) 20:01, 21 February 2023 (UTC)
- @Benwing2, @Atitarev (who may not be aware of this) - I've turned on automatic simplification for
zh
andcmn
. Automatic pinyin is still to come, as there are some bits to iron out. Ben - would it please be possible for you to run a bot job removing any duplicated translations? Up until now, it's been necessary to add traditional and simplified separately - usually with traditional first, but with the pinyin given in the simplified template (to avoid duplication). In the great majority of cases, the automatically generated simplified form will be correct, meaning that it'll now be displayed twice (e.g. at noodle#Translations). The pinyin "transliterations" will need to be moved to the traditional template, too. Many thanks. Theknightwho (talk) 05:29, 24 February 2023 (UTC)- @Theknightwho Sure, although I'll need more guidance on exactly what to do. Specifically, can you give me examples of various templates as they look now and what they ought to look like? Benwing2 (talk) 05:33, 24 February 2023 (UTC)
- Also, I see 27 entries in CAT:E. Only some of them are memory-related but I'm wondering if at the end of this, the removal of dead code will result in memory decrease from the current state. Benwing2 (talk) 05:35, 24 February 2023 (UTC)
- @Benwing2 No problem. To use the same example:
- There will be some that don't follow this format, but this should catch about 95% of them. I'm hoping you're right about the removal of dead code - there is a lot of code in the Chinese modules that we would do well to get rid of. Theknightwho (talk) 05:38, 24 February 2023 (UTC)
- Also, I see 27 entries in CAT:E. Only some of them are memory-related but I'm wondering if at the end of this, the removal of dead code will result in memory decrease from the current state. Benwing2 (talk) 05:35, 24 February 2023 (UTC)
- @Theknightwho: Thank you for the efforts! Will you be able to turn features (eventually) for sentence transliterations and simplified forms?
- At Template:zh-x/documentation#Tricks (for Mandarin only) - the list describes typical situations when dealing with Mandarin (about 10%), where automated simplifications and transliterations are not right. Ignore the delinking but helping with desired simplified forms and corrected pinyin is what is typically required with this automation. Anatoli T. (обсудить/вклад) 05:40, 24 February 2023 (UTC)
- @Atitarev I should think so, yeah. You can use
//
to separate traditional/simplified if you need to do it manually (in exactly the same way as/
works for{{zh-l}}
). Theknightwho (talk) 05:43, 24 February 2023 (UTC)- @Theknightwho:. Great. Forgot to mention (in case you haven't implemented) that "^" is already used for capitalisation of romanisations in Japanese, Korean and by
{{zh-l}}
and{{zh-x}}
. - It opens the door for fully automating Japanese transliterations. That's why I always provided the full kana in all Japanese translations.
{{ja-r}}
and{{ja-x}}
show similar tricks and challenges. - Same thing can be done for Thai and Khmer.
{{th-x}}
and{{km-x}}
for reference. Anatoli T. (обсудить/вклад) 05:49, 24 February 2023 (UTC)- @Theknightwho Is there a bot-callable function to convert traditional Chinese to simplified, and one to generate the default transliteration for a string of Chinese characters? (
{{xlit}}
will probably work for the latter; not sure the correct function for the former). It needs to be either a template call or an instance of{{#invoke:...}}
. Benwing2 (talk) 06:01, 24 February 2023 (UTC) - Also the example you gave has 'cmn' in it. Do you only want/need 'cmn' transliterations converted or also 'zh' transliterations (and do the latter exist at all)? Benwing2 (talk) 06:03, 24 February 2023 (UTC)
- @Atitarev Yes - I’ve enabled
^
as a way to capitalise transliterations for all scripts which don’t have capitalisation - it’s more flexible, too, as you can put it anywhere in the term. - @Benwing2 Using
lang:generateForms(text)
will return a table of forms. Here, it’ll contain two forms if it’s made a conversion, but only one if not. - In terms of Chinese translations, there shouldn’t be any using
zh
, as it should be divided by lect. In reality, I know plenty do - but they’re usually bullet-pointed by lect (which makes determining the correct langcode trivial). Theknightwho (talk) 06:20, 24 February 2023 (UTC)
- @Atitarev Yes - I’ve enabled
- OK, to make things more concrete, here is part of a dump of searching through the Jan 20 dump file for the regex
\{tt?\+?\|(cmn|zh)\|.*
(although the Feb 20 file should be similar):
- @Theknightwho Is there a bot-callable function to convert traditional Chinese to simplified, and one to generate the default transliteration for a string of Chinese characters? (
- @Theknightwho:. Great. Forgot to mention (in case you haven't implemented) that "^" is already used for capitalisation of romanisations in Japanese, Korean and by
- @Atitarev I should think so, yeah. You can use
- @Theknightwho Sure, although I'll need more guidance on exactly what to do. Specifically, can you give me examples of various templates as they look now and what they ought to look like? Benwing2 (talk) 05:33, 24 February 2023 (UTC)
- @Benwing2, @Atitarev (who may not be aware of this) - I've turned on automatic simplification for
Page 900 Roman numeral: Found match for regex: {t+|cmn|羅馬數字}}, {{t+|cmn|罗马数字|tr=Luómǎ shùzì}} Page 901 letter: Found match for regex: {t+|cmn|字母|tr=zìmǔ}}, {{t+|cmn|字|tr=zì}}, {{t+|cmn|文字|tr=wénzì}} Page 904 decrypt: Found match for regex: {t+|cmn|解密|tr=jiěmì}}, {{t+|cmn|解碼}}, {{t+|cmn|解码|tr=jiěmǎ}}, {{t+|cmn|解讀}}, {{t+|cmn|解读|tr=jiědú}} Page 906 Irish: Found match for regex: {t+|cmn|愛爾蘭語}}, {{t+|cmn|爱尔兰语|tr=ài'ěrlányǔ}} Page 909 second: Found match for regex: {tt+|cmn|第二|tr=dì'èr|sc=Hani}} Page 910 century: Found match for regex: {t+|cmn|世紀}}, {{t+|cmn|世纪|tr=shìjì}} Page 911 clock: Found match for regex: {tt+|cmn|鐘}}, {{tt+|cmn|钟|tr=zhōng}}, {{tt+|cmn|時鐘}}, {{tt+|cmn|时钟|tr=shízhōng}}, {{tt+|cmn|鐘錶}}, {{tt+|cmn|钟表|tr=zhōngbiǎo}} Page 912 millisecond: Found match for regex: {t+|cmn|毫秒|tr=háomiǎo|sc=Hani}} Page 913 polytheism: Found match for regex: {t+|cmn|多神教|tr=duōshénjiào}} Page 914 Japan: Found match for regex: {tt+|cmn|日本|tr=Rìběn}} Page 915 computer science: Found match for regex: {t|cmn|電腦科學|sc=Hani}}, {{t|cmn|电脑科学|tr=diànnǎo kēxué|sc=Hani}}, {{t+|cmn|計算機科學|sc=Hani}}, {{t+|cmn|计算机科学|tr=jìsuànjī kēxué|sc=Hani}} Page 917 few: Found match for regex: {t+|cmn|少|tr=shǎo|sc=Hani}}, {{t+|cmn|一些|tr=yīxiē|sc=Hani}} Page 918 meat: Found match for regex: {tt+|cmn|肉|tr=ròu}} Page 919 I love you: Found match for regex: {t+|cmn|我愛你}}, {{t+|cmn|我爱你|tr=wǒ ài nǐ}} Page 920 beer: Found match for regex: {tt+|cmn|啤酒|tr=píjiǔ}}, {{tt+|cmn|麥酒}}, {{tt+|cmn|麦酒|tr=màijiǔ}} {{q|rare or regional}} Page 922 encrypt: Found match for regex: {t+|cmn|加密|tr=jiāmì}} Page 925 ASAP: Found match for regex: {t+|cmn|盡快|sc=Hani}}, {{t+|cmn|尽快|tr=jìnkuài|sc=Hani}}, {{t+|cmn|及早|tr=jízǎo|sc=Hani}} Page 929 pseudo-: Found match for regex: {t+|cmn|偽|alt=偽-|sc=Hani}}, {{t+|cmn|伪|alt=伪-|tr=wěi-|sc=Hani}}, {{t+|cmn|假|alt=假-|tr=jiǎ-|sc=Hani}} Page 934 trade union: Found match for regex: {t+|cmn|工會}}, {{t+|cmn|工会|tr=gōnghuì}} Page 937 umbrella: Found match for regex: {t+|cmn|傘}}, {{t+|cmn|伞|tr=sǎn}}, {{t+|cmn|雨傘}}, {{t+|cmn|雨伞|tr=yǔsǎn}} {{qualifier|rain}} Page 939 white-collar: Found match for regex: {t+|cmn|白領|sc=Hani}}, {{t+|cmn|白领|tr=báilǐng|sc=Hani}} Page 941 chairman: Found match for regex: {t+|cmn|主席|tr=zhǔxí}}, {{t+|cmn|議長}}, {{t+|cmn|议长|tr=yìzhǎng}} Page 943 bit: Found match for regex: {t+|cmn|馬銜|sc=Hani}}, {{t+|cmn|马衔|tr=mǎxián|sc=Hani}} Page 946 BCE: Found match for regex: {t+|cmn|公元前|tr=gōngyuán qián}} Page 947 BC: Found match for regex: {t+|cmn|公元前|tr=gōngyuánqián}}, {{t|cmn|主前|tr=zhǔqián}} {{qualifier|Christian}}, {{t+|cmn|紀元前}}, {{t+|cmn|纪元前|tr=jìyuánqián}} Page 949 point: Found match for regex: {t+|cmn|點}}, {{t+|cmn|点|tr=diǎn}}
Some questions here:
- Page 900 Roman numeral has the translit 'Luómǎ shùzì' which will not be what's auto-generated since it has a capital letter and a space. I'm pretty sure spaces should be preserved but do we want to map capital letters to lowercase in translit? Also is there a way to specify in the Chinese characters that there should be a space in translit? Some sort of specially-handled character which doesn't show up in the link or the Chinese display but does show up in translit. I implemented something of this nature for hyphens in Korean, but it was special-cased in Module:script utilities. I take it maybe you've implemented a generalized version of this?
- Page 904 decrypt: I take it the third translit is the simplified equivalent of the second. If 'jiěmǎ' is the default translit, do you want the bot run to detect this and remove it, so it's auto-generated?
- Page 909 second: Do you want the bot run to remove
|sc=Hani
? - Page 914 Japan: Another capital letter in translit.
- Page 929 pseudo-: There's an
|alt=
param in both the traditional and simplified equivalent, from what I can tell. How should the bot handle this? - Page 946 BCE and page 947 BC: The same expression occurs in both places but with differences in placement of spaces. Presumably we should eventually fix this (not by bot)?
Benwing2 (talk) 06:18, 24 February 2023 (UTC)
- @Theknightwho I checked some examples where I removed the manual translit and it appears auto-translit isn't ever getting generated. Is this correct? Are there plans to change this? Benwing2 (talk) 07:42, 24 February 2023 (UTC)
- @Benwing2 Sorry for the misunderstanding - automatic pinyin hasn’t been turned on yet, as we’re ironing out the specifics on how best to go about it. The consensus so far is that it’s going to be a semi-automated, with ambiguous situations requiring manual input. As things stand, that means it’s best to keep all the transliterations we have at the moment, and then we can handle those later on if they need to be removed. They’re lower priority, as they don’t have the immediate visual/usability problem that the duplicates have. Theknightwho (talk) 12:50, 24 February 2023 (UTC)
- @Theknightwho I checked some examples where I removed the manual translit and it appears auto-translit isn't ever getting generated. Is this correct? Are there plans to change this? Benwing2 (talk) 07:42, 24 February 2023 (UTC)
- @Benwing2, Theknightwho: ^ is used for capitalisation and space is space in Module:zh-usex. Both are invisible.
- I’d like |sc=Hani or Hant to be removed. Anatoli T. (обсудить/вклад) 12:17, 24 February 2023 (UTC)
- |alt= should be fine, e.g. 韓國的/韩国的 (zh) (Hánguó de) Anatoli T. (обсудить/вклад) 12:22, 24 February 2023 (UTC)
- @Benwing2 I’ll leave the various transliteration issues for now. However, please remove all script codes (from all lects, if that’s not too much trouble). At the moment, these are wrongly overriding the traditional/simplified detection, and for
zh
&cmn
they will be potentially causing issue for the generation of simplified forms (as that only works if the script is detected asHant
). Plus, it’s likely automatic simplification will be turned on for some of the other lects at some point, too. Theknightwho (talk) 12:58, 24 February 2023 (UTC) - {{re|Benwing2}} I've checked some translations with warnings above. They are apparently already addressed. Do you still have any queries outstanding? I understand you're not checking for how to do spaces/capitalisations yet? It's not ready yet. Anatoli T. (обсудить/вклад) 05:46, 27 February 2023 (UTC)
based pro-standard templates chad, good work and ily, and {{l|zh|車//|tr=-}}
is excellent syntax 👍️ —Fish bowl (talk) 21:54, 24 February 2023 (UTC)
- Happy to see things up and running, and it's definitely a good step to make templates less divergent for Chinese. I'm wondering whether simplified should be suppressed in
{{zh-dial}}
, which has been relying on the backend of{{l}}
. I know that was the status quo every since the conception of{{zh-dial}}
, but I think it's time to make it more accessible for people who are used to simplified (which tbh is probably the majority of Chinese leaners). The downsides are that it would make the template a little clunkier (which I don't see as a big issue) and that it's going to take up a bit of memory because of the sheer amount of data we process with{{zh-dial}}
. Thoughts from @Fish bowl, RcAlex36, Wpi31? — justin(r)leung { (t...) | c=› } 22:46, 24 February 2023 (UTC)
- IIRC it was decided to remove simplified to reduce visual clutter, which I think is reasonable (陰莖#Synonyms), but I can't find the conversation. (also I'd personally like to migrate
{{zh-dial}}
to my more language-agnostic{{dial syn}}
too) —Fish bowl (talk) 23:38, 24 February 2023 (UTC)
- IIRC it was decided to remove simplified to reduce visual clutter, which I think is reasonable (陰莖#Synonyms), but I can't find the conversation. (also I'd personally like to migrate
- @Fish bowl: I believe it's Wiktionary talk:About Chinese#Simplified Chinese in all templates and modules. We might want to revisit this since I think the decision was mostly Wyang's. (I guess you were kind of also supporting it if you think it's cluttered in zh-dial.) — justin(r)leung { (t...) | c=› } 05:21, 26 February 2023 (UTC)
- @Theknightwho My bot script is running. There are 22,776 pages to process so it will run overnight. Besides removing redundant translations, it removes script codes from translation templates for all Chinese lects and replaces 'zh' with the correct lect code. When run on the Feb 20 dump it produced 165 warnings of various sorts; these need to be fixed up by hand. See User:Benwing2/remove-redundant-chinese-translations-warnings. Benwing2 (talk) 04:05, 25 February 2023 (UTC)
- @Benwing2 Thanks for this - I’ll take a look at the warnings. Theknightwho (talk) 17:29, 25 February 2023 (UTC)
- @Theknightwho My bot script is running. There are 22,776 pages to process so it will run overnight. Besides removing redundant translations, it removes script codes from translation templates for all Chinese lects and replaces 'zh' with the correct lect code. When run on the Feb 20 dump it produced 165 warnings of various sorts; these need to be fixed up by hand. See User:Benwing2/remove-redundant-chinese-translations-warnings. Benwing2 (talk) 04:05, 25 February 2023 (UTC)
- @theknightwho: I noticed that
{{och-l}}
and{{ltc-l}}
are also using Module:zh/link (which is totally unnecessary, and could be done using the standard link modules). Can you look into that? (I don't have the time for that now) – Wpi31 (talk) 14:46, 25 February 2023 (UTC)m- I agree - the two templates never really had any good reason to exist in the first place, and can be fairly easily replaced. One thing I’m considering is whether the language objects should have a pronunciation method, as that might be more applicable for MC and OC. Theknightwho (talk) 17:22, 25 February 2023 (UTC)
- @Theknightwho See also User:Benwing2/remove-redundant-chinese-translations-warnings-from-to. These are the remaining 114 warnings in a slightly different format. The lines in question are in the form
<from> LINE <to> LINE <end>
; if you correct the part *after* the<to>
, and leave the part before it alone, I can run a bot script to update all the pages in question. Some of the pages need to be edited directly, in particular the ones with junk after theChinese:
part, but this should make it easier to fix things up. Benwing2 (talk) 21:47, 25 February 2023 (UTC)- @Benwing2 The link templates now seems to be also generating simplified forms for the rest of the Chinese lects. (I don't remember it doing that a few days ago) Can you run the bot job again for them? – Wpi31 (talk) 13:58, 28 February 2023 (UTC)
- @Wpi31 Sure. Can you give me a couple examples where this is happening? Also User:Theknightwho can you verify what Wpi31 says? Benwing2 (talk) 22:32, 28 February 2023 (UTC)
- @Benwing2: book/translations#Noun, pen (writing tool), horse, pig, apple, wind/translations#Etymology_1. I think this needs to be put on hold for now, since something needs to be ironed out after seeing these pages – it looks like the simplfied forms are generated only for the larger Chinese lects, i.e. yue, hak, nan, wuu? @theknightwho – Wpi31 (talk) 03:04, 1 March 2023 (UTC)
- PS: hsn, gan, zhx-teo, also generates simplified forms, but not for cdo, cjy, cpx, czh, czo
(probably someone forgot to press save?)– Wpi31 (talk) 03:22, 1 March 2023 (UTC)- @Wpi31, Theknightwho It looks like the c... lects are fixed. I am ready to run the bot to fix up the non-Mandarin lects, let me know if that's OK. Benwing2 (talk) 21:22, 1 March 2023 (UTC)
- @Benwing2 @Wpi31 I agree this sounds good. The lects which now use automatic simplification are:
cdo
,cjy
,cmn
,cpx
,czh
,czo
,dng
,gan
,hak
,hsn
,mnp
,nan
,wuu
,wxa
,yue
,zhx-sht
,zhx-tai
&zhx-teo
. When turning these on, I forgot to check Module:languages/data/3/c for other lects because Mandarin (cmn
) had already been enabled, which was just an oversight; nothing to do with the size of the lects. Theknightwho (talk) 21:42, 1 March 2023 (UTC)
- @Benwing2 @Wpi31 I agree this sounds good. The lects which now use automatic simplification are:
- @Wpi31, Theknightwho It looks like the c... lects are fixed. I am ready to run the bot to fix up the non-Mandarin lects, let me know if that's OK. Benwing2 (talk) 21:22, 1 March 2023 (UTC)
- PS: hsn, gan, zhx-teo, also generates simplified forms, but not for cdo, cjy, cpx, czh, czo
- @Benwing2: book/translations#Noun, pen (writing tool), horse, pig, apple, wind/translations#Etymology_1. I think this needs to be put on hold for now, since something needs to be ironed out after seeing these pages – it looks like the simplfied forms are generated only for the larger Chinese lects, i.e. yue, hak, nan, wuu? @theknightwho – Wpi31 (talk) 03:04, 1 March 2023 (UTC)
- @Wpi31 Sure. Can you give me a couple examples where this is happening? Also User:Theknightwho can you verify what Wpi31 says? Benwing2 (talk) 22:32, 28 February 2023 (UTC)
- @Benwing2 The link templates now seems to be also generating simplified forms for the rest of the Chinese lects. (I don't remember it doing that a few days ago) Can you run the bot job again for them? – Wpi31 (talk) 13:58, 28 February 2023 (UTC)
- @Theknightwho See also User:Benwing2/remove-redundant-chinese-translations-warnings-from-to. These are the remaining 114 warnings in a slightly different format. The lines in question are in the form
- I agree - the two templates never really had any good reason to exist in the first place, and can be fairly easily replaced. One thing I’m considering is whether the language objects should have a pronunciation method, as that might be more applicable for MC and OC. Theknightwho (talk) 17:22, 25 February 2023 (UTC)
The generate_forms
key is not in the documentation of Module:languages/data/2. Is it a new feature? Anyway I suggest the documentation page be updated. -- Huhu9001 (talk) 13:11, 26 February 2023 (UTC)
- @Huhu9001 I'll add it shortly. I'm still in two minds as to whether there should be some way to specify different modules depending on the script(s) of the forms submitted by the user, which explains the delay. For example, Dungan uses automatic simplification, but would also benefit from having Cyrillic + Han displayed together as well (which would need to be facilitated separately). Theknightwho (talk) 21:45, 1 March 2023 (UTC)
- @Wpi31, Theknightwho More warnings (204 of them): User:Benwing2/remove-redundant-chinese-translations-warnings-2. Benwing2 (talk) 23:15, 1 March 2023 (UTC)
- @Benwing2 There seems to be some translations that the bot never got to clean up, for example ten thousand and time/translations. Wpi31 (talk) 06:56, 2 March 2023 (UTC)
- @Wpi31 I suspect those are cases where the simplified and traditional don't match according to our tables. In such a situation, the bot currently silently ignores the mismatch; I'll do a run outputting warnings for these cases to make sure this is the issue. Benwing2 (talk)
- @Wpi31 Actually caused by an off-by-one error resulting in skipping Translations sections that were the last section on the page. Also fixed some other issues; will rerun. Benwing2 (talk) 10:06, 2 March 2023 (UTC)
- Also, a few cases where lects in translations were skipped due to not using automatic simplification (this concerns Literary Chinese, Old Chinese, Middle Chinese):
- @Benwing2 There seems to be some translations that the bot never got to clean up, for example ten thousand and time/translations. Wpi31 (talk) 06:56, 2 March 2023 (UTC)
- @Wpi31, Theknightwho More warnings (204 of them): User:Benwing2/remove-redundant-chinese-translations-warnings-2. Benwing2 (talk) 23:15, 1 March 2023 (UTC)
- Page 179 of: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t|lzh|之|tr=zhī}}
<to> *: Literary Chinese:{{t|lzh|之|tr=zhī}}
<end> - Page 299 foot: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{tt|lzh|足}}
<to> *: Literary Chinese:{{tt|lzh|足}}
<end> - Page 361 eat: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{tt|lzh|餔}}
<to> *: Literary Chinese:{{tt|lzh|餔}}
<end> - Page 467 dark: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{tt|lzh|黲}}
<to> *: Literary Chinese:{{tt|lzh|黲}}
<end> - Page 599 lithium: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t|lzh|鋰}}
<to> *: Literary Chinese:{{t|lzh|鋰}}
<end> - Page 798 Jesus: Skipping lect Middle Chinese (ltc) not using automatic simplification: <from> *: Middle Chinese:
{{tt|ltc|移鼠}}
<to> *: Middle Chinese:{{tt|ltc|移鼠}}
<end> - Page 914 homosexuality: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t|lzh|同性戀}}
<to> *: Literary Chinese:{{t|lzh|同性戀}}
<end> - Page 1448 fragrance: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t|lzh|馥}}
,{{t|lzh|馨香}}
,{{t|lzh|馝}}
,{{t|lzh|馞}}
<to> *: Literary Chinese:{{t|lzh|馥}}
,{{t|lzh|馨香}}
,{{t|lzh|馝}}
,{{t|lzh|馞}}
<end> - Page 2098 longan: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t|lzh|龍目}}
<to> *: Literary Chinese:{{t|lzh|龍目}}
<end> - Page 2124 Wikipedia: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{t-check|lzh|維基大典}}
<to> *: Literary Chinese:{{t-check|lzh|維基大典}}
<end> - Page 2403 sun/translations: Skipping lect Literary Chinese (lzh) not using automatic simplification: <from> *: Literary Chinese:
{{tt|lzh|日}}
;{{tt|lzh|陽}}
,{{tt|lzh|阳}}
;{{tt|lzh|太陽}}
,{{tt|lzh|太阳}}
<to> *: Literary Chinese:{{tt|lzh|日}}
;{{tt|lzh|陽}}
,{{tt|lzh|阳}}
;{{tt|lzh|太陽}}
,{{tt|lzh|太阳}}
<end> - Page 2410 you/translations: Skipping lect Old Chinese (och) not using automatic simplification: <from> *: Old Chinese:
{{t|och|你|tr=nɯʔ}}
<to> *: Old Chinese:{{t|och|你|tr=nɯʔ}}
<end>
Benwing2 (talk) 01:54, 2 March 2023 (UTC)
- @Theknightwho Any objection to cleaning up
{{pinyin reading of}}
to remove the simplified readings? Benwing2 (talk) 02:49, 2 March 2023 (UTC)- @Benwing2 Thanks for all this. Cleaning up
{{pinyin reading of}}
sounds like a good idea, along with{{yue-jyutping of}}
as well. Theknightwho (talk) 03:59, 2 March 2023 (UTC) - @Benwing2: Good idea. Anatoli T. (обсудить/вклад) 04:30, 2 March 2023 (UTC)
- @Benwing2: But please check cases where the automated form <> manual, if you can. Anatoli T. (обсудить/вклад) 04:37, 2 March 2023 (UTC)
- @Benwing2 Thanks for all this. Cleaning up
- @Benwing2: I think "Literary Chinese" is often misused in translations and elsewhere when they literary written Chinese language code, especially with pinyin transliteration, e.g.
{{t|lzh|之|tr=zhī}}
. This should be{{t+|cmn|之|tr=zhī}}
instead with{{qualifier|literary}}
but that just complicates things. This is equally applicable to other Chinese varieties like Cantonese, just a different translit, e.g.{{t|yue|之|tr=zi1}}
. Anatoli T. (обсудить/вклад) 04:46, 2 March 2023 (UTC)- Personally, I think we should rename "Literary Chinese" to "Classical Chinese" anyway (which would clear up this problem, among other issues). Theknightwho (talk) 04:56, 2 March 2023 (UTC)
- @Theknightwho, Atitarev Yes I'll check to make sure the simplified form given is actually the simplified equivalent of the traditional form given. Also I'm thinking of renaming
{{pinyin reading of}}
to{{cmn-pinyin of}}
in the process; this will make it consistent with{{yue-jyutping of}}
, with the corresponding headword template{{cmn-pinyin}}
, and with other language-specific form-of templates. Benwing2 (talk) 05:00, 2 March 2023 (UTC)- @Benwing2 Agree - sounds good. Theknightwho (talk) 05:02, 2 March 2023 (UTC)
- @Theknightwho, Atitarev, Wpi31 I am doing a test run now. A lot of warnings are coming out; the first 1,800 pages processed produced 275 warnings concerning mismatched simplified vs. traditional. Some of them are resolved by assuming the params are reversed, but a lot still remain. See User:Benwing2/clean-pinyin-jyutping-of-warnings-first-1800. Can you comment on a handful of these and let me know if there's some automatic way of resolving some of them? Note that the issues appear front-loaded, i.e. the first 13,000 pages only produced 353 warnings, not much more than the 275 warnings coming from the first 1,800 pages. Benwing2 (talk) 06:33, 2 March 2023 (UTC)
- Oh yeah, one other issue: definitions in
{{pinyin reading of}}
. Some template invocations have definitions in them, e.g. dǎngùchún ("cholesterol"), àihù ("to cherish"), yìxuéxí ("e-learning"), Sìchuān ("Sichuan"). The|def=
param is currently ignored by{{pinyin reading of}}
but my script flags the unrecognized param. Should we just remove this param entirely or should we modify{{pinyin reading of}}
to display the definition when provided? Benwing2 (talk) 06:38, 2 March 2023 (UTC)- @Benwing2: Yeah, remove the definitions. Anatoli T. (обсудить/вклад) 06:53, 2 March 2023 (UTC)
- A number of these are listing the variant/alternative forms alongside the standard trad/simp characters, and sometimes they list them separately. For example, on fēng, 峯 is a variant form of 峰 (the main entry, which has no simplified); gè is 個 (trad), with simplfied 个 and alt form 箇; wěi has 偽 (main entry, in traditional) and variant 僞, but the simplified 伪 is listed on a separate line. It should be easy to fix them if one could tell which case it falls under, but sadly that's the complicated part which I think requires manual work. – Wpi31 (talk) 06:45, 2 March 2023 (UTC)
- @Benwing2:
- Anatoli T. (обсудить/вклад) 06:52, 2 March 2023 (UTC)
- @Wpi31, Atitarev Thank you! One more thing is that currently
{{pinyin reading of}}
supports up to 10 numbered params, including two trad/simp pairs (1/2, 3/4) and 6 more additional forms. Very few pages use these params but there are a few:- guó:
{{pinyin reading of|國|国|圀|囻|囶|囯}}
- lián:
{{pinyin reading of|聯|联|聮|䏈|聫|聨|𦕱}}
- luó:
{{pinyin reading of|騾|骡|驘|臝|䯁|𩦻|𩧣}}
- guó:
- I'm thinking in the new
{{cmn-pinyin of}}
we don't need this many params, and they should maybe instead be converted to multiple lines. Does this make sense? If so, how many params should be supported at the max? (There won't be simplified equivalents supported, just a set of numbered params to be displayed using{{l}}
.) Benwing2 (talk) 07:05, 2 March 2023 (UTC)- @Benwing2: Personally, I don't mind if they are split into all multiple lines. It sort of makes sense to keep together alternative traditional forms, e.g. Táiwān:
- vs
- It's not a big deal, though, if they are split (IMO!). Anatoli T. (обсудить/вклад) 10:30, 2 March 2023 (UTC)
- Of course, simplified forms need to be removed carefully, taking care of cases where simplified forms can be both trad. and simp. such as
- Anatoli T. (обсудить/вклад) 10:34, 2 March 2023 (UTC)
- @Wpi31, Atitarev Thank you! One more thing is that currently
- Oh yeah, one other issue: definitions in
- @Theknightwho, Atitarev, Wpi31 I am doing a test run now. A lot of warnings are coming out; the first 1,800 pages processed produced 275 warnings concerning mismatched simplified vs. traditional. Some of them are resolved by assuming the params are reversed, but a lot still remain. See User:Benwing2/clean-pinyin-jyutping-of-warnings-first-1800. Can you comment on a handful of these and let me know if there's some automatic way of resolving some of them? Note that the issues appear front-loaded, i.e. the first 13,000 pages only produced 353 warnings, not much more than the 275 warnings coming from the first 1,800 pages. Benwing2 (talk) 06:33, 2 March 2023 (UTC)
- @Benwing2 Agree - sounds good. Theknightwho (talk) 05:02, 2 March 2023 (UTC)
- @Theknightwho, Atitarev Yes I'll check to make sure the simplified form given is actually the simplified equivalent of the traditional form given. Also I'm thinking of renaming
- Personally, I think we should rename "Literary Chinese" to "Classical Chinese" anyway (which would clear up this problem, among other issues). Theknightwho (talk) 04:56, 2 March 2023 (UTC)
@Wpi31, Theknightwho See User:Benwing2/clean-pinyin-jyutping-of-warnings. These are all the warnings generated when converting {{pinyin reading of}}
(471 warnings) and {{yue-jyutping of}}
(34 warnings); total of 505 warnings. I went ahead and created {{cmn-pinyin of}}
and will be doing a run tomorrow (= Thursday, US time) to convert the templates appropriately. Benwing2 (talk) 10:19, 2 March 2023 (UTC)
- @Atitarev Thanks. For the moment I'm not splitting any lines. I wrote
{{cmn-pinyin of}}
to take up to five variants; we can expand it if more are needed. The conversion script is running; there are about 61,000 entries so it may take most of a day to finish. It doesn't delete any lines so cases like 后 that are both a traditional character in its own right and a simplified variant character won't be messed up. Benwing2 (talk) 01:13, 3 March 2023 (UTC)
Warnings galoreEdit
- @Wpi31, Theknightwho, Atitarev I finished another run of my script to remove redundant translations. User:Wpi31 pointed out that my previous runs missed lots of pages; this run gets all these pages and also checks and outputs warnings for disagreement between traditional and equivalent simplified forms, rather than just silently skipping those cases. Unfortunately a lot of warnings got output (1,508 of them when processing 12,478 pages). I have split the warnings into three categories:
- User:Benwing2/remove-redundant-chinese-translations-warnings-3-trad-simp-disagreement (777 warnings): Those where the purported simplified equivalent does not match the traditional version according to our tables. In some cases the traditional and simplified were just reversed; those are fixed automatically. For the remainder, there are occasional false positives where the two versions are really just two different translations, but most of them look to be genuine warnings.
- User:Benwing2/remove-redundant-chinese-translations-warnings-3-junk-after-chinese-line (531 warnings): Those where the
Chinese:
line has something following it (usually a translation of some sort, sometimes{{t-needed|zh}}
or{{not used|zh}}
). There are a few false positives here (4, I think) for Chinese Pidgin English that can be ignored. - User:Benwing2/remove-redundant-chinese-translations-warnings-3-misc (200 warnings): All other warnings. Mostly either an unrecognized lect or a case where the traditional and equivalent simplified have mismatching transliterations.
- @Wpi31, Theknightwho, Atitarev I finished another run of my script to remove redundant translations. User:Wpi31 pointed out that my previous runs missed lots of pages; this run gets all these pages and also checks and outputs warnings for disagreement between traditional and equivalent simplified forms, rather than just silently skipping those cases. Unfortunately a lot of warnings got output (1,508 of them when processing 12,478 pages). I have split the warnings into three categories:
For the first and third categories, you should be able to speed up processing the warnings by editing the portion of the line after the <to>
tag rather than directly fixing up the page in question. If you do that, let me know and I can do a bot run to push those changes to the appropriate pages. For the second category (junk after 'Chinese:' header), it may be necessary to actually edit the page to add the appropriate line; but it may be possible to speed up processing as well. Specifically, if you edit the portion of the line after the <to>
tag and make it contain the appropriate text for a 'Mandarin:' line, I can write a bot script to insert that text in its own 'Mandarin:' line at the appropriate position.
Apologies for all the warnings; it seems things are often messy currently. Benwing2 (talk) 05:42, 3 March 2023 (UTC)
- @Benwing2: Some comments on the 1st list.
- kòngxián - 空閒 (trad), 空閑 (variant trad), 空闲 (simp)
- GDP - not pinyin
- liùyuè - 六月 or 6月 are variants. IMO, should not be liùyuè but 6yuè.
- zháohuǒ - 著火 (trad), 着火 is both variant trad or simp
- yù, ào - 隩 trad, 奧 is both variant trad or simp
- English abbreviations or Arabic numerals shouldn't probably get pinyin entries but people do make them. Anatoli T. (обсудить/вклад) 10:10, 3 March 2023 (UTC)
- @Atitarev, Wpi31, Theknightwho Thanks User:Atitarev for the detailed comments! It sounds like we need some more thinking around variant forms; the current trad//simp display might be insufficient. For example we might need tables mapping canonical traditional forms to their variant forms, as well as auto-display of variant forms under some circumstances (maybe a flag of some sort to
{{l}}
and{{m}}
, which can be triggered automatically by{{t}}
,{{cmn-pinyin of}}
, etc.). Benwing2 (talk) 10:41, 3 March 2023 (UTC)- Regarding the variant traditional forms (which are standard in some regions and not to be confused with variant forms), they are usually ignored. It may be possible to autogenerate variant traditional forms, but for characters with complicated correspondences like 著/着 or 檯/枱/臺/台 it's better to not overcomplicate it and simply use the manual
//
syntax, which is what I did when fixing the translation templates. Wpi31 (talk) 11:08, 3 March 2023 (UTC)
- Regarding the variant traditional forms (which are standard in some regions and not to be confused with variant forms), they are usually ignored. It may be possible to autogenerate variant traditional forms, but for characters with complicated correspondences like 著/着 or 檯/枱/臺/台 it's better to not overcomplicate it and simply use the manual
- @Atitarev, Wpi31, Theknightwho Thanks User:Atitarev for the detailed comments! It sounds like we need some more thinking around variant forms; the current trad//simp display might be insufficient. For example we might need tables mapping canonical traditional forms to their variant forms, as well as auto-display of variant forms under some circumstances (maybe a flag of some sort to
- Thanks for the bot job. I'll look into cleaning it up next week when I have time. – Wpi31 (talk) 11:02, 3 March 2023 (UTC)
- PS Question: I assume the errors generated from the earlier runs are also included here? – Wpi31 (talk) 11:10, 3 March 2023 (UTC)
- @Wpi31 Yes. They are separate from the warnings generated when converting
{{pinyin reading of}}
but subsume all previous translation-table-related errors/warnings. Benwing2 (talk) 19:20, 3 March 2023 (UTC) - BTW conversion of
{{pinyin reading of}}
to{{cmn-pinyin of}}
is done. For invocations that couldn't be cleaned up properly, I went ahead and renamed the template and added|attn=1
, which causes the page to categorize into CAT:Requests for cleanup in Hanyu Pinyin entries. Once the invocation is cleaned up, just remove the|attn=1
. Thanks! Benwing2 (talk) 20:07, 3 March 2023 (UTC)
- @Wpi31 Yes. They are separate from the warnings generated when converting
- PS Question: I assume the errors generated from the earlier runs are also included here? – Wpi31 (talk) 11:10, 3 March 2023 (UTC)
- @Benwing2: I checked a couple of entries in your 2nd list of warnings. There were problems with translations. No nesting (Chinese/Mandarin), using only the simplified forms, etc. I fixed e.g. in wind_instrument#Translations and acclamation#Translations. --Anatoli T. (обсудить/вклад) 01:11, 6 March 2023 (UTC)
- A large amount of the unrecognised lects listed there are Teochew (and a few Taishanese), which have their own language codes but are still a subvariety of Min Nan or Cantonese respectively. Thus they are sub-subindented under them, but I don't think that is the best way to display them given that their treatment has since then changed. Before I go and make any further changes (but the codes need to be changed in any case), can we agree on how to format Teochew and Taishanese in translations? @Justinrleung, RcAlex36 – Wpi31 (talk) 13:35, 6 March 2023 (UTC)
- PS: I also see other lects including Hokkien and Sichuanese appearing in the translations despite we treat the entire thing as Min Nan. – Wpi31 (talk) 14:02, 6 March 2023 (UTC)
- @Wpi31: I'm not entirely sure how it should be done. I kind of like the set up in
{{zh-pron}}
where nesting only occurs if both Hokkien and Teochew appear, not if only one of them occurs. This might be tricky to deal with in translations. — justin(r)leung { (t...) | c=› } 22:17, 10 March 2023 (UTC)- Minnan is not a language, and I believe we try to organize translations by language. Indenting Teochew and Hokkien under Minnan would be indenting Russian and Ukrainian under East Slavic. ISO is considering a proposal to break up Minnan into languages, which will hopefully solve the problem for us by next year, but meanwhile IMO we shouldn't follow the ISO breakdown for Chinese. kwami (talk) 09:24, 11 March 2023 (UTC)
- @Wpi31: I'm not entirely sure how it should be done. I kind of like the set up in
- PS: I also see other lects including Hokkien and Sichuanese appearing in the translations despite we treat the entire thing as Min Nan. – Wpi31 (talk) 14:02, 6 March 2023 (UTC)
I just created Wiktionary:Quotations/Resources as a guide for finding quotations, aimed at both new and experienced users. If anyone has other resources that you like to use, please add them to the page. (pinging @CitationsFreak, who I assume loves citations) Ioaxxere (talk) 02:33, 12 February 2023 (UTC)
- I've used Issuu and Genius in the past.
- Issuu has mostly modern magazines, but I've seen some books and newspapers and manuals there. Here's an example: https://issuu.com/search?q=example .
- Genius has mostly songs. This means that it has modern slang[note 1] (as anyone can make a song, publish it, and then write their own lyrics). It also has some non-song stuff (like Atticus Finch's closing speech in To Kill A Mockingbird). I think they even have scores for football (soccer) games, and even guides on how to create your own Genius lyrics! Of course, I (personally) just use it for song-lyrics checking[note 2]. Here's an example: https://genius.com/search?q=example
- [note 1] And slang from any time since records became popular, as anyone could transcribe the lyrics to a novelty song that only sold, like, 10 copies back during WWII.
- [note 2] Counting albums with only people talking as "songs". Three citations, for all senses. (talk) 03:30, 12 February 2023 (UTC)
- Thanks for reminding me about Genius, I've been using it to cite "urban slang" like pept and crodie. Ioaxxere (talk) 03:37, 12 February 2023 (UTC)
- @CitationsFreak I've added Issuu and Genius. If you like, make sure the info is accurate. Ioaxxere (talk) 04:50, 12 February 2023 (UTC)
- I’m not sure we should list Reddit and Twitter. The last time there was a vote on this there was no consensus that these sites should be used for quotes. Better to focus on non-controversial resources, I think. — Sgconlaw (talk) 06:13, 12 February 2023 (UTC)
- I think we can cite Twitter now. I've seen cases in RFV where a term was passed with three Twitter cites. Three citations, for all senses. (talk) 20:25, 12 February 2023 (UTC)
- Only when it is specifically agreed on a case-by-case basis, hence the recent "CFI votes" etc. —Al-Muqanna المقنع (talk) 21:44, 12 February 2023 (UTC)
- I think we can cite Twitter now. I've seen cases in RFV where a term was passed with three Twitter cites. Three citations, for all senses. (talk) 20:25, 12 February 2023 (UTC)
- I think Genius could only technically be considered durably archived if the lyrics have appeared in print, or if the song was released as a single or in an album using physical media (vinyl record, CD, etc.). There are a lot of songs on Genius.com that seem to only exist on YouTube and the like. These can still be cited, but only under the caveats for online media. Please correct me if my understanding is wrong. 70.172.194.25 23:24, 16 February 2023 (UTC)
- That's what I was thinking. I would say that a song that only sold 1 copy during the Great Depression would be considered more durably archived than a song with a million views that only exists on YouTube, all other things being equal. Three citations, for all senses. (talk) 23:49, 16 February 2023 (UTC)
- I’m not sure we should list Reddit and Twitter. The last time there was a vote on this there was no consensus that these sites should be used for quotes. Better to focus on non-controversial resources, I think. — Sgconlaw (talk) 06:13, 12 February 2023 (UTC)
Link for absolutive in template inflection_ofEdit
Could someone please correct the reference given for 'absolutive' in {{inflection of}}
. It currently misdirects, for Pali, to w:absolutive, which is an article on the absolutive case, which is singularly inappropriate for verbs. This may also be an issue for Sanskrit. absolutive#Noun would be a good reference. --RichardW57m (talk) 14:30, 13 February 2023 (UTC)
- @RichardW57m There is no current support for making a given tag display in different ways for different languages. However, we've already run into the issue you describe and the way it's currently handled is by making the inflection tag be written differently but display the same, e.g. we have an inflection tag 'terminative case' and another 'terminative aspect' and they both display as 'terminative'. We have an 'absolute' tag that links to Appendix:Glossary#absolute; if this isn't the same as absolutive#Noun then I can create another tag 'absolutive participle' or something, with appropriate display and abbreviation ('absp'?). Benwing2 (talk) 18:59, 13 February 2023 (UTC)
- That's the advantage of using an explanation on Wiktionary - the entry for the noun 'absolutive' already covers both meanings. The tag 'absolute' doesn't link to Appendix:Glossary#absolute - and there's no such fragment! I found I'd been using the tag 'abs', without realising that it didn't lead to anything very obvious. (These have already been converted to 'absolutive'.) I don't like calling it a participle - etymologically it appears to be the instrumental case form of a verbal noun, and it undergoes no further inflection. For the absolutive, I'd prefer a display form of 'absolutive', and for the inflection tags I'd prefer 'absvf', with long form, if needed, of 'absolutive verb form'. --RichardW57m (talk) 13:49, 14 February 2023 (UTC)
- If you're planning to do the conversions yourself, note that the inflection tag 'absolutive' is used by both
{{inflection of}}
and its non-Roman Pali front end,{{pi-nr-inflection of}}
. It's probably simpler to let me know when the new tag is available, and I can do the change myself - there aren't many entries for the Pali absolutives. --RichardW57m (talk) 13:49, 14 February 2023 (UTC) - Could you please set up, for maintenance purposes, maintenance categories (perhaps just one) to catch the use of the inflection tags 'abs' and 'absolutive' for Pali inflections so as to catch their inappropriate use. --RichardW57m (talk) 13:49, 14 February 2023 (UTC)
- At the second attempt, I think I've found the relevant code at Module:form of/data2. I'll see if I can implement this tonight as @RichardW57. Of course, another solution would have been to change the label to 'gerund', though a Slavic and an Indic 'gerund' are quite different adverbs. --RichardW57m (talk) 15:27, 20 February 2023 (UTC)
- The new inflection tag (sane input: 'absvf') has now been created. I haven't yet tried setting up the maintenance categories, which I think are needed as the use of the wrong tag is not blindingly obvious and in practice reading documentation is usually a last resort. I've fixed the 23 absolutive terms to use 'absvf'. --RichardW57 (talk) 08:44, 21 February 2023 (UTC)
- Test words: disvā and ປຫາຍ (pahāya). RichardW57m (talk) 15:52, 14 February 2023 (UTC)
Universal Code of Conduct revised enforcement guidelines vote resultsEdit
The recent community-wide vote on the Universal Code of Conduct revised Enforcement Guidelines has been tallied and scrutinized. Thank you to everyone who participated.
After 3097 voters from 146 Wikimedia communities voted, the results are 76% in support of the Enforcement Guidelines, and 24% in opposition. Statistics for the vote are available. A more detailed summary of comments submitted during the vote will be published soon.
From here, the results and comments collected during this vote will be submitted to the Board of Trustees for their review. The current expectation is that the Board of Trustees review process will complete in March 2023. We will update you when their review process is completed.
On behalf of the UCoC Project Team,
Mervat (WMF) (talk) 21:21, 14 February 2023 (UTC)
- Discussion moved to WT:RFDO.
SimplificationEdit
There is a proposal on meta that would substantially simplify page structure and reduce risk for mistakes. Taylor 49 (talk) 18:07, 15 February 2023 (UTC)
- That would be a very useful feature, thank you for raising that issue. JeffDoozan (talk) 18:32, 15 February 2023 (UTC)
- FYI there is another proposal seeking to solve the same problem (and also maybe others) in a different way. — excarnateSojourner (talk · contrib) 19:02, 15 February 2023 (UTC)
en.wikt's options for lemmatization approach for prevocalic forms of prefixesEdit
Seeking people's preferences on the following options, regarding lemmatization approach for prevocalic alternative forms of prefixes:
Right now en.wikt largely follows a pattern whereby the following pages are separate, and the cats have hyperlinked cross-references to each other:
I have been proceeding according to the approach above. Recently I saw where another editor deleted one of the prevocalic categories and changed the etymology sections' {{confix}}
or {{affix}}
parameters (for example, for |en|rhiz-|-ome to become |en|rhizo-<alt:rhiz->|-ome), most likely because they feel that all derived forms from what is (in lemmatic essence) the selfsame prefix should fall into a unified category. That desire is laudable; the only question is which approach en.wikt would convene upon as its standard, if any consensus exists. To me it seems appropriate that if en.wikt has separate entries for the alt forms at all (many dictionaries do not; some do things such as headword "rhiz(o)-"), then it makes sense to retain separate categories for each and cross-reference them to each other. The advantage is that it is transparently clear at a glance which derived terms come from which alt form, which has a certain small philologic value. But if a consensus develops to retain separate entries but have the etymology sections link to a lemma form for categorization, I will follow suit and will make changes in that direction in future.
Thanks for any thoughts or tips. Regards, Quercus solaris (talk) 04:09, 16 February 2023 (UTC)
- For Sanskrit, compounds are given in terms of morphemes, and explaining the assimilation is generally (perhaps always) left as an exercise for the reader. For example, no explanation is given of why the suffix -अन (-ana) often surfaces as -अण (-aṇa), though the explanation may be given when the latter variant, which at least gets a mention under the lemma form, gets an entry. (In this case the rule is so pervasive that one can't decline Sanskrit nouns without knowing it.) For Pali, treatment mostly follows the same pattern, though we seem mostly to be allowing a decent collection of Pali terms to build up before documenting them. (I don't entirely trust the text books - most of them are aimed at teaching the understanding of Pali.) --RichardW57m (talk) 11:54, 16 February 2023 (UTC)
- For these English prefixes, I would select the longer form as the lemma and treat the forms with elision as variants, and call out any examples where expected elision fails as exceptions in the etymology of the examples. --RichardW57m (talk) 11:54, 16 February 2023 (UTC)
- Sounds good, thanks. In the case of en.wikt's handling of EN's use of ISV prefixes, it is good that it is already the case that the longer form (i.e., the -o- form) is the main entry, and the prevocalic form points to it via
{{alternative form of}}
. The remaining question is how to handle the etymology of each derived term (for example, for |en|rhiz-|-ome versus |en|rhizo-|alt1=rhiz-|-ome) and thus how the autocats will be handled. TBD whether any consensus will materialize here/now (in this thread). If not, then perhaps for now this aspect will simply remain unstandardized (notwithstanding an inconsistency that is fairly venial anyway). In the meantime I will aim at least to finish ensuring that each sibling pair of autocats consistently cross-references (between each other). Someday, as I could envision, someone might impose a consistent method on the categorization aspect (whether me or someone else; the biggest theme regarding "who" is "whoever would bother to implement it, either way, scut-wise"); the rationale at that point (for which method to impose) would be "no one else had a strong preference, so flip a coin, then stick with that result afterward." Quercus solaris (talk) 18:52, 16 February 2023 (UTC)
- Sounds good, thanks. In the case of en.wikt's handling of EN's use of ISV prefixes, it is good that it is already the case that the longer form (i.e., the -o- form) is the main entry, and the prevocalic form points to it via
Whether IPs can participate in CFI discussionsEdit
Bringing this up since I'm not sure about the best solution myself.
This is the relevant part of WT:CFI:
- "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks."
The issue is what we count as an "editor". Under Wiktionary:Voting policy#Voting eligibility, an account is a requirement for voting. Below that, we have:
- "Where the consensus of editors is required for discussions other than formal votes at Wiktionary:Votes (for example, in discussion rooms such as Wiktionary:Beer parlour and on discussion pages such as Wiktionary:Requests for deletion and Wiktionary:Requests for verification), the support of at least two-thirds of the editors taking a supporting or opposing stance in a discussion on an issue is a hint for the threshold for consensus, but it is not set in stone. As a result, the consensus determination is somewhat indeterminate and can take into account considerations other than pure tallying. Tallying does play a role."
Since using the literal definition of "editor" meaning "anyone who has ever edited, including vandals" would be awful, the assumption would be equate "editor" and "user who can vote".
However, a proposal to forbid IPs from participating in RFDs, which are similar to CFI votes, failed 6-8.
Arguments in favourEdit
- IP editors which establish themselves in the community are essentially equivalent to regular users, and often have a better knowledge of policy.
Arguments againstEdit
- Many [6] CFI votes take place in WT:VOTES, which is inaccessible to IP editors. It would be easy to game the system by forum shopping (choosing where to hold the votes to try to get a better result).
- It would be easy to hijack RFV discussions with a swarm of editors, e.g. by linking a vote to 4chan.
- come on why is it always 4chan you people get on.. actually 4chan has a policy against brigading, they'll delete your links, reddit doesn't, and discord doesn't enforce any rules + is entirely deep-net Fishing Publication (talk) 14:31, 28 February 2023 (UTC)
Possible solutionsEdit
- Allow any IP to participate
- Allow some IPs to participate, maybe with edit requirements
- Allow only users to participate
- Hold all CFI votes in WT:VOTES in the future (equivalent to above)
Ioaxxere (talk) 21:48, 16 February 2023 (UTC) Ioaxxere (talk) 21:48, 16 February 2023 (UTC)
- Regarding the argument in favour, how can it be ascertained whether a particular IP address represents a single individual? — Sgconlaw (talk) 22:33, 16 February 2023 (UTC)
- The approach of sending these terms to WT:VOTES was an initiative of one user and was mostly disliked by the community. At least five of the abstentions on Wiktionary:Votes/2022-05/elfism validation pointed out that (in their view) it was a misuse of the formal vote process, and you can find similar comments on Wiktionary:Votes/2022-05/melanoheliophobia validation, etc. I don't think that would be a widely accepted solution. People wanted these discussions to be held on the RfV page itself.I'm not sure why we need a specific policy on this. There's no specific policy saying unregistered users can't comment on RfDs or RfVs in general. Why would CFI discussions on whether to accept online sources be special?
- It might be the final motivation I needed to create an account if the community decides I can't fully participate in RfV... or maybe I'll be content to watch from the sidelines and work on other things. IDK. 70.172.194.25 23:05, 16 February 2023 (UTC)
- I agree with the IP here. I don't see the point of holding formal votes for CFI attestation discussions; it seems overkill. Furthermore this seems like a solution in search of a problem. We have one well-known IP editor who seems to have a static IP address and contributes to RFD/RFV discussions, and I haven't seen very many (if any) other IP's contributing like this. If and when we get unknown IP editors contributing to RF* discussions en masse, we can revisit the issue (or just ignore the unknown IP's). Benwing2 (talk) 06:04, 18 February 2023 (UTC)
Proto-Romance pronunciations in attested wordsEdit
@Kwékwlos has been systematically adding Proto-Romance pronunciations to attested Late Latin words such as ceresia, cisorium, and rasorium. Similarly, Proto-Italo-Western pronunciations for portaticum, campania and Proto-Gallo-Romance for missaticum.
I can't say that I find the idea fundamentally incorrect. These pronunciations do seem more plausible for the period(s) in question—as far as popular speech is concerned, that is—than one based on Cicero's contemporaries, or one based on modern Italian clergy. I worry, however, that this may be a bit too 'adventurous' for the purposes of Wiktionary. A step too far, as it were. Thoughts?
Pinging @Ser be etre shi, Al-Muqanna, Catonif, Hazarasp, Ultimateria, Fay Freak as potentially interested parties.
Nicodene (talk) 02:05, 17 February 2023 (UTC)
- If the issue is that these are reconstructions: Our Classical Latin pronunciations could be called reconstructions too, even if we can be pretty confident in them based on various forms of evidence, so I'm not sure that's enough of a reason to exclude these (not to mention that we also include reconstructed IPA for Old Chinese, Ancient Egyptian, and so forth in mainspace).
- If the issue is that the reconstructions for Proto-Romance and the other branches are highly uncertain or contentious, I could see that as a potential problem.
- If the issue is rather that Proto-Gallo-Romance */meˈsad͡ʒo/ shouldn't be treated as belonging to the same chronolect as Latin missaticum, then I can also see that as a potential problem. However, it seems like we do treat Proto-Gallo-Romance as a form of Latin in the reconstruction namespace, and even include IPA when doing so: e.g. Reconstruction:Latin/leviarium. If it's appropriate there, why would it not be appropriate in mainspace? Maybe it shouldn't be the only IPA we provide, but I wouldn't personally have an issue with including it alongside the Classical or Ecclesiastical pronunciations (as relevant). That said, I could also see a case for splitting Proto-Gallo-Romance off from Latin, like how Proto-West Germanic is split off from Proto-Germanic, but that's apparently not what the Romance editing community has chosen to do so far.
- I don't know much about Latin or the Romance languages so these are mostly just intuitive off-the-cuff responses, feel free to ignore. 70.172.194.25 03:07, 17 February 2023 (UTC)
- The 'proto-pronunciations' are based mainly on the comparative method as applied to Romance (albeit with nebulous support from contemporary spelling mistakes), whereas the reconstructed Classical pronunciation is based mainly on evidence from the relevant period. Hence the former are 'at home' in entries reconstructed from Romance data, whereas the latter is so in attested entries, at least for time periods where a Classical pronunciation makes sense. I suppose that is the main difference in the end.
- I have thought about splitting up the proto-languages, actually, but thought that would be rather contentious. It isn't always clear which proto-language a given reconstruction should belong to, and there are a fair amount of scholars who reject concepts such as 'Proto-Gallo-Romance' or 'Proto-Italo-Western Romance' (which entails also rejecting the branch model for Romance). How does all this go over in the Germanic community, I wonder. Nicodene (talk) 05:01, 17 February 2023 (UTC)
- I generally support adding such reconstructed pronunciations where appropriate—it makes no sense, IMO, to only add them when a term isn't attested, i.e. when it's in the Reconstructions space. My ideal preference would actually be for
{{la-IPA}}
to generate a full suite of pronunciations over time similar to the pronunciation module for Ancient Greek. —Al-Muqanna المقنع (talk) 20:27, 17 February 2023 (UTC)- In my opinion, the main reason it’s helpful to display a reconstructed Classical Latin pronunciation and an Ecclesiastical Latin pronunciation on Latin entries is because both are common pronunciation styles used by present-day learners and speakers of Latin. That motivation doesn't apply to Proto-Romance, Proto-Italo-Western Romance, Proto-Gallo-Romance, Proto-Iberian-Romance etc. reconstructed pronunciations: it would be very fringe for any contemporary user of Latin to pronounce Latin words in that manner. As the list above indicates, there are many steps along the way from Latin to modern Romance. The best way for interested parties to get a complete and accurate picture of how their pronunciation evolved is to use a source that comprehensively describes the relevant sound changes. I think the normalized Latin spelling provides all of the important lexicographical information about the form of words of this type that had a phonetically regular development; the detailed history of their pronunciation is a matter better covered by other types of resources than dictionary entries. (Furthermore, there are disagreements among scholars about details.) For comparison, we don’t include additional pronunciations on Old English entries showing stages between Old English and Middle English, or on Sanskrit entries showing stages between it and its descendants.
- At the same time, I do think that it can be misleadingly anachronistic to include the reconstructed Classical Latin pronunciation on pages for medieval Latin words or Late Latin words that may never have been pronounced that way. To me, it seems fine to just omit pronunciation information for words of this type. But I’m not strongly opposed to the practice of including reconstructions like */meˈsad͡ʒo/ in pace of the reconstructed Classical Latin pronunciation in such cases, with an appropriate label describing the language stage that the reconstruction is supposed to represent (as in the examples).
- I’m pretty strongly opposed to adding later pronunciations like this to the la-IPA template based on how it worked out in practice when we had a “Vulgar Latin” pronunciation in that module: it was included inappropriately on miscellaneous pages (at one point, it was inexplicably added to the article Status Uniti Americae) and the actually implemented pronunciation had several bugs that could not be fixed because there were no clear criteria in the first place for what it was meant to represent. Templates do have advantages, in particular consistency, so I would say having a separate template specifically for Late Latin or Proto-Romance pronunciation might be helpful; my main question would be if we’re dealing with a small enough number of nodes on the Romance tree to make a template of that kind feasible.--Urszag (talk) 02:34, 18 February 2023 (UTC)
- I tend to agree with User:Urszag here. I don't really see the point of most reconstructed pronunciations, e.g. I'm generally opposed to Proto-Germanic pronunciations since (a) the spelling indicates pretty clearly the pronunciation, (b) any type of narrow pronunciation will not represent a scholarly consensus. Benwing2 (talk) 06:07, 18 February 2023 (UTC)
- If we had a phonemic spelling for Reconstruction:Latin, as we do for Proto-Germanic, there wouldn't really be a reason to provide phonemic transcriptions. We're stuck, however, with Classical spellings (as a matter of policy), even for terms that might as well have originated in a community of Balkan shepherds in the eleventh century.
- I don't recall seeing a narrow reconstructed pronunciation. Except our Classical ones. Nicodene (talk) 06:42, 18 February 2023 (UTC)
- Inappropriate inclusion is always going to be an issue regardless. I would consider the Classical pronunciation currently listed at Status Uniti Americae to be equally inexplicable, and the "Ecclesiastical Latin" pronunciations currently given are really a 19th-century (Italianate) style that's not universal in the present-day Catholic Church, let alone trans-historical or generally appropriate for post-Classical Latin as the label by itself might imply. If the main purpose of the pronunciation template is the utility of modern speakers of Latin then that probably needs to be indicated somewhere, since it conflicts with what I'd take as the more intuitive understanding that the Classical pronunciation simply describes how the word was pronounced in the Classical era. A formally reconstructed Classical pronunciation will not necessarily be the same as a conventional "classicising" pronunciation by a modern speaker. —Al-Muqanna المقنع (talk) 09:30, 18 February 2023 (UTC)
- Are you suggesting that the ‘Ecclesiastical’ label should be removed entirely and replaced by ‘Italianate’? I’d say that it’s definitely standard to pronounce words like caeli in an Italian fashion in Anglo-Catholic churches at least (though ‘Italianate’ would do as a label instead, I suppose). Overlordnat1 (talk) 10:25, 18 February 2023 (UTC)
- Yes but it is not standard in e.g. Latin services in Germany. Its current use in England is a product of 19th-century ultramontanism which subsequently also bled over to Anglo-Catholicism. (There is some info on Wikipedia about the traditional English pronunciation preceding it if you're curious.) Italianate is the correct name for the pronunciation we give, “Ecclesiastical Latin” is far too broad and implies it is much older and more generally adopted than it is. So it should ideally be changed. —Al-Muqanna المقنع (talk) 10:28, 18 February 2023 (UTC)
- This is very acute of us. Our pronunciation information about “Ecclesiastical Latin” is a virtual concept essentially created by the Anglo-centrism of Wikipedia which does not correspond to our use of “Ecclesiastical Latin”, e.g. as that Medieval Spanish one I thought of when deriving Spanish mantel. I always found it odd because in Germany, or Poland or whatever, this pronunciation has been used in no context, and I have only ever heard it in the Modern Latin scene attached to a Anglo-centric social media sphere, which of course often travelled to Italy and connected with its churchpeople (as in Germany, this pronunciation unused in church would be marked as Italian!), so this turns out the reasoning pertinent to this distribution. Fay Freak (talk) 14:05, 18 February 2023 (UTC)
- Yes but it is not standard in e.g. Latin services in Germany. Its current use in England is a product of 19th-century ultramontanism which subsequently also bled over to Anglo-Catholicism. (There is some info on Wikipedia about the traditional English pronunciation preceding it if you're curious.) Italianate is the correct name for the pronunciation we give, “Ecclesiastical Latin” is far too broad and implies it is much older and more generally adopted than it is. So it should ideally be changed. —Al-Muqanna المقنع (talk) 10:28, 18 February 2023 (UTC)
- Are you suggesting that the ‘Ecclesiastical’ label should be removed entirely and replaced by ‘Italianate’? I’d say that it’s definitely standard to pronounce words like caeli in an Italian fashion in Anglo-Catholic churches at least (though ‘Italianate’ would do as a label instead, I suppose). Overlordnat1 (talk) 10:25, 18 February 2023 (UTC)
- Inappropriate inclusion is always going to be an issue regardless. I would consider the Classical pronunciation currently listed at Status Uniti Americae to be equally inexplicable, and the "Ecclesiastical Latin" pronunciations currently given are really a 19th-century (Italianate) style that's not universal in the present-day Catholic Church, let alone trans-historical or generally appropriate for post-Classical Latin as the label by itself might imply. If the main purpose of the pronunciation template is the utility of modern speakers of Latin then that probably needs to be indicated somewhere, since it conflicts with what I'd take as the more intuitive understanding that the Classical pronunciation simply describes how the word was pronounced in the Classical era. A formally reconstructed Classical pronunciation will not necessarily be the same as a conventional "classicising" pronunciation by a modern speaker. —Al-Muqanna المقنع (talk) 09:30, 18 February 2023 (UTC)
- I tend to agree with User:Urszag here. I don't really see the point of most reconstructed pronunciations, e.g. I'm generally opposed to Proto-Germanic pronunciations since (a) the spelling indicates pretty clearly the pronunciation, (b) any type of narrow pronunciation will not represent a scholarly consensus. Benwing2 (talk) 06:07, 18 February 2023 (UTC)
- I’m pretty strongly opposed to adding later pronunciations like this to the la-IPA template based on how it worked out in practice when we had a “Vulgar Latin” pronunciation in that module: it was included inappropriately on miscellaneous pages (at one point, it was inexplicably added to the article Status Uniti Americae) and the actually implemented pronunciation had several bugs that could not be fixed because there were no clear criteria in the first place for what it was meant to represent. Templates do have advantages, in particular consistency, so I would say having a separate template specifically for Late Latin or Proto-Romance pronunciation might be helpful; my main question would be if we’re dealing with a small enough number of nodes on the Romance tree to make a template of that kind feasible.--Urszag (talk) 02:34, 18 February 2023 (UTC)
- After considering the above, I think I won't actively oppose others' including 'proto-pronunciations' in the mainspace, so long as they are properly done, chronologically plausible, and marked with {{a}} rather than {{lb}} (to avoid putting attested terms in reconstruction categories).
- Urszag brings up a fair point about anachrony. It should be clarified that what Wiktionary simply labels as 'Ecclesiastical' is in fact 'modern Roman' or similar.
- For words that are specifically Late or Medieval Latin, my inclination would be to show no pronunciation. Informed readers wanting to read a given word out loud in some modern pronunciation will have no difficulty using the spelling with macrons to do so (all modern pronunciations of Latin are, fundamentally, based on this type of reading). On the other hand, an uninformed reader, seeing a Classical pronunciation on an entry for, say, a term that originated in tenth-century England, would be easily misled to believe that that is how the term was actually pronounced at the time.
- Agreed that it would be desirable to have something equivalent to our system for Greek, showing for instance 'fifth-century Roman' and other (properly researched) pronunciations. Preferably phonemic- as, incidentally, our Classical pronunciations also should be. Nicodene (talk) 18:51, 18 February 2023 (UTC)
Zhomron sockpuppetryEdit
Greetings, is this the correct place to bring behavioral, user-related problems for administrator attention?
- Zhomron (talk • contribs • global account info • deleted contribs • nuke • edit filter log • page moves • block • block log • active blocks) is a globally-locked LTA account which has been correlated back to BedrockPerson (talk • contribs • global account info • deleted contribs • nuke • edit filter log • page moves • block • block log • active blocks). Locally, @Chuck Entz has blocked these sockpuppets when they pop up. I am concerned this month that BP has created another account to use in abusing Wiktionary and other projects:
- Itobh (talk • contribs • global account info • deleted contribs • nuke • edit filter log • page moves • block • block log • active blocks). Their first three edits were done to undo mine, where I had implemented a denial-of-recognition strategy on BP's incompetent contributions. I have no other explanation as to why a brand-new user would choose to involve himself in these very obscure linguistic corners, right off the bat.
If any admin here has CheckUser ability, or would simply like to investigate this on behavioral evidence, I'd appreciate it if you had a look soon. Thanks, and kind regards. Elizium23 (talk) 18:53, 17 February 2023 (UTC)
- This seems to be resolved thanks to Chuck Entz. 70.172.194.25 02:54, 19 February 2023 (UTC)
Last June in Wiktionary:Beer parlour/2022/June#The State of WT:RFDN it was proposed to split out Romance and/or Romance+Latin (=Italic) and/or Romance+Latin+Greek from WT:RFVN. It was also proposed recently to split out reconstructed terms. I only see 19 reconstructed terms currently in WT:RFVN and many more Romance terms. What do people think? My instinct is to leave Latin and Greek out of Romance, but maybe someone disagrees. Benwing2 (talk) 06:49, 18 February 2023 (UTC)
- I'm skeptical. There's a large body of Latinate vocabulary in all of the languages of Europe- not just the Romance languages. There's also taxonomic nomenclature, which is explicitly (for plants and animals, anyway) based on Latin, and musical terminology based on Italian. Then there are the pidgins, creoles and mixed languages- some would even include Middle English as one of those. Anglo-Norman is more clearly fuzzy in that respect, though. Yes, there was Vietnamese to complicate things for the CJK(V) split, but the dividing lines for Romance are a lot blurrier. As for excluding Latin: we can't even decide what to treat as Latin vs. Proto-Romance half of the time. Chuck Entz (talk) 07:53, 18 February 2023 (UTC)
- These seem like a lot of straw-man arguments. The definition of Romance is pretty clear. Proto-Romance is a special case that should go wherever Latin goes. Latinate vocabulary in non-Romance languages has no bearing on Romance language RFV's; the idea of splitting out Romance is that we have a lot of Romance-language RFV's and people who know one Romance language often know another and can assist. As for creoles etc., it's not like we have very many RFV's involving creoles or pigeons, and if we do, and it's not clear where to put them, it doesn't matter so much as long as it goes somewhere reasonable. Benwing2 (talk) 08:09, 18 February 2023 (UTC)
- Can we still split off reconstructions... please Vininn126 (talk) 09:04, 18 February 2023 (UTC)