Wiktionary:Beer parlour/2022/January

Headers ==Chalita Conjugation== and ==Sadhu Conjugation== in Bengali edit

(Notifying AryamanA, Kutchkutch, Bhagadatta, Inqilābī, Msasag, Svartava2): Sorry to be pinging the Prakrit editors but there is no wgping entry for Bengali. Several Bengali pages have ==Chalita Conjugation== and ==Sadhu Conjugation== headers. Some example pages: অনুবাদ করা (onubad kora), উল্কি আঁকা (ulki ãka), চালানো (calanō), করা (kora). I don't know what the difference is but these headers are nonstandard. I propose instead putting both conjugation variants under a single ==Conjugation== header and either preceding them with a header beginning with a semicolon (which generates boldface text), or (probably better) just putting the words "Chalita" and "Sadhu" in the respective conjugation table headers. Thoughts? If people agree, I can make this change by bot. Benwing2 (talk) 01:58, 1 January 2022 (UTC)[reply]

@Benwing2 Sadhu is a standardised version of Middle Bengali (14th-early 19th c). I think it's better if we add Sadhu forms in Middle Bengali rather than the modern one. Sadhu Bengali used to be the official literary form of Bengali during colonization but it wasn't a spoken language, which was the Cholito and other forms. Msasag (talk) 10:42, 1 January 2022 (UTC)[reply]
@Msasag: We already treat Sādhu Bhāṣā as Modern Bengali. You are right that this register is derived from Middle Bengali, but it was employed right during the Modern Bengali period as a standard, literary register. There was even instances of code-switching: with the narration being written in Sādhu Bhāṣā while the dialogues in the contemporary colloquial language, as seen in 19th—early 20th century Bengali literature. Some 19th century works were even written in a mixture of the classical and the colloquial form! We already have a precedence for treating a learned, archaizing register as part of the modern language itself: cf. Category:Katharevousa, which is the Greek equivalent of Sādhu Bhāṣā. Hope you understand. ·~ dictátor·mundꟾ 18:13, 1 January 2022 (UTC)[reply]
@Inqilābī So Sadhu is an archaized form of Modern Bengali (which followed or were inspired by middle Bengali forms) rather than being a standardized form of Middle Bengali? Msasag (talk) 01:48, 1 January 2022 (UTC)[reply]
@Benwing2: Yea, we should get rid of those nonstandard headers. And it’s enough to specify only Sādhu Bhāṣā in the conjugation table header, because the other one is the same as the contemporary Modern Standard Bengali. ·~ dictátor·mundꟾ 18:13, 1 January 2022 (UTC)[reply]
I agree that it is non-standard and support the proposed changes if the editors involved in Bengali presently viz. Inqilabi and Msasag both agree. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 09:28, 2 January 2022 (UTC)[reply]

Codifying certain rules for RFD edit

I think it is time for us to set in stone certain rules for RFD, it is one of the most important processes of Wiktionary that decides the fate of various entries, pages and senses. For now, I'm thinking of a few policies that may be helpful to codify:

  1. Banning IPs from voting.
    This was a matter of discussion at Talk:opinions are like assholes. Also, by our current lack of policy on this matter, any IP with even without constructive edits may vote, which isn't always desirable.
  2. Defining the required consensus: 3/5 or 2/3 (per whatever the consensus is).
    Earlier discussions on this include Wiktionary:Beer parlour/2021/September#Policy on deletion consensus, Talk:Real Academia Española, Reconstruction talk:Proto-Indo-European/kr̥snós. This might be the most important requirement, since currently, it's totally upto the closer of the RFD to decide how much consensus is required, and we must define the supermajority needed.
  3. Time period for closing an RFD as no-consensus or no-objection: 1 or 2 or 3 months (per whatever the consensus is).
    There have been previous instances of deleting by no-objection and keeping by no-consensus, but like the others, this is also a grey area. "No-consensus" would mean that the required consensus (which would be clearly defined, see the second point) hasn't been reached in a certain time period and the status quo would be maintained; "no-objection" would mean that apart from the nomination, there hasn't been any keep/delete vote. The time period here, in my opinion, should be more than when there have been votes and consensus in either direction — and I support 2 or 3 months — but this needs to be set in stone too, whatever the agreed upon time period.

This would be a major change, so it would most likely require a policy vote to be implemented and enforced. All the proposed points would be analogously applicable to undeletion requests as well. I would like to call some other users to analyse this proposal and share their thoughts (Metaknowledge, Surjection, Chuck Entz, PUC, Lambiam, Fytcha, Sgconlaw, DAVilla, Donnanz, Inqilābī, Imetsia, Equinox, PseudoSkull, Lingo Bingo Dingo, BD2412, SemperBlotto, Fay Freak, Vininn126), but anyone should feel free to comment. —Svārtava [tcur] 17:38, 1 January 2022 (UTC)[reply]

In most cases I am not a fan of copying the ways of Wikipedia, but in this case I think we can use some inspiration from how article deletion proposals are handled over there, as described in Wikipedia:Deletion process. The most important thing is that the decision depends on a mysterious phenomenon called consensus, which is not based on a tally of votes, but on reasonable, logical, [Wikipedia] policy-based arguments. If an IP presents a reasonable case here in line with our policies and guidelines, then it should by all means be considered seriously. Conversely, if a veteran editor casts a "vote" Keep here for a blatantly a sum-of-parts term without presenting an argument, or merely the argument that the term is common or useful or some other variant of "I like it", this expression of their sentiment in the matter can be duly noted but should in my opinion further be ignored.  --Lambiam 18:18, 1 January 2022 (UTC)[reply]
I definitely agree with the first proposed change, as it is impossible to tell in some cases whether an IP is a sock of a logged in user or of other IPs who have already voted. While I think it's unlikely anyone would vote that dishonestly here at RFD, if such a thing were ever suspected it would be a huge waste of time to have to sort through evidence to support or contradict the suspicions. It is also not too hard to vote a second time with an IP such that it wouldn't arouse suspicion at all. These are important reasons that formal votes don't allow IPs to cast votes.
I think each of the 3 items should get their own sections in the BP and/or formal votes, because I have no comment on the other 2 at this time. PseudoSkull (talk) 19:55, 1 January 2022 (UTC)[reply]
I oppose this. You have mistaken discussion pages for votes, but they are not votes — they are discussions. We allow for grey areas because we trust closers (usually admins) to be able to use their best judgement in the interests of the dictionary. You may remember back when Dentonius was flooding RFD with keep votes, but you may not have noticed that I simply discounted his vote when closing those RFDs. That is to say, we already function a bit more like what Lambiam described than you realise, and if we were to codify our procedures, it should be in that direction. I concur with PseudoSkull that your first item regarding IPs has merit, but I challenge you to find even a single time that an IP has ever swayed an RFD discussion. If it came to a vote, I would support that item, but it seems like a solution in search of a problem. —Μετάknowledgediscuss/deeds 20:09, 1 January 2022 (UTC)[reply]
@Metaknowledge: If Lambiam's proposed solution is adopted, we would need better and more clearly defined policies especially those dealing with SOPs. Similarly, there would be many SOPs that would be deleted going by our current policy but have been kept with community consensus. Also, at a discussion, there was disagreement whether hypallage could save a term from being SOP, so I find it a bit confusing which one would be chosen correct (in other words, would the entry be deleted or kept) without others' votes ⇒ comes back to the same thing. There have been other users apart from Dentonius, particularly I have noticed, SemperBlotto and Donnanz who have voted keep at multiple occasions without giving any rationale or justification for the term not being SOP; but their votes have been counted. To add, per CFI: “a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers” so I don't believe we need to fully adopt Wikipedia's policy and abolish the voting process (again, its good to argue against the given justification of any user and try to convince them with the counter-arguments, but maybe we can retain the voting process). Regarding banning IP voting, I can't find for now any such instances of IPs swaying RFDs, but I think we need to do it since it's extremely lax and can very easily be exploited (an editor could just switch their network and IP and double-vote). Regarding setting the time period for no-consensus/objection, what are your thoughts? —Svārtava [tcur] 04:43, 2 January 2022 (UTC)[reply]
You agree that our system works, and you can produce no examples of IPs ever having swayed an RFD in the entire history of Wiktionary. I think the time period for RFDs is fine as well; we should (and usually do) grant more laxness at RFV, of course. You are still fishing for problems with poorly conceived solutions. —Μετάknowledgediscuss/deeds 06:15, 2 January 2022 (UTC)[reply]
@Metaknowledge: I do not have any problems with the time period for RFDs where there is consensus (in favour of keeping or deleting). However, the time period for closing as no-objection and no-consensus seems unclear. I am proposing that it be fixed and set in clone: be it 1 month, 2 months or 3 months. I think this would require more time, since in "no-consensus" it is okay to wait and see if any consensus is observed in near future and similarly with "no-objection" it may be a better option to wait and let some votes be cast and some thoughts expressed. Some might think that 1 month period is enough in this case also, so if a vote is created regarding this, it would have multiple options. The proposal seeks to codify certain things (which don't have any policy whatsoever), rather than change any existing policy/rule. I'm waiting for a week or so to see where this discussion leads us to and where the consensus is; accordingly, after that, this may be voted upon. —Svārtava [tcur] 06:28, 2 January 2022 (UTC)[reply]
@Svartava2 You have pinged me. I don't normally spend a lot of time participating in deletion votes but I do think the general concept of consensus is right. Obviously there is a lot of judgment involved but I think it's better than simply using a blanket voting rule, as it e.g. allows admins to ignore people who consistently vote keep without any rationale. Benwing2 (talk) 04:59, 2 January 2022 (UTC)[reply]
@Lambiam, Metaknowledge, Benwing2: I urge you 3 to read Imetsia's argument below: “[E]ven if we put this proposal in place, we could keep discounting unmeritorious votes (such as those by Dentonius or other noted inclusionists). This vote would do nothing to change or challenge this practice.” Our RFD system is primarily based on votes (a fact that can't be ignored), so its helpful to “solidif[y] the consensus standard” if we count votes at all (which I think we do, even if we ignore some particular votes). —Svārtava [tcur] 06:19, 4 January 2022 (UTC)[reply]
@Svartava2 The issue with voting is that in my experience there usually aren't enough participants for the voting to be meaningful. Benwing2 (talk) 06:44, 4 January 2022 (UTC)[reply]
If nothing else, it’s useful to codify the IP ban because it’s a sensible standard that could patch up possible holes in our policies. I’m less concerned about manipulating votes via socks (although this proposal would be helpful in that regard too). My real problem is that anonymous editors are less likely to have a history on Wiktionary, and therefore don’t have the knowledge, experience, or judgment to cast ballots at RFD. Of course there are exceptions (e.g., PUC often votes when not logged in, logged-in editors are sometimes newbies), but this would go a long way in addressing the issue. By the way, the phrase “a solution in search of a problem” has become so routinely used at the BP that it’s lost all meaning.
Solidifying the consensus standard makes sense on the basis of both consistency and precedent. The problem with leaving it up to admin discretion is that it inevitably produces inconsistent results. One admin might count the votes and interpret a consensus to delete, but another would see the same votes and argue there’s no consensus. This makes it so that the existence or deletion of an entry hinges entirely on the arbitrary fact of which user closed the RFD. Secondly, we’ve already pinned down a more precise consensus standard for actual votes, and it makes sense to do so again for this other category of votes. It’s, once again, more consistent.
Note also that, even if we put this proposal in place, we could keep discounting unmeritorious votes (such as those by Dentonius or other noted inclusionists). This vote would do nothing to change or challenge this practice.
Lastly, the proposal to close by no-objection is useful simply because activity at RFD is often stagnant/lacking. Imetsia (talk) 20:29, 2 January 2022 (UTC)[reply]
@Metaknowledge: On IP votes, I would almost agree that it is a solution searching for a problem, except that this is a system that is so easy to exploit as I've pointed out. Take this for example: Each user with an account is easily identifiable as a separate identity because of the list of previous contributions, and in cases of doubt (which I'm pretty sure has existed before even in RFD) it is relatively easy to spot sockpuppeteering and easier to prove. However, in the case of IP accounts, those are rarely kept longer than a day, and some have no contributions at all to speak of, at least for individual IPs not considering ranges. I can definitely see double-voting going completely unnoticed, even by closing administrators, because admins are just people after all, which is why we can't come up with even one example of it (which I admit, is probably going to make this harder to pass as a rule in a formal vote). Disguises are pretty easy to use if they don't stand out as such. Every vote that is counted in an RFD discussion does sway the vote one direction or the other, if the closing admin counts the vote. Of all the years I've been here I didn't even know that IP votes were countable until recently, and it honestly shocked me to find that out. I see it as an easy exploit to a system based on contemporary consensus, which is already inherently far from perfect as it is in determining consensus consistently. PseudoSkull (talk) 10:05, 2 January 2022 (UTC)[reply]
  • I don't personally see a strong need to change a great deal about how RFD works. Yes, some listings do drag on for a very long time, and should ideally be resolved more quickly, but there aren't huge numbers of these. If there was a lot of disruptive or unhelpful activity by unregistered users then I would support curbs on that, but I don't see much of this (unless a lot is deleted very quickly and I never see it). If an unregistered user "votes" a certain way on the basis of a solid argument that is useful for others to see, then that seems fine to me. Mihia (talk) 12:04, 2 January 2022 (UTC)[reply]
@Mihia In your view then, IPs being able to comment or even vote aside, should their votes therefore be counted in the end? PseudoSkull (talk) 15:00, 2 January 2022 (UTC)[reply]
My understanding, as has also been mentioned above, is that a decision at RFD isn't, or shouldn't be, merely a case of totting up votes. It should be based on a judgement of the cases made for retaining or deleting. So, in theory, even if five registered users vote "delete" with no explanation of why, while one unregistered user makes a compelling case why the entry should be kept, then the RFD can be closed as kept, in my understanding. So basically it's the argument that matters, not who makes it. Mihia (talk) 18:03, 2 January 2022 (UTC)[reply]
@PseudoSkull: The matter of IPs vs. accounts isn't quite that simple: as long as you're using the same Internet Service Provider from the same geographical area, your IP address will generally be in the same range, and there are geolocation services that tell what that geographical area probably is. I have created abuse filters that have very effectively kept certain IP editors from making certain types of edits for years at a time. It's also not uncommon for someone with hardwired internet to have the same IP address for years. An account is uniquely identifiable, but has none of the information that's available for IPs- unless you're a checkuser and have cause to run a check. What's more, there's no way for a non-checkuser to tell if two accounts are the same person. You, of all people, should know how easy it is to create a new account. As for evidence of IPs cheating on RFDs: there was a recent case in RFDO where someone voted with their account and with multiple IPs. All the votes were struck and the master account has been blocked permanently. Of course, it was pretty obvious what was going on, and reasonable suspicion of double-voting is clear justification for a checkuser check. Chuck Entz (talk) 17:02, 2 January 2022 (UTC)[reply]
The listings that “drag on for for a very long time” are IMO a greater problem than you’re suggesting. There just seem to be so many entries at RFDN that suffer from this. There are essentially two problematic scenarios:
  1. An entry is voted on, a result is clear, but no one closes it for an unreasonably long time.
  2. No one votes on an entry at all, so the entries are stuck at RFD for months and months.
So I think this is a real issue, and the proposals would go a long way in alleviating them. Imetsia (talk) 20:29, 2 January 2022 (UTC)[reply]
Sorry, I should have made it clear that my comments apply only to WT:RFDE. I do not ever participate at WT:RFDN. Mihia (talk) 20:36, 2 January 2022 (UTC)[reply]
I've given it some thought and I'm adding in my two sense, namely to voting criteria. I'm not sure that the problem is whether IP's can vote or not, as that's not something that's come up very often, but rather who can vote at all. The current rules is "the first English wikt vote must have been made 1 week before the given vote, 50 edits in total, and no suckpuppetting". I can't really think of a way to stop sockpupetting anymore than we are now. I also don't believe that votes cast by non-regular editors has really affected the outcomes - however I do wonder if some of the criteria and procedures should be slightly more standardized. Is a week and 50 votes enough for a given IP or account? Would a better rule be "the first edit must predate the CREATION" of the vote by one week", rather than the start time? Is 50 edits enough? Are people with only 50 edits really going to be all that aware of votes and how to even cast them and such?
On a similar note, in light of recent admin votes, we should consider implementing a standard that all votes begin and end at 23:59, and instead of "a month, give or take", perhaps set lengths for types of votes. I propose a tiered system - quick votes (i.e. bot votes) can last 15 days, standard votes 30. Vininn126 (talk) 20:30, 3 January 2022 (UTC)[reply]
@Vininn126: You're clearly mistaken there; this is for WT:RFD, not for actual votes (of which the rules are pretty clearly defined and I don't propose to change that). —Svārtava [tcur] 06:22, 4 January 2022 (UTC)[reply]
oop, brainfart Vininn126 (talk) 07:30, 4 January 2022 (UTC)[reply]
I want to point out that, if a vote fails, it doesn't mean that the negation of the vote's content is implemented but rather that the status quo prior to the vote is maintained. Concretely: If the vote on "IPs/Anonymous editors cannot vote" fails, it merely means that closing admins are not forced to disregard IPs' votes, not that all IPs' votes must be respected (as that's not the status quo). Fytcha (talk) 03:33, 5 January 2022 (UTC)[reply]
I think Wonderfool should be banned from voting too Br00pVain (talk) 14:54, 5 January 2022 (UTC)[reply]
@Br00pVain: Why from Rfd? I have anyways now seen him vote much here. —Svārtava [tcur] 15:00, 5 January 2022 (UTC)[reply]
For a late record: "We allow for grey areas because we trust closers (usually admins) to be able to use their best judgement in the interests of the dictionary": this has not been our practice and I do not trust closers to use their best judgment. --Dan Polansky (talk) 20:42, 30 August 2022 (UTC)[reply]

User:B2V22BHARAT's Korean entries edit

(Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, B2V22BHARAT, Quadmix77, Kaepoong): An IP user has just aired their discontent with the above user's entries (diff), stating that this is a repeating problem. A quick glance at the user's recent contributions reveals that a considerable part has been reverted already. I think manually looking over their entries would be a wise choice. --Fytcha (talk) 03:45, 2 January 2022 (UTC)[reply]

@Fytcha: More checks may be required by native speakers. They haven't been very active lately. User:Tibidibi (or his previous accounts) pointed and corrected some edits in the past. I have RFD'ed one entry - 발음법(發音法) (bareumbeop) as an SoP. --Anatoli T. (обсудить/вклад) 10:11, 2 January 2022 (UTC)[reply]
@Atitarev, @Fytcha, Tibidibi will unfortunately be out until 2023 due to mandatory conscription. I can look into some of the entries (though I'm not a native), and I maybe could ask some folks who aren't as active to chime in as well. Looking into B2V22BHARAT's history and entries, it seems like there's a lot to fix nonetheless, so it'll for sure take time. Also, perhaps they should be removed from the Korean working group for pings? AG202 (talk) 21:38, 2 January 2022 (UTC)[reply]

Administrator intervention requested edit

I would be grateful if an administrator could please look at Wiktionary:Beer_parlour/2021/November#WT:ATTEST_proposal and either make the suggested change or rule that this has to go to a formal vote. Thank you. Mihia (talk) 18:09, 2 January 2022 (UTC)[reply]

I'm afraid I don't feel competent enough to adjudicate on the further steps to be taken in this case. I do agree however that you should be given guidance on how to proceed seeing that you have garnered community support for your idea. Pinging @Chuck Entz. Fytcha (talk) 17:21, 5 January 2022 (UTC)[reply]

Entry formatting of one-character Chinese entries e.g. edit

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly): I have a script to clean up misindented sections but it currently doesn't work right on single-character Chinese entries like . These have a very strange format with e.g. a ==Definitions== header that promiscuously mixes parts of speech. This one puts ==See also== underneath the Definitions header instead of at the same level as is more normal, but conversely puts Descendants at the same level when it normally would be indented underneath a POS header. It also uses a Compounds header instead of a Derived terms header, and puts that at the same level as Definitions instead of indented underneath it. Questions:

  1. Is this standardized? If so, is there a page documenting the standards?
  2. Does this apply to all Chinese entries or only one-character ones?
  3. Does it apply to any other languages? If so, which ones, and if it applies to a subset of entries (e.g. only one-character entries), what is that subset?

Thanks, Benwing2 (talk) 08:14, 3 January 2022 (UTC)[reply]

@Benwing2: Wiktionary:About_Chinese#Basic_headers_for_single_characters has information.
And {{zh-see}} and {{ja-see}} can be found under ==Chinese==/==Japanese== (), ===Etymology=== if there is no actual etymology written (), or ===Definitions=== ().
Otherwise, ===Definitions=== should not be used elsewhere, especially not Japanese single characters.
Fish bowl (talk) 08:17, 3 January 2022 (UTC)[reply]
@Fish bowl Thanks. There isn't info there though about header indentation or whether and how much this applies to multicharacter entries. Benwing2 (talk) 08:20, 3 January 2022 (UTC)[reply]
I don't know concretely about indentation level, but would support indentation for all; HOWEVER for ====Compounds==== sometimes it is a lazy cop-out where there are multiple Etymologies but no one has sorted the words into each Etymology.
As for the naming of ====Compounds====, IIRC it is used to skirt the question of whether a term is actually a derived term or technically the other way around ().
Fish bowl (talk) 08:29, 3 January 2022 (UTC)[reply]
@Benwing2 It's not a good idea to mess with header levels in Chinese character entries: The Module Is Watching You. See Module:zh-forms after about line 330 for details... Chuck Entz (talk) 09:41, 3 January 2022 (UTC)[reply]

Words used solely by non-native speakers edit

Moved to WT:TR#Words used solely by non-native speakers --Fytcha (talk) 23:39, 4 January 2022 (UTC)[reply]

New phrasebook rules edit

Following up Wiktionary:Beer parlour/2021/October#The_phrasebook_is_in_dire_need_of_rules., I've decided to create a formal vote: Wiktionary:Votes/2022-01/New phrasebook regulations.

Suggestions strongly encouraged! Fytcha (talk) 21:25, 3 January 2022 (UTC)[reply]

I would greatly appreciate some more input before the vote begins. Most importantly regarding the two points raised on the talk page. — Fytcha T | L | C 13:02, 15 January 2022 (UTC)[reply]

ordering of languages edit

@DTLHS I have a script to correct various misformatting issues that I've been running. I recently added support for reordering languages. This brings up an issue: What should the order of non-ASCII characters in language names be? User:NadandoBot seems to sort strictly by Unicode codepoint, possibly ignoring case; hence on A, Võro comes after Votic, Xârâcùù comes after Xhosa, and Yámana comes after Yoruba. On the other hand, Yámana comes before Yoruba on ala and several other pages not recently touched by User:NadandoBot. From looking at various pages, I see that 'Are'are comes before Acehnese on ma, and ǃKung (which despite appearances does not contain an exclamation point but the Unicode codepoint U+01C3) generally comes at the end, e.g. after Zulu on m. Furthermore, Indonesian comes after Indo-Portuguese on a (a change made by User:NadandoBot in [1]; formerly it was the other way around). Rather than sorting strictly by Unicode codepoint, I propose instead to sort by Unicode codepoint but ignore case distinctions and combining diacritics; this would place Võro before Votic, Xârâcùù before Xhosa, and Yámana before Yoruba, but put 'Are'are before Acehnese (apostrophe is not a combining diacritic but a spacing character), Indo-Portuguese before Indonesian (hyphen is likewise not a combining diacritic) and ǃKung after Zulu. Other more possibilities are to ignore hyphens and apostrophes (i.e. act as if they aren't present) or even to ignore any character that isn't A through Z, after removing combining diacritics. (The latter would alphabetize ǃKung like Kung, Zo'é like Zoe, and Indo-Portuguese like Indoportuguese.) Benwing2 (talk) 02:28, 5 January 2022 (UTC)[reply]

@Erutuon, Surjection, This, that and the other who might be interested in this topic. Benwing2 (talk) 02:29, 5 January 2022 (UTC)[reply]
I'm on board with a sort order that ignores diacritics. As an extreme example, I noticed recently that Önge entries are placed at the very bottom of pages, which is definitely illogical.
How do the languages such as 'Are'are and ǃKung sort the special letters in the context of their alphabet? I can't seem to find any info on this. But it might be best to sort them in a place that speakers and scholars of that language would expect to find them. This, that and the other (talk) 02:45, 5 January 2022 (UTC)[reply]
@This, that and the other Thanks. Another possible issue has to do with spaces. E.g. on sa, South Slavey comes after Southern Ndebele (effectively ignoring spaces); this was added by User:Thadh on Dec 19, 2021. Benwing2 (talk) 02:51, 5 January 2022 (UTC)[reply]
That's yet another illogicality (and ironically, one where sorting strictly by Unicode codepoint would result in a better outcome). In any sane sort order, South Zzz would come before Southern Aaa. This, that and the other (talk) 03:01, 5 January 2022 (UTC)[reply]
I don't think that this is something that we should bother actual editors about, so I support whatever the person who decides to run a script to order languages thinks is best. DTLHS (talk) 03:13, 5 January 2022 (UTC)[reply]
@Benwing2: Don't we just follow WT:STATS' order? Thadh (talk) 08:41, 5 January 2022 (UTC)[reply]
@Thadh This has the issue of putting Önge and Àhàn at the very end, and Záparo after Zuni, which seems contrary to what most people expect; so I am following what DTLHS said above and using the ordering I described above. Benwing2 (talk) 08:54, 5 January 2022 (UTC)[reply]
Oh right, sorry, I misunderstood the issue. Thadh (talk) 09:02, 5 January 2022 (UTC)[reply]
Whichever ordering is the standard (or rather most common) for English should be used. So if the language names were entries on some dictionary, that order should be used. — SURJECTION / T / C / L / 12:10, 5 January 2022 (UTC)[reply]
Another option that should handle the diacriticked letters would be sorting by the decomposed version of the language names (unicodedata.normalize('NFD', language_name) in Python). That would split letters with diacritics into sequences of the base letter and a combining diacritic wherever possible. I'm not sure what we should do for apostrophes or click letters. I imagine a typical English speaker would just ignore them in sorting. — Eru·tuon 15:27, 5 January 2022 (UTC)[reply]

{{plural of}} redlinks edit

(Following up the request for speedy deletion of emiratis and envolvers by User:This, that and the other)

What is the community consensus on plural-of stubs without the corresponding lemma? I'm not particularly a fan of them but at least they provide possible redlinks for article creators. On the other hand, some of these plural entries are ancient; If the lemma has not been added in 15 years, it will never be added.

Pinging also User:Equinox and User:Apisite as two users I see adding plural-only stubs somewhat regularly. Fytcha (talk) 18:20, 5 January 2022 (UTC)[reply]

I am trying not to speedy those which were intentionally created by humans and where the lemma has not failed RFV or RFD. The two entries you mention were pure bot creations and never had a human eye look over them. All the redlinked plurals I've looked at that were created by people like Equinox seem legitimate, and someone with the time could very well create the singular form. I haven't been speedying those. This, that and the other (talk) 21:56, 5 January 2022 (UTC)[reply]

Half collapsed boxes look like garbage edit

For example: B.1.617. All of these synonyms / coordinate terms / derived terms sections look like shit when they're piled on top of each other. Especially with the title formatting. Please let's just go back to having collapsed boxes like translation sections have. DTLHS (talk) 01:14, 6 January 2022 (UTC)[reply]

Particularly bad with mixture of red and blue links. The pale blue background makes it look extra busy. The overall look draws attention away from the definition, which may be all that many users want and which is needed by other users to confirm that the other material is relevant to their needs. DCDuring (talk) 16:34, 6 January 2022 (UTC)[reply]
I prefer col4 or top4's to collapsible boxes in derived and related terms sections, but I agree that using it in 'nym sections is distracting. Thadh (talk) 17:07, 6 January 2022 (UTC)[reply]

Unattested translations and {{not used}} edit

(Following up on diff) What should be added in cases where a potential translation does not meet WT:ATTEST but is still clearly and obviously correct and is also the form that's used by any native? Using {{not used}} is misleading: A term not meeting WT:CFI is different from a term not being used in a language (like the in many languages). — Fytcha T | L | C 02:45, 7 January 2022 (UTC)[reply]

@Fytcha: If you mean the cases when a valid translation is an SoP, i.e. Russian translation of time-consuming is {{t|ru|тре́бующий мно́го вре́мени}}, producing тре́бующий мно́го вре́мени (trébujuščij mnógo vrémeni). I have described this case at Wiktionary:About_Russian#Translations_into_Russian.
A translated term may be vague or ambiguous or narrow, may require additional words to mean exactly the same as the English term. You can use {{qualifier}} for clarifications.
{{not used}} can also be used for abbreviations if the target language doesn't use them. --Anatoli T. (обсудить/вклад) 03:20, 7 January 2022 (UTC)[reply]
@Atitarev: I didn't mean SOP translations, I more meant cases like this. The way I understand it, translating to SOPs is fine in all languages as long as the SOP is the commonly used one (also codified here: Template:t). — Fytcha T | L | C 03:29, 7 January 2022 (UTC)[reply]
@Fytcha: That usage is not expected. I noticed some users link only native words in a multiword translation, the unlinked word being a name. E.g. {{t|fi|Streisand-ilmiö}} in Streisand effect translation into Finnish. Wouldn't "panzerfaust" (lower case?) still be a valid translation into Romanian? You can mark it as {{qualifier|rare}}. --Anatoli T. (обсудить/вклад) 03:57, 7 January 2022 (UTC)[reply]
@Atitarev: Turns out, I was just really bad at searching. It is now added as both a translation as well as entry. Though the point still remains, what should I do if a translation is unattested (per WT:ATTEST) but exists in real language usage? {{no entry}} gives off the wrong signal in my opinion, which it seems you agree on.
Not related to this issue but I'm not sure I agree with how that Finnish translation is handled. If the whole term is attestable, it should be linking to that because it's definitely not a SOP. — Fytcha T | L | C 04:15, 7 January 2022 (UTC)[reply]
@Fytcha Panzerfaust even has an entry in the Romanian Wikipedia: [2]. I think it's very strange to use {{not used}} when translating any non-function word (except maybe an abbreviation). Speakers in a language must be able to refer to a concept in some way, if nothing else by code-switching or using an unadapted borrowing. Rather than {{not used}}, you can add a qualifier explaining what native speakers actually do; e.g. since essentially all Catalan speakers also speak Spanish, they might well use Spanish words to refer to certain concepts when speaking Catalan. (But there are plenty of monolingual Romanian speakers so I can't see this applying to Romanian.) Benwing2 (talk) 05:09, 7 January 2022 (UTC)[reply]
@Benwing2: Yes, that's exactly my point; I agree that using {{not used}} is strange and I'm hereby asking what exactly I am supposed to add. What native speakers would do: Use an easily understood term that fails WT:ATTEST (see Wiktionary:Translations#Sources: "clashing with the fact that words added to translation tables are subject to attestation requirements as well." Emphasis not mine). Hence, I don't think recording the correct term using {{q}} instead of {{t}} is even a true loophole. See also Schläfli symbol, it also has a Romanian Wikipedia entry but that one is actually impossible to attest (surprise me!). — Fytcha T | L | C 05:22, 7 January 2022 (UTC)[reply]
@Fytcha I would say, use {{q}} to provide an explanation (e.g. (found in Wikipedia as simbol Schläfli; otherwise unattested)). {{not used}}, as you added, seems simply wrong. {{not used}} implies an intentional gap, when this is clearly an accidental gap. Benwing2 (talk) 05:41, 7 January 2022 (UTC)[reply]
@Benwing2: What do you think about creating something like {{no attested translation}} in the spirit of {{no equivalent translation}}? — Fytcha T | L | C 04:01, 8 January 2022 (UTC)[reply]
@Fytcha That is fine with me. However, I still think in a case like Schläfli symbol, where we have a living non-LDL language and where there is a translation in a source that does not pass WT:ATTEST, it's worth mentioning in a qualifier. I would not bother doing so for dead languages like Gothic, Old English or Latin, or in an LDL (low-documentation language), because in all these cases the people creating the Wikipedia entries are likely to be non-native speakers. Benwing2 (talk) 02:40, 9 January 2022 (UTC)[reply]

(Following up on User_talk:Fytcha#Template:rfv-t)

I have created this template today and was advised to start a discussion with the wider community. I will share my rationale for creating this template:

Being quite active in translating rather specialized or rare English terms, I have come across a fair share of translations that seemed at least a little iffy to me. Keep in mind that, while translations are not subject to the entirety of WT:CFI (notably, they are exempt from our idiomaticity policy), they are still subject to WT:ATTEST as per WT:TRANS: " [] clashing with the fact that words added to translation tables are subject to attestation requirements as well." (emphasis again not mine). If, for a WDL, I can offhand only find, say, one valid quotation then there needs to be some kind of action that I can undertake to ensure that the term is properly verified in accordance with WT:ATTEST. For lemmas that have an entry, this would be using {{rfv}}. For lemmas that don't have an entry, there was no infrastructure, hence the template. {{t-check}} is unsatisfactory because there is no time limit so it may as well remain there unchecked for another decade and secondly because the term can not be easily listed on WT:RFVN where it belongs. On the other hand, creating an article for it and immediately RFVing it is also an unsatisfactory solution because it is unnecessarily time-consuming and makes RFVing terms (especially in languages one is not familiar with) unnecessarily complicated. — Fytcha T | L | C 01:56, 8 January 2022 (UTC)[reply]

@Fytcha I'm not really sure about this. I simply remove any unattested translation (if red link), and anyone is free to revert me if they can provide the required citations. In the aforementioned scenario, if there is one valid quotation for a WDL, it should be considered unattested, and hence the translation removed. What's wrong? —Svārtava [tcur] 03:34, 8 January 2022 (UTC)[reply]
@Svartava2: I don't trust my attestation skills in languages I don't speak: I don't know what the inflected forms are and I don't know where to look specifically. The fact that I can't attest a word doesn't mean by a long shot that it is unattestable. Therefore I would never remove such a translation, especially not if I was in fact able find one attestation. Isn't this the whole reason why we do RFVs in the first place instead of directly deleting entries we can't ourselves find attestations for? — Fytcha T | L | C 03:42, 8 January 2022 (UTC)[reply]
@Fytcha Deleting an entry and removing a mention of are totally different. Editors may remove the translations when they're sure and it is a translation in a language they know. Similar is the criteria for RFV: it's usually that editors send RFVs of their language (some exceptions being if someone tagged it but not listed it) to be verified (ideally when they were not able to). Their are tons of red links in translations, and tons of entries without quotations, which is okay IMO; how pedantic it would be if editors of other languages start RFVing it? So that's why the editors of a particular language should only deal with the translations. —Svārtava [tcur] 04:12, 8 January 2022 (UTC)[reply]
@Svartava2, Fytcha I tend to agree with Fytcha here. I don't think it's a good idea to remove translations in languages that aren't your native language, even if you think the term is unattested; you could easily be wrong and then you may have removed good info. Depending on someone else to revert you is likely to fail because there aren't enough editors out there on Wiktionary to properly police everyone's changes. There are of course exceptions; e.g. if the translation is into a dead language and smells funny, or into an LDL language where you don't trust the competence of the person who added it, or in similar circumstances where you have reason to believe the translation is likely to be wrong. Even in that case, unless you're pretty sure the translation is garbage, I would comment it out rather than remove it outright. Benwing2 (talk) 02:48, 9 January 2022 (UTC)[reply]
“listed on WT:RFVN where it belongs” – I thought there was a principle to not RFV entries that have not even be created. Or you would need a separate list for these, or a separate category for these links would suffice: The current category Requests for verification in langname entries hardly applies since there is no entry.
And this new template created after seeking translations for “rather specialized or rare English terms”, won’t it be abused to hunt for or displace words that are correct translations but fail CFI (or CFI fails for them—like internet slang)?
Also the sentence “words added to translation tables are subject to attestation requirements as well” does not mean that the attestation requirements for these words are the same as for words with entries, 😄, it says they are subject to some (unwritten?) attestation requirements, thus the problem “where a potential translation does not meet WT:ATTEST but is still clearly and obviously correct” is a chimera. Just only include good translations, mkay? If it is a complicated Gewissensfrage then you can write a few lines about what corresponds and what is attested how, as man had to do so with some legal terms as tortious interference and negligence per se, and if you told how it is then it can’t be wrong. Fay Freak (talk) 08:37, 8 January 2022 (UTC)[reply]
Thanks for bringing the category stuff to my attention. That certainly needs some kind of change; maybe something like Category:Requests for verification of langname translations will do.
I should have cited the entire sentence because the first part actually directly links to WT:ATTEST; as such, it is clear to me that translations are subject to that specific attestation policy, which is also in use for standalone entries. — Fytcha T | L | C 15:21, 8 January 2022 (UTC)[reply]
I personally   Support this. I've seen a ton of translations that have made me scratch my head, but with most of them, since they're not in my primary languages, I've just let them be, so having a RFV template would help a ton to be able to have them verified somewhere. However, I am a bit worried about continuously adding more to the non-English RFV, which is still backlogged. AG202 (talk) 05:06, 10 January 2022 (UTC)[reply]
I think it's a great concept, but rather than listing the terms at the backlogged RFV page, it might be better to just let the entries be categorised into language-by-language categories of questioned translations, which editors in that language can go through and check as required, similar to the {{attention}} categories. This, that and the other (talk) 04:16, 12 January 2022 (UTC)[reply]

Transparent "law of X" entries edit

(Following up on Wiktionary:Requests_for_deletion/English#law_of_conservation_of_energy)

Do we want such entries? The only information worthwhile in these articles are perhaps the translations. German, for one, is not always predictable regarding which term is used for a mathematically proven statement. In other words, there is no mapping between English {"theorem", "law", "lemma", "corollary"} and German {"Theorem", "Satz", "Hilfssatz", "Gesetz", "Lemma", "Korollar"}. I think a BP discussion might be better suited for such a general class of terms, rather than proposing them one by one as I encounter them. — Fytcha T | L | C 04:34, 9 January 2022 (UTC)[reply]

This seems very appendix-y to me. —Justin (koavf)TCM 04:36, 9 January 2022 (UTC)[reply]

Abuse filter to block "Pronunciation 1" edit

I am thinking of creating an abuse filter to block any addition of ==Pronunciation 1== headers. It has never been agreed to allow them and in practice entries created with them have all sorts of weird formatting issues, because there's no standard for how to handle them. On top of this, it's really not necessary to have such a header at all. Instead, you split by etymology, and if a given etymology section has multiple lemmas in it with different pronunciations (or a single lemma with multiple pronunciations, or whatever), you list the pronunciations in a Pronunciation subsection at the top of the etymology section, appropriately tagging the pronunciations so it's clear which lemma goes with which pronunciation.

Probably I will make an exception for entries with Chinese characters in their title, since the use of Pronunciation 1 headers for Chinese (and Japanese kanji terms) seems to occur fairly frequently. Benwing2 (talk) 08:06, 9 January 2022 (UTC)[reply]

  Support
I don't like most usage in Chinese, and usage in Japanese should be eliminated AFAICT. —Fish bowl (talk) 08:10, 9 January 2022 (UTC)[reply]
  Strong oppose: See Afar awka: The term obviously has only one etymology, and the only difference between the two senses is the shift in stress to accommodate a different gender. Thadh (talk) 09:27, 9 January 2022 (UTC)[reply]
Perhaps coming up with a more standard formatting would solve this issue. Something like Lang -> Etymology -> pronun1 -> header/definition -> pronun 2 -> header/definition, and so on. Vininn126 (talk) 10:03, 9 January 2022 (UTC)[reply]
@Benwing2: So if I understand you correctly, your solution for entries like German unumgänglich is to merge the pronunciation sections into one while qualifying them with the senses using {{sense}}? — Fytcha T | L | C 11:59, 9 January 2022 (UTC)[reply]
See also îndoi#Etymology_2. If I understand your proposal correctly, it would introduce quite a lot of redundancy into this article. — Fytcha T | L | C 16:26, 9 January 2022 (UTC)[reply]
@Thadh You misunderstand. I am not saying these must be separate etymologies. See diff. (We need some slight changes to {{aa-IPA}} to make this cleaner-looking.) The problem with Pronunciation 1 is there is absolutely no standardization, making editing the pages by bot virtually impossible. Sometimes you find Etymology N under Pronunciation N, sometimes vice-versa, sometimes they are randomly threaded together in a non-nesting fashion. Furthermore, there is no mention in WT:ELE of such headers at all, but many past discussions that they are undesirable. I can dig these up if you need proof. There is actually a tag {{rfc-pron-n}} added by some past bot indicating that such entries need to be cleaned up. Benwing2 (talk) 23:25, 9 January 2022 (UTC)[reply]
@Benwing2: Perhaps a better example was baxa. In any case, what are we going to do, add all the senses to the pronunciation section? And adding the header (which would actually work for Afar) is also less than ideal, because that makes it harder to distinguish between one word pronounced in different ways and two words having the same etymology.
What about deciding on one standard solution for entries with multiple pronunciations instead? I think using pronunciation sections after etymologies whenever there's more than one seems doable. Thadh (talk) 23:40, 9 January 2022 (UTC)[reply]
@Fytcha I cleaned up îndoi appropriately. I don't see a lot of redundancy and in fact the page got 103 bytes smaller. Benwing2 (talk) 23:29, 9 January 2022 (UTC)[reply]
@Benwing2: It's a bit misleading to say that it got smaller when it getting smaller had nothing to do with getting rid of numbered pronunciation sections and especially when bringing back numbered pronunciation made it 18 bytes smaller. As to redundancy, it obviously is there (the argument of {{s}} must necessarily be a redundant copy of the corresponding senses) which is to be disliked by default. In the end it probably just comes down to whether we as a community have a stronger dislike for the redundancy or the more difficult / less reliable parsing. — Fytcha T | L | C 01:14, 10 January 2022 (UTC)[reply]
@Fytcha You are right about the size difference, my apologies. However, I don't see why this claimed "redundancy" is a big issue; we're talking about on average maybe four or five words. And it's not just "less reliable parsing" it's that (a) these headers are essentially disallowed by policy (WT:EL); (b) there is no absolutely standard for how to do this. If there are multiple Pronunciation N headings and multiple Etymology N headings, does that mean we end up with L6 headings? And which goes under what? Existing entries that use Pronunciation 1 are all over the place. What happens if the pronunciations cross-cut the etymologies, which often happens e.g. in Tagalog? Editors have tried to interleave the headings in such cases, which just doesn't work. If you really don't like putting all pronunciations in an etymology section in one place, I would much rather see a Pronunciation section placed *under* the corresponding POS header, similar to a Conjugation/Declension header, than introduce Pronunciation N headers. This is easy to follow and does not complexify the entry structure, which is already complex enough with single etymologies vs. multiple etymology sections, different possible places for Alternative forms, nested vs. non-nested Derived terms/Related terms/Descendants/..., etc. Benwing2 (talk) 02:29, 10 January 2022 (UTC)[reply]
I should also add, others besides me have been cleaning up Pronunciation N headers; I have seen User:Jberkel do this, for example. Benwing2 (talk) 23:30, 9 January 2022 (UTC)[reply]
I am okay with removing these headers, never a fan of those, though yesterday I added one case, Baumheide (Eastern Westphalian place names have strange stresses). I’d rather link the different senses or etymologies from a joint pronunciation section, although I don’t know now by which template: we have used {{sense}} for this a few times (mostly Arabic pages where an etymology is just a root and the vocalizations differ for noun and verb or the like) but how to additionally link IDs instead? (Of course we won’t reintroduce {{jump}}.) Fay Freak (talk) 00:16, 10 January 2022 (UTC)[reply]
@Fay Freak IMO having "etymologies" that are just roots is lazy. Properly, different vocalizations of noun vs. verb, form I vs. form II, etc. *are* different etymologies; either they have distinct Proto-Semitic (or Proto-West Semitic/Proto-Central Semitic/etc.) forms, which could be listed, or they are post-Proto-Semitic creations, in which case the etymology could/should specify this. I know that this is difficult in practice since Semitic etymology is such a shambles; but theoretically Arabic is no different from any other language in this regard. Benwing2 (talk) 02:33, 10 January 2022 (UTC)[reply]
Surely, but even if they are separate etymologies then it impedes readers if there are separate etymologies only for that reason, without there being anything ever said as specific etymology. (Or as you often did: put a new reference section under each part of speech section with a Steingass template referring to the same location. The use of space has to look efficient—this is for me more the point to forgo numbered pronunciation sections than your bot logics; these won’t prompt us to split up into etymology sections because “actually” there are etymologies, which can’t be actualized–if the text is old you don’t see whether you have stem I or stem II save for the imperative or verbal noun, so really never, even discounting any possible human work.) Fay Freak (talk) 02:48, 10 January 2022 (UTC)[reply]
@Benwing2: I forgot something I had in my mind and mentioned at various times: If the inflection tables had a “switch” to swap Semitistic transcriptions for IPA transcriptions we would rarely seek pronunciation sections in the first place. Of course in the cases that there are audio files they might have to be added to particular forms with form-specific parameters as we use in Russian tables anyway—currently I even have audio files between POS headers and POS templates not to create noisy pronunciation sections or structure the whole page after pronunciations. Example of an Arabic page: حبن. Fay Freak (talk)
  Weak oppose per Thadh. A general comment though: I think we seriously need to review our entry layout. I know there was the vote a while back related to prioritizing definitions that unfortunately failed, but I still feel that the etymology-first approach is tougher for languages where etymologies aren't as clear or close to non-existent. I remember when I first started with Yoruba entries, I actually got suggestions to use the Pronunciation headers similar to how Akar & Arabic use them actually, see: User:Smashhoof/Sandbox/bi#Yoruba, though in the end I went with a different approach, see: ọkan, odo, and bi, the last of which I know has the overarching "Pronunciation" header that some folks aren't fans of. Thus, I understand why Thadh and others would use them, as {{sense}} definitely can be messy and unclear at times. AG202 (talk) 05:00, 10 January 2022 (UTC)[reply]
  Support If there's a need for this it should be discussed instead of making up non-standard headers. – Jberkel 12:57, 13 January 2022 (UTC)[reply]

Wiki Loves Folklore is back! edit

Please help translate to your language

 

You are humbly invited to participate in the Wiki Loves Folklore 2022 an international photography contest organized on Wikimedia Commons to document folklore and intangible cultural heritage from different regions, including, folk creative activities and many more. It is held every year from the 1st till the 28th of February.

You can help in enriching the folklore documentation on Commons from your region by taking photos, audios, videos, and submitting them in this commons contest.

You can also organize a local contest in your country and support us in translating the project pages to help us spread the word in your native language.

Feel free to contact us on our project Talk page if you need any assistance.

Kind regards,

Wiki loves Folklore International Team

--MediaWiki message delivery (talk) 13:14, 9 January 2022 (UTC)[reply]

(Following up on @Svartava2's edit)

The above user has been doing daily {{etyl}}->{{der}} substitutions in Serbo-Croatian lemmas for almost 4 months now. They have been on my radar before actually (diff, diff, many more) but I got bored so I stopped cleaning up after them at some point. Evidence of automated scripting is relatively low I guess (https://imgur.com/u1WBj9P) but I haven't spent a lot of time trying to reverse-engineer their distribution. We would need a Serbo-Croatian editor to clean up more thoroughly; I can only decide for the obvious cases like late and direct borrowings from German.

Please discuss what to do; such indiscriminate mass-substitutions are considered bannably disruptive by precedent: [3]Fytcha T | L | C 15:04, 9 January 2022 (UTC)[reply]

@Fytcha: I suppose we can block them right away, since this substitution is all that they ever do. Another precedent: Donnanz was also banned from etyl-cleanup, see Wiktionary:Beer_parlour/2021/September#User:Donnanz’s_etyl_clean-up_methods. —Svārtava [tcur] 15:36, 9 January 2022 (UTC)[reply]
★ I was never banned, I stopped. As I stated at the time "I am washing my hands of Category:etyl cleanup". Get your facts right, please. DonnanZ (talk) 16:45, 13 January 2022 (UTC)[reply]
@Svartava2: I've made them aware on their talk page. Let's see if it continues. — Fytcha T | L | C 16:10, 9 January 2022 (UTC)[reply]
@Fytcha: this is continuing. I think a block is in order, and also, since they really aren't making any constructive edits, we lose nothing by blocking them. —Svārtava [tcur] 12:22, 10 January 2022 (UTC)[reply]
@Fytcha: Still they're making those edits. Block them for a month to stop it, and as above, “since they really aren't making any constructive edits, we lose nothing by blocking them”. It's an awkward situation because if I don't know the language well and the case is not obvious enough, neither can I revert it boldly nor can I say that it is correct. —Svārtava [tcur] 08:34, 12 January 2022 (UTC)[reply]
@Svartava2: Blocked them for a week; I hope they get the memo this time. The true pity is that they could be really productive if only they replaced {{etyl}} with the right substitute. — Fytcha T | L | C 13:44, 12 January 2022 (UTC)[reply]
I reduced it to a /64 block. All the edits in question are within the same /64 range, so /48 was overkill. Most ISPs (except mobile providers) assign a /64 IPV6 block to each single customer account, so the default should be a /64 block. Chuck Entz (talk) 20:41, 14 January 2022 (UTC)[reply]
See my comment above. The pettiness of this action, targeting some poor user, beggars description. DonnanZ (talk) 16:57, 13 January 2022 (UTC)[reply]
@Donnanz: Why is it petty? — Fytcha T | L | C 01:58, 14 January 2022 (UTC)[reply]
The blocking of that user doesn't actually achieve anything. Since I stopped doing etyl cleanups, the rate of cleanup has dropped to less than a snail's pace, the speed is now glacial. Apart from some insignificant minor languages included in the cleanup but not listed separately, the only languages totally cleaned up in the last few months are German and Japanese. The tally of major languages where cleanup is still outstanding is 23. DonnanZ (talk) 10:32, 14 January 2022 (UTC)[reply]
It achieves keeping them from making disruptive edits. I'm not going to discuss the point whether such edits are disruptive or not. That has been discussed extensively and there's precedent. It is also, frankly, completely obvious. — Fytcha T | L | C 12:53, 14 January 2022 (UTC)[reply]
A world war could begin and end in the time it has taken to clean them up. If Armageddon came... DonnanZ (talk) 14:27, 14 January 2022 (UTC)[reply]
Unfortunately these geezers are obsessed about this topic - they are welcome to check my contributions if they don't do that already, but I very much doubt that they will find anything of interest in the last few months apart from new etymology, which doesn't count. DonnanZ (talk) 21:29, 14 January 2022 (UTC)[reply]
@Chuck Entz: Thank you for clarifying this. I must say though that it strikes me as weird that being blocked on an account, logging out and editing on an IP is perma-bannable whereas being blocked on an IP, logging in and editing on an account is totally fine. — Fytcha T | L | C 22:02, 14 January 2022 (UTC)[reply]
My user a/c has never ever been blocked. Moreover, I have absolutely no idea what IP # I would have if I wanted to edit when logged out. DonnanZ (talk) 23:34, 14 January 2022 (UTC)[reply]
What I said (above) about the glacial speed of etyl cleanups seems to have made an impact on a certain user, as there has been some frenzied activity in Romanian. DonnanZ (talk) 10:37, 18 January 2022 (UTC)[reply]

Community Wishlist Survey 2022 edit

 

The Community Wishlist Survey 2022 is now open!

This survey is the process where communities decide what the Community Tech team should work on over the next year. We encourage everyone to submit proposals until the deadline on 23 January, or comment on other proposals to help make them better. The communities will vote on the proposals between 28 January and 11 February.

The Community Tech team is focused on tools for experienced Wikimedia editors. You can write proposals in any language, and we will translate them for you. Thank you, and we look forward to seeing your proposals! SGrabarczuk (WMF) (talk) 18:10, 10 January 2022 (UTC)[reply]

Adpositional phrase edit

In 2010, @Ruakh proposed adopting prepositional phrase as a POS header, but thought that neither "Adposition" nor "Postposition" is a standard POS header: so I (Ruakh) see no need to consider "Adpositional phrase" and "Postpositional phrase" at this time. However, entries like at the earliest or at the latest show the need for that header to avoid the confusion arising from overgeneralizing prepositions as adpositions. Notably, @DCDuring opined that preposition is already a misnomer, meaning that it's already in established use to refer to adpositions in general and that only a minority of people object to its heterological nature. However, neither our definition here reflects such convention, nor that of MW. I wanted to see the general position in the project. Assem Khidhr (talk) 14:32, 12 January 2022 (UTC)[reply]

Aren't they called prepositional phrases because they contain a preposition, rather than because they act prepositionally (a convention I really hate because it breaks consistency with noun phrase etc.)? By that token, the two examples you've mentioned don't qualify as postpositional phrases, phrases that contain a postposition and its object. — Fytcha T | L | C 14:57, 12 January 2022 (UTC)[reply]
@Fytcha Oh, I think I fell victim to surface analysis. Thank you! Assem Khidhr (talk) 15:15, 12 January 2022 (UTC)[reply]

Deprecating Usenet edit

As Usenet becomes less relevant, less distributed, and less easy to search, it no longer has the value it did in the early days of Wiktionary. People have rightly complained that its mention in WT:CFI is out of line with our general attitude towards online sources. As it does have some value for recording 20th century usage I propose the following more explicit policy for just that one source:

Usenet posts from 2005 or earlier are considered durable if they are still findable.

I have suggested a more general rule that 20 year old web pages with fixed content should be citable, but nobody seems interested in that. People unhappy with the current rules seem to want the hottest ephemera, not established words.

I'm OK with grandfathering any existing post-2005 Usenet citations. This proposal is not meant to delete any existing entries. Thoughts? Vox Sciurorum (talk) 14:39, 12 January 2022 (UTC)[reply]

I'd love to be able to quote webpages if there was an easy way to create an accessible, archived version of it. I think this isn't a bad idea, a step in the right direction, perhaps. Vininn126 (talk) 14:59, 12 January 2022 (UTC)[reply]
Source for Usenet being harder to search now? I've never found it difficult using Google Groups. And you can even link directly to the post in the quotation, which IMO should be required. 70.172.194.25 22:34, 12 January 2022 (UTC)[reply]
  • Okay, true, Google Groups' search does include a lot of results that are not from Usenet and there does not appear to be any built-in way to filter these out. However, it takes less than a second to look at the group name and tell if it fits the usenet.newsgroup.name.format (it would even be trivial to write a script to filter results in this manner). If you want to be careful, you can also verify that the group really is from Usenet by searching its name in combination with Usenet, or by looking up a newsgroup hierarchy listing.
  • If we're talking about being inaccessible, though, I feel like this doesn't hold a candle to old journal articles behind a paywall or books with no/limited preview on Google Books, which are seemingly still allowed as long as they are durably archived in some format. Meanwhile, a link to a Usenet post via Google Groups poses no barriers to access; it is only the process of finding such quotations that some find challenging (although IMO it is not that hard once you get used to it).
  • The point about Usenet not being as relevant anymore is certainly true, but I don't think we generally require that sources be relevant, only that they be durably archived. There are a lot of works in the Internet Archive, Google Books, or Google Scholar that have probably only ever been read in their entirety by a dozen people and are not culturally relevant, but we still count them as long as they meet the criteria.
  • Overall, I fail to see how adopting this change to policy would benefit the project. It would just make it harder to attest certain slang, jargon, or non-standard forms that actually do exist but don't often appear in print. Are there any examples of words that you think should not be included where post-2005 Usenet postings pushed it just over the edge of CFI? 70.172.194.25 02:02, 13 January 2022 (UTC)[reply]
    • Since this comment, some examples of terms that barely squeak by based on Usenet cites have been discussed, and I may have to reconsider my position on this. And we now have a policy that allows for including Web content under certain conditions, which reduces the need to rely on Usenet. I still think Usenet should be allowed in some form to cite old terms, though. 70.172.194.25 04:37, 14 February 2022 (UTC)[reply]
  • I don't see the point of "deprecating" Usenet. Is it no longer durably archived? I know that it is no longer possible to use Google Groups to get Usenet cites. But if someone, using any means, finds a valid Usenet cite and provides a link thereto, why shouldn't we accept it? DCDuring (talk) 22:49, 12 January 2022 (UTC)[reply]
    I consider it less durable than it was 20 years ago. It's definitely harder to get than it used to be. I don't think it is so superior to Facebook, Twitter, or MySpace to deserve special treatment. It's a worse source than the archives of a major newspaper. It should be handled based on the same policies as electronic sources in general, and as unedited sources in general. (Note that we do not currently distinguish professionally edited documents from keyboard spew, but I think we should.) Vox Sciurorum (talk) 12:12, 13 January 2022 (UTC)[reply]
As a side note, not gonna lie, I'm still a bit confused as to why Usenet has so much power here? I've seen many complaints about how a word widely used on Twitter shouldn't be included because it's not "durably archived", but yet a word can appear in the far reaches of Usenet only three times and be included? It's similar to any other forum nowadays, just that it's much older, so I really wonder if there's anything that can be done with it to actually put it in line with the other guidelines and policies. No reason why there should be this discrepancy. (And this isn't the first time me or anyone else has brought this up.) AG202 (talk) 04:08, 13 January 2022 (UTC)[reply]
  • I'm in favor of downgrading Usenet. As a corpus, it's greatly biased towards a certain type of speaker/language (tech-related, predominately male, etc.) Useful to document the language/slang of that particular group (and time), but not representative of the language community of today. – Jberkel 12:34, 13 January 2022 (UTC)[reply]
    • As a corpus, journal articles are biased towards highly-educated speakers and technical language, but that doesn't seem to be a problem. I'd be in favor of including more sources that we deem as durably archived that encompass a wider range of lects, but I don't see how removing Usenet as a source would help. It's currently the one form of non-print source we deem as acceptable, which is wildly out of line with the trends of the day, and removing it would in essence be relegating ourselves to only dead-tree sources (which admittedly have a longer shelf life than most internet sources; with the notable exception of Usenet!). 70.172.194.25 20:40, 13 January 2022 (UTC)[reply]
I think maybe the question shouldn't be about depreciating usenet, but specifically an ability to upgrade other web-citations, which might be controversial. However, there has been talk of using something like internet archive, which would make these sources much more durable. Vininn126 (talk) 11:39, 14 January 2022 (UTC)[reply]

Spanish redundant accent category edit

Should we have a category for Spanish words with an accent that doesn't affect pronunciation such as and  ? Dngweh2s (talk) 00:00, 13 January 2022 (UTC)[reply]

In both examples the acute accent indicates an aspect of pronunciation- namely stress. The corresponding forms te and mi are atonic (and yes, also different words). That this is so is more obvious in cases like que/qué, quien/quién, etc. Nicodene (talk) 01:03, 13 January 2022 (UTC)[reply]
@Nicodene What about solo/sólo and este/éste? Dngweh2s (talk) 01:09, 13 January 2022 (UTC)[reply]
Éste serves to indicate the pronoun, and so a stressed form, whereas the determiner este forms the first part of a noun-phrase, where the stress will instead be on the head. It is true that the pronoun may also be spelled without the diacritic (for typographical convenience), but, as far as I am aware, the reverse is not true- because an acute accent would not be appropriate for the atonic form.
With solo/sólo I do not see a clear-cut difference in stress. Perhaps not surprising, then, that the latter spelling has been retired. Nicodene (talk) 01:36, 13 January 2022 (UTC)[reply]
Both pairs (este/éste & solo/sólo) fall under the same guidelines, and the tilde should only be added in cases of ambiguity, per la RAE, sections 3.2.1 & 3.2.3. AG202 (talk) 04:11, 13 January 2022 (UTC)[reply]
If the official practice is now to drop the diacritic in general, but to require it where ambiguity is possible, then the description on our entry for sólo should be changed accordingly, because it simply states the spelling is deprecated.
It seems, in any case, that sólo is a genuine example of what the OP was describing. Nicodene (talk) 05:33, 13 January 2022 (UTC)[reply]

Call for Feedback about the Board of Trustees elections is now Open edit

You can find this message translated into additional languages on Meta-wiki. More languages • Please help translate to your language The Call for Feedback: Board of Trustees elections is now open and will close on 7 February 2022.

With this Call for Feedback, the Movement Strategy and Governance team is taking a different approach. This approach incorporates community feedback from 2021. Instead of leading with proposals, the Call is framed around key questions from the Board of Trustees. The key questions came from the feedback about the 2021 Board of Trustees election. The intention is to inspire collective conversation and collaborative proposal development about these key questions.

There are two confirmed questions that will be asked during this Call for Feedback:

  1. What is the best way to ensure more diverse representation among elected candidates? The Board of Trustees noted the importance of selecting candidates who represent the full diversity of the Wikimedia movement. The current processes have favored volunteers from North America and Europe.
  2. What are the expectations for the candidates during the election? Board candidates have traditionally completed applications and answered community questions. How can an election provide appropriate insight into candidates while also appreciating candidates’ status as volunteers?

There is one additional question that may be presented during the Call about selection processes. This question is still under discussion, but the Board wanted to give insight into the confirmed questions as soon as possible. Hopefully if an additional question is going to be asked, it will be ready during the first week of the Call for Feedback.

Join the conversation.

Best,

Movement Strategy and Governance --Mervat (WMF) (talk) 09:29, 13 January 2022 (UTC)[reply]

Allow adding unattested translations? edit

(Following up on #Unattested_translations_and_{{not_used}})

WT:TRANS clearly states: " [] words added to translation tables are subject to attestation requirements as well." Where does the community stand on relaxing this requirement? As I've also stated in the above discussion, I see it as a possibility to create a template {{no attested translation}} (in the spirit of and styled equivalently to {{no equivalent translation}}), after which unattested translations are allowed to be added. There needs to be something one can add to such entries. {{not used}} (as seen here) is just not satisfactory because there is a term that is used and readily understood, it just doesn't conform to our specific attestation criteria.

This discussion made me think of Alemannic where there are potentially tens of thousands of commonly used and understood terms (mainly technical or scientific in nature) that can absolutely not be attested (I guess because science is in general not conducted in the L language in a diglossic environment). These terms are usually calques, nativized loans, or loan renderings of the German equivalent.

I do acknowledge that this makes the dictionary less verifiable. What else do we have other than the respective editor's word? — Fytcha T | L | C 12:39, 13 January 2022 (UTC)[reply]

  • If a translation is challenged the supporter should be able to demonstrate that it is used. Personally, I am willing to accept weaker evidence for a translation than an entry but I would still demand some evidence. Others might insist on full citation compliance. I had some translations challenged around late 2020 and I went to the effort of digging up quotations. I do not think the challenger needed to accept my word for it. One of my translations was said to be code switching so I deleted it. Vox Sciurorum (talk) 21:55, 13 January 2022 (UTC)[reply]
    @Vox Sciurorum: I must say, even finding any shred of evidence at all will be hard for the majority of scientific terms in Alemannic. coordinate chart is Alemannic German Charte (likely a loan meaning from German Karte) but this would be absolutely impossible to attest in any way, shape or form. Then again, we probably don't care about scientific Alemannic terms anyway. — Fytcha T | L | C 13:42, 14 January 2022 (UTC)[reply]
    As a general policy, if it hasn't been written down it does not exist as far as Wiktionary is concerned. We are not in the business of researching spoken languages. Vox Sciurorum (talk) 14:30, 14 January 2022 (UTC)[reply]
  • The mention of Alemannic reminds me of hearing technical conversations in foreign languages. English words appear regularly, but they are still English words. We consider them code-switching rather than borrowings. A similar issue has been brought up with Scots and English. Almost any English word can be dropped into a conversation. Vox Sciurorum (talk) 01:41, 14 January 2022 (UTC)[reply]
@Fytcha: Since I add a lot of translations, I find Category:Requests for translations by language both useful and bothersome, since a few users just carelessly throw around {{t-needed}} into rare languages or for terms, which are unlikely to be even known to speakers of a given language. My request to reduce the usage of {{t-needed}} was always rebutted with "all words all languages" motto. I agree with the motto but it has to be reasonable. What are all these requests doing at 610 Office, contact tracing or Doukhobor?
I am not sure if {{no equivalent translation}} is an ideal solution but it's better than nothing or if you want to quickly close an annoying request to reduce the request category. Also pinging @Benwing2: who contributed in the other discussion. --Anatoli T. (обсудить/вклад) 02:11, 14 January 2022 (UTC)[reply]
@Atitarev: I think it would be valuable to distinguish between {{no equivalent translation}} and something like a hypothetical {{no attested translation}}. The first one states that there exists no equivalent, when, it fact, it does exists (it can't be claimed that Romanian mathematicians simply lack the vocabulary to refer to Schläfli symbols, right? Even ro.wiki has an entry: ro:simbol Schläfli) but just doesn't fit our policies (yet). On the other point I agree, is there really a need to request a translation of 610 Office into Armenian and Georgian, considering those languages don't even have a translation for doormat? Request the important stuff first! contact tracing seems more reasonable though. — Fytcha T | L | C 13:35, 14 January 2022 (UTC)[reply]
Is there a mechanical test to find out if a word in spoken language is code switching or not? I honestly think code-switching to High German is rarely the case in Alemannic, though it happens sometimes, e.g. if a High German idiom that has grammatical features that Alemannic lacks is used (then, Swiss High German phonology is usually used accordingly). Apart from that, I'd call them calques / nativizations for morpho-/phonological reasons (the exact categorization is maybe a bit hard; compare also Wiktionary:Etymology_scriptorium/2021/October#Romanian_asexualitate; it may be that Wiktionary currently doesn't document this class of "borrowings" correctly). — Fytcha T | L | C 13:21, 14 January 2022 (UTC)[reply]
We also have the CFI criterion "clearly widespread use", which may be helpful in this case, as Alemannic is rarely written down. But of course you should make sure that it's not codeswitching, like Vox says. Thadh (talk) 09:49, 14 January 2022 (UTC)[reply]
@Fytcha: I don't quite understand this. You have to explain so that people without a good grasp of Romanian make sense of this too. Is jeton nefungibil an SoP? Perhaps it should be broken up like jeton nefungibil? Is "jeton nefungibil" identical to non-fungible token? It's unattestable but how is it a correct translation? Having both "no attested translation in Romanian, but see" and a translation next to it, doesn't make much sense to me. Perhaps splitting into parts like this jeton nefungibil would be sufficient? --Anatoli T. (обсудить/вклад) 23:38, 18 January 2022 (UTC)[reply]
@Atitarev: It is not SOP for the same reason the English term is not SOP: The specific cryptocurrency related meaning is not deducible by the parts. Therefore, I don't think linking to the parts is the correct way to go about it. It is "obviously correct" because it is the term that is actually used by Romanian media ([4], [5]) and ro.wikipedia ([6]). The reason I didn't add {{not used}} or {{no equivalent translation}} is because there actually is an equivalent term in use, so those template would be wrong (the latter of the two also requires attestation from my understanding, so that wouldn't fly anyway). The reason I didn't just use {{t}} is because WT:TRANS states that WT:ATTEST applies to translations too. In conclusion, we're in quite a weird spot here from what I can tell, which is why I created this template. Tell me if this makes it clearer! — Fytcha T | L | C 23:58, 19 January 2022 (UTC)[reply]
@Fytcha: I see your point now, thanks. Perhaps the WT:ATTEST are to strict and should include media usage? Or perhaps there should be some distinction between "entry-worthy" vs "translation-worthy" (correct and perhaps the only way but unattested")?
As in the previous discussion (I know you disagreed), I think {{t|fi|Streisand-ilmiö}} is a good Finnish translation of Streisand effect, if you don't think it's worth creating that entry, there is a Wikipedia article for that but wish to show users how this is translated into a target language.
Perhaps a fully de-linked templatised translations should also be allowed, just "jeton nefungibil" you can achieve it with a template call: {{t|ro||n|alt=jeton nefungibil}} -> jeton nefungibil n.
See also how I dealt with the Russian translation of [[boat people]]. «лю́ди с ло́дки» m pl («ljúdi s lódki») (i.e. "people from a/the boat") is not idiomatic in Russian (even if attestable) but it's a correct translation, using "quotes" to highlight that's what is used by media as a translation only. So, maybe it would be correct to use "jeton nefungibil" in your case? --Anatoli T. (обсудить/вклад) 00:37, 20 January 2022 (UTC)[reply]
@Fytcha If it's an "obviously correct" translation because it's found in actual usage in the media, it should be included, not hidden behind {{no attested translation}}. If it can't be attested per our attestation guidelines, either we need to revise them, or you should just ignore the attestation rules in this particular case. To me this is clear case where the letter of the law is contrary to the spirit of the law, in which case the spirit should prevail. (Compare Wikipedia's "ignore all rules" policy.) Benwing2 (talk) 06:04, 20 January 2022 (UTC)[reply]
@Benwing2, Atitarev: I see your points and I agree that our CFI need revision (most uncontroversial opinion on this platform probably), though I don't feel up to the task of proposing a revision. From what I see, Romanian jeton nefungibil might be includible if the current proposal passes (second bullet point).
Differentiating between entry-worthy and translation-worthy is already being done with respect to idiomaticity (translations are allowed to be non-idiomatic; should be linked word by word if so). Further differentiating with respect to attestability is an interesting idea, especially coupled with un-linking the translations. One difficulty I see is that closing RFVs as failed is harder now, because non-English RFVs would have to go through two phases: one where entry-attestation is considered and one where translation-attestation is considered. We also don't have any good way of documenting translation-attestation really.
This is getting more and more complicated, so I suggest we just do one of the following:
  1. I delete {{no attested translation}} and create {{t-unattested}} as a hard-redirect to {{t}} (so that we still document whether a translation is WT:ATTESTed or not because I look this up every time (and so should everybody else)) while we revise WT:TRANS to be a bit laxer (can be done without a vote; see header).
  2. I delete {{no attested translation}} and just keep adding such translations as usual with plain {{t}} and we will RFV-pass any RFVs of terms like jeton nefungibil using "clearly widespread use".
  3. We keep {{no attested translation}}.
Tell me what you think! I don't have any strong feelings about this, I just want the obviously correct translations to be included one way or another. — Fytcha T | L | C 15:04, 20 January 2022 (UTC)[reply]

Trivial English present participles edit

(See Wiktionary:Votes/2022-01/Excluding_trivial_present_participal_adjectives. I will probably temporarily retract the vote until the exceptional case brought up by AG202 on the talk page has been addressed satisfactorily.)

Where does the wider community stand on trivial (in the sense as defined in the vote as a combination of three criteria) adjectival conversions of present participles? I generally disagree with their separate inclusion (i.e. by using an adjective header in addition to a verb header), see the vote page for the majority of my reasoning and arguments. However, the point has been brought up that it would be potentially confusing to lose the overwhelmingly common second sense of e.g. interesting along with its many good translations. I strongly agree with the translations part so I am currently in search of a good criterion to pick out the trivial but keep-worthy adjectives (like interesting, annoying) while getting rid of the "That VERBS." glossed nonsense (and equivalents, of course) like falling. See Wiktionary_talk:Votes/2022-01/Excluding_trivial_present_participal_adjectives#Present_participles_that_do_act_like_true_adjectives for two ideas on how to potentially discriminate those two cases. Most importantly:

 

1. Have the community collaboratively define a set of exceptional adjectives (via the BP which also can always be updated with a BP consensus). This is still an improvement compared to the status quo as it shifts the default position from inclusion to deletion, which is how it should be.
2. Try to come up with something along the lines of a WT:THUB criterion, where a present participal adjective may be entered if it has a certain number of interesting translations. What constitutes interesting exactly would still have to be hammered out, but I think "having N translations that are not the present participle equivalents of the respective translated base verb or any verb that is synonymous (in this specific sense)" would be a starting point. What should and shouldn't be counted towards these N would, as is the case for WT:THUB, be subject to appeal; we don't want e.g. the collective of all Arabic lects to be able to unilaterally meet this criterion.

 

Pinging @Lambiam, Vininn126, Sgconlaw, AG202, Donnanz, Eirikr as some of the more involved parties in this discussion. — Fytcha T | L | C 21:13, 13 January 2022 (UTC)[reply]

I think that the vast majority of participles should NOT have adjectival definitions except a small handful - i.e. the ones brought up by AG202. I think it will be hard to come up with a hard and fast rule, unfortunately, as most of these adjectival participles are considered such only becuase they're more idiomatic. Vininn126 (talk) 21:18, 13 January 2022 (UTC)[reply]
I'm not happy with the use of "trivial", it should be replaced with something else. And on the subject of worthy -ing adjectives, I bet most users wouldn't wouldn't delete f***ing if its part of their vocabulary. DonnanZ (talk) 21:38, 13 January 2022 (UTC)[reply]
F***ing is a good example actually of a word that doesn't fall under the typical adjective tests and also has a bunch of translations, so @Fytcha we still would probably need more clarity, though I'm not sure where to start with it. AG202 (talk) 22:24, 13 January 2022 (UTC)[reply]
@AG202: That word is not concerned by my vote (just like becoming and eating aren't) because the semantics are not 100% transparent (it doesn't mean "that VERBS"). — Fytcha T | L | C 22:41, 13 January 2022 (UTC)[reply]
Ahhhh alright that makes sense, I wasn't sure if it could be construed to fit under one of the verb meanings at f***. Thanks! AG202 (talk) 00:21, 14 January 2022 (UTC)[reply]
Strike the language "may be deleted by any user on sight and may not be re-entered" and simply say they are not to be included as separate parts of speech. We don't need to say that something not meeting CFI can be deleted or should not be entered and controversial cases will end up in a forum discussion anyway. More specifically, add to Wiktionary:About_English a paragraph "adjectives which are simply present participles used adjectivally in the sense of 'which VERBS' are not included as separate parts of speech; they should be listed as verb forms using {{present participle of}}." Vox Sciurorum (talk) 21:49, 13 January 2022 (UTC)[reply]
I was going to object to that anyway. If that happened, perish the thought, it could be extended (in a nightmare) to SoP terms and other PoS. Definitely not on. But this whole proposed vote deserves to fail. DonnanZ (talk) 22:16, 13 January 2022 (UTC)[reply]
@Vox Sciurorum, Lambiam: I agree with both of your objections regarding the wording so I've updated the wording now. Tell me what you think! — Fytcha T | L | C 12:48, 14 January 2022 (UTC)[reply]
I’m on board with the general idea, even though I have some problems with the current wording of the proposal. Also, I think it can be smoothly extended to past participles. Even more generally, if in some language it is a property of its grammar that terms with some given primary POS assignment can also be used with a second POS role (like adjective → adverb in e.g. German; see German adverbial phrases § Adverbial forms of adjectives on Wikipedia), there needs to be a specific reason beyond such routine use for the inclusion of such a term under that second POS.  --Lambiam 23:41, 13 January 2022 (UTC)[reply]
Completely agree with Lambian’s extension. MuDavid 栘𩿠 (talk) 01:22, 14 January 2022 (UTC)[reply]
@Lambiam: I agree, I've also mentioned German and Romanian in the vote's rationale (both of which feature trivial adj->adv conversion). I actually started out writing the vote to pertain to all languages but then in the middle of it I thought to myself that I don't want to vote to fail only because of some unforeseen corner case in a language I'm not familiar with, so I've changed it to English. Judging by the languages I know, however, I totally agree with what you're saying, there needs some kind of distinguishing feature for inclusion apart from the predictable and trivial conversion. This is already de-facto policy for adj->adv conversions in the two languages I've mentioned. — Fytcha T | L | C 01:24, 14 January 2022 (UTC)[reply]
Another adj → adv example is Turkish grammar, which moreover has several finite verb forms with predictable secondary roles as participles, such as [third-person singular present simple indicative] → [aorist participle] and [third-person singular future] → [future participle].  --Lambiam 10:36, 14 January 2022 (UTC)[reply]
Lambiam's proposed extension to past participles is also flawed, judging by Fytcha's RFD of pressurized. That one has an antonym, unpressurized. DonnanZ (talk) 10:58, 14 January 2022 (UTC)[reply]
I do not see the argument. Are you suggesting that the verb forms abetted, abolished, abraded, abrased, abrogated, absolved, ... merit a second entry as an adjective merely by dint of having a derived term with un-? Are there any past participles of transitive verbs that, in your opinion, do not deserve a separate inclusion as adjective? And then, what of adjectives as nouns (the dispossessed, the strong, and so on and so forth)?  --Lambiam 14:07, 14 January 2022 (UTC)[reply]
The fate of usexes, quotes, and translations included in entries covered by this proposal is also unclear. Deleting those would be vandalism, considering the effort of editors adding them. DonnanZ (talk) 12:11, 14 January 2022 (UTC)[reply]
Agreed, Vininn126 (talk) 12:53, 14 January 2022 (UTC)[reply]
You agree to what? DonnanZ (talk) 14:31, 14 January 2022 (UTC)[reply]
To the comment above mine by Lambian ;) That's why I replied to that comment Vininn126 (talk) 14:34, 14 January 2022 (UTC)[reply]
I'm still confused. DonnanZ (talk) 14:47, 14 January 2022 (UTC)[reply]
I’m on board with the general idea, even though I have some problems with the current wording of the proposal. Also, I think it can be smoothly extended to past participles. Even more generally, if in some language it is a property of its grammar that terms with some given primary POS assignment can also be used with a second POS role (like adjective → adverb in e.g. German; see German adverbial phrases § Adverbial forms of adjectives on Wikipedia), there needs to be a specific reason beyond such routine use for the inclusion of such a term under that second POS.  --Lambiam 23:41, 13 January 2022 (UTC) Vininn126 (talk) 14:52, 14 January 2022 (UTC)[reply]
@Lambiam I don't think I'd support a sweeping proposal like this one. There've already been issues with proposals that don't take into account more minority languages and communities, so I'd really just keep it to English for now. If it should be applied to German and there's no major opposition to it in the German editor community, then it should just be added to Wiktionary:About_German, after some discussion in Beer Parlour. Re: past participles, I still think that it'd be best to just run the typical adjective tests found on Wiktionary:English adjectives as that'd weed out the ones that should be weeded out (ex: opened would not pass but frightened or broken would), and to be honest, the more I think about the more I feel that a formal vote on this is becoming less necessary. AG202 (talk) 22:13, 14 January 2022 (UTC)[reply]
  • Could someone explain to me why an English word ending in 'ing' that passes the tests for adjectivity (gradability/comparability/modification by 'very' or 'too', use after copulas other than forms of 'is', meaning distinct from that of a verb it is derived from) should not retain an adjective heading? DCDuring (talk) 14:59, 14 January 2022 (UTC)[reply]
    @DCDuring: No, but that is not the concern of this vote anyway. I only want to get rid of (most; see AG202's point) adjective entries that just mean "that VERBs" such as falling (among other criteria), i.e. ones that have no "meaning distinct from that of a verb [they are] derived from". My arguments for that are on display on the vote's page. — Fytcha T | L | C 15:07, 14 January 2022 (UTC)[reply]
@Fytcha Can you give a larger list of "trivial present participles" besides just falling which currently have adjective entries? Benwing2 (talk) 01:55, 16 January 2022 (UTC)[reply]
@Benwing2 Not sure if I did this right, but if I did, this should be the search for all the English present participles that have an adjective header, so it'd include both the "trivial" entries and the ones that I've mentioned, as a starting point. Also as a side note, I think that a category like "English present participles" would be very helpful to have. I'm not sure exactly why it was deleted, and looking at the RFD "discussion", it seems that it was deleted without direct discussion about it. AG202 (talk) 02:09, 16 January 2022 (UTC)[reply]
@Benwing2: growling, accusing, reigning, curving, quivering, improving, defining, resulting, tickling, contracting, differing, tinkling (sense 1), inducing, wetting, discouraging, thieving, widening, musing, rousing, arousing, all meaning "That VERBs". For some of these, like improving, it could be argued that they are not trivial because they additionally contain some semantics about habituality, which would bar them from my proposal, something I didn't think of before. — Fytcha T | L | C 03:11, 16 January 2022 (UTC)[reply]
@Fytcha Thank you. I would argue that some of these deserve to be adjectives as they can be qualified by words like extremely, somewhat or quite: extremely discouraging, somewhat curving, quite arousing. Benwing2 (talk) 03:21, 16 January 2022 (UTC)[reply]
@Benwing2: Aren't that simply the comparable ones? — Fytcha T | L | C 03:24, 16 January 2022 (UTC)[reply]
@Fytcha My point is that being comparable is IMO one clear test of a term being an adjective rather than just a participle. Another IMO is when a term describes a state rather than a result; this is the point User:Chuck Entz made in his description about pressurized. So it's not enough just to say it can be defined as "that VERBs" because the adjective-y terms have additional semantics, even if not explicitly captured in the definition. Benwing2 (talk) 03:56, 16 January 2022 (UTC)[reply]
@Benwing2, AG202: From what I see, the discussion seems to circle mainly around comparability for the two of you. I want to remind, however, that comparability is neither a sufficient nor necessary condition for adjectivality, the former of which I've learned only recently in this discussion. It could be the case however, that it is indeed sufficient to discriminate within the class of present participles and merely fails for nouns; that has to be shown though.
Even if this proposal of mine leads nowhere (which is what it currently looks like), I at least hope that we can come up with better tests that can be applied razor-sharply. The non-be copula test that AG202 uses could be one. Then again, where do we draw the line? spiring, for instance, has exactly one attested use with become; does this merit an adjective entry now? Once the dust has settled, something should be put into WT:CFI. — Fytcha T | L | C 04:15, 16 January 2022 (UTC)[reply]
I propose a second test - can the given participle be used predicatively without giving the continuous form (i.e. that was exciting). This would give two possible tests. I wonder if there's a participle that would fail both and still be considered an adjective. If yes, then perhaps we need one more test. Vininn126 (talk) 04:17, 16 January 2022 (UTC)[reply]
Out of that list, I'd personally probably keep "discouraging" & "arousing" just based on the same tests I used for "pressurized". AG202 (talk) 03:21, 16 January 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── I feel quite doubtful about the proposal in general. I worry that it will lead to a lot of wrangling over what is regarded as “trivial” and what isn’t. Moreover, if common -ing forms are not marked as adjectives, are readers supposed to assume that all such words can be used as adjectives? Is this generally a correct assumption? If so, where will readers learn about this? Do we put quotations showing adjectival use interspersed among verb uses, organized chronologically in the usual way – will this be confusing to readers? (Or do we somehow list such quotations in a separate block?) Why should we ignore the lemming principle here (many other major dictionaries mark such words as adjectives)? — SGconlaw (talk) 04:21, 16 January 2022 (UTC)[reply]

The general consensus so far is that a few will be kept. And yes, readers should be expected to understand that participles are adjectives, we're a dictionary, not a grammar. I don't think we should ignore the lemming principle, just because OED lists them doesn't mean we should. Vininn126 (talk) 04:40, 16 January 2022 (UTC)[reply]
@Vininn126: what should we do about adjective quotations then? Mix them together with quotations showing verb uses? — SGconlaw (talk) 05:16, 16 January 2022 (UTC)[reply]
You mean like on growing? Potentially. Or under the participle usage, proving that, indeed, the participle exists, and it is functioning as participles do in English. Vininn126 (talk) 05:21, 16 January 2022 (UTC)[reply]
We may not be a grammar, but word-class membership is essentially a matter of syntax, not semantics. I fundamentally disagree with the proposition that "trivial" wording in a participle's definition is a sufficient reason to delete an adjective PoS section for any English participle. Further, I believe that no semantic consideration is a sufficient reason for deletion. Membership in the adjective word class is governed by syntactic criteria for inclusion: attributive use [a necessary condition], gradability/comparability (eg, modification by too, very) [sufficient], use in a predicate after a copula other than be (eg, seem, become) [sufficient]. Having a distinct definition not immediately transparent from the definitions of the verb from which it is derived is also a sufficient reason for inclusion.
I'm sorry that contributors are looking for shortcuts to deletion of PoS sections they object to instead of doing the work involved in determining whether a word behaves like a member of a word class. Flooding the RfV process with definitions one objects to to achieve deletion is scarcely better. DCDuring (talk) 06:03, 16 January 2022 (UTC)[reply]
@DCDuring: Not sure why you keep going back to that "wording in a participle's definition" line; it is self-evident that my criterion is aimed at the actual semantics of the word, not whether its definition on Wiktionary matches some kind of regex; in my vote I also wrote "a 100% transparent meaning". Even if you misunderstood or I explained poorly, this would be the obvious steelman.
Also, could you please address the concerns @Mihia brought up there? As it stands, it looks like you're saying "New York" is an adjective, for it passes the sufficient condition proposed by you of being gradable/comparable. — Fytcha T | L | C 13:11, 16 January 2022 (UTC)[reply]
New York is no adjective nor is New Zealand, but both are used attributively. The adjective was deleted, see Talk:New York. DonnanZ (talk) 13:55, 16 January 2022 (UTC)[reply]
I am saying that semantics is often irrelevant to word-class membership and is especially so in the case of participles. Not even a completely transparent meaning trumps the syntactic evidence that a word is a member of a specific word class.
As to proper nouns being adjectives, no one says they are worth including as adjectives merely because they are used attributively. In fact, we don't even take as sufficient that a proper noun can be shown to be sometimes forced into adjective-like use with too or very (eg, "That tweet was very Trump"). Virtually any noun can be used attributively. Your reliance on semantics for some kind of rule about word-class membership is completely misguided. DCDuring (talk) 18:42, 16 January 2022 (UTC)[reply]
So we shouldn't keep adjectival definitions of nouns, as it's 100% predictable how they're used, but we should keep adjectival participles such as "growing", because it's... 100% predictable? Huh? Vininn126 (talk) 13:21, 18 January 2022 (UTC)[reply]
@DCDuring: "no one says [proper nouns] are worth including as adjectives" I think you are by presenting a sufficient condition ("gradability/comparability (eg, modification by too, very)") that applies to them as well. Either your sufficient condition is not sufficient or you think these words are adjectives as well and deserve an adjective PoS header. Which one is it? You're free to walk back your claim that gradability/comparability is a sufficient condition, but then I'd again ask you to present sufficient conditions to test adjectivality of a word. — Fytcha T | L | C 13:28, 18 January 2022 (UTC)[reply]
We made the explicit decision not to accept attributive use alone as evidence of adjectivity for any English noun because such attributive use is possible for virtually any English noun, both as a matter of syntax and as a matter of actual usage. For English proper nouns we decided that we wanted more than minimal evidence of adjective-type usage like "It was not a very White House way of communicating" to support adjectivity.
We have some 1,300 entries that are in both Category:English proper nouns and Category:English adjectives. Most of them are demonyms or glossonyms, for which we believe it likely that evidence could be found of true adjectivity. Hardly any have citations supporting their adjectivity. Many of them could use some cleanup, eg, moving derived and related terms to the bottom of the L2. DCDuring (talk) 14:56, 18 January 2022 (UTC)[reply]
The whole concept of removing these adjectives should die a death. I notice that Fytcha has taken down the proposed vote - for now. DonnanZ (talk) 10:50, 16 January 2022 (UTC)[reply]

Non-English entries that don't meet WT:CFI#Numbers,_numerals,_and_ordinals edit

See Wiktionary:Requests_for_deletion/Non-English#Uzbek_SOP_numbers and Wiktionary:Requests_for_deletion/Non-English#өч_йөз: Would anybody mind if I instagibbed all such entries whenever I encounter them? I would of course take care of proper relinking etc. I ask this because RFD-tagging them, listing them and then revisiting them after a month etc. can be so tiring for such a large number of essentially equivalent entries. — Fytcha T | L | C 22:55, 13 January 2022 (UTC)[reply]

@Fytcha: No I wouldn't mind. And to the contrary, there's precedent for deleting all such entries. Imetsia (talk) 18:22, 17 January 2022 (UTC)[reply]
Seconded. Ultimateria (talk) 21:45, 18 January 2022 (UTC)[reply]
@Imetsia, Ultimateria: Thanks. I deleted like a couple of dozen Malay SOPs. — Fytcha T | L | C 02:28, 21 January 2022 (UTC)[reply]

Why the heck is "m*nstr*l" word of the day? edit

Discussion moved from Wiktionary talk:Word of the day/Nominations.

The following comment was posted at the above location. Minstrel was WOTD on 12 January 2022 but it may be useful to hear some views on it:

"It is offensive to BIPOC. Especially without proper contextualization in the bottom for words relevant to today!" — This unsigned comment was added by 72.76.95.136 (talk) at 08:27, 12 January 2022.

I assume that the original poster was referring to sense 2.2: "(US, historical) One of a troupe of entertainers, often a white person who wore black makeup (blackface), to present a so-called minstrel show, being a variety show of banjo music, dance, and song."

Now as far as I can tell, this sense of the word itself is not derogatory, but the practice of people who are not black performing in blackface is nowadays regarded as inappropriate. Should that disentitle an entry from appearing as WOTD? (I feel that if an explanation is needed, then it would probably not be feasible to feature such an entry as a WOTD as the comment line at the bottom of the WOTD is not a very suitable place for a lengthy discourse.) Thoughts? — SGconlaw (talk) 13:01, 14 January 2022 (UTC)[reply]

If the term is already considered offensive by itself, then what about bl*ckf*c*, or sl*v* tr*d* for that matter?  --Lambiam 14:14, 14 January 2022 (UTC)[reply]
Some people are too thin-skinned or deniers. DonnanZ (talk) 14:21, 14 January 2022 (UTC)[reply]
I know right, we allow minstrel but Sgconlaw (talkcontribs) didn't accept my nomination of proctorrhea - what double standards! Br00pVain (talk) 14:24, 14 January 2022 (UTC)[reply]
Oh well, if you all think we should have proctorrhea on the Main Page, please say so now … ha, ha. — SGconlaw (talk) 14:34, 14 January 2022 (UTC)[reply]
Minstrel is definitely more acceptable. SGconlaw is the boss, mate. Hard cheese. DonnanZ (talk) 14:42, 14 January 2022 (UTC)[reply]
The word minstrel is not offensive in of itself. Buidhe (talk) 11:40, 15 January 2022 (UTC)[reply]
IMO the blurb should have included a note of some sort in sense 2.2 that such shows are considered racist today; cf. the lede in the Wikipedia article minstrel show, which says "The minstrel show, also called minstrelsy, was an American form of racist entertainment developed in the early 19th century." Benwing2 (talk) 01:48, 16 January 2022 (UTC)[reply]
I suppose we could update the definition to reflect this in some way, though the OED does not. — SGconlaw (talk) 04:24, 16 January 2022 (UTC)[reply]
I went ahead and updated the definition and the image caption. — SGconlaw (talk) 05:34, 16 January 2022 (UTC)[reply]
Those who erase history are doomed to repeat it, ain't they. Equinox 17:31, 17 January 2022 (UTC)[reply]
What do people gain from all their laborious study of history? The sun rises and the sun sets, and hurries back to where it rises. What has been will be again, what has been done will be done again; there is nothing new under the sun. :(  --Lambiam 00:30, 18 January 2022 (UTC)[reply]
I suppose that we could adopt the practice of avoiding entries that could be offensive to someone, but which we cannot outright condemn. I view minstrel as such an entry. Can one actually use the word minstrel on its own offensively?
BTW, would we even need the allegedly offensive definition if we did not have its uncited subsense, which I have now RfVed? It would be much easier to have an apologetic usage note for the entry at minstrel show. Even there we could delegate the apology/condemnation to WP by including their article as further reading. DCDuring (talk) 14:55, 21 January 2022 (UTC)[reply]
@DCDuring: I have a feeling sense 2.2.1 (amphetamine tablet) is not the sense objected to. (It appears in the OED, by the way.) It's probably sense 2.2, but in my view it's not the sense itself that is offensive to the OP but the concept of someone performing in blackface. I can't see how Wiktionary could be regarded as a reliable dictionary if sense 2.2 were omitted.
What I find a bit difficult to accept is the idea that entries with concepts that people might find objectionable should be excluded from WOTD even if the entries themselves are not vulgar, etc. For example, should capitalism or socialism be excluded on this basis, that some people object to the concepts and not because the words themselves are rude? — SGconlaw (talk) 15:37, 21 January 2022 (UTC)[reply]
I didn't think it was sense 2.2.1 that was the source of the offensiveness, but rather that it was the source of the need to have sense 2.2, also not fully cited. OTOH, I note that MW Online has five definitions for minstrel, though they do not have the amphetamine sense. One of their definitions ("a member of a type of performance troupe caricaturing Black performers that originated in the U.S. in the early 19th century") has a longish apologetic note.
How much time do you want to spend in this kind of discussion? If people are easily offended by this kind of entry, then we can spend less time apologizing and fretting by not drawing attention to such entries. (BTW, when I scanned my watchlist and saw the heading there, I thought that the word being challenged was menstrual.) DCDuring (talk) 16:09, 21 January 2022 (UTC)[reply]
Not only many of our users, but also some of our contributors, including a few of our veterans, have trouble with the distinction between words and concepts. They seem to think of Wiktionary as a short-attention-span version of WP. DCDuring (talk) 16:36, 21 January 2022 (UTC)[reply]
I don't see sense 2.2.1 as the reason why sense 2.2 is needed; even if sense 2.2.1 fails verification we'd need to have sense 2.2, otherwise we'd be missing a key sense of the word. Anyway, I think that briefly mentioning in the definition of sense 2.2 that blackface is now regarded as racist addresses the OP's concern.
I just wanted to get a sense of what other editors thought of the OP's comment. It seems like most people think it was slightly over-sensitive. If we had to avoid causing all possible offence, it might be quite hard to predict what terms someone might object to. We've featured proletarianization and Ugandan affairs before; conceivably people opposed to Marx or feeling that Uganda has been insulted might take offence. Where do we draw the line? — SGconlaw (talk) 17:24, 21 January 2022 (UTC)[reply]

“Red” links shown black in inflection-tables edit

The use of class="inflection-table" in inflection tables has the effect that entries in the table with a wikilink that leads nowhere are initially not shown in red but in black, even though the link has action=edit&redlink=1. After following the link, it turns red. Is this intentional? I find it awkward.  --Lambiam 13:44, 15 January 2022 (UTC)[reply]

Why do you find it awkward? Also, having a lot of redlinks really looks bad in most inflection-table templates. Thadh (talk) 13:48, 15 January 2022 (UTC)[reply]
Interesting. I would like to see some examples, especially verb entries incorporating "one's", "someone's" etc. DonnanZ (talk) 14:08, 15 January 2022 (UTC)[reply]

"Adjectival noun" header in Japanese edit

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo): This is not a standard header but shows up in several pages, e.g. 大人気 (very popular), 誇大 (exaggeration), 小柄 (small build, short stature), 無礼 (impolite, rude). There is a Wikipedia page Adjectival noun (Japanese) that says Adjectival nouns constitute one of several Japanese word classes that can be considered equivalent to adjectives. The last three words link to the Japanese adjectives page, which describes these words as "na-adjectives". They are categorized under Category:Japanese adjectives. All of this makes me think the header should just read "Adjective". Even the two definitions above that are nouns seem suspect; 誇大 as shown in the cited examples seems better glossed as "exaggerated", and 小柄 could easily be glossed as "short of stature" or "of small build". Another point here is the following, from the Wikipedia article: In their attributive function, Japanese adjectival nouns function similarly to English noun adjuncts, as in "chicken soup" or "winter coat" – in these cases, the nouns "chicken" and "winter" modify the nouns "soup" and "coat", respectively. What is being described here is exactly equivalent to what we call "relational adjectives" in Slavic languages (as well as in Latin, Ancient Greek and most Romance languages). Such adjectives are often glossed as nouns in English, but that does not change their status in the source language. Similarly, Japanese したい is translated as "want to do" but is an adjective. Benwing2 (talk) 01:45, 16 January 2022 (UTC)[reply]

These seem to be remnants of an obsolete entry style. Japanese entries used to distinguish "adjective nouns" (-na) from "adjective" (-i), but no longer so now. -- Huhu9001 (talk) 03:03, 16 January 2022 (UTC)[reply]
Right, and I have discussed the issue before with Eirikr. https://en.wiktionary.org/wiki/User_talk:Eirikr/2020#%E9%87%8D%E7%AE%B1%E8%AA%AD%E3%81%BF Shen233 (talk) 18:41, 17 January 2022 (UTC)[reply]
I went ahead and renamed ==Adjectival noun== to ==Adjective== in all Japanese entries. Benwing2 (talk) 02:09, 18 January 2022 (UTC)[reply]
Thank you @Benwing2! As others have noted, this "adjectival noun" terminology was a stale holdover from past style.
One key difference between this class of Japanese adjectives and the English construction of attributive nouns is that a Japanese -na adjective cannot be used as a standalone noun -- it cannot be used as the patient or agent. Some of these words are also nouns, but not many of them. The majority that cannot be used as nouns as-is can be turned into nouns with the addition of the nominalizing suffixes (-sa, -ness, objective degree or amount) or (-mi, -ness, subjective experience).
Due to this distinct non-noun-ness of Japanese -na adjectives, I have cringed whenever I run across an English-language text describing these as "nouns". I am happy that you've cleared out the last of these stale headers.  :) ‑‑ Eiríkr Útlendi │Tala við mig 18:50, 18 January 2022 (UTC)[reply]

Hebrew conjugation accent edit

I think indicating accent somehow in the Hebrew conjugation tables would be useful, even if it is possible to figure it out without it. Dngweh2s (talk) 16:06, 17 January 2022 (UTC)[reply]

@Dngweh2s I agree, although doing so is non-trivial (to say the least) given the complexity of Hebrew verb conjugation. I am almost done writing a module to do something similar for Italian conjugation (which is currently missing marking of stress and vowel quality), and the module is over 3,000 lines of Lua code. Benwing2 (talk) 20:14, 17 January 2022 (UTC)[reply]
@Benwing2 It would also be useful for the conjugation table to say what subconjugation/pattern it is. This must be possible given that they are generated automatically. Dngweh2s (talk) 20:39, 17 January 2022 (UTC)[reply]

Lojban cleanup edit

Just FYI (not sure if anyone cares), I have done some work cleaning up Lojban lemmas. I think the main contributor to these lemmas is User:Jawitkien, although certain other users have added entries, e.g. User:DefinitionFanatic, User:Sarefo, User:Brantmeierz, maybe others. User:Kc kennylau did some bot work on these entries. None of these users are currently active AFAICT. The only thing potentially controversial that I did is to replace Lojban headers with English ones, consistent with our handling of other languages. I am not familiar with Lojban but from looking at the Wikipedia article, and from the fact that some headers are compounds of Lojban and English (e.g. ==Gismu | Root word==), I made the following substitutions:

  • Lujvo -> Predicate
  • Brivla -> Predicate
  • Rafsi -> Affix
  • Gismu -> Root
  • Cmavo -> Particle
  • Vlakra -> Etymology (Pronunciation in one case)
  • Noun -> Predicate (in one case), Particle (in one case)

Except for the Lujvo/Brivla merger, this doesn't lose information. In that case, there seemed to be no consistency in whether the terms "Lujvo" or "Brivla" were used as headers. Furthermore, apparently a "Lujvo" is just a "Brivla" that is also a compound word, and generally we don't make a header distinction between compound and non-compound words.

Now, granted, "Predicate" isn't quite a standard header (although we do have "Predicative", which is a part of speech e.g. in Russian). But I think it's a lot easier to make sense of than "Lujvo" or "Brivla". I originally thought of using "Verb" but the definitions of these entities are usually more like nouns than verbs, so "Verb" seemed too confusing. Benwing2 (talk) 00:18, 18 January 2022 (UTC)[reply]

BTW if anyone thinks it's important to include the Lojban-language POS's in the entry, IMO the correct way to do that is in the headword. Since there are already headword templates like {{jbo-lujvo}}, this should be easy to do. Benwing2 (talk) 00:22, 18 January 2022 (UTC)[reply]
I thank you again! A previous discussion about Lojban POS headers went nowhere useful. ‑‑ Eiríkr Útlendi │Tala við mig 19:11, 18 January 2022 (UTC)[reply]

Talk to the Community Tech edit

 

Hello

We, the team working on the Community Wishlist Survey, would like to invite you to an online meeting with us. It will take place on 19 January (Wednesday), 18:00 UTC on Zoom, and will last an hour. This external system is not subject to the WMF Privacy Policy. Click here to join.

Agenda

  • Bring drafts of your proposals and talk to to a member of the Community Tech Team about your questions on how to improve the proposal

Format

The meeting will not be recorded or streamed. Notes without attribution will be taken and published on Meta-Wiki. The presentation (all points in the agenda except for the questions and answers) will be given in English.

We can answer questions asked in English, French, Polish, Spanish, and German. If you would like to ask questions in advance, add them on the Community Wishlist Survey talk page or send to sgrabarczuk@wikimedia.org.

Natalia Rodriguez (the Community Tech manager) will be hosting this meeting.

Invitation link

We hope to see you! SGrabarczuk (WMF) (talk) 00:21, 18 January 2022 (UTC)[reply]

Nivkh should be split edit

Nivkh should be split into two languages. The Amur, West Sakhalin, and North Sakhalin varieties are indisputably distinct enough from the East Sakhalin, Central Sakhalin, and (extinct) South Sakhalin dialects to consider them separate languages. The two varieties could be called "Amur Nivkh" and "Sakhalin Nivkh", or "Nivkh" and "Nighvng" (which is "Nivkh" in Sakhalin Nivkh), though the latter is a rarely used distinction. Within these two, we (by which I mean mostly "I") can continue specifying finer dialectal distinctions, like Amur vs. North Sakhalin, and East vs. South Sakhalin. The distinction is often made, it's just often (but not always) made as a dialectal difference. This, however, is simply for the sake of simplicity and tradition.

See the following sources: This talk by Ekaterina Gruzdeva; Austerlitz 1985 ("South-Sakhalin Gilyak... vs. North-Sakhalin and Amur dialects"); Gruzdeva & Janhunen 2018 ("East Sakhalin Nivkh, also known as Nighvng, is spoken in eastern and central Sakhalin"); Gruzdeva 2016 ("...expressing epistemic modality in the Amur (A) and East Sakhalin (ES) dialects of Nivkh"); Gusev 2015 ("Two main dialects of Nivkh are Amur and East-Sakhalin (Krejnovich 1934: 182–183 and Shiraishi 2006: 10–11), cf. the scheme based on Shiraishi (2006: 11): Amur dialect group: Continental Amur, West-Sakhalin, North-Sakhalin; (East-)Sakhalin dialect group: East Sakhalin, Southeastern. Some authors treat poorly documented Nivkh idioms on North Sakhalin (Panfilov 1962: 3) and also on South(eastern) Sakhalin (Gruzdeva 1998: 7) as separate dialectal units"); Shiraishi & Botma 2016 ("In the Amur dialect (incl. West Sakhalin), accent falls..."); Shiraishi 2006 ("Nivkh has two dialect groups, the Amur dialect group and the Sakhalin dialect group. ... Within each group, there are numerous sub-dialects, some of which have not been described or documented to date. The best-described dialects are the dialects spoken on the lower reaches of the Amur River (Kreinovich 1934, 1937, Panfilov 1962, 1965, 1968, etc.) and the dialects on Sakhalin spoken in Nogliki (Gruzdeva 1998, etc.) and Poronaisk (Hattori 1955, 1962a,b, Austerlitz 1956, etc.)").

  • Austerlitz, Robert. 1985. Etymological Frustrations (Gilyak). International Journal of American Linguistics 51(4). 336–39.
  • Gruzdeva, Ekaterina. 2016. Epistemic modality and related categories in Nivkh. Studia Orientalia 117. 171–98.
  • Gruzdeva, Ekaterina & Juha Janhunen. 2018. Revitalization of Nivkh on Sakhalin. In Leanne Hinton, Leena Huss & Gerald Roche (eds.), The Routledge handbook of language revitalization, ch. 45. New York: Routledge.
  • Gusev, Valentin. Some parallels in grammar between Nivkh and Tungusic languages. Journal of the Center for Northern Humanities 8. 63-75.
  • Shiraishi, Hidetoshi. 2006. Topics in Nivkh phonology. University of Groningen.
  • Shiraishi, Hidetoshi & Bert Botma. 2016. Asymmetric distribution of vowels in Nivkh. Studia Orientalia 117. 39–46.

— This unsigned comment was added by Dylanvt (talkcontribs).

Wikipedia agrees with you, although this does not mean we necessarily need to follow suit. Dialect distinctions could be handled with labels, which categorize appropriately, and the various varieties could be added as etymology-only languages, for example. Benwing2 (talk) 06:07, 18 January 2022 (UTC)[reply]
If we do keep Nivkh as one language, how do we deal with things like the differing plurals? For instance, in the entry няӽ, I included the (East) Sakhalin plural няӽкун, but in Amur Nivkh the plural is няӽку. Dylanvt (talk) 18:49, 26 January 2022 (UTC)[reply]
We should split them if they are better treated separately. The existence of dialectal plural forms is not in any way a hurdle; compare {{cop-noun}}, for how this is solved in Coptic. —Μετάknowledgediscuss/deeds 18:57, 26 January 2022 (UTC)[reply]
I am, however, clueless about the technical aspects of Wiktionary. I don't know how to do anything with templates or splitting languages, etc. All I can do is express my view as someone (and probably the only someone on English Wiktionary) knowledgeable about the language. Dylanvt (talk) 03:01, 12 February 2022 (UTC)[reply]

Subscribe to the This Month in Education newsletter - learn from others and share your stories edit

Dear community members,

Greetings from the EWOC Newsletter team and the education team at Wikimedia Foundation. We are very excited to share that we on tenth years of Education Newsletter (This Month in Education) invite you to join us by subscribing to the newsletter on your talk page or by sharing your activities in the upcoming newsletters. The Wikimedia Education newsletter is a monthly newsletter that collects articles written by community members using Wikimedia projects in education around the world, and it is published by the EWOC Newsletter team in collaboration with the Education team. These stories can bring you new ideas to try, valuable insights about the success and challenges of our community members in running education programs in their context.

If your affiliate/language project is developing its own education initiatives, please remember to take advantage of this newsletter to publish your stories with the wider movement that shares your passion for education. You can submit newsletter articles in your own language or submit bilingual articles for the education newsletter. For the month of January the deadline to submit articles is on the 20th January. We look forward to reading your stories.

Older versions of this newsletter can be found in the complete archive.

More information about the newsletter can be found at Education/Newsletter/About.

For more information, please contact spatnaik wikimedia.org.


About This Month in Education · Subscribe/Unsubscribe · Global message delivery · For the team: ZI Jony (Talk), Wednesday 0:56, 01 May 2024 (UTC)

Suggestion: Allowing soft-redirected entries of vocalized words of languages spelt in an abjad edit

Now, searching, for example, the Persian word "مِسواک" ("toothbrush") with the kasra returns search results. I suggest allowing entries such vocalized words that are soft-redirected to the main entry (in this case مسواک#Persian), in a similar manner as the Japanese alternatively spelt words (example:はブラシ, which is soft-redirected to 歯ブラシ). This applies to all languages spelt in an abjad, where short vowels are not usually spelt out, such as Arabic and Hebrew. This would make Wiktionary more convenient to language learners. Jonashtand (talk) 16:20, 18 January 2022 (UTC)[reply]

Undo informal/colloquial merge edit

Following earlier discussion over at User talk:Surjection#Colloquial and informal, this topic seeks to reverse the decision to merge "informal" and "colloquial" terms in Wiktionary:Requests for moves, mergers and splits#Category:Colloquialisms by language and Category:Informal terms by language for the simple reason that a distinction between "informal" and "colloquial" (spoken language) does exist in many languages, including Finnish and Welsh, and merging the two is counterproductive. Informal terms may "fly" in higher registers while colloquialisms will not and strictly belong to a lower, more vernacular, register. — SURJECTION / T / C / L / 15:41, 18 January 2022 (UTC)[reply]

  Support keeping them separate for the reasons given by Surjection. I think the "Could this appear (without quotation marks) in the running text of a newspaper?" is a decent (but not definitive) test to discriminate between colloquial and informal. To give an example in another language, I could very well see German den Stecker ziehen be used in the main body of a newspaper article but not Dingsda or behindert (2). — Fytcha T | L | C 16:04, 18 January 2022 (UTC)[reply]
Even in English there is a distinction. Collins COBUILD (1995) gives a full explanation for its labels and specifically makes it quite clear by reducing the number of syllables in a key label from 4 ("colloquial") to 2 ("spoken"). The spoken label indicates "used mainly in speech rather than in writing: e.g. school kids, whoops". In contrast, the written label indicates "used mainly in writing rather than in speech: e.g. animus, bespectacled,". The informal label indicates "used mainly in informal situations, conversations, and personal letters: e.g. decaf, elbow room." In contrast, the formal label indicates "used mainly in official situations, or by political and business organizations, or when speaking or writing to people in authority: e.g. belated, demonstrable." In COBUILD most terms do not have any of these labels. Also they have many specialized labels including legal, literary, technical, journalism, medical, which are narrower than the previously mentioned labels, though sometimes overlapping or marking a subset of them (eg, legal).
Sadly, we lack the full range of the COBUILD corpus and, more importantly the annotation and software they have to support the labels. We can, however, make do with what we have. DCDuring (talk) 16:19, 18 January 2022 (UTC)[reply]
I I struggle to perceive much difference between "school kids" and "elbow room", although my perspective as an Australian might colour this a little. This, that and the other (talk) 01:12, 19 January 2022 (UTC)[reply]
Their database was UK based and the quoted material is from a 1995 print edition. I also would put them both in informal. In our ever-more-democratic times it may be that informal and colloquial speech can be used in what were once formal settings. I would have once called "Fuck you" colloquial (as well as derogatory); now I'm not so sure. DCDuring (talk) 01:51, 19 January 2022 (UTC)[reply]
  Support, it seems weird we even merged them in the first place. Vininn126 (talk) 16:31, 18 January 2022 (UTC)[reply]
Strongly   Support. Our Finnish entries and the sources they draw upon distinguish two informal registers, and I'm sure Finnish is not the only lect suffering from this decision. The issue of merging the two categories is something that should be tackled separately for each language. If the Anglophone editors decide to support the merger for English terms, I won't oppose, but their language-specific decision should not damage the other languages within this project. As such, the site-wide merger should be reverted and further discussion moved to pages like Category talk:English informal terms. brittletheories (talk) 17:23, 18 January 2022 (UTC)[reply]
  Support. This is part of why I maintain that decisions like those should take all language communities in mind, not just the ones with more editors. It's frustrating to see that happen over and over again. AG202 (talk) 17:28, 18 January 2022 (UTC)[reply]
  Oppose for English, at least. I think the distinction is too fine, and there isn't a convenient way for editors to analyse whether a term is more commonly used in speech or writing. We will just end up with people slapping on either label pretty much randomly. (If there is consensus for restoring "colloquial", then a very clear explanation of when each label is used must be added at "Appendix:Glossary".) — SGconlaw (talk) 18:08, 18 January 2022 (UTC)[reply]
English to my knowledge does not exhibit a similar form of diglossia, but that's not a reason to merge the two for every other language as well. — SURJECTION / T / C / L / 18:55, 18 January 2022 (UTC)[reply]
If use of a term is almost exclusively found in Google Books and News in dialog or quoted speech (or the sports section?) I think that is good evidence that it is "spoke"/"colloquial". We probably have more trouble finding empirical support for the formal label. In any event I don't see why we couldn't analyze what lemmings say. DCDuring (talk) 00:50, 19 January 2022 (UTC)[reply]
Strongly   Oppose a blanket unmerger. I was the one who merged them, based on the fact that for a large number of languages, the distinction is unclear and the terms were (and still are) being used promiscuously. By unmerging them, we'll end up in the same situation as before, where the largest languages will have an artificial distinction made between "informal" and "colloquial" that really doesn't mean anything. Rather than just unmerging, we need a different solution. One possibility is to make the unmerger conditional only in specific languages, but this requires hacking in the label and category code. Another possibility is to come up with a different term for either the "informal" or "colloquial" register. If the idea is the "colloquial" is lower-register than "informal", I'd propose something like "vernacular" in place of "colloquial"/"informal". Russian has a maybe-similar distinction, termed просторе́чный (prostoréčnyj), which we translate as "low colloquial" but "vernacular" (or "popular") would probably work as well. Either way I would prefer seeing terms that are consistent across languages. The nice thing about "vernacular" or "popular" is that neither term is really used to describe English (except AFAIK in the context of AAVE, which is something different altogether), so we are free to define the terms for use with other languages. Benwing2 (talk) 01:47, 19 January 2022 (UTC)[reply]
@Benwing2 "Either way I would prefer seeing terms that are consistent across languages." As much as we can try, I don't think that there's a solution for labels regarding formality usage across all languages. Some languages have up to a 5-or-more way distinction with politeness/formality when it comes to vocabulary, and those should be shown with our labels (ex: French has up to six registers depending on whom you ask, and currently the registers familier and populaire, even some from jargon and argot for some reason, are all grouped together "informal French", which is borderline inaccurate). Also, just because we have the labels doesn't mean that every language has to use them, while at the moment no one can use them properly. Stuff like that should be left up to the communities of editors rather than a blanket solution that stops everyone. AG202 (talk) 02:14, 19 January 2022 (UTC)[reply]
@AG202 Whatever terms we use, I oppose splitting informal and colloquial the way it is proposed, because time has shown people cannot use these terms consistently. "Familiar", "popular" and "vernacular" are all reasonable ways of showing a register that is considered inappropriate for written language and somewhat nonstandard, if that is what is going on in Finnish (as in Russian). Benwing2 (talk) 02:34, 19 January 2022 (UTC)[reply]
@Benwing2 Time has shown that English contributors can't use the terms right. As such, the solution should be to for the English categories. And sure, there could be more languages like English that barely experience diglossia, and a merger is needed with those too. Still, editorial freedom should be the default. brittletheories (talk) 07:21, 19 January 2022 (UTC)[reply]
This is not a problem just for Finnish. The entire reason for the merge seems to boil down to "English editors cannot use these consistently so no language can use them". If this discussion dies down, you can be assured that I will simply undo the blanket merge and let editors for each language decide how to use the labels, because this merge, as it was carried out, was destructive, plain and simple. — SURJECTION / T / C / L / 10:09, 19 January 2022 (UTC)[reply]
The distinction clearly does mean something to some lexicographers. I quoted Collins COBUILD above. Note that they use spoken instead of colloquial (which has two definitions in its entry here). A large number of "interjections" would probably deserve the "spoken" label. DCDuring (talk) 03:08, 19 January 2022 (UTC)[reply]
"vernacular" is an ambiguous term, and there is a risk of confusion; what is being meant is our def 2, but people could likewise think it is supposed to mean our def 3. As someone else already pointed out, I'd argue there's less consistency between the uses of formal and literary, which no one has proposed merging, than there was between informal and colloquial, a split which made complete sense in some languages and discarding of which has been a terrible mistake. — SURJECTION / T / C / L / 09:45, 19 January 2022 (UTC)[reply]
@Surjection I know you feel strongly about this but please do *NOT* simply take unilateral action once "this discussion dies down". That in itself would be very destructive unless there is consensus, and I may well undo you on these grounds. I have proposed some alternatives to address your concerns. You seem to have rejected them out of hand in your zeal to implement your preferred solution, but let me repeat them. Specifically: (1) My preferred solution is to adopt other terminology for the lower-register distinction. Any of "spoken" (as User:DCDuring mentions), "vernacular", or "popular" would do as well, possibly also "familiar". The problem with forcing an artificial distinction between "informal" and "colloquial" is exactly that it is artificial: these two terms are essentially synonymous in English, and we are the *ENGLISH* Wiktionary, so we need to choose terms that correspond to the way that speakers of English use them. I gather that in languages like Finnish and Russian where this register distinction exists, it is between an informal register that is acceptable is some writing, and a lower register that is allowed only in speech (or in quoted dialogue) and is considered in some sense outside the standard language. To me, both "popular" and "vernacular" connote this sort of register quite well, whereas "colloquial" does not at all, considering its normal usage in English. (2) Another possible solution is to hack the label code that handles the labels "colloquial" and "informal" and make them categorize differently in certain languages (Finnish, Russian, Welsh, ...), but the same in other languages. That is technically doable but ugly in a way that I'd very much disprefer having.
Let me ask you again: Why are you so wedded to the specific terms "informal" and "colloquial"? Please take it from a native speaker that these terms do not have a clear enough distinction in English. Benwing2 (talk) 05:53, 20 January 2022 (UTC)[reply]
I'm not. It's just that better alternatives simply do not appear to exist. "familiar" is already taken for something else. "vernacular" is ambiguous, as it is also used to refer to a particular group's vernacular, i.e. slang, jargon or any other kind of idiolect. "spoken" is bad because it implies these terms cannot be written down, which is nonsensical because how else would we document them? "colloquial" is the only term I've seen used to describe this kind of register in an English dictionary of any kind. DCDuring's messages seem to suggest that yes, English dictionaries do indeed use these two terms and maintain a distinction between them. If you have a better proposal, I'm all ears - but all of the "alternatives" suggested thus far are simply worse. — SURJECTION / T / C / L / 10:44, 20 January 2022 (UTC)[reply]
Another point about "spoken": it implies the other standard, i.e. actual standard Finnish, is not spoken (but only written), which is patently false. It is widely spoken in formal contexts. — SURJECTION / T / C / L / 20:17, 20 January 2022 (UTC)[reply]
@Surjection "colloquial" is just as ambiguous and, worse, is a synonym for "informal" in English, which conveys completely the wrong sense. Again, why are you so wedded to this particular word? BTW I also suggested "popular", which I think works quite well for this purpose. As for "vernacular", yes it has a meaning as a noun, referring to a particular group's speech, but this is a usage as an adjective, which is unambiguous. "colloquial" has multiple ambiguous uses *an as adjective*. Benwing2 (talk) 07:32, 21 January 2022 (UTC)[reply]
@Benwing2, if you're aiming for clarity, then "popular" is the least clear for everyday readers (emphasis on English speakers). I mean, if I weren't familiar with how French uses populaire, then I'd be like, "popular to whom? Does popular mean that everyone uses it?" Just look at our definitions at popular, we don't even have a clear definition that really fits what we're aiming for right now. AG202 (talk) 08:47, 21 January 2022 (UTC)[reply]
All terms suffer from some kind of ambiguity, but out of the options presented thus far, "colloquial" is still the least worst option; it was in use for years on Wiktionary and is commonly used (see below, where I posted some statistics) for this kind of phenomenon. Only "spoken" is more common, but carries dangerous ambiguity. "vernacular" is much less common than either of those two and tends to see use mostly as a noun, not as an adjective. (Funnily enough, our adjective entry for vernacular lists "colloquial" as a synonym but not "informal".) It's simply a matter of picking one's poison since it appears most here can agree that there is a distinction worth documenting here, just not on the terms (Russian has low colloquial whatnot). — SURJECTION / T / C / L / 11:16, 21 January 2022 (UTC)[reply]
Another synonym to throw into the mix is conversational, though I don't know if it's ever been used in this context in dictionaries. Chuck Entz (talk) 04:16, 22 January 2022 (UTC)[reply]
  Oppose per Benwing2, Sgconlaw. —Svārtava [tcur] 04:40, 19 January 2022 (UTC)[reply]
  Support per Surjection. --Rishabhbhat (talk) 07:25, 19 January 2022 (UTC)[reply]
  Oppose per Surjection. For the alleged distinction is exactly how I would see the distinction but reversely: Colloquial terms may “fly” in higher registers while informalisms will not and strictly belong to a lower, more vernacular, register. 👏 Therefore I have a few times even labelled “formal colloquial” or intended to do so for colloquial technical terms of jurists, e.g. abheben that they may write in the most formal contexts; it is not informal because it fits the form, and it is not formal because it does not make speech formal, and it will not believed to be jargon as in “police jargon” (will jurists ever admit to use jargon?). You have also in Finnish “law, colloquial”, todistus. English: dissental. If you can’t lastingly make it clear why one label is wrong but not the other, this shows again that the distinction doesn’t exist or is artificial, the same way the non-existence and artificiality of gods is shown by believers varying their understanding. Can’t pinpoint a signification without ambiguity, then don’t make points from the exact distinction which is under dispute.
The vagary uniaxial definition, rejuggling the alleged distribution of the terms’ “informal” and “colloquial” usage between a mere high and low, playing with connotations of other underdefined words like “casual” or “common” has no reliability to satisfy.
I originally suggested to keep the labellings intact to not manipulate the usage as well as make the decision adaptable if a definition comes to light but to merge the categories by reason that without contexts from a bird's-eye view there is no sense in the distinction even if you feel it well enough to distinguish in individual cases of labelling senses or terms. A gloss can be a feeling, a category cannot. Fay Freak (talk) 02:18, 20 January 2022 (UTC)[reply]
@Fay Freak: I think you meant to support Surjection's proposal, not oppose it. Thadh (talk) 02:25, 20 January 2022 (UTC)[reply]
@Thadh: No, I come to the opposite conclusion by swapping the terms within her premises, which then afford an equally plausible distinction, “disproving Surjection per Surjection”. If a distinction and its opposite are both true then it makes no sense. Fay Freak (talk) 03:01, 20 January 2022 (UTC)[reply]
All you did was flip my two terms, which does nothing to disprove the fact that there is a distinction worth documenting here. The rest of the message is a mix of disingenuous argumentation and pointing out individual mistakes in entries (todistus should be informal, not colloquial). Clearly if individual entries get the two mixed up then no distinction may exist, except for the fact that when "colloquial" briefly even displayed as "informal", it instantly made thousands of Finnish entries sound incorrect and misleading. — SURJECTION / T / C / L / 10:44, 20 January 2022 (UTC)[reply]
Exactly that’s all I needed, and it did all to disprove the fact that you all are unable to document a distinction here. Even if “it is worth”, then you don’t know it, and if you know it then you are unable to communicate it to be understood more than by an esoteric circle of readers. (Man, I would have liked to read Gorgias of Leontinoi! To bad the idealist world conspiracy of the Christians hid the facts.) Funny that you continue to argue for keeping them merged: “Clearly if individual entries get the two mixed up then no distinction may exist.” (If it’s irony it has not been successfully purported. But of course it’s I who is disingenuous.) Fay Freak (talk) 16:23, 20 January 2022 (UTC)[reply]
You're free to call an axe a spade and call a spade an axe. None of that means anyone else than you should now consider spades and axes to be synonymous. Just because you choose to not understand what the distinction is does not mean it should not be documented. Otherwise, we should've already tossed away the distinction between formal and literary because some people don't know the difference, and next we might as well throw away the distinction between derogatory and vulgar because I can bet I can find at least one person on the street that doesn't know what the difference is between the two. — SURJECTION / T / C / L / 17:40, 20 January 2022 (UTC)[reply]
I think Benwing makes the salient point that even though there are languages with more than one non-formal register, using the English terms "informal" and "colloquial" to denote this is attempting a distinction the English words don't easily make (looking at how various dictionaries define them, they frequently define colloquial as informal). I also question whether Finnish, Welsh, Russian, etc have the same two non-formal registers as each other, and I think the fact that even people who perceive a distinction in the English terms don't agree on what it is—besides being the reason the labels were merged (because it meant whether an entry was categorized as one or the other was haphazard)—calls into question whether applying those English terms to the other languages is the best approach, as opposed to using more language-specific terminology. But I certainly agree we shouldn't gloss over distinctions other languages make just because English doesn't make the same distinctions. I do think, in general, it'd be helpful if there were a way to tell the module to interpret a label differently for some languages, so that e.g. "Doric" can be interpreted differently if language = Greek vs if language = Scots, and likewise for other cases where we've been frustrated by needing the same label-term for different things in different languages. The same functionality could make "colloquial" or "vernacular" categorize as "informal" if lang = en (since people definitely will use {{lb|en|vernacular}} just as haphazardly), but make it categorize separately if lang = fi. (Obvious question is would this use a lot of Lua memory?) - -sche (discuss) 07:01, 20 January 2022 (UTC)[reply]
Maybe we should stop using “vulgar” in a “distasteful”, ”grossly corporeal” sense. We already use it differently in “Vulgar Turkish”, “Vulgar Arabic”, “Vulgar Latin”. Maybe that’s what they want in Finnish. Fay Freak (talk) 16:23, 20 January 2022 (UTC)[reply]
  • I thought that most English-language dictionaries used colloquial in the way that I do, as a near synonym of spoken. I have not yet reviewed very many, but of the four print dictionaries on premises here:
    MW 2nd (1934) uses "slang", "colloquial" and "formal"
    MW 3 (1993) uses "slang"; also "substandard", and "nonstandard"
    Longmans DCE (1987) use "slang", "informal", and "formal"
    Collins COBUILD (1995) used "spoken", "informal", and "formal"
    MW Online (2022} uses "slang", relegates other matters to usage notes.
    AHD Online (2022} usese "nonstandard", "slang", "informnal" etc.

I couldn't find such information for the online dictionaries of Oxford/Lexico, Collins, Cambridge, Macmillan, WNW/YourDictionary, Wordsmyth, or Infoplease.

Perhaps "colloquial" is now considered too ambiguous in English to be useful as a label, as Benwing says. Later, I'll try a bookstore or library to see what other print dictionaries use for such labels. Personally, I would like the label "spoken" to be available, as it is somewhat useful for English.
-sche's approach seems good to support the distinctions in other languages. Perhaps labels in other languages should be the standard terminology used in English-language works (scholarly linguistics, language-learner material [texts, grammars, English-FL dictionaries]) to characterize the style/register differences under discussion in each language. Personally, I don't see the value of standardization of this kind of label across languages. DCDuring (talk) 17:09, 20 January 2022 (UTC)[reply]
"Colloquial Finnish" is by far the most common term used in literature. Based on some searching, "colloquial Welsh" is likewise the case for Welsh. There are many books written about "colloquial Russian". It seems a logical conclusion that colloquial is the standard term for this phenomenon. — SURJECTION / T / C / L / 18:08, 20 January 2022 (UTC)[reply]
Logical conclusions don't follow from statistical empirical data, let alone anecdotes. DCDuring (talk) 18:33, 20 January 2022 (UTC)[reply]
Sure. Actual data: "colloquial Finnish": 667 hits on Google Scholar. unlike the next two, this term is unambiguous. "spoken Finnish": 1750 hits on Google Scholar, many of which are about Finnish as an oral language, not about the register. I'd estimate the two being equally common contrary to my earlier comment. "vernacular Finnish": 88 hits, practically all of which use vernacular as a noun. Truth be told, even though I prefer "colloquial" over "spoken", I'm fine with the latter. The most important thing is to have the distinction. — SURJECTION / T / C / L / 18:45, 20 January 2022 (UTC)[reply]
I thought you were trying to generalize across languages, since you hadn't responded favorably to -sche's suggestion above. DCDuring (talk) 20:26, 20 January 2022 (UTC)[reply]
"colloquial Welsh": 421 hits, "spoken Welsh": 836 hits, "vernacular Welsh": 140 hits.
"colloquial Russian": 2090 hits, "spoken Russian": 3780 hits, "vernacular Russian": 366 hits.
Even considering ambiguity, colloquial and spoken are about as common, the latter probably even more common for some (most?) affected languages, but "spoken" comes at a risk of ambiguity as I stated earlier (it implies all spoken language is like it, when in reality it is simply a lower register that is common when speaking). Generalizing across all languages is impossible, since not every language exhibits the same phenomenon, but there is still something to generalize here. — SURJECTION / T / C / L / 21:43, 20 January 2022 (UTC)[reply]
@Surjection Collocations don't prove anything here, as you note with "spoken". "Colloquial Russian" probably doesn't refer in most cases to просторе́чный. The hits for "colloquial Russian" in Google (the top hits of which all refer to the book "Colloquial Russian: A Complete Course for Beginners") make it clear they mean "informal" not просторе́чный. Even if you search through Google Scholar, it's hard to find the equation "colloquial = просторе́чный". In fact, the first reference I found that mentions both is [7], which makes it clear that просторе́чие is not the same as "colloquial Russian". This article refers to просторе́чие as "urban substandard", and defines "colloquial Russian" in opposition as "that variety of the standard language spoken by educated city dwellers". This is exactly the point I'm trying to make when I say "colloquial" doesn't cut it. I don't know much about Finnish, but do the terms "substandard" or "low colloquial" apply to the register you are referring to by "colloquial"? Benwing2 (talk) 04:15, 22 January 2022 (UTC)[reply]
BTW I am fine with "spoken" or a further qualified variant being used for a low-register spoken language. Benwing2 (talk) 04:17, 22 January 2022 (UTC)[reply]
I would say no. "substandard Finnish" gets at best a few uses (thus significantly more marginal), while I can hardly find any uses of "low colloquial" when talking about Finnish.
My order of preference remains colloquial > vernacular > spoken >> (no distinction) out of the three main proposals. — SURJECTION / T / C / L / 10:16, 22 January 2022 (UTC)[reply]
@Surjection We seem to be at loggerheads. You are obstinate about wanting "colloquial", my point is that this does not seem to work for languages outside of Finnish. I have proposed a lot of alternatives, you have essentially rejected them all. (And I question whether "colloquial" even applies to Finnish. Can you clarify what exactly are the characteristics of "colloquial Finnish" that clearly distinguishes it from "informal Finnish"? I am still not clear on this, and knowing this clearly might help come up with an alternative term that would be acceptable to you.) BTW please ping me on your responses, otherwise I have a hard time finding them among all the side conversations. Benwing2 (talk) 05:39, 23 January 2022 (UTC)[reply]
Another simple possibility is "low-register"; I see that someone recently added "higher-register" as a label that maps to categorizes like Category:German higher-register terms. Benwing2 (talk) 08:12, 23 January 2022 (UTC)[reply]
@Benwing2 Colloquial Finnish is the lower colloquial register. Informal vocabulary in Finnish tends to be used in more casual texts that would otherwise be written standard language, while colloquial Finnish has a more varied vocabulary using colloquial terms, informal terms as well as grammar and morphology that is somewhat different from standard Finnish. It is commonly mixed in with dialectal features and indeed came about mostly through dialectal diffusion, albeit more naturally than standard Finnish which was intentionally conceived mostly from features that were deemed to be archaic. Using colloquial Finnish in books is rare outside of lines spoken by characters or direct quotations, but yet it is commonly written down in e.g. chatrooms. Standard Finnish is what most books are written in, what newspapers and news websites tend to use and is spoken in formal contexts, such as speeches, but also news broadcasts and so on. Colloquial Finnish tends to be the spoken variety in most everyday situations, but there exists a continuum between standard and colloquial that every speaker tends to unconsciously have their natural point on, yet shift to a higher, more standard register know when they should be speaking more "properly". — SURJECTION / T / C / L / 10:17, 23 January 2022 (UTC)[reply]
I'll also reiterate that I'd much rather have any of the terms proposed up to this point than to not have this distinction just because we can't agree on what term to use. — SURJECTION / T / C / L / 13:24, 23 January 2022 (UTC)[reply]
@Surjection Thank you for pinging me. What you're describing reminds me of what I've read about Common Czech, and is indeed similar to просторе́чный speech in Russian; all three are based on dialectal features instead of a standard that was created from a literary register that is somewhat archaic in speech. Although maybe the Colloquial Finnish you're describing is less-nonstandard in native speakers' eyes than Common Czech or просторе́чный speech; I don't know. For this sort of thing, any label such as "vernacular", "spoken colloquial", "lower colloquial", "lower-register", etc. seems OK to me. The fact that it's written in chatrooms doesn't (in my eyes) detract from the "spoken" label. However, I see now what User:-sche wrote below and I'm OK with essentially what is proposed. We would create a special language-specific label that maps to a category such as Category:Colloquial Finnish. The label should not be colloquial but it could be Colloquial (capitalized) or Colloquial Finnish or similar, and displays as "Colloquial Finnish" which links to the appropriate Wikipedia article. The lowercase "colloquial" would remain as a generic label that is equivalent to "informal", categorizing under Category:Finnish informal terms. The same solution can be adopted for other languages with special lower-register vocabulary/morphology/pronunciation/etc. such as Common Czech. For Russian I'm thinking we should call the label "Low Colloquial" or something, on the principle that we avoid native-language grammatical terms like просторе́чный (which literally seems to mean "simple spoken" or similar). Benwing2 (talk) 18:46, 23 January 2022 (UTC)[reply]
@Benwing2 The lower-case "colloquial" would be meaningless for Finnish (when sources discuss "colloquial" in terms of Finnish, it is practically always used for what I'm referring to) so I don't see why the Colloquial Finnish label could also not be in lowercase, as variety labels are per-language to begin with. It'd also mean that there would be no need for a bot job. — SURJECTION / T / C / L / 19:05, 23 January 2022 (UTC)[reply]
@Surjection That means carving out a special exception for Finnish in the generic label code, where "colloquial" and "informal" are synonyms everywhere but Finnish, which I'd really rather not do. Furthermore, since this is a specific sociolect, IMO it should be capitalized to make that clear. Benwing2 (talk) 19:17, 23 January 2022 (UTC)[reply]
@Benwing2 There shouldn't be any special exceptions. The module should support per-language labels and prioritizing them over generic labels - if it doesn't, that just means it needs to be refactored, which is probably the case either way. The "colloquial" in "colloquial Finnish" is not usually capitalized. — SURJECTION / T / C / L / 19:21, 23 January 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Surjection I'm sorry, I oppose this because IMO it is too confusing. Almost all subvarieties in Module:labels/data/subvarieties are capitalized, including the label Colloquial Singaporean English (an alias for Singlish). I really don't want to go down the path of having individual languages hijacking generic terms for their own uses. Benwing2 (talk) 19:29, 23 January 2022 (UTC)[reply]

@Benwing2 I don't think it's that much less confusing for editors than having Colloquial (as you proposed) be different from colloquial. As I said, if we distinguish the two, any uncapitalized colloquial labels on Finnish entries would simply be errors, which I would argue is even more confusing. — SURJECTION / T / C / L / 19:38, 23 January 2022 (UTC)[reply]
@Surjection Apologies, I didn't get your ping. (Did you add or edit the ping after adding your signature?) We disagree over what is considered confusing. My reasoning is this: If we are treating "colloquial" = "informal", we should do that everywhere and not make a special exception of this for Finnish. If we extend what you want to other languages, we'd end up with a mismash of special per-language exceptions where certain labels are treated specially in certain languages and can't be used the way they are used in other languages. This seems a bad idea to me. After all, there is a colloquial (= informal) standard Finnish, which is different from capital "Colloquial Finnish". There is no support in Module:labels for making special language-specific exceptions for particular labels, and I argue this is for good reason. If you need me to do a bot run to change lowercase "colloquial" to capital "Colloquial", or to some label even clearer like "Colloquial Finnish" or an abbreviation thereof (compare the "AAVE" label), I can easily do that. Benwing2 (talk) 06:35, 25 January 2022 (UTC)[reply]
@Benwing2 "colloquial Finnish" is "Colloquial Finnish". As I already wrote, the term is not usually capitalized. Basically all uses of "colloquial Finnish", whether capitalized or not, refer to "Colloquial Finnish". This is why it'd be confusing for editors if colloquial and Colloquial somehow mean different things with only the latter actually being correct. As for the per-language label support, that is no longer true after I refactored the label data to split the subvariety data down by language. Since subvariety labels are supposed to have higher priority than regional labels (as the documentation stated even before I did my refactoring), it was necessary to make per-language labels have higher priority than general ones. The only difference now is that there are two tables for general and language-specific labels, while in the past they were all in one table. Now it would be perfectly possible to override colloquial for one language only. — SURJECTION / T / C / L / 10:53, 25 January 2022 (UTC)[reply]
@Surjection Unfortunately you haven't convinced me of the correctness of overriding colloquial (or any similar label) for a single language. I thought about it some more and I don't really like capital Colloquial either as a label; I much prefer something like Colloquial Finnish that is unambiguous. But Colloquial capitalized to me is a lot less bad than overriding a common cross-linguistic label like colloquial and giving it a special language-specific meaning. As I mention elsewhere, this is opening a serious can of worms, whether or not you have implemented the support for it. At this point we have a difference of opinion that I don't see how we can easily resolve; you don't seem willing to budge from your viewpoint. Benwing2 (talk) 07:05, 26 January 2022 (UTC)[reply]
Yes, I want to make it clear that I'd still like an unmerger of informal & colloquial, and would personally be ~okay~ with "vernacular". "Spoken" isn't horrible, but isn't used enough imho. @Benwing2 AG202 (talk) 19:26, 23 January 2022 (UTC)[reply]
@AG202 And I am still opposed to an unmerger of these terms. Perhaps low-register or low register is the least confusing term if you need a generic additional register term. Benwing2 (talk) 19:32, 23 January 2022 (UTC)[reply]
"Substandard" would be misleading, as being colloquial does not mean that it's not standard. Also, see the dated label at substandard. AG202 (talk) 15:28, 22 January 2022 (UTC)[reply]
If Finnish in particular needs "colloquial" specifically, for "Colloquial Finnish", one idea is to have "(Colloquial Finnish)" as a label (specific to Finnish like "Doric" is Greek-specific). If Wikipedia is correct that "Colloquial Finnish" is "the standard colloquial dialect" and regional-dialect terms that are "colloquial" in the general sense are not "Colloquial Finnish", a label linking specifically to that article might be more appropriate than the general "colloquial" label would be. For Russian "low colloquial", maybe {{lb|ru|low colloquial}} should be the label, again linking to an explanation (that label too could be made language-specific if necessary). For other languages, like French, if it's not possible to find good English terms to denote one or more registers, we could always consider just using the language's own term(s) like populaire, linked to an explanation. We already do this in several situations, like Hebrew verbs are labelled pa'al and entire categories of entries are called by the Japanese terms Category:Wasei kango, Category:Wasei eigo. While I completely agree with AG202 (below) people shouldn't enforce a merger of registers in other languages they don't speak, I also think when it comes to how to label the registers, we should try not to invent idiosyncratic distinctions between English terms. - -sche (discuss) 06:54, 23 January 2022 (UTC)[reply]
I'll gladly create a "colloquial Finnish" label. Since subvariety labels have higher priority, this shouldn't even require mass-replacing all labels in Finnish entries, but I'd rather have a cross-language solution. w:Colloquial Finnish using the word "standard" in "standard colloquial dialect" is somewhat misleading, as colloquial Finnish is by no means standardized and varies by region, even by speaker, and regional and dialectal features are commonly mixed in to everyone's own variety of it. — SURJECTION / T / C / L / 10:17, 23 January 2022 (UTC)[reply]
Fair point about "colloquial" being the most common term used to denote some other languages' registers, despite the ambiguity in English. Bleh, I don't want to stop another language from making a customary distinction just because English doesn't make it, if we can't come up with a better term... but if we do split these labels (for any language) our glossary entries need to make the difference clear. (Perhaps attempting to write distinct definitions now would let us check whether the distinction we imagine really fits all the different languages it's proposed to apply it to, and (ideally) meets our criteria for inclusion in the main-namespace entries on colloquial and informal.) Maybe once we define a distinction, we would benefit from systematically reviewing English terms which use one or the other, too. (It's not like it's the only case where labels are liable to be haphazardly interchanged on English or other entries: there's not a hard-and-fast distinction between "uncommon" and "rare", and there's no maintained distinction between "regional" and "dialectal", so we're always going to have to monitor the usage of labels, whether that includes these labels or not.) - -sche (discuss) 02:18, 21 January 2022 (UTC)[reply]
No point, since it was only about what attribute is used together with the term for the whole language rather than in reference to lexical items. The more so since the matter is about titles of books and catchwords: at that point it is a meme. Language learning books are just titled a certain way. It’s not different from YouTube clickbait, spammers know what they have to use, and academia is also rigged at least in so far as the titles of studies are concerned—gotta get the scarce attention somehow, and perhaps “colloquial” is the more high-falutin term that makes you sound more intelligent, while informal seems like something unscientific and not förderwürdig. Of course this is an argument that it is all the same ting, like scent and perfume and stuff. Fay Freak (talk) 03:38, 21 January 2022 (UTC)[reply]
Most good English single-language print dictionaries say what they mean by the labels, symbols, abbreviations they use. (Online dictionaries, not so much.) That, in turn, means that they expect their lexicographers to apply all those consistently. They also seem to avoid items subject to confusion in dictionary use. They do take the trouble to try to provide users with this kind of information despite the fuzziness of category boundaries. Unfortunately, we don't have any good means of enforcing standards, even within each language we cover. The more rules we have the less likely we are to get contributors. We have the extra problem of trying to accommodate multiple languages. Fortunately we have technology to help us. DCDuring (talk) 14:11, 21 January 2022 (UTC)[reply]

I oppose unmerging English at least. Also my preferred term for English is "informal". I agree with -sche's suggestion to define the terms we're going to use and update the glossary before implementing anything. In Romance languages we could combine informal/colloquial/popular into one category but maintain a formal-informal distinction for verb forms and pronouns, chiefly. Ultimateria (talk) 04:06, 22 January 2022 (UTC)[reply]

@Ultimateria I'd oppose a merger for French for the reasons I've mentioned above. They're distinct registers. AG202 (talk) 04:25, 22 January 2022 (UTC)[reply]
@AG202: The Trésor de la langue defines (roughly) familier as language used among family and friends, and populaire as language used by the lower classes. The distinction is interesting but probably meaningless in practice. Does the bourgeoisie avoid popular words with their friends and family? Do poor French people not use formal language if they go to university? IMO it all amounts to formal-informal situations. Ultimateria (talk) 17:40, 22 January 2022 (UTC)[reply]
@Ultimateria It's not meaningless if they're generally considered and taught as different registers, and actually yes, the bourgeoisie are often said to avoid popular words even with friends and family; I remember distinctly reading about that in my French Literature courses and the dynamics that it can create. I'd recommend the book Diversité : La nouvelle francophone, if you want some examples of the differences. For more info on the registers, niveaux de langue, Les registres (ou niveaux) de langue, the French Wiktionary, LES NIVEAUX DE LANGUE, and more. Sure some words might fluctuate between the two depending on the source, but it's a distinction that's French Sociolinguistics 101, so I'd really recommend reading into it more before making a sweeping policy and take the suggestions of editors more familiar with the language. We gotta stop bringing assumptions from the languages we're comfortable with into policies that cover other languages, as that's what got us here in the first place. AG202 (talk) 18:27, 22 January 2022 (UTC)[reply]
I created Wiktionary:Votes/2022-01/Label for lower register and set it to start two weeks from now (rather than one week) to allow discussion on the voting procedure I went for. — SURJECTION / T / C / L / 09:58, 24 January 2022 (UTC)[reply]
@Surjection, Fytcha, Benwing2: A question regarding the vote. If it does not pass, would the status quo obtained by the RFM discussion of the two labels being merged be reinstated fully rather than the half-half status (labels unmerged, categories merged), along with language specific label(s) like "Colloquial Finnish", etc.? —Svārtava (t/u) • 04:35, 2 March 2022 (UTC)[reply]
@Svartava, Surjection What is the current status quo? I haven't been looking to see exactly what changes Surjection unilaterally made. Cross-linguistically, informal and colloquial should stay merged, but I am OK with adding language-specific labels to express low-register terms, although I don't think colloquial is the right word for reasons I've expressed before. Benwing2 (talk) 07:23, 4 March 2022 (UTC)[reply]
@Benwing2: I was referring to Special:Diff/65378133/65378896 where the colloquial-informal labels were unmerged, reverting to the state of inconsistency. The "current status quo" should be the one we got from RFM (i.e. to merge them), and should this vote fail to change it, I think those labels should be re-merged. —Svārtava (t/u) • 08:01, 4 March 2022 (UTC)[reply]

Should we have entries for Turkish predicative forms? edit

A brief lesson in Turkish grammar. As is well known, Turkish is a very synthetic and highly agglutinative language. Pinker gives the example şehir +‎ -li +‎ -leş +‎ -tir +‎ -eme +‎ -dik +‎ -ler +‎ -imiz +‎ -den +‎ -siniz.[8] While probably constructed for the purpose of illustration, this word is not unnatural. It is what one would expect a Turkish speaker to utter when saying, in Turkish, “you are one of those whom we can’t turn into a town dweller“. The message may be contrived; its expression as a sentence is not. Clearly, we should not attempt to list all possible Turkish words that can be synthesized. Listing those that can be attested would result in a completely haphazed collection.

Turkish has no copular verb like the English be. Any noun phrase or adjectival phrase can be used as a predicate, and then assumes a predicative form (which can be the same as the nude predicate: “makine bozuk ” is an acceptable complete sentence for saying “the machine is out of order”). Enclitics added to the predicate indicate the person, but the third person allows a null form. In the film İmparatorluk Geri Dönüyor, the Turkish version of The Empire Strikes Back, Darth Vader says, “Hayır, ben senin babanım” – “No, I am your father”. Here, babanım is the first-person predicative form of baban (“your father”).

We do not have an entry for baba +‎ -n +‎ -ım = “father” + “of you” + “I am”, nor should we; the possible cases are endless, like ben dünyanın en kötü babasıyım - “I am the world's worst father”, which should be parsed as [ben dünyanın en kötü babası ] + -(y)ım, not as [ben dünyanın en kötü] [babasıyım]. Pinker’s şehirlileştiremediklerimizdensiniz is a predicative form. The question now is:

Should we have entries for Turkish predicative forms, and if so, which ones (and why those)?

(The question arose at RfD. For comparison, for Italian we do not list bevimi = bevi +‎ -mi and mangiami = mangia +‎ -mi (and countless other similar imperative + enclitic forms), even though attestable.[9] We do list some, though, like amami, but not amalo,[10] so the selection appears to be haphazard.)  --Lambiam 13:17, 19 January 2022 (UTC)[reply]

  • We should allow them at least in cases of homographs within Turkish, and probably in cases of homographs with other languages. I recall seeing a mention of not allowing Latin forms suffixed with -que, forcing readers of the Aeneid to split virumque on their own. Vox Sciurorum (talk) 18:36, 23 January 2022 (UTC)[reply]
    For all nouns ending in consonants, the first person singular predicative form is always a homograph with the first person singular possessive, but I guess you meant more like "homograph with a different lemma or form of a different lemma". — Fytcha T | L | C 19:16, 23 January 2022 (UTC)[reply]
I think the perfect solution on a perfect Wiktionary would be to have something like a word decompiler built into the search engine so that searching for a word (word here (incorrectly) defined as that which is separated by spaces in the predominant orthography) like babanım, the user is presented with a virtual page containing all possible decompositions and maybe even synthesized English equivalents. The decomposition software alone would actually not be that hard to write but I think the MediaWiki software itself doesn't support such virtual pages.
For the time being, I agree that we shouldn't allow entries from a limitless class of words to be added ad nauseam. This extends even further than just predicative forms, I also don't want to include words such as çalışıldığımı etc.
Pinging also @İtidal, Sabri76. — Fytcha T | L | C 19:31, 23 January 2022 (UTC)[reply]
The Perseus website has such a word decompiler for Latin and Greek, called a “word study tool”. Here is its application to the word ὑδάτων. If the MediaWiki software does not allow this, we may perhaps host this elsewhere and make it a selection option next to  MediaWiki search  Google  Wikiwix  Bing  Yahoo.  --Lambiam 12:04, 26 January 2022 (UTC)[reply]
The more I think about it, the more I like my idea with the virtual pages. They could not only be used for such aggultinated "words" but also, say, for e.g. 15763 which would then present the user with the spelled-out forms in various languages. I'm not sure about realizing this as a new search engine, my thinking was more along the lines that they should seamlessly integrate into the rest of Wiktionary as though they were actual, real pages. It could be implemented as a global Lua module that would be invoked as a fallback if a search query doesn't return a page. Unfortunately, I just about missed the deadline of the community wishlist with this, but honestly, my hopes for them implementing this are 0 anyway. When/if I finally write my custom Wiktionary front-end (because the web interface lacks a lot of very basic features), I'd also implement a proof of concept for this idea. — Fytcha T | L | C 15:48, 27 January 2022 (UTC)[reply]
How about this? "Predicative and possessive forms of Turkish nouns and adjectives may be included as separate non-lemma entries when (1) they are homographs of an existing term in any language, meaning the page would exist anyway, or (2) the form is especially commonly used, like arasında and canım. Otherwise they may be listed in an inflection table on the lemma form." Vox Sciurorum (talk) 16:08, 25 January 2022 (UTC)[reply]
Also if the form is irregular (benim, suyum), or one of a set of homographs of inflected forms of terms with different etymologies, such as elması.  --Lambiam 12:17, 26 January 2022 (UTC)[reply]
Is there consensus to add a paragraph along these lines to WT:ATR? Also pinging @Djkcel, Vox Sciurorum.  --Lambiam 16:18, 27 January 2022 (UTC)[reply]
Yes, I think that looks good. When in doubt, would they take the non-lemma to the Tea Room? DJ K-Çel (contribs ~ talk) 17:38, 27 January 2022 (UTC)[reply]
I added some advice to Wiktionary:About Turkish. I hope we don't get to arguing over individual words. I wouldn't say anything unless there was a pattern, like a bot on a mission to turn black links blue. Vox Sciurorum (talk) 21:46, 27 January 2022 (UTC)[reply]
There is some current arguing (or perhaps lack of arguing) over individual words at WT:RFDN § taklitçi olmak and WT:RFDN § sözcüyüm.  --Lambiam 09:06, 28 January 2022 (UTC)[reply]
I propose to add to Wiktionary:About Turkish "Words formed by adding the suffixes -dir (is), -ki (possessive marker), and -le (and, with), in the senses given, and their alternative forms should not normally be included." This will cover taklitçidir. Vox Sciurorum (talk) 14:33, 28 January 2022 (UTC)[reply]

Rhymes and hyphenation in affix entries edit

What are your thoughts about providing rhymes and hyphenations at affix entries (especially prefixes and infixes)? Thadh (talk) 15:47, 19 January 2022 (UTC)[reply]

Anything other than a final suffix that always takes the accent is incompatible with our system. That said, hemidemisemiquaver comes to mind as an example of a word with rhyming prefixes. Chuck Entz (talk) 15:56, 19 January 2022 (UTC)[reply]
What if someone wants to write poetry about prefixes, treating them as if they were words in and of themselves? A bit esoteric I know, but plausible. Vininn126 (talk) 16:07, 19 January 2022 (UTC)[reply]
I’m sure if a poet wanted to do that they wouldn’t need the Wiktionary’s help… — SGconlaw (talk) 11:28, 20 January 2022 (UTC)[reply]
Perhaps not, I was just trying to propose a potential situation in which such information might be useful. Vininn126 (talk) 18:17, 20 January 2022 (UTC)[reply]
As far as hyphenation, I see no reason to treat affixes which regularly hyphenate a certain way differently than anything else that does, so I think it'd be just as fine to indicate hyphenation on e.g. -phobia as on phobia. For rhymes, as Chuck says, including rhyme info for anything other than a final suffix that always takes the stress/accent won't normally be possible (just like we have trouble notating in etymologies when a suffix includes a feature like "umlauts the stem", which a few dictionaries notate like -̈er). Including it for prefixes, eh... it's low utility, but is it actually wrong to say demi-, semi-, demi, and semi rhyme? ... - -sche (discuss) 02:36, 21 January 2022 (UTC)[reply]
@Thadh, -sche Why not implement alliteration? For Finnish, I'd see that as being as important as rhyme info because the vast majority of traditional Balto-Fennic poetry is alliterative. brittletheories (talk) 09:25, 21 January 2022 (UTC)[reply]
I actually think that we need to make a system for languages which don't use rhymes in the "traditional" (i.e. Germano-Italic) way, but that's a story apart. Thadh (talk) 12:14, 21 January 2022 (UTC)[reply]
For reference, here's the first verse of the first poem in Kalevala:
Mieleni minun tekevi, aivoni ajattelevi
lähteäni laulamahan, saa'ani sanelemahan,
sukuvirttä suoltamahan, lajivirttä laulamahan.
Sanat suussani sulavat, puhe'et putoelevat,
kielelleni kerkiävät, hampahilleni hajoovat.
brittletheories (talk) 09:29, 21 January 2022 (UTC)[reply]
In languages with phonetic spelling at least, alliterations can already be found by using the category pages, which have built-in alphabetical sorting. (I don't think this is that good of an argument against listing alliterations, although there may be other ones. I'm just trying to provide some context on how this goal can already be achieved.) 70.175.192.217 09:37, 21 January 2022 (UTC)[reply]
For Old Norse, toto, stave-rhyme/alliteration was more used (and so would be more useful) than end-rhyme, and for my part I'd be fine with adding it, but it does seem like a separate issue (doesn't just affect affixes). There are interesting language-specific quirks to it, too, like in Old Norse all vowels stave-rhyme with each other, but st and sk don't stave-rhyme with each other. - -sche (discuss) 21:24, 24 January 2022 (UTC)[reply]

Attestation of verbs vs. participles edit

In general, if Xing and Xed are both attested, is that sufficient to justify the creation of a verb X? If not, can uses of participles count in some way toward a verb, or must all citations of the verb be in unambiguously non-participle-based forms? What about cases where it's unclear whether the participles are being used in verby or adjectival ways? 70.175.192.217 01:22, 22 January 2022 (UTC)[reply]

I don't remember that we've formally addressed this. Obviously it would be nice to show that, say, Xs and to X usage existed. But, if X is also a noun, then it could be tedious separating the noun instances from the verb instances. I would be inclined to accept attestation of both Xing and Xed as sufficient, especially if there are three citations for each definition and three citations for both of those forms among all the definitions and neither was attestable as a true adjective in their early use. DCDuring (talk) 02:44, 22 January 2022 (UTC)[reply]

Question about the Affiliates' role for the Call for Feedback: Board of Trustees elections edit

Hi All,

Thank you to everyone who participated in the Call for Feedback: Board of Trustees elections so far. The Movement Strategy and Governance team suggested another question was still under discussion. As of today, we announce the last key question:

How should affiliates participate in elections?

Affiliates are an important part of the Wikimedia movement. Two seats of the Board of Trustees due to be filled this year were filled in 2019 through the Affiliate-selected Board seats process. A change in the Bylaws removed the distinction between community and affiliate seats. This leaves the important question: How should affiliates be involved in the selection of new seats?

The question is broad in the sense that the answers may refer not just to the two seats mentioned, but also to other, Community- and Affiliate-selected seats. The Board is hoping to find an approach that will both engage the affiliates and give them actual agency, and also optimize the outcomes in terms of selecting people with top skills, experience, diversity, and wide community’s support.

The Board of Trustees is seeking feedback about this question especially, although not solely, from the affiliate community. Everyone is invited to share proposals and join the conversation in the Call for Feedback channels. In addition to collecting online feedback, the Movement Strategy and Governance team will organize several video calls with affiliate members to collect feedback. These calls will be at different times and include Trustees.

Due to the late addition of this third question, the Call will be extended until 16 February.

Join the conversation.

Best,

Movement Strategy and Governance --Mervat (WMF) (talk) 15:11, 22 January 2022 (UTC)[reply]

Betawi: language or dialect? edit

@-sche, Metaknowledge I hope I have pinged the right people. I came across errors in the page kerupuk and tried to clean them up. In the process I notice we have essentially duplicate entries under two languages, Betawi and Indonesian. The Indonesian entry includes two not-obviously-related definitions, which are the same two definitions listed under Betawi, and the Indonesian entry is furthermore tagged with the "Betawi" label. This suggests we have confusion over whether Betawi is a language or a dialect of Indonesian. The Wikipedia entry (Betawi language) is similarly vague. I have no opinion here other than to note that both options of handling Betawi as a separate L2 entry or under Indonesian with the appropriate label are viable, and the latter option does not necessarily take a hard stand on whether Betawi is a language or a dialect (compare what we do with "Chinese"). Benwing2 (talk) 21:15, 22 January 2022 (UTC)[reply]

@Benwing2 and @Austronesier, Xbypass. The mess at kerupuk was perpetrated by @Indigenouswikicom, so the current state of the entry shouldn't be read as indicative of anything resembling normal practice. They were making a mess of lots of entries up to the time they were blocked from mainspace (see WT:Etymology scriptorium/2021/January#langit for details), but Betawi has been treated as a language here for a decade and a half. That part, at least, isn't their fault. Chuck Entz (talk) 22:10, 22 January 2022 (UTC)[reply]
@Chuck Entz Thanks. I can't find any reference to langit or User:Indigenouswikicom in Jan 2021 ES, is there some other page you meant? Benwing2 (talk) 22:54, 22 January 2022 (UTC)[reply]
@Benwing2: The correct link is WT:Etymology scriptorium/2022/January#langit. —Μετάknowledgediscuss/deeds 00:01, 23 January 2022 (UTC)[reply]
And it's an elucidating discussion. I'd say continue to treat Betawi as a separate language (as we were doing, as Chuck says) until there's a reasoned argument / evidence to do otherwise from someone who knows more about what they're talking about than Indigenouswikicom. - -sche (discuss) 22:12, 23 January 2022 (UTC)[reply]
@Benwing2, Metaknowledge, Austronesier This was my question during problem during such proposal (ie. this 2018 proposal) about the definition of Malay language which sometime politically influenced as these languages are dialect continuum (as I explained in this picture). The status quo is those languages treated as separate languages.--Xbypass (talk) 02:01, 24 January 2022 (UTC)[reply]
I would prefer to keep Betawi as a distinct language, since the basilect as spoken by ethnic Betawis is lexically and structurally quite distinct from Standard Malay/Indonesian. This whole thing is however tricky because Betawi also serves as basilect for Jakarta-based colloquial Indonesian, forming a register continuum that goes up all the way to standard Indonesian. There is some vocabulary that is largely confined to the Betawi community, so these should only get a Betawi lemma, while other words (and also pronunciation variants) that are used in colloquial Indonesian should further be listed as Indonesian lemmas, labelled Jakarta (if only used in Greater Jakarta) or colloquial if widely used and accepted in all other parts of Indonesia. It's a bit like Scots lemmas which also are listed as English lemmas with label Scotland. –Austronesier (talk) 12:13, 24 January 2022 (UTC)[reply]
Yes, I prefer to keep all as distinct languages for easier management. –Xbypass (talk) 01:47, 25 January 2022 (UTC)[reply]

“rare” versus “uncommon” edit

I do not understand why we have two separate categories for rare or uncommon terms/senses: Category:Terms with rare senses by language, Category:Terms with uncommon senses by language. What’s the rationale for keeping different labels? ·~ dictátor·mundꟾ 21:21, 23 January 2022 (UTC)[reply]

Rare is even rarer-er than uncommon. Rare means not many speakers would know it, uncommon means that people might know it, but don't often use it. Vininn126 (talk) 22:04, 23 January 2022 (UTC)[reply]
Before ~2012, "uncommon" redirected to "rare"; I don't recall if I was the one who split them or if someone else did, but see Wiktionary:Beer parlour/2012/March#What_does_rare_mean? (there was also another discussion, with Dan Polansky IIRC, which I can't find offhand). Theoretically, "rare" is stronger than just "uncommon", but in practice both and ad-hoc qualifications like {{lb|en|very|rare}} / {{lb|en|somewhat|rare}} are applied without a consistent standard, and I see things labelled "rare" that I might label "uncommon". Would it be better to merge them or to set quantitative criteria, like "[for English / WDLs / major languages] a word with only ≤N cites is rare", "≤X cites is uncommon", "a spelling 1/Yth as common as the main spelling is very rare"? - -sche (discuss) 22:08, 23 January 2022 (UTC)[reply]
I like them separated and I'd even be in favor of creating a third category for very rare terms. Compare (uncommon) Lybien, (rare) eulenspiegelhaft and (very rare) Umdreherin. The two most important digitally-available German monolingual dictionaries (DWDS and Duden) contain much more granular frequency data than we do. — Fytcha T | L | C 22:26, 23 January 2022 (UTC)[reply]
@Fytcha Definitely opposed to creating any more such categories, and I'd actually rather merge "rare" and "uncommon", since it's clear that they aren't being used at all consistently. Benwing2 (talk) 05:43, 24 January 2022 (UTC)[reply]
No objection to a merger of “uncommon” and “rare”. Another thing I’d point out is that the rarity of a word changes over time. A word that was common in the 17th century can become rare in the 21st century. However, I would like to see “rare” confined to terms that have consistently been of low usage; a word which has declined in popularity should be labelled either “archaic” or “obsolete” as the case may be. We should thus not use “now rare”. — SGconlaw (talk) 10:32, 24 January 2022 (UTC)[reply]
@Benwing2: I oppose the argument that features should be removed because some editors misapply them. As I pointed out, German dictionaries record this information very meticulously ([11], [12] ("Häufigkeit:")) so if anything, we should improve our coverage of frequency data / categorization instead of doing away with it. It's also trivial to define some hard cutoffs (these could even be applied by bot) but I prefer leaving the editors some discretion (because the dictionaries likely underestimate the frequency of colloquial terms or neologisms for instance). — Fytcha T | L | C 13:43, 24 January 2022 (UTC)[reply]
If some language editors feel that the distinction is noteworthy in certain languages, then I think those languages could be exempted from the merge of the labels. ·~ dictátor·mundꟾ 15:55, 24 January 2022 (UTC)[reply]
@Fytcha By your argument, let's make 10 shades of uncommonness, each with its own separate category. Benwing2 (talk) 04:21, 26 January 2022 (UTC)[reply]
@Benwing2: Not really, I've argued for 2->3, not for N->N+1. Of course, they will cease being meaningful at a certain point (Duden only provides 5 frequency levels for instance; the graphs on DWDS would probably allow a bit more than that but then we'd have to define hard cutoffs, which I dislike as stated), but having just 3 less-than-common frequency categories is not only well within the reach of standard references (as I've sufficiently proven), it's also what easily tracks with Sprachgefühl. That said, I won't go out of my way to create the very rare category (I agree that people already misuse / underutilize the two we have), I just oppose the merger. — Fytcha T | L | C 04:46, 26 January 2022 (UTC)[reply]
@Fytcha Arguably (*maybe*), we could define different numbers of frequency levels for different languages, but IMO this sort of thing is a can of worms. Before accepting extra frequency levels in German, I'd need to see evidence that *all* dictionaries (not just Duden) are consistent and in agreement on these levels. In Italian, for example, dictionaries make various distinctions like "rare" vs. "uncommon" vs. "less common" but different dictionaries are in total disagreement as to which label to apply to a particular word, making the distinction basically meaningless. Another data point: dictionaries in many languages (e.g. Russian, Italian) do not consistently distinguish "archaic" from "obsolete" from "dated", even though in my mind there is a clear distinction between "archaic" and "obsolete" that is much less murky than "rare" vs. "uncommon" (archaic terms are still in use in specialized registers and are generally recognized by native speakers, while obsolete terms are completely out of use and generally not recognized by native speakers). So the fact that we have this distinction is a pain in the ass for these languages; unless you are a native speaker you more or less have to guess which label applies. Benwing2 (talk) 04:57, 26 January 2022 (UTC)[reply]
What Vininn126 said. The distinction applies to every language and is so with a consistent meaning; the quality of application varies by acquaintance of editors with the language, which is natural and acceptable, as they wouldn’t excel more at maths and corpus linguistics, and the distinctions between these labels is frankly not a core issue of our dictionary content, actually we want quotes to show usage but we can’t pay the effort.
What I find more complicated is if rarity mixes with foroldedness. The distinction which Benwing2 sees for “archaic” and “obsolete” I see too – it seems also similar to the frequency distinction in so far as it uses to be defined by people possibly knowing, which is actually a statistical likelihood in either case: even possession as defined by law is a statistically likelihood, of a person being able to exercise power over a thing, and one can’t avoid debatable gray areas, but now the two blurs combine and one sees the question arise which historical corpora are to be considered for something being deemed rare. It may be that a word is more frequent in a very specific corpus which consists of obsolete texts which one wouldn’t read if not specialized in a historiographic field so that the term rare is if the general corpus of obsolete texts is considered, while one is less likely to call a term rare if one is sure that currently a very specific field drops the term with regularity because one does not want to other the very specific field as outside general use, because general use doesn’t really exist while general reading of literature of the past exists, also as a function of progressing specialization of individuals over the history of humans, everyone being more and more in his own rabbithole and bubble and echochamber unlike at times when people went out and conversed with everyone in the village.
This is all a complicated way to say that the concept of rarity has a paradigm shift in its meaning depending on which times with which language access or media exposure the label is applied upon, and also raises questions concerning whether it is constructed with an ex ante or ex post (e.g. from what has remained and is prominent from a time) view: was it actually rare, as perceived by people from that time, or is it rare in the kinds of texts that remain? The smaller a corpus of a dead language is the rarer we will certain enough about either, of course, to ever use the rare or uncommon label.
Then again, in those dead languages, one wants to see “rarity” as relative to other terms, this is how I understand Category:Akkadian terms with rare senses therefore; currently it has four entries, four of which are rare spellings —actually should be in Category:Akkadian rare forms, which failed to happen as you could not use {{sumerogram of}} and {{rare spelling of}} combined, we have to add to the modules a label “rare spelling” displaying as “rare” but categorizing differently; but this labelling of spellings as rare shows how detached the uses of the “rare” label can be from actual, absolute, usage, only having meaning relative to attestations in the few available texts, which is distinct from absolute rarity in a corpus in so far as the corpus of that dead language is representative of no field
In sum rarity within a corpus can be devised intertextually or not, and then it can be posited as having been a property independent of the corpus, and this judgment principally has an ex ante or ex post modifier, plus a modification of the meaning of the rarity label due to the fact that we apply a word of today to circumstances of a past where people had different exposure. This is all in addition to the fact that rarity or uncommonness label was vague in the first place, i.e. for the well-known language of today which we do not see through filters that much but even give original interpretation to. Fay Freak (talk) 05:59, 26 January 2022 (UTC)[reply]

"Norwegian" lemmas edit

We have 1,739 lemmas in Category:Norwegian lemmas. Personally I think Wiktionary should merge Bokmal and Nynorsk but the Norwegian editors seem opposed to it, so what do we do with these lemmas? Most of them are proper names created awhile ago, but some have been created recently. Examples:

This leads to these questions:

  1. Why is there a "no" language code if it shouldn't be used?
  2. Should we create an abuse filter to block creation of new L2 Norwegian entries?

BTW it might be possible to bot-convert all the proper name entries to Bokmal and/or Nynorsk, if that is desired; someone would have to specify exactly how to do that, however.

Benwing2 (talk) 05:41, 24 January 2022 (UTC)[reply]

I don't know enough to select the written language. Perhaps those Norwegian editors can put the correct codes and templates on. The language my entry was for was the same as whatever is used in https://no.wikipedia.org/wiki/Stiftamt Graeme Bartlett (talk) 07:34, 24 January 2022 (UTC)[reply]
When the forms are entirely spoken/dialectal, I’ve used “no”, but if it shouldn’t be used at all, I suggest moving them to “nn”. Merging Nynorsk and Bokmål wouldn’t make much sense as the two written standards have different origins. One (Nynorsk) is closely tied to the spoken language and the dialects, while the other (Bokmål) continues a centuries old Danish written tradition. The etymology of the same word may be different in Nynorsk and Bokmål. Eiliv (talk) 08:46, 24 January 2022 (UTC)[reply]
Editing socalled dialectal entries, I've practiced differently across time. Lately, however, I've been using “nn” rather than “no”. The National Library of Norway categorizes most dialectal works, e.g. texts of Alf Prøysen. But more practically, it has to do with attestation and other criteria for inclusion. Words considered dialectal usually occur in passages considered Nynorsk as a whole. With some searching ([www.nb.no] is a good option), “entirely spoken” forms are often found in writing (Bokmål or Nynorsk). Ofkosinn (talk) 10:49, 24 January 2022 (UTC)[reply]
I think that the “no” tag serves its purpose as a kind of Translingual “mul” tag for Bokmål and Nynorsk, and that proper noun entries are a good fit for this purpose. Therefore, I disagree on both points. Still, I think the “no” tag should be used sparingly. Many of these entries are short and they are would usually also be identical between Nynorsk and Bokmål (e.g. Andeby or Per). I do, however, see issues with this:
  1. In texts, proper nouns are usually attested either in Bokmål or Nynorsk. This point is also somewhat contrary to the one in my comment above.
  2. Linking to other entries: For the etymology, is Andeby a compound using the Bokmål or the Nynorsk entry of and (“duck”)? Should a common “no” entry of Frankrike (“France”) use the “nb” or “nn” tag of fransk (“French”)? At the moment, Andeby evades this problem because of its brevity.
Entries such as Per do not run into the second problem. Also, my understanding generally is that a lot of proper nouns, especially given names, are widely considered neither Bokmål or Nynorsk, merely Norwegian. Ofkosinn (talk) 10:49, 24 January 2022 (UTC)[reply]
Even personal names have some differences in the two written languages, mainly when addressing several people with the same name (e.g. dei to Perane). I’ll admit it doesn’t have much relevance here though, as the Norwegian entries don’t show declension of personal names. Eiliv (talk) 15:50, 24 January 2022 (UTC)[reply]
I think however, that the “no” tag is sometimes left by what are probably just editing mistakes. I fixed trivelig. Ofkosinn (talk) 11:01, 24 January 2022 (UTC)[reply]
There are some Norwegian entries, proper nouns for personal names, surnames, place names which are common to both Bokmål and Nynorsk which will have to stay in no. On the other hand, these are the two official languages in Norway, the same word can occur in both languages but inflections differ between the two, so entries for each language are necessary. There is a third unofficial language, Riksmål , which is closer to Danish but replaced by Bokmål - it must be remembered that Norway was under Danish rule for a long time. But we don't make a habit of recording it. The character aa was replaced by å over 70 years ago, so any occurrences of aa are obsolete. DonnanZ (talk) 15:27, 24 January 2022 (UTC)[reply]
Re "names which are common to both Bokmål and Nynorsk which will have to stay in no": to be devil's advocate for a second, if they're common to both, why would they have to be no; couldn't they just be listed as both nn and nb? I mean, I think (like Benwing) Norwegian should be merged with Norwegian, but as long as Norwegian and Norwegian are being treated as separate languages, I don't see why a name being attested in both would prevent it from being assigned to nn and/or nb. - -sche (discuss) 21:47, 24 January 2022 (UTC)[reply]
If you want to do it that way in order to eliminate "no", by all means do so. I didn't know how to tackle them. DonnanZ (talk) 00:34, 25 January 2022 (UTC)[reply]

Revisiting the Bokmal vs. Nynorsk separation edit

I would like to get an unofficial poll. Who actually still opposes merging the two, vs. who favors merging the two? The fact that we have merged "Chinese", which represents lects far more different than Bokmal vs. Nynorsk, shows that the arguments for keeping them apart because "they have different declensions" (or genders, or what have you) is spurious. If a word exists in both, and has different declensions, we can easily show both declension tables together. If meanings are different, or a word exists in one but not the other, we can tag using the appropriate labels. Benwing2 (talk) 04:26, 26 January 2022 (UTC)[reply]

Updating CFI for the Internet and Offensive Material edit

Following up on the two previous discussions as well as the numerous CFD's, I think we should maybe consider a vote on two things, and discuss what exactly we can vote on here, however controversial they may be.

Problem 1:

There is a growing want to update the CFI for internet content to include not only usenet, which has a very niche group and as such niche language, and yet we disclude certain popular cites.

Potential solutions

1) Downgrade usenet's citability

2) Upgrade other cites' citability by

2a) Allowing for the usage of something like Internet Archive or
2b) Somehow making our own with wikisource? Though this seems rather like a stretch.

Problem 2: There has been quite a bit of talk over how we should handle offensive content see the BP discussion with m*nstr*l as word of the day and a CFD of D*rky C*ntin*nt, which also connects to our CFI discussion. The latter is technically citable, and perhaps there is a need for some way to warn the reader of incredibly offensive citations, asking for approval before showing the quotations? The alternative solution would be to move the quotations to the citations page. One problem with this is where is the line for adding such a filter? Vininn126 (talk) 10:26, 24 January 2022 (UTC)[reply]

For point two:
I'm assuming "d*rky c*ntin*nt" stands for "darky continent" (so Africa?) and not, for example, "dorky cuntinent". Before saying anything else, I must ask you to use recognisable names for what you discuss, as your current conduct is Anglo-centric and noninclusive.
As for the problem itself, I'd only like to remind those that will participate in future dicussion on the topic that Wiktionary is a dictionary, and that as such her aim is to provide the reader with the necessary semantic information to understand what is said or written. This should always come first. According to our CFI "[a] term should be included if it's likely that someone would run across it and want to know what it means". All manner of self-cencorship should be avoided, as it only lessens the worth of our project as a dictionary. Also, even if we do decide to bowdlerize ourselves, we should be wary of where to begin. We, a group of online strangers, do not have the coherence or professionality to decide what use is too offensive for the public. brittletheories (talk) 10:57, 24 January 2022 (UTC)[reply]
Are you referring to the fact I added asterisks?
Also, I'm not saying we shouldn't include offensive terms - we absolutely should. I'm just saying we might want to consider how we present it to the reader. Vininn126 (talk) 11:01, 24 January 2022 (UTC)[reply]
Yes, I'm refering to the asterisks. Having spent time on the anglophone side of the internet, I could figure it out, but I'm sure my mother, who does speak English well enough to read a novel or use a dictionary, would probably have trouble decyphering "d*rky c*ntin*nt". Censorship of this sort does not only lessen the pragmatic impact of a phrase but can also conseal it's semantic information. I understand that you don't advocate for such censorship in the mainspace, but I think it's a good parallel. No reduction in semantic information is acceptable for the purpose of passing cultural taboos or political whims. brittletheories (talk) 15:51, 24 January 2022 (UTC)[reply]

As a process thing, I would recommend splitting these two topics into two distinct posts but that said, I definitely agree that there should be some kind of principle of least shock or offense of some kind when it comes to what may be deemed offensive content. Yes, cultural standards shift and obviously what is taboo is different from place to place, etc. but I think it's pretty obvious that we don't need citations for "frying pan" or "bird" from a place like Stormfront or a completely nude person in a photo to supplement "armpit". When it comes to what is intended to be vulgar and offensive speech, gore, needless or excessive sexualization, etc., it is actually totally appropriate and keeping in line with other dictionaries to not include this material at all or to only include it where strictly relevant and necessary (e.g. on entries about sex organs or racial slurs, and even then, probably not linking to or citing Nazi forums). I know this sort of conversation opens up a kind of endless parade of fighting about the moving target that is "offense", but if we can avoid being coy or acting in bad faith, it's not that hard to agree that some content should either not be included/cited or should only be included/cited where strictly relevant for an academic purpose and that anyone inserting the kinds of outrage-producing thought hypotheticals from my previous sentences should be dealt with by banning. —Justin (koavf)TCM 11:12, 24 January 2022 (UTC)[reply]

That’s interesting. Personally, I would not mind a rule along the lines of requiring a term likely to be regarded as offensive to reasonable people (to be defined; perhaps limited to certain categories such as racist, sexist, LGBT-phobic, etc., terms) to be attested by more independent quotations (five? ten?) over a longer period (ten, maybe even 20, years). That might be enough to eliminate fringe terms that pop up from the sewer a few times and then disappear, while allowing for the retention of terms like the n-word which are offensive but also of academic and historical interest. — SGconlaw (talk) 11:39, 24 January 2022 (UTC)[reply]
This is a terrible idea. Have you forgotten "all words in all languages"? —Μετάknowledgediscuss/deeds 18:00, 24 January 2022 (UTC)[reply]
Not gonna lie, the applicability of that phrase here is rather ironic considering prior discussions and votes. AG202 (talk) 18:10, 24 January 2022 (UTC)[reply]
While I would like to see some consideration given to what criteria allows inclusion, and what cites are sufficient to show that a term meets those criteria, I have no interest in the policies being dependent on the meaning or register of the term in question. I think that terms which are more likely to spawn euphemisms (e.g. drug-related, sex-related, bigotry-related, or otherwise taboo) are more likely to have lots of nonces, and therefore will be much more common in these types of discussions, but I see no reason why a niche term in computer science needs to have a different set of criteria than a niche term in Y'all Qaeda lingo. All words in all languages continues to be the right aim for the project, but let's make sure a word has actually entered the language before we include it. - TheDaveRoss 15:29, 24 January 2022 (UTC)[reply]
I'll add my comment from the RFD discussion here: "I agree with either having a content warning and hiding the quotes, because if I'm being 100% honest, coming across these recently-created articles (cw: racial slurs, including Niggeria & Niggerian) made me [feel sick] and have to take a breather with how physically uncomfortable I got. This is exactly why in a previous RFD discussion I said that Usenet definitely appeals to a certain demographic, and it's really really disheartening to see these terms be included and Usenet be given a pedestal while we're excluding AAVE slang terms because they're not in 'durably archived' sources." Now on that front, I would definitely like to see more stringent regulations on what can and can't be included from Usenet. It's extraordinarily frustrating that folks are voting against policies that'd include more online sources based upon the fears that three folks on Twitter might come up with "AG202ism" over a year, while simultaneously refusing to engage in discussion about how Usenet already does this (I've pinged said people multiple times about this). How do you expect editors like myself to want to continue contributing to this project at this rate? In terms of the proposals at hand, though @Vininn126, I would love to downgrade the citability of Usenet, but I don't think the Wiktionary community as a whole would want to do so. And then, unfortunately 2a seems like it won't pass without some kind of impossible nuance based on this vote. I genuinely don't know what to do at this rate, as change on this front doesn't feel likely here. AG202 (talk) 15:52, 24 January 2022 (UTC)[reply]
What AAVE terms (that are not very recent neologisms) are we not currently able to include? —Μετάknowledgediscuss/deeds 18:00, 24 January 2022 (UTC)[reply]
@Metaknowledge Talk:I'm not the one is the most recent example. AG202 (talk) 18:09, 24 January 2022 (UTC)[reply]
@AG202: In that example, you claimed it was used in songs and TV shows. Both of these are widely considered to be durably archived (unless they are independently published). Why don't you actually try to cite it? —Μετάknowledgediscuss/deeds 18:14, 24 January 2022 (UTC)[reply]
@Metaknowledge Because as stated in that discussion, I wasn't sure if they'd count under WT:CFI as it just states "durably archived", and as we've seen in prior discussions, that's very vague, and I didn't want to go through the time-consuming process of digging through tv shows and music only to be told that "that's not durably archived" (I'm pretty disillusioned right now if I'm being honest). Nonetheless, if I have the time and energy, I'll go through and find some cites to add at a later point. AG202 (talk) 18:21, 24 January 2022 (UTC)[reply]
My suggestion of requiring more citations over a longer period for offensive terms would not require any change to the current policy of allowing references to Usenet but disallowing references to sources that are not durably archived, leaving the policy to be revisited on another occasion. Just saying. — SGconlaw (talk) 18:31, 24 January 2022 (UTC)[reply]
Perhaps something like this would be better suited for internet citations in general. Vininn126 (talk) 18:42, 24 January 2022 (UTC)[reply]
(edit conflict) Maybe we can't stop these potentially offensive entries being created by certain editors, but these very entries betray the creator's point of view. I think that other editors who are shocked by any inflammatory quotations included, dredged up from questionable internet sources, should have the right to hide those quotations, in an effort to lessen the entry's impact. I have already done this to one entry, but I won't disclose it. DonnanZ (talk) 15:57, 24 January 2022 (UTC)[reply]
@Donnanz, stop harrassing Fytcha. The creation of offensive entries tagged as such that a creator believes in good faith to pass CFI are not a window into that editor's personal beliefs, and your repeated insistence of that in multiple fora is inappropriate. —Μετάknowledgediscuss/deeds 18:00, 24 January 2022 (UTC)[reply]
@Metaknowledge: I, too, have been harrassed before now. DonnanZ (talk) 18:15, 24 January 2022 (UTC)[reply]
We're not going to rehash your grievances here. And if you really do think you've been harrassed, then you should know better than to do it to others. —Μετάknowledgediscuss/deeds 18:22, 24 January 2022 (UTC)[reply]
No, that's water gone under the bridge. It's a reminder to you. I have been careful in not mentioning or discussing that user here. DonnanZ (talk) 18:42, 24 January 2022 (UTC)[reply]
(edit conflict) I've questioned Fytcha's judgment once in the last couple of weeks (the insistence on putting "Big Red" through RfD). But entry creations alone shouldn't be taken as an indicator or expression of ideological sentiment. Have a look at my contribution history. My most recent entry creations and attestations have been a bunch of incel slang. I am, to put it mildly and succinctly, not aligned with that particular movement. And it is possible to be pulled into a kind of research spiral as Fytcha describes. I recently set out to create a couple of entries to help populate Category:en:Libertarianism, and in attesting those terms through Usenet, I encountered a bunch of other terms, and ended up creating like a dozen entries. WordyAndNerdy (talk) 22:26, 24 January 2022 (UTC)[reply]
@WordyAndNerdy: Thanks for posting this. Regarding the first thing you've mentioned, I just wanted to add (in my defense) that my decision to convert the speedy into an RFD was heavily influenced by the fact that the article was created by a sysop. It is maybe also worth mentioning that I concluded the RFD with deletion a mere 6 hours after opening it (as opposed to waiting for 30 days). — Fytcha T | L | C 21:04, 25 January 2022 (UTC)[reply]
Yeah, fair point. To me, it was an obvious candidate for speedy deletion, given the subject isn't a public figure and has been harassed/stalked on account of the meme. However, though the incident made national news here in Canada, it doesn't seem to have been reported internationally. So it's probably that many editors wouldn't be familiar with it. WordyAndNerdy (talk) 23:24, 25 January 2022 (UTC)[reply]

(As Justin says, I think "update CFI re the internet" and "how to handle offensive material" are two different issues, but...responding to the second one...) As OP notes, we can (and do) already move un-illustrative, including unnecessarily offensive, quotes to Citations: pages if they're notheless needed for WT:ATTEST. (If they're not needed for that, like someone is adding racist screeds as cites of and, just replace them with normal cites and block the user as needed.) This does indeed lack a reader-facing warning of why the cites aren't in the entry, but eh, that probably reduces the amount of bad-faith or even good-faith debates over whether a moved quote is "really offensive" that a content warning would attract. And most readers are only looking for a definition and don't click through to the Citations page anyway; in the last 30 days, the n-word got 3,037 pageviews and its citations page got 4; I suspect a "for citations, see this page, but note that the quotes are offensive" notice would attract more attention. As Justin gets at, we'll always have to be ready to deal with bad-faith trolls, of course. We already see users try to add "offensive ethnic slur" labels to anything that mocks white privilege; there are folks who claim white and/or cis is a slur, or that they're not white supremacists they're white nationalists (or vice versa); etc... - -sche (discuss) 22:30, 24 January 2022 (UTC)[reply]

Given that the majority of discussion has been on topic two, I'm going to sum up what the general thoughts seem to be: Offensive terms should obviously be included, however some should be put under more scrutiny and with citations moved to someplace less visible (albeit accessible). The citations page seems to be the best location for that. The reason I brought up the internet citations credibility discussion is because it is directly tied to this - the vast majority of these lexicographically controversial terms seem to be hidden away on dark reccesses of the internet and not in general use, making them more niche, which is why a higher level of scrutiny may be preferred. Whether we apply that scrutiny to web-sources or to specifically more politically extreme terms (speaking as objectively as possible) is still up to debate. Vininn126 (talk) 22:44, 24 January 2022 (UTC)[reply]

I would rather see limitations on derogatory terms than offensive terms. Routine name-calling can be de-emphasized by (1) more aggressively deleting spaced terms that don't add much to the sums of their parts, (2) avoiding descriptive phrases that usually appear in a context making them clear, and (3) requiring greater attestation for low register terms in general. Vox Sciurorum (talk) 00:00, 25 January 2022 (UTC)[reply]
"hidden away in dark recesses of the internet" Are you running for elected office in the Bible Belt? DCDuring (talk) 00:05, 25 January 2022 (UTC)[reply]

@-sche has raised a salient concern. I'm not opposed to either proposal that's been put forward to minimize offensive content. But I worry that implementing any sort of "show offensive quotations" mechanism in entries might serve as catnip to bad-faith actors. It's the same story with trigger warnings. The goal of trigger warnings is simply to provide a mechanism by which people can ascertain whether they wish to be exposed to certain content. But certain types perceive content notices as a moral diktat – e.g., "you shouldn't read/watch/look at this" – and treat them as censorship or a threat to free speech. And, in doing so, they amplify the original content. The offensive content is unintentionally exposed to a wider audience than it would've reached otherwise. It's a kind of Streisand effect, I guess.

I favour the citations page quarantine model because it's passive. Citations pages aren't an entry point for most readers. There's less of a chance of anyone unwittingly stumbling upon inflammatory material on one. If we link to the citations page in the main entry, readers can reference the quotations there if they wish. There's no "show offensive quotations" to impart a possible value judgment on that choice.

I also think we should be cautious about trying to codify any standard of "offensiveness" into policy. There would likely be broad consensus that slurs based on race, gender, sexuality, disability, etc, are all offensive in a way that merely derogatory terms like nincompoop or Aberzombie are not. But there are, as -sche points out, those who believe that mansplain and white privilege are slurs, or are willing to argue so as bad-faith provocateurs. I don't think there's anything productive to be gained from opening the door to such arguments. I'd hope it would be sufficient to craft policy provisions that allow for the selective inclusion/exclusion of citations from entries:

Care should be taken when selecting citations and usage examples to be included in entries. Low-quality or provocative material may add little informational value while causing unnecessary offense to readers. In-entry citations and usexes should ideally strike a balance between documenting the term as it is used and keeping Wiktionary accessible, accurate, and relevant to a wide range of users. Citations deemed low-quality or offensive may be moved to the appropriate citations page. They can still meet CFI and "count" toward attesting the term.

And, no, this isn't "censorship." Every reputable dictionary in existence exercises editorial discretion in how it constructs and presents entries. There is a reason that Merriam Webster, Collins, etc. generally don't feature things like "women are the worst drivers" or "the thug driver crashed the stolen car" as usage examples of driver. There is a reason the image montage at the top of the Wikipedia article for the colour brown doesn't include a photo of human feces. "Nothing must be censored ever" is a terrible ethos on which to found a dictionary or encyclopedia with aspirations of being anything more than an edgy joke. The reality is that Wiktionarians have been exercising varying individual standards of editorial discretion for as long as Wiktionary has existed. We all make editorial choices informed more by an internal sense of what is "best" than any existing policy. I do this all the time when selecting three cites from the citations page to include in an entry. WordyAndNerdy (talk) 00:59, 25 January 2022 (UTC)[reply]

I should also note that I strongly oppose any proposal to deprecate Usenet as a source of citations or selectively impose higher CFI standards to Usenet. This wouldn't simply minimize the creation of marginal entries by provocateurs. It would do significant splash damage to our coverage of language in certain areas, especially fandom slang, regional slang, early Internet slang, early video games slang/terminology, and early TCG and tabletop gaming slang. There are terms that were once widely-used in certain online subcultures but never made it into print. Sometimes Usenet is the last surviving digital record of their existence because everything else has disappeared.
And I wouldn't call Usenet the "dark recesses of the Internet." At least not historically. Usenet predates the Internet as we know it. It was once a popular network for online discussion. In the 1990s, one could talk about everything from Star Trek, popular music, cooking, or regional politics through Usenet. But its usage has declined significantly in the last decade, and it seems like the surviving userbase overlaps with 4chan to an extent. WordyAndNerdy (talk) 01:33, 25 January 2022 (UTC)[reply]
To point 1) That is why above I mentioned that the citations page would be used. To point 2) I think the general want is that there should be a better way to include internet quotations in general, and not put usenet on some pedestal while ignoring other cites. It's less about downgrading usenet and more about upgrading other sites. Vininn126 (talk) 09:27, 25 January 2022 (UTC)[reply]
I agree with you on that front. I've always found it odd that Wiktionary, as a digital dictionary not bound by the limitations of print, has favoured physical media to the almost total exclusion of digital media. I've been arguing that the status quo needs to change for a long time. I think Twitter has a lot of promise as a modern corollary to Usenet. It's a widely-used, openly-accessible platform, and has advanced search functionality. My only resistance toward recent proposals to modernize the "durably-archived" part of CFI is the concern it might open the door to the insertion of self-promotional or fringe quotations into entries (CW: reference to Daily Stormer - this is a thing that happened). WordyAndNerdy (talk) 23:10, 25 January 2022 (UTC)[reply]
Wow, someone actually used that? I referecned a Nazi site as a hypothetical based on something that I recently saw on a (non-Nazi) message board. Incredible. —Justin (koavf)TCM 23:51, 25 January 2022 (UTC)[reply]
There are a few concepts that I feel need to be included in this discussion:
  1. "Undue weight". Terms that are only used by very small and isolated groups of people shouldn't be listed as synonyms along with terms that anyone would be able to produce without thinking hard. This is another reason I didn't think about when discussing the idea for smarter synonym templates: if you have a template that takes all the synonyms from all the synonym lists in all the synonymous entries, you end up with everything but the kitchen sink repeated in every entry- it's not just garbage-in-garbage-out, but garbage-in-garbage-everywhere. In triplicate.
  2. "Directionality". It's one thing to have an entry for an uncommon or unrepresentative variant that's definitely in use- as CFI says, someone may encounter it and want to look it up. And we should make it easy to go from such terms to the main entry, which is where we have all the important information. It's another, however, to link to it from the main entry to the variant if there's nothing much there to look at. An obsolete form such as advauntage, for instance, is definitely dictionary material and we definitely want to include there a link to advantage. On the other hand, there's no good reason I can think of to have a link in the advantage page to advauntage, except in the etymology to make a point about the history of the term. That means we should distinguish between:
    1. "Inbound" links, from minor spelling variants and non-lemma forms to the main, lemma entry
    2. "Outbound links, from the main lemma entry to selected alternative forms and synonyms
The idea is to have fringe material included, but without giving it any outbound links, just inbound ones. If you're looking specifically for such terms, you can find them. If you aren't, we won't go out of our way to show them to you.
Likewise, fringe material should be kept out of translation boxes for non-fringe terms: in effect, such translations are outbound links.
How does this square with "Wiktionary is not censored" and "Neutral point of view"? As far as inclusion goes, we're not excluding anything. All the content should be there, and it should reflect a neutral point of view. You should not be able to tell from the wording of the definitions how the definition-writers feel about the subject. The editorial discretion comes in strictly with regard to navigation- there's a difference between not hiding something and going out of your way to make sure people see it.
I'm not sure that I've explored this as thoroughly as I should- but it's past my bedtime, so I'll leave it at that. Chuck Entz (talk) 07:37, 25 January 2022 (UTC)[reply]
You raise a very good point about the inclusion or exclusion of links, as the person who was thinking of making synonyms smart. I deal primarily with non-fringe terms, so this hadn't crossed my mind. Vininn126 (talk) 09:25, 25 January 2022 (UTC)[reply]
A few points on this. First, as I noted in the RfD on "Darky Cuntinent", derogatory terms are not true synonyms, since they import a difference in meaning. They are more like related terms, but in most cases not etymologically related. Second, I would question the value of presenting as a synonym, for dictionary purposes, a word or phrase attested by only a handful of Usenet uses. Like internet fora generally, anyone in the world can post there, and one person can easily post from multiple accounts. If you were to go on Usenet and say, "from now on I'm referring to pizza as 'tomatacozita'," and either use this phrase with multiple accounts over some period or get some other people in that limited sphere to use that phrase, Wiktionary would not be serving the preservation of language to indicate at pizza that tomatacozita is a synonym. bd2412 T 21:05, 25 January 2022 (UTC)[reply]
@BD2412: I have no issue with removing Darky Cuntinent as a synonym of Africa (for reasons related to how extremely rare it is) but, as I already laid out in the corresponding RFD discussion, I strongly oppose that we interpret the term synonym in a way such that equivalence of referents is not sufficient anymore. Commonly understood, synonyms are allowed to differ with respect to register/formality, vulgarity, offensiveness, datedness, etc. (all of which "import a difference in meaning"). This is also the definition followed by every monolingual dictionary that I am aware of; for instance, see what Merriam-Webster Online provides as synonyms for to die: [13] Note also that we have qualifier parameters for the template {{syn}}. — Fytcha T | L | C 21:30, 25 January 2022 (UTC)[reply]
Agreed: synonyms have a match in connotation and denotation, so we shouldn't consider an ethnic slur as equivalent to a proper, appropriate name for an ethnic group because the slur includes an extra layer of connotation, even if it denotes the same phenomenon. —Justin (koavf)TCM 21:30, 25 January 2022 (UTC)[reply]
@Koavf: Just to be clear, "agreed" refers to my statement, and not Fytcha's response to it? bd2412 T 21:36, 25 January 2022 (UTC)[reply]
@BD2412: Correct: I hope the indents make it clear and as per my comment below, I've had a couple of edit conflicts. —Justin (koavf)TCM 21:53, 25 January 2022 (UTC)[reply]
@Fytcha: We had overlapping edits: as I propose below, a true synonym has the same connotations and denotations. The example synonyms you showed for "die" are colloquial or slang, etc. but they still all mean "dying". They don't include anything like "moved on to the ethereal realm", which would simply some kind of assumptions about what constitutes dying or somesuch. In other words, I think all of these "die" synonyms are true synonyms without the same denotation and no extra connotation. —Justin (koavf)TCM 21:51, 25 January 2022 (UTC)[reply]
@Koavf: There is no difference in connotation, in suggested meaning to you between "The man passed away." and "The man croaked."? No implied difference in how the speaker feels about this death? — Fytcha T | L | C 21:59, 25 January 2022 (UTC)[reply]
I'm saying that there is a clear etymological distinction between informal terms and derogatory terms. If I say "I'm going to New York City", a friend responding, "oh, The Big Apple" is merely reciting an informality, whereas a friend responding, "oh, New York Shitty" is making a clear value judgement about the quality of the city. Whether "croaked" is intended to convey that value judgment is not clear. bd2412 T 22:20, 25 January 2022 (UTC)[reply]
This is quickly moving away from the main point of discussion. I think synonyms, like most dictionaries treat it, can include terms of varying affection by the speaker, positive and negatives, and it actually makes more sense to list them because of that, with the right qualifiers. It would make sense to know how to list both the positive and negative terms for the same thing. Vininn126 (talk) 23:39, 25 January 2022 (UTC)[reply]
@BD2412: I think we just fundamentally disagree here; I understand New York Shitty to be a synonym and I maintain that other monolingual dictionaries are in agreement with my position. But even if we were to decide that such terms are not synonyms, where should they be listed instead? New York Shitty can be listed under derived terms I guess, but the same hack is not possible for e.g. Kraut as a synonym of German or chink as a synonym of Chinese. This is invaluable information and having to check the "What links here" page (that cannot even be filtered by language) to find out about the no-no words is unacceptable for me. I hope you can see where I'm coming from with this concern. — Fytcha T | L | C 23:34, 27 January 2022 (UTC)[reply]
At least as I understand it, what's being discussed is that the entry [[foobar]] will list inoffensive synonyms and then have a link to Thesaurus:foobar, where the offensive synonyms are sent. So, [[Chinese]] could link to Thesaurus:Chinese and list Chink there, like [[Jew]] links to Thesaurus:Jew. (But at least for my part, I'm mainly concerned about long lists of slurs based on a few qualities like race, gender, or sexuality; I don't know if we need to be concerned about listing derogatory synonyms of New York City or die, as long as everything's labelled.) - -sche (discuss) 23:11, 28 January 2022 (UTC)[reply]
I don't thing offensiveness plays a role though. Having Thesaurus:Jew or Thesaurus:eat is good because there are so many of them, not because they are offensive. I don't think that you're in favor of creating Thesaurus:Niger for that one lone offensive synonym. Perhaps a solution would be to create {{l-offensive}} which renders the term in a custom CSS class that users can opt out from seeing, similar to {{,}}. — Fytcha T | L | C 00:23, 29 January 2022 (UTC)[reply]
I like the idea of creating a collapsable template, though if it's like "show offensive terms" on the main entry as it does for quotations, then it rings the issue of highlighting them more. It's for sure a tough issue to get a comprehensive solution for, but I'm glad that it's at least being discussed. AG202 (talk) 00:43, 29 January 2022 (UTC)[reply]
In that case wouldn't we have to remove all the slang and religious-overtoned words from Thesaurus:die etc.? Really I think you are wrong about synonyms. Equinox 21:38, 25 January 2022 (UTC)[reply]
I don't think we're talking about the Thesaurus: namespace, which expressly allows for a much broader range of terms with differing connotations. bd2412 T 21:50, 25 January 2022 (UTC)[reply]
Yeah, while I'm not at all convinced that synonyms should in general be required to have the same connotations (or even that we should mind listing derogatory synonyms for die), I think we can (and already do) treat Thesaurus: pages, like Citations:, as less prominent than entries when it comes to putting slurs / offensive/derogatory terms for people somewhere. I moved the list of slurs for Jew to the Thesaurus: space after a suggestion at Talk:Jew#List_of_slurs?, as someone else did for Muslim. (You at one point suggested moving slurs even out of the Thesaurus, to a single Appendix of All Slurs; that would probably attract attention, but I don't know if we care: if someone who specifically wants to find slurs finds them, fine.) - -sche (discuss) 23:36, 25 January 2022 (UTC)[reply]
I had forgotten that I had that excellent idea. I should congratulate my past self for being so forward-thinking. bd2412 T 07:48, 29 January 2022 (UTC)[reply]
I don't think this is an accurate analysis of synonyms. I think a better solution would be to list the non-offensive synonyms on the offensive term, and to disclude the more offensive terms on the "main" page. Vininn126 (talk) 21:49, 25 January 2022 (UTC)[reply]
IMO we should not list offensive terms and slurs as "synonyms" of non-offensive terms, even if tagged with the appropriate qualifier. Benwing2 (talk) 04:35, 26 January 2022 (UTC)[reply]
Formulated thus, this rule would exclude too much interesting. The most harmless concepts have pejorative and elevating variants, example: neutral German gesprächigtalkative”, pejorative geschwätzig, appreciative redselig.
Apart from the fact that they are still synonyms in a broader sense.
I note that in the original vote that officially introduced {{syn}} and company the old method of synonyms sections was not deprecated with one reason being something like that some synonyms are remote and not as usable as a basic word, e.g. for the word for “to eat” in each language you will have a lot synonyms, many of which are vulgar or pejorative et cetera, so in such cases it does not make too much sense to put the synonyms too close to the definition (although in the case of the word for eating, the sheer bulk of synonyms discommends it, and for the large amount we are able to have a Thesaurus page to which we then link in the definition line, but you get the idea).
For most of what has been said above, I don’t like discriminating against very small and isolated groups of people. Wiktionarians are a small and isolated group of people, many important businesses in society are represented by small and isolated groups of people, so lots of traders cater specific markets, and most software projects have specific groups which ultimately use and disseminate them; the majority always will nag for it is dull and comprehends not the vastness of our plans, and I don’t trust it, but we can expect loyal behaviour of minorities if they aren’t treated with double standards.
They should be able discern where they would place inappropriate value judgements, which is less likely if we induce reactance, and it is a bit tricky how to sneak out of these: to present the loaded but not hide it. The important is that we preserve the intention of impartial as well as accurate description. Fay Freak (talk) 00:26, 27 January 2022 (UTC)[reply]
@Fytcha If possible, I'd like to kindly ask that we pause on adding terms like N*gger to even derived term sections on pages like Niger until something tangible comes out of this. No point in talking about limiting the visibility of terms like that on those pages, which is why I removed it from the synonyms section, if we're just going to throw them into the derived terms section. AG202 (talk) 02:16, 28 January 2022 (UTC)[reply]
@AG202: With all due respect, while there are some people that hold the opinion that the former word is not a synonym of the latter, I don't think anybody disagrees that it is a derived term. As such, it should clearly be included on that page as such. That said, I don't care about it as much as to re-add it if you were to remove it now. I won't add any more of these for the time being. — Fytcha T | L | C 00:15, 29 January 2022 (UTC)[reply]
Not saying that it's not a derived term, but the visibility is the concern for me. AG202 (talk) 00:40, 29 January 2022 (UTC)[reply]
@Fytcha Let me make it perfectly clear: we shouldn't avoid linking to things in the main entry merely because they're offensive. That strategy should only be used for things that are unimportant and meaningless in relation to the main term. A term that we only include because a few individuals used it in some out-of-the way place is of very little interest to people who are learning or learning about the language in question. Please note that I'm not talking about little-known regional lects and the like, but about isolation for non-linguistically-interesting reasons, or just random chance. A term like "Darky Cuntinent" isn't part of the vocabulary of any but an infinitesimal speck in the vast ocean of English speakers. There's nothing of historical or linguistic interest to make its rarity worth explaining.
The point here is that there are always a few anomalous examples in any corpus, and we can't possibily show the full range of varation in every entry, nor would we want to if we could. As dictionary writers, we're trying to educate people about a given language, not to showcase every possible permutation of it in the same place. To use an analogy: it's an indisputable fact that a certain percentage of any given species are anomalous- they may be missing limbs or other body parts, there may be discoloration due contact with various substances, there may be lesions due to various diseases or parasites. If you go to a museum, don't expect to see the occasional rabbit that's lost a paw to a lawn-mower. They definitely exist, and they might have such a specimen in a drawer somewhere, but they're not going to put it on display- it would distract from educating people about rabbits as a whole. Chuck Entz (talk) 05:48, 29 January 2022 (UTC)[reply]
So while we should have such rare terms, we don't need to put them front and center? Just available upon request? That feel appropriate. Vininn126 (talk) 11:21, 29 January 2022 (UTC)[reply]
  • I just want to report that I find this line of discussion is making me sick, sick enough not to wish to participate in discussion of the nuances of suppression. The eagerness with which we are looking to suppress terms is quite like the way Facebook, Google, NextDoor, Twitter (each with headquarters and main operations in the erstwhile land of the free, etc. are suppressing the presentation of certain kinds of speech. I always had hopes for the US as a polity of freedom and for the Internet as a vehicle for free expression, especially for the WMF projects, and, most especially, for English Wiktionary. At best we seem to be edging into a new Victorian Age. DCDuring (talk) 19:51, 29 January 2022 (UTC)[reply]
    How do you think I feel? I specifically mentioned how seeing these terms, and especially the quotations, in the wild made me feel extraordinarily uncomfortable as a Black Nigerian, yet I still didn't even vote to delete the term in RFD, because I know that it doesn't directly violate CFI + it's a word that has been used in the past however unfortunately, and am still willing to participate in the discussion. I am quite literally the target of the terms Nig*erian & Darky Cuntinent after all! Deciding whether or not a word should be considered a synonym of another on the entry page vs a thesaurus page or deciding to move quotations to a citations tab, is not suppression. Dictionaries do this all the time, seriously, and it's frankly insulting that you'd even label this as moving into a new Victorian Age. Seriously, I would recommend looking into actual examples of censorship and suppression of speech from developing countries before you compare it to moving some of the most racist and offensive quotations to a Citations tab (not even deleting it at the moment!) or adding a warning for blatantly offensive content. Remember that we should be appealing to all users, not just the ones saying the offensive content, otherwise we wouldn't even have the offensive label to begin with. AG202 (talk) 20:05, 29 January 2022 (UTC)[reply]
    I would like everyone to enjoy the privilege of all the freedoms of a liberal polity and society. One of the costs of the privilege is the need for toleration, not only of those with whom one disagrees, but even of the abusive. I note your aside above: "not even deleting it at the moment!" That makes it seem that deletion is your ultimate goal. DCDuring (talk) 21:39, 29 January 2022 (UTC)[reply]
    I'd like to point out the Paradox of Tolerance, and no, if it were my ultimate goal then I would've voted to delete it on RFD, as I explicitly stated already. And then on your first point, I should then be able to enjoy the freedoms of not seeing violently racist quotations openly on an entry nor seeing N*ggeria be openly displayed as a synonym for my birthplace. There's frankly no way to be completely tolerant towards the intolerant without being intolerant towards another group; otherwise, as mentioned, we wouldn't even have the {{lb|offensive}} label as the groups that use them could say they don't see them as offensive and that that goes against their "freedoms". But that's why we're having this discussion in the first place, to see how best to find that line. Nonetheless, if you don't want to participate, that's perfectly fine with me, but I'd really suggest that you avoid making comparisons like the one you made in your initial comment, if you actually want anything meaningful to come out of these interactions. AG202 (talk) 21:53, 29 January 2022 (UTC)[reply]

Desktop Improvements update and Office Hours invitation edit

Hello. I wanted to give you an update about the Desktop Improvements project, which the Wikimedia Foundation Web team has been working on for the past few years.

The goals of the project are to make the interface more welcoming and comfortable for readers and useful for advanced users. The project consists of a series of feature improvements which make it easier to read and learn, navigate within the page, search, switch between languages, use article tabs and the user menu, and more.

The improvements are already visible by default for readers and editors on 24 wikis, including Wikipedias in French, Portuguese, and Persian.

The changes apply to the Vector skin only. Monobook or Timeless users are not affected.

Features deployed since our last update edit

  • User menu - focused on making the navigation more intuitive by visually highlighting the structure of user links and their purpose.
  • Sticky header - focused on allowing access to important functionality (logging in/out, history, talk pages, etc.) without requiring people to scroll to the top of the page.

For a full list of the features the project includes, please visit our project page. We also invite you to our Updates page.

The features deployed already and the table of contents that's currently under development


How to enable the improvements edit

 
Global preferences
  • It is possible to opt-in individually in the appearance tab within the preferences by unchecking the "Use Legacy Vector" box. (It has to be empty.) Also, it is possible to opt-in on all wikis using the global preferences.
  • If you think this would be good as a default for all readers and editors of this wiki, feel free to start a conversation with the community and contact me.
  • On wikis where the changes are visible by default for all, logged-in users can always opt-out to the Legacy Vector. There is an easily accessible link in the sidebar of the new Vector.

Learn more and join our events edit

If you would like to follow the progress of our project, you can subscribe to our newsletter.

You can read the pages of the project, check our FAQ, write on the project talk page, and join an online meeting with us (27 January (Thursday), 15:00 UTC).

How to join our online meeting

Thank you!!

On behalf of the Wikimedia Foundation Web team, SGrabarczuk (WMF) (talk) 22:11, 24 January 2022 (UTC)[reply]

Using "transliteration" vs. "borrowed" template edit

What makes a foreign, apparently transliterated term qualify for {{transliteration}} as opposed to simply a {{borrowed}} in the Etymology section? I suppose perhaps naturalization but, for example, are these truly the only transliterated Arabic terms where the former template is justified, as opposed to all these borrowed Arabic terms? Or are these the only transliterated Russian terms, as opposed to all these borrowed Russian terms?

I'm asking because I hesitate which template to use out of the above two for Hungarian terms like transliterated names of certain national capitals. (Many of them are either Arabic or Russian/Cyrillic in origin, but of course the question applies to some two dozen other national languages as well that need romanization, since they are to be transliterated from the source language directly, unless another form has become completely established.) Adam78 (talk) 15:46, 25 January 2022 (UTC)[reply]

My gut reaction is transliterated is when it's a borrowing from the same alphabet. Vininn126 (talk) 15:50, 25 January 2022 (UTC)[reply]
That'd be {{unadapted borrowing}}. I think transliteration doesn't have to conform to the target language's phonology, while borrowings should. Also, transliterations would be {{learned borrowing}}s, wouldn't they? Thadh (talk) 16:08, 25 January 2022 (UTC)[reply]
@Adam78 In Russian, at least, "transliterations" are used *only* for personal names. IMO these are true transliterations, not borrowings; common nouns are always borrowings unless they don't actually exist in the target language at all (in which case IMO they shouldn't have entries). Hence national capitals are borrowings, not transliterations. Benwing2 (talk) 04:29, 26 January 2022 (UTC)[reply]
@Benwing2: цо́кор (cókor) is a common noun and is a transliteration, but in that case of a transliteration of Russian!
Else I find it suspicious that people try to suppose a separation of graphic systems and phonological systems of a language during borrowing: both is considered while borrowing a word, or by less literates only the sound systems are effective.
Currently it is used in almost half a thousand of entries and one seems to want to hint at there being something peculiar in the spelling due to particular consideration of the source spelling and its transliteration systems during the borrowing process, both nothing essential. I have never felt a need to use it. Fay Freak (talk) 00:19, 30 January 2022 (UTC)[reply]

@Benwing2 Thank you! (And @Vininn126, Thadh, thank you, too.) I understand your point but not exactly your reasoning. I thought a borrowing was a term that gained some currency in the target language and/or underwent some changes as compared to the original. The names of little-known capitals (let alone other toponyms) may not qualify for either, do they? Or a "transliteration" only applies when it can be contrasted with another, more common or altered form of the same origin? What about the "transliterated" Arabic and Russian terms I linked above, are all of them truly transliterations? In many cases the existing examples clarify the point and help to proceed, but not (yet) here… Adam78 (talk) 14:42, 26 January 2022 (UTC)[reply]

@Adam78 You make a good point. I guess it's a judgment on my part that toponyms should be considered borrowings. With names a distinction can be made between those that are "transliterations" of foreign names and those that are naturalized and hence borrowings. For example, Americans who aren't immigrants from Russia or children of such immigrants are unlikely to be named Anatoly, so IMO "Anatoly" is a transliteration of a Russian name, not a borrowing, which is different e.g. from Olga or Lara. Maybe you could make such a distinction with toponyms, but it's harder. Maybe you could make a list of likely factors influencing the decision, e.g. whether there is a conventional pronunciation in the target language. Benwing2 (talk) 01:32, 30 January 2022 (UTC)[reply]

How to mark Bahai orthography in Wiktionary edit

Theknightwho has been helpful adding entries and renaming categories related to the religion Baháʼí Faith. For those who don't know, that religious group has several Persian terms that have been adapted into other languages, including English and so they have special preferences in transliterating, known as Baháʼí orthography. My question is this: should we have some method of marking or noting terms written in their preferred orthography? E.g. we have Baháʼí Faith (preferred) as well as Bahai Faith and Baha'i Faith, etc. If so, how should these be noted? I can think of three approaches: An appendix with a table (linked in "See also"s), a standardized Usage notes section (a la this) with a tracking category, or an addition to {{lb}} with a tracking category. Thoughts? If necessary, I can make a proper vote, but I don't know that this is so controversial or impactful at the moment to require one. —Justin (koavf)TCM 20:07, 25 January 2022 (UTC)[reply]

Thanks! I think an addition to {{lb}} would probably be the easiest way to do things, if possible. Do we already have a way to distinguish orthographies that are and aren't approved by a recognised authority? For example, the Académie Française with French. Theknightwho (talk) 20:22, 25 January 2022 (UTC)[reply]
{{lb|proscribed}} is the one that comes to mind first, but it'd have to be clear as to whom it's proscribed by, so probably a Usage Notes section could be added as well. AG202 (talk) 20:47, 25 January 2022 (UTC)[reply]

Proper use of dashes in titles edit

We had a discussion here: Wiktionary:Beer_parlour/2019/January#Hyphens_and_dashes_in_entry_titles but it did not result in any change to Wiktionary:Entry titles. I have moved Creutzfeldt-Jakob disease to the typographically correct Creutzfeldt–Jakob disease and recreated Category:English terms spelled with – to have a conversation on the topic. If necessary, we can have a proper vote. I think it's pretty obvious that "Creutzfeldt–Jakob disease" and countless other terms with an en dash (not a hyphen) are completely valid English terms. We should have hyphenated forms be alternative spellings that direct users to the dashed forms. Whatever we decide should be memorialized at Wiktionary:Entry titles, no matter what that is (in fact, what is written there now seems to imply something consistent with what I am suggesting [emphasis added]: In most languages, the HYPHEN-MINUS is used for the hyphen, not any of the dashes."). —Justin (koavf)TCM 02:21, 26 January 2022 (UTC)[reply]

@Koavf Blah. Please don't do this in general. We are not Wikipedia and we have a tradition of using hyphen-minus, not en-dash. I would actually ask you to undo your changes until/unless you get consensus to have en-dashes in main lemma entries. Benwing2 (talk) 04:31, 26 January 2022 (UTC)[reply]
@Benwing2: Wiktionary:Entry titles says basically the opposite and we have all kinds of titles with all kinds of characters. Why is it English can have "@" in a term but not "–"? I'll undo as a good faith request, but my point still stands: these are legitimate terms. —Justin (koavf)TCM 04:44, 26 January 2022 (UTC)[reply]
@Koavf Thank you. I am just going by how things appear to actually be, rather than what a particular policy can be read to say. Having @ in a term is not controversial and there's no obvious replacement for it, but the use of hyphens vs. en-dashes, and straight vs. curly quotes, in entry titles is quite controversial to many people, and IMO it's asking for trouble to try and implement a change to the status quo without getting consensus. Benwing2 (talk) 04:48, 26 January 2022 (UTC)[reply]
I would agree that we should use the typographically correct en dash in such cases, and that there should be redirects with hyphens to those entries. — SGconlaw (talk) 04:51, 26 January 2022 (UTC)[reply]
I've been in favour of hyphens, like we use straight apostrophes in don't etc even when some books have curly ones. In part this is because deciding between en and em and quotation dashes is a source of perennial disputes on Wikipedia which using only the typeable hyphen has so far helped us avoid. (Reading the previous BP discussion, it seems a lot of users share this preference!) I think you're misreading WT:Entry titles, though I can see why: but "In most languages, the HYPHEN-MINUS is used for the hyphen, not any of the dashes" is accurate if taken as a description of the current practice that for hyphens of whatever sort, the hyphen-minus is used, not any of the dashes (i.e.: for hyphens of whatever kind, none of the dashes are used, only the hyphen-minus is). We have been seeing a slow uptick in the (still small) number of users who want to use dashes, and I concede Wiktionary is not the most consistent place, e.g. using click-ǃ and not exclamation-! for clicks. Meh. Obviously, whichever we lemmatize, we should have redirects from all the others. Another thing to consider is: what do other Wiktionaries do, and would moving our pages break the interwiki links or oblige the other Wiktionaries to add a bunch of redirects, like we and fr.Wikt used to have to use to link entries with apostrophes (since we lemmatize straight ones and they lemmatize curly ones)? - -sche (discuss) 21:37, 26 January 2022 (UTC)[reply]
  • I object to editorializing in the header. I could have opened a discussion entitled as follows:

Fussy use of dashes in titles edit

Wouldn't that be annoying? DCDuring (talk) 22:57, 26 January 2022 (UTC)[reply]

I think there's a difference between using "proper" typography in running text and in titles, and I'd argue that it's less important in titles. The titles/lemmata here are already to some degree normalized (as it happens with Latin macrons etc.). The headword template could take care of replacing hyphens with dashes automatically. – Jberkel 13:26, 28 January 2022 (UTC)[reply]
I suppose folks can be as fussy as they want if it doesn't interfere with either convenient typing or searching. DCDuring (talk) 23:35, 28 January 2022 (UTC)[reply]

How to RFV emojis? edit

We currently have WT:RFVN#🦀 and WT:RFVN#💜 open, both of which appear somewhat real to me but they are of course going to fail RFV. What should we do? The crab one could at any given moment be closed and deleted, which would be a shame if it is real. Then again, I don't think any non-memey dictionary defines these symbols. Please discuss. — Fytcha T | L | C 04:11, 27 January 2022 (UTC)[reply]

@Fytcha There's not much to do de jure without changing CFI. Point blank period. It was mentioned in the RFV for 🦀, and is part of why there needs to be a serious update to actually allow for common internet slang of today's day and age. Hopefully the vote going on right now passes so that we have a starting point, but unfortunately I don't have that much hope for it. I really feel that we keep going in circles on this issue, and I haven't even been here that long, so I can only imagine how frustrated some folks are that've been here longer. AG202 (talk) 04:55, 27 January 2022 (UTC)[reply]

Before I go too crazy removing these {{swp}} links... edit

I noticed that User:Samubert96 has created a large number of surname pages with {{swp}} links to Wikipedia articles for people who have the surname, but are not the originator of the surname, and appear to me to have no etymological significance to it. For example, "Fawbush", which linked to Wayne H. Fawbush (born 1944), a local politician in Oregon, "Allinson", which linked to Adrian Allinson (1890–1959), a British landscape painter, and "Battenfield", which linked to Peyton Battenfield (born 1997), a minor league baseball player. I immediately started removing these, as I can see no reason for their inclusion, but it looks like this is going to run into the hundreds, perhaps thousands of edits, and I want to be absolutely sure before I dig in that deeply. bd2412 T 05:56, 28 January 2022 (UTC)[reply]

One thing you might consider is if there is a redirect, and perhaps a differnt page like SURNAME (Surname) with the states, and relinking there. And of course we should only delete the truly dead links, the ones that have no surname pages on WP. Vininn126 (talk) 11:40, 28 January 2022 (UTC)[reply]
I don't see the interest linking to random people on Wikipedia with that surname and agree they should be deleted. – Jberkel 12:17, 28 January 2022 (UTC)[reply]
I will help you deleting these links if you find them unnecessary, but next time warn me as soon as possible, so we don't have to remove hundreds of links. Samubert96 (talk) 12:37, 28 January 2022 (UTC)[reply]
I only came across this issue myself because some of the pages containing these links were listed on User:This, that and the other's subpage, User:This, that and the other/broken interwiki links/2022-01-20/wikipedia, due to the targeted individuals having been deleted from Wikipedia as non-notable. bd2412 T 18:55, 28 January 2022 (UTC)[reply]
At any rate, I think this task is done. bd2412 T 18:09, 30 January 2022 (UTC)[reply]

A list of suggestions edit

Dear colleague,

I am a contributor from French Wiktionary and also part of a User Group to gather Wiktionarians. In the French Wiktionary homepage, you can find a list of entries up to be created and the list change every weeks. It is a way to motivate the community to look at blank spots and a way to inspire newcomers to start something that will be looked over by the others. It is often a list of five to seven words related to a topic (dogs names, winds, dances, etc.) or words with a shared feature (starting with a prefix, in use in a specific area, synonyms of a term, etc.). I just made a list of all past suggestions and I made it translatable, so it is also in English (you are welcome to fix my mistakes and add other languages  ).

This list was inspired by a discussion with LA2 and it is thematic and then in a chronological order to show the evolution from general to more specific topics. My idea is that some volunteers may use those lists here or in other Wiktionaries to impulse a similar dynamic. I know most of the contributors are loners and they do not want any editorial constraint, but it is only suggestions, and we have seen very good results, with new entries and a gain in motivation for several veterans. For example, we suggested names of musical genres and a contributor had created a thousand of new pages, far more than the six initial suggestions. Sometimes it brings new discussions about how we describe some concept or strange grammatical situations. Well, it was not that easy to find new ideas for each weeks in the past years, so I also hope other people will be inspired by our list and propose new idea what can be an inspiration for us. I am convince contributing to Wiktionaries could be more fun if we have a mutual support across languages   Noé 23:51, 28 January 2022 (UTC)[reply]

New language proposal - Proto-Bunun language edit

  • Name: Proto-Bunun
  • Wikidata item: not exist
  • Type: reconstructed
  • Script: Latin (Latn)
  • Ancestor: Proto-Austronesian (map-pro)
  • Descendant: Bunun (bun)
  • Proposed language code: bun-pro
  • Reference: Shibata, Kye (2020), A Reconstruction of Proto-Bunun Phonology and Lexicon (thesis), National Tsing Hua University

-- 13:38, 29 January 2022 (UTC)[reply]

Movement Strategy and Governance News – Issue 5 edit

Hey all,

As you may know, I'm a facilitator with a team called Movement Strategy and Governance. We've recently re-designed the Universal Code of Conduct News, and it will now appear as the Movement Strategy and Governance News.

I've added some direct links in the shortened version below, if you want to skip right to the subjects for this issue. Please let me know if you have any questions. The Movement Strategy and Governance team is inviting input about the newsletter (past, present, future) at m:Talk:Movement Strategy and Governance/Newsletter. --Mervat (WMF) (talk) 19:06, 31 January 2022 (UTC)[reply]

Welcome to the fifth issue of Movement Strategy and Governance News (formerly known as Universal Code of Conduct News)! This revamped newsletter distributes relevant news and events about the Movement Charter, Universal Code of Conduct, Movement Strategy Implementation grants, Board elections and other relevant Movement Strategy and Governance topics.

This Newsletter will be distributed quarterly, while more frequent Updates will also be delivered weekly or bi-weekly to subscribers. Please remember to subscribe here if you would like to receive these updates.

QQ guessing what language a quote is edit

Recently, Quiet Quentin was updated to output {{quote-book}} instead of wikicode. {{quote-book}} takes a language code, so the gadget supplies one...but it's wrong a lot, e.g. in the entry for hoguine, I noticed that it took Pierre Larousse's Grand Dictionnaire Universel to be English. It's also inconsistent; when I look at the set of results for the search "les hoguines", it takes different editions and copies of Jean Nicot's Dictionnaire françois-latin to be fr, la or the module-error-producing invalid code un. I didn't initially notice it was doing this, and even after I did, I found myself forgetting to check for and fix the codes. I can't imagine less-involved users will remember to check every time, either. (And it's not as simple to monitor as e.g. periodically checking that codes match L2 sections, since I can cite a monolingual French dictionary in an English etymology section, etc, and the gadget's mis-assumption that the dictionary was English would scan as fine.) I wonder if the template should default to und or (better, IMO) a dedicated cleanup code like the un it already tries to use for some quotes (which we would need to add to the module so it doesn't throw an error). - -sche (discuss) 20:39, 31 January 2022 (UTC)[reply]

Find all red links in a passage? edit

Is there a presently existing tool along the lines of the following? Suppose I have a block of text, and I wish to find all words within the text that do not have entries on Wiktionary (or do not have entries with a specific language section). Does anyone know a good way of accomplishing this goal? 70.172.194.25 04:49, 1 February 2022 (UTC)[reply]

Copy the text into a word document, replace all the spaces with opposed pairs of double brackets (e.g., [[replace]] [[all]] [[the]] [[spaces]] [[with]] [[double]] [[brackets]]), then create a Wikimedia account so you can have a userspace sandbox on Wiktionary and copy that block of text into your sandbox and see what turns red (e.g., nothing in the phrase: replace all the spaces with double brackets]]). bd2412 T 07:06, 1 February 2022 (UTC)[reply]
Thank you for the suggestion. 70.172.194.25 08:02, 1 February 2022 (UTC)[reply]
You are also welcome to use User:Fish bowl/m#ad-hoc. I use it a lot with "Show preview". Underscores are a quick way to link to an entry with a space in the name. —Fish bowl (talk) 07:46, 1 February 2022 (UTC)[reply]
Thanks, I did try it in preview mode. It seems to work great, even when punctuation is included. 70.172.194.25 08:02, 1 February 2022 (UTC)[reply]