Wiktionary:Beer parlour

(Redirected from Wiktionary:BP)
Latest comment: 11 hours ago by Ioaxxere in topic Wiktionary really needs structured etymology

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


February 2024

Unencoded Quotations edit

It has been claimed that if a spelling cannot be expressed in Unicode, it should not be entered into Wiktionary.

How does this apply to quotations? If the quotation cannot be expressed in Unicode, is it inadmissible, or do we simply use the best approximation available? I may be about to be in this situation for some Tamil-script Sanskrit. --RichardW57m (talk) 12:25, 1 February 2024 (UTC)Reply

How would it even be added? What would you type in your web browser in order to get it to render here? —Justin (koavf)TCM 12:36, 1 February 2024 (UTC)Reply
One might use images instead of text. Or one could do something along the lines of using 'oe' instead of 'ö', or tricks like using a following 'z' to indicating a combining squiggle below for Romanian. Sometimes the text could simply be transposed to another script or otherwise transliterated. --RichardW57m (talk) 14:32, 1 February 2024 (UTC)Reply
I'm opposed to adding these hacks, but at the very least, they should be contained with some kind of template with a tracking category like "Entries with Unicode hacks" that can be fixed as subsequent versions of Unicode are published. —Justin (koavf)TCM 18:29, 1 February 2024 (UTC)Reply

Partially unadapted borrowings edit

There are categories for "unadapted borrowings", but what about partially adapted borrowings? For instance, in Romanian, we have yankeu, which is not unadapted (it has the ending changed so that it can fit into Romanian grammar), but it's not fully adapted to the Romanian phonetic alphabet, as that would be iancheu. Bogdan (talk) 09:06, 2 February 2024 (UTC)Reply

Changes from /ə(ɹ)/ or /əɹ/ to /ɚ/ edit

@Cpeng2 has edited multiple entries, changing /ə(ɹ)/ or /əɹ/ in the pronunciation to /ɚ/. I've left a message on his talk page requesting that they not do so, for compliance with "Appendix:English pronunciation". Should the edits be reverted? — Sgconlaw (talk) 22:49, 2 February 2024 (UTC)Reply

IMO, yes, if there is an appendix giving a standard, and they are going against it. Benwing2 (talk) 08:04, 3 February 2024 (UTC)Reply
Wait, did we stop using /ɚ/ at some point? Or is there a distinction between /ɚ/ and /əɹ/ that I wasn't aware of? Andrew Sheedy (talk) 01:55, 4 February 2024 (UTC)Reply
It appears that @Kwamikagami implemented the change in June 2023 following a discussion on the talk page (this one, I guess). Perhaps such discussions ought to take place here at the Beer Parlour. — Sgconlaw (talk) 05:25, 4 February 2024 (UTC)Reply
@Sgconlaw Hmm, yeah, agreed. I actually think /ɚ/ is better for American English. Benwing2 (talk) 05:34, 4 February 2024 (UTC)Reply
I assume this topic should be included if this vote ever actually happens.--Urszag (talk) 05:45, 4 February 2024 (UTC)Reply
For GA, it's just /ɹ/. There's no phonemic distinction. kwami (talk) 05:49, 4 February 2024 (UTC)Reply
I don't think there is any single objectively correct phonemic transcription of GA. There is no phonemic distinction between /æw/ and /aʊ/ or /ɛj/ and /eɪ/, but we still use the latter transcriptions rather than the former.--Urszag (talk) 05:57, 4 February 2024 (UTC)Reply
Sure, but we don't use both, because that would imply that they are distinct phonemes. kwami (talk) 06:01, 4 February 2024 (UTC)Reply
In that case we should decide on one transcription and go with it. I am not an expert on phonetics, but it may be that laypersons would find /əɹ/ more intuitive than /ɚ/. — Sgconlaw (talk) 06:01, 4 February 2024 (UTC)Reply
I think that's probably true. kwami (talk) 06:06, 4 February 2024 (UTC)Reply
At the very least, I don't think we should enforce things like this without a Beer Parlour discussion. However, the arguments made above do make sense to me. Andrew Sheedy (talk) 19:34, 4 February 2024 (UTC)Reply
User talk:Your future self was doing the same thing (a student of Cpeng2). Changes from /ə(ɹ)/ to /ɚ/ are problematic, as I explained on YSS's talk page, as they make a (lazy) pan-dialectal transcription rhotic only, sometimes while keeping UK-only vowels (it'd be better to clean up instances of /ə(ɹ)/ into separate rhotic/nonrhotic or GA/UK pronunciations). In favour of /əɹ/, I see several arguments: if there's not a distinction between /ɚ/ and /əɹ/ (and hence we shouldn't be using one in some entries and the other in other entries), and we're already using /ə/ and /ɹ/, then using /əɹ/ means using fewer phonemes/symbols, and is consistent with how we notate e.g. /kɑɹt/ not /kɑ˞t/, and /woɹt/ (~/wɔɹt/) not /wo˞t/~/wɔ˞t/. I think /əɹ/ is more accurate in situations where there is a following vowel (to me, something like /əˈdʌltɚaɪn/ looks bad—there's a consonant in the pronunciation which is missing from that IPA), so I understand the appeal of using it across the board. OTOH, in cases where there isn't a following vowel, I understand the habit/appeal of using /ɚ/, like many other dictionaries do. My edits aren't consistent in using one or the other, but I do agree it'd be good to pick one as the standard to try to use consistently. - -sche (discuss) 00:14, 5 February 2024 (UTC)Reply
@-sche You make a lot of good points although I think /əˈdʌltɚaɪn/ (what word is this BTW?) is probably actually accurate for American English; I don't hear a consonant /ɹ/ in the onset of the syllable following the rhotic schwa in such words. Benwing2 (talk) 00:22, 5 February 2024 (UTC)Reply
adulterine, chosen only because it was the first word I spotted in the "has IPA" category that had the sequence we're discussing [in whatever notation] followed by another vowel - -sche (discuss) 01:26, 5 February 2024 (UTC)Reply
@User:Cpeng, and @User:Your future self who has been making the changes now that Cpeng was asked to stop, it would be helpful if you could discuss here why you think Wiktionary should use /ɚ/ instead of /əɹ/. Changing a few entries at a time, as you have been doing, is unwise: the number of entries which contain this sound is so large that it is surely more sensible for everyone to reach consensus here about which notation to use, and then we can standardize things in one direction or the other (either changing all relevant cases to /ɚ/ or changing all relevant cases to /əɹ/) systematically, with the help of automated and semi-automated tools.
YFS, I appreciate you heeding what I said on your talk page about tagging rhotic pronunciations as GenAm, but please pay attention to the whole pronunciation and not only the rhotacized schwa bit, for example here your edit resulted in the GenAm pronunciation being said to have /ɒ/ (GenAm uses /ɑ/ in that word, although it has come up in past discussions that some dialects in your area—I'm referring to the fact that you both list yourselves as CUNY folks—do use /ɒ/ in various words, but GenAm is not normally analysed as having that phoneme). - -sche (discuss) 02:43, 9 February 2024 (UTC)Reply
@-sche FWIW, my more-or-less GA accent does have /ɒ/ but it is the sound of caught not of cot, which uses /ɑ/ or maybe a centralized [ä]. As for the wider issues, we need an English pronunciation module to reduce the Wild-West nature of our current pronunciations. User:Theknightwho has mentioned interest in doing this but has also said they might not be able to get to it any time soon. If not, I might try to create something. Thoughts? Benwing2 (talk) 05:39, 9 February 2024 (UTC)Reply
(This ties in to what I was saying earlier on the talk page of the nurse-vowel vote, and I've commented there so that this discussion doesn't get too far off the topic of /ɚ/~/əɹ/...) - -sche (discuss) 08:27, 9 February 2024 (UTC)Reply
@-sche Hello! Thank you for bringing up some good points! My apologies for the mistakes regarding the general american pronuncations that used the incorrect vowels. This was due to to a lack of my own knowledge and (incorrectly) assuming that the lazy pan-dialectical transcription corresponded to an American pronunciation. I will be more careful about this from now on. I also agree that we ought to choose either /ɚ/ or /əɹ/ and stick with it consistently. Which one we end up choosing doesn't particularly matter to me (although I tend to lean /ɚ/ because I think it better captures how it is a single phoneme), however, /ə(ɹ)/ just isn't valid IPA. Parentheses are not documented as a convention anywhere. It is not clear that they indicate an "optional" phoneme, and even if that *is* understood, it is completely opaque that one version is supposed to represent GenAm while the other is supposed to represent RP. Whatever symbol we do land on, we should at least stop the practice of transcribing this sound as /ə(ɹ)/. - Your future self (talk) 21:10, 16 February 2024 (UTC)Reply

Failure to employ Descriptivism: Implied bias against Alternative forms edit

On the homepage of Internet Archive, one of the most important websites, is the following sentence: "Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more." That sentence employs the word "non-profit". Wiktionary classifies this word as alternative form of "nonprofit". The "alternative form" treatment of "non-profit" is the standard treatment of any alternative form, sure. But it in fact treats the alternative form as a "lesser form" or a "discouraged form", because the entry is a hollow shell. Now, frankly: I'll take what I can get! You know, Wiktionary is very open to all kinds of alternative forms that most dictionaries would shun and discourage. But I want to challenge you that treating "non-profit" as a second-class citizen compared to "nonprofit", in the way Wiktionary impliedly does by making "non-profit" so skeletal is not what descriptivism is all about. The average reader will see the "non-profit" entry and think: "ah, this is the bullshit form I shouldn't use." For Wiktionary to give off this impression is against Wiktionary's core descriptivist ethic. I don't have a solution more than I have a nagging problem. --Geographyinitiative (talk) 00:27, 3 February 2024 (UTC)Reply

Can you cite any major dictionary that does not do that? Nicodene (talk) 01:08, 3 February 2024 (UTC)Reply
It is not obvious to me that the effect is of any importance.
If we had anything intelligent and fact-based to say about alternative forms, that would reduce this effect. Examples would be usage notes that noted usage contexts, {{defdate}}, citations, more common use with one definition rather than others (for MWEs). It would also help if the determination of what the main form was could be based on current (say, this millennium or over longer periods for words without much usage) relative frequency. DCDuring (talk) 15:30, 3 February 2024 (UTC)Reply
Your idea of what the "average user" would or should do is also a non-descriptive imposition. Where are your statistics and proof? Equinox 16:00, 3 February 2024 (UTC)Reply
Geographyinitiative assumes a user with low ambiguity tolerance for which the user is responsible himself. He might well appreciate not having a stance. Fay Freak (talk) 16:39, 3 February 2024 (UTC)Reply

Thank you all for your comments. Thanks for your help. I'm thinking the same thoughts you are saying. I'm uneasy about it, but I can't think of a way out. --Geographyinitiative (talk) 17:20, 3 February 2024 (UTC)Reply

I'm missing the original point, but would like to state that we don't (yet) have a Template:bullshit form of template. We'd do well to add it to wonderfool. Demonicallt (talk) 23:38, 4 February 2024 (UTC)Reply

"the" in head= edit

A lot of English terms (with parallels in other languages) include an article in the headword when it's habitually used with the article. Example, ankle express, defined using |head=the [[ankle]] [[express]]. The next ten in alphabetical order after this that do the same are:

  1. Antiochian Orthodox Church
  2. Antlered God
  3. Antonine Wall
  4. Apennine Mountains (but Apennine Peninsula and Italian Peninsula don't get this treatment)
  5. Apostle Islands
  6. Apostolic Age
  7. Apple Isle
  8. Arab Emirates
  9. Arabian Peninsula
  10. Arab League

Some other interesting examples:

  1. 10/40 Window
  2. art of the possible
  3. aseptic technique
  4. Asian six-pack
  5. Asian songbird crisis
  6. book world
  7. calm before the storm
  8. great imitator
  9. imperial system
  10. kitty's titties (but cat's meow doesn't get this treatment)

Should we maintain this? Intuitively, it does convey some info, but (a) I'm not sure this is the best way to present it, (b) I'm not sure whether it's derivable automatically by a reasonably competent non-native speaker. I'm sure there are many cases where the the is missing, and indeed I found a couple of examples above. Note that I'm about to push some improvements to English headwords, one of which is to greatly improve the default linking behavior and another is to allow post-processing of the default-linked form the change specific links or add text at the beginning or end, which could be used to add the at the beginning without having to repeat the entire head. However, if people think the current system is the best, another possibility is a special param |def=1 to indicate that the term normally takes the definite article; there is something similar already in {{de-noun}} and {{de-proper noun}}, and it affects the entire declension table (in this case, we have no declension table, but we may have headword forms). Thoughts? Benwing2 (talk) 08:23, 3 February 2024 (UTC)Reply

This was discussed at Wiktionary:Beer parlour/2017/March § Where to record usage of the form "the X" (I support the view that it should be indicated somewhere). J3133 (talk) 08:52, 3 February 2024 (UTC)Reply
It's useful, and not derivable automatically. Greenpeace could have been named "the Greenpeace", but wasn't. Equinox 15:24, 3 February 2024 (UTC)Reply
I agree with J3133 and Equinox - it conveys useful information. Theknightwho (talk) 18:12, 3 February 2024 (UTC)Reply
I agree. Shows useful information. CitationsFreak (talk) 23:19, 3 February 2024 (UTC)Reply
It does seem premature to have our desire for technical perfection proceed without attending to the issue of what is or is not part of the lexicon. The fact that we have divergent treatment of apparently similar headwords is not necessarily a sign of laziness or ignorance, but rather of disagreement over what should be presented. Articulating the rules for normal application of the is principally the job of the entry for the. Alternating usage ((the] Gambia, (the Ukraine, etc.) needs to be noted in the entry, probably in a usage note. Usage labels and examples accomplish most of what we need if we want to act as if the matter is lexical. Burdening headword templates with this kind of thing seems likely to waste technical resources and risks making templates/modules even harder to maintain. DCDuring (talk) 15:25, 4 February 2024 (UTC)Reply
I don't see how this is a burden or a maintenance issue. Theknightwho (talk) 17:40, 4 February 2024 (UTC)Reply
Complication always is. This is an instance of needless complication. I am not confident that we will be able to call on the authors of the modules when things change in the WM environment. People leave for all kinds of reasons, often unexpectedly, sometimes totally. DCDuring (talk) 18:43, 4 February 2024 (UTC)Reply
I say that "the" should only be used if the term is very often used with it, not counting attributive uses (eg cat's meow, White House). CitationsFreak (talk) 19:44, 4 February 2024 (UTC)Reply
I have added |def=1 and converted cases that use 'the' in the head to use it. I'm not sure how it adds complication; it's only about 6 lines of code and the principle is very simple. Benwing2 (talk) 21:45, 4 February 2024 (UTC)Reply
I recall that 10 years ago or so our technical folks we already concerned about the complexity of {{en-noun}}. Maybe Lua makes it easier to use conventional programming methods to add complexity/features without limit. DCDuring (talk) 23:36, 4 February 2024 (UTC)Reply
@DCDuring IMO Lua makes complexity much more manageable compared with template coding. Template code gets impossible to understand after a certain point due to the proliferation of braces and the need for duplication (because of missing features in the template syntax), but this doesn't happen with Lua. Benwing2 (talk) 00:14, 5 February 2024 (UTC)Reply
I'm sure |def=1 makes things easier for the programmers, but from the user perspective the difference is:
Keystrokes: 4 (t + h + e + [space]) vs. 5 (| + d + e + f + = + 1)
Wiktionary-specific knowledge required: none vs. yet another named parameter- I'm not even sure how many there are so far, and other templates each have their own sets of parameters.
Output: No difference whatsoever
This strikes me as something like having Template:the, powered by Module:the, whose only output is "the ". It may not add much to the system overhead, but from a user's perspective it's a pointless waste of the user's time and learning capacity.
Mind you, I'm not exactly a technophobe. I appreciate all the wiki-bells and wiki-whistles, and spend a lot of my time adding things like headword templates and educating new users about them. It just seems like this is an outlier in the usefulness vs. complexity continuum. Chuck Entz (talk) 01:36, 5 February 2024 (UTC)Reply
@Chuck Entz It's not 4 keystrokes because the old way you had to use |head= and spell out the entire lemma following the the , but when using |def=1 you can mostly avoid that, especially since I've made the default headword-linking algorithm a lot smarter. For example, formerly snake-in-the-box problem would use something like {{en-proper noun|head=the [[snake]]-[[in]]-[[the]]-[[box]] [[problem]]}} when now you write {{en-proper noun|def=1}}. Benwing2 (talk) 01:50, 5 February 2024 (UTC)Reply
Point taken. Is there any reason not to call it |the=1, then? That would be easier to remember. Chuck Entz (talk) 01:57, 5 February 2024 (UTC)Reply
No reason, I just didn't think of that. I'll add that as an alias. Benwing2 (talk) 02:00, 5 February 2024 (UTC)Reply
@Benwing2 Would it be possible to add something like def=~ for optional use of the definite article? e.g. Old National Pronunciation. Theknightwho (talk) 23:40, 10 February 2024 (UTC)Reply
@Theknightwho Sure. How should it display? As two heads, or with (the)? Benwing2 (talk) 23:41, 10 February 2024 (UTC)Reply
@Benwing2 I'm inclined towards displaying both separately. To an unfamiliar user, it might seem like the brackets are there because it's not part of the pagename. Theknightwho (talk) 23:44, 10 February 2024 (UTC)Reply

What's a "multiuse collective" supposed to be? edit

The term is used at Module:number_list/data/en and therefore appears in some number entries. The term doesn't seem to be used outside Wiktionary. Equinox 11:06, 3 February 2024 (UTC)Reply

It might be read as "multiuse, collective", ie, as two adjectives. This would be bad form as most of the other column heads are NPs. DCDuring (talk) 16:41, 3 February 2024 (UTC)Reply
I might have created this but if so I have no idea what I meant. It sounds like a (poor) attempt to distinguish various sorts of collective terms. Benwing2 (talk) 21:37, 3 February 2024 (UTC)Reply
It could be worse. Sometimes I'm reading some old discussion here and find something that makes me think 'Who does this guy think he is?' only to find that it was me. But I don't let that slow me down. DCDuring (talk) 22:47, 3 February 2024 (UTC)Reply
I think I get it now. It means a single entity consisting of n parts (e.g. triplet, as opposed to a threesome which is three separate entities). I don't know why that is "multiuse" though. Equinox 22:42, 4 February 2024 (UTC)Reply
@Equinox What do you think of referring to terms like threesome, foursome as "group collectives" ("collective groups"?) and terms like triplet as "multipart collectives"? Benwing2 (talk) 00:17, 5 February 2024 (UTC)Reply
@Benwing2: That seems clearer! Equinox 09:44, 5 February 2024 (UTC)Reply
I am the one who named it "multiuse collective"; I needed a word to disambiguate it from the other collective series, especially the Germanic "-some" series. The "multiuse" series is really to "tuplet" series, which blends Latinate number prefixes and the Germanic diminutive "-et". At the time I named it "mutiuse", I think one of the tuples had a multiplicative meaning, or some confusing wording that made me think it had a meaning other than collective or elemental, so I named the series "mutliuse" since it had mutliple uses; I figured "Germanic-Latinate collective" would be offputting.
I agree, the term is inapt. I am glad others have changed it, but honestly the difference is only etymological and not semantic. I think the current "group" and "multipart" terms are misleading, since it implies some sort of semantic difference. Is "Tuple collective" better? PhalanxDown (talk) 17:41, 7 March 2024 (UTC)Reply
I don’t think either “multipart collective” or “tuple collective” are clear. It may be that there is no brief way of expressing this clearly, in which case just pick one of the terms and add an footnote explaining at greater length what it means. — Sgconlaw (talk) 18:10, 7 March 2024 (UTC)Reply

Search options edit

In case someone wants to find, say, English words beginning with st-, are there any tools to do such a search? Pirhayati (talk) 22:06, 4 February 2024 (UTC)Reply

@Pirhayati: For some options, go here and click to expand the Advanced Search section: [1]. Or go to Category:English lemmas (or any category) and browse alphabetically. Equinox 22:44, 4 February 2024 (UTC)Reply
Thanks. What if I want to find the words ending with -st or with a middle -st-? Pirhayati (talk) 08:54, 5 February 2024 (UTC)Reply
As far as I recall, you can't search by suffix; see e.g. this discussion from 2015: [2]. If you are techy, one option is to download the periodic "titles only" database dump, and parse it with a simple script. Equinox 09:47, 5 February 2024 (UTC)Reply
You can use incategory:"English lemmas" intitle:"*st*" in Special:Search for the middle -st-, but "*st" will show you things that have "st" at the end of any word, not just the last one. There might be some way to use a regex with the intitle keyword, but I wasn't able to make it work. Chuck Entz (talk) 15:59, 5 February 2024 (UTC)Reply
You can search for words with a certain ending using this tool; set the category to "English lemmas" (or whatever language you're interested in); if you also want non-lemmas, do a second search for "English non-lemma forms". It may time out or truncate the results if you search for something extremely common. If you're techy, the best option is as Equinox said. If you're not techy, you can download AutoWikiBrowser, download a database dump, and use AWB to search the database dump for all pages with st in the title and e.g. ==English== in the content; AWB will not allow you to edit Wiktionary unless you've been approved by the community, but AFAIK you can use it to search database dumps and categories and generate lists. It stops after some high number of results, though, so if you want a list of all words with something extremely common like e, you need to learn to parse a dump with your own script as Equinox said. - -sche (discuss) 15:38, 5 February 2024 (UTC)Reply
incategory:"English lemmas" prefix:st works for me. JeffDoozan (talk) 16:37, 5 February 2024 (UTC)Reply

Listing West Proto-Slavic Descendants edit

@Sławobóg@Thadh@AshFox@ɶLerman

Would the following descendants section format be too busy? An example would be

  • West Slavic:
    • Czech-Slovak:
      • Old Czech: bez
      • Slovak: bez, baza
    • Lechitic:
    • Sorbian:

Vininn126 (talk) 13:34, 5 February 2024 (UTC)Reply

I'm highly skeptical of Czech-Slovak as a valid branch. Once you get to dialectal Slovak the number of shared developments is extremely low, and it is well known that Old Slovak was under influence of Old Czech for many years. Thadh (talk) 13:41, 5 February 2024 (UTC)Reply
I take it by your lack of comments you don't see any problems with grouping Lechitic like this? Vininn126 (talk) 13:45, 5 February 2024 (UTC)Reply
I don't think it's problematic. Thadh (talk) 13:54, 5 February 2024 (UTC)Reply
I am for it! A long time ago I proposed adding a Czech-Slovak subgroup to the tree of Slavic languages. But then Thadh also spoke about his skepticism about the Czech-Slovak subgroup. AshFox (talk) 05:04, 6 February 2024 (UTC)Reply
I really ask you not to combine Czech with Slovak, since Czech dialects influenced Western Slovak dialects, and those on the Middle and Eastern dialects. ɶLerman (talk) 12:14, 6 February 2024 (UTC)Reply
Also, the Polabian does not need to be designated as Lechitic. ɶLerman (talk) 12:15, 6 February 2024 (UTC)Reply

Some Issues with the Way Hebrew Verbs (and Forms Thereof) Are Displayed edit

Hi,

I’ve noticed some glaring issues with the way Hebrew verbs are handled that I think need to be addressed:

1. Different binyanim with same etymology listed with different "Etymology" headings edit

Verbs (lemma forms and forms-of) with the same headword spelling (thus listed together on the same page) are often separated under different "Etymology" headings, in order to separate/disambiguate different binyanim, when in fact the two words in question have the exact same actual etymology. Take the entry נשלח, for example, which contains נִשְׁלַח (nishlákh), the 3ms past/perfect (and lemma) and נִשְׁלָח (nishlákh), the nif'al ms participle/present forms of the nif'al/"N-stem" binyan, under "Etymology 1", as well as נִשְׁלַח (nishlákh), the 1cp future/imperfect form of the qal/pa'al/"G-stem" binyan under a second heading of "Etymology 2".

However, in reality, both of these binyanim and all three forms share the exact same etymology, i.e. the same triliteral root שׁ־ל־ח (š-l-ḥ). Different binyanim do not constitute different etymologies, and such incorrect and misleading usage of the "Etymology" heading could actually create significant confusion in cases where there are actually different etymologies under the same headword/entry page, either where two different words just happen to have the same spelling despite having different roots, or where there are, in fact, two different roots with the same or similar spelling. For an example where this could be the case, one need look no further than words with the roots פ־ר־שׂ (p-r-ś) or any of the three (by Hebrew Wiktionary’s reckoning—see earlier link) separate roots with the literals פ־ר־שׁ (p-r-š) (Hebrew Wiktionary labels these different roots with the exact same spelling as פ־ר־שׁ א, פ־ר־שׁ ב, and פ־ר־שׁ ג, respectively).

Now, if one were to use the strategy utilized on נשלח of using different "Etymology" headers for different binyanim, in addition to the already potentially confusing plurality of headers needed for all of the different actual etymologies on פרש, things would get out of control pretty quickly. In fact, our entry for פרש actually does suffer from this malady. It is only (partially) redeemed from total confusion by the fact that the entry contains multiple {{HE root}} templates, which serve to group the actual etymologies together—unfortunately, such a strategy is not possible on a page like נשלח, which will only ever have a single {{HE root}} template. Importantly, it will only have a single {{HE root}} template precisely because all of the words in the entry only have one etymology, making the case by itself that "Etymology" headers should not be used to separate or classify different binyanim whose etymologies are, in fact, identical. Again, a binyan is not an etymology, it is effectively a higher-level kind of conjugation, a pattern which provides information about how a verb is used (i.e. voice—active/passive/reflexive, causativity, etc), which vowels should be used to conjugate its various parts, and whether any affixes are required, but it usually has no purely semantic information in itself. I suppose one could make the argument that "D-stem" or pi'el verbs often carry "extra" semantic content, usually in the form of "intensification" of the "G-stem"/qal/pa'al verb, although more rarely with somewhat different meaning altogether, but this still doesn’t change the fact that the word's etymology and primary semantic content is ultimately derived from the root, regardless of the binyan. And because most binyanim do not, in fact, contain any purely semantic content in and of themselves, the peculiarities of the "D-stem"\pi'el binyan cannot on its own be a case for this usage of the "Etymology" heading, especially in the context of all the issues such usage brings up, as discussed above.

One solution that I can think of to this problem would be to add a parameter to the {{he-verb form}} templates that lets editors specify the binyan, in the same way that the {{he-verb}} already does. This parameter would use the same shortcodes as {{he-verb}} for the various binyanim, and the result would be the appending of something like "(qal construction)" to the end of the {{he-verb form}} template, clearly disambiguating between it and other binyanim in the same entry, and eliminating the need for editors to abuse the "Etymology" header, leaving it for the task for which it should actually be used.

2. No good way to specify alternative spellings of verbs on same entry edit

I really think we need a way to specify alternative spellings in entries that use the {{he-verb}}/{{he-verb form}}/{{he-verb form of}} templates—for example, the verb שִׁלֵּחַ (shiléakh), with alternative spelling שִׁלַּח (shilákh). Currently, the only way to show the alternative would be to list it as a separate verb in the entry and then write "alternative form of…" in the definition (this is what I, myself, did on the aforementioned entry). While this particular combination of verb/tense/person/number only alternates between two spellings, many Hebrew verb conjugations have as many as four alternative spellings. We have a special parameter for pausal forms in {{he-verb}}, so IMO alternative forms should also be supported by template parameters (and by the way, we also may want to have a way to specify paragogic nun forms of verbs, so that they may be added and properly notated, in a way similar to pausal forms and my proposal here for alternative spellings).

Note that this is also a problem in the {{he-verb form}}/{{he-verb form of}} entries for the conjugated parts of such verbs, such as with יְשַׁלֵּחַ (y'shaléakh)/יְשַׁלַּח (y'shalákh). On those particular entries it is not quite as much of a problem to list the different spellings under different {{he-verb form}} templates, because they are already "forms of" anyway, but it still isn't ideal.

Note: I also posted about the above issue (#2) here, under the discussion/talk page for {{he-verb}}, but I wanted to bring it to the wider attention of this community here, since discussions at such parochial talk pages can easily be overlooked for long periods of time.

Please let me know what you think is the best way to solve either of these problems, and whether either of the solutions I have proposed makes sense.

Thanks,

Hermes Thrice Great (talk) 00:16, 6 February 2024 (UTC)Reply

Regarding the first issue, I agree with your proposal. To give an example from another language group: when Romance languages have a noun that has one gender with some meanings and a different gender with other meanings, the practice is to simply have a second noun header (e.g. meia has two noun headers under one etymology). This would be a better approach for Hebrew verbs with different binyanim, I think. There would be a single etymology, but two (or more) verb headers, which would specify the stem. I have no thoughts on the second issue, for the moment. Andrew Sheedy (talk) 06:13, 6 February 2024 (UTC)Reply
@Hermes Thrice Great For the first issue, I proposed last July a solution involving Etymology sections with numbers like 1.3 and 2.1, which are used to group the equivalent of terms from different binyanim in Arabic vs. terms from different roots. See WT:Beer parlour/2023/July#Etymology sections like 1.3, 2.1. There are issues with stuffing different binyanim under a single root in a single Etymology section, e.g. it becomes trickier to indicate the different pronunciations of the different forms (as well as the fact that they do indeed have different etymologies, logically speaking). For the second issue, we now have fairly general support for listing multiple variants of a term in a single link, courtesy of User:Theknightwho. The Hebrew templates haven't been reworked to use this support, however. As for your example of שִׁלֵּחַ‎ (shiléakh) vs. שִׁלַּח‎ (shilákh), however, these aren't just different spellings but also have different pronunciations, so I think some form of {{alt form of}} is correct. Benwing2 (talk) 23:58, 6 February 2024 (UTC)Reply
Thanks for your response. I think both of these solutions make a lot of sense. I should like to see if anyone else wants to chime in, but I can definitely work with these solutions.
Hermes Thrice Great (talk) 08:05, 7 February 2024 (UTC)Reply

Parameters for citing translations in {{quote}} templates edit

I think we should add a few parameters for citing where we got an English translation of foreign language texts if an English translation of a quoted text are readily available (not translated by our own editors). What do people think? Pinging @RcAlex36, who gave me the idea. — justin(r)leung (t...) | c=› } 04:48, 6 February 2024 (UTC)Reply

@Justinrleung: you could use |footer= to indicate the source of the translation and, if desired, format the citation using {{cite-book}}, etc. I don’t think it’s necessary to have new parameters for this purpose. — Sgconlaw (talk) 04:53, 6 February 2024 (UTC)Reply
@Sgconlaw: I'm thinking of Chinese quotations, where we use {{zh-q}} rather than |text= and |t= in the {{quote}} templates, but still the rest of the template for the bibliographic info. Using |footer= would not have the desired effect in this case, right? — justin(r)leung (t...) | c=› } 16:42, 6 February 2024 (UTC)Reply
@Justinrleung: I'm not familiar with Chinese entries. Can you provide an example? — Sgconlaw (talk) 16:54, 6 February 2024 (UTC)Reply
@Sgconlaw: Something like 另起爐灶. — justin(r)leung (t...) | c=› } 17:19, 6 February 2024 (UTC)Reply
@Justinrleung Can you give an existing Chinese example that cites where an English translation came from? Your example of 另起爐灶 doesn't do that; it just seems to use {{zh-q}} for formatting the quotation itself, for some sort of technical reason that I don't completely understand (IMO we should strive to eliminate things like {{zh-q}}). Or alternatively, a hypothetical example with the syntax you propose. Benwing2 (talk) 22:31, 6 February 2024 (UTC)Reply
@Benwing2: A use case would be 國君, which is currently using the even older format of |ref= in {{zh-x}} instead of having a {{quote-book}} with {{zh-q}}. What I would imagine it being with what I propose would be something like this:
{{quote-book|zh|year=c. 4th century {{BC}}|title=zh:《左傳》|trans-title=Commentary of Zuo|en-trans-title=Zuozhuan: Commentary on the "Spring and Autumn Annals|en-trans-year=2017|en-trans-translators=Stephen Durrant; Wai-yee Li; David Schaberg|...}} — justin(r)leung (t...) | c=› } 22:45, 6 February 2024 (UTC)Reply
@Justinrleung How would this be displayed? Also maybe there is a way of doing this with |newversion= and |foo2= params; there are already quite a lot of params to {{quote-book}}, and various ways of citing translations, and I'm reluctant to add even more. Can you take a look at Template:quote-book#Reprintings,_translations_and_quoting_one_book_in_another near the bottom of that section where some text is quoted from a translation of The Snow Queen by Hans Christian Andersen and let me know if that format works? Maybe User:Sgconlaw can comment. Benwing2 (talk) 23:48, 6 February 2024 (UTC)Reply
It makes sense. Currently the information is given by various hacks, like on فَنَك (fanak), or for سَابُورُ بْنُ سَهْلٍ [Sābūr ibn Sahl] (a. 869), Oliver Kahl, editor, Dispensatorium Parvum (al-Aqrābādhīn al-saghīr) (Islamic Philosophy, Theology and Science. Texts and Studies; 16) (in Arabic), Leiden: Brill, published 1994, →ISBN I left out mention of the 2003 translation The small dispensatory by the same author because the author is already cited as editor (and I do not always take over translations unrevised anyway).
Also true what Benwing suspects, we have had various workarounds for sequenced editions in the past, nobody might even gather all from memory. Imbiss still has the same hack as always. We tend to think out solutions for infrequent problems and then forget both. Fay Freak (talk) 00:02, 7 February 2024 (UTC)Reply

Handling of some mathematical terms edit

In mathematics there are hundreds of adjectives which are used to indicate a certain mathematical object has some property (e.g. prime, normal, free). These terms usually appear next to a word indicating what kind of object they are characterizing (e.g. prime number, normal subgroup, free group), but sometimes they do not (e.g. x is prime, H is normal, G is free). Is there established policy on article/sense creation in this case? Do we create

(a) Senses under the adjective labeled (of a [number, subgroup, etc.])
e.g. local: (algebra, of a ring) Having a unique maximal (left) ideal.
(b) Separate entries of the form [adjective] [object] when the relevant sense of [adjective] only refers to mathematical structures of the type [object]
e.g. local ring: (algebra) A commutative ring with a unique maximal ideal, or a noncommutative ring with a unique maximal left ideal or (equivalently) a unique maximal right ideal.

or both? Or something else? In general I am in favor of (a) (if we can find sufficient evidence of use outside [adjective] [object]), and (b) seems justified given that textbooks read (e.g.) "a ring is called local if..." about as often as "a local ring is..."; see relevant discussion on Talk:prime number.

There's also a related problem: lots of properties in math can be seen either in a specific setting or in generality (e.g. ε-δ continuity of real-valued functions vs topological continuity). So, if we gloss free with its category-theoretic sense, should we remove free group as SOP? If we don't, presumably we do so on the basis that the more "groupy" sense we have at free group now is meaningfully different from something like ("a free object in the category of groups"), but this is kind of epistemologically murky since the definitions are (mathematically) equivalent, and practically difficult because there are lots of algebraic objects which can be free.

Is there already consensus on this issue? Discussions I'm not aware of? Winthrop23 (talk) 13:03, 6 February 2024 (UTC)Reply

Among other things we are a historical dictionary. Thus, early use of a term often merits inclusion, ie, with its attestable meaning and part of speech. This may not be entirely consistent with its current use. So, if free in group theory preceded free in category theory (presumably a generalization/abstraction), we would like to have the early definition. I don't know whether we would want to have every attestable definition of free that preceded its ultimate? generalization in category theory.
Generally, we try to write definitions that can be understood by English language learners and those who are not experts in a field (like mathematics). Thus, simply moving a technically correct definition often does not lead to a satisfactory definition.
@Msh210 has entered many mathematical definitions and probably can help. DCDuring (talk) 17:09, 6 February 2024 (UTC)Reply
@Winthrop23 I'm generally in favor of (a) to avoid duplication, although maybe some particularly common cases of (b) are acceptable under some circumstances. I think User:DCDuring's concern about historical accuracy can be handled in the mathematical definition of free (maybe through the Etymology section or in a Usage Note or something) without needing to create a bunch of specific entries like free X and free Y. Benwing2 (talk) 00:06, 7 February 2024 (UTC)Reply
@Benwing2: See WT:IDIOM on in a jiffy and jiffy. DCDuring (talk) 13:47, 7 February 2024 (UTC)Reply
Thanks for the ping. I agree that, by and large, we should follow "(a)" and treat things like free group as sums of parts, as Benwing2 said. That said, if there was a clear predecessor (for example, free group was in use before anyone thought to separate the words (which I highly doubt is true for this particular case, but maybe it's true for some (though off the top of my head I can't think of any likely candidates))), then IMO DCDuring's argument is a good one.​—msh210 (talk) 08:59, 13 March 2024 (UTC)Reply

Classical Gaelic miscellanea edit

I’m preparing to start a preliminary WT:About Classical Gaelic page now that we’ve the ghc language code split.

Before I do that I’d like to ask ye about some preliminary stuff and see what you find about my ideas. There’ll be a few different issues here. I wasn’t sure if it fits any general discussion spaces, so I’ve chosen Beer Parlour as the most general one – hope that’s OK!

This post is a general request for comments.

What falls under Classical Gaelic and what doesn’t edit

So far we’ve had Middle Irish for anything up to ~1200 and then Irish or Scottish Gaelic for anything later. The ghc language code is not treated as the ancestor of Irish and Modern Gaelic (as it was a fairly standardized literary language that at the end of the period was quite different from the vernaculars). So the question is which texts should fall under the Classical Gaelic heading and which should not.

My proposal is:

  1. treat any text composed between the 13th and 15th centuries (inclusive) in Ireland and Scotland as Classical Gaelic,
  2. treat all poetry fulfilling dán díreach requirements from Ireland and Scotland, up to the 18th century, as Classical Gaelic,
  3. and for 16th century and later prose, decide depending on diagnostic features of the text in question.

The last will be fairly easy for Scottish texts, a bit more difficult for Irish ones. If a Scottish texts has a consistent use of:

  1. plural verbs endings (cuirid for ‘they put’),
  2. full preverbs do-, a-, ad- (do-bheir, a-tá, ad-chím, etc.),
  3. no reduction of do in past tense and relative clauses (do ghrádhuigh agas do ghlac ‘which have loved and accepted’),
  4. no reduction of do before verbal nouns (tareis an fhuar chreidimh do chur ar gcul ‘after putting away the vain faith’),
  5. use of eclipsis (a gcriochaibh for ‘in bounds / lands / countries’),
  6. use of future tense separate from present,

it’s (prose) Classical Gaelic and not Scottish Gaelic.

So Carswell’s Foirm na n-Uirrnuidheadh would be Classical Gaelic, but 1767 Gaelic Bible would be Scottish Gaelic. As Donald Meek put it in Language and Style in the Scottish Gaelic Bible (1767–1807):

(…) Thus, a bardic poem was governed by regularions that defined its language and style to a very minute degree. Prose was less strictly controlled, but it was carefully regulated, with the verb-endings correctly used according to classical norms, and other spects of morphology (…) closely observed. This type of prose was written in Scotland well before Keating’s time, and occurs in the first Gaelic book ever printed, namely John Carswell’s translation of the Book of Common Order, published in 1567. (…)

Carswell’s version [of a Biblical passage] is noticeably different from the form of the verse in the 1767 translation (…).

Although stylistically variable, from the highly ornate prose of John Carswell to the leaner but consciously dignified prose of Geoffrey Keating, Classical Gaelic of Type A made few concessions to the sort of language actually spoken by the ordinary people, certainly in Scotland. Nevertheless, one senses in Carswell a transparency and an occasional lightness of touch (…) which seem almost to anticipate the need for a level of language capable of connecting with the non-classical vernacular language. (…)

This balance, between ‘differentiated register’ and the vernacular, was in due time fully achieved by employing another of Classical Gaelic, which we can call ‘Type B’ for convenience. [we just call this (Early) Modern Scottish Gaelic on Wiktionary] (…)

(…) Kirk’s Bible is in the ‘Type A’ style; the later Old Testament is in ‘Type B’. (…) The most salient differences are:

  1. The pre-verb do, used to mark past tenses in the classical language, is not used in independent position in the Scottish Gaelic Old Testament, whereas it is fully preserved in Kird; thus sheas rather than do sheas (v. 3).
  2. The Scottish Gaelic text generally employs analytic forms of the verb, and does not use verbal inflections to indicate the person of the verb; thus chruinnich na Philistich rather than do chruinnigheadar na Philistinigh (v.1).
  3. The Scottish Gaelic text also marks nasalisation (eclipsis) in the Scottish manner, rather than in the classical manner (…); thus nam Philisteach rather than na Bhphilistineach (v. 2).
  4. There are noticeable differences in vocabulary, and the Scottish translation generally reflects a register more in keeping with Scottish use; thus ghlaodh e re slòigh Israeil rather than dfúagair ar sluaghaibh Israel (v. 8), (…)

He calls the Gaelic of the later Bible translation “Classical Gaelic Type B” but honestly I don’t see why as it’s pretty much just a high register of modern Scottish Gaelic. Fairly close to the modern vernacular and quite distant from Carswell.

It’s a bit more difficult to make a list of such diagnostic features for Irish, as many of them are kept in one dialect or the other and show up in prose fairly late (even if the language is generally recognizably a modern dialect). I’d suggest a few, though:

  1. points 2–3 from the Scottish list,
  2. occasional use of s-preterite forms in the past meaning,
  3. consistent use of the -idh ending in 3rd person present absolute verbs (and only restricting -ann to dependent/conjunct forms),
  4. other distinctions between synthetic absolute / conjunct endings (cuirmíd vs ní chuiream; cuirid vs ní chuiread),
  5. subject/object distinction in 1st sg., 1st pl. and 2nd pl. pronouns ( vs mhé; sinn vs inn; sibh vs ibh),
  6. keeping the preposition re beside le (even if not distinguished in meaning),
  7. use of eclipsing ar to express perfect,
  8. use of eclipsing go in the meaning ‘with’,

and perhaps more.

I’ll need to collect more examples of Irish texts from 17th and 18th centuries and compare them to see how easy it is to classify them as classical or not.

Lemmatizing verbs edit

We need to choose the lemma form for the verb. We have a few options, each with its adventages, disadventages, and some precedence.

Personally I lean towards the 3rd person present indicative, but here’s the full list:

  1. 3rd sg. pres. indicative – the same form as we use for Old and Middle Irish,
  2. 1st sg. pres. indicative – as in Dinneen,
  3. imperative – like for new dictionaries of Modern Irish,
  4. verbal noun – like in medieval tracts.

1. is used in DIL (where most of citations actually are from classical texts, it’s actually more of a Early Modern Irish dictionary than an Old Irish one, despite mostly using the Old Irish-like spelling), and in vocab lists to some editions of classical texts (see glossary on the Léamh.org website, it’s also the form in Eoin Mac Cárthaigh’s The Art of Bardic Poetry. So I’d say it’s the standard modern practice for Classical Gaelic, and well established since the beginning of the 20th century.

2. is used by Dinneen – his dictionary deals mostly with Modern Irish, but it uses the pre-reform spelling which is often followed by the editions of classical texts (see next section), and contains a lot of historical usage (making it perhaps the most useful dictionary for reading classical texts). It’s also used in many editions of classical texts (again, see Léamh glossary, eg. the entry for cuirim).

Generally, editions of classical texts seem to choose either 1st sg. or 3rd sg. form.

I haven’t seen 3. used for classical vocab lists (and it isn’t used by DIL – the sole dictionary actually encompassing the language) – but it’s the form that’s generally used for modern languages (both Irish and Scottish Gaelic), except for Dinneen’s dictionary. Choosing it would be useful for people wanting to find classical forms of a verb when they know the modern Irish imperative – as the lemma would often be the same.

4. is what’s found in the actual inflection lists in grammatical tracts from 14th–16th centuries – the bardic schools treated verbal noun as the name of a verb. I wouldn’t choose it though as the verbal noun has its own separate morphology and syntax, so I’d rather keep it listed among the verb’s forms and then have it as a separate lemma with its own inflection table.

Spelling normalization of lemmata edit

The spelling in actual early modern manuscripts and printed books vary a lot and can be anything between early Old Irish practices to pretty much post-caighdeán modern Irish sometimes.

Still, modern editions generally use late (post-15th century) style spelling (using ao for the /əː/ vowel instead of áe or the like, marking lenition of b, d, g consistently, etc.) so I think we can stick to later practices too – only listing actual orthographic variants appearing in texts in the entry.

The question is which exact spelling we should choose. We have the Irish Grammatical Tracts i: Introductory likely from early 1500s which gives a spelling guideline and it’s followed by Mac Cárthaigh in his The Art of Bardic Poetry, it has features such as:

  1. no a between é and a broad consonant (eg. bél rather than béal for ‘mouth’),
  2. doubling of eclipsed voiceless stops (a ccríochaibh instead of i gcríochaibh, a ttír and not i dtír, etc.),
  3. use of sg, sb, sd over sc, sp, st.

On the other hand, many modern editions stick to forms that got popular later and are closer to (or identical with) Dinneen’s spelling (béal, i gcríochaibh, scéal).

Reconsider whether we want Classical Gaelic to be a sister of Irish and Sc. Gaelic or their ancestor edit

It’s generally agreed upon that Irish and Scottish Gaelic have split during the Middle Irish period and started their own grammatical innovations during that time. But they both kept the same literary tradition wherever Gaelic political order with poets educated in bardic schools working as diplomats and public commentators was present. Gaelic was still popularly considered a single language at the time, up to at least the 18th century.

If we adopt the policy I suggested above, anything from Scotland and Ireland from the period 13th–15th century would classify as Classical Gaelic. And so it basically is the ancestor form of both – a language with some dialectal differences but generally a single literary standard allowing the use of many dialectal forms.

And most Scottish, as well as Irish, forms can be regularly derived directly from classical ones (even though some cannot, where the classical standard follows some Irish innovations not present in Scotland). So it can be useful in Etymology sections to derive some words from Classical Gaelic (and perhaps interesting to users of Wiktionary to see a “descendants” list under classical entries).

But since it was a classical standard not really representing any particular spoken dialect and during later times fairly removed from them, I’m not convinced that’s what we want to do.

So I’m mentioning this also as something to discuss further in the next weeks or months. // Silmeth @talk 21:46, 6 February 2024 (UTC) // Silmeth @talk 21:46, 6 February 2024 (UTC)Reply

@Silmethule You might want to take this to the discussion page of WT:About Classical Gaelic. There are a lot of issues you're asking about and they seem to be something that only knowledgeable Irish and Scottish Gaelic editors would be able to discuss cogently. I'm not sure who these editors are but if you know, you can create a workgroup in Module:workgroup ping/data containing them, for ease in pinging. Benwing2 (talk) 00:01, 7 February 2024 (UTC)Reply
@Benwing2: fair, I’ll start a stub of that page later today and move the discussion there. // Silmeth @talk 10:00, 7 February 2024 (UTC)Reply
Moved to Wiktionary talk:About Classical Gaelic. // Silmeth @talk 20:02, 7 February 2024 (UTC)Reply

Derived terms and surface analysis edit

As previous discussions have made apparent (1, 2) the precise scopes of 'derived terms' and 'surface analysis' remain uncertain. There is a clean way to fix this, as it happens.

First, note that our mainspace defines surface analysis as a 'synchronically valid analysis of a word's morphology regardless of whether it represents its diachronic etymology, that is, its historical origin'. That is a fairly clear definition and, in principle, easy to follow.

In practice, however, surface analysis has been thoroughly confused with surface etymology, to the extent that even the glossary blends the two into a synchronic-diachronic mess. In fairness, the problem isn't always apparent; the cited example earthen : earth + -en happens to work on both levels. Then one runs into any number of cases like the Bulgarian луна (luna), with its surface analysis of луч (luč) + -сна (-sna), and everything falls apart. Mind, neither *луч nor *-сна even exist synchronically, and the problems only multiply from there. If, following *lówks+neh₂, one tries to combine the actual Bulgarian noun лъ́ч m (lǎ́č) "light" and the actual suffix -ен (-en) "forms adjectives", all they could ever hope to produce is лъ́чен (lǎ́čen), a relational adjective for "light". Try as one might, there is simply no way to make anything like луна́ f (luná, moon) from the available materials.

To be sure the more typical mistakes are of a milder type, such as dissolve : dis- + solve, where the synchronic combination may look intuitive but produces the wrong pronunciation (/dɪ(s)ˈsɒlv/ ≠ /dɪˈzɒlv/) or the wrong meaning ('dis-solve' ≠ 'dissolve').

In any case, to sort all this perhaps one might try a plan like:

1) Change {{surf}} to display a vague/ambiguous 'equivalent to'. (Accepting for the moment that SA and SE are hopelessly confused.)

2) Gradually introduce distinctive replacements:

  • {{synch|Y|Z}} – 'synchronically derivable from Y + Z'
  • {{diach|Y|Z}} – 'diachronically corresponds to Y and Z' (or 'etymologically')

3) Introduce a sorting rule (possibly automatable?) such that:

  • If a given word X has a valid etymology with {{synch|Y|...}}, then (and only then) it is put under entry Y as a derived term.
  • If X has no valid etymology with {{synch|Y|...}} but does have one using {{diach|Y|...}}, or has some longer-range etymological link with Y, then it is put under Y as a related term.

ETA: It seems Rua had some similar thoughts back in the day. Nicodene (talk) 15:49, 7 February 2024 (UTC)Reply

@Nicodene (For the record, at least луч (luč) is considered to exist in Bulgarian, according to RBE. If the roots indeed didn't exist, I'd find it to be a slightly absurd surface analysis and say it should be removed, but that one's fine IMO.)
Anyway, how are surface etymology and analysis different again? If "synchronic" means relating to a language in one point in its history, then it can't use forms that precede that point of the language, so we should be restricted to using only forms that exist in the modern form of the language, right?
My other question is why we would want to represent diachronically equivalent forms. What would be an example of this? Analyzing a modern English term as its Middle/Old English roots? I guess this should be used when we don't know an exact origin for a word, but want to represent it based on the component roots we know it came from? Kiril kovachev (talkcontribs) 20:36, 7 February 2024 (UTC)Reply
@Nicodene Yeah I have the same concern as Kiril. I don't understand the difference between surface analysis and surface etymology, and unless this is made crystal clear, introducing two replacements for the one we have is just going to make things messier. Benwing2 (talk) 21:29, 7 February 2024 (UTC)Reply
@Kiril kovachev, Benwing2:
Sorry, in trying to avoid rambling I'd cut out examples and quite a bit of context.
The gist is that for a modern Bulgarian word to have a synchronically valid morphological analysis, the morphological combination that one proposes for it by definition has to be synchronically possible - that is, the indicated elements should exist and be able to combine, within and according to the grammar of modern Bulgarian, to produce the form luná.
Suppose that a sorceror makes everyone forget that any word like luná ever existed. Would it be possible for a Bulgarian to wake up tomorrow and recreate that word by combining the noun lăč 'light' (still the only standard form I am aware of- not that it really matters) with a suffix -sna?
No, first and foremost because there exists no suffix -sna which modern Bulgarians attach to nouns to create new nouns from them. Nevermind being able to make new words with it, they wouldn't have any reason to suspect that their language had ever even had a -sna suffix unless informed of it by a linguist. I digress.
----
As for the terms 'surface analysis' and 'surface etymology' - frankly my preference would be to toss them in the bin and never look back. I wish I'd had the foresight to avoid using them at all in the earlier comment, as they (understandably) cause confusion.
Is the suggested alternative of 'synchronically derivable from' not clear? Another example: murderer is synchronically derivable from murder, because any time we like we can take a noun like murder and pop the agentive suffix -er onto it. Combining /ˈməːdə/ and /-ə/ gives us a form pronounced /ˈməːdəɹə/, and combining their senses results in one of 'he who engages in murder'. For all intents and purposes our creation is identical to the existing word murderer and we have proved that the latter is synchronically derivable. So its etymology merits the use of {{synch|murder|-er}} and we put murderer as a derived term under murder. (In principle we should likewise be able to synchronically recreate anything else found as a derived term under murder - and if something doesn't fit, then it would be moved instead under 'related terms'.)
On the other hand, we have no way to synchronically derive the English montage, even though we have the etymological components in mount and -age, because combining them results in the wrong pronunciation (/ˈmaʊntɪd͡ʒ/) and the wrong meaning ("act of mounting"). Thus in this case we could not put {{synch}} and instead use {{diach}}, since the components mount and -age do etymologically, and diachronically, correspond to those of montage, even if they cannot actually be combined to make montage. Nicodene (talk) 01:07, 8 February 2024 (UTC)Reply
@Nicodene I see. Yeah I have never liked "surface blah" either, and have always preferred "synchronically analyzable as" which is similar to "synchronically derivable from". I actually think the vast majority of uses of {{surf}} are simply synchronic analyses; the cases like montage are rare and probably don't merit any sort of analysis into English components. I would just say montage is borrowed from French. There are indeed occasional cases (e.g. in Russian) of languages inventing French-like terms using French roots and French affixes that aren't normally productive in the destination language along with French combining rules, but it's not clear we need a special template for this. Benwing2 (talk) 01:15, 8 February 2024 (UTC)Reply
I'd no interest in using {{diach}} myself and rather intended it as a relaxed alternative to {{synch}}. I'd somewhat pessimistically estimated that a third or so of the current uses of {{surf}} would fail to qualify as synchronically valid and so would one day have to be deleted or switched to another template. Having just surveyed some forty uses however I'm surprised to report myself agreeing with all but three.
If nobody pipes up in favour of {{diach}} I'm happy to drop it from the proposal, leaving just {{synch}}. Incidentally, an unintended side-effect of the phrasing 'synchronically derivable from' is that it may render {{af}} and such redundant. It seems just about inevitable that the preface works for any correct example of affixation. Nicodene (talk) 03:06, 8 February 2024 (UTC)Reply
My position has always been against using linguistic jargon where it can be avoided (I favoured binning the "proscribed" label, for instance). I'd be a lot happier to see {{surf}} generate "equivalent to". I know it's imprecise, but so what? Your average punter isn't going to have a clue what "synchronic" means. I remember when I was at uni, a friend was studying a unit on Chinese translation and we came across the words "synchronic" and "diachronic" in one of the readings. Neither of us knew what they meant - and we both had undergraduate degrees (one in arts, one in sciences)! At a stretch, we could work on improving synchronic in the Glossary. But let's not make our readers work any harder than they have to. Most will just give up. This, that and the other (talk) 02:37, 9 February 2024 (UTC)Reply
@This, that and the other I get your point although at the same time, I'm concerned that using something vague like "equivalent to" will lead to the same mess we currently have with "surface analysis": No one knows what it means and so it just leads to endless questions and speculation. At least "synchronic" has a precise meaning. Benwing2 (talk) 05:42, 9 February 2024 (UTC)Reply
Would "Morphologically equivalent to" work for people? Vininn126 (talk) 08:15, 9 February 2024 (UTC)Reply
In the last thread multiple people chimed in to make, in different ways, the same very important point that I'd initially missed. I should probably have made it at the beginning of this thread as well.
Take for instance boldly. Even if the combination is attested as far back as Old English (bealdlīċe), and across the centuries thereafter, that doesn't mean that our modern boldly is ipso facto its direct, linear survivor - in every case handed down from individual speaker to speaker gaplessly for some two thousand years. Inevitably it has been reinvented from the components bold and -ly, and older forms thereof, multiple if not countless times in the intervening period, and will continue to be so long as both components remain in active use. Speakers surely do not walk around with a massive stored archive of regular adverbs (rapidly, enviously, proudly, surreptitiously ad infinitum) individually and faithfully passed down from medieval English. What we actually store is -ly 'attaches to an adjective to form an adverb', and we go around merrily applying it to adjectives as we please.
All that is to say that the etymology shouldn't be phrased as if boldly is a fossil inherited from Old English that just so happens to be the equivalent of or comparable to bold and -ly: it is inseparable, a sort of forever-being-reborn child. After stewing on the point for some time I came up with the phrasing 'synchronically derivable from' to capture at least some of the significance of this. If there is a more effective way I'll be glad to hear it. Nicodene (talk) 12:23, 9 February 2024 (UTC)Reply
Yes, of course reinnovation is part of this - how is this a response to the wording "Morphologically equivalent to"? Vininn126 (talk) 12:29, 9 February 2024 (UTC)Reply
I have answered this, or tried to at any rate, in the last paragraph of that comment- its first sentence essentially. Sorry, I was in the middle of editing when you replied. Nicodene (talk) 12:39, 9 February 2024 (UTC)Reply
I've a bit more faith in the reader than that! Surely they could click the glossary link. Even if they don't, one exotic word won't stop everything after it from making sense. If I were to stumble across the entry for virology and find a description like 'glabuciously derivable from virus + -ology', I can't imagine walking away without a clue as to the origin of virology.
In point of fact I've been reading etymologies with "surface analysis" for years here and the phrase still doesn't make sense to me. Or really to anyone else, judging by the foregoing discussions.
If it helps, here is a mock-up glossary entry for the proposed template:
  • Synchronic derivation: the process of making a word out of other words, suffixes, etc. that current speakers know, use, and can combine in that way. The modern English bluish is synchronically derivable because English speakers are aware of and use the separate adjective blue and because they also freely add -ish to various adjectives to diminish their sense. In case there is any doubt, this can be tested by coming up with an entirely unattested combination and seeing whether using it in a sentence make sense. At the time this was written there were zero Google results for 'Kyrgyzstani-ish', yet the writer judged the following sentence acceptable: 'The cat's owner sounds Kyrgyzstani-ish, but that there's an Uzbek cat if I've ever seen one'.
Nicodene (talk) 07:49, 9 February 2024 (UTC)Reply
I for "synchronically, X + Y" or "morphologically, X + Y". PUC12:44, 9 February 2024 (UTC)Reply
I agree with this conclusion. Kiril kovachev (talkcontribs) 16:33, 9 February 2024 (UTC)Reply
Just spitballing, but if we accept the idea that "boldly" exists not solely because it was handed down from generation (Old English bealdlīċe) to generation (Middle English baldeliche, boldeliche, boldely) to generation (modern boldly) the way e.g. bold itself was, but because speakers productively use -ly, then what if we simply omit any sort of "surface analysis..." or "synchronically..." or other framing, and just start every etymology section for which a synchronic etymology is applicable with that etymology, i.e. instead of "From Middle English boldeliche [...] equivalent to bold + -ly", boldly would instead just say "{{af|en|bold|-ly}}. From Middle English boldeliche...}} ? (Cases where a synchronic etymology is not obviously applicable would not get {{af}}s, i.e. we would not say bold itself was "bell + -t" or whatever the reflexes of the parts it was composed of back in Pre-Germanic/PIE are, just like my opinion is that such things should not currently be getting "Surface analysis..."es either. I'm not sure how best to handle things like linking husband to house and bond: perhaps we keep its vague "equivalent to", or perhaps we replace it with something like "The first element is cognate to house, the second element is cognate to bond"? Or "...corresponds to..."?) - -sche (discuss) 01:52, 11 February 2024 (UTC)Reply
We don't provide an etymology deriving the English plural friends from the Middle English frendes, do we? Nor singing from singynge, nor angered from angred. It'd be mad. Friends is synchronically, and trivially, the plural of friend, singing is the gerund or present participle of sing, and angered the past participle of anger. Plain and simple.
So, I ask, what is stopping us from treating boldly as the adverb of bold? Do our readers need to know that hundreds, perhaps thousands, of regular -ly adverbs are also attested in older stages of English any more than they need to know about all the older attested -s plurals or verb forms with -ing? Of course not. To see the latter they can simply visit the Old or Middle English lemma, and that is, logically, how it should be for -ly adverbs as well.
I understand that boldly differs in part-of-speech from bold, but I don't think that is sufficient reason to have the site cluttered with ever-growing numbers of trivial etymologies for regular adverbs with -ly, or in Romance languages -ment(e), or in Georgian -ად, and so on ad infinitum.
We should seriously consider the possibility of delemmatising that which is trivially derivable on the synchronic level in all ways - that is, whenever the result is entirely predictable in pronunciation, meaning, and form. For example this could involve a definition-line template that reads regular adverb of [X], possibly linking to the suffix used to make it, and such entries will be left without etymologies that are, in the end, entirely unnecessary. Nicodene (talk) 02:07, 12 February 2024 (UTC)Reply
I disagree with delemmatizing adverbs like this; yes they are largely predictable in form and meaning but not always, and it would be confusing as well as hard to draw the line when it comes to doing that. Benwing2 (talk) 02:15, 12 February 2024 (UTC)Reply
And that is where the use of a definition-line template will come in handy. Truthfully for instance is 1. regular adverb of truthful; 2. (sentence adverb) to tell the truth; and so on. As for as words unpredictable in form - that is already a problem, is it not? If all an etymology states is 'X+Y', but X and Y do not actually make Z, then the etymology is simply incorrect - or more charitably speaking, it is incomplete. The only reason the problem hasn't been apparent till now is that it's been hiding behind the vagueness of 'equivalent to', 'by surface analysis', and other nebulous phrasings. Nicodene (talk) 02:35, 12 February 2024 (UTC)Reply
@Nicodene By delemmatizing I assume you mean either deleting the words or converting them to non-lemma forms, which I disagree with. If you are just referring to templatizing the definition, that's a different story, although I would argue that regular adverb of foo should be replaced with "in a foo manner", which is clearer. Benwing2 (talk) 02:43, 12 February 2024 (UTC)Reply
You'll have to forgive my not looking up Wiktionary's particular definition of 'lemma'.
What I mean is de-etymologising the adverbs in question and templatising their synchronically-derivable definition - which will be the only one in ~99% of cases. The phrasing I gave for the template was just a place-holder; I do find 'in an [X] manner' much more user-friendly.
ETA: That said, the latter wouldn't work for non-English languages. Nicodene (talk) 03:06, 12 February 2024 (UTC)Reply
I'm not a fan of this approach. While I understand that forms can be re-innovated that also doesn't exclude them from being inherited. Vininn126 (talk) 08:51, 12 February 2024 (UTC)Reply
Well then it should be good news that this approach doesn't exclude inheritance at all, unlike the previous one. Nicodene (talk) 09:09, 12 February 2024 (UTC)Reply

FYI: Updates from Unicode edit

https://mailchi.mp/7311676d715c/testing-rickys-template-6269058Justin (koavf)TCM 16:41, 7 February 2024 (UTC)Reply

Is anyone going to tell Ricky that his email template works, and has in fact been working for several months?... This, that and the other (talk) 02:39, 9 February 2024 (UTC)Reply

Setting Foreign Word of the Day edit

Is anybody eager to take over the task of setting Foreign Words of the Day? Then please reply below and we can work out a date from which you will assume full responsibility over it. You have to be a user in good standing with at least a modest amount of experience and an okay track record of not submitting questionable nominations (read: not having nominated offensive, vulgar or otherwise potentially controversial terms). I will explain the workings of the template to you and give you hints about certain best practices. I shall remain available for questions for some time as well. Be advised that you will often have to resort to finding or preparing suitable words by yourself.

Note that I am not insisting that someone else take over, nor burnt out, but I thought that the time may be right for some new blood. If it turns out to be too stressful, I remain available to reassume the task. ←₰-→ Lingo Bingo Dingo (talk) 20:25, 8 February 2024 (UTC)Reply

We could just discontinue the FWOTD project. Demonicallt (talk) 20:40, 16 February 2024 (UTC)Reply
Or just decide to only feature LDLs ;) Thadh (talk) 18:54, 17 February 2024 (UTC)Reply
If you offer to set or provide them... :) ←₰-→ Lingo Bingo Dingo (talk) 23:36, 18 February 2024 (UTC)Reply
Category:Ingrian terms with quotations is ripe for exploitation, but other than that I'm afraid I don't have enough time or enough lack of laziness to set a FWOTD every day. Thadh (talk) 23:43, 18 February 2024 (UTC)Reply
We've also abandoned Wiktionary:Translations of the week and Wiktionary:Collaboration of the week Demonicallt (talk) 18:47, 17 February 2024 (UTC)Reply

alternative forms and alternative spellings edit

Recently, someone asked what the difference was. While looking for examples to illustrate the supposed difference (previously explained to me as: alt. spellings if pronunced the same, alt. forms when pronunciation, or sometimes etymology, differs), I noticed many/most alt forms should (by that metric) be alt spellings. There was also discussion a while ago about how to indicate when the pronunciation is the same as the lemma vs hasn't been added yet (since in most entries that don't have pronunciations, it just hasn't been added yet).

  1. Do we agree on "pronounced the same" vs "pronounced differently" as the difference between these?
  2. If so, can we make {{alt spell}} spell out "alternative spelling of foo (pronounced the same)" or something? (For want of this, I have seen some entries use, and I have used, a pronunciation section that just contains "like foo" ... because duplicating the entire pronunciation section in a way that falls out of sync when people update one entry and not the other is bad, but having no indication about the pronunciation is indistinguishable from it just not having been added yet.)
  3. Should we go through existing {{alternative spelling of}}s and {{alternative form of}}s and make sure they're matching that distinction? I can't do all 94,000+, but I'll go through some with AWB and other people could please pitch in too, if people actually want these templates to be separate and distinct...

- -sche (discuss) 01:20, 11 February 2024 (UTC)Reply

@-sche Yes, that is my understanding of the difference. I can make the change in (2) if others agree, and as for (3), yes we should fix this but it's a long-term project. Benwing2 (talk) 02:12, 11 February 2024 (UTC)Reply
I feel like certain instances of "alt form" being used wrong can be seen by a simple bot program. As an example, there could be a bot program that sees that a certain spelling is defined as an "alt form of" when there is nothing but non-letter chars that separate it from another spelling. (As in fo'o-bar and foo bar.) CitationsFreak (talk) 02:26, 11 February 2024 (UTC)Reply
@CitationsFreak For English, yes, but I wouldn't trust that more generally. Benwing2 (talk) 02:45, 11 February 2024 (UTC)Reply
@User:Benwing Of course, of course! Still the general principle applies. CitationsFreak (talk) 03:05, 11 February 2024 (UTC)Reply
I'm not sure I would assume "fo'o" vs "foo" to be pronounced the same even in English, but let's spot-check some examples and find out. I do think that (for English) a bot fixing all the instances that differ only in hyphenation would be great, and while a bot changing entries that differ only in spacing ("bat man" vs "batman") to "alternative spelling" might create a handful of errors (where the stress actually differs), based on my own spot-checking it seems like it would be a net positive by a wide margin because far more cases are currently wrong in the other direction (listed as "alternative forms" but not different in pronunciation). - -sche (discuss) 07:27, 11 February 2024 (UTC)Reply
@-sche I downloaded the latest dump and fetched all occurrences of {{alternative form of}} or one of its aliases that refer to English. There are 57,005 of them. Looking through them, in addition to your suggestions I think we should also bot-change entries that differ only in capitalization, e.g. big brother vs. Big Brother, catch some Z's vs. catch some z's and cases that differ in a combination of capitalization, spacing and/or hyphenation, e.g. Middle-earth vs. Middle Earth or subsaharan vs. sub-Saharan. Also words that differ in -ise vs. -ize or -isation vs. -ization. Possibly also words differing in -ie vs. -y (leftie vs. lefty, forecaddy vs. forecaddie). Probably also differences in accent vs. no accent (a la carte vs. à la carte, chacun a son gout vs. chacun à son goût, épaulière vs. epauliere; although there are several like banishèd vs. banished where it may or may not count, depending on the intent, which is sometimes spelled out use a label poetic or poetry or even an explicit note "used to specify a disyllabic pronunciation"). In terms of apostrophe, there are cases like Bahá'í vs. Baháʼí with straight vs. curly apostrophe, and also cases like you's vs. yous, 'hood vs. hood, Shi'ite vs. Shiite, St. Paul's vs. St Pauls [here we would have to ignore the difference in punctuation, which seems a good idea to me], Tai-p'ing vs. Taiping that clearly are the same pronunciation; things like Hallowe'en vs. Halloween, Ba'ath vs. Baath, Guy Fawkes' Day vs. Guy Fawkes Day that probably represent the same pronunciation; and things like a'ight vs. aight, our'n vs. ourn that might represent the same pronunciation. There are also differing placements of apostrophes like horses' doovers vs. horse's doovers. (BTW there are some funny things, e.g. 38 alternative spellings of Gaddafi specified using {{alternative form of}}, of which 14 are labeled uncommon and 3 very rare. There was actually an SNL skit about this, listing a zillion alternative "spellings" including I think Chicago and Chewbacca.) Benwing2 (talk) 08:33, 11 February 2024 (UTC)Reply
@-sche OK, I wrote a script to analyze the 57,005 cases to see how many could be converted:
39624 not same as to-page, can't convert to {{alt spell}}
6227 same as to-page with hyphens removed
4860 same as to-page with spaces removed
3066 same as to-page with hyphens converted to spaces
1569 same as to-page with capitalization ignored
 723 same as to-page with full canonicalization applied
 585 same as to-page with accents removed
 346 same as to-page with apostrophes removed
   5 same as to-page
This amounts to 17,381 cases, or about 30% of them. Here, "full canonicalization" means hyphens, spaces, apostrophes and accents all removed and capitalization ignored (i.e. more than one transformation was required to make the from-page and to-page the same). There are many more that are actually alt-spellings, but this is a good start. Benwing2 (talk) 09:21, 11 February 2024 (UTC)Reply
Update: With a few more canonicalizations, I got another 2,956 cases handled:
36668 not same as to-page, can't convert to {{alt spell}}
6243 same as to-page with hyphens removed
4870 same as to-page with spaces removed
3070 same as to-page with hyphens converted to spaces
1639 same as to-page with -ise/isation/isational/isable/isability -> same with -iz-
1573 same as to-page with capitalization ignored
 921 same as to-page with full canonicalization applied
 585 same as to-page with accents removed
 362 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 355 same as to-page with periods removed
 346 same as to-page with apostrophes removed
 325 same as to-page with -ie -> -y
  40 same as to-page with æ/œ -> e
   5 same as to-page
   3 same as to-page with ł -> l
Benwing2 (talk) 11:03, 11 February 2024 (UTC)Reply
I think the spellings of Gaddafi with 'z' actually have /z/ in the pronunciation, and I wouldn't trust 'q' not to induce a pronunciation with /k/. --RichardW57 (talk) 13:48, 11 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── -is- vs -iz- are supposed to use Template:standard spelling of rather than "alternative spelling of", with from=British form (strictly speaking, if being very precise, a combination of from=non-Oxford + labels for other relevant countries) for -is-, or from=American form|from2=Oxford for -iz- (+ other relevant countries if we're being precise). (Possibly we should just create labels like -is- form and -iz- form so we can just change what the labels display whenever we want to change the lists of countries, without having to update entries like we do at present.) Indeed, we should even be changing (or at least looking at, and double-checking whether or not we should be changing) cases that currently use either of "alternative spelling" or "alternative form", to use "standard spelling of".
Cases like big brother vs. Big Brother are meant to use Template:alternative case form of. Cases like catch some Z's vs. catch some z's are debatable; we could say the capitalization of Z doesn't seem to be indicating anything like greater specificity / proper-noun-ization the way it does with Big Brother or Native and so we could say catch Z's is a mere alternative spelling ... but we could also just make it an Template:alternative case form of too for consistency, and that is probably the easier thing to bot-ify.
Given the low numbers, I think we can just review and do the "5 same as to-page, 3 same as to-page with ł -> l" by hand, heh.
I think cases where the difference is an accent in a word-final -éd or -èd should be left using "alt form" rather than "alt spelling", or should examined by humans, as that difference fairly consistently seems to correspond to "the unaccented form can be pronounced either with -ed not being a separate syllable or with it being a separate syllable, whereas the accented form is always pronounced as a separate syllable".
For cases where the differences is plural vs singular possessive, horses' doovers vs. horse's doovers (BTW, if anyone is thinking "why is there a space between the horse and the possessive there?", it's because T:' adds padding), my understanding is that various standard accents distinguish princesses' with /ɪz/ vs princess's with /əz/), so it seems like a good idea to make a list of those for a human to review, if there are only 346, although some may then be bottable because I think no accent distinguishes e.g. drivers' vs driver's. - -sche (discuss) 15:22, 11 February 2024 (UTC)Reply

@-sche Thanks. Some questions:
  1. What about cases like -isable vs. -izeable where there's both an -ise/-ize distinction and an extra e in one of them?
  2. What about cases like Middle-earth vs. Middle Earth or antisemitism vs. anti-Semitism where there's both a capitalization and hyphenation difference?
  3. Which accents make a difference between princesses' and princess's? Are these RP accents? My more-or-less GA speech makes no difference (although I do differentiate Rosa's from roses).
BTW it should be possible to have a bot pull out and flag cases with word-final -éd and -èd, as well as cases with apostrophes involving words ending in -s, -se or -ce (should I also flag cases with -x(e), -ch(e), -sh(e)?). Benwing2 (talk) 20:37, 11 February 2024 (UTC)Reply
Here are the only cases that my script flagged (horse's doovers was the only case with apostrophes):
Page 130420 blessèd: WARNING: Saw from-page 'blessed' same as to-page with accents removed and needs manual checking for accented -ed
Page 1117859 horses' doovers: WARNING: Saw from-page 'horse's doovers' same as to-page with apostrophes removed and needs manual checking for pronounced schwa
Page 2253669 chassed: WARNING: Saw from-page 'chasséd' same as to-page with accents removed and needs manual checking for accented -ed
Page 2716069 lovèd: WARNING: Saw from-page 'loved' same as to-page with accents removed and needs manual checking for accented -ed
Page 2747107 belovèd: WARNING: Saw from-page 'beloved' same as to-page with accents removed and needs manual checking for accented -ed
Page 2747107 belovèd: WARNING: Saw from-page 'beloved' same as to-page with accents removed and needs manual checking for accented -ed
Page 3151227 cursèd: WARNING: Saw from-page 'cursed' same as to-page with accents removed and needs manual checking for accented -ed
Page 3151233 banishèd: WARNING: Saw from-page 'banished' same as to-page with accents removed and needs manual checking for accented -ed
Page 3184364 semi-auto: WARNING: Saw from-page 'semi-auto' same as to-page
Page 3184364 semi-auto: WARNING: Saw from-page 'semi-auto' same as to-page
Page 3200801 case shot: WARNING: Saw from-page 'case shot' same as to-page
Page 6354565 struntish: WARNING: Saw from-page 'struntish' same as to-page
Page 8666150 peakèd: WARNING: Saw from-page 'peaked' same as to-page with accents removed and needs manual checking for accented -ed
Page 9046577 day-long: WARNING: Saw from-page 'day-long' same as to-page
Benwing2 (talk) 21:15, 11 February 2024 (UTC)Reply

I will say that I have always been 100% uncomfortable and unsatisfied with the dividing line between alternative spellings (most closely allied), alternative forms (basically the same), and synonyms (seriously etymologically divergent). And then there's doublets. The theory of what these categories mean is one thing, but treating them as mutually exclusive and finding that dividing line can become absurd at the edges. I am 100% open to changing "alternative form" entries to something else- either "alternative spellings" or "synonyms" or whatever else.
Are "big brother" and "Big Brother" alternative "spellings" or alternative "forms"? Under the doctrine that "alternative spellings are pronoucned the same" one might say they are alternative spellings. But the problem is that these two entries are spelled the same way! It's somewhat counterintuitive to call that pair alternative spellings. Capitalization is not a spelling change. Yes, the Spelling Bee does require you to indicate a capitalization [3], but I feel like the spelling of a word is the letters themselves and not the case of the letters. Hence this is an "alternative" "form" rather than an "alternative" "spelling". A word can be "spelled correctly but capitalized incorrectly" in my understanding. --Geographyinitiative (talk) 00:46, 12 February 2024 (UTC)Reply

As sche said, cases like big brother and Big Brother should use "Template:alternative case form of" rather than either the alternative spelling or alternative form templates.
In regard to apostrophes, although many apostropheless forms with the same pronunciation are just alternative spellings, in some cases there is a difference in formal interpretation e.g. lady's man and ladies' man, despite being pronounced the same and meaning the same thing, are arguably different in more than just spelling since the first uses the genitive singular form and the second uses the genitive plural form. They (in theory at least) represent different interpretations of how the word is formed. There is also ladies man, which we now label as an 'eggcorn': while I can sort of see where this is coming from, I don't think that's the most useful way to describe why the apostrophe is missing in this case. "eggcorn" suggests to me reinterpretations between etymologically unrelated words, whereas the genitive plural "ladies'" and plain plural "ladies'" are obviously etymologically extremely related.--Urszag (talk) 01:36, 12 February 2024 (UTC)Reply
@Urszag I would say, in theory, yes there is a difference between lady's man and ladies' man but it seems too theoretical to worry about. Cf. traveler's diarrhea vs. travelers' diarrhea. I would say to simplify things, we should just use 'alternative spelling of' whenever the pronunciation is the same and the etymologies are, if not precisely the same, very closely related. I agree with you, however, that ladies man is not an eggcorn (which I would reserve for sparrowgrass vs. asparagus and eggcorn vs. acorn and such), but just a case of lazy spelling (it is very common, for example, for place names to start out with apostrophes in them which get dropped over time). BTW User:-sche I created English-specific labels for ise-form and ize-form, which you can see in action at User:Benwing2/test-standard-spelling. Benwing2 (talk) 02:01, 12 February 2024 (UTC)Reply
Just FYI I expanded the above analysis to cover many more cases and reran it on cases that used {{alternative spelling of}} or one of its aliases, producing the following:
7149 not same as to-page, can't convert to {{alt spell}}
2950 same as to-page with hyphens removed
2344 same as to-page with spaces removed
1766 same as to-page with hyphens converted to spaces
1267 same as to-page with -ise/-iser/-ises/-ised/-is(e)ing/-is(e)ation(al)/is(e)able/is(e)ability -> same with -iz-
 759 same as to-page with æ/œ/ae/oe -> e
 671 same as to-page with accents removed
 356 same as to-page with full canonicalization applied
 349 same as to-page with capitalization ignored
 247 same as to-page with ph -> f
 185 same as to-page with -ie -> -y
 160 same as to-page with -our -> -or
 141 same as to-page with -ey -> -y
 137 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 128 same as to-page with apostrophes removed
 119 same as to-page with -re -> -er
 103 same as to-page with -ll/-lled/-ller/-lling -> same with -l-
  74 same as to-page with periods removed
  60 same as to-page with -or -> -er
  39 same as to-page with grey -> gray
  27 same as to-page with -yse/-yser/-yses/-ysed/-ys(e)ing/-ys(e)ation(al)/ys(e)able/ys(e)ability -> same with -yz-
  25 same as to-page with plough -> plow
  23 same as to-page with -gue -> -g
   8 same as to-page with accents removed and needs manual checking for accented -ed
   4 same as to-page with -iseing/-iseation(al)/iseable/iseability -> same with -iz- and omit extra -e-
   1 same as to-page with -yseing/-yseation(al)/yseable/yseability -> same with -yz- and omit extra -e-
   1 same as to-page

Many of those that are in the "not same as" category do not have the same pronunciation and should be moved to {{alt form}}, e.g. half-arse vs. half-ass, stanch vs. staunch, alternate energy vs. alternative energy, hylogeny vs. hylogenesis, palmate compound vs. palmately compound, aluminum hydride vs. aluminium hydride, raddleman vs. ruddleman, superturbocharger vs. turbosupercharger, wazzup vs. wassup, shunyata vs. sunyata, faclempt vs. verklempt, neurone vs. neuron, Moslemize vs. Muslimize etc. Benwing2 (talk) 04:01, 12 February 2024 (UTC)Reply

Also there are quite a lot of misspellings tagged as "alternative spellings" such as kotower vs. kowtower, idiosyncracy vs. idiosyncrasy, guerilla vs. guerrilla, propadeutic vs. propaedeutic, incomunicado vs. incommunicado, dicotyledenous vs. dicotyledonous, etc. etc. Benwing2 (talk) 04:39, 12 February 2024 (UTC)Reply
Rerunning the above expanded analysis on the {{alt form}} uses produces the following:
33076 not same as to-page, can't convert to {{alt spell}}
6243 same as to-page with hyphens removed
4870 same as to-page with spaces removed
3070 same as to-page with hyphens converted to spaces
2028 same as to-page with -ise/-iser/-ises/-ised/-is(e)ing/-is(e)ation(al)/is(e)able/is(e)ability -> same with -iz-
1707 same as to-page with æ/œ/ae/oe -> e
1573 same as to-page with capitalization ignored
1066 same as to-page with full canonicalization applied
 577 same as to-page with accents removed
 363 same as to-page with -ible/-eable/-ibility/-eability -> -able/-ability
 355 same as to-page with periods removed
 345 same as to-page with apostrophes removed
 325 same as to-page with -ie -> -y
 251 same as to-page with -ll/-lled/-ller/-lling -> same with -l-
 233 same as to-page with ph -> f
 215 same as to-page with -ey -> -y
 184 same as to-page with -our -> -or
 169 same as to-page with -or -> -er
 163 same as to-page with -re -> -er
  54 same as to-page with -yse/-yser/-yses/-ysed/-ys(e)ing/-ys(e)ation(al)/ys(e)able/ys(e)ability -> same with -yz-
  50 same as to-page with grey -> gray
  43 same as to-page with -gue -> -g
  21 same as to-page with plough -> plow
   8 same as to-page with accents removed and needs manual checking for accented -ed
   7 same as to-page with -iseing/-iseation(al)/iseable/iseability -> same with -iz- and omit extra -e-
   5 same as to-page
   3 same as to-page with ł -> l
   1 same as to-page with apostrophes removed and needs manual checking for pronounced schwa
Benwing2 (talk) 04:53, 12 February 2024 (UTC)Reply
Re -isable vs. -izeable, I think forms without e are more standard/common(?), at least for the examples I could think of to check; iff this is right, we could bot-normalize accordingly. I.e., if foobariseable, foobarisable and foobarizeable are all currently listed as {{alternative form of}}s of ''foobarizable, we could recast foobariseable as an {{alternative spelling of}} (strictly speaking of foobarisable, but as this requires a user to click through two entries to reach the definition, maybe we just say of foobarizable—or double the templates, as I have sometimes seem people do: {{alt spelling of|en|foobarisable}}: {{alt spelling of|en|foobarizable|nocap=1}}), and recast foobarisable as a {{standard spelling of}} foobarizable, and foobarizeable as an {{alternative spelling of}} foobarizable.
Re Middle-earth vs. Middle Earth, there's not really a great way to handle those AFAIK; we might just say they're alternative spellings and reserve "alternative case form of" for cases where only capitalization differs, but I'd like to know if User:Urszag or other users of "alternative case form of" (besides myself) have any better ideas.
Re stanch vs. staunch: excellent, it would be great if we could also clean up entries in that direction (things given as "alt spellings" that are actually alt forms); and I think that if we modify the "alt spelling" template to make our intended distinction more explicit by adding some text about how it's an alternative spelling "with the same pronunciation" or "(pronounced the same)" or whatever wording we decide is clearest — and who knows, maybe modify the "alt form" template to also make its own intended distinction explicit by saying "alternative form of ... (pronounced differently)" or whatever we think is clearest — that will help users maintain the distinction in the future. Re kotower, ugh; if we can fix such things, great. Re which accents distinguish princesses' vs princess's, AFAIK it's any accent that doesn't have the weak vowel merger yet, including older RP / Standard Southern English etc; in an ideal world I'd love to use {{alt form of}} on those and give them all pronunciation sections showing which accents distinguish vs merge them, but in the interim, if we're changing {{alt spelling of}} and {{alt form of}} to indicate "pronounced the same" vs "pronounced differently", maybe we should allow that text to be suppressed by parameter, and then suppress it on these entries (and eventually give them all pronunciation sections). - -sche (discuss) 16:04, 12 February 2024 (UTC)Reply
I think the plural suffix and possessive clitic in princesses' and princess's are homophones in old RP, both with /ɪ/. What is distinct from both of those is words in -a with a plural suffix or possessive clitic (-as and -a's), with /ə/. — Eru·tuon 17:57, 12 February 2024 (UTC)Reply

κινούβοιλα (Dacian) edit

I ran into this while doing cleanup in Category:Entries with incorrect language header by language. It raises some interesting technical and theoretical issues, so I'm bringing it here. There's some discussion on the talk page, but I'm not sure I agree with the outcome as seen in the entry itself.

The word in question is mentioned as the Dacian word for bryony in De Materia medica by Pedanius Dioscorides, a Greek physician in the Roman era. Not only is this work a priceless treasure for those studying ancient knowledge and usage of medicinal plants, etc., but he also gives names in other languages of the day. In this case, he writes:

βρυωνία λευκή· οἱ δὲ μάδον, οἱ δὲ ἄμπελος λευκή, οἱ δὲ ψίλωθρον, οἱ δὲ μήλωθρον, οἱ δὲ ὄφιος σταφυλή, οἱ δὲ ἀρχέζωστιν, οἱ δὲ κέδρωστιν, Αἰγύπτιοι χαλαλαμόν, Ῥωμαῖοι νότιαμ, οἱ δὲ ἕρβα κοριάρια, οἱ δὲ κουκούρβιτα ἠρράτικα, Δάκοι κινούβοιλα, Σύροι αλλαβιάρια.

My Ancient Greek is mediocre, at best, but I believe βρυωνία λευκή (bruōnía leukḗ, white bryony) is the main name- the lemma, we would call it- and all of the parts starting with οἱ δὲ are other Ancient Greek names for the same plant. Then there are the plural forms of nationalities such as Αἰγύπτιοι (Aigúptioi, Eygyptians), Ῥωμαῖοι (Rhōmaîoi, Romans), Σύροι (Súroi, Syrians) and Δάκοι (Dákoi, Dacians), which are obviously a brief way of saying "the Egyptians call it...", etc.

So, anyway, Δάκοι κινούβοιλα refers to κινούβοιλα as the Dacian name for the plant. This seems to be the only attestation of the word. Note that it's an Ancient Greek spelling occurring in an Ancient Greek text, but it's obviously intended to be a representation of the Dacian word as Dacian. No one will mistake this for IPA, but it's all we've got.

The reason I'm bring this here is the way our entry handles the bilingual aspects of this: it's under a Dacian language header, but the headword line consists of:

{{cln|xdc|lemmas|nouns|feminine nouns}}{{head|grc|noun|g=f}}

So, basically, the headword template is Ancient Greek, but the {{cln|xdc|lemmas|nouns|feminine nouns}} adds all the categories that a Dacian headword template would. As an aside: κινούβοιλα looks morphologically like an Ancient Greek first declension noun, and therefore probably feminine- but how do we know that it would be feminine as a Dacian word?

Thus we have this categorized as both Ancient Greek and Dacian (hence its presence in Category:Ancient Greek entries with incorrect language header). In spite of it occurring in running Ancient Greek text in Ancient Greek script, I don't see why we should treat it as Ancient Greek. Here in the United States, there are a number of extinct languages that are only attested in texts written in English, French and Spanish texts where someone lists some words in the local indigenous language. There are similar issues with languages like Old Prussian in Europe, and with many other languages in Africa, Asia, and Australia, not to mention throughout the rest of the Americas. In addition, there are all the LDLs where we're getting all of our data from reference works that aren't in the languages in question.

Another issue I was thinking about: we generally don't do Reconstruction entries for attested terms, but this is a gray area. I could see having a reconstruction of the original term that's imperfectly represented by the form that's actually attested (just a thought).

Also, it might be interesting to look at the influence of the "host" language for this kind of secondhand attestation. Every language has its blind spots: English speakers have trouble with true voicing distinctions (what we tend to call voicing distinctions are a combination of aspiration and length of preceding vowels), and speaker of most Romance languages have trouble hearing aspiration. Pretty much all speakers of European languages have trouble with tones. And so on, and so on. I wonder if it would be worth the trouble of developing categories for types of attestation beyond the LDL/WDL divide.

Thank you for reading through all of this. Partly I'm just thinking out loud, so I hope this will inspire more discussion to make something actionable out of it, and that it will be worth the time you spent on it. Chuck Entz (talk) 02:00, 12 February 2024 (UTC)Reply

@Chuck Entz Not quite sure what to make of this specific case but what you might call phonological reconstruction of an ancient language attested in an imperfect spelling system is quite common, cf. Old Chinese, Mycenaean Greek, Old and Middle Persian, etc. In those cases we tend to enter the term in its original spelling and try to reconstruct the pronunciation as best we can in the Pronunciation section. So maybe you are right that these should be considered Dacian terms written imperfectly in Greek letters. I think this comes up also in Xiongnu terms written in Old Chinese texts, where there was a debate I think in WT:RFM whether to consider these as Xiongnu or Old Chinese. Cf. User:Theknightwho, User:Thadh, User:-sche who I think participated in that debate. Benwing2 (talk) 02:12, 12 February 2024 (UTC)Reply
@Benwing2 @Chuck Entz So the issue with Xiongnu is slightly different. We have a handful of Old Chinese transcriptions that we're pretty sure are names (or maybe titles), which could go either way. The thing is, the Xiongnu weren't illiterate - we know they had written records, but the conventional wisdom is that they're in Old Chinese. However, after some recent discoveries, Vovin argued (quite convincingly) that it's likely an analogous situation to Japan and Korea, where Chinese characters were used for their semantic values, but the texts themselves are in Xiongnu. Or at least, some of them probably are.
I think a better comparison is maybe Jie, where we have a single sentence transcribed in an Early Middle Chinese source which explicitly describes it as the language of the Jie tribe (秀支替戾岡,僕穀劬禿當). Or possibly even Bala, which is probably extinct, but is attested only in a Chinese transcription from the 1980s of a song. Admittedly, it was transcribed by a linguist, but all the same: it's an imperfect rendering, and any textual analysis has to be done by reference to the likely Manchu cognates (and sometimes Jurchen, since it retained some archaic features lost in Manchu). Theknightwho (talk) 06:00, 12 February 2024 (UTC)Reply
We know too little about Dacian to so much as dream of a reconstructed equivalent of κινούβοιλα.
Otherwise, I agree. We can only put {{head|xdc|noun}} because it is a Dacian word and we don't know that the language had grammatical gender (though it's not unlikely). The mere fact that the word is mentioned in a passage written in Greek does not make it Greek. Dioscurides indeed cites it as the word that the Dacians use. Only if he were using the word in an ordinary way to refer to the plant could a case be made for {{head|grc|noun|g=f}}. Nicodene (talk) 06:39, 12 February 2024 (UTC)Reply
@Nicodene @Chuck Entz There was a discussion on the Hunnic language in which this issue came up, so we've now got the (frankly ridiculous) Ancient Greek entry κάμος (kámos) (noted as a hapax!), which is very clearly described as a "native [Hunnic]" word in the original source. It's nonsense. These arguments make sense when it comes to names, but in situations like this it feels like a bizarre form of mental gymnastics to sweep the awkwardness of an imperfect non-native transcription under the rug, by simply pretending it's not an attestation at all. In that case, there's a separate question of whether they actually are Hunnic, but κάμος (kámos) is most definitely not Ancient Greek, going by the source. Theknightwho (talk) 06:54, 12 February 2024 (UTC)Reply
That is baffling. It is surely better to put κάμος as Hunnic with a question mark, if that is the best available guess, than to put it as Greek, which we know for a fact is wrong. Safest of all, put the various words together in an appendix, as some were suggesting. Nicodene (talk) 07:25, 12 February 2024 (UTC)Reply
The original text apparently doesn't say *κάμος but rather καμον. I would like to ask on what basis the masculine gender has been assumed, to deduce a nominative singular with -ος? Or on what basis the stress position has been assumed? Nicodene (talk) 07:43, 12 February 2024 (UTC)Reply
Right. I have known this subject for long, but not hyperfocussed on it. Which does not absolve of having entry criteria. The leading word in Dacian of this thread is at the correct place, only the header is wrong (I’d do {{head|xdc|noun|sc=Polyt}}), and so would be the Hunnic word; for Trümmersprachen we can definitely lemmatize under attested non-lemma forms, if that makes sense: “citation forms” are not really a concept for languages not systematically known.
If the languages are known enough for citation forms to make sense then slight adjustments can be made, hence for the Systemzwang I entered briginos instead of mentioned briginos (the reasoning of which someone was too simple to transcend, so he RFVed it), and Punic terms supported by comparanda were entered in Punic script: about a dozen Punic plant names mentioned in the same Dioskurides interpolations (we know they were not present in the original work but later added, just for simple language one omits this factlet anyway) or only supported by Berber borrowings, only in the later case being situated in the reconstruction namespace. In the Ugaritic entry 𐎃𐎚𐎐 (ḫtn) I created a day ago you also see that I prefer having Ugaritic items attested but in syllabic cuneiform entered in alphabetic cuneiform if possible, which in this case keeps terms of the same root at one entry, as is always a goal in Semitic dictionaries. The same holds water if a multiword term is attested, 𐤔𐤕𐤋 𐤔𐤃 (štl šd /⁠šəṯīl sad⁠/) but only one or none of the constitutive terms: these get entries in the mainspace anyway, therefore unstarred 𐤔𐤕𐤋 (šəṯīl), at least being explicit that the derived Fügung is the only attestation of this term. For Germanic languages we hold that compounds do not generally attested simples but this does not apply analogically to terms in the Semitic construct state.
Following this discussion I am also more inclined to think that Old Persian or whatever gangaba can only be entered as such, not presented as Latin or a reconstruction, which we know is wrong, irrespective of the inflectional ending. In the old RFD Mnemosientje ponderably intervened that such words need a place to stay and the appendix is where words go die; we have not decided yet where we enter such a word “as such”. There is much standing to avoid having terms in the mainspace whose structure, in spite with systematic knowledge of the target language as in the case of Old Persian, is intransparent; in addition to transcription being imperfect the textual situation is supported by nothing. It is well known that the Thracian and Dacian and Phrygian or whatever words in Greek writing have variants, though I now don’t find a list of the terms where I got this impression from, and of the Punic plant names mentioned in ancient Greek or Roman writings I could only enter about a tenth, due to their being otherwise supported or us being able to make sense out of them. Löw, Immanuel (1881) Aramæische Pflanzennamen[4] (in German), Leipzig: Wilhelm Engelmann treats all the eighty-or-so alleged Punic names in an appendix and the rest I avoided, having all created terms deliberately hiding behind advanced linguistic arguments, in fear of creating a role model (Vorbildwirkung, as we say in construction law) of wanton littering of the mainspace in ancient lay transcription. You know, those Albanians, they will be overconfident about the attestation of ancient languages of the Balkan. Fay Freak (talk) 09:15, 12 February 2024 (UTC)Reply
For convenience, the quote from Curtius is gangabas persae vocant umeris onera portantes, loosely translatable as "the Persians refer to those charged with carrying baggage on their shoulders as gangabas". Nicodene (talk) 09:49, 12 February 2024 (UTC)Reply
I'm torn on this issue. On one hand, there is no real difference between this attestation and any other pre-IPA description of an unwritten language.
On the other hand, there doesn't seem to be any intention here to describe the language, but rather only to give this one native name. It would be the same as an English text saying "The Russian doll, called matryoshka in Russia...", and I think we can agree that matryoshka is an English term, not a Russian one.
This isn't a matter of "imperfect spelling" or anything of the sorts, this is a question of whether the author did try to follow the original word phonologically, or whether it was a borrowing (including phonological/phonotactical substitutions) that he noted to be a borrowing. I would assume the latter. Thadh (talk) 10:20, 12 February 2024 (UTC)Reply
Well, I would agree that "matryoshka" is an English word, but I don't think this is established by sentences of that type. It's established by its use in English sentences that don't explain what the word means as a Russian word.--Urszag (talk) 10:35, 12 February 2024 (UTC)Reply
What Urszag says. The Englishness is in other instances. Fay Freak (talk) 10:40, 12 February 2024 (UTC)Reply
We can agree that matryoshka is an English term only because it actually happens to be one, not because the single quote you have written would have ever proved this.
There is a species of flower that Georgians call yayachura. My writing this in English, using Latin characters for the word in question, in no way makes it an English word. Nicodene (talk) 10:40, 12 February 2024 (UTC)Reply
It doesn't make it a Georgian word, though. It is not an English word only because it doesn't have enough traction. I don't think we can use yayachura used in an otherwise English setting as an attestation of the Georgian term. Thadh (talk) 11:17, 12 February 2024 (UTC)Reply
Of course it's a Georgian word - I've just told you it is. Writing ყაყაჩურა in another alphabet doesn't make it magically not what it is. Nor does talking about it in another language.
And yes that is how English words work. 'Globulicious' needs only to gain a little traction and it too can be one. Given that I've invented this about four seconds ago it may be a bit premature though. Nicodene (talk) 12:04, 12 February 2024 (UTC)Reply
In "The Russian doll, called matryoshka in Russia…", matryoshka is neither a Russian nor an English word, it is an interlanguage of sorts. Such quotes are useless for the purposes of attestation. All they can do is point towards the existence of such a word, and set us on the path of looking for real quotes. PUC12:14, 12 February 2024 (UTC)Reply
It is an attestation, by definition, of the existence in Russian of the word matryoshka. It may not be an optimal attestation, but it is one nevertheless. And if this were the only time the word occurred in an English sentence, lemmatising it as English would be mad. Nicodene (talk) 12:22, 12 February 2024 (UTC)Reply
No, it is an attestation of the existance of something that the English speaker identified as matryoshka. This is not the same thing. If I hear some Arabic person saying 'alhairwan, not knowing a single thing about Arabic and having no idea what it means or whether it was a word, it doesn't mean there is anything of the sort in Arabic. Thadh (talk) 12:37, 12 February 2024 (UTC)Reply
To continue this thought, having worked with a lot of smaller languages, I have found myself seeing bullshit ad-hoc transliterations of non-existant words into Russian (or some other language) that was based on a misunderstanding of a misheard word in the target language.
An example is лайба (laiba), which allegedly means "Izhorian sailboat", with the only similar word in Ingrian being laiva (ship); There is no cultural aspect to its meaning, nor is there a form with -b- (or -p-) to be found in any resource I could find or any language in the region, but Russian ethnologists seemingly misheard and misinterpreted this word. In the end this word did make it into the language as a borrowing, but it might well not have. Thadh (talk) 12:49, 12 February 2024 (UTC)Reply
Yes I'm sure that an author who gave the definition of a word in Dacian, in a form that just so happens to resemble other Paleo-Balkan forms, attested in exactly the same sense, had no idea it was a word, nor knew what it meant. He was just randomly scribbling letters and got really lucky.
We've moved beyond the point, which was whether it should be lemmatised as 'Ancient Greek'. Has your point has now shifted to 'Dioscorides' testimony is illegitimate to begin with' and therefore the entry shouldn't exist? Nicodene (talk) 12:54, 12 February 2024 (UTC)Reply
I'm not sure if we should lemmatise it as Ancient Greek or not record at all, but I don't see evidence of it really being Dacian. It may be a borrowing, it may be a transcription, but as long as we have no proof of the accuracy of this transcription I don't think it's fair to call this a part of the original language.
If I were to take your example, ყაყაჩურა, and say "Well, I've heard from a friend that there's this Georgian plant called yaychoora (/jeɪˈtʃʊɹə/)", I think you'll agree this isn't Georgian at this point, even though it "resemble[s]" other Karvelian terms. Thadh (talk) 14:21, 12 February 2024 (UTC)Reply
What there is actually zero evidence of is this somehow being a 'borrowing into Greek' when the one mention by a Greek man explicitly calls it a foreign word. A man who, let's not forget, had just finished listing all the ways that Greeks do call the plant, then went on to contrast it with how Dacians call it! I can't believe we're even having this conversation.
That, and the parallel with the Thracian makes it clear beyond any reasonable doubt that Dioscorides' testimony is not some gibberish like the example you have attempted. I'm not sure why you've decided to run it through some odd spelling-pronunciation when the route of transmission under discussion is aural - thus a starting-point /q'aq'atʃura/. Nicodene (talk) 14:57, 12 February 2024 (UTC)Reply
You're just trying to dismiss my argument on the basis of small inaccuracies, ignoring the bigger picture. Even a transcription like "cuckchura" (/kʌkˈtʃʊɹə/) would not be acceptable as a Georgian word, or an attstation of it, only as an English attempt at recreating one. Same here: The term itself is only similar to the Thracian one in so far as it seems to have a similar stem and maybe a similar suffix, but it's not similar enough to be able to apply sound laws to it. I would not expect a Greek scholar to be able to accurately transcribe a Dacian term. Thadh (talk) 17:28, 12 February 2024 (UTC)Reply
The bigger picture is this: once you actually produce any evidence that the word was massively distorted, besides 'but what if??', then and only then will any of this amount to an actual argument. Nicodene (talk) 18:41, 12 February 2024 (UTC)Reply
But otherwise we get this weird stuff like when koekchuch was listed as an Itelmen word and not English. Tollef Salemann (talk) 20:47, 21 February 2024 (UTC)Reply
I also concur. --RichardW57m (talk) 10:58, 12 February 2024 (UTC)Reply
There's an awkward aspect to this. What do we do when foreign words have the inflections of the matrix language? Perhaps we can treat each inflected form separately, and explain the inflection in the etymology. Perhaps for pages with many languages, we should have a soft direct. I'm not sure whether we even need a part of speech - and in many cases it might be discordant between the two languages. --RichardW57m (talk) 11:07, 12 February 2024 (UTC)Reply
Then we drop or adapt the inflection in so far as we know the strange language, and don’t if we don’t know anything to that direction. Because otherwise we can’t well list the term as a term in the strange language, while we can’t list it as a term in the matrix language either since it is a false claim. The same problem actually occurs with uses in one known language: Latin farfarum being of unknown citation form but not reconstructed either; if a citation form is not attested one puts the star after the assumed citation form, in the academic writing. Fay Freak (talk) 12:10, 12 February 2024 (UTC)Reply
There's a key difference between the Dacian example and "The Russian doll, called matryoshka in Russia..." : the reason the latter is not an attestation of matryoshka as either an English or a Russian word, the reason it's an "interlanguage" as PUC put it, is that for English and Russian our attestation requirement is use, and that's not a use — even "The London palace, which Londoners call Ally Pally" would not be an attestation of Ally Pally as an English word, in the sense we mean when considering whether to have an ==English== entry for it, because it's not a use. But for extinct languages where we don't require use and mention is sufficient, we do accept a 'reliable' Roman author saying (in and about Latin) "The foobar plant, called thymum foobaricundum in Rome..." as an attestation of thymum foobaricundum sufficient to have an entry for it! We accept mentions of various Native American words in Spanish texts (including, clearly, in cases where the Native language is not known to have used the Latin alphabet). For this attestation of a Dacian word in an Ancient Greek text, I think it comes down to whether scholars regard the attestation as reliable vs unreliable (I think there are cases of ancient authors confusing which people/language used a particular word, but in this case, if the evidence is that this is correct, then my inclination, at least based on the discussion so far, is to agree with those saying we should have a Dacian and not an Ancient Greek entry). - -sche (discuss) 17:19, 12 February 2024 (UTC)Reply
This leads us to an additional argument Thadh apparently needs: even from analogy there can only be a Dacian and not Ancient Greek entry, because for those American Indian cases we can’t have English or Spanish entries due to the requirement of use in English or Spanish. It is conceptually incorrect though to claim that there is not an attestation of the Russian word: there is, in the usual academic sense of “attestation”, but none according to our quality standards for Russian, none we make use of because we committed to make use of better occurrences. In either case the enterprise of the author of the sentence of mentioning a foreign word and not employing or mentioning a word in his language is clear. Fay Freak (talk) 17:31, 12 February 2024 (UTC)Reply
So basically what you are saying is that we should be fine with using low-quality resources for smaller/more vulnurable/extinct languages? I take extreme issue with this, it's fine saying a language needs fewer attestation, but these attestations should still be up to the standard of our dictionary, and that standard should be the same across all languages. Thadh (talk) 17:39, 12 February 2024 (UTC)Reply
A hungry man will be content with fast food. Fay Freak (talk) 17:42, 12 February 2024 (UTC)Reply
Not every hungry man, and we shouldn't be such a hungry man. Thadh (talk) 18:00, 12 February 2024 (UTC)Reply
I could turn your question around: so basically what you are saying is that we should suppress records of small, vulnerable communities' languages precisely in those cases where previous colonizers tried to suppress those languages before whoever meets your definition of a proper scientist could record them? I would take extreme issue with that. I don't think it's helpful for you or me to accuse each other of such motives. As I said, we ought to evaluate case by case whether a source is usable; in some cases, a record is confused (e.g. records of Loup, especially what is now separated out as Loup B), and we don't or shouldn't use it; in other cases, notwithstanding that the original record was by no proper scientist, there's modern scholarship devoted to even single attested words and evaluating what sound laws etc they fit and what implications that has for what family the language would've belonged to. (And many cases are in the middle somewhere.) In this case, it seems like there is modern scholarship accepting this as an attestation of a Dacian word and even speculating on what sound laws etc it indicates, e.g. centum vs satem or simply depalatization. If there is other modern scholarship arguing that this author was confused and mistook some other language's word for Dacian, please bring it to bear. - -sche (discuss) 18:34, 12 February 2024 (UTC)Reply
I strongly agree with @-sche. Theknightwho (talk) 18:55, 12 February 2024 (UTC)Reply
I honestly don't think we should create entries for the Native American terms attested only as mentions in non-scientific literature either. I also don't see how those would not be the same "interlanguage" as matryoshka is. Seems to me like we have one rule for well-attested languages and a completely different rule for those that aren't, and while sometimes that is justified (namely, number of quotes and/or cites in scientific literature), I don't think it is in this case. Thadh (talk) 17:32, 12 February 2024 (UTC)Reply
You won’t really convince people with this stance. Americanists take all they get, and editors will put those mentions somewhere, interlanguage or not, the motivation is not shattered by this Cartesian abstraction; Native American communities have been inviting or seclusive to varying degrees and you can’t even strictly define what is science and what not: it also depends on the current technological, and intellectual, capabilities at any given time. Fay Freak (talk) 17:42, 12 February 2024 (UTC)Reply
If we get a word like koekchuch, which was once registered here as Itelmen, now it is clear that the real spelling is unknown, but we have the original spelling from a Russian source. Does it hurt Itelmen? No, because if you scroll down, you get a category reference to all the Itelmen words in English. So, even if the original Itelmen word is lost, we have it still in English listed here in the right category. The other way of doing it is stupid. It is like the guys who read the Hebrew letter from Khazars and find a tribe called "wnntit" and are sure that it is the Vyatichi tribe. Tollef Salemann (talk) 21:00, 21 February 2024 (UTC)Reply
If you need to scrap a lost language from every corner of the world just to get a few words written by a foreigner, you still can set these words together to find patterns you can use for compare it to other languages. But as i said, you have already the categories here for these kinds of words. The one smart idea which can be done, is to create categories for every language, which are gonna contain all the words borrowed into other languages, so you can easily find an obscure term like barrabora listed both in Itelmen and English main topic. Now you find it only in English categories, because it is an English term. Tollef Salemann (talk) 21:13, 21 February 2024 (UTC)Reply
Are you even sure that the spellings are ok? He gives translations into Syriac, but they are weird. Like, for lilly he gives "sasa", which is very distant from all the Semitic names of this plant, not only because Greek has no "sh"-letter. His Latin seems ok tho. Tollef Salemann (talk) 21:35, 21 February 2024 (UTC)Reply

Announcing the results of the UCoC Coordinating Committee Charter ratification vote edit

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Dear all,

Thank you everyone for following the progress of the Universal Code of Conduct. I am writing to you today to announce the outcome of the ratification vote on the Universal Code of Conduct Coordinating Committee Charter. 1746 contributors voted in this ratification vote with 1249 voters supporting the Charter and 420 voters not. The ratification vote process allowed for voters to provide comments about the Charter.

A report of voting statistics and a summary of voter comments will be published on Meta-wiki in the coming weeks.

Please look forward to hearing about the next steps soon.

On behalf of the UCoC Project team,

RamzyM (WMF) 18:24, 12 February 2024 (UTC)Reply

Nice. Icantthinkofaname12 (talk) 20:39, 16 February 2024 (UTC)Reply

Glossing entries based on where the referent is found edit

I'm a bit concerned by this user's large number of edits of this kind: [5]. See the user's talk page for my reasoning. Equinox 21:10, 17 February 2024 (UTC)Reply

Yeah, sounds like a shit idea. PUC21:11, 17 February 2024 (UTC)Reply
@Equinox Yeah this is garbage. IMO if this user won't stop on their own they need to be blocked. Benwing2 (talk) 21:36, 17 February 2024 (UTC)Reply
It doesn't fit well either in our context labels or our topical categories. The notion that the referent is to be commonly found principally in a particular region would be fine in a definition. DCDuring (talk) 23:44, 17 February 2024 (UTC)Reply
@DCDuring, @Benwing2, @Equinox: do we want to revert this and/or other edits like this? It seems the edit still stands at the moment. Kiril kovachev (talkcontribs) 03:27, 18 February 2024 (UTC)Reply
@Kiril kovachev Yes. Do you know how many there are? Benwing2 (talk) 03:34, 18 February 2024 (UTC)Reply
@Benwing2 I'm afraid not, I only saw that we hadn't changed this particular one. Looking through the contributions is possible though: diff on brook seems a bit odd - is that a word only used in Northeastern US...? skillet? There are probably quite a few judging by this... Kiril kovachev (talkcontribs)# Kiril kovachev (talkcontribs) 03:43, 18 February 2024 (UTC)Reply
The label for brook approximated what I found in DARE. He might be using DARE, but DARE may often be too detailed for us. DCDuring (talk) 04:35, 18 February 2024 (UTC)Reply
@DCDuring I am going through the changes; most appear garbage. You are welcome to review them more specifically but there are over 500. I doubt brook for example is only Northeastern US; it is poetic everywhere. Benwing2 (talk) 04:49, 18 February 2024 (UTC)Reply
We are more a people's dictionary than a literary one. I reworded brook to more closely follow DARE. It may be more detail than we want and may need a reference. DCDuring (talk) 04:55, 18 February 2024 (UTC)Reply
Saying brook is Northeast-only is just wrong if it's used everywhere in poetry (and I doubt it's only poetry). Benwing2 (talk) 04:58, 18 February 2024 (UTC)Reply
@User:Benwing2 So, add some qualification. Lots of contributors make contributions that need correction. Reversion is abusive of a good-faith user. DCDuring (talk) 16:26, 18 February 2024 (UTC)Reply
@DCDuring Well in the case of brook, you changed the qualifier from misleading (implying it was only used in the Northeastern US without saying as much), to outright wrong (saying it originated in New England), when our first quote is from Shakespeare in the 1590s, followed by one from the KJV. Theknightwho (talk) 18:55, 18 February 2024 (UTC)Reply
You are right. I took the DARE perspective, which ignores usage outside the US completely AFAICT. DCDuring (talk) 19:58, 18 February 2024 (UTC)Reply
I wonder whether brook is used in India. It does seem to be used in the UK, eastern Canada, eastern Australia, and eastern US.
@User:Benwing2 He also seemed to be working from DARE for skillet. Did you check DARE or have a better source?. I think it wrong to revert first and ask questions later. DCDuring (talk) 04:51, 18 February 2024 (UTC)Reply
@DCDuring I don't have access to DARE but on first glance many of the labels are patently wrong. I am reviewing them one-by-one but making educated guesses as to whether a change is plausible. Benwing2 (talk) 04:56, 18 February 2024 (UTC)Reply
I'm sorry for troubling you, I explained my reasoning for many of my edits in my talk page and I can provide my reasoning for the more dubious choices if you like. I thought if I were doing anything wrong that it would've drawn more attention sooner. SwordofStorms (talk) 05:06, 18 February 2024 (UTC)Reply
@SwordofStorms You still don't understand the difference between usage and reference. Terms like beignet, dirty rice, jambalaya etc. are used everywhere to refer to terms that may originate in specific parts of the US; but that doesn't mean they should be labeled with those parts. Terms like bitching and grody may have originated in California but have spread everywhere in the US. DARE is also quite out of date. Benwing2 (talk) 05:11, 18 February 2024 (UTC)Reply
@User:Benwing2 Early volumes of print DARE are old: volume 1 is copyright 1985, with older data, of course; there has been much loss of regional difference in the US as people move between regions, use national rather than local media, etc, but much remains. We are, among other things a historical dictionary, so mere age should not be a disqualification of contributions (or contributors!). DCDuring (talk) 16:04, 18 February 2024 (UTC)Reply
I assume that would mean that 'bunny chow' shouldn't have a 'South Africa' label and 'bangers and mash' shouldn't have a UK label then. But they do so I do hope you understand the confusion on that part.
I'm sorry about the DARE entries but that's a source Wiktionary's sister site, Wikipedia uses liberally lol. I understood some of my entries were tenuous and I was hoping I'd be corrected if I were jumping the gun! I'm not well travelled! SwordofStorms (talk) 05:19, 18 February 2024 (UTC)Reply
OK I reviewed this user's changes (and reverted most of them). User:SwordofStorms you also don't seem to understand what doublets are; just because two terms are etymologically related doesn't make them doublets (cf. water, hydro, und). Benwing2 (talk) 06:10, 18 February 2024 (UTC)Reply
@User:Benwing2, @ChuckEntz Are User:Demonicallt and User:SwordofStorms the same? DCDuring (talk) 16:26, 18 February 2024 (UTC)Reply
@DCDuring Looks like Wonderfool to me. Benwing2 (talk) 20:36, 18 February 2024 (UTC)Reply
Definitely not WF. Wrong kind of mistakes. Chuck Entz (talk) 21:14, 18 February 2024 (UTC)Reply
@Chuck Entz I mean that Demonicallt looks like WF, not SwordofStorms. Benwing2 (talk) 22:44, 19 February 2024 (UTC)Reply
@Benwing2 Yes, Demonicallt is unquestionably WF. DCD asked whether they were the same, so it looked you were saying "Yes, they're both WF" rather than "No, unlike SwordofStorms, Demonicallt is WF". Thank you for clarifying. Chuck Entz (talk) 23:16, 19 February 2024 (UTC)Reply
Part of the problem has to do with the fact that topic and register/restricted-context labels appear identically. How can we blame a new contributor for being confused about this. We are compounding the problem by discouraging a new contributor by reverting good faith contributions based on knowledge of how experienced users (knowledgeable about the distinction we make between topic and register/etc.) view subjective beliefs or personal concept of what an entry feature should be. We should take this as an indication that we need to resolve the confusion in our label/topic display. DCDuring (talk) 16:04, 18 February 2024 (UTC)Reply
The confusion is caused by taking over labelling by means of brackets from past print dictionaries. These labels look identical to additional information in parentheses such as by {{gloss}}. If one were to add some other decoration separating the labels from the header “the difference between usage and reference” would transpire more. Fay Freak (talk) 17:08, 18 February 2024 (UTC)Reply
OK, @SwordofStorms if you were relying on DARE saying the terms really were regionally restricted, I apologize for my tone on your talk page, I can't fault someone for not realizing if DARE is not actually reliable; I would've thought it was reliable myself, unless I saw it saying something like that brook or cheesesteak was regionally restricted (does it really say that? I'm disappointed if it's so inaccurate/unreliable).
To DCDuring's point: yes, historical information, where it's accurate and useful, is worth preserving/indicating ... but we have to be careful, because as the words we're discussing demonstrate, (1) sometimes the supposed historical information is wrong, as with the claim that brook was originally New England (no, it was used already in old England), and (2) often, such "historical information" is not useful: e.g., from the perspective of sheer technical correctness, we could add "originally Britain" or "originally England" as a label on brook, and furthermore on stream and tens or hundreds of thousands of words which were first attested in Britain/England sometime before English-speakers even reached the Americas, but that's not useful, now is it? We have to be sure there was a period of time when a term was actually dialectal. - -sche (discuss) 19:59, 18 February 2024 (UTC)Reply
I found that AHD took the trouble to have the following:
Our Living Language Traditional terms for “a small, fast-flowing stream” vary throughout the eastern United States especially and are enshrined in many place names. Speakers in the eastern part of the Lower North (including Virginia, West Virginia, Delaware, Maryland, and southern Pennsylvania) use the word run. Speakers in the Hudson Valley and Catskills, the Dutch settlement areas of New York State, may call such a stream a kill. Brook has come to be used throughout the Northeast. Southerners refer to a branch, and throughout the rural northern United States the term is often crick, a variant of creek.
Of course, they have the advantage on us of having the US (alone?) as their target market. DCDuring (talk) 20:23, 18 February 2024 (UTC)Reply
@DCDuring @-sche I may have been a bit heavy-handed on the reversions but a lot of them appeared outright wrong to me such as saying fridge, freeway and frontage road are regionally restricted in the US (possibly 'frontage road' is regionally restricted but the regions given didn't make sense to me) and terms like arroyo and caliche are Texas-only (e.g. I grew up in Arizona and we used them there; as these are Spanish-origin terms maybe they are more present in the Western US but not only Texas). Some added definitions were just bizarre e.g. under bring this user added the following:
# {{lb|en|Florida}} To carry something inside; to have something.
@DCDuring: The main problem is that inline labels are the wrong tool for presenting DARE data. The goal of DARE is to document the variation in regional English usage in the United States. This is rarely simple and concise enough to represent in the amount of space available in front of the definition. When you use context labels, you have to leave things out, which makes what's left rather cryptic and/or ambiguous. Where it's of interest it should be presented as "Usage notes", which were designed for precisely this sort of thing. Chuck Entz (talk) 21:14, 18 February 2024 (UTC)Reply
I had earlier written "but DARE may often be too detailed for us." I'm not certain that it is, but I mostly object to the manner in which this mass reversion was carried out: shoot first and ask questions later. (And disparage evidence rather than admit you have none to support your own preferences.) DCDuring (talk) 21:22, 18 February 2024 (UTC)Reply
In general I left terms I didn't recognize except those referring to regional food items, where the confusion between usage and reference was present. My logic is that I'd rather remove incorrect info than try to correct things one by one where I don't have the resources to do so. In general this user doesn't seem to be applying common-sense tests to their labels (the "smell test") but taking some resources on face value. If this resource is DARE, then we know now that DARE is wrong or at least quite out of date in their data. Benwing2 (talk) 20:23, 18 February 2024 (UTC)Reply
@User:Benwing2 How hard would it have been to correct arroyo (only heard it in context of descriptions of SW terrain and mostly mentions) and caliche (never heard of it before)? DARE supports the labels SwordofStorms provided. What source do you have to contradict that other that personal opinion?
Well, you probably won't have to worry too much about SwordofStorms coming back.
You have made me feel less embarrassed about my own occasional dyspeptic outbursts. DCDuring (talk) 20:36, 18 February 2024 (UTC)Reply

Use of T:lang edit

See User talk:Benwing2#Italicising synonyms for taxonomic names for a relevant discussion.

Is there any kind of community position (for or against) on using {{lang}} like this? User:PUC objected, for no good reason, IMO. Incidentally, I object to PUC's rank-pulling and rudeness (see Special:History/schema logu and his/her talk page, respectively) in dealing with this issue, if that counts for anything. 0DF (talk) 23:12, 17 February 2024 (UTC)Reply

There's no reason to do this that doesn't apply to almost every line on millions of pages. I don't see the point. Chuck Entz (talk) 23:23, 17 February 2024 (UTC)Reply
The language is inherited from the page-language. For example it is lang="en" in the <html> element while lang="de" on German Wiktionary. The template {{lang}} is for marking a text as in another than the inherited language, so using it this way you abuse the template. Fay Freak (talk) 23:30, 17 February 2024 (UTC)Reply
@Chuck Entz: I gave the rationale in this edit summary (i.e., w:Template:Lang#Rationale, specifically points 1, 2, 3, 6, and 7) and in this post (“{{lang}} skips the left-hand table of contents, which serves all readers, not just visually-impaired ones”). That was before Fay Freak informed me that “[t]he language is inherited from the page-language” (thanks for that, F.F.). In the light of that, only the section-linking caused by {{lang}} still applies as a reason; that can just as easily be achieved by using {{l|en}}, so I'll use that instead in future. 0DF (talk) 23:52, 17 February 2024 (UTC)Reply
I've unblocked the page, but what do you have in mind exactly?
On another note, I apologize for the rudeness, but having to deal with large numbers of vandals or well-meaning but ultimately clueless editors (I'm not saying you're one of them, though) has made me a bit callous. I simply have lost the patience to explain to people how things are done here. When I see what looks like nonsense of any kind, I'm inclined to get rid of it as fast as possible so that I can go back to adding more material myself. Not the ideal mindset in an admin, but hopefully what I'm doing is still a net benefit for the project. PUC00:04, 18 February 2024 (UTC)Reply
@PUC: Thanks for the apology. It's all water under the bridge. The section-linking I have in mind is to skip the left-hand table of contents, which pushes all the actual entry-contents down, often off the screen. 0DF (talk) 00:10, 18 February 2024 (UTC)Reply
@0DF It is not normal to use {{l|en|...}} in definitions. Just use bare links, unless the link is to the pagename itself. Benwing2 (talk) 00:04, 18 February 2024 (UTC)Reply
@0DF: You decrease source-code readability and strain server-side execution time without sufficient reason, I think. The section-linking reason you state is a reason but sore meagre: consider that there is a collection of reasons why English (and translingual) sections are put at the very beginning of main-space pages. When I began here the first months I thought the section-linking also great and was enough excited to employ {{l}} in the same fashion, later I valued my fingers and the brain typing on them too much—before I even deadlifted with the same and systematically engaged in maxxing out mental resources. This does not say that it is not particularly desirable in individual cases to link specific senses on large target pages or entry sections, which is admitted by those who read the source-code. Fay Freak (talk) 00:06, 18 February 2024 (UTC)Reply
@Benwing2, Fay Freak: OK, so in what cases is linking to language sections desirable (outside term-list sections, I mean)? 0DF (talk) 00:13, 18 February 2024 (UTC)Reply
Whenever linking to non-English terms, and when linking to English terms in term-list sections and the like (as you call them), and when a specialized template is required e.g. {{cog}}, {{inh}}, {{bor}}, etc. Generally yes in lists, no in running text using {{l}}. Benwing2 (talk) 00:20, 18 February 2024 (UTC)Reply
I will not give a comprehensive answer, since this question actually rarely occurs, as such, empirically. But in general specific links concern particular senses that do not immediately surface, by the typical viewing behaviour of a reader: a slangy sense might not be likely to be remembered in context and then sought out: like leg day links a specific sense of split; it does so in a holonyms section with |id= but by the same token it can be desirable to link senses in foreign-languages glosses, which I can not think to do in another way than by {{l}}. This is for senses however, with language sections the specific use can not be that high, since such links are not that specific. Fay Freak (talk) 00:22, 18 February 2024 (UTC)Reply
@Benwing2, Fay Freak: So there are no cases where you would deem it desirable to link to the English-language section of an entry from a definition? No matter how long the table of contents? Is the left-hand table of contents not regarded as an issue? 0DF (talk) 00:27, 18 February 2024 (UTC)Reply
@0DF: I have not found, or at least do not remember, which is tantamount, a case empirically, where it is sufficiently desirable for one to exert oneself to link an English-language section, in seven years. Know that tables of contents can and are also manipulated, as on bar. I am concerned for your end-user device, writing this on 27" QHD, high response over DisplayPort with high-end graphics card and wont mouse. But I make this value judgment even though the interphalangeal joints of my right index finger have already hurt many days during the last two years due to all the computer-work or doomscrolling. Fay Freak (talk) 00:49, 18 February 2024 (UTC)Reply
@0DF I agree with User:Fay Freak here. I think it would be better on pages with excessively long TOC's to add the appropriate settings to keep the lower levels of the TOC closed by default (as is done on bar). Benwing2 (talk) 00:54, 18 February 2024 (UTC)Reply
Also I mentioned one specific use case, which is when the link is to the page itself, because in that case a bare link shows up as unliked boldface. Benwing2 (talk) 00:56, 18 February 2024 (UTC)Reply
@Benwing2: Yes; noted with thanks. That is also a case in which {{lang}} doesn't work; only {{l}} will do there. 0DF (talk) 01:32, 18 February 2024 (UTC)Reply
@Benwing2, Fay Freak: The table of contents for bar, despite being limited to the forty-nine level-two language headers only, still takes up two whole screens' worth of vertical space for me. That would be exactly the kind of case in which I would think that section-linking would be desirable. But, if The Community deems otherwise, so be it. 0DF (talk) 01:30, 18 February 2024 (UTC)Reply
@0DF: For me three fourths of the TOC take up the whole screen and I am zoomed in 160%–170%, with no zoom where I can still read well the whole table takes 100%. I sometimes (not rarely; the behaviour is automatical after some use) click language sections in the TOC but yet I realize the order is pointless. I compare to Wikipedia’s default and prefer their TOCs technique now, which is from two years ago. In your preferences → Appearance their skin is Vector 2022. Should’ve tried before instead of maltreating my index as in 2021: It looks much, much better, also in comparison to Wikipedia! We might vote on it to make it default on Wiktionary, if basically editors weren’t energetic enough to even check out available layout improvements, busy with streamlining other parts of Wiktionary. Fay Freak (talk) 02:08, 18 February 2024 (UTC)Reply
@0DF IMO it's not worth the effort to try and have a special list of terms where we link specifically to the English section; for one thing it's definitely not maintainable, for another these are almost always common English terms so the gain of saving a couple of mouse clicks/scrolls seems hardly worth it. User:Fay Freak's suggestion of using Vector 2022 is also an alternative; for me it wastes too much whitespace on the left and right but if the TOC issue is a big concern it may be worth it. Benwing2 (talk) 02:46, 18 February 2024 (UTC)Reply
@Benwing2: I figure that the impression of waste is shaped by imagining yourself editing some Module code sections, which they don’t do that deeply at Wikipedia, hence their vote. For reading and editing in general it is of course more amenable by decreasing the paths the fingers have to move the mouse to edit and view the history and read it again, or switch between entry and discussion, hence the main actions there in the narrower middle column, and texts having only limited width makes as much sense as multiple columns in newspapers, more information with shorter eye moves. Even most other actions need no scrolling, but particularly the search, which is much more important on Wiktionary, but on Vector 2010 I have to scroll back to the top (or shift through the browser search bar) to search another word when I am in some L2 header section under the very top. Maybe we can make the gaps less extreme and more in alignment with Wiktionary through CSS, otherwise I must own that the Foundation’s designers have outslicked us in balancing the psychological and physiological effects of the interface upon editors and only-readers. Fay Freak (talk) 03:32, 18 February 2024 (UTC)Reply
@Fay Freak, Benwing2: My screen resolution is 1,920 × 1,080 pixels, if that helps you to visualise. I tried the Vector (2022) skin, and that does indeed get rid of the problem, but at the acknowledged cost of instead wasting horizontal screen-space. However, I make a point of using mostly default settings, since those are the ones that all unregistered users will have, which I assume to be the vast majority of them, their being the ones for whom we should try to optimise things, other things being equal. Is there some way to apply {{col-auto}}-style columns to the table of contents in the Vector legacy (2010) skin in order to save vertical screen-space? 0DF (talk) 19:49, 18 February 2024 (UTC)Reply
@0DF Maybe User:This, that and the other knows? This would involve some CSS magic. And User:Sarri.greek was also trying to work on changes to the TOC to save space. Benwing2 (talk) 20:33, 18 February 2024 (UTC)Reply
Well, if you're asking, my personal preference would be to enable TabbedLanguages for everyone by default. And my second preference would be to enable Vector2022 for everyone by default... as DCDuring often says, we squander the width currently available to us in the vast majority of entries, and Vector2022 at least uses this space for something else.
Failing that, see User:This, that and the other/columnar toc mockup for a couple of mockups. I tend to prefer the second one. We could use media queries to ensure the columns only display on sufficiently large screens. This, that and the other (talk) 00:52, 19 February 2024 (UTC)Reply
@This, that and the other: The second one looks good. How would it look on bar, restricted to level-2 headers only? The Vector (2022) skin or an improved version thereof might eventually be the way to go, but that's no reason not to improve the TOC problem with the Vector legacy (2010) skin in the meantime. 0DF (talk) 01:05, 19 February 2024 (UTC)Reply
@0DF you can try it yourself by pressing F12 in your browser, going to the Console tab, and pasting the following code:
document.querySelector('.toc').style.display = 'flow-root'; document.querySelector('.toc > ul').style.columnWidth = '20em';
Note that, for it to work effectively, the {{TOC limit}} template should come after the {{character info}} template in the wikitext of the page. Currently these are the wrong way around at the bar page. This, that and the other (talk) 01:20, 19 February 2024 (UTC)Reply
@This, that and the other: Sorry to be dense, but what do you mean by "the Console tab"? 0DF (talk) 13:41, 19 February 2024 (UTC)Reply
@0DF When you press F12, a new panel should open up with various tabs along the top such as "Elements" (or "Inspector), "Console", "Network" and so forth. Switch to the "Console" tab. This, that and the other (talk) 03:56, 20 February 2024 (UTC)Reply
@This, that and the other: Unfortunately, all that happens when I press F12 is that my computer's Calculator program starts. 0DF (talk) 02:10, 21 February 2024 (UTC)Reply
@0DF It may depend on your browser. Under Chrome on MacOS, for example, you need to go to View -> Developer -> JavaScript Console to get a console pane to pop up. Under Firefox on MacOS you need to go to Tools -> Browser Tools -> Browser Console. Benwing2 (talk) 02:42, 21 February 2024 (UTC)Reply
@0DF Try pressing Fn+F12 or Ctrl+Shift+I (capital "i"). This, that and the other (talk) 03:10, 21 February 2024 (UTC)Reply
@Benwing2, This, that and the other: Thank you both. Fn + F12 was the first thing I tried and it worked, so I didn't try the other methods.
@This, that and the other: The TOC on bar looked great with your code. I made the correction to the order of {{character info|㍴}} and {{TOC limit|2}} you recommended, but I prefer the four-column presentation of the TOC when those two templates are in the "wrong" order to the three-column presentation of the TOC when they're in the "right" order. Nevertheless, they are both a great improvement, and in both cases the resulting TOC fits on the first screen with plenty of space to spare.
@Benwing2, This, that and the other: I think the columnar TOC should be rolled out universally. I also think that the TOC should be limited to level-two and -three headers by default. What do you think? Does it call for a vote? 0DF (talk) 00:48, 22 February 2024 (UTC)Reply
@0DF I'm not sure whether it needs a vote, but if nothing else a thorough discussion (and the two issues you mentioned should be considered separately). I would recommend starting a new BP discussion bringing up these two issues as separate subsections and soliciting feedback. Benwing2 (talk) 01:30, 22 February 2024 (UTC)Reply
@Benwing2: Done. Let's see what comes of it. I'll be focussing on responding to that discussion about Byzantine Greek for the next few days. 0DF (talk) 02:22, 25 February 2024 (UTC)Reply
My comment still stands, though the comments by others since better address the same things. If you have concerns about how definition lines are treated by browsers, you need to bring them to the attention of the community. Using a template that's not intended for the purpose on a random individual line while leaving millions of similar lines untouched that are not different in any meaningful way strikes me as misguided and hacky. As for @PUC, their explanations were straightforward and to the point. I would probably have reverted your edit myself. As for {{l|en}} that's used for linking to English entries, and the modules would be wasting time determining that there was nothing to link to. I spend a lot of my time patrolling CAT:E, and I know that there are several entries that intermittently show up there because of excessive system overhead. If this is done systematically, it will just make things much worse. In comparison to the efforts of people like @Benwing2, This, that and the other and @Theknightwho (which have not been without criticism), such methods are piecemeal, inefficient and silly. Chuck Entz (talk) 00:34, 18 February 2024 (UTC)Reply
I think my only contribution in this particular sphere was creating the -lite templates! Still, 0DF has a point. Unmarked links in definition lines should take the reader to the English section of the linked entry. The objections to using {{l}} for this purpose are very reasonable, so perhaps the addition of #English to these links can be done via JavaScript. This, that and the other (talk) 06:15, 18 February 2024 (UTC)Reply
If you can figure out how to do this, by all means implement it. Benwing2 (talk) 06:26, 18 February 2024 (UTC)Reply
@This, that and the other: That would be a great solution if you could manage it. 0DF (talk) 19:50, 18 February 2024 (UTC)Reply
@Benwing2 @0DF   Done, see [6]. This, that and the other (talk) 01:10, 19 February 2024 (UTC)Reply
@This, that and the other Thanks for this. This causes any red links to get turned orange by the OrangeLinks gadget, because it's still adding #English to the end, and OrangeLinks obviously can't find an English section on the target page. You should be able to filter these out, since the URL targets have a different format. Theknightwho (talk) 02:07, 19 February 2024 (UTC)Reply
@Theknightwho thanks for noting this. Can you check to see if it's fixed now? This, that and the other (talk) 03:54, 19 February 2024 (UTC)Reply
@This, that and the other Thanks. One other issue: it still fails with mainspace links which contain a colon (which covers anything in Category:English terms spelled with :), and I assume you've excluded anything with a colon as a way to catch interwiki links. A better way to do that would be to check for the class extiw. On the other hand, links to titles which include ? do seem to work, since it gets escaped to %3F, so that check's fine to keep. Theknightwho (talk) 04:09, 19 February 2024 (UTC)Reply
@Theknightwho the : is mainly to catch links to other namespaces, such as the Glossary. I'm inclined to leave it as is. The number of English terms containing colons is already minuscule (cat:English terms spelled with :) and it's hard to imagine that any of these would be raw-linked from sense lines with any regularity. This, that and the other (talk) 04:40, 19 February 2024 (UTC)Reply
@This, that and the other That's fair - you could always grab the array of namespaces, called wgContentNamespaces, and simply iterate over the array to eliminate them. Theknightwho (talk) 04:45, 19 February 2024 (UTC)Reply
FWIW, I came up with a rough version that would handle the namespaces in proper internal wikilinks and would also correctly parse some sneaky links where the URL is spelled out (like this or this) though not those with sneaky interwiki links (like this or this):
Array.from(document.querySelectorAll('.mw-parser-output ol a')).filter(e => e.href).filter(e => {
  const url = new URL(e.href);
  if (url.hostname === mw.config.get("wgServerName") &&
    (url.search === "" || url.searchParams.get("action") === "view" || url.searchParams.get("action") === null)) {
    let rawTitle = url.searchParams.get("title");
    if (rawTitle === null) {
      const match = url.pathname.match(mw.config.get("wgArticlePath").replace("$1", "(.+)"));
      if (!match) return false;
      rawTitle = match[1];
    }
    const title = new mw.Title(rawTitle);
    return [0, 118].includes(title.namespace);
  }
})
I tested it with these links. It could use some refactoring but I have to go to bed. — Eru·tuon 06:18, 19 February 2024 (UTC)Reply
Again, I'm not convinced it's worth it given that (a) this code only affects sense lines (dodgy external links to Wiktionary in sense lines should be fixed rather than worked around) and (b) there are so few false-negative entries in existence. I'm not going to get in the way of anyone who wants to edit my code, but it's complete as far as I'm concerned. This, that and the other (talk) 06:46, 19 February 2024 (UTC)Reply
@This, that and the other: Thanks for your work on this. Unfortunately, when I clicked on those "trial" links, none of them took me to the English section of the page. Do you know why that might be? 0DF (talk) 13:37, 19 February 2024 (UTC)Reply
@This, that and the other: Actually, scratch that. It seems to work fine most of the time. Thank you! 0DF (talk) 13:40, 19 February 2024 (UTC)Reply
So now we should give taxonomic links as {{l|mul}}? @Benwing2 Do you think at least part of this transition could be bottable? Although this probably needs a separate discussion. Catonif (talk) 16:08, 19 February 2024 (UTC)Reply
@Catonif Hmm, this is an unintended consequence of this change. Yeah probably we will need to give taxonomic links using {{l|mul}}; I think it's too expensive to consider trying to automate this by looking up each link from JavaScript to see whether it has a Translingual and/or English section. This should be bottable essentially by looking for raw links in definitions that link to terms that have Translingual but not English sections. Benwing2 (talk) 22:28, 19 February 2024 (UTC)Reply
@Catonif, Benwing2, DCDuring: Why exactly is {{taxlink}} removed from blue links? I don't understand the reason for that practice. 0DF (talk) 22:46, 19 February 2024 (UTC)Reply
@0DF Can you give me an example of this? Benwing2 (talk) 22:47, 19 February 2024 (UTC)Reply
@Benwing2: Yes: Special:Diff/78116436/78128518. 0DF (talk) 23:24, 19 February 2024 (UTC)Reply
@0DF: {{taxlink}} is primarily for tracking taxonomic names that don't have translingual entries yet- it adds the page to subcategories of Category:Entries using missing taxonomic names. For linking to a Translingual taxonomic name entry, it would be simpler to use {{l|mul}} or {{m|mul}}. Chuck Entz (talk) 23:49, 19 February 2024 (UTC)Reply
@Chuck Entz: I'm not sure it would, given the functionality of {{taxlink}} brought up by DCDuring in User talk:Benwing2#Italicising synonyms for taxonomic names. {{taxlink}} with |nocat=1 seems like the easiest way to go. 0DF (talk) 00:39, 20 February 2024 (UTC)Reply
@0DF I don't know much anything about the taxonomic templates but this seems the wrong way to do things; instead there should be a template that implements the fancy italicizing behavior without adding a tracking category. (Maybe there already is such a template; there are several taxonomic templates in CAT:Taxonomic name templates.) Benwing2 (talk) 01:23, 20 February 2024 (UTC)Reply
If I remember correctly {{taxlink}} came first, then it was upgraded to have the proper display. After looking at {{taxlink}}, I recalled that the italicization logic was later moved to Module:italics, which also handles some titles for quotation templates. That said, I don't think it should be added to a general-purpose workhorse like {{m}}. Chuck Entz (talk) 02:22, 20 February 2024 (UTC)Reply
@Chuck Entz It could be done if we had a special langcode for taxonomic links, which imo is justified. Theknightwho (talk) 02:25, 20 February 2024 (UTC)Reply
@Theknightwho AFAIK we already have an etym code for taxonomic links although it may not be used. Benwing2 (talk) 02:44, 20 February 2024 (UTC)Reply
@Benwing2, Chuck Entz, Benwing2: The code mul-tax exists for just such a purpose, although it is currently unused. See User talk:Benwing2#Italicising synonyms for taxonomic names. 0DF (talk) 02:17, 21 February 2024 (UTC)Reply
The real question is, why is a template called {{taxlink}} being used for missing taxonomic names? Surely this template should be called something like {{taxlink-check}}, and then, when an instance checked and found valid, it can be changed to {{taxlink}}. I'm tempted to go to RFM. This, that and the other (talk) 04:00, 20 February 2024 (UTC)Reply
{{taxlink}} evolved. I intended to be short because I often type it. Originally, it was to simultaneously provide a way of counting incoming links to missing terms and also hide our lack of content by providing a link to WikiSpecies. (There is a variant that provides link to WP and for alternative pagenames for links to Species and WP.) When we add a taxonomic entry I remove the template. Sometimes, as in the example given above, I add {{taxlink}} to all items in a list, as of hypernyms or hyponyms, and remove those for which we actually have entries.
The capability of automatically de-italicizing terms appearing within taxonomic names like "subsp.", "var.", "sect." etc. helps a lot for lists of terms (hypernyms, hyponyms). It isn't a big need for uses of taxonomic names otherwise eg, in definitions and etymologies, where occurrence is rare, unmanual wikitext is sufficient, and the "harm" of not having proper italicization is not serious. DCDuring (talk) 13:16, 20 February 2024 (UTC)Reply
@DCDuring The whole system of removing the template when a page has been created seems like a major waste of time, as we can check for this stuff automatically to determine whether to add the tracking category. It also means that systematising/improving taxonomic links is now a massive job, instead of something we could have done by using existing template calls. Theknightwho (talk) 13:27, 20 February 2024 (UTC)Reply
The machine checking of entry existence seemed like a gross waste of machine time, leading to unnecessarily slow loading of pages with lots of instances of taxlink, or so I was told at the time. Even today, the WM documentation for ParserFunctions says:
ifexist limits
  1. ifexist: is considered an "expensive parser function"; only a limited number of which can be included on any one page (including functions inside transcluded templates). When this limit is exceeded, any further #ifexist: functions automatically return false, whether the target page exists or not, and the page is categorized into Category:Pages with too many expensive parser function calls. The name of the tracking category may vary depending on the content language of your wiki.
To this day Were I not on the defensive for daring to have ambitions to include large number of taxa and had I the wit, I would have created another template, say, {{taxlinky}}, which retained taxonomic information. At the time, {{taxlink}} did not have any formatting capability and taxonomic entries received no technical support whatsoever, so there was not much point. To this day, taxonomic names are included in the same "language" as CKJV characters, despite there being more than 20K instances of {{taxon}} and sharing damned few commonalities with CJKV characters. ({{taxlink}} appears on more than 46K pages, very many with multiple instances.) In the process of removing {{taxlink}} instances I almost always find missing content and error in entries, so the 'major waste of time' leads IMHO to quality improvement. DCDuring (talk) 16:12, 20 February 2024 (UTC)Reply
@DCDuring It’s a minor check that’s carried out very easily, can be done without using the shitty parser function you’ve referred to, and does not involve someone spending many hours going around removing templates. I can only conclude you’re being defensive because of all the pointless time you’ve wasted doing it. Theknightwho (talk) 16:32, 20 February 2024 (UTC)Reply
@Theknightwho: This seems unnecessarily combative, DC gave good reasoning and supporting evidence for doing things one way while also expressing an interest in having a second template {{taxlinky}} to preserve the information (presumably so they can manually change {{taxlink}} to {{taxlinky}} after we have an entry for the term, which sounds pretty close to what TTO suggested earlier. If it's possible to modernize the underlying code of {{taxlink}} so that it can automatically link to terms we have and use WikiSpecies for terms we don't have without causing excessive memory or CPU overhead, that seems like an ideal solution. JeffDoozan (talk) 17:06, 20 February 2024 (UTC)Reply
@DCDuring @JeffDoozan I apologise. I get irritated when people bring up these kinds of micro-optimizations as a reason to keep doing things the same way; in this case, even using the parser function, the memory/performance impact would be a few milliseconds at most even with several of them, unless we start adding many hundreds of taxonomic links (at which point it would hit the limit of 500). It is possible to do it another way in Lua which does not have this limit, however. Theknightwho (talk) 17:12, 20 February 2024 (UTC)Reply
@Theknightwho: I found that taxlink is already invoking lua to handle the complicated italics so I doubt there would be any performance impact to just convert the whole thing to lua. I found mw.title.new(page):exists as a way of checking that a page exists, but the linked documentation mentions that it's an "expensive" function. Is there an inexpensive way to check if a page exists in Lua? Going even further, is there an inexpensive way to check that the page exists and that it contains a Translingual L2, maybe doing whatever orangelinks does to inspect the page (I haven't looked)? Thanks! JeffDoozan (talk) 16:57, 24 February 2024 (UTC)Reply
@JeffDoozan Yes - you can use :getContent(), which is not expensive. If the page doesn’t exist it returns nil, so it functions a workaround. We already use this in several other places. Checking for a Translingual section can be done with get_section in Module:utilities. Theknightwho (talk) 17:00, 24 February 2024 (UTC)Reply
@Theknightwho: Perfect! I misread the documentation, the call to mw.title.new() is apparently only expensive when using id and not when using title, so mw.title.new(PAGENAME):getContent() looks like a safe way of inspecting the contents. Thank you! JeffDoozan (talk) 17:09, 24 February 2024 (UTC)Reply
I always had the feeling that taxon links should be left wrapped some way or the other, so they be later an object for manipulation in bulk, if ever needed. It is clear that the current state is not ideal. This, that and the other also seemingly underestimates the amount of work put into organism names; if we assumed two templates then calling one {{taxlink-check}} and the other {{taxlink}} is unacceptable for the former’s length, rather it would be {{taxcheck}} and {{taxlink}} or even {{taxl}}. I don’t know what you programmists can do, other than having two templates, but the problems can be outlined well. Fay Freak (talk) 17:14, 20 February 2024 (UTC)Reply
See User talk:Benwing2#Italicising synonyms for taxonomic names for a relevant discussion.

Help wanted, lots of template calls with bad parameters edit

I built a tool to analyze the supported parameters of ~36,000 templates that don't invoke modules (except "Module:string" and "Module:ugly hacks") and then used that to validate all of the template calls in the main namespace and found nearly 100,000 calls to templates using unhandled parameters. There are a lot of errors in here that can be cleaned up by bot: typos, misnamed parameters, etc, and I made a cleanup config page where you can specify a template name, bad param name, good param name to have the bot rename or remove a parameter (this isn't automatic, I'll be verifying anything added here and running the bot manually). Additionally, there are probably some places where it might be worthwhile to modify the templates to support the parameters users are trying to use. Finally, there are places where parameters were once required by templates but are longer used and can be just deleted. Please take a look at the list (warning: it's big and not mobile friendly), add any cleanups you know are safe to the config page, and share any thoughts you have on how else we can clean this up and what can be done to prevent similar errors. JeffDoozan (talk) 17:05, 18 February 2024 (UTC)Reply

@JeffDoozan Thank you, this is going to be very helpful. I am looking at the Hungarian entries and I'm not sure why some of them are listed. For example, {{hu-decl-ek}} has a valid parameter ül, listed in the documentation and present in the code but is marked as an error in jel as follows: {{hu-decl-ek|je|l|et|acc2=t|ül=y}} bad param 'ül' on jel. My next question is if I correct the issues manually can I delete them from your list? Or are you planning to regenerate the list from time to time? Panda10 (talk) 17:45, 18 February 2024 (UTC)Reply
@Panda10: I'll regenerate the list with new data automatically every few weeks. |ül= is mistakenly flagged as invalid because it contains "ü", which I had not included as a valid character for parameter names. I'll fix that bug and regenerate the list. JeffDoozan (talk) 17:51, 18 February 2024 (UTC)Reply
@JeffDoozan Please also include ő as well. There is a parameter |no-vő= in another template. Panda10 (talk) 18:01, 18 February 2024 (UTC)Reply
The filtering by valid-characters was a mistake that excluded a lot of valid parameter names. I've removed it entirely. The list should be updated with the improved parameter detection in the next 10 minutes. JeffDoozan (talk) 18:06, 18 February 2024 (UTC)Reply

Setting Classical Latin transcriptions to phonemic only edit

It is entirely unnecessary for a dictionary, of all things, to attempt narrow transcriptions of millennia-old pronunciations. It is even more unnecessary to insert all sorts of silly hot-takes like:

  • Complete absence of [s] in favour of "[s̱]"
  • ⟨z⟩ as a word-initial [d̪͡z̪] and intervocalic [z̪d̪͡z̪]
  • Complete absence of [j] and [w], even word-initially
  • Syllabification claimed to be phonemic (to be fair, not the only language with this issue)
  • Short vowels before /-m/ as half-long (ludicrous levels of claimed precision) but not raised (??): [ɪ̃ˑ ɛ̃ˑ ɔ̃ˑ ʊ̃ˑ]

Mind, all of this is simply presented as a matter of fact - the outputted transcriptions come with no disclaimer like 'phonetic details uncertain' or 'take with a grain spoonful mug of salt'.

I hope it's not too much of me to say that a word like divisibilitatem should be rendered as simply /diːwiːsibiliˈtaːtem/ and that the current output [d̪iːu̯iːs̠ɪbɪlʲɪˈt̪äːt̪ɛ̃ˑ] is best suited for a conlanging forum, not a lexicographical project with any pretense of professionalism. Nicodene (talk) 12:41, 19 February 2024 (UTC)Reply

Yes. Fay Freak (talk) 15:17, 19 February 2024 (UTC)Reply
Great I'm glad you agree. Nicodene (talk) 15:45, 19 February 2024 (UTC)Reply
  Support - put it in the bin. Theknightwho (talk) 20:46, 19 February 2024 (UTC)Reply
  Support. It feels like a conlang project. — Fenakhay (حيطي · مساهماتي) 20:51, 19 February 2024 (UTC)Reply
  Support. There's always a strong temptation to treat linguistic reconstruction as a magical portal into the past, when it's really just an educated guess based on incomplete evidence. We do have some contemporary evidence, and books have been written on the subject, but it's still just a guess. How would the pronunciation of a common foot soldier be different from that of a centurion, a member of a certain gens, or even an emperor? What about when giving an oration as opposed to when buying something in a market? Any language with that many speakers over that wide an expanse of time and space is going to have lots of variation with historical era, geography, and any number of social variables. Of course, most of those variations wouldn't make it into writing that would be preserved into modern times, but it's still like the parable of the blind men and the elephant to some extent. Chuck Entz (talk) 21:38, 19 February 2024 (UTC)Reply
@Nicodene   Support although it might be worth rendering final -m (as well as vowel + -ns-) as a long nasal vowel since that seems to have been universal. Benwing2 (talk) 23:04, 19 February 2024 (UTC)Reply
BTW there used to be a {{cu-IPA}} that generated even more absurd pronunciations of Old Church Slavonic terms; it was nuked with extreme prejudice. Benwing2 (talk) 23:07, 19 February 2024 (UTC)Reply
  Support. We don't need every phonetic process that happens and we should really nuke other phonetic transcriptions when possible. Vininn126 (talk) 08:17, 20 February 2024 (UTC)Reply
@Vininn126 Not sure I agree with the second part. It depends at what level the phonetic transcription is. Sometimes on the contrary we need to nuke the phonemic version, if it's too abstract and misleading. Benwing2 (talk) 08:19, 20 February 2024 (UTC)Reply
I suppose with certain transcriptions, sure. My point is more that we have far too many phonetic transcriptions where phonemic ones would be better. Vininn126 (talk) 08:30, 20 February 2024 (UTC)Reply
Again I believe broad phonetic transcriptions are usually the best. For example, a purely phonemic transcription of Catalan would render all voiced fricatives (which are actually pronounced more like approximants, as in Spanish) as stops; but this would be quite misleading for the language learner. What I think you're complaining about is narrow phonetic transcriptions, which I agree are usually unnecessary. Benwing2 (talk) 08:40, 20 February 2024 (UTC)Reply
I don't have the same objection for modern languages since there the phonetic details are a matter of fact. I've not really found any that seem unnecessarily detailed; a transcription like [ɡɐˈrɨ] for instance is far more useful to a learner than a phonemic /ɡoˈri/.
If what you're getting at though is that having phonemic and phonetic transcriptions side-by-side is unnecessary, that I can agree with. If we're already going to show [kaˈβ̞a.ʝo], why have a preceding /kaˈbaʝo/? Nicodene (talk) 19:34, 20 February 2024 (UTC)Reply
I don't agree with that, and would like to see a three-tier system: a broad phonetic transcription, and for people interested in more, a narrow phonetic transcription + a phonemic one, which would be hidden by default. PUC20:23, 20 February 2024 (UTC)Reply
We don't actually disagree - I was referring to the default state of an entry. Putting additional information in a drop-down menu sounds like a good way of going about it. Nicodene (talk) 21:19, 20 February 2024 (UTC)Reply
This could be a good system. Vininn126 (talk) 16:26, 21 February 2024 (UTC)Reply
Phonemic transcriptions are useful for users to be able to infer idiolectal pronunciations of words: You can't show how every single speaker talks, but by giving a phonemic transcriptions a reader can apply a set of rules to determine it. Thadh (talk) 10:57, 22 February 2024 (UTC)Reply
  Support. MuDavid 栘𩿠 (talk) 00:58, 21 February 2024 (UTC)Reply
Support, but Benwing makes a good point about final nasals; I was going to make the same comment myself. This, that and the other (talk) 08:39, 21 February 2024 (UTC)Reply
A phonemic representation of ⟨-Vm⟩ as /Ṽː/ isn't given by any scholar that I'm aware of, and it would be contradicted by the fact that a consonant is evidenced by the various spellings of the tan durum type (= tam + durum), which show final [m] assimilated to [n] in contact with the initial consonant of the following word.
That leaves the matter of length. Does any source give phonemic representations like */dekeːm/ for decem? A blank implementation would run into mistakes like */reːm kʷeːm/ for rem, quem - contradicted by the Romance outcomes like French rien and Spanish quién. Nicodene (talk) 16:21, 21 February 2024 (UTC)Reply
@Nicodene AFAIK final -m was pronounced as homorganic to the next consonant in cases like 'tam durum' as you mention, but as a nasal vowel when a vowel or nothing followed, as shown by elision in poetry. I thought that was universally accepted. Maybe this isn't phonemic but we run into the standard issue with phonemic representations found in so many languages, which is that there are typically multiple representations that make sense. Benwing2 (talk) 21:00, 21 February 2024 (UTC)Reply
@Benwing2 It's not that I question that this occurred in general (though I suspect monosyllabic words were resistant), it's rather that I can't see it being phonemic. If we try to make tam = /tãː/ work, then in light of tan durum we have to tack on an allophonic rule like "nasal vowels in contact with a following plosive forwards-eject a homorganic nasal consonant", which is more or less the reverse of what all sources describe (final /-m/, retained as a consonant in that context, allophonically deleted in others). I don't know of anyone who gives Classical Latin as a language with phonemic nasal vowels. Nicodene (talk) 21:46, 21 February 2024 (UTC)Reply
  Mild oppose: Keep a phonetic transcription with some generally agreed-on allophones, like the nasal vowels and nasal assimilation. Though I guess what's generally agreed-on is a can of worms. Maybe there's a lot of disagreement these days. I'm not really up-to-date with the evidence on retracted or apical s in Latin (aside from it being in Old Spanish, Old French, Old Galician-Portuguese) and open-mid long vowels and such. — Eru·tuon 02:12, 22 February 2024 (UTC)Reply
To the best of my knowledge, Latin /s/ as anything but [s] is a minority view in the scholarship; total exclusion of [s], either rare or non-existent. Low-mid /ē ō/ is Calabrese's pet theory, fringe amongst scholars but enjoying a cult following on the internet (thanks to a youtuber popularising it).
Above all, though, if you see some validity in either of those then a phonemic representation is to your advantage, since it accommodates them.
Maybe it's just me, but coda nasal assimilation is so common cross-linguistically that I take it as a given, upon encountering a new dialect/language, unless shown otherwise. I think once I found one where coda nasals were all automatically [ŋ]? Nicodene (talk) 09:35, 22 February 2024 (UTC)Reply
@Nicodene I am with Erutuon here that we should show non-obvious phonetic information whenever possible when it's well-accepted, including in dead languages. We are aiming for language learners and I think it's doubtful it would be obvious that coda nasal assimilation would happen (since it tends not to happen in English), and there's no possible way they could figure out that a written /m/ is actually a nasal vowel before a pause or another vowel. I believe this strongly about living languages, and logically this extends to dead languages where the scholarship is sufficiently clear. All sorts of weird things happen in phonemic representations. E.g. one well-respected editor around here argued seriously that the consensus of modern scholarship is that Spanish fui pronounced [fwi] and muy pronounced [muj] have phonemic representations /fui/ and /mui/ respectively, which to me is completely bizarre because it requires a lexically-sensitive rule to determine whether to pronounce a given /ui/ as [wi] or [uj]. I'm coming more and more to the belief that we should ditch completely any pretense of generating purely phonemic pronunciations and present whatever will be most useful to the language learner. For Latin this might mean deviating from a pure phonemic representation only to the extent of indicating the actual pronunciation of -m (and maybe -ns-, if the -n- as nasal vowel is well-accepted). Or it could mean the same but also showing /l/ as [l] before <i> and <l> and [ł] elsewhere; again AFAIK some variant of this view is fairly universally accepted. Benwing2 (talk) 10:11, 22 February 2024 (UTC)Reply
Believe it or not, even that is not straightforward, as there isn't a consensus that non-velarised /l/ was in fact [l] as opposed to, say, a somewhat palatalised [lʲ]. (-ns- on the other hand isn't controversial at all, I should say.)
Dead languages are very much a different beast altogether. It's trivial to find an academic source giving phonetic transcriptions for Spanish or Catalan on-par with the ones we have, some even more detailed. It is on the other hand impossible to find (and believe me I've tried) even one serious academic source which provides Latin transcriptions with anything like the level of allophonic detail you are describing. Not Allen, not Leumann, not Sen, not Cser... I'll take an actual transcription from the latter for illustration: [ĩːferus]. It does mark the nasalisation we've come to love and cherish, yet pretty much nothing else at all. And you won't find a more elaborate transcription than that in the entire study (link). If a modern work of 200+ pages dedicated to Latin phonology and morphology doesn't embark on these adventures, what are we doing here exactly?
And supposing we go broad to the extent of Cser: now the problem is that we're going to mislead anyone who knows basic IPA but isn't a specialist in Latin. If I saw, for a language I don't know, a phonetic transcription like [ĩːferus] - with that level of detail put into the first phone - I'd think the rest of the transcription is similarly accurate - yet, I know for a fact the rest is simply left in phonemic form. Meanwhile Sen, whenever he can, gives Classical Latin in phonemic transcriptions (as I am also suggesting we do). These are cutting-edge works discussing fine details of Latin phonology - by no means can either author be accused of, say, incompetence or laziness in that regard. It's simply that nobody does such a thing with a dead language, in any sort of academic context. Whimsically whilst daydreaming, maybe.
ETA: Why not just have the label 'Classical' link to the Wiki page that describes what scholars do or do not agree on, phonetics-wise? Nicodene (talk) 12:55, 22 February 2024 (UTC)Reply

Replacement of the trill [r] with non-trill [ɹ] edit

There seem to be cases in general American English where the voiced alveolar approximant /ɹ/ is transcribed as the trill /r/. For example, 'brand' is transcribed as /brænd/ instead of /bɹænd/. If this sounds right, our team at CUNY will start to make corrections. Cpeng2 (talk) 22:12, 20 February 2024 (UTC)Reply

@Cpeng2: that seems uncontroversial to me. — Sgconlaw (talk) 22:32, 20 February 2024 (UTC)Reply
Thanks,@Sgconlaw! Me and other team members @Yaejunmyung and @Your future self will start the cleaning soon. Cpeng2 (talk) 00:28, 21 February 2024 (UTC)Reply
@Cpeng2 Not sure if you intend to cover it, but this also affects RP English. Theknightwho (talk) 00:34, 21 February 2024 (UTC)Reply
@Sgconlaw: Except that the correct symbol for broad phonemic transcription is /r/ when there's only one rhotic. --RichardW57 (talk) 13:48, 21 February 2024 (UTC)Reply
@RichardW57: in English? Can't say I'm very familiar with this. I'm just following what is specified in "Appendix:English pronunciation". Again, if it is thought there needs to be a change to the table, then it should be discussed on this page so that consensus should be reached. — Sgconlaw (talk) 13:54, 21 February 2024 (UTC)Reply
@Sgconlaw It’s a long-established practice to use /ɹ/, and I’m not fully convinced that /r/ is correct even in broad transcription, even if it is used by some publications. Theknightwho (talk) 14:01, 21 February 2024 (UTC)Reply
I'm not a big fan of /ɹ/ because I think using a special IPA letter rather than than /r/ makes it less obvious that the transcription is broad. (A plain alveolar approximant occurs as a lenited allophone of /r/ in languages like Italian and Greek, but sounds pretty different from the English "r" sound. The definition of "ɹ" is actually pretty vague, so it isn't improper to use it for either sound, but I think using it for English r can potentially give an air of false precision.) Neither /r/ nor /ɹ/ is "incorrect" in a holistic sense though. Notation for English R is discussed in "The Articulatory Phonetics of /r/ for Residual Speech Errors", Suzanne E. Boyce (Semin Speech Lang. 2015 Nov;36(4):257-70). Boyce writes: "Linguists agree that the rhotic liquid of English is a single phoneme and that certain articulatory movements must occur for a typical acoustic profile and an acceptable percept to occur. In the International Phonetic Alphabet’s notation, this sound is represented by /ɹ/, which specifies that the sound is an approximant with a primary constriction at a point along the palate that may range from alveolar to postalveolar to palato-velar. The American phonetic tradition, which is followed by most clinicians, is to use the Roman alphabet symbol /r/" (page 258) and "although many textbooks refer to /r/ as having an alveolar place of articulation, it is more accurate to say that it has a relatively undefined “palatal” or “postalveolar” primary place of articulation. As noted previously, this is in fact the current stance of the international phonetic association for the IPA symbol /ɹ/" (page 261-262). In the 1999 Handbook of the IPA, Peter Ladefoged's chapter on "American English" presents "'tɹævəlɚ" as a "broad phonemic transcription" of traveler.--Urszag (talk) 14:12, 22 February 2024 (UTC)Reply

Emoji pronunciations in English edit

I recently noticed that we now have an entry (complete with pronunciation) for one emoji: 🧢. I'm curious whether others think this is a good precedent. Kylebgorman (talk) 13:07, 21 February 2024 (UTC)Reply

The entry is pretty explicit that it is for the use of the emoji as a rebus for the slang word cap. To my mind, this isn't much different from including other types of alternative spellings such as ur, i18n, and so on. As with those, it seems reasonable to include most forms of detailed information at the main entry to avoid duplication, although including a short definition may be convenient, and I guess it can be argued that including the pronunciation may be useful in cases where it isn't obvious from the spelling itself.--Urszag (talk) 13:50, 21 February 2024 (UTC)Reply
@Urszag I don't feel that strongly about this but I think conventionalized English names for emojis" is a rather different than conventionalized abbreviations. For one it's not clear whether all or even most emoji have a conventional name. Kylebgorman (talk) 19:30, 21 February 2024 (UTC)Reply
The citations don't attest conventionalized English names for the emoji 🧢. Rather, they show use of 🧢 as a written representation of the word "cap".--Urszag (talk) 19:47, 21 February 2024 (UTC)Reply

Theknightwho changes to Dravidian tree edit

User:Theknightwho took it upon himself to rename and restructure the Dravidian family tree. A South Dravidian superfamily does not have unilateral support, and is in fact not supported in the Kolipakam (2018) computational models. Please undo these changes. @Benwing2, -sche, Chuck Entz, Mahagaja, Mnemosientje --{{victar|talk}} 18:09, 21 February 2024 (UTC)Reply

Apparently Victar missed the edit summary, so I shall repeat it here:
  • Changing Dravidian to the three branch model (North, Central, South), with (former) South and South-Central changed to South I and South II, in a new superfamily called South. Although the consensus at User talk:AleksiB 1945#Proto-Dravidian entries wasn't universal, it was generally in favour and it's also the model all our entries currently follow anyway.
The only person who opposed was you, and you self-admittedly are "really only interested in Dravidian terms borrowed into Sanskrit", and given your endless obstructionism/bullshitting in the face of anything you disagree with and the fact that User:AleksiB 1945 has been waiting for months for these changes, I decided to bite the bullet and make the change. So no, I will not "undo these changes".
What Victar has also missed is that a decision on this issue had to be made at some point, because we couldn't create proto-languages for the major Dravidian subfamilies until we made a decision as to what "Proto-South Dravidian" referred to; this has also been pending for a while. Given that our entries de facto follow the model I changed it to, that seemed by far the most sensible choice, as it's no use having the data say one thing but all our entries say another. Theknightwho (talk) 18:17, 21 February 2024 (UTC)Reply
Also I should note that I was the one who added the previous Dravidian model in the first place - it's not like it had some kind of longstanding consensus behind it. Theknightwho (talk) 18:28, 21 February 2024 (UTC)Reply
Again, this is another example of you making executive decisions without starting a proposal discussion first. If you really feel your change is warranted, undo it and start a proposal. But you won't, just as you didn't revert that last things you were told to revert because rules and practices clearly don't apply to you. --{{victar|talk}} 20:39, 21 February 2024 (UTC)Reply
@Victar As I have already explained, this change was made in order to close a thread relating to proto-languages for Dravidian subfamilies which has been open for a long while now. You failed to comment on that thread. In making that decision, I consulted past threads in which you were a minority dissenter, and also noted the de facto state of our entries. This has already been debated at length, as you well know, so me opening yet another proposal would serve absolutely no purpose.
This has already been explained to you, and no rules or practices have been broken in coming to that decision. Theknightwho (talk) 20:51, 21 February 2024 (UTC)Reply

@-sche, I could use your experience in language families on this. We've been only reconstructing Proto-Dravidian and not any subfamily branches for the following reasons:

  1. Some of these families are not agreed upon and may simply be areal, such as North Dravidian, with some scholars believing Brahui to be its own genetic branch, and the superfamily South Dravidian being highly disputed. Please see Dravidian_languages#Classification.
  2. The differences between proto reconstructions of the families, in many cases, would be none, creating redundant reconstructions.
  3. In part for the reasons above, most sources only reconstruct Proto-Dravidian in their etymologies.

To sort reconstructions that are only found in a single branch into categories, we started using labels, like is done with Proto-West-Germanic and many other languages. These seems the safer and more collated route. Pinging @Pulimaiyi, Kutchkutch. --{{victar|talk}} 21:26, 23 February 2024 (UTC)Reply

@Victar That isn’t true, and misrepresents what @AleksiB 1945 has told you, and indeed what can plainly be seen in our reconstruction entries. You've just been reverting anyone who tried to make entries for them.
The South Dravidian branch is not “highly disputed” - I can find absolutely no evidence of that. I can see that there are different views as to whether South or South-Central should be grouped together, but recent scholarship has (strongly) trended towards doing so. Please stop misrepresenting things.
Finally, if you disagreed with adding additional proto-languages, you had ample time to object in the thread which has been open for two months, but you did not do so. It’s bizarre that suddenly you find a pressing need to claim it’s wrong now, and in this way; I wonder what the reason could possibly be.
Being frank, your conduct in this thread seems like an attempt to circumvent the consensus of discussions you don’t like through trying to impose a new one by canvassing editors you believe are going to be predisposed to your views, and by crying foul play by misrepresenting the circumstances in which the change was made. That’s made very obvious by the fact you haven’t pinged by far the largest contributor in Dravidian languages, which is very obviously because you know he won’t agree with you. It’s unacceptable. Theknightwho (talk) 22:49, 23 February 2024 (UTC)Reply
I also note you didn’t ping @Illustrious Lock, but you did seemingly find the time to move the Proto-South Dravidian I entries they made to Proto-Dravidian. How strange. Anyone would think you’re not interested in genuine consensus. Theknightwho (talk) 23:25, 23 February 2024 (UTC)Reply

@Theknightwho: Although the details may need to be discussed further, I agree with the administrative changes proposed by the Dravidian editors. However, I am unable to make the administrative changes myself, since I am an involved administator in the dispute. The labels at Module:labels/data/lang/dra-pro were intended to be a temporary measure until a more sustainable conclusion is reached. Kutchkutch (talk) 03:20, 24 February 2024 (UTC)Reply

@Kutchkutch Thank you - yes, that was my impression as well, and Victar’s argument here makes little sense to me. Theknightwho (talk) 12:43, 24 February 2024 (UTC)Reply
For some further context, the discussion which @Victar did not participate in was explicitly pointed out to him in User:Victar#Proto-Dravidian labels at MOD:labels/data/lang/dra-pro, so it's not as though he wasn't aware. He much prefers forcing through what he wants at the expense of the other Dravidian editors. Theknightwho (talk) 20:27, 24 February 2024 (UTC)Reply

Writing a bot for surnames and relevant statistics. edit

See this diff for an example of the information being added. It's a "Statistics" heading for proper-noun surnames, with information like "According to the 2010 United States Census, Gullage is the 55,121st most common surname in the United States, belonging to 373 individuals. Gullage is most common among White (61.7%) and Black (34.0%) individuals."

From what I can tell, it's been done in an ad-hoc manner, but it's on over thirty thousand pages.

The data is from the 2010 Census surnames file (there's no 2020 file available), which contains 162,253 names: all names with more than 100 people having them.

I'd like to write a bot which will, for each of these names, look for an English proper noun section, add # {{surname|en}}. if not present, and add the "Statistics" section if not present with the relevant data, possibly with a citation footnote like:

Frequently Occurring Surnames from the 2010 Census”, in 2010 US Census, US Census Bureau, 2019-02-14, retrieved 2024-02-22

I also found this Nature article about names which links to a large dataset, but it doesn't include frequency information and is just a very large list.

Is this data too encyclopedic, viz., WT:NOT? I see that it's in a lot of places, and it might be worthwhile to automate it, as well as to import a lot of names. Thoughts? This would be my first attempt at a bot, though I have plenty of programming experience. grendel|khan 16:51, 22 February 2024 (UTC)Reply

@Grendelkhan I would like it, personally. More information like this seems good. The only thing is to be careful with automatically adding the definition, because the same spelling could have a form that's not a name, and then you wouldn't know where to put it. But I think it's a good idea. Whilst you're at it, could you also convert any usages you do find to use {{surnames-us-census}}? We had a discussion on that topic a few months ago, and I wanted to do it, but I've forgotten and it's fallen by the wayside. If you're interested, I would definitely support it :) Kiril kovachev (talkcontribs) 22:54, 22 February 2024 (UTC)Reply
I'm not a huge fan of these, because it seems like trivia, and not partiuclarly interesting trivia at that (beyond the most common names). Plus, it means that entries for surnames which are really rare in the US but common elsewhere end up being dominated by information about the US. Yeah, we could add census info for other countries as well, but that soon starts getting out of hand, as it's beyond the scope of a dictionary. Theknightwho (talk) 23:16, 22 February 2024 (UTC)Reply
This is true, maybe we should limit it to a certain range so that we don't have names like the 10,000th most common name or whatever. I believe we previously removed a lot of those which had literally only one bearer, so we should also be careful not to add them back in. Kiril kovachev (talkcontribs) 21:28, 23 February 2024 (UTC)Reply
I'm sure we had a whole discussion about the inclusion of this surname information elsewhere, which probably ended up without a consensus. — Sgconlaw (talk) 18:06, 25 February 2024 (UTC)Reply
@Sgconlaw We did (at RFD), and it did. Theknightwho (talk) 13:35, 27 February 2024 (UTC)Reply
@Theknightwho: right, so …? — Sgconlaw (talk) 15:30, 27 February 2024 (UTC)Reply
Idea: could this data be hosted at Wikidata and dynamically read from there instead? —Fish bowl (talk) 23:51, 22 February 2024 (UTC)Reply

Old Dutch lowercase toponyms edit

There are a bunch of Old Dutch toponyms entered using an initial lowercase letter and classified as nouns rather than proper nouns, e.g. ganipi = Gennep and budilio = Budel. These entries seem to have been created by User:Rua. This seems very strange to me, and the references given for these terms don't appear to support the usage of lowercase here. Any objections to renaming these with an initial capital letter and reclassifying as proper nouns? Benwing2 (talk) 22:02, 22 February 2024 (UTC)Reply

Seems perfectly sound to me. Without a doubt they should at least be proper nouns, but since the references seem to cite the capitalized version, they should just be capitalized too in my opinion. Kiril kovachev (talkcontribs) 21:26, 23 February 2024 (UTC)Reply
If the refs don't support it, spell it like they do. CitationsFreak (talk) 23:29, 23 February 2024 (UTC)Reply
Is a capital letter used in the original contemporary source? —Rua (mew) 11:36, 27 February 2024 (UTC)Reply
@Rua AFAICT it is, see the citations in [7] for example. But keep in mind that we capitalize Classical Latin proper names and toponyms despite there being no capitalization in the source (because the source didn't have any lowercase letters). Benwing2 (talk) 20:07, 27 February 2024 (UTC)Reply
Why though? —Rua (mew) 17:59, 29 February 2024 (UTC)Reply
Why do we capitalize Latin proper names? This is getting far afield of the issue but it's conventional to normalize ancient-language text, e.g. we separate U and V in Latin even though the source didn't do that, we add diacritics consistently to Ancient Greek text even though the source didn't always do that, etc. I don't know why I'm even explaining this to you; you already know it. Benwing2 (talk) 00:02, 1 March 2024 (UTC)Reply

Wiktionary: a valuable tool in language preservation edit

https://diff.wikimedia.org/2024/02/23/wiktionary-a-valuable-tool-in-language-preservation/Justin (koavf)TCM 14:52, 24 February 2024 (UTC)Reply

New Spanish-Language Dictionary Template edit

Hello guys. I just made my first edit in en.Wiktionary and I'd like to share it with you. It's a new template for a Spanish-language dictionary. It's the Diccionario del Español de México (Dictionary of the Spanish of Mexico).

Template:R:es:DEM

I recommend using it instead of/along with Template:R:es:DRAE for words related to Mexico and Mexican culture as they are usually deeper or more accurate than DIRAE's.

ocote”, in Diccionario del español de México, Segunda edición, Academia Mexicana de la Lengua, 2019

JaimeDes (talk) 15:41, 24 February 2024 (UTC)Reply

Muchisimas gracias, but why does it link to an article at en.wp that doesn't exist? Were you trying to link to es.wp? —Justin (koavf)TCM 17:48, 24 February 2024 (UTC)Reply
I didn't notice that. I just modified DRAE's template, in that case, I have homework for this weekend, I'll create the article. Thanks for mentioning it! JaimeDes (talk) 22:14, 24 February 2024 (UTC)Reply
It's always good to have more references. I think this could be added to entries where it includes senses the DRAE lacks, such as campechana, caguama, and chocolate, or where it has an entry the DRAE does not, such as chípil. I'm not convinced the senses are that much better on ocote that it should *replace* the DRAE. Similarly, popote, huipil, and ejote don't seem much better in DEM vs DRAE. JeffDoozan (talk) 19:28, 24 February 2024 (UTC)Reply
I do agree that it was a bad example, it seems that I picked the worst example possible whithout noticing. But I think you got the right idea about the DEM. JaimeDes (talk) 22:12, 24 February 2024 (UTC)Reply

Two proposals concerning entries’ Tables of Contents (TOCs) edit

For those using the default Vector legacy (2010) skin (e.g., all unregistered users of this site), TOCs take up a lot of initial vertical space in entries with many headers. Two proposals to mitigate that follow. They derive from discussion in the section #Use of T:lang, above.

Proposal 1: Limit TOCs to displaying only level-2 and -3 sections by default edit

The template {{TOC limit}}, when transcluded on a page, limits which sections are displayed in the TOC for that page. The template defaults the limitation to level-2 and -3 sections (i.e., those whose headers are generated by bookended double [==] and triple [===] equals signs), although that can be varied by calling |1= or |limit= with any number besides 3. See bar for an example of a page than transcludes {{TOC limit|2}}, thereby limiting the page's TOC to displaying level-2 sections (i.e., language sections) only.

I propose that {{TOC limit}}'s default limitation should be made the default limitation to all TOCs (instituted at a fundamental level, not requiring the use of {{TOC limit}} on every page).

Rationale: The way TOCs are generated seems to have been designed with Wikipedia in mind. Wikipedia articles have few section headers relative to Wiktionary pages. Wikipedia TOCs are well suited to navigating their articles, whereas Wiktionary TOCs that include links to level-4, -5, and deeper sections are simply unwieldy. I propose the limitation to level-2 and -3 headers by analogy with print dictionaries' headwords, which usually give a term's part of speech (usually abbreviated, e.g. to n., sb., a., v., adv., vel sim.) and, in cases of homography, a numeral to differentiate terms with different etymologies. This limitation would result is the display of language sections, etymology sections, and (other than in cases of homography) pronunciation and part-of-speech sections, as well as some other sections (Alternative forms, Anagrams, etc.), but the hiding of other sections. (FWIW, I would happily see the limitation be to level-2 sections [language sections] only, à la bar, but I thought this proposal would be a more moderate change.)

This change would also improve the usability of TOCs for users of the Vector (2022) skin. 0DF (talk) 02:18, 25 February 2024 (UTC)Reply

Discussion edit

Personally, I oppose this, as it makes the TOCs useless (people like me can't use them to navigate to the specific sections they're interested in any more); I even dislike and am inconvenienced that we suppress lower-level headings on extremely long pages like a, although I tolerate it because I recognize that other people dislike extremely long TOCs and don't want to have to click the "hide" button to collapse them. - -sche (discuss) 21:10, 25 February 2024 (UTC)Reply

@0DF: I want to see L4 headers (such as 'Declension'), even when they're at Level 5. --RichardW57 (talk) 22:19, 25 February 2024 (UTC)Reply

A different Proposal 1b: For pages with excessive number of languages L2 it is impossible to view L3. example A, te. My proposal is User:Sarri.greek/notes#TOC_hor_limit2 created by something like Module:User:Sarri.greek/toc2-hor to a template like Template:User:Sarri.greek/toc2-hor with an appropriated css Template:User:Sarri.greek/toc2-hor/style.css which will make it horizontal, but I do not know how to do it. I have been begging programmers to fix it. Also, for other horizontals, see all the ToCs of the Chinese wiktionary like A, also the Vietnamese have horizontal talks (with all subLs). For less L2s, or if desired to see all Ls, see below #Proposal_2b & User:Sarri.greek/notes#For_few_languages ‑‑Sarri.greek  I 23:09, 25 February 2024 (UTC)Reply
@Sarri.greek: I think you meant to link to zh:A, rather than zh. I like the way 维基词典 (Wéijī Cídiǎn) formats its TOCs. The default Vector(2022) skin shows just the language sections, each with an arrow to click for the drop-down submenu for that language section's headers. Whereas the 旧版Vector(2010) skin shows all the sections, presented in two columns, with the language sections in the left column and all the subordinate sections in the right column, each one separated from the next by one standard space followed by • (a bullet, specifically a non-selectable one identical to that generated by a line-initial asterisk on this site). Let's consider how well that latter format could be applied to the English Wiktionary. Consider Wéijī Cídiǎn's page for A, for example. Besides an {{also}} transclusion, a couple of {{character info}} boxes, and — notably — a transclusion of {{TOC limit|2}} (although it doesn't seem to make a difference), it comprises 204 language sections, the longest-named of which are the two ten-character ones, 帕胡納爾-阿舍寧卡語 (Pàhúnàěr-āshěníngkǎyǔ?, “Ashéninka Pajonal”) and 塞爾維亞-克羅地亞語 (Sài’ěrwéiyà-kèluódìyàyǔ, “Serbo-Croatian”). Of the names of the 271 sections in the right column, 239 comprise two characters, 22 comprise three characters, and 10 comprise four characters; the longest non-language section names are only four-characters long each. As a result, the widest entry in that page's TOC is the first, the 跨語 (kuàyǔ yán, “translingual word”) section, which looks like this:
1 跨語言         詞源1 •詞源2 •圖片 •另見 •拓展閲讀 •參考資料
If we were to apply the same format to the TOC line for an identical entry with section titles translated into English (with intercolumnar spacing based on that of the longest language-section name in en:A, namely “Kalo Finnish Romani”), it would look like this:
1 Translingual      Etymology 1 •Etymology 2 •Image •See also •Further reading •References
But, for readability, I think this would be better:
1 Translingual      Etymology 1 • Etymology 2 • Image • See also • Further reading • References
That looks OK, I think. Better than what we have at the moment.
@-sche, RichardW57: How do you feel about this “Proposal 1b”? 0DF (talk) 20:07, 27 February 2024 (UTC)Reply
@0DF: It lacks section numbers, which are important for navigation. --RichardW57 (talk) 22:34, 27 February 2024 (UTC)Reply
@RichardW57: How are section numbers "important for navigation"? They aren't included in URLs to specific sections. 0DF (talk) 01:07, 28 February 2024 (UTC)Reply
M @0DF, full numbering (e.g. 2.1, 2.2., 3, 3.1) is indispensable at wiktionaries (unlike wikipedias) because our section.titles are repetitive; especially in subsections of multiple etymologies, they help the brain navigate! (of course, positions may change at an electornic dictionary, but still... I find them so helpful!) Thank you. ‑‑Sarri.greek  I 02:35, 28 February 2024 (UTC)Reply
@Sarri.greek: I can't say I really get it, but OK. I suspect that neither of these proposals are going to go anywhere. 0DF (talk) 03:03, 28 February 2024 (UTC)Reply
I think, it should, @0DF. {alert/attn|bureaucrats}} The state of pages like A, te tells us, tells to me as a reader, that en.wiktionary does not care at all of how its ToCs look. Why is this? There are 3-4 desirable styles: for tooooo many Languages, for many languages, for juxtaposed languages, for few languages. And, if ever the __TOC__ is taken away (as it is forbidden at Vector2022), we should have a wiktionary‑built ToC. Because the structure of Contents is the responsibility of the editor, not the publisher. I hope, 0DF your proposals, ignite some interest. Thank you. ‑‑Sarri.greek  I 07:55, 28 February 2024 (UTC)Reply
@Sarri.greek: Thank you for your support. I admit to feeling quite disappointed at the largely negative response to these proposals. I may write a third proposal at some point, accommodating the various criticisms of the first two proposals that have been made in this discussion. In the meantime, however, I owe you and others responses in Wiktionary:Requests for moves, mergers and splits#Medieval Greek from Ancient Greek and elsewhere, so I won't hurry back to this. 0DF (talk) 22:32, 16 March 2024 (UTC)Reply

Proposal 2: Display the contents of all TOCs in columns edit

The template {{col-auto}} is widely used in entries here to sort lists of terms (in sections such as Related terms and Derived terms) into neat columns, like this:

Such presentation saves a lot of vertical space in entries that would otherwise be taken up by single unnecessarily long (but narrow) columns. That columnar arrangement can also be applied to TOCs. For a ready-made example approximating what that would look like, courtesy of This, that and the other, see User:This, that and the other/columnar toc mockup#using columns. Alternatively, you can see what that would look like on any page of your choosing. To do so, go to the page you want and, once there, open the Console tab and paste this code (courtesy, again, of This, that and the other) into it:

document.querySelector('.toc').style.display = 'flow-root'; document.querySelector('.toc > ul').style.columnWidth = '20em';

To open the Console tab, press either F12, Fn + F12, or Ctrl + ⇧Shift + I. Alternatively, if you're using MacOS and wish to navigate menus, go to View → Developer → JavaScript Console (on Chrome) or go to Tools → Browser Tools → Browser Console (on Firefox). Thanks go to This, that and the other for the keyboard shortcuts and to Benwing2 for the MacOS menu directions.

I propose that all TOCs display their contents in columns (using This, that and the other's code or code with equivalent effect).

Rationale: As they are, TOCs waste a lot of otherwise unused or little-used horizontal space at the top of pages. This change would save vertical space (which is used) by using that horizontal space (which is not used currently). Just as there is no controversy about the use of {{col-auto}}, I foresee no controversy about columnar presentation in TOCs. 0DF (talk) 02:18, 25 February 2024 (UTC)Reply

Discussion edit

  1. Is there any particular problem with right-hand-side ("RHS") display (by default) of the table of contents, at least on personal computers?
  2. Will implementing either of these default options impinge on RHS display selected through gadgets or preferences?
  3. Is it possible to enjoy the benefits of proposal 1 (fewer levels in ToC) with RHS display?

Obviously, I really like RHS display and have been surprised that we don't have more who opt for it. DCDuring (talk) 16:07, 25 February 2024 (UTC)Reply

@DCDuring:
 1. It pushes down images, any transclusions of {{character info}}, {{examples}}, {{wikipedia}}, etc., and any other RHS objects. It causes basically the same problems as left-hand-side TOCs, but to a subset of a page's contents, rather than the whole thing.
 2. Good question. Can you answer this, please, This, that and the other?
 3. It should be. What does the TOC look like for you on bar? If that page shows a RHS TOC displaying only language sections, the answer to your question is "yes".
My concern with these proposals is improving the experience of unregistered users, who are stuck with defaults. 0DF (talk) 17:30, 25 February 2024 (UTC)Reply
The proposals still force down all content, eg. etymology, pronunciation, definitions, whereas RHS ToC forces down items that are less essential, less common ({{examples}}), or easily relocated (sister-project boxes).
I don't see how LHS ToC is better than RHS ToC is better for unregistered users.
RHS ToC displays well at bar. DCDuring (talk) 17:52, 25 February 2024 (UTC)Reply
@DCDuring: The default is the LHS TOC. Whether the RHS TOC is better is outside the scope of this proposal, in the same way that whether the Vector (2022) skin is better is outside the scope of this proposal. 0DF (talk) 18:18, 25 February 2024 (UTC)Reply
I am making an argument against the proposal on the grounds that there is a superior alternative. DCDuring (talk) 18:21, 25 February 2024 (UTC)Reply
@DCDuring: Then write your own proposal in favour of making RHS TOCs the default. Don't hijack this proposal; all you'll achieve by doing so is sabotaging this attempt at improving things. 0DF (talk) 18:37, 25 February 2024 (UTC)Reply
Proposal 2b for vertical ToCs. Example at User:Sarri.greek/notes#For_few_languages. TOC vertical-L2 with horizontal L3,4,5etc and TOC horizonal L2 with vertical L3,4,6. For excessive number of languages Ls see my #Proposal_1b as in User:Sarri.greek/notes#TOC_hor_limit2. These, if a programmer undertakes the burden of creating modules and css that will handle various styles of ToCs. I also think that the style and placing of ToCs is at the discretion of editors, regardless of which skin is around the bodytext, which may or may not produce an automatic TOC. Discussed with WMF at wikt:el:Wikiacademy/2023Vector#Modifications?. I truly hope that en.wiktionary could come up with multiple solutions and styles. Thank you ‑‑Sarri.greek  I 23:55, 25 February 2024 (UTC)Reply
Ideally, I would want items in the menus individually collapsible. That way you could have the 1st-level only version, but click a control on the right of an item to view all of its sub-items. I have my doubts as to whether it can be done with the standard browser toolkit, but I can dream... Chuck Entz (talk) 00:02, 26 February 2024 (UTC)Reply
@Chuck Entz:,many different styles should be available. I have been trying for 2 years now, at enWP, at pages of module-programmers, at en.wikt, to please please someone create something like wikt:el:Module:toc-test wikt:el:Πρότυπο:toc-test but the output should be like the manual wikt:el:Πρότυπο:test-ol? (I know nothing about Lua, i cannot do it).... ‑‑Sarri.greek  I 00:11, 26 February 2024 (UTC) @Chuck Entz: individually collapsible too like at fr.wiktioanry. https://fr.wiktionary.org/wiki/table?useskin=vector And fully numbered too. ‑‑Sarri.greek  I 00:15, 26 February 2024 (UTC)Reply

"Tasmanian language" edit

Category:Tasmanian language is not a single language. There were an unclear number of distinct languages spoken on the island prior to the genocide of its native inhabitants, and all we have of them are wordlists in a wide range of mostly defective orthographies. For example, one word for "buttock" is variously transcribed as <leen.her>, <leieena>, <leng.in.ner>, and <liengana>. Many of the wordlists also mix data from different locations. One wordlist supposedly of Tasmanian even turned out to be Kaurna, a mainland Australian language.

I am not sure what to do with the data from Tasmanian languages, but clearly a situation like binearrenerepare, where itho, meener, munger, and nomemene are all given as "synonyms" of this first-person singular pronoun, is not ideal.--Saranamd (talk) 18:03, 25 February 2024 (UTC)Reply

My stance is - if we don't even know what it is, better either not record it (always a good solution), or, if someone burns with desire to do so, put it under CAT:Undetermined lemmas. Thadh (talk) 18:33, 25 February 2024 (UTC)Reply
Not recording something is the worst solution. We should check which lemmas go with which lang, and put them there, or put them in Undetermined Lemmas if we can't. CitationsFreak (talk) 06:38, 26 February 2024 (UTC)Reply
Hard agree - I strongly oppose any situation where we can't record something because past data is imperfect (unless there is genuine reason to doubt it even existed at all). It only serves to erase its speakers from history even more than they already have been. Theknightwho (talk) 15:44, 26 February 2024 (UTC)Reply
Nothing is erased from history, there are plenty of papers and books on the subject. "All words in all languages" does not mean we are obligated to document everything ever recorded by anyone. Thadh (talk) 15:59, 26 February 2024 (UTC)Reply
@Thadh We can qualify entries by stating the limitations of any sources. Simply refusing to record anything that's imperfect or uncertain undermines the whole project. Theknightwho (talk) 16:49, 26 February 2024 (UTC)Reply
There is a difference between "imperfect" and "we don't even know what this is". We shouldn't record the Voynich manuscript or Linear A or some Uugawoogan-English wordlist published in 1630 either. Thadh (talk) 16:51, 26 February 2024 (UTC)Reply
@Thadh The first two examples are untranslated, while the third is not. The only reason we don't know what Linear A and the Voynich manuscript are written in is because they haven't been deciphered; whereas in the third example, you just don't think the source is very good. Completely different situations, and the third clearly falls under "imperfect". Theknightwho (talk) 17:21, 26 February 2024 (UTC)Reply
It's not about not being good, it's about not correlating to the reality we currently witness. We can't identify the language with any one language that exists now and we don't have a good corpus to give us multiple accounts of this language, and until we can or do I don't think it makes sense for us to record these. Thadh (talk) 17:31, 26 February 2024 (UTC)Reply
Indeed, we broke up "Tasmanian" and created codes for more specific Tasmanian languages (following the literature) several years ago, although I can't find the discussion at the moment (linked in the 2020 discussion that follows), and in 2020 the ISO followed our suit and retired xtz and split it into codes for specific languages, so we were even able to upgrade our exceptional codes (e.g. aus-pee) to ISO codes (e.g. xpw). AFAIR Category:Tasmanian language only still exists because we didn't have time to fully clean up every occurrence of xtz and then retire the code; if someone can clean up the last few entries in that category (binearrenerepare, bo, itho, lia, meener, munger, narrar, nomemene), ideally assigning them to the relevant more specific codes (see the "in 2020" link for a list), that'd be great. "Tasmanian" translations in water and one also need to be changed or removed. - -sche (discuss) 21:05, 25 February 2024 (UTC)Reply
@-sche Thanks for this. binearrenerepare is now Port Sorell, itho is Pyemmairre, meener is Paredarerme, and munger is Peerapper in line with Crowley & Dixon 1981. I do not have access to the cited source and Crowley & Dixon do not give narrar, nomemene, or bo as attested Tasmanian 1SG pronouns so not sure where they come from.--Saranamd (talk) 16:44, 26 February 2024 (UTC)Reply
nomemene+Tasmanian gets no Google Books hits and almost no Google web hits; I've RFVed it. Narrar is piped so that it links to "he" but displays "I", which does not give me confidence in the person who created the entry; it is mentioned in Henry Ling Roth, John George Garson, The Aborigines of Tasmania (1899), page 184, as the personal pronoun corresponding to "he, she" and has a parenthetical "Norman" after it; I'll RFV it too, and bo. - -sche (discuss) 18:29, 27 February 2024 (UTC)Reply

Lojban cleanup again edit

@AugPi who has worked on Lojban and is still active. I would like to de-Lojbanize the grammatical terminology here on Wiktionary. I partly did this before; you can see for example, that Appendix:Lojban/vo'i has its header specified as ==Particle== instead of ==Cmavo==, as it did previously. But the template that defines the headword is still {{jbo-cmavo}} and it uses the Lojban-only POS cmavo. Furthermore, the categories of this term (besides Category:Lojban lemmas) are Category:Lojban cmavo, Category:Lojban cmavo of selma'o KOhA and Category:Lojban pro-sumti, and the definition of this term is as follows:

Repeats the x3 sumti of the main bridi of the current sentence.

To me, this reads as total gobbledygook, and I doubt 99.9% of anyone who comes across this definition will have any idea what it means.

As a general rule, we don't use native grammatical terms in Wiktionary entries, but map them to the closest English terms, and Lojban should be no exception despite being a rather unusual language. I would like to do a more thorough cleanup where we replace Lojban grammatical terms with English ones everywhere. For example, the definition could be reworded something like this:

Repeats the third argument of the main clause of the current sentence.

assuming that argument is a good translation of sumti and clause of bridi. If necessary we can include the Lojban terms in parens following the English term, like this:

Repeats the third argument (sumti) of the main clause (bridi) of the current sentence.

User:AugPi, can you help me compile a list of Lojban terms and their best English equivalents? Feel free to ping anyone you know who works on or has worked on Lojban.

For the terms needing translation, we can start with those that have infected Module:headword/data:

1. For lemmas:

cmavo
cmavo clusters
cmene
fu'ivla
gismu
lujvo

2. For non-lemma forms:

rafsi

3. Other terms appearing in categories are:

brivla
selma'o
fu'ivla cmene
lujvo cmene
pro-bridi
pro-sumti
sumti tcita

4. Other terms in Category:jbo:Grammar are:

bridi
gadri
jufra
selbri
sumtcita (=sumti tcita?)
tanru

Thanks! Benwing2 (talk) 03:39, 29 February 2024 (UTC)Reply

Navajo category renames edit

Does anyone here at Wiktionary still work on Navajo these days? I am planning on renaming the Navajo verb categories to be more standard. In particular:

  1. Navajo verbs with prefix foo- will become Navajo terms prefixed with foo-; e.g. Category:Navajo verbs with prefix shó- -> Category:Navajo terms prefixed with shó-
  2. Navajo verbs with foo prefix bar- will become Navajo terms prefixed with bar- (foo); e.g. Category:Navajo verbs with disjunct prefix ʼá- -> Category:Navajo terms prefixed with ʼá- (disjunct)
  3. Category:Navajo terms with emphatic infix -x- -> Category:Navajo terms infixed with -x- (emphatic)
  4. Navajo verbs with classifier -foo- will become Navajo terms prefixed with foo- (classifier); e.g. Category:Navajo verbs with classifier -∅- -> Category:Navajo terms prefixed with ∅- (classifier)
  5. Category:Navajo verbs with peg element yi- -> Category:Navajo terms prefixed with yi- (peg element)
  6. Navajo verbs with postpositional prefix -foo will become Navajo terms prefixed with foo- (postposition); e.g. Category:Navajo verbs with postpositional prefix -aʼ -> Category:Navajo terms prefixed with aʼ- (postposition)
  7. Mainspace entries for terms that are prefixes but don't have a following hyphen, or do have a preceding hyphen, will be renamed.

The logic here:

  1. I renamed "verbs" -> "terms" in categories because the vast majority of such categories are for verbs. There are only five categories in Category:Navajo terms by prefix representing a total of 11 lemmas (mostly nouns). Navajo seems a heavily verb-centric language, so it can be assumed a given prefix is verbal, and if there are prefixes that can be both nominal and verbal and it's important to note the difference, this can be handled in the parenthetical tag.
  2. I standardized the use of hyphens in prefixes. The current etymologies are not consistent in the use of hyphens. IMO everything that's a prefix (where "prefix" means anything coming before the root) should have a following hyphen and no preceding hyphen.
  3. The use of the term "classifier" here does not follow the standard usage of this term (e.g. as in East Asian languages). For example, -ł- is defined as follows:
    The -ł- classifier or valence-change prefix, a causative-transitivizing prefix of active verbs that modifies the transitivity or valence and grammatical voice of a verb. It often transitivizes an intransitive -∅- (unmarked) verb:
    This doesn't appear to have anything to do with (e.g.) Chinese classifiers, which categorize a noun semantically, a bit similarly Indo-European genders. Instead it's simply a type of prefix. If "classifier" is the normal term in Navajo grammars, it's fine to maintain it as a parenthesized tag (as I have done), but it should not be placed in the Classifier POS.

Benwing2 (talk) 04:44, 29 February 2024 (UTC)Reply

@Eirikr, who knows something about Navajo, even if he doesn't work with it. I've worked a little on other American Indian languages, and this is similar in the way the lines between inflectional morphology, derivational morphology and syntax tend to blur- I'm not sure if we can make these fit neatly into anything. When you can have a single word that means "I saw those two women walk this way out of the water", all bets are off. In the case of Cahuilla, the verb is the sentence, with subject pronouns, object pronouns, many adverbs/prepositions, etc. reduced to affixes, and separate words mostly just used to name the referents of the affixes. Chuck Entz (talk) 16:07, 29 February 2024 (UTC)Reply
@Chuck, thanks for the ping. Much IRL is keeping me busy enough that my Wiktionary time is more limited.
@Benwing2, re: renaming the categories, I have no particular concerns. What you explain above all makes sense.
Specifically about the verb-valence morphemes, I agree that the "Classifier" part-of-speech would be a mistake. That said, these are commonly called "verb classifiers" in the literature, particularly the seminal resources by Young and Morgan, if memory serves. I'll dig up a couple of my dead-tree books later on and make sure I'm not mis-remembering.
(FWIW, I just tried googling for complex Navajo verbs, and ran across the Reddit thread "What makes Navajo considered so difficult?". Please take the "qcomplex5" poster there with a big grain of salt — they list a lot of "Navajo" examples that are patent gibberish. For example, they gloss bízhiʼ jį́ as "he/she is asleep", but as you can see from our entries, this is the two nouns "his/her name" + "day". The rest of their "Navajo" is similarly whackadoo.)
Anyway, like Chuck describes for Cahuilla, a Navajo verb incorporates many of the grammatical elements that are explicitly separate in many other languages, such that subject, object, etc. are all fused in as part of the "verb" word. Consider the relatively simple example of yishdlóósh, an intransitive verb meaning "I creep on all fours". This incorporates the first-person subject pronoun shí as that medial -sh-. Or compare ółtaʼ (s/he reads something unspecified; s/he studies, transitive, unspecified object), yółtaʼ (s/he reads it; s/he studies it, transitive, specific object), and then wóltaʼ (it is read, passive), which also involves a change in the so-called "classifier" from active / transitive -ł- to intransitive / passive / reflexive -l-.
I digress, but I hope that helps.  :) ‑‑ Eiríkr Útlendi │Tala við mig 20:29, 29 February 2024 (UTC)Reply
@Eirikr Thanks, Eirikr, yeah this makes sense to me and in some ways the use of a valence-changing "classifier" is simpler than English, where we have to use a different finite verb ("to be") to passivize and the main verb changes from a finite form to a participle. The valence change prefix reminds me a bit of se in Romance languages, which is similarly ambiguous as to whether it's intransitive, passive or reflexive. Benwing2 (talk) 23:39, 29 February 2024 (UTC)Reply
  Done. Benwing2 (talk) 04:24, 16 March 2024 (UTC)Reply

Sanskrit kṣ-aorist edit

@Dragonoid76 What is a Sanskrit kṣ-aorist meant to be? By the current categorisation rules it appears to simply be an s-aorist whose root ends in a velar before the sibilant, which at the very least invites the parallel of ts-aorists. Is it perhaps a confusion with sa-aorists, whose aorist stems all end in -kṣa and whose affix is sometimes called ksa? (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76): --09:51, 29 February 2024 (UTC) RichardW57 (talk) 09:51, 29 February 2024 (UTC)Reply

@RichardW57 Last I checked there was a lot of confusion in Sanskrit noun and verb classification. Benwing2 (talk) 10:11, 29 February 2024 (UTC)Reply
@RichardW57 Yes, it is the same as the sa-aorist. Dragonoid76 (talk) 18:05, 29 February 2024 (UTC)Reply
@Exarchus: So none of the 4 verbs in Category:Sanskrit verbs with kṣ-aorist should be there! RichardW57 (talk) 19:56, 29 February 2024 (UTC)Reply

Template:desc-family edit

I created this as an experiment. It's meant to clean up cases like these, where we want to group languages together by family, but do not have a proto-form. (In practice, this does happen occasionally; any argument that there must always be a proto-form is absurd to me.) This allows 1. better formatting (with tooltips for borrowings, etc.) and 2. easier parsing. I'm bringing it up here in case anyone has any feedback on it. — SURJECTION / T / C / L / 22:11, 29 February 2024 (UTC)Reply

@Surjection Looks good to me. I agree that trying to force a proto-form when there isn't one reconstructed isn't helpful. Benwing2 (talk) 23:35, 29 February 2024 (UTC)Reply

March 2024

A way to more easily connect with readers edit

I have seen this idea thrown around some and I have had it myself - what if we had some official social media accounts where we can respond to readers, give polls, etc., that admins have access to? In theory readers can interact with us here i.e. at the Information Desk etc., but I think that process can be a little obtuse for the average person, and for some even intimidating. I also know that various Wikimedia projects have their own accounts on various platforms. Also, since it's an open project, the more input the better, theoretically. Vininn126 (talk) 08:45, 1 March 2024 (UTC)Reply

I see you beat me to it. I have personally had a need for this on numerous occasions, finding Reddit and Twitter posts that (often in the form of a joke/curiosity) showed serious errors in our dictionary. A recent example is diff. A way to interact with the people that bring such mistakes to our attention is in my opinion very important.
I think the easiest way to maintain this is to simply create a couple of accounts and share their login information in the Admin channel on Discord, which automatically gives all admins that have joined the Discord server the means to manage these accounts. Then afterward we can post various polls and/or announcements after a consensus with the community, while also having the ability to quickly respond to feedback. Thadh (talk) 09:26, 1 March 2024 (UTC)Reply
@Thadh I agree, but we should probably message other admins and send it to their emails potentially, so as not to have a barrier to get access. Vininn126 (talk) 09:29, 1 March 2024 (UTC)Reply
I would prefer we do that based on requests. Many admins are not very active and I don't want the mail to go to some years-long unchecked inbox. Thadh (talk) 09:42, 1 March 2024 (UTC)Reply
Some platforms we should consider: Reddit, Twitter (X), Facebook. Any other suggestions? Vininn126 (talk) 09:58, 1 March 2024 (UTC)Reply
Only fans? Allahverdi Verdizade (talk) 10:19, 1 March 2024 (UTC)Reply
You wish. I ain't doing a body reveal that easily. Vininn126 (talk) 10:23, 1 March 2024 (UTC)Reply
I hate the idea that we are, in effect, endorsing/legitimizing and making more attractice these intrusive systems, but we are effectively forced into it by user preference for them. DCDuring (talk) 13:54, 1 March 2024 (UTC)Reply
@DCDuring One thing we could do is also promote how to engage with the site more directly. By creating accounts on these sites with wider reach, we can bridge the gap for readers who are scared to edit and also show them how to start discussions, etc., pontitally increasing editorship. Vininn126 (talk) 13:57, 1 March 2024 (UTC)Reply
@DCDuring Alternatively, we could see it as engaging with the reality that users will not always come to the site directly in order to raise issues. Ignoring that isn't going to help anyone. Theknightwho (talk) 14:13, 1 March 2024 (UTC)Reply
It's certainly not a bad idea. But, in the interest of protecting myself from "personalization", I waste my time at MW projects, not the commercial sites. DCDuring (talk) 14:50, 1 March 2024 (UTC)Reply
@DCDuring What do you mean by personalization? Kiril kovachev (talkcontribs) 18:24, 2 March 2024 (UTC)Reply
Generating content tailored to me, certainly including advertising, possibly already including or soon to include price discrimination. DCDuring (talk) 20:36, 2 March 2024 (UTC)Reply
Mastodon. CitationsFreak (talk) 16:33, 1 March 2024 (UTC)Reply
I second this. Allahverdi Verdizade (talk) 20:53, 2 March 2024 (UTC)Reply
VK may be a good idea to attract any potential editors and/or readers from Russia. Thadh (talk) 20:45, 10 March 2024 (UTC)Reply
I support this idea, but would it only be admins? I might like to have access as well. Ioaxxere (talk) 20:43, 1 March 2024 (UTC)Reply
@Ioaxxere I think without a rigorous way to add users, it might devolve to a free-for-all. Also at the beginning, I think only truly trusted users should be given access. Perhaps there'd be a process for adding trusted users in the future. Vininn126 (talk) 20:46, 1 March 2024 (UTC)Reply
We had an unofficial Twitter account created by WF. Only admins can see it, but there's some information at the deleted page for User:Wikt Twitterer (I believe they created it when they were using that account, but kept the Twitter account going after the Wiktionary account was blocked). Chuck Entz (talk) 23:56, 1 March 2024 (UTC)Reply
Well considering no one has objected I think we can probably move forward. The question is what email to use when signing up for these accounts and what username to use. Vininn126 (talk) 10:49, 8 March 2024 (UTC)Reply
We should probably create one to go with these accounts. Best not to advertise its address anywhere on-wiki though. Thadh (talk) 17:18, 8 March 2024 (UTC)Reply
Use a Wikimedia address. Don't go third-party like Gmail etc. Equinox 17:25, 8 March 2024 (UTC)Reply
That's a good point. Vininn126 (talk) 17:26, 8 March 2024 (UTC)Reply
@Equinox How might we do that? Vininn126 (talk) 17:30, 8 March 2024 (UTC)Reply
I assume we tell Wikimedia that we want to make a social media account. CitationsFreak (talk) 20:22, 8 March 2024 (UTC)Reply

Bengali language edit

I intended to create a phonelist for Bengali. Is there anyone who can guide me through bot stuff? Arundhatisgupta (talk) 18:24, 1 March 2024 (UTC)Reply

@Arundhatisgupta What do you mean by "phonelist"? What sort of bot work are you trying to do? (Keep in mind if you plan to do page edits using a bot, you need to get permission to do so.) Benwing2 (talk) 23:45, 5 March 2024 (UTC)Reply

Restricting {{m}} in etymology sections edit

Wiktionary's etymology sections are not very machine-readable, and the main issue is the {{m}} template, which can be used in a wide variety of ways:

  • Origin within a language: A {{glossary|respelling}} of {{m|en|puisne}} (in puny)
  • Listing alternative forms of an etymon: From {{inh|en|enm|hed}}, {{m|enm|heed}}, {{m|enm|heved}}, {{m|enm|heaved}} (in head)
  • Listing related terms: More at {{m|en|Tyr}}, {{m|en|day}}. (in Tuesday)
  • Listing unrelated terms: Not related to {{m|en|Romanian}} or {{m|en|Roman}}. (in Rom)

I propose that {{m}} be used only for unrelated terms and that we create new templates for the other three cases. Ioaxxere (talk) 20:41, 1 March 2024 (UTC)Reply

In the case of "More at...", that should be {{l}} anyway, since it refers to the entry and not the term. Theknightwho (talk) 21:29, 1 March 2024 (UTC)Reply
Just in terms of the formatting produced, I dislike the use of {{l}} when used inline in other running text: {{m}} produces italicized text, which is visually distinct from the rest of the text.
For that matter, I understood the "l" in {{l}} to stand for "list", as this template was originally intended to only be used in lists of terms, where formatting to distinguish from other running text isn't needed. ‑‑ Eiríkr Útlendi │Tala við mig 05:09, 10 March 2024 (UTC)Reply
That is an unrealistically lofty ideal. {{m}} has many other uses and you simply cannot sequester them all into separate templates. — SURJECTION / T / C / L / 21:36, 1 March 2024 (UTC)Reply
I wouldn't mind that it no longer be used to italicize (often mis-italicize) taxonomic names. But there are no restrictions on its use at present, it has mostly been used for formatting, and there is no incentive for users to limit use. It seems particular hard to imagine that we could get users to comply with different rules based on the L3/4/5/6 header they were editing in. Our filters are already getting intrusive and unhelpful. DCDuring (talk) 22:50, 1 March 2024 (UTC)Reply
This doesn't seem like a good idea... Thadh (talk) 22:53, 1 March 2024 (UTC)Reply
Being able to mention other terms is very, very useful. Vininn126 (talk) 22:56, 1 March 2024 (UTC)Reply
Yeah, I don't think we need a proliferation of different templates. It will just make coding much harder. As it is, it's already difficult to learn how to use templates like {{en-verb}}, {{inflection of}}, and {{Module:quote|call_quote_template}}. — Sgconlaw (talk) 22:59, 1 March 2024 (UTC)Reply
  Oppose. I spend a lot of time fixing etymologies where someone copied the entire etymology from an entry in another language without changing the language codes. If people routinely get that wrong, they're not going to have a clue about the subtleties and intricacies proposed. You'll end up with people copying from one entry where they make sense to another where they're all wrong- or worse, partly wrong. Unlike with language codes, there's no reliable way to tell if they're being misused without knowing something about the etymology (if there were, you wouldn't need them in the first place). Basically, this proposal would give editors more ways to be wrong. Chuck Entz (talk) 23:35, 1 March 2024 (UTC)Reply
Also, I don't know if we would want to do this in the name of machine-readability. Wiktionary is not really a database, so if we wanted it to be machine-readable, it would have had to have been one to begin with. Maybe Wikidata could hanlde this kind of thing instead. Kiril kovachev (talkcontribs) 00:50, 2 March 2024 (UTC)Reply
@Kiril kovachev I think it's a balance - we need to be machine-readable to some extent, since some users rely on that to collate info from a wide array of entries (and it also helps our bots), but I'd agree that the templates suggested here would be a step too far, as I don't really see what advantage they'd provide. Theknightwho (talk) 18:16, 2 March 2024 (UTC)Reply
That's true, I agree with your point here. It's nice to have a clearly-defined {{inh|en|grc|...}} kind of thing, but there're also ways in which our etymology section can be virtually free-form and forcing it to be more machine-readable would kill that flexibility. Such as the ways we may be using {{m}} right now. Kiril kovachev (talkcontribs) 18:23, 2 March 2024 (UTC)Reply
It doesn't seem very useful to me. Are there plans to have machines doing something with our etymology sections anytime soon? At some point far enough in the future, improvements in machine comprehension of natural language might make it easier for machines to understand what humans write, rather than forcing humans to adjust how they write so machines can understand it. I think there are a lot of aspects of writing etymologies that are difficult to boil down to a fixed set of templates, so I'm not enthusiastic about us engaging in that project unless there's some real benefit we can point to. Simple etymologies already use templates, so this proposal seems to deal with a tail of complicated etymologies (do you know what percentage do contain {{m}})?--Urszag (talk) 01:06, 2 March 2024 (UTC)Reply
This seems to be a good attitude/strategy for such matters in general. DCDuring (talk) 17:52, 2 March 2024 (UTC)Reply
  Oppose. Makes things harder, would make me inclined to not do etymologies if it's such a pain, even if I know the word origin. If we must restrict codes in this way, introduce a nice GUI that creates the code from our menu choices or something. Equinox 18:27, 2 March 2024 (UTC)Reply
Honestly, an extension of the New Entry Creator that can easily add etymologies and quotes would be nice... CitationsFreak (talk) 06:02, 3 March 2024 (UTC)Reply
The lack of a template for this purpose has irked me in the past. We have nice templates for when a word comes from another language, but not for when it comes from another word in the same language (unless by some well-known process such as affixation).
Recently I wanted to generate a list of all English terms which are said to derive from another English term, but where that term's entry doesn't include the term derived from it in a "Derived terms" section. Such a list would help fill gaps in our "Derived terms" sections, but it's all but impossible to generate a comprehensive list like this with the current setup.
I would definitely support an effort to designate a template specifically for same-language derivations. It seems it would be possible to use {{af}} for this purpose with minor modifications to its code, and probably a new name too:
A {{glossary|respelling}} of {{af|en|puisne}}. Not related to {{m|en|some other term}}.
The other uses of {{m}} can be dealt with in other ways. This, that and the other (talk) 03:21, 4 March 2024 (UTC)Reply
@This, that and the other what do you think it should be called? For now, we could create a template which redirects to {{affix}}. In the future, I think {{af}} should be adapted into a generic "internal derivation" template. I think @Benwing2 is on the same page on this. Ioaxxere (talk) 17:30, 4 March 2024 (UTC)Reply
I realized that {{from}} didn't exist—I think that's a good name. Another idea is to be able to adapt {{der}} to allow a faster way of writing {{der|en|en|term|nocat=1}}. Ioaxxere (talk) 18:58, 4 March 2024 (UTC)Reply
Yes, I recall seeing a discussion about broadening the use of, and renaming, {{af}} in the past (not sure where or with whom).
{{from}} is an excellent name - good find. This, that and the other (talk) 00:08, 5 March 2024 (UTC)Reply
Sounds like a skill issue on the part of the machines. Nicodene (talk) 11:13, 7 March 2024 (UTC)Reply

deprecate Template:1 edit

I have renamed this to {{cap}} and deprecated it per the discussion in WT:RFM, but User:Equinox reverted the deprecation claiming it will save them keystrokes. I would like to see what people think about keeping this deprecated. I don't see how two keystrokes makes much of a difference, and {{1}} is just about the worst alias imaginable. If keystroke savings is really a big deal, we could use something like {{ca}} or {{cp}}, both of which are currently undefined. Benwing2 (talk) 02:02, 3 March 2024 (UTC)Reply

Looking at the RFD, I see that {{M}} was suggested as a new name for this template by User:This, that and the other. We should just switch the template's name to that. Saves the same amount of keystrokes as {{1}}, better alias. (Plus, there was no real consensus to deprecate it in the first place.) CitationsFreak (talk) 06:24, 3 March 2024 (UTC)Reply
Equinox's argument is weak - "cap" falls under the fingers nicely (on a QWERTY keyboard at least) and is just as typeable as "1". As for alternative names, {{M}} (for "majuscule") is just okay, because of the existence of lowercase {{m}}. The other obvious single-letter shortcuts ({{C}} for "capital" and {{U}} for "uppercase") are already taken. Another alternative would be {{^}}, implying "raising" the first letter to uppercase. This, that and the other (talk) 07:04, 3 March 2024 (UTC)Reply
@This, that and the other @CitationsFreak {{U}} is hardly used so we could easily repurpose it. Also how about {{uc}}? Benwing2 (talk) 08:07, 3 March 2024 (UTC)Reply
Please don't deprecate, rename, etc. An uppercase template name, like {{M}}, is a bit worse than a lowercase one, like {{1}}. I, for one, appreciate any keystroke savings for my arthritic joints. DCDuring (talk) 13:29, 4 March 2024 (UTC)Reply
I know this template only for a few weeks, since editors always preferred bare links. The issue here is that it looks homographic to the non-italic linking template {{l}}, whereas we don’t want confusables. For this purposes anything cV seems to be bad already, looking like {{cat}}, {{c}} and {{C}}, the parameter |nocap=, and {{caps}} and {{cx}} and what not. I suppose Benwing2 wants to cleanup {{caps}} too, though, since this is only used in about 200 entries having {{he-root}}.
Intuitively I propose {{up}} and {{high}} since the letters are close together, and high, on the keyboard. And {{}} which is Shift + AltGr + U on my standard xkeyboard-config layout, I’d actually use that, it looking exactly as much better than {{1}} as needed, Abloh’s 3% rule or something. Not seen DCDuring using the template, but the same concern can be valid for other editors and a rename can make it better. Fay Freak (talk) 14:29, 4 March 2024 (UTC)Reply
I don't see the point of deprecating this template if multiple editors use it to save keystrokes. However I think we should be automatically subst-ing every instance of {{1}} and {{cap}} for the sake of readability. Ioaxxere (talk) 17:36, 4 March 2024 (UTC)Reply
In the software industry, "deprecation" usually gives you a long time to deal with something. For example, Microsoft deprecated WebClient (a class used to perform Internet downloads), but it continues to work for many years. Also, there is usually a genuine stated rationale by which the replacement is better, not just a programmer's whim. You can joke "it's not a big deal", but it is longer to type cap than 1 (especially if you create thousands of entries, like I do) and there's also muscle memory, which is really important for older people: please understand this, even if you are young: it's ableism. In this case, it costs us literally nothing to retain the 1 page as a redirect, which makes the template work fine. Removing and breaking the redirect can be nothing but either (i) punishing "old dogs" who can't learn "new tricks", or (ii) a fascist march ahead that supports developers but not users who create the project. Equinox 03:43, 5 March 2024 (UTC)Reply
@Benwing2: Some years ago, we had a very aggressive template editor who upset many people by placing his/her software design decisions over user needs. Please don't be that person again. There are democratic discussion tools to allow you to work it out without turning off things that really matter to me, as a person who creates hundreds of entries per month and never fucks with a template. Equinox 03:47, 5 March 2024 (UTC)Reply
@Equinox Would {{L}} work as a compromise? It looks sufficiently different that I don't find it confusable, and it's only one extra keystroke. I don't like {{1}} because it looks almost the same as {{l}} in the code. Theknightwho (talk) 19:13, 5 March 2024 (UTC)Reply
L is better than nothing. But I really don't see why it's killing anyone to retain the working redirect.v Equinox 19:18, 5 March 2024 (UTC)Reply
@Theknightwho @Equinox I was thinking of repurposing {{u}}. Not even a shift key extra and it's barely used; {{U}} can be used for user mentions if anyone cares. Please note that this change is not coming out of the blue; the discussions over getting rid of {{1}} have been going on for years, most recently in WT:RFM. I'm also not sure how useful or helpful it is to accuse me of being selfish, fascist, ableist and ageist, and IMO it's definitely not helpful to demand that no template be removed once it's created (or maintained over a several-year deprecation process, which is tantamount to the same thing). Benwing2 (talk) 22:53, 5 March 2024 (UTC)Reply
@User:Benwing2 You should have said that at the start, to be honest. I feel that mentioning that would be more productive for you, since there is the same amount of joint movement in typing both {{1}} and {{u}}, so there could be no argument based on that. CitationsFreak (talk) 23:31, 5 March 2024 (UTC)Reply
I'm getting to the point where I just use wikitext for everything I input. If the wikitext "required" for "proper formatting" is too hard, then I get the words right and leave something that doesn't necessarily conform to WT:ELE or whatever other norms we have for cleanup by others, who seem to like that kind of thing. If the next step is to filter such input, I'm out of here. DCDuring (talk) 01:07, 6 March 2024 (UTC)Reply
@DCDuring I don’t have any strong views on the issue raised by this thread, but this attitude isn’t fair on other users, because you’re just creating clean-up work for others. The idea of using link templates outside of definition lines isn’t new, and it’s not complicated. Theknightwho (talk) 17:40, 7 March 2024 (UTC)Reply
@User:Theknightwho Just more keystrokes and more learning overhead. I find it hard enough to try to make and keep taxonomic and related entries useful and to correct other users' mistaken and omitted uses of {{taxlink}}, {{vern}}, and now {{taxfmt}}. I don't undertake any non-morphological etymologies, instead inserting {{rfe}} (and getting complaints about that), because that's just more learning overhead, easily forgotten. I'm sure I get lots of descendants items wrong too. DCDuring (talk) 18:43, 7 March 2024 (UTC)Reply
@Theknightwho, Benwing2: I’m with User:Ioaxxere above: Why not just automatically subst every instance of {{1}} (perhaps by bot), making this problem vanish? This template, whatever its name, is convenient for editors adding content but bad for readability; subst-ing would keep the convenience while resolving the problems of having such a template hang around in the code. — Vorziblix (talk · contribs) 02:57, 6 March 2024 (UTC)Reply
@Vorziblix It's not possible to automatically do this except by periodically running a bot script. We only have a few things that currently run by periodic bot scripts, and AFAIK they are all triggered manually (by me, or in the case of {{t+}}, by User:Ruakh, although I don't know whether this still runs); in general I am reluctant to add more esp. to mainspace pages because they cause surprise for editors and are a maintenance burden. Also, for long words at least, it might be worse to have it duplicated in capitalized and lowercase forms than to have a (properly-named) template that wraps a single instance of the word. Benwing2 (talk) 03:04, 6 March 2024 (UTC)Reply
Why aren't you 'reluctant' to do things that add more keystrokes? Is it because you aren't the one doing those keystrokes? Or do you think that our content is so good that all we have to do is pretty the dictionary up and let AI fill in the gaps? DCDuring (talk) 13:19, 6 March 2024 (UTC)Reply
Needlessly snarky. Vininn126 (talk) 13:28, 6 March 2024 (UTC)Reply
@DCDuring: Look, we can’t imagine well how it is to have arthritis and have to balance the concerns of joints and eyes of everyone, stop being so combative. Depending on the position of the keys, one or two keystrokes more may go easier for you than even one: if they are in a close area and if they are in the upper mid; 1 is at a corner and {{1}} strains the eyes of people with impeded and good eyesight in view of {{l}}. That’s why I have these three suggestions here, we might take two of: {{up}}, {{high}}, {{}}. I actually think a lot about keyboard layouts, the curly brackets are at the keys for 8 and 9 for me and for US standard <AD11> and <AD12> (the two right of O and P) and so these will be typed on one hand easily. Fay Freak (talk) 13:36, 6 March 2024 (UTC)Reply
That's not snarky. I'm really concerned about attitude.
My eyesight isn't very good either. I've weighed the difference to me.
So, I'm just supposed to roll over? I haven't objected to the {{subst}} idea.
Why don't we have a thoroughgoing consideration of keystroke minimization. Why not use {{i}} for initial capitalization, instead of wasting it as a redirect to {{qualifier}}, when {{q}} also redirects thereto? DCDuring (talk) 15:45, 6 March 2024 (UTC)Reply
In general the concerns of easy input and easy readability for future editors both have to be considered when naming templates. There are ways for editors to configure their own machines to make entry easier, e.g. I think an AutoHotkey script could be used on Windows to convert {{1| to {{cap|, or do anything similar like this, on the user's end.--Urszag (talk) 02:52, 7 March 2024 (UTC)Reply
On the one hand I support the principle of things actually making sense, and nothing about the abbreviation "1" does. On the other hand it seems fairly harmless, and if it really is saving Equinox so much trouble, why not? Nicodene (talk) 11:10, 7 March 2024 (UTC)Reply
Mehhh. I agree it's an unintelligible name (and therefore proposed at RFM that we make the 'main' name something more intelligible), but redirects are cheap and I don't see harm in leaving {{1}} as a redirect. Some prolific editors are clearly used to using it. (In any given couple of months, we have one or two entries which use {{altcaps}} and thus just display a redlink, because I or someone else has been unable to recall what the new name for that is.) I admit {{1}} is a particularly unintelligible name, though (unlike e.g. {{altcaps}}). - -sche (discuss) 06:04, 9 March 2024 (UTC)Reply
Badly named redirects add cognitive burden to people trying to understand the wikicode, and redirects in general (esp. badly named ones) increase the tech debt; enough of them and the site becomes unmaintainable. This is why people like me and User:Theknightwho who put time into maintaining the site (rather than just using it) push back against having random redirects littering the site. I also still don't know why User:Equinox as well as User:DCDuring (who doesn't even use the alias) and are so attached to this particular alias when I have proposed a more sensible redirect {{u}} that is the same number of keystrokes. (Not to mention that using any template requires 5-6 keystrokes due to the left brace and vertical bar, so I have a hard time buying the argument that a single extra shift key makes a huge difference. I should also add, Equinox accused me of ageism and ableism knowing almost nothing about me -- I am in fact older than him and have suffered my own spate of hand-related disability.) Benwing2 (talk) 06:18, 9 March 2024 (UTC)Reply
Personally, I feel that most of the arguments that apply to {{1}} apply to {{u}} as well. Also, after enough uses of it, they will be using it in no time (like with the mandated use of "en" in the etym and quote fields). CitationsFreak (talk) 20:50, 9 March 2024 (UTC)Reply
{{up}} is still clearer than {{u}} and easy enough to type. My tendency is always that single-letter templates are badly named if they might look like something else (e.g. usage templates, {{user}}) and as after all there are little more than twenty letters available. This is not strictly comparable to terminal commands either, where we use to have a -V synonym of a longer --word. The one-ASCII-character ones really need broad consensus, even unconscious one. I doubt that {{u}} for {{cap}} will have this habitation like {{m}} and {{l}} have. The difference is also that these, and {{q}}, have semantics, even if it only consists in wrapping a language other than the working one, that capitalization at the beginning of English glosses hasn’t. All rationalizations that I am uneasy about {{u}} and {{i}} for any purpose. So far I have only three one-letter template-codes I use and watch out for. Fay Freak (talk) 21:49, 9 March 2024 (UTC)Reply

Report of the U4C Charter ratification and U4C Call for Candidates now available edit

You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello all,

I am writing to you today with two important pieces of information. First, the report of the comments from the Universal Code of Conduct Coordinating Committee (U4C) Charter ratification is now available. Secondly, the call for candidates for the U4C is open now through April 1, 2024.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. Community members are invited to submit their applications for the U4C. For more information and the responsibilities of the U4C, please review the U4C Charter.

Per the charter, there are 16 seats on the U4C: eight community-at-large seats and eight regional seats to ensure the U4C represents the diversity of the movement.

Read more and submit your application on Meta-wiki.

On behalf of the UCoC project team,

RamzyM (WMF) 16:25, 5 March 2024 (UTC)Reply

Module Breaker edit

User:Module Breaker should be blocked. Also, why can't I edit Wiktionary:Vandalism in progress? Avessa (talk) 15:06, 6 March 2024 (UTC)Reply

@Avessa: Thanks. Because it is a vandalism target obviously. You can do some useful edits and then edit that page if needed. The bar to become autoconfirmed is low. Fay Freak (talk) 17:34, 7 March 2024 (UTC)Reply

Unlink more and most in English headwords edit

For example:

common (comparative commoner or more common, superlative commonest or most common)

I don't see the point of having links to more and most in this kind of entry. In my view, having excessive links makes a page less visually appealing and could invite misclicks. Would anyone oppose removing these links? Ioaxxere (talk) 22:02, 6 March 2024 (UTC)Reply

Not really seeing why unlinking the words is necessary. Maybe learners of English would find the links helpful. — Sgconlaw (talk) 11:49, 7 March 2024 (UTC)Reply
Because it is easy to click on touchscreens. One would find it helpful only one time theoretically: and then would not understand the English definitions anyway; anyone who would find necessity to click them is at the wrong place with a monolingual dictionary. Nobody said it is necessary, it is about optimization. Fay Freak (talk) 13:35, 7 March 2024 (UTC)Reply
For that matter why have links to commoner and commonest? (Alright, maybe commoner is a special case because of commoner#Noun.) DCDuring (talk) 16:58, 7 March 2024 (UTC)Reply
Because the link is what’s left from WT:ACCEL, or any red link crosslinguistically when there is a comparable situation with periphrastic adjective gradation without the gadget, leaving a red link to invite creation. It would be too distressing to create a page and have no link then. The links are made not with random logics but impulses in mind. Fay Freak (talk) 17:31, 7 March 2024 (UTC)Reply
I don't understand your last sentence. What "random logics" and what "impulses"? DCDuring (talk) 19:58, 7 March 2024 (UTC)Reply
You find something that makes sense, harmonizes with some aesthetic equation. But we have to ponder what will be clicked, by the typical, impulse-driven behaviours of readers and editors. The contradiction against analogy logic (synthetic comparatives vs. periphrastic ones) would barely be felt. Fay Freak (talk) 20:06, 7 March 2024 (UTC)Reply

Wikimedia Canada survey edit

Hi! Wikimedia Canada invites contributors living in Canada to take part in our 2024 Community Survey. The survey takes approximately five minutes to complete and closes on March 31, 2024. It is available in both French and English. To learn more, please visit the survey project page on Meta. Chelsea Chiovelli (WMCA) (talk) 00:23, 7 March 2024 (UTC)Reply

Revoking autopatrolled status from Kwamikagami edit

Some background: @Kwamikagami has been autopatrolled since April 2009, and currently has just over 34,500 edits. They're sporadically active, but when they do edit they tend to make changes to large numbers of entries very quickly, and they tend to focus on single-character entries or anything relating to IPA.

Me, @Benwing2, Vininn126, AG202 and others have been pretty concerned about their sloppy editing for a few months now, and their autopatrolled status makes it much harder to spot. Some examples off the top of my head (but there are literally hundreds like this):

  1. [8]: mass-adding languages with tons of mistakes: the Khoekhoe entry uses the wrong language code throughout, Bodo (India) has the wrong L2 header, and Dogri (even today) still doesn't have a headword template.
  2. [9] Deciding to merge ң and ӈ with no consensus or discussion, despite the fact this caused a bunch of issues for several languages. They also did this for a bunch of analogous letters.
  3. [10] Adding a bunch of stenoscript entries like w—O or adm with merged part of speech headers and/or no headword templates. I can see why they've done this - to avoid repetition - but the obvious and sensible thing to do would have been to start a discussion, not create ~100 more entries with the same issue ([11]). We should not be giving '''{{PAGENAME}}''' on the headword line, and anyone with autopatrolled status should know that.
  4. [12] Adding definitions like "?". No request or attention template - just "?".
  5. [13] Even looking at their most recent contributions, they've wrongly given the pronunciation IPA(key): /ɪkˈsaɪ.ən, -ɒn/ at Ixion. This should be IPA(key): /ɪkˈsaɪ.ən/, /-ɒn/ or IPA(key): /ɪkˈsaɪ.ən/, /ɪkˈsaɪ.ɒn/.

All of this creates a massive clean-up job for everyone else, and Kwamikagami has repeatedly proclaimed that they don't understand the problem, which quite frankly means I don't think they should be autopatrolled anymore. Theknightwho (talk) 22:26, 8 March 2024 (UTC)Reply

Support. I actually believe we should block Kwami for at least a month since they refuse to acknowledge the problematic nature of many of their edits and continue doing the same thing after warnings. But revoking autopatroller status is a good start. Benwing2 (talk) 22:30, 8 March 2024 (UTC)Reply
Yeah, I decided not to bring up the repeated refusal to understand consensus, since it didn't seem relevant to this particular issue, but that's definitely a much worse issue.
The clean-up job of their contributions is going to be huge. Theknightwho (talk) 22:33, 8 March 2024 (UTC)Reply
Support. Vininn126 (talk) 22:31, 8 March 2024 (UTC)Reply
Support. There's also Category:Translingual entries with incorrect language header (not all of them are Kwami's, but too many are). There are good arguments for treating language-specific characters as either the language itself or translingual, but not both. Most of the entries in this category use translingual templates and language codes under other language headers.
The problem they have in general seems to be making snap decisions without thinking things through, then sticking with those bad decisions until forced to abandon them. They know more than I do on a lot of things, but they don't make very good use of that knowledge. As for the whole stenoscript issue: they did actually ask for advice at the time, so that may not be the best example. Chuck Entz (talk) 01:44, 9 March 2024 (UTC)Reply
@Chuck Entz I'm not sure I agree with you re the stenoscript: regardless of when they asked for advice or what the response was, they've still created ~100 entries which are in a completely unacceptable state and will need to be cleaned up by someone. Even if they got no response to at all, what they did was definitely not the right thing to do, and is the kind of thing that has got some new users banned. Theknightwho (talk) 02:07, 9 March 2024 (UTC)Reply
Support + they need a block. AG202 (talk) 04:58, 9 March 2024 (UTC)Reply
@Theknightwho I confess that I've also used this forbidden headword line on sum#Multiple parts of speech and tht#Multiple parts of speech. I agree that it would be better as a template, but I think "multiple parts of speech" should be allowed as a POS header. Ioaxxere (talk) 06:49, 9 March 2024 (UTC)Reply
No, it shouldn't. — SURJECTION / T / C / L / 09:53, 9 March 2024 (UTC)Reply
It looks super objectionable. With little necessity, since at least with {{head}} you can just use |catN=. You might stretch WT:POS a bit by letting a part of speech header be followed by another part of speech header and then only the headword line, which probably contradicts basic publication logics of not having empty headers but would at least look better. For such alternative forms, to save vertical space, we could introduce headers like Pronoun · Adjective (i.e. in my example separated by middle dots). Years ago it was considered whether it would be better to have templates instead of headings, as on other Wiktionaries, only dismissed for Lua memory restrictions, to make appearance centrally manipulatable. Fay Freak (talk) 18:09, 9 March 2024 (UTC)Reply
I don't think this would be widely accepted. It messes with categorization and not having a "head" template violates our practices. I would change it to how entries like obvi & unfort are. AG202 (talk) 20:04, 10 March 2024 (UTC)Reply
@AG202 those entries are sensible, since each abbreviated word corresponds with a single part of speech. Compare tht, which would need to have four or five identical POS sections. Clearly a dedicated template would be preferable eventually, although for now I don't think a few missing categories are the end of the world. Ioaxxere (talk) 23:37, 10 March 2024 (UTC)Reply
The repeated POS sections are what are required at this point per our policy. You should've brought it up with English editors at the very least if not everyone in general before creating those entries like that. It clearly violates our Entry Layout guidelines. AG202 (talk) 23:48, 10 March 2024 (UTC)Reply
@Ioaxxere I agree with User:AG202. There are various imaginable ways of compressing repeated POS sections but (a) it needs discussion, (b) I doubt using a POS "Multiple parts of speech" is ideal in any case; certainly the actual parts of speech should be listed one way or another. Benwing2 (talk) 23:53, 10 March 2024 (UTC)Reply
  Support unfortunately, I think we exhausted other options. Kwami was given multiple written warnings and blocks for EACH these mass edits and continued regardless. They also haven't really helped clean up or shown remorse for their problematic mass edits... - سَمِیر | Sameer (مشارکت‌ها · بحث) 08:47, 9 March 2024 (UTC)Reply
  Support + a block to fix up the entries. CitationsFreak (talk) 23:08, 9 March 2024 (UTC)Reply
  Support Ioaxxere (talk) 23:35, 10 March 2024 (UTC)Reply

Revoked, given:

  1. The unanimous and overwhelming support.
  2. It only takes a nomination from one admin and approval from another for a user to gain autopatrolled status.
  3. This has been open for just over 2 days, which is about 6 times longer than it took for the original nomination to get approved and actioned ([14] [15] [16]).

Theknightwho (talk) 00:18, 11 March 2024 (UTC)Reply

@Theknightwho Thank you. Benwing2 (talk) 00:26, 11 March 2024 (UTC)Reply

Eastern Geshiza language edit

User:Geshiza has been asking about adding this language, but in the meanwhile has created a walled garden of over 30 entries with their own improvised categories, but no templates and no links to or from the rest of Wiktionary.

Adding this language won't be easy, because it's hard to tell what it really is. It's apparently a sub-sublect of Horpa (language code ero), but the Wiktionary article for that language doesn't have much detail about what it describes as "a cluster of closely related yet unintelligible dialect groups/languages". In one analysis of the groupings that it cites, there are 5 "varieties", of which "Central Horpa" has 3 "dialects", one of them being "Dgebshesrtsa (Geshezha 革什扎) (non-tonal)". Whether "Gesheza" and "Geshiza" are the same thing isn't explicitly stated, but another quote in the article makes that seem likely. At any rate, there's no mention at all of "Eastern Geshiza". Does anyone have access to any sources that will make sense out of all this? Chuck Entz (talk) 02:28, 10 March 2024 (UTC)Reply

@Chuck Entz This sort of "do it then get permission" approach was done for Belter Creole as well. I am strongly opposed to allowing this to proceed as it sets a terrible precedent. I would suggest moving the contents into that user's space until it becomes clearer whether there's any hope of supporting this variety or these varieties. Benwing2 (talk) 03:11, 10 March 2024 (UTC)Reply
@Benwing2 I don't think they're comparable at all. Belter Creole is a constructed language, whereas Eastern Geshiza seems to be a variety of Horpa, and I can see that a published grammar exists. I would much rather that we simply put a moratorium on any new entries until we've hashed out how it should be handled, but regardless of the language code they still belong in mainspace. Theknightwho (talk) 03:16, 10 March 2024 (UTC)Reply
@Theknightwho Ultimately maybe so, but not remotely in the current state they're in, and I doubt simply asking or telling this user to stop will make them stop. Who's gonna restructure and clean up the entries once we sort out how many varieties are involved and whether they are L2's or etymology variants? You? If you're not willing to personally commit to doing this then IMO we should move these ill-structured entries to userspace and put them back, gradually, in a properly structured form, once we add the lect codes. Benwing2 (talk) 03:36, 10 March 2024 (UTC)Reply
@Benwing2 @Theknightwho moving the entry's to their userspace is probably fine. They seem to not understand templates (but they are making an effort, as they seem to be trying to make their entries match others here). We could have them practice using templates in their userspace and, once we feel like they understand how templates work, they can move the entries back themselves. — Sameer (مشارکت‌ها · بحث)
As someone who regularly patrols Abuse Filter 68, I can tell you that creating entries with no templates is more common than you might think. Usually it's not bad faith- just cluelessness. Chuck Entz (talk) 04:23, 10 March 2024 (UTC)Reply
@Benwing2: Well, they've already been asked, but it's too soon to tell how they'll respond. Chuck Entz (talk) 03:55, 10 March 2024 (UTC)Reply
@Chuck Entz @Benwing2 they responded and they indicated they will wait until everything is resolved before continuing to edit. — Sameer (مشارکت‌ها · بحث) 05:19, 10 March 2024 (UTC)Reply
@Sameerhameedy Sounds good, thanks for making the request. Benwing2 (talk) 05:21, 10 March 2024 (UTC)Reply

Language titles with category edit

Could the language.titles have a clickable link to their Category? (main, or lemmas, whatever?) Ideally, also with tooltip with their code? (would be very helpful!). At pages with many language sectors, it is very difficult to go down to the bottom and find the language.
e.g. [:Cat:Afar language|<span title="Afar (aa)">Afar</span>] Thank you! ‑‑Sarri.greek  I 12:12, 10 March 2024 (UTC)Reply

I'd rather not add any templates to the headings. One could implement a JavaScript gadget that automatically does this, though. — SURJECTION / T / C / L / 12:31, 10 March 2024 (UTC)Reply
M @Surjection, Thank you. I have no idea how it could be done. I would be delighted at the output. ‑‑Sarri.greek  I 12:57, 10 March 2024 (UTC)Reply
I have a working prototype in User:Surjection/linkLanguageHeaders.js. You can add it to your common.js to test it. Perhaps it can be turned into a gadget if there is interest. — SURJECTION / T / C / L / 13:08, 10 March 2024 (UTC)Reply
Yes, I think it was agreed awhile ago not to use templates in headings and IMO this is just as well. Benwing2 (talk) 00:25, 11 March 2024 (UTC)Reply

I was not proposing a way to do it, I was just showing the desired result. I don't know what js is. I do not change default looks at platforms. As a reader, I would like to click language.titles, because I do not know what they are and Categories are too far away to click. Could, please, en.wiktionary rethink it? Thank you. ‑‑Sarri.greek  I 01:36, 14 March 2024 (UTC)Reply

Hi, it should be available now through Special:Preferences under "Gadgets" as "Add links to language headings that point to the category of the corresponding language." — SURJECTION / T / C / L / 19:09, 14 March 2024 (UTC)Reply
Ω! Μ @Surjection! you did this for me? Hooray! Thank you, thank you! I will find it immediately. You are too kind. I hope, lots of people will like it and that it become standard! ‑‑Sarri.greek  I 19:57, 14 March 2024 (UTC)Reply
It works! it is wonderful; why not for everyone? why hidden in 'gadgets'... You are a magician M @Surjection. The default should be the 'best' and the most useful. ‑‑Sarri.greek  I 20:06, 14 March 2024 (UTC)Reply

Make default language titles with category edit

Great news! M @Surjection, has made a Gadget and we can click the Language.Titles to go to the category! I propose it become default, for all to use. Kiitos! kiitos Surjection! ‑‑Sarri.greek  I 20:27, 14 March 2024 (UTC)Reply

I don't personally think it should be the default, since it can be a bit distracting and confusing to those who aren't used to it. — SURJECTION / T / C / L / 21:44, 14 March 2024 (UTC)Reply
But, M @Surjection, you have made it so discreet and elegant! There are no colours, or anything 'loud' about it. I find it very helpful, because there are many names of languages unknown to us. I am delighted, and I wish all people could use it too. (you may not guess it, but lots of us do not go to Preferences. This was my first time, except for Global Pref. for Vector Classic for wikipedias, and fr.wikt). ‑‑Sarri.greek  I 22:27, 14 March 2024 (UTC)Reply
Thank you @Surjection for doing this! I tried it out and it looks great. @Sarri.greek I think enabling it by default could very well be done a bit down the line but for the moment we should wait to make sure it doesn't have any unexpected interactions with anything else. Benwing2 (talk) 22:36, 14 March 2024 (UTC)Reply
Mainio! hieno! -in honour of M Surjection, from now on, Finnish will be the language of interjections. .js will be renamed .surjs @Benwing, many wiktionaries have clickable Lang.titles. I was, so longing for it. At el.wikt, the visible labels {{lb}} before definitions, link to their Cat. Where, we see on top, the word of the label in host language, and sorted on top, its translation in the target language :) Anything to fascilitate readers! ‑‑Sarri.greek  I 04:38, 15 March 2024 (UTC)Reply
I'm also worried about how it will behave in the mobile view, specifically about if it makes the headings harder to click to expand. It does help get around the lack of categories in the mobile view, something which has always greatly irked me. — SURJECTION / T / C / L / 07:35, 15 March 2024 (UTC)Reply

Two transliterations edit

A question (after endless discussions of how to transliterate Modern Greek at Module_talk:el-translit). I do not know about other languages, but at least for Modern Greek ISO offers two types of conversions.

  • TypeA = unique.conversion letter-to-letter transliteration, reversable (two-directional), used for international usage. Customs, machines etc when one-to-one translit is needed.
  • TypeB = slightly simplified, and pseudo-phonemic, calls it transcription (but not with IPA symbols), for national usage. For Greek, the only difference to TypeA are two macron diacritics.
  • ISO also introduces an idea of a 'level 3' mixed Type, more phonemic, for national usage, 'especially' when the above transliterations are very different from the pronunciation.

The question is: Does en.wiktionary have a rule that says: a) en.wikt is obliged to provide the official unique.conversion ISO transliterations. b) en.wiktionary also provides a more phonemic transliteration based on ISO and House Rules, through consensus.
If a) is yes, then we should have two transliterations (for some languages). Discussions would be needed only for b), saving a lot of our energy. Two translits, How? I propose

word (xxxxx© / xyyxxxyy) ...or I for ISO --please check the tooltips

Thank you. ‑‑Sarri.greek  I 12:42, 10 March 2024 (UTC)Reply

@Sarri.greek Agreed, Persian is also running into this issue. After a discussion months ago it was agreed that Persian templates should have two transliterations (Classical + Iranian) but modules don't support that so we can't do anything rn. I believe Hebrew editors have wanted something similar as well. — Sameer (مشارکت‌ها · بحث) 18:21, 10 March 2024 (UTC)Reply
@Sameerhameedy There is some language-specific support for this in place at the moment: the major example being Chinese (and I'm not referring to the separate languages grouped together), where several lects show two or three transliterations each in the dropdown; Cantonese has four, and Mandarin seven(!). Korean, Thai and Khmer also do this in various ways, too.
It's clear that there needs to be a language-neutral way of showing things like this, and (taking Mandarin as a benchmark) it shouldn't be limited to transliterations into the Latin script, either, given one of the systems is Zhuyin and another Cyrillic. Theknightwho (talk) 20:16, 10 March 2024 (UTC)Reply
Thank you M @Sameerhameedy. Asking M @Theknightwho for languages mentioned with 3 or 4 transliterations. What is the legal status of these? By 'obligatory' for wiktionary to show, we mean: ISO-assigned for international transactions like exports. Is there one and only one topping the others? The problem we have here is: Because wiktionarians try to adapt ISO to something more useful to our readers, the discussions a. never end. and b. every 5 or so years, someone comes up with an alteration or a restoration of some letter conversion. This will never end. ‑‑Sarri.greek  I 22:43, 10 March 2024 (UTC)Reply
@Sarri.greek As far as I can tell, they all have one system which is used for things like links (i.e. as the "transliteration" in the normal sense), and the others are only shown on the entry.
I don't think we're under any obligation to choose the ISO standard as the main transliteration, but if we don't, then it's a good idea to show it on the entry itself. Theknightwho (talk) 22:50, 10 March 2024 (UTC)Reply
@Theknightwho, I see that such languages have boxes for transliterations = they can manage multiple solutions. I was thinking of languages that have one translit. next to PAGENAME, and disabled the option to add a second one. May I add a point:
ISOs have been critisised for poor results and unsuccesful conversions. Still, I am not proposing to reform ISO here. If ISO makes changes, we record them and update the official translit. I am proposing to free ourselves from the rigid 1st translit, which is not-to-be-debated. Also: how do wikipedians face this problem? Does en.wikt. have a liaison to en.wikipedia for questions or coordination? Thank you. ‑‑Sarri.greek  I 23:04, 10 March 2024 (UTC)Reply
@Sarri.greek The community of Wikipedia editors who work on language entries seems much smaller than the community of Wiktionary editors, so whoever has the most stamina tends to win out. E.g. User:Mahmudmasri insisted on particular standards for transliteration and phonemic rendering of Egyptian Arabic that I disagree with, but I don't have the energy to fight him on this and he does have the energy to patrol all the relevant pages and edit-war as necessary to get his preferred system in place, so that is what Wikipedia has. Similarly for things like language names and family trees; User:Kwamikagami out-staminas everyone else. I definitely agree with User:Theknightwho that we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration. We need to do what's right for Wiktionary and hopefully maintain some consistency of approach across languages where feasible. Benwing2 (talk) 00:24, 11 March 2024 (UTC)Reply
Thank you @Benwing2 About your comment (for general 'rules') >>we are under no obligation whatsoever to choose ISO's or anyone else's proposed standard as the preferred/main transliteration.<< (Also by @Theknightwho) The problem with not having some 'locked' directives, is, that talks be endless. Official things: (ISO, spelling directives of Academies or similar). Are not official things the first obligation of wikt? = credibility, stability, well-referenced, not subject to 'talks' and alterations. I dislike it too, but as a reader, I expect the info available. Otherwise, I would have to go elsewhere to get it. For some ISOs: Wiktionary's standards aspire to give better results than the official ISO :) That would be nice! But one has to see the comparison. ‑‑Sarri.greek  I 01:46, 11 March 2024 (UTC)Reply
@Sarri.greek Yes, sometimes consensus is hard to achieve but we all know that some ISO standards are garbage and/or have no adoption, and many ISO standards simply have different aims than we do at Wiktionary. I think we should aim to not be gratuitously different from ISO standards where possible (e.g. we use ISO language codes whenever possible rather than incompatible ones), but at the same time not be bound by them (e.g. sometimes we merge lects that ISO considers different, and sometimes we split lects that ISO considers the same). Benwing2 (talk) 01:53, 11 March 2024 (UTC)Reply
Ok, then @Benwing2. This is the end of this talk, so, my proposal for 2 transliterations is withdrawn. ‑‑Sarri.greek  I 02:07, 11 March 2024 (UTC)Reply
@Sarri.greek I don't think you need to withdraw your proposal just to end the conversation :) ... I do think having multiple translits is an interesting idea to be potentially considered further. After all, this is not the first or second time this idea has come up. Benwing2 (talk) 02:11, 11 March 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── Holding a discussion between two in this medium is difficult, I find one between so many impossible! I will only say that Dictionaries should be accessible (understandable to the "Man on the Clapham omnibus" — Oxford dictionaries appreciated this and have changed substantive to noun in their entries). I suspect that most people, not understanding IPA, use the transliteration as a guide to pronunciation. I hope that whoever makes a decision (the cynic in me says that it will probably be changed again next year) will bear the "man from Clapham" in mind.   — Saltmarsh 06:07, 11 March 2024 (UTC)Reply

Ωωωω! my wise mentor and administrator for Greek, @Saltmarsh! Hear, hear! Thank you. ‑‑Sarri.greek  I 06:13, 11 March 2024 (UTC)Reply

One system, multiple transliterations edit

For Vedic Sanskrit, transliteration is abused to show the placement of the accent. Our policy is not to show the placement in the spelling of the word. Now, for finite verbs incorporating prefixes, there are two possible placements for the same verb, depending on the grammatical usage of the verb. Is there an approved mechanism for showing the two transliterations, and if so, what is it and where, if anywhere, is it documented? Or do we only show the accent for finite verbs for the usage where a verb without a prefix would bear an accent? The placement in the other case appears to be reliably predictable if one can identify the prefix. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)Reply

I refer to this marking of the accent as an abuse partly because transliteration-related categorisation assumes that explicit transliterations are exceptional and worthy of review, whereas it is the norm for words found in accented texts. --RichardW57m (talk) 11:07, 11 March 2024 (UTC)Reply

How should we transliterate (into Japanese script or other scripts), romanize, and lemmatize Ryukyuan? edit

Previous discussions edit

The following previous discussions can have useful possibilities.

Information edit

Lately, the Ryukyuan orthography has been a mess. Various works vary between the hiragana or katakana or mixed script. There are vowels and syllabic consonants that cannot be transcribed cleanly/properly using Japanese orthography, so the central vowel (ɨ for example, サ行) been variously transcribed in Japanese script as シゥ, シィ, ス, す, スィ, ス𛅤 (CJK small katakana wi () if you cannot render this character), you name it. Aspirated and unaspirated consonants are also variously referred to as plain and glottalized consonants, and one of either is distinguished in hiragana or katakana. At Wiktionary we use an ad hoc transcription of inserting dakuten into the aspirated (Amami) and unaspirated (Okinawan/Kunigami), which is not used anywhere else. We also use an ad hoc method of including kanji in Ryukyuan languages, which some people do to transliterate Okinawa songs (but I can't find an example at the moment). Thus, 送り仮名 (okurigana) is basically another ad hoc transcription. In addition, we are basically duplicating kanji information from the Japanese entry, which requires more time and effort.

For the glottalized consonants such as [⸢ʔwáː] 'pig', should we do っわー, or ’わー?

Miyako has a special vowel, variously referred to as an apical vowel, laminal vowel, or fricative vowel (it is not a central vowel), which is variously transcribed as (S)ɨ, (S)ï, ʉ, ɿ, z, ü, you also name it. In fact, there are syllabic consonants in Ogami Miyako that cannot be transcribed cleanly in Japanese kana script, although there's a possibility that some Ogami words are actually reflections of a fricative vowel, as Kaneda Akihiro's vocabulary spreadsheet (from personal communication) does.

For romanizing, take Okinawan Shuri dialect [⸢ʔútɕínáː] for example. We could variously romanize it as ucinaa, 'ucinaa, ?ucinaa, uchinaa, uchinā, 'uchinā, you name it as well. And central vowels ([ɨ] in this instance) could either be transliterated as ï or ɨ (perhaps IPA only, so the former can be more plausible?), or we can transliterate [i] as yi and [ɨ] as i, and also have a glottalized initial as qV (as in qutyinaa). For aspiration, we could include <h> for aspiration, but nothing for unaspiration (or <'>), or include <'> for aspiration but nothing for unaspiration.

Finally, do we lemmatize at the kanji, the kana, or romanization? The current situation is just a total mess.

TL:DR: Transliteration and lemmatization of Ryukyuan needs a massive overhaul; it's a mess as of right now.

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Mcph2): This is an important discussion for the orthography and lemmatization of the Ryukyuan languages. Please come to a consensus. Chuterix (talk) 17:10, 11 March 2024 (UTC)Reply

We should lemmatize at what native speakers have used the most, absent a standard orthography, regardless of if it seems inconsistent or "ad-hoc". Defective or variant orthographies are not specific to Ryukyuan, and in other cases, we list the variants as alternative forms with the "standard" or most-common form as the lemma. (Or in the case of two differently-pronounced words represented by the same orthography, we disambiguate in the etymology + pronunciation sections)
For Okinawan in particular, there are several works written in mixed script (Kanji & kana(, and it looks to be the traditional orthography as well, so I wouldn't support a move to solely kana, and definitely not the Latin script. The same level of research should be done for the other languages as well; if they are more-written in the Latin script or katakana, then shifts can be made, but the research needs to be done first. AG202 (talk) 17:25, 11 March 2024 (UTC)Reply
As someone who does not read Japonic/Ryukyuan literature and cannot otherwise comment much on this, I would just like to register my (ignorant) doubts towards/concerns regarding Wiktionary constructs such as {{ryn-readings}} (the concept of on'yomi vs. kun'yomi, at least) and Category:Northern Amami-Oshima Han characters (the concept of "Ryukyuan kanji" in general). Kana orthography seems to be under-developed, let alone usage of kanji (or should it be the other way around? placenames, etc.). Are we just reapplying 標準語 kanji to Ryukyuan? (can we examine 1. Japonic dialects [using kanji seems non-problematic] 2. Chinese "dialects" [本字 debates, "unwritten", etc.] 3. Jeju [the concept of Sino-Jeju is discouraged on Wiktionary]? as a comparison point for this topic?) —Fish bowl (talk) 09:29, 14 March 2024 (UTC)Reply

Recent change to government standard for Japanese edit

I broke this off into a subtopic because I do not understand Japanese (and therefore cannot check original sources) and I'm generally ignorant of CJK languages, but per Wiktionary:Grease_pit/2024/March#FYI:_Major_romanization_change_coming_in_Japan, the government standard in Japan for Japanese is now Hepburn. As AG202 notes above about "absent a standard orthography", I'm just soliciting that the feds there may have a standard for Ainu, Ryukuan, etc. as well and that standard may be Hepburn also. Sorry if my ignorance introduces noise. :/ —Justin (koavf)TCM 17:53, 11 March 2024 (UTC)Reply

 

the government standard in Japan for Japanese is now Hepburn.

 
Notably, this is for romanization, which is included on various kinds of signage explicitly for foreigners, as part of the country's efforts to court tourism money. This shift to Hepburn has nothing to do with text written in Japanese or other Japonic languages, outside of this very limited context (signs for foreigners). ‑‑ Eiríkr Útlendi │Tala við mig 20:43, 12 March 2024 (UTC)Reply

Wikimedia Foundation Board of Trustees 2024 Selection edit

You can find this message translated into additional languages on Meta-wiki.

Dear all,

This year, the term of 4 (four) Community- and Affiliate-selected Trustees on the Wikimedia Foundation Board of Trustees will come to an end [1]. The Board invites the whole movement to participate in this year’s selection process and vote to fill those seats.

The Elections Committee will oversee this process with support from Foundation staff [2]. The Board Governance Committee created a Board Selection Working Group from Trustees who cannot be candidates in the 2024 community- and affiliate-selected trustee selection process composed of Dariusz Jemielniak, Nataliia Tymkiv, Esra'a Al Shafei, Kathy Collins, and Shani Evenstein Sigalov [3]. The group is tasked with providing Board oversight for the 2024 trustee selection process, and for keeping the Board informed. More details on the roles of the Elections Committee, Board, and staff are here [4].

Here are the key planned dates:

  • May 2024: Call for candidates and call for questions
  • June 2024: Affiliates vote to shortlist 12 candidates (no shortlisting if 15 or less candidates apply) [5]
  • June-August 2024: Campaign period
  • End of August / beginning of September 2024: Two-week community voting period
  • October–November 2024: Background check of selected candidates
  • Board's Meeting in December 2024: New trustees seated

Learn more about the 2024 selection process - including the detailed timeline, the candidacy process, the campaign rules, and the voter eligibility criteria - on this Meta-wiki page, and make your plan.

Election Volunteers

Another way to be involved with the 2024 selection process is to be an Election Volunteer. Election Volunteers are a bridge between the Elections Committee and their respective community. They help ensure their community is represented and mobilize them to vote. Learn more about the program and how to join on this Meta-wiki page.

Best regards,

Dariusz Jemielniak (Governance Committee Chair, Board Selection Working Group)

[1] https://meta.wikimedia.org/wiki/Special:MyLanguage/Wikimedia_Foundation_elections/2021/Results#Elected

[2] https://foundation.wikimedia.org/wiki/Committee:Elections_Committee_Charter

[3] https://foundation.wikimedia.org/wiki/Minutes:2023-08-15#Governance_Committee

[4] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_elections_committee/Roles

[5] Even though the ideal number is 12 candidates for 4 open seats, the shortlisting process will be triggered if there are more than 15 candidates because the 1-3 candidates that are removed might feel ostracized and it would be a lot of work for affiliates to carry out the shortlisting process to only eliminate 1-3 candidates from the candidate list.

MPossoupe_(WMF)19:57, 12 March 2024 (UTC)Reply

User:GabMarquetto edit

Last month, this user added well over a thousand problematic Greenlandic entries over a day or two by scraping a Greenlandic dictionary site and running an unauthorized bot on their account. I blocked them from mainspace and the Reconstruction namespace as an unauthorized bot and asked for help at the Grease pit (see Wiktionary:Grease pit#Hundreds of Incomplete Greenlandic entries need to be cleaned up) on getting them up to Wiktionary standards. The consensus seemed to be that it would be best to just nuke them all, which I have since done, for the most part. Aside from copyvio concerns (compilation copyright, if nothing else), the verbatim inclusion of typos and other irregularities in the headwords showed that the bot run had been prepared with only minimal attention to the content. They have admitted that they don't speak Greenlandic at all (they're editing from Brazil).

The user responded by apologizing on their talk page and by attempting to clean the entries up using an alternate account and as an ip, for which those were blocked by others on grounds of block evasion.

We need to discuss what to do next. While their methods were wrong, their motivation was to add content to the dictionary. They have admitted their mistakes and agreed not to repeat them. I made a point of only blocking them from two namespaces so they could discuss things here and on talk pages. This should not be about punishment for anything they did, but about whether they can be trusted to edit responsibly and add worthwhile content.

Pinging participants in the Grease pit discussion: (@Benwing2, DCDuring, Thadh, Vininn126), and users I've seen editing Greenlandic entries: (@Gamren, Jakeybean, Tesco250). Chuck Entz (talk) 14:57, 13 March 2024 (UTC)Reply

The only input I can give is on admin decisions - I definitely think we should WT:Assume good faith and discuss with this user and teach them. Unfortunately when it comes to specifically Greenlandic I am very unfamiliar. I do think that they should stick to languages whose text they can at least read and understand (and not just rely on something else). Perhaps this user shouldn't be editing Greenlandic at all. Vininn126 (talk) 15:02, 13 March 2024 (UTC)Reply
Unfortunately our dictionary does suffer from a severe lack of terms in many languages. However, if we don't have any editors who know the language, there is nothing we can do about that. The best course of action in my opinion would be to simply remove all these contributions, because currently a larger problem we are facing as a dictionary is untrustworthiness, which in turn decreases the number of willing editors in these languages. Better to not have any entries in a language than to have hundreds of questionable quality and validity at best. Thadh (talk) 16:35, 13 March 2024 (UTC)Reply
Untrustworthiness is probably mostly based on English entries. Maybe we need to start over with a clean sheet of virtual paper. DCDuring (talk) 18:46, 13 March 2024 (UTC)Reply
@DCDuring: I feel like you are using some kind of tone that isn't being taken over into your writing. What are you saying? That we should re-do our whole dictionary? Also applies to the message below, I'm confused what your opinion is. Thadh (talk) 19:46, 13 March 2024 (UTC)Reply
I found the argument given spurious. If our problem is that we are thought untrustworthy, I find it hard to believe that the problem can be anywhere other than English entries. If untristworthiness is a reason to delete content, then it is English entries that should be deleted. DCDuring (talk) 20:02, 13 March 2024 (UTC)Reply
Most of the readers I interact with don't even use the English entries, so I think you're talking about a whole other reader base. If a language's sections don't have any references and half of the time feature an incorrect translation, then this is a disservice to the readers, and we should remove or improve those sections. If you think an English entry does not fulfill our CFI, you should RFV it, too, but mostly our English entries are pretty well-formed and represent the language adequately, as they are proofread by hundreds of native speakers. Not at all the case with our other language sections. Thadh (talk) 20:51, 13 March 2024 (UTC)Reply
I've barely glanced at English entries except occasionally to make Romance etymologies that bleed into them more consistent. Nicodene (talk) 12:52, 15 March 2024 (UTC)Reply
I guess that either we don't need no stinking first-draft-level Greenlandic entries from a volunteer or we should be trying to recruit someone (from where?) to add them from scratch. DCDuring (talk) 18:44, 13 March 2024 (UTC)Reply
Perhaps we could see whether there is some other language's wiktionary that has some good Greenlandic entries. da.wikt? is.wikt? DCDuring (talk) 20:02, 13 March 2024 (UTC)Reply
@DCDuring Alas, no. da.wikt has ~ 100 Greenlandic entries and they're all extremely basic stubs with a single-word definition and nothing more. is.wikt is even worse, with only 2 basic stub Greenlandic entries. In general, many non-English Wiktionaries are slim pickings; the entries are typically OK only for the native language of the Wiktionary in question and often not even then. For many languages, en.wikt does a far better job than the corresponding language's own Wiktionary. Benwing2 (talk) 07:28, 14 March 2024 (UTC)Reply
It seems a shame that there are so many resources for Greenlandic available from the Greenland Language Secretariat to evaluate and improve the first-draft/stub entries, but we can't find the linguistic talent motivated to improve the entries. Oh well. DCDuring (talk) 14:30, 14 March 2024 (UTC)Reply
@DCDuring You are welcome to do the entries yourself. Theknightwho (talk) 00:35, 15 March 2024 (UTC)Reply
Needless to say, please do consult a grammar (or more) before doing so. Thadh (talk) 09:10, 15 March 2024 (UTC)Reply
I'll just request Greenlandic translations for the organisms that sometimes live there. Maybe I'll venture to add a Greenlandic entry for them too. DCDuring (talk) 12:48, 15 March 2024 (UTC)Reply

Hoping to convene on practice regarding natural overlap of hyponyms and derived terms edit

There are many nouns for which a population of hyponyms and a population of derived terms will quite naturally have a substantial overlap. For example, in English, the noun list has laundry list, punch list, and dozens more. The theme is quite generalizable. What I propose here is to codify the principle that it is not wrong to show such terms both in a hyponyms section and in a derived terms section. Earlier I had avoided doing so because I anticipated that otherwise someone else might complain that having the same term twice on the page was "clutter". But there are some good reasons, regarding w:structured data, why allowing for the natural degree of double-posting is a good idea. Does anyone strenuously object to doing so? Note that column wrappers can be used, so there is no excuse for a section with lots of content not to auto-collapse. Thus, the user will not be presented with a giant unfolded list. Thanks. Quercus solaris (talk) 17:34, 13 March 2024 (UTC)Reply

Definitely not wrong, actually encouraged inmo. Thadh (talk) 17:37, 13 March 2024 (UTC)Reply
I support this. "Laundry list" is both a derived term from "list" as well as a hyponym of "list". CitationsFreak (talk) 18:38, 13 March 2024 (UTC)Reply
It is pure clutter on taxonomic name entries. I have limited derived terms to items that are not hyponyms, ie, accepted species names are hyponyms, not derived terms. No-longer-accepted species names that have been placed in other genera may appear as derived terms. The rule-driven, mechanical nature of the derivation of almost all tribe and family names and many order names makes their inclusion in any derived terms lists often of similarly low value, insufficient to warrant the clutter. OTOH, if we really want this duplication, there is nothing to prevent a bot from doing the job thoroughly. DCDuring (talk) 20:14, 13 March 2024 (UTC)Reply
User:Sae1962 had a frequent habit of adding extremely basic hypernyms (i.e. going the other way), so he would take something like Hypertext Markup Language and add a hypernym of language. I consider this fairly unhelpful to human readers, though technically correct: it reminds me of reading Java or .NET programming documentation where you have a huge list of derived or inherited classes going all the way back to object, because everything is an object in the end. Equinox 20:20, 13 March 2024 (UTC)Reply
i thought the Derived terms section was only for terms that could not fit under Hyponyms or some other section (though I suppose it would nearly always be Hyponyms). I think there are at least some pages that have an HTML comment in the Derived terms section warning users not to add terms that could be put into a hyponyms section or some other section. But I didnt bookmark anything and I dont see it on the policy page. Soap 20:53, 13 March 2024 (UTC)Reply
Great points everyone. Thanks. Perhaps not a firm rule to be codified now. Wiktionary could impose firm consistency retroactively, later, if it ever feels the need. So where I'll leave it is that I'll respect and obey any existing setups that don't double-post (such as taxonomy, or entries with comments discouraging it). And I'll follow the principle that whichever method is used, just make sure that auto-collapse is keeping everything nice and orderly. Quercus solaris (talk) 16:08, 14 March 2024 (UTC)Reply

Renaming "etymology-only language" edit

@Theknightwho, -sche I think the time has come to rename the term "etymology-only language" to something else. This term is cumbersome, and while it was accurate originally when the codes in question could be used only in etymology templates, it's long outgrown that particular use case. I would propose one of "dialect", "subvariety" or "sublect". "Dialect" is the most straightforward and arguably is exactly what these varieties are in most cases, but it's a bit of a loaded term given the longstanding language-vs-dialect controversy that happens with many language varieties. Thoughts? Benwing2 (talk) 04:57, 14 March 2024 (UTC)Reply

I'll just add we treat Middle Polish as such, and I'm not sure dialect would be the best term for it. Unless we accept "dialect" to mean "any variant of"... Vininn126 (talk) 07:21, 14 March 2024 (UTC)Reply
@Vininn126 Good point. That is why I suggested "subvariant" and "sublect". "Variant" on its own could sort of work but it feels too vague without some other qualifier, since "variant" and "lect", at least in some contexts, are generic terms covering any type of language. Benwing2 (talk) 07:25, 14 March 2024 (UTC)Reply
@Benwing2, Vininn126: May I suggest merolect? The word sees no established usage, so we can thereby avoid any undesirable connotations, and with the etymological sense of "part-language", it means exactly what we want it to mean. 0DF (talk) 14:33, 14 March 2024 (UTC)Reply
Variant is succinct, and covers all the different kinds of etym-only language: dialects, chronolects, regional varieties, written standards etc. Theknightwho (talk) 14:37, 14 March 2024 (UTC)Reply
@Benwing2, at el.wikt we mark them as 'sublang' = subordinate languages. The weird thing here, is that they can be donors but not receivers. How is this possible? The 'subordinate' or 'hosted' languages/varieties/dialects/whatever have Cat:Terms derived from this.sublang (donor to other languages) Cat:Sublang terms derived from X.languagage (as receivers) e.g. MedLat alchemia at wikt:el:alchemia has a Cat:Med.Lat terms borrowed from arabic. ‑‑Sarri.greek  I 15:03, 14 March 2024 (UTC)Reply
@Sarri.greek I'm not keen on this; I'm not sure about Greek, but in English the term "subordinate" implies a lesser status, which is likely to put some contributors off. Theknightwho (talk) 16:23, 14 March 2024 (UTC)Reply
I meant, M @Theknightwho, that they are marked at module sublang=true. If the question is about the 'name' of all of them, it doesn't matter. But, how could this title convey that these languages are not allowed what the others are? They are code-only languages with only existence, in the template {{m}} and being a donor but never a receiver at etym.templates. My big surprise, worry, and question is: why are they not receivers?? Probably this is not the place to ask this. I just bring it up because it is relevant, and because I do not intend to open such a subject myself. ‑‑Sarri.greek  I 16:41, 14 March 2024 (UTC)Reply
@Sarri.greek We do often include them in descendant sections. Also, the elephant in the room is Chinese, which we already subdivide for this purpose already; we simply group them all under one header. Theknightwho (talk) 16:44, 14 March 2024 (UTC)Reply
Please, please, Sir, think about it! @Theknightwho, Benwing2 why etymologies should be inaccurate? Medieval Latin alchemia and similar LaMed words, give the Cat:Latin terms derived from Arabic, which should include only a subcategory: Medieval Latin terms derived from Arabic. cf @el.wikt.Cat.Lat.from.Ar has only this subcat. The etymologies of descendants, should say 'from Med.lat' not 'from Lat'? because it is a medieval word. ‑‑Sarri.greek  I 16:58, 14 March 2024 (UTC)Reply
@Sarri.greek This is an orthogonal point. I think you're asking for allowing etym-only languages in the |1= param of etym templates and categorize under e.g. CAT:Medieval Latin terms derived from Arabic in addition to CAT:Latin terms derived from Arabic. We don't currently do this but the trend is towards allowing etym-only languages in more places (hence this renaming discussion), so potentially we could allow this. IMO though this should be a separate discussion from what we should rename "etym-only language" to. Benwing2 (talk) 20:45, 14 March 2024 (UTC)Reply
Agreed. Vininn126 (talk) 20:47, 14 March 2024 (UTC)Reply
May I suggest variety? It is the most neutral commonly-used term that comes to mind. ‘A variety of Spanish’ brings up some seven million hits on Google. Nicodene (talk) 15:41, 14 March 2024 (UTC)Reply
Yeah, this is probably a better suggestion than "variant", and is a widely-used term. Theknightwho (talk) 16:20, 14 March 2024 (UTC)Reply
So far variety is my top option. I understand the logic of merolect but I think we should avoid obtusisms if possible. Vininn126 (talk) 16:23, 14 March 2024 (UTC)Reply
@Nicodene, Theknightwho, Vininn126: I'm also happy with variety. 0DF (talk) 17:47, 14 March 2024 (UTC)Reply
"Dialect" isn't ideal, because we already have dialectal data modules. Likewise, "variety" isn't ideal, because language data has a field for "varieties" that is just a list of names (see e.g. Category:English language). — SURJECTION / T / C / L / 18:41, 14 March 2024 (UTC)Reply
@Surjection I’d argue the opposite: the two listed for English are both lects which would benefit from having a code of this type, so naming them “variety codes” make total sense. Theknightwho (talk) 19:45, 14 March 2024 (UTC)Reply
Can you say the same of every "variety" currently specified for every language? — SURJECTION / T / C / L / 20:26, 14 March 2024 (UTC)Reply
Do you have an alternative suggestion? Vininn126 (talk) 20:27, 14 March 2024 (UTC)Reply
My point is that we should avoid adopting terminology that is already used for something else. It's only going to make everything more confusing than it is. — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)Reply
I don't think this is something else, though - the whole point of the varieties field is to list more specific types of the main language, which is precisely what these codes are for. I've not been able to find a counter-example to that yet, since alternative names for the language itself should go under the "aliases" field instead. Theknightwho (talk) 21:24, 14 March 2024 (UTC)Reply
@Surjection I think we shouldn't worry about existing internal names. The "dialectal data modules" are probably going away in any case (see my post in the Grease pit) and we can rename the language data field. Benwing2 (talk) 20:40, 14 March 2024 (UTC)Reply
BTW since most people seem to support the term "variety", maybe we can call the internal data field "variant" or "lect". Benwing2 (talk) 20:41, 14 March 2024 (UTC)Reply
Sure, renaming that field is another option, and then we can call etymology-only language "varieties". — SURJECTION / T / C / L / 20:42, 14 March 2024 (UTC)Reply
@Surjection Can you provide an example of something which should go under the "varieties" field in the language data which shouldn't ever have an etymology-only code? Theknightwho (talk) 21:25, 14 March 2024 (UTC)Reply
I'm not saying I know of any such cases. What I am saying is that nobody knows, until the work to check them is put in, that all of the currently registered "varieties" could reasonably have their own codes. — SURJECTION / T / C / L / 21:31, 14 March 2024 (UTC)Reply
Alright - I'll do that. Theknightwho (talk) 21:56, 14 March 2024 (UTC)Reply
Just FYI I suspect that everything that qualifies as a "variety" under the variety field can reasonably have an etym-only code. We have current etym-only codes for conventional dialects (regional lects/topolects), chronolects (e.g. Early Modern English), registers/sociolects (e.g. Katharevousa), cants (e.g. Polari), even writing systems (e.g. Wade-Giles). It might be useful to set up the ability to categorize etym-only varieties by the type of lect involved; currently this info is found only in the associated category and only at the level of regional lect vs. everything else. Also, if we get serious about adding etym-only codes for all varieties, we might want to split the data into submodules the way we currently do for full languages. Benwing2 (talk) 22:32, 14 March 2024 (UTC)Reply
If we're adding etym-only codes for all varieties, do we still need a "varieties" field in Module:languages, or is it just redundant to Module:etymology languages/data and its "parent"/"3" field? (I think the/an original reason varieties were listed in Module:languages is so people searching the module for e.g. Twi would find what code covered it; we might want to retain "varieties" that have ISO codes but let Module:etymology languages/data handle all the other ones...?) - -sche (discuss) 23:10, 14 March 2024 (UTC)Reply
My (half-serious) suggestion last time this came up was "subsumed variety", since that seems to be the distinguishing characteristic (?), that these are codes that are subsumed under other codes. There are some edge cases like substrates which none of the proposed names fit, e.g. the pre-Roman substrate of the Balkans—or as it was recently (non-consensusly?) renamed, Paleo-Balkan—is not really a "dialect" or "subvariety" or "subsumed variety" of anything, it's "one or more unknown languages from place X". I agree with Surjection we shouldn't be using the same name for two nonidentical things, so if we call these "varieties", we should consider whether to rename or retire the "varieties" field in Module:languages as discussed above.
BTW, re "dialect", another issue with that term is: are things like "Classical Latin" and "Late Latin" "dialects", per se? (If they are, we need to update the entry dialect.) - -sche (discuss) 23:10, 14 March 2024 (UTC)Reply

Wiktionary really needs structured etymology edit

I've become convinced that Wiktionary's current etymology system, in which each entry contains the complete ancestry of a term, is creating massive problems that prevent Wiktionary from being a good etymological dictionary. Here are the problems:

  • Massive duplication: consider English puny and its earlier form puisne. We repeat the exact same information in different entries, blatantly violating the DRY principle. That's just two entries: the etymology of a widely-borrowed term like the ancestor of English sugar has to be duplicated across hundreds of languages. Often, editors don't bother and just write something like "see term#English for more details".
  • Entries falling out of sync: English nexus claims to derive from Proto-Indo-European *gned- or *gnod- through Latin necto. But necto was recently revised, claiming that its origin is "uncertain". Which entry is a reader meant to trust? This kind of inconsistency is actually encouraged by the current system, because after changing editing the etymology of a term, an editor has to hunt down and correct every single place where that etymology is referenced or copied. More often than not, they don't, and the result is that entries can drift out of sync and sometimes even contradict each other.
  • Redundant edits: editors spend large amounts of time expanding "derived terms" and "descendants" sections which is necessary only because of limitations in the current system. Because if we know that A is an ancestor of B, there's no point in also writing that B is a descendant of A—that's clearly implied. But we have to anyway, since there's no automated system that can make that logical step.

Structured etymologies would also let us do cool things, like create etymological trees and automatically find cognates and doublets across different languages.

Here is a simple model for creating structured etymologies:

  • Each etymology section of an entry needs to be associated with one or more etymons. An etymon is a term which is the ancestor of another with no intermediate steps. Thus, the etymon of puny is puisne, the etymon of puisne is Anglo-Norman puisné, and so on.
  • An entry can have more than one etymon. For example, English arrangement can be said to derive from English arrange, English -ment, and French arrangement.
  • There are different kinds of etymons: English fullwidth clearly derives from full +‎ width, but is also calqued from Japanese 全角 (zenkaku). The first two are morphological etymons, while the last is a semantic etymon. Another example is bullroar (sense 3) which is morphologically from bull +‎ roar but semantically from bullshit.
  • An etymon can also have a degree of certainty: the levels might be "certain", "likely", and "unlikely". This is an improvement from the current system, where something is either {{derived}} or it isn't. Sometimes, when editors aren't confident, they add |nocat=1, but this isn't a standardized practice.

Thus, to create a list of derived terms or descendants, all you would need to do is get a list of entries which have a particular term as an etymon.

The main problem to consider is how all this can be accomplished. Here are the possibilities:

  1. Use Lua data modules, which are well-established on this week but are fairly unintuitive for new users and might cause performance issues.
  2. Get an extension like Wikibase, which is already used on Wikidata. To be clear, I'm not saying we should turn Wiktionary into Wikidata, but rather use a Wikidata-like structure for this application. The drawback is that this would require WMF developers to get something done (not their strong suit).
  3. Use bots. A bot can essentially function as a parser which converts high-level information into wikitext. This kind of thing is already being done with our {{anagrams}} system. This is the technically simplest solution, but would require someone to continually run a bot.
  4. Do nothing and keep writing etymologies manually. This is easier for now, but probably not a great long-term solution.

Another problem to consider is how this structured information should be presented to the reader. But all in all, I'm curious as to what the community thinks should be done. Ioaxxere (talk) 23:18, 14 March 2024 (UTC)Reply

I agree that the current way etymologies are handled is problematic. If there is some technical solution to that, it would be great; I don't understand that side of things so am not sure what can be done.--Urszag (talk) 00:17, 15 March 2024 (UTC)Reply
@Ioaxxere IMO none of your proposed solutions is workable. I would rather suggest a scraping solution. This is what we do for Descendants, for example, and it seems to work fairly well. (Your proposed solution #1 was tried for Descendants prior to implementing scraping, and failed, which led to the scraping solution.) This should not be too hard, but it might require the introduction of a few more templates to more clearly spell out the relations between etyms. Not sure. Benwing2 (talk) 00:46, 15 March 2024 (UTC)Reply
@Benwing2 By scraping, do you mean creating an etymology equivalent of {{desctree}}? I think the main limitation of this is that you can only go in one direction (i.e. you wouldn't be able use the etymology data to get a list of descendants). But I would definitely consider that an improvement over the current situation.
Actually: I thought of a way to resolve this, by using the category system. We already have categories like Category:English terms suffixed with -en. If we created categories for every single "terms descended from LEMMA" (there would be millions) this could theoretically be used to encode a tree. What do you think? Ioaxxere (talk) 02:15, 15 March 2024 (UTC)Reply
@Ioaxxere Yes, something like {{desctree}}. It's true this wouldn't easily let you go from lists of descendants to ancestors and vice-versa, but (a) it would solve the other issues, (b) it's not clear in any case you would want to automate things in both directions in all situations; there are lots of complex cases involving etymologies that can't be neatly categorized and need descriptive text, and the Descendants lists and Etymology sections are conceptually different in their current implementations. Since the etymology sections are less structured than Descendants sections, some thought would have to go both into the conventions needed in Etymology sections so that the scraping result looks reasonable, and into how to implement the scraping itself and handle the various edge cases. Not something I have time to work on now but I agree it would be a good idea in the longer run and avoid lots of duplication and the inevitable bit rot associated with this (maybe there's a better term than "bit rot" to describe the inevitability of things getting out of sync when you have duplication). Benwing2 (talk) 03:28, 15 March 2024 (UTC)Reply
I wholeheartedly agree but think you may be underestimating the sheer scale of the undertaking. Even with bot help it will take a lot of hands-on work by knowledgeable editors. My instinct is to pare this down to the basic core: set etymologies to point only one lemma higher and add the kind of 'scraper' currently being discussed. (And then call in the clean-up crew for a massive spectrum of languages...) As someone who edits mainly ety and desc sections, this alone, if feasible, would clear up 80% of my headaches. Nicodene (talk) 05:39, 15 March 2024 (UTC)Reply
I don't disagree with 'make etymologies only point to the next level up' as a goal, but the issues that have come up when that's been discussed before, which you are probably aware of but which I want to make sure are mentioned here now that the idea itself has been mentioned, include (1) what if the next level up doesn't have an entry? e.g. I just added an etymology to chabot, but neither the Occitan chabotz nor the Latin capoceus exists to house the information that they ultimately seem to come from caput. (maybe in that case we add the full ety to chabot but also add a template that categorizes the entry as needing Occitan and Latin editors to help by creating those entries and moving the information thither?) (2) someone who's only interested in e.g. the etymology of English or French words now has to watchlist Occitan and Latin (etc) pages to see if that etymology gets changed; in some cases where minor languages see vandalism, this would make it less likely to be spotted (e.g. people try to add all kinds of weirdness to Kamboja, and if they were adding it to some less-watched Indian language instead, it'd get noticed less). (Also 3: if I want to know how many Old English words survive in English, and our English entries only point to Middle English, I'm stymied... but that one we could solve if the scraper/bot mass-adds that template that people currently use to add categories to etymologies.) - -sche (discuss) 06:39, 15 March 2024 (UTC)Reply
For 1) I meant the next entry up which exists, and if it's a dead end, the etymology is left as-is. The proposed change wouldn't affect the chabot situation one way or another. For 2) I don't have an answer. Is vandalism that bad of a problem? Maybe I've just not noticed since I'm not the one dealing with it. 3) I wouldn't do this without including some kind of automatic categorization that runs through the etymology chain, or maybe regular bot sweeps that add/fix dercat. Not that I know how feasible that is, now that you mention it. Nicodene (talk) 09:44, 15 March 2024 (UTC)Reply
For point 2 with Wikibase, the model is not Wikidata but c:Commons:Structured data. It's a new tab with data that bots can populate from key templates. Lua can access it recursively to create new and more powerful templates. Other tools like SPARQL can query the data. It's all about how to model metadata. Vriullop (talk) 09:33, 15 March 2024 (UTC)Reply
I generally support this idea but I have no strong opinions on what the exact solution should be. I like being able to point to a specific etymon to generate structure from there, however that structure looks. Vininn126 (talk) 10:11, 15 March 2024 (UTC)Reply
I've been thinking about this, and I'm starting to feel that a category system is the only sensible way to implement this. Here's how it would work:
  1. Start at an entry (say biology)
  2. Add an etymon template to the top of the etymology section. It might be formatted like this: {{etymons|en|id=life science|Biologie#German: biology|bio-#English: life|-logy#English: study}}. The |id= parameter defines the {{etymid}}, while the subsequent parameters link to etymons by their etymids.
  3. The {{etymons}} template adds the category Category:ety:biology (English: life science), which represents a node in the etymology tree (although the naming scheme isn't final).
  4. A bot creates the category with {{auto cat}}.
  5. {{auto cat}} scrapes the page biology and discovers the {{etymons}} template. Using this information, it adds Category:ety:biology (English: life science) into the categories Category:ety:Biologie (German: biology), Category:ety:bio- (English: life), and Category:ety:-logy (English: study).
Now, getting the descendants or derived terms of biology is as simple as seeing what entries are in Category:ety:biology (English: life science). There might be subpages, like Category:ety:biology (English: life science)/uncertain or Category:ety:biology (English: life science)/semantic, to include cases I discussed in my original post. But overall, the concept is essentially {{prefixsee}} or {{suffixsee}}, just for every term. @Benwing2, would you support implementing this template? It could coexist in parallel with the current system for now until we figure out a way forward.
To answer a few others:
  • @Nicodene: Yes! Having etymologies point only one lemma higher is the entire purpose of this proposal. Because if we have a chain A -> B -> C, there's no reason why C needs to "know" that it comes from A. It's implicit. The problem is that editors spend lots of time writing out the entire chain on every entry, and this is done in an inconsistent way. But there's no rush to implement this on a massive scale right away. As stated above, we should have this coexist with the current system.
  • @-sche: Those are honestly good questions. In the case of A -> B -> C, if B doesn't exist, it might be reasonable to create it as a "dummy entry" with an etymology section and nothing else. Another possibility is to just link A -> C. For the second point, I don't think we should be designing our systems with the expectation of vandalism. But yes, an editor would have to watch a variety of pages to follow an entire etymological chain. However, someone who's really only interested in English etymology wouldn't care about, say, which PIE root an entry comes from, because that's not English etymology. In the case of French chabot, we might do {{etymons|fr|id=fish|caput#Latin: head|q1=uncertain}}, which would add it into Category:ety:caput (Latin: head)/uncertain.
  • @Benwing2: I think the term you're looking for might be entropy.
Ioaxxere (talk) 14:43, 15 March 2024 (UTC)Reply
@Ioaxxere I think we need a solution that can work with existing etymologies. That probably means accessing a chain of data if possible from a single entry, and if that doesn't work, falling back to accessing from multiple entries in a chain. I don't think having an entirely new system in place in addition to the old system will work very well. Benwing2 (talk) 19:32, 15 March 2024 (UTC)Reply
@Benwing2: What you're asking for seems impossible. The current system is ambiguous in that we link to an entry without specifying its etymid, meaning that "going up the chain" is rarely possible to do in an automated way. If your plan then involves specifying etymids for every etymology section, then we might as well just overhaul everything, because it's the same amount of work anyway. Ioaxxere (talk) 20:53, 15 March 2024 (UTC)Reply
@Ioaxxere I think we'd have to examine some actual use cases before deciding it's impossible. In many cases there's only one Etymology section, for example. Benwing2 (talk) 21:00, 15 March 2024 (UTC)Reply
@Benwing: the problem with this heuristic is that a) there's no guarantee that that the single-etymology entry is actually the correct one (maybe the actual ancestor hasn’t been added yet) and b) could break unpredictably if someone adds a new etymology section later. The basic use case, in my view, is to replace our current "from X, from Y, from Z" with just "from X" and have the rest be automatically filled in. Those on the Discord will have seen my struggles in trying to do just this. Ioaxxere (talk) 21:29, 15 March 2024 (UTC)Reply
@Ioaxxere Sure but realistically I don't think trying to implement a completely new system will work. We need to find a solution that leverages what's already there. Benwing2 (talk) 21:35, 15 March 2024 (UTC)Reply
@Benwing2: We can leverage our current data by using bots to convert etymology sections into a structured format, but this can only be done in situations where we are certain that errors won't be propagated. For example: if A is listed an ancestor of B#Etymology_2, and we find B listed as a derived term or descendant at A#Etymology_2, we can fairly confidently connect A#Etymology_2 and B#Etymology_2. I have implemented this heuristic in my own script and it works very well. Ioaxxere (talk) 03:26, 16 March 2024 (UTC)Reply
@Ioaxxere Just FYI, before you set off to radically restructure etymologies, you need to (a) get consensus, (b) keep in mind what will be workable for the typical editor; ideally the system should be as little different as possible from what we have already. It's also better to do this stuff dynamically through scraping if at all possible, vs. requiring a bot to run periodically. Benwing2 (talk) 04:16, 16 March 2024 (UTC)Reply
@Benwing2 It seems as though there's consensus for a change of some kind, but no agreement as to how it should be implemented. And that's something I'm also still thinking about... Ioaxxere (talk) 07:32, 17 March 2024 (UTC)Reply
Ben's idea on something like {{desctree}} would imply that {{bor}}/{{inh}} would be able to check the pointed-at etymon and print information from there, and potentially go several pages back. If such a system were implemented, I think {{af}} should obviously be excluded, imagine printing the information for all the morphemes! It also wouldn't work for redlink pages just the same as {{desctree}}. Vininn126 (talk) 07:42, 17 March 2024 (UTC)Reply
But, as mentioned, every word would need etymid's and etymology sections, of course... Vininn126 (talk) 07:48, 17 March 2024 (UTC)Reply
@Benwing2, Vininn126 I created a mockup of my concept at User:Ioaxxere/under. I created a module, Module:User:Ioaxxere/etymon, which can recursively go backwards through various entries to build a chain of etymons (no categories are involved). Currently, it can only handle "From X, from Y"-type etymologies, so we'll need more complex parameters to represent stuff like {{af}}. Please let me know what you think! Ioaxxere (talk) 05:15, 18 March 2024 (UTC)Reply
@Ioaxxere I took a brief look. I really don't want to be a party pooper but my sense is the scraping needs to be a lot more sophisticated and more able to work with existing entries (I feel I've said this before). Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon. The reason {{desctree}} works is that it works with existing entries without requiring everything to be converted to a new format (and to the extent things have been converted, like when I changed {{desc}} to accept multiple terms, it's been in a completely automated fashion). Benwing2 (talk) 05:22, 18 March 2024 (UTC)Reply
@Benwing2 Take the example of father. Let's say I want to the etymology to by synced up with its etymon, Middle English fader. But wait, do we want fader (Etymology 1) or fader (Etymology 2)? A human would obviously realize that the correct section is etymology 1. An automated scraping template could easily figure this out as well if we added heuristics like "Etymology 1 is a lot longer" and "Etymology 1 is on top" and "Etymology 1 links to father in its descendants section" and "Etymology 1 and father list the same ancestors" and "Etymology 1 is defined as father". The problem is that these heuristics can get arbitrarily complex and break in unpredictable ways. That's why we should be working towards using etymids.
Also, I have a question about {{desctree}}: how would I get it to scrape bar#Descendants_2? The entry doesn't have etymids, so this doesn't seem to be possible.

Manual or even manual-assisted conversion of large numbers of entries to a new system/template just won't be happening any time soon.

Would you be opposed to trying out a new system on a few entries, such as father and its five ancestors? Like {{desctree}}, this would have no effect on any other entry. Ioaxxere (talk) 06:11, 18 March 2024 (UTC)Reply
@Ioaxxere: I see a number of possible problems. Can you assure me that they aren't?
1. Intermediate steps may be unattested.
2. Uncertainty as to the borrowing route. The OED has this problem with terms that may come from French or some form of Latin, and Thai has many words for which the ultimate source is Pali or Sanskrit, and indeed some which are blends of the two. A further problem is that many of these words were probably (but not certainly) borrowed via Old Khmer, where the spelling is chaotic. The path of mainland SE Asian loans from Pali or Sanskrit may be very uncertain.
3. Clusters of 'obvious' cognates, but for which there is no authoritative proto-form. Tai languages often show this.
4. Word A is derived in language X, and word B in language Y may be inherited from A in X or independently formed in Y. RichardW57m (talk) 16:51, 18 March 2024 (UTC)Reply
@RichardW57m Here are my proposed resolutions.
  1. User:-sche highlighted this issue with entries like French chabot, which a reference suggests derives from Latin caput through Vulgar Latin *capoceus (which is unlikely to ever be created). This could be written as: {{etymon|id=fish|der|unc|la>caput>head|text=perhaps from <1> (via Occitan, from unattested {{m+|VL.|*capoceus}})}}.
    In natural language, this represents: French chabot (etymid: fish) may be derived from Latin caput (etymid: head), but this is uncertain. Also, the entry should display the text "perhaps from Latin caput (via Occitan, from unattested Vulgar Latin *capoceus)".
    The template would also be able to automatically fetch the ancestors of Latin caput (etymid: head), although we probably don't want that in this case.
  2. One example of this is in English crusado, which is partially borrowed from Spanish cruzado as well as Portuguese cruzado. This could be written as: {{etymon|id=crusader|bor|es>cruzado>cross|pt>cruzado>cross}}
    In natural language, this represents: English crusado (etymid: crusader) is borrowed from either/both Spanish cruzado (etymid: cross) and Portuguese cruzado (etymid: cross). The template would automatically generate the text "Borrowed from Spanish cruzado and/or Portuguese cruzado." in the entry (the |text= parameter could be used to change the display text).
  3. If there is no proto-form, the {{etymon}} template wouldn't have any ancestors listed and wouldn't be very useful. If for some reason the only thing we knew about English king was that it was cognate with German König, the entry might have: {{etymon|id=monarch}} Cognate with {{m+|de|König}}. (Note: for now, I'm not sure if it's possible to automatically get cognates as we would have to go up and then down the etymology tree, although it might be possible with category stuff).
  4. This one's simple: English unlock (for example) is from Middle English unloken but also equivalent to un- +‎ lock. This could be written as: {{etymon|id=open lock|enm>unloken>unlock|afeq|un->inverse|lock>mechanism}}.
    In natural language, this represents: English unlock (etymid: open lock) is inherited from Middle English unloken (etymid: unlock), and is also equivalent to English un- (etymid: inverse) + English lock (etymid: mechanism). The template would automatically generate the text "From Middle English unloken, equivalent to un- +‎ lock." in the entry.
Ioaxxere (talk) 18:32, 18 March 2024 (UTC)Reply

Adding "Língua Geral" as a new language edit

It's a fairly well documented language, and represents a 150-year-old path between Old Tupi (tpw) and Nheengatu (yrl). Língua Geral has some evolutions that appear in both late Portuguese borrowings and Nheengatu, and are hard to show without it, like some changes in pronunciation (kunumĩ > kurumĩ) and meaning (paranã (sea) > paraná (“river”)). Língua Geral is also present in a good number of Brazilian toponyms, like Botuverá, and having an Etymology section saying "From Old Tupi" would just be wrong.

The cut between Língua Geral and Nheengatu is set at 1853 by most scholars, when the word "Nheengatu" was first used with the current meaning. The cut between Old Tupi and Lingua Geral is a bit more nebulous. Navarro used 1700 for his dictionary, so I'd go with that. For the code, I sugest <tpw-lg>, as it comes from Old Tupi.

There existed two varieties, Língua Geral Amazônica and Língua Geral Paulista, but I think that they could just be pointed out using {{lb}} when needed, rather than two separated L2 headings (if this ever become a new L2). What do you think?. Trooper57 (talk) 20:31, 16 March 2024 (UTC)Reply

@Trooper57 I think what you are proposing is a full (L2 header) language. I don't know much about the differences between Old Tupi, Nheengatu and Língua Geral, but 150 years seems rather narrow a window for a full language; unless things evolved really fast, this could also (maybe better) be handled as an etym-only language variant of either the preceding or following stages. At least to me, the changes in pronunciation you give (kunumĩ > kurumĩ and paranã -> paraná with a semantic shift) do not seem indicative of a radical transformation in the language. Also, for a code I'd suggest maybe tpw-lig as we try to make the second component of a two-component language code have three chars. Benwing2 (talk) 21:57, 16 March 2024 (UTC)Reply
Maybe a etym-only would suffice. The trouble I was having was how to state a word came from a later stage o Tupi, and not from the 16th century.
Also, as I understand, the fast pace of LG comes from its marginalization: Marquis de Pombal prohibited anything besides Portuguese, so it ended being a unscripted, nonstandardized language. Trooper57 (talk) 00:26, 17 March 2024 (UTC)Reply
@Trooper57 If we add tpw-lig as an etym-only language variant of Old Tupi, then all you need to do is use the code in place of tpw and it will show as "From Língua Geral ..." with the appropriate link to the Língua Geral category (which doesn't seem to exist but can be created). Benwing2 (talk) 00:32, 17 March 2024 (UTC)Reply
@Trooper57 BTW "Língua Geral" and "Geral" are listed as other names for Nheengatu in our data, which is consistent with how Wikipedia describes things (i.e. Língua Geral being an older stage of Nheengatu rather than a later stage of Old Tupi). Benwing2 (talk) 00:39, 17 March 2024 (UTC)Reply
There are authors that call everything from 1500 onwards "Lingua Geral". It gets really confusing sometimes. Trooper57 (talk) 01:23, 17 March 2024 (UTC)Reply
@Trooper57 OK. Let's wait a couple of days for anyone else to weigh in who might be knowledgeable about this topic (please ping anyone you think might be able to contribute), and then we can create an etym-only lang for Língua Geral, either tpw-lig or yrl-lig, whatever you think most appropriate. We can also create subvarieties *-lga (Língua Geral Amazônica) and *-lgp (Língua Geral Paulista) if you think this would be helpful (e.g. if you think these varieties will ever see fit to be cited in an etymology). Benwing2 (talk) 01:53, 17 March 2024 (UTC)Reply
@RodRabelo7 and @NoKiAthami are the other Old Tupi editors I can think of. There's also @Arthur botelho, but he's inactive since 2019. Trooper57 (talk) 02:22, 17 March 2024 (UTC)Reply