Wiktionary:Beer parlour/2023/March

discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← February 2023 · March 2023 · April 2023 → · (current)

I've got an impression that HAKHSIN (talkcontribs) pushes through made up terms. I don't know Persian enough but his words are not even searchable in Google. He has been adding Persian translations to terms he makes entries in the Persian Wiktionary for. (Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ): Please check if you care.

I had one little conversation. My last question/comment was left unanswered. Anatoli T. (обсудить/вклад) 04:32, 1 March 2023 (UTC)Reply[reply]

@Atitarev I blocked this user for 3 days, since they're not responding to questions about the likely made-up terms and this isn't the first time this user has come up in connection with problematic edits. Benwing2 (talk) 06:50, 1 March 2023 (UTC)Reply[reply]
I am a native persian speaker and i can confirm that his terms were made up and meaningless. Karen kalantari (talk) 06:41, 16 March 2023 (UTC)Reply[reply]

<languages />

Reminder: Office hours about updating the Wikimedia Terms of Use edit

You can find this message translated into additional languages on Meta-wiki.

Hello everyone,

This a reminder that the Wikimedia Foundation Legal Department is hosting office hours with community members about updating the Wikimedia Terms of Use.

The office hours will be held on March 2, at 17:00 UTC to 18:30 UTC. See for more details here on Meta.

Another office hours will be held on April 4.

We hereby kindly invite you to participate in the discussion. Please note that this meeting will be held in English language and led by the members of the Wikimedia Foundation Legal Team, who will take and answer your questions. Facilitators from the Movement Strategy and Governance Team will provide the necessary assistance and other meeting-related services.

On behalf of the Wikimedia Foundation Legal Team, Mervat (WMF) (talk) 18:19, 1 March 2023 (UTC)Reply[reply]

Unverifiable derogatory term of a specific modern time person edit

@Justinrleung, Thadh added info of a specific political figure into 影帝 (lit. acting emperor)changes, stating that it is a derogatory label for that political figure. There is indeed a book whose title labels that political figure as such, but the said users did not give any reliable source to verify if this usage is widespread. Does Wiktionary policy allow such info even with a rfv-sense template? Sameboat (talk) 22:41, 1 March 2023 (UTC)Reply[reply]

@Sameboat: We are not adding something, but putting something removed out of process back. The RFV process requires the relevant senses to be on the page until official failure of the process. — justin(r)leung (t...) | c=› } 22:43, 1 March 2023 (UTC)Reply[reply]
Yes, we do. Any information that is added and hasn't gone through an RFV yet should go through RFV before being removed. If there is reason to assume that a sense is a fabrication or vandalism, an RFV may be speedied, but I don't think that's applicable here. Thadh (talk) 22:43, 1 March 2023 (UTC)Reply[reply]
I'd agree if this does not involve a derogatory label against a modern day person. Sameboat (talk) 22:45, 1 March 2023 (UTC)Reply[reply]
I don't know if this will make it more acceptable to you, but the verification of derogatory terms is already expediated compared to other terms (WT:DEROG). — justin(r)leung (t...) | c=› } 22:48, 1 March 2023 (UTC)Reply[reply]
I understand your concerns, but if we start going around removing unflattering nicknames of public figures we lose all credibility as an unbiased source. As Justin said, we do handle this kind of term more quickly than other words. Thadh (talk) 22:51, 1 March 2023 (UTC)Reply[reply]
Please read our rules at the top of Wiktionary:Requests for verification/CJK and our WT:CFI. Vininn126 (talk) 22:47, 1 March 2023 (UTC)Reply[reply]
How long does the rfv take? Sameboat (talk) 22:53, 1 March 2023 (UTC)Reply[reply]
Depends on the individual request. Usually at minimum a month, unless there is reason to speedy a request. Vininn126 (talk) 22:59, 1 March 2023 (UTC)Reply[reply]
This one in particular should take two weeks since it's derogatory. — justin(r)leung (t...) | c=› } 23:08, 1 March 2023 (UTC)Reply[reply]
@Justinrleung, Vininn126: I don't see what the point of removing Sameboat's etymology for the common noun is, the article can list both until the RFV is resolved. —Al-Muqanna المقنع (talk) 23:17, 1 March 2023 (UTC)Reply[reply]
@Al-Muqanna: It's general practice in Chinese entries to not have elaborations on compounds in the etymology section because it is kind of redundant to the {{zh-forms}} box. — justin(r)leung (t...) | c=› } 23:28, 1 March 2023 (UTC)Reply[reply]
Yeah, I'm just not sure the significance of 影 would be obvious to someone not familiar with the language with the automatic "picture, image, reflection" gloss. I see it's been clarified in the box now though, which works for me. —Al-Muqanna المقنع (talk) 23:49, 1 March 2023 (UTC)Reply[reply]

Translations sections in non-English terms edit

See കൂപമണ്ഡൂകം. Created by User:Vis M. Normally I'd just delete the section but this term is a language-specific idiom (see Kupamanduka) that doesn't appear to have an equivalent lemma in English, and the translations are of equivalent terms in other languages. What is our policy in such cases? Benwing2 (talk) 05:14, 3 March 2023 (UTC)Reply[reply]

As I understood translation sections were only for English entries - I do see the dilemma here. I don't see why we don't just move these to the etymology section using {{cog}} saying "compare". Vininn126 (talk) 09:27, 3 March 2023 (UTC)Reply[reply]
For this particular entry, it seems that "frog in a well" / "frog in the well" is used in English. It should be possible to create an English entry to house the translations. – Wpi31 (talk) 15:54, 3 March 2023 (UTC)Reply[reply]
I'm having difficulty finding examples of idiomatic usage in English: in the uses I can find it's either in translations from Chinese/Sanskrit/etc. or given an explicit explanation (with the understanding the reader won't otherwise be familiar with it). Regardless, even if it's unverifiable as an English idiom I think it would make pragmatic sense to create frog in a well as a translation hub. Given how widespread the idiom is in Asia (there are also at least Japanese and Korean versions) it wouldn't make sense to arbitrarily host it at the Malayalam entry, where people who come across the term in other languages won't find it. —Al-Muqanna المقنع (talk) 17:28, 3 March 2023 (UTC)Reply[reply]
Yeah, let's create a translation hub. Some other sets of non-English terms that all mean the same thing but lack an English translation are at Appendix:Terms considered difficult or impossible to translate into English; perhaps the various frog in a well phrases in different languages should be listed there, and/or the other sets of terms listed there should have translation hub entries. In a few similar cases, I've resorted to ===See also=== to link such things (although this is not ideal as it doesn't clarify the nature of the connection and some people think "See also" should only link same-language terms), or the etymology section as Vininn suggests. - -sche (discuss) 21:23, 3 March 2023 (UTC)Reply[reply]
@Vininn126, Wpi31, -sche, Al-Muqanna Thanks for the suggestions; I created frog in a well as a translation hub. Benwing2 (talk) 23:28, 3 March 2023 (UTC)Reply[reply]

Ban Donnanz from participating in RFD edit

Time after time, Donnanz has proved woefully incapable of participating constructively at RFD discussions. He very rarely, if ever, makes any attempt at presenting cogent arguments. When someone asks clarification from him, all they're met with is dismissiveness.

Therefore, I hereby propose that Donnanz be banned from taking part in RFD debates. PUC – 20:36, 3 March 2023 (UTC)Reply[reply]

Can you provide a few examples? Ioaxxere (talk) 21:08, 3 March 2023 (UTC)Reply[reply]
For example:
The commonality is that the user often either gives no argument at all, or arguments that (given our CFI) are completely irrelevant, while refusing to acknowledge the relevant CFI rules. The user’s dismissiveness of other’s arguments gets by times a bit vitriolic.  --Lambiam 06:36, 4 March 2023 (UTC)Reply[reply]
I have to agree with this. It's frustrating and offputting. Theknightwho (talk) 09:11, 4 March 2023 (UTC)Reply[reply]
to be fair the "beyond repair" one has a valid argument, the "this is a set phrase" part, and he does reply afterwards, so I don't think that that specific example is the best. AG202 (talk) 13:47, 4 March 2023 (UTC)Reply[reply]
Oppose. While I find it irritating banning a specific user seems like the wrong end of the stick—if RFDs are being resolved on the basis of comments like this then there's a problem with the policy for how RFDs are resolved, if they aren't then it seems pointlessly punitive. —Al-Muqanna المقنع (talk) 15:20, 4 March 2023 (UTC)Reply[reply]
Well I never. Apart from not receiving any notification of this (why?), many of the "reasons" quoted by PUC and Lambiam are quite trivial, there appears to be prejudice against anyone voting keep without giving a reason, and I am often outvoted anyway. Does this mean deletionists want entries deleted without contest? It doesn't bother me if I'm banned from RFD (even if I just comment, and don't vote at all?), I have other things to concentrate on. Re the Dickens point, another way around that would be the definition: "A surname, notably that of Charles Dickens". Then you could remove the separate definition. DonnanZ (talk) 20:52, 4 March 2023 (UTC)Reply[reply]
Banning individual editors from specific tasks is clearly not the way forward, as the recent I-am-annoyed-about-Wonderfool's-admin-nominations vote evinces. I wouldn't worry Donnanz, you and I both know there's not a snowball's chance in hell this dumb suggestions will come to anything. You have my 100% support, and we appreciate your hard work here. Van Man Fan (talk) 21:09, 4 March 2023 (UTC)Reply[reply]
It is unseemly to label editors as “deletionists” for trying in earnest to apply our criteria (which do not necessarily correspond to their personal preferences). A manifest contemptuous disregard of this policy is not merely annoying but interferes with the discussions.  --Lambiam 08:07, 5 March 2023 (UTC)Reply[reply]
Using such a label doesn't give me any pleasure. I don't vote on every RFD that comes along, some I am indifferent to. I have been accused of trolling by another "dodgy" admin, I'm still awaiting an explanation of that. I have been editing here for 9½ years now, and in that time PUC in his various guises has been enthusiastic (obsessed?) about removing what he sees as contraventions of CFI, SoP policy or whatever. I am the opposite, and now PUC is an admin he thinks he can ban me from RFD, which is rather draconian. Semi-related to this is the permanent ban of Dan Polansky, which seems to be a fate worse than death. I seem to get on better these days adding to Wikipedia, mainly surname disambiguation pages. DonnanZ (talk) 13:56, 5 March 2023 (UTC)Reply[reply]
The only reason you call me “dodgy” is because you liked the fact Dan Polansky voted “keep” on pretty much everything - and it’s no surprise that you brought that up, really.
What you have pointedly not done is actually address any of the concerns raised here. It’s disappointing. Theknightwho (talk) 17:13, 5 March 2023 (UTC)Reply[reply]
I don't think it's best to compare yourself to Dan Polansky as he was permabanned primarily for racist attacks towards other editors... AG202 (talk) 17:37, 5 March 2023 (UTC)Reply[reply]
No, I am not comparing myself to Dan, I was referring to his ban, which is rather harsh.
I have addressed one concern re Dickens, withdrawing my vote and suggesting a solution. DonnanZ (talk) 17:47, 5 March 2023 (UTC)Reply[reply]
God Defend New Zealand dealt with. I am an NZer, BTW. DonnanZ (talk) 22:15, 5 March 2023 (UTC)Reply[reply]
North Atlantic Treaty Organization: I naturally prefer British spellings, but I know that it would have the "z" spelling if reinstated. I think I was misunderstood by my critic, but I revised my comment anyway. DonnanZ (talk) 21:45, 6 March 2023 (UTC)Reply[reply]
Any editor could have proposed this ban. Being an admin has nothing to do with it.  --Lambiam 19:29, 5 March 2023 (UTC)Reply[reply]
I accept that point if it's true, PUC was the only one motivated though. DonnanZ (talk) 21:03, 5 March 2023 (UTC)Reply[reply]
Oppose. Just ignore his vote if it's nonsensical, but banning him from any voting makes it impossible for him to give any arguments at all, even if they do make sense. Thadh (talk) 19:57, 5 March 2023 (UTC)Reply[reply]
Oppose. He's far from the only person to vote without providing a justification (or providing an unconvincing one). --Overlordnat1 (talk) 22:08, 5 March 2023 (UTC)Reply[reply]
Oppose. For al the reasons already given by others above. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 00:16, 13 March 2023 (UTC)Reply[reply]
Oppose. No adequate reason given for unprecedented selective ban, let alone any broader action. DCDuring (talk) 15:46, 13 March 2023 (UTC)Reply[reply]
Oppose. While I share your frustrations with DonnanZ's inability to explain his decision or adhere to CFI, I don't think that warrants a ban. Though I really do wish discussions were less... frustrating. Vininn126 (talk) 15:57, 13 March 2023 (UTC)Reply[reply]
Oppose. A ban is far too harsh of a response. Plenty of people vote "keep" without giving a good reason (or any reason at all); providing one is not mandatory. If someone makes "illogical" arguments, they can be ignored and outvoted. When outvoted, DonnanZ accepts the outcome, even if they don't personally agree with it. Thus they have my support to continue participating in RFD. Megathonic (talk) 20:17, 16 March 2023 (UTC)Reply[reply]
I was hoping that User:PUC would see which way the wind is blowing and withdraw this proposal, but no such luck. I am very grateful for the support given to me by many editors. DonnanZ (talk) 11:22, 23 March 2023 (UTC)Reply[reply]
I would ask you kindly to take some of the comments to heart and consider things like CFI and explaining yourself more in the future. Vininn126 (talk) 11:39, 23 March 2023 (UTC)Reply[reply]
@Vininn126: I have already. As for CFI, it has shortcomings and isn't perfect, so won't satisfy every user. DonnanZ (talk) 12:11, 23 March 2023 (UTC)Reply[reply]

Proposed change to CFI edit

Through my experiences closing RFVs I've noticed that CFI doesn't always align with actual practice. My suggestion is to add this into Wiktionary:Criteria for inclusion § Inflections:

Regular inflections of terms, such as English feats or crossed, may be cited with only one attestation in a durably archived source even if they are not part of a limited documentation language. What counts as a "regular" inflection should be decided for each individual language. This does not apply if the regular inflection is nonstandard or uncommon relative to the lemma, such as beed (regular past tense of English be).

Inflections, including irregular ones, count towards attesting their lemma form. For example, three quotations for soars, soaring, and soared are sufficient to attest soar.

Any unattested entries should be noted as such in a usage note.

Would anyone be interested in voting for this? If so, what about if the first paragraph was changed to "zero attestations"? That would mean three quotations for soar would be sufficient to create soars, soaring, and soared.

Ioaxxere (talk) 20:41, 3 March 2023 (UTC)Reply[reply]

I feel like it should be zero attestions. For example, if a Spanish speaker were to read "123ar", they would know that if they wanted to make the verb first-person plural present, they would say "123amos", etc. with the other forms. Three citations, for all senses. (talk) 21:04, 3 March 2023 (UTC)Reply[reply]
In practice we already don't require attestation of regularly formed inflections of attested lemmas unlessd there is some question about whether the form is attestable (e.g. if there is an irregular inflection of the same meaning, like 'fungi', or if the term is singular-only, etc.). The main exception is dead languages, esp. those with low attestation like Old Irish; and even then in well-attested dead languages (e.g. Latin) we often create entries for non-lemma inflections without attestation. Benwing2 (talk) 21:36, 3 March 2023 (UTC)Reply[reply]
I also wonder how we should handle regular spelling variation. For example, converbialisation is not attestable while converbialization is, but it feels very silly to delete the latter. It's just not a very common word. Theknightwho (talk) 22:02, 3 March 2023 (UTC)Reply[reply]
we have only been documenting attested middle and old polish forms, mostly due to a lack of uniformity, however for obsolete terms we generate full inflection tables. Vininn126 (talk) 22:27, 3 March 2023 (UTC)Reply[reply]
I would support the version without attestations. There was an RFV in October where we agreed that the rare English verb soccer was considered cited by two uses of soccer and one of soccered, and that this also extended to the inflected form soccers which was the form that was originally sent to RFV. So my understanding is that it is already our policy that we don't need to find attestations for every inflected form of a word, at least in English. I would hope not, anyway ... that could generate a lot of spurious RFV's. It is only good when the forms are irregular or unpredictable .... for another rare English verb, coorie, I had trouble finding uses of the -ing spelling, but eventually was able to turn up coorying ... another editor later added coorieing, so it seems that this verb can take both spellings. Soap 04:36, 4 March 2023 (UTC)Reply[reply]
I’m fine with zero for regular inflections in living languages, which I think reflects current practice. 70.172.194.25 21:41, 4 March 2023 (UTC)Reply[reply]
I agree with Soap and Benwing2 that we should keep credible, regular forms, even without dredging for instances. (There's a good case for considering ephemeral evidence.) --RichardW57m (talk) 16:31, 6 March 2023 (UTC)Reply[reply]
Do we yet have a mechanism for challenging the content of inflection tables? Dan Polansky tried to RfV an English comparative lurking in an inflection line, but I never managed to find a record of the challenge later. I have doubts about some of the forms in Pali inflection tables, but they've stayed for a lack of a consensus. (I would welcome advice on how to publicly accumulate evidence for endings as 'regular'.) --RichardW57m (talk) 16:31, 6 March 2023 (UTC)Reply[reply]
With the proposed rule, what would happen to inflected forms in a table if they failed RfV when they ventured onto their own page? Many table-generators seem to lack a mechanism to remove a form. I loathe orange links in inflection tables because they appear blue to visitors. Fixing them looks non-trivial. --RichardW57m (talk) 16:31, 6 March 2023 (UTC)Reply[reply]
As far as I know there are no policies as to what should be in a table. Generally they should be whatever is most helpful to readers, I guess. Ioaxxere (talk) 21:58, 6 March 2023 (UTC)Reply[reply]
Not on this subject, I noticed that entries for names of species (botanical, zoological) aren't mentioned in CFI. They are accepted by default, I guess. DonnanZ (talk) 13:49, 19 March 2023 (UTC)Reply[reply]

Proposal to expose and clean up HTML comments edit

There are currently about 26,000 sections with HTML comments scattered throughout the main Wiktionary namespace. Here's a random sample of 1000 comments.

The comments seem to fall into a few categories:

  1. Comments left to save future editors from duplicating an effort
    • all but:English:Adverb Note: Do not add the sense "all except", as in "all but three of them were left", as it is not a set phrase and its meaning can be derived from "all" and "but"
    • cẩu:Tày:Etymology 2 The Vietnamese word is never used as a neutral word for "dog" and its humorous connotation is probably recent, so unless the Tày word is also humorous, it's probably a direct Sinitic loan.
  2. Comments that raise questions or doubts
  3. Comments urging some action
  4. Comments that disable something
  5. Comments that could be categories
  6. Comments that could have been part of the edit summary
    • basis:Latin:Noun my own translation, since all the ones I found online either didn't include Sirach or didn't translate 'bases virtutis' literally.
  7. Comments that may be better on the talk page
    • bat-fowling:English:Noun Had a description from a Robert Graves short story here previously, but what he was describing was in Majorca and seemed a little different from this Cyclopaedia description. Believe Graves was describing only something similar rather than being authoritative on the term.

I think in many of these cases, there's a better option than using HTML comments, ideally a template like {{attention}} that would automatically categorize the page for cleanup and expose comments to users who want to see them.

I propose creating a new template, possibly {{rfc-comment}}, and adding it by bot to the beginning of each section containing a HTML comment, excluding Translations, which already uses {{t-check}} for most of the lines with comments. The new template could generate a banner visible at the beginning of each section alerting users that the section contains HTML comments and needs manual cleanup. The banner could have a link to a page describing how best to handle the various types of HTML comments and possibly even the HTML comment itself (or the first line non-blank, if it's a multi-line comment). Additionally, the template would categorize the page into categories like "Category:Requests for cleanup in LanguageName" and "Category:Requests for cleanup in LanguageName SectionName"

This wouldn't have to be done all at once, we could start by tagging only specific section within a few languages that have editors interested in this type of cleanup and iterate from there.

I'm interested in feedback, suggestions, and specific ideas for how to categorize the pages and how best to handle the various classes of comments. JeffDoozan (talk) 02:39, 4 March 2023 (UTC)Reply[reply]

@JeffDoozan: None of them regularly seeks attention to go to the article, which a category would be for. You just raised mine by a row of bot edits (5 in my watchlist) none of which justified the insertion of {{attn}}. Comments are there for the mere eventual attention of those who want to edit the entry–otherwise man also knows {{attn}} but intentionally did not use it. In the most often cases I know the comments are there to give an exact reasoning behind a definition that could later help decipher vague wording but does not need to do anything.
So in tahulla someone said why it is a superseded spelling, in alarguez I added a number of vernacular names typically equated in reference works from which I could map unto modernly appropriate botanical glosses, álabe contains technical terms in German found in various dictionaries to make it reconstructible how the English glosses came about—we sometimes feel that we can be transparent that our definitions are based on translating German, Latin, Russian ones etc.—, similarly even for pastry stuff like alcorza the German versions felt much lighter—also note that people translate my dictionaries entries into Chinese etc., surely it can make a difference: it still is just an exception for suspected irresolvable inexact or ambiguous mapping between languages—, segur contains the source of a descendant otherwise not easy to find since we cluttering descendants with footnotes.
Or of course they just disable something or warn so something dubious is not added. Your number three is very telling as the comment exactly urges to not take the particular action, of reinserting the hidden text, since it belongs elsewhere.
So none of them needs category or cleanup or a template (RAM and unworthy attention seeking). It is also unheard of that reasonings in the source-codes of FOSS are purged without the codes they comment themselves; of course we use HTML comments just like in any programmed app, how could one think it different? I also frequently confuse “edit summaries” with commit messages, no difference in my opinion, and sometimes one uses commit messages and sometimes source comments and in either case one assumed that the text should gain no activity. Fay Freak (talk) 04:20, 4 March 2023 (UTC)Reply[reply]
@JeffDoozan From the example comments, it appears only some of them need addressing; my concern about adding an {{rfc-comment}} banner at the top of every section with an HTML comment is that it would be a lot of noise. Also you'd potentially be duplicating the functionality of {{attn}}; maybe better to just fix {{attn}} as needed. For example, it might make sense to have {{attn}} comments displayed by default; having them hidden makes them invisible to someone browsing the page, and hence much less likely that they will be addressed. HTML comments in a sense are like {{attn}} but even more invisible, and can be used in place of visible {{attn}} to leave an invisible comment.
BTW I have absolutely no idea what Fay Freak's comment means, which is par for the course.
Benwing2 (talk) 05:34, 4 March 2023 (UTC)Reply[reply]
The display of comments from {{attn}} was disabled a few months ago, based on an earlier discussion. It might be handy to have an invisible-by-default template that marked the presence of html comments to facilitate the use of Cirrus search to find comments (using insource=, which is not usable without a filter such as hastemplate=). Alternatively, the xml dump could be processed to yield a complete list of entries with HTML comments, preferably grouped by the language of the L2 section(s) in which they appear. DCDuring (talk) 19:34, 4 March 2023 (UTC)Reply[reply]
As is, one can identify HTML comments in even large categories of entries by using searches such as "incategory:"English adverbs" insource:/\<\!\-\-/". This search would have a number of false positives (ie, the HTML comments could be in non-English L2s), but is very practical, requiring no investment of technical or other resources beyond learning basic Cirrus search, which has many other applications. DCDuring (talk) 19:45, 4 March 2023 (UTC)Reply[reply]
I think your Category 6 example would be lost in a change history. It's telling an editor that other translations may be unsuitable, and giving criteria to use in assessing them as replacements. --RichardW57m (talk) 17:35, 6 March 2023 (UTC)Reply[reply]
On the topic of HTML comments, let me briefly also link to here, where there is a list of potentially problematic (unclosed) HTML comments, and where I suggested we might benefit from identifying what pages have the longest HTML comments (e.g. someone commented out four language sections and no-one has noticed); see this for how some long comments on Wikipedia were found by HaeB. - -sche (discuss) 03:19, 13 March 2023 (UTC)Reply[reply]
Unclosed HTML comments seem to be more clearly problematic than HTML comments in general. DCDuring (talk) 16:04, 13 March 2023 (UTC)Reply[reply]

Retiring derivative subpages edit

I have been going through the 50-something derivative subpages and moving their contents into the main entry, while making changes to ensure it does not go over the Lua memory limit. <technical stuff> Most of the entries only need changes such as replacing ineffcient templates (e.g. {{ja-r}}, {{ko-l}}, {{col3-u}}) with more efficient ones like {{ja-r/multi}}/{{ja-r/args}} and {{der-top}} etc., and the use of some lite templates. Some entries required using manual transliteration in {{zh-x}}, which otherwise uses a large amount of memory when loading the pronunciation tables. The few especially problematic pages are , , , and (in particular its descendants section), which although I have successfully made them go under the 50MB limit, there is not much leeway to further reduce the memory usage if more content is added. </technical stuff>

With these changes, it means that the derivative subpages are no longer needed, but they do include some useful history (which I have attributed when copying the contents over), so should they be deleted? Please also suggest any other concerns if the derivative subpages are to be retired. Wpi31 (talk) 06:52, 4 March 2023 (UTC)Reply[reply]

@Wpi31 I don't have any particular concerns with deleting unneeded derivative subpages. However, it would be great if you could write up a guide to optimizing memory usage, including such things as which inefficient templates to replace with which others and how, and which {{*-lite}} templates exist and (as far as you know) what their limitations are. A lot of this stuff is currently just tribal knowledge, and having it written up would go a long way towards helping other editors figure out how to do this stuff. Maybe User:Theknightwho, User:Surjection and/or IP 70.* can help augment the guide as they also have done significant work reducing memory usage. Benwing2 (talk) 07:20, 4 March 2023 (UTC)Reply[reply]
Good idea. In fact sometimes I have no idea how some of the lite templates work and miss out a parameter. The guide is already partly covered by Wiktionary:Lua memory errors#Tactics, but it's just skimming over the topic on the surface, and I think it's better to split it into a separate page. – Wpi31 (talk) 07:28, 4 March 2023 (UTC)Reply[reply]
@Wpi31 I've been seeing a few pages returning to CAT:E with memory errors after your edits. The problem is that the trend toward increasing memory usage is still going on, so what works now may not keep working for long. Be careful not to overdo it, and keep an eye on CAT:E. Chuck Entz (talk) 01:32, 6 March 2023 (UTC)Reply[reply]

Template editor permission request edit

I want to edit some protected modules (such as Module:bo-pron and Module:languages/data*). Please grant me template editor permission. Thanks. -- 14:49, 4 March 2023 (UTC)Reply[reply]

Seeking feedback on trying to add a kind of usage for terms like "fifty" as in "I was going 63 in a 50". edit

Is there any way to include these kinds of uses, which are common in the United States? These usages refer to the speed limit in a given region, such as "You will get a reckless endangerment charge if you're going 70 (miles per hour) in a 30 (mile per hour speed limit zone)". My suspicion is that there is no real way to catalogue these, but I'm open to feedback on if there's some way to include this usage. —Justin (koavf)TCM 09:16, 5 March 2023 (UTC)Reply[reply]

Not sure I understand. Are you thinking of something other than just adding a third usage under fifty#Noun? I'd think we'd also want to add an adverbial entry for the 70 in your example. kwami (talk) 10:09, 5 March 2023 (UTC)Reply[reply]
I am thinking of that, yes. Do you think that would be an appropriate usage to add? In theory, there could be an infinite number of these added, but in practice, they would only be 25 to 70 in multiples of five. Good point on the adverb as well. —Justin (koavf)TCM 10:31, 5 March 2023 (UTC)Reply[reply]
You would have to put up with 20mph speed limits around here, whether it's a suburban cul-de-sac or a main road. DonnanZ (talk) 14:39, 5 March 2023 (UTC)Reply[reply]
The phenomenon seems to me to be an example of context-specific shortening by omission of what would be the head of an NP. This seems to me to be simple pragmatics, not something lexical. Other examples using fifty:
I got a fifty(-dollar bill) from my grandma for my birthday.
He looked like he was in his fifties. (years of age)
I don't want anyone to hear that 'fifty(-caliber machine gun) at night, understand? Just cover it and leave that fat barrel sticking out for show.
6IH and 6CFE combined stations and are using a "fifty"(-watt transmitter).
[] with Herman Hyde and John Stoker captains of fifties (units of fifty men)
He had an office on Lexington in the fifties. (from 50th to 59th Street)
By the 10th century it had become the custom to divide the Psalter into three books of fifties (fifty psalms)
Thus each fifties selector has [] access to every terminal selector of each fifty. (telephone circuits)
These large and fairly constant yields during the forties and fifties, moreover, seem not to have been caused by high prices [] (1850s)
The differences among these are whether the usage context is widely experienced (first two) or not and in what time period it was experienced (last one), not attestability. DCDuring (talk) 19:13, 5 March 2023 (UTC)Reply[reply]
No other OneLook dictionary has fifty as an adverb. Does OED have it as one?
I'm not sure that we should have this kind of adverbial use of a noun, but we do, eg, home#Adverb. DCDuring (talk) 19:25, 5 March 2023 (UTC)Reply[reply]
Thank you for clarifying my thinking on this: this really puts it into perspective. I was driving around thinking about how I'm "going from a 50 to a 40" and wondering how I would explain it to someone with poor English skills and whether or not it would fit here. Seems like there'ss no real way to capture this as a kind of definition, per the examples you gave above. —Justin (koavf)TCM 19:27, 5 March 2023 (UTC)Reply[reply]
I agree with DCDuring. This seems to be just an omission of certain words rather than a different part of speech. — Sgconlaw (talk) 01:39, 6 March 2023 (UTC)Reply[reply]
I'm slightly in two minds about it. The meaning is dependent on context, if you were talking about your speed in continental Europe you might use a number (say 'X') to mean 'X km/h' or 'X km/h zone' rather than mph, for example, but we do have things like both forty and 40 being used to refer to a 40 fl oz bottle and 40 referring to a score in tennis. On what basis are we deciding to keep these and reject the meaning of '40 units of speed' (which could even be 40 knots) and 'a zone with a speed limit of 40 units of speed'? --Overlordnat1 (talk) 02:44, 6 March 2023 (UTC)Reply[reply]
Beside the size of a beer bottle, we list a monetary denomination with "A banknote or coin with a denomination of 50."
So, if we wish to be consistent, we need to decide how to handle those entries as well: Should we have no examples at all, and delete the current currency and beer-bottle entries; list only a consensus list of the most common (e.g. currencies, speed limits, ages), or have open-ended and potentially indefinite lists for whatever someone happens to find attestation for?
Or, should we perhaps have a single entry for "ellipsis of a noun phrase containing a numeral", and restrict the specific definitions of age, currency, bottle-size etc. to an illustrative but non-exhaustive list of examples under that single entry? We could then argue that the definition is complete even if some particular usage is not listed. That and the original context should hopefully then be enough for a non-native speaker to understand any instances that they come across. kwami (talk) 03:54, 6 March 2023 (UTC)Reply[reply]
I think something like that is the best approach: have a definition that is a clipping of [number] with [unit] and usage examples "Can you lend me a 20?" and "He was going 70 in a 40", etc. But do we add them to all numbers? Multiples of five? :/ —Justin (koavf)TCM 04:38, 6 March 2023 (UTC)Reply[reply]

Signs of a potentially problematic IP edit

Some of you may remember Fête (talkcontribsglobal account infodeleted contribsnukeabuse filter logpage movesblockblock logactive blocks), who was well known to live in Quebec and who combined poor English skills with a very peculiar way of looking at things to create lots of bad edits. That and their annoying habit of hanging out on peoples' talk pages and constantly asking questions got them (and their socks) globally banned. I just noticed an IP geolocating to Quebec, Special:Contributions/65.92.244.151, spending a lot of time on the entry for Phung. Based on the name of one of their sock accounts, that may very well be Fête's name in real life. I've noticed this IP for at least a few months working a lot on given names and surnames, but had no reason to connect them with anyone.

I would note that there's another Quebec IP who has been systematically adding entries on technical subjects for many years. In spite of my misgivings based on their wideranging subject matter and closeness to Fête's location, their edits have checked out every time I looked into it, and I don't think they're the same person.

All of this is very circumstantial. It's also true that Fête was fairly young when they were active a decade or so ago, so they might have grown out of their problematic stage. Nonetheless, this makes me nervous and I'd appreciate others looking at this IP's edits. Chuck Entz (talk) 04:31, 6 March 2023 (UTC)Reply[reply]

For context: There are 8.5 million people in Quebec and it's two times larger than Texas. WordyAndNerdy (talk) 05:39, 6 March 2023 (UTC)Reply[reply]
I'm aware of that, but there are certain IP editors associated with certain areas, whether it's the south end of Long Island in New York, St. Louis, Missouri (not that far from where my mother was born and raised), Occidental College in Glendale, California not far from me, Philadelphia, Pennsylvania, etc., as well as the north end of London, the Pays de Loire in France, Land Berlin in Germany, Thailand and Vietnam. That's not even getting into the IPs I run into in my checkuser work. You may not be aware of them, because your focus is on other things, but they do exist and they do have distinctive editing signatures that one learns to spot after years of patrolling IP edits. Chuck Entz (talk) 06:21, 6 March 2023 (UTC)Reply[reply]
I don't trust that Fête has grown up one bit. — SURJECTION / T / C / L / 07:59, 6 March 2023 (UTC)Reply[reply]

Arabic presentation forms edit

I've come across links from Wikipedia to ‎, , and . These are isolated presentation forms of ۇ‎, ۋ, and ى, respectively, whose entries include character info boxes listing that forms. The first two links, however, give 404. Seeing that presentation forms for some Arabic letters have redirects like the third one, and that there even seems to be rcat template {{R character variation}} exactly for this kind of redirects, I thought I'd create the missing two for ﯗ and ﯞ, but it turns out page titles matching .*[\x{FB50}-\x{FBB1}\x{FBD3}-\x{FDC7}\x{FE70}-\x{FEFC}].* are disallowed. Yet, some redirects matching it like clearly exist. I think it would be natural for any single-letter entry with more than one character info box on it to have them all redirecting to it. What is the policy on Arabic presentation forms? –mwgamera (talk) 07:39, 6 March 2023 (UTC)Reply[reply]

@MwGamera:
Wiktionary:Beer parlour/2020/May § why are the unicode Arabic Pedagogical symbols blacklisted?
Wiktionary:Beer parlour/2021/August § Deleting "Hangul syllable" entries
There doesn’t really need to be a policy since we cover the language and are not a Unicode database. In the latter linked thread, RichardW57 (talkcontribs) accurately called Arabic presentation forms dead waste, not actually part of any language but a technical nuisance you would need a specific reason for to include—there wasn’t any reason other than this consideration to delete existing redirects either, so it is logical that we are inconsistent. Unresolved wilderness is also natural, often, rather than consistency. Fay Freak (talk) 14:13, 6 March 2023 (UTC)Reply[reply]
So I understand it works as intended? Some consistency would be nice. Dictionaries should sort out the mess nature creates, not multiply it ;) And sure, I can't imagine these being anything else than redirects, but it's not that obvious they shouldn't exist at all as we have single-letter entries which prominently list their alternative encodings and calling these entries parts of a language is already a bit of a stretch. Thanks for the links and explaining the current status quo anyway!
Btw, where do I vote on removal of red links from the character info that was previously mentioned? Red links are invitation to contribute and shouldn't point to names that are not supposed to exist. mwgamera (talk) 00:59, 7 March 2023 (UTC)Reply[reply]
It would be possible to redirect terms containing presentation forms with JavaScript if we had a map from presentation form to regular shape-changing letters. I haven't found such a thing myself, but it's probably out there somewhere. — Eru·tuon 01:14, 2 April 2023 (UTC)Reply[reply]

Inclusion of italics in head edit

Should they be added? Some examples are “abc conjecture”, “ad valorem tax”, “EverQuester or EverQuester”, “in terrorem clause”, “k-cell”, “Palko test”, “p-adic order”, “r/K selection theory”, “uno flatu”, “Zelda-like or Zelda-like (sense 2)” (which started this discussion: Talk:Zelda-like). J3133 (talk) 10:04, 6 March 2023 (UTC)Reply[reply]

“Head” here means {{en-noun}}, etc., which displays it in bold. J3133 (talk) 11:13, 6 March 2023 (UTC)Reply[reply]
Wikipedia does this in articles on certain topics, but I have found this can't be repeated on their disambiguation pages. DonnanZ (talk) 10:54, 6 March 2023 (UTC)Reply[reply]
Were the entries only in English, then yes, at the very least for creative work terms. The use of italics for foreign terms that are not entirely localized is inconsistent, but the use of italics for book or movie or video game titles is very standard. That said, a term could be multi-lingual and the conventions of that language may not be to italicize a certain thing, so I'm inclined against it for the page title, but in favor of using it in the text of entries as appropriate. —Justin (koavf)TCM 11:04, 6 March 2023 (UTC)Reply[reply]
None of those entries have a double header (except for EverQuester which you created two days ago) which what I think looks strange. Ioaxxere (talk) 13:23, 6 March 2023 (UTC)Reply[reply]
@Ioaxxere: I do not think it looks strange. Wade-Giles has “Wade-Giles or Wade–Giles”. Perhaps the strangeness is that the words have more than one form, which itself is unusual (therefore both forms are included); otherwise, I do not see another way, unless you have a suggestion? J3133 (talk) 13:45, 6 March 2023 (UTC)Reply[reply]
I've removed the second head from Wade-Giles. It's a typographic difference, not a legitimate alternative form (which belong at other pages anyway). Ultimateria (talk) 01:10, 15 March 2023 (UTC)Reply[reply]
@Ultimateria: The page after your edit does not indicate that the dash is a valid alternative to the hyphen. The dash form was the only one included from 6 April 2022 (@Chuck Entz) to 5 March 2023—now it is the opposite. Also pinging @Geographyinitiative. J3133 (talk) 01:20, 15 March 2023 (UTC)Reply[reply]
I have no effin clue what should be done on this question, not even .0001% of a clue. Wikipedia uses Wade–Giles, Wikimedia Commons uses Wade–Giles, but Wiktionary's cites show that Wade-Giles is the form in actual use, lulz. --Geographyinitiative (talk) 01:24, 15 March 2023 (UTC)Reply[reply]
@Geographyinitiative: Wikipedia follows its own Manual of Style, which mandates the en-dash when the two linked terms aren't a morphological compound or blended into one (i.e., it's a system ascribed to Wade and to Giles, not by one person with the surname Wade-Giles). In practice Wikipedia follows this rule rather more strictly than most publishers; for example their en-dash in Polish–Lithuanian Commonwealth is very rare in published works. It's not difficult to find Wade–Giles printed with the en-dash, though, this example was on the second page of Google Books results for an intext search for "Wade-Giles" for me. —Al-Muqanna المقنع (talk)
The problem with including it as an alternative is that it's not specific to this term but potentially any hyphenated term. I don't find it necessary on this or any other entry to "indicate that the dash is a valid alternative to the hyphen". Al-Muqanna's point about the two surnames is interesting, but that's covered in the etymology of this page at least. If someone thinks it's important, feel free to revert. Ultimateria (talk) 02:06, 15 March 2023 (UTC)Reply[reply]
@Ultimateria: I have changed Wade–Giles from a redirect to an alternative form per other terms in Category:English terms spelled with –: e.g., Trans–New Guinea was changed from a redirect by Equinox. If you think there should be a vote, then I will make one. J3133 (talk) 12:02, 15 March 2023 (UTC)Reply[reply]
Also, this is consistent with the other entries using italics such as “Palko test”. Providing only one form would be incorrect (e.g., if “Palko test” was also used). J3133 (talk) 13:49, 6 March 2023 (UTC)Reply[reply]
I.e., excluding the form using italics would be inconsistent if the italics were kept in other entries, because it would be misleading—indicating that italics are not used. J3133 (talk) 14:00, 6 March 2023 (UTC)Reply[reply]

Dates edit

What is our policy on dating quotations and what do we think it should be? Should it be in a YYYY-MM-DD format or in either the standard British or American formats and should we be including the month and day at all? I ask because the dates for the citations for knob were changed yesterday so that the months and days were removed, though when I brought it up in a user’s talk page they did partially change it back to include the months. I don’t want to single any particular editor out, though anyone reading this can easily find out the specifics for themselves, but I think that a clear and official policy on the issue would be welcome as we should aim to be consistent. Overlordnat1 (talk) 14:26, 6 March 2023 (UTC)Reply[reply]

I just do it arbitrarily. Where does it end? You could ask about policy to order the parameters of quotation templates and then templates general and at some point it is no fun any more. I think one can’t make a reliable rule on it, people would vote on it arbitrarily and subjectively too and nothing would be gained plus one would waste time to correct rule violations. Fay Freak (talk) 14:30, 6 March 2023 (UTC)Reply[reply]
Personally, I provide whatever information is available in or about the source. If a full date is known I provide it, and if a month and year is known I provide them. As for the date format, the quotation templates currently display the year, followed by the month and date if provided. It’s helpful to have the month spelled out as a word, I think, to avoid ambiguity. — Sgconlaw (talk) 14:36, 6 March 2023 (UTC)Reply[reply]
I agree. Vininn126 (talk) 14:47, 6 March 2023 (UTC)Reply[reply]
I also agree. Perhaps we should create a policy that states that more specific dates should never be altered to less specific ones unless there is very real doubt as to the accuracy of the more specific date? --Overlordnat1 (talk) 14:54, 6 March 2023 (UTC)Reply[reply]
I should also add that full dates are very useful, if not essential, when trying to verify quotations from serial publications such as magazines and newspapers. It makes little sense to indicate an article from a daily newspaper as having been published in "1950". — Sgconlaw (talk) 15:00, 6 March 2023 (UTC)Reply[reply]
@Overlordnat1 forgot to mention that this was specifically about {{quote-book}} dates. I agree that for regularly published material (magazines, newspapers etc) we should always include the full date, I simply question the usefulness of specifying the day of publication of books. Jberkel 15:04, 6 March 2023 (UTC)Reply[reply]
I think full dates are less critical for books. Nonetheless, if the information is provided I generally just add it. It can help in arranging quotations chronologically if there are two works published in the same year, but again this isn't a biggie. If full dates aren't available in such a situation, i just arrange them alphabetically by the authors' surnames. — Sgconlaw (talk) 15:08, 6 March 2023 (UTC)Reply[reply]
There are occasions where a new edition is printed in the same year, I suppose. Vininn126 (talk) 15:09, 6 March 2023 (UTC)Reply[reply]
My personal policy is to only add publishing info as stated in the frontmatter, since Google's metadata is frequently wrong, and it's pretty rare for a month or day to be listed there for a book. I wouldn't remove it if someone's added it though. —Al-Muqanna المقنع (talk) 15:15, 6 March 2023 (UTC)Reply[reply]
Yes, I generally also use data that is published in works and don’t use Google’s metadata either, but I have sometimes accepted what is stated in a Wikipedia article about a work in good faith. — Sgconlaw (talk) 22:08, 6 March 2023 (UTC)Reply[reply]
Virtually all the quotes I add now use {{quote-journal}}, where dates are critical. They are from magazines both new and old, the old ones are ones I bought second-hand years ago; finding their contents on the Internet is highly unlikely. DonnanZ (talk) 09:14, 7 March 2023 (UTC)Reply[reply]

RFVE Mass Closures (Again) edit

A continuation of Wiktionary:Beer parlour/2023/February § Disallowing mass closures. After being told by multiple folks to slow down on the mass closures due to multiple instances of not following CFI, @Ioaxxere is only continuing and keeps closing entries against CFI. The most egregious example that was brought to my attention was with bigenital surgery, where they passed an entry with two links to white supremacist/fake news websites; two sites that have never ever been accepted for CFI. This made me go through several RFVs yet again, only to see that they've passed entries like MAMAA, and have had multiple back-and-forths with entries like antijapanese, praecognita, and Falklands Fritillary Butterfly. They've also been closing RFVs almost as soon as the entries hit the month point, which, while it's the written guideline, is usually pushed out so that folks have enough time to go through them (though the warning is appreciated). This is especially concerning because of the high amount of RFVs that they've closed and then archived, leaving possible entries that would've passed or failed RFV if given enough proper review out in the wind. An example of this is y'all'd'nt've, which still does not have cites, even though I explicitly pointed out that it'd need cites. (Resolved) At first I had thought that this is just an example of inexperience, but the fact that they keep continuing and these problems keep coming up is very problematic, especially with the bigenital surgery one. Something needs to be done. Pinging @Theknightwho, @-sche, @WordyAndNerdy, @Benwing2, @Chuck Entz AG202 (talk) 15:42, 6 March 2023 (UTC)Reply[reply]

Okay, I'm experienced enough to admit my own inexperience. I will stop passing RFVs (for at least the rest of the month), and if I think an RFV should pass I'll ping you (or any other editor who closes RFVs). By the way, the cites for y'all'd'nt've are on the lemma entry. Ioaxxere (talk) 16:09, 6 March 2023 (UTC)Reply[reply]
I'll move those cites and edit my initial comment, apologies for that, and thanks. AG202 (talk) 16:10, 6 March 2023 (UTC)Reply[reply]
"Lemma" doesn't mean "the main spelling of several alternate spellings": it means the main form of an inflected word, like dog for dogs. Alt spelling citations should go at the correct spelling, the one they actually provide evidence for. Equinox 16:18, 6 March 2023 (UTC)Reply[reply]
@Equinox Some people argue that alternative spellings are not themselves lemmas, even though they may get inflected. Personally, I think there's a hierarchy of lemmas, but I can envisage a demand to categorise non-Roman script Pali words and stems as forms rather than lemmas. --RichardW57m (talk) 16:13, 10 March 2023 (UTC)Reply[reply]
@RichardW57m: I think this is a foolish argument because we could create either "color" or "colour" as an alt form of the other one, depending on whether we felt more British or American (or other places). Clearly a "base word" is a lemma, even if it isn't our favourite one. But a conjugated form like "coloring, colouring" is not. Equinox 04:27, 13 March 2023 (UTC)Reply[reply]
I wouldn't try giving the community a deadline ("for at least the rest of the month"). Clearly some experienced users find your standards for closing RfVs questionable. IMHO, although the size of RFVE is a problem, it is not more of a problem than premature closure of RfVs. DCDuring (talk) 16:46, 6 March 2023 (UTC)Reply[reply]
It's not a deadline, I just thought it wouldn't be believable to declare "I will never do X for as long as I live". By the way, it's not the size of RFV that's the problem but rather the fact that RFVs are created and immediately forgotten about. Before I starting going through them, there were literally hundreds of uncited terms (including hoaxes) in the mainspace with no one interested in clearing them out. Ioaxxere (talk) 17:15, 6 March 2023 (UTC)Reply[reply]
Also, what's the time I need to wait before failing an RFV if not a month? I am following the guideline "After a discussion has sat for more than a month without being “cited”, or after a discussion has been “cited” for more than a week without challenge, the discussion may be closed." Ioaxxere (talk) 17:27, 6 March 2023 (UTC)Reply[reply]
The time to close a difficult RfV is a matter of judgment. Judgment is largely achieved by learning from relevant experience. Also one needs a repertoire of means of resolution of RfVs. Sometimes the best thing to do is to try to find cites. Sometimes one should see what other dictionaries do (by using {{R:OneLook}} or OED, the latter by asking at RfV for help). Maybe the definition could be reworded to be more citable. Maybe it would be good to ask for help in finding cites from durably archived sources other than Google Books and News. Maybe one could guess at who might have an interest in the particular definition.
Maybe it just isn't that big a deal to let the RfV go for a while longer. DCDuring (talk) 17:38, 6 March 2023 (UTC)Reply[reply]
What about the time to fail an "easy" RFV (no hits anywhere)? Ioaxxere (talk) 18:27, 6 March 2023 (UTC)Reply[reply]
"No hits anywhere"? There are many sources that are not found by general Google web searches. Examples are regional terms and dated terms, but there are many types of terms that need a lot of love to find support. If no one is interested enough to do the work after some longish period, then I suppose we have to let it go, saving the record to the entry's talk page and any cites (even those not found durably archived or supporting a definition different from the challenged one) to the citations page. DCDuring (talk) 19:04, 6 March 2023 (UTC)Reply[reply]
It occurs to me that perhaps you could ask for advice from User:Kiwima who worked hard at citing and closing RfVs for 4-5 years. DCDuring (talk) 17:44, 6 March 2023 (UTC)Reply[reply]
When I had taken on the task of keeping the RFV list up-to-date, I never failed an RFV without personally doing a search for cites. It's time consuming, but tends to avoid closing entries that are simply uncited either because they are difficult (e.g. too many false hits because of a more common alternate definition) or that no one is particularly interested in. Kiwima (talk) 18:46, 6 March 2023 (UTC)Reply[reply]
There’s also the fact that you need to judge whether you’re the best person to be doing the closure. For example, regionalisms or historical terms may be something that another user is much more knowledgeable about. Theknightwho (talk) 19:11, 6 March 2023 (UTC)Reply[reply]
Yeah, agreed, this also tends to be a reason why RFVs are pending for a long time: with early modern English, for example, there are a fair number of cases where a word or sense is probably attestable, but needs legwork that most people don't have the time or inclination for. It's true that a term that failed RFV can be re-created, but failing the RFV removes the term from to-do lists and it becomes much less likely anyone will ever put in that work. —Al-Muqanna المقنع (talk) 01:04, 7 March 2023 (UTC)Reply[reply]
Exactly. Much of Hansard (100m+ words) is not directly searchable from Google - though is easily searchable from the official website - and the same goes for a lot of legislation or law reports, where it’s simply a matter of knowing where you need to search. Nevermind the fact that archives of material are often (unfortunately) behind a paywall that another user may have access to. I’m reminded of the case of plantage, where I was only able to cite it because I have access to Westlaw.
In some cases, these archives will have tens-if-not-hundreds of millions of words of material, and it’s a shame that we don’t have our own archive of resources as to where to start looking, frankly. Perhaps we should use this as the impetus to make one, because this stuff all counts as being durably archived! Theknightwho (talk) 17:19, 7 March 2023 (UTC)Reply[reply]
FWIW, there is a list at Wiktionary:Searchable external archives which could be expanded; unfortunately, I don't know what we could do to publicize it any further, it's already mentioned in the header of WT:RFV, but I don't recall noticing it until many years after I first started editing, and evidently you didn't read or see it in that boilerplate either, heh. Maybe we could link to it in a notice that displays when someone goes to edit an RFV page (like the notices Wikipedia uses on certain contentious pages), but I don't know if people who use visual editor would ever see that. I suppose if we wanted to make it really prominent we could expand the {{rfv}} and {{rfv-sense}} templates to say something like "(check these sources)". - -sche (discuss) 23:34, 7 March 2023 (UTC)Reply[reply]
I think adding something to {{rfv}} and maybe also {{rfv-sense}} that links to Wiktionary:Searchable external archives or similar (personally, I think a design akin to Wiktionary:Corpora is superior, though admittedly, not user friendly) is a very good idea. Kinda like how {{rfi}} links to Wikimedia Commons. Another thing is mentioning the Wikipedia Library which can let you access great corpuses like the British Newspaper Archive, Newspapers.com, and NewspaperARCHIVE.com . —The Editor's Apprentice (talk) 00:43, 8 March 2023 (UTC)Reply[reply]
For the record, another instance of a bad closure in mind is cephalophore as a term related to fungi. 98.170.164.88 explicitly mentioned that they thought the term was citable and linked to a Google Scholar search with a few good hits. In my own search, I found these two papers as well: [1], [2]. Nonetheless, Ioaxxere closed the discussion with the determination that the term failed noting no results at all for "cephalophore mushroom" etc. which didn't really address the previous discussion, such as 98.170.164.88's search nor the fact that the term's context is around fungus/fungi rather than mushroom. In this case it has been a few months since the discussion started, so the timing wasn't the problem, but instead the approach.
I'll also say I'm still glad to have Ioaxxere as a fellow editor responsible for entries like grounders or senses like climber (structure on a playground designed to be climbed on) though those could probably benefit from a larger time span of cites/some non-web cites. I think Kiwima's statement about not closing RfVs without first giving it a real go yourself as well as the other comments under that are a good advice for how to improve things going forward. —The Editor's Apprentice (talk) 00:35, 8 March 2023 (UTC)Reply[reply]
@The Editor's Apprentice In that discussion User:Chuck Entz and 98.* noted that the term might be attestable in a different sense, but the RFV was only for that specific sense. By the way, if you like non-web cites check out the quotations on devilfish. Ioaxxere (talk) 04:19, 9 March 2023 (UTC)Reply[reply]
It's (or at least it was) fairly common for editors to change the definition after discussions like those, so that the cites can fall under one definition. AG202 (talk) 12:37, 9 March 2023 (UTC)Reply[reply]
Thanks for the reply @Ioaxxere I get your reasoning and will echo what AG202 said. On another note, I want to apologize for the note about web cites. It was an unnecessary comment, especially given your work with other terms like devilfish, as you point out. Overall, it made the message a backhanded compliment. I'll leave minor and irrelevant criticism aside in the future when I'm trying show my appreciation to you and other editors. —The Editor's Apprentice (talk) 19:54, 10 March 2023 (UTC)Reply[reply]

We were blessed with a new frequency list today. This one includes collocations, as well as lots of unwanted crap, but there are plenty of missing entries that we would be delighted to include. I made a list of a small selection of semi-glaring omissions at Wiktionary talk:Frequency lists/English/Wikipedia (2016)/10001-20000 Van Man Fan (talk) 20:58, 6 March 2023 (UTC)Reply[reply]

A lot of these are not entry-worthy, but it might be interesting to build a "collocation suggester" with this data. – Jberkel 21:09, 6 March 2023 (UTC)Reply[reply]
@Jberkel I suspected as much, I was kinda disappointed with the amount of dross - I might try again with more of a mix of sources, which has worked better for some languages than others.
P.s. if you are thinking of building a collocation suggester, I recommend going to the original source, since a lot of the work has already been done - though it wasn't directly relevant to what I am using the data for. Even better, it's under a suitable licence. Helrasincke (talk) 09:53, 19 March 2023 (UTC)Reply[reply]

Coverage of Sign Languages edit

Though it was thriving at one point, our coverage of Sign Languages has fallen to the wayside in recent times. There's a strong need for more robust entries, but the barrier to entry is very high currently. Sign language entries either follow very complex entry names like 5@NearInsideNosehigh-PalmBack-5@NearInsideNosehigh-PalmBack 5@NearInsideNeckhigh-PalmBack-5@NearInsideNeckhigh-PalmBack or use Signwriting in entry names like 𝡌𝪛𝨒𝤆 or 𝣷𝪜 𝤃𝪜 𝣜𝪜 which isn't encoded the way it should be in Unicode (should also be vertical, which I've attempted to recreate in my own common.css). I myself have been trying to create the ASL entry for BODY as seen at BODY at Handspeak, but being that it's a multi-move sign, it gets very complex. The first entry type would be OpenB@Chest-PalmBack-OpenB@Chest-PalmBack Contact Contact OpenB@Abdomen-PalmBack-OpenB@Abdomen-PalmBack based on WT:AASE, and then the second type with Signwriting would be even more complex with something like 𝡚𝪧𝡚𝪡𝤅𝤅𝤪𝪛𝪤𝤪𝪤𝤅𝤅, but it doesn't feel the best. This is before we start getting into nonmanual markers and differences between for example, SNOW vs ¡SNOW!(chhh) (which can mean "blizzard/snowstorm", nonmanual marker depending on the speaker). Thus, there needs to be more clarity on it, especially since a lot of main contributors haven't been active in years.

On a second point, there's also a lack of proper family/etymology coverage for Sign Languages. (Old) French Sign Language is almost universally accepted as an ancestor of ASL, but it's not set as an ancestor here, nor is Old French Sign Language even an etymology-only language. Category:French Sign Languages only has three sign languages, and is missing many more. This makes it more difficult for future sign language coverage (as we can't show signs coming from ancestor languages). For these changes, I plan on going through at least the ASL subfamily on my own at some point, though support would be appreciated. Pinging @Rodasmith, @Numberguy6, @Msh210 AG202 (talk) 05:39, 7 March 2023 (UTC)Reply[reply]

Thinking out loud a bit, this seems like a topic that would really benefit from specialists (even moreso than most languages), and would probably be a good application for a grant. —Justin (koavf)TCM 07:39, 7 March 2023 (UTC)Reply[reply]
@AG202, Koavf Agree on all counts with the points both of you are making. I have passing familiarity with ASL and I know it's very different from your typical spoken language due to the way signs work: hands cannot move as fast as the mouth, so to make up for this the individual signs encode a lot more information than phonemes do. Sign-language research seems much less developed than spoken-language research and most linguists do not focus on sign languages, with the result that (AFAIK) there isn't even a universally accepted way of symbolically representing signs. Benwing2 (talk) 23:45, 12 March 2023 (UTC)Reply[reply]
If you're motivated to work on it, I'm a grant writer and I have a personal interest in sign language documentation, so I'd be happy to collaborate on asking for funds, reporting, etc. —Justin (koavf)TCM 23:47, 12 March 2023 (UTC)Reply[reply]
Alas, I don't have time to focus on sign languages currently but maybe someone else will; I didn't realize you are a grant writer. Benwing2 (talk) 05:04, 13 March 2023 (UTC)Reply[reply]
I believe the records indicate that ASL was partly relexified with FSL, but not actually descended from it. But in general the evidence for genealogical relationships among sign languages is extremely poor, and often family proposals are little better than guesswork. If we don't expect "FSL family" to mean anything more than "contains significant FSL vocab" (the way for example English would be a Romance language, and Japanese both a Sinitic and a Germanic language), then listing ASL in the FSL family would probably make navigation easier. kwami (talk) 05:40, 13 March 2023 (UTC)Reply[reply]
@Kwamikagami My understanding was that ASL descends directly from Old LSF (Old French Sign) with a mix of native languages. At least that's what the works that I've looked at say. For example, A Historical and Etymological Dictionary of American Sign Language (2015, published by Gallaudet Uni Press) (intro linked here), states, "Since some of the first generations of ASD students were from the island, it is likely that a number of Martha's Vineyard Sign Language (MVSL) signs were incorporated into ASL, though probably less than is typically assumed." And estimates that 20% of the 300 MVSL signs documented were cognates with ASL signs, but it's unclear which loaned to which one. But it also makes it clear that there's a link from Old LSF to ASL and that the latter inherits from the former. Thus, with this + other sources, I'd say that it's clear enough to have ASL have Old LSF as an ancestor and be a part of its language family. AG202 (talk) 23:05, 14 March 2023 (UTC)Reply[reply]
From my understanding, Clerc couldn't understand the students and needed to learn something of their language. The original students mostly spoke 3 village SL's, with MVSL numerically dominant. I don't know if the result was a mixture, but the basis of ASL was not FSL. Especially since Clerc would have been mostly teaching them vocab: nearly all the grammar would have been whatever the students converged on, as they were already fluent, and there was only one speaker of FSL but dozens of MVSL. They certainly adopted a lot of FSL words, but for oral languages that would be considered secondary. kwami (talk) 00:15, 15 March 2023 (UTC)Reply[reply]
Is it possible that you could provide sources? Not denying what you’re saying, but from what I’ve seen, there’s more evidence to prove that it does in part come from Old FSL, at least what we have access to. And re: MVSL, the above says: “No more than four students from Martha’s Vineyard were present at ASD at the same time until the 1850s and 1860s, when their attendance peaked at around twelve students (Annual Report 1887)”. This is not as much as expected, and if there are only truly 60 cognates documented cognates between MVSL & ASL vs the many many cognates of LSF, then I’d be hard-pressed to see how the grammar + syntax would be as heavily impacted as well by MVSL. AG202 (talk) 00:24, 15 March 2023 (UTC)Reply[reply]
I don't recall the ref for Clerc not being able to understand the students. Something somewhere in what he himself wrote, if I remember correctly, rather than from Gallaudet.
May've been wrong about the number of MVSL-speakers. But when you bring deaf children together, they develop their own language. Yes, FSL will be the lexifier, bus as with a creole, not likely to be the basis of the grammar, because the students are already fluent in their own grammar. FSL wasn't transplanted here and then impacted by contact with other SLs, rather, children speaking those SLs were taught FSL in by a single teacher but most of their interaction, reinforcing the grammar they used, would've been with each other. This is quite a common occurrence with SL's, and in general SL 'families' are not going to be structured the same way as oral-language families, where it's native speakers who diverge from each other over time. kwami (talk) 00:34, 15 March 2023 (UTC)Reply[reply]
Thank you so much for the info. I don't have any personal experience with grant writing, but I'd be happy to collaborate where I can. AG202 (talk) 13:12, 13 March 2023 (UTC)Reply[reply]

Requesting rollback edit

I started checking Special:RecentChanges regularly recently to revert vandalism. I'd like to request the rollback right since some people make several edits that should be reverted in one go. I'm generally conservative and only revert if I'm sure. I sometimes point questionable edits out on Discord. Thanks! -- tbm (talk) 07:41, 8 March 2023 (UTC)Reply[reply]

Approved by @Fenakhay. Vininn126 (talk) 09:03, 8 March 2023 (UTC)Reply[reply]

Northern Kurdish alphabet edit

Hi, everybody.
I wanted to ask: what is the policy on Latin-script Northern Kurdish entries? Are we supposed to employ the base Hawar alphabet? the version including ⟨ḧ ẍ '⟩ for /ħ ɣ ʕ/? Should we follow the system listed at Wiktionary:Kurdish transliteration? Thanks in advance for any input on the subject. — GianWiki (talk) 17:28, 8 March 2023 (UTC)Reply[reply]

@GianWiki AFAIK, Northern Kurdish is by default written in Latin script, so we need to follow whatever the actual usage is. If you have a Northern Kurdish dictionary that lists terms in Arabic script, it's undoubtedly outdated, and you should use a different one if possible. You may have to do some research to find out what the current usage is. I do see Wikipedia's article Kurdish alphabets, which mentions the base Hawar alphabet and in addition the ⟨ḧ ẍ '⟩ that Celadet Alî Bedirxan proposed using. I would (a) look to see what current entries do; (b) look at the "Kurdî" Wiktionary (which is Kurmanji/Northern Kurdish) at [3]; (c) consult with native Northern Kurdish speakers. I don't know if any are very active currently at Wiktionary but I know I spoke with some when I split "Kurdish" into Northern Kurdish and Central Kurdish. You can find the discussion of this split in Wiktionary:Beer parlour/2020/September#Kurdish and Wiktionary:Beer parlour/2020/October#Remaining Kurdish lemmas, where I spoke with @Balyozxane, Calak, Şêr and also with @Vahagn Petrosyan (who is Armenian but may be able to help with Kurdish). Benwing2 (talk) 23:33, 12 March 2023 (UTC)Reply[reply]
@GianWiki: there is no formally agreed policy, but in practice we use the base Hawar alphabet. <ḧ ẍ '> as well as the aspiration symbol <’> should not be included in the pagename, but they should be shown in the headword line using the parameter head= as in antêx. Vahag (talk) 15:00, 13 March 2023 (UTC)Reply[reply]
I see. Thank you very much for your help! — GianWiki (talk) 16:50, 13 March 2023 (UTC)Reply[reply]
There's something I forgot to ask: aside from ⟨ḧ ẍ ' ’⟩, should the head= parameter also show a particular character for the trilled /r/? I saw the transliteration chart uses ⟨ř⟩, while the Ferhenga Birûskî: Kurmanji–English Dictionary uses ⟨r̄⟩. Is one of them preferable to the other? — GianWiki (talk) 17:31, 13 March 2023 (UTC)Reply[reply]
You should show the trilled r in the headword. I don't think we ever discussed which symbol is preferable. Vahag (talk) 18:47, 13 March 2023 (UTC)Reply[reply]
Many thanks for the advice; you've been extremely helpful! — GianWiki (talk) 16:51, 13 March 2023 (UTC)Reply[reply]

Proto-Italic/Proto-Hellenic IPA edit

Would anyone be opposed to a mass removal of unsourced IPA from Proto-Hellenic and Proto-Italic entries? These have been added by one IP-hopping anonymous editor and the quality is questionable. Are there any other protolanguages with this same problem? — SURJECTION / T / C / L / 21:05, 8 March 2023 (UTC)Reply[reply]

(Yes, kill, they're wrong, thank you.) Catonif (talk) 21:09, 8 March 2023 (UTC)Reply[reply]
I don't oppose that. In general, I see no justification for including IPA on proto-language entries: since these words are not attested in writing, their spelling itself is typically phonemic. IPA or narrow phonetic transcriptions are unnecessary and often debatable.--Urszag (talk) 04:10, 9 March 2023 (UTC)Reply[reply]
Yeah, I would support not having reconstructed IPA pronunciations without a source (as some PIE entries have, like *h₂éwis). If there are no objections within the next few days, I'll have a bot job remove ita-pro and grk-pro IPAs. — SURJECTION / T / C / L / 12:18, 10 March 2023 (UTC)Reply[reply]
Remove them for Proto-Celtic and Proto-Brythonic as well. --– Sokkjō 06:12, 16 March 2023 (UTC)Reply[reply]
Absolutely, and block the IP if they reverts any deletions. --– Sokkjō 05:25, 12 March 2023 (UTC)Reply[reply]
@Surjection Yes, please remove the IPA. I'm generally opposed in any case to IPA attached to reconstructed languages for the reasons enumerated by User:Urszag. I would even argue we should remove the IPA from Proto-Germanic, because it doesn't seem to add a lot compared with the spelling and may not represent a consensus (and if it's kept, it should DEFINITELY be generated by a pronunciation module rather than hard-coded manually). Benwing2 (talk) 23:39, 12 March 2023 (UTC)Reply[reply]
I don't think that IPA is a good idea for reconstructed languages, unless the reconstructions actually use IPA. People are going to read IPA transcriptions as indicating pronunciation, which will be misleading. Any orthography is likely to mislead people that way, but using IPA is likely to make it worse. Certainly if it's appended to the reconstruction, as if to say "this is what it really sounded like," that would be problematic. kwami (talk) 05:48, 13 March 2023 (UTC)Reply[reply]
I think having a phonemic transcription for proto-languages that use an obscure orthography is not a bad idea. Karen kalantari (talk) 05:31, 16 March 2023 (UTC)Reply[reply]
What you're suggesting is that we add a second orthography. It won't be any more phonemic than the first one. At that point, IMO it would be simpler to use a reformed orthog that is more accessible. kwami (talk) 05:52, 16 March 2023 (UTC)Reply[reply]
That's a good solution, or at least talk about the orthography in the "about: proto-language" section. some languages like proto-turkic lack this. Karen kalantari (talk) 06:38, 16 March 2023 (UTC)Reply[reply]
If an orthography is obscure, i.e. limited to one or a few researchers, I think it might be reasonable to transliterate it into a more international system. But if it's the norm it its field, then we will presumably want to stick with convention because that's what most RS's will be using. (That would be different form the IPA, where we generally normalize transcriptions, because the IPA has internationally accepted values. Reconstructions in general do not.) I'm doubtful about the utility of having multiple transcriptions. I suspect it many cases that would only cause confusion.
In some cases, reconstructions are intentionally agnostic. (E.g. capital letters that could mean almost anything.) Also, researchers may disagree as to what sound *G was. In such cases, using IPA could give a wrong impression of phonetic precision or of agreement among scholars. kwami (talk) 06:45, 16 March 2023 (UTC)Reply[reply]

POS headers / headword lines edit

I'm currently in the process of recreating the entry stats for Wiktionary, and came across some inconsistencies while working on the parser: According to WT:EL, there's always one headword line per POS header.

Each entry has one or more POS sections. In each, there is a headword line

For two nouns, this would look like:

===Noun===
{{head|xx|noun}}

# def

===Noun===
{{head|xx|noun}}

# def

However, some entries group headwords under a single POS header:

===Noun===
{{head|xx|noun}}

# def

{{head|xx|noun}}

# def

Am I interpreting WT:EL correctly? What should be the standard formatting? From a parsing perspective, the first option is easier. Jberkel 10:54, 10 March 2023 (UTC)Reply[reply]

Can you give concrete examples? In cases like English denier and Turkish melemen, the two meanings have different etymologies and are in different Etymology sections. Are there nouns that are homographs and have a common etymology, yet are in some sense distinct rather than a single noun with two distinct senses?  --Lambiam 00:08, 12 March 2023 (UTC)Reply[reply]
@Jberkel I have encountered the second style above occasionally and I view it as purely an error, and always correct it by duplicating the header above the second headword to make it look like the first style given above. The second style is not common and I think it occurs either because people aren't familiar with WT:EL, or because it's a holdover from several years ago before WT:EL got solidified. Benwing2 (talk) 23:16, 12 March 2023 (UTC)Reply[reply]
@Lambiam: Yes, at least given what we have on Wiktionary at the moment.
  1. There are dhātu f and dhātu m (root of a word), though personally I think only the masculine form is borrowed rather than inherited. Even if the feminine is chiefly inherited, it is also used to mean 'root of a word', as well as in a range of other meanings.
  2. More securely, there are palāsa m or n (leaf), palāsa n (foliage) and palāsa m (bastard teak). The latter is so called because of its red petals; in Sanskrit the word includes the meaning 'petal'. The first two senses would benefit from an assembly of quotations to confirm the associations of gender, number and meaning. There's also an adjective sense under the same etymology, palāsa (green).
For English, we might wind up with monosyllabic cafe in the same section as disyllabic cafe. However, the pronunciation difference might lead to separate etymologies! --RichardW57m (talk) 17:40, 13 March 2023 (UTC)Reply[reply]

Frequency information edit

Discussion moved from Wiktionary:Tea_room/2023/March.

I recently found (and made a template for) {{R:pl:SFPW}}, and I am thinking about using this somehow. Sadly, it's from 1990, which is a little dated, so some things might have changed, but it should still be interesting for people. I am considering making a template, something like {{pl-freq 1990}}, which when given certain parameters would print various information about the frequency of the given word automatically.

Question 1: Has frequency information like this ever been documented in the mainspace? The closes I've seen is the information on surnames Question 2: What section should this go under? Currently surname information is listed under the non-standard header "Statistics", but I think this and frequency information should be put under the header "Trivia", which is a header listed on WT:ELE. Vininn126 (talk) 00:06, 10 March 2023 (UTC)Reply[reply]

I have included an example of this on the page sprawa. If anyone thinks it should be done differently, please let me know. Otherwise I would like to set this as the standard for such things in the future. Vininn126 (talk) 11:26, 10 March 2023 (UTC)Reply[reply]
This is how I would rewrite the template for clarity and concision:
The Słownik frekwencyjny polszczyzny współczesnej (1990) found sprawa to be the 47th most common word in Polish, appearing 77 times in scientific texts, 243 times in news, 335 times in essays, 114 times in fiction, and 114 times in plays, totaling 883 uses.
I've moved the frequency up, removed the factoid "one of the top 10,355" (obviously if it's 47th), and removed the size of the corpus since it didn't seem relevant to the frequency of any particular word. I recognize that the latter two changes are a matter of personal taste. I think the implementation is fine; the entry looks great overall. Ultimateria (talk) 00:58, 15 March 2023 (UTC)Reply[reply]
I think the size of the corpus is hugely important when presenting this numbers, so people can do the math themselves. Vininn126 (talk) 11:12, 15 March 2023 (UTC)Reply[reply]
Why does it still say "one of the top 10,355"??, this isn't useful information.
The breakdown of numbers by different genre is also quite useless, as we don't know the breakdown of the totals.
If anything, only parameters 6 and 7 should be included, this will also help ease of reading, as a list of numbers is a little bit hard to digest. Something like "in a corpus of ... words, ... appeared ... times, making it the ... common word" itd 85.255.237.74 04:28, 2 June 2023 (UTC)Reply[reply]
1) How is that not useful?
2) How is this not useful either? Vininn126 (talk) 09:19, 2 June 2023 (UTC)Reply[reply]
1) As already mentioned by Ultimateria, if you say a word is the 147th most common word, then that it is in the top 1234567 words is already known, hence not useful.
2) When I say that "in a corpus of Y words, word A appears X times", this gives me useful information about the frequency of the word appearing, I can expect the word to appear X/Y of the time, as a fraction. But notice that if you omit Y, X loses all meaning. Similarly, you might be interested in the ranking "word A is the 147th most common word". At the moment there is information like "word A appears 3874 times in science etc." ommiting the total number of words in the science corpus (!). This only makes sense when you make some assumptions about the proportion of the corpus of Y words is made up of science words. 82.46.123.120 14:19, 2 June 2023 (UTC)Reply[reply]
I support having this information in general, but saying it is in the XXXX most common words is completely useless if you then say what actual rank this has. The way the template is worded doesn't even tell me that the top ten thousand-whatever words were analyzed, because for all I know (and this would have been my interpretation if the number was a round number), there is a second category that goes to 20,000 words. So I'd either say, "Of 10,XXX words analyzed" or drop that information. And if they only counted the most common words, I would drop it in that case as well, since then the rank of the word tells you all you need to know. Andrew Sheedy (talk) 14:28, 2 June 2023 (UTC)Reply[reply]
I have updated the template to just take that part out of the wording. Vininn126 (talk) 14:57, 2 June 2023 (UTC)Reply[reply]
I find that first argument somewhat compelling and I can probably omit that. As far as the genres are concerned, I feel it's important because if you look at some words, you'll see for example no#Polish is more popular in different areas. It's similar to labels, etc. I could explain how big each corpus is (they are all equal in size, equally dividing the entire corpus. Vininn126 (talk) 14:30, 2 June 2023 (UTC)Reply[reply]
Ok, makes sense. In that case each subject corpus is a round 100,000 words.
If you want to keep all the info in, maybe something like this:
According to ..., ... is the Xst most common word in a corpus of 500,000 words, appearing A times in scientific texts, B times in news, C times in essays, D times in fiction, and E times in plays, each out of a corpus of 100,000 words, totaling A+B+C+D+E times, making it the Zst most common word in a corpus of 500,000 words. 82.46.123.120 15:00, 2 June 2023 (UTC)Reply[reply]
ups, removed the duplication I left in:
According to ..., ... is the Xst most common word, appearing 191 times in scientific texts, 161 times in news, 128 times in essays, 199 times in fiction, and 169 times in plays, each out of a corpus of 100,000 words, totaling 848 times, in a corpus of 500,000 words. 82.46.123.120 15:02, 2 June 2023 (UTC)Reply[reply]
Sure. Vininn126 (talk) 15:08, 2 June 2023 (UTC)Reply[reply]
  • A long long time ago, Wiktionary contained word frequency for English words, using Template:rank. We decided later on it was dumb. Van Man Fan (talk) 03:10, 18 March 2023 (UTC)Reply[reply]
    Interesting. I don't see any arguments why in that thread, but I'm willing to hear them. Vininn126 (talk) 10:18, 18 March 2023 (UTC)Reply[reply]
    @Vininn126 I'm not huge on the idea of including frequency here mostly because it's a very vague concept (in what linguistic domain? encompassing which register/s, time periods, geographic groupings?) and we don't have the resources or access to the kind of high quality data which are required for these statistics to really be reliable. Relative frequency rank is only really valid if you have a truly representative corpus. That said, if you are interested in working on this anyway with the resources we do have access to, there's no need to reinvent the wheel. Here it is in action. Maybe these numbers could be incorporated somehow if you want to build a test-case. Helrasincke (talk) 11:43, 20 March 2023 (UTC)Reply[reply]
    @Helrasincke I agree providing context is incredibly important with this. I am basing this off a frequency dictionary printed some time ago, but have tried to include all the relevant information. I wouldn't be opposed to including other sources, of course. If you actually look at the implementation, I think you'll see I've tried to explain everything needed. Vininn126 (talk) 11:47, 20 March 2023 (UTC)Reply[reply]
@Vininn126: As someone who relies heavily on frequency lists for language learning, I'm in favor of using frequency lists to improve coverage, but I'm not sold on including it as a Trivia section for the reasons mentioned by User:Helrasincke. A quick look at our [frequency lists] shows that the top 10 most common English words in [English wikipedia] are "the of and to in a is was that for" while the top 10 words used in [[4]] are "the I to and a of was he you it". They don't even agree on the top two most common words, but at least 9 words do appear on both lists. The 2000th most common fiction word is "teen", which appears at position 17,159 in the wikipedia list.. Telling readers that "teen" is either the 2000th most common word or the 17,159th most common word really doesn't give them any useful information and presenting only one could be unintentionally misleading. JeffDoozan (talk) 16:40, 2 June 2023 (UTC)Reply[reply]
@JeffDoozan So how should that be presented? I also don't see how it could be misleading. Vininn126 (talk) 16:42, 2 June 2023 (UTC)Reply[reply]
@Vininn126: I don't think it should be presented to the reader at all. It could be misleading because I'm unfamiliar with the word "teen" and therefore searching for its definition on Wiktionary and it tells me that it's the 17,159th most common word, I might think that it's a relatively uncommon word (on par with interspersed, microcode, or socioeconomic according to the Wikipedia list). JeffDoozan (talk) 16:48, 2 June 2023 (UTC)Reply[reply]
So how would you word it? I'm just trying to present the information in this specific dictionary as flatly and clearly as possible - i.e. giving the year, specific dictionary, etc. Vininn126 (talk) 17:04, 2 June 2023 (UTC)Reply[reply]

More on alternative forms edit

This time it's the several variants of inscripturated. All of them, best I can tell from informal research, are of relatively recent coinage, probably by theologians, who are the main users of these words. There is inscripturate, which (like its coordinate term incarnate) can be either a verb or an adjective, and clearly deserves its own entry. Then there's enscripturated, which appears to be merely an alternative spelling (much less common) of "inscripturated". Last and least, a day or so ago the ongoing compulsion to dig further here got the better of me, and I found that enscriptured is yet another form (even less common, but still attested in reputable sources). How should it be listed? As simply yet another alternative form of "inscripturate"? That doesn't quite seem to fit. Surely not as a full-fledged word in its own right, a synonym of "inscripturate" that just happens to look similar. Or are we allowed to label a word as an alternative form of an alternative form? That is what this seems to be. – HelpMyUnbelief (talk) 13:26, 10 March 2023 (UTC)Reply[reply]

I think the argument can be made either way really: whether certain forms ultimately represent a single word is a qualitative judgement and your example seems borderline. I've seen some people suggest that altforms have to be purely orthographic, but I imagine few people disagree with aluminum being listed as an altform of aluminium and that's not a purely orthographic variation, they're pronounced differently too. The guideline at WT:FORMS is simply that altforms are "variants of a single word" that should be identical in meaning, not errors, and satisfy the CFI. —Al-Muqanna المقنع (talk) 14:02, 10 March 2023 (UTC)Reply[reply]

On our list of English one-letter words edit

The list for English one-letter words only has three, the standard a, I, and O. However, we list far more one-letter words than that. Here's a list, separated by how iffy they are:

Words

  • A - "London euph. for arsehole"
  • c - "Alt. form of c., as in circa"
  • C - "One hundred dollars"
  • D - "Slang for dick"
  • d - "Abbr. for down, in the crossword sense"
  • E - "Slang for ecstasy"
  • e - A Spivak pronoun
  • F - "Fahrenheit"
  • f - "Euph. for fuck"
  • G/g - "Unit of gravitational acceleration"
  • H - "Abbr. for heroin
  • h - "Internet filler response"
  • J/j - "A marijuana cigarette"
  • K/k - "OK"
  • L - "Slang for loss"
  • n - "Shortening of and"
  • o - "Zero"
  • p - "pretty"
  • Q - "QAnon, anon. person on message boards"
  • R/r - "radius"
  • T/t - "time"
  • U - "Char. of the upper class, as in language"
  • v - "Abbr. of versus, in the name of a case"
  • W - "Slang for win."
  • X - "Obscene, as in a film"
  • x - "Ship indicator"
  • Y - "Facility ran by the YMCA/YWCA"
  • Z - "Z-drug"

Iffy

I suggest that we include at least some of the ones of the upper list, and leave a note at the top of the category saying that common ones are a, I, and maybe O. Three citations, for all senses. (talk) 20:15, 11 March 2023 (UTC)Reply[reply]

What is more iffy about those on the second list? Most I recognize as being commonly used abbreviations.  --Lambiam 23:52, 11 March 2023 (UTC)Reply[reply]
If we include cases where the "word" is just the name of the letter itself used with a particular sense (e.g. D, H, Q) then there seems to be no reason to not just include all letters, as any letter can be used as a word to refer to the letter itself.-Urszag (talk) 23:55, 11 March 2023 (UTC)Reply[reply]
I don't understand your criteria for iffiness. I'm also going to RFV "h"! Equinox 00:04, 12 March 2023 (UTC)Reply[reply]
Mainly "would a reasonable person say the letter as itself (eg "U" as "yoo" or "W" as "dub") or as something else (eg "b" for "born")?" I also put V on the Iffy list as it's just a subsense of the shape. Although maybe D isn't as iffy. Three citations, for all senses. (talk) 00:08, 12 March 2023 (UTC)Reply[reply]
All v iffy. – Sokkjō 05:31, 12 March 2023 (UTC)Reply[reply]
A = arsehole, c = circa, q = question, b = billion, p = pretty I've all heard pronounced as the letter. —Al-Muqanna المقنع (talk) 14:41, 12 March 2023 (UTC)Reply[reply]
'V' is not a word, just as '3', '@' and '♃' are not words. 'V' is just a letter. The name of that letter is vee, not 'V'.
Looks to me that the only one-letter words here, including letter names, are 'a', 'e', 'i', 'I', 'o', 'O', 'u' and 'n', though usually 'n' is written with an apostrophe. There are going to be some interjections as well, such as 'm' indicating something tastes good, or as a variant of 'hm' or 'um'. kwami (talk) 05:30, 13 March 2023 (UTC)Reply[reply]
Only if you exclude words that aren't spelled phonetically, which would be an arbitrary restriction. Nobody writes e.g. "bee" for "billion"; the spelling "b" is standard, so if treated as a separate word it's a word with one letter. CitationsFreak's suggestion makes somewhat more sense for distinguishing "separate words" in that independent pronunciation is a relatively standard way of distinguishing mere abbreviations (i.e., ones that function only as written representations of other words) from forms with some independent lexical character. —Al-Muqanna المقنع (talk) 10:19, 13 March 2023 (UTC)Reply[reply]
I agree about following the pronunciation, but by that standard, '3' (three), '@' (at), '+' (plus) and '♃' (Jupiter) are all words, and the restriction to letters is arbitrary. kwami (talk) 10:34, 13 March 2023 (UTC)Reply[reply]
Yeah, it's arbitrary, like any other starting principle, but it's the choice that happens to have been selected ("English one-letter words"). The restriction to pronunciation spellings doesn't follow as a necessity. I don't consider '@' etc. to be in the same category anyway, by the same principle that they're mere representations of words and not lexically independent (there isn't some special pronunciation of '@' as opposed to 'at')—though there might perhaps be cases where a symbol has its own special pronunciation, in which case it's more interesting. —Al-Muqanna المقنع (talk) 12:23, 13 March 2023 (UTC)Reply[reply]
Like many others have said above, most of the 'iffy' ones are no such thing. A search for 'b.1756 d.1791' unsurprisingly yields many hits about Mozart, for example[5]. 'A' for 'arsehole' seems deeply iffy though. Even if someone does insult someone by calling them an 'A' (does this actually happen?) then how do we know they're not calling them an 'arse' rather than an 'arsehole'? That would be more consistent with how people say 'A-hole' to mean arsehole/asshole. --Overlordnat1 (talk) 13:22, 13 March 2023 (UTC)Reply[reply]
All of those would just seem to be the letters standing in for the word, not words of their own. 'B' for 'bitch' is common, and it's clearly 'bitch' as opposed to anything else, but again it's the letter as a euphemism, not a distinct word. Similarly with 'F you!'.
And if we're going to accept all the letters of the English alphabet, why not the Greek alphabet too? A muon is often called a 'μ', a photon a 'γ', etc. These aren't written out 'mu' or 'gamma' any more than 'b' for 'born' is written out 'bee'. kwami (talk) 14:22, 13 March 2023 (UTC)Reply[reply]
I would consider those to be translingual. A photon is still a γ particle in another language, even a language which doesnt include a /g/ phoneme and normally transliterates /g/ by /k/ or some other sound. So, they would be out of place in this category. Soap 14:56, 13 March 2023 (UTC)Reply[reply]
A letter that's being said as a euphemism is a word, at least according to my gut. Also, "γ" (as in "photon") is totally a Trans. one-letter word, as it's pronounced like "gamma" in the various tongues that use it (and is just a shortening of "gamma ray"). CitationsFreak: Accessed 2023/01/01 (talk) 22:53, 13 March 2023 (UTC)Reply[reply]
Euphemisms are distinct words, yes. That shouldn't be problematic. —Al-Muqanna المقنع (talk) 22:56, 13 March 2023 (UTC)Reply[reply]
I think your average user is going to expect the category to contain words that are pronounced either as the name of the letters or the letter's sound. I think that's where you're going with the "iffy" list. I would exclude all abbreviations that are pronounced as the full word (for instance, if I see "q." I read "question" not "queue/cue". Unless there's evidence that it's pronounced the latter way, I wouldn't want to see it in the category, because it would be a purely typographical convention, not a single-letter "word". Andrew Sheedy (talk) 14:35, 2 June 2023 (UTC)Reply[reply]
That was my intent. The "iffy" was me having no evidence for people ONLY pronouncing, say "50 L" as "50 Liter[s]". (And V is just a reference to its shape.) CitationsFreak: Accessed 2023/01/01 (talk) 14:40, 2 June 2023 (UTC)Reply[reply]

I saw this discussion, and I just want to state the obvious: it is wicked hard to cite some of these words, and it's not hard merely because it's actually rare, it's just hard because you aren't sure how to look for it. I'm so proud of my citations for the ancient Chinese kingdom of E, also romanized as O. But I feel I only found the third clear cite for 'O' just a minute ago, despite doing cites for years. You sometimes have to think outside the box to find these things. But I tell you, just as much as a, I and O are words, I have fully confirmed that E and O are English language proper noun terms: a name for that ancient state. --Geographyinitiative (talk) 14:57, 2 June 2023 (UTC)Reply[reply]

Category:Administration edit

I propose deleting Category:Administration, since most of the language subcats are empty except for the category "Public administration". Even Category:en:Administration only has one entry. --Numberguy6 (talk) 18:39, 12 March 2023 (UTC)Reply[reply]

@Numberguy6 Agreed although this should probably be discussed at WT:RFDO. Benwing2 (talk) 23:55, 12 March 2023 (UTC)Reply[reply]

Wikimania 2023 Welcoming Program Submissions edit

 

Do you want to host an in-person or virtual session at Wikimania 2023? Maybe a hands-on workshop, a lively discussion, a fun performance, a catchy poster, or a memorable lightning talk? Submissions are open until March 28. The event will have dedicated hybrid blocks, so virtual submissions and pre-recorded content are also welcome. If you have any questions, please join us at an upcoming conversation on March 12 or 19, or reach out by email at wikimania@wikimedia.org or on Telegram. More information on-wiki.

"Someone must have slandered Josef K., for one morning, without having done anything truly wrong, he was arrested." edit

Time after time my edits have been reverted by the admin @Fenakhay. I have time after time tried to fish out of him what rule I have broken on his talk page, to no avail.

This culminated in me being blocked by him for "refusing to learn" God knows what. I appealed the block, but no one noticed.

Each time my "crime" was to point out similarities between Hebrew and Arabic, under the guise of "duplicated entries", whatever that means. Of course, I have not seen Fenakhay make a problem out of this for any other set of languages than Hebrew and Arabic. I've also pointed out similarities between Dutch and German; here for example he edited right after me, and it seemed to be no problem to him.

I demand to concretely know, finally, what rule I have broken, and whether this welcoming behavior is to be expected from other Wiktionary admins. Synotia (talk) 09:39, 14 March 2023 (UTC)Reply[reply]

Your edits were irrelevant where placed and badly formatted. Rule of the whole WWW: Only post relevant content. Relevancy may also be affected by duplicate nature. Fay Freak (talk) 10:32, 14 March 2023 (UTC)Reply[reply]
How are they irrelevant? According to what criteria? Synotia (talk) 10:34, 14 March 2023 (UTC)Reply[reply]
You have already been told the considerations by various people, yet refuse to learn. Fay Freak (talk) 10:43, 14 March 2023 (UTC)Reply[reply]
@Synotia The reason why your additions are duplicative is that you can simply click on the Proto-Semitic ancestor to see all the descendants. We try to avoid duplication like this because it tends to lead to errors. (Also, referring to Hebrew as the "language of the Zionists" sounds pejorative and is best avoided.) Benwing2 (talk) 02:59, 15 March 2023 (UTC)Reply[reply]
@Synotia: I am with @Fenakhay, @Fay Freak and @Benwing2 on this and I have explained it to you on the Fenakhay's page.
Imagine, if I add all Slavic cognates on the Ukrainian term вода́ (vodá, water). It's way more than just a couple, there is no point in this and it would be a huge duplication if *voda exists and lists all descendants. Anatoli T. (обсудить/вклад) 03:30, 15 March 2023 (UTC)Reply[reply]
I wonder how many site visitors do this. Synotia (talk) 20:53, 15 March 2023 (UTC)Reply[reply]

Cleaning up Persian templates edit

(Notifying Ariamihr, Dijan, Mazsch, Qehath, ZxxZxxZ, Sameerhameedy): Seems not too many active Persian editors, but User:Atitarev and I have been discussing cleaning up the Persian-specific templates, which are in a messy state currently. IMO, e.g. {{fa-adv}}, {{fa-conjunction}}, {{fa-interjection}}, {{fa-phrase}}, {{fa-preposition}}, {{fa-pronoun}} don't really accomplish anything and should be eliminated in favor of directly calling {{head}}, and some of the other templates have weird and non-standard param usages that could stand to be cleaned up and standardized. There's also things like {{fa-verb/new}}, {{fa-IPA/old}} etc. that are in a halfway state. We also have 61 (?!) verb conjugation templates, which are certainly in an awful state (although cleaning that up will take significant effort). Any Persian editors have any thoughts on this? Benwing2 (talk) 03:05, 15 March 2023 (UTC)Reply[reply]

Thanks. {{fa-proper noun}} should be kept, IMO and it needs an optional |g= for pluralia tantum, such as قرون وسطی(qorun-e vostâ, the Middle Ages).
{{fa-IPA/old}} can possibly be converted to {{fa-IPA}} by a bot, which does the work. Anatoli T. (обсудить/вклад) 03:22, 15 March 2023 (UTC)Reply[reply]
@Atitarev Agreed on {{fa-proper noun}}; I'm only proposing removing the 6 templates I mentioned above, which take only a head= and tr= param. Probably possible to convert {{fa-IPA/old}} to {{fa-IPA}} by bot, although I haven't looked into the details, and likewise for {{fa-verb}} vs. {{fa-verb/new}}. Benwing2 (talk) 03:28, 15 March 2023 (UTC)Reply[reply]
@Atitarev FYI even after deleting some old templates there are 123 remaining fa-* templates. It will take some time to clean these all up. Benwing2 (talk) 04:52, 17 March 2023 (UTC)Reply[reply]
@Benwing2: Thank you! Anatoli T. (обсудить/вклад) 04:57, 17 March 2023 (UTC)Reply[reply]

Entries for bird names edit

I just wanted to double-check that this is worthwhile doing. I've been adding Welsh names for birds and I'm finding that many of the English translations don't have their own entry. Some entries like pileated woodpecker have been around since 2010, but others like downy woodpecker or bright-rumped attila were missing until I recently added them.
I was thinking that these probably qualified for inclusion based on existing entries, how a non-native speaker might approach these terms and look them up (I wouldn't fault anybody for not knowing that a bright-rumped attila was even a bird), and that they aren't SOP since they refer to a specific species of bird and not to any bird which merely fits the characteristics of the name. But I just wanted to confirm and hear the community's thoughts first before making too many more entries and end up potentially wasting my time. – Guitarmankev1 (talk) 15:05, 15 March 2023 (UTC)Reply[reply]

I think they are fine, as long as they are attestable. (See WT:ATTEST.) Some birdnames are hard to find in use, though they often appear principally in synonyms listings in wildlife books, ie, in mentions, not uses. DCDuring (talk) 15:33, 15 March 2023 (UTC)Reply[reply]
I tend to agree with DCDuring tbh, I don't see any other reason to exclude them honestly. User: The Ice Mage talk to meh 19:00, 15 March 2023 (UTC)Reply[reply]

@Sgconlaw Can you explain what the purpose of this category is and why it's needed? It seems very strange to me esp. given that it only ever contains one subcategory, 'Carbon'. Benwing2 (talk) 06:34, 16 March 2023 (UTC)Reply[reply]

It was created by @Solomonfromfinland, so I just added it to the module. It is the parent category of “Category:en:Categories named after chemical elements” which has several entries in it. — Sgconlaw (talk) 11:36, 16 March 2023 (UTC)Reply[reply]
@Sgconlaw, Chuck Entz This seems highly questionable. It appears that User:Solomonfromfinland created a zillion element-specific categories each of which is a grab bag of junk; e.g. Category:en:Iron contains cast iron, ductile iron, etc. but also ferrous, iron-sick, blacksmith (??), irony (totally wrong), and other randomness. Chuck, this is pushing the limits of the topics-as-sets vs. topics-as-related-terms issue; should we consider trying once and for all to solve this e.g. by renaming the 'related terms' categories to something like 'Iron-related'? Benwing2 (talk) 04:47, 17 March 2023 (UTC)Reply[reply]
@Benwing2 Re: " irony (totally wrong)" irony#Eymology 2 is correctly associated with the element. DCDuring (talk) 15:43, 17 March 2023 (UTC)Reply[reply]
@DCDuring Hmm, thanks, never heard that usage but you are right. Benwing2 (talk) 15:46, 17 March 2023 (UTC)Reply[reply]
Reminds me of liver#Etymology 2. Is there a name for words like that? Theknightwho (talk) 18:06, 17 March 2023 (UTC)Reply[reply]
@Theknightwho You mean "someone who lives"? I would call that an agent noun. Benwing2 (talk) 18:11, 17 March 2023 (UTC)Reply[reply]
@Benwing2 I meant situations where adding an affix to one word creates an uncommon homograph(?) of a much more common word that’s unrelated, like iron + -y and irony or live + -er and liver. The kind of thing that would trip up a non-native speaker. Theknightwho (talk) 18:21, 17 March 2023 (UTC)Reply[reply]
@Theknightwho: Oh. I bet there's an obscure term for this but I don't know it. Benwing2 (talk) 18:27, 17 March 2023 (UTC)Reply[reply]
@Theknightwho: Only the 25,000th time I've done that. Benwing2 (talk) 18:27, 17 March 2023 (UTC)Reply[reply]
@Benwing2 I meant they'd trip up doing it the other way round haha. I guess the rule is if native speakers think the derived term probably doesn't exist (because I had the same thought as you tbh). Theknightwho (talk) 18:29, 17 March 2023 (UTC)Reply[reply]
I don't know if there's a term for it, but Granger and some other users and I have a list of them here, in the Anteroom of Silliness, along with some silly definitions. - -sche (discuss) 01:44, 18 March 2023 (UTC)Reply[reply]
detail is also often used in crosswords to indicate removing the last letter of a word, it's de-tail but cheekily written without the hyphen (though I doubt this could be attested as an actual word). --Overlordnat1 (talk) 02:03, 18 March 2023 (UTC)Reply[reply]
@-sche Thanks for this - exactly what I was looking for. Theknightwho (talk) 21:59, 18 March 2023 (UTC)Reply[reply]
@Benwing2: they simply don't understand the abstract structures behind categorization (see their talk page), but our category organization has some real problems, too. This mess was a response to a particular oddity: Iron is an Ossetian language. Category:Iron Ossetian was mistakenly moved to Category:Iron and a category redirect was left behind when it was moved back. Instead of asking someone what to do about this, they improvised a hacky workaround. I would just delete the current Category:Iron, replace it with a daughter category of Category:Chemical elements, then orphan this category by moving everything else to its proper place under that category.
As for the contents of these categories: I have yet to see a workable way to deal with the overlap of topical and set categories. That leaves situations where the intersection of the topical structure and the set structure is artificially narrow, but there are enough such intersections to bloat one or the other if no subcategories are made. I experimented a little with categories for such overlaps in the case of maize, which has a wide body of terminology that makes even narrow categories workable in some languages. See Category:Maize (crop), Category:Maize (food) and Category:Maize (plant). This is just the main problem that occurred to me- I'm sure there are others. Chuck Entz (talk) 00:22, 18 March 2023 (UTC)Reply[reply]
@Chuck Entz. Thanks. Can you give me some examples of what you mean by this sentence:
Topical categories for specific things often only fit into the same conceptual framework as that used by the set categories for those things, but there are plenty of cases where they fit better into other conceptual frameworks, with the distribution of which is which not predictable from the specifics of either framework.
Also do you think it's worth trying to separate actual chemical-element categories (see Category:en:Categories named after chemical elements, User:Solomonfromfinland manually created a bunch of them) from subcategories like "Halogens", "Chalcogens", etc.? Or do you think we should just move halogen elements under "Halogens", chalcogen elements under "Chalcogens" etc.? The name Category:en:Categories named after chemical elements is terrible and needs to go. Any suggestions for rearranging the subhierarchy under CAT:Chemical elements are welcome. Benwing2 (talk) 00:30, 18 March 2023 (UTC)Reply[reply]
I created the remaining group categories needed to cover all of the elements (except hydrogen, which is unique), then added all of the chemical elements that are likely to need categories to the module as subcategories of the group categories and finally converted all of the chemical element categories to {{autocat}}. As of now, the main category contains nothing but empty subcategories. This particular edifice of coat hangers and duct tape is ready to be deleted. I'm sure there are more of them, though. Chuck Entz (talk) 07:19, 3 April 2023 (UTC)Reply[reply]

Ottoman borrowings of Arabic adverbial accusatives edit

@Itidal, Fay Freak, Rd1978, Ardahan Karabağ. Many Turkish adverbs were originally Arabic adverbial accusatives. These often end in -en in modern Turkish. An IP editor has defined Ottoman Turkish and Turkish suffixes ـاً and -en. I do not think these suffixes exist. Terms like اولا(evvela), تماماً(tamamen), and kısmen were formed long ago from Arabic nouns by the rules of Arabic grammar, mostly in Arabic as far as I can tell. I propose to delete any mention of the supposed Turkish suffixes. In general, these are borrowings from Arabic that happen to end in the same sound (plus or minus a nasal). Some of them may be pseudo-loans. Vox Sciurorum (talk) 16:16, 16 March 2023 (UTC)Reply[reply]

Yes. It would have to exist in native Turkish words (even pseudo-Arabisms with it might not be enough if only extraordinarily occurring). Fay Freak (talk) 17:25, 16 March 2023 (UTC)Reply[reply]
I agree to delete actually. These suffixes don't exist in native Turkish vocabulary & lexicon and entered to our language directly. If there was no words in Arabic that ends with the suffix -an/-en but occurs in Turkish I wouldn't see a problem. For example Tr. tamamen < Ar. تماما.
As you can see there is an Arabic form. Ardahan Karabağ (talk) 17:42, 16 March 2023 (UTC)Reply[reply]
there are some cases that Turkish speakers actually coin adverbs utilizing -en, such as tekniken ("technically"), (there's probably more which I can't remember). categorization of the Arabic adverbial accusative derivatives is functional, as an average speaker will mostly have easy time analyzing those adverbs from its stem and the suffix at issue. imho deletion is unnecessary. what are the criteria that decides whether a turkish word is "native" or not? Itidal (talk) 21:52, 16 March 2023 (UTC)Reply[reply]
If there are cases where -an/-en is a genuine suffix, then it's maybe OK to keep it but IMO no words borrowed whole from Arabic should mention it. Benwing2 (talk) 04:51, 17 March 2023 (UTC)Reply[reply]
we already mention those terms with surface etymology template. we are obviously don’t consider them as terms that are genuinely coined in turkish. ex: müttefiken, hakikaten, binaen. Itidal (talk) 10:14, 17 March 2023 (UTC)Reply[reply]
"but IMO no words borrowed whole from Arabic should mention it" as I wrote because surface etymology categorizes. Benwing2 (talk) 15:47, 17 March 2023 (UTC)Reply[reply]
I didn't know the suffix was productive. I checked {{R:tr:OTK}} and indeed it is there. It must be a modern Turkish innovation. It is not listed as belonging to Ottoman Turkish. Educated Ottomans would have recognized -en applied to Turkish roots as a barbarism. I will delete mentions of the Ottoman suffix ـاً and leave the modern suffix alone. Vox Sciurorum (talk) 12:28, 22 March 2023 (UTC)Reply[reply]

Medieval Greek edit

from Sarri.greek: notifying @Al-Muqanna who initiated the discussion Koine/Byzantine and @Benwing2. Also @Mahagaja, Erutuon, JohnC5 directors of ancient greek, and, although inactive, the 'fathers' of grc section @Atelaes, ObsequiousNewt. Mr @A. T. Galenitis has shown great interest on the subject during our discussions.
Subject: Applying to create Medieval Greek (gkm) a language section; currently an etymolgoy language (2016, BeerParlour) Category:Byzantine Greek, resulting to categories like Category:Ancient Greek terms derived from Old Anatolian Turkish. Three issues are put, also concerning periodization of Greek as described at WT:About Ancient_Greek#Divisions of the Greek language

  • 1) Could the term 'byzantine' be changed to Medieval Greek? (a term used in many of our contemporary sources) It is also visible at {{grc-IPA}})
  • 2) Would en.wiktionary agrree to make Medieval Greek an autonomus language section? Not many lemmata would be added, but I feel that a gap of 1,000 years (6th to 17th century) is a somehow serious omission. WT hosts many languages with very few lemmata, I wonder if this one could be added too.
  • and, 3) revisiting and updating texts about Greek language periodization, especially for Koine and Medieval Greek as in appendixes, templates and lemmata.

The basic sources for the documentation of Med.Gr. by period: EarlyMed: Iustinianus up to 1100, learned, extension of Late Koine {{R:LBG}} dictionary. Main period, vulgar texts 1100 to 1453 and LateMed or EarlyModern (they coincide) 1453‑1699 i.e. 1500‑1700. Dictionaries: {{R:Kriaras Medieval}} & {{R:Kriaras Medieval2}} (22 printed volumes, up to τέως (téōs)), {{R:Dimitrakos 1964}}. Grammar, the 2019 Cambridge Grammar [6]. No inflection tables are required for this language, ipa already is included at {{grc-IPA}}. Texts are available in the internet.
I realise that a lot of technical interventions are need to add a new language, and unfortunately I do not have the capacity to make them. Still, I hope that en.wiktionary will stand favourable to this proposal. Thank you ‑‑Sarri.greek  I 18:41, 17 March 2023 (UTC)Reply[reply]

PS, Of course, what I can do, is to review and update all existing lemmata of the category. ‑‑Sarri.greek  I 19:23, 17 March 2023 (UTC)Reply[reply]
1) No, because Byzantine Greek can be from the 4th century CE. Wikipedia treating “Byzantine Greek” and “Medieval Greek” as synonyms (the latter is a subset of the former) and hence restricting the former to begin from “c. 600” makes them stupid, but they don’t know because they disregard primary sources for terminology usage and hence contrary evidence. I have frequently used it that way for borrowings into the Arabic language, which was spoken before the frontiers of the Byzantine Empire from the 4th century CE up to the 7th century CE (to when the Arabs expanded and a new era began); the most detailled treatment is perhaps at س ج ن(s-j-n), thence the categories bring you to other cases, often military and food terms.
2) As I have understood it the language is however shorter at the end, marked by Islam’s capture of Constantinople in 1453.
I do not deny that there were significant sound and grammar changes over the whole period (which sometimes have to be known for borrowings), but nothing hinders us to specify periods and variants to be more exact under a language name.
It is easiest if we don’t split that hard, I don’t see the problem. Fay Freak (talk) 01:26, 18 March 2023 (UTC)Reply[reply]
They are treated as synonyms on Wiktionary too at the moment: Byzantine Greek occupies the ISO language code gkm = Medieval Greek. So if they aren't in fact being used as synonyms then they will need to be split. In general, though, they have been used synonymously in the literature, so we would probably need a better justification to use a bespoke definition. AFAIK most contemporary Anglophone specialists would not use the term "Byzantine" for the empire before around the 6th century (if at all), which happens to coincide with the relevant linguistic shift.
The recent Cambridge Grammar of Medieval and Early Modern Greek rejects the term "Byzantine Greek" and suggests a periodisation of Greek into Early Medieval from 500 to 1100 and Late Medieval from 1100 to 1500 (p. xix) based on linguistic turning points. —Al-Muqanna المقنع (talk) 12:14, 18 March 2023 (UTC)Reply[reply]
Does not look like it if we have a recent work Byzantium and the Arabs in the Fourth Century. But I have a sinister suspicion that correct philological usage and fashionable historians’ usage that can be expected differs. Anglophone scientists appear to speak the same language, to the laymen, but they don’t, specialized interests make echo chambers. In this fashion you can be as convinced that “most” specialists (wherein?) use the term for one range as I am that it is for another. Wikipedia opts for the loudest, most circular echo chamber. Fay Freak (talk) 13:42, 18 March 2023 (UTC)Reply[reply]
The Byzantium and the Arabs ... series started in the 1980s so the title's not recent, but in any case I said "most" for a reason—there isn't a decisive consensus in usage, so it's a somewhat opaque term if the Constantinian period is meant (as opposed to "late Koine", which is clear and used in the literature). It also has no bearing on what gkm is labelled, sounds like it ought to be renamed from "Byzantine Greek" to "Medieval Greek" either way. —Al-Muqanna المقنع (talk) 13:52, 18 March 2023 (UTC)Reply[reply]
Cool, then we actually agree. The implication of the language code is of course a problem if it is different to that of the language name assigned to it when somebody uses it—another of the casual inexactitudes of those language databases (they didn’t go that deep into chronolects at those standardization gremiums that much, did they, they copied together overviews but did not investigate the actual usage of the terminology). It is possible to have separate codes or separate concepts “Byzantine Greek” and “Medieval Greek” with intersection, as for Latin “Renaissance Latin” and “Medieval Latin” overlap and somewhere in between the latter as a register we have “Ecclesiastical Latin” (unfair comparison since those are not merely chronolects). If we have “Late Koine” than the meaning of “Koine” would also be affected, I tended to view “Koine Greek” as of the time just before it but of course it is true that my pre-Medieval Byzantine Greek is also Koine. Two bad, ambiguous names for a particular era we have right now: I could arbitrarily have chosen the codes for “Koine Greek” and “Byzantine Greek” for those borrowings imagined to have taken place from the 4th to 7th century.
I don’t know, we could have Byzantine Greek with subcategories Late Koine and Medieval Greek and Koine Greek with sub categories Late Koine and Middle and Early Koine (somewhere between the two latter also the term Hellenistic Greek ends, Sarri had fun using this epithet); for etymologies “Byzantine Greek” might be seen as a cleanup category and for labels in ”Ancient Greek” entries it would make sense if editors want to express that the terms are of that era from the 4th century to the 15th—they do anyway, depending on how exact they want to be, we won’t forbid people to use lect names whereby they do the job, T:defdate is too ideal to oust it in reality. Fay Freak (talk) 18:33, 18 March 2023 (UTC)Reply[reply]

Medieval or Byzantine edit

I have no linguistic training; none. My notes here reflect what we read as wiktionary editors. I have no arguments or opinions. But still, I would like to present the case of Med.Greek, as a language lover. Allow me then, to rephrase point 1 from above. The reason we asked en.wikt to use 'Medieval' instead of 'Byzantine' is

  • a) to avoid using historical terms and events as boundaries for language change. Boundaries are always conventional. Lexicographers and linguists of previous millenium, tended to adopt as boundaries the historical turning points even if they did not have an impact on language change.
  • b) because The CambrdigeGrammar.XIX: «The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700»
  • c) because Kriaras, and greek lexicography, prefer the term Medieval (supposing, that the tradition of lexicography of each language is taken in account here, in en.wiktionary.)

Thank you ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)Reply[reply]

Strange to refer to the seventeenth century as 'medieval'. Nicodene (talk) 18:52, 19 March 2023 (UTC)Reply[reply]
Yes, @Nicodene, it might (note: some lexicographers extend it to 1800). Of course it is about language, not history. Also, the 'medievalism' both historically and linguistically had to do with the resurrection of the Greek statei in 1821 after some 400 hundred years of occupation. So, the term is not imprecise but describes a rare delay of renaissance. Thank you ‑‑Sarri.greek  I 19:03, 19 March 2023 (UTC)Reply[reply]

Periodization Koine & terminus edit

Let me elaborate on point 3 from above for the ending of Koine and beginning of Medieval Greek.

  • Koine. The 330 terminus was chosen by previous century lexicographers to coincide with the historical beginning of Byzantium. The founding of Constantinople. But this had nothing to do with language. The official language of the empire was Latin, and people spoke Late Koine. The 6th century terminus makes sense because it is the first time we have official legal texts in Greek, so conventionally, we can make it a boundary for passing from Koine to Medieval. Note, Koine was used by authors for many centuries ahead. E.g. Eustathios of the 12 century is quoted at Koine section under Ancient Greek for his Scholia, not under Medieval. (Not to mention atticists)... So, we have
  • Koine1. 3rd, 2nd, 1st BCE centuries,
  • Koine2. 1st, 2nd 3rd CE centuries,
  • Koine3 is Late Koine. 4th, 5th, 6th centuries (as included at {{R:DGE}}, TLG, and recent lexicography for Ancient Greek).. and for some writers, up to ... 1970s.

Thank you ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)Reply[reply]

Periodization Medieval Greek & terminus edit

Let me explain point 3 from above for the ending of Medieval.

  • Med starts (conventionally) 6th century with the Iustinianus Novellae (Νεαραί). Greek is now official language in the Byzantine Emprire. So we have
  • Med1 7th-11 centuries An extended Late Koine, because we have only learned texts surviving.
  • Med2 1100-1453 the main Medieval period, vulgar texts in abundance. Not only, but mainly, in the Byzantine empire ++Venetian and Frankish occupations. It is much more like Modern Greek than Koine. Language change was significant.
  • Med3 1453-1669 (or 1500-1700 if you like) is LateMedieval, interchangeably, EarlyModernGreek. Up to 1669, the year of the Fall of Crete coincides with the ending of Cretan literature, theatre and poetry, sung and popular even today. As Kriaras explains it in his 1100-1669 dictionary of «the last byzantine and first postbyzantine centuries» in his own words (cover)
    Quote, p.XI Kretchmer pereferred as boundaries the basic limit-chronologies of byzantine history (324, 1453). We have placed 1669 […] because a substantial portion of Greek literary output echoes byzantine tradition. […] Indeed, these cretan texts, inspite their individual characteristics, are placed in the linguistic atmosphere of the most-vulgar texts of the last byzantine centuries.The CambridgeGrammar uses the term EarlyModernGreek. We could place this Med3 either under Modern or Medieval Section. The reason why el.wiktionary places it under Medieval is a) because of the Kriaras Dictionary and his rationale and b), for a technical reason: because we study it in polytonic script.

Thank you all ‑‑Sarri.greek  I 19:37, 18 March 2023 (UTC)Reply[reply]

Comments on Medieval Greek edit

I support all of the points immediately above (renaming "Byzantine Greek" to "Medieval Greek", standardising the periodisation of Koine as 3rd century BC to 6th century AD, and distinguishing the 600–1100, 1100–1500, and 1500–1700 periods). I also lean towards supporting splitting Medieval Greek as a distinct language: this is no different in principle from distinguishing e.g. Old French / Middle French / French as three distinct languages, or many other medieval European "Old" languages that we've added with no problems (see @Vininn126's work on Old Polish, which has no ISO code at all even in ISO 639-3). The one problem is that the grc code is currently formally defined as referring to Greek up to 1453, but as the ayin transliteration debate shows this is something we can choose whether to follow at our leisure. —Al-Muqanna المقنع (talk) 00:46, 19 March 2023 (UTC)Reply[reply]

I like the idea of renaming "Byzantine Greek" to "Medieval Greek" and taking it up to 1669 or 1700, as the Greek sources do. Now, when I see a μσν. = μεσαιωνική ελληνική term quoted in Kriaras or {{R:DSMG}}, I have to check if it is attested before the arbitrary (and sad) date of 1453, which is difficult. I am undecided on splitting Medieval Greek from grc. That will lead to a lot of duplication. — This unsigned comment was added by Vahagn Petrosyan (talkcontribs) at 09:13, 19 March 2023(UTC).

@Vahagn Petrosyan, {{R:Kriaras Medieval}} is for vulgar language. Date, is stated, if known (e.g. λέξη του 11ου αιώνα ("word of 11th century") or, we understand itfrom the authors (here is the guide pdf. Note, that some writers continue to write in Koine. Kriaras and the Cambrdigde Grammar do not deal with them at all. Their vocabulary, up to 11th centruy is covered at {{R:LBG}}, where just getting a password will show you that it is very similar to Koine, like we see it at LSJ
No inflectional endings are given in any med.greek lemma of either LBG or Kriaras, because of the variety.
But, Vahagan, how many centuries and millennia would you need to make it an autonomus Section? Etymological and othercategories, the lemmata themselves are so weird, under the title Ancient Greek Howww can a word of 11th CE century be under a title with the word 'Ancient'. Ancient means ancient. Thank you, and especially for all your great work for greek dialects! ‑‑Sarri.greek  I 14:26, 19 March 2023 (UTC)Reply[reply]
@Sarri.greek: so would a word first attested in a Koine text written after the 6th century be classified as gkm or grc? Vahag (talk) 15:10, 19 March 2023 (UTC)Reply[reply]
@Vahagn Petrosyan if it is a new word, a neologism of its era, but in Koine style, {{R:LBG}} has it. Also {{R:Dimitrakos 1964}} (a bit difficult to read this dictionary, he does each definition diachronically from grc to el). LBG has the authors (so, we get the dates of their lifetime). A 10th century new word found at LBG: if you cannot find it at LOGEION, you know, it is not Koine, but Med. It would be labelled either as Late Koine (Scholia to ancient texts) or as learned medieval (like religious texts, laws etc), just as we label learned for any such case.
Had it been vulgar, people's language, LBG would not have it, Kriaras would.
We do not ask what the dating of the text/word was, but who the author was. The high prestige and Ancient Greek resulted to a continuous diglossia. Take an extreme example. Anna Comnena who had this dream of becoming the female Thucydides: she wrote her Alexias in attic dialect. No medieval dictionary would deal with words, inflectional forms of a revived Attic. Perhaps they would only include placenames, people's names, or vocabulary of things of her era.... ‑‑Sarri.greek  I 15:31, 19 March 2023 (UTC)Reply[reply]
@Vahagn Petrosyan example wikt:el:βάμβαξ. But the ending, is like Koine and Ancient. el:ἀκουμπῶ / Cat. of Early Med words ‑‑Sarri.greek  I 15:43, 19 March 2023 (UTC)Reply[reply]
If I understood you correctly, a Koine-style neologism after the 6th century should be put under ==Medieval Greek==. How would the etymology section of such a word look like? For example, ὀφθαλμοπονία (ophthalmoponía, eye pain) in LBG. Would you say {{affix|gkm|ὀφθαλμός|πονέω}} and then create Medieval Greek entries for ὀφθαλμός (ophthalmós) and πονέω (ponéō)? Would the Medieval Greek entry for ὀφθαλμός (ophthalmós) contain the definition "eye" or just the two new senses that are attested in the Medieval period, namely "a kind of stone; water intake of a mill". I am asking because I haven't figured out how to handle the medieval period of another language with diglossia — Armenian. I now regret having code axm for "Middle" Armenian. Vahag (talk) 16:41, 19 March 2023 (UTC)Reply[reply]
@Vahagn Petrosyan I do not write blind etymologies. I would have to find a dcitionary that states: {l|gkm|ὀφθαλμοπόν(ος) (< Koine {l|grc|ὀφθαλμοπόνος) +{af|gkm|-ία. http://stephanus.tlg.uci.edu/lbg/#eid=51338
In general, I would treat the word, just as any other language. ‑‑Sarri.greek  I 16:50, 19 March 2023 (UTC)Reply[reply]
I may be showing my ignorance, but that reminds me a lot of Katharevousa- is the way we deal with that relevant here? Even with English we have Edmund Spenser, who wrote what sometimes looked like Middle English in what we consider to be the Early Modern English period. Sometimes you just have to come up with an arbitrary cutoff date and stick with it. Chuck Entz (talk) 00:24, 20 March 2023 (UTC)Reply[reply]
Yes, it is a common theme in Greek @Chuck Entz, the revival (an artificail one) of old styles. But my question here is: is Medieval Greek recognized at en.wiktionary as an existing and documented language period? similar to all other Middle and Medieval periods of other languages covered here? Do the diglossic literary styles overshadow its existence? Thank you. ‑‑Sarri.greek  I 00:43, 20 March 2023 (UTC)Reply[reply]

Conclusion for Medieval Greek edit

[by Sarri.greek] I thank you all for contributing to this discussiion, and helping to clarify.
During the last 15 days, I have contacted the three administrators for Ancient Greek at their talk pages and two of them respdonded; @JohnC5.talk and @Mahagaja.talk. I thank JohnC5 and @Mahagaja for their positive responses for the above changes. I think there is no objection, except one about wikipedia's different periodization (w:en:Template:Greek language periods already has correct Koine up to c.600. I could notify WP, that a period 1453-1669 as either Late Medieval Greek or Early Modern Greek is studied at en.wiktionary under Medieval Greek language (in coordination with Dictionaries for Med.Greek. w:en:Tempalte:Greek language has not been updated yet).
If administrators agree that there are no other objections, could the necessary changes start being implemented?

I thank you in advance, ‑‑Sarri.greek  I 11:05, 31 March 2023 (UTC)Reply[reply]

@Sarri.greek: Before we do anything, I must point out that the code for Medieval Greek (or Byzantine Greek) is gkm, not byz, which is the code for Banaro, a language of Papua New Guinea. —Mahāgaja · talk 11:16, 31 March 2023 (UTC)Reply[reply]
@Mahagaja:, yes of course. byz is used at IPA. Thank you. ‑‑Sarri.greek  I 11:20, 31 March 2023 (UTC)Reply[reply]
Actually, I just noticed that gkm is not actually an official ISO 639-3 code. It was requested back in 2006, but no decision has been made yet. That being the case, is it in keeping with Wiktionary policy to use the gkm for Medieval Greek, or do we have to use an explicitly Wiktionary-only code like grk-gkm until such time as the code is made official? —Mahāgaja · talk 11:22, 31 March 2023 (UTC)Reply[reply]
@Mahagaja -if the question is for me-, I dont know your policy. Or if you add at a list such language-codes as 'under trial use' or something like that. ‑‑Sarri.greek  I 11:30, 31 March 2023 (UTC)Reply[reply]
The question isn't for you but for other admins. Reading Wiktionary:Languages § Language codes, I think we do have to use grk-gkm and list it at Module:languages/data/exceptional, not at Module:languages/data/3/g. —Mahāgaja · talk 11:36, 31 March 2023 (UTC)Reply[reply]
Yes, @Mahagaja. I have read the ISO.proposal2006 for gkm, which is very old. I hope a proposal would be renewed with updated sources, besides 'Robert Browning'. and perhaps from some official institution -I' ll try to find out what is available- By the way, at el.wikt we already use labels for dialect codes. gkm-cyp (Medieval Greek Cypriot) and gkm-crt (..Cretan). ‑‑Sarri.greek  I 11:59, 31 March 2023 (UTC)Reply[reply]
Could @Benwing2 help or perhaps recommend an admin for {alert|languages} to assist with the code gkm or ...? Thank you. ‑‑Sarri.greek  I 18:47, 31 March 2023 (UTC)Reply[reply]
@Sarri.greek Apologies for not keeping up with this discussion, as it's long and technical and I don't know enough about Greek periodization. What is the request exactly? Is it to convert 'gkm' to a full language from an etymology-only language? Anything else? Also, User:Mahagaja you are suggesting a different code 'grc-gkm'? I don't think there's any rule here that says we can't use non-official ISO 639-3 codes for languages, although User:-sche can correct me if I'm wrong. Definitely it would be better to stick with 'gkm' if possible as it's four fewer characters to type. Benwing2 (talk) 18:57, 31 March 2023 (UTC)Reply[reply]
BTW I don't think it will be difficult to make this conversion (going in the other direction, from full to etym-only language, is harder). When you create the new 'Medieval Greek' lemmas, you'll have to use {{head}} for the moment until we have new gkm-specific headword templates. User:Sarri.greek maybe you can specify how you want the templates to behave, and someone who knows Greek well (User:Erutuon, if you have time?) can help implement them? Benwing2 (talk) 19:01, 31 March 2023 (UTC)Reply[reply]
@Benwing2: Are there any examples of non-ISO codes in use at Wiktionary that have the form "xxx" rather than "xxx-xxx"? I thought we kept them carefully separate. —Mahāgaja · talk 19:08, 31 March 2023 (UTC)Reply[reply]
Yeah, if gkm has never been an official ISO code for Medieval/Byzantine Greek, then for internal clarity and to avoid problems if the ISO assigns gkm to another (newly-encoded) language (which they would be free to do!), it should be formatted as an exceptional code xxx-yyy where xxx is the nearest family code and yyy is some string, as described on WT:LANG. To my knowledge the only time we use ISO-like but non-ISO codes is when something used to be an official ISO code and the ISO retired it but we didn't, like sh and some minor languages with three-letter codes that they've split or merged but which we haven't (yet). Because we use that ISO-like three-letter code for it, I didn't realize that gkm wasn't an official ISO code! For a time, we used LinguistList's qot for Sahaptin (which was actually less of a problem since IIRC that's within the range the ISO allots for private use), but when that was noticed we re-coded it, too. Perhaps someone can prod the ISO to approve gkm, but until then, let's use grk-gkm. Probably it would be beneficial to check ISO's code list against ours and see which ISO codes are absent from our modules and which two- or three-letter codes we use are absent from the current ISO standard. I did this years ago and noticed quite a few discrepancies where we needed to either add a code or record on WT:LANGTREAT that we were intentionally excluding it. - -sche (discuss) 19:51, 31 March 2023 (UTC)Reply[reply]
1) Thank you very much @Benwing2. Template {{head}}|gkm|POS} looks fine. Perhaps a label might be used (learned or formal). Bibliography templates already exist. No declensions are needed. Inflectional forms are discussed in dictionaries as attested.
It would be nice if gkm would anticipate a renewal of proposed ISO language (hopefully soon). ‑‑Sarri.greek  I
2) Thank you @-sche for your help and explanations. I hope someone will renew the gkm ISO proposal soon. At the moment, any code would be fine! Is grk-gkm = hellenic language, Medieval OK? ‑‑Sarri.greek  I 20:33, 31 March 2023 (UTC)Reply[reply]
@-sche, +parent = Ancient Greek grc. descendants = Modern Greek el, Pontic pnt, and Cappadocian cpg... ‑‑Sarri.greek  I 20:40, 31 March 2023 (UTC)Reply[reply]

@Sarri.greek, you did not answer how we should deal with Ancient Greek words developing new senses in the Medieval period. For example, ὀφθαλμός (ophthalmós) has the new meanings "a kind of stone; water intake of a mill" according to LBG. Should we create ==Medieval Greek== with just those two senses? --Vahag (talk) 20:01, 31 March 2023 (UTC)Reply[reply]

Yes, @Vahagn Petrosyan. We would treat the senses as we do for every lang. Also in Modern Greek, senses may be identical to ancient ones, or/and plus new senses, or different. It is no problem: we follow our dictionaries's definitions. ‑‑Sarri.greek  I 20:21, 31 March 2023 (UTC)Reply[reply]
But we don't follow dictionaries. We include what is attested. Obviously, the bulk of Ancient Greek lexicon is attested also in the Medieval Period, in the same form and meaning. You are proposing to duplicate the whole of Ancient Greek lexicography under ==Medieval Greek==. Note that LBG and Kriaras are dictionaries of differences from Ancient Greek, namely of words, wordforms and meanings unattested in Ancient Greek. For example, they do not include κύων (kúōn, dog). They are supposed to complement Ancient Greek dictionaries, not duplicate them.
Another problem beside duplication is that if gkm is separate from grc we will have determine if each borrowing in other languages (Old Armenian, Old Georgian, Coptic, Aramaic etc.) happened before or after 500 AD. That is difficult. Vahag (talk) 20:46, 31 March 2023 (UTC)Reply[reply]
@Vahagn Petrosyan, no, the dictionaries of Medieval deal with all words attested in the particular periods without 'duplicating the whole of Ancient'. No, {{R:LBG}} and {{R:Kriaras Medieval}}, do not include only 'different' senses. They have all senses that are attested and found in medieval texts of their scope. κύων (kúōn) is an ancient word. Greeks of today may use it too if they wish. That does not make it a Modern Greek word. In Greek, you may use, reuse, quote, any word or inflectional form from any period. Dictionaries do not deal with such references, but with real usages.
For the borrowings: the Medieval Greek has been overlooked, but a borrowing might happen through ancient texts too. Not necessarily through contact with post 7th century speakers of Med.Greek. You do not have to change anything from your sources. Thank you, I hope this covers your question. ‑‑Sarri.greek  I 21:00, 31 March 2023 (UTC)Reply[reply]
I will give you an example. LBG has the word βρομερός meaning "stinky" attested in the phrase βρωμερός κύων "stinky dog"" in a 6–7th century medieval text. LBG has no entry for κύων because there is nothing different about that word compared to Ancient Greek. You see that it does not include all words and senses attested in the texts of its scope (it's the same text). If you split code gkm, then according to Wiktionary's principles both βρωμερός and κύων will be eligible to be entered under ==Medieval Greek== because they are attested in the medieval period: see WT:CFI. Κύων is attested in modern Greek texts too so it is eligible to be entered also under ==Greek==.
Regarding the borrowings in other languages, if gkm is split from grc, then all oral borrowings happening after the cut-off date are by definition not from grc. I can't rely on my sources anymore. I will have to figure out if the borrowing of ղենջակ (łenǰak) is pre- or post- the cut-off date. That may be impossible for non-literary languages like Laz. Vahag (talk) 21:36, 31 March 2023 (UTC)Reply[reply]
@Vahagn Petrosyan, it is normal to use an ancient word. Also, the presence of Med.Greek has always been persent at English.wikt, either as Category under grc or as grk-gkm. It did not alter your etymologies. ‑‑Sarri.greek  I 21:42, 31 March 2023 (UTC)Reply[reply]
As Vahag says, entries will have to be based on cites, not copying other dictionaries (although presumably dictionaries will be of help in figuring out what stage a term is attested in). That said, and while I can't contribute much to the question of whether to split Medieval Greek off as a separate language, it's probably best for that decision to based on 'Greek-internal' factors (whether the stages are as different as other stages we would typically consider different lects, etc) and not on what's easiest for borrower-languages' etymology sections [regarding the point about Laz, above], because even for languages as well-attested and well-written-about in reference works as English and French, it can require more resources than I am able to track down, to figure out whether a particular word for e.g. armor or heraldry was borrowed from Modern or Middle (or possibly even Old) French or even Anglo-Norman, but it does not follow that having these as separate languages is bad; in such cases all I can do is spell out the uncertainty like "from modern or Middle French foo". In other situations it can be hard to figure out which of several closely (or even not closely!) related languages a term was borrowed from, e.g. bensin. - -sche (discuss) 22:28, 31 March 2023 (UTC)Reply[reply]
I completely agree with User:-sche here. It sounds like there is still some discussion to be had; I'll wait to make any changes until the issues are resolved. Benwing2 (talk) 22:34, 31 March 2023 (UTC)Reply[reply]
@Benwing2, issues? I agree too with -sche. That it is based on Greek internal factors and that having these as separate languages is not bad. ‑‑Sarri.greek  I 22:41, 31 March 2023 (UTC)Reply[reply]
@Sarri.greek User:Vahagn Petrosyan seems not to support this, and User:-sche just said that the decision should not be made based on how easy it is for etymologies to be created (which I agree with), but didn't explicitly support the change. I think we should try to resolve Vahag's concerns. Benwing2 (talk) 22:59, 31 March 2023 (UTC)Reply[reply]
@Benwing2, Vahagn Petrosyan. I hope so too. But I have not studied other languages except greek, and I cannot comment on their etymologies. Our concern here, is to correct the title 'Ancient' over medieval words, and keep en.wiktionary updated with reference and bibliography published in the last decade, thus correcting the empty gap of the 7th century to modern times, covered by 'Ancient'. Thank you. ‑‑Sarri.greek  I 23:07, 31 March 2023 (UTC)Reply[reply]
I don't understand what is the Greek internal factor that makes ὀφθαλμός (ophthalmós, a kind of stone; water inlet) a word in a different language stage than ὀφθαλμός (ophthalmós, eye). Did it undergo a sound change? No. It just developed yet another figurative meaning. Before we move ahead with this momentous change, @Sarri.greek can you please create a sample ==Medieval Greek== page in a sandbox in your userspace for ὀφθαλμός (ophthalmós) and another one for ὀφθαλμόπονος (ophthalmóponos) to see how they will look like? Vahag (talk) 23:07, 31 March 2023 (UTC)Reply[reply]
Normally i need more time to study words, date all authors where they are attested, but here User:Sarri.greek/gkm-test is a sample page of your requested terms Vahagn Petrosyan. ‑‑Sarri.greek  I 00:00, 1 April 2023 (UTC)Reply[reply]
@Sarri.greek: thank you for creating the samples. These look like normal Ancient Greek words that can be presented under ==Ancient Greek== with a label {{lb|grc|Medieval Greek}}, using the vast infrastructure that has already been developed for Ancient Greek.
Note that there are no gaps. All Greek words until 1453 can currently be entered under code grc (we can raise the cut-off date to 1663). All Greek words after that can currently be entered under code el, with a label {{lb|el|obsolete}} if not used anymore. I understand that Medieval Greek under Ancient Greek is somewhat oxymoronic, but language names are conventional, we don't need to pay attention to that formal contradiction.
If we are to carve gkm out of grc and el, then there should be practical benefits in terms of organizing and presenting information. One benefit I can think of is allowing the normalizing of polytonic spellings to monotonic for gkm, like Kriaras does (that is not acceptable for grc). That way we can freely use the monotonic headwords and quotations found in Kriaras without having to find and restore the polytonic spelling. Another benefit is presenting the vulgar alternative spellings found in medieval texts in the ===Alternative forms=== section of ==Medieval Greek==, like αφθαλμός, εφταλμός, ουφθαλμός for ὀφθαλμός. I assume most of our users would not appreciate seeing those barbarities in the Ancient Greek section. Vahag (talk) 14:42, 1 April 2023 (UTC)Reply[reply]
Yeah, it's not clear to me what the reason they would have to be split is. Sarri, you said in your initial post that if we "make Medieval Greek an autonomus language section[,] not many lemmata would be added, but I feel that a gap of 1,000 years (6th to 17th century) is a somehow serious omission", but that seems like a misunderstanding, there shouldn't be any omissions at present, everything is entered as either Ancient or modern Greek depending on when it's attested relative to the cutoff between the two. And it seems from Vahag's comments like many duplicate entries would be added if we split, since all the words that just continued to be used from ancient through medieval times would be duplicated (since, as Vahag said, we add what's attested, not only what other dictionaries have, and hence not only words first attested in the medieval period). It is unfortunate we don't have many active editors familiar with Ancient Greek who could weigh in; most of the discussion seems to be about what to call the lect (even if it's just an etymology-only language), but other than the proposer and Al-Muqanna is anyone else supporting a split as opposed to just commenting, like me and Mahagaja, that they aren't in the best position to judge whether it's necessary or not? And if so, what's the rational for the split? (My comment above was just to say that I don't think "it's hard to tell when another language borrowed a term" is an impediment to splitting the lects, but I'm not seeing what internal changes would necessitate a split.) From Vahag's comments it seems like a lot of the vocabulary is unchanged, just in some cases with semantic changes which could be handled via labels like they should already be. - -sche (discuss) 02:01, 1 April 2023 (UTC)Reply[reply]

@-sche. I am not a linguist. If the dictionaries {{R:Kriaras Medieval}} for 1100-1669, {{R:LBG}} for the early period 9th to 11th century, and if the 2019 Cambridge Grammar [7] do not suffice for a justification of a separate language section, what could I, a little editor answer to you? Or what could I say to all the administrators of en.wikt, which prides for its accuracy, its plurality, covering thousands of languges. ‑‑Sarri.greek  I 02:10, 1 April 2023 (UTC)Reply[reply]

To the future edit

I am certain that gkm will inevitabely become autonomous language one day, which is the correct thing. A hellenist interested in it may arrive here in some years, perhaps in some decades. α! φίλε μου, I apologise for being so inadequate! ‑‑Sarri.greek  I 14:49, 1 April 2023 (UTC)Reply[reply]

Change Proto-Mon-Khmer to Proto-Austroasiatic edit

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages. So how can this name change be done? Ngôn Ngữ Học (talk) 21:51, 18 March 2023 (UTC)Reply[reply]

@Ngôn Ngữ Học Normally this would be handled at Wiktionary:Requests for moves, mergers and splits.
Thanks, I have just moved this discussion to Wiktionary:Requests for moves, mergers and splits. Ngôn Ngữ Học (talk) 22:19, 18 March 2023 (UTC)Reply[reply]
@-sche, PhanAnh123, Patnugot123 This suggests we need to rename the Proto-Mon-Khmer lemmas to Proto-Austroasiatic. Can they simply be renamed or do they need updates to the reconstructed forms? I don't know a damn thing about these languages but I can help with bot stuff. Benwing2 (talk) 22:07, 18 March 2023 (UTC)Reply[reply]
@Benwing2 They can simply be renamed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer. Ngôn Ngữ Học (talk) 22:18, 18 March 2023 (UTC)Reply[reply]

Remove language name from reconstructed entries' title edit

I propose we move our reconstructed entries from Reconstruction:Langname/entry to simply Reconstruction:entry (and merge consequential homographs). Note how we currently have an L2 header repeating what the title already says. Nowhere else in the project (mainspace, citations, thesaurus, ...) are language names in the title of an entry. This would make it easier to look up the terms in the search bar, and setting R: as shorthand for Reconstruction: would help even more. The current situation feels like a remnant of when these entries used to be in the Appendix namespace. Catonif (talk) 09:25, 19 March 2023 (UTC)Reply[reply]

  • Support, though I'll defer to editors who regularly edit within the namespace. — excarnateSojourner (talk · contrib) 04:25, 12 April 2023 (UTC)Reply[reply]
  • Support. Not only is it inconsistent, but it means special logic has to be used in (e.g.) Module:links, and - much more of a headache - in the lite templates. We’re fast approaching some of the non-Lua page limits on certain pages with a lot of lite templates, and one reasons for that is the process of adding the language name for every reconstruction link. Removing that would reduce load time and add a buffer against the limits on those pages. Theknightwho (talk) 16:34, 12 April 2023 (UTC)Reply[reply]
Support if no issues result from this. Nicodene (talk) 16:43, 12 April 2023 (UTC)Reply[reply]
Oppose, reconstruction namespaces are generally very large already and a reconstruction like *a would have to host so many entries it will quickly follow the fate of its attested counterpart. If anything, we should think about splitting the mainspace. Thadh (talk) 17:01, 12 April 2023 (UTC)Reply[reply]
Are they, compared to pages like para? Theknightwho (talk) 17:20, 12 April 2023 (UTC)Reply[reply]
Proto-Sino-Tibetan *ŋa, Proto-Indo-European *bʰer-, Proto-Athabaskan *tuˑ; Yes. Thadh (talk) 17:51, 12 April 2023 (UTC)Reply[reply]
@Thadh None of those even come close. Theknightwho (talk) 11:11, 13 April 2023 (UTC)Reply[reply]
To one section? Thadh (talk) 11:36, 13 April 2023 (UTC)Reply[reply]
They are just one section, but I’m sceptical that any combined reconstruction pages would have the 1,000+ template calls that we see on the mainspace pages that are causing us headaches. Theknightwho (talk) 11:43, 13 April 2023 (UTC)Reply[reply]
Oppose, per Thadh. I don't see why we'd merge them. — Fenakhay (حيطي · مساهماتي) 17:10, 12 April 2023 (UTC)Reply[reply]
Oppose. Vininn126 (talk) 17:16, 12 April 2023 (UTC)Reply[reply]
Abstain. I kind of like being able to search for talk pages of only specific reconstructed languages (e.g. all Reconstruction talk:Proto-Algonquian/), but that's a minor issue. If I recall correctly, this was proposed years ago and one of the objections raised then was that reconstruction entries' orthographies are somewhat arbitrary so we'd be putting e.g. Proto-Algonquian terms with θ or x on the same pages as Proto-Germanic (etc.) terms with those letters as if those two languages both had a term spelled or pronounced that way, when in fact neither term was written that way historically and they represent completely different sounds. I don't consider that overly persuasive, since with e.g. cag, g likewise represents something completely different in English vs Hmong (where it's not a sound at all but just indicating the tone of the preceding letters), but the argument was that at least there all the listed languages really do spell their terms that way. - -sche (discuss) 17:25, 12 April 2023 (UTC)Reply[reply]
Oppose. There would be more language ambiguity if we remove the language name and more confusion than having a single language per reconstruction entry. Kwékwlos (talk) 18:04, 12 April 2023 (UTC)Reply[reply]
But we do this in mainspace already... It creates more confusion to have two different ways of doing it. Theknightwho (talk) 18:21, 12 April 2023 (UTC)Reply[reply]
This is likely something that'd need to go to a full vote. I'm abstaining for now, but I do think that the Lua issues should be taken into account, along with readability issues of entries like a. AG202 (talk) 18:14, 12 April 2023 (UTC)Reply[reply]
Abstain for now. I would like to see some data on how many reconstruction entries would be merged due to this (and hence may lead to readability/Lua memory issues). Wpi31 (talk) 18:33, 12 April 2023 (UTC)Reply[reply]
Oppose Would mess with my reconstruction page tracking and scraping, and I think it would hurt SEO results, for whatever that's worth. Also, don't want to give people any Proto-World ideas, LMFAO, and what Thadh said. -- Sokkjō 19:27, 12 April 2023 (UTC)Reply[reply]
Oppose. The technical argument is not convincing - the real way to get rid of this 'special case' would be to retire the Reconstruction namespace entirely and have e.g. Reconstruction:Langname/entry directly under *entry, but I cannot support that either. — SURJECTION / T / C / L / 20:01, 16 April 2023 (UTC)Reply[reply]

To hyphen or not to hyphen edit

In Bantu languages, there’s many morphemes that are technically prefixes (they go before the content-word stem), but cannot be at the beginning of a word. All sources that I know of write these with hyphens both before and after, as -me-. It seems, however, that the editors of the Nguni languages have lemmatised many such morphemes with only a hyphen after, such as sa- (etymology 2). What is the “canonical” way of doing this here at Wiktionary? The entry layout page is vague (“where it links with other words”; these morphemes link with other affixes, though, not always with actual words). I’d prefer two hyphens, as these morphemes must attach to something both in front and after, and that’s what many (all?) sources do. But there’s a lot of precedent against it. MuDavid 栘𩿠 (talk) 07:18, 21 March 2023 (UTC)Reply[reply]

@MuDavid If all sources use hyphens on both sides, we should do the same (and these should potentially should reclassified as interfixes). Benwing2 (talk) 22:28, 23 March 2023 (UTC)Reply[reply]
I agree, if they can't go at the start of the word then it makes sense for there to be hyphens before and after. —Al-Muqanna المقنع (talk) 23:48, 23 March 2023 (UTC)Reply[reply]
Thank you for the answers!
@Benwing2 According to both Wiktionary and Wikipedia (the sources of all knowledge ☺) an interfix is a meaningless morpheme, which is not the case here. I’m still looking for a better term; I’m tempted to make -me- a particle like Vietnamese đã, given that they function exactly the same except for the presence of whitespace. MuDavid 栘𩿠 (talk) 01:57, 24 March 2023 (UTC)Reply[reply]
The term in this case is infix, which often gets confused with interfix for understandable reasons but is also an admissible part of speech (see WT:POS, Category:English infixes etc.). —Al-Muqanna المقنع (talk) 02:05, 24 March 2023 (UTC)Reply[reply]
The word infix normally means a morpheme that goes inside another morpheme, which is not the case here either: -me- goes between different morphemes. Many sources call it an infix and we also used to, but I’m not sure we should. MuDavid 栘𩿠 (talk) 02:34, 24 March 2023 (UTC)Reply[reply]
There is prescriptive disapproval of this application, for sure, but ultimately Wiktionary decides for itself how to use particular terms when other sources differ in their usage, and our own glossary definition atm is that an infix is a morpheme inserted inside a word, not specifically in another morpheme. In that case the IP's justification for changing the POS header there ("infixes go inside stems/root not simply inside words") was simply ignoring how the term's been defined for in-house use. I can also think of counterexamples I've come across in reference grammars for other languages myself, e.g. Chinese 得 being defined as an infix in constructions like 聽得懂听得懂 (tīngdedǒng). "Prefix" or "Infix" are both comprehensible in this case IMO (and I don't think "Prefix" has to mean no hyphen at the front). My only suggestion would be to avoid a bespoke option not used in the sources like the "Particle" idea, which is likely to be opaque to readers. —Al-Muqanna المقنع (talk) 03:07, 24 March 2023 (UTC)Reply[reply]
I agree with Al-Muqanna regarding the scope of infix: perhaps some works prescribe a narrow definition, but in practice (things described as) infixes are clearly not limited to being inside a morpheme, even in works about grammar. (All of Wikipedia's examples of English infixes are not limited in that way, e.g. hiphop hizouse or Homeric edumacation.) To the immediate question about Bantu, though... how do works about Bantu describe these? As prefixes? It's apparently possible for something to be a prefix even if it can't appear at the beginning of a word: compare Wiktionary:Tea room/2023/March#-nil-, and various Category:Navajo prefixes like -ł-, -∅-, and -ba. (OTOH it is also conceivable that some of the Navajo affixes are mislabeled.) - -sche (discuss) 22:02, 24 March 2023 (UTC)Reply[reply]
Trask defines infix as “An affix which occupies a position in which it interrupts another single morpheme”. If we use a different meaning, we should definite more clearly say so in our glossary.
I’ve searched some works on Bantu grammars, and it seems some conspicuously avoid calling those anything but “markers”, but the term infix is also common, see for example here. MuDavid 栘𩿠 (talk) 02:37, 27 March 2023 (UTC)Reply[reply]
"Infix" is common in Bantu linguistics, or at least it used to be, but that usage conflicts with normal usage of the term. AFAIR, and pace Trask, in general an infix does not necessarily appear inside a morpheme, but must appear within a stem. That is, if you derive one word by adding an affix between morphemes of an existing word. (Generally this would also allow appearance within a morpheme, but there may be exceptions.) But a sequence of affixes does not turn the internal affixes into infixes. I'd oppose labeling these "infixes" because (1) it is technically incorrect, and (2) it may be passing out of favor even within in-house Bantuist convention.
I agree however that we should write two hyphens. That does not imply that the element is an infix, only that it is affixed to something on both sides. kwami (talk) 08:41, 30 March 2023 (UTC)Reply[reply]
If “infix” is passing out of favor among Bantuists, what do they call it instead then? For example, what do they call it when a morpheme is inserted in between two different stems? (It happens in Swahili, where the -o of reference is suffixed to auxiliary verbs, and some of these grammaticalized and ended up attached in front of their main verb.) MuDavid 栘𩿠 (talk) 09:39, 30 March 2023 (UTC)Reply[reply]
As for passing out of favor, you'll have to ask the people above who said that. The older lit I'm familiar with usually does call them infixes, but this has long been criticized by linguists who work on languages that actually have infixes. IMO Wikipedia should not intentionally misuse technical terminology. We could call them "Bantuist infixes", I suppose, so as to not mislead the reader into thinking they're actually infixes.
Yes, the suffix to the compounded auxiliary is an interesting complication. I don't know that there is a term for it, but it's certainly not an infix, as that would mean the auxiliary had been compounded to the main verb first, to form a functional word of its own, and then that that verb was further derived by inserting the relative suffix. kwami (talk) 23:26, 30 March 2023 (UTC)Reply[reply]
Nobody “above” said that infix is passing out of favor. I found some sources that call them markers and concords, which is what they are; that does not mean they are called prefixes now.
Words can have different meanings in different fields of study. If you cannot accept that, stay out of language, of all things. “Infix” may be limited to what you say in non-Bantu linguistics, that does not mean it cannot have other meanings in other fields.
If you’re unable to come up with a solution for infixed suffixes, I’ll stick with the solution that is generally accepted and continue calling them infixes. And to “not mislead the reader into thinking they're [what pedants call] infixes”, that’s only a question of expanding our glossary definition. MuDavid 栘𩿠 (talk) 03:44, 4 April 2023 (UTC)Reply[reply]
Would anyone else like to weigh in here? (The only other user who comes to mind as knowing about Bantu is @Metaknowledge, who has sadly been inactive of late.) Benwing is OK with 'infix' (User_talk:MuDavid#Swahili_infixes), MuDavid is going with 'infix', and although I have no strong feelings, I'd go with whichever of our POS categories Bantu literature treats these as, which is apparently 'infix'. Kwami unilaterally changed the entries and categories to 'prefix', claiming there is a lack of consensus to continue using 'infix', and ignoring the rather more obvious lack of consensus for his own abrupt change away from the Bantuist-literature-standard term we've been using for years to the one he (alone?) prefers. Unless anyone else wants to weigh in in favour of 'prefix', I intend to undo that change on principle. (This is not the first time Kwami has tried to unilaterally implement his own preferences, claiming there was a lack of consensus to stop him, even if there was.) - -sche (discuss) 21:56, 4 April 2023 (UTC)Reply[reply]
The infix entries were just made while this discussion was going on.
Anyway, if you have evidence that a Bantu language actually has infixes, then sure. Hadza is in the area, and has both infixes and suffixes. But if you're calling suffixes "infixes" because that's the local convention, we have a problem: Wiktionary is not about local conventions. So will will have a conflict between RS's that they're both infixes and affixes, and the consequent possibility of duplicate entries and requests for merging, when we could simply limit the label "infix" to infixes in the first place. I mean, if we haad a RS that called verbs "action words" and nouns "thing words", we wouldn't add those as POS labels and have the same word twice, once as a "verb" and once as an "action word". kwami (talk) 22:03, 4 April 2023 (UTC)Reply[reply]
David Crystal (2008) A dictionary of linguistics and phonetics, Wiley-Blackwell pub., defines an "infix" as:
A term used in morphology referring to an affix which is added within a root or stem. The process of infixation (or infixing) is not encountered in European languages, but it is commonly found in Asian, American Indian and African languages (e.g. Arabic).
Hadumod Bussmann (1999) Routledge Dictionary of Language and Linguistics defines an "infix" as:
Word formation morpheme that is inserted into the stem, e.g. -n- in Lat. iungere ‘to tie’ vs iugum (‘yoke’) or the -t- in the reflexive function between the first and second consonants of the root in the eighth binyan of classical Arabic, cf. ftarag ‘to separate,’ ʕtarad ‘to place before oneself.’ Ablaut and umlaut are often considered infixes.
and under "affix":
infixes are inserted into the stem (e.g. -m- in Lat. rumpo ‘I break’ vs ruptum ‘broken’).
These are standard linguistic definitions, and Bantu agreement prefixes do not match. Indeed, Bussmann in summarizing Bantu languages says,
Complex verb morphology (agreement prefixes, tense/mood/polarity prefixes, voice-marking suffixes)
that is, labeling the morphemes in question as "prefixes".
I could easily find more, including professional descriptions of individual Bantu languages. If we insist on labeling the Swahili perfect morpheme -me- an "infix", then following RS's we would need a second perfect morpheme -me- labeled a "prefix". kwami (talk) 22:22, 4 April 2023 (UTC)Reply[reply]
You still have not provided any reference work (let alone sufficiently many do establish consensus) describing the -o of reference as a “prefix” when this is suffixed to an auxiliary and followed by some other verb complex.
And don’t lie, Kwami. The “infix entries [] just made while this discussion was going on” were made after the discussion died down and before you revived it, the Category:Lingala words infixed with -el- was created in 2018, and @Metaknowledge created -me- with the “infix” header in 2016. MuDavid 栘𩿠 (talk) 01:12, 5 April 2023 (UTC)Reply[reply]
Were made without any consensus or much discussion at all, in addition to being demonstrably wrong.
Finding a morphological classification for the -o- won't be easy. It's obviously not an infix; I suspect it will just be a suffix to a compounded aux (that is, AUX-o+VERB, not -o-), but it will probably take some time to find something. kwami (talk) 02:18, 5 April 2023 (UTC)Reply[reply]
@MuDavid, can you give me specific examples of the AUX-o-VERB construction in question, or the equivalent in other Bantu languages? It's hard to address this without sometime concrete to base it on. kwami (talk) 03:37, 5 April 2023 (UTC)Reply[reply]
Wait wait wait, you mean you made those edits to -ye- and its ilk without having the faintest idea of what it is or how it works? You had all this discussion without the slightest idea of what you’re talking about? That’s, erm, audacious is the only polite word I can think of. Do you even know anything of Bantu languages at all besides terminology nitpicking?
Well, before my patience runs out:
  • watakao kulathey who want to eat
  • watakaokulathey who will eat
How is -o- prefixed to -la here? Or:
  • watakacho kukisomathey who want to read it
  • watakachokisomathey who will read it
How is -cho- prefixed to -soma? MuDavid 栘𩿠 (talk) 01:50, 6 April 2023 (UTC)Reply[reply]
I was wondering if maybe you meant -taka-, but that is not an auxiliary. It was one historically, but now it is simply a TAM prefix. You said "sometimes". "Sometimes" doesn't mean "all the time", which is what we have here: -taka- is always used for the future tense: it forms a paradigm with wanaokula and waliokula. So, yes, historically this derives from -o as a suffix to an auxiliary verb, watakao kula. But that's -o, not -o-. -o- is the same morpheme in wa-taka-o-kula as it is in wa-li-o-kula. The fact that the historical origin of -taka- from "want" is more transparent than that of -na- or -li- doesn't change the analysis of the -o-.
BTW, -li- and -na- have the same stress assignment as -taka-, which is why you will frequently see these written as wanao kula and walio kula, both in Latin and in Arabic script.
So, if you wish to analyze wanao kula as two words, AUX and lexical verb, then -o is a suffix to the AUX. If you wish to analyze it as a single word, then -o- is one of a string of prefixes to the root. -taka- is more complicated semantically, because if you analyze the above as two words, wanao kula, then you have watakao kula as two lexical verbs (those who want to eat) vs watakao kula as AUX + lexical verb (those who will eat). But that's no different in principle from English "will" as AUX vs lexical verb, or "going to" as verb of motion vs lexicalized as intention or a prospective aspectual marker, and doesn't affect the status of -o.
Anyway, -o- cannot be an infix in watakaokula, because there is no *watakakula for it to be infixed into. That's assuming that, like me, you accept the more lenient definition that an infix can appear between morphemes in a stem; the standard definition requires an infix to be placed inside a root. There are, BTW, true infixes in Bantu languages -- you can find examples in Nurse & Philippson. For example, David Odden says of Rufiji-Ruvuma,
Polysyllabic roots infix the vowel i, as in tukuta → tukwiite 'run'.
That's an infix. kwami (talk) 02:07, 6 April 2023 (UTC)Reply[reply]
Did you even read what I wrote above? I’m perfectly aware that -taka grammaticalised and became a TAM marker (to use better terminology). The details of the grammaticalisation depend on the stress: without a suffix it becomes -ta-, with a suffix it remains -taka-. (And the etymology of -na- and -li- is not less transparent if you know basic Swahili. That’s just the preposition na and the stem -li of -wa.)
Anyway, you’re saying that the “analysis” depends on white space. More than one linguist I have interacted with in the past said that speech is primary, which means white space may not change your “analysis”. Swahili doesn’t put white space everywhere, Tsonga does. So what?
You still haven’t produced even a single reference work that explicitly “analyses” -ye- and its ilk as prefixes, while you did admit several times Bantu works in general call them infixes. MuDavid 栘𩿠 (talk) 02:36, 6 April 2023 (UTC)Reply[reply]
Of course speech is primary. The white space reflects stress assignment, which is speech. Or at least that's Schadeberg's take on it. I was just pointing out that the way speakers write these constructions shows that they see them as being parallel, and that -taka- isn't an auxiliary that just "sometimes" happens to compound with the following verb.
Yes, some Bantuist works call them "infixes". But that's not specific to -ye-/-o-: all non-initial prefixes are called "infixes". -taka- is also an "infix". Watakaokula is a prefix wa- followed by two infixes, -taka- and -o-, and a root kula (unless you want to count the -ku- as a third infix). In Nurse and Philippson, there are places where they call something an "infix (prefix)", presumably because the material they're using calls the prefixes "infixes". There is plenty of discussion in the lit that, for cross-linguistic usage, calling non-initial prefixes "infixes" is not useful, because now you need a new word for infix. That terminology may function acceptably if your language has no infixes, but here on Wikt we have words from languages which do have infixes, and if we call prefixes and suffixes "infixes", what do we call the infixes? This is a terminological distinction, not a difference in analysis. kwami (talk) 02:52, 6 April 2023 (UTC)Reply[reply]
Robert Botne ("Lega" in N&P) speaks of "relative prefixes". E.g., In object relative clauses, the relative prefix occurs in the SUB slot. In subject relative clauses, the prefix occurs in the SP slot, and Object relative constructions require two agreement prefixes, a relative prefix determined by the object noun, followed by a subject prefix determined by the agentive subject.
Here it's the relative that occurs initially and the subject that follows, so it's the subject prefix that would be the "infix" in the tradition you're referring to. The phrase "relative infix" does not occur once in the entire volume.
You keep speaking of the relative prefixes, as if they were morphologically distinct from the object and TAM prefixes that are also often called "infixes" in the Bantu tradition. Can you cite anyone that distinguishes them, object "prefix" -o- vs relative "infix" -o-? If not, then you have no RS for the distinction you draw, and we're reduced to the oft-noted fact than non-initial prefixes have often been called "infixes" in Bantu linguistics.
BTW, that used to be the case more generally in linguistics, but once true infixes started to be found, people left off calling prefixes and suffixes "infixes". That development came late to Bantu linguistics, presumably because infixes are marginal among Bantu languages (and don't occur at all in most). kwami (talk) 03:12, 6 April 2023 (UTC)Reply[reply]
Sigh. Could you pleeease read the context there? The “SUB” slot is the very first one. The “SP” slot is the first one that’s required. So Lega actually has relative prefixes. Sweet, but has no bearing on Swahili. You’re just wasting my time. MuDavid 栘𩿠 (talk) 03:28, 6 April 2023 (UTC)Reply[reply]
Could you please read what I wrote about the context? Yes, we have relative prefixes here. But then we have subject prefixes, which by your definition are infixes. To repeat myself, do you have a RS that Swahili relative infixes are more infixy than Swahili object and TAM infixes? That somehow they're the real infixes, so even if the object and TAM infixes are actually prefixes, the relative infixes remain infixes? That's the gap in your argument I should have started with. kwami (talk) 03:32, 6 April 2023 (UTC)Reply[reply]
I just came across this in an older source:
[Swahili] reciprocal verbs are usually followed by "na" (with) reminding us of the frequent English prefix (or infix) "con".
with reconciled as an example of -con- as an infix in English. This is the same tradition as that of the Bantu 'infix'. Shall we create an entry for the English infix -con-? These are not infixes by modern linguistic definition, and do not fit the definition of 'infix' either here on wikt or on WP, or in modern treatments of morphology or linguistic dictionaries. kwami (talk) 03:38, 6 April 2023 (UTC)Reply[reply]
Okay, a couple things I've found so far, specifically for Swahili and specifically for the relative prefixes (though AFAICT there is no relative–non-relative distinction in the use of 'infix'):
Joan Maw (1999) Swahili for Starters
"Notice that any object prefix comes after the relative prefix."
"In the case of a compound verb, the object prefix occurs only in the main verb, in contrast to the relative prefix, which occurs only in the auxiliary."
Example: kitabu ninachokisoma 'the book which I am reading' vs kitabu nilichokuwa ninakisoma 'the book which I was reading'
Ultimate Swahili Notebook (2020)
"Verb structure in Swahili: Subject prefix - TAM prefix - Relative prefix - Object prefix - Verb stem - Extension(s)"
"Relatives are verbs used as adjectives by being relativised using a relative prefix (or suffix) which agrees with the noun's class."
Example: ndege aliyekufa.
Edward Steere & Augustine Hellier (1934) Swahili Exercises
"The use of the relative prefix in the verb will no doubt be found difficult."
Mühlhäusler, Ludwig & Pagel (2019) Linguistic Ecology and Language Contact. CUP.
"Sheng appears to be losing or to have already lost the subsystems of the object prefix and the nominal relative prefix."
[U.S.] Foreign Service Institute, Earl W. Stevick (1966) Swahili: An Active Introduction
"This form is characterized by a 'relative prefix,' which stands between the tense prefix and the object prefix (if any). The relative prefixes all contain /-o-/, except for the third person singular personal relative prefix, /-ye-/."
Examples: wanaokaa 'those who live', anayekaa 'he/she who lives'
"The relative of the /ta/ tense has /taka/ plus the relative prefix."
"The relative prefix /po/ (Class 16) is often used without any special Class 16 word before it."
Example: Utakapofika penye mto ... 'When you arrive at a stream ...'
So there you have a relative 'prefix' even after the TAM prefix -taka-. And neatly using the double hyphen that was the original point of this thread. kwami (talk) 04:08, 6 April 2023 (UTC)Reply[reply]

@-sche, I’m tired of this pointless discussion. Care to judge? Or how do you suggest we move on from here? MuDavid 栘𩿠 (talk) 02:35, 11 April 2023 (UTC)Reply[reply]

Agreed. I provided you with exactly you asked for, which makes any further argument pointless. kwami (talk) 02:43, 11 April 2023 (UTC)Reply[reply]
I really hoped more people would weigh in here, but in the absence of that, as said above, there was no consensus for the one editor to change these to prefix, against other editors saying infix, so yes, I will revert the changes. - -sche (discuss) 23:09, 11 April 2023 (UTC)Reply[reply]
We have multiple sources that these "infixes" are really prefixes. I concentrated on the specific case of the relative prefixes, because that's what MuDavid was most concerned about, but there are additional RS's for the Swahili concord prefixes in general. If an "infix" on Wiktionary is a non-initial prefix simply because that's what some sources call them, then for consistency we need to add a duplicate entry for English con- as an "infix" because that's what some sources call it in words like reconcile. Shall I start a proposal on a mass creation of English "infixes"? kwami (talk) 23:17, 11 April 2023 (UTC)Reply[reply]
I'd think even the scholars who consider -con- in that word to be an infix would agree that there's a difference between it and the more traditional infixes of some other languages, which occupy inflectional slots, not derivational ones, and which in most cases can be omitted and still leave behind a grammatical word. By contrast, there is no *recile. Soap 16:43, 12 April 2023 (UTC)