Wiktionary:Beer parlour

(Redirected from Wiktionary:BP)
Latest comment: 4 hours ago by Chuck Entz in topic Dhivehi written in Devanagari?

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2025

2024
Earlier years

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Plural uncountable nouns

edit

I've just noticed many plural nouns such as cattle or arms ('warfare') are never labeled as uncountable (in the sense of numbers/numerals) as different from police or staff (compare: three (members of the) crew are... vs an item of clothing/*clothes).

Is there any particular reason for that? JMGN (talk) 10:57, 1 April 2025 (UTC)Reply

I just left a comment at Appendix_talk:Glossary to this effect, but you can find more than a handful of examples of cattle used directly after a numeral, which makes it inaccurate to call it "uncountable" without giving further qualifications to that label.--Urszag (talk) 11:24, 1 April 2025 (UTC)Reply
Could you clarify what you mean by "plural uncountable nouns"? Arms might well be in that category, taking a plural verb and usable with much, but clothing takes a singular verb.
To try to answer your question: I think the reason is that different contributors focus on different aspects of the grammar of English nouns. Some focus on countablily/uncountability, some on number agreement with a verb, some on the form(s) of the plural, and others on the semantics of number. It is not obvious to me how one can consistently present the relevant information in an English noun entry. Further complication of already complicated {{en-noun}} may be necessary.
To start, almost any usually countable noun can be used uncountably with a predicable meaning relative to the countable use. Similarly, uncountable nouns can by used countably. A test for uncountability is use of the noun with a determiner like much ("much arms" can be found) (vs. many with a plural form for countability). Whether an English noun is (un)countable does not transparently answer the question of number agreement with a verb. DCDuring (talk) 14:49, 1 April 2025 (UTC)Reply
"Cattle" is a countable word. You say "some cattle are", not "some cattle is". 2A00:23C5:FE1C:3701:F5D8:C7C2:FAB5:4BC6 15:55, 1 April 2025 (UTC)Reply
Cattle is an uncountable singulare tantum as a mass noun. You say “some cattle are” because notional agreement is mandatory in English. Fay Freak (talk) 19:20, 1 April 2025 (UTC)Reply
@Fay Freak, DCDuring: According to the CGEL,
Uninflected pl-only: [37] i) cattle livestock police poultry1 vermin ii) folk people1 [38] {These cattle belong - *This cattle belongs} to my uncle.
[37i] cannot be used with low numerals, but are found with high round numerals (‘quasi-count’). Their denotation is thought of en masse, with none of the individuation into atomic entities that a low numeral implies. Genuine count nouns (usually of more specific meaning) must be substituted in order for this individuation to take place:
[39] a thousand cattle vs *seven cows/*cattle
ii two hundred police vs four policemen / *police / police officers.
An alternative, in cattle, is to use a quantificational noun: seven head of cattle. JMGN (talk) 19:27, 1 April 2025 (UTC)Reply
They are non sequitur: their denotation is thought of en masse means it is a mass noun thought in one number only and thus not being counted. The agreement is no evidence for it being plural, for the said empirically observed English agreement behaviour. Plural or singular is a morphological category, which is not marked in cattle, grammatical number is a feature […] that expresses count distinctions, while agreement of number within the subject and the predicate is not required, but a syntactic rule hinging on various environment factors including (most commonly) semantics and word order, e.g. in Arabic PSO sentences always have the predicate in the singular when the same sentence in SPO requires agreement, while in turn English has a more loose requirement of subject and predicate numbers agreeing (because agreement in sense trumps the ideal of sentence constituents). Fay Freak (talk) 19:43, 1 April 2025 (UTC)Reply
@Fay Freak: Take into account etymologies (dia- vs synchronic):
< Medieval Latin capitāle, 'holdings, funds' < from neuter of Latin capitālis
https://ahdictionary.com/word/search.html?q=cattle JMGN (talk) 19:57, 1 April 2025 (UTC)Reply
@JMGN: Yeah, the Latin is singular. Translations of course do not even need to be the same POS, let alone number.
See also people your CGEL mentions. Wiktionary labels it as uncountable, e contrario from not labelling any sense but one as countable – which gained abstraction to substitute nation, in the same fashion as United States has gone used with singular predicates due to no certain plurality of individual persons (as states have been personhoods, by analogy to clubs consisting of natural persons) being felt or conceptualized, ousting notional agreement, but United States always stayed a plural. It would not work and be confusing not to. The sense of people called by Wiktionary “plural of person” is lexical suppletion to express plural senses of person by means of a singular. Fay Freak (talk) 20:08, 1 April 2025 (UTC)Reply
CGEL is a good resource, of course, but when it stars *seven cattle and *four police, it is only describing the usage (or judgements) of one group of speakers rather than universal facts about English. Low numerals can in fact be found used with both of these nouns ("four police", "seven cattle").--Urszag (talk) 20:53, 2 April 2025 (UTC)Reply
@Urszag: There's always some dialect somewhere, African o Asian, because after all, who owns English? JMGN (talk) 21:02, 2 April 2025 (UTC)Reply
Both of those links go to texts published in the United States, although I haven't looked up the details of the authors. Of course, time can also be a component.--Urszag (talk) 21:09, 2 April 2025 (UTC)Reply
@Urszag Anyway, it was never about whether that particular word can be used with numerals, but a systematic lack of description in our current labels. JMGN (talk) 21:11, 2 April 2025 (UTC)Reply

en.wikt transl locations for jabonera and sabonera

edit

Consider: a soap factory in the soap industry

una fábrica jabonera

Am I right in perceiving that there is currently a gap in en.wikt because there ought to be some Translations section where jabonera (adj) and sabonera (adj) can go? There is not yet any provision made for them for the noun adjunct use of 'soap' at soap#Noun → Translations, and they obviously don't belong at soapy#Adjective → Translations. It seems to me that there needs to be a noun adjunct sense/use listed at soap#Noun → Translations, and it simply hasn't been added yet. But I have never had occasion to ponder this particular pattern before, and there must be thousands of instances, so I may be missing something. Thus asking here. Quercus solaris (talk) 23:32, 1 April 2025 (UTC)Reply

There've been a few small discussions of this over the years: DCDuring and I discussed it back in 2011, after which I mocked up two ways of handling this, in brass/spruce (give other languages' adjectival translations of English attributive use of regular noun senses their own translation table) and cork (put the translations into the "main" table, with qualifiers highlighting the POS mismatch). It came up again in 2016 (mentioned again in 2017), but I've always been hesitant that there was never a big discussion in which a large number of people formalized either approach. Perhaps we can decide here and now which approach (either of those? or something else?) is best. Then again, maybe entries having taken such approaches since 2011 without objection — indeed, many editors and many entries (e.g. racial) have been taking the cork approach (reasonably listing Rassen- as a German translation despite the POS mismatch) for as long as Wiktionary has existed — means that is the accepted way to handle this. - -sche (discuss) 15:32, 3 April 2025 (UTC)Reply

Batch of new ety-only languages

edit

I'd like to add most of the custom language codes defined in Module:Babel/data as proper etymology-only languages in Module:etymology languages/data. Those are:

  • acf Saint Lucian Creole
  • azb Iranian Azerbaijani
  • be-tarask Taraškievica
  • bs Bosnian
  • cnr Montenegrin
  • crh-RO Dobrujan Tatar
  • diq Zazaki
  • hr Croatian
  • hyw Western Armenian
  • kiu Kirmanjki
  • ko-KP North Korean
  • ku Kurdish
  • nan-PH Philippine Hokkien
  • nan-TW Taiwanese Hokkien
  • sr Serbian
  • zh-CN Mainland Chinese
  • zh-HK Hong Kong Chinese
  • zh-MO Macau Chinese
  • zh-MY Malaysian Chinese
  • zh-SG Singaporean Chinese
  • zh-TW Taiwanese Chinese

(I've excluded two of them: cs-u-sd-cz642 because it's unused and I have a feeling it will break something if added; and hu-formal because it's not really necessary IMO; I'm probably going to remove both of these from Babel as well soon.)

Aside, I think all of these codes should already be etymology-only codes anyway, and adding them will mean that the proficiency categories generated for them are defined in the category tree. Saph (talk) 17:59, 2 April 2025 (UTC)Reply

Absolutely not for Western Armenian, Kurdish, Zazaki and Yugoslav. That would create a mess. Vahag (talk) 18:30, 2 April 2025 (UTC)Reply
None of that looks sensible, even in the names little thoughts have been expended — Kirmanjki, Dobrujan Tatar? Database monsters without looking into the linguistic material will not be nursed. For personal identity in babel boxes okay, otherwise begone. Fay Freak (talk) 19:23, 2 April 2025 (UTC)Reply
OK I'm not too set on this, so nevermind. acf probably should be added as a fully-fledged language though - I'll open a new discussion about that later. Saph (talk) 19:28, 2 April 2025 (UTC)Reply
The rest can always be handled as labels. Nicodene (talk) 19:58, 2 April 2025 (UTC)Reply
I agree with what's been said. For example, Kurdish is a family, not a language, and there is strong consensus not to add etym variants for individual countries where Serbo-Croatian is spoken. Any country-specific variants of "Chinese" or Hokkien need consultation with the Chinese editors before adding. In general these are all tricky cases as shown for example by the previous discussion on Dobrujan Tatar, which appears not to exist. Benwing2 (talk) 02:24, 3 April 2025 (UTC)Reply

Pronunciation of heteronyms

edit

this discussion may have happened many times in the past, but it seems unresolved due to there not being a proper agreed method to handle these. how should multiple pronunciations for heteronyms be handled if the terms share an etymology? see entries like coop, compound, reject, toot, and, somewhat related, ghoti, among many others. Juwan (talk) 22:33, 2 April 2025 (UTC)Reply

Err, I think you answered your own question. Any of the ways we display them in the entries you is decently handled. I have a feeling, though, that you're looking for a Wiktionary: namespace page that states pls do heteronym pron this way. Father of minus 2 (talk) 22:42, 2 April 2025 (UTC)Reply
it seems so! or at the very least an unofficial official way that used across entries, as often Wiktionary guidelines are left unstated. Juwan (talk) 22:51, 2 April 2025 (UTC)Reply
I believe you're referring to how it should be laid out in a Pronunciation section and I believe that the examples at caught and cot show best practice and what is generally done across Wiktionary. I don't want to be pedantic, but if you haven't already, see Wiktionary:Entry_layout#Pronunciation. —Justin (koavf)TCM 00:36, 3 April 2025 (UTC)Reply
I am referring to how it should be layed out, yes, but these examples and the policy don't give a good practice for these specific cases. Juwan (talk) 10:56, 3 April 2025 (UTC)Reply

Disabling Babel categorisation for inactive users

edit

@benwing2, -sche (since Benwing mentioned |nocat= was something you wanted):

There was a vote which ended in a consensus for this, but it was in 2017, so I wanted to reaffirm that this was still the consensus. I've already put together a script for this, and I've done 15 test edits under a bot account and all of them except the first worked fine. I can post the script on Github if anyone would like to see it. Saph (talk) 00:57, 3 April 2025 (UTC)Reply

No objection from me. Benwing2 (talk) 02:25, 3 April 2025 (UTC)Reply
  Strong support 0DF (talk) 12:28, 3 April 2025 (UTC)Reply
Should have been implemented. Probably fine to do it now, even though consensus is old. Vininn126 (talk) 12:57, 3 April 2025 (UTC)Reply
I just noticed, the vote specifies that the users would be moved to categories appended with (inactive); is there any preference between this and, as |nocat= is currently set up, outright disabling categorisation? Saph (talk) 13:04, 3 April 2025 (UTC)Reply
I personally think I prefer the inactive cat as opposed to a total lack of categorization, not just per consensus. Vininn126 (talk) 13:07, 3 April 2025 (UTC)Reply
@Saph: Ceteris paribus, (inactive) is probably better than no categorisation, but it's a minor issue, so if you've already written the script, I would say go with the standard |nocat= function. 0DF (talk) 13:08, 3 April 2025 (UTC) so I second what Vininn126 wrote. 0DF (talk) 13:09, 3 April 2025 (UTC)Reply
Adding (inactive) would be pretty trivial to implement, it would just be a simple edit to MOD:Babel and the category tree, so I'm fine with either way. Saph (talk) 13:55, 3 April 2025 (UTC)Reply
I've implemented this on the Babel end, but adding it to the category tree wasn't as easy as I thought it would be and I've asked on the Discord if someone else more knowledgeable than me can do it. For now I'm going to do an AWB run and convert the |nocat=1 usages to |inactive=1. Saph (talk) 14:51, 3 April 2025 (UTC)Reply
I would prefer something to fetch user contributions and determine that if the last edit is > 1 year ago the categorization is disabled. It is less intrusive than every page being edited for categorization – as most people deem their user page personal, and by extension the editing of others suspect to be inappropriate –, though of course this can also be a regularly bot job, unnecessarily stressing rate limits to flood the recent changes to the project. On the other hand I may be wrong and that operation would be more expensive. Fay Freak (talk) 13:11, 3 April 2025 (UTC)Reply
Unfortunately checking user contributions is not possible in Scribunto, as far as I know at least. Saph (talk) 13:53, 3 April 2025 (UTC)Reply
Yes, Wikimedia categories tend to get updated when something changes, but inactive users, by definition, aren't changing anything. Anything system-based is going to be seriously useless. My main concern is with what happens when someone starts editing again after being away. How long does it take before the categorization catches up? At least something that edits a user page is going to be easily detectable by the user. Then there's the matter of alternate accounts... Chuck Entz (talk) 14:08, 3 April 2025 (UTC)Reply
Support (e/c). If it's easy to implement recategorization into "inactive" categories rather than un-categorization, that seems like a great idea. There was also a brief discussion of this in 2023, associated with which I (for lack, at the time, of a better method) commented out a few inactive users' Babels; if we start tackling this in a proper way, I can probably locate the relevant batch of edits and either revert them (putting the inactive users back into Babel categories for the bot to find) or update them in the same manner as the bot. - -sche (discuss) 14:55, 3 April 2025 (UTC)Reply
Yeah, I saw some of those edits, the bot ran into a few - if you could go through them that would be great. The script handles pages already having |inactive=1 fine, so I would just update it in the same way as the bot. Saph (talk) 14:58, 3 April 2025 (UTC)Reply
OK, I've undone those old edits (except on the page of User:BAICAN XXX, who is blocked anyway). - -sche (discuss) 15:43, 3 April 2025 (UTC)Reply
There seem to be no objections (except from FF, maybe? I can't really tell) so I've gone ahead and created the vote. Saph (talk) 15:08, 3 April 2025 (UTC)Reply

"folk medicine" vs. "alternative medicine"?

edit

Should we categorize them differently or is there too much overlap? I think of "folk medicine" as the traditional medical practices of (especially non-Western) cultures, e.g. Ayurveda, Traditional Chinese Medicine, etc. whereas "alternative medicine" is Western-invented non-evidence-based practices such as chiropractic, homeopathy and aromatherapy. I know it gets fuzzy in that practices like acupuncture and moxibustion are TCM in origin but adopted widely by Westerners. I ask because I just moved 6 Cebuano terms from Category:ceb:Folk medicine (not defined in the category tree) to Category:ceb:Alternative medicine and moved 蒙醫 / 蒙医 (méngyī) (defined as "traditional Mongolian medicine") from Category:zh:Medicine to Category:zh:Alternative medicine, but I don't know if these are the best categories. (Is "traditional Mongolian medicine" in essence the same as Traditional Chinese Medicine? If so do we need to rename the latter?) Benwing2 (talk) 05:31, 3 April 2025 (UTC)Reply

I agree with your thought process: there is overlap that does not lend itself to pure dichotomization. Thus, there is no perfect answer about how to categorize them in a dichotomous way. Fuzzy logic could apply, with differing percentages depending on the term, such as X%-weighting-for alternative medicine and Y%-weighting-for folk medicine. The same theme is also true regarding alternative medicine versus quackery, although the quackiest kind of quackery is when the doctor consciously knows that the treatment is useless and sells it anyway, as opposed to instances when the doctor mistakenly believes in the treatment. But there is also the layer on which it is true that if the placebo effect brings peace of mind or other customer satisfaction to the patient (whether blinded or even nonblinded), then the placebo cannot be said to be useless, because even though it has zero efficacy, it has more than zero effectiveness. Quercus solaris (talk) 06:19, 3 April 2025 (UTC)Reply
Maybe folk medicine is a part of alternative medicine? (at least if used in modern society which sees the difference between alternative and scientific methods). Also, I will not consider stuff like bustein as alternative medicine, but it is still a part of old folk medicine, as long you don’t use it today (then it’s gonna be alternative medicine). Tollef Salemann (talk) 06:30, 3 April 2025 (UTC)Reply
This seems a good idea; maybe folk medicine is a subcategory of alternative medicine that includes all manner of traditional medical practices, so for example, Ayurveda and TCM have folk medicine as the parent category, which in turn has alternative medicine as a parent category, while chiropractic and homeopathy directly have alternative medicine as the parent category. Benwing2 (talk) 06:34, 3 April 2025 (UTC)Reply
You have it backwards. Alternative medicine is a single modern culture of "traditional" medicine, from an anthropological or historical lens. The fact that we call it traditional medicine rather than something not time-specific is what's likely misleading you. — Ganjabarah (talk) 02:06, 23 April 2025 (UTC)Reply
Agreed (with what Tollef Salemann said). Etic versus emic worldviews; variable ontologic construal. The folk medicine of pre-Columbian Amerindian peoples was just plain medicine within their worldview; and there was no alternative medicine. Among people today who are willing to use alternative medicine, many would not bother with a magic stone, because they are seeking things that "actually work" (as far as they know or believe), so to them, the stone is ancient folk medicine but nothing else; but some of them will embrace the stone and love it, and if they do, then it is alternative medicine in their hands. Quercus solaris (talk) 06:39, 3 April 2025 (UTC)Reply
OK this all sounds good but I'm not understanding how you're proposing to structure the categories, or even if you're making any proposal at all. Benwing2 (talk) 06:42, 3 April 2025 (UTC)Reply
Unfortunately I mean that any strictly hierarchical categorization (versus a fuzzy logic one) is fine (good enough) but is also incapable of modeling the reality 100% accurately. Which doesn't mean that a method can't be chosen for it! Nor that it is futile. Just that it is capable at most of being an approximation, a stylized representation. With that being true, I am agnostic as to which option to choose for its design. Whether impressionism or cubism, both could be nice, even though neither is photorealism. Quercus solaris (talk) 06:49, 3 April 2025 (UTC)Reply
Trying to sound less insane about it (lol), it's like email folders (like Outlook traditionally uses) versus email labels (like Gmail uses): each treatment could have any label applied, or more than one (i.e., alternative med, folk med, or both, or quackery), but which labels apply to each treatment is not the same yes/no value for all people (it varies by who is judging it). Thus, trying to put one label totally inside another one (as if subfolder into bigger folder) doesn't apply. Quercus solaris (talk) 07:10, 3 April 2025 (UTC)Reply
It’s not Western invention versus Eastern tradition, instead the West and the East has vaguely evidence-based tradition, as we had pharmacopoeiae before one knew how to do clinical studies or cell physiology worked, based on anecdotal evidence like even today medical experience often has to make conclusions from observational studies, and later due to the abuse of media there came to be belief systems vaguely built on tradition and quackery in the fashion of conspiracy theories, woo
You of course pick existing vocabulary to sound superficially reasonable and attract supporters that aren’t reliably trained in critical thinking, since to some degree it always presupposed academicism still nowhere attained in the masses and often not even after a college diploma: not everyone is equally attentive. Interestingly we have an extensive article about pseudolegal practice and advice espoused by Reichsbürger and freemen on the land; perhaps we will add Category:Pseudolaw.
That being said it is correct that we have Category:Alternative medicine sorted under Category:Pseudoscience, as we could as well call it Category:Pseudomedicine, while Category:Folk medicine could be in another sense alternative medicine, that which is still practised in spite of not being confirmed, and could be tried to go beyond medical school medicine; but Category:Pharmacology or Category:Drugs, and the other intervention types, are actually quite sufficient categories, since internet visitors, even the reasonable ones in turn, are confused enough not to be relied upon in distinguishing pure folk medicine, though it be greatly enough defined in the dictionary. Fay Freak (talk) 10:32, 3 April 2025 (UTC)Reply
You might say that "alternative medicine is folk medicine with an army and a navy". The word "folk", like "dialect", tends to imply something rural and backward, while "alternative" tends to imply that sophisticated/normal people have decided to try something different. Chuck Entz (talk) 13:56, 3 April 2025 (UTC)Reply
Uhh how about "traditional medicine" then instead of "folk medicine"? Benwing2 (talk) 19:17, 3 April 2025 (UTC)Reply
@Fay Freak — Not sure whether I read it right at "You of course pick existing vocabulary to sound superficially reasonable and attract supporters that aren’t reliably trained" — but I'm not personally espousing any woo-woo or hoo-haw at all. I'm mentioning the epistemologic view of physicians who reject reductionism as constituting all of science rather than just a part of it. Scientists with a maximally reductive view believe that even having any agency like NCCIH exist, at all, equals promoting pseudoscience. But many scientists disagree with that assertion, because of factors such as (1) they view studying the placebo effect as scientifically valid, as to whatever extent a placebo helps the patient (even when nonblinded) it is not useless to the purpose of health care, and (2) there are some things that science so far hasn't fully understood but may come to understand better, and then a particular treatment that many people were labeling as "alternative" meaning "against science" will be viewed as "not against science but rather valid per the concept of effectiveness as differentiable from efficacy (distal causality rather than proximal causality)." An appeal to authority saying basically that "the plebs should shut up and not even try to understand what science is except to accept whatever a high priest of science tells them it is" is counterproductive to those high priests' own self-interests in the end, anyway. A thumbnail example of why is that in 1960 or thereabouts your doctor would prescribe you some thalidomide for your morning sickness and recommend that you smoke Chesterfields instead of Marlboros because they're better for your lungs. Oops, guess today's current state of science isn't a last word forevermore, huh? The plebs aren't willing to take the high priests' word without the high priests being willing to discuss and defend and debunk. The fact that JAQing off and sealioning are endlessly tiresome and are often done disingenuously by shysters doesn't negate the fact that the poor little plebs demand to be included epistemologically and will throw a pitchfork revolt if they think they're being disrespected and discounted by the high priests. It is therefore in the high priests' own self-interest to somehow live with and deal with the burden of explaining and discussing and defending and debunking. I well realize how hard of a problem it is. But the alternatives are even worse, though. Quercus solaris (talk) 15:53, 3 April 2025 (UTC)Reply
@Quercus solaris: This was great to read 😂. I did not imagine anyone to espouse woo-woo or hoo-haw here, but this illustrates how any field of study, scientific or not, is practiced and enforced on the basis of reproduction of previous treatments compared with empirical reality and statistical assumptions and interwoven with personal stakes in entering a science and maintaining positions therein, dragging down the recognition of health causality in a stream of tradition representing individual custom and habit, in Hume’s terms.
The question is then at which point it becomes belligerent enough to earn the label “alternative”, “fringe” or “minority” view, with its army and navy, and when a former majority-accepted view has gathered enough dust and blows to contrast with the state of art due to its subterfugial evidence base, never going with time in marketing itself. The difference is that of two genres of art, one very old, largely oblivial as much as it is maladaptive, and recent ventures one can still inflate as a consequence of little risk-aversion and attractiveness of new products, and how can so many buyers be wrong? Like cryptocurrencies popular enough to take off, by the volatility of which some people lost their fortunes, whereas for gold coins you don’t have to argue, everybody knows what he gets from them, though they be of limited direct practical use. These are all things targetting communities or communal identities. Fay Freak (talk) 19:36, 3 April 2025 (UTC)Reply

Final proposed modifications to the Universal Code of Conduct Enforcement Guidelines and U4C Charter now posted

edit

The proposed modifications to the Universal Code of Conduct Enforcement Guidelines and the U4C Charter are now on Meta-wiki for community notice in advance of the voting period. This final draft was developed from the previous two rounds of community review. Community members will be able to vote on these modifications starting on 17 April 2025. The vote will close on 1 May 2025, and results will be announced no later than 12 May 2025. The U4C election period, starting with a call for candidates, will open immediately following the announcement of the review results. More information will be posted on the wiki page for the election soon.

Please be advised that this process will require more messages to be sent here over the next two months.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.

Please share this message with members of your community so they can participate as well.

-- In cooperation with the U4C, Keegan (WMF) (talk) 02:05, 4 April 2025 (UTC)Reply

@Vilipender: UCCEG? It would be funnier if they were the Supreme Universal Code of Conduct Enforcement Guidelines. 0DF (talk) 22:33, 8 April 2025 (UTC)Reply
UCOC, you cock. Supreme Universal Extract of Code of Conduct - SUX COCVilipender (talk) 08:15, 9 April 2025 (UTC)Reply
Well don't I look like an innocent little wilting violet? 0DF (talk) 01:06, 10 April 2025 (UTC)Reply
Long live Wiktionary's hatred of the corporate overlords who make the rules but never buy us dinner. (We know what they are spending the donations on.) 2A00:23C5:FE1C:3701:F050:4AD7:86D3:BA99 01:11, 10 April 2025 (UTC)Reply

Direct object omitted in ditransitive verbs

edit

He does a bit of painting, but he doesn't like to show people.

What is the best way to reflect this behavior by some verbs? JMGN (talk) 12:06, 4 April 2025 (UTC)Reply

@JMGN We might be justified to add an extra sense for this behavior since it seems to be limited to a few ditransitive verbs.
You can say:
Can you show me how you did that? ― Yes, I'll show you.
But you can't say:
Can you bring me a sandwich? ― Yes, I'll bring you.
By the way, the entry for show doesn't even mention that it can be ditransitive, so maybe we should fix that too. Here is my draft:
  1. (transitive, ditransitive) To display (something), to have (somebody) see (something).
    The car's dull finish showed years of neglect.
    I showed him my brand new computer yesterday.
  2. (transitive) To have (somebody) see.
    I do some sculpturing, but I don't like to show anybody.
Tc14Hd (aka Marc) (talk) 22:29, 5 April 2025 (UTC)Reply
@Tc14Hd
Many trivalent verbs in the fields of communication and giving/transfer show alternation between a construction with object + to PP and one with two objects: She showed the new draft to her tutor ∼ She showed her tutor the new draft
[43] IO or 'to': with such verbs as tell, read, show, teach, IO and DO are aligned with less central cases of recipient and theme: the IO-referent comes to hear, see, or learn what is expressed by DO, rather than to have it.
Transitive/intransitive contrasts [IV]: They fined us ($100).
The single object of the monotr corresponds to the IO of the ditr (i.e., us), but other verbs such as charge allow both types of omission:
[52] Compare:
They fined/charged us ($100)
They charged/*fined $100.
The verbs in [53] follow the pattern of charge:
[53] bet, cost, envy, excuse, forgive, refuse, SHOW, teach, tell. Yet, there's a distinction between the understood elements, which are either (in)definite:
I asked him the price but he wouldn’t tell me (sc. “the price”: definite)
He tells lies / dirty jokes (addressee indefinite).
If he were surprised, he didn’t show it.
Cost only in informal style (That’ll cost you, i.e., with “a lot” understood) or in the idiom to cost sb dear (where the syntactic analysis of dear is unclear).
Factive and entailing governors (i.e., the content clause complement is normally presupposed):
[42] (ii) entailing and non-factive a. happen, prove, SHOW, turn out.
Licensing of subordinate interrogatives:
[17] (iv) TELLING: tell, inform, point out, show
Huddleston, R., & Pullum, G. (2005). The Cambridge grammar of the English language. JMGN (talk) 22:45, 5 April 2025 (UTC)Reply
@JMGN Okay, seems like this affects more verbs than I assumed. We should probably just use one sense definition which lists example sentences for all three cases. Tc14Hd (aka Marc) (talk) 01:47, 6 April 2025 (UTC)Reply

names of cities in India

edit

I am expanding the list of cities in Module:place/shared-data to include at least all the cities over 1,000,000 people as of the most recent (2011) census (the 2021 census has been repeatedly delayed and is not yet conducted). Per w:List of cities in India by population there are 46 cities over 1,000,000 people (city proper, not metro area) as of the 2011 census; by now there are certainly more. I would like to solicit opinions on some issues:

  1. Bangalore vs. Bengaluru. Wikipedia just renamed their article earlier this year, but (a) Wikipedia tends to lean towards official names and endonyms rather than common names, despite the "common name" policy; (b) per the move request closer, this was a close call. @-sche has proposed using Google Scholar as a first-line source; a search using only >= 2024 sources shows 17,900 occurrences of Bangalore vs. 17,300 of Bengaluru. I am inclined to keep using "Bangalore" for now with Bengaluru as an alias, but am open to suggestions.
  2. Cities renamed for political reasons. We have at least the cases of Aurangabad vs. Chhatrapati Sambhajinagar (a red link!) and Allahabad vs. Prayagraj. Wikipedia has the former at w:Aurangabad but the latter at w:Prayagraj. The Google Scholar test for the latter using only >= 2024 sources shows 10,700 Allahabad vs. 7,100 Prayagraj. I think the case for Aurangabad is obvious and I am inclined to use Allahabad over Prayagraj.
  3. Metro areas anchored by multiple smaller cities rather than a single large city. There are three of them in the list: Pimpri-Chinchwad, Kalyan-Dombivali/Kalyan-Dombivli and Vasai-Virar. Should we use the hyphenated names for categories or should we use the component cities, and if we go with the latter option should we include all 6 cities or only some? One possibility is to use the hyphenated names but add the components as aliases that are recognized but categorize under the combined name.
  4. The second hyphenated city is further confused by the variants Kalyan-Dombivali vs. Kalyan-Dombivli. Wikipedia puts the city at Kalyan-Dombivli with no mention whatsoever of the variant Kalyan-Dombivali, but the above list article uses Kalyan-Dombivali, and so do we. What is the deal here and which variant should we use? The lack of context in the Wikipedia article makes it hard for me to judge what to do.
  5. Metro areas over 1,000,000 inhabitants. w:List of million-plus urban agglomerations in India lists 52 such metro areas per the 2011 census and 65 according to the 2023 report of Demographia (which uses 2023 estimates). Maybe I should go off one of these two lists instead of the city-proper estimates; this also eliminates the issue with Pimpri-Chinchwad, Kalyan-Dombiv(a)li and Vasai-Virar, which are all satellite cities that get included in a larger metro area. (But on the other hand it introduces new issues with Durg-Bhilainagar and Hubli-Dharwad -- both red links for us.)

Benwing2 (talk) 03:01, 6 April 2025 (UTC)Reply

OK I've decided to go with the Demographia list of 65 metro areas. The Wikipedia article on Durg-Bhilainagar redirects to Bhilai so I've used that as the category name, with Durg, Durg-Bhilai, Durg-Bhilainagar and Bhilainagar all existing as aliases. Hubli-Dharwad per the Wikipedia article uses that name with Hubli and Dharwad as aliases. Comments welcome. Benwing2 (talk) 03:36, 6 April 2025 (UTC)Reply
I support going with official names like Bengaluru and Prayagraj which are well-established and dominant by now. Aurangabad to Chhatrapati Sambhajinagar is a recent rename and the new name is longer, so it isn't as popular as others. Maybe in some years Sambhajinagar (without the honorofic) would become popular and usable. Svārtava (tɕ) 09:59, 6 April 2025 (UTC)Reply
@Benwing2: Cities renamed for political reasons
  • The renaming of Indian cities is certainly politically motivated to replace Mughal- or European-sounding names.
  • Therefore, there may be prescriptivist pressure towards using the new official names, because they are argued to be a ‘purer name’ or the ‘original name’.
  • Except in well-known cases such as the renaming of ‘Bombay’ to ‘Mumbai’, determining which name is more common may require some investigation.
I am inclined to keep using "Bangalore" for now with Bengaluru as an alias
  • Although Indian English may have accepted ‘Bengaluru’, non-Indian English still prefers the anglicised form ‘Bangalore’.
  • So, I can agree to keep using ‘Bangalore’.
I think the case for Aurangabad is obvious
  • I agree. As @Svartava implied, not all Indian English may be accustomed to using the new name of Aurangabad/Sambhajinagar, since it only occurred in June 2022, and it is quite lengthy.
  • If the new name of Aurangabad/Sambhajinagar catches on, ‘Chhatrapati’ would likely be dropped in ordinary usage but still maintained in purist usage. For comparison, Mumbai’s airport is officially called ‘Chhatrapati Shivaji Maharaj International Airport’, but is almost never referred to as such in ordinary speech.
  • Perhaps something to investigate would be how strong the prescriptivism is for ‘Chhatrapati Sambhajinagar’ outside the state of Maharashtra and the Marathi language.
Regarding Dombiv(a)li,
  • Whether or not to retain schwas as ‘a’ in English renderings of Indian names from schwa-dropping languages is a widespread issue.
  • It should be noted that the first ‘a’ representing an orthographic schwa in the Indian state of ‘Gujarat’ is often deleted in ordinary speech but may be retained in careful speech. On the other hand, the underlyingly identical term (with a different referent) is spelled in English as ‘Gujrat’ for the city in Pakistan.
  • The ‘a’ in ‘Dombiv(a)li’ could be dropped as per Wiktionary’s transliteration policy of dropping orthographic schwas in both Marathi and Hindi terms, since the case retention is not as strong as it is for the Indian state of ‘Gujarat’.
the issue with Pimpri-Chinchwad, Kalyan-Dombiv(a)li and Vasai-Virar, which are all satellite cities that get included in a larger metro area
  • These satellite cities in Maharashtra are large suburbs of their respective main city that are differentiated for administrative and legal reasons.
  • Kalyan, Dombiv(a)li, Vasai, Virar, Navi Mumbai and Thane are large suburbs of Mumbai. Of these satellite cities of Mumbai, Navi Mumbai and Thane would be the strongest cases for being separate cities altogether.
  • Pimpri and Chinchwad are essentially suburbs of Pune.
  • In any case, using hyphenated compound forms, including Hubli-Dharwad in Karnataka, is very awkward.
Kutchkutch (talk) 08:53, 23 April 2025 (UTC)Reply
@Benwing2: I concur with Svārtava in favouring official names in the vast majority of cases, but, my word, Chhatrapati Sambhajinagar is unwieldy. I took it upon myself to etymologise Aurangabad and to create Chhatrapati Sambhajinagar. The latter literally means “Parasol-lord Sambhaji’s city”, and has the same number of syllables if you don't drop your schwas. I can't imagine anyone would say the whole thing everytime he referred to the city. By contrast, a name analogous to *Sambhajiton sounds a lot more manageable, so maybe we can lemmatise Sambhajinagar if and when that catches on. 0DF (talk) 01:37, 16 May 2025 (UTC)Reply

Redirected derivations categories

edit

On the 22nd of March, Ahsan Mahim Ʒaaz moved Category:Bengali terms borrowed from Persian to Category:Bengali terms borrowed from Iranian Persian, such that the former now redirects to the latter. Is this desirable? I suspect it's not. 0DF (talk) 22:58, 6 April 2025 (UTC)Reply

@0DF Definitely not. Persian includes at least Dari as well as Iranian Persian. Benwing2 (talk) 06:32, 7 April 2025 (UTC)Reply
I undid the move. Benwing2 (talk) 08:34, 7 April 2025 (UTC)Reply
@Benwing2: Thank you. As a general point, is redirecting derivations categories ever a good idea? I notice that pages added to a redirecting category do not get added to the redirected-to category. 0DF (talk) 12:19, 7 April 2025 (UTC)Reply
No it's not; in general redirecting categories doesn't really work. Benwing2 (talk) 19:15, 7 April 2025 (UTC)Reply
@Benwing2: I thought not; thanks. I'll undo any such redirections I see in the future. 0DF (talk) 22:30, 8 April 2025 (UTC)Reply
edit

@-sche @Ioaxxere I've now ended up creating CAT:Terms for fingers as a set category to avoid conflicting with CAT:Fingers (a related-to category), and now CAT:Individual buildings (see CAT:zh:Individual buildings for a specific notable library in Huzhou) for notable individual buildings vs. CAT:Buildings for types of buildings. We're getting to the point where we need to solve this properly. I almost thought of naming the category CAT:zh:Notable buildings but that seems like it wouldn't generalize; e.g. deserts beyond a certain very small size are inherently notable, so CAT:Notable deserts seems counterproductive, and CAT:Notable airports semi-so. (FWIW CAT:Deserts and CAT:Airports are name categories.) Benwing2 (talk) 06:32, 7 April 2025 (UTC)Reply

I maintain that we need to name every (type of) category in a way that explicitly spells out the intended scope, leaving no category at all named just "CAT:Deserts" or "CAT:Waterfalls" etc, because it is demonstrated all around Wiktionary that if any category is named just "CAT:Waterfalls" (etc), each different person is liable to use it for something different (as discussed here and elsewhere). Perhaps "CAT:Individual buildings" (or "CAT:Names of individual buildings"?), "CAT:Types of buildings", and (hypothetically) "CAT:Terms related to buildings"? (In the case of buildings, that last one might not exist, but replace "buildings" with e.g. "rivers".) It is conceivable that people might want to make exceptions for specific kinds of category, e.g. to let CAT:Cities continue to be named that (as opposed to renaming it CAT:Individual cities), but then again... the ambiguity of "CAT:Cities" means people have put pomerium (term related to the topic of cities) and capital city (type of city) into CAT:en:Cities... - -sche (discuss) 16:29, 7 April 2025 (UTC)Reply

Old Sundanese spelling

edit

Should lemmas in Old Sundanese be spelt with the modern Sundanese Latin orthography, or a modified version of the Sundanese spelling as seen in Old Javanese lemmas? (e.g. with ṅ, ñ, ĕ). Yes, there are already 83 lemmas in total (alll of them using the modern Sundanese spelling), but I think a new spelling for Old Sundanese would be interesting. Udaradingin (talk) 09:06, 7 April 2025 (UTC)Reply

@Udaradingin: Which forms are actually attested? 0DF (talk) 12:14, 7 April 2025 (UTC)Reply
For Old Sundanese? I mean they use Old Sundanese script, Kawi, or Pallava. What I meant was romanization of OS (sorry for not clarifying earlier), like how lemmas in Old Javanese use romanization as the main entry (for example abhiṣeka, aḍaṅ, etc.). And these romanization uses one letter to represent one sound (e.g. for /ŋ/ instead of ng, ñ for /ɲ/ instead of ny, differentiation between e and ĕ, etc.). I'm asking your opinions if the Old Sundanese entries should be created following the Old Javanese romanizations (or at least based of it). What do you think? Udaradingin (talk) 13:57, 7 April 2025 (UTC)Reply
Here's an example of the proposed spelling:
"Ini silokana: mas, pirak, komala, hintĕn, ya ta saṅhyaṅ catur yogya ṅara(n)na. Ini kaliṅana. Mas ma ṅaranna sabda tuhu tĕpĕt byakta pañcāksara. Pirak ta ma ṅaranya ambĕk rahayu. Komala ma ṅaranya gĕi(ṅ)na padaṅ caaṅ lĕga loganda. Hintĕn ma ṅaranya caṅciṅ sĕri sĕmu imut rame ambĕk. Ya ta sinaṅguh catur yogya ṅarana."
(Sanghyang Siksakandang Karesian) Udaradingin (talk) 14:14, 7 April 2025 (UTC)Reply
@Udaradingin: Well, I believe we should lemmatise the forms in the Old-Sundanese/Pallava and/or Buda/Kawi scripts. The important thing in Romanisation, as far as I'm concerned, is that the original-script form be reconstructable from the Romanisation. That is usually facilitated by a bijective glyph-to-glyph correspondence. If the original script embodies the “one dedicated letter for each sound” principle, then so should the Romanisation. That is a convoluted way of saying that I agree that the Romanisation of Old Sundanese terms should be like those of Old Javanese rather than like those of modern Sundanese, but that the Romanisations should not be the main entries for Old Sundanese. 0DF (talk) 15:35, 7 April 2025 (UTC)Reply
@0DF I totally understand your point and I agree, especially with the last part. But, there are several reasons why I think making Old Sundanese/Kawi/Buda/Pallava script as the main entry of OS would be difficult:
  1. The number of sources providing the original Old Sundanese texts in said scripts are very limited, with most sources only giving the romanization or at least the modern-Sundanified (read: heavily edited so that modern Sundanese people are able to comprehend the text). Counterpoint being, the existence of photographs and rubbings of inscriptions and/or manuscripts, especially this image of the Astana Gede inscriptions and this facsimile of Carita Waruga Guru. But then again, very limited.
  2. Old Sundanese are (in my opinion) are generally less-researched and less-documented compared to Old Javanese (this might be related to the 1st reason).
  3. Much like Old English, the Old Sundanese orthography wasn't standardized as it is in the modern one, let alone other scripts like Kawi, Buda, and Pallava.
  4. Because we are trying to follow the examples of Old Javanese lemmas, I think it would be easier and more consistent if we only change spelling to be like those of OJ (still being in Latin alphabet) rather than changing the writing system into OS/Kawi/Buda/etc. This would bring a balance between a sense of familiarity (comprehension) and novelty (spelling system).
However, I do take your input, so thanks! Udaradingin (talk) 16:53, 7 April 2025 (UTC)Reply
@0DF However, I'm thinking of a middle path here. We could put the OS/Kawi script as a soft redirect to the main entry in Latin, similar to how Sundanese or Pali entries are structured (examples for Sundanese and Pali entry). Do you have anything in mind about this? Udaradingin (talk) 01:02, 8 April 2025 (UTC)Reply
@Udaradingin: You say that most sources for Old Sundanese only give the Romanisation. What do you mean by "sources" here? Do you mean dictionaries, grammars, and other sources that discuss Old Sundanese, or do you mean texts from the Old Sundanese corpus are usually published in Romanised form? Pali was not originally written in the Latin script, but the editions of the Pāli Text Society mean that a lot of the Pali corpus exists in Romanised form; is that the case for Old Sundanese? This has a bearing on the issue of attestability of forms.
You wrote that “[m]uch like Old English, the Old Sundanese orthography wasn't standardized”. My understanding of Old English writing is that there is variation that reflects dialectal differences in pronunciation and inflection. Is this the case with Old Sundanese? I don't see why this would be a problem, since any Romanised system would also need to reflect such differences. We would, of course, need to choose lemmata from the range of attested forms, but that's a problem no matter the writing system, surely. 0DF (talk) 00:58, 10 April 2025 (UTC)Reply
@0DF Yes, what I meant by "sources" is that dictionaries, books, etc. generally publish Old Sundanese using the Latin script. This trend dates back to at least the 53rd edition of Tijdschrift voor Indische Taal- Land- en Volkenkunde (1913) where they used (at the time) the Dutch-influenced spelling. Even today, most scholarly materials today follow this convention. This suggests that what we often work with are editorial spellings or diplomatic transliterations rather than the reproductions of the original manuscripts.
I think I agree with your opinion on the 'unstandardized orthography' of the Sundanese system being unproblematic. But I think we should take into consideration that not only is OS less standardized than Old English, but the corpuses also had varying practices in different regions and periods, so it would be difficult to establish a uniform system were if we to not use a single system of writing.
The usage of Kawi, while recently implemented to Unicode, can still not be properly rendered in some devices. While the Buda script isn't even in the Unicode at all (as of yet). This gives it a kind of a technical problem. Given that Romanized texts are more readily available, it gives a more accessible point of reference, especially for public audiences in Wiktionary. This is also how we can select the lemmata from the attested forms. Udaradingin (talk) 12:03, 29 April 2025 (UTC)Reply
@Udaradingin: It sounds like we should be using the Dutch-influenced orthography, in that case, if most texts in the corpus are published in it. 0DF (talk) 01:18, 10 May 2025 (UTC)Reply
@0DF I wasn’t saying that most Old Sundanese texts are written in Dutch-influenced (Van Ophuijsen) spelling. I cited the 1913 edition of the TITLV just to show that Latin orthography has been in use for Old Sundanese since the early 20th century. The Dutch-based spellings were shaped by the orthographic conventions of the time and is intended for a colonial-era readership. For example, the term laṅṅit was spelled as langngit. Same goes on for purasani (poerasani), ja (dja), cikal (tjikal), and so on. Today, modern Indonesian spelling has shifted toward, or at least influenced by the EYD system. However, like in Dutch-influenced spelling, it often merged or obscured phonemic distinctions important for understanding Old Sundanese (like ng vs. , ny vs. ñ, or e vs. ĕ). One could not whether the word dunya is spelled/pronounced as "du-nya" (duña, /duɲa/) or "dun-ya" (dunya, /dunja/) ultil they check at the original text.
So rather than adopting a 20th-century Dutch-influenced system entirely, it makes more sense to follow a modified Latin orthography that reflects current understanding of the language, just like in Latinized Old Javanese entries. Adapt rather than adopt, I'd say. Udaradingin (talk) 19:09, 15 May 2025 (UTC)Reply

Romanization?

edit

As a part of speech, Romanization seems like a weird label. The definition is : "The act or process of putting text into the Latin (Roman) alphabet, by means such as transliteration and transcription."

But the underlying words or phrases being Romanized have their own parts-of-speech in their native language, don't they? There are about 117K entries in English Wiktionary with "romanization" for part-of-speech, so I'm not suggesting any changes. I just want to understand why this category exists as a pos and not say a "form" or "form of" or "alt" section, etc, in an entry that has the actual pos for the Romanized word.

Also, is an entry like ꜣꜥy even Romanized, according to the definition? 2/3 of those characters don't look like Roman letters to me. ( Btw, I think it's amazingly cool you can look up ancient Heiroglphys like ꜥꜣꜥ. What a great tool this is!) Killeroonie (talk) 04:08, 9 April 2025 (UTC)Reply

@Killeroonie: I think the operative sense when it comes to the “Romanization” POS header is rather the countable one: “An instance (a string) of text transliterated or transcribed from another alphabet into the Latin alphabet.”
The characters and are letters of the Latin script, according to Unicode. 0DF (talk) 01:03, 10 April 2025 (UTC)Reply

Add Kaitag code

edit

please add xdq, code for Kaitaq, to the language list. Make Dargwa great again (talk) 08:09, 9 April 2025 (UTC)Reply

I predicted this day will come. @Make Dargwa great again, why do you need a code for Kaitag? You can handle Kaitag forms under Dargwa (code dar), labelling them as {{tlb|dar|Kaitag}}, as I did in тамбал (tambal). Vahag (talk) 09:18, 9 April 2025 (UTC)Reply
I guess I need a code because it is a language of its own with seven dialects of its own[1]. Probably other codes are needed too. Why is it a problem to add new codes? Make Dargwa great again (talk) 11:19, 9 April 2025 (UTC)Reply
To split a language we must know there will be people who will maintain the split, otherwise there will be asynchronization and bardak, with uncared-for split varieties coexisting with unsplit Dargwa.
If you want to separate Kaitag from Dargwa, then we should also split the other 16 Dargwa varieties and you should promise us that:
1) You will go through all of Category:Dargwa lemmas and assign each lemma to a newly split variety.
2) You will go through all of Dargwa translations given in the translation tables of English terms and assign each translation to a newly split variety.
3) You will add Category:Dargwa lemmas to your watchlist and will assign new lemmas added by others to newly split varieties. You will also stick around to tell newbies that Dargwa is not a single language anymore and will teach them how to assign Dargwa words to a proper variety. They will not know it because Dargwa is treated as a single language by all dictionaries published in Russia.
4) You will split the entry Dargwa аба (aba) into 16 new sections to demonstrate proof of concept.
Do you agree? Vahag (talk) 13:21, 9 April 2025 (UTC)Reply
Fekk no. Are you splitting off a code on wiktionary or marrying off a daughter? Fekking job interview here. Make Dargwa great again (talk) 14:16, 9 April 2025 (UTC)Reply
Then you don't get a code, sorry. Vahag (talk) 14:20, 9 April 2025 (UTC)Reply
As the equality commissioner of Wiktionary, I express my harmony with this decision. Fay Freak (talk) 14:43, 9 April 2025 (UTC)Reply
I checked the first 15 translations or so, all of them are into standard northern Dargwa. I don't get why it is needed to split the other 16 Dargwa varieties at once? Make Dargwa great again (talk) 18:46, 9 April 2025 (UTC)Reply
If we split Kaitag, that would create a precedent. The activists of the other Dargwa varieties would demand a code like you do.
Understand that splitting a language is like a vasectomy: it is possible and even reversible, but it is painful. I suspect if we give you a Kaitag code, you will create like 10 entries then disappear; none of our Daghestani editors have stayed around. Then an etymologist like me will be stuck with the need to check each Dargwa word I may occasionally add to the Etymology or Descendants section to see if it is a Kaitag Dargwa or non-Kaitag Dargwa to assign the proper code (most etymological sources would not mention the variety, they would simply say "Dargwa"). This burden is worth bearing only if the new code attracts a dedicated editor. I suggest you contribute under the header ==Dargwa== using the code dar and the label {{lb|dar|Kaitag}} for a couple of months to see if you are that editor. Vahag (talk) 09:27, 10 April 2025 (UTC)Reply
There are now nearly 300 Kaitag entries on French Wiktionary. [2]. Meanwhile, here... Make Dargwa great again (talk) 18:49, 3 May 2025 (UTC)Reply
@Make Dargwa great again: Ok, you passed the first test by contributing for more than one week. You're not a mayfly. If we make a xdq code, we are required to define its relationship with dar. Would it be a descendant of dar or a sister language of dar descending from Category:Proto-Dargwa language? Vahag (talk) 09:24, 4 May 2025 (UTC)Reply
A sister-language of dar. Make Dargwa great again (talk) 09:55, 4 May 2025 (UTC)Reply
Is there a difference in alphabet from literary Dargwa alphabet? Vahag (talk) 10:21, 4 May 2025 (UTC)Reply
The Soviet orthography is more or less the same, but the IPA values are a bit different. There is also a new orthography developed by Alkaitagi[3] (it is accepted [4]). You can see the comparison here [5].Make Dargwa great again (talk) 14:27, 4 May 2025 (UTC)Reply

Same layout for pronounciation?

edit

I think it would be nice to have one common standard of giving pronounciation. For example, take a look at these English entries: the, of, and, to, in and conpare the "stressed" caption. You can see that all these five entries use a slightly different layout.

In my opinion it's more likely that a reader wants to learn all pronounciations for one variety of English rather than learn all pronounciations of a stressed form. So my suggested layout would look like this:

British English

  • RP
    • stressed form
    • unstressed form
  • Some other British accent

American English

  • GA
    • streessed form
    • unstressed form
  • Some other American accent

Some other English variety

Even if this layout isn't perfect, I feel like some kind of standarisation and common layout is important 185.18.68.210 22:19, 9 April 2025 (UTC)Reply

Our current layout for pronunciations is here: Wiktionary:Entry_layout#Pronunciation. If you're proposing some change to that page, it will probably have to follow a vote, since changes to the that policy have site-wide impacts. I'm personally in favor of some kind of standardization, but have no strong feelings on what that would be. —Justin (koavf)TCM 22:23, 9 April 2025 (UTC)Reply
I'm proposing the style mention above then and I think our current layout leads to a lot of different style. our has yet another arrangement 185.18.68.210 22:30, 9 April 2025 (UTC)Reply

User:49.149.102.149 vandalism

edit

Partly just breaking stuff, and partly xenophobically attacking certain forms as "nonstandard". Please take an eyeball. 2A00:23C5:FE1C:3701:F050:4AD7:86D3:BA99 00:22, 10 April 2025 (UTC)Reply

@49.149.102.149: Malaysianising is the present participle and gerund of Malaysianise; Malaysianizing is the present participle and gerund of Malaysianize; Malaysianise and Malaysianize are two spellings of the same word. Even if you dislike the spelling Malaysianise, it should be very clear that Malaysianising is not the present participle and gerund of Malaysianizing, as you have been asserting. 0DF (talk) 01:16, 10 April 2025 (UTC)Reply
Courtesy link: Special:Contributions/49.149.102.149. Heads up that Wiktionary:Vandalism in progress exists. No comment on if this user is actually a vandal. —Justin (koavf)TCM 01:59, 10 April 2025 (UTC)Reply

Prohibit AI-generated content

edit

It seems that User:Jöttur has prolifically generated bad Icelandic content (especially pronunciations and usage examples) using ChatGPT or similar system. I have been trying to delete all the bad content but there's a ton of it, and a lot still left. I would like to propose a formal prohibition on using AI chat bots and LLM's to generate any sort of content for Wiktionary. Doing this would make the user subject to escalating blocks. Thoughts? Benwing2 (talk) 07:28, 10 April 2025 (UTC)Reply

  Strong support of a ban. Vininn126 (talk) 08:00, 10 April 2025 (UTC)Reply
  • Online translators like Google Translate are effectively specialised forms of LLM, and I think those are already prohibited as the basis of entries (although I can't find the policy). If there is a policy, then we can hopefully modify it fairly simply - if not, it should cover both cases (more generally, "Do not use any automated tool to generate content that you would not be capable of writing and verifying independently." - effectively rule 1 of WT:BOT). The only reasonable exception I can think of might be for grammar checkers - I mean, if I edit Wiktionary on mobile, technically I'm already using an LLM (since the autocorrect, autocomplete and spell check/grammar check functions on phones use a low-level AI), and I can imagine someone reasonably using a manually-controlled semi-automatic grammar bot on definitions and etymologies. Smurrayinchester (talk) 08:11, 10 April 2025 (UTC)Reply
  Support. Svārtava (tɕ) 08:49, 10 April 2025 (UTC)Reply
  Strong support. But this might be subject to an official vote. AG202 (talk) 14:03, 10 April 2025 (UTC)Reply
  Strong support. Anarhistička Maca (talk) 14:15, 10 April 2025 (UTC)Reply
  Support. AI may be a useful tool in some cases, but since it is very prone to hallucinating, anything it creates must be manually reviewed by a human before any of it is contributed. In particular, asking an AI something and believing it without question must never be treated as an alternative to actual proficiency, as in adding entries or examples in a language that you do not actually speak. — SURJECTION / T / C / L / 14:52, 10 April 2025 (UTC)Reply
I think the text mentioned by Smurray summarizes best the situation and preferred approach. Vininn126 (talk) 14:55, 10 April 2025 (UTC)Reply
  Support. We have always had serious problems with people who contribute prolificly in languages they don't know. AI provides a handy tool to make such people even more prolific, and all kinds of apps are intrusively and aggressively promoting their AI services without mentioning their limitations. While it's hard to conclusively prove that a pattern of bad edits is due to use of AI, we definitely need to make it clear upfront that we do not allow AI-generated content. We should also develop a help or policy page explaining what AI is, what can go wrong, and why editors should never use it here. Basically, an AI app is hard to distinguish from a human pathological liar, and neither should be trusted. Chuck Entz (talk) 15:18, 10 April 2025 (UTC)Reply
On this note I think we should also start coming down harder on people editing languages they are not competent with more. You don't need to have perfect fluency, but you do need to know enough linguistically speaking to not leave stubs, etc. Yes, there is WT:BOLD but if you're leaving a mess or a bunch of request templates then you might as well not make the page. Vininn126 (talk) 16:09, 10 April 2025 (UTC)Reply
I 100% agree. AG202 (talk) 20:58, 10 April 2025 (UTC)Reply
  Strong support, I have tried chat GPT's etymolical and page-building skills "for fun": it is disappointing, to say the least, if not a win for myself/ourselves in its failure. Saumache (talk) 15:24, 10 April 2025 (UTC)Reply
  Strong support. — Sgconlaw (talk) 15:44, 10 April 2025 (UTC)Reply
  Support I prefer human-generared gibberish. Tollef Salemann (talk) 16:58, 10 April 2025 (UTC)Reply
  • How exactly will we confirm our suspicions that content is AI-generated? Or do we act on our suspicions and require the contributor to show that contributions aren't AI-generated? A ban makes me think of King Canute. DCDuring (talk) 18:17, 10 April 2025 (UTC)Reply
    • Policy for editors: AI is forbidden. Policy for admins: Poor quality bulk submissions can be bulk deleted without going through formalities. There is no need to change the blocking policy. Vox Sciurorum (talk) 18:29, 10 April 2025 (UTC)Reply
      • What makes some submissions "bulk submissions"?
      • In effect, you propose that we use an anti-AI policy to justify eliminating "formalities" to allow "bulk submissions" deemed (by whom? ie, what judge and juty) of "poor quality" to be summarily deleted (by whom? ie, what executioner?). We won't even have to have a show trial. DCDuring (talk) 18:54, 10 April 2025 (UTC)Reply
        FWIW I asked several contributors on Discord to evaluate the quality of User:Jöttur's work and posted some details about this on their talk page. I don't think it's helpful to throw around accusations that contributors are getting railroaded into "show trials" with "summary executioners" or anything like that. In reality it's quite the opposite; we have a massive problem with people who have no idea what they're doing but think they do and contribute junk across several languages, significantly degrading the quality of the dictionary. It typically takes months, sometimes years, before these bad contributors get blocked, and often their bad contributions never get cleaned up because it's a big task especially once time has elapsed. You're not the one who is cleaning up the messes so please, do your due diligence before casting aspersions. Benwing2 (talk) 19:18, 10 April 2025 (UTC)Reply
          • @User:Benwing2 I reacted to VS's proposal: arbitrary deletion at the discretion of individual editors.
          I think that the use of the label "AI-generated content" is a canard if we don't have the ability to actually detect it. It seems to me the problem is large quantities poor-quality content from languages for which we apparently cannot muster sufficient trusted contributors to promptly review entries or translations. Maybe AI is the source, but without any specific ability to detect whether AI is in fact the source, we may just be ignoring the root problem. Do we need to filter additions of L2 sections in certain languages for which we have no active editors, so that only qualified editors can work on them or on critical parts of them (creation? etymology? pronunciation? definition?)? Are there some technical means (joined with necessary contributor workflow) to detect and limit (ban?, quarantine?) the flow of entries and translations in certain languages for which we have no qualified contributors or reviewers? Maybe our cottage-industry approach to entry review needs technical, not rhetorical reinforcement. DCDuring (talk) 23:15, 10 April 2025 (UTC)Reply
        I think that bad entries made with good intentions can sometimes be improved and become good, but entries made by lazy copypaste from Google Translate with AI made code are not possible to improve, neither are they made with good intentions. I can’t say I make perfect edits all the time, but I try to at least make sure that they don’t contain false information, and use lots of time to verify stuff. Using AI is the opposite, when the form is set over the content and honest work. Tollef Salemann (talk) 20:11, 10 April 2025 (UTC)Reply
  Strong oppose. Has it not been established that trying to "detect AI generated content" is a fool's endeavor? Even if this were somehow possible, which it isn't, there is no inherent issue in using LLMs for editing help. An obvious example that comes to mind is editors whose first language isn't English using ChatGPT or similar for proofreading. If a user contributes AI slop, it should be deleted as slop and the user dealt with accordingly. Focusing on the AI part is completely unhelpful. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 22:18, 10 April 2025 (UTC)Reply
I think you may be missing the gist here. The idea is not to prevent people from using LLM's for help in proofreading or verifying the correctness of generated content. I used to do that several years ago using Google Translate when the Russian quality was so-so at best, essentially as a "second opinion" to make sure what I was doing wasn't crazy. The issue is people who mass-generate content using ChatGPT or similar and don't correct its mistakes, like the above-mentioned user. Ultimately yes there is an arms race between detecting AI-generated content and generating content to fool the detectors (that was the explicit idea of GAN's developed by Goodfellow et al several years ago ... I work in the field in fact so I'm familiar with the issues). But much current content is obvious AI slop and having a policy to explicitly prohibit such slop would make it a lot easier for admins to block editors who enter such slop. As it is, AFAIK it's not explicitly prohibited so it can be difficult to justify a lengthy block as a first offense, which just makes life that much more difficult for admins who have to repeatedly deal with problematic users who wait out their block and then continue the same behavior. Benwing2 (talk) 22:32, 10 April 2025 (UTC)Reply
"obvious AI slop" is not obvious though, as I am sure you know. And here it's not like we have something akin to WT:CheckUser that would enable admin to insist someone used AI, against their word. Wouldn't it be a much more constructive policy direction if we prohibited mass-contributed, incorrect, low-effort content, AI-generated or not? AI could be mentioned, sure, but as formal policy I fear it just doesn't provide any meaningful coverage. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 22:40, 10 April 2025 (UTC)Reply
Sure, but I am afraid such a policy against something as nebulous as "mass-contributed, incorrect, low-effort content" will prove impossible to enforce or even define. Benwing2 (talk) 23:04, 10 April 2025 (UTC)Reply
@User:Benwing2 What about capping the flow of certain contributions from editors without established track record in the language of the contribution? Such capped contributions could be quarantined pending review by a competent reviewer, should we ever get one for the language involved. DCDuring (talk) 23:20, 10 April 2025 (UTC)Reply
I still fail to see how a blanket ban against AI is any more enforceable. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 02:49, 11 April 2025 (UTC)Reply
@Lunabunn: There's more to a ban than enforceability. Sure, there are people who will use it-and get away with using it- no matter what we say or do, but there are some people who wouldn't use it if they saw that it wasn't allowed. Chuck Entz (talk) 03:51, 11 April 2025 (UTC)Reply
And if someone were to use AI properly? DCDuring (talk) 13:25, 11 April 2025 (UTC)Reply
On @Surjection's suggestion above:
anything it creates must be manually reviewed by a human before any of it is contributed
And what do we do if a problematic user claims that they did review it? Do we punish them anyway, failing to assume good faith? What if they claim they didn't use AI at all? Do we run GPTZero on their edits? 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 22:37, 10 April 2025 (UTC)Reply
I think the primary purpose of this policy is to educate editors and clearly explain to them that the unverified AI-generated content is not a good contribution. Some of the contributors acting in good faith may genuinely believe that they are doing something useful. A month ago, somebody behind an IP tried to add a bunch of Belarusian entries, which looked like a bot automatically submitting AI-generated content. I spooked them via leaving messages at their talk page and they stopped doing that (unfortunately without responding to me and without providing any explanations). But if this was somebody really malicious, then they could have continued doing damage. --Ssvb (talk) 05:00, 11 April 2025 (UTC)Reply
Now that I dissect the matter after @Lunabunn’s distinction, running unapproved bots is disallowed. The bot policy page, which appears more relevant to this matter than the offence of copying from online translators, is, surprisingly, largely unedited and stable since 2006. Help in proofreading or verifying the correctness of generated content by whichever technical means does not constitute the act of content generation per se and hence has not been disapproved of even if it be subject to automation. I figure it is more intuitive to me to assume AI in terms of complicity, disallowed as a principal but not accessory. Ssvb is right that people are innocent enough to need explication. Fay Freak (talk) 18:48, 11 April 2025 (UTC)Reply
  Strong support BABRtalk 23:05, 10 April 2025 (UTC)Reply
  Strong support Just like we forbid other kinds of automatic content generation. As long as someone checks it and ensures the content is quality is there, I approve, but then it just becomes a way of typing out the content rather than mass automated content, so that's okay IMO. Kiril kovachev (talkcontribs) 00:32, 11 April 2025 (UTC)Reply
  Support with caveats discussed above. MuDavid 栘𩿠 (talk) 01:37, 11 April 2025 (UTC)Reply
  Strong support per Surjection. – wpi (talk) 11:36, 11 April 2025 (UTC)Reply
  • I asked AI for its views...

That's a great and complex question—there are strong arguments on both sides. Here's a breakdown of the pros and cons to help frame the discussion:

✅ Arguments for allowing AI-generated content on Wiktionary: Speed & Scale AI can generate large volumes of content quickly, especially for underdeveloped entries or less commonly spoken languages.

Consistency AI can follow structured templates and formatting rules, which may improve consistency across entries.

Augmenting, not replacing AI-generated entries can be used as a base for human editors to refine, speeding up the editing process without replacing human judgment.

Accessibility & Inclusion AI could help document languages or dialects with fewer contributors, making Wiktionary more globally representative.

❌ Arguments against allowing AI-generated content: Accuracy & Reliability AI might hallucinate or pull from inaccurate sources, especially with obscure or nuanced definitions.

Loss of Expertise Human contributors bring contextual knowledge, cultural nuance, and etymological accuracy that AI might miss.

Vandalism & Abuse Risk AI-generated content could be used to flood the platform with low-quality or even malicious entries if not properly controlled.

Community Erosion Overreliance on AI might discourage human contributors, weakening the collaborative spirit of Wiktionary.

🤔 Possible Middle Ground AI-assisted editing only: AI suggestions require human review before publishing.

Flagged content: Mark AI-generated entries for transparency.

Pilot programs: Test AI contributions in specific languages or entry types.

Vilipender (talk) 09:28, 15 April 2025 (UTC)Reply

User:Catonif for interface administrator

edit

Hello, I would like to request the interface admin rights to be able to edit MediaWiki:Gadget-LanguagesAndScripts.css, as until now I had to bother other interface admins with requests. I would only add fonts or maybe do minor adjustment for scripts and languages of minor importance. Catonif (talk) 19:39, 10 April 2025 (UTC)Reply

  Done as per Wiktionary:Interface administrators. — SURJECTION / T / C / L / 15:14, 11 April 2025 (UTC)Reply
Thank you! Catonif (talk) 15:38, 11 April 2025 (UTC)Reply

Wikidata and Sister Projects: An online community event

edit

(Apologies for posting in English)

Hello everyone, I am excited to share news of an upcoming online event called Wikidata and Sister Projects celebrating the different ways Wikidata can be used to support or enhance with another Wikimedia project. The event takes place over 4 days between May 29 - June 1st, 2025.

We would like to invite speakers to present at this community event, to hear success stories, challenges, showcase tools or projects you may be working on, where Wikidata has been involved in Wikipedia, Commons, WikiSource and all other WM projects.

If you are interested in attending, please register here. If you would like to speak at the event, please fill out this Session Proposal template on the event talk page, where you can also ask any questions you may have.

I hope to see you at the event, in the audience or as a speaker, - MediaWiki message delivery (talk) 09:18, 11 April 2025 (UTC)Reply

Wiktionary search engine redirects to Wikipedia

edit

I apologize if here is not the correct place to voice this issue. I am a casual user of Wiktionary, normally using it to check the etymology of words. Normally, I just to go to the hub page of Wiktionary and use the search engine there to look for a particular word. Before, the search engine simply directed me to the Wiktionary entry of the searched word; however, since a few days ago, it started redirecting me to Wikipedia instead. The glitch happens both on my PC and on my smartphone. Does this happen to everyone or somehow I've botched up the settings of the engine?

PS The search engine of the inner Wiktionary (Main Page) still works as normal. 2A02:6B6F:E3B5:2C00:20F7:DEEF:D366:F20E 10:50, 11 April 2025 (UTC)Reply

This has been reported numerous times and there is a phabricator ticket. https://phabricator.wikimedia.org/T391297 Vininn126 (talk) 10:53, 11 April 2025 (UTC)Reply

Formally allowing removal of Babel boxes by other users if proficiency is contradicted

edit

Another User:Jöttur-related issue. Benwing deleted Jöttur's Babel box after I suggested the idea on Discord after Jöttur was blocked due to a consensus that he was persistently adding incorrect Icelandic information despite claiming native Icelandic fluency in his Babel box. In consequence, I would also like to add the following to {{Babel}}'s documentation and Wiktionary:Babel in case another situation like that happens in the future:

Babel boxes may be removed by other users if it is clear that the user's claimed language proficiency levels are unsubstantiated.

Ceso femmuin mbolgaig mbung, mellohi! (投稿) 05:46, 12 April 2025 (UTC)Reply

  Strong support, makes sense. Svārtava (tɕ) 07:04, 12 April 2025 (UTC)Reply
  Supportwpi (talk) 07:34, 12 April 2025 (UTC)Reply
  Support Vininn126 (talk) 07:50, 12 April 2025 (UTC)Reply
  Support Saumache (talk) 08:51, 12 April 2025 (UTC)Reply
  Strong support, and a good call to formalise. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 09:53, 12 April 2025 (UTC)Reply
  Support Fay Freak (talk) 12:51, 12 April 2025 (UTC)Reply
  Support Benwing2 (talk) 09:56, 15 April 2025 (UTC)Reply
Claiming native Icelandic by using AI is a good reason to doubt such stuff, but Babel is pretty much subjective otherwise. Doubtful use of Babel is not really common, I remember just two cases in the last year, and they were very obvious and were soon stopped, as the contributions made by the users were so bad, so they were banned. Tollef Salemann (talk) 21:41, 12 April 2025 (UTC)Reply
We have had another contributor to Icelandic, Numberguy6, greatly overstating his knowledge of the language (the correct Babel assessment would be "is-1" instead of his claimed "is-4") adding some significant inaccuracies and mass copyright violations that will never be fixed due to his high volume of edits. But, like with Jöttur, removing the Babel box would not have changed anything, as these over-eager editors rarely listen to pleas for them to stop. The only thing that might help would be for it to be clear that these users should be reported somewhere for immediate admin intervention. 130.208.182.103 08:16, 13 April 2025 (UTC)Reply
Please don't take this personally, but my take on this is that the Icelandic language just needs competent Wiktionary editors who are willing to contribute on a regular basis. You are hiding behind an IP and have contributed very little during all these years since 2021. Of course, it isn't like you have any obligation to contribute, but I'm not surprised that imposters are filling the void.
I also don't know what to feel about the speedy lynching of Jöttur, which was based on your report and the testimony of "another Discord user", who was hopefully really a different person rather than your account there. I wonder, wouldn't it have been a good idea to ask for expert opinion of some active Icelandic Wikipedia contributors when resolving this dispute, such as perhaps @TKSnaevarr or the others? --Ssvb (talk) 10:31, 13 April 2025 (UTC)Reply
I am the above IP address. I gave up on contributing as there was no end in sight of bad Icelandic contributions to review. I tried on multiple occasions to get Numberguy6 to clean up after himself but to no avail. I do not have Discord. TKSnaevarr is welcome to review Jöttur's contributions; even though most have been deleted Jöttur's userpage is representative of his competence in Icelandic. Hvergi (talk) 11:40, 13 April 2025 (UTC)Reply
@Ssvb Nearly everything contributed by Jöttur was completely wrong, and it was repeatedly called out by others trying to clean it up. As just one example, he added the Afrikaans section on ander, and an IP later redid it with the comment
Correcting + expanding Afr. adj. inflections. They were added by someone unfamiliar with Afr. grammar, who thought all attributive forms take -e. This is quite wrong. An oversimplified rule of thumb is: (a) Polysyllables take -e unless ending in -el, -er. (b) Monosyllables take -e only when ending in -f, -d, -s, g.
This is typical of his contributions. As for the "another Discord user" possibly being Hvergi's Discord account, please assume good faith on my and Hvergi's part. In fact the user was @Anarhistička Maca, who gave me her permission (on Discord) to quote her response, and is an active Icelandic Wiktionary contributor (you specifically said "Icelandic Wikipedia contributors"; if this is intentional I don't know why it matters whether it's Wikipedia or Wiktionary). Also, depending on how bad User:Numberguy6's contributions are, I am willing to nuke them as well as I'm really out of patience with poor-quality editors who lie about their competence in a language and contribute slop. Benwing2 (talk) 09:53, 15 April 2025 (UTC)Reply
@Benwing2 Please assume good faith on my part and try to put yourself in my shoes. I posted my comment, after analyzing the information that was publicly available. And from where I stand, nothing in User_talk:Jöttur indicated that he was "repeatedly called out by others" for the issues related to the Icelandic language skills. I understand that some other communication channels could have been used for that, but yet nobody bothered to bring this topic to the user's talk page until just a few days ago. And this is strange, considering that the Jöttur's account is not exactly new. Is "nearly everything contributed by Jöttur was completely wrong" a hyperbole or somebody's objective assessment? Regarding the Icelandic language dispute that unfolded, and without having any other information, I see that you are relying on two expert opinions. One of these experts was labelled by you as "an actual Icelandic speaker" without disclosing their identity, but now upon my request, you have clarified that it was @Anarhistička Maca with "is-2" self-assessed Icelandic language skill in her Babel box. Another expert opinion came from an IP user, who later turned out to be @Hvergi, and whose Icelandic language proficiency is currently still ambiguous due to a missing Babel box. May I kindly ask Hvergi to make a statement about their self-assessed Icelandic language proficiency? You may assume that I'm not assuming good faith, but I'm merely asking for more transparency in handling this matter. And I'm surprised that the others haven't pointed out the same.
I mentioned Wikipedia in my previous comment, because it doesn't seem to be perfectly clear whether Wiktionary even has sufficient in-house Icelandic language expertise at this right moment to resolve the Icelandic language issues on its own. So active Wikipedia contributors could be possibly consulted as independent experts, of course if they don't mind. --Ssvb (talk) 10:19, 17 April 2025 (UTC)Reply
Among other things there were several pings to Jöttur made in the edit messages of commits trying to clean up his bad contributions, which he ignored, just as he ignored my and others' messages to him. The assessment also comes from me; although I am not an Icelandic speaker, I have enough linguistic background to have written the Icelandic noun and adjective declension modules (and consider that Icelandic declension is extremely complex), and I have been around on Wiktionary long enough that I can clearly identify when contributions are full of mistakes of various sorts. Anarhistička Maca also has deep linguistic knowledge of Icelandic, which I can attest based on personal conversations with her; her self assessment in Babel is probably based on her speaking ability, not based on her linguistic knowledge of Icelandic. I can spell out in gory detail all the errors but I don't see the point; ultimately either you trust my judgment or you don't. I welcome Wikipedia contributors with Icelandic knowledge to check Jöttur's contributions, but keep in mind they may not know Wiktionary's standards and rules, which are very different in many ways from Wikipedia. Benwing2 (talk) 21:32, 17 April 2025 (UTC)Reply
I don't see why commonality should necessarily be a factor. Vininn126 (talk) 08:27, 13 April 2025 (UTC)Reply
  Support Numberguy6 (talk) 17:21, 15 April 2025 (UTC) Feel free to delete/downgrade Icelandic (and all the other languages) from my box. I'm not lying, but rather misunderstanding: I've always assumed that Babel is equivalent to how well one speaks a language (and if it's not, then someone should put that on the page), and I can speak Icelandic fluently (ref). Of course, I wasn't this good when I started contributing, but I've improved a lot over time, which is why I keep thinking "I made mistakes before, but I won't make them anymore". I've also contributed in many other languages (which I haven't tried to become fluent in), and been blocked for a month over that. The problem is that it's just too hard to know how good I am at contributing (which is a problem I've been on the other side of countless times on Wikipedia: "I know you learned how to write papers in school, but this is different."). Allowing others to edit one's Babel would be a great first step towards fixing this problem. As for next steps, I'm thinking of a process similar to AfC on Wikipedia or the Test Wikidata, where new users (and existing users learning new languages) can write entries and then have them reviewed by experienced users; if they pass the review, then they can contribute.Reply
@Numberguy6 FYI, there's the Wiktionary:Babel page, which explains what each level means and "fluent" is supposed to be level 3. Being able to speak fluently without feeling that the language skill restricts your ability to express yourself doesn't mean that what you say is always grammatically correct. And there's a foreign accent too. My English is very likely not worse than your Icelandic. But the "near native" level 4 likely requires being truly indistinguishable from a native speaker. Which might be possible if, for example, somebody relocated to a new country at a very young age. And there are bilingual countries too, where everything is much more complicated. --Ssvb (talk) 18:47, 17 April 2025 (UTC)Reply
I just realized that Wiktionary's Babel system only goes up to 4, while Wikipedia's goes up to 5. Since I've always interpreted 5 (not 4) as "indistinguishable from native", I've been setting my own Icelandic level at 4. Numberguy6 (talk) 19:26, 17 April 2025 (UTC)Reply
This doesn't seem to be documented yet, but Wiktionary's Babel system actually goes up to level 5. You can see this, for example, by using edit preview. Level 5 is defined as "professional", and I assume this level is reserved for individuals with exceptional language skills, such as professional linguists, professional translators, and authors of notable literary works – people whose proficiency far exceeds that of the average native speaker. --Ssvb (talk) 02:45, 18 April 2025 (UTC)Reply
This is correct; 5 is "professional" level which means you work with the language professionally. @Numberguy6 please set your Icelandic competency to 2 or 3 as it's clear you don't have near-native proficiency. Benwing2 (talk) 05:28, 18 April 2025 (UTC)Reply
To be frank, I think we need a way to measure one's ability to add information, they are aware of various linguistic things important when making an entry, not just fluency and ability to speak. I know many fluent l2 speakers of, say, English, that don't have much philological or linguistic knowledge. Vininn126 (talk) 21:04, 17 April 2025 (UTC)Reply
I therefore weigh in analysing written texts and knowing the whole grammar and typical manners by heart more than listening comprehension, speaking ability, and writing skills, which would otherwise have to be put into the basket together with reading comprehension to formulate decorations in general society. Then again even natives cannot plead their own language like most court interpreter, so what does near-native (this is a WT:COALMINE, like non-native) even mean? For a scientist it counts; for a kind of clerk supporting a business—there was a profession of foreign language secretary popular once—the writing may be where the money is, and then our perspective is slanted to business writing to the disadvantage of academic writing and conversational writing, while strikingly different “skill-sets” are sought in a call center—but we don’t count the scammers!—, and for some general reason, not merely technicalities, e-mail and phone support is done in separate corporate departments. There is lots of material to argue, either way. Fay Freak (talk) 21:51, 17 April 2025 (UTC)Reply

Sense headings for Thesaurus namespace

edit

in many pages in the thesaurus namespace, editors decide to mark groups of words with a different sense (say, all vulgar vs. all neutral) with a heading. there is currently no standard way on how to implement these, though most use a pseudoheading created using italics or bold formatting. I wonder then how should they be formattted, a pseudoheading or a level 5 heading or something else? Juwan (talk) 14:52, 12 April 2025 (UTC)Reply

User:Wpi for extended mover

edit

I would like to request extended mover rights for moving several entries from IPA to something more proper. There's only 10 + 1 entries but I would rather not bother others to clean up the mess I had originally created. – wpi (talk) 16:31, 12 April 2025 (UTC)Reply

Moving archaic, obsolete, rare and uncommon meanings to the end

edit

I guess that when someone uses Wiktionary they are probably more likely to want to see modern popular meaning first rather than archaic or rare ones. In that case, is it possible to automatically move all the obsolete and rare meanings (in all entries) to the end of the list?

So for example the first meaning in noun ghost will become the last one 185.18.68.210 21:32, 12 April 2025 (UTC)Reply

We don’t do automatically. Against the likelihood of what someone wants to see there are issues like we don’t always know this, or frequencies, and it is unclear how diachronic perspective should be weighted against synchronic views—what to do with a once common term now only used in a marginal specialist sense?—, then in the end we give senses some logical order to have better presentation notwithstanding frequencies. Sorting by likelihood is specious, but it sometimes happens in place of complete arbitrariness. Fay Freak (talk) 00:27, 13 April 2025 (UTC)Reply
I prefer chronological order when the evolution of the senses is clear. In complicated definitions I have tried to group related senses. For ghost the senses "disembodied soul" and "human soul" are closely related and I would group them together, likely as subsenses, if I cared enough to edit the page. Vox Sciurorum (talk) 13:09, 13 April 2025 (UTC)Reply
I support putting archaic senses last. It is comparatively less useful to tell the reader "this is what the word meant 500 years ago" vs "this is what the word means now". Especially when there are long lists of definitions and it's not easy to immediately single out which one is still relevant. — BABRtalk 18:34, 13 April 2025 (UTC)Reply
While we generally put archaic and obsolete senses after the current ones, I don't think this should be a strict rule because sometimes putting the archaic and obsolete senses first indicates how the meaning has evolved over time, especially when earlier etymons of a word have a certain meaning, and the current meaning of the word seems different and unconnected. — Sgconlaw (talk) 18:45, 13 April 2025 (UTC)Reply
I reverted the change a user made to ghost#Noun. I believe the sense that is once more the first has somewhat broader use than the previous labels ("dated, obsolete") wrongly (IMHO) indicated. I certainly agree that it is not the most frequent use, but I also doubt that many other than the most hasty users would be confused because it appeared before the more frequent uses. Frankly, English Wiktionary is not really a suitable online dictionary for such a user. I believe we have already discouraged such users (and probably some "normal" users, too) by the complexity of our entries, the bulk of etymologies and pronunciations appearing before definitions, etc. DCDuring (talk) 18:57, 13 April 2025 (UTC)Reply
I don't agree with that revert, especially since there was an active, ongoing discussion about it when you did so. Regardless, you have to remember that "power users", like yourself, are not representative of the average reader. Most "power users" are on desktop, but most of our readers are on mobile, for example. I don't see how it is more helpful to the reader to see dated or historical usages of a term before the most common modern usage (that they are more likely to need).
However, I agree with Sgconlaw that we probably shouldn't make a hard rule about how senses should be listed. — BABRtalk 19:57, 13 April 2025 (UTC)Reply
I don't know that it is true that a typical reader is "more likely to want to see modern popular meaning first". They might well be looking up a word with an archaic meaning because they are reading something old, and the use of the word in the old thing does not correspond with modern usage. bd2412 T 18:59, 13 April 2025 (UTC)Reply
This is a primordial debate that has been waged over the decades from the beginnings of Wiktionary. Each has its own adherents and each has good arguments behind it, so neither has prevailed. The difference is basically between having the arrangement tell a story or show the logic behind the development of the sense, on the one hand, or having the arrangement help the reader find the things that they're most likely to want to find.
The problem is that the entries are often far too complex to reduce to an algorithm. For one thing, we have things separated by etymologies. Since Wiktionary is organized by spelling, we have to deal with wound, the past tense of wind, and wound, an injury (with a verb that comes from it). Likewise, wind has the present tense of wound and the movement of air (again with a verb that comes from it). Having the most common sense of each right next to each other would be confusing, so we would have to settle for arranging the senses within the etymologies. Even there, the senses within an etymology have subsenses. The extra verbiage needed to provide the information the reader would get from the sense/subsense arrangement would add to the clutter in our already quite cluttered entries for common terms.
In the end, we can't rearrange things to completely fit either phiolosophy- and we're likely to make a mess of things if we try. Chuck Entz (talk) 20:56, 13 April 2025 (UTC)Reply
Not to mention less-common subsenses in highly polysemic words. In principle we could try to selectively hide definitions based on subsense status and label, but that would be very difficult, possibly impossible (eg, subsenses that don't have an explicit substitutable supersense definition.). DCDuring (talk) 18:25, 15 April 2025 (UTC)Reply

Vote now on the revised UCoC Enforcement Guidelines and U4C Charter

edit

The voting period for the revisions to the Universal Code of Conduct Enforcement Guidelines ("UCoC EG") and the UCoC's Coordinating Committee Charter is open now through the end of 1 May (UTC) (find in your time zone). Read the information on how to participate and read over the proposal before voting on the UCoC page on Meta-wiki.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review of the EG and Charter was planned and implemented by the U4C. Further information will be provided in the coming months about the review of the UCoC itself. For more information and the responsibilities of the U4C, you may review the U4C Charter.

Please share this message with members of your community so they can participate as well.

In cooperation with the U4C -- Keegan (WMF) (talk) 00:35, 17 April 2025 (UTC)Reply

Should poss=1 and pred=1 automatically be turned on for turkish inflection?

edit

Sometimes I look up an inflected word and no results show because nobody turned on poss=1 so I do it Zbutie3.14 (talk) 21:03, 17 April 2025 (UTC)Reply

Font legibility

edit

I recently wasted a minute or so because I could not distinguish bumbag (bumbag) from burnbag (burnbag). My display is large and Windows scaling is at 150%. Further scaling helps, but surprisingly little. I use Vector legacy, but didn't get better results on others skins that were otherwise tolerable. I don't recall other character pairs that cause a problem.

Is there a way to select a better ("more visually accessible") font, preferably not a monospace one, or better kerning in personal CSS or JS? What is the appropriate place to whine about such an "accessibility" issue? DCDuring (talk) 12:10, 18 April 2025 (UTC)Reply

Why not just choose any font you want using either personal CSS, as you say, or browser settings? I cannot be bothered to check other themes but at least as of Vector 2022 English text just uses the default browser sans-serif font. (Which is good, as I would be very much against Wiktionary shipping its own English font.)
A good place to look for fonts that are easy to use with Wiktionary would be https://fonts.google.com/. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 00:44, 19 April 2025 (UTC)Reply
@DCDuring yes, Lunabunn is right, Wiktionary (and Wikipedia, etc) has always used your default browser font for body text. Or you can do the override in your personal CSS:
body { font-family: "Times New Roman"; }
This, that and the other (talk) 09:35, 19 April 2025 (UTC)Reply
Thanks for the help. I'll have to keep looking for a font. Times New Roman doesn't solve my problem. DCDuring (talk) 14:45, 19 April 2025 (UTC)Reply
Firefox provides for adjustment of character spacing, but I'll look for a font that addresses the problem of the specific pair r,n vs. m. DCDuring (talk) 14:51, 19 April 2025 (UTC)Reply
Lucida Sans Unicode is an improvement. Good enough for now. Thanks for getting me pointed in the right direction. DCDuring (talk) 16:18, 22 April 2025 (UTC)Reply
Windows has good font-choice features, including a good number of fonts. The Braille Institute has "Atkinson Hyperlegible Next" which doesn't fully handle my problem, but may be better overall. DCDuring (talk) 17:14, 22 April 2025 (UTC)Reply
Duplicate thread to Wiktionary:Grease_pit/2025/April#Font_legibility_problem. —Justin (koavf)TCM 01:10, 19 April 2025 (UTC)Reply
Sorry about the dupe. DCDuring (talk) 14:40, 19 April 2025 (UTC)Reply

Pinyin-derived English Language Terms: Approximate Origin Dates

edit

Many Pinyin-derived English Language Terms originated in the "Mid/Late 20th c.", which is what I plan to put in the etymology section of any Pinyin derived word if:
the word appears in "Shabad, Theodore (1972) “Index”, in China's Changing Map‎, New York: Frederick A. Praeger, page 345".
My rationale is that the initial uses or mentions of many of the words in the Index are going to be somewhat obscure, and I might say "Late 20th c.", but if the words were known of in that 1972 Index, they could easily have been used at some point in the 1960s or even late 1950s, see especially Citations:Beijing, which clearly existed in 1958 (though I have found no evidence of it in 1957). However, as for words that do not appear in that index, I do not assume they exist at all before 1979; they are an open question to me whether they would have happened in the 1960s or early 1970s. I will leave those without any dates for the time being (unless they have a clear later date, like Xiong'an.) I will try this dating scheme, and build on the developments that grow from it; please let me know if you have any insights or comments. EDIT: No, I think I'll do something like "circa late 20th c." c. Late 20th c., for all these so I don't create an artificial distinction between words in and not in that Index, but I write circa because technically some words have a real chance to exist between 1958 and 1967, albeit hyper rare. Unless I have some other more specific info, and if I know the word existed in the 20th c. and could theoretically have been created in 1958 though probably wasn't well known until 1979 and there's no evidence of it between 1958 and 1967 (counterexample, cf. Guangzhou), I'll use this. I've done about 15 test cases just now--, see if you have any objections: Zhongsha, Xisha, Ritu, Xian, Sanmenxia, Atushi, Ningxia, Quanzhou, Shanxi, Wuxi, Jixi, Zhangzhou, Shashi, Wulumuqi, Haerbin. I'm acknowledging the possibility of very early period usage or mentions, but I'm not forcing it when it actually may not have happened. --Geographyinitiative (talk) 10:54, 20 April 2025 (UTC)Reply

We have a problem

edit

You see, it doesn't make sense to call Old English a different language. Languages are like people, they don't become different languages. Even though it's drastically different, calling Old English a different language is like calling you when you're 15 a different person from when you're 2. Same thing with Middle English. — This unsigned comment was added by 2603:9000:e102:e587:4443:782f:4e79:f9f5 (talk).

There is no single determiner between one "language" and another "language" or between various lects within a language (see chronolect, dialect, topolect, etc.). One common and handy rule of thumb is if the two are mutually intelligible. If you personally time traveled 900 years back to the days of the Angles and Saxons in England shortly after the Norman Invasion and you said "Hello, I am from the future" in your current tongue, they would have no idea what you're talking about and you couldn't understand anything they said as well. Take a look at a copy of Beowulf (written 1,000 years ago in Old English) or Chaucer's Canterbury Tales (written 600 years ago in Middle English) and tell me if it makes sense to you as a modern English speaker. The latter will have several reasonably similar passages and you can stumble thru a lot of it. The former will be virtually like Greek. It is common and reasonable to call the "Old" versions of a language something that is either so different from the current one as to be a separate language or a clearly demarcated chronolect that a contemporary speaker could not understand or could only understand a little with great difficulty. See also, e.g., Vulgar Latin leading to Old Spanish to our current Spanish. —Justin (koavf)TCM 23:41, 20 April 2025 (UTC)Reply
By your standard, French, Spanish, Italian and Romanian are all just Latin. For that matter, all the language families would consist of a single language each- are we communicating in "Indo-European" or "English"? England has been a single nation located in the same place under basically the same name since the latter part of the Old English period, and even invasions of Old Norse and Old French speakers didn't change that. That makes it easy to call the language(s) spoken there by the same name, which leads people like you to assume that they're inherently the same thing- in some ways they are, and in others they aren't. Of course, there's room for debate as to whether it's better to treat Old, Middle and Modern English (not to mention Scots) as one or multiple languages- but not because it's impossible for them to be anything but one language. Chuck Entz (talk) 01:17, 21 April 2025 (UTC)Reply
Precisely. The fact that they all have the word "English" in the name does not imply we have to treat them as the same language. Theknightwho (talk) 01:28, 21 April 2025 (UTC)Reply
So, English is three languages? 2603:9000:E102:E587:4443:782F:4E79:F9F5 13:00, 21 April 2025 (UTC)Reply
Think about it like this: would you be requesting this merge if the name we used for Old English was "Anglo-Saxon" instead? Theknightwho (talk) 13:13, 21 April 2025 (UTC)Reply
No, but I think we should leave it. 2603:9000:8100:B539:1C1:3530:B9F5:F160 19:53, 21 April 2025 (UTC)Reply
It should be kept. 2603:9000:8100:B539:1C1:3530:B9F5:F160 19:53, 21 April 2025 (UTC)Reply
I couldn't have said it better myself. Actually, what the original poster said should be extended as much back as possible, for the proposition to reach its logical consistency, that is, to Proto-Indo-European. It doesn't really make sense to treat all its dialects as separate "languages". We should put all different deviating and innovative senses a pIE word has developed under the pIE page, and, if needed, add appropriate labels for more recent pronunciations, appended with a specifications for when and where such and such word was pronounced this or that way. That would be an endeavor worth a dictionary aspiring to be called "etymological".Make Dargwa great again (talk) 20:14, 24 April 2025 (UTC)Reply

Request to become interface administrator - User:Theknightwho

edit

Hi - as per Wiktionary:Interface administrators, could I please be added as an interface administrator? This would mainly be to deal with languages and scripts, as well as scripts like MediaWiki:UpdateLanguageNameAndCode.js. What prompted this request is the fact that I am putting together a series of modules for keeping our Unicode data up to date, which would work by extracting the data from Unicode text files saved in raw modules (e.g. Module:Unicode data/raw/DerivedCombiningClass.txt), and constructing it into a suitable format that can be accessed by other modules. This isn't possible without JavaScript. Theknightwho (talk) 13:11, 21 April 2025 (UTC)Reply

Done. Benwing2 (talk) 21:55, 21 April 2025 (UTC)Reply

Defaulting {{ux}} to automatic switching

edit

(pinging @Vininn126, Lunabunn from a Discord discussion a few weeks ago):

The current setup of usexes has three templates: {{ux}}, for multiline; {{uxi}}, for inline; and {{uxa}}, which automatically switches between the two. IMO this is too liable to be poorly used. Generally uxa is best for most situations and the default behaviour of ux should reflect that; we should not have 150 character inline usexes nor should we have 10 character multiline usexes.

I think it's best, then, that we switch ux to have the behaviour of uxa by default and retire uxa and uxi. If, for some reason, the formatting needs to be manually overridden, then we can add parameters to ux to do that.

On a technical level, as well, this is relatively trivial to do: change ux to use the same code as uxa, and run a bot job to switch calls of uxa and uxi to use ux. Saph (talk) 13:53, 21 April 2025 (UTC)Reply

  Support — @Saph: I was not previously aware of {{uxa}}, but I note that {{ux}} already has an |inline= parameter which, if the template's documentation is accurate, can be used to switch from the template's default behaviour (always-multiline) to {{uxa}}-style automatic switching or to the always-inline presentation of {{uxi}}, depending on the argument supplied. IMO, we should make {{ux}} act like {{uxa}} by default, make |inline= Boolean for switching to always-inline, and have |multiline= (or something similarly intuitive) as a Boolean parameter for switching to always-multiline. That is what I take your proposal implicitly to entail anyway. 0DF (talk) 14:48, 21 April 2025 (UTC)Reply
@0DF Isn't |inline=auto/yes/no sufficient? 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 18:27, 21 April 2025 (UTC)Reply
Slightly aside, I would think we should implement shorter aliases for these: |i=a/1/0, or similar. Saph (talk) 17:44, 22 April 2025 (UTC)Reply
1 and 0 should already work; no objections to i and a, although as for a it would be default anyway. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 18:05, 22 April 2025 (UTC)Reply
@Lunabunn, Saph: Please consider accessibility for non-coders’ sake, choosing intuitive parameter names and ideally idiot-proof inputs. Shortening |inline= to |i= is a particularly bad idea, given the number of templates that use |i= to activate italicisation. 0DF (talk) 21:07, 22 April 2025 (UTC)Reply
@0DF I also don't particularly see the need for an alias FWIW because I thought the entire point is to eliminate the need to specify in most cases. In the occasional edge case where it is still needed, a few extra characters shouldn't be an issue. That being said, Saph did say aliases; I don't see her proposal hurting anyone, either. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 21:28, 22 April 2025 (UTC)Reply
  Strong support. Hard-coding one or the other is an active hazard for accessibility across different environments. I am working on a better inlining heuristic for {{uxi}} that takes into account viewport width, which should hopefully further improve UX (pun intended). 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 18:30, 21 April 2025 (UTC)Reply
Said better heuristics implemented at User:Lunabunn/Sandbox, albeit with values that need more tweaking. Suggestions welcome at my talk page. 🌙🐇 ⠀talk⠀ ⠀contribs⠀ 08:43, 23 April 2025 (UTC)Reply
  Support Benwing2 (talk) 21:26, 21 April 2025 (UTC)Reply
  Support Vininn126 (talk) 17:51, 22 April 2025 (UTC)Reply
  Yes, please, just like with {{col}}. Polomo47 (talk) 00:11, 26 April 2025 (UTC)Reply
By the way, when are we having a bot change {{col1}}, {{col2}} etc. to the automated template? Polomo47 (talk) 00:12, 26 April 2025 (UTC)Reply

Successor to User:DerbethBot

edit

The only bot that has consistently added audio files to Wiktionary pages has been blocked for months now with no indication that the issue(s) will be resolved. Meanwhile, Wiktionary:Approved Lingua Libre users is growing slowly but steadily, and Commons users continue to upload files under the xx-foo.ogg nomenclature. Audio pronunciations are too valuable a resource to be added by hand a few at a time. To give an idea of what we're missing out on, there are more pages in commons:Category:Dutch pronunciation than results here for the search "terms with audio pronunciation". Is anyone willing to create or adapt a bot for the purpose of importing audio files? Ultimateria (talk) 19:11, 21 April 2025 (UTC)Reply

It's too bad that User:Derbeth has not been willing to do this work. On the surface this sounds simple but there are some potentially tricky issues to work out:
  1. Ensuring that we don't auto-add audios when there are multiple pronunciations specified for a given term, as we won't know which audio goes with which pronunciation.
  2. Ensuring (or trying to ensure) that we don't re-add audios that have been previously deleted.
  3. Additional language-specific tweaks; e.g. we may want to entirely exclude languages written in the Arabic script at first due to the underspecified vowels.
Also pinging @AG202 who may have thoughts. Benwing2 (talk) 21:53, 21 April 2025 (UTC)Reply
I agree. I recall one annoying issue was that pronunciations that were identified as incorrect (for example, stressed on the wrong syllable) kept getting readded by the bot. Any new bot will need to have a way to avoid this. — Sgconlaw (talk) 05:55, 22 April 2025 (UTC)Reply
I think this can be implemented by looking in the page history to see if the audio was already added. Probably the best way to do it is to include the name of the file and language in the changelog message, in a particular format such that the bot can identify just based on the changelog message that it previously added the same audio. AFAIK, it's pretty fast to retrieve a list of the last 500 commits (including commit messages) to a given page, but slower to check the contents of the commits, because each such commit has to be retrieved individually. Benwing2 (talk) 06:04, 22 April 2025 (UTC)Reply

Provisions on Sicilian Entries

edit

Catonif, Nicodene, Scorpios90, Medellia, Afc0703, Benwing2, and if you know anyone else who edits Sicilian (bonus points if they're mothertongue) ping them as well. Benwing2 is pinged as this topic, aside from probably being of interest to them, also includes topics of templates.

I think it's necessary to make a few express decisions on Sicilian entries, and to ultimately create a Wiktionary:About Sicilian page, as Sicilian has already posed some challenges before. I'd like to reach some official decisions around 3 thoughts I had:

  1. Could vulgar/ad hoc spellings possibly be added as pages? (directing visitors to the CS spellings) (CS: Cademia Siciliana)
  2. Pointing out the official templates (especially for verb conjugations and pronunciations)
  3. Labelling narrow pronunciation transcriptions and alternative written forms by dialect?

I'd like to spend a few extra words of mine on all of these, to get the conversation going and to give everyone an idea of what my initial proposals are so you can put your ideas on the table. After you can also propose other concerns you've had so we can make as many useful decisions as possible.

Including "vulgar" or ad hoc spellings with redirects and labels You might be asking yourselves what I mean by "vulgar/ad hoc spellings". I don't have any other name for them, but as you know Sicilian is (unrightfully so!) not taught or officially used in school, and this also means effectively illiteracy, analphabetism in Sicilian, for speakers. Now, a lot of the youth use a modern Sicilian to communicate, and verbally there is no problem. On text, however, I see my peers having to eye dialect their way through the words they use, leading to what I call vulgar or ad hoc spellings. These can be found on social media, in chats, but also in places such as restaurants that may have their name in Sicilian (although often those try to use more CS-like spellings in stuff like their menus, but that's beside the point). Here are some examples of this spelling that I commonly see used in my school class group chat (Gelese Sicilian): macna for màchina, po for , itve for jìtivi, femmna for fèmmina, vene for veni, foco for focu. I'm quite confident these spellings are probably influenced by Neapolitan song names and lyrics being written in this sort of way. Examples in the wild: po culo, pizz, femmna (Gela) (P.S. the restaurant might actually be traditionally Neapolitan, from skimming their page), como vene si cunta. Now, the examples I found are by no means many and while I'm sure I could find some more (I know one place I could look), it does seem like there's less of this sort of writing on the indexed web than I thought. I'm not sure what other people who grew up in Sicily will say, whether they've had a similar experience or not, but at least in Gela, and at least in these recent years, every teenager writes in Sicilian like this. Do you think these spellings should be included in our overall project (with appropriate labelling as ad hoc spellings)? I would personally discard any non-popular spellings of words from being covered as basically nobody will use them (I'm talking about like, idk, spelling cunigghiu as kunigghju and stuff like that). CS should remain the "official" orthography for Sicilian entries on Wiktionary.

Pointing out the official templates (especially for verb conjugations and pronunciations) There appear to be 9 templates for Sicilian verb conjugations (Category:Sicilian verb inflection-table templates). The number of Italian ones? 1. (Category:Italian verb inflection-table templates). It would definitely be better if we could do the same with Sicilian verbs, and use only that one template. I can help with this point, if morphological help is needed!

I can also help for a uniform, phonemic, broad transcription pronunciation template, if phonemic help is necessary.

Labeling pronunciations and forms by dialect or region Pronunciations on Sicilian entries, as of current, are most often narrow transcriptions (P.S. this does not seem to actually undoubtedly be the case), of different and unspecified dialects, e.g. the pronunciations in arrè and babbaluci look to me like they're central or conservative, while I'm very sure the pronunciations in aḍḍumari and astutari are another accent. I don't think anyone would be against labelling all narrow transcriptions (and possibly also audio pronunciations) by the accent represented, either by the name of the town, e.g. "(Gela)", or by the name of the dialect/accent associated to the location, e.g. "(Gelese)", or by the type of Sicilian (Eastern, Western, South Eastern, Central...) if the exact location is unknown. Whichever way you people like. In this same bout we could also label alternative forms the same way, by accent (e.g. jattu could be labelled "predominantly Eastern Sicilian, Catania").

As a final word... this language is a mess on Wiktionary, amongst obsolete spellings in page contents or even titles, amongst stub pages, missing information and haphazardous coverage. But hey, if we make some decisions, we can include them in an About Sicilian page because imo it could be a good way to promote a standard of quality and good information across Sicilian entries. What you all think? Crunchy Cloaky Crackdown (talk) 00:53, 22 April 2025 (UTC)Reply

(Notifying Catonif, Scorpios90, Medellia, Afc0703, Crunchy Cloaky Crackdown): Fenakhay (حيطي · مساهماتي) 05:50, 22 April 2025 (UTC)Reply
@Crunchy Cloaky Crackdown Yes, Sicilian is unfortunately a mess. @Nicodene and I did some cleanup of pronunciations, esp. the narrow ones, which were often simultaneously incorrect and over-detailed. It would be great if you're willing to clean some of the mess up. As for point #1, these sorts of spellings can be included, yes, but they should be properly sourced (which may be a bit tricky, as Facebook, Twitter and the like don't count as valid sources). As for #2, I'm pretty busy now with all sorts of requests so I can't commit time at this point to writing a Sicilian verb module, although it should definitely be possible to create one by modifying the Italian module (or possibly maybe better, start with the Spanish module; the Italian module is somewhat complex in order to handle all the irregular and obsolete forms sometimes found in standard Italian, which might not be needed for Sicilian if we only want to cover a single modern standard). As for #3, I'm pretty sure Nicodene prefers broader or even phonemic pronunciations. My general preference is for "lightly phonetic" pronunciations, which means that some salient allophonic features may be represented if they're non-obvious to a language learner, but mostly you should follow the phonemic form. Definitely if we include narrow details they need to be tagged with the appropriate accent identifier. Benwing2 (talk) 05:52, 22 April 2025 (UTC)Reply
@Benwing2 for #1: you say sourcing for the ad hoc spellings is necessary? Does that mean having a link to a source that includes this spelling in the "References" section of the entry? (Also, why are social media (like Facebook or Twitter) not valid for this? Where else should one look?)
For #2, given what you say I might start fiddling by myself later, at least with the conjugation template.
For #3, I forgot to make this clear but yes we should definitely prioritize broader or phonemic transcriptions over specific realizations like you and @Nicodene prefer. Either way though if we make a pronunciation template like I proposed this problem will basically vanish forever anyway. Crunchy Cloaky Crackdown (talk) 12:45, 22 April 2025 (UTC)Reply
Hi. If you’re curious, here is the previous thread on Sicilian pronunciations. A major problem that @Catonif and I had was that it’s unclear which pronunciation we should take as ‘standard’. The issue could in principle be side-stepped by simply tagging all given pronunciations by location, I suppose, at the cost of leaving a backlog of some hundreds or thousands of already-added pronunciations with no such tag.
Have the Sicilian Academy published any sort of ‘orthoepic’ recommendations? That could form the basis for a pronunciation module I suppose. Alternatively, if there exists a detailed phonetic description of, say, Palermitan (or whatever speakers tend to regard as a prestige variety) we could use that as our in-house standard.
As for adding commonly occurring non-standard spellings as alternative forms: yes, I think they deserve to be documented. Nicodene (talk) 18:46, 22 April 2025 (UTC)Reply
Yes, Cadèmia Siciliana has a whole book on its orthographic proposal, Proposta di normalizzazione ortografica comune della lingua siciliana (first edition, 2017), available on the website for free—as it so happens I just started reading it last week! I'm not sure if they have worked on a second edition, but they have a periodical, some of the articles in which are orthography-related (though I haven't looked much into these). As for your question, I'm not yet sure if they actually prescribe standard pronunciations or if they're even concerned with that, but of course any preference of orthography (even if the system is for something "objective" like phonemic IPA) will inevitably favor some pronunciations over others. — Ganjabarah (talk) 22:34, 22 April 2025 (UTC)Reply
@Catonif Ya I read that thread to get informed before posting this whole thread! Personally, I don't think we should consider any pronunciation at all as 'standard', I don't think there's any point to doing that, unless I'm missing something. Meanwhile, you hinted at in your own 2022 post as well, that Sicilian words share the same phonemic base across all dialects: I'm in favor of always including that phonemic transcription. As well, any other regional pronunciations, transcripted broadly, but not phonemically (so possibly containing phones absent in Sicilian phonology, like /ʔ/) might be added, and alongside those, in case a native speaker (or certain researcher) wills to add them, (accent-labelled) narrow, allophonic transcriptions local to specific areas might also be added. For example gattu could have, in its Pronunciations (pseudo printed output):
  • IPA(key): /ˈɡat.tu/ (phonemic) or (base) or similar
    • (predominantly Western Sicilian) IPA(key): /ˈʔat.tu/
    • (predominantly Eastern Sicilian) IPA(key): /ˈjat.tu/
    • (some dialects) IPA(key): /ˈɡat.tu/
    • (Gelese Sicilian) IPA(key): [ˈjɐˑt.t̩ʰʊ̯]
Or something like that (with hyperlinks too of course, not sure how that could be gotten to work). The only complaint then could be that it's a tad long, but to balance that out it's decently informative and does contain all the information one should need.

However yes, as you point out there's been a lot of narrow transcriptions that have been added in the past and we, erring on the side of caution, could never know and add the exact location for any and all of them, resulting in a painful backlog as you say. Maybe a half-solution could be to mark these pronunciations with some request for verification as to what accent it is? But I'm not too sure how that truly works and if I've got my head in the clouds saying this.

By the way I searched, and the CS has not released anything on pronunciation as far as I can see. Even then though, I should definitely be able to help a decent bit for a pronunciation module myself anyway. Sicilian phonology and phonemics aren't very complex, even with written word to phoneme conversion in mind. That being said I do have a few doubts about a few sounds and if they constitute different phonemes (off the top of my head, I have difficulty coming to terms with the "soft c", amongst [ç], [ʃ], [t͡ʃ], and even [ɕ]), but I'm probably thinking that with some research and paying attention when people speak I should be able to make sense of things. Crunchy Cloaky Crackdown (talk) 22:45, 22 April 2025 (UTC)Reply
Yes it's very possible to have appropriate hyperlinks added to accent qualifiers, and I don't have any objections to long pronunciation sections; if necessary we can simply hide some of the pronunciations by default (we follow the same approach for Spanish; see cebolla for an example). As for marking pronunciations as needing verification, that could be done too, either with an existing template like {{rfv-pron}} or a new Sicilian-specific template. In either case the term would be added to a cleanup category. My main concern with this approach, however, is that the terms may never be cleaned up; historically, cleanup categories have tended to languish unless there's someone particularly diligent about going through them. An alternative is I could generate a page listing all the existing pronunciations, and someone like you or @Nicodene or @Catonif could mark all the questionable ones, and I can have a bot go through and delete them. I'd rather have no pronunciations than pronunciations that are sketchy, questionable or clearly wrong. We followed a similar approach for Manx for several thousand bad lemmas added by Embromystic. Benwing2 (talk) 23:23, 22 April 2025 (UTC)Reply
@Benwing2 Oh that's useful info, so we can definitely make Sicilian prons look pretty and well implemented then. As for the cleanup operation, are you referring to narrow transcriptions, broad, or both? I've looked around some of the
words in Sicilian that have IPA pronunciations and most seem fine to me*, apart from some that have transcriptions with sounds that I haven't heard before (might be attributable to inexperience in the language). Could you show me some examples of pronunciations you think look sketchy or questionable as you say? And if it's that many then we might consider mass deleting pronunciations like you suggest. I understand though that people adding misinformation on small languages is very common on Wiktionary so I wouldn't be surprised.
*I will have to say though, the stress on some entries actually is wrong, and I also often think some audio pronunciations sound like the person just read the Sicilian word in an Italian accent and that was it (again, maybe I'm inexperienced and people actually pronounce it like that somewhere, but I find that very, very unlikely from my overall experience). Crunchy Cloaky Crackdown (talk) 21:35, 23 April 2025 (UTC)Reply
@Crunchy Cloaky Crackdown I'm referring to narrow transcriptions, many of which on first glance looked wrong to me (in the conversation linked by @Nicodene). But I'm not actually sure whether they're wrong, I'm just guessing based on this along with what you said (that many of them are unidentified as to accent) and the fact that, as you note, it's very common for people who don't know what they're doing but think they do to add incorrect info to less-known languages. Likewise for the audio pronunciations; if there are a lot of incorrect ones and we can identify a pattern either in the contributor or the format of the audio filename, we can mass-delete them. Benwing2 (talk) 22:05, 23 April 2025 (UTC)Reply
To me those pronunciations specifically, on the old 2022 thread, look passable (although of course they're of unidentified accent) and realistic enough. As for the audio files... it turns out that most of the ones that have been added (which are still very few, consider only A, B, C, and S lists exist) were added by me... that leaves a few remaining which are the ones that rubbed me the wrong way: biḍḍizza, beḍḍu, Sicilia, which are all by the same recorder, @User:Àncilu. On their own user page their Babbel says scn-4, but their recordings (I went out of my way to see some of their own Sicilian recordings on Wikimedia (scroll down)* + another page (Ctrl+F and see the last few ones, like "a fini" and "a cunnizioni i" and none of them sound convincing or even slightly believably Sicilian, and they don't even really get the sounds of Sicilian accurately either, especially ⟨ḍḍ⟩ which they treat as if it were ⟨dd⟩) make me doubt they have that level of competence at least in pronunciation. Next there is an editor that leaves me a little perplexed with the narrow pronunciations they add, also because they're a red name and as such they don't have a Babble: @User:Inqvisitor. I'm concerned about them because they generally add pronunciations that I find a little weird (personally, it's the ones having [ɑ̝] in them like ballari, or also tulimaicu with how they didn't split [äɪ̯] into [ä.ɪ], but I might just be finding reasons to be over-critical), and they've also said a very weird thing, if you look at beddu's history: the edit summary on the 13rd July 2023 (they've done a similar thing in biddizza, same date). The reason I find this statement contestable is mainly the "plus /ḍḍ/ is not even standardized scn orthography for [ɖː]" statement, which if we go by CS is absolutely not true? Also, ⟨dd⟩ and ⟨ḍḍ⟩ are two totally different sounds and it seems weird to not want to differentiate them. Again I might just be trying to find reasons to be skeptical, and that can be done with anything in life, but there really is a difference between a blue name editing and openly not including a Sicilian level in their Babbel (therefore admitting they might be making mistakes editing, e.g. you guys who are very noble in this) and a red name you don't know any background about.
@User:Hyblaeorum seems to make good edits, and the narrow pronunciations they add are very un-complicated and generic.
*their Italian recordings don't sound like they have a Southern accent either (the most southern it sounds to me is Naples, but if I had to say one accent I'd say Tuscan. Possibly purposefully recording a Standard Italian accent?), and there are also some Russian recordings too which don't sound very convincing (seemingly missing palatalization sometimes, vowels not being realized as specific allophones based on context); both of these are in the same page as linked by the way Crunchy Cloaky Crackdown (talk) 16:09, 24 April 2025 (UTC)Reply
@Crunchy Cloaky Crackdown Iu parru lu sìculu (pirchì lu mè nannu veni dâ. Nun haju pirò n'accentu/prununza sempri bona pirchì nascivi ntê Pugghî e abbitava pi 17 anni n Vaḍḍi d'Aosta, supratuttu pâ ḍḍ. Siḍḍu vuliti scancellu di ccà tutti li mè file di prununza n sicilianu. Pû russu, è na lingua ca iu studiai pirciò la mè prununza nun è pirfetta. Àncilu (talk) 17:08, 24 April 2025 (UTC)Reply
Ah, capisciu. Cumunca se, lu putìssitu fari di scancillalli, ma prima vulissi sèntiri si chistu va bonu pi l'autri :)
Ah, I understand. By the way yes, you could indeed delete them, but first I'd like to hear if this is fine for the others :)
@Benwing2, @Àncilu is talking about deleting all his audios in Sicilian (either me or them could provide a translation of his reply if you need (Google Translate + slight guesswork should be enough), but I'd still prefer to keep this conversation in English) Crunchy Cloaky Crackdown (talk) 22:35, 24 April 2025 (UTC)Reply
@Crunchy Cloaky Crackdown Thanks. Google Translate does an OK job but leaves out entirely the sentence where @Àncilu says it's OK to delete his audios (I assume that's what he's saying). I'm fine with removing them; that's probably the best option as (per my earlier statement) it's better to have nothing than something wrong. As for the other users yeah there are a lot of users who think they know what they're doing but don't; it's a chronic problem and one where I'm increasingly convinced that we just need to nuke all the contributions of some users rather than trying to correct them. Some users will respond to warnings telling them not to contribute to languages they don't know, but others won't, and there are very few people who will clean up past bad contributions they've made. Benwing2 (talk) 22:43, 24 April 2025 (UTC)Reply
@Benwing2 Àncilu (talk) 22:49, 24 April 2025 (UTC)Reply
I did not mean delete from lingua libre but in the sense of not having it appear in the English Wiktionary. The goal of lingua libre is to document as many pronunciations from different locutors as possible. But if you prefer a 100 % Sicilian accent, no problem, you can record it yourself if you can get a more faithful accent. This means it will remain in the French Wiktionary and that's it. But in my opinion, it is interesting that a French-speaker in Morocco would record the pronunciation of rural locations of the word “voiture”: [vwatir] instead of [vwatyʁ] Àncilu (talk) 22:50, 24 April 2025 (UTC)Reply
@Àncilu Yes, what I meant by "delete" is to remove the audio templates from the English Wiktionary. I won't delete anything on Lingua Libre (and don't even know how). Benwing2 (talk) 22:53, 24 April 2025 (UTC)Reply
By the way, my point in the previous thread was not that phonemic transcriptions are preferable to phonetic ones, but rather that if one does want to make a phonemic transcription, one should make sure that what one puts in it really is phonemic. I would actually suggest using phonetic transcriptions for representing regional variation in Sicily, since you can simply focus on the actual sounds without having to make assumptions about the deeper sound-structure (phonology) of each dialect. Nicodene (talk) 02:57, 24 April 2025 (UTC)Reply
Oh, right, I totally see now seeing the broad transcriptions at the start of the post. Those ones aren't phonemic. And I've seen that since then, a lot of Sicilian entries have, in their histories, edits that report their Pronunciations being normalized as you call it, so that's good. Also yes being able to compare regional, phonetic pronunciations would be the ideal (as long as a base phonemic pronunciation is still present of course, but I don't think you wanted to make that optional). Crunchy Cloaky Crackdown (talk) 22:58, 24 April 2025 (UTC)Reply
It is possible to do without the phonemic level entirely. This is now the case for our Catalan and Russian pronunciation modules, for instance.
For Sicilian, a ‛pan-insular’ phonemic transcription may be possible for something like gattu but not for words reflecting a number of historical developments. For instance /ˈnɔvu/, /ˈnɔvi/, /ˈforti/ would fail to account for Mistrettese having [ˈnu̯o:vu], [ˈnu̯o:vi] with a diphthong yet [ˈfɔrti] without a diphthong (AIS 1579, 186). /ˈmɛrlu/, /parˈlassi/, /ˈtɛrra/ would fail to account for the same dialect having [ˈmi̯ellu] with [ll] yet [parˈrassi], [ˈtɛrra] with [rr] (AIS 493, 1627, 420). And so on.
Nicodene (talk) 03:50, 25 April 2025 (UTC)Reply
Oh wow, I actually didn't know there existed situations like these where you can't always predict the pronunciation for one dialect with just the phonemic transcription... and I also didn't know Catalan and Russian only used narrow transcriptions. I totally understand now. So, do you feel Sicilian phonemic transcriptions should even remain anymore at this point, if we can easily do well without? I saw Catalan has a pronunciation template, Template:ca-IPA, which generates three "macro-regional" square-bracketed pronunciations, maybe the Sicilian pronunciation template could be something similar, with narrow transcriptions for Palermitan, Catanese, and some other major dialects? I would not be able to help directly with transcribing any of those (I could only Gelese), but I'm sure there exist studies of either dialect's strict phonetics. This site that you consulted though... I haven't fully made sense of it yet, but it seems like it allows one to fetch realizations by location? Do you feel like a hypothetical Sicilian pronunciation template could make use of that? Crunchy Cloaky Crackdown (talk) 18:10, 25 April 2025 (UTC)Reply
I do think that using phonetic transcriptions, whether relatively broad or relatively narrow, is the most practical solution here. It takes quite a bit of work to establish phonemic correspondences for a range of dialects. (See e.g. this discussion with @Jamala regarding Neapolitan.)
The website that I linked is a digitization of this linguistic atlas. It’s useful for reference but would be difficult to base a pronunciation module on.
Ideally we’d base the module on one or more varieties of Sicilian whose phonetics are described at length in multiple sources. If such exist, and you find the sources, I can help by condensing the relevant information into a rough draft for the module. Nicodene (talk) 22:56, 25 April 2025 (UTC)Reply
The real issue is the lack of recordings. Most readers unfortunately don't know or care the first thing about IPA, let alone phonemicity. What they do tend to learn is how orthography corresponds to a pronunciation, based on hearing many examples, and any IPA transcription to supplement a corresponding recording is mostly helpful to linguists or the 1% of Sicilian learners who are interested in the linguistics. In my opinion reaching out to speakers from various dialects to provide multiple audio recordings per word should be the top priority. An audio is worth a thousand IPA characters… or whatever they say. — Ganjabarah (talk) 00:23, 26 April 2025 (UTC)Reply
The two goals are compatible and complimentary. Nicodene (talk) 02:18, 26 April 2025 (UTC)Reply
@Nicodene I tried to search for those in English on both normal Google and Google Scholar and nothing came up- when I tried doing the same in Italian, I was able to find three papers on Sicilian pronunciation. I'm not sure how useful they could be but I'm assuming you know where to look.
  1. Very old one from 1890, doubt it could be useful as it only seems a little superficial, and it also doesn't read well
  2. Salentinu 1, Salentinu 2
  3. Caltagironese + notes on main dialects
I'm surprised there was seemingly nothing for specifically Catanese or Palermitan. Either way I also sent an email to the Cademia Siciliana, maybe they know some more sources. Either way I hope I managed to provide :) Crunchy Cloaky Crackdown (talk) 22:21, 26 April 2025 (UTC)Reply
Thank you. I can’t seem to access the second source through that link. The third source seems fairly high-quality.
I’m digesting this article at the moment, an overview of important regional differences. Nicodene (talk) 04:04, 2 May 2025 (UTC)Reply
Good to know I was of use, and I fixed the link for the second source and even casually found another on the same dialect! Lmk when you have something Crunchy Cloaky Crackdown (talk) 10:38, 2 May 2025 (UTC)Reply

So-called preterite in Cimbrian

edit

In Cimbrian, we apparently call the perfect tense (be/have + past participle) the "preterite". See the definitions and the inflection template at "haban". This is highly unusual and misleading.
(1.) Throughout Continental West Germanic the perfect tense stands in for the preterite, which latter is often in limited use or -- in Cimbrian, but equally in all other forms of modern Upper German -- has been lost entirely. Nevertheless the remaining composed past tense is called the "perfect" in all of these languages. More or less the same is true of various Romance languages including French and Italian.
(2.) The term "perfect" (= completed) is entirely adequate for such a tense and in line with the scope of the original Latin perfect. (If anything, the perfect tense in English is a misnomer.) The word "preterite", on the other hand, is used in Germanic specifically for the synthetic past tense.
Therefore I see no justification to deviate from the general rule in Cimbrian and call the perfect the "preterite". I ask for permission to change the Cimbrian conjugation templates and remove the term "preterite". Nothing speaks against replacing it with "perfect", but "past tense" could be used as well. 84.57.154.5 22:09, 22 April 2025 (UTC)Reply

No objections from me, but give it a couple of days to see if anyone else comments. You are right that "preterite" is usually used to indicate a synthetic past tense and not a tense formed with auxiliary + past participle. Benwing2 (talk) 23:25, 22 April 2025 (UTC)Reply

Glyph origin

edit

This is about the (graphical) etymology section called “glyph origin” of mostly Chinese, but also Japanese, Korean, etc. glyphs.

I own quite a number of books on the topic of the development of Chinese and Japanese characters (glyphs, graphs), the best of them published in Chinese and Japanese. A recent example is the book:  漢字字形史字典【教育漢字対応版】 (Dictionary of the historical evolution of kanji forms: Edition covering all Elementary school characters) 落合淳思 Ochiai Atsushi. 東方書店 Tōhō Shoten. Tōkyō Metropolis, 2022.

From this book and others it can be learned that for a great many glyphs there is no consensus about the origin or development of a certain character. The author of this particular book deals with that by having selected ten important researchers and comparing his own analyses with these other researchers, for each glyph. Additionally, research in this field is very much ongoing, and earlier opinions are often discarded or changed.

However, in the section “glyph origin” only very rarely sources are cited.

I don’t mind that contributors only give the most commonly held view on the origin of a specific character. I would certainly not want contributors to use the elaborate method used by Ochiai, the researcher I referenced above.

However, it would be very helpful if contributors would cite their source, to show who's opinion they are giving, and from which time period.

As I indicated, giving the source is quite rare, which puzzles me. I can only assume that contributors are using a source and not making it up, so why do they normally not include their source as well?

There are also contributions that contain sentences like “An alternative theory suggests...” - and not naming the source of that alternative theory either.

In conclusion: There is no way for the reader to judge the reliability of a given explanation, by noting who's opinion it is, from which period, of to seek more information by looking up the source.

I’m not an active contributor myself, so I’m wondering what is going wrong here. Perhaps contributors need a reminder to include their source? Perhaps a list of sources should be provided to the contributor, so that is easy and not time consuming to add the correct source? Perhaps there should be some other way to make it easier to add a source?

Thanks for your time.

Hurdsean (talk) 11:52, 23 April 2025 (UTC)Reply

There have been a lot of active discussions recently about strongly encouraging or even requiring sources. Wiktionary in the past has not required such sources, which IMO was a mistake. Cc. @Thadh @Vininn126 @AG202 as some who have participated in the discussion about sources, and @Justinrleung and @Wpi who may be able to comment specifically on the glyph origins and where the info is coming from. As for making it easier to add source info, the way to do that is to create the appropriate templates: reference templates of the form {{R:zh:...}} listing the actual sources, and parameters in the glyph origin templates to make it easy to cite specific sources (I did that, for example, for Italian pronunciations, where a lot of them are sourced to DiPI). Benwing2 (talk) 22:09, 23 April 2025 (UTC)Reply
Seems to me a no-brainer that reliable references should be provided whenever they are available. — Sgconlaw (talk) 22:16, 23 April 2025 (UTC)Reply
References definitely should be included in glyph origins, especially where they are more controversial. We could either use reference templates or {{zh-ref}}. — justin(r)leung (t...) | c=› } 06:08, 24 April 2025 (UTC)Reply

Serbo-Croatian proper nouns

edit

Currently SC headline template for proper nouns does not support female equivalent/f= to handle nationalities. Should we add this function? Chihunglu83 (talk) 11:55, 24 April 2025 (UTC)Reply

How is it proper noun though if it has a female equivalent? In family names it can be, of course then we should have this function. You have Swede and German as nouns, only the language German as a proper noun—which is neither a proper noun in my opinion which I also have argued at some other place, there are various Englishes and Germans. So Švéđanin should also be declared a noun and not a proper noun. These terms being entered as proper nouns presumably merely has taken place due to fallacious conclusion from their capitalization. Nobody makes this mistake for Arabic script where no capital letters exist, e.g. أَلْمَانِيّ (ʔalmāniyy), also the language أَلْمَانِيَّة (ʔalmāniyya). Fay Freak (talk) 14:14, 24 April 2025 (UTC)Reply
Yeah I agree that demonyms should be common nouns not proper nouns even if capitalized, but IMO language names are fine as proper nouns even if they can sometimes be pluralized, because they usually have a single referent. Benwing2 (talk) 22:45, 24 April 2025 (UTC)Reply
@Benwing2 @Fay Freak AFAIK, SC linguistics classified ethnonyms and demonyms as proper nouns (which I also feel weird), an example [[6]]here discussing the upper and lower-Case Letters of proper nouns in plural. In general, I just want to ask: how should we handle Šveđanin/Šveđanka? Previous editors put them in derived terms section which I think headword would be more proper. Ideas? Chihunglu83 (talk) 11:45, 25 April 2025 (UTC)Reply
https://pravopis.tripod.com/latinica/l-velika_i_mala_slova.html - attached is the pravopis srpskog jezika on writing proper nouns. Chihunglu83 (talk) 12:15, 25 April 2025 (UTC)Reply
@Chihunglu83, Benwing2: Previous editors weren’t that equipped in coding of the templates or modules behind them. It would be more straightforward to have the female forms of demonyms and ethnonyms in the headword, since they are too necessary just to be entered as derived terms. They would have to be presented as nouns however, since even by comparison with other Slavic languages we can hardly wrap our heads around them being proper nouns.
For family names the situation is peculiar in Serbo-Croatian and Slovene, as opposed to Macedonians and any other Slavic-speaking nation; they don't print female surnames regularly. You can still form them with -ka for the wife of so-and-so and -ova / -eva for the daughter of so-and-so (in ⅔ of cases someone ending with -ić, following Serbo-Croatian naming customs), which is theoretically negligible historicizing use but apparently necessary already in journalistic reports: they are required if no female forename or the word gȍspođica or similar or anything indicating social gender precedes the surname, lest congruence with perfect verb forms be not maintained, Telegraf.rs writes it would be definitely wrong to mean a woman and write "stigla je Jovanović", jer to krši pravila o kongruenciji, budući da se predikat mora slagati sa subjektom rodu (ako glagol razlikuje rod), stoji u pravopisu srpskog jezika. Also in the oblique cases only the preceding word suffers inflection but not the surname if it is a woman, pozovite gospodina Jovanovića, ali - pozovite gospođu Jovanović. This would have to be relegated to inflection tables as the particular surname inflection type if Ben segues to the modularization of Serbo-Croatian noun inflections. (Again I intuitively say noun inflections since the idea of their being proper nouns is repugnant.)
In sum this means we have no case left where a Serbo-Croatian proper noun head needs a |f=. (Natively, since a Bulgarian mixed-sex immigrant couple is granted two gendered forms of their surname, which has little to do with the entry language as it would even appear in English.) Fay Freak (talk) 22:10, 25 April 2025 (UTC)Reply

Should social media posts be able/enough to attest terms and "Scots problem" spellings?

edit

(Notifying @User:Benwing2):
I have been told that as of current, social media are still not allowed for attestation due to not being durably archived. In that case, do you think Wayback Machine archival could possibly negate that concern? As a counterpoint still, however, we surely know the Internet Archive project is in a shaky state in general.

And how about underdocumented languages with no official orthographies (colloquially, presenting the "Scots problem"), like Sicilian and more? Do you think an exception could be made in the policy for these languages, where there might be no other way to encounter an "ad hoc" 'unorthodox' spelling if not on social media posts? Crunchy Cloaky Crackdown (talk) 23:34, 26 April 2025 (UTC)Reply

I wonder who told you that? We have accepted entries based on social media attestation alone, although in practice users prefer to see more attestation than the bare minimum of three uses in twelve months. The policy (WT:ATTEST) requests that an internet archiving service is used when doing this - the Internet Archive is not the only one. This, that and the other (talk) 04:52, 27 April 2025 (UTC)Reply
@This, that and the other It was me who mentioned this; I'm aware we have social media entries but I thought they were frowned on, since WT:CFI doesn't explicitly allow them but says they need community approval per source. The issue that @Crunchy Cloaky Crackdown is running into is that Sicilian isn't a well-documented language so the sources for it other than social media, esp. for "in-the-wild" spellings, are often lacking. I didn't realize there are other archiving services, but what happens if the only good social media posts aren't archived? Benwing2 (talk) 19:39, 27 April 2025 (UTC)Reply
Technically yes, we need community approval per source, but there's been little to no enforcement on that policy in CFI in the past few years, which has led to a very laissez-faire attitude, where as long as no one notices, social media-based entries have been allowed. There's just not enough editors, time, or energy to monitor something like that. @Benwing2 AG202 (talk) 19:52, 27 April 2025 (UTC)Reply
It's because by linguistic standards, we can be confident it is not anyhow irrational an attitude, this is about best practice for “languages like that”, contrasted with prestige or imperialist languages. Realistically for most creoles and pidgins this is the most likely place anything at all is written. Then again you might have heard something in a piece of music, where these languages are more often present, and just cross-check that your spelling is not utterly off the wall, by this new means of support, for you would not get any frequency data, and formally published sources also maintain their quirks and would constitute biased selection. Fay Freak (talk) 03:04, 28 April 2025 (UTC)Reply

Requested Entries

edit

As noted at WT:RFVE#xanadu, sometimes someone adds a term to the Requested Entries list, someone else evaluates that it doesn't meet CFI and removes it, and then the same person or someone else re-adds it and someone creates it unaware of the earlier evaluation. [] It could be useful to have a way to track that a Requested Entries request was denied, and why.
One idea would be to give each word its own headered section so (after tweaking aWa) rejected requests could be archived to talk pages; that'd make it more likely that if the entry was created someone could notice it was previously discussed and RFV it if needed, but it'd make the prior discussion invisible to anyone (re)adding a term to the main RE page. Alternatively we could keep all requests, and people's comments of why they couldn't be created, on the RE page, rather than removing 'dead' requests, but then the page will be huge. Soliciting other ideas! - -sche (discuss) 23:41, 26 April 2025 (UTC)Reply

Here's an idea: We create a gadget, similar to the translation adder, that lets people add to REE using a simple form (I'm envisaging two fields: the term itself, and a freetext field for a comment and links to sources).
This gadget then looks up a list page, say WT:Requested entries (English)/unsuitable entries, and rejects the entry if the term is found in that list.
Or if we prefer to use entry talk pages, the gadget could notify the user if the entry's talk page contains an archived REE "discussion" (since REE uses bulleted lists we could adapt aWa to follow that structure, or create a new archiving tool specially for the page). This, that and the other (talk) 04:57, 27 April 2025 (UTC)Reply

ramifying topic categories by type

edit

@-sche @Ioaxxere Also pinging @Theknightwho @This, that and the other for thoughts:

It's becoming increasingly impossible to avoid separating topic categories by type. The more I work with {{place}} and geographic topics, the more I run into this problem. For example, Category:en:Mountains is supposed to be a name category, but not surprisingly in fact it contains a mixture of individual (named) mountains, types of mountains and terms related to mountains. I propose we do this incrementally, something like this:

  1. There will continue to be a generic category Category:Mountains. It will have three ramified children Category:Individual mountains, Category:Types of mountains and Category:Terms related to mountains. However, the first parent of each such ramified category will not be the generic category but the corresponding ramified parent. For example, the first parent of Category:Individual mountains will be Category:Individual natural features, whose first parent will be Category:Individual places, etc. Correspondingly, the first parent of Category:Types of mountains will be Category:Types of natural features, whose first parent will be Category:Types of places, etc. The reason for this is that the breadcrumb trail is determined by the first parent, so that each of the three types of ramifications will have a parallel hierarchy that is logically the same as the current hierarchy.
  2. Sorting of categories in their parent categories will ignore the ramification prefix "Individual", "Types of", or "Terms related to"; this will happen automatically in the category tree code.
  3. The categorizing template {{C}} will accept abbreviated prefixes to indicate the ramification type: ind:, typ: or rel:, probably with shorter abbreviations i:, t: or r: (per User:This, that and the other I'm avoiding special characters for this purpose, which will be hard to remember).
  4. Now, what happens if you don't use a ramification prefix? I propose that corresponding to each generic topic category, or maybe to a subset of them, is a default ramification type, whereby if you just write {{C|en|Astrology}}, it automatically goes into Category:en:Terms related to astrology, as if you had written {{C|en|rel:Astrology}}. I say "maybe a subset" because in cases like "Mountains", it the ramification type may not be obvious, but in the case of "Astrology", "Individual astrology" makes no sense and "Types of astrology", while possible, is less likely to be applicable to a given term than "Terms related to astrology". Similarly, "Musical genres" seems to be an obvious "types of ..." category, and "States of the United States" an obvious "individual ..." category. We already in essence have a default type for each topic, specified right in the category tree data modules. In the case of a topic category where we don't assign a default type, omitting the type dumps the page into the generic category.
  5. The breadcrumb tree will have some way of indicating the ramification type that doesn't take a lot of space. In particular, given the parallel hierarchies described above, typically the top few categories in the breadcrumb trail will be either non-topic categories or special grouping categories, and the remainder will all be topic categories of a specific ramification type. For example, Category:en:Mountains has the following trail: Fundamental » All languages » English » All topics » Names » Places » Natural features » Mountains. Everything starting with Category:en:Places is an "individual ..." category so maybe the breadcrumb for this category will show [individual] in a smaller font with the assumption that everything below this is of the same type.
  6. The only thing problematic about this scheme I can think of is that it may make autocompletion harder, since it works off of the beginning of a category. Someone searching for a specific ramified category related to a given topic will have to type Category:Individual ... or Category:Terms related to ... which is a fair amount of typing. The generic categories will still exist and facilitate autocompletion, but with this in mind, possibly the ramified categories should be named more like Category:en:Mountains (individual), Category:en:Mountains (types) and Category:en:Mountains (related to); and in this case maybe Category:en:Mountains (named) is better than Category:en:Mountains (individual). (Do all individual foo have names? Probably so ...)
  7. One last thing is we might want to make a naming exception for certain classes of categories. For example, any geographic category of the form PLACETYPE in/of LOCATION such as Category:Counties of Texas, USA is almost certainly an 'individual' category (for example, I *suppose* there could be types of Texas counties and terms related to Texas counties, but there are unlikely enough to reasonably warrant a category for them). So maybe these classes of categories can be automatically ramified without having the ramification type noted in the category name. But maybe this is more trouble than it's worth.

Benwing2 (talk) 19:24, 27 April 2025 (UTC)Reply

I like much of this; in particular (I realized I should clarify my comment in the last discussion) I like the idea of still having the top-level categories like "Category:Mountains" both to group the subcategories and potentially even to categorize entries directly into in cases where there are too few entries of any one subtype to split them. Or maybe we shouldn't categorize anything into top-level categories, maybe for consistency we should enforce always subcategorizing? It's tricky because for some things (waterfalls?) there aren't that many types vs terms-related-to (so splitting would make for lots of small categories or (if we disallow categories with few entries) require in non-categorization, whereas in other cases there probably are so many terms-related-to and so many types that splitting them makes sense. (The curse of a dictionary is to encounter every edge case and I trust we will run into edge cases and spanners here, alas...)
If the top-level categories still exist (and even if they don't), I am not sure I'm a fan of point 4; typing {{C|en|Foobar}} and having it result in "Category:Terms related to Foobar" for some values of Foobar, but "Category:Individual Foobars" for other values of Foobar (and "Category:Foobar" for some?) seems unintuitive, and like a recipe for people putting things into the wrong categories because they saw that {{C|en|Barfoo}} generated the category-type they wanted and so they assume {{C|en|Barbaz}} will too and are unaware it aliases differently. I admit that this means, as you say, people have to type long(er) category names when trying to add or find them, which is not great.
- -sche (discuss) 23:39, 27 April 2025 (UTC)Reply
Yeah we can dispense with #4 if necessary, and simply make it so that typing {{C|en|Foobar}} always dumps into Category:en:Foobar. I also think we should ideally have every term go into one of the ramified categories rather than a generic category, and treat any terms in generic categories as cleanup opportunities. Unfortunately we can't force people to subcategorize into a ramified version of a given category until we've gone and cleaned the generic category, which will be a long process; but we can definitely mark individual categories as needing ramification (potentially even on a language-by-language basis), so that people aren't allowed to add to the generic version of that category (unless they use a manually marked up category, which is difficult to prevent unless we start using edit filters to disallow this).
We also need to consider how labels interact with the new category structure; I haven't thought this through. Benwing2 (talk) 23:54, 27 April 2025 (UTC)Reply

subcats

edit

This is probably a stupid question, but I'm not good at cat editing: Shouldn't Category:en:Cities in England and Category:en:Cities in Scotland be subcats within Category:en:Cities in the United Kingdom? And if so, how do I go about making them so? Quercus solaris (talk) 01:06, 29 April 2025 (UTC)Reply

Relatedly, Category:en:Cities in the Isle of Man is currently listed as a subcat of Category:en:Cities in the United Kingdom, but speaking precisely, the Crown Dependencies are not constituent countries of the United Kingdom, and their cities are not, precisely speaking, in the United Kingdom. Quercus solaris (talk) 01:10, 29 April 2025 (UTC)Reply
Not a stupid question. I redid the category parent system a few weeks ago so that in general 'PLACETYPES in/of FOO' goes in first parent 'FOO' and second parent 'PLACETYPES in/of BAR' where BAR is the container of FOO. So you'll see for example, Category:Cities in Arizona, USA going under Category:Cities in the United States. However, if FOO is a country or country-like entity, the second parent is instead 'PLACETYPES' so that e.g. the second parent of Category:Cities in the United States is just Category:Cities rather than Category:Cities in North America (that last category shouldn't exist but it does because of the single entry Mexko in Central Huasteca Nahuatl, which defines this explicitly as a city in North America rather than a city in Mexico; I should fix both the entry and the categorization system so it won't categorize in such a situation). Due to a decision going back into the depths of time, England, Scotland and the like are made to perform like countries rather than administrative divisions, which is why they aren't in Category:Cities in the United Kingdom; it may have been @Donnanz who made a request of this sort since he works a lot on toponyms in the UK. But this is definitely open to change. As for all the dependent territories, they're more or less grouped into their own group so I can change group properties; maybe for example, Category:Cities in the Isle of Man, Category:Cities in the Falkland Islands and the like should have second parent Category:Cities instead of Category:Cities in the United Kingdom. If I do this at the group level it will likewise affect Puerto Rico, Guam, etc. but it can be done at the individual territory level. You're welcome to take a broader look at the current category structure and make some suggestions; I am not well versed in the subtleties of dependent territories (which I imagine differ from territory to territory). Benwing2 (talk) 19:42, 29 April 2025 (UTC)Reply
That is not a stupid question at all. And yes, you bring up a good point. The way to do this is by editing modules. As for your point about the Channel Islands, granted, but I don't know that anyone is going to change this system based on the technicality. I'll leave that up to others, but try to fix the England/Scotland thing (as well as Northern Ireland and Wales). —Justin (koavf)TCM 02:00, 29 April 2025 (UTC)Reply
It looks like in Module:place, FIXME #25 addresses this and is related to FIXME #15 (which itself is resolved). So there may be some deeper reason why this hasn't been fixed yet. I'll see what other kinds of feedback we get here before editing unilaterally. —Justin (koavf)TCM 02:05, 29 April 2025 (UTC)Reply
@Benwing2: The only cities in the UK are in the constituent countries: England, Wales, Scotland and Northern Ireland. The Isle of Man, although it has the Ordnance Survey grid system, unlike Ireland which has a different grid system, is a Crown dependency. I think treating Douglas as a capital city is erroneous, it's the capital, sure, but no more than the largest town on the IoM (I visited it years ago). Whether it has city status, similar to official cities in the UK, which have royal approval, I don't know. The Falkland Islands are a British overseas territory, the capital of Stanley is a town, not a city (2,000 or so people).
I think the Channel Islands have categories for the islands, Jersey, Guernsey etc., and none for the island group. DonnanZ (talk) 20:21, 29 April 2025 (UTC)Reply

Vote on proposed modifications to the UCoC Enforcement Guidelines and U4C Charter

edit

The voting period for the revisions to the Universal Code of Conduct Enforcement Guidelines and U4C Charter closes on 1 May 2025 at 23:59 UTC (find in your time zone). Read the information on how to participate and read over the proposal before voting on the UCoC page on Meta-wiki.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.

Please share this message with members of your community in your language, as appropriate, so they can participate as well.

In cooperation with the U4C --

Initialisms

edit

What is the policy on including initialisms? Do we include all that met the general criteria, or only when we have an entry for the expanded form?

I looked for Wiktionary:Initialisms and Wiktionary:Initializms before asking here, but found nothing. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:04, 29 April 2025 (UTC)Reply

Initialism defs are allowed to point straight to the Wikipedia entry when no Wiktionary entry exists (whether 'not yet' or 'not ever' — either one). Thus, either {{init of|en|foo bar bar}} or {{init of|en|w:foo bar bar}} or {{init of|en|[[foo]] [[bar]] [[bar]]}}, ideally in that order of preferability. I don't know where this fact is documented, if at all, but it is currently de facto true in thousands of entries. Quercus solaris (talk) 18:55, 29 April 2025 (UTC)Reply
Thank you. Is this good: LIDO?
Should it link to lido and vice versa? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 29 April 2025 (UTC)Reply
Looks good, thanks. I touched it up with an edit (diff), which shows how {{also}} can be used at top of page. The "senseid" element is optional, so don't sweat it if you don't care to. Quercus solaris (talk) 19:38, 29 April 2025 (UTC)Reply
Is that "lightweight information that describes objects" or "objects of the lightweight information-describing variety"? The quaint, dated custom of using hyphens to make life simpler for readers would help. DCDuring (talk) 22:40, 29 April 2025 (UTC)Reply
The contact details for the LIDO Working Group are on their website. No doubt they'll be glad to hear your views. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 07:44, 30 April 2025 (UTC)Reply
  Done DCDuring (talk) 13:49, 1 May 2025 (UTC)Reply

I have written up the above guidance, at Wiktionary:Initialisms (to which the "z" spelling also redirects). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:28, 30 April 2025 (UTC)Reply

Is there a reason for creating Wiktionary:Initializms? The z spelling gets all of 0 hits on Google Books and 13 hits on Google Web; is it anything other than an exceedingly rare misspelling? - -sche (discuss) 22:52, 30 April 2025 (UTC)Reply
I assumed it was the American spelling; no issue with deletion if I got it wrong. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:21, 1 May 2025 (UTC)Reply
That spelling difference affects -ise/-ize and -isation/-ization, never -ism. 2A00:23C5:FE1C:3701:701F:B25:3261:382A 13:53, 1 May 2025 (UTC)Reply
Except z possibly productive in this use in AAVE or similar? DCDuring (talk) 14:00, 1 May 2025 (UTC)Reply

Latin capitalization

edit

I see no policies for capitalization of Latin entries set out at Wiktionary:Latin entry guidelines or Wiktionary:Entry layout. Obviously, Classical Latin had no case distinction. It seems the earliest bicameral Latin texts may have arisen in Carolingian or even Merovingian handwritten texts, but I think it would be really tough to verify their usage. So I think there are two practical methods that we can follow: a) describe the usage that can be observed in printed Latin texts b) ignore that usage and just follow our own rules of capitalization based on logic, e.g. "Capitalize all proper nouns, lowercase all adjectives and common nouns". I'm inclined to go with following the usage of printed texts (whether editions of ancient authors, or original New Latin works), with entries for alternative case-forms when there are multiple in use. That would however mean that we would have capitalized entries for various words other than proper nouns, such as certain adjectives or common nouns referring to nationalities, ethnicities, locations, or in some cases types of mythological beings. As far as I'm aware, most English-Latin dictionaries (and at least some non-English ones) do use capitalization for words other than just proper nouns; for example, looking at Logeion, we see capitalization of Harpyia "a Harpy/harpy", Acherusius "Pertaining to Acheron", Ianuarius "January/Pertaining to January", Latinus "of Latium", for the indexed dictionaries other than the digitized Latino-Sinicum (it looks like in the original print version of this dictionary, all the entries were capitalized, and so a case distinction was not available when it was digitized). Urszag (talk) 20:29, 29 April 2025 (UTC)Reply

@Urszag I oppose having capitalised forms for adjectives, as they're trivial, represent an artificial post-Classical distinction that serves no practical benefit, and they present a maintenance burden. Theknightwho (talk) 20:37, 29 April 2025 (UTC)Reply
Having a single clear-cut rule has some advantages. I think it tends to mislead readers about the actual conventions typographers tend to follow for Latin text. Also, it would require always placing the main entry for nationality adjectives on a separate page from the main entry for nationality singular nouns, e.g. Hispānus; this is certainly doable, as in Romance languages, but I think in that case it adds maintenence burden. (A related issue is the lemmatization of these nouns at singular vs. plural forms. Traditionally, Latin dictionaries use the masculine plural rather than the singular as the lemma of nationality/tribe-name nouns: the nominative singular forms are much less common, and I think in some cases not even attested as nouns in Classical texts, which could be related to the general avoidance in Latin of using nominalized adjectives in the masculine nominative singular form; e.g. bonus tends not to be used by itself with the sense "a good man", even though "boni" is often used with the sense "good men").--Urszag (talk) 21:00, 29 April 2025 (UTC)Reply
@Urszag Do we need to capitalise ethnonyms? That also seems to be a holdover from English capitalisation rules. Theknightwho (talk) 21:27, 29 April 2025 (UTC)Reply
Right, I keep forgetting that ethnonym nouns are not considered to be proper nouns (I think I get confused because they're capitalized in French). In that case, they would be lowercase according to the rules that you favor, which does make things simpler. I don't think "holdover from English capitalization rules" is an accurate diachronic description of how the Latin convention came to exist.--Urszag (talk) 21:37, 29 April 2025 (UTC)Reply
Following that convention, we'd write the first part of Caesar's Commentarii de Bello Gallico as follows: "Gallia est omnis divisa in partes tres, quarum unam incolunt belgae, aliam aquitani, tertiam qui ipsorum lingua celtae, nostra galli appellantur." It's not unreadable, but my opinion is that lowercase ethnonyms in Latin look strange and don't read as smoothly as using the normal capitalization. There are already cases where we make concessions to common conventions for the sake of readability (such as using punctuation, distinguishing "v" from "u", and not distinguishing "i" and "j").--Urszag (talk) 06:36, 30 April 2025 (UTC)Reply
@Urszag Well, I think we need a discussion about the i/j distinction, too. Theknightwho (talk) 09:04, 30 April 2025 (UTC)Reply
This Latin sentence caught my eye as I stumbled across this thread, and reading through it (unaware that you were discussing ethnonyms) I didn’t find notice anything strange about it at all.
Lowercase demonyms may be at odds with English orthography, but they are in line with the orthography of the Romance languages (although there is some variation in French). Nicodene (talk) 15:38, 30 April 2025 (UTC)Reply
I am of two minds here. I like the logic behind @Theknightwho's suggestion of only capitalizing proper nouns (i.e. basically names), but at the same time as a general principle we should try to follow what other dictionaries do, esp. if there is a consensus. Logeion seems to show that all cited dictionaries on the site capitalize demonyms/ethnonyms and related terms (e.g. Hispanē (in a Hispanic manner)) except for Latino-Sinicum. (Du Cange shows up for terms like hispanus but that's because this dictionary writes headwords in all caps. The actual citations given for hispanus do capitalize the term.) I assume that most Latin dictionaries capitalize demonyms and derived terms because that's what Medieval and modern writers tend to do. (And I would guess that English capitalization rules for demonyms came from Latin rather than vice versa.) So on the balance I think we should go with capitalizing demonyms and derived terms even if it's somewhat illogical; if we do it the other way, we'd need soft redirects all over the place in any case from the capitalized to the lowercase versions, which would be somewhat of a pain. Benwing2 (talk) 23:58, 29 April 2025 (UTC)Reply
@Benwing2 Well, we'd need soft redirects either way. Look at the derived terms ("""Translingual descendants""") on aegyptiacus. Theknightwho (talk) 00:09, 30 April 2025 (UTC)Reply
As far as I know, we've been more inclined to follow editors' standards. I have moved some pages in the past (Camēnālis) and began making a list of should-be capitalized nouns and adjectives, I lacked systematicity in noting them but know for a fact it is all over the place. Saumache (talk) 06:21, 30 April 2025 (UTC)Reply
I definitely think we should have a policy that goes beyond whatever an editor feels like. It seems pretty straightforward to use the practice of other dictionaries and of the published sources used for quotations as a criterion; e.g. the linked Lewis and Short entry writes Cămēnālis, so we can say that justifies having a capitalized entry for this word.--Urszag (talk) 06:36, 30 April 2025 (UTC)Reply
I meant publishing standards, I am all for laying out and enforcing policies. Theorically, any word derived from a capitalized proper noun should be upper case as well, that is at least the rule I have been accustomed to reading Classical and Medieval Latin works in modern editions, which Wiktionary users are more likely to be reading than manuscripts, all with differing spelling standards. It's the same old issue that keeps getting brought back in various fora for word-by-word discussion (Wiktionary:Tea room/2025/March § quirinalis). Saumache (talk) 09:01, 30 April 2025 (UTC)Reply
Ah, I misunderstood what you meant by "editors' standards". Thanks for the clarification.--Urszag (talk) 09:48, 30 April 2025 (UTC)Reply
@Urszag I strongly oppose blindly following other dictionaries. There needs to be some systematisation to it. Theknightwho (talk) 09:06, 30 April 2025 (UTC)Reply
In my opinion, using prior dictionaries as a reference and practical criterion isn't a matter of "blindly following" anything. The fundamental principle behind this strategy would be documenting the usage of modern edited publications—just as we do for capitalization in languages such as English—rather than imposing some novel scheme, invented by us, that may not be attested in any Latin text that our readers are likely to see. So if it happens that dictionaries somehow make a mistake and contain a capitalized entry for a word that isn't actually capitalized in attested Latin publications, then we should correct that mistake, but I highly doubt that errors in this regard will be any more common than errors in e.g. noun genders or definitions. There are certainly generalizations that could be made on the basis of that data, which it might be helpful to record somewhere, but the philosophy here would not be to start with rules and decide how to capitalize entries based on that, but to start with usage, which I think is easy to observe in most cases.--Urszag (talk) 09:48, 30 April 2025 (UTC)Reply
In other words, I'm proposing the primary criterion "if a word appears capitalized in Latin text, have a capitalized entry for it on Wiktionary. If it appears uncapitalized, have an uncapitalized entry." I think we can generally rely on dictionaries to accurately indicate which words are usually capitalized. As with other aspects of spelling, entries would be subject to RFV if someone suspects the indicated usage doesn't actually exist.--Urszag (talk) 09:58, 30 April 2025 (UTC)Reply
@Urszag The issue is that we are always going to be using an artificial scheme of capitalisation, because Classical Latin - in which we will find a large number of attestations - did not have a capitalisation distinction. We are also a secondary source, not a tertiary source, so we are not bound by the same limitations as Wikipedia in blindly (and yes, it is blindly) following what other publications do simply because that is what they do.
I am not opposed to us including alternative entries with capitalisations if other users feel there’s a need for that (though I’m not sure I do), but we are at liberty to make the same editorial choices as the authors of all those other dictionaries who have chosen which entries they capitalise or not. This feels like an area in which we are prioritising a somewhat arbitrary distinction at the expense of usability for the reader, who likely does not care one bit for whether we capitalise the headword or not. Theknightwho (talk) 10:13, 30 April 2025 (UTC)Reply
Also, to add to this: there is an important difference between us and print dictionaries (and their electronic versions), which is that print dictionaries are laid out with the entries one after the other, so capitalisation does not present any kind of impairment to a reader finding the entry they are looking for. By contrast, our choice of capitalisation affects findability, because a reader is far more likely to find entries by typing them into the search bar, and we lack the ribbon which lays out entries in alphabetical order to one side. Capitalisation can be expected under certain circumstances (e.g. proper nouns), but I'm not convinced that that translates over to adjectives and adverbs. An entry at Aegyptiacus is not helpful for a user looking up the second word of Spinosaurus aegyptiacus, for instance. Theknightwho (talk) 11:04, 30 April 2025 (UTC)Reply
Binomial nomenclature has its own fixed rules for capitalization. I agree that capitalization affects findability. That's why I think it is best for us to use capitalizations that match the conventions used by the majority of Latin documents, even if this is less simple than using our own bespoke rule system. I'm not certain about the best way to implement this criterion in practice, and I'm fine with having guidelines to ensure consistency with which entry we set as the main and which as the soft redirect when both case-forms are used.--Urszag (talk) 21:27, 30 April 2025 (UTC)Reply
I think even the idea to only capitalise proper nouns runs into the problem that proper nouns are defined differently in different languages. Are names of languages proper? Are names of people? Is there a difference between a name of the people and the name of a country for the speakers? Note how Ingrian does make a difference between soomi and Soomi, but that it also struggles in written text to make a distinction between soomen (Finnish) and Soomen (Finnish). Or how Polish Niemcy is plural.
Is a name really a name or is it a nicknames? Do nicknames get capitalised? What about metaphors? Where's the line?
So I think this is not as easy as "proper nouns", we need to define it further. This is also partly why I personally would favour lemmatising at the all-caps no-ujg Roman script, as less distinctions techincally give us less freedom to impose our own biases on the system. Thadh (talk) 11:32, 30 April 2025 (UTC)Reply
Given the absence of a universal modern convention, I also find myself inclined to drop the upper/lower-case distinction and revert to following Roman practice, a solution which could also finally decide the matter of j/v versus i/u. Nicodene (talk) 12:44, 30 April 2025 (UTC)Reply
We are not only a dictionary of Classical Latin, though. New Latin is covered as well, and New Latin is usually written bicamerally. Communicating capitalization norms is relevant to anyone who wants to read or write in New Latin or who wants to read modern editions of Latin texts. Using spelling that diverges from New Latin authors and editors could pose an obstacle to our readers, albeit not an insurmountable one. I think it’s safe to assume that almost none of our readers will encounter Latin primarily in the form of ancient unicameral inscriptions and manuscripts.--Urszag (talk) 21:27, 30 April 2025 (UTC)Reply
I strongly agree with @Urszag and would strongly oppose moving to unicameral Latin lemmas. Benwing2 (talk) 21:30, 30 April 2025 (UTC)Reply
I also agree with Urszag here; in particular, I would find it unintuitive and unprofessional/wrong-seeming to find (say) Romulus and Jupiter rendered unicamerally as romulus and iuppiter. (But if a significant body of texts render them that way, I would not personally have any opposition to creating soft redirects from those and other unicameral titles.) - -sche (discuss) 22:48, 30 April 2025 (UTC)Reply
I agree as well: complete unicameralism seems unhelpful, though soft redirects are a good idea. Theknightwho (talk) 01:43, 1 May 2025 (UTC)Reply
Which capitalization norms? The norm of capitalizing demonyms or the norm of not capitalizing them? What about adjectives that coincide with demonyms, titles of people or divinities, sobriquets or noms de guerre, non-proper nouns of religious significance in Christianity, days of the week or months?
I don’t see what’s so bad about writing ⟨iuppiter⟩ or ⟨ivppiter⟩, as native speakers actually did, but if modern-style capitalization is a must then I suppose we’re left to choose between:
1) Making up a set of rules ourselves.
2) Following whatever rules happen to be used in some modern source that publishes extensively and carries some kind of authority (the Vatican?)
Nicodene (talk) 04:51, 1 May 2025 (UTC)Reply
Regarding the "where's the line?" question, assuming we do go with the rule of only capitalizing proper nouns, it would be as simple as using the same criteria that we use for the part-of-speech header "Proper noun".--Urszag (talk) 22:06, 30 April 2025 (UTC)Reply
Agreed. Theknightwho (talk) 01:08, 1 May 2025 (UTC)Reply
@Urszag: have you not read the rest of the comment I posted? That line is different for different languages. We will need to invent one for Latin. Thadh (talk) 05:33, 1 May 2025 (UTC)Reply
My point is that these criteria are needed in any case, unless you're proposing that we forbid the use of the POS header "Proper noun" from Latin and convert all of its existing uses to "Noun". That would be another change to the established style for Latin entries. I'm working now to add a summary of what kinds of terms in Latin are proper nouns to Wiktionary:Latin entry guidelines.--Urszag (talk) 05:39, 1 May 2025 (UTC)Reply
@Urszag: if we went with a lack of capitalisation we could, yes. But if you're willing to make an exhaustive list of what makes a Latin proper noun, then I guess it's fine as well. Thadh (talk) 05:42, 1 May 2025 (UTC)Reply
Why would we change the part of speech? We’d just be decapitalizing the first letter (or capitalizing/small-capping the other letters) in all Latin lemmas where they, for whatever reason, are capitalized currently. Including things other than proper nouns, like Februarius, Hispanus (the adjectives). Nicodene (talk) 21:09, 1 May 2025 (UTC)Reply
@Nicodene: Thadh asked "Where's the line?" between nouns and proper nouns in Latin and suggested that the distinction is unclear. I was simply responding that per current practice, we need to draw that line to determine the part-of-speech header, so decapitalizing by itself would not eliminate the need to answer that question. If we can't provide consistent guidelines for answering that question, that would constitute a reason for not only decapitalizing, but also getting rid of the part of speech "Proper noun" in Latin entries. But I am optimistic that we can identify reasonable rules. I have edited Wiktionary:Latin_entry_guidelines#Proper_nouns to add some guidelines that I believe will not be controversial. However, some cases may be more difficult, such as names of holidays, religions, doctrines, or political movements.--Urszag (talk) 21:32, 1 May 2025 (UTC)Reply
I see.
One might try solution #2 above and choose a specific source—like publications from the Vatican, or one of the aforementioned Latin dictionaries—as a point of reference for capitalization, or for orthography overall. Then it’s just a matter of describing what kinds of words that source happens to capitalize, which may not fit into a neat grammatical rule. Nicodene (talk) 22:21, 1 May 2025 (UTC)Reply
@Nicodene I suspect we can probably fold the capitalisation issue into the multiple-spelling issue, in the sense that transclusion is probably the way to go. Theknightwho (talk) 22:31, 1 May 2025 (UTC)Reply
  • I haven't read the entire above discussion, but I've never seen a modern edition of De Bello Gallico that didn't capitalize the demonyms, regardless of the nationality of the editor. Looking at the Wikipedia articles about that work in all the major modern Romance languages, I see that Spanish, Catalan and Romanian do not capitalize the demonyms, while French, Portuguese and Italian do capitalize them, so the modern languages are split 50/50 on the issue. I would definitely find it jarring to see nouns like Hispanus, Gallus, Celta and Germanus lower case. I'd prefer the corresponding adjectives to be capitalized as well, but seeing them lower case is much less jarring than seeing the nouns that way. —Mahāgaja · talk 13:57, 6 May 2025 (UTC)Reply

The i/j distinction in Latin

edit

The current practice in Latin entries is to make no distinction between i and j in entries; instead, we use i in all circumstances. I think we should make the distinction, for a few reasons:

  1. It leads to ambiguity. For instance, the term adjuvō currently has its main entry at adiuvō, but the lack of i/j distinction means that adiuvō is ambiguous as to whether it is 3 syllables (ad-iu-vō) or 4 syllables (ad-i-u-vō). This same ambiguity affects Iēsūs, which notes that it has both 2-syllable (Iē-sūs) and 3-syllable (I-ē-sūs) readings. This ambiguity does not arise in our entries due to the u/v distinction, where we distinguish servit (2 syllables: ser-vit) and seruit (3 syllables: se-ru-it).
  2. It is inconsistent with how we handle u and v. Classical Latin made neither distinction, meaning that u/v were represented by V, and i/j were represented by I (e.g. Venus was VENVS, and Jēsūs was IESVS). Our page WT:Latin entry guidelines states that this is because "the distinction between I and J only appears post-Classical Latin", but the same also applies to U/V, so I'm uncertain why the current editorial practice was chosen. My suspicion is that it's because this is a popular practice in modern scholarly editions.
  3. However, what makes sense for scholarly editions is not necessarily what makes sense for us, because our primary aim is not faithfulness to the original source material, but to make phonemic distinctions as clear as possible to readers. As a point of comparison, scholarly publications rarely include macrons, but that does not mean we should exclude them, because they represent an important phonemic distinction that existed in Classical Latin. Likewise, the distinction between i and j represents an important distinction that we should not be glossing over.
  4. Pronunciation sections are inadequate for making the distinction, just as they're inadequate for giving the length distinction shown by macrons. Most readers can't read IPA, and we shouldn't assume that any can.
  5. There's nothing stopping us giving soft redirects at alternative spellings anyway.

Theknightwho (talk) 11:41, 30 April 2025 (UTC)Reply

I agree that our current practice of reserving v for the consonant and u for the vowel(s), while making no corresponding distinction between j and i, is illogical. I would strongly prefer a consistent alternative, whether that means making both distinctions or neither of them. I am a bit more inclined to distinguish neither, for the reasons mentioned in the thread above this one, but your point about indicating important phonemic distinctions—in a way that readers unfamiliar with IPA can understand—is valid. I think it bears mentioning though that applying the latter principle consistently would also mean distinguishing vowel length in lemmas (not just in headwords) as well as distinguishing diphthongs from sequences of monophthongs.
ETA: given that we handle vowel length with diacritics in the headword, couldn’t we also do that with glides? For instance ⟨iam⟩ for the lemma and ⟨i̯am⟩ for the headword. Nicodene (talk) 13:55, 30 April 2025 (UTC)Reply
Are there many cases where the whether something is a glide or a vowel is not clear from the phonological structure of the word though? It may be more worthwhile to include diacritics for hiatus (aï, aü?) than the other way around. Thadh (talk) 14:02, 30 April 2025 (UTC)Reply
There are indeed far fewer unpredictable cases, such as iambus with /i/ or belua with /u/, than predictable ones. Incidentally it seems iambus is already written with a diaeresis in the headword, as you suggest, but not (yet) Iason, io, or iulus.
As for the letters C and G, they may not have been distinguished in Old Latin but they were in Classical Latin (and ever after). Nicodene (talk) 14:46, 30 April 2025 (UTC)Reply
In that case as TKW says - if we do split off Old Latin, we should have no distinction there, if we don't, we don't. Thadh (talk) 15:36, 30 April 2025 (UTC)Reply
@Nicodene What would be the benefit of i̯am? I think this could be covered by my past suggestion of having a box that shows the various different orthographies. I'm not sure that making no distinction would be of benefit for the average user, however, who is likely to find the lack of distinctions to be confusing and/or actively unhelpful in a way that the macronless spelling likely wouldn't be.
My main issue with the alternative suggestions (namely, and ï) is that they represent other ways of making the same distinctions that I've suggested we make here but in a way that is less familiar to the average user (especially in the case of ). I'm not entirely sure what removing the distinction in favour of these would achieve. Theknightwho (talk) 14:49, 30 April 2025 (UTC)Reply
Consistency. We otherwise follow typical Roman orthography in our lemmas. It’s not clear to me why the distinction between /j w/ and /i u/ would be more deserving of a special exception than the one between long and short vowels, or the one between diphthongs and adjacent monophthongs. Or why some of these should be indicated in the lemma while others are left to the headword. I’d be inclined to treat them all the same way, whatever that may be.
I suppose not everyone knows what a diaeresis is for, but by the same token not everyone knows what a macron is for either, yet we do use them. Nicodene (talk) 15:21, 30 April 2025 (UTC)Reply
As I said in the above post, I would personally prefer going towards not distinguishing the two, and probably not distinguishing c and g, either. Thadh (talk) 14:00, 30 April 2025 (UTC)Reply
@Thadh If we were to split out Old Latin, which I think we probably should, then I agree with you. Theknightwho (talk) 14:50, 30 April 2025 (UTC)Reply
  Support, it's easier to remove, than add, information. Lemmatizing at spellings with the i/j u/v distinction, and generating the spellings without the distinction, would also be better for search results. — BABRtalk 16:20, 30 April 2025 (UTC)Reply
Hello! I am happy to see interest in general infrastructure for Latin, which has not been very lucky in this regard. Nonetheless, I support the current normalisation. To address the remarks you made:
  • 1–2. It leads to ambiguity and [i]t is inconsistent. Yes, though that is true for many, if not most, languages. The i-u-v scheme we are using is, in my experience at least, the most used among both modern and not-so-modern publications. The consistency and/or unambiguousness of alternative schemes does not, in my opinion, make up for their being less common. As a side note, the difference in opinions here may depend on educational background. In Italy what you are suggesting would not have been taken seriously, the scheme used is unshakably i-u-v and has been that for quite some time and I suspect that is the case for most of Europe, while it seems that the i-j-u-v scheme gained wider use in English-speaking contexts than it did here.
  • 3. As a point of comparison, scholarly publications rarely include macrons [] While running text may lack macrons, I do not recall seeing a modern dictionary without them. Lexicographical diacritics, a cross-linguistical concept, are not comparable to orthographical normalisation choices. A better analogy would be Nicodene's proposed i̯am, which although I have never seen elsewhere, does look compelling. [O]ur primary aim is not faithfulness to the original source material, but to make phonemic distinctions as clear as possible to readers. I disagree with this. It is a legitimate viewpoint, especially among laguages with a less crystallised orthography, but not a universal principle, and in this context I do not think it is the best course of action for a general-purpose resource like Wiktionary.
  • 4. Pronunciation sections are inadequate for making the distinction. Pronunciation section are by definition the best place to hold pronunciation information. If IPA is too technical we could find another way to show it, like has been done for English, without changing the spelling.
I think this proposal puts too much weight on historical accuracy and not enough on the way Latin has been taught and used for the last two thousand years: Latin never died. Catonif (talk) 16:42, 30 April 2025 (UTC)Reply
I suspect that the i-u-v scheme came from Italy to begin with, judging by the identical one used for writing the Italian language itself.
Perhaps this would be a good time to compare the lemmatization practices of the ‛big boys’ in Latin lexicography:
  • DMLBS: iambus, juvencus, eruere, cervix (i-j-u-v)
  • L&S: ĭambus, jŭvencus, ē-rŭo, cervix (ĭ-j-ŭ-v)
  • MLLM: (?), juvenculus, eruere, cervicatus (i?-j-u-v)
  • OLD: ĭambus, iuuencus, ēruō, ceruix (ĭ-i-u)
  • TLL: ĭambus, iuvencus, ēruo, cervīx (ĭ-i-u-v)
Nicodene (talk) 22:51, 30 April 2025 (UTC)Reply
This favours a four-way distinction. In principle, I'm not opposed to using ĭ to denote syllabic i, but it seems unwise to rely on the presence or absence of a breve, given that editors frequently omit length information when it isn't readily available. Theknightwho (talk) 23:39, 30 April 2025 (UTC)Reply
Thank you for the overview, exactly what the discussion should have had to begin with. It seems that the i-j-u-v scheme has a greater popularity among English-language lexicography (and even Gaffiot 2016 mentioned by Benwing!) than I imagined. And yes, the i-u-v scheme likely developped in the context of Italian, so take my opinion with a grain of salt. Nonetheless, TLL's approach would be my favourite, keeping the i-u-v orthography while indicating the phonemic distinction (Iēsus vs. Ĭēsus). As a side note, perhaps we could use the breve for other istances of hiatus as well, e.g. coĕmō instead of coëmō, although this may be less understandable. Catonif (talk) 19:37, 1 May 2025 (UTC)Reply
I   Support making the i/j distinction as proposed by @Theknightwho, consistent e.g. with Gaffiot 2016. Benwing2 (talk) 21:32, 30 April 2025 (UTC)Reply
  Oppose: I don't have a strong personal preference, but I think it's best to continue using the i-u-v scheme because I agree with Catonif that it is the most widely used. I strongly expect therefore that most of our users will prefer it and find it the most familiar system. Alatius ran a poll around 2010 surveying 251 users of certain online Latin discussion forums, and apparently found the i-u-v scheme to be "by far the most popular", although unfortunately I think the images showing the precise poll results have been lost: "Survey of Latin orthography preferences", Alatius, 2010, Textkit Greek and Latin. I think that it is very unusual nowadays to write Ecclesiastical Latin without the u-v distinction, so the i-u scheme feels somewhat biased against this form of Latin. I know that some academics have adopted i-u in recent publications, but I think it is still rare in textbooks and introductory learning materials. Also, I think most people who use the i-u scheme in lowercase use the I-V scheme in uppercase (i.e. the uppercase counterpart of "u" is "V"), which is an added complication that would be tricky for us to handle. Therefore, my second-place preference would be i-j-u-v.--Urszag (talk) 22:32, 30 April 2025 (UTC)Reply
I would also oppose any change that eliminated the u/v distinction. However, whatever we decide, we absolutely need to find a consistent way to handle the i/j ambiguity, because the current approach is inadequate. Theknightwho (talk) 23:32, 30 April 2025 (UTC)Reply
I don't think we can always represent both conventional Latin spelling and Classical Latin pronunciation with a single headword spelling in a way that looks natural and isn't a mess of diacritics. We don't do that for other languages; I realize Latin is a little different since we can't illustrate Classical Latin pronunciation using audio files in the way that we can for e.g. English, but I still think that we should take advantage of having a separate dedicated pronunciation section, as Benwing mentioned. Even if we use i-j-u-v, there's still cases where aspects of pronunciation will not be apparent just from spelling. There are words like obiciō, where the standard spelling uses a single letter "i" to represent the consonant "j" followed by the vowel "i". There are words like biiugus, where the consonant "j" is single between two vowels because of the prefix-base boundary, in contrast to cases like eius where the consonant "j" is pronounced double. There are words like abripiō, where the "br" is always split across syllable boundaries in Classical Latin pronunciation because of the prefix-base boundary, in contrast to the "br" in a word like celebrō, where both consonants are normally pronounced together at the start of a syllable as a complex onset cluster. There are words like illūc, with stress on the final syllable. If we go with "Most readers can't read IPA, and we shouldn't assume that any can", I think the solution to that would be to present a non-IPA respelling in the pronunciation section, like Template:enPR for Latin. So we would have adiuvō on the headword line, but something like "AD-ju-vō" or "ád-ju-vō" in the pronunciation section before the IPA transcriptions (currently /ˈad.i̯u.u̯oː/, [ˈäd̪i̯uː̯oː], although I personally would prefer that we revise Latin IPA to use "j" and "w" instead of "i̯" and "u̯"; you can see that the implementation is currently buggy and incorrectly fuses sequences of vowel + semivowel in phonetic transcriptions). I think having Module:la-IPA generate such a non-IPA respelling would be pretty straightforward.--Urszag (talk) 00:08, 1 May 2025 (UTC)Reply
Actually, here's a somewhat more radical proposal: I think it might be good to eliminate IPA phonemic transcriptions from Module:la-IPA. I don’t think most readers know what the difference is between // and [] anyway: I’ve seen people elsewhere online misunderstand our entries as displaying two separate pronunciations. (For example, the author of this recent Reddit comment thought it was an alternative pronunciation: “I see these pronunciations listed for classical Latin: /kon.stan.tiːˈno.po.lis/, [kõːs̠t̪än̪t̪iːˈnɔpɔlʲɪs̠] A little googling showed me that the lowercase j in superscript position means palatalization, so I guess that was an alternative pronunciation sometime, somewhere.”) It may be clearer to use an obvious non-IPA respelling showing phonemes and syllable divisions, followed by (reasonably broad) phonetic transcriptions for Classical Latin and Ecclesiastical Latin respectively. That way, we also can avoid some tricky questions such as what the phonemic identity of word-final -m was in Classical Latin, and whether assimilations such as bs > [ps] operate on the phonemic or phonetic level. Going back to adiuvo, my proposal would be for its pronunciation to be displayed as follows: "AD-ju-vō, Classical Latin IPA(key): [ˈäd̪juwoː], modern Italianate Ecclesiastical IPA(key): [ˈäd̪juvo]"--Urszag (talk) 00:45, 1 May 2025 (UTC)Reply
@Urszag I'm all in favour of overhauling Module:la-IPA (though not everything you suggest), but there are a couple of things here:
  1. It needs its own thread, as it's a separate question to which orthography we use.
  2. I don't understand what the attraction of using these bespoke, ad hoc standards is, when we are perfectly able to handle multiple spelling conventions. Nobody is suggesting that we remove the entry at adiuvo if adjuvo is made the primary lemma. This is not an either/or situation. In fact, we can make use of transclusion to ensure that we don't even lose information on alternative entries, so all that is achieved by this is to make things more difficult from a technical perspective, requiring a higher degree of maintenance from editors to ensure that the templates are fed the correct info. This is not helpful. It's one thing to make things more accessible to users; it's entirely another to do so at the expense of users who benefit from clear phonemic information.
So far, in a proposal to be more precise about the information given in headwords, we have propsoals to (a) eliminate u/v as well, and (b) to remove phonemic information from pronunciations altogether. Collectively, we seem to have forgotten what the point of a dictionary is. Theknightwho (talk) 00:57, 1 May 2025 (UTC)Reply
My point I guess is that any attempt to indicate all pronunciation information in the headword spelling essentially turns into a bespoke non-IPA transcription system. If you aren't in favor of indicating all pronunciation information using respellings like "AD-ju-vō", I don't understand why you think it's unacceptable to omit the fairly predictable i-j distinction in this context. "Inadequate" is a strong word to use for the popular i-u-v scheme.
I'm not married to the idea of removing the IPA phonemic transcriptions, but I'm not sure how to reconcile you first arguing that we should expect them to be useless for most of our readers, and then arguing that we need to retain them. It isn't a difficult task to infer them from the spelling along with the phonetic transcriptions.--Urszag (talk) 01:12, 1 May 2025 (UTC)Reply
@Urszag But the pronunciation section is not the headword, and it is a well-established practice to include certain phonemic distinctions in Latin headwords. I'm not sure why it is necessary for me to demonstrate the purpose of headwords ab initio when proposing a relatively minor extension by analogy to a system we have used for 20 years. There are an awful lot of barriers being thrown up here, and we are losing sight of the original point of the proposal, which takes for granted the fact that headwords already include certain phonemic information that is not necessarily distinguished in running text.
And yes, the i-u-v system is inadequate for dictionary purposes. That does not mean it is pointless, that I dislike it, that we should ignore it etc. etc. It simply means that it is not adequate for our needs, and "fairly predictable" doesn't cut it for a dictionary, especially when you're proposing an ad hoc system to get around it instead of using a well-established standard that is both intuitive and widely understood. None of that precludes us having entries that use i-u-v, though, and we can make use of transclusion to get around that problem.
My point about users not necessarily understanding IPA was to drive home the point that the headword should contain as much phonemic information as possible; it was not to suggest that we should add to the confusion by only giving phonetic information, thereby making the phonemic difference between i and j even less clear. That would be a huge backwards step. Theknightwho (talk) 01:20, 1 May 2025 (UTC)Reply
@Urszag: I agree with using ⟨j w⟩ for the IPA. I also agree that displaying both // and [] is excessive, but why not remove the []? It seems rather strange for a dictionary to auto-generate purported phonetic pronunciations dated to two millennia ago. (Not to mention that this encourages all sorts of invented pseudo-precision like [d̪ t̪] for /d t/ and [ʊ̃ˑ] for /-um/.)
As for indicating pronunciation with non-IPA respellings, is it going to be any easier for readers to understand than, say, the symbols in /ˈadjuvo:/? We already expect readers to know basic IPA to access pronunciation in general on Wiktionary, or at least to follow the auto-generated link to an IPA key.
Ah, I'll start a third topic on this since Theknightwho pointed out that it's really another conversation. I think /ˈadjuwo:/ is relatively accessible (which is why I'm not sure I agree with Theknightwho's argument that "Most readers can't read IPA, and we shouldn't assume that any can"), but I think IPA stress, length, and syllable division marks are all probably less immediately intuitive to most readers than a non-IPA convention of marking stress with acutes or uppercase, length with macrons, and syllable divisions with hyphens.--Urszag (talk) 05:35, 1 May 2025 (UTC)Reply
@Theknightwho: I do like your proposed system more than the current one, but help me out here: what makes it preferable to i-u? If we prioritize reader comfort/familiarity, aren’t we better off with the status quo of i-u-v? If we prioritize making phonemic distinctions in the lemma, then why omit other important ones (/V/≠/V:/, /VV/≠/VV̯/)? Or for that matter why not leave these distinctions to the pronunciation section, since that’s what it’s for anyway? Nicodene (talk) 04:03, 1 May 2025 (UTC)Reply
@Nicodene Given the complete lack of consensus for any system (every possible system seems to have been proposed), an alternative approach might be as follows:
  1. We use transclusion to display entries at multiple different lemmatisations. This requires a nontrivial amount of work to put together a module which is capable of doing this, but given the differences are orthographic and regular, the final product should be relatively painless for the average editor to actually use.
  2. In the back-end, modules should use a maximalist distinction, on the principle (stated by @Babr) that distinctions are easier to remove than to add. As such, the actual working modules would make all relevant distinctions. Given that the system would be a technical one, it's not especially important what we use, so long as it's capable of making all of the relevant distinctions.
  3. In terms of display, this could then be converted to the relevant system, be it Classically-faithful, the standard i-u-v system, or whatever.
  4. Importantly, this would clear up the problems that we currently face with headword and inflection templates, which will currently generate nonsense like præacutae or uāgīvī.
  5. This should be a way to ensure that no particular scheme gets prioritised, from a user perspective.
Theknightwho (talk) 06:06, 1 May 2025 (UTC)Reply
I like the idea in general of using transcription to avoid duplication and circumvent problems of where to lemmatize. I proposed a very similar idea before with Punjabi, where there are two writing systems (Gurmukhi, a South-Asian-style abugida used in India, and Shahmukhi, a Perso-Arabic script used in Pakistan). For the most part, neither script is losslessly convertible from one to the other (although maybe Shahmukhi with the right vowel diacritic system can be converted to Gurmukhi, but this may not always be the case and it's difficult to enforce the correct use of diacritics in most Perso-Arabic scripts, which outside of Arabic typically have no native tradition of doing so); and lemmatizing at either script is a potential political statement that we'd like to avoid. My proposal was to lemmatize using a maximalist romanization that captures all the distinctions in both scripts, and use transclusion to generate the appropriate lemma entries in the two scripts. Something similar can and should be done for Serbo-Croatian, which currently has duplicated entries everywhere. There are numerous technical issues to work out, esp. with the case of Punjabi, e.g. where to put the underlying romanized lemmas (in an appendix?), but it's definitely feasible. If you go down this route, it will be important to design such a system with other languages than Latin in mind, so that when the time comes to use it for Punjabi or Serbo-Croatian, we don't have to start over from scratch.
That said, I'm a bit leery of adopting such an approach here, because the number of lemmas where it would be used is only a subset of the whole, and it will impose a non-trivial technical burden on anyone wanting to add lemmas that might not be worth it considering that we're talking about a letter here and there vs. a fundamentally different script. Benwing2 (talk) 06:27, 1 May 2025 (UTC)Reply
@Benwing2 I think there are a couple of things that should ease the burden for anyone adding lemmas:
  1. Assuming we use the maximalist spelling as the baseline (i.e. all the distinctions), all other spellings should be trivially-derivable in 99% of cases. Morpheme boundaries may be awkward, but these should be pretty rare. This should require very little input from the user, but I think it makes sense to integrate any alternative spelling display with the pronunciation, given that anything which affects one will affect the other (e.g. if ae straddles a morpheme boundary, {{la-IPA}} should be using a.e anyway).
  2. Generating alternative entries can be done via acceleration. The actual wikitext should be minimal, as everything should be transcluded. This also goes for any inflection templates etc etc.
  3. As a sanity check, we will want some kind of inverse-inflection capability, similar to that in {{es-verb form of}}; this goes for alternative spellings and inflections (and combinations thereof).
Theknightwho (talk) 06:36, 1 May 2025 (UTC)Reply
OK, I'm not quite understanding where you would put the maximally-spelled lemmas. Would they be mixed in with the regular lemmas, or segregated into an appendix or something? In the former case, how do we ensure that they don't show up in categories? (Are you planning on introducing some sort of special flag in Module:headword to mark "underlying source" lemmas like this, like we do for alternative forms?) Benwing2 (talk) 06:41, 1 May 2025 (UTC)Reply
@Benwing2 The maximal spelling would be the "real" lemma, in the sense that it would contain substantive info that users might want to edit. The other entries would simply be pointed at it.
There does need to be a plan to avoid flooding the lemma category, yes. I'll have a think about how best to do it. Theknightwho (talk) 07:04, 1 May 2025 (UTC)Reply
Do you mean something like the ‛mirror’ that I described here?
{{head}} has a parameter |altform=1 which excludes altforms from the usual part-of-speech categories and instead dumps them into ‛Category: [language name] alternative forms’. Nicodene (talk) 07:32, 1 May 2025 (UTC)Reply
@Nicodene Yes. Not necessarily the exact same specifics, but fundamentally that’s the idea. Theknightwho (talk) 08:48, 1 May 2025 (UTC)Reply
Yes I understand that, but my question was rather: (1) where will the "real lemma", as you call it, live? In the mainspace or in an appendix? (2) What counts as maximal? Is it the i/j/u/v form or does it include macrons for long vowels and maybe breves for short vowels? If yes to macrons and breves, what about unclear cases? (3) If it lives in the mainspace, what happens if the maximal spelling happens to agree with one of the externally visible spellings? Benwing2 (talk) 07:45, 1 May 2025 (UTC)Reply
@Benwing2
  1. Mainspace. It should be possible to transclude the content of the entry, cutting/amending parts as necessary.
  2. Maximal would mean i/j and u/v; any macrons or anything else can be taken from the headword. In theory, this could be taken from any entry, so long as the pronunciation section is complete, but using the maximal spelling builds in redundancy (e.g. the pronunciation and headword at the main entry should be compatible).
  3. I’m not sure I understand this point: the maximal spelling would be a real entry, and I’m not suggesting we create pages with macrons or anything like that. Theknightwho (talk) 08:45, 1 May 2025 (UTC)Reply
  Oppose as a frequent user of Wiktionary for looking up Latin. When I look up a word, I look it up using the spelling I see. Most Latin texts I've encountered, including all modern Ecclesiastical texts, maintain the u-v distinction but do not use J. The reason for this is simple: in some pronunciation schemes of Latin (like Ecclesiastical Latin) there is a more significant (and less intuitive) pronunciation difference between u and v than between i and j. In our current system, users can still look up J forms, but since they're less likely to encounter those forms in the wild, I don't see why they should become the lemmas. Andrew Sheedy (talk) 02:12, 1 May 2025 (UTC)Reply
@Andrew Sheedy It isn't an either/or. We wouldn't delete entries that use the i-u-v system. Theknightwho (talk) 02:35, 1 May 2025 (UTC)Reply
But terms with consonantal i would be made redirects, would they not? Or am I misunderstanding the proposal? Andrew Sheedy (talk) 04:33, 1 May 2025 (UTC)Reply
@Andrew Sheedy Yes; currently we have soft redirects from Latin words with j in them to the corresponding lemmas with i in them. If we switched to lemmatizing at the j, we'd essentially reverse the direction of soft redirects, and e.g. ianua would be a soft redirect to janua instead of vice-versa. Benwing2 (talk) 06:04, 1 May 2025 (UTC)Reply
@Benwing2 @Andrew Sheedy On this point, I've just proposed a system which would avoid prioritising any one system above, which should be a way to circumvent the issue of different people preferring different systems, or otherwise we'll always leave someone unhappy. Theknightwho (talk) 06:09, 1 May 2025 (UTC)Reply
I would be fine with such a system. I agree with Benwing that it would be good to look beyond Latin and try to find a solution that would be compatible with other languages with multiple orthographies. Andrew Sheedy (talk) 20:07, 1 May 2025 (UTC)Reply
I'd be happy with a technical solution that eliminates the need for soft redirects in either direction between i/j and u/v variants, assuming it has good performance and doesn't add much difficulty to creating new entries.--Urszag (talk) 20:47, 5 May 2025 (UTC)Reply
@Urszag Alright, let's move forward with this then, as there have been no objections, unless @Mahagaja wishes to object. Theknightwho (talk) 14:07, 6 May 2025 (UTC)Reply
@Theknightwho: This thread is so long and convoluted I can't tell what precisely you want to move forward with. —Mahāgaja · talk 14:13, 6 May 2025 (UTC)Reply
@Mahagaja In brief: there is no consensus between editors for which spellings they prefer; some prefer i/j, some prefer the status quo, and some prefer no u/v distinction either. In addition, users may encounter a wide variety of spelling schemes, our entries are generally poor at accommodating these outside of the most common words, and Latin is far more likely than most languages to have many one-off/occasional users with low understanding of the language, due to the historical status of Latin as a lingua franca.
To get around all these issues, and to avoid massive duplication, I have proposed that we retain one spelling as the main entry (as now), but use a special template to transclude the content of entries to entries at the other spellings (which would have minimal wikitext). This is a system that already works well in other languages, though there is no need to follow the exact same system as those. Theknightwho (talk) 14:21, 6 May 2025 (UTC)Reply
  •   Oppose. Using j in Latin just looks hopelessly old-fashioned, and (unlike u and v) they never contrast. Wiktionary would look like it was being written in 1825 instead of 2025 and there would be no benefit. —Mahāgaja · talk 13:59, 6 May 2025 (UTC)Reply
    They do contrast: Iēsūs (trisyllabic /iˈeː.suːs/); Jēsūs (disyllabic /ˈjeː.suːs/). Theknightwho (talk) 14:10, 6 May 2025 (UTC)Reply
    Those are not two different words, so that's not a contrast. —Mahāgaja · talk 14:25, 6 May 2025 (UTC)Reply
    @Mahagaja Sorry, but that's nonsense. Theknightwho (talk) 14:27, 6 May 2025 (UTC)Reply
    No, it isn't. From w:Minimal pair: "In phonology, minimal pairs are pairs of words or phrases in a particular language, spoken or signed, that differ in only one phonological element, such as a phoneme, toneme or chroneme, and have distinct meanings" (emphasis added). Iēsūs and Jēsūs don't have distinct meanings, so they're not a minimal pair. And what's the evidence for a trisyllabic pronunciation anyway? —Mahāgaja · talk 14:31, 6 May 2025 (UTC)Reply
    @Mahagaja We do not need a perfect minimal pair in order to see that a distinction between adjuvō and gladius, which is made evident by the fact that coadiūtor has the wrong pronunciation because it has been wrongly assumed to have syllabic i. Theknightwho (talk) 14:47, 6 May 2025 (UTC)Reply
    The fact that {{la-IPA}} hasn't been written carefully enough to get the pronunciation of coadiūtor right doesn't prove that the distribution of /j/ and /i/ is unpredictable. Neither */a.diˈuwoː/ nor */ˈɡlad.jus/ is a possible word of Latin. —Mahāgaja · talk 14:54, 6 May 2025 (UTC)Reply
    @Mahagaja In what way is the pronunciation of coadiūtor deducible as coadjūtor from the spelling? That's aside from examples like Gāius, where the syllabification can be traced over time reducing from 3 syllables to 2 ([7]). You are repeating a common dogma that doesn't actually reflect reality.
    Neither /a.diˈuwoː/ nor */ˈɡlad.jus/ is a possible word of Latin. Explain, because you seem to be making etymological inferences. Theknightwho (talk) 14:57, 6 May 2025 (UTC)Reply
    I guess it's the morphology rather than spelling that clinches it. Anyway, when is Iēsūs ever trisyllabic? I've certainly never heard it pronounced that way in Church Latin in my 40 years of singing in church choirs. —Mahāgaja · talk 15:10, 6 May 2025 (UTC)Reply
    @Mahagaja No, the morphology simply isn't relevant. You run into more difficulties when you compare injūrus with paliūrus, where the distinction is clear due to the etymology, but completely opaque phonemically. I can have a look for evidence of the syllabification of Iēsūs, but you can't ignore the evidence for the two different syllabifications of Gāius. Theknightwho (talk) 15:17, 6 May 2025 (UTC)Reply
    Though for what it’s worth, I’ve been singing choral music for over 20 years and I’ve never encountered it either, so it may be limited to Late Latin. Theknightwho (talk) 15:39, 6 May 2025 (UTC)Reply
    The two different syllabifications of Gāius are irrelevant because (1) once again, it's the same word, so not a minimal pair, and (2) they belong to different time periods. It isn't a contrast. And iniūrus has a morpheme boundary that paliūrus doesn't. —Mahāgaja · talk 16:08, 6 May 2025 (UTC)Reply
    @Mahagaja The morpheme boundary is irrelevant to deriving the pronunciation from the spelling, and using that would prevent u/v from being contrastive as well: servit (serui-t) and seruit (ser-u-it). It's a specious argument, as is the argument about time periods (we aren't representing one time period) and phonemic contrast (this would prevent us representing poetic metrical differences at all). Theknightwho (talk) 16:14, 6 May 2025 (UTC)Reply
    The morpheme boundary is absolutely crucial to the question whether or not the distribution of /i/ and /j/ is predictable. As for Iēsūs, if a trisyllabic pronunciation ever existed at all, I would expect it to be early, used when the name was a relatively unfamiliar borrowing from Greek and the Greek trisyllabic pronunciation was being copied. As the name became more familiar through Christianization, the much more nativelike disyllabic pronunciation probably ousted any foreign-sounding trisyllabic one. —Mahāgaja · talk 17:06, 6 May 2025 (UTC)Reply
    @Mahagaja Yes, and it’s crucial in deciding whether it’s u or v as well, so it’s a nonsense criterion. @Nicodene @Benwing2 what are your thoughts? Theknightwho (talk) 18:12, 6 May 2025 (UTC)Reply
    Julius has three syllables with /j/ and his supposed progenitor Iulus also has three syllables with /i/. There's no way to predict this automatically. jam has /j/ but its derivative etiam has /i/ even though there's a morpheme boundary after the et- (and if you argue there isn't, your reasoning is circular). In general the only difference between the i-j distinction and the u-v distinction is that the latter has more functional load, but both are phonemic and IMO there's no good reason for making one distinction but not the other. Benwing2 (talk) 20:06, 6 May 2025 (UTC)Reply
    There are too many cases where the difference is unpredictable. Cf. [j]am versus [ˈi]ambus and the cases listed by Cser (2016: 14) such as bel[u]a versus sil[w]a. Nicodene (talk) 22:30, 6 May 2025 (UTC)Reply

Retiring dual phonemic-phonetic transcriptions for Latin

edit

As I said at Wiktionary:Beer_parlour/2025/April#The_i/j_distinction_in_Latin, I think we should consider replacing the dual IPA transcriptions in Latin entries, because I think few of our users actually understand the distinction between phonemic and phonetic transcriptions. As a concrete example, someone on Reddit wrote: “I see these pronunciations listed for classical Latin: /kon.stan.tiːˈno.po.lis/, [kõːs̠t̪än̪t̪iːˈnɔpɔlʲɪs̠] A little googling showed me that the lowercase j in superscript position means palatalization, so I guess that was an alternative pronunciation sometime, somewhere.” I think a non-negligible amount of our readers will not notice or appreciate the difference between // and [] and will make the same mistaken assumption that these are two distinct, alternative pronunciations rather than two different types of transcription representing the same pronunciation.

I think removing the phonemic pronunciation and leaving the phonetic pronunciation is better than the reverse. Our average reader will probably have only a hazy concept of what a phoneme is. To the extent that they know IPA, it's plausible that they've been introduced to it as a "phonetic" alphabet, as per its name (despite the fact that it isn't reserved for phonetic use by linguists), and they may expect the same IPA symbols to represent more or less the same sounds across languages. This kind of expectation is one reason why we use the transcription /ɹ/ in English even though /r/ would be just as adequate as a phonemic transcription. Therefore, if they see the transcription /siɡˈnaː.tim/, they're likely to be liable to get the misimpression that the first syllable rhymes with league or cig and the last syllable sounds like team or Tim. The transcription [sɪŋˈnäːt̪ɪ̃ˑ] may be overly narrow, but at least it suggests that the first syllable rhymes with sing, and the last syllable doesn't rhyme well with any English word.

A phonemic transcription actually requires making more theoretical presuppositions than a phonetic transcription. There are a number of areas where, despite broad agreement about phonetics, there are different approaches to the phonemic analysis of Latin.

  • Final -m: We're reasonably sure that when the letter -m came in word-final position in Classical Latin, it was pronounced phonetically as nasalization on the preceding vowel. But there is not consensus among linguists about whether this means Classical Latin had phonemic nasalized vowels. We currently transcribe it phonemically as /m/, but that's also dubious. Cser 2016 explicitly distinguishes it from the phoneme /m/ and treats final -m in Latin as a "placeless nasal consonant" (pages 15, 28-29): we could transcribe that with a non-IPA symbol like "N", but that's unlikely to be understandable to our readers.
  • qu, su, gu: Cser 2016 argues in favor of analyzing these as biphonemic clusters /kw ɡw sw/ (pages 16-28) but acknowledges there are arguments for analyzing them as single complex phonemes /kʷ ɡʷ sʷ/.
  • ae, oe, ui, ei, au, eu, etc.: Cser 2016 argues for analyzing these as phonemic vowel-consonant sequences /aj, oj, uj, ej, aw, ew/ (page 31-37), whereas others analyze them as phonemic diphthongs /a͜e, o͜e, u͜i, e͜i, a͜u, e͜u/.
  • Some consonants are always double between vowels within a word, e.g., [ʃ] in Ecclesiastical Pronunciation. Technically, there's no possible minimal pair between /ˈfaʃʃia/ and /ˈfaʃia/, and so I'm not sure how theoretically sound it is to indicate the doubling on the phonemic as opposed to the phonetic level.
  • Latin stress is usually predictable based on the phonemic structure of the word. There are a few exceptions, but it's not clear that means stress is assigned defined in all cases at the phonemic level. In contrast, it's clear there's no theoretical issue with including stress in a phonetic transcription of Latin.

We already have languages where we show phonetic transcriptions without a phonemic transcription; e.g. this seems to be the norm for Catalan entries, and Nicodene made this suggestion for Sicilian.

There is only limited benefit to including both transcriptions. They would be useful only to readers whose understanding of linguistics is sophisticated enough to appreciate the difference between phonetic and phonemic transcription, but whose understanding of Latin spelling and pronunciation is superficial enough that they can't easily derive the phonemic transcriptions themselves (the way that Module:la-IPA does). I think that benefit is small enough that it's outweighed by the risk of confused readers mistakenly interpreting the phonemic transcriptions as phonetic transcriptions. Urszag (talk) 05:30, 1 May 2025 (UTC)Reply

@Urszag: Could you give some examplar Latin words wherein ⟨ei⟩ is pronounced as /ej/ or /e͜i/, and not as /e.i/ vel sim., with hiatus, please? 0DF (talk) 15:35, 1 May 2025 (UTC)Reply
@0DF dein(de) (afterwards), deinceps (successively), one pronunciation of ei (him), and interjections like hei and oiei / ojei. In all cases, the “diphthong” is an artefact of consonantal i occurring in a position where j would be disallowed by convention, as it’s before a consonant (e.g. *dejnde) or at the end of the word (e.g. *hej).
The same issues applies to ui, au and eu: e.g. E͡urōpa is indistinguishable from *Evrōpa etc. etc. Theknightwho (talk) 19:13, 1 May 2025 (UTC)Reply
@Urszag I   Support this change and note that it's also the norm for Russian to include only a phonetic transcription (and this is likewise done in my mostly-complete German pronunciation module I wrote a couple of years ago but never completed). Including a phonemic transcription, as you note, can get you into hairy discussions of what counts as a phoneme, which is problematic for German (e.g. with glottal stops and the [ŋ] sound) and especially problematic for Russian due to vowel reduction. My one reservation here is the level of detail shown in the phonetic transcription, which I think may be a bit too much, particularly as regards the diaeresis over the [a] and the dentalization diacritics on [n], [d] and [t]. It's true that technically [a] is a front vowel, but in practice an [a] without diacritics is usually interpreted as central; and it's also true that the dental obstruents are a difference from English, which mostly has alveolar n, d, t, but this sort of difference is (I would argue) not immediately apparent to an English speaker not trained in phonetics, and having all these diacritics feels maybe a bit overwhelming. I would argue, on the other hand, that the bar under the [s] to indicate retraction is useful to have, since (assuming this is meant to indicate the same sound as in Spanish apicoalveolar retracted [s]) the difference is immediately noticeable to an English speaker. In general my preference when including phonetic transcriptions is a "lightly phonetic" version, one that tries to capture the most salient aspects of the sound without overwhelming the reader with detail. Benwing2 (talk) 06:01, 1 May 2025 (UTC)Reply
I think another major problem is that our module features a random mixture of ideas from (several) different scholars as well as purely user-invented guesses. A cleaner way to handle all this would be to choose a single principal source to use—so, either Allen or Cser—and only use other sources, if at all, to fill in a detail here or there that isn’t covered by that principal source.
There are a number of areas where, despite broad agreement about phonetics, there are different approaches to the phonemic analysis of Latin.
I’m not sure that this is necessarily less common than the reverse, namely cases where scholars broadly agree on the Classical Latin phonemes but disagree on the phones. (Not so much a problem for living languages, like Catalan, where speakers are available for reference.)
Some examples of there being no clear consensus on phonetic details:
  • /p t k/: somewhat aspirated or not?
  • The coronals: dental, alveolar, or perhaps a mix?
  • /l/ (when not velarized): somewhat palatalized or not?
If we ‛de-narrow’ our Latin phonetic transcriptions in general, however, then I suppose the above could comfortably fit within [p t k], [t d s (…)], [l].
In case some of it happens to be useful, here is an (unfinished) overview of what various scholars have to say on Classical Latin pronunciation. Most of the information is in the footnotes.
qu, su, gu: Cser 2016 argues in favor of analyzing these as biphonemic clusters /kw ɡw sw/ (pages 16-28) but acknowledges there are arguments for analyzing them as single complex phonemes /kʷ ɡʷ sʷ/.
If I’m not missing something, these cases are also difficult to decide on the phonetic level.
Edited to add: in keeping with your last two bullet-points one could also mention my old gripe against the notion of phonemic syllabification (no such objections against [˙a.mo:]). Nicodene (talk) 10:28, 1 May 2025 (UTC)Reply
@Nicodene I do think that [p t k t d s l] etc. would be acceptable broad transcriptions regardless of whether slight aspiration, secondary articulations, etc. are recognized or not for these sounds. Likewise, while there may be some different opinions about the phonetic realization of qu gu su, I think prevocalic [kʷ ɡʷ sʷ] and tautosyllabic [kw ɡw sw] are similar enough to make either acceptable in the context of broad phonetic transcription (the phonetician Mark Liberman notes that "the phonological distinction between a doubly-articulated consonant and a cluster is not always phonetically plain"). Whatever difficulties there may be with coming to consensus on a phonetic transcription, I think omitting phonetic transcription is not an option we should take because transcriptions like /siɡˈnaː.tim/ and /konˈfrin.ɡoː/ don't show notable features of Latin pronunciation such as [ŋ], final -m loss, and vowel lengthening before nf, and there's a real risk of readers misunderstanding these as phonetic transcriptions.--Urszag (talk) 21:18, 1 May 2025 (UTC)Reply
I’m actually a lot more amenable to that than the above comment probably made me seem. That is, given a generally broad [] (aligned, presumably, with Cser 2016) rather than what we have now. Nicodene (talk) 21:31, 1 May 2025 (UTC)Reply
@Urszag @Benwing2 I appreciate that the line is somewhat blurry, but it sometimes feels as though our definition of "phonemic" veers too far towards morphophonology, and that some of the issues Urszag raises can be safely dealt with under a phonemic transcription.
  1. There is no need to treat m- and -m as the same phoneme, simply because they are written with the same letter. The fact that they cannot be contrastive due to position is purely morphophonemic.
  2. The question of /kʷ/ or /kw/ has essentially already been answered by the way the module has been written: analysing it as /kw/ would require a bunch of special exceptions that aren't necessary when treating it as /kʷ/, and (though I'd need to check in detail to be sure), I can't think of any instances in the other direction. By comparison, x is treated as though it were cs, and you run into the same problem in reverse if you try to analyse it as a single phoneme (not that anyone does, but it's illustrative of my point). The same goes for /ɡʷ/ and /sʷ/; e.g. compare cuiusvīs /kujˈjus.wiːs/ (penultimate stress) and */ˈkuj.ju.sʷiːs/ (initial stress).
  3. All Latin diphthongs ending in -i and -u seem to be artificial constructs to get around the fact that j and v cannot conventionally occur before consonants or at the end of a word, and I don't really see any evidence that they represent anything phonologically distinct. Regardless, I'm not sure how [uj] and [ui̯] are supposed to contrast phonetically anyway, so I don't see why we would include these at all.
  4. The gemination of /ʃ.ʃ/ is still phonemic, because /ʃ/ and gemination are both phonemic features of the language. The fact that it only occurs under certain conditions is a morphophonemic feature, though - I don't think it occurs at word boundaries.
  5. The same goes for Latin stress: it is regularly predictable, but there are exceptions (e.g. illic and istic), so we have to treat it as phonemic. Theknightwho (talk) 10:56, 1 May 2025 (UTC)Reply
For point 4, Italian /ʃ/ does geminate across word-boundaries and so, accordingly, does the Italo-Ecclesiastical counterpart (as in [kwiʃˈʃi:ret] ‛qui sciret’). Nicodene (talk) 12:11, 1 May 2025 (UTC)Reply
@Nicodene That's a good example - thanks. I'm a bit wary of extrapolating Italian to Italianate Latin, given there are notable differences (e.g. Italian fascia /ˈfaʃ.ʃa/ and Latin fascia /ˈfaʃ.ʃi.a/). That being said, I'm still inclined to call the gemination phonemic, because it's still treated like a cluster: e.g. mariscī is /maˈriʃ.ʃi/, not */ˈma.ri.ʃi/. That only works if we treat it as geminate, which is precisely how we handle z, too. Theknightwho (talk) 13:07, 1 May 2025 (UTC)Reply
It’s in part accidental. The placement of stress in general depends not on the surrounding sounds but rather on the speaker memorizing ancient (no longer pronounced) differences like ū/ŭ and ae/e, word by word, and then applying a series of ‛weights’. Synchronic rules are not allowed to involve the speaker doing historical linguistics, I think. Nicodene (talk) 14:00, 1 May 2025 (UTC)Reply
@Nicodene In this case, isn’t the ancient rule simply that it was a consonant cluster, reinforced by the digraph? That doesn’t really feel accidental. Theknightwho (talk) 16:44, 1 May 2025 (UTC)Reply
The problem is that, as far as Italianate Latin today is concerned, it’s a pattern rather than a determinative rule. Cf. discédo, nescíret (stress follows [ʃʃ]) or baptizáre, ratiónem (stress follows [ddz], [tts]). Nicodene (talk) 20:44, 1 May 2025 (UTC)Reply
@Nicodene The rule I'm referring to is that a short vowel in the penultimate syllable is treated as light, unless it is followed by a consonant cluster or geminated consonant. This is useful for a couple of reasons:
  1. qu, and consonantal gu and su, only fit the rule if analysed as /kʷ ɡʷ sʷ/.
  2. Classical z and Italianate sc must be analysed as phonemically geminate, because pulverizō and mariscī both have penultimate stress in Classical and Italianate. By comparison, *pulverisō and *marisī would both have antepenultimate stress. The lack of vowel-length distinction in Italianate isn't relevant, either, because a light syllable can never occur before sc, whereas we would expect unpredictable variation (corresponding to vowel length in Classical) if sc weren't geminate. The upshot is that (1) gemination affects stress, (2) we already established that stress is phonemic earlier, so (3) the gemination of /ʃ.ʃ/ must be noted in a phonemic transcription. Theknightwho (talk) 21:45, 1 May 2025 (UTC)Reply
The lack of unstressed penultimate syllables before [ʃ] in Ecclesiastical Latin could be regarded as a diachronic accident, or at least analyzed in terms other than [ʃ] being two phonemes long in this position: Spanish has a similar gap with the consonants /ʎ ɲ ʝ tʃ r/, but it isn't that common to transcribe these as phonemic geminates in contemporary Spanish. The rule you mention is undeniably valid for Classical Latin, although clusters such as "pr tr cr" can be exempt, so analyzing /kʷ ɡʷ sʷ/ doesn't eliminate all of the exceptions. As Cser says, the phonemic analysis of "qu gu su" has been discussed by multiple linguists with no clear consensus emerging, so I don't think it is actually obvious. While /kw gw sw/ would be somewhat atypical clusters in Latin, /kʷ ɡʷ sʷ/ would be somewhat atypical consonants: they can't precede other consonants or come at the end of a syllable (seemingly not even in the context of a geminate consonant if the spelling "cqu" is taken at face value as /k.kʷ/), and /sʷ/ is sometimes replaced in poetry by /su/.--Urszag (talk) 22:05, 1 May 2025 (UTC)Reply
@Urszag Well, the rule isn't strange at all: it's that they must be followed by a vowel. That's it. I'm honestly not sure why Cser writes as though there is a whole laundry list of things that make /kʷ ɡʷ sʷ/ atypical, when they're all just corollaries of that. Plus, the same rule uncontroversially applies to /f/ and /h/, so it's not without precedent, either. Theknightwho (talk) 00:25, 2 May 2025 (UTC)Reply
It is certainly imaginable that /kʷ ɡʷ sʷ/ are single consonants that are required to be followed by a vowel. But this requirement would put them in the minority of Latin consonants (/f/ can be followed by a consonant in /fr/, /fl/ and /ff/). Latin /h/ also has to be followed by a vowel, but /h/ is clearly the most aberrant Latin consonant, to the point where it is not clear it is a consonant phoneme at all: there is a long history going back to ancient authors of not counting it among the consonant sounds of Latin (since it allows elision and doesn't create heavy consonant clusters). Also, if we consider phonemes to be psychologically real, it seems a bit strange that as far as I know no ancient Latin author on pronunciation describes "qu gu su" as consonants rather than as sequences composed of a "c g s" sound followed within the same syllable by a "u" sound.--Urszag (talk) 01:01, 2 May 2025 (UTC)Reply
@Urszag You know, I originally only wrote /h/, then went back and added /f/ absent-mindedly due to Cser mentioning it on page 26 as an example of a phoneme which must go in syllable-initial position. My bad. In any event, I don't really think it changes my point, especially with the possibility of treating /m/ in the same way, mentioned below.
One thing I find contradictionary in Cser's argument are these two arguments:
  1. p. 19: While all stops occur as geminates in simplex forms, qu does not.Furthermore, it does not even occur in a [kkw]/[kkʷ] sequence (which could, in theory, be analysed as the phonetic representation of geminate [kw] but also as a [k] + [k] + [w] sequence). This squares neatly with the fact that geminates do not occur next to another consonant (in this case [kk] before [w]). It also squares neatly with the fact that [kkw] can emerge (though rarely does) at prefix stem boundaries, as in acquirere ‘get’ and acquiescere ‘acquiesce’ from ad+qu. It is only at such boundaries that geminates can be adjacent to consonants.
  2. p. 26: The history of English shows a parallel development of PIE *[kʷ] > (Old) English [hw] and *[ɡʷ] > English [kw], as in which and queen, respectively, where stops developed into what are analysed as clusters on phonological grounds independently of their provenance. Furthermore, the later history of Classical Latin qu is far from uniform: in Italian, for instance, it developed intervocalically into [kkw], as in acqua [akkwa] ‘water’, which can be seen as a diachronic reflection of its cluster nature (though, admittedly, in Vulgar rather than Classical Latin).
Well, which is it? Is it a cluster because gemination does not occur outside of morpheme boundaries, or is the development of gemination at a non-morpheme boundary evidence that it must be a cluster? It can't be both. Theknightwho (talk) 01:58, 2 May 2025 (UTC)Reply
I am referring to the phonology of Italianate Latin—that is, how it works synchronically and in its own right—not to the elaborate rules our module uses to derive Italianate outputs from Classical (or Classical-by-orthographic-proxy) inputs. The latter would only apply here if one regards Italianate as phonemically identical to Classical (with the full spectrum of contrasts like vowel length, etc). Geminate [ʃʃ] does not determine stress, per examples like discédo, and there is nothing about the sequence of phonemes /d/+/i/+/ʃ/+/e/+/d/+/o/ that would tell us where the stress should be. From the synchronic perspective it is unpredictable, as e.g. in English. Nicodene (talk) 22:41, 1 May 2025 (UTC)Reply
Theknightwho is referring to the absence of forms such as léntisci from Ecclesiastical Latin. The rule is not that [ʃʃ] must be preceded by a stressed vowel, but that a vowel before [ʃʃ] must be stressed if it is in the penultimate syllable of the word: the i in discedo is not in the penultimate syllable, so it isn't a counterexample to that rule. In any case, I kind of just tossed ʃ in there as one additional example and I don't think it deserves much argument; I think the other problems are more important, such as final -m and the predictable but non-contrastive vowel lengthening before nf and ns.--Urszag (talk) 23:35, 1 May 2025 (UTC)Reply
I see. My response in that case would be essentially the same as yours. Nicodene (talk) 01:10, 2 May 2025 (UTC)Reply
reading this thread, I don't quite understand why, for a /phonemic/ transcription, we can't have <-um> as /ũ/ and <-un> as /un/ but neutralize them both to /un/ before a consonant.
I guess arguing about their /phonemic/ values isn't necessary if we are going with a [phonetic] transcription instead. — BABRtalk 07:00, 6 May 2025 (UTC)Reply
There’s a fine line between deciding phonemes and sliding into madness. Nicodene (talk) 08:22, 6 May 2025 (UTC)Reply
Regarding final -m, look at the entry for etiamnum that I recently edited, as well as quamobrem and other artefacts of univerbiated spelling, where final m contrasts with regular m in unexpected ways (when spelling forces it inside a word). True m retains its power before n, as in columna, omnis, unlike etiamnum or etiamnunc -- this one is attested as etiannunc (in papyri) and is also said by Velius Longus to be unpronounceable with an m despite the spelling. So, as a practical consideration, final m in Wiktionary entries isn't even always final. For this reason I'd strongly support adding a final M phoneme, which I would spell as /M/ instead of /N/ for legacy reasons (compliance with the Roman grammarian tradition, where it is never by anyone called a type of N). If anything, /n/ contrasts with final M, and by all accounts a final [m] was an acceptable pronunciation of final M before pause, but a final [n] was not, being a realization of /n/.
In regard to diphthongs, I believe the ancient tradition recognizes only several, of them au, ae, oe, eu. "Cui" is not considered to have a diphthong. Many more diphthongs were added by modern analyses, to the point that now we want to abolish them because the category is overblown and meaningless. Personally I would pay lip service to the ancient tradition and keep the diphthongs at au, ae, oe, and eu, and convert the rest to vowel and consonant sequences (huic, cui, ei). That way we are neither doing something very revolutionary (abolishing customary diphthongs like ⟨au⟩) nor stretching the definition of diphthong to its utmost limits away from the custom (ei of eidem as a diphthong, possibly ai of aio as a diphthong, etc). In the end, it likely doesn't matter too much.
Albeit weakly I suggest keeping in a distinguishing diacritic on t and d, because I find the difference noticeable between a Romance t/d and an English one, and because the English one poses problems when one tries to say it together with a coronal r. On the IPA module talk page I already suggested simplifying some of the phonetic aspects.
On a related note, it could be wise to have the phonemic transcription at least feature vowel qualities presumed to have been at all used in Latin. I don't think Latin was ever by default spoken with narrow [e] and [o], which the phonemic transcription of /e/ /o/ would imply. This seems to be an artefact of these letters being more natural, but not their sound. Draco argenteus (talk) 23:17, 1 May 2025 (UTC)Reply
/M/ would have some mnemonic value, but it's not IPA and if we use it we can't expect anybody to know what it means without an explanation. I don't know if we can do something like the Italian "°" for syntactic gemination, where the symbol automatically gets a tooltip explanation if you hover over it, but even that wouldn't be a great solution, since it doesn't help mobile users. Furthermore, having the concept of a phoneme /M/ gets into the morphophonological issues that Theknightwho alluded to. When do we use /M/ versus /n/ versus /m/ before a consonant when phonemically transcribing prefixed, suffixed or compound words such as etiamnum, illunc, conpello, compello, cōnficiō, īnfāns, inputō, imputō, circumtrahō, circumpōnō? @Theknightwho what would be your preferred phonemic transcription for final -m and for nasal vowels before -nf- and -ns-?--Urszag (talk) 23:56, 1 May 2025 (UTC)Reply
@Urszag I've not yet considered the -nf -ns question, but don't @Draco argenteus's examples suggest that the nasal vowels were phonemic? I don't think there's any problem with treating /m/ as a positionally-restricted phoneme which can only exist in syllable-initial position. Theknightwho (talk) 00:05, 2 May 2025 (UTC)Reply
There is a distinction between "/M/" and /n/ in utterance-final position, as in cum compared to in, but this distinction is neutralized before a consonant. Before a stop, either becomes a homorganic nasal stop; before a fricative, either becomes deleted with nasalization and prolongation of the preceding vowel. If we treat the nasal stops as /m/ and /n/ (according to the identity of the following consonant), and the nasalized vowel as /M/, then we would transcribe these as /etiˈannuM/, /ilˈlunk/, /komˈpelloː/, /koMˈfikioː/, /ˈiMfaMs/, /ˈimputoː/, /kirˈkuntrahoː/, /kirkumˈpoːnoː/ respectively. I don't think the use of M internally before /f/ and /s/ here is very intuitive. But I also don't think the morphophonological approach (where the prefix in- is underlyingly /in/, the prefixes con- and circum are underlyingly /koM/ and /kirkuM/, etc.) will be helpful to our readers. If you're suggesting we use transcriptions like /etiˈannũː/, /kõːˈfikioː/, /ˈĩːfãːs/, I find that prettier and more legible, but it implies that /ũː/, /õː/, /ĩː/, /ãː/ are vowel phonemes rather vowel + consonant sequences on the phonemic level, which contradicts the bisegmental analysis of nasal vowels (as a vowel phoneme + a placeless nasal consonant phoneme) that is preferred by some scholars such as Cser.--Urszag (talk) 00:26, 2 May 2025 (UTC)Reply
I don't suggest going all in on the placeless nasal except for noting it at word-end only, or in the middle of etiamnum, wherever in the spelling it is shown through m. Word internally most often in utrumque and the rest. Here the Romans recognized that the M was special and sometimes even wrote it differently (non tota littera, sed pars illius). Phonemic /m/ for such cases sort of works, but I find it just slightly unsatisfactory, as indeed it was already proposed. Importantly, a true /m/ keeps its character before /n/, /t/ and /s/ (hiems, demsi, though it often acquires an intervening allophonic p before t and s), even if it's special behavior before vowels is to be neglected. Most of the rest regarding adding more of the placeless nasal in phonemic transcription I don't suggest, and I disagree with it. Draco argenteus (talk) 03:43, 2 May 2025 (UTC)Reply
The phrase "non tota littera, sed pars illius" occurs in Velius Longus, in the context of a proposal to write word-final -m differently when followed by a word-initial vowel: "Non nulli circa synaliphas quoque observandam talem scriptionem existimaverunt, sicut Verrius Flaccus, ut, ubicumque prima vox m littera finiretur, sequens a vocali inciperet, m non tota, sed pars illius prior tantum scriberetur, ut appareret exprimi non debere." I don't think this passage gives any support to the hypothesis that the Romans heard the pre-consonantal ⟨m⟩ in utrumque as the same sound as the ⟨m⟩ in utrum + a vowel-initial word. Cser 2016 notes that ancient testimony indicates that words like numqumam/nunquam were pronounced with the assimilated velar nasal consonant [ŋ], which could be spelled etymologically as ⟨m⟩ or phonetically as ⟨n⟩ (page 19); in this position, [ŋ] is typically analyzed as an allophone of /n/. I'm not a fan of using spelling as the criterion for phonemes in contexts like this: Latin spelling is sometimes morphological rather than phonetic, as in the case of urbs which is pronounced /urps/.--Urszag (talk) 04:21, 2 May 2025 (UTC)Reply
I don't suggest that either. My minimal suggestion was for using /M/ for final M, then analogically extending it to some occurrences of it within words, mainly before consonants, sometimes before vowels as in quamobrem where the behavior differs from that of a true m, to be noted as /m/. The criterion being ⟨m⟩ appearing in spelling but being contradicted by what it really is: for example ⟨quamtus⟩ is an attested spelling for quantus (and stated to be pronounced quantus), while ⟨emtus⟩ is an attested spelling for emptus. Here we can see that denoting both as /kwamtus/ (I'm not typing the superscript w for now, but it is to be understood here instead of w) and /emtus/ will either mean that both are quamptus and emptus, or that both are quantus and entus. This can be avoided with /kwaMtus/ and /emtus/, where the use of /M/ is informed purely by spelling. However, I can see this as being unnecessary, as indeed /kwantus/ /emptus/ provide solutions to the problem, and etiamnum is successfully represented by inputting /etiannum/. Perhaps only prevocalic final M is the most useful, as in quamobrem, but even there the problem is taken care of by word separation. Ultimately I am in favour of scrapping the displaying of phonemic transcription, because of its various problems and apparent uselessness. Draco argenteus (talk) 04:41, 2 May 2025 (UTC)Reply
@Theknightwho You run into endless problems if you try to shoehorn phonetic/allophonic information into the phonemic representation. As an example, [ŋ] occurs as an allophone of <n> before <g> and <k>, and also as an allophone of <g> before <n>, but it's unquestionably non-phonemic. Benwing2 (talk) 00:27, 2 May 2025 (UTC)Reply
At this rate we’re never going to move things along. I propose an informal vote:
  • Should we set the module to stop outputting both // and []?
    • If yes, do you prefer having only // or only []?
  • Should we set the module to follow the Classical pronunciation reconstructed by a single scholar, as a baseline before further discussions/tweaks?
    • If yes, do you prefer we use Allen 1965 (Vox Latina) or Cser 2016 (Aspects of the Phonology and Morphology of Classical Latin)?
My answers:
  • strong support
  • weak preference for []
  • strong support
  • weak preference for Cser 2016
Nicodene (talk) 01:53, 2 May 2025 (UTC)Reply
  • Support
  • Prefer only [] (I have suggested improvements to //, but I'll see the entire idea of outputting // scrapped with little regret)
  • Strong support
  • Support for Cser 2016 with possibility for later tweaks
Draco argenteus (talk) 03:49, 2 May 2025 (UTC)Reply
My answers:
  • Support
  • Prefer []
  • I don't favor using one of these as a baseline. If we do, I'd prefer Allen 1965. Cser does provide phonetic transcriptions, but as per the title of the thesis, he deals a lot with questions of phonology and morphology. For example, I think Cser's use of [aj] as a broad transcription is informed by his favored phonemic analysis: given the change in spelling from "ai" to "ae", I think it's unlikely [aj] is a very accurate transcription of Classical Latin "ae".
--Urszag (talk) 04:21, 2 May 2025 (UTC)Reply
  • Support
  • Prefer only []
  • I don't have enough knowledge of the two sources to say which one is better. In general I would rather that we start with a single baseline and go from there, so I guess I support this question. One possibility if the reconstructions differ significantly is to list both of them, as we do for Old Chinese reconstructions (there are at least two major ones, Baxter-Sagart and Zhengzhang Shangfang; we put both along with others in a dropdown that when closed shows the Zhengzhang reconstruction, which presumably has a bit more consensus on it these days, maybe just among Wiktionary editors, than Baxter-Sagart). My instinct is to prefer the more recent one; 50 years is a long time in historical linguistics. But I dunno if Cser's reconstruction is more of a sensible, "reflect modern consensus" type of reconstruction like e.g. Ringe for Germanic, or more of an "out there" type of reconstruction like Leiden tends to produce.
Benwing2 (talk) 04:30, 2 May 2025 (UTC)Reply
Overall there aren’t all that many notable points on which Cser contradicts Allen. Examples include his preferring [aj aw] over [ae̯ au̯] and (very tentatively) [kw ɡw] over [kʷ ɡʷ], both of which he provides fairly detailed argumentation for. (So, it’s not on a whim.)
Instead of resetting the entire module to follow one scholar or the other, though, we could just focus on cleaning up issues like:
  • the purely user-invented [ʏ]
  • the unnecessary diacritics in [ä] and [s̪ z̪ t̪ d̪ l̪ n̪]
  • the odd/borderline-non-IPA [i̯ u̯] for the [j w] of iam, peius, evangelium, vesper
  • the strange/fake-precise [ɪ̃ˑ ɛ̃ˑ ãˑ ɔ̃ˑ ʊ̃ˑ] for what both Allen and Cser describe as [ĩ: ẽ: ã: õ: ũ:]
Nicodene (talk) 04:08, 4 May 2025 (UTC)Reply
  • Support removing the really narrow transcription.
  • No strong opinions on / / vs [ ]: I think we're all arguing for the same kind of transcription, but we seem to have different opinions on what counts as phonemic or not, so it's probably safer to go for [ ].
  • I don't think we need to use a particular author as a baseline, but I agree with the changes Nicodene proposes above. That being said, I disagree with Cser's view on [kʷ ɡʷ], and would like to keep them.
Theknightwho (talk) 00:32, 5 May 2025 (UTC)Reply
I would like to keep [kʷ]. In my opinion [gʷ] is more difficult to prove, isn't associated with a dedicated letter (while [kʷ] is associated with ⟨q⟩), and can be simplified to [gw], just as [sw] is. So, unusually, I propose using [gw] for simplicity. Draco argenteus (talk) 01:11, 5 May 2025 (UTC)Reply
Hard oppose using different transcriptions for them. Sorry. They're both digraphs, the use of q can be explained diachronically and has nothing to do with this question, and there is no reason at all to assume that we should default to [ɡw]. It isn't a simplification, either, and just makes our transcriptions incoherent by implying some kind of qualitative difference between qu and gu. Theknightwho (talk) 03:10, 5 May 2025 (UTC)Reply
Certainly for reading purposes you are right. I got caught up in other considerations. I support both expressed through superscripts. Draco argenteus (talk) 06:06, 5 May 2025 (UTC)Reply
Something modest like this may be more conducive to consensus, so I’m changing my votes to:
  • Yes; only []
  • Neither; just these more lightweight changes for now, like [ä]>[a]
Nicodene (talk) 00:42, 5 May 2025 (UTC)Reply
I would like to see the narrow transcription simplified to the basics, but while retaining the dark l and the dental diacritic under z. I can let go of the dental diacritics on t and d. I think even the lax i and u are too narrow, since no Roman before Consentius mentions them even when describing the different vowel qualities of e/ē and o/ō, which makes me think the detail was surprisingly irrelevant, as African speakers, who had the Sardinian vocalism, would not receive corrections on their i and u, despite receiving them at least on ē (as with Pompeius). But simplifying the lax i and u may be a spicy opinion. Draco argenteus (talk) 01:01, 5 May 2025 (UTC)Reply
The above would change [ɫ̪] to [ɫ], not to [l].
I’ve checked all the Latin scholars that come to mind and not been able to find one that argues for z being specifically dental. But, note that [z] as a broad transcription accommodates both possibilities (dental and alveolar) while [z̪] is inherently narrow and can only cover one. Nicodene (talk) 01:25, 5 May 2025 (UTC)Reply
@Nicodene Maybe he means retracted alveolar s? I think there is a fair amount of evidence from Romance languages for this pronunciation in Latin. I dunno about z, which I thought was pronounced more like [dz] in any case. Benwing2 (talk) 01:28, 5 May 2025 (UTC)Reply
For the manner of articulation of Classical z, the scholarly ‛votes’ are: fricative (Sturtevant, Allen, McCullagh), affricate (none) — notes, citations.
For the place of articulation of Classical s: dental (Sturtevant), alveolar (Allen/Weiss/McCullagh), apico-alveolar or undecided (Lloyd) — notes, citations. To the best of my knowledge, the only symbol that covers this range of possibilities is a broad [s]. Nicodene (talk) 01:44, 5 May 2025 (UTC)Reply
I spoke for z. I have an idiosyncratic belief where I think whether s was specifically retracted or not is hard to decide upon, but I weakly support it being apico-alveolar (which may be unretracted), but I assume that z specialized toward being dental, but this all mainly from early Romance languages. Essentially I keep putting my belief in /s/ and /z/ not actually contrasting mainly on voicing, which leads to a desire to add some distinctions to them in the transcription beyond making the reader think the main distinction is voicing and some doubling. It's probably not very citable for Latin though. Draco argenteus (talk) 06:15, 5 May 2025 (UTC)Reply
Given what can be cited I propose [s̺] for s, as several+most scholars support it, while z with a diacritic is unciteable and can be left as is. Draco argenteus (talk) 08:52, 5 May 2025 (UTC)Reply
I’m not aware of information from Romance (or elsewhere) that would suggest Classical /z/ and /s/ had different places of articulation. I suppose you are thinking of the situation later in Iberia, but there /(d)z/ was not the surviving continuation of a Classical /z/ but rather the result of original /k/ palatalizing before front vowels and voicing intervocalically (facere > fazer) and the result of adapting ‛ecclesiastical’ pronunciation in new learned borrowings (baptizare > bautizar). Nicodene (talk) 11:36, 5 May 2025 (UTC)Reply
So that was the evidence. Okay, I can live with it. Draco argenteus (talk) 03:36, 6 May 2025 (UTC)Reply
@Urszag, @Benwing2: any chance we could agree on cleaning up the above? For reference:
  • user-invented [ʏ]
  • unnecessary diaeresis in [ä] and dental diacritic in [z̪] etc
  • non-standard IPA [i̯ u̯] for iam, vesper, etc (not diphthongs)
  • user-invented [ɪ̃ˑ ɛ̃ˑ ãˑ ɔ̃ˑ ʊ̃ˑ] for what both Allen (p 30) and Cser (passim) describe as [ĩ: ẽ: ã: õ: ũ:]
Nicodene (talk) 03:54, 6 May 2025 (UTC)Reply
All fine with me. Benwing2 (talk) 03:56, 6 May 2025 (UTC)Reply
For what it's worth, I completely agree with this. The current transcription for Classical is annoyingly narrow, and the usage of [i̯ u̯], instead of [j w], is honestly nonsensical (at least in the onset). Whether [i̯ u̯] should remain in the coda, I think, depends on the phonological rules of the language. — BABRtalk 07:23, 6 May 2025 (UTC)Reply
I don't think they should, unless we want to disallow [j.C] or [w.C] (and I don't see why we would). [i̯ u̯] should certainly never appear before a vowel, in any event. Are we going to keep [e̯] as well? I appreciate that there's speculation on why the shift from ai and oi to ae and oe happened, and that it may have been for phonetic reasons, but I'm not convinced:
  1. The distinction between [ai̯] and [ae̯] is very small, and certainly did not get used contrastively. For instance, are maior (phonetically maiior) [ˈmajjɔr] and praeiacēbō [prae̯jaˈkeːboː] really distinct? This seems unlikely. The difference just seems to be an orthographic artefact of the morpheme boundary after prae-.
  2. There was orthographic pressure to distinguish [aj] and [oj] from [a.i] and [o.i], as all four are common sequences before a consonant or boundary. There are no instances of ai and only one instance of oi (proin(de), [ˈprojn(dɛ)]) being used as diphthongs in that position during the Classical period, suggesting the distinction served a practical purpose.
  3. On the other hand, [ej] and [uj] before a consonant/boundary are very rare, with dein(de) [ˈdɛjn(dɛ)], deinceps [ˈdɛjnkɛps] and huic [ˈhʊjk] being the only preconsonantal examples.
  4. Their use as evidence for Vi being qualitatively different from Ve is also undermined by proin(de) [ˈprojn(dɛ)], which is directly analogous to dein(de) [ˈdɛjn(dɛ)] etymologically, but never underwent respelling to *proen(de). This, again, suggests oe was simply an orthographic convention, because the alternative is to suggest that proin(de) [ˈprojn(dɛ)] had a unique diphthong distinct from the initial vowel of proelium [ˈproe̯lʲiʊ̃ː]. While marginal diphthongs evidently did occur under similar conditions, contrastive [oe̯] and [oj] is not plausible.
Theknightwho (talk) 13:40, 6 May 2025 (UTC)Reply
@Nicodene I'm fine with replacing [ʏ] with [y], [ä] with [a], [z̪] with [z], [ɪ̃ˑ ɛ̃ˑ ãˑ ɔ̃ˑ ʊ̃ˑ] with [ĩ: ẽ: ã: õ: ũ:], and replacing [i̯ u̯] in the onset or as geminate consonants with [j w].
@Theknightwho I prefer using [e̯] [u̯] [i̯] for the second portion of diphthongs.
In the case of "ae", the testimony of Terentius Scaurus ("apud antiquos i littera pro ea scribebatur, ut testantur μεταπλασμοί, in quibus est eius modi syllabarum diductio, ut 'pictai vestis' et 'aulai medio' pro pictae et aulae. sed magis in illis e novissima sonat, et propterea antiqui quoque Graecorum hanc syllabam per ae scripsisse traduntur") is often taken as direct evidence for the phonetic value [ae]; thus Allen, Lindsay 1894:43, and others. This has been debated. While [aj] could be argued to be broader transcription in some respects, [ae̯] is closer to the spelling, which I consider a point in its favor since the spelling is not disputed.
If we use the transcription [aj] for "ae", we would be practically compelled to use the transcription [aj.j] for the ae + vowel sequences found in Greek loans such as iūdaeus: a pronunciation like [ˈjuː.da.jʊs] is ruled out since the second-to-last syllable scans heavy, and a pronunciation like [juːˈdaj.ʊs] has an unparalleled syllable boundary between a consonant and a directly following vowel. So that leads us to [juːˈdaj.jʊs], but that transcription implies no phonetic distinction from the sequence [aj.j] found in words spelled with ai + vowel such as maior. I'm doubtful of that conclusion. There are a few words such as Aiāx where Greek -αι- before a vowel was transliterated in Latin as -ai-, but I think the use of -ai- vs. -ae- spellings is generally stable rather than fluctuating for each individual word, which would be in line with this being a phonetic distinction rather than a purely orthographic one.
I acknowledge Cser’s point that the analysis [aj] makes it simpler to interpret and transcribe the pronunciation of words prefixed with prae- like praeacūtus, which usually scan with a light first syllable. If we assume prae- = [praj] and this is resyllabified like other consonant-final prefixes, we get [pra.jaˈkuː.tʊs]. But while this neatly accounts for the metrical facts, I think it’s not actually clear that such words were pronounced with onset [j] rather than with a diphthong that was affected by shortening in hiatus, a phenomenon that can be seen affecting long monophthongs (as in dĕūrās, derived from dē- + ūrō). I would handle this situation with a notation like [prăe̯.aˈkuː.tʊs]: this may be more awkward, but this is an edge case anyway.
The case of "oe" is basically analogous. I don't think we can rule out the possibility that proinde was pronounced [ˈproe̯ndɛ], even if it was never spelled *proende. The "oi" may simply be a morphological spelling influenced by the form of inde. So I don't think the existence of proinde requires us to transcribe "oe" as [oj]. Assuming we accept the transcription of Latin pre-consonantal short i as [ɪ], it isn't obvious that a diphthong derived from the fusion of the vowels [oː] + [ɪ] would have the phonetic outcome [oj], with [ɪ] changed to the more constricted consonant sound [j].--Urszag (talk) 21:26, 6 May 2025 (UTC)Reply
I agree with @Urszag here; since we're doing a broad phonetic representation, and given Urszag's arguments along with the stability of the spellings <ae> and <oe> and the general tendency for the second element of diphthongs to "relax", [ae̯] or even [ai̯] sounds more plausible than [aj]. Benwing2 (talk) 22:04, 6 May 2025 (UTC)Reply
@Urszag Why are you doubtful that iūdaeus could have [aj.j] when the Greek was Ἰουδαῖος (Ioudaîos), with a long vowel? In fact, doesn't that suggest that's precisely how it was pronounced? Theknightwho (talk) 22:23, 6 May 2025 (UTC)Reply
As you say, the αῖ in ancient Greek Ἰουδαῖος functioned like a long vowel: for example, it shows circumflex accentuation, which could not occur on a short vowel followed by a double consonant. The transcription [aj.j] indicates a short vowel followed by a double consonant. After thinking a bit more, I'm a bit less confident in my appeal above to spelling evidence, since most examples showing the distinction in writing between "ae" and "ai" before vowels in Greek loans come from manuscripts written by postclassical scribes. However, we do see ancient grammarians comment on the pronunciation of words such as Troia, Maius and Aiāx that confirm that these were pronounced with short vowels + double [jj]. Also, it's hard for me to tell whether any of the Romance descendants of iūdaeus are fully inherited, but if they are, they they show different outcomes from Maius (e.g. Portuguese judeu vs. maio).--Urszag (talk) 22:43, 6 May 2025 (UTC)Reply
Yes the outcomes of iudaeus are broadly in line with those of Deus, e(g)o, meus and not gaius, maior, Maius, Nicodene (talk) 22:54, 6 May 2025 (UTC)Reply
@Urszag I think you've misunderstood my point: the sequence αῖο would have been pronounced [aj.jo] in Greek, so it's fully expected for the Latin borrowing to do the same. Theknightwho (talk) 23:04, 6 May 2025 (UTC)Reply
I'm not sure I accept that αῖο was pronounced [aj.jo] in Ancient Greek, given that αῖ isn't accented like a short vowel + double consonant sequence. I know Allen in Vox Graeca presents the analysis of Greek pre-vocalic diphthongs as short vowels + doubled semivowels, but it seems rather speculative, and in any case the foreword of Vox Graeca says its primary aim is to describe Attic Greek of the 5th century BC, centuries before Classical Latin.--Urszag (talk) 23:27, 6 May 2025 (UTC)Reply
@Urszag My point was not to say it's definitive; it was that resting your argument on the doubtfulness of [aj.jo] is not as self-evident as it first seemed. I am happy to accept that preconsonantal diphthongs may have been more relaxed than prevocalic ones, but that still leaves us with some questions:
  1. The ongoing discussion on how to handle ae and oe in prevocalic or word-final positions.
  2. How to treat the other dipthongs:
    1. Were ei and ui also relaxed in deinde and huic?
    2. How about au and eu?
Theknightwho (talk) 00:03, 7 May 2025 (UTC)Reply
@Theknightwho I think there's a clear basis for saying that the diphthong ae, ending in a front glide, did not evolve symmetrically to the diphthong au, ending in a back glide. Aside from the lack of a spelling change from au to ao, we see that in Romance languages ae always evolves like monophthongal ĕ (or occasionally ē), whereas au is often maintained as a diphthong ending in [w] or [u], e.g. aurum > Galician ouro [ˈow.ɾʊ], taurum > Romanian taur. That supports the asymmetry between the pronunciations [ae̯] and [au̯]. It's not as easy to give examples of eu in vocabulary inherited from Latin to Romance, but I think we can safely conclude by analogy that it was also generally pronounced as eu̯. The pronunciation of the more marginal diphthongs ending in a front glide is more troublesome, but I think the transcriptions [ei̯] [ui̯] are adequate, even if the actual values could have involved slightly different vowel qualities such as ɛɪ̯, ɛi̯ or ʊɪ̯, ʊi̯. In native words, their development by fusion of originally separate vowels is generally more recent than the development of ae and oe.--Urszag (talk) 00:47, 7 May 2025 (UTC)Reply
@Urszag I agree with your argument that the transcriptions [ei̯] [ui̯] are adequate, even if the actual values could have involved slightly different vowel qualities. What I can't agree with is that we take a pedantic approach to ae and oe, but ignore these. Theknightwho (talk) 00:51, 7 May 2025 (UTC)Reply
I would consider all of [ae̯] [oe̯] [ei̯] [ui̯] to be broad, somewhat uncertain transcriptions, so [ae̯] might have really been pronounced more like [aɪ̯], but since we can't be certain I prefer using the same letter the Romans did. I think there's a good chance that [ae̯] [oe̯] really ended in an opener phonetic quality than [ei̯] [ui̯], so I don't find it problematic to use different letters for their offglides. But if everyone else prefers [ai̯] [oi̯] [ei̯] [ui̯], that's OK with me.--Urszag (talk) 05:48, 7 May 2025 (UTC)Reply
I agree here with @Urszag. Benwing2 (talk) 05:52, 7 May 2025 (UTC)Reply
My stance on ui and ei is that they're not to my knowledge anciently recognized diphthongs, and since diphthongs are progressively arbitrary beyond [ae̯] and [oe̯], I like to draw an arbitrary line as well and stop the diphthongs at eu, not including ui, ei and other possible ones, which I would put as the corresponding short vowel with [j]. Draco argenteus (talk) 07:19, 7 May 2025 (UTC)Reply
@Theknightwho, I thought it may be worth exploring some of the background for this question.
Per Adams' The regional diversification of Latin (pp 78‒88), evidence points to the monophthongization of Latin ae, presumably to [ɛ:], in several regional accents of the second and first centuries BC, but not in Rome itself - at least, not among the educated. Per Adams' Social variation and the Latin language (p 75), ‛various corpora show provincials in the first three centuries of the Empire writing e for ae with such regularity that monophthongisation must have been widespread across the Empire’, but there is no clear evidence for this in Rome itself. Further (p 80), ‛what little evidence there is in grammarians suggests that in the early centuries of the Empire there was an attempt to maintain to maintain the diphthong, but that by about the fourth century the monophthong was so established that it was acceptable even to grammarians’.
I do not know of a similarly detailed survey of evidence for the Greek diphthong, but per Allen's Vox Graeca (pp 75‒6), spellings indicative of a monophthongal value, presumably also [ɛ:], are found ‛from about 100 AD’ and ‛confirmed for this period by a specific statement of Sextus Empiricus’ (fl. 2nd or 3rd century AD).
Returning to our word iudaeus, the earliest quotations in Lewis & Short are from authors of the first century AD, such as Pliny. They likely encountered the Greek word with a diphthongal αι, and they likely had a diphthongal Latin ae in their own (presumably educated) speech.
To keep the Romance side of things brief, the inherited descendants of iudaeus unambiguously indicate a Proto-Romance *[juˈdɛu] and rule out **[juˈdajju]. This does not necessarily rule out a Classical pronunciation with *[ajj] since such a pronunciation could have been superseded by a monophthongized one imported from later Greek. Still, some form of positive evidence for *[ajj] remains a desideratum. Nicodene (talk) 22:12, 7 May 2025 (UTC)Reply
I believe there is an interesting point of distinction between a Greek-originating ae and a Latin ae, where Latin ae scans as short in verses when a vowel follows it. The example I remember was praeacutus. This seems to mark some conceptual differences between Greek ae's and Latin ae's. However this is just something I've been told is a fact, so I have no further references. Draco argenteus (talk) 08:59, 8 May 2025 (UTC)Reply
@Nicodene @Urszag Given that ae and ai do seem to scan differently, would it be reasonable to interpret the difference as [aj] for ae and [ajj] for ai? The presence of gemination being indicated by ai, as compared to ae, would explain why aii was never used, since it's an odd exception otherwise. Theknightwho (talk) 23:42, 14 May 2025 (UTC)Reply
@Theknightwho Perhaps?
I suppose my main question would be what motivated the change in spelling for the native diphthong from older ⟨ai⟩ to Classical ⟨ae⟩, if not a sound-change like [ai̯] > [aɛ̯] (en route to later [ɛ:]). Nicodene (talk) 00:02, 15 May 2025 (UTC)Reply
@Theknightwho, Draco argenteus I mentioned praeacūtus earlier. Cser analyzes prae- as /praj/, resyllabified as /pra.j/ before a vowel, but I'm not confident that Cser's analysis of prae- as consonant-final is phonetically accurate, even though it gets the right results. I don't think it is likely that Romans felt ae and ai functioned to distinguish [aj] from [ajj], since they used ae in the spelling of words like iudaeus: this was certainly not pronounced as [ˈjuː.da.jʊs], and it seems improbable it was pronounced as [juːˈdaj.ʊs]. Word-final ae is often completely elided, which is not typical behavior for a VC sequence.--Urszag (talk) 00:17, 15 May 2025 (UTC)Reply
@Urszag @Nicodene For the purpose of my point, it makes no difference whether we analyse it as [aj] or [ai̯]: the issue is the gemination. Nicodene's point on why the orthography used ae is a good one, but if ae is [ae̯] and ai is [ajj], then the apparent gap does still need to be explained.
One reason I can think of is orthographic: the use of AII at a non-morpheme boundary creates too much ambiguity, due to the number of ways it could be read. We see the sporadic use of long-I to get around this, but making a distinction between ae and ai works just as well. Theknightwho (talk) 00:34, 15 May 2025 (UTC)Reply
If the phenomenon only concerns prae, it could just be that prae was subject to some form of unstressed reduction. (À la prehendo?) Nicodene (talk) 01:26, 15 May 2025 (UTC)Reply
  • I want to push back on the notion that having both forms will confuse readers. If this is really the case, then the link pointed to by the "key" text should be edited to say what these brackets mean. As for getting rid of the dual transcriptions... is it wrong for me to say that I find both forms useful? -BRAINULATOR9 (TALK) 00:47, 4 May 2025 (UTC)Reply
I linked to one example that I think shows this confusion occurs, and I can't point to it but I think I've seen other examples. However, I don't have any way to be certain how common this problem is. Adding more explanation behind a link may do some good, but only if readers follow it, which can't be guaranteed. Could you explain more about what you find helpful about having the phonemic transcription alongside the phonetic one?--Urszag (talk) 01:11, 4 May 2025 (UTC)Reply
I don't see any reason why we would have to have either "//" or "[]"- if we're not representing our notation as completely standard IPA, we're not bound to using the standard IPA conventions. It would simply be a matter of whether they would help us get our information across to our readers. Chuck Entz (talk) 03:51, 5 May 2025 (UTC)Reply
I favor using only IPA characters in the transcription. Using brackets seems like it helps to differentiate the phonetic transcription from other parts of the entry (including respellings, such as "nichil", which Theknightwho has proposed adding to pronunciation sections for some words).--Urszag (talk) 20:41, 5 May 2025 (UTC)Reply
I think we can safely use [ ] without any problems, and I'm really not keen on reinventing the wheel by using pseudo-IPA. Phonetic respellings are useful in that they highlight irregular pronunciations in a way that clarifies the differences. The very fact that "phonetic respelling" is there at all is a flag to users that the term is odd in some way. Theknightwho (talk) 21:34, 5 May 2025 (UTC)Reply
I agree with @Urszag and @Theknightwho. Benwing2 (talk) 21:38, 5 May 2025 (UTC)Reply
Phonemic transcriptions are generally simpler and for a language as widely spoken as Latin was, I can't help but think that not everyone would have realized certain sounds the same way. Admittedly, I know very little about the specifics of how Latin speakers spoke Latin, and my thoughts are purely speculative, but it's nice to see what people thought they were saying versus what they were actually(?) saying. -BRAINULATOR9 (TALK) 01:57, 8 May 2025 (UTC)Reply
If people are confused about what // vs [] means, they're probably not only confused about it with regard to Latin, but any language, right? I suppose we could have all our various pronunciation modules output explicit explanatory notes when they sense or are triggered to output // vs [], like instead of "/fu/, [fuː]" they could produce "(broad phonemic:) /fu/, (narrow phonetic:) [fuː]" or something, for logged-out users. (The text could have some class so that logged-in users who know the difference could 'turn it off' / make it invisible.) - -sche (discuss) 03:55, 4 May 2025 (UTC)Reply
  • I'm all in favor of eliminating the overly narrow transcriptions we currently have, including the unnecessary contrast between velarized and palatalized allophones of /l/ and all but the most essential diacritics. As for qu gu su, w:Latin prosody § Quantity tells us "qu counts as one consonant", i.e. it was /kʷ/, but it doesn't say anything about gu and su before a vowel. —Mahāgaja · talk 14:22, 6 May 2025 (UTC)Reply
Saying that "qu counts as one consonant" is one way of conceptualizing the fact that it doesn't make a preceding syllable heavy. Another way is to analyze it as a tautosyllabic complex onset cluster, and say only heterosyllabic clusters make the preceding syllable heavy (because the thing that makes a syllable heavy is really the presence of a consonant in its coda). Compare patrēs: the first syllable can be scanned short (pătrēs [ˈpa.treːs]) but hardly anyone considers [tr] in this context to be a single consonant phoneme. We can equally say that ăquă is pronounced [ˈa.kwa], with a light first syllable because the cluster [kw] is syllabified here as an onset rather than being split across syllables as [k.w]. I'm fine with using the transcriptions [kʷ ɡʷ sʷ], but prosody doesn't actually prove that they must have been unitary segments.
Typically, "gu" [ɡʷ] only occurs after [ŋ] (in the context -ngu-) in Latin. It can occur in other contexts in Medieval or New Latin, but I don't think that's relevant to Classical Latin transcription. So there is no direct evidence of how [ɡʷ] affects prosody in Classical Latin when it comes directly after a vowel. However, I think analogy is sufficient reason to use the same type of transcription for both "qu" and "gu".
"su" [sʷ] only occurs at the start of a morpheme. The scansion of compound words such as mălĕsuādus [ma.lɛˈsʷaː.dʊs] shows that it does not make a preceding syllable heavy (compare the scansion of words like respondet [rɛsˈpɔn.dɛt], where even though the prefix re- has a short vowel, it gets turned into a heavy syllable by resyllabification of the [s] from the initial cluster in the base word, spondeō.--Urszag (talk) 19:30, 6 May 2025 (UTC)Reply
The only other position it ever occurs is initial, in Medieval Latin (and possibly late Late Latin, too). Theknightwho (talk) 23:14, 6 May 2025 (UTC)Reply
We can look to spelling artefacts such as distinguuntur/distinguntur, which are similar to loquuntur/locuntur/loquntur, which suggest similar behavior of qu and ngu, in being neutralized before another u but variously retained in spelling. Cassiodorus (I think?) specifies that nguu has the first u not pronounced and be identical to ngu. Draco argenteus (talk) 07:25, 7 May 2025 (UTC)Reply

Coming to a conclusion

edit

I think it's time we concluded this discussion. It seems there is consensus among everyone (User:Urszag, User:Nicodene, User:Benwing2, User:Babr, User:Draco argenteus, maybe User:Mahagaja), except maybe User:Theknightwho, to discard the phonemic pronunciation and only display a broadly phonetic one, with the following properties:

  1. [a] not [ä]
  2. [y] not [ʏ]
  3. no dental diacritics on [z t d], i.e. not #[z̪] etc.
  4. syllable onsets use [j w] not [i̯ u̯]
  5. final nasals use [ĩː ẽː ãː õː ũː] not #[ɪ̃ˑ ɛ̃ˑ ãˑ ɔ̃ˑ ʊ̃ˑ]

This leaves the following that need resolution:

  1. Syllable final diphthongs; consensus is leaning towards [ae̯] [oe̯] [au̯] [eu̯] [ei̯].
  2. Alveolar diacritic on [s]; leave it or keep it?
  3. Dark [ɫ] (or [ł]? I dunno which is more correct) vs. light [l] or [lʲ]; do we show this distinction, and if so, how? I'm personally in favor of making a distinction, and probably dark [ɫ] vs. light [l].
  4. /ɡn/ sequences: scholarly consensus seems to lean towards [ŋn] and I think we should do likewise, even if Proto-Romance evidence equivocally suggests [ɡn]; keep in mind that (a) there's no reason the prestige variant of Classical Latin that we're documenting has to be the same as Proto-Romance, (b) the evidence for Proto-Romance [ɡn] is (AFAIK) somewhat equivocal.
  5. How to represent <qu>, <gu>, <su> when the u was semivocalic. I have no strong opinions here.

Benwing2 (talk) 23:38, 6 May 2025 (UTC)Reply

My responses:
  1. I prefer [ae̯] [oe̯] [au̯] [eu̯] [ei̯] for reasons discussed above.
  2. I prefer [s] without a diacritic. But if there’s consensus to include a diacritic, I don’t object to it.
  3. The correct IPA symbol for a velarized lateral liquid is [ɫ]. I wouldn't object to using either [ɫ] or [l] for coda l (e.g. falx [ˈfaɫks], albus [ˈaɫ.bʊs], facul [ˈfa.kʊɫ]): there's pretty consistent evidence that this was velarized throughout the history of Latin. I don't think we should use [ɫ] before vowels, that is, for /l/ in the syllable onset: the evidence for velarization here is contradictory and suggests either variation over time or between speakers. I'm not a fan of using [lʲ] for the lightest allophone of /l/ (although this does follow Sen's transcriptions): I would instead favor transcribing nūllus as [ˈnuːl.lʊs], relinquō as [rɛˈlɪŋ.kʷoː]
  4. I prefer [ŋn], and would object to [ɡn].
  5. I don’t care whether we use [kʷ ɡʷ sʷ] or [kw ɡw sw]. I would object to an inconsistent system.--Urszag (talk) 00:16, 7 May 2025 (UTC)Reply
I agree on all of these points. Nicodene (talk) 00:21, 7 May 2025 (UTC)Reply
I agree with all of Urszag's responses. But for 5, I prefer [kʷ ɡʷ sʷ] whereas they had no preference. — BABRtalk 08:17, 7 May 2025 (UTC)Reply
OK, based on the discussion below with Draco argenteus, I'm revising my answer for 1 slightly to [ae̯ oe̯ au̯ ɛu̯ ɛi̯] (and [ʊi̯] for "ui").--Urszag (talk) 22:01, 8 May 2025 (UTC)Reply
  1. /ae̯ oe̯ au̯ eu̯ ei̯/ are fine with me.
  2. I prefer no diacritic on /s/ (and I'm not even sure what the "alveolar diacritic" is anyway, or did you mean the dental diacritic?)
  3. I remain unconvinced that it's necessary to distinguish two varieties of /l/, especially if, as Urszag says, there isn't 100% certainty of their distribution in onset position.
  4. I prefer /ŋn/.
  5. I slightly prefer /kw ɡw/ but could live with /kʷ ɡʷ/. /sʷ/, on the other hand, really rubs me the wrong way. —Mahāgaja · talk 05:39, 7 May 2025 (UTC)Reply
1. Agree until eu and ei. This [8] primary source evidence makes me support [ɛu̯] (vowel quality of regular short e) instead of [eu̯]. For ei likewise [ɛi̯] by extension (although I can't confirm right now, I think Sydney Allen suggests the vowel quality of a short e for both of these), however, I oppose this all semivowel i-final diphthongs (except yi), as they are not recognized in the Roman grammarian tradition to my knowledge, and too many of them can be created to no clear benefit or purpose (ui ai ei oi? how would they be distributed? very contentious and pointless) and I prefer recognizing them as simply u/a/e/o + [j] as a consonant.
2. Prefer [s̺]. Overall s being "special" is a popular topic, and it looks to be citable as [s̺] with a majority of scholars behind it.
3. Prefer [ɫ] in syllable coda, other than as part of /ll/. Prefer [l] for clear l without any further specifications. Identity of /l/ in syllable onset is fraught with some difficulties, and the easiest way out would be to follow Allen and make the distribution of light and dark l similar or identical to that in British RP -- clear before all vowels. However I'm not opposed to having a full distribution of dark l, where it is assumed before most vowels.
4. Prefer [ŋn].
5. Prefer [ʷ] for all three.
Draco argenteus (talk) Draco argenteus (talk) 07:10, 7 May 2025 (UTC)Reply
@Draco argenteus I strongly dislike any scheme that mixes the symbols [j w] and [i̯ u̯] to denote offglides. I feel like the transcriptions [j w] tend to imply the sounds have more phonetic constriction than [i u] and tend to suggest the sounds function as consonants on the phonological level. I don't think there is sufficient evidence to conclude that the "i" in words like deinde, deinceps, cui, huic either functioned as a consonant or had more phonetic constriction than the vowel [i]. But if we do adopt such an analysis, I think it makes most sense to apply it equally to seu and laus, ancient doctrines about diphthongs notwithstanding, so I would prefer [ae̯s] [sɛw] [laws] [ˈdɛjndɛ] [hʊjk] over [ae̯s] [sɛu̯] [lau̯s] [ˈdɛjndɛ] [hʊjk].
Thanks for pointing out that passage in Terentianus Maurus about the pronunciation of "eu". I'm not certain about how the first element of diphthongs was pronounced and I don't object to using [ɛi̯ ɛu̯ ʊi̯] instead of [ei̯ eu̯ ui̯].--Urszag (talk) 22:57, 7 May 2025 (UTC)Reply
It's possible to read the i-final diphthongs into cases for some reason not recognized by anyone as diphthongs, e.g. eius, maius, Troia, cuius (eius and cuius are both incidentally perhaps phonetically just ei and cui with one more syllable). Here for some reason the idea of there being a diphthong is not adopted, and the ancient view is used that the consonantal i is just said geminate. For this reason I'm for general [j] to eliminate the arbitrariness (diphthong when it's cui, nuh-uh when it's cuius), while I support a diphthongal identification for au and for eu for traditional reasons. Which I understand sometimes just slavishly follow Greek, though it is on that token applicable for Latin given that eu predominantly occurs in words coming from or through Greek and is then employed in Greek-derived poetic meters. But yes, I just suggest it all because it's as good a place to draw a line in the sand where diphthongs end and vowel+consonant sequences begin as any other. Re: constriction and so on, I do not see or imply any phonetic difference whatsoever between [i̯] and [j] (these are both a non-syllabic i), except that one conceptually here marks a part of a diphthong for me and the other doesn't. Possibly significantly, eventually the so-called "ui" and "ei" diphthongs lose their diphthongal quality in versification and become sequences of two syllables, although this could be a consequence of doctrine regarding diphthongs. Whereas an analogical uncoupling of au and eu did not happen, to my knowledge, nor of any other diphthongs; in fact this makes i-final diphthongs (other than yi imported from Greek) unique in Latin for being freely dissolved, and in fact, I believe, not considered diphthongs anciently. Draco argenteus (talk) 06:24, 8 May 2025 (UTC)Reply
@Draco argenteus It's true that if we use [ɛi̯ ʊi̯], there will be alternations between [i̯] and [j]. But as Cser points out, there are also alternations between [u̯] and [w] in contexts like the conjugation of verbs like faveō, fautum. I don't see how the one is more arbitrary than the other, and to me, it seems more arbitrary to distinguish [u̯] from [w] but not [i̯] from [j] than to make both distinctions in parallel, based on the position of the sound in the syllable. Given your position of "I do not see or imply any phonetic difference whatsoever between [i̯] and [j]", do you find the phonetic transcriptions [ɛi̯ ʊi̯] acceptable, even if they are not your first choice? I feel like the variable scansion of "ui" and "ei" actually is an argument for transcribing them with two vowel symbols, rather than a vowel symbol + a consonant symbol, since the disyllabic variants unambiguously contain two vowel sounds.--Urszag (talk) 20:53, 8 May 2025 (UTC)Reply
Sure, it's not my first choice but I find it acceptable, at least for now. Maybe it can be discussed later. The variable scansion does bring one to that topic. But, at this point, monosyllable will be easier for the readers who want to scan Classical verses. Draco argenteus (talk) 21:29, 8 May 2025 (UTC)Reply
[s̺] alongside [z(z)] would suggest a difference for which we, to the best of my knowledge, do not have evidence. Nicodene (talk) 19:13, 9 May 2025 (UTC)Reply
@Benwing2 If I'm not mistaken, the changes favoured by a majority are:
  • [a] not [ä]
  • [y] not [ʏ]
  • remove dental diacritics
  • syllable onsets use [j w] not [i̯ u̯]
  • final nasals use [ĩː ẽː ãː õː ũː] not #[ɪ̃ˑ ɛ̃ˑ ãˑ ɔ̃ˑ ʊ̃ˑ]
  • [s] without diacritic
  • [l] not [lʲ]; [ɫ] okay in syllable coda (not in the geminate [ll])
  • keep [ŋn]
Questions on which we seem to remain undecided are:
  • Representation of diphthongs
  • Representation of e.g. aqua, lingua, suavis
Perhaps we can go ahead with just making the first group of changes for now? Nicodene (talk) 03:47, 14 May 2025 (UTC)Reply
Sounds good to me. I'll make the changes in a day or so. Benwing2 (talk) 03:54, 14 May 2025 (UTC)Reply
To give my view:
  1. Fine with all the changes proposed.
  2. My view's hardened on preferring [kʷ ɡʷ sʷ], for all of the reasons I've given above, as well as the fact that they never underwent the post-Classical shift from [w] > [v] (which isn't direct evidence for how they should be treated in Classical, but it's supportive of the other points mentioned).
Theknightwho (talk) 23:35, 14 May 2025 (UTC)Reply
I went ahead and implemented the "changes favored by a majority" and also disabled the phonemic notation unless |include_phonemic=1 is given. I didn't touch the representation of diphthongs or of qu/gu/su, so we are for now remaining with what was there before, which writes [ae̯ oe̯ au̯ ɛu̯ ɛi̯] and [kʷ ɡʷ sʷ]. I didn't change the use of l-pinguis before non-high-front vowels; this might be up for discussion. Benwing2 (talk) 06:15, 15 May 2025 (UTC)Reply
Thank you! One immediate request that I have that follows from the disabling of the phonemic transcriptions is to put the syllabification marks in the phonetic transcriptions. They were removed on the basis that a syllable division is not a phonetic entity. This is strictly correct; however, even though "." is not a sound, syllabification has audible effects and the syllabification of words like abluō is important to the meter of Latin poetry. Of course, this request is conditional on other editors agreeing with me.--Urszag (talk) 06:29, 15 May 2025 (UTC)Reply
I completely agree with this FWIW. Benwing2 (talk) 06:31, 15 May 2025 (UTC)Reply
OK I went ahead and restored the dots. It's a one-line change so we can always undo it if there are objections. Note that praeiūdicō is not being handled correctly either at the Classical or Ecclesiastical level; we'll need an additional rule for this, maybe. Benwing2 (talk) 06:42, 15 May 2025 (UTC)Reply
NVM, it's fixable with a manual syllable break. Benwing2 (talk) 06:44, 15 May 2025 (UTC)Reply
Looks much better, thank you. Agreed about the desirability of marking syllables. For /l/ before vowels, I'm happy with either this set-up (clear before /i:/ or /i/) or the ‛Allenesque’ type (clear before all vowels). Edit: I do however have some doubts about [ɫ] after [C], as in [ˈfɫoːs].
For Ecclesiastical, some possible points to discuss:
  • [ä] → [a] and removing dental diacritics, as with Classical
  • [kʷ ɡʷ sʷ] → [kw ɡw sw], as in Italian
  • [s̬] → either [s] or [z] (for intervocalic s, as in rosa)
    • [s] is more traditional/‛proper’. [z] seems to be gaining ground, as in Italian. [s̬] seems to be an attempt to transcribe both at the same time.
Nicodene (talk) 07:04, 15 May 2025 (UTC)Reply
I just noticed all the diacritics disappeared.
  1. If we're going to simplify things, can we document all the scholarly discussion at Appendix:Latin pronunciation? The [] is where I learned about [s̠], [t̪] and stuff and now it's gone and new learners won't discover this.
  2. Our new IPA is broad so shouldn't it use //?
  3. @Nicodene, Draco argenteus: [s̺] (dental) is the opposite of the [s̠] (laminal flat postalveolar) that was there before. Speaking of which why didn't anyone mention "postalveolar," why was [s̠] there, and are there any scholars arguing for it?
174.138.213.2 01:56, 16 May 2025 (UTC)Reply
  1. Yes, we should include information about the articulation of Latin sounds at that appendix.
  2. The term "broad" doesn't have a clear definition. Sometimes it is used as a synonym of "phonemic transcription": our transcription is not "broad" in that sense since it marks non-phonemic allophones such as /l/ as [ɫ] or [l], /i/ as [i] (before vowels) or [ɪ] (before consonants). A phonemic transcription represents phonemes. (For reasons discussed above, it is difficult or controversial in some cases to determine what Latin phonemes were present in a word, as in magnus, lingua, ēnsem, or they might not be transcribable with IPA because that alphabet doesn't have letters for abstract phonemes like a "placeless nasal" segment.) The 1999 handbook of the IPA introduces "broad" as a synonym for phonemic transcription, and then differentiates between various kinds of narrow transcriptions. Our current scheme for Latin would be categorized by its criteria as "allophonic"/"systematic narrow", of the subtype "slightly narrow" as opposed to "very narrow". Some quotes: "it is possible (and customary) to be selective about the information which is explicitly incorporated into the allophonic transcription", "Narrowness is regarded as a continuum" (page 29).--Urszag (talk) 02:34, 16 May 2025 (UTC)Reply
If we're going to simplify things, can we document all the scholarly discussion at Appendix:Latin pronunciation?
One suitable place to host this information would be the article Latin phonology and orthography.
The [] is where I learned about [s̠], [t̪] and stuff and now it's gone and new learners won't discover this.
The picture of would-be precision that our transcriptions previously gave was in large part fantastical.
Our new IPA is broad so shouldn't it use //?
// is for phonemic transcription. [] is for phonetic transcription, which can vary in broadness or narrowness. Some specialists like to use ⟦⟧ to distinguish (very) narrow transcriptions.
@Nicodene, Draco argenteus: [s̺] (dental) is the opposite of the [s̠] (laminal flat postalveolar) that was there before.
[s̺] is apical (apical alveolar in this case).
Speaking of which why didn't anyone mention "postalveolar," why was [s̠] there, and are there any scholars arguing for it?
For some citations pointing to scholarly discussions about Latin /s/, see here (with a brief overview here).
Nicodene (talk) 02:54, 16 May 2025 (UTC)Reply

On a new centralized citation system for bibliographic references

edit

Happy 1st of May, my dear fellow editors. I would like to propose adopting a new system for handling citations and bibliographies, centered on a template I've developed called {{bibref}}, which works much like Wikipedia's {{sfn}} or our {{zh-ref}}. This system addresses several longstanding issues with how we cite sources, each requiring its own reference template:

  1. Creating or editing these templates is not easy for beginners and tedious even for experienced editors.
  2. They have grown into the hundreds, making them hard to manage and standardise.
  3. Full citations are quite lengthy considering they have to be repeated on each entry, and when an entry has a good number of them it hinders readability. Some have started to hide reference sections in boxes whenever they get too unwieldy.
  4. The most common way to have |pageurl= link to Google Books or the Internet Archive is to invoke Module:ugly hacks. It is unfortunate that we are still relying on a workaraound for such a basic feature.

The new system would be using {{bibref}} in reference sections (e.g. as in 𐁁𐀴𐀍𐀦 (a3-ti-jo-qo)), which makes abbreviated citations linking to a full bibliography (e.g. the Mycenaean one), itself generated starting from a JSON-like database (e.g. Module:bibliography/data/gmy). I believe this system has a good number of benefits.

  1. The citation syntax and the process of adding sources is simpler and more human-readable, with no need to learn convoluted wikitext syntax and template conventions.
  2. Entry pages remain focused on their content, while full bibliographic details are offloaded to the dedicated bibliography pages.
  3. Each language (or, where appropriate, each family) would its own bibliography page, making it easy to see what works have been cited, as opposed to the current system of checking the template categories (e.g. the Armenian one).
  4. Centralisation also improves maintainability and consistency. It becomes easier to find errors, dispreferred formatting, or missing metadata.
  5. Although the system is definitely far from perfect at the moment (a proof of concept made with Mycenaean in mind, possibly lacking features essential for other languages), I believe it more adaptable to future technical changes. Bots (or tireless editors) will not have to update hundreds of individual templates to enforce them.
  6. All this may encourage better referencing habits. By making it easier to cite, editors moreare likely to actually include proper references.

I started this with Mycenaean Greek, as the examples I made earlier show, and similar templates have had succesful precedents in Chinese and other languages of the Sinosphere (Japanese, Korean, Vietnamese), as well as on Wikipedia. The template should ideally be moved to {{R}}, by analogy of {{Q}}, to save up key strokes (currently under {{bibref}} because it would not have been a good idea to create a template with that name without prior community consensus). If adopted, the transition would be slow and gradual, allowing both systems to coexist.

Catonif (talk) 19:16, 1 May 2025 (UTC)Reply

I believe this system has a good number of benefits.
I also believe so.
I am not sure whether editors will find it easier to cite or to decipher.
Some have started to hide reference sections in boxes whenever they get too unwieldy. Now you hide them on separate pages, as admittedly Chinese pages already do. In any case designs on avoiding to repeat a reference in full if you link another page in the same work are legitimate, which however often is not a need in inline references. See in قالة#References I wanted to reference Høst, Georg Hjersing … page 272 and 277 and not repeat Høst, Georg Hjersing … after 272 again. Another solution would be to separately templatize a linking mechanism and/or only access that via a template that only writes the output “page xxx (linked)”.
I cannot comment on the ugliness of the ugly hack and the fairness of the new page fetching mechanism: it would have to be smart enough to distinguish within volumes of the same work or even works from collected works as in {{R:sem-eth:Littmann}}, no? In general the abbreviated style appears more legitimate in philologies of ancient languages, especially Trümmersprachen, where but academically inclined people read and “of course” know what Nakassis 2013 is, because they have these references all the time.
Either way it sounds like fun. You present new logics many people will not wrap their heads around, or otherwise will succeed in it and then forget about it, if interested so much in Wiktionary as the steady commenters here. It is a bit like trying to convince smokers of substituting their habit with vaping. The implication that there is no need to learn, and practice, something convoluted, and that editors are more likely to actually include proper references by the presented solution, has little verisimilitude from my angle: various citation styles being employed across multiple pages, as the varying qualities and workflows disgorged in them, sound equally plausible. (Lots of anarchists here.) Fay Freak (talk) 20:31, 1 May 2025 (UTC)Reply
Fair points. We can scratch the "easier" aspect of point #1 as it is subjective, though he current system is certainly not easy either: vaping has its inevitable flaws but it is less harmful than smoking. And let's scratch point #6 as well, optimism is not evidence and I do not claim to see the future.
And you are right, this works at its maximum potential on languages with a relatively stable bibliography, such as ancient languages or obsure LDLs, but in the end, even languages with greater literature often have those few go-to sources everyone ends up of-course-knowing. Not sure if you mentioned the Høst as a point in favour or a point against, but with the new system it would be {{bibref|ar|Høst:1781|p=272|p2=277}}, handling the page urls as well. And yes it can handle Littmann, and anything it cannot handle it can still be made to handle, the infrastructure is versatile. Catonif (talk) 21:25, 1 May 2025 (UTC)Reply
If I understand correctly, there will be no easy way of finding which pages use a specific reference. Like we currently do by "What links here"? Vahag (talk) 20:43, 1 May 2025 (UTC)Reply
Right, I can set up a tracking mechanism for that. Catonif (talk) 21:26, 1 May 2025 (UTC)Reply
@Catonif Hi. Are you essentially proposing a replacement for {{Q}} that works similarly but is better written and designed? BTW as for the proliferation of reference templates, before @Vininn126 created all the 6,000 or so Old Polish ones that currently exist, I suggested incorporating them into {{Q}}, but I wasn't able to help out because I didn't have the time and didn't (and still don't) understand how that monstrosity of a module works. I would generally be in favor of that but I'd like to get some more info on the specifics, and in place of things like p=272|p2=277 I'd encourage using a single commma-separated param with inline modifiers if necessary, as it's usually a lot easier to type. Benwing2 (talk) 22:18, 1 May 2025 (UTC)Reply
@Benwing2 Hi! The template is meant for references, so it actually aims to replace the {{R:}} templates, while {{Q}} will keep being used for quotations. About the specifics, I will eventually write a more exhaustive documentation, though for now you can get a rough idea of how it works by seeing the Mycenaean data module and all its current istances, alongside the the Mycenaean bibliography database and its outcome. There is still a lot that needs to be fixed and polished, of course, but thought I would go into that after getting consensus, not to waste time in case people would have disagreed. About inline modifiers, you may add that syntax to the module if you want, although it may get messy. Take for example {{bibref|gmy|DMic.|v=1|mi-ta|p=454f.|da-ra-[.]-mi-ta-qe|p2=157ab}}, resulting in DMic., vol. 1, pages 454f.: “mi-ta”, page 157ab: “da-ra-[.]-mi-ta-qe”. With your syntax you could do |mi-ta<p:454f.>|da-ra-[.]-mi-ta-qe<p:157ab>, and for |p=272|p2=277 intuitively |p=272, 277, but for |mi-ta|p=454f.|p2=157ab? I will leave it up to you if you want to meddle with the idea, although I do not recommend it. Catonif (talk) 23:21, 1 May 2025 (UTC)Reply
@Catonif Thanks. In terms of inline modifiers and commas, I see you are making |p= go with the first term and |p2= go with the second. I definitely think in that case that inline modifiers are better because it gets hairy if you have several terms, although I have a module Module:parameter utilities that specifically supports both inline modifiers and separate numbered parameters for list parameters like this, which I have used for things like {{syn}} that support both syntaxes. For | I was suggesting this under the assumption that |p= and |p2= were two pages for the same term rather than page parameters for separate terms. If you do need a way of specifying two pages for the same term, definitely use comma separators (and without a following space; the principle I've used is that comma + space is used for embedded commas and the separator isn't recognized in such a case). I see no issue with |p=454f.,157ab in case we need to refer to two pages for the same term and the pages have more complicated specs like just given. I will take a look at your implementation but in general it would be nice if {{R}} and {{Q}} were synchronized rather than being two entirely different implementations and double the cognitive burden for editors. Benwing2 (talk) 23:45, 1 May 2025 (UTC)Reply
Actually the templates were for Polish, not Old Polish. Vininn126 (talk) 07:11, 2 May 2025 (UTC)Reply
On this note, I've been thinking about a template to more easily organize multiple reference templates. Something akin to {{reflist}}, but for non-inline templates, and the ability to control their style and even group them by whatever categories are needed for the entry. Vininn126 (talk) 07:16, 2 May 2025 (UTC)Reply
Looks very nice. Nicodene (talk) 09:28, 2 May 2025 (UTC)Reply
I am inclined to migrating to this new system. Thanks for having worked on it, Catonif. What appeals to me most is the standardization of syntax. Currently some people give the page number with |page=, others with |1=. Some give volume number with |volume=, others with |vol=, yet others with |2=. It's a pain to remember which template uses which. I also like the new technical capabilities, such as generating separate external URLs for non-sequential pages; the usual templates link only the first page. The automatically generated bibliography list for a given language is also very valuable for researchers in and of itself.
My concerns are:
  • Looking for reference templates by typing R + language code + first letters of the author name in the search bar will not work anymore. Searching for the ID in the data module is tedious. It would be nice if the bibliography list generates an easily copypastable ID. We could then look at the bibliography to find what we need.
  • The references at the end of each article will now be cryptic, barely comprehensible collections of numbers, letters and surnames for regular users without following the link to full bibliography. That means each article alone will be incomplete. I don't mind this as I want to capture readers inside Wiktionary biosphere, force them to read several articles, follow crumbs and maybe solve a riddle before I give them the answer. But others prefer to give full etymology chains, full cognate sets, full definitions, full references on each article, creating self-contained units that can be screenshotted and shared on Twitter.
  • Filling in data modules like Module:Quotations/xcl/data is a pain. Memorizing the rules of filling in the new proposed bibliography databases is worth only if my next point is solved.
  • The most prolific reference creators will have to be brought on board voluntarily or forcibly. At least me, the Fairy Freak but also User:AshFox who favours a peculiar syntax in reference templates. If not, we will have to memorize even more ways of formatting references. More pain instead of less pain.
Vahag (talk) 12:41, 2 May 2025 (UTC)Reply
Is the transition to this system mandatory in the future? For example, I am currently actively editing Old Novgorodian and references for it... Appendix:Old Novgorodian bibliography. I'm ready to try to move all this into one module... But why, if I need to edit one specific R, should I scroll down each time, look for the necessary line and in one huge list in the conditional Module:bibliography/data/zle-ono. Is this really more convenient? AshFox (talk) 13:28, 2 May 2025 (UTC)Reply
We don't even know one how one reference template belongs to one language only, so we would scroll multiple lists or use a search after trying one or more lists. Here lies the advantage of {{refcat}}, and {{quotation template cat}} and the categories these templates (formerly nude category syntax) place references in.
How is the resource hunger of the new module accessing these lists? They should only access but one line, like now language data is accessed, otherwise the current citation templates are faster also in this respect.
Still the former can't be mandatory obviously because the outlook of moving thousands of templates to then hide the complete references from the main space and gain a bibliography is not motivating at all, and even leaves the impression – though it be irrational, if somebody agrees to it, which however nobody should suffer – that those who industriously created citation templates to source well and keep the Wikicode clean are now punished. In fact Category:Reference templates by language is a bibliography. The only thing we need is |pageN= or |p=454f.,157ab, as @Benwing2 proposed, within wonted {{cite-book}} templates, and another syntax (like ! determines whether page or pages is written) for only outputting the page without even the reference, useful when the reference is used within multiple footnotes, and on talk-pages discussing pages of a work, and in tables like Appendix:English dictionary-only terms. Fay Freak (talk) 17:31, 2 May 2025 (UTC)Reply
Thank you very much for the input! I will try to tackle the points you have made, and I appreciate the opportunity to improve the system with your help.
  1. @Vahagn Petrosyan: It would be nice if the bibliography list generates an easily copypastable ID. It now does! Try going on a bibliography page (e.g. gmy, or now that AshFox made it, zle-ono) and you will notice on the side bar a link that says "Show editor utilities" (the precise wording can be changed). This shows all the IDs of the sources for easier copy-pasting and searching, alongside a link that takes you to its usage tracker.
  2. [E]ach article alone will be incomplete. That is true, and I agree this is a shift in our philosophy. But I'd argue that (1) information is not too cryptic if there is a bright blue link that shows you what it means. We could even set up the page previews gadget so that a quick hover over the link could show you the full citation. And (2) as I think we both agree, it is not each article that needs to be complete but the project in its entirety. Readers who come here just to see one entry and then clear off or see our entries via screenshots on Twitter probably do not really care about bibliographic details anyways, while for editors and researchers centralisation does pay off.
  3. @AshFox: Is the transition to this system mandatory in the future? I am not the kind of person to go out and impose what I think is the best option on such a great userbase, I was only planning this on the technical side and did not consider amending editing rules to demand this. That said, as Vahag said and as FF illustrated with the comic strip, dual systems can eventually create friction. My hope is that the transition can be gradual and community-driven, and that the system proves useful enough that it gains traction naturally.
  4. Your work on Old Novgorodian bibliography is remarkable and really does you credit, you set a high standard to compete with. What the system aims to do is to make this kind of excellence easier to replicate. Note about [...] should I scroll down each time, look for the necessary line and in one huge list [...] that the search feature of your browser (Ctrl+F) should be about as fast as looking up the template name in the search bar.
  5. @Fay Freak: [W]e would scroll multiple lists or use a search after trying one or more lists. Good point, for that reason I added the option to import sources from one bibliography to another, { import_from = "LANG" }, so sources can be shared accross multiple bibliographies just as now they appear in multiple categories.
  6. [L]eaves the impression [...] that those who industriously created citation templates to source well and keep the Wikicode clean are now punished. I hope not! and hope that they rather feel relieved they do not have to do that anymore. The work that went into those templates are the foundation we are improving upon, not something we are throwing away.
Catonif (talk) 19:27, 3 May 2025 (UTC)Reply

When do we create a translingual section for an orthographic element [letter, diacritic etc] and when do we list individual languages separately?

edit

We have two competing approaches, with no clear guideline.

A look at the entry for ⟨a⟩ illustrates how ridiculous articles can become if we attempt to list every language that uses a basic alphabetic letter. And often a letter will have closely related uses in a number of languages that have influenced each other orthographically. In such cases summarizing that usage in a translingual section would make sense.

On the other hand, we've set up categories for letters and diacritics of individual languages, and have navigation templates for individual alphabets that link to those language sections.

In some cases, a letter or diacritic is used for a single language, and it would be odd to call such situations 'translingual'. Examples are some of the Arabic letters of Serer and Rohingya orthography. At the extreme is ⟨Ⱦ⟩, which doesn't even have a lowercase form because the only orthography that uses it is monocase capital. [Unicode added l.c. just in case it's ever needed, but AFAICT its only use is as a typewriter hack for phonetic symbols that the writer doesn't have available, and which wouldn't be enough for Unicode to encode it.] Another example is ⟨[b⟩, originally a hack for barred b that AFAICT is only used for Kiowa. Sometimes a letter + digraph combo is created independently for two languages, which again would be odd to call 'translingual', esp. if the glyph origins were unrelated.

So, when I come across an article on a Unicode character whose only content is a 'definition needed' tag, and I find it's unique to a single orthography, should I create a dedicated language section for it? What if it's only recorded from two? Or if there are more, and we already have an alphabet nav template for one of the languages, or a category for the letters used by that language? kwami (talk) 23:37, 1 May 2025 (UTC)Reply

2.5 thoughts: (1) I don't know if we can sensibly avoid having lots of language sections on a, W etc, (1.5) unless perhaps we move things like pronunciation (/ˈdʌbəlju/, /veː/) to appendices? (2) Maybe we could solve/avoid the question of 'how many languages counts as translingual?' by replacing ==Translingual==, in character entries, with a header like ==Character==?
- -sche (discuss) 18:17, 3 May 2025 (UTC)Reply

Category: "native english words"

edit

This isn't entirely a serious suggestion, but it nonetheless seems interesting to talk about. I saw that "native Korean words" is a category for Korean, and while Korean etymology on this site seems to be handled inherently differently from English (IE for Korean it's usually "first attested in..." rather than "from proto...")

But a category for native English words wouldn't be a horrible thing to do, even though it's weird and unnecessary.


To make it wholly clear a "native English word" would mean this:

proto indo european --> proto germanic --> proto west germanic --> old english --> middle english --> english

Which means no Old Norse words, no Latin words that came to Old English, no words that are known to be of Celtic origin brought into Proto Germanic (such as "iron"), none of that kind of stuff. The word must be confidently accepted as having come straight through the etymology chain I've listed.

This is quite a silly suggestion, I know. Troopersho (talk) 17:24, 2 May 2025 (UTC)Reply

The other way around, "native Korean words" is a poor excuse of a category and should be nuked. — SURJECTION / T / C / L / 18:02, 2 May 2025 (UTC)Reply
@Surjection It does actually have some benefit due to Korean being a language isolate- in effect, its own family. Chuck Entz (talk) 18:50, 2 May 2025 (UTC)Reply
Hardly. For one, there's Jeju, which is more likely a closely related language than a dialect. Secondly, we have Old Korean and Middle Korean as separate languages, so any 'native' Korean term should be marked as inherited from either. Thirdly, the category has been misused numerous times, because it is added by {{ko-etym-native}} - which people have on occasion added to obviously recent compounds, some of which were even formed from obvious recent borrowings. — SURJECTION / T / C / L / 18:57, 2 May 2025 (UTC)Reply
It is a useless category because you can browse Category:English inherited terms and so on. Fay Freak (talk) 18:42, 2 May 2025 (UTC)Reply
@Troopersho The category for this is Category:English terms inherited from Proto-Indo-European, which excludes all borrowings from other Indo-European languages. Theknightwho (talk) 10:21, 9 May 2025 (UTC)Reply
Let us please get rid of the Korean category. Polomo47 (talk) 15:30, 9 May 2025 (UTC)Reply

Requested unprotection of sweet summer child

edit

In 2021, the above page has been indefinitely protected, allowing only autoconfirmed users to edit it. The user who did this, and who left the project in 2024, gave this reason, based on what they found on the talk page at the time: Excessive vandalism: people keep falsely adding the claim that this pre-dates modern Game Of Thrones books. However, as shown in (Talk:sweet summer child#Etymology) since, those claims had been largely true. We need to rewrite the etymology section, which I removed for now. We could use any help we can get, including from unregistered users. Regardless, the original reason for the protection wasn't valid, so there's no reason to keep it in place. Renerpho (talk) 04:43, 4 May 2025 (UTC)Reply

@Renerpho: It is the first citation of this sense at Citations:sweet summer child. Despite claims that it was used earlier, no citation was added (note: we already have the ones mentioned on the talk page under “poetic allusion of various meanings”). J3133 (talk) 05:01, 4 May 2025 (UTC)Reply
I have restored the etymology as your rationale, “those claims had been largely true”, is without evidence; we already mention that “isolated occurrences go back to the 1800s” (i.e., the mentioned claims which we already had). J3133 (talk) 05:06, 4 May 2025 (UTC)Reply
I think we may be stuck until (if ever) any other etymology-dictionary or scholarly/reference work looks into this. (I'm surprised by how strident the people who think GRRM either definitely did or definitely didn't coin it are.) I think any wording has to hedge, and acknowledge the prior attestations. The current wording leans towards saying he coined it, but does hedge enough, I think. (I will note that many words and names which people have held him up as coining, like Margaery, have turned out to long predate him, so I would view any unhedged statement that he definitely coined this with scepticism.) - -sche (discuss) 01:13, 7 May 2025 (UTC)Reply

Nakba as WOTD

edit

I noticed that Nakba was set as a WOTD for next week. I have no problem with the entry itself but I'm worried that featuring it on the main page might cause controversy given the current political situation. Brexiteer was cancelled a while back for similar reasons (link to that discussion). What do we think? (@Sgconlaw) Ioaxxere (talk) 06:54, 5 May 2025 (UTC)Reply

Happy to go with whatever the consensus is. It was proposed on the WOTD nomination page. — Sgconlaw (talk) 11:24, 5 May 2025 (UTC)Reply
I don't think I have a problem with the entry or the upcoming feature, and as entries for words go, the entry looks more or less fine. In general, I think I'd enjoy WOTDs more when they are just interesting, qualifying words picked out of a hat and aren't politically topical or remembrance based; or, if there's a good entry that also happens to be political, it can be featured on any arbitrary day instead of holding it on a themed day, and that way avoids some controversy too. But I can see that the featurer takes much pride in constraining nominations into day-related themes. (Otherwise it probably gets boring.) I wonder if forcing themes for every WOTD also biases against featuring the many nominations that are just "plain" words (adverbs, interjections, case in point, blud, ouster). Anyway, there are definitely more than enough interesting and not politically controversial words to feature, for next time! (The pace of nominations is currently slower than one a day, I wonder how come there was such a large backlog previously?) Hftf (talk) 11:55, 5 May 2025 (UTC)Reply
@Hftf: on your last point, it could be because some editors like to nominate a whole raft of terms at one go. If you look at the list of nominations you’ll see some instances where there are multiple nominations all with the same timestamp. — Sgconlaw (talk) 12:43, 5 May 2025 (UTC)Reply
I'd vote against WOTD for any politically charged words like this. There might be some sufficiently aged as to have lost their ability to inflame. DCDuring (talk) 13:36, 5 May 2025 (UTC)Reply
Strategically not the best feature in view of recent US government investigations against Wikimedia questioning its nonprofit status due to alleged foreign-influenced political propaganda. Israel couldn't care less, but we have to keep the main-page innocent enough for MAGA-hats. Fay Freak (talk) 13:42, 5 May 2025 (UTC)Reply
@Fay Freak I don't think we need to yield to such poorly-motivated political threats, as long as we aren't actually doing anything wrong. Wiktionary should remain free of any government interference and as far as possible should shirk any attempts to censor it, if we care to actually be a free dictionary that embodies the values we say we do. At any rate, I just despise the idea of having to conform to some arbitrary threat like this. The good thing is that Wiktionary wasn't mentioned in that document, as far as I could see, and generally Wiktionary seems to have nearly 0 optics effect compared to Wikipedia, so I'm sure we can get away with something as small as this.
But ultimately we should decide separately whether we actually want Nakba to be featured; I think it's a relevant and topical word, and I don't think the fact that it's controversial should disqualify it from being featured, as long as there is no opinion being promoted by its inclusion. Kiril kovachev (talkcontribs) 16:16, 5 May 2025 (UTC)Reply
Given the current administration's obsession with language ("banned words" etc.) it's probably just a matter of time before Wiktionary too will get in the crosshairs. Better not poke the bear. Jberkel 12:22, 6 May 2025 (UTC)Reply
We should under no circumstances kowtow to proto-fascist chuds and bullies. —Justin (koavf)TCM 22:23, 7 May 2025 (UTC)Reply
I don't really see a problem with this entry being feature. The only strong opinion I have, is that I agree with @Kiril kovachev, that we shouldn't yield to political threats from the current US administration. We should only prioritize the reader's feelings, not the presidents. — BABRtalk 19:04, 7 May 2025 (UTC)Reply
I think it's best to stay away from prominently featuring politically charged terms as WOTD, and the I/P area is about as politically charged as it gets. From what I can tell, Israelis and Palestinians have diametrically opposed views of the 1948 war, and featuring the term "Nakba" on Nakba Day will almost certainly be interpreted as a political statement and attract a lot of unwanted attention. The same arguments were (IMO cogently) made for not featuring "Brexiteer" on Brexit Day, and I think the same issue would come up if, for example, we were to feature the word aliyah on Aliyah Day. This has little or nothing to do with the current US administration and any hypothetical threats they may make, and much more to do with the fact that we are a dictionary, and need to avoid any appearance of bias. Benwing2 (talk) 22:20, 7 May 2025 (UTC)Reply
I think keep it: it's a word that someone may plausibly see written or hear spoken somewhere and may want to know what it means. —Justin (koavf)TCM 22:23, 7 May 2025 (UTC)Reply
If we are to keep it I would strongly argue moving it to a non-"themed" day. Benwing2 (talk) 22:43, 7 May 2025 (UTC)Reply
Yes, agreed with this. Fine to keep it - it's a valid word - but let's put it on some other day. This, that and the other (talk) 23:36, 7 May 2025 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── To sum up at this point (correct me if I'm wrong):

If we are to feature this word on another date, does anyone object to a date in May 2025? — Sgconlaw (talk) 13:54, 9 May 2025 (UTC)Reply

I think it should be on May 15 and capitulating to gross chuds is bad policy. I respect anyone who thinks that we should generally avoid contentious entries on the front page, tho and have no objection to it being on another day for that reason. —Justin (koavf)TCM 13:56, 9 May 2025 (UTC)Reply
@Sgconlaw To clarify my opinion, I would also not object to it indeed being on May 15, and in my opinion that would be the most relevant day to put it – but having it on another day may be less provocative, if that is what we are going for. Kiril kovachev (talkcontribs) 14:01, 9 May 2025 (UTC)Reply
As the emotionality of the discussion above shows: best to avoid contentious political words of all stripe, on all days. 2A00:23C5:FE1C:3701:5CD6:5C00:85E2:3C8A 14:01, 9 May 2025 (UTC)Reply
Thanks. (Just wanted to point out that 16 May is the International Day of Living Together in Peace, but I guess featuring the word one day after Nakba Day would also attract the same concerns expressed above …) — Sgconlaw (talk) 14:03, 9 May 2025 (UTC)Reply
I think it's fine to feature on the scheduled day. I think it'd be weirder to feature it on some random day. (The specter of Trump, invoked above, can and should be ignored. It's been amply demonstrated that people who comply in advance with whatever they think his likes and dislikes are, especially when, as here, he has no power to make them do anything, simply attract him to make more demands, whereas people who keep doing what they're doing succeed.) - -sche (discuss) 21:02, 9 May 2025 (UTC)Reply
@Sgconlaw My vote is that it not be featured at all, but if it is to be featured I'd prefer a date other than May 15, e.g. early June. @-sche Just curious, if we were to feature the word aliyah on Aliyah Day (see w:Yom HaAliyah) and call out (as we tend to do with "themed" words) the fact that this is a celebration of immigration to Israel, would you object? Benwing2 (talk) 21:24, 9 May 2025 (UTC)Reply
@Benwing2: I hazard that the concepts of punching up and punching down are relevant here. There is less objection to an entry that punches up—targets a group that is of greater power or status—than one that punches down. Thus, an entry that highlights an oppressed group is arguably less objectionable that one that highlights an oppressor or aggressor. — Sgconlaw (talk) 21:52, 9 May 2025 (UTC)Reply
@-sche: This applies for Trump personally. If we can get him or one of his cabinet to leave a snarky remark about Wiktionary disseminated in the media, I would of course support putting up something offensive. If insane or random enough, then you win against the madmen. But the concerns of ourself censoring ourselves, I want to highlight the speciosity of this line of argument, are far from material. The cover is not the book. Compliance only concerns self-presentation, not delivered value—mere marketing stunts, which are cap-a-pie interchangeable like a WordPress theme. There is no legal risk or political risk if you get away with it. But I believe that we are too clumsy and amorphous a mass of secluded scholars to manage the message control ruthlessly to our satisfaction, so there is innegligibly a point in unwanted attention. Fay Freak (talk) 16:13, 10 May 2025 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── Current tally:

Seems we are pretty tied at the moment. I guess it's OK if I cast a vote. @Ioaxxere, do you have a view? — Sgconlaw (talk) 21:17, 9 May 2025 (UTC)Reply

@Sgconlaw: I don't want to have this as WOTD, because it looks too much like Wiktionary is trying to promote one side of a VERY vehement dispute. How about having a dual WOTD on May 16, with both Nakba and aliyah, to show two sides of a dispute that constitutes one of the biggest challenges to realizing the goals of that day? Chuck Entz (talk) 21:49, 9 May 2025 (UTC)Reply
@Chuck Entz: I rather like that idea. Is aliyah a suitable word—is it a coordinate term to Nakba, or at least sufficiently related to be featured alongside it? — Sgconlaw (talk) 21:55, 9 May 2025 (UTC)Reply
No Hftf (talk) 22:22, 9 May 2025 (UTC)Reply
@Hftf: that's too terse. What are you responding to? — Sgconlaw (talk) 22:33, 9 May 2025 (UTC)Reply
It's not really a coordinate term (what's the hypernym?) and just kind of weird to make special exceptions because a word is in a controversial topic. Will it set a precedent that featured words related to controversial topics need to balance "camps"? This is a dictionary, it should once a day simply feature a word that is interesting and not particularly controversial, and having themes and attempts at counterpoints only increases controversialness of a feature, such that I'd rather just do neither/none, but I can't say if anyone else feels the way I do. Edit: To clarify (and repeating what I wrote above), I don't have a particular issue with the scheduled feature in the queue as-is; my vote would be to leave as-is along with a resolution to keep in mind/avoid future controversialness in word selection and theme, such as by assigning future controversial words to any day. Hftf (talk) 22:52, 9 May 2025 (UTC)Reply
my vote would also be to leave it as-is, I don't believe I said anything about the day it's featured. The only strong opinion I had was that we shouldn't self sensor because of the POTUS. — BABRtalk 05:19, 10 May 2025 (UTC)Reply
It's hard to say: the aliyah was made possible by the Nakba, but the latter wasn't its goal. They're different kinds of things, but they're inextricably linked. It just shows that history is more complicated than the narratives of either side can explain. Chuck Entz (talk) 22:32, 9 May 2025 (UTC)Reply
@Chuck Entz: right, I get it. I'm OK with going with those terms. Does anyone still have objections? — Sgconlaw (talk) 22:34, 9 May 2025 (UTC)Reply
There was no single aliyah: there were multiple waves of Jewish immigration to the Holy Land. —Justin (koavf)TCM 22:56, 9 May 2025 (UTC)Reply
I like this option less than any of the others. Having 'topical' WOTDs is OK I guess, but I think we should avoid making it seem like we're taking a stance on the referent of the word itself. If there's too much risk of that, it seems better to use another word or day. Trying to achieve balance by including words for "both sides" of an issue on one day seems to just further establish a politicized tone to the WOTD (bothsidesism is not apolitical: it is also a political position). I agree with what Benwing's said.--Urszag (talk) 22:53, 9 May 2025 (UTC)Reply
There should only be a single word on a single day. I think it is more evil to shy away from a word due to politics than to just go ahead with it. I would support all words, I would support pushing boundaries. Wiktionary is a more beautiful and long lasting thing than this temporary wicked era. Look forward to seeing my comment in evidence in court when we fight for WMF's freedom rofl sup Judge, please support freedom of expression. Geographyinitiative (talk) 23:06, 9 May 2025 (UTC)Reply
@Geographyinitiative: we have previously featured more than one term as WOTDs on a single day, for example, when we had an anagram theme. However, this is very much an exception for the simple fact that it takes twice as long to set the WOTDs—it's a lot of work. — Sgconlaw (talk) 15:09, 10 May 2025 (UTC)Reply
@Sgconlaw: On balance, I would support option #2, since really the political edge seems to come from the description of Today is Nakba Day, which commemorates and protests the Nakba. Trying to cut out any word referring to a historical event shies too close to self-censorship in my opinion. Ioaxxere (talk) 01:05, 10 May 2025 (UTC)Reply
Updating the tallies, I think that on balance most editors do not mind the term being featured (and feel that to deliberately exclude it would be self-censorship and cowing to bullies), but agree that it should not be on Nakba Day itself. There isn't much support for @Chuck Entz's suggestion of featuring both Nakba and aliyah on 16 May 2025. Thus, I'm going to shift Nakba to a less contentious date. What about (1) 15 November (anniversary of Palestine's declaration of independence in 1988); (2) 29 November (anniversary of the date when Palestine was given observer status by the UN General Assembly in 2012); or just (3) 31 May (no particular commemorative date)? — Sgconlaw (talk) 15:09, 10 May 2025 (UTC)Reply
I agree with Benwing's "I would strongly argue moving it to a non-"themed" day." (1) and (2) are still themed days, so I vote for (3).--Urszag (talk) 16:53, 10 May 2025 (UTC)Reply
@Sgconlaw I agree with Urszag, of course, and would vote for (3). Benwing2 (talk) 10:53, 11 May 2025 (UTC)Reply

Famous people bringing attention to Wiktionary

edit

There is a joke from the previous-to-last Reich about a fabricated German word repunsieren. A small number of blokes recline in the boozing ken when one of them designed that he shall be part of history, word history precisely, by not waiting till next day to introduce the verb, in this exact spelling, to general parlance, so he asked the buffetière whether he could do it, whereupon she was grossly offended, however in turn the alemaster—anxious as every neurotypical about the adherence to social conventions, though their origin be buried forever—had to assure his guests that of course they may, by which the first follower was gained. Soon afterhand one could read on signs in public whether said action towards the personnel was admitted, &c.

So who will get the Pope to cite Wiktionary? We are already repeat guests in DOI-literature, but hitherto only worldly and utmostly peripheral one. Fay Freak (talk) 16:49, 10 May 2025 (UTC)Reply

Enabling Dark mode for logged-out users

edit

Hello Wikimedians,

Apologies, as this message is not written in your native language. Please help translate to your language.

The Wikimedia Foundation Web team will be enabling dark mode in this Wiki by 15th May 2025 now that pages have passed our checks for accessibility and other quality checks. Congratulations!

The plan to enable is made possible by the diligent work of editors and other technical contributors in your community who ensured that templates, gadgets, and other parts of pages can be accessible in dark mode. Thank you all for making dark mode available for everybody!

For context, the Web team has concluded work on dark mode. If, on some wikis, the option is not yet available for logged-out users, this is likely because many pages do not yet display well in dark mode. As communities make progress on this work, we enable this feature on additional wikis once per month.

If you notice any issues after enabling dark mode, please create a page: Reading/Web/Accessibility for reading/Reporting/xx.wikipedia.org in MediaWiki (like these pages), and report the issue in the created page.

Thank you!

On behalf of the Wikimedia Foundation Web team.

UOzurumba (WMF) 00:08, 7 May 2025 (UTC)Reply

Street names

edit

(See RFD.) It feels weird to have things like A2 paper size and credit rating but not (as I understand our current CFI) A4, M4 or I-5 (roads). For that matter, I see some lexical value in entries for other (more name-ly named) roads: pronunciation; some have interesting etymologies; a few have translations.
It gives me pause that there are a lot of roads—then again, there are also a lot of personal names: it too is an open-ended class—but it feels weird to have the tiniest villages and rarest personal names, but not even the most prominent roads ... particularly since in some cases (e.g. "she lived near Foobar") it's unclear to "someone [who] run[s] across it and want to know what it means" whether a given thing is a hamlet, neighbourhood, name or other entry that they can expect to look up here, or a road we exclude. So I'm wondering if there's any appetite for changing CFI to allow at least some (or any attested—the same criterion as given names) roads.
(One idea: only allow road names that aren't just applications of personal- or place- names to roads, so no entry for Washington Street because we already have Washington the name.) - -sche (discuss) 01:03, 7 May 2025 (UTC)Reply

Roads with non-literal translations can be translation hubs, and the most prominent roads generally have figurative senses or connotations beyond just their literal meaning. Can you give some specific examples of roads you'd like included but which you think are currently excluded by CFI? If I think of Austin, for example, the "interesting roads" are the two main freeways, I-35 (which is formulaically named, so maybe not so interesting) and MoPac Expressway (technically "Texas Loop 1"; the name "MoPac" refers to the Missouri Pacific railroad and could be its own entry as a sort of abbreviation), as well as some streets with weird pronunciations: Manor /'meɪn.ǝr/, Manchaca /'mæn.ʃæk/ (recently renamed to Menchaca, supposedly closer to the original form, but the pronunciation hasn't changed), Burnet /'bɚn.ɪt/, Guadalupe /'gwɑd.ə.lup/, Brazos /'bræz.əs/; but all are the names of nearby towns or rivers, with the same pronunciation. I'm also concerned that unless we formulate a very narrow exception for roads, we'll be inundated with streets and roads from Joe Schmoe editor's home town. Benwing2 (talk) 06:12, 7 May 2025 (UTC)Reply

Drill commands Category:

edit

Should a drill commands category be created under military category? So far 4 languages have this type of category.

https://en.wiktionary.org/w/index.php?search=Category%3ADrill+commands&title=Special%3ASearch&ns0=1 𝄽 ysrael214 (talk) 13:34, 7 May 2025 (UTC)Reply

And if not, please delete tl:Drill commands category and its links to the pages. Thanks. 𝄽 ysrael214 (talk) 21:05, 7 May 2025 (UTC)Reply

Change "Negerhollands" to "Virgin Islands Dutch Creole"

edit

Would it be possible to change the language name for ISO 639-3: dcr from Negerhollands to Virgin Islands Dutch Creole? The former term is rather contentious, as it contains the Dutch N-word. The latter is also just a better description of the language and is the term used by Glottolog. 92.254.93.189 19:07, 7 May 2025 (UTC)Reply

Moved from Wiktionary:Feedback for discussion. —Justin (koavf)TCM 19:10, 7 May 2025 (UTC)Reply
Wiktionary and Wikipedia's entry on neger say it's not the "Dutch N-word" (for which a different word is used), but more like the English word Negro; yet it is still increasingly charged. The main argument against the change, as far as I can see, is potential confusion with Virgin Islands Creole (ISO 639-3 code vic), which is not the same thing and is an English-based rather than Dutch-based creole. Possibly for this reason, Wikipedia still uses the term Negerhollands. But in balance I think this rename makes sense. Benwing2 (talk) 22:37, 7 May 2025 (UTC)Reply

Inconsistencies between Persian and Tajik transliterations

edit

(Notifying Atitarev, Benwing2, Rodrigo5260, Saranamd, SinaSabet28, Samiollah1357): also @Light hearted sam, and @स्वर्गसुख (who's recently been editing Tajik)

There are two major inconsistencies between Persian and Tajik transliterations (that don't represent pronunciation differences) that I'd like to Iron out, I'd like to propose two changes for more cohesion:

1. Change Persian ğ > ġ to match Tajik, and other Arabic script languages like Urdu.

2. Change Tajik ʾ > ' to match Persian. Using ʾ in the first place is a bit weird, as it implies a distinction from ʿ, which Tajik does not even have a letter for. — BABRtalk 21:10, 7 May 2025 (UTC)Reply

I would prefer to change Tajik ġ to ğ, but I agree with your other proposal. Rodrigo5260 (talk) 21:27, 7 May 2025 (UTC)Reply
well ġ/ḡ are much more common ways of transliterating غ and ğ isn't really common outside of Turkic languages. — BABRtalk 22:06, 7 May 2025 (UTC)Reply
I also agree with the proposal generally. One could also consider ɣ instead of ġ, which is used in some sources. Samiollah1357 (talk) 22:11, 7 May 2025 (UTC)Reply
I think it's best to stick to modified versions of letters used in English, plus ɣ is already in IPA. One may consider all Arabic script languages adopting ḵ for خ and ḡ for غ, which I would like because it would match the common romanizations of 'kh' and 'gh', and would have more cohesion (for letters that are pronounced the same), though I'm not sure there would be support for such a thing. Plus, as many Indic languages use multiple scripts and try to match romanizations, we would have to bring in Indo-Aryan editors into the discussion so they can have cohesion and... it's probably not worth it. — BABRtalk 18:42, 8 May 2025 (UTC)Reply
I was wanting to propose the ḵ over the current x we have for خ since x could be confusing for someone not familiar with the transliterations here. Light hearted sam (talk) 08:02, 9 May 2025 (UTC)Reply
@Light hearted sam well, If you'd like to propose that, I would support using both ḵ and ḡ in our romanization (as a pair), but not ḵ alone. (ḵ and ġ feels weird) — BABRtalk 12:56, 9 May 2025 (UTC)Reply
  Support. Also ğ is confusable with ǧ, which is used in some Arabic transcription systems to represent /dʒ/. Benwing2 (talk) 22:39, 7 May 2025 (UTC)Reply
  Support. Please also restore the transliteration of ع word initially as ' as it has been for ever, even if Tajik doesn't use a letter for it. Anatoli T. (обсудить/вклад) 22:59, 7 May 2025 (UTC)Reply
  Support. Seems good. Light hearted sam (talk) 08:59, 8 May 2025 (UTC)Reply
  Support. स्वर्गसुख (talk) 17:58, 8 May 2025 (UTC)Reply

Transliteration of Initial Ayn

edit
(Top comment repeated for clarity)
  Support. Please also restore the transliteration of ع word initially as ' as it has been for ever, even if Tajik doesn't use a letter for it. Anatoli T. (обсудить/вклад) 22:59, 7 May 2025 (UTC)Reply
@Atitarev the change with ع that your referring to was proposed by Saranamd on discord, not me. I supported the proposal, but perhaps it was unfair to have the discussion on discord and not on-wiki. — BABRtalk 23:40, 7 May 2025 (UTC)Reply
@Babr: Yes, thanks. That's what I meant. We should have a discussion and agreement here first. Pinging @Saranamd as well. Anatoli T. (обсудить/вклад) 23:50, 7 May 2025 (UTC)Reply
For the context, the initial ع is not transliterated in:
  1. Classical Persian: عَائِلَه (ā'ila) (also Dari)
  2. Iranian Persian: عائِلِه (â'ele)
Arabic: عَائِلَة (ʕāʔila)
Urdu: عائِلَہ ('āila) Anatoli T. (обсудить/вклад) 23:57, 7 May 2025 (UTC)Reply
Yes, I would also like to add that I support Saranamd's proposal to remove it, because transliterating initial ع implies a pronunciation difference between ع and consonantal alif/alef (ا), when in reality, they are both representing glottal stops. — BABRtalk 00:02, 8 May 2025 (UTC)Reply
@Babr: A plain ا doesn’t carry any written consonant, unless it has a hamza, above or below, as in أ or إ.So, the glottal stop is not spelled, unlike the case with ع. Anatoli T. (обсудить/вклад) 03:56, 8 May 2025 (UTC)Reply
@Atitarev What you saying applies to Arabic, not Persian. In Persian, hamza cannot appear at the beginning of a word (and thus is never seated on an initial alif), because an initial alif is always a glottal stop. — BABRtalk 04:17, 8 May 2025 (UTC)Reply
@Babr: There is no difference. What you're saying is that the glottal stop consonant is never written in Persian when ا is used. If it's not written, we don't transliterate it. The letter alef/alif is not a consonant, it has a special purpose.
There is no problem of NOT writing ' in the transliteration of Classical Persian اَلِف (alif) or Iranian Persian اَلِف (alef).
The first letter in Arabic words اِسْم (ism) and إِسْرَاع (ʔisrāʕ) are pronounced the same way /ʔi-/ but are transliterated differently dependent on how they are spelled. The symbol "ʔ" is used for the hamza, not for the alif and that's the same way for Persian or Urdu, even if hamzated alifs are not used often, especially or never word-initially. Anatoli T. (обсудить/вклад) 05:51, 8 May 2025 (UTC)Reply
@Atitarev No dialect of Persian (except Judeo-Tat if we consider that a dialect) has ever made a distinction between initial ا, أ, and ع, and we know this to have been the case from medieval sources. All three have always been pronounced as non-phonemic glottal stop. How it works in Arabic (which in any case is not simply a spelling artifact, because اِسْم (ism) and إِسْرَاع (ʔisrāʕ) are phonemically different beyond just the spelling in the way they interact with other morphemes in a way that is not true in Persian) is not relevant for Persian.--Saranamd (talk) 06:23, 8 May 2025 (UTC)Reply
@Saranamd thank you for explaining it better than I could. — BABRtalk 06:32, 8 May 2025 (UTC)Reply
In what world, nation or dialect is ا a consonant by grammarians? There is no equality between letters ع and ا.
@Saranamd, @Babr. I can sense stubbornness, unwillingness to undo. You haven't brought any good arguments. ع may only represent a glottal stop or nothing but should not be ignored in transliterations. It's just not right. Tajiks dropped it word-initially because of the pronunciation but that change is reflected in the spelling.
Is it because you're just trying to match Tajik оила (oyila) with Classical Persian عَائِلَه (ā'ila)? Well, you "achieved" it by dropping the essential letter for users not familiar with the Perso-Arabic script!
Each standard per Romanization of Persian uses a symbol for letter ع and so did we, until the change. Anatoli T. (обсудить/вклад) 06:50, 8 May 2025 (UTC)Reply
@Atitarev
"In what world, nation or dialect is ا a consonant by grammarians?"
In the Persian alphabet, only consonants can carry a vowel diacritic, a vowel cannot carry a vowel. By all definitions alif may act as a zero consonant or a vowel, depending on whether it is acting as the syllable onset or the nucleus.

"There is no equality between letters ع and ا."
They are both consonants representing a non-phonemic glottal stop. If initial alif is a zero-consonant, and initial ayn is pronounced the same, then they are both zero consonants in the initial position.

"Well, you "achieved" it by dropping the essential letter for users not familiar with the Perso-Arabic script!"
A wise man once said "[a] literal transliteration is not very useful" + many 'essential letter[s]' are not distinguished including ذ ز ض ظ which are all transliterated as 'z'. If we were aiming for that, then every letter would be distinguished, which isn't the case for most Arabic script languages on Wiktionary. — BABRtalk 07:19, 8 May 2025 (UTC)Reply
@Atitarev I agree with Babr and do not see why spelling-only features not part of the spoken language should be reflected in the transliteration. This is not done for Urdu or Ottoman Turkish, or indeed for Persian with other homophonous letters that are an artifact of Arabic spelling.--Saranamd (talk) 07:27, 8 May 2025 (UTC)Reply
Historically a Persian/Ottoman/Urdu transliteration scheme that had dedicated letters for each Arabic-script letter was indeed common (and remains common in academia), but my understanding is that this was traditionally (1) due to historical issues with printing Arabic script in an otherwise European-language text, (2) to the benefit of scholars unable to read Arabic script, and (3) in some academic contexts, in order to ensure consistency with Arabic. Since we now provide the Arabic script prominently in all entries and our purposes are to document the current language, none of these issues seem particularly relevant. Saranamd (talk) 07:29, 8 May 2025 (UTC)Reply

Animals as Foods in Maltese

edit

In Maltese most animal meats have the same name as the animal, such as baqra, tiġieġ, fenek, ħut and others, just like in English, though for distinction can be said as laħam tal-fenek etc.

What should be done to add these to the mt:Foods category since the animals themselves aren't food? Should we make new entries such as laħam tal-fenek or add a new definition like in English entries stating The meat from this animal? Melithius (talk) 00:27, 9 May 2025 (UTC)Reply

How about adding them to Category:mt:Meats? Whether or not you have separate definitions for the animal vs. the food, chicken as a food is a type of meat.
By the way, to partly answer a question you asked elsewhere: English is unusual in having words for meats that are completely different from the words for the animals they come from. That's because at one time the peasants who raised the animals spoke English, but the paying customers (and landlords who were paid with what the peasants produced) spoke French. Basically, if it was too big to hand to someone as a recognizable animal, they had to tell them what kind of meat it was in the recipient's language: the meat of a cow (from Old English ) was beef (from Old French buef); sheep (from Old English sċēp), mutton (from Old French mouton), etc. (it's more complicated than that, but you get the idea).Chuck Entz (talk) 03:53, 9 May 2025 (UTC)Reply
Thanks for answering that question, very interesting!
Right, but if I add them to mt:Meats, can I not as well add them to mt:Foods? Thats what I did initially but I was told ‘the animal itself is not food but its meat is’ so thats why I suggested a sense for meat. English chicken and rabbit has this as sense 2: ‘meat from this animal’, so I suggested we do the same and add another sense.
I still think they should just be added to mt:Foods as most of the time the meat and the animal share exactly the same name.
So do you think they should just be added to the categories as is? Melithius (talk) 07:39, 9 May 2025 (UTC)Reply
We categorize the cultural understanding, right? German Meerschweinchen (guinea-pig) is raised by some—not outlawed like Hund (dog) and Katze (cat), don't ask me for the constitutionality—for meat, but most 18-year-olds (soon Schulabgänger) are not informed about this, so only German Schwein (swine, hog, pig, also pork) gets added. Neither is Känguru (cangaroo) even though most speakers have only immediately seen it in the restaurant, where it was my favourite. In Russian you can use свинья́ (svinʹjá, swine, pig, hog) in a meat meaning, because the homo sovieticus is a sore coarse person, but the proper way is to say свини́на (svinína, pork), in the same fashion derived terms are used for the other common meats. On a quick glance people do it intuitively right, though I admit that I aligned my lexicographic reasoning with expected intuition. Fay Freak (talk) 11:24, 9 May 2025 (UTC)Reply
So what do you think should be done for Maltese if everyone here understands that for example 'tiġieġ' is chicken meat (assuming context is given). Melithius (talk) 11:30, 9 May 2025 (UTC)Reply
Add it as one of the Category:mt:Meats, as I think that few would oppose adding Schwein to Category:de:Meats, where it currently does not reside. I refer to @Fenakhay for confirmation of my belief on the Maltese side, though I wonder about the criteria for doing otherwise. I don't see a rule like “the term should specifically mean a meat and not the animal”.
There is another argument, of supporting vocabulary learning, one of our primary purposes as a bilingual dictionary; when I started Arabic, as well as any other language, I specifically made note of common grains as you should have staple-foods in your vocabulary and so meats are a separate list, where you don't need to have the term for dog or dog meat unless you learn rural Korean; the ones of animals and vegetables are longer lists, but you talk about fauna and flora in different contexts. Fay Freak (talk) 12:27, 9 May 2025 (UTC)Reply
In English, sentences like he [some man] is eating chicken can contrast with sentences like he [a giant in a fairy tale] is eating a chicken or ...is eating chickens, and since we consider the lemma in all three cases to be chicken (not e.g. *a chicken), this helps to show that the animal and meat senses are distinct. On the face of it, it seems reasonable to also have distinct definitions for the corresponding senses in Maltese, as you suggest. (If anyone thinks the senses should not be distinguished, perhaps they can articulate why.) - -sche (discuss) 20:41, 9 May 2025 (UTC)Reply
Ah okay thanks. There are similar grammatical distinctions in Maltese using the article il- Melithius (talk) 20:50, 9 May 2025 (UTC)Reply

Template:bg-conj

edit

What a monster of a template! More than two screenfuls are dedicated to instructing the reader how to form compound tenses. This isn't even the same scenario as German or French, where verbs vary in terms of which auxiliary they use (avoir/haben vs être/sein) - it seems like a lot of the Bulgarian table's content is just hard-coded directly into the Lua module; only the past participle varies.

What's worse, some of the instructions are pretty intricate:

Use the present indicative tense of съм (leave it out in third person) and гово́рил/говори́л1 m, гово́рила/говори́ла1 f, гово́рило/говори́ло1 n, or гово́рили/говори́ли1 pl

I do not think there is any value in repeating these lines of grammatical textbook content in every single Bulgarian verb table. If you know Bulgarian grammar, it is a waste of space for you. If you don't know Bulgarian grammar, it seems to me that it is presented in too abbreviated a manner to actually be useful - you have to consult other entries anyway to construct the forms.

Our readers of Bulgarian entries would be much better served by presenting only the "atomic" forms in the verb table, and moving the instructive material to a dedicated Appendix:Bulgarian verbs.

It's worth also comparing the verb table for closely related Macedonian. The verb morphology of these two lects seems to be very similar, yet {{mk-conj-table}} (e.g. at стрела (strela)) presents the compound tenses in a much more reasonable (imo) format.

Pinging @Benwing2, @Atitarev for input. This, that and the other (talk) 12:59, 9 May 2025 (UTC)Reply

@This, that and the other I haven't had any issue with this template over the years, but now that you mention it, there is quite a lot in there :) In my opinion, it is nice to have virtually all of the possible constructions mentioned in the table, because then it at least shows to readers that they all exist – but, you might be right that a change may be in order.
Maybe we can ask any known Bulgarian readers, to see whether they would find the table better without the grammatical instructions, or whether they've found them useful. Pinging @SimonWikt, what do you think about this?
Macedonian is definitely a very good format though. Very compact and to-the-point.
By analogy to Macedonian, I think I might make the following hypothetical suggestions to the Bulgarian format:
  • The imperatives can be moved after the base verb forms like the indicative, etc.;
  • The participles might be better to go after the imperative;
  • We can potentially give (actual examples of) the constructions of the compound tenses etc., like Macedonian seems to do as well, in the places which currently consist of instructions.
Also, I feel like I remember something similarly-named to {{bg-conj-full}}, but I don't quite recall what it's called, but as far as I remember it tried to fully expand the instructions into their actual forms and was generally not used on entries, probably because it was very huge. I'm not sure whether that would be helpful to reference as well here. Kiril kovachev (talkcontribs) 14:20, 9 May 2025 (UTC)Reply
As a learner of the Bulgarian language and not yet fully conversant with the grammar I have found the conjugation tables with the instructions very useful, it is helpful to not have to go to other pages or appendices.
I would however agree that a change of layout would be be helpful:
  • Imperative and possibly conditional after the indicatives
  • Participles at the end.
SimonWikt (talk) 16:31, 9 May 2025 (UTC)Reply
I don't necessarily have a problem with having tons and tons of forms but having them be collapsible and collapsed by default could be a good solution. Vininn126 (talk) 17:03, 9 May 2025 (UTC)Reply
@Kiril kovachev See съм (sǎm) for an example with the "full" table. It uses e.g. {{bg-conj|съм<irreg.impf.intr>|full=1}}. @This, that and the other Bulgarian verbs are very unlike any other Slavic language except maybe Macedonian (although I think Macedonian simplifies the verb system somewhat compared with Bulgarian). Languages like Russian drastically reduced the Proto-Slavic verb system; OTOH Bulgarian not only kept all the original complexity but (if I recall aright) added a distinction between aorist and imperfect l-participle, and much more significantly, innovated a 4-way evidential distinction that cross-cuts all other categories, essentially quadrupling the number of possible forms. The way to form all the distinct evidential categories is through various periphrastic constructions, but the mapping between category and construction is not very simple or easy to remember, hence the table that spells out the way to form each possible tense/aspect/person/number/evidential combination. As you can see by comparing съм and говоря, there are two styles, the "compressed" one used on most verbs and a "full" one used on only a few verbs. The structure of the verb tables was already in place when I rewrote them in Lua; I didn't change that. We could definitely make the tables look more like the Macedonian ones but I still think they would be bigger, as (AFAIK) Bulgarian has more distinctions than Macedonian (also Bulgarian has free stress while Macedonian has fixed stress on the antepenultimate, and some verbs have more than one possible stress, as in the example with говоря that you quote). Benwing2 (talk) 19:19, 9 May 2025 (UTC)Reply
Thanks for the input; this is incredibly insightful and I'm grateful for the constructive ideas!
It seems like there is some agreement on what can be done to improve the template. As much as the purist in me would like to get rid of the compound tenses altogether (especially the perfect tenses, as their construction seems totally predictable), it seems there is value in keeping them.
Keeping these tenses means the table will remain long. It would be possible to make the compound tenses hidden (collapsed) by default within the template, as Vininn suggested.
I'd like to propose a couple of ideas for {{bg-conj}}:
  1. Keep the template largely as is (noting that it is wider than it needs to be and this cannot be easily fixed in its current form). Use Module:roa-verb/style.css to get dark mode colors. Move infinitive up and participles down, as suggested by Kiril and Simon. Maybe un-bold the instructions to reduce loudness.
    or
  2. Rewrite the template along the lines of the Macedonian template. Rather than shouting instructions at the reader, we give them an example of how to form each compound tense, alongside a concise explanation of the rule. (Having mocked this up, I feel this is much more useful than the instructions alone.) I've made a mockup of this option at User:This, that and the other/bg-conj - very much open to input on the layout and formatting.
Thoughts on this? @Benwing2, Vininn126, SimonWikt, Kiril kovachev This, that and the other (talk) 11:54, 10 May 2025 (UTC)Reply
@This, that and the other Your mockup looks good to me. Can you complete it with the remaining evidential/tense/aspect/mood combinations and also sketch out a "full" one (like what currently is in use for съм and ща, and maybe should also be used for имам)? If you do this, I should be able to modify Module:bg-verb to follow the new table format. Benwing2 (talk) 10:52, 11 May 2025 (UTC)Reply
@This, that and the other Wow, this is really cool. Sorry I didn't respond for 5 days – looking at this now though, I like your new idea a ton. If you'd like I can try to fill this in tomorrow with the remaining forms so we can look at how it would look like fully filled-in? (If you are short on time of course. Otherwise I wouldn't want to interfere with it if you wanted to do it yourself.) Kiril kovachev (talkcontribs) 23:35, 15 May 2025 (UTC)Reply
@Kiril kovachev Just speaking for myself (and not for TTO), if you can do that, it would be great. Definitely, the new structure is better than the old. Benwing2 (talk) 23:43, 15 May 2025 (UTC)Reply

New Persian transliteration proposal

edit

(Notifying Atitarev, Babr, Benwing2, Rodrigo5260, Saranamd, SinaSabet28, Samiollah1357): and @स्वर्गसुख

Concluding the above thread where these suggestions were made:

  • Transliterating initial ع:   (Mostly) rejected
  • Changing Tajik ʾ to '   Agreed
  • Changing Persian ğ to ġ, matching Tajik.   Agreed

On the last point, while I originally agreed, I thought about changing x (خ) to ḵ since it would match with Arabic and avoids any confusing for those not familiar with the transliteration system. And to make it consistent (+ matching with Arabic again), change ğ to ḡ instead of ġ. So the new system would be:

  • Persian غ and Tajik Ғ (+ fa-ira ق): From ğ/ġ/ğ to
  • Persian خ and Tajik Х: From x to

Light hearted sam (talk) 19:31, 9 May 2025 (UTC)Reply

  Support, but only on the condition that they both match, so I'd like ḡ and ḵ, but i'd be opposed to say ġ and ḵ. Also, the fact that ḡ and ḵ match the common romanizations of gh and kh are a plus for me. Plus it would kinda match transcribing ENP ڤ and ذ as ḇ and ḏ? But honestly, I think x and ġ are already an improvement, so even though I slightly prefer ḡ and ḵ, I'd honestly be content either way. — BABRtalk 19:57, 9 May 2025 (UTC)Reply
As for the transliteration of ع, it makes sense only when transcribing the dialectal pronunciations of places like Nishapur or Kulab, where /ʕ/ exists as a sound. E.g. عسب (ʿasb) for the dialectal form of اسب (asb). Samiollah1357 (talk) 20:41, 9 May 2025 (UTC)Reply
It is only x in Iranistik, never . Don't forget that Persian is an Iranian language, not an Arabic dialect. Vahag (talk) 20:51, 9 May 2025 (UTC)Reply
@Vahagn Petrosyan what? many Persian dictionaries use ḵ/k͟h and ḡ/g͟h (In fact, I see those more frequently than x and ğ), and they match the much more common lax transliterations of kh and gh. And I'm not sure why usage by Iranists matters but the largest Iranistik encyclopedia (Iranica) exclusively uses ḵ and kh, never x — BABRtalk 21:00, 9 May 2025 (UTC)Reply
I meant historical linguistics works, not synchronic dictionaries. Vahag (talk) 21:17, 9 May 2025 (UTC)Reply
Linguists and lexicographers of modern Persian overwhelmingly use a variation of kh in their transcriptions (as do the literal government's of Iran, Afghanistan, and Tajikistan). I don't agree with the notion that the practices of Iranic historical linguists trumps the practices of literally everyone else — BABRtalk 21:55, 9 May 2025 (UTC)Reply
Oh but to clarify: I'm only saying that I disagree with the idea that the usage of ḵ is outlandish (it's very common), or that we must use x because certain linguists do. I'm not trying to imply that we must use ḵ or anything, lol. — BABRtalk 23:27, 9 May 2025 (UTC)Reply
The transcription or transcription of Persian should match Old Persian, the other modern and historical Iranian languages, and Proto-Iranian as *ráwxšnaH. Hence introducing is inacceptably odd. In addition we get a lot of Iran-stans who only know fictitious Iranistics including Goths in Northwestern Iran and are unbrokenly on the rag for making Persian appear to depend on Arabic.
In Arabic then, is an anomaly specific to English and we already discussed possibly dropping it, perhaps for x, otherwise for .
None of the representations of this phoneme is reliably understood, including the digraph by reason of its ambiguity, if you discount any educated understanding, so all arguments for popularity are specious. Even though we do not merely target full-time academics, we have to assume some familiarity with professional standards for international presentation of groups of languages for efficient use of the dictionary.
I agree with changing Persian ğ to ġ. Fay Freak (talk) 01:42, 10 May 2025 (UTC)Reply
@Fay Freak I do think that not wanting a random cutoff from x -> ḵ into modern Persian is fair. Though I don't think 'kh' as a diagraph is odd, it's by analogy to English 'th', t = /t/ and th = 'θ', the fricative equivalent. By the same logic, kh being /x/ makes sense. The underline is similar in concept, just less confusion because it's clear it's not a cluster of /k/ + /h/ — BABRtalk 02:24, 10 May 2025 (UTC)Reply
@Fay Freak To clarify, I wasn't trying to make Persian "depend" on Arabic (how would a change of translit do that?), and you do have a point. (But still, having all sounds with no letter in English be a variation of another letter would be a plus). I'll roll with the any reached consensus. Light hearted sam (talk) 09:03, 10 May 2025 (UTC)Reply
I agree with @Fay Freak and @Vahagn Petrosyan. The lax transliteration using the digraph <kh> is of course the most common, but if we're avoiding that, <ḵ> is indeed an artifact of Semitic transliteration schemes and <x> is the standard in Iranian linguistics. Encyclopaedia Iranica has a transliteration scheme overly dependent on Arabic, e.g. they write ث as <ṯ>.--Saranamd (talk) 10:39, 10 May 2025 (UTC)Reply
Encyclopaedia Iranica too uses <x> in diachronic articles. For example here, "NPers. āxor". Vahag (talk) 10:51, 10 May 2025 (UTC)Reply

Future of the Eggcorn Database - at Wiktionary?

edit

The following was posted on the American Dialect Society e-mail list

"Chris Waigl

"Sat, May 10, 10:34 PM

"to ADS-L "The Eggcorn Database has in recent months and years been rather unstable (broken security certificates, or even complete outages). The cause of this sorry state of affairs is a lack of investment of time and efforts by me, in part due to my procrastination before the rather daunting task of fixing the underlying issues of outdated, broken software.

"The site's status and future has been recently raised in a forum thread here: https://eggcorns.lascribe.net/forum/viewtopic.php?id=7470 . In it I provide some more background information as well as ruminate on options.

"Since then my thinking has come around forging a path forward in the following manner: a) back up all existing content and b) convert both the forum and the ECDB into a static site, preserving all URLs. I would also at this time move the site to a better hosting provider. However, this would be the end of new posts to the forum, and also the inability to resume EDCB entries. It could be revived in the future or the content reused with no more effort than it would take now.

"This is by way of an announcement. Any thoughts are welcome, too.

"Chris"

We've had a passing interest in this for a while. If we had more content, it might become something sustainable. We might win some new contributors as well.

Would it be worthwhile to offer to investigate hosting this? DCDuring (talk) 13:14, 11 May 2025 (UTC)Reply

What kind of eggcorns does it include? Ones that are found in many books, or ones that one person heard once? I.e., would many meet CFI or would it be akin to the List of Protologisms that was deleted? If it would not just be another List of Protologisms, but would contain at least a decent proportion of CFI-meeting things, then (to me, at least) it seems like the kind of thing we could easily spare an appendix for, or at least tolerate a userspace page being used for if the user were also making helpful edits to the rest of Wiktionary (e.g. adding the most commonly attested, CFI-meeting eggcorns to mainspace). But I don't know if the database is structured in a way that would make it easy to construct an appendix out of, or not. - -sche (discuss) 17:58, 11 May 2025 (UTC)Reply
I give a certain amount of credit to the man's active membership in the American Dialect Society. I personally have only the mildest interest in eggcorns, but thought some here might be interested. DCDuring (talk) 19:15, 11 May 2025 (UTC)Reply
Here is the database. It has 648 items. Of the few that I have looked at, all seem to have 3 or more cites, though not always from durably archived sources. Analysts or reporters on individual items include Mark Liberman, Ben Zimmer, and Arnold Zwicky. DCDuring (talk) 19:32, 11 May 2025 (UTC)Reply
Well, see it that way, Chris, there is no one to hinder you creating well-formatted and -supported entries. Most people aren't enough into the meme to be fastidious about what an eggcorn even is. Our definition of eggcorn does not really say by which logics or plausibility test a word is assumed and then declared an eggcorn, though I assume it is etymological connection, in which cases more serious people grown up in the Old World speak of misconstruction, so I guess other people than me would have entered insider baseball as an eggcorn instead of misconstruction, and for the philologies of historical centuries we are at a loss, where there are lots etymological reconnections actualized by speakers, or, if the language is educated and artificiality or lectio facilior is admitted, writers, but the fashionable linguists inventing such novel terms who are early adopters of the blogosphere don't appear to have lifetime over to think through these matters, and they always ignored me anyhow when I wrote them, very slick in not failing to appear professional so much that they would feel to be downgraded contributing here anyway, before noticing that you gain some practicality in mirroring the wilds of language by admitting for some inconsistency. Fay Freak (talk) 12:41, 12 May 2025 (UTC)Reply
I emailed Chris about this and she responded:

Thanks so much for your message and kind words about the Eggcorn Database. I have a lot of respect for Wiktionary and other Wikimedia projects, so it's great to have a personal connection.

It's an intriguing thought to integrate eggcorns into Wiktionary. I'm not sure how good a fit it would be. And I would have to give thought about the licensing issue. We clearly neglected to resolve that back then, and now of course a lot of the contributors have dispersed. On the other hand, online lexicography is certainly in flux and will hopefully be, so we should at least remain aware of each other's projects and future opportunities for coordination or collaboration.

My message to ADS-L got several private responses including from the "old" online linguistics blogging community I used to be part of, 20 years ago. Now we can add podcasting to it. One thing Wiktionary wouldn't solve is the future of the forum and residual community, and that's where several of us have been thinking about. I'm not sure where this is leading, but it's what I want to explore first. This said, a category "English Eggcorns" in Wiktionary could exist in addition to any other eggcorn site, if we can resolve the licence suitably.

I'll let you know where the conversation is going. I'll certainly not delete anything or make it impossible/harder to retrieve the content. But I may prioritise stabilizing the hosting situation and finding a home for the comunity in the forums.

Best,

Chris

Ioaxxere (talk) 17:51, 12 May 2025 (UTC)Reply
Thanks. Judging from the modest enthusiasm both here and at the Eggcorn Database, I'm skeptical about the prospects. I'm not even sure that there is any interest here in "eggcorns' as a category as opposed to "misconstuctions". DCDuring (talk) 19:08, 13 May 2025 (UTC)Reply
I mean, there's enough enthusiasm for eggcorns here that we do have Category:English eggcorns; I see no problem with looking at the eggcorns the database has found, checking which meet our own CFI, and spinning up our own entries for those. But that would indeed not solve the issue of where to host the forum. - -sche (discuss) 00:19, 14 May 2025 (UTC)Reply
The forum could be at Wiktionary talk:English eggcorns (with a relevant guideline/policy page) or Category talk:English eggcorns. —Justin (koavf)TCM 00:30, 14 May 2025 (UTC)Reply
That would be practical, but I'm not sure that our environment would measure up to a collegial academic one. Also, I don't think our license is negotiable, which may not suit at least 2 of the distinguished contributors. DCDuring (talk) 01:21, 14 May 2025 (UTC)Reply

keep yourself safe; filter-avoidance terms (vs spellings)

edit

I don't think keep yourself safe is a filter-avoidance spelling: it's a whole different set of words, which I expect are not just spelled but pronounced differently in spoken filter-avoidance, like I hear people say sewer slide and corn aloud. My instinct is to redefine it as a {{synonym of}} like sewer slide, and mention filter avoidance in the etymology... but there seem to be a fair few entries in the same boat (we are also currently presenting grape as a "spelling", but in my experience it's also spoken differently, just like corn), so I want to check: do you agree with redefining keep yourself safe and grape as not mere spellings? and do we want a category for them, like "CAT:Filter-avoidance terms"? (We do categorize e.g. "archaic terms" as well as "archaic spellings".) - -sche (discuss) 17:42, 11 May 2025 (UTC)Reply

Filters are just one method of censorship. Before that people were writing things like "read that fine manual" to avoid sanctions from newsgroup moderators, and before that it was things like "jeepers, creepers" that were used to avoid punishment by parents, teachers, etc. And then there's the matter of how US English ended up with terms like chickadee and donkey... Chuck Entz (talk) 21:09, 11 May 2025 (UTC)Reply
So create {{filter-avoidance form of}}, abbreviated {{fa f}}. Everyone, conceding our diachronic perspective on languages, will own that this occurs at least when a spelling was only used to avoid textual filters, but later pronounced, possibly only because many people tried to be funny. The precedence of the former template will grow out of date when audio recognition in social media matures, so filter-avoidance forms will have distinct pronunciations in the first place. Fay Freak (talk) 12:15, 12 May 2025 (UTC)Reply
I agree that these are strictly not filter-avoidance spellings, and suggest it would even be bad to say that these terms are "filter-avoidance" anythings (i.e., that the users of the term were specifically intending to avoid filters) in Wiktionary voice without good evidence. They're in a class of terms I don't exactly know how to call, or even how they overlap in a Venn diagram with other types of euphemism and algospeak, but would label with something like "euphemistic" for now. Hftf (talk) 22:33, 12 May 2025 (UTC)Reply

Advice/Help with Appendix

edit

Arabic has an appendix for its verb forms (Appendix:Arabic verbs). I wish for Maltese to also have such appendix as I think its equally as necessary and helpful.

Can someone tell me how I can create this appendix please? Melithius (talk) 21:20, 11 May 2025 (UTC)Reply

@Melithius you can enter the page title "Appendix:Maltese verbs" in the search bar, do the search, and click the red link that comes up. This, that and the other (talk) 10:39, 12 May 2025 (UTC)Reply
Ah okay, thought it was more difficult than that lol. Thanks a lot :) Melithius (talk) 13:16, 12 May 2025 (UTC)Reply

Call for Candidates for the Universal Code of Conduct Coordinating Committee (U4C)

edit

The results of voting on the Universal Code of Conduct Enforcement Guidelines and Universal Code of Conduct Coordinating Committee (U4C) Charter is available on Meta-wiki.

You may now submit your candidacy to serve on the U4C through 29 May 2025 at 12:00 UTC. Information about eligibility, process, and the timeline are on Meta-wiki. Voting on candidates will open on 1 June 2025 and run for two weeks, closing on 15 June 2025 at 12:00 UTC.

If you have any questions, you can ask on the discussion page for the election. -- in cooperation with the U4C,

Keegan (WMF) (talk) 22:08, 15 May 2025 (UTC)Reply

Dhivehi written in Devanagari?

edit

After a bit of work, I've managed to get Category:Dhivehi terms in nonstandard scripts down to 62 entries. The rest are single-character/ligature entries in the Devanagari script (the one exception is a Thaana-script entry with a Devanagari-script alternative form).

For background: Devanagari is one of the main scripts of India, mainly due to the Hindi language. We use it for our Sanskrit entries, and there are other languages such as Nepali that use it as well. To the north into Pakistan, the Arabic script tends to predominate, and to the south the Dravidian languages there tend to have their own set of scripts.

Dhivehi is a bit of an odball: it's out in the ocean away from the other Indo-Aryan languages, and it has its own script (Thaana) created semi-randomly from those of other languages, including Arabic. It has also used a couple of other scripts in its history- but to my knowledge, not Devanagari.

Which brings up the matter at hand: Is there any evidence of Devanagari ever being a standard script for Dhivehi? I mean, not just used to write it here and there, but having a standard alphabetical order, etc.? The entries in question have definitions like:

  1. The twelfth consonant in Dhivehi, written in Devanagari

I would also note that there is exactly one actual Dhivehi word written in Devanagari in all of Wiktionary's mainspace entries (redlinked, of course)- the entries are only about the characters, not about what might be written using them. There are no references in any of these entries, though three of them have misspelled links to Omniglot's page on the Thaana script.

If Devanagari has been used for Dhivehi, we will need to add it as a standard script in the module, and we will need to provide references, not to mention some examples of actual usage. If it hasn't, we'll need to see about deleting all of these incorrect entries. A couple of them are already tagged for RFV, but I don't think they've been listed. Chuck Entz (talk) 06:33, 17 May 2025 (UTC)Reply