Wiktionary:Beer parlour

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021


June 2021

Adding Sumerogram, Akkadogram and Determinative to standard POSEdit

Hi! I've been working on Akkadian, Sumerian and Cuneiform Translingual entries recently. I've been using "Sumerogram", "Akkadogram" and "Determinative" as POS when needed, but I've been made aware those are not standard and could cause issues. (see 𒌉 for usage examples of Sumerograms, 𒀭 for Determinatives and 𒅆 for Akkadograms)

All three of them are necessary to structure Cuneiform entries for Akkadian and Sumerian (and Hittite, too) in a consistent way. Therefore, I'd like to propose adding them to the standard POS list. Do I have your vote? :D Sartma (talk) 09:08, 1 June 2021 (UTC)

Support. Tied to this, could "phonogram" be recognized as a header? See e.g. Old Korean .--Tibidibi (talk) 09:20, 1 June 2021 (UTC)
Per Fay Freak, revise vote to support "heterogram".--Tibidibi (talk) 16:51, 1 June 2021 (UTC)
On third thought, go back to supporting the original proposal. Oppose "logogram" because the category could lead to inconsistencies with other languages that use logograms, e.g. Japanese, and the existing POS setup should be preserved for those languages.--Tibidibi (talk) 17:49, 3 June 2021 (UTC)
@Tibidibi I was thinking last night that even if we end up choosing Logogram as POS, we would do so because it's the familiar term in Mesopotamian studies, in the same way Kanji is for Japanese, so it wouldn't really create any inconsistency with Japanese. Using Logogram for Akkadian/Hittite wouldn't necessarily mean we have to change POS for other languages. Sartma (talk) 18:05, 6 June 2021 (UTC)
@Sartma Sorry for the late response. I actually now fully agree with you, we can choose logogram as POS because these are just soft redirects where an orthographically worded header makes sense, which is not the case for the CJKV entries.--Tibidibi (talk) 07:22, 5 July 2021 (UTC)
Support. — Fenakhay (تكلم معاي · ما ساهمت) 15:41, 1 June 2021 (UTC)
uncertain - but could you at least add some actual definitions or translations to those entries. SemperBlotto (talk) 15:49, 1 June 2021 (UTC)
Hi! Sumerogram, Akkadogram and Determinative are categories of a cuneiform sign, in the same way Noun, Verb or Adjective are categories of a word. They classify the sign and under them we give a link to all the different words that can be written with that sign, so you will not find any "actual" definition or translation there. You find all relevant information in the page of the words listed under each category. If you take a second to check the pages I linked above you can see what I mean. Try clicking on a couple of the words linked as "Sumerogram of" or "Akkadogram of" under Sumerogram/Akkadogram and you'll be redirected to those words' entries. Sartma (talk) 16:55, 1 June 2021 (UTC)
@SemperBlotto: So the Akkadian and Sumerian word mentioned at مَيْس(mays) uses the sign 𒄑 (GIŠ) which wasn’t pronounced, when used as a determinative, but indicated to the reader that now the name of a tree follows (you might discern that the sign looks like a tree if you have a font for it installed). For this reason one might not parse the whole cuneiform string as a word so that one seeks a separate entry for 𒄑 (GIŠ), and any such signs, categorizating them as so-called determinatives, or taxograms or semagrams.
A heterogram is when you write mlkʾ, from the Aramaic spelling of the Semitic term for “king” *malk-, but actually mean and say شاه(šāh). They did such things frequently in the Ancient Near East. Fay Freak (talk) 17:16, 1 June 2021 (UTC)
Why not Heterogram? So one can use it later for Aramaeograms in Pahlavi etc. (@Victar) The definition line template {{sumerogram of}} already says “sumerogram”.
I guess Heterogram would work too, if you really are against Sumerogram/Akkadogram. It's just not a word you would find in Mesopotamian studies that much (I never saw it before now! XD), so it would be a bit confusing/alienating to people looking up Akkadian words. It says what it is, it just doesn't paint it a familiar colour. Like, when you study Akkadian you have glossaries and dictionaries with Akkadian words and then you have lists of "Sumerograms" (that unluckily never give the cuneiform, they're just like "A = water, A.BA = father, etc.) . I understand that Heterogram is more versatile, and I'm not against it in principle, but if there's no strong reason to change the labels, I would prefer to keep the more familiar ones. In the end, that's what they do in languages that use Han characters too (Japanese calling them Kanji, Korean Hanja, etc.). If we decide to go for Heterogram, then we should probably ask Japanese and Corean editors to change their entries accordingly too. Sartma (talk) 18:31, 1 June 2021 (UTC)
Actually, there's another word that's widely use in Mesopotamian studies: Logogram. That would be generic like Heterogram, including both Sumerograms and Akkadograms. What about Logogram? Again, if we choose a more general name, then for consistency we need to change also Kanji and Hanja, since they both are just Logograms.Sartma (talk) 10:56, 2 June 2021 (UTC)
@Sartma Two points:
  • There is a difference in that most people consulting Korean or Japanese entries are (hopefully) going to be casual learners, to whom "Hanja" and "Kanji" are the familiar terms, while most people consulting Akkadian entries will be people with at least some linguistics background who can be relied on to be more familiar with terms such as logogram, heterogram, etc.
@Tibidibi I don't think that we should base our decisions on the perceived or hypothetical readers of Wiktionary entries, arbitrarily discriminating by language (Japanese and Korean: ok; Akkadian: no, sorry): in other words, I'd like to be able to have a discussion based on facts and not personal feelings or perceptions. There will be a lot of casual learners of Akkadian and Sumerian consulting Wiktionary (judging by existing Akkadian and Sumerian entries, I can assure you that who wrote them was probably even more casually learning them than people using Wiktionary for Japanese and Korean...) to whom "Sumerogram", "Akkadogram" and "Determinative" are the most familiar terms (if not the only one they'll ever hear). I would like to write entries for the vastest possible public, but mainly for a public that's actually interested in Sumerian and Akkadian, not for general "people with some linguistics background". I'd like those entries to be useful to those who are studying those languages, not to "others" (what sense would it make to do otherwise?). Every language has its own "technical" terms. I'm not sure who we are pleasing by changing well established terms to favour others that would just make everything less clear, confusing and alienating. We don't do that with Latin, Ancient Greek or Sanscrit, were all traditional categories are maintained, whether they make "linguistically" sense or not. I'd like to see the same respect for Akkadian and Sumerian too. Sartma (talk) 15:58, 2 June 2021 (UTC)
@Sartma Okay, I take back the point about discriminating by langue. But we are not removing the "Sumerogram" and "Akkadogram" terms, and they are still displayed prominently in the page. They are still on the page due to {{sumerogram of}}, only the title of the header is "heterogram". So no information is lost, and if anything information is added; people will now know from the header that these are heterograms.--Tibidibi (talk) 16:10, 2 June 2021 (UTC)
  • Logogram is an extremely broad term while, to the best of my knowledge, full-scale heterogramic systems are more-or-less exclusive to the Ancient Near East, Japanese (modern and historical), and Old Korean; Chữ Nôm does not really use Chinese characters in this way. And since heterogramic entries are not made for Old Korean (there is no point because the phonetic component is not known) while Japanese has its own system already, it seems better to use "heterogram", which would become a more precise category exclusively used for extinct languages of the Near East. Modern Hanja are not heterograms.--Tibidibi (talk) 15:18, 2 June 2021 (UTC)
@Tibidibi: Logogram is a hyperonym of heterogram. The choice between the two should therefore take relevance and pertinence into account. Is it necessary to use "heterogram" instead of its hyperonym "logogram"? Does "heterogram" add any relevant/pertinent information that "logogram" doesn't express already? I'd argue that for the use in Akkadian entries "logogram" is sufficiently clear and there's no need to choose its hyponym "heterogram". The indication of a "foreign origin of the sign" is implicit in the further indication of the logogram as a Sumerogram or Akkadogram. Moreover, "Logogram" has the advantage of also being a very familiar word for people studying Akkadian and Sumerian: that to me is one big point in favour of its use. Sartma (talk) 16:29, 2 June 2021 (UTC)
I can concur; I know next to nothing of Sumerian and that stuff, but having "Logogram" as the header tells us enough; having each definition preceded by label "sumerogram of" and "akkadogram of" is clear, since it informs me of what to look for to know more about it. I actually understand what you're talking about here, which is sufficient for an entry. "Determinative" really should be a possible header, since it's used everywhere in the past, though the explanation you give reminded me more of jukujikun and the like. Knowing that the words make intuitive sense to those unfamiliar with the standard lingo, and having them agree with the accepted in-field jargon makes for this proposed system sufficing in my eyes. 110521sgl (talk) 14:31, 3 June 2021 (UTC)
  • I will also add here that the Korean hanja entries do not represent logograms (as in the glyphs themselves) but Sino-Korean morphemes, and most are closer to full lemmas than soft redirects. If anything the "Morpheme" header would be more appropriate, except that most Korean linguists agree that many Hanja used in modern Korean are not genuinely productive morphemes in modern Korean, especially given the decline of Literary Chinese education. So Hanja is really the only header that fits.--Tibidibi (talk) 16:10, 2 June 2021 (UTC)
@Tibidibi True, hanja in modern Korean are not logograms. They're just a different way to spell Sino-Korean morphemes, as you say. So, for example, 椅子 is just a different spelling of 의자. In modern Korean it's just a question of stylistic choice. Sartma (talk) 16:48, 2 June 2021 (UTC)
“determinative” should probably be added since taxogram and semagram are much less used and classifier seems restricted for a thing that is used with numerals, though determinative has another meaning we list and I personally prefer taxogram and semagram because these elite words are unambiguous and parallel to other -grams, and I see semagram is used with another meaning by word-gamesters (the one I knew first we don’t have yet, as with heterogram, I’m finna fix it). Fay Freak (talk) 16:47, 1 June 2021 (UTC)
Here too, I'm not against it in principle, but for the same reasons I'd prefer to keep Sumerogram/Akkadogram, I'd prefer to keep Determinative too. This is the word used in every Akkadian and Sumerian reference material (Dictionaries, textbooks, essays...); it would be confusing/alienating if we used something unusual in the field. Sartma (talk) 19:22, 1 June 2021 (UTC)
Support the original proposal. Nobody actually uses heterogram when working on these languages, so we'd just be causing confusion for no gain; it's not like we have a finite number of L3s we can use. —Μετάknowledgediscuss/deeds 16:46, 3 June 2021 (UTC)
Support 110521sgl (talk) 19:52, 3 June 2021 (UTC)

definite/indefinite articles againEdit

Hi @JoeyChen, I asked on your discussion page why you removed grammatical articles from glosses. You didn't answer and you insist on continuing to do it: Special:Diff/62639656. Then I raised the issue in BP last month and no opinions were offered in favour of your practice, but neither were any firm and clear guidelines offered against it. I think such a fundamental disagreement deserves a coherent discussion - perhaps even a vote? Surely it can't be that difficult to decide. Please engage. Brutal Russian (talk) 19:39, 2 June 2021 (UTC)

I can see why they're doing it. I'm quickly going to look at a physical dictionary for Latin real quick. Oh. I thought I remembered there being indefinite articles in it, but apparantly dictionaries don't give articles for Latin nouns. So JoeyChen's doing it right. (Woordenboek Latijn/Nederlands zevende herziene druk Amsterdam University Press, 2018) 110521sgl (talk) 14:35, 3 June 2021 (UTC)
@110521sgl:"Right" in the context of wiktionary is what corresponds to our editing policies/guidelines. These can be influenced by what other dictionaries are doing, but what other dictionaries are doing does not determine what we consider to be "right"; moreover, if one wants to determine what other dictionaries are doing, consulting just one isn't enough. {{R:OLD}} uses articles the way our English definitions use them, definite and indefinite; it does present its definitions as sentences finished by a stop. {{R:L&S}} seems to be inconsistent, but does use them in the same word; see further "FriezeDennisonVergil" on the same website (at the top, only for words used by Virgil). It seems to me that the way English speakers choose to present English definitions is a good guide to the natural way to present them, and this would make article-less glosses aberrant. In addition, translations in templates like {{m}} generally require the use of articles to distinguish parts of speech, and it's simply better when the definitions in these templates consistently reflect the definitions in the entires. Otherwise, why not gloss verbs without the to for ultimate confusion? Finally, do you really find it desirable to gloss eg. cantiō, pugna as "singing", "fighting"? If not, what's the point of making an exception for disambiguation instead of introducing a general rule? Brutal Russian (talk) 21:50, 3 June 2021 (UTC)

CFI for foreign languages should be spelt outEdit

Hi. I think the CFI for foreign languages is not clearly spelt out. If I understand correctly, the present consensus followed is:

  1. English Wiktionary should have entries for all foreign natural language words that exist in the foreign natural language. The definitions and descriptions should be given in English. Title should be in the foreign script.
    • Entry layout for foreign terms is identical except that it should not have Translations sections
  2. Foreign translations of English words should be added to the translations section of English entries.

I had long time not contributed anything to Wiktionary as I was unsure to what extend can foreign languages be added. Note that English WT has very less Indic language content (only 14,000 Hindi entries and just 1,400 Malayalam entries) despite their respective language versions of WT have over hundred thousand entries. WT:Statistics. So, please clear out the inclusion criteria for non-English languages in the CFI and other policy pages. I wish that a WT:Foreign languages page will be created. Thank you! Vis M (talk) 22:43, 3 June 2021 (UTC)

The main blocker is the nominal policy that we shouldn't just include an alleged word because some other dictionary has the word. The other is that we need usable translations into English of these words. Strictly, for non-English words, we give translations rather than meanings, though I think a lot of editors go for meanings rather then translations. There's no policy reason why English Wiktionary shouldn't include most of those lemmas - the reason is shortage of labour. On the other hand, there can be policies excluding some very obvious inflections, as for English. --RichardW57 (talk) 05:49, 4 June 2021 (UTC)
Ok, thank you very much! Vis M (talk) 00:24, 6 June 2021 (UTC)
The inclusion criteria for English are the same as for all other well-documented languages.__Gamren (talk) 23:32, 10 June 2021 (UTC)

Multiple click policy for subordinate entryEdit

There is a policy, though I don't know that it is documented, that the collection of meanings or translations for a word are kept at the main entry rather than duplicated across alternative forms or inflections. What, then, are the allowed uses of the gloss fields in many of the linking templates, such as {{inflection of}}, {{alternative form of}} and {{sa-sc}}, or indeed {{bor}}? --RichardW57 (talk) 05:35, 4 June 2021 (UTC)

One view, promulgated by Inqilābī, is that, "We provide the meaning in nonlemma entries only when there are multiple definitions in the entry", essentially that the purpose of these short glosses is to distinguish the lemmas. He's used this view to delete one of my brief gloss given for Pali သီလ (sīla, habit), which technically is a lemma, though some prefer to call it a soft redirect. (It actually stores script-specific information; by @AryamanA's rejection of the use of data-modules for word-specific information, irregular inflection can cause these subsidiary lemmas to involve a fair bit of work. Pali seems to have a host of irregularities.) --RichardW57 (talk) 05:35, 4 June 2021 (UTC)

I have been taking the view that these glosses can provide a one-stop service to the user who has temporarily forgotten the word; if he wants more meaning, he can click on, but if the reminder is enough, job done. So, may we attempt to be user-friendly by providing memory-jogging glosses? --RichardW57 (talk) 05:35, 4 June 2021 (UTC)

I also sometimes provide a brief gloss in non-lemma even when the lemma entry is unambiguous, especially if the main lemma is more than one click away (e.g. an alternative spelling of an inflected form or a mutation of an inflected form). —Mahāgaja · talk 15:08, 4 June 2021 (UTC)

User pages as self-contained lexiconsEdit

What do people think about User:Turkish Glossary of Untranslatable Expressions, and User:Términos de la psicología (earlier version User:Jimena rc)? While the subject matter is relevant to a dictionary, these are completely isolated from the rest of Wiktionary- none of these accounts have made any edits outside of their own user and user talk pages.

I'm probably going to delete the "psicologia" ones either way because the definitions are all in Spanish, but I think we need to discuss this- it's starting to look like a trend. Chuck Entz (talk) 21:19, 4 June 2021 (UTC)

I think they are in need of a kind of software for their vocabulary records, and are here because they have been conditioned to seek out a SaaSS. Fay Freak (talk) 02:24, 5 June 2021 (UTC)
This is an improper use of user names. As to the Turkish list: if attestable, why are these entries not simply terms/phrases in mainspace? (In fact, some are: (bacanak, cümbür cemaat, ellerine sağlık, kaçıncı, ulan/lan, üşenmek.) It is not at all unusual that some term has no direct equivalent in another language, or that some idiom does not make sense when translated word for word, or needs a usage note to explain when it can be used. The list has the appearance of having been copied from elsewhere, what with the remark “also seen in the photograph” while there is no photograph.  --Lambiam 17:13, 6 June 2021 (UTC)
My recent favourite "untranslatable" Spanish term is por el culo te la hinco. I was wondering how I'd translate that if it was in a film - probably have the character sing "Ah Ah Ah Ah Number Five Number Five" Beegees-style as a relatively humorous alternative. Indian subcontinent (talk) 20:52, 6 June 2021 (UTC)
...which probably explains why I'm not a film-script translator. Indian subcontinent (talk) 20:53, 6 June 2021 (UTC)
“In yer tewel I swive” Fay Freak (talk) 21:07, 6 June 2021 (UTC)
“In yo' mom's muff I dive” Indian subcontinent (talk) 22:54, 6 June 2021 (UTC)

English nouns lacking genitive formEdit

The following pronouns have no genitive formː there & relative which. The following common nouns are not found with a genitive form eitherː umbrage, sake, dint, worth, behalf, lack, basis, extent, means, stead, shrift, spate, heed, & cusp. What's a good way to deal with this?--Brett (talk) 16:35, 6 June 2021 (UTC)

Maybe I'm just dumb but how is umbrage any different from anger? I can't see how a possessive or genitive is acceptable for one but not the other. —Justin (koavf)TCM 20:40, 6 June 2021 (UTC)
I don't see anything that need to be "dealt with", to be honest. Make a list in a subpage of your username, I guess Indian subcontinent (talk) 20:47, 6 June 2021 (UTC)
Better corpuses? Better analysis? You ought to find that 'whose' does function as the possessive of 'which'. As a mathematician, I have no problem pondering a basis's cardinality. (It's in print as, "In fact, for any two vector spaces A and B, we can always find a vector space C, whose basis’s cardinality is big enough, such that A ⊕ C = B ⊕ C.") And googling quickly turned up, "Then we consider the case of unknown cusp's order and derive an adaptive wavelet estimator with the uniform rate slower only by a log n factor than the corresponding rate for known ffi." I also found, "Business Insider calculated that Amazon CEO Jeff Bezos made $160,000 per minute at his net worth's peak September 2018".
If such a lack were real and noteworthy, the 'Usage notes' seem a sensible place to mention such a lack. --RichardW57 (talk) 22:22, 6 June 2021 (UTC)
"the richness of sake's taste" FTW Indian subcontinent (talk) 22:59, 6 June 2021 (UTC)
That's just taunting. --RichardW57m (talk) 11:06, 7 June 2021 (UTC)
"If first extent's measurement in EAD = boxes, enter boxes in ASpace type." --RichardW57m (talk) 11:06, 7 June 2021 (UTC)
I suspect certain verb forms also happen not to have a 'genitive form', such as 'am'. Do clitic forms of verbs take the possessive clitic, or does it force the clitic to decliticise? This question seems to be more of an issue for a grammar rather than a dictionary. The clitic's realisation seems to be variable after 'is' and 'was', even amongst those who have mastered the apostrophe. (The question is whether the 'repeated morph constraint' gets applied.) --RichardW57m (talk) 11:06, 7 June 2021 (UTC)

Template:P: for "pronunciation"Edit

I asked for opinions on creating pronunciation usage templates back in March, but didn't receive any. Since then I've only created one ({{U:la:pron-dropvowel}}) because it rubs me the wrong way to create templates with monstruously long names. These result from the need to specify what type of usage template it is, for example. In my opinion the type is best distinguished by the capital-letter, and so I've just made {{P:la:4decl-neut}}, where P stands for "pronunciation". I'm not sure if Etymology needs its own letter, but no other sections that do come to mind, since the rest of the entry is basically treated as one section and the note generally appears under Usage notes. Do you think this is a good approach? Earlier-created pronunciation notes are often found in the Usage notes section, which I think is the wrong place for them, and I've been consistently putting them under Pronunciation. Brutal Russian (talk) 00:32, 7 June 2021 (UTC)

I'd go ahead and do it, and if it breaks anything someone will eventually realise. Also, don't worry about long names in templates - we have long-named stuff like Template:RQ:Chapman Mask of the Middle Temple and Lincoln's Inn and Template:RQ:Denham On the Earl of Strafford's Tryal and Death Indian subcontinent (talk) 22:35, 7 June 2021 (UTC)

Universal Code of Conduct News – Issue 1Edit

Universal Code of Conduct News
Issue 1, June 2021Read the full newsletter


Welcome to the first issue of Universal Code of Conduct News! This newsletter will help Wikimedians stay involved with the development of the new code, and will distribute relevant news, research, and upcoming events related to the UCoC.

Please note, this is the first issue of UCoC Newsletter which is delivered to all subscribers and projects as an announcement of the initiative. If you want the future issues delivered to your talk page, village pumps, or any specific pages you find appropriate, you need to subscribe here.

You can help us by translating the newsletter issues in your languages to spread the news and create awareness of the new conduct to keep our beloved community safe for all of us. Please add your name here if you want to be informed of the draft issue to translate beforehand. Your participation is valued and appreciated.

  • Affiliate consultations – Wikimedia affiliates of all sizes and types were invited to participate in the UCoC affiliate consultation throughout March and April 2021. (continue reading)
  • 2021 key consultations – The Wikimedia Foundation held enforcement key questions consultations in April and May 2021 to request input about UCoC enforcement from the broader Wikimedia community. (continue reading)
  • Roundtable discussions – The UCoC facilitation team hosted two 90-minute-long public roundtable discussions in May 2021 to discuss UCoC key enforcement questions. More conversations are scheduled. (continue reading)
  • Phase 2 drafting committee – The drafting committee for the phase 2 of the UCoC started their work on 12 May 2021. Read more about their work. (continue reading)
  • Diff blogs – The UCoC facilitators wrote several blog posts based on interesting findings and insights from each community during local project consultation that took place in the 1st quarter of 2021. (continue reading)


placement of inline synonymsEdit

Do inline synonyms/antonyms as specified using {{syn}}, {{ant}}, etc. go before or after usage examples? Benwing2 (talk) 01:39, 12 June 2021 (UTC)

Before, I think. Imetsia (talk) 02:19, 12 June 2021 (UTC)
It was left unregulated, I discovered shortly after the vote introducing them and you probably have read: But there one has argued for before. Which I now also prefer mostly because otherwise the quotes push away the semantic relations on expansion but you would like the synonyms and company near the definition to even understand the definition or you wonder where they went. Fay Freak (talk) 03:01, 12 June 2021 (UTC)
Given that almost everyone in that discussion wanted them placed before usage examples, and I agree, and WT:ELE agrees as well, I've changed the documentation of all inline *nyms to indicate that they go before usage examples. Benwing2 (talk) 04:23, 12 June 2021 (UTC)

Pinyin capitalizationEdit

@Justinrleung, Suzukaze-c, Tooironic, 沈澄心 Should the pinyin of the names of ethnic groups be capitalized here on Wiktionary? In 現代漢語詞典 they are capitalized even though they are classified as nouns. An example is Hànzú for 漢族. The same is true for 漢人, 漢語 and 漢字, but not 漢姓. RcAlex36 (talk) 11:25, 12 June 2021 (UTC)

Thanks for the ping. I don't have an opinion on this matter. ---> Tooironic (talk) 11:33, 12 June 2021 (UTC)
Also @Frigoris, Bula Hailan. RcAlex36 (talk) 12:02, 12 June 2021 (UTC)
@RcAlex36 See this link 汉语拼音正字法基本规则, in section 6.3.2, the Pinyin transcription of 景頗族 is capitalized. Bula Hailan (talk) 12:10, 12 June 2021 (UTC)
I don't think 現代漢語詞典 has a separate label for proper nouns, but even so, I think these should be capitalized. — justin(r)leung (t...) | c=› } 16:04, 12 June 2021 (UTC)
@Suzukaze-c, 沈澄心 Any opinion on this? RcAlex36 (talk) 05:28, 13 June 2021 (UTC)
I personally support capitalization but do not care if someone supports the opposite. —Suzukaze-c (talk) 05:31, 13 June 2021 (UTC)
With no editor opposing capitalization I will go ahead and capitalize the pinyin of words in question. RcAlex36 (talk) 09:24, 14 June 2021 (UTC)
@Atitarev made this change ([1]); I'm pinging that editor in case they have a comment. I don't have an opinion on it, but I will note that Xiandai Hanyu Guifan Cidian has no capitalized pinyin forms, but Xiandai Hanyu Cidian does (as noted above). --Geographyinitiative (talk) 12:32, 14 June 2021 (UTC)
Note section 6.3.3 in which the pinyin for 漢語 is written as Hànyǔ. Also, Xiandai Hanyu Cidian, being the primary prescriptive standard for Standard Chinese, should be considered more authoritative than Xiandai Hanyu Guifan Cidian in my opinion. RcAlex36 (talk) 14:19, 14 June 2021 (UTC)
@RcAlex36: I failed to understand when to capitalize. 粤语 is transcribed as Yuèyǔ in 汉语拼音正字法基本规则 (2012) but yuèyǔ in Xiandai Hanyu Cidian (7th edition). -- 09:10, 22 June 2021 (UTC)

Links to English should be preferred to links to pageEdit

WingerBot is making entries worse by changing {{l|en|a}} in definitions to [[a]]. The former is superior because it links to the intended definition, not to the top of a large page that happens to contain a definition. See Special:diff/62729111. Compare a to a. I have to scroll down 22 pages to reach the English section if I follow the page link. Vox Sciurorum (talk) 11:35, 12 June 2021 (UTC)

I'm not sure the first is superior, since it also uses Lua (which can be of significance in some pages). Also, the English section is always the first or second one on the page, so you can click it in the contents box. That said, I'm not sure machine-changing these is a good idea without consensus, since I can imagine some editors prefering the former style over the latter. pinging @Benwing2 Thadh (talk) 11:59, 12 June 2021 (UTC)
It may be tedious doing it by hand, but a bot should have no trouble using [[a#English|a]] to replace {{l|en|a}}. Chuck Entz (talk) 17:42, 12 June 2021 (UTC)
@Vox Sciurorum This has been discussed before and I think people were generally in favor of raw links for English. Generally this is how people enter the definitions anyway; it's annoying to enter templated links everywhere when creating definitions by hand (which is how it has to be done). The vast majority of pages for English words don't have large tables of contents at the top, and the English definition is almost always the top definition, so usually it's not an issue. If this is really an issue, we can use templated links only for the pages with large tables of contents. Furthermore, most of the time words like a aren't even linked in definitions; how many times do you need to check the definition of a word like this anyway? Benwing2 (talk) 18:01, 12 June 2021 (UTC)
I think plain wikilinks work in definitions, etymologies, etc, for Translingual terms as well, whether CJKV or taxonomic, except inside certain templates like those in the {{der}} family. DCDuring (talk) 18:32, 12 June 2021 (UTC)

Convert Italian noun plural forms to noun formsEdit

Plurals are the only possible non-lemma forms of nouns in Italian, so there's really no point in having a category Category:Italian noun plural forms distinct from Category:Italian noun forms. For this reason, I plan to run a bot to convert all Italian 'noun plural forms' to plain 'noun forms' and remove the category Category:Italian noun plural forms. This would make Italian work like English and Spanish (which likewise have only plural non-lemma noun forms, which are placed in the 'noun forms' category directly). The same thing should be done in French. Benwing2 (talk) 01:31, 13 June 2021 (UTC)

Convert English proper noun plural forms to proper noun formsEdit

Per above and Category talk:English noun plural forms § RFM discussion: March–May 2019. J3133 (talk) 01:45, 13 June 2021 (UTC)

How do we handle the interaction with the clitic -'s? In many ways, it still works like a case form. --RichardW57m (talk) 11:48, 14 June 2021 (UTC)

We only have a small handful of pages that are noun forms but not noun plural forms, and the change was already made for English; Italian doesn't have this issue. I support moving the Italian, French, and English proper noun categories. However there are still 179 categories in Category:Noun plural forms by language. Should they all be moved to "X noun forms" even if they have case systems? No one has complained so far about the 51,671 pages in Category:German noun forms versus the 174 in Category:German noun plural forms. Ultimateria (talk) 17:21, 14 June 2021 (UTC)
@Ultimateria I don't think there should be anything in Category:German noun plural forms. In general, "noun plural forms" doesn't really make sense for languages with case because there usually isn't a single plural noun form. German is a partial exception in that nouns with plurals in '-n' and '-s' have the same form for all cases, but I still don't see the point of a 'noun plural forms' category there. Benwing2 (talk) 04:03, 15 June 2021 (UTC)
@Ultimateria, Benwing2, J3133 It makes some sense when the plural has a separate stem, as most notably in Semitic languages. However, it is these stems that one would want to cpature. --RichardW57m (talk) 11:30, 15 June 2021 (UTC)

Dutch/Afrikaans noun plural formsEdit

I'd like to make the same change to Dutch and Afrikaans noun plural forms. These languages are similar to English and Romance languages. Dutch does have some archaic case forms, but these are all segregated into CAT:Dutch noun case forms. Benwing2 (talk) 04:45, 25 June 2021 (UTC)

Italian numbers as adjectivesEdit

It appears that all Italian cardinal numbers are listed as both numerals and adjectives. The way it seems to have gotten this way is that User:SemperBlotto made all Italian numbers be marked as both nouns and adjectives around 2008, and User:Ultimateria converted the nouns to numerals in 2020, leaving the adjectives. I don't believe "adjective" is a correct POS and am planning on deleting the adjective POS from all of the numbers. Benwing2 (talk) 00:57, 14 June 2021 (UTC)

By all means delete them. Ultimateria (talk) 01:14, 14 June 2021 (UTC)
Agreed, although I wouldn't bother for some of the larger numbers. Many of the cardinal-number entries are in the process of being deleted outright (see the category talk page). So I don't think we should waste time first editing the categories for entries that will be deleted soon anyways. Although I do invite other admins to continue the work of deleting that large mass of cardinal numbers per our previous vote. Imetsia (talk) 01:21, 14 June 2021 (UTC)
numbers are considered adjectives in Italian though. Why would Adjective be wrong? Sartma (talk) 05:27, 14 June 2021 (UTC)

Wikimania 2021: Individual Program SubmissionsEdit

Dear all,

Wikimania 2021 will be hosted virtually for the first time in the event's 15-year history. Since there is no in-person host, the event is being organized by a diverse group of Wikimedia volunteers that form the Core Organizing Team (COT) for Wikimania 2021.

Event Program - Individuals or a group of individuals can submit their session proposals to be a part of the program. There will be translation support for sessions provided in a number of languages. See more information here.

Below are some links to guide you through;

Please note that the deadline for submission is 18th June 2021.

Announcements- To keep up to date with the developments around Wikimania, the COT sends out weekly updates. You can view them in the Announcement section here.

Office Hour - If you are left with questions, the COT will be hosting some office hours (in multiple languages), in multiple time-zones, to answer any programming questions that you might have. Details can be found here.

Best regards,

MediaWiki message delivery (talk) 04:18, 16 June 2021 (UTC)

On behalf of Wikimania 2021 Core Organizing Team

Swedish common genderEdit

Like the standard language, the swedish noun entries on wiktionary have two genders common/neuter. However, since almost every (traditional) dialect has the three masculine/feminine/neuter, it would be better to split the common gender template into something like c/f and c/m so both dialects and standard language would be accommodated equally in this aspect. There are also a few nouns that have different gender in different dialectareas. —⁠This unsigned comment was added by ASkyr (talkcontribs) at 09:40, 18 June 2021 (UTC).

This is a good suggestion, and one that I myself have been thinking of making. There are still certain noun classes that have a strong connotation with feminine or masculine gender, for instance the nouns with -a in singular and -or in plural are historically feminine, and intuitively seen by natives as such, while the nouns with -e in singular and -ar in plural are likewise, but masculine.
Not to mention that the distinction between masculine and feminine was still largely existing in the written language of the 1600s and 1700s, which counts as Swedish and thus, being attested, should be included in the dictionary. I should add that SAOB, the Dictionary of the Swedish Academy, also lists nouns as "r. l. f." (common or feminine) and "r. l. m." (common or masculine), rather than only "r." (common) Mårtensås (talk) 18:58, 20 June 2021 (UTC)
I think this would be useful information, whether we put it in the headword line or at least in declension tables, and given what's been said above about how it's necessary for describing not only dialects but the early modern language, and how another major dictionary includes it, I'm inclined to include it. German entries sometimes mention a noun's varying gender in "dialects" via usage notes, but the number of Swedish nouns where this information would be applicable seems so high that it should go somewhere more "regular", like the headword line or declension table. (I suppose there may be some modern coinages/borrowings which don't have a traditional gender besides common, though, yeah?) - -sche (discuss) 21:16, 20 June 2021 (UTC)
I already added support for it so that it looks like how SAOB handles it, see jord (it reads like " jord c or f "). Mårtensås (talk) 12:06, 21 June 2021 (UTC)

{{bor+}} and {{inh+}}Edit

Despite the failed vote, Creation of Template:inh+ and Template:bor+, @SodhakSH (and Brutal Russian) went ahead and created the templates {{bor+}} and {{inh+}}. They should be deleted and locked to prevent them from being recreated. @Metaknowledge, Thadh, Donnanz, Fenakhay, DannyS712, Fay Freak, -sche, Jberkel, Mahagaja --{{victar|talk}} 18:46, 18 June 2021 (UTC)

I agree, but this seems like a discussion for WT:RFDO, not the Beer Parlor. —Mahāgaja · talk 19:07, 18 June 2021 (UTC)
RFD created as well. --{{victar|talk}} 20:36, 18 June 2021 (UTC)
I think lengthy case presentings shouldn't be part of RFDs, so I'm going to give mine here, where it seems to be tolerated.
These templates' function is to make editing easier for a small number of editors (by name, mostly, but probably not exclusively, SodhakSH, Inqilābī and Brutal Russian), and arguably create a regular wording for etymology sections (although I, as well as others, dispute that). The necessity of the text that is now being displayed using these templates is very much disputed to a degree that a supermajority (13 to 5) has voted to abolish giving the text within the {{bor}} template. Of course, one could argue that adding the template {{bor+}} isn't contradictory to that vote, but seeing as the vote concerning adding bor+ also failed I wouldn't be so certain of that.
Some bring up the issue that AryamanA voted just past the time and thus failed to make the difference, but seeing as PUC was also going to vote oppose (mind you, an oppose vote is worth twice a support), the vote would have failed anyway.
Now, some have brought up that a new template's creation shouldn't need any vote, but I'd argue that since this template is one in a series of arguably most used templates (after {{head}}, {{l}} and {{m}}), any creation of a template that takes over a part of or even the whole function of {{bor}} or {{inh}} should get a vote, which it did in this case, and would have even without the initiative of the template's advocates.
So, to reiterate, the proposal to create these templates was turned down in a democratic process that is of the highest form we have. An RFD discussion is of no value, since it's not as important as a vote, and as such I, and anyone who agrees with me, plead to the administrators of this project to delete these templates, lock them and either create another vote (which I personally would find absurd, since we just finished this one, and we are currently not in the season where the majority of Wiktionary editors is regularly editing), or just ban the creation of such templates until a supermajority of Wiktionary editors actually agrees that this template should be created. I thank you for your time.@Victar, SodhakSH, Inqilābī, Brutal Russian, Mahagaja, Fenakhay, Imetsia, Benwing2, PUC Thadh (talk) 11:09, 21 June 2021 (UTC)
There’s no question of having them ‘deleted & locked’. From the vote it is clear that a supermajority was in favour of the templets, only Time kept the vote from officially passing. Many/Most of the supporters (and even an opposer, Metaknowledge) agree that this vote was not needed at all: and what’s more, especially because these are harmless templets with no differing functionality, they should be kept. As Lambiam rightly pointed out, we have lots of unnecessary templets but we keep em. Now, the reasoning that Canonicalization could have also cast his vote is not a whit a good justification for the claim that the vote would have anyway not passed, forasmuch as all editors were not aware of the vote, and if the vote were to be prolonged, more people would have cast their vote. Also, {{inh+}} & {{bor+}} are overlapping templets in the sense that they would only be used initially in the etymology section, but also generally and not always, depending upon the preference of the editor. It is in fact more democratic to have overlapping templets; to not let using them is authoritarian. All etymology templets (save {{com}}) not only produce the full wording but also have the keywords linked to Glossary, thus ’tis only natural that {{inh}} & {{bor}} should do likewise; but seeing as these two templets are used very often, two new templets had to be deviced (at first I was thinking of using parameters, but that idea was rejected owing to the perceived unwieldiness thereof, hence having the new templets is the best possible choice). Has any of the opposers a better solution to’t? ·~ dictátor·mundꟾ 13:52, 21 June 2021 (UTC)
The difference between {{inh}}/{{bor}} and any other of the etymology templates is that "inherited", "borrowed" and "from" are pretty self-explanatory; calque, semantic loan and onomatopoeia not so much, and so the template provides the link, so that editors don't have to link to the glossary manually, and the readers don't have to know every lexicographical term to read the dictionary. For the two templates in question, it's simply not necessary, and the new templates don't even link to the glossary.
For what it's worth, it seems a little silly to have such a discussion over two templates, I give you that, but I think that templates that are used on a day-to-day basis in all languages in a certain way shouldn't be replaced by another template just like that, especially against a held vote. Also, about the supermajority: I have already adressed that, PUC said they would have voted against, making it again a non-supermajority. Thadh (talk) 14:30, 21 June 2021 (UTC)
No, they do, I already announced that before. I said that more people would have voted if the vote had lasted beyond a month, there would have been more supporters as well, please do not use Canonicalization as a distraction. In many language families, templets like {{lbor}} are used as often as {{bor}} & {{inh}}; so the needlessness of the etymological text is unjustified. Also, your claim that ‘The necessity of the text that is now being displayed using these templates is very much disputed to a degree that [] ’ is not any premise: during that time {{etyl}} reigned supreme, and people just wanted to have consistency in line with the few other existing/utilised templets. This time again, we are advocating consistency. ·~ dictátor·mundꟾ 15:00, 21 June 2021 (UTC)
The point is that we don't know what would have happened if the vote had lasted longer. Maybe people would have flocked to support it and it would have passed with flying colours, or maybe it would have failed miserably, or maybe the result would have been exactly the same: failing by a narrow margin. The same way that my non-casted vote is irrelevant, Aryaman's late vote is irrelevant. What matters is the actual result. So, by all means, make your case, but drop that argument, which weakens it. 212.224.224.150 15:12, 21 June 2021 (UTC)
You're right about the linking, I'm sorry, I vaguely remembered looking at the new templates and not seeing these, but it turns out I was wrong. I've slashed that comment. Thadh (talk) 15:41, 21 June 2021 (UTC)
@Thadh: I don't think you get it right when you say the utility of showing the text is disputed in the old vote. People voted to remove the text because it had to be constantly deleted using the parameter, because the templates are constantly used in other positions than line-initial, and this was cumbersome - not because the text was useless (ctrl+F only finds 1 'usef' and no 'util'). The two new templates have been created in order to make it less cumbersome to display full and unambiguous etymological statements line-initially, and because adding an optional parameter to {{inh}} and {{bor}} would have resulted in the same cumbersomness as the old vote had banished. Both votes were designed along the same goal of making life easier for everyone, and I find it difficult to understand the position of people who argue against making life easier on procedural grounds of a vote that didn't pass by 1 and that wasn't even required in the first place. This seems less like democracy and more like make-yourself-feel-good bureaucracy. I will also add that a precedent for these templates was already present as {{m+}}, which takes over the functionality of {{m}} and of very marginal utility indeed, albeit created full 4 years ago.—When you mention "a small number of editors", naming only three, are you doing it after having read through all the support votes? Brutal (talk) 00:01, 22 June 2021 (UTC)
@Brutal Russian: Not everyone that voted support is in favour of using these templates, they're - from my understanding - just not against the creation of the templates for the people that want to use them.
If showing the text wasn't the issue, then a creation of the parameter |t=1 would've done the trick, but the thing is that not everyone likes to write "Borrowed/Borrowing" (although I always do write it) before an etymology, and it seems like almost no-one wants to systematically put "Inherited from" before every inheritance. The vote has shifted a long time ago from "Should we actually create and use these specific templates?" to "Should everyone be able to create any template without a vote?". I personally believe that the creation of these templates will lead to the standardisation of their usage (since the community always strives to a unified style for as much as possible), and since I neither like them in action nor think them useful, I oppose this development.
For {{m+}} it's different, because the usage of the template is very marginal and mostly handy for the reason that some languages' names are just too darn difficult to remember (try typing out "Xârâcùù" or "Babine-Witsuwit'en" with a plain keyboard without copy-pasting in a casual mention), but I wouldn't be too sad if it were deleted because a majority of the community doesn't like it. Thadh (talk) 10:37, 22 June 2021 (UTC)
@Thadh: ‘Not everyone that voted support is in favour of using these templates, they're [] just not against the creation of the templates [] .’: That’s clearly a misrepresentation, just behold the support votes; what you said best describes the abstain votes. The issue with the vote was hardly the text in itself but the creation of new templets. The new templets would be very helpful for {{der}} cleanup: one would not have to see the wikitext or scroll down to the categories to check if {{inh}} or {{bor}} has been used. And the display of the keywords (they are linked to the glossary) is nothing wrong: all kinds of etymologies are equally significant. If you do not like the templets, do not use em, but please allow the templets’ backers to use them. Tolerance of harmless things is the best policy. ·~ dictátor·mundꟾ 11:44, 24 June 2021 (UTC)
You may still find it dangerous (healthwise) to 'clean up' "Inherited from {{der|pi|sa|...}}". There are quite a few words where there has been a morphological change between Sanskrit and Pali, but the relationship is not one of borrowing. --RichardW57m (talk) 15:38, 24 June 2021 (UTC)
@RichardW57: (1), (2). Nothing will be ‘dangerous’ as I am against a bot normalisation. ·~ dictátor·mundꟾ 11:59, 25 June 2021 (UTC)
@RichardW57: Could you explain your example? Is it appropriate that the template and category disagrees with the text? Which should be corrected to which, or how this should otherwise be resolved? Please give an actual example word or two if possible. Brutal Russian (talk) 19:52, 26 June 2021 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I'm glad someone has noticed the clash between the wording and category. But we were assured above that everyone understands 'inherited'.

  1. The example that set me thinking is the alleged inheritance of Pali candimā (moon) from Sanskrit चन्द्रमस् (candramas). Geiger sees the first components (candi- and candra-) as being different alternatives in the Caland system, so if the Pali descends from the Sanskrit form, there has been a morphological change. Given the characteristics of the Caland system, I'm now inclined to see the first element in Pali as more ancient, with a strong possibility of the Sanskrit form being the later formation, and reject the notion of the Pali form descending from the Sanskrit form, so this word might not actually be an example. If we ignore the discrepancy in the initial forms, we have the issue that apart from the nominative, the singular of the Pali word is now declined from the stem candima - it has been morphologically restructured to include a thematic vowel, rather than being a consonant stem. --RichardW57 (talk) 21:56, 26 June 2021 (UTC)
  2. At first sight, a better example is the Pali doublet sumedha (wise) and sumedhasa, which are now thematic adjectives that Geiger derives from the consonant stem seen in Sanskrit सुमेधस् (sumedhas). The former Pali word is indistinguishable from a bahuvrihi on Pali medhā (wisdom), so is arguably a morphological restructuring, disqualifying the use of {{inh}}.
  3. Another awkward case is the current etymology of Latin nurus - "From {{inh|la|itc-pro|*snuzos}}, from {{inh|la|ine-pro|*snusós}}". I'm not sure that the second 'inh' is valid in this sequence at all. The problem is that the Latin word is 4th declension, not 2nd declension. There's been some morphological restructuring there.
  4. @Brutal Russian: A good example of arguable morphological change from Sanskrit to Pali is Pali ojā f from Sanskrit ओजस् n (ójas). --RichardW57m (talk) 13:09, 28 June 2021 (UTC)

Categorization botEdit

I would like to create a bot to populate Category:English three-letter words (using the method outlined by User:Thryduulf) and Category:English terms with multiple etymologies (by checking for multiple etymology sections), and any other under-populated categories I find. Are these categories worth populating? Is there a better way to populate them than by using a bot?

I have experience with Python and from what I've seen I think I could make the bot using Pywikibot fairly easily. —TeragR disc./con. 02:01, 19 June 2021 (UTC)

Yeah, this seems like it could be done relatively easily. Though I am no competent expert, I'd say populating the categories could well be worthwhile (there are all kinds of categories out there, after all, many of which have less utility than this proposal), and a bot would probably be the best way to do it. Moreover, this kind of edit should be minor and easy enough to achieve, right? Have you already begun working on it? Support your idea :) Kiril kovachev (talk) 16:42, 20 June 2021 (UTC)
Yes, the edits would be simple and minor. I wanted to ensure there was consensus before writing the code, so I have not started that yet. —TeragR disc./con. 22:30, 20 June 2021 (UTC)
Personally I don't think the categories are necessary, but I don't strongly oppose the idea. In other discussions, users have expressed concern about "category bloat", i.e., scrolling to the bottom of a page and seeing dozens and dozens of categories which make it hard to find what you're looking for. I'm surprised more active editors haven't responded here, but don't be surprised if you face some opposition after you start adding these categories. Ultimateria (talk) 01:25, 23 June 2021 (UTC)
Thank you for mentioning that. IMO the difficulty in finding one category in a large block of categories sounds like a UI issue (though admittedly not one I'm volunteering to try to fix). I don't mind if consensus says the category is unuseful, but I would like to see it deleted in that case. —TeragR disc./con. 04:28, 23 June 2021 (UTC)
I'm surprised that "three-letter words" is manually populated, since it could be trivially added by {{head}} or the like (although I would also be wary of feature creep). —Suzukaze-c (talk) 02:43, 23 June 2021 (UTC)
Good to know! In that case perhaps this is not a job for a bot (but rather for a template). —TeragR disc./con. 04:30, 23 June 2021 (UTC)
@TeragR User:Suzukaze-c is right, there is no point in using a bot to add categories like "N-letter words". Category:English terms with multiple etymologies is harder to do by template and involves an expensive operation (reading the page text), which isn't warranted on all English pages, although I'm not sure we'd want it auto-added by bot to all pages (that would require some consensus). Benwing2 (talk) 04:40, 25 June 2021 (UTC)

Please provide input here or on Meta and during an upcoming Global Conversation on 26-27 June 2021 about the Movement Charter drafting committeeEdit

Hello, I'm one of the Movement Strategy and Governance facilitators working on community engagement for the Movement Charter initiative.

We're inviting input widely from users of many projects about the upcoming formation of the Movement Charter drafting committee. You can provide feedback here, at the central discussion on Meta, at other ongoing local conversations, and during a Global Conversation upcoming on 26 and 27 June 2021.

"The Movement Charter drafting committee is expected to work as a diverse and skilled team of about 15 members for several months. They should receive regular support from experts, regular community reviews, and opportunities for training and an allowance to offset costs. When the draft is completed, the committee will oversee a wide community ratification process." (Creating the drafting committee)

Further details and context about these questions is on Meta along with a recently-updated overview of the Movement Charter initiative. Feel free to ask questions, and add additional sub-sections as needed for other areas of interest about this topic.

If contributors are interested in participating in a call about these topics ahead of the Global Conversation on 26 and 27 June, please let me know. Xeno (WMF) (talk) 16:48, 19 June 2021 (UTC)

The three questions are:

  1. What composition should the committee have in terms of movement roles, gender, regions, affiliations and other diversity factors?
  2. What is the best process to select the committee members to form a competent and diverse team?
  3. How much dedication is it reasonable to expect from committee members, in terms of hours per week and months of work?

Change "Korean Language Family" to "Koreanic Language Family"Edit

As mentioned on the talk page Module talk:families/data a few months ago, in line with the current nomenclature, the canonical name of the language family that includes Korean, Jeju, and extinct languages from the Korean peninsula should be "Koreanic" and not "Korean." Glottolog, Wikipedia, Wikidata, recent literature & research, along with Wiktionary's own entry for Koreanic, list it as the title for the language family; thus, the change should be made officially in Module:families/data. AG202 (talk) 18:04, 19 June 2021 (UTC)

Support--Tibidibi (talk) 13:59, 20 June 2021 (UTC)
Pinging Korean editors. (Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, B2V22BHARAT, Quadmix77, Kaepoong, Omgtw15): Fenakhay (تكلم معاي · ما ساهمت) 16:06, 20 June 2021 (UTC)
Support, seems like there's no reason not to change this. Kiril kovachev (talk) 16:34, 20 June 2021 (UTC)
SupportOmgtw15 (talk) 18:15, 20 June 2021 (UTC)
Support: Koreanic is rare but established in the linguistic circle. — TAKASUGI Shinji (talk) 02:31, 21 June 2021 (UTC)
Pinging @幻光尘 since it was originally their suggestion in Module_talk:families/data that spurred this proposal here. Thank you! AG202 (talk) 07:48, 21 June 2021 (UTC)
Not my area but having just had a little conversation with AG202: can this be implemented without a vote? Equinox 01:27, 7 July 2021 (UTC)

{{slim-wikipedia}}Edit

I propose that only {{swp}} be used in dictionary entries to link to the corresponding Wikipedia page. Some editors do be converting instances of {{wp}} to {{pedia}}; but the problem therewith is that the latter templet is put beneath ===Further reading===: which is but a wrong practice! The heading is meant only for references (non-inline ones), while Wikipedia (or any other sister project) cannot technically be used as a reference. Therefor, {{pedia}} should be deprecated; and so should be {{wp}}, for it has fallen out of favour with many users. Thus, {{swp}} should be the only templet available for linking to encyclopedia articles. ·~ dictátor·mundꟾ 01:54, 20 June 2021 (UTC)

Wiktionary:Entry layout § Further reading (this was voted): “This section may be used to link to external dictionaries and encyclopedias, (for example, Wikipedia, or 1911 Encyclopædia Britannica) which may be available online or in print.” The claim it is wrong is a fallacy as you redefined it (“meant only for references”); rubbish from Inqilabi is not surprising. J3133 (talk) 02:43, 20 June 2021 (UTC)
I would further note that the following paragraph from there:
This section is not meant to prove the validity of what is being stated on the Wiktionary entries (the “References” section serves that purpose).
I have frequently put sources as to the gender of words under "References", though as I have been told (it felt more like an instruction) that 'we do not use inline references', I didn't even struggle to make the source of the gender explicit. It has annoyed me to see References changed to Further reading - I now learn that these changes were actually damage. --RichardW57 (talk) 13:51, 20 June 2021 (UTC)
I have personally never used {{swp}} and always use {{wp}}, and see it being used all the time, so I don't understand what you mean by it being "fallen out of favour". Thadh (talk) 11:17, 20 June 2021 (UTC)
@J3133: By encyclopedia I was referring to Wikipedia; WP cannot be used as a reference in dictionary entries, as it is only a sister project. That’s why we have the templets {{wp}} and {{swp}} to link to the WP page. (By the way, you can make personal attacks at me, but try to spell my name aright.) ·~ dictátor·mundꟾ 14:45, 20 June 2021 (UTC)
@RichardW57: ===Reference=== is used only for inline sources, whilst ===Further reading=== for non-inline sources. Otherwise both are the same. (See this entry for an example of an approximate use of the headings.) ·~ dictátor·mundꟾ 14:45, 20 June 2021 (UTC)
@Thadh: I myself have used* both {{wp}} and {{swp}}, but as I said, it has fallen out of favour with some editors (not I) as you saw in the diff. [* But lately I have been using only the slimmer version because some editors do be substituting {{wp}} with {{pedia}} (which I see as counterproductive).] ·~ dictátor·mundꟾ 14:45, 20 June 2021 (UTC)
“WP cannot be used as a reference”: try to read again, “Further reading” is “not meant to prove the validity of what is being stated on the Wiktionary entries (the “References” section serves that purpose)” (thus “References” and “Further reading” are not “the same”). Also that was a note of you making your own rules (as when you thought you can add new templates and new parameters and insisted no one can delete them, as was proved otherwise, before your ad hominems). “It has fallen out of favour with some editors”: Irrelevant, as {{wp}} is used by more editors, thus the opposite can be argued. J3133 (talk) 14:48, 20 June 2021 (UTC)
Here is point 5 (which passed) of the vote (Wiktionary:Votes/2016-12/"References" and "External sources") that implemented the difference between “References” and “Further reading” (then “External sources”, later renamed) before the naive Inqilabi[sic] makes a fool of himself:
Allowing the usage of "External sources" only in cases where other dictionaries and encyclopedias (including Wikipedia) are listed as suggestions of places to look, without serving as proof for specific statements in the entry.”
Wikipedia is explicitly mentioned as being allowed. J3133 (talk) 15:07, 20 June 2021 (UTC)
But in reality no one has objected to @Jberkel and others’ continual substitution of {{wp}} with {{pedia}} (I for one definitely think it’s counterproductive). You mostly edit English entries, so you may not be aware of the actual usage of ===References=== & ===Further reading===. And the vote mentions Wikipedia only because {{pedia}} was formerly used beneath ===External sources===; after this heading was renamed to ===Further reading===, the vote was not officially updated, but per our prevalent practice, WP cannot be used beneath ===Further reading===. (Note that our English entries, being the oldest ones here, are the least updated.) ·~ dictátor·mundꟾ 15:18, 20 June 2021 (UTC)
“per our prevalent practice, WP cannot be used beneath ===Further reading===”: You mean your practice, as this is not our practice for most of us. Others do not support this new practice you and presumably some others use (and thus this proposal will likely fail). J3133 (talk) 15:20, 20 June 2021 (UTC)

New iOS app based on Wiktionary - VedaistEdit

Hello! I wanted to introduce a new iOS English dictionary app that is based on Wiktionary data. Please check http://www.vedaist.com/ if you're interested. I currently show a very minimal meaning of a word. Noun, verb, adjective or adverb sections for English meanings are shown. Over time I'll be adding more features.

I wanted to acknowledge the great work all of you have put into building Wiktionary, and making it possible for me to build on top of your work. If there is any feedback for me, please reach out. Thanks! —⁠This unsigned comment was added by Toucanvs (talkcontribs) at 08:31, 20 June 2021 (UTC).

Shouldn't you acknowledge Wiktionary on the app pages, to conform to the CC BY-SA 3.0 License?  --Lambiam 11:29, 20 June 2021 (UTC)
I have an acknowledgement in the Settings > Dictionary section with license information there. Or did you mean something else @Lambiam. Toucanvs (talk) 00:18, 22 June 2021 (UTC)
@Lambiam the latest version of the app now credits wiktionary and has a cc license link on each word page. Toucanvs (talk) 10:59, 28 June 2021 (UTC)
ps, thanks for pointing this out @Lambiam. I think the current version I have meets the CC license requirements. The text is "Dictionary content from Wiktionary under the Creative Commons Attribution-ShareAlike license" Toucanvs (talk) 11:13, 28 June 2021 (UTC)
I think that will do, but is it possible to hyperlink to the precise license, e.g. as follows: "Dictionary content from Wiktionary released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike License</a>">?  --Lambiam 12:30, 28 June 2021 (UTC)
Oh yes. That's been done. Both Wiktionary and the CC license are links. Toucanvs (talk) 12:58, 28 June 2021 (UTC)
I don't have iOS, so I can't give much feedback on that front, but indeed, it would be good to credit the Wiktionary authors for your uses. ^^ I am a big fan of the adless, trackerless paradigm, though, so I hope that's something you'll never change :) Kiril kovachev (talk) 16:31, 20 June 2021 (UTC)
@Kiril kovachev I updated the attribution to be per entry in addition to the settings. See the 3rd screenshot in https://apps.apple.com/us/app/vedaist-english-dictionary/id1572821331.
I would like to add user based features that would require login in the future, but that's different from tracking for ads. Toucanvs (talk) 11:05, 28 June 2021 (UTC)
@Toucanvs Looks good to me, good luck with your development! ^^
@Toucanvs: please add it to Wiktionary:Wiktionary-supported software (don't mind the deletion notice). – Jberkel 06:50, 21 June 2021 (UTC)
done @Jberkel Toucanvs (talk) 00:18, 22 June 2021 (UTC)
General question to all editors: how would your ideal attribution look like? As an example, when exporting a text from Wikisource, the generated document contain a list of contributors ordered by edits. – Jberkel 09:52, 23 June 2021 (UTC)
@Jberkel have a look at my replies on this thread for the changed attribution per entry. Toucanvs (talk) 11:10, 28 June 2021 (UTC)
Yes, but that's the bare minimum, hence my asking what an "ideal" attribution could look like. – Jberkel 13:47, 28 June 2021 (UTC)

Request for bot consensusEdit

Hello, I have recently been in the process of developing a bot for the purposes of auto-generating derived form entries based on Bulgarian noun conjugation tables. To put it simply: the bot fishes for declension tables across all noun lemmas fed to it, takes note of what derived forms come from what lemmas, and creates definitions based on that data. Once generated, the definitions are either appended to existing entries, as long as they don't already have a Bulgarian section, or a new page is created containing the entry it just generated.

The edits I have already made under my own account using the bot can be viewed here: diff, diff, diff, diff, diff

(apologies, I don't know how to link these properly)

My GitHub repository is linked here - if you have Python knowledge, please run through if you're interested and see if you can spot any bugs. There are a few other resources in there, such as one to help to understand the program more clearly, and some sample output data to show what the edits the bot's making would look like. There are also instructions as to how to run the bot yourself if you wanna trial it out and look for problems.

I additionally wrote a few paragraphs describing the method on my bot's user page, which, if you have any objections to, please let me know once again. If all goes well, I'll post a vote to get the bot approved sometime soon.

If you have any questions or doubts, please ask me for answers, and I will do my best to respond well. One final thing - understandably, few people on here, if any, will have experience with both the Bulgarian language and Python, so - if no one cares, I'll just apply to votes within a few days. However, if there does happen to be anyone interested who could audit for any mistakes, I would be highly grateful for your help. Thanks for reading!

Kiril kovachev (talk) 16:28, 20 June 2021 (UTC)

@Kiril kovachev: I fixed your links for you. Chuck Entz (talk) 16:49, 20 June 2021 (UTC)
@Chuck Entz Thanks ^^ Kiril kovachev (talk) 17:07, 20 June 2021 (UTC)
I don't like new pages being created when there's no evidence that the word has existed in any language. Some inflected forms are best left hiding in the inflection tables. --RichardW57 (talk) 18:50, 20 June 2021 (UTC)
@RichardW57 Surely there's warrant enough in the fact that those forms theoretically can exist? If the lemmas exist in the first place, then the declension tables only show a specific use case of the word. I can guarantee you that if anyone's using the lemma, then they're surely also using the definite form (effectively the word "the"), or the plural form, or whatever. Admittedly, I have no way of gathering evidence for each form as to whether it 'exists' in the wild or not, but declension tables are created with the fact that not all words have a "vocative" or a "plural" in mind, so it won't be creating any purely theoretical forms. Though it makes no sense for completely ordinary words not to have an attested plural, no? The editors that created those tables exclude forms that are obviously non-existent by using the template.
@Kiril kovachev That should work for words that the authors know from their own experience, but I do note that телефон (telefon) is given as an example of a word without a vocative, only to find that its declension table gives it one.--RichardW57m (talk) 12:35, 23 June 2021 (UTC)
The rationale for having declined form entries at all is that it helps readers of a language discover a lemma without necessarily knowing how words in a language conjugate. They can look up an inflected form and wind up on the lemma straight away like that. I felt inspired to make this, by the way, because of Latin declension tables. Check out zodiacus, for example - surely you don't mean to tell me all of these forms have been individually sourced? Kiril kovachev (talk) 16:56, 22 June 2021 (UTC)
A great many Latin word forms actually have quotes. However, I do think that Latin forms are overdone, and to some extent are counter-productive. If one looks up an inflected form, and there is no entry for it, one may yet find it by looking for a word that links to it or even doing a complete text search. However, if languages A and B have it as an inflected form, and language A has had the equivalent of your bot run, but language B has not, one will be directed to the word in language A. This provokes an unproductive race between languages to create entries for their inflected forms. --RichardW57m (talk) 12:35, 23 June 2021 (UTC)
Incidentally, does your bot handle the case of an inflected form being an inflected form of two different Bulgarian lemmas? --RichardW57m (talk) 12:35, 23 June 2021 (UTC)
@RichardW57m Now that you mention it, incidentally not really. As I wrote on the bot's userpage, the bot can trace multiple inflected forms from some given lemma, like if there are multiple senses of the same spelling of a word, but you're quite right in saying it wouldn't account for a totally different lemma declining in the same way as an existing one. It's not beyond implementation, but... would that case come up particularly often? I can't think of any cases off the top of my head where different words happen to decline to the same form like that - though I won't deny that's an oversight on my part. Do you know of any examples?
@Kiril kovachev I don't know Bulgarian, so even a single example is hard work. But, with a little effort, I see that the numerical plural of кос (kos, blackbird) is the same as the singular of коса (kosa, scythe). I noted a similar problem with the perfects of Latin circumsto and circumsisto. The bot creating entries for the perfect tenses didn't handle their sharing the perfect circumsteti. --RichardW57m (talk) 11:10, 24 June 2021 (UTC)
@RichardW57m I suppose this case isn't too much of a problem, since the behaviour of the bot would be to skip that inflection. There may still be problems on that front though, like that some inflections would only be generated for one of two different lemmas, but I believe that would also not be much of a concern. If some are left uncreated, they can be manually added. Still the problem that they could go unnoticed, I suppose.
As for your point on Latin being oversaturated with forms, I understand that is a very sensible concern. Before making this bot, I hadn't considered what effect that would have on users searching for other languages, nor perhaps the bloat it would generate across the wiki. Nevertheless, I find the convenience of being able to locate a term you're looking for directly rather than having to identify words that link to it a great utility. I would argue that the ease of simply entering a term and being directed to an entry linking to the lemma you're seeking is much more utilitarian than needing to scan through the search results or engage a full-text search. Perhaps it's just for me, but I find that navigating the search results and waiting for dead load-times to be the more arduous part of using Wiktionary. Maybe a minor gripe, but this is part of my motivation why I feel these would help.
Also, though it's correct to say this kind of editing can certainly start an 'arms race', if you will, I wouldn't call it unconstructive, personally: at any rate, it's adding to the dictionary's function at least somewhat, whilst equally not detracting from users-of-other-languages' ability to search in their own. That users may find a different language's entry when searching for their own is unfortunate, but wouldn't that incentivise more of such entries to be created for those languages, too? Rather than a race, wouldn't that make up healthy 'competition' instead? It would be nice to hear more people's opinions from the community about this other than just our own. If it's really the case that people don't want all these pages to be inundated with inflected forms, it isn't a big deal to scrap that idea, but I just thought it would be a constructive direction to go down. I was under the impression that Latin declined form entries were quite helpful.
I do occasionally worry about the sheer flooding of the system. With Pali, we have some significant degree of automated support for 14 writing systems across 10 scripts. A regular masculine noun of the commonest declension has 16 case and number forms, so that's 224 forms. (Inflection isn't yet supported for two of those systems - I want to see evidence for them first, but there are another two writing systems that depend on manual support, and there are several other writing systems or variants that need a small amount of manual support.) Waiting to take off, there's Sanskrit with thirty odd scripts and its large inflection tables, though at present only nominal inflection in Devanagari is productive. (I think it should also support significant sandhi variants.) When checking that Pali inflection tables direct only to Pali entries, I frequently find that the Roman script masculine/neuter genitive/dative singular points to a Finnish inessive. Of course, Swahili verbs do quite well with just a single script. --RichardW57m (talk) 11:10, 24 June 2021 (UTC)
14 writing systems! Out of interest, just what are the 4 not listed on Wiktionary:About Pali? In a similar vein to Pali entry flooding, though, Bulgarian nouns in particular can have at most 9 inflections (with vocative - 7 without), whilst most have between 5 and 7 (masculine 7-9, feminine/neuter 5-7). In comparison, it's not nearly as bad... but for a large number of entries, admittedly still a big footprint.
It lists 10 scripts, not 10 writing systems. The Thai script has two writing systems, one with implicit vowels (thus an abugida in Daniels' terminology) and one where the historical vowels are always explicit (thus an alphabet in Daniels' terminology). Both systems are alphasyllabaries in Bright's terminology. Both of these two writing systems are fully supported. Lao uses three quite different consonant complements - that of Lao, same again with nuktas, and the full set of consonants as extended by the Buddhist Institute. All three are used as alphabets, and the last one is also used as an abugida. Only the two systems using the extended range of consonants is fully automated, but inflection is also automated for the system with nuktas. For the most restricted set, the inflection of masculine and neuter nouns needs manual tuning. There is effectively an eleventh script, the Shan script. (Formally, it is part of the Burmese script.) That comes in two flavours - one with stacked consonant clusters and one where a vowel killer symbol is used. We generate both for lists of alternative spellings, but inflection tables are not yet generated for them.--RichardW57 (talk) 01:06, 27 June 2021 (UTC)
And, insofar as authors' entering the declension tables with maybe-dubious forms... that is another flaw, to be honest. I was planning, however, on using this bot in perhaps a different way than usual, i.e. looking at its output for each entry, at least for several hundred generations, to make sure that it's working as intended, before allowing it to make an edit. That would solve the problem of there being any bugs in the code, as well as correcting any questionable inflections like you mentioned. Maybe this would be more satisfactory? Sorry for the wall. Kiril kovachev (talk) 19:12, 23 June 2021 (UTC)
Well, 'telephone' does have a vocative role in English, as in "Telephone, don't ring. I'm busy". Bulgarian may be different.
Theoretically it can happily exist in Bulgarian too, but one of the only dictionaries online refutes it. Kiril kovachev (talk) 20:29, 24 June 2021 (UTC)
Support. The code looks well documented, but I have too little experience with Pywikibot to meaningfully review it. —TeragR disc./con. 18:42, 22 June 2021 (UTC)

Pacifying User:The NicodeneEdit

quarrel

This started when I fixed the entry capus that was a mix of mainspace Latin and Reconstructed, and that this user had edited under a fundamental misconception of how wiktionary entries operate. This user chose to conduct a discussion via an edit war and aggressively dismissive comments to their reverts. I started a discussion which I was forced to abandon when their replies turned into accusations, rudeness and rants in reply to what an imaginary me inside their head said to them.

—Then the same repeated for fōrmāticus. This time they supplied the Latin entry with an Old French pronunciation, changed the etymology in contradiction to all the etymological references, and even redirected the attested variant form as if it wasn't attested. They replaced a conjecture that the word represents an ellipsis, suppored by numerous references (#1, #2, #3) as if it was pure nonsense. They further replaced "Gaul" with "France" on the grounds that its "entirely anachronistic" because of a single period gloss. I again took this to a discussion. In it the user show their lack of understanding of historical linguistics, including but not limited to the difference between date of attestation and date origination, and conflating attested evidence and conjectured forms as if they had the same strength; the concept of anachronism in adducing a single period gloss in order to condemn modern usage anachronistic, and in appealing to nothing but the linguistic intuition of medieval Franks to define what is and isn't Latin even when specifically warned agains doing so, because they're forced to in order to avoid admitting that they have no familiarity with the modern linguistic side of the question. As well as lack of knowledge of history and of any awareness of the socio-political implications that Francia and other words used to describe these territories is/was fraught with (#1, #2; finally only a complete ignornace of the fact that the word Gallia continued to be used throughout the Middle Ages and even up to today (Gallomania, Gallo-Romance etc) can explain the categorical statements that it's anachronistic for the period.

—Now I'm perfectly aware how petty and ridiculous these matters are, and it's all the more frustrating to me something like this regularly results in a shit-storm when interacting with this user. The reason for these squabbles is obvious to me. The user in question is in the habit of making rash edits and statements, and they have an extremely vulnerable ego. This creates an unfortunate feedback loop of constantly being open to criticism that they can't take, and of having to hide their own droppings. For example, here they they directly accuse me of ignornace - spoiling the well - for revealing to them information they were ignorant of, and when they realise their folly they try to remove my replies: #1, #2. The same combination of ignorance and lack of foresight can be seen in the amusing quip "Romanistics isn't even the name of the field" - in fact, is exactly the same thing as with Gallia and cāseus, only in the latter cases the quip is effected by editing the page, and protests are countered with a shit-storm over nothing and an edit war.

This user is a core example of the "argument is war" metaphor run out of control and into pathology. He's reasonably informed, but in his world-view "knowledge is ammunition", and he uses it to demolish the enemy's "arguments are soldiers". This person equates his own ego to the country being defended, and his persona to the commander in chief, and so the very act of challenging him is perceived to be aggression. This is that unfortunate cases where the more the person knows, the worse they are to interact with. The ego in question is highly vulnerable and I have good reasons to believe is highly narcissistic - something they themselves realise. It is also highly inflated due to overestimating the degree of their knowledge. The net result of the above is that they see people who possess enough knowledge to question their own conclusions and undermine their imaginary authority as highly threatening and need to be proactively destroyed, humiliated and otherwise neutralised before they're able to deal damage to the vulnerable ego; and if the ego has suffered for whatever reason, revenge is exacted on the culprit. I've had quite a bit of experience dealing with and circumventing the "argument is war" metaphor, and I have enough of an understanding of the narcissist's mindest to be confident in my observations.

Their impulsiveness is exemplified by their editing habits which are disruptive to the website. In conversation this same emotional instability surfaces in knee-jerk spiteful replies followed by several hours of editing, often so that by the time you reply they'll have changed half of it. Take notice of the timestamps, as well as here from June 2 - this person's brain locks in a 9 hour-long pathological obsessive feedback loop and by the end of it is so fried that they cannot tell whether they're replying to your actual words, or to a strawmen figure of you. This obviously makes a conversation difficult if not impossible. Another part of this is replacing offensive words with a set of less offensive expressions that are still clearly fighting words. For example, when you read "mildly absurd", understand "moronic". This is often combined with externalising misattribution: "you vehemently deny" means "I'm furious about the fact that you deny", and "apopletic rage" means they're currently experiencing just that (this expression they changed three times).

They went on to continually fixate on me confusing the names of two languages (Breton and Welsh) with no relevant consequences - they were annoyed that I wouldn't verbally admit that I confused them - and further to attribute to me a ridiculous strawman position (#1, #2) to argue against while completely ignoring my protests that this isn't my position, if not being further incited by them. This was done as a thought-blocker designed to prevent understanding what I'm saying, which was that the word's form presents no evidence to the time of borrowing. And the nature of their last edits can be perfectly described by the expression "apopletic rage" and an all-out assault on my person, calling me a beginner in Romance linguistics and trying to discourage me from ever disagreeing or challeninging them ever again, in fact threatening to humiliate me if I do.

Overall, to me they seem to possess enough knowledge to reach correct conclusions, but not to possess enough humility to continually question their intermediate conclusions or even suspend them. Indeed, they possess the opposite trait that encourages them to hang onto their intermediate conclusions with utmost conviction and to dismiss anybody questioning them as both ignorant and an aggressor. Their behaviour of correcting themselves 30 times in a row is a manifestation of the same pattern as that which forces them to impulsively "correct" others - I know because I suffer from a milder variant of the same thought pattern. The difference in the outcome seems to be rooted in my constant self-doubt, factual self-checking and a much higher associated standard of certainty. And of course, my ego can take a far bigger beating.

My only other previous interaction with this user on this website also ended badly, although the start was promising of a fruitful exchange. Unfortunately I quickly understood that their purpose was to disrupt and win points against me. They tried to argue that the existence of one allophone justifies not phonetically transcribing allphony as a whole for a singular case where we have overt descriptions of Latin allophony by native speakers. Towards the end of it they resorted to using the fact that a rare, impressionistic, essentially phonemic and phonetically vacuous IPA symbol [l͈] for the "fortis L" can barely be found in some linguistics handbooks should be interpreted as it being appropriate to use this symbol in an actual phonetic transcription of a dead language which hasn't been shown to possess fortis/lenis consonants even phonemically; as well as to misinterpreting a super-detailed table because its author unfortunately chose to call an 'unspecified for backness' non-velarised L - IPA [l] - "dark", with a palatalised one to the left and full two further degrees of velarisation to the right, the proper IPA [ɫ] being called "darkest". They claimed that the author calling it "dark" overrules the author's actual articulatory description and entire discussion, and the fact the author chose to use a non-IPA diacritic to mark the vowel quality associated with it ([o]) was claimed to be incompatible with simply transcribing it as IPA [l], which stands for the non-velarised (neither palatalised) lateral approximant, is therefore wrong and therefore this user is right and has won and why won't I admit it already by submitting to their reverts.

This user uses references as binary crutches that either justify or disprove everything or nothing because they lack the tools to comprehend the matter in evidential terms and use the references to update their degree of certainty. From what I've seen they lack proper awareness of the distinction between phonological and narrow phonetic transcription. They tried "disproving" the detailed phonological study (and several specialised references) I provided in support of my transcription by citing a 3-line unsourced note on the matter in Sihler 1995's comparative grammar. This is bad enough, but the main reason I chose not to continue that discussion was because they became visibly pissed off ("brazenly denied", "recant your vehement rants") at their own inability to argue coherently and my ability to do so, and because they had clearly revealed their squabbling winner-loser mentality instead of that of a mutual search for truth, and so any willingness for a civil discussion was clearly gone. I also left their reverts in place.

In a following act of aggression in Module:la-pronunc, they replaced my Campanian reconstruction of a historical pronunciation with their Russell's teapot, which is another variation on what my reconstruction had replaced after several prior discussions. They did this despite being aware of at least some these discussions and participating in them under the name Excelsius, and despite knowing that I've been editing the module in a different direction. Again, they feel that having condenced a number of works into a wikipedia article justifies this behaviour - "I have the references, you lose".

I don't object to having a purely phonemic proto-Romance reconstruction (lend a hand if you can), but a phonetic proto-Romance reconstruction makes even less sense than phonetic Old or Middle English. They argue that proto-Romance represents a concrete spoken prestiege variety that was common to everywhere; they believe it to have been significantly later than Classical Latin. Yet those reconstructions that use Sardinian evidence to reconstruct proto-Romance, as I understand it, are forced to postulate the break-up of the PRmc. unity no later than the 3d (preferrably 2nd) century CE, which is to say it's post-Augustan Latin, but comparatively reconstructed; and the phonological difference between attested Classical Latin and the reconstructions are of the same nature as syntactic differences (the future tense or the full system of cases cannot be reconstructed). I see no way to postulate a prestige language variety common to the whole of the empire but dating to the 5th century, for example, and nor have I seen it postulated. The loss of Dacia in the 3d century alone precludes this, with Rumanian being intermediate between Sardinian and the rest in its conservativsm. This paradox - that what you get comparatively isn't what you see attested and thoroughly described by language standardizers - has been known for as long as the Romanistics has existed but is felt especially sharply today (Wright R. in Clackson J. 2011 p.64). It's understood comapartive evidence needs to be synthesized with attested Latin to arrive at likely reality, and this is what my Campanian transcription aims to do; this user mistakes pure reconstruction for a complete linguistic system that actually existed, a claim that most Romanists don't subscribe to. In essence it's an offshoot of Vulgar Latin, which was nebulous enough but at least inextricably linked with written attestations; the reconstruction they use is an excercise in purely comparative reconstruction and makes no claim at being a language in the sense of a sociolinguistic system, even less so at having a single supra-regional standard phonology that they're attempting to imbue it with.

In any case I'm sure I could have worked out these differences with a reasonable person - there are many things that could be fruitfully discussed. But this person is not reasonable, and is not interested in working out differences. They're interested in winning and in humiliating me personally. They take rash one-sided initiative in completely undoing other people's work, and they take being challenged on it as personal offense. They will edit-war you in an attempt to defeat you, to have you give up. I'm a relatively chill person averse to conflict, and so you'll notice that in every single one of those instances I simply bail out of the discussion when I see that reason and civility no longer reigns supreme (although sometimes I do loiter too long). But I cannot do this continuously because this user interprets this as a victory and they proceed to do whatever they like. The other options are to continue feeding oxygen to the fire, which is stupid, or to silently edit-war them, which is wrong. Therefore it seems to me the optimal solution is to simply ban them.

To summarize, never before have I been in conflict with anyone at all on this website, and I enjoy it for this reason among others. I don't ever participate in website drama, here or elsewhere. I do seem to be able to magically pop over-inflated egos with a few words, but most of the time I only slightly prick them to achieve silent deflation. This is because I highly value open-mindendess, civility and consensus as conductive to the acquisition of knowledge, something which an inflated narcissistic ego opposes by nature, being a priori right. For a full disclosure, I might have ended popping this person's ego 3 years ago or so when they stormed into our chatroom fresh from a notorious edgy meme- and teenager-infested place and tried to bring that attitude with them. It might have rendered them permanently jarred ever since: first they've tried trolling me into another conflict (the "go Skype" revenge tactic) before reappearing under several different aliases, all with the same result - roughly how this went can be gleaned from the pronunciation discussion referenced above (the last time I interacted with them anywhere must have been close to a year ago). I say nothing offensive, but merely show them that their pretences at being a Romance language authority are unfounded, and a wiser person would have learned humility from this, but at some level of volatile narcissism this seems to be ruled out. I don't hold grudges and have genuinely attempted to resolve disagreements with this user and show them where their reasoning was flawed. What I got in response was immediate aggression, deeply-rooted bad faith and general verbal military action - fully conscious, as his repeated abusive narcissist's clichés of "you made me do it" and "you started first" testify - until I too dropped any attempts to assume good faith. At the end of that discussion they've clearly demonstrated that they aren't interested in calm and reasoned discussions, that they don't even consider me worthy of being listened to, and to but it bluntly that they're emotionally disturbed. Personally speaking, engaging with them feels like being attacked by a rabid canine. If I seem like a reasonable person to you, if you don't believe I'm simply delusional, I would ask the admins to consider taking some action to prevent The Nicodene from causing further strife on this website. If I know anything about appropriate human conduct (and Wikipedia Etiquette), I believe their combination of traits results in unique toxicity that is detrimental to this website as well as to the atmosphere on it. Mentioning @-sche, Mahagaja, Imetsia, Metaknowledge as admins I've recently interacted with - hopefully this is appropriate. It's taken me a while to decide to write this because I don't believe in heavy-handed moderation and would hate to be seen as settling a personal matter via that route, but I don't see another choice. Brutal Russian (talk) 17:25, 20 June 2021 (UTC)

I am a user given to writing long comments. I have to try to be more concise. I want to reach out and suggest to you that you that a comment of this length is not consistent with the mission of a volunteer dictionary website. The very length of the comment is 'detrimental to the atmosphere' of the website. That's my comment and suggestion. Thanks for your time. --Geographyinitiative (talk) 18:29, 20 June 2021 (UTC)
Imagine what it's like being the one to have to reply and address all parts of it. Gets exhausting, frankly. The Nicodene (talk) 18:32, 20 June 2021 (UTC)
Nitpick: Doesn't Daco-Romance come from Moesia? --RichardW57 (talk) 18:40, 20 June 2021 (UTC)
Yes that is one of many issues with the above. It is assumed that Romanian must have originated north of the Danube even though there exists a considerable body of evidence to the contrary. Even if it had, though, that does not mean that linguistic contact between the ancestors of the Romanians and other Latin speakers had to cease the moment that Aurelian abandoned the province: the Danube is not going to prevent ongoing trade, proselytism, etc. The collapse of Roman rule throughout the interior of the Balkans in the early seventh century, in the face of the Slavic-Avar invasion, appears to be the turning point. The Nicodene (talk) 18:53, 20 June 2021 (UTC)
@RichardW57: I'm not informed on that question, although I'm currently reading something that mentions it, at it looks to be a distinct possibility. I'm very cursorily familiar with Rumanian. If I decided to take Geographyinitiative's advice, I might have substituted that whole part with: "Hadrian (76–138 CE) is reported to have had a Spanish accent and Septimius Severus (145–211 CE) a distinct African accent (his sister couldn't even speak Latin so was sent away from Rome as a disgraziata). Clearly no supra-regional prestige proto-Romance phonology gave rise to Ibero-Romance, nor to African Romance. Augustine was mocked for his accent when in Italy, presumably in early 5th century Milan, to no surprise of mine. One possibility is that somewhere between the early 5th century and the break up of the empire, a supra-regional prestige proto-Romance phonology took root across the whole of the empire. The other is that there never existed a supra-regional prestige phonology save for the upper-class Roman speech of the 1st century BCE, aka Classical Latin." Russell's teapot continues to elude us. Brutal Russian (talk) 19:58, 20 June 2021 (UTC)
It is rather that by the fifth century at least two clear regional differences had developed in the pronunciation of Latin, as demonstrated by Adams (namely the extent of betacism and treatment of the original Latin /ĭ/). That much is not controversial.
The phonological system reconstructed by modern scholarship (and labelled as 'Proto-Romance') existed prior to that. The Nicodene (talk) 20:27, 20 June 2021 (UTC)
Septimius Severus lived in the 2nd century and had an African accent. Hadrian had a Spanish accent in the first century even despite being from a patrician Roman family settled in Iberia shortly after Scipio Africanus' conquest of Italica (206 BCE). There already existed a distinct Spanish Latin in the 1st century CE, and nothing seems to show that it ever went away. The sociolinguistic situation in the Roman empire seems to have been highly complex at every single point in time, and this clashes with notion of a single supra-regional Late Latin/proto-Romance in much the same way it clashes with traditional Vulgar Latin. Brutal Russian (talk) 21:35, 20 June 2021 (UTC)
Variation in regional accents certainly existed at all periods, as in any sufficiently widespread language, including in the Classical period. That does not prevent linguists from reconstructing a relatively 'Late' pronunciation any more than it prevents them from reconstructing a coherent Classical pronunciation, which you have never once questioned. Note that the pronunciation of Rome was prestigious in the Late period, as Adams (2008: chapter III § 1.2) demonstrates, just as it had been in Classical times. The same author also debunks claims of an early distinctive 'Spanish Latin' (pp. 370–431), among numerous other topics relevant to this discussion. The Nicodene (talk) 23:08, 20 June 2021 (UTC)
@The Nicodene: The volatile narcissist is lying as usual. Adams debunks no such thing. There's a whole section with testimonies on "an identifiable Spanish accent" as well as vocabulary . It's only possible to disprove these testimonies by demonstrating that they're not authentic; disproving bronze tables from the 2nd c. CE would prove even more difficult, and the volatile narcissist will not succeed in recruiting Adams as a useful idiot to do their narcissistic bidding. There is no question that there existed a regional Spanish variety of Latin, complete with its own phonology, vocabulary and surely other differences as well that have left few if any attested traces. As the volatile narcissist themselves admit before getting entangled in their own lies, variation certainly existed at all periods. The raving narcissist has not shown me any linguists that claim a universal Late pronunciation. The Classical pronunciation was not universal by any means, but it has a time and place where it existed. If you spoke in the high-class Roman accent anywhere in the Empire, it would have commanded prestige. No such time or place has been demonstrated for the renamed Vulgar Latin that the volatile narcissist is attempting to promulgate. When the volatile narcissist appeals to Classical Latin to justify their folly, they're appealing to a registered satellite to claim the existence of Russell's teapot. The whole idea of proto-Romance is precisely the opposite of a local, let alone presitigious pronunciation.—Adams 2007 Regional Diversification... chapter III is "Explicit evidence for regional variation: the Republic" and has no § 1.2. What book and page? Brutal Russian (talk) 08:23, 23 June 2021 (UTC)
@Brutal Russian As someone who has tried to read through all of the megabytes of text and hundreds of edits in multiple discussions and still doesn't know who's right, I have to say that this is a new low in an already awful exchange. Please stop the gratuitous name-calling- the only damage it does is to your credibility. Chuck Entz (talk) 15:10, 23 June 2021 (UTC)
Once again I will ask you to tone down the edgy insults before you get yourself banned from this website.
All of the references below will be to Adams’ Regional Diversification of Latin.
On pages 429–431, Adams begins by speaking out against the notion of an Oscan-influenced distinctive ‘Spanish Latin’ pronunciation.
He continues by mentioning that there are “no genuine regionalisms” to be found in it other than a few terms related to mining and the single word paramus.
The earlier section about a ‘Spanish accent’ simply recounts some scattered comments by Roman authors: it does not argue, contrary to Adams’ own later conclusions, that there really existed a distinctive native Latin that characterized the Iberian Peninsula and set it against the Latin spoken anywhere else.
This can be seen on page 724, where he says that “the observations [of Roman authors] about the usage of Italians, Gauls, and so on are not to be taken as establishing that there were already standard varieties of Italian, Gallic, and African Latin [and so on], but as reflecting the perceived separate identities, easily defined in geographical terms, of the major regions of the Roman Empire […] it is an absurdity to attempt to trace the origins of the Romance languages back to the date of foundation of the provinces of the Roman world.”
The section containing testimonies from the Empire, both the earlier and later periods, about the prestige of the Roman accent are to be found in chapter four, section 1.2. (I miswrote '3' for '4'.)
In any case I am not arguing for a “universal Late pronunciation”. What I am saying is that what scholars such as the members of the DÉRom team, Thaddeus Ferguson, Robert Hall, etc. have been working on all these years is a reconstruction of a real pronunciation. The Nicodene (talk) 16:04, 23 June 2021 (UTC)
@The Nicodene: You have not cited the book and page that demonstratstes "that the pronunciation of Rome was prestigious in the Late period". What is this that you continue referring to as "a real pronunciation"? What time and place is it reconstructed for? How is it that a universal, purely comparative reconstruction divorced from any attested Latin data which doesn't posit a sociolinguistic system, is given a non-universal, sociolinguistically-limited pronunciation? Cite the DÉRom publication that defines and describes it. Cite the DÉRom publication that posits its proto-Romance as an on-the-ground sociolinguistic system. Brutal Russian (talk) 17:41, 23 June 2021 (UTC)
@Brutal Russian: I have cited the book ("All of the references below will be to Adams’ Regional Diversification of Latin") and I have cited the specific section (chapter 4, §1.2). Please read pages 188–202.
From page five of Dworkin's article on the DÉRom: "“[Written] Latin and Proto-Romance (in essence, spoken Latin as reconstructed through the comparative method) are in reality two different registers of the same linguistic system.”
The reconstruction is not 'purely comparative', to the point of rejecting attested data; it is actually informed by a thorough consideration of the latter.
Since the reconstruction is, inherently, synchronic, the DÉRom does not commit to dating it to a specific year. Éva Buchi, the head of the project, comments in her article Sept malentendus dans la perception du DÉRom par Alberto Vàrvaro:
"...le DÉRom ne postule nullement que la Sardaigne ait été isolée linguistiquement de manière précoce [...] autant nous n’avons pas d’idée préconçue sur le processus de fragmentation de la Romania, autant nous avons l’espoir que lorsque plusieurs centaines d’articles du DÉRom seront disponibles, il deviendra possible d’en exploiter les résultats dans le but de contribuer – modestement – à l’élucidation de ce processus."
Translation: "... the DÉRom in no ways postulates that Sardinia was linguistically isolated from early on [...] Just as we have no preconcieved notions about the [chronology of the] process of fragmentation of Romania, we hope that once hundreds of DÉRom articles will be available, it will become possible to use their results to contribute, modestly, to the elucidation of this process."
Romania (also spelled Romània) refers to the entirety of the Romance-speaking world. Later, in the same article, she comments:
"le DÉRom s’oppose à l’hypothèse du latin vulgaire en tant qu’état de langue indépendant, et c’est bien pour cela qu’il nomme son objet protoroman, signifiant ainsi que c’est par le moyen d’accès à la réalité linguistique qu’il se distingue du latin connu par le corpus littéraire, et non comme un état de langue essentiellement différent."
Translation: "The DÉRom is against the hypothesis of 'Vulgar Latin' as an independent language, and it is for this reason that it names its subject of focus 'Proto-Romance', referring to the means of accessing the linguistic reality beneath the literary corpus of Latin, and not an essentially different language."
The Nicodene (talk) 18:16, 23 June 2021 (UTC)
When you said that ‘there already existed a distinct Spanish Latin in the 1st century CE, and nothing seems to show that it ever went away’, I took it to be a claim that:
1) There existed a Latin dialect encompassing most or all of Iberia which can be shown to have been distinct from the Latin spoken anywhere else.
2) That this dialect evolved linearly from early Roman times into Ibero-Romance.
(I do not know if this is, in fact, what you thought, but this is how I understood it.)
To recapitulate Adams' comments from page 724, quoted above, he warns against using impressionistic (a word he uses elsewhere, repeatedly, in reference to them) Roman-era testimonia regarding local practices as evidence—in the absence of any ‘real’ linguistic evidence—for the existence of a distinct dialect covering an entire region like Gaul or Italy (I take his ‘standard’ to mean ‘relatively homogenous’). In the latter part of the quote he speaks out against notions along the lines of #2, listed above. All that is, at least, how I understand what he is saying.
Nowhere in the book does he mention a phonological (or grammatical or syntactic) feature that can be described as proof of such a distinct dialect covering Iberia. To recapitulate some of the conclusions he makes (pp. 428–431), after examining what evidence there exists, there are ‘no genuine regionalisms’ characterizing the Latin of the peninsula other than a few metallurgical terms or the word paramus. He also casts doubt on theories of a distinct archaic or Oscan-influenced character for the Latin spoken there.
Regarding the unreliability of testimonia (again, in the absence of other evidence) about regional usages or linguistic practices, he comments, on pg. 5:
”There is often a rhetoric to ancient observations, and such evidence cannot be used uncritically. In a recent book on regional variation in contemporary British English based on the BBC’s nationwide Voices survey it is remarked (Elmes 2005: 97–8) that people in the regions today like to claim words as their own regionalisms when in reality such terms may be scattered much more widely, even across the whole country. This is an observation that should be kept in mind as one assesses ancient testimonia. Communications were poor in the ancient world, and there is no necessary reason why someone asserting the regional character of a usage should have had any knowledge of linguistic practices much beyond his own patria.”
———————
I reiterate that I do not mean to make a strawman (in #1 and #2) of what you were saying, if it was not that. I understood it the way I did and responded accordingly.
———————
I say all this not because I want to have a discussion about whether Adams’ views are true or not, but because I am tired of being accused of lying—of even using scholars as ‘useful idiots’ to do my bidding. I have not once done that, and I am sick of the slander. The Nicodene (talk) 18:04, 25 June 2021 (UTC)
I am pinging you here, @Brutal Russian:, because I suspect you have not seen the above comment and will think I have just left this unaddressed. (And then you will, in a week or two, post another 'manifesto' somewhere on Wiktionary claiming that I lied to you about this, etc.) I want to get this over with right now.
To summarize, Adams very much does debunk "claims of an early distinctive Spanish Latin" in the cited section; the claims are those expressed by certain scholars that Spanish Latin was distinctly Oscanized or distinctly archaic relative to the Latin of other regions, and that this carried over into Ibero-Romance. I suspected that you may have had something like that in mind, especially since the latter part of your comment seemed to imply continuity of features into Ibero-Romance.
What you actually had in mind, judging by your virulent reply to that comment, was apparently two things: 1) ancient testimonia, and 2) distinct local vocabulary.
For #1, see the above reply citing Adams' own warnings against the unreliability of vague Roman-era testimonia without corroborating linguistic evidence. For the unreliability of 'Spanish accent' testimonia in particular, note Adams' comment on p. 231 that 'details are never given' in them. People from, say, Corduba in the first century could easily have had some local twang or other, and that could have been dubbed by a speaker from Rome as a 'Spanish accent'–falsely attributing to an entire region a localized feature. The problem is extrapolating from vague Roman-era comments to this effect into an entire distinct Spanish Latin phonological system, as opposed to simply, e.g., some oddity in cadence found in one part of the peninsula. (And the idea that that carried over linearly into Ibero-Romance is an assumption even more fraught with problems.) See Adams' comments, quoted above, that warn against assuming relatively uniform (internally) but still distinct (from those of other regions) dialects for large regions like Gaul or Italy–or, in this case, Spain–on the basis of vague testimonia regarding accents, and see also the comment quoted above where he warns that testimonia can easily attribute to one particular region a feature that is also found outside the region (and, I will add, may not even be found in the entirety of the region mentioned). That is why concrete linguistic evidence is needed, and that is why I cited the section where Adams discusses what evidence there exists (pp. 370–431)–including that from the bronze tablets–and dismisses it all, except for a few mining terms and paramus. So much for #2.
I reiterate that I do not want to discuss right now whether what Adams says is valid. I write all this in response to your claim that I was a liar for citing him.
By the way, I have looked through "Madeline 2016: 197+" and found in it no comment whatsoever about "modern English scholarly usage" of Gaul versus France. I will be nice enough not to reverse your own accusation or conclude that it was out of projection. The Nicodene (talk) 00:15, 27 June 2021 (UTC)
@Brutal Russian This will come as no surprise to you, but I disagree with every last paragraph of your gargantuan rant. I do not, however, want to spend the rest of the day systematically dissecting the entirety of it and citing all the contradictory evidence. I have other things to do in life as well. I suggest you pick a specific section of it, and we can start from there. Once that is done, give it at least a few days, and we can move on to the next part.
By the way, if you want to claim that someone is bullying you, it doesn't help your case when you're the one constantly writing new rants like these in 'public' discussion venues and calling the other person ignorant, a narcissist, etc. and even trying to get them banned. The Nicodene (talk) 18:43, 20 June 2021 (UTC)
@Brutal Russian Here is what I would like to say in your response to your 'rabid dog' comparison.
You were toxic to me in the very first discussion we had on this website, without the slightest reason to be, and you have never stopped being so since that moment. (You even continue to insult me on this page.) How is it that you cannot see that you, the one regularly attacking me on new discussion pages and now even attempting to ban me, are the bully?
I barely have the energy to do even a quarter of the work I used to do on this website, thanks entirely to you. You have succeeded in turning my experience here into a living misery. I hate logging on here now, because I have to constantly deal with the nauseating feeling that you are about to launch yet another tirade against me in some new corner of the website. Let me be even more honest (at the very real risk of you using this as ammo to ridicule me): I suspect I am developing a form of PTSD from your constant attacks. If I put on a façade of unaffected bravado, that is entirely for the benefit of the discussion. The Nicodene (talk) 09:57, 21 June 2021 (UTC)
@The Nicodene: I don't know what you're talking about with "have never stopped since that moment". Since that moment - which was a screaming 1.5 YEARS ago - we had had no discussions on the website until you started those edit wars full of open aggression, rudeness, fighting words and outright accusations of ignorance, undermining all my efforts to foolishly convince myself - as I constantly try to do - that whatever it looks like, the other party must have good faith and will take my cues to improve their behaviour, because most people are reasonable and mostly end up being mean without intending to. This works 95% of the time, but with you it fails miserably. You're under complete misapprehension of how reasonable people work out their differences in general or on this website. They do so by starting discussions, which they don't call tirades. Tirades is what you end up writing when you lock yourself in a 9 hour-long imaginary mental battle with a distorted image of me. You couldn't be farther from the truth in considering that that's what your interlocutors do - this is called externalization or projection (psychology).—You conduct discussions in comments to your edit wars. This is against the website's code of conduct. You become infuriated and use the most dishonest of argument tactics in response to other editors engaging with you. You attack and abuse me and try to make it look like I'm too ignorant to be worth talking to, and you threaten me with further verbal violence if I continue to engage in discussion. This is absolutely despicable behaviour that should and will not be tolerated on this website. It destroys the atmosphere of help and cooperation for the benefit of all. You reject civil dialogue in favour of outright war. I'm convinced that the reason for your inability to maintain civility for any amount of time is as I describe in the longread. I think it would be better for everyone if you found a different place to implement the results of your proto-Romance reconstruction efforts, one where you're the only and unquestionable authority. Brutal Russian (talk) 00:37, 22 June 2021 (UTC)
@Brutal Russian: That discussion ended eight months ago, not 'a screaming 1.5 years ago'. Our next contact came on May 15th of this year when you reverted my edits on a page that you had literally never touched up until that moment. What happened when I tried to re-institute my original edits, while providing my reasoning? You accused me (yes, me!) of "inconsiderately reverting [your] edits" on the Beer Parlour. Apparently when you wander onto a page and start reverting people's edits, it's fine, but God help them if they try to reinstate their edit! You even had the audacity to describe this as me "aggressively rever[t]ing an informed user's edits without any attempts to contact them and reach consensus first", as if you had not started exactly that yourself. (Or are you, in your mind, the only "informed user" on this website? I wouldn't be surprised.) At times I am genuinely dumbfounded at your ability to see yourself as a victim even when you are, objectively, the aggressor.
Leaving aside personal matters—which we will quite literally never agree on—we are clearly going to have to come to a sort of modus vivendi. I have already proposed a solution below. To recap, it's this:
If you make a separate sub-module for your experimental 'Campanian Latin', and if you correctly source all of the features that distinguish it from the existing Classical Latin, then you have my word that I will leave it alone.
And yes, you need to make a separate sub-module for it, because at the moment the 'vulgar' transcriptions are used primarily for reconstructed Proto-Romance lemmas. It is quite clear that a reconstructed Proto-Romance pronunciation is what is most appropriate for reconstructed Proto-Romance lemmas, rather than a (so far still incorrect) approximation of Pompeiian speech. If you really are even a third as reasonable as you proclaimed yourself to be in the middle of your latest hate-filled tirade then you will see the fairness in this.
And no, I most certainly do not regard myself as an "undisputed authority on Proto-Romance". If someone came along who was actually informed on the subject and disagreed with me on something, and they actually presented sources supporting their view, I would be happy to discuss the matter with them and potentially learn something new. The Nicodene (talk) 01:22, 22 June 2021 (UTC)
@The Nicodene: Right, so for real: if I never made this edit to capus, would all of this never have happened? Would you not have decided to edit fōrmāticus and the pronunciation module? Would peace and prosperity continue to reign on this website if I never made that fateful edit? Brutal Russian (talk) 15:09, 23 June 2021 (UTC)
@Brutal Russian: no, that is not the case. I already had the intention of editing the pronunciation module long before that (already by October of last year), and I have been working on individual entries related to Latin or Romance Linguistics longer than that. The Nicodene (talk) 16:18, 23 June 2021 (UTC)

Regarding an old discussion about allophones of /ll/Edit

quarrel

Today I will be focusing on one topic out of the lengthy rant that @Brutal Russian has posted above.

It relates to an old discussion we had about the topic mentioned in the title of this post, which was also the first contact that this user and I had ever had on Wiktionary.

As you can see, the discussion was civilized until he said “[you’re] simply muddying the waters while milking me for knowledge/conviction points” and “In fact, let me ask you directly: had you read anything on the topic besides Vox Latina before starting this exchange?”

He insinuated, in that paragraph, that I was ignorant, and that he, the all-knowing one, was annoyed by my having a discussion with him. I had done nothing to deserve this behaviour from him, yet he threw it at me anyway, in the process rendering void any future claims of innocence and victimhood.

This attitude of his might have been, at least, slightly less egregious if he had been right about what he was saying, but in the end he was not.

He appears to challenge the basic fact, which any linguist is aware of, that narrow transcriptions vary in their degree of precision. Moreover, few such transcriptions attempt to exhaustively transcribe all of the allophonic features of a given segment; that is especially so for reconstructions of extinct pronunciations, where doing so becomes exponentially more complicated.

He claims that that Classical Latin /l/ before /e/ was “non-velarised”, yet his own source (Sihler) clearly states that it was, just to a lesser degree than before /a/ or back vowels. (Just because a feature is underspecified does not mean that it does not actually exist.) This is supported by the scholars Scen and Weiss, who, like Sihler, out that /l/ was ‘pinguis’ in the environment in question. The Nicodene (talk) 19:54, 20 June 2021 (UTC)

I said: "When you're suggesting transcriptions devoid of meaning, arguing that we shouldn't aim for precision because some transcriptions are just wrong, to me it seems like simply muddying the waters while milking me for knowledge/conviction points". You're chopping my words to misrepresent my tone and intentions. With the full quote it's obvious that I'm being duly polite and circumspect. I'm politely but firmly letting you know that your behaviour is inconsistent with what I understand to be a bona-fide discussion with the aim to reach consensus, but that you seem to have a pre-conceived agenda to disagree with me. It's silly to pretend now that I wasn't right, and that I didn't burst your pretence bubble with that remark, which made your blood boil. Nothing stings one's ego more than truth, and the sting is felt even by those who lie to themselves about what they're doing.
Your next-to-last paragraph is precisely what I call muddying the waters and milking me for conviction points. There is no facts in that paragraph - the is only whataboutism of a somewhat higher level as "nobody has heard recordings of Classical Latin so we don't know cēna wasn't pronounced as in Roman Ecclesiastical - at least have the humility of not claiming that it wasn't!" or "you haven't seen the Earth from space so it's difficult for you to claim it's oval - at least have the humility of not claiming that it's not flat!" - you get the gist. People who use that tactic aren't people I want to have a discussion with. I have provided you with a detailed phonological study right from the very start - Sen 2015. I did not speicifally expect you to understand it without prior training in autosegmental phonology - but I did expect you to not argue about something you don't understand, a reasonable expectation I and other reasonable people apply to myself and everyone else.
Sihler can state anything he likes, but his statements cannot possibly disprove the study I cited. His statements don't have magical powers to disprove a phonological paper. Weiss treats the question of its quality before /e/ in a footnote - and that footnote doesn't disprove anything in Sen's 2015 study, the screenshot of which you took yourself. Weiss offers different conclusions that he borrowed from earlier non-phonologica studies. Is Sen the same person as your Scen? If so, then one and a half year ago you continue talk about the very study from where you took the screenshot that says what I say it says, and claim that it contradicts what I say it says. This is what made me decide you were one of several things neither of which I wanted to have anything to do with, and I continue to see confirmations for this. Why are you saying that what's written is not what's written? How is one supposed to discuss such intricate matters with a person who is either unable to understand even an illustrated phonological description, to agree even about what the English text says?
If the situation has improved since, and you instead mean Cser, in his 2020 book he writes "it was velarised before consonants, velar vowels and possibly [e]", presumably basing this on the same Sen 2015 which he cites elsewhere albeit not for this statement; and again his is not a phonological study of the Latin /l/ and does not disprove anything. The bottom line is that you're throwing generic references at me that make generic statements based on generic musings instead of providing anything relevant to the specialised phonological study that I cite. You provide no evidence - your own or external - that questions Sen 2015's scholarship or conclusions in any way. You seem to not understand how a scholarly exchange of knowledge works.
This is what I call throwing references at people, muddying the waters, whataboutism, having no knowledge or understanding but still disagreeing. Why? Why, to disagree of course! Once you've started a war, you don't want to just stop the barrage just because your missiles hit a brick wall, right? This is what I mean when I say that you use references as binary crutches regardless of whether or not you understand anything written inside. It's like a soldier not looking inside the rocket in their rocket launcher - as long as it destroys the opponent, it's all good and fair. Fire away and don't worry about it. And in that discussion you continued doing exactly the same with the fortis [l͈]. This is when I concluded I wasn't taking part in a discussion between two intelligent people. As is obvious by the fact that I didn't know who I was conversing with, I did that purely on the merits of your conduct, without any personal preconceptions against you, and the same happened on previous occasions with you using different aliases.
Do you still, despite my explanations on the talk page and in the longread above, not understand that the sound Sen 2015 puts between the patalatised [lʲ] and the velarised [ɫ] is good old IPA [l]? Do you not understand that its understanding the phonological description that allows one to conclude what the author is talking about, and that once you have the phonological entity, it's irrelevant to you whether calls that entity "dark", or comes up with adventurous transcriptions like [l°] (which I probably would have used it was standard IPA)? Brutal Russian (talk) 21:07, 20 June 2021 (UTC)
@Brutal Russian The fact that you interlaced your rudeness and insults with polite-sounding language does not suddenly mean that they did not exist. The fact remains that you were toxic to me without the slightest provocation, and anyone reading this will be able to see that. The fact that you continue to insult me in this most recent comment by calling my ego fragile and calling me ignorant for the umpteenth time ("having no knowledge or understanding") is just further confirmation that you are incorrigibly toxic.
(I do not understand your preoccupation with my 'aliases', by the way. Previously I, like many people, perhaps most people, had different usernames on different websites. Recently I decided to harmonize them all to Nicodene. For Wikipedia that name was already taken, so I had to add 'The' before it. None of this has anything to do with you.)
Back to the matter at hand: if you believe that narrow linguistic transcriptions all must transcribe every single allophonic feature then, I am sorry, but you are simply mistaken about this. Look at any narrow phonetic transcription of any word in any source, particularly ones of dead languages, and you will see that they only transcribe a certain number of features. If they covered everything, they would have to always specify tone (even for non-tonal languages), secondary/tertiary stress, minute degrees of asperation/palatalizion/velarization/retraction/nasalization, etc. The resulting transcriptions would be, to put it simply, absolutely chock-full of diacritics: at least one on every single symbol. Such exhaustive transcriptions are occasionally done, but they are the exception, not the rule.
In point of fact, extremely narrow transcriptions are often given in double square brackets, rather than single ones. Please enlighten me about how this would be possible if all narrow transcriptions must be maximally precise.
According to the ‘super-detailed table’ that you reposted, Sen (2015: 33) is quite clearly and unambiguously saying that /l/ before long or short /e/ was “dark” and “underspecified for back”, just like /l/ was before long or short /a/ or /o/ (which you yourself had transcribed as [ɫ]), just to a lesser degree. Meanwhile you transcribed /l/ before /e/ as a simple [l], which is in direct contradiction to what Sen is saying. Before you attempt to claim that I am misreading the table, refer to the following comment he makes on page 28: "To conclude, /l/ before /e:/ was dark enough to trigger backing to /o/, whereas /l/ before /e/ was darker, triggering backing to /u/", and on page 16 he adds that "Traditional grammars disagree as to which variant appeared before /e/, but colouring indicates that /l/ was relatively dark in this environment.”
Sihler (1995: 174) says that “The distribution was as follows: l exilis was found before the vowels -i- and -ī-, and before another -l-; l pinguis occurred before any other vowel; before any consonant EXCEPT l; and in word-final position.” On page 41, in a parenthetical comment, he specifies that what he means by l pinguis is, in fact, velarized l: "Before a velarized l (that is, l pinguis, 176a)".
Weiss (2009: 82) specifically says “In Latin, l developed two allophones: a non-velar (possibly palatal) allophone called exīlis be­fore i and when geminate […] and a velar allophone called l pinguis elsewhere […] The one slight surprise in this distribution is the fact that l is pinguis even before e, e.g., Herculēs < Hercolēs".
There is no ambiguity here. The sources directly contradict your transcription [l]. The Nicodene (talk) 22:18, 20 June 2021 (UTC)
To be even more explicit on the last point: [l] specifically refers to clear l, not to dark l. The latter is defined as being, to a greater or lesser degree, +back/'dark'/velarized. This can be seen by looking up the definitions or even just descriptions of clear l and dark l in any scholarly work. The Nicodene (talk) 02:58, 21 June 2021 (UTC)
This, dear readers, is what I'm talking about. 1st degree: "clear" = palatalised IPA [lʲ]. 4th degree: "darkes" = [+back] velarised IPA [ɫ]. There are two degrees in-between, lacking IPA letters. 3d degree: "darker" - contextually velarized but not [ɫ] - I chose to transcribe that as [ɫ] because [ɫ] subsumes a number of different degrees of velarisation and pharyngealisation. This leaves us with the 2nd degree: "dark", immediately to the right of [lʲ] and two degrees less velarized than [ɫ]. What is that in IPA? Oh, I think I know, it's IPA recant your vehement rant. Brutal Russian (talk) 00:55, 22 June 2021 (UTC)
Let me break this down for you in simple terms, since you still fail to grasp the point.
The sources agree that /l/ before /ē̆/ was dark. Your attempted precise transcription claimed it was clear. You were demonstrably, and unambiguously, wrong here.
And guess what the IPA representation of dark l is? It is [ɫ]. (Please open any linguistics textbook if you disagree with that.) It is unfortunate that there is no official way to represent degrees of 'darkness' in IPA, but c'est la vie. The Nicodene (talk) 01:39, 22 June 2021 (UTC)
It just occurred to me what the source of the problem may be here. You are, somehow (I assume from just glancing at Sen's chart, which admittedly might be misleading to a beginner), under the false impression that dark l and velarized l are two different things. The terms are literally synonyms, and anyone informed on basic linguistics would be aware of that. The Nicodene (talk) 02:08, 22 June 2021 (UTC)
"Dark" and "clear" are not phonological terms or descriptions, they're sounds you make with your mouth - quack-quack. Same sounds can have different meanings for different people, or for one person in different contexts, moods or just times. Sounds do not overrule the results of a phonological study, which is as I described above. "Velarized" is a sound you make with your mouth that underlies a phonetic/phonological description. The relevant phonetic/phonological statement is "/l/ before /e/ was intermediate, darker than the palatalised [lʲ] but two degrees clearer than the velarized (I think more properly called pharyngealised) [ɫ]". Such phonetic/phonological descriptions are commonly expressed in short-hand using the IPA alphabet. This particular phonetic/phonological description is best conveyed by IPA symbol [l], and is poorly conveyed by IPA [ɫ], which is the symbol that best conveys the phonetic entity that has two degrees higher velarisation. It's also poorly conveyed by [lʲ], which refers to the same phonetic entity only characterised by palatalisation.
You're incapable of understanding the phonological study, or what I'm wrighing, so you resort to substituting the label for the meaning. You're arguing from definitions by substituting sounds you make with your mouth for the understanding of factual phonological statements. You're attaching an all-or-nothing category lable to something and proceeding to triumphantly proclaim that it doesn't possess the qualities that your conception of that category excludes. You need to be right and you need me to be wrong. My eyes are currently at the back of my head. This conversation is over. Brutal Russian (talk) 02:15, 22 June 2021 (UTC)
Sorry, but no. Dark (l) is a synonym for 'velarized (l)' and so it is, in fact, a phonological term. The fact remains, despite your attempts to 'muddy the waters', that the sources say that /l/ before /ē̆/ was dark, and that your old transcription claimed it was clear. ([l] cannot stand, in a precise phonetic transcription, like the one you had attempted, for 'somewhat dark/velarized l'. It is specifically clear l, i.e. non-dark/non-velarized.) The Nicodene (talk) 02:28, 22 June 2021 (UTC)
This is precisely what I mean when I say that you have an exasperating habit of never admitting you were wrong about anything. You apparently did not even know that dark l and velarized l are the same thing, and now you're grasping for straws by arguing that the two synonyms 'dark' and 'velarized' have different meanings (one being, according to you, a phonological term and the other not... a completely fictional difference that you just invented), and also claiming that I "didn't understand" Sen's study. I understand it perfectly: he is saying, exactly like Weiss and Sihler, that /l/ before /ē̆/ was dark/velarized. He just provides more detail than the latter two about different degrees of velarization before different vowels. He says absolutely nothing, by the way, to support your claim that [ɫ] can only stand for the most velarized l: the transcription he uses for the latter is in fact [lˠ]. You have completely invented this notion out of thin air.
I have noticed that the moment I catch a serious and unambiguous mistake on your part, which can be fixed by a Google search, you have a habit of suddenly ending the conversation. The Nicodene (talk) 02:41, 22 June 2021 (UTC)
@The Nicodene: Yes, and "darker" is the synonym of "velarizeder" and "darkest" of "velarizedest". But the volatile narcissist cannot give up now. Never give up! Bite the leg! Bite and don't let go even if they wrench away your jaw together with it! Woof! Bark! Aaa-whoooo! Brutal Russian (talk) 08:38, 23 June 2021 (UTC)
@Brutal Russian: If you do not tone it down with the edgy insults, you are going to get yourself banned from Wiktionary with comments like this. This one might actually have crossed the line already. Perhaps I should link it to an administrator?
In any case, yes, dark l and velarized l are very much synonyms. See the below:
From Jakielski et al. 2017: Phonetic Science for Clinical Practice (pp. 49, 199): "Production of a postvocalic l is called dark-l or velarized-l. [...] Hopefully you remember from Chapter 2 that a velarized-l also is called dark-l."
From Burgess 1992: A Mouthful of Air: Language, Languages-- Especially English (p. 80): "dark l is velarized".
From Robinett et al. 1983: Second Language Learning: Contrastive Analysis, Error Analysis, and Related Aspects (p. 48): The voiced alveolar velarized lateral consonant [ɫ] and the voiced mid central retroflex vowel [ɚ] are phones which occur in English but not in Greek. The velarized or "dark l" is allophonic, while the retroflex vowel is phonemic in English."
From Rubach 1982: Analysis of Phonological Structures (p. 31): "In British English there are two further variants of /l/ : the so-called 'clear l' and the 'dark l' (velarized)".
From Sawaie 1994: Linguistic Variation and Speakers' Attitudes: A Sociolinguistic Study of Some Arabic Dialects (p. 39): "While Transjordanian dialects tend to use velarized L (i.e. "dark" l) [...]"
From Proceedings of the Second International Hindukush Cultural Conference (p. 168) "...except in the case of < L > , which represents the velarized or 'dark l' sound of Khowar."
From Jannedy et al. 1994: Language Files: Materials for an Introduction to Language & Linguistics (p. 61): "The l in bowl is velarized (or "dark") while the [l] of lobe is "clear"."
From Benware 1986: Phonetics and Phonology of Modern German (p. 29): "...results in a so-called 'dark l' or velarized l which is foreign to standard German, but common in English."
From Jones & Laver 1973: Phonetics in Linguistics: A Book of Readings (p. 175): "Those who know that language will remember that when l terminates a word it has a 'dark' or velarized value."
From Dickey 1997: The Phonology of Liquids (p. 49): "In a 'dark' or velarized [ɫ], the tongue body lowering extremum occurs before tongue tip extremum."
From MacKay 1978: Introducing Practical Phonetics (p. 180): "[ɫ] 'dark' (velarized) /l/".
From Wayland 2018: Phonetics: A Practical Introduction (p. 85): "Velarization is the addition of tongue back raising toward the velum. When preceded by a vowel, the English 'l' sound is often produced with this secondary gesture, and is referred to as a 'dark l'. [...] The 'dark l' also occurs when it precedes another consonant, as in field [fiɫd], film [fɪɫm], false [fɔɫs], etc. Velarization is represented by the diacritic [~] through the symbol for [l]."
(All of that, though, is technically besides the point. Sen indicates that /l/ before long or short /e/ was dark, and dark l is transcribed in IPA as [ɫ], which is exactly what I have done. [ɫ] is used in Linguistics for 'dark l'; it is not restricted to 'maximally dark l'.)
Sen is using the transcriptions [lo] and [lu] to indicate different levels of dark resonance. In a comment on page 23, he specifically equates dark resonance with velarization: "As dark resonance (velarization in articulatory terms) is correlated with backness in vowels..."
In other words, Sen's ad-hoc transcription [lo] does, in fact, represent (a specific grade of) [ɫ]. It is labelled as having dark resonance, which, according to Sen himself, means it is velarized. That is also consistent with what Weiss and Sihler say in the quotes provided earlier. The Nicodene (talk) 22:05, 23 June 2021 (UTC)
Hi again @Brutal Russian:. I decided to e-mail Sen about this and he just got back to me.
In his response he confirms what I said: the allophones he refers to as 'dark' and 'darker' are, in fact, velarized allophones of /l/. The difference between the two, and also between them and the 'darkest' allophone is simply one of degree (of velarization).
If you wish to contact him yourself, his e-mail can be seen in this screenshot or found on this page.
Now, keeping in mind that I was right about this the whole time, I want you to re-read the following comment you made to me:
"Yes, and 'darker' is the synonym of 'velarizeder' and 'darkest' of 'velarizedest'. But the volatile narcissist cannot give up now. Never give up! Bite the leg! Bite and don't let go even if they wrench away your jaw together with it! Woof! Bark! Aaa-whoooo!"
I have nothing more to say to you. The Nicodene (talk) 09:43, 24 June 2021 (UTC)
@The Nicodene: Whoo-hoo, this might be the first time The N's thought process coincided with mine. Luckily I postponed writing my own letter, otherwise this would have been awkward, and the author would have understood that their replies are being used to settle an argument. So I have to ask that The N sends my questions to the author in a follow-up letter as if they're their own, asking
  1. whether their analysis only applies to Archaic Latin or also the time of Pliny;
  2. whether they think that Pliny distinguished his three types of L on phonological intuition according to feature specification, and if so why did he distinguish the exilis L before /i/ when it's also underspecified, and only contextually palatalised - in this case how would they transcribe the underspecified allophones in standard IPA?
  3. or if Pliny distinguished it on perceptual basis, and in this case what would be the optimal way to capture the contrast between all three in standard IPA? Would it make sense to transcribe both the [+back] and the contextually velarized allophones with the same [lˠ] or [ɫ] - if so why would Pliny perceptually distinguish what even the IPA doesn't? Or would it be more appropiate to transcribe some or all of the underspecified allophones with IPA [l]? In other words, how can Pliny's description - and their study's conclusions - be best translated into IPA in order to be reproduced by other people.
Again, it would be highly desirable that the author doesn't learn that stakes are being placed on their answer, both for moral reasons and in order for the answer to be unvitiated and unbiased. Brutal Russian (talk) 23:17, 24 June 2021 (UTC)
Your message should have begun with "Alright, I was wrong about this, and I apologize for ridiculing you". You have his e-mail, so contact him yourself. Pretend this was a civilized conversation, leaving out the fact that you called me a dog. The Nicodene (talk) 03:13, 25 June 2021 (UTC)

Regarding formaticusEdit

quarrel

I did say that I wanted to leave 'at least a few days' between such discussions, but I will contradict this, for the moment.

The user @Brutal Russian posted a rant above in which he alludes to a discussion we recently had about the word formaticus.

In his recent comments, as well as his older ones, he repeatedly claims that using the term France for the late eighth century is against ‘scientific usage’. This claim is simply not true: modern scholarship prefers the term France, as can be seen by the fact that Google results for "Carolingian France" outnumber those for "Carolingian Gaul" by a ratio over six-to-one, as of right now (searching both terms with the surrounding quotation marks). I did, perhaps, overstate the case, but the basic point I was making remains true. Nevertheless I decided to go with the phrasing "what is now France" in the end in order to avoid a continuation of this frankly pointless argument over labels. Apparently, though, he cannot bring himself to let go of it.

He insults me, in the rant above, by saying that I show a ‘complete ignorance’ (I have lost track of the number of times he has called me 'ignorant') of the fact that the Latin word Gallia never truly died out, even in the early Medieval Ages, in the most polished, classicizing varieties of Latin, and that modern linguists use such terms as Gallo-Romance to describe the language family that includes modern French, Occitan, and Arpitan. I am well aware of both facts: neither contradict the fact that modern historians prefer to speak of Carolingian France. (Not to mention, of course, contemporaries.)

He mentions that I removed the caseus formaticus conjecture. That is true, I did, but I eventually came around to including it, this time clearly wording it as a conjecture rather than an indisputable fact, since caseus formaticus is not actually attested, nor is the fact that formaticus shows up as a masculine proof of such an ellipsis, cf. the dozens of Merovingian and Carolingian-era coinages ending in -aticus, many examples of which I provide in the discussion, none of which are the products of an attested ellipsis.

Speaking of the latter, he claims that I "conflat[ed] attested evidence and conjectured forms". That is not true: he simply misread what I really said. I quite clearly described coraticus, etc. as the etyma of various Old French words in the original comment. The fact that he apparently understood etyma to mean 'attested written forms in Latin' is his own fault, not mine. I did, by the way, provide a total of nineteen unambiguous attestations of such masculine forms (with -aticus/os/i), and I linked a page on which Du Cange lists several more. I could, if needed, expand the total to, say, sixty with a few more hours of research, and perhaps a hundred given an entire day to do it, but this would be a waste of time on my part: no amount of evidence would ever lead 'brutal russian' to concede the point. Nineteen sourced attestations, plus a link to about a dozen additional attestations by Du Cange, is plenty for any remotely reasonable person.

In response to my pointing out that nobody prior to the ninth century distinguished Latin and Romance as separate languages, he says that I am “appealing to nothing but the linguistic intuition of medieval Franks to define what is and isn't Latin”. This is not the case: it was, as I pointed out in the discussion, not just ‘medieval Franks’: nobody, no matter whether they were a native speaker or not, no matter which century they lived in (as long as it was prior to the ninth) ever referred to Latin and Romance as separate languages. As elsewhere, 'brutal russian' is twisting my words here.

He implies that I “have no familiarity with the modern linguistic side of the question”. This is just more toxic slander on his part: I am perfectly aware of the numerous opinions and arguments both in favour of the old diglossic model (which is very much becoming a minority view in the latest scholarship) as well as arguments against it.

He says that he revealed “information [that I was] ignorant of” in reference to his claim that spoken Latin lacked a nominative case in inanimate nouns. The source he cited, Ledgeway, says absolutely nothing of the sort, nor does any other source, and furthermore the fact that Old French and Old Occitan both had nominative forms of inanimate nouns is in direct contradiction to that claim.

(Here 'brutal russian' thinks that I intentionally deleted his comment. I did not: I simply replied to it, then sighed at the thought of developing yet another branch of an already massive and over-complicated discussion, and decided to delete my own comment. It seems I deleted his comment together with my reply. That was not intentional on my part whatsoever. Nor is it an example of him actually being right about the topic that was being discussed: as I have explained here, and as I explained in more detail on the discussion page for formaticus, he has completely misread Ledgeway, who in no way makes the unusual claim that 'brutal russian' somehow divined from his work.)

As for ‘Romanistics’ it is, at best, an extremely obscure term. Moreover, even when it is used it seems to often refer to the study of Roman culture, law, or other such topics. In the work titled Lesser-used Languages and Romance Linguistics (which is, incidentally, the very first result on Google Books when one searches ‘romanistics’) it is quite plainly stated that ‘the word Romanistics is regrettably not current in English’. Many, perhaps nearly all, of the ‘linguistic’ results seem to just be used in reference to German or Iberian scholars as a literal translation of terms found in their native languages. Neither Merriam-Webster nor the Oxford English Dictionary, moreover, even mention that the word exists in English. If one searches for “romanistics” on Google (with the quotation marks) one is met with less than six thousand results, versus two hundred twenty thousand for “Romance Linguistics”, a ratio of of more than thirty six to one. The field is most certainly not called ‘Romanistics’ with any frequency at all by British or American scholars; the term which does not appear in any standard reference work written by them, such as the Oxford Guide to the Romance Languages or either volume of the Cambridge History of the Romance Languages.

‘Brutal russian’ accuses me of narcissism, which is an odd claim. I have no problem with admitting when someone else is right about something, and have done so, whenever it was actually the case, in my discussions with him. By contrast he refuses to concede any point, no matter how many sources are arrayed against him. I have never once heard him say anything to the effect of ‘Alright, that was true. I was mistaken about that.’ Refusal to admit any wrong is a major highlight of narcissism; so, incidentally, is wanting attention and pinging half of Wiktionary to witness your attempted ‘epic beatdown’ of somebody. So is being rude and toxic without the slightest provocation, as he was to me the very first time we ran into each other on Wiktionary.

One valid criticism he has made in this latest discussion (note how I, unlike him, admit when what the other person says is actually true) is that I have a habit of hitting ‘submit’ too early and then seeing numerous things that I want to revise in my comment. I will make an attempt to do that less.

'Brutal russian' claims that I am making a strawman when I point out that he believes formaticus existed in the Classical period. I will concede that it is true that he never explicitly said it did. If he truly never believed that it did (I continue to have my doubts), then good: I no longer have to spend energy arguing for the obvious.

I did point out, multiple times, and citing several elementary mistakes he has made, that he is a beginner in Romance Linguistics. It is strange to me that he would complain about this for two reasons:

1) That is objectively true. Nobody informed about Romance Linguistics would fail to know where the rusticam romanam linguam quote is from, nor would they claim that inanimates lacked a nominative in the sixth century, nor would they claim that a complete merger of /b/ and /w/ took place in Pompeii, and so on.

2) The comment in question came a full two days after he claimed that I was ignorant of Latin, during a bizarre rant about Nazis, “peeing on scooters”, and “meme-shitters”. (Yes, he really said all that. In the Tea Room, no less.) The quote in question was “[you can’t] assert your authority in matters of a language you can't even speak”, although even prior this he had repeatedly insinuated that I do not even know Latin. Needless to say, I can speak Latin: I have studied it for over fifteen years.

Relative validity of the comments aside, one should not complain when they slander someone and the other later returns precisely the same favour. The Nicodene (talk) 02:25, 21 June 2021 (UTC)

@Brutal Russian: When I read over the FEW's 'gallischen latein' I took that to mean what is called Proto-Gallo-Romance today. If the FEW does have a separate term for Proto-Gallo-Romance (not just Gallo-Romance, which refers to later stages), then let me know.
Elcock's work is not cited on the Wiktionary entry, so I did not have that quote to consider.
I should say, though, that I am no longer opposed to saying the term Gaul, and have not been for a while now.
All I am contesting as of late is the notion that saying France is either 'absurd' or 'scientifically inaccurate'.
——————————

In a now-deleted comment you cite "Madeline 2016: 197+" as describing "the medieval Latin or modern English scholarly usage of these terms". I assume you mean this:

"Before moving further, it is necessary to consider the issue of nomenclature. Of course today’s France is not the French kingdom of the Middle Ages, and the term “Francia” does not in every time period refer to the kingdom, which was still in the process of creating a territory and an identity at the end of the Middle Ages. Depending upon the time period, the author, and the genre, the use of “Francia” or “Gallia” has implications that I do not propose to study in detail here. Yet, at the end of the Middle Ages, two tendencies can be observed: the term “Gallia” is very often used in texts of descriptive geography, while in historic texts “Francia” is more often employed. It is therefore hardly possible to trace a simple straight line which would see “Gallia” erased in favor of “Francia,” also because Humanism gave a new impetus to “Gallia” as suitable for designating both antique Gaul as well as new “Gallia,” even in the historic texts."
This quote discusses the usages of medieval, or pre-modern, scholars writing in Latin. It does not mention the usages of modern scholars writing in English.
(If you had some other quote in mind, provide it.)
——————————
I should mention here that the term France is also used in modern English-speaking scholarship for the Merovingian period (c. 450–751). Several publications use the phrase "Merovingian France", and search results indicate that it actually occurs more frequently than "Merovingian Gaul" (by ~10%).
That is simply to say, again, that it is neither 'absurd' nor 'scientifically incorrect' to use France.
——————————
I would also like to reiterate the fact that I, nearly two weeks ago, replaced the phrase with 'what is now France', purely to avoid a resurfacing of this controversy. The Nicodene (talk) 18:43, 25 June 2021 (UTC)

Regarding 'Campanian Latin'Edit

quarrel

Now I would like to address the matter of ‘Campanian Latin’.

Ah, where to begin.

The first draft ‘brutal russian’ made of this ‘Campanian Latin’ (which is apparently meant to reflect the speech of Pompeii, so up to the year 79 A.D.) included such adventurous features as a complete, universal lowering of Latin short /i/ to [e], a complete merger of /w/ and /b/ in all positions into the voiced bilabial fricative, and a nasalization of every vowel before a nasal consonant in syllable coda position (not just word-finally).

The only source he provided for these, or any other feature since (that was not already found in the Classical module) was a single citation of Adams, regarding the single matter of Latin short /i/, in which he apparently missed the point that the phenomenon in question only occurred with any regularity in final syllables, generally in verbs. I think this is about the fourth or fifth time I am explaining this (see Adams 2013: 58-61), but I don’t see any sign that he has actually understood it. In his most recent edit to the pronunciation model, he attempted to resurrect the feature exactly as he had tried to implement it originally.

If he really wants this ‘Campanian Latin’ to happen, he needs to provide sources that actually support all of the non-CL features that he wants to assign to it. So far he has not done so, apart from the one mistaken example I have mentioned.

Not only that, but in this latest rant of his (which can be seen above) he even ridicules the idea of backing up one's arguments with sources. He apparently thinks the fact that every single feature that I have added to my Late/‘Vulgar’ transcriptions is thoroughly cited, according to two or more reliable sources (all the features in question are sourced here and here) is some sort of childish debate tactic rather than a legitimate argument in favour of the transcription.

Moreover, he has not explained why his 'Campanian Latin' is in any way appropriate for reconstructed Proto-Romance lemmas on Wiktionary, which is what the 'vulgar' module is used for. Needless to say, a reconstructed Proto-Romance pronunciation is what is most appropriate for reconstructed Proto-Romance words. The Nicodene (talk) 06:42, 21 June 2021 (UTC)

In fact, let me propose a solution for this, @Brutal Russian.
If you make a separate sub-module for your experiment (leaving alone the 'vulgar' one that is primarily used for reconstructed Proto-Romance lemmas), and if you correctly* cite all the features you want to put in your ‘Campanian Latin’, I promise to never touch it.
*Not like that mis-citation of Adams (2013). The sources need to actually support what you’re saying. The Nicodene (talk) 07:50, 21 June 2021 (UTC)
You're proposing this to me? This was what I immediatley proposed to you. We will do this when we figure out a way to add a new transcription. You rejected my proposal by continuing your one-sided military crusade. Then you continued to abuse and insult me in hour-long fits of rage and told me you would rather eat glass than talk to me and called me an ignorant beginner and promised me to abuse me more if I continued trying to do what normal people do, that is talk. We will not lose sight of the fact that I shouldn't be talking to you at all. You will not bully your way around here by edit warring until the other party gives up, and will not appropriate any transcriptions. Everyone will be able to discuss and improve any transcription on this website.
My transcription was not designed for reconstructed pages, but it might as well have appeared there alongside Classical. There is no place for a phonetic transcription for proto-Romance because proto-Romance is not a sociolinguistic variety. It's not a language anyone ever spoke and it cannot have an associated phonology. If you wish to postulate a specific Late Latin pronunciation for a given time and place like I'm doing with Campanian, I'm open to it. It will borrow the phonetic discussions of proto-Romance reconstructions just like Classical and Campanian do.
The Campanian [e] was reflecting the pholonogy of Oscan. There occurred a synchronic vowel reshuffling: /ae/ in this variety was the length pair to /e/, both spelled AE or E and with the value of [ɛ(:)]. Correspondingly, /i/ started pairing with /ē/ and both could be spelled E or I, with the value of [e(:)]. This exactly parallels the vowel system of Oscan, where /ē~e/ was spelled Í and is evidenced by Pompeiian spellings like veces = vicēs, menus = minus. The final syllable phenomenon is not specific to Pompeii, and consists in a complete merger of /ē, e, i/ in the final syllable. It's present in Sardinian in the verbal inflection (cantades = cantātis) but seemingly not in nouns (sitis = sitis) - can't think of other examples. Oscan had a separate short /ɛ/ that could contrast with /e/, so in Pompeii no merger between them occurred. The merger in the verbal declension probably originates in Roman Latin from the shortening of the verbal -ēt, which coincided in quality with -it because it postdated the raising of /ē/ from Plautine [ɛ:]. Non-urban Latin had no such raising, and might not have had the shortening either. The speech of Pompeii imported this from Roman Latin. Brutal Russian (talk) 01:40, 22 June 2021 (UTC)
Again (I am tired of explaining this), Adams exhaustively analyzes the data not only from Pompeii but also a variety of other sites and concludes (2013: 60) that the phenomenon you describe was limited to final unstressed syllables, isolated spellings like ⟨ueces⟩ notwithstanding. Clackson, on page 7 of the Oxford Guide to the Romance Languages, comments: "there are hardly any good examples of mis-spellings of the expected type within accented syllables".
If you have a (purely personal) problem with the notion of reconstructing allophonic features of Proto-Romance, please take it up with the entire DÉRom team, not to mention the myriad other scholars whose works I have cited on the relevant Wiki pages. The Nicodene (talk) 08:03, 22 June 2021 (UTC)

Marriage counselingEdit

Can we set aside the equivalent of an (alcohol-free) marriage counseling facility?  --Lambiam 11:54, 21 June 2021 (UTC)

I have put the discussions in collapsing boxes. 212.224.224.150 15:36, 21 June 2021 (UTC)

Fandom-made termsEdit

As revealed on current RFVs for FemShep, the current wording of WT:FICTION only covers terms originating in the universe. I propose it be expanded with something like "Terms originating in, or which refer to specific entities within, fictional universes...". Examples of affected entries:

  1. FemShep, BroShep
  2. Supes, Spidey
  3. Ship names

As I've said on the RFV for Charizard, I feel that the purpose of the fiction clause is to avoid terms that only make sense in the context of a work of fiction. To be included, it needs to somehow transcend that setting, which in practise means acquiring a more generic meaning than it had on inception. Much like BRAND and genericized brand names, really. In fact, with modern copyright law, a work of fiction is a brand, now that I think about it.__Gamren (talk) 23:41, 21 June 2021 (UTC)

  Supportsurjection??⟩ 17:22, 23 June 2021 (UTC)
Tentatively   Support; I think the proposed wording is good, though hopefully with more input we can be more certain this doesn't accidentally cover anything we don't want it to. All the things I can think of where I'm not sure whether they should actually be excluded are things which are not affected by the change; for example, Whovian was coined outside but also refers to something outside the specific Doctor Who universe (namely, a real-world fan of that universe), so it's unaffected by either the current or the proposed rule (👍), and a transmat#Noun can transmat#Verb things in 3+ unrelated fictional universes (the cites for the verb are Star Trek, Doctor Who and Supergirl), so AFAICT it's also unaffected...? It seems like this change would also affirm our previous deletion of Talk:MissingNo.. - -sche (discuss) 23:54, 23 June 2021 (UTC)
Vote is up.__Gamren (talk) 15:39, 30 June 2021 (UTC)

DerbethBot rebootEdit

What's the current state of DerbethBot and audio? LinguaLibre data is currently not used, and as someone has just remarked, it's tedious to add audio files manually. Metaknowledge (talkcontribs) has started User:Metaknowledge/audiowhitelist, but this is not used yet? Reading some recent discussions with Derbeth (talkcontribs), they seem to be reluctant to make modifications to the bot. Perhaps we should consider a fork of it? – Jberkel 13:55, 23 June 2021 (UTC)

I thought the plan was that Derbeth would use the whitelist, but I haven't checked in for a while. A fork might be a good idea, but whatever we do, I think we need to use whitelisting instead of blacklisting to avoid the crap that has accumulated unnoticed in the past. —Μετάknowledgediscuss/deeds 16:30, 23 June 2021 (UTC)
I had some doubts about the whitelist, mainly because some people act as native speakers of multiple languages. I know it's possible if you are raised bilingual, but what about the person speaking Delhi dialect of Hindi and US dialect of English? Are both dialects spoken on the native level? I am not able to judge that, I'm not a native speaker of either. I haven't spoken about my doubts loudly, sorry for this. I became more engaged in linking CJK strokes and feature requests from de.wiktionary. I can apply the whitelist for the most obvious cases (one person-one language) and we can discuss the harder cases. --Derbeth talk 16:54, 23 June 2021 (UTC)
Ideally the bot would read the whitelist configuration from the page, and assume it's correct, otherwise it'll be a lot of back-and-forth. – Jberkel 17:18, 23 June 2021 (UTC)
@Derbeth: Yes, that person is native in both. Why are you troubled by native bilingualism? That whitelist was composed carefully, so you don't need to second-guess it. —Μετάknowledgediscuss/deeds 17:29, 23 June 2021 (UTC)
@Derbeth I am in agreement with the idea of a whitelist. I could potentially fork the code and add a whitelist although it might be easier for me to rewrite it in Python using pywikibot as I haven't done much Perl hacking in a long while. Benwing2 (talk) 03:01, 24 June 2021 (UTC)

Ok, I implemented support for all whitelist entries. Please take a look at Occitan changes, as Occitan has particularly chaotic organization of files on Commons. I hope I matched dialects correctly. --Derbeth talk 06:50, 26 June 2021 (UTC)

Thanks! I checked some Catalan and French entries, all good. – Jberkel 15:07, 28 June 2021 (UTC)

BlacklistEdit

Users whose contributions should not be added.

Username Language Location (if relevant) Reason Example
Jjackoti FR France Speech defect: lisp seconde, /sə.ɡɔ̃d/:
(file)

—⁠This unsigned comment was added by 212.224.226.27 (talk) at 17:24, 23 June 2021 (UTC).

Apostropheless older English genitives like Godes, mansEdit

Should we have sense lines (and, in the event we wouldn't otherwise have an entry, entire entries) at pages like Godes, mans, kings/kinges, etc for their use as genitives/possessives, especially in early modern English works before the use of the apostrophe was standardized? (E.g. "by godes grace, I meane not to infringe his directions, for any mans pleasure" in a 1612 text at Google Books.) By vote, we don't include God's, man's, king's, etc as genitives because they're transparently separable into God+-'s, etc; OTOH, we include gods and kings as plurals although they're fairly transparent. So which side of the line is mans on? (As an interesting case, I can find at least one citation of the Latinate genitive Jesu for Jesus; that kind of thing, irregular genitives à la irregular plurals, I certainly would be inclined to include if I could only find enough citations.) - -sche (discuss) 15:47, 23 June 2021 (UTC)

@-sche: I believe we already discussed this. J3133 (talk) 17:26, 23 June 2021 (UTC)
Thanks for that link! I was trying to find that thread before I posted this, because I recalled discussing this briefly with someone. Since it really was "we" who discussed it (the two of us, and one comment from a single other user), I hope this thread attracts more input. As I said in that thread, AFAICT and AFAIK we don't currently have entries/sense-lines for genitives of this type, but... ever since you brought it up, I've been thinking there should be some kind of community consensus (ideally consisting of more than three users) on whether we should or not... - -sche (discuss) 23:18, 23 June 2021 (UTC)
@-sche: I have mentioned contractions below; e.g., mans (“man is”). Should we also include them? J3133 (talk) 11:31, 26 June 2021 (UTC)
Seems reasonable to me to add such genitives. As @Vox Sciurorum pointed out on the previous discussion, this is already a regular feature of other languages', even if their genitives are transparent, and I see no harm in adding reasonable genitives like that... my only concern being, could this call into question what else we should include from pre-standardized orthography? From your quote: should 'meane' also be entered as an archaic variant of 'mean'? If not, why is it different from 'kings' or 'kinges', effectively just obsolete spellings of now-standardized inflections? At any rate, though, definite support for forms like 'Jesu'; if they're attested, then they're legitimate and should be added. Kiril kovachev (talk) 09:38, 24 June 2021 (UTC)
I am in favor of adding archaic genitives like Godes, mans, etc. They should just say {{obsolete spelling of|en|God's}}, {{obsolete spelling of|en|man's}} etc. And yes, meane should be added as an {{obsolete spelling of}} too. —Mahāgaja · talk 11:47, 24 June 2021 (UTC)
@Mahagaja: [[-'s|'s]], unless also used for contractions, then would need two senses (if we also want those forms). J3133 (talk) 19:19, 24 June 2021 (UTC)
Re Kiril: yeah, we definitely already include (and should include) a lot of entries like meane (e.g,, thinke); it's just obsolete genitives/possessives which, as I said in the previous thread, I'm not aware of any examples of us including before now (presumably because we don't include the corresponding modern possessives). - -sche (discuss) 21:30, 24 June 2021 (UTC)

Tone IndicatorsEdit

Is there a consensus on tone indicators here? I've created a list of the commons ones at User:AntisocialRyan/Tone_indicators. We already have "/s", and others have exploded in online use over the past couple years. See Google Trends for "tone indicators" and "gen meaning". I think the majority of them should be included, or at least on an Appendix page. AntisocialRyan (talk) 20:22, 23 June 2021 (UTC)

I'd say if they meet attestation requirements, include them. Perhaps we could discuss whether /fake and /safe and other regular words like /comment or /wondering would be better handled under /fake, /comment, etc or just by a sense at Wiktionary:Beer parlour/ — for comparison, we don't have entries for most parenthetical things like (not really) or (joking), only a few like (!). But the bulk of them, like /s, are as opaque and "idiomatic" as abbreviations like S. and p. which we include, so AFAICT the only hurdle is whether they're attested to the usual standards. - -sche (discuss) 23:30, 23 June 2021 (UTC)
Not sure how many strong sources would include these, but searching something like "genq" on Twitter and going to Latest shows how frequent these are used. Just /genq alone is said multiple times a minute, and that form isn't as common as /gen (harder to search for, though). After studying these a lot I can confirm they're all used often. Is this ok? I will wait for more feedback anyway before creating them all. AntisocialRyan (talk) 23:48, 23 June 2021 (UTC)
  • I'd like to request a different name for these than "tone indicators" -- in a multilingual project such as this, "tone" has too many ambiguous meanings (perhaps ironically, given the subject of this thread :) ). For me personally, I first thought from the title that this thread would be about notation conventions for Chinese or Navajo or Igbo or some other tonal language.
As a proposal, how about "intent indicators"? I believe that this is less ambiguous. ‑‑ Eiríkr Útlendi │Tala við mig 18:27, 24 June 2021 (UTC)
I like it, but do we decide the names for things or do we use what is widely used already? "Tone indicators" is what they're referred to as generally, I put some sources on that page if you'd like to learn more! :) AntisocialRyan (talk) 19:57, 24 June 2021 (UTC)
Ahh, good point, I agree "tone indicator" wouldn't be a great name as it refers to too many other things, so if both kinds of tone indicator exist in some languages (e.g. if there is a Chinese analogue of /s exists), the contents of the category will be a muddle. We're talking about what to call the category they'd be in, right? Because I assume the part of speech header will just be one of the existing/usual ones, like Interjection (a la "psych!"), or the ol' catch-all header Particle, or perhaps the Symbol header /:s uses. Perhaps we could add a qualifier like "discursive tone indicator", although that gets no google hits. I also see the phrase "tone tags" used, which seems a little less ambiguous: a tone marker in the languages Eirikr mentions could be called a tone indicator, but would someone call it a tone tag? If not, maybe that'd work? - -sche (discuss) 21:24, 24 June 2021 (UTC)
1. Ah right, people do refer to them as "tone tags" as well sometimes, although "tone indicator" is more common. That could work.
2. I have yet to see variants for other languages, I feel like I would notice but I probably don't pay as much attention to posts using non-Latin scripts.
3. "/s" uses the Symbol part of speech header and I think that fits well since its only in written text. AntisocialRyan (talk) 21:54, 24 June 2021 (UTC)

Editing news 2021 #2Edit

14:15, 24 June 2021 (UTC)

  • Hmm. I'm curious about this. I've been working on metrics and quantification in my day job, and struggling with the problem of metrics that don't actually measure what people think they measure.
  • Newer editors who had automatic ("default on") access to the Reply tool were more likely to post a comment on a talk page.
  • The comments that newer editors made with the Reply Tool were also less likely to be reverted than the comments that newer editors made with page editing.
A follow-up question comes immediately to mind. Granted, these comments weren't reverted, so they're probably not just vandalism. But were those comments themselves worthwhile? Did they add to the discussion in constructive ways? ‑‑ Eiríkr Útlendi │Tala við mig 18:31, 24 June 2021 (UTC)
They didn’t measure how many editors were scared away by it. Like I haven’t edited once on Arabic Wikipedia, could mend no obvious mistake, because they have enabled Visual Editing by default (for surely, editing source-code bidirectionally provokes mistakes of newbs), but I did not discover how to get out of the angry fruit salad. Fay Freak (talk) 23:00, 24 June 2021 (UTC)
@Fay Freak: What a wonderful turn of phrase, angry fruit salad. Thank you for expanding my vocabulary!  :) ‑‑ Eiríkr Útlendi │Tala við mig 18:14, 25 June 2021 (UTC)

Something something part of speechEdit

something-something can mean so-and-so, but one of the cites is "You're nothing but a something-something bookworm." in which it's not obvious if it's a noun or an adjective. Another example:

2005, “Hi, Mr. Horned One”, in Two and a Half Men, season 3, episode 6, 5:00 from the start:
Yeah, well... I'm rubber, you're glue. Something, something, something, you.

In Japanese it's supposedly うんたらかんたら or うんたら. It's also in the title of Family Guy: Something, Something, Something, Dark Side. What to do with this? It's a stand-in word with no obvious PoS. Alexis Jazz (talk) 15:27, 25 June 2021 (UTC)

"A something-something bookworm" is attributive use: in English, all nouns are happy to function as adjectives, no need for a separate PoS header. And to me it doesn't feel like something-something is the same as something, something, something. MuDavid 栘𩿠 (talk) 07:28, 3 July 2021 (UTC)

Romagnol languageEdit

Is it possible to create a Template:rgn-adj etc. for my term?--BandiniRaffaele2 (talk) 23:28, 25 June 2021 (UTC)

Server switchEdit

SGrabarczuk (WMF) 01:19, 27 June 2021 (UTC)

A Primer on Proper PingingEdit

Defective pings are by far the most common technical error here in the forums. Every day I see several attempts to correct posts where pings failed due to misspelling of user names or missing signatures. Sad to say, these attempts almost never work. Instead, they make it look after the fact like the pings were done correctly, but without actually pinging anyone.

Here's what you need to do make a ping (I'm using my own name to avoid hitting anyone with stray pings):

  1. Create a new post that includes a ping, and sign it.
    A ping happens when a link to the user's user page appears in a new message that is signed in the same edit. I don't know the details of what the system requires to see a message as new, but at the very least it can't be an edited version of something that was already there. If you misspelled something so it didn't link properly or even left the ping out entirely, simply correcting the error accomplishes nothing- you're correcting an old edit, not making a new one. I usually format the new message as a reply to the first one, with a note that I made a mistake in the first ping, and including the ping in the new message- making sure it's correct and that I sign it correctly.
  2. Make an edit and include a ping in the edit summary
    Any link to a user's user page in an edit summary triggers a ping, whether there's a signature or not. The only trick is that templates don't work in edit summaries. Instead you have to link to it the old-fashioned way, in double square brackets: [[User:Chuck Entz]]. If you you want to correct your old edit so it looks right, you can redo the failed ping at the same time in the summary for that edit, as in: "[[User:Chuck Entz]]: fixing my ping".

A couple of other points:

  1. There's no need to ping anyone on their own talk page. I've never checked whether it actually creates a notification, because the notification that someone has edited one's talk page is of much higher priority than a mere ping. The only thing such an attempted ping accomplishes is that the kind of people who are picky enough to edit dictionaries tend to find it annoying.
  2. It's easy to check your pings in preview: all you have to do is open the link in another tab or another window. This won't affect your edit window. If they don't have a user page yet, you can at least tell from the text on the page you go to whether you're at the non-existent page for an existing user or for one that isn't registered
    If you have a multi-button mouse, right-click on the link. Otherwise (on a Mac, anyway) shift-click. This gives you a menu from which you can select the option to open the link in a new tab or a new window. It's probably not as easy in the mobile version, but I've never used it.
    When you preview your edit, a preview of the edit summary should also show in the same window. You should be able to follow that link the same as you would the one in the text.

I hope you find this helpful. If anyone knows more, feel free to chime in with an addition or correction. Chuck Entz (talk) 00:16, 28 June 2021 (UTC)

Do pings register from e.g. a page creation, page deletion, move, etc? Or just edit summaries? Ultimateria (talk) 03:34, 28 June 2021 (UTC)
A page creation is an edit that has an edit summary, pinging from a page creation should work. As for the others: I have no idea, though they do seem different to me. As a test, I just deleted one of my user subpages and linked to your user page in the deletion reason- did you get a ping? Chuck Entz (talk) 03:48, 28 June 2021 (UTC)
I did not. Ultimateria (talk) 03:53, 28 June 2021 (UTC)
To make it easier: activate beta feature "discussion tools". It adds a reply link, it is autosigned, and it has an icon for pinging participants. Vriullop (talk) 06:54, 30 June 2021 (UTC)
Hello. I mostly don't use this, partly because I doubt the mechanism (maybe merely mentioning a user should "ping" them; cf. subtweet), also because I tend to assume people watch their discussions (but I often ignore my watchlist for weeks too, and then stuff scrolls off the available history). Maybe we should have an info page like WT:PING. It could even redirect to whatever actual help page exists for this feature (on 'pedia or WMF), since that stuff is never easy to find. If you create one here, remember to indicate how to block pings, and warn people that pings may be blocked, and aren't guaranteed to get through. love, Equinox 00:13, 7 July 2021 (UTC)

Consensus on adding |withtext=1 parameters to {{bor}} and {{inh}}Edit

Inqilabi is trying to add |withtext=1 parameters to {{bor}} and {{inh}} instead of these templates. This subsection is for reaching consensus (i.e., support or oppose; as an extension of the vote above), which Inqilabi should have created instead of secretly adding them (the parameters were not mentioned above), as this is clearly controversial (especially “Inherited from”; or writing it manually instead). J3133 (talk) 16:33, 27 June 2021 (UTC)

Inqilabi has removed this discussion, which I moved from WT:RFDO, as I am not fussy about where it is placed. J3133 (talk) 05:31, 28 June 2021 (UTC)
I think it is not worth fighting over. There is no clear notion of what changes need a vote or even an advance public notice. In this edit I "secretly" added a |nodot=1 parameter to {{R:TLFi}}. IMO, changes that do not affect users who are not aware of them do not need a vote.  --Lambiam 12:46, 28 June 2021 (UTC)
I agree with Lambiam. The addition of an optional parameter is harmless and @Inqilābī's request was not at all problematic IMHO.--Tibidibi (talk) 13:51, 28 June 2021 (UTC)
Given what a contentious issue this has been, he really should have let things cool down before trying to push for a |withtext= parameter. I'm against adding "Inherited from" to all inherited derivations carte blanche, and don't think "inherited" needs explaining with a link, nor does "borrowed". I'm pretty tired arguing about this issue though and would like a temporary cease fire. --{{victar|talk}} 18:41, 28 June 2021 (UTC)
IMO this whole controversy is pointless, and I agree with Lambiam and Tibidibi. Benwing2 (talk) 02:57, 29 June 2021 (UTC)
Your vote expressed that, so no surprise. --{{victar|talk}} 03:17, 29 June 2021 (UTC)
@Victar: What? I requested for the parameter just 2 hours after the vote had ended, while the recent contention (which was begun by SodhakSH; and for which he’s still fighting) originated 3 weeks later. And, I will not permit any bot-operation for standardising the etymologies, so what is really your concern? Also, having ‘Inherited from’ would help to distinguish inheritances from the plain ’from’ as used with {{der}}. ·~ dictátor·mundꟾ 23:19, 29 June 2021 (UTC)
  • Oppose this; I want any short parameter like |wt=1 or |tx=1, that too only if the new templates are deleted. Better to type "from" than these lengthy parameters. 🔥शब्दशोधक🔥 04:55, 29 June 2021 (UTC)
    Lengthy? Sure, you would want to manually link the keywords to the glossary while other editors would benefit from the parameter. The shorter form |wt= was my idea, anyway. ·~ dictátor·mundꟾ 23:19, 29 June 2021 (UTC)
    Mr. Dictātor, when the fuck did I say |wt= was not your suggestion? Even |tx= was somebody else's. These are way easier to type than |withtext=. Asking [w]ould any of you mind if I replace "Inherited" with "{{glossary|Inherited}}" (and same for borrowed) does not mean that I'm going to that: until the new templates exist, I'll use them only in new entries. Answer this one for me: what is value of so many Keep votes at RFDO? 🔥शब्दशोधक🔥 09:28, 30 June 2021 (UTC)

Use of borrowed template in etymologiesEdit

As per the policy on the use of the borrowed template here https://en.wiktionary.org/wiki/Template:borrowed/documentation#When_to_use we are supposed to only use it for the borrowing languages spoken at the time of the borrowing. For example, English wouldn't use "borrowed" for a borrowing that happened during Middle or Old English. However, this is only good for languages that we have delineated into different stages, like English (modern, Middle, New), French, Portuguese, Spanish, German, Irish, etc. But what about the many languages that don't have that breakdown? That makes things inconsistent. Like Italian, Albanian, Romanian, Serbo-Croatian, etc. These terms are meant to cover over 1000 years of history in some cases, while Middle English is a specific period in time of 400 years or less. This is a problem both across languages and within them as well. For example, some etymologies might not necessarily explicitly include the "Old" version of the language. Like in French, Spanish, or Portuguese, something may be borrowed from Latin but they don't include the Old French, Old Spanish, Old Portuguese form necessarily, even if it was borrowed during the time that language was spoken. It's also harder in cases like Welsh where the exact time of a word's introduction is not always clear. It helps to know a detailed etymology and history of the word. But for example, some French terms may be listed as "borrowed" from Latin, while others (where Old French is listed as an intermediate) are simply listed as being "derived" from Latin. I do agree that the ultimate goal of this idea is good, and it will be good once everything is actually done, but until then it can be messy, disorganized, and inconsistent. Guess we just gotta work with it. Word dewd544 (talk) 22:24, 30 June 2021 (UTC)

July 2021

Regional Variations in Pali Inflection TablesEdit

We need to discuss how we are going to handle regional variations in Pali spelling as it affects inflection tables. This affects templates {{pi-decl-noun}} and {{pi-conj-special}}.

One problem is that transliteration cannot easily be tuned on a form-by-form basis. For variations in the stem, I have already proposed a 'subst' parameter. Please discuss that approach there. --RichardW57 (talk) 05:25, 2 July 2021 (UTC)

This problem is actually solved by the now-implemented |subst= in the main Pali inflection templates, provided that all forms of a word spelt the same are to be transliterated the same, and the transliterations corresponds to some Pali spelling. --RichardW57 (talk) 11:22, 4 July 2021 (UTC)

There are already some parameters to address the generation of the tables - use of implicit vowels and its concomitants (|impl=), the shape of the ā vowel (|round=), the choice of consonant for -y- (|y=) and the form of the Lao script instrumental/ablative plural in -bh- (|liap=). Parameters |impl= and |y= are passed down into a special transliteration interface (trwo() instead of tr()) in the transliteration module.

We now have some additional features to worry about. Apparently some Mon use SIGN II for -iṃ, and Shan Pali (assuming it meets CFI) works like a different script to the Burmese script. Shan may have two writing systems, but quite a few writing systems seem to have been proposed for Pali in the Shan script. Two Shan writing systems are displayed in the lists of alternative forms. There is also a Lao writing scheme whereby consonant clusters are (mostly) written using the rules for Lao. What parameters would users find usable for controlling these features in the inflections. What mixtures of writing systems in the inflection tables should users find acceptable?

I ask, especially of @Octahedron80, that implementation of these new variations be suspended until we have discussed the matter. --RichardW57 (talk) 05:25, 2 July 2021 (UTC)

Slavey splitEdit

@Mahagaja We have forgotten to conclude this discussion, so I'll summarise the proposals:

  1. Change Slavey (den) into a family (Slavey languages, den); put South Slavey (xsl) and North Slavey (see below) as members.
  2. Split North Slavey (scs) into three languages: Bearlake Slavey (den-blk?), Hare Slavey (den-hre?) and Mountain Slavey (den-mnt?)

The code names can be debated about. If nobody's opposing these, I would like them to be implemented. Thadh (talk) 12:39, 2 July 2021 (UTC)

Delete Category:West_Proto-GermanicEdit

Someone might want to redirect the entries listed there to Proto-West Germanic entries. (@Rua). ·~ dictátor·mundꟾ 20:04, 2 July 2021 (UTC)

@Benwing2, you might do a bot-operation for this. ·~ dictátor·mundꟾ 21:58, 4 July 2021 (UTC)

Admin help requiredEdit

I recently came across Wiktionary:Beer_parlour/2020/July#Update_CFI_to_reflect_decision_on_treatment_of_attributive_forms, which I had forgotten about. I believe that the wording should be uncontroversial, as it merely reflects the result of the vote at Wiktionary:Votes/2019-05/Excluding_self-evident_"attributive_form_of"_definitions_for_hyphenated_compounds, in which case I would be grateful if an administrator would now insert the wording as suggested. Alternatively, if there is something further that I need to do to make this happen, please let me know what it is. Thanks. Mihia (talk) 15:57, 3 July 2021 (UTC)

@Mihia:   Done, sorry for the repeated delay. I'll see about deleting any affected pages. Ultimateria (talk) 01:12, 8 July 2021 (UTC)
@Ultimateria: Thanks very much for doing that. Would it also be possible to make the italicisation of terms in the CFI text the same as in Wiktionary:Beer_parlour/2020/July#Update_CFI_to_reflect_decision_on_treatment_of_attributive_forms? Or the part in quotes could instead be "attributive form of periodic table" if you prefer that. Mihia (talk) 17:58, 8 July 2021 (UTC)

Announcing the confirmed candidates for the 2021 Wikimedia Foundation Board of Trustees electionEdit

The 2021 Board of Trustees election opens 4 August 2021. Candidates from the community were asked to submit their candidacy. After a three week long call for candidates, there are 20 candidates for the 2021 election.

The Wikimedia movement has the opportunity to vote for the selection of community-and-affiliate trustees. The Board is expected to select the four most voted candidates to serve as trustees. Voting closes 17 August 2021.

The Wikimedia Foundation Board of Trustees oversees the Wikimedia Foundation's operations. The Board wants to improve their competences and diversity as a team. They have shared the areas of expertise that they are currently missing and hope to cover with new trustees.

How can you get involved? Learn more about candidates. Organize campaign activities. Vote.

Read the full announcement.

Best,

The Elections Committee

Announcement posted by User:Xeno (WMF) at 23:26, 3 July 2021 (UTC)

Removing Vulgar Latin Pronunciations from MainspaceEdit

(Notifying Fay Freak, Brutal Russian, JohnC5, Benwing2, Lambiam, Mnemosientje): I notice that @The Nicodene has been systematically removing |vul= from {{la-IPA}} from mainspace Latin entries with the edit summary "Removed Proto-Romance because the word is not reconstructed". This strikes me as fundamentally wrong, for at least a few reasons.

  1. To start with, there's a difference between reconstructing a whole term and merely reconstructing a pronunciation. All historical pronunciations that aren't covered by detailed phonetic decriptions are reconstructed. Following the logic of the rationale given to its conclusion would mean removing pronunciations from all dead languages in mainspace.
  2. I'm not really sure exactly what Vulgar Latin is as we cover it on Wiktionary, but I don't see consensus for redefining it to be exactly synonymous with Proto-Romance. Sure, there's overlap, but the methodology and criteria seem different to me.
  3. Which brings up my main concern: a project to systematically change a basic feature of our Latin entries should not be undertaken without consensus of the Latin community. Yes, this has been touched on in the ad nauseam "debates" between The Nicodene and @Brutal Russian, but it got lost in the sea of ad hominems, nitpicking and arguments about who said what when. Besides which, the matter of removing Vulgar Latin pronunciations was never directly addressed. Chuck Entz (talk) 05:26, 4 July 2021 (UTC)
@Chuck Entz I am glad to see that you are interested in this. Pinging also @Ser be etre shi, who has also expressed an interest.
Vulgar Latin is, as we all know, an unfortunately nebulous term. Before either I or BR ever touched the 'vulgar' pronunciation module, it was being used haphazardly for not only reconstructed Proto-Romance entries, but also for random Latin words attested from Plautus (e.g. eccille) all the way to the fifth century C.E. (e.g. Justinianus) and even well beyond that (e.g. Iraquia, which is Neo-Latin for Iraq).
BR attempted one solution, namely narrowing 'Vulgar Latin' down to a sort of reconstructed Pompeiian pronunciation. I am not opposed to the general idea here, but no sources have yet been provided that support the assigned features. (To date I still do not understand where the idea of a word-initial /b/-/w/ merger came from, for instance.) A properly-sourced version of that would be appropriate for Latin words attested by the late first century, perhaps with a grace period of an additional 50-100 years after that.
As for reconstructed Proto-Romance words, the reconstructed Proto-Romance pronunciation seems most appropriate. Sources for the reconstructed features can be reviewed here and here.
But what do with attested Latin words from, say, circa 200–600?
I previously proposed using the Proto-Romance pronunciations as the basis for a pronunciation of Late Latin, and indeed most of the features assigned to it are supported by inscriptional evidence (which the scholars reconstructing Proto-Romance of course had in mind as they did their work). The members of the DÉRom project themselves see their reconstruction as a means of discerning what Late Latin was like via the comparative method, rather than some sort of separate and purely hypothetical language.
BR, however, made his contempt for that proposal so clear that I decided, in the end, to withdraw it in an attempt to avert further controversy on that matter. That is also why I began to remove the reconstructed pronunciation from attested Late Latin entries. (I would have removed it anyway from Plautine or Neo-Latin entries, of course.)
Unfortunately I do not think there are any sources out there that attempt to lay out a synchronic phonology of Late Latin at a particular point in time without working from comparative Romance data. (I would be delighted to learn otherwise.)
If anyone has thoughts on ways to fill the 200–600 C.E. gap, please share. The Nicodene (talk) 06:43, 4 July 2021 (UTC)

The few previous discussions about Vulgar Latin that I'm aware of are collected here. It's clear to me that people are interested in having a non-standard, more phonetically advanced pronunciation in addition to the conservative Classical. A consensus was emerging in these dicussions that the Latin of Pompeii and the wider Campania is the best way to give it to people - when Classical-age Vulgar Latin isn't used as synonymous with the speech of Plautus, or in the sense of a diglossic "language that the plebs spoke and the patricians didn't", it's used in this sense. Accordingly, I had replaced the former indeterminable and unrealistic Vulgar transcription with "Campanian" by introducing the most striking features attested in the region to what is otherwise Classical Latin. The name is after Adams (eg. 2007 & 2013): he sees it as having been characterised by at least one "long-standing feature of the speech of this Campanian region, distinguishing Campanian Latin still at this date from the speech of the city" (talking about the monophthongisation of /ae/). This variety basically anticipates many of the Late Latin developments and can be used as a generic, even urban post-Silver Latin pronunciation with minimal modifications. Most of its features have their ultimate origins in the Middle to Late Republican "rustic Latin" of Latium, part of the well-known Roman dichotomy with the other part being "urban"; but that is best represented as Praenestine and/or Faliscan.

I believe all the features I included can be found in any discussion of Pompeiian inscriptions, from Väänänen's Le latin vulgaire des inscriptions pompéiennes and Introduction au latin vulgaire to Wallace's Introduction to Wall Inscriptions from Pompeii and Herculaneum. I'm also referencing Rohlf's 66 Grammatica storica... The word-initial /b/-/w/ merger came from a combinations inscriptions (first confused in that position in Pompeii) and from the fact that this merger in favour of /v/ is characteristic of modern Southern Italian (Neapolitan), where /b/ only occurs after consonants or when double. A parallel thing happened to its /d-/ (now /ð/ and widely /r/, as in Neapolitan riece (ten), Sicilian reci), and with a lot of variation to /g-/. It might or might not have characterised Campanian Latin at that stage (2nd century AD), and I tend to think there was free variation, so I included it for variety. Apart from that, the transcription is highly conservative, and so I was intending for it to be given by default alongside Classical for all the lemmas because it simply represents a variant pronunciation of the same language that people are interested in seeing. The above user's complaints about no sources are disingenuous and simply serve to retroactively justify unilaterally removing my transcription without any prior discussion or request for sources.

This user above has unilaterally replaced this conservative and discussed transcription of Latin with the controversial transcription of the purely comparatively reconstructed proto-Romance whose status in relation to Latin and whose need to be reconstructed at all are still contentious issues. They have pointedly done so with no prior discussion and in violation of the principles of this website. Moreover, they're "requesting administrative help" to stop me from stopping them from appropriating the module in this manner. They have their fringe ideas and they're here to implement them, and they have an article with a pile of references to gaslight you in case you disagree. It's your ideas that are fringe, you see. In particular, postulating a pan-Romance, supra-regional phonology not situated either in any time or space on the basis of "why couldn't it have existed?" is indefensible. DÉRom uses no attested Latin evidence (what they dub 'le code écrit') whatsoever in its reconstructions as a basic premise (a basic fact that the user above repeatedly denies), and its goal of reconciling proto-Romance and Late Latin is nothing but a goal. Currently their reconciliation consists in the 'Le corrélat du latin écrit..' section, some footnotes and dictionary references at the bottom of most but not all articles.

What I think is should be done:

  1. restore the pronunciation of an unambiguously attested, known variety of Latin that I term Campanian that every work on Pompeiian Latin describes;
  2. stop user The Nicodene from unilaterally overwriting the pronunciation module and shamelessly telling people to re-add what they've overwritten in a separate sub-module. Stop them from trying to appropriate parts or the whole of the module by proposing "deals" where other people don't edit their appropriated part and vice versa.
  3. make the Campanian pronunciation default alongside Classical as has been done with Ecclesiastical, and figure out how to add at least a third one (Plautine) in order to complete the range of pronunciation that coexisted in the Late Republic, to reflect speech variation much better than currently, to illustrate the language's development and to give people an ability to choose. People are on the right track when they think that perhaps Plautus, Cicero and Zosimus from Pompeii didn't pronounce things the same. I requested assistance with this here, but so far nobody has responded. This would make the removals of vul=1 irrelevant.
  4. stop user The Nicodene from removing default pronunciations based on date of attestation. It's a basic insight of historical linguistics that date of attestation in the written record is not the date of appearance in speech, and neither is date of last appearance the date of disappearance from speech. This user makes no consideration of langauge as a complex sociolinguistic system that we have only a very rough outline of in written attestations. They demonstrate this in the fōrmāticus debacle, insisting we base everything on the date of attestation, from pronunciation to morphosyntax (ellipsis and gender) to even how to call the geographic region, freely disregarding the evidence for a much earlier date of the word's appearance.
  5. stop user The Nicodene from trying to conflate Late Latin with proto-Romance. If a seprate proto-Romance transcription is to be made, it should be phonemic like with all other reconstructed languages on the website, and in accordance with common sense. It should basically mirror what DÉRom gives. It seems reasonable that it be limited to the reconstructed namespace, as people will already have the "Vulgar" that they want. Ideally pagenames should look like that too, and not as current Latinisations, but this will probably entail making proto-Romance into a whole separate language, with the ensuing problems of having to derive Romance languages both from Latin and proto-Romance simultaneously. This is currently the only way I see to reconcile proto-Romance and Latin.
  6. if and when we introduce an actual Late Latin phonetic transcription for a well-documented time and place, such as 6th century Ravenna, it will necessarily borrow from the fruits of DÉRom's labour. The problem with this is that the language of this particular time and place differed from Classical Latin in numerous other ways and might deserve its own morphology as well, for instance. Which again leads into the above issue. Brutal Russian (talk) 12:22, 4 July 2021 (UTC)
I want to participate in this discussion but I can't if this is going to degenerate into another ad hominem quarrel between the two of you... I found your medicalization of The Nicodene particularly horrid. However, I do have some questions:
For @Brutal Russian: first, regarding point 4, in terms of showing this Campanian Latin by default, do you see nothing wrong with showing it for words significantly post-dating the 2nd century AD like Iraquia (mentioned above) or tēlephōnum? It would strike me as very odd. (It'd be great if people other than us three (Brutal Russian, The Nicodene, me) could also chime on this...)
Second (and secondarily), do you think you could manage to cite the exact page(s) in either Väänänen book or Wallace/Rohlf where initial v- is said to be confused with b-? I just want to confirm The Nicodene is wrong about them not stating such a thing (I can't remember this detail myself).
For @The Nicodene: while I'm aware a few linguists have been doing God's work trying to reconcile attested written Late Latin with pronunciations reconstructions off Romance data, via statistical methods involving misspellings (e.g. Politzer, Adams, Leppänen), plus interpretations of grammarians of the time, is Brutal Russian really wrong about DÉRom largely omitting this in favour of pure reconstructive work?
Besides, I have the feeling that as reasonable as the Proto-Romance reconstruction in your head might be (or not), there are philosophical issues in pushing this reconstruction via Wikipedia articles you have largely written yourself... Call it an unfortunate consequence of too few people seriously reading all that material in question and participating on Wikipedia/Wiktionary, if you will. It would be good to pay particular attention to the scholars' consensus (well, more like schelling point), or to simply stick to the exact representation the DÉRom people prefer (which seems to be solely phonemic as BR said, phonetic allophony being more like territory under current exploration).
I'm well aware that scholars have been working out PRom allophony (just yesterday I was reading Loporcaro's Vowel Length from Latin to Romance (2015), whose "PRom OSL" (the author's term, open-syllable lengthening) is obviously allophonic), and while I understand the desire to represent such things, phonetically in square brackets, I have to admit PRom allophony often leaves me wondering... E.g. I recall one time talking to a PhD student of Romance linguistics and receiving pushback over OSL, partly because it's only some more geographically central languages that show its effects (Old French, Friulian, Dalmatian, ?Tuscan...), being absent from Portuguese, Spanish, Romanian and Sardinian. I couldn't help but agree that it smelled like a later wave sound change... You could say all more peripheral Romance simply happened to drop OSL, but why not say the more central Romance developed it and spread it instead? (Yes, I'm aware Augustine in De Musica alludes to canō having a long ā... add his African Latin dialect to the list if you want.)
The allophony of /ll/ being more retroflex (which you alluded to elsewhere) seems to be on better ground, being more widespread, but here it's you and me making the call, not the scholars at large (in the WP article you cited a paper by Xavier Gouvert after all... it's this kind of thing why I call it territory under exploration).--Ser be être 是talk/stalk 14:17, 4 July 2021 (UTC)
Agree with Brutal Russian. The gist is point 4. From it all “The Nicodene’s” paralogisms hail. It could be easy: If man believes it was present in vulgar speech then it gets a Vulgar Latin pronunciation. This has nothing to do with whether a term is in the reconstructed or main namespace. There are no “reconstructed Proto-Romance entries” insofar as the entries are imagined Latin. Likely pronunciations get added. But they are for Latin and not some third “Proto-Romance”. You see, Proto-Romance didn’t exist, for our purposes: Whenever it is attested it is Latin, and whenever a reconstruction is situated in reality it is Latin. Fay Freak (talk) 14:27, 4 July 2021 (UTC)
@Ser be etre shi: Yes, it would strike me as odd, but people have added vul=1 to stranger things showing that there's an interest in learning how to pronounce any word the way "the Roman people" might have done. If for instance a person adopts Campanian as their default pronunciation (and what's to stop them? Look at Luke with his quantitative Ecclesiastical :-), who should we assume they won't try to pronounce modern vocabulary the same way? Reciting Praenestine inscriptions using, say, German Ecclesiastical would strike me as more peculiar, which is why I tend to remove Ecclesiastical from dialectal/variant forms, but in principle even that could be left in. I doubt there's a big audience for this and those who do that probably use that pronunciation for lack of worry about pronunciation and don't need pronunciation tips any way.
On DÉRom's methodogy, specifically as envisioned by Buchi, one can consult this and this in French (sacrebleu) and this in English. Admittedly I haven't read through any of the introductory articles of any of the three dictionary issues so far (I'm afraid I'd overload Google Translate), but I think I know when I see a foundational principle of doing Romance etymology differently, and I've used the dictionary itself and found that it follows the principle as stated, only adducing written Latin correlates at the end to "confirm" their findings. One admittedly questions that it's possible to completely unsee the Latin data even when professing to do so. Awareness of the bias doesn't automatically exclude it.
In addition, nothing of what I've seen tells me they're as blind to sociolinguistics as The Nicodene wants to portray them. In fact shedding better and modern light on Latin sociolinguistics seems to be a fundamental aim of their approach (Conception du projet 5.2.4, first link): "la variation est omniprésente", listing the /phonological/, semantic, morphosyntactic and lexical levels. The [phonetic] level that The Nicodene is trying to make pan-Romance is conspicuously absent, because quite clearly the dictionary doesn't even attempt to reconstruct that level.
On OSL, Spanish and Rumanian diphthongisation make postulating it necessary for these languages as well. Portuguese still has OSL, in Portugal with strong reduction of everything unlengthened. It's currently present in Sardinian, though definitely not Neapolitan/Molise/Lombard-level present (perhaps that's why no diphthongs in Sardinia). But the diphthongs alone will probably necessitate as many transcriptions as there are Romance sub-branches, with vaguely understood waves of developments staggered by vast time spans, close to two millenia in the case of Sardinian. That is even if any agreement at all is possible (I have ideas).
Allophones of /l/ is one case where not just geographic, but even sociolingustic (Greeks) differences in allophony are explicitly attested, and no sweeping pan-Romance generalisation is possible.
On b/v confusions, it's attested eg. in Pompeian baccvleivs, baliat, bervs, and Väänänen 1981: 61 mentions Romance continuations such as Romanian bătrân, Portuguese bodo, Tuscan boce, not to mention the situation in Sardinia where only a couple of villages maintain the distinction. Also Väänänen 1966: 50. It seems to me the South of Italy was the hotspot for this particular phenomenon, but I can easily drop it, and I did before getting reverted again. Brutal Russian (talk) 16:28, 4 July 2021 (UTC)
While it is true that the DÉRom’s lexical entries give only phonemic transcriptions–which is true of perhaps most dictionaries–that is simply because the focus of these entries is lexical, not phonological.
The DÉRom does in fact cover allophonic phenomena in great detail in volumes I and II, and that is what is cited on the Wiki page, with supporting citations from other sources like Zampaulo or Ferguson, who also do the same thing.
Moreover the reconstructed (allophonic) features are supported by Late Latin inscriptional evidence, as examined extensively by e.g. Grandgent.
The citation pointing to Xavier Gouvert, regarding the realization of /ll/, is not from a separate paper: that is simply the name of a section of the DÉRom, volume I. Incidentally I did not implement a retroflex realization for /ll/ on the module because the other sources that I consulted about this, e.g. Zampaulo, do not seem to agree with Xavier Gouvert's attribution of the retroflex realization to Proto-Romance, suggesting that it is not widely accepted. I have made a point of refraining from adding features that multiple reliable sources do not agree upon.
Variation certainly is present in any sufficiently widespread language, and of course the DÉRom acknowledges that, and many of its lexical entries or 'mini-articles' do, in fact, provide more than one form for this very reason. That does not prevent modern scholarship from reconstructing a general Proto-Romance phonology, complete with allophonic features, as they have done.
The matter of diphthongization in Romance is certainly tricky, and Ferguson (1976: §7), at least, posits a sort of metaphonically-conditioned proto-diphthongization of stressed (and lengthened) lax e and o, for Proto-Romance. He sees the resulting *[eɛ] and *[oɔ] as underlying all future Romance developments, even including Sardinian (thereby explaining metaphonic raising to [e] and [o] in the tonic vowels of words like tempus or oru).
––––––––––––––
Looking through Väänänen (1966), I do see that the source supports at least some of the features that you assigned. Still, I would like you to document citations (including page and section numbers) for all of the features that you have assigned. This is not too much to ask for a major module.
I have reservations about making such a 'Pompeiian' pronunciation apply to all Latin entries, since that would include words attested 1,000+ years after Pompeii was buried by ash. The fact that some modern speaker could, conceivably, use such a pronunciation privately does not mean it is well-established enough to make it default on Wiktionary.
P.S.: Re chronology.
Just as there is no a priori reason to assign, on Wiktionary, an Ecclesiastical pronunciation to unattested forms reconstructed from Romance data, there is also no reason to assign a Classical one either.
An exception can be made if at least one scholar in the field has stated that the form in question existed, or at least likely existed, in Classical times but was simply never written. The Nicodene (talk) 06:30, 5 July 2021 (UTC)
@Fay Freak I have never thought that Proto-Romance reflects a language distinct from Latin. I have rather argued–both on this page and elsewhere–that it does not. (The supposed distinction is upheld by the same person that you say you agree with.) When I say “reconstructed Proto-Romance entries”, understand that as shorthand for “entries for unattested Latin words that have been reconstructed from comparative Romance data”.
The problem here is that there is no single "Vulgar Latin pronunciation” that Wiktionary could possibly give: the term is wildly imprecise, referring to anything non-literary in approximately the period 200 B.C.E. to 600 C.E., with some applying the term to even later centuries than that. Needless to say, sound changes were operative in Latin–as in any living language–throughout the period. The Nicodene (talk) 06:33, 8 July 2021 (UTC)

@Brutal Russian: I have taken the time to work out a fair compromise. Please consider the following proposal:

There should be two sub-modules, one labelled ‘Proto-Romance’, the other labelled ‘Pompeiian’ or ‘Campanian, late first century C.E.’ or similar. Neither should be labelled as simply 'Vulgar Latin', an imprecise term with an infamously wide range of possible meanings.

  • The Proto-Romance one, complete with well-cited allophonic features, is to be used for unattested words reconstructed from Romance data.
    • I have already agreed, per your request, not to apply it to attested Late Latin words.
  • The Pompeiian one is to be used for Latin words that are either attested by 300 C.E. or claimed by at least one source to have probably existed, unwritten, by about the time of Pompeii.
    • That provides a ‘grace period’ of more than two hundred years after Pompeii. There needs to be some cut-off, and this seems reasonable. (I do not think anyone wants to see Wiktionary apply a Pompeiian pronunciation to a word attested from 1180 to 1246 in the southwestern corner of Poland.)

If we can agree on that, it is now simply a matter of asking someone knowledgeable about Lua to make two such sub-modules. Neither needs to even use the code ‘vul’: they can simply be ‘pr’ and ‘pom’ or similar.

A compromise will allow everyone to move on with their work of improving Wiktionary. The Nicodene (talk) 21:27, 5 July 2021 (UTC)

@The Nicodene: One obvious reason to give Classical for reconstructed entries is to illustrate phono evolution. Another is that Classical is the standard pronunciation for all lemmas. A third reason is that there's no way to disprove that this pronunciation was ever used for these items since date of appearance is unknown and dates of individual phono features disappearing entirely is too. Given the starting point and the ending point, a person can make an educated guess at the intermediate points, and the vast majority of people cannot triangulate it without explicit transcriptions.
DÉRom has some dicsussions of allophony, but their goal is to arrive at phonemic reconstruction. The question is: given the starting point (proto-Italic) and a range of Romance outcomes, is there any continuity and what would be the optimal phonemic representation? No claim is made as to what combinations of these allophones existed in what speech varieties. The closest thing to such a discussion that I've seen is found at the end of Gouvert 2014, p. 48. A belief in the necessity to settle on a universal proto-Romance narrow phonetic transcription to me is inexplicable even from the point of view of the philosophy of language and historical linguistics, and doubly so in that it repeats the mistakes of Vulgar Latin ("that's how all the plebs in the Empire spoke as opposed to all the patricians"). No valid reason has so far been presented in its favour.
Here's a word from 14th century Poland where a Classical pronunciation can be argued to be inappropriate: granicia. It can also be argued to be appropriate as a guide for people who want to incorporate the word in their speech any way - I've participated in more than one discussion on "how would the Latins adapt /t͡ʃ/ in words like chilēnsis or Czechia." The only possible Classical-age native adaptation seems to have been /s/, and later perhaps /t͡s/ judging by spellings with zo- for t(h)eo-, probably as a Greek-borrowed (bilingual) phoneme.
Campanian Latin hasn't disappeared with the destruction of Pompeii, it's continued by the modern dialects of Campania and southern Lazio. Given that it anticipates many Late Latin developments, it will be an appropriate transcription at least to the end of antiquity, certainly where Classical is appropriate. Same considerations to tracing phonetic evolution apply. I don't know of a language on this website where such fuss exists over pronunciations. I haven't observed anything of the sort with Ancient Greek. I don't see what's the practical point. We don't even have consistent attestation dates/authors, a much more pressing problem. If there already exists a similar practice with some other ancient language, we can see about borrowing from it. Otherwise it's an exercise in whose arbitrary position wins.
There are people willing to engage with my thoughts, know what I know, how I know it and how I interpret it, to offer criticisms for both to consider. For these people I'm ready to write pages of text and provide the links and references they need. References typically require interpretation, and I will happily discuss the interpretation of these references with those able to discuss it. What I'm not ready to do is to engage with those who, for being unable to engage with my thoughts, instead engage in reference warring; who, due to their own lack of information, project onto me their ignorance of what knowledge and interpretations I based an edit on, and assume that I based it on no knowledge, no references and/or on an inability to correctly interpret them; and that gives them justification to discard my edits as trash and edit war me. I will not enable the practice of taking a module hostage by paying ransom in references.
Those interested will consult the references provided and discuss what features they think deserve to be incorporated in the transcription whose aim is to portray sociolinguistic variation by stereotypising a local variety and contrasting it to the standard. In short, Classical and Campanian are both reconstructions; in addition to being individually valid, the pairing can be used to represent two poles of a continuum that likely was both diatopic (Campania-Rome) and diaphasic. It's true that Classical has a special status of also supposedly being taught, giving it a status beyond just "reconstructed", but what's being taught in reality is often widely different. I would argue that Campanian has the same validity as a classical-age reconstructed pronunciation as what we currently have and if it was taught precisely, it would have been miles closer to the target. In short I don't believe there's any significant epistemological difference between the two. Brutal Russian (talk) 21:16, 7 July 2021 (UTC)
@Brutal Russian: While it is true that one cannot prove a negative, there does need to be evidence (or at the very least one scholar's speculation) that shows that the form in question existed in Classical Latin for there to be a chronological or historical justification for assigning it such a pronunciation on Wiktionary. The starting point for such words is Proto-Romance, not Classical Latin, unless demonstrated otherwise. Moreover, while providing a dubious Classical pronunciation could be useful for illustrating diachronic sound changes, the same reasoning would justify assigning Proto-Romance pronunciations for any Classical or Late Latin word that survived into Romance, which you are clearly against. (Although, if you change your mind, that could actually be a fruitful avenue of discussion.)
Nowhere in the DÉRom does it say that it is necessary to limit oneself to the phonemic level. The very first volume of the work contains, in part two, an extensive section titled ‘Reconstruction Phonologique’, which lays out a reconstructed phonology of Proto-Romance, complete with allophony. (The last part is particularly interesting.) If you would like more information on why they undertook this project, you are free to read it in their own words. The fact remains, however, that they did, and other sources as well reconstruct allophonic Proto-Romance features.
We are not discussing whether a Classical pronunciation is appropriate, on Wiktionary, for a fourteenth-century Polish word, but rather whether a reconstructed Pompeiian one is. A revived Classical pronunciation is in widespread use among modern Latinists; a revived Pompeiian one is not.
It seems that you agree with the principle of a cut-off, but not where exactly it should be. That, however, can be worked out in time. Wiktionary does not, by the way, assign reconstructed ancient Greek pronunciations to non-ancient words, as far as I can tell.
Per Wikipedia:BURDEN it is your responsibility to provide citations for Pompeiian Latin, no matter who asks you to.
Should a revived Pompeiian pronunciation come into widespread use among Latinists one day, we can revisit the topic of making it a default pronunciation for all periods. The Nicodene (talk) 22:30, 7 July 2021 (UTC)

@Brutal Russian: I have now read through the sources that you mentioned.

It occurs to me that it would be a good idea to lay out here, in detail, the issues I have with the Pompeiian module. They can be summarized as a series of questions:

1) Why is there a complete merger of the phonemes /b/ and /w/ in every environment?

Väänänen’s own conclusion (1966: 128) is that, in Pompeii, “the cases of b-u confusion are very few and doubtful”. He does mention the ‘sandhi theory’ on page 52: “At the beginning of a word, b […] would have remained an occlusive when preceded by a consonant […] at the beginning of a word, the confusion of b and [w] was doubtlessly decisive when a vowel preceded”.
Parodi, cited elsewhere by Väänänen, mentions that as well on page 195: “the general tendency was to consistently continue the route that word-internal b had followed, between vowels, and to reduce it to v even when it was initial, if a vowel preceded it.”
Lloyd (1987: 239) mentions that as well in From Latin to Spanish: “If we turn back to the situation in Late Latin when variation was still a living process, we can see that the initial /b-/ would have had two realizations: [b-] after pause or consonants, and [β] after words ending in a vowel [...]"
Hualde (2011: 2228–2229), in The Blackwell Companion to Phonology, vol. 1, provides a detailed summary: “Again, the merger would be expected to affect word-initial B- and V- when intervocalic; that is, ILLA BUCCA, for instance, should have undergone lenition to [ilːaβukːa] […] At the last stage represented in (11), the phonemes /b/ and /v/ are in contrast in word-initial position only if not intervocalic (e.g. after pause or consonant) […] The frequent cases of confusion between initial b- and v- […] provide quite strong evidence for the hypothesis that the phonemic contrast was indeed analogically re-established in word-initial postvocalic position, after a period where the two phonemes were contextually neutralized, as proposed by Weinrich (1958).”
In other words, word-initial /b/ tended to become a fricative when preceded by a vowel, but this contextual (i.e. limited to a specific environment) merger was later reversed, in many regions, by substituting the occlusive [b], which would have survived all along in initial position when not preceded by a vowel.
I would, for the purposes of the Pompeiian module, limit word-initial [b] > [β] to cases where there is a preceding word ending in a vowel. Even so, note that there is not a single example of ⟨u⟩ for initial [b] in Pompeii, whatever the preceding sound.

2) Why is short /i/ rendered as [e] in all environments?

Adams (2013: 60) shows that the phenomenon only occurs with regularity in word-final atonic syllables, generally verb endings.
Väänänen (1966: 128) says that “e for i only appears to a notable degree in the endings -is, -it > -es, -et”. Out of the only three stressed examples where the readings are not doubtful, he immediately casts doubt on two of them as explainable via assimilation to the e of a following syllable (p. 21).
Wallace (2005: xxvii) says only that: “in graffiti the short vowel e was used with considerable frequency for original short i in word-final syllables”, which he follows by providing numerous examples of verbs. Notice that the section is titled Short i in Final Syllables.
Clarkson, on p. 7 of the Oxford Guide to the Romance Languages, cautions that “there are hardly any good examples" of the phenomenon in stressed syllables and, on the next page says: “The conclusion from these spellings would appear to be that, in the first century AD, the confusion between ē and i was not yet at a stage where it permeated into speakers' writing habits (so also Eska 1987; Adams 2013:58f) […] Vowels in final syllables of polysyllabic words (which were never under the stress accent in speech) do show alternations in writing between e and i, but here the merger affects short e and short i, and need not be related to the changes which will later affect Romance vowels.”
I would, for the purposes of the module, limit the phenomenon to final unstressed syllables. Note also Clarkson's comment that this is a contextual merger of short i and short e, which your module distinguished.

3) What source claims that intervocalic /d/ and /g/ were fricatives in Pompeii and not in Classical Latin?

4) Why is there not, for instance, simplification of geminates after long vowels, or numerous other features mentioned by Väänänen?

I would also like to ask why you claim that the formation of the glides [w] and [j] is “syncope”, but that did not affect the module, at least. The Nicodene (talk) 21:57, 8 July 2021 (UTC)


@Brutal Russian: Notice how Wiktionary deliberately avoids assigning a 5th century B.C. Attic pronunciation to these terms, which entered Greek in later eras:

δούξ, Κωνστᾰντῑνούπολῐς, Σκλάβος, βίρρος, φραγέλλιον, ἱεράρχης, πάσχα, πίτα
παροικία, Τοῦρκος, κυριακή, σάββατον, στήκω, Ἀλεξανδρέττα, σεβαστοκράτωρ

Periodization matters. The Nicodene (talk) 21:51, 9 July 2021 (UTC)

Concerning Kuching HakkaEdit

@RcAlex36, Justinrleung Kuching Hakka, Kuching (Hopoh) Hakka, or Kuching (Hepo) Hakka is descended from the Hakka variant as used in Hepo, Jiexi, Jieyang, Guangdong, China, and therefore is more similar to Cantonese than the Hakka variants as used in Taiwan, in terms of written characters. I believe that the Taiwanese Hakka standard should not be used to write the Kuching Hakka variant, as some words have different etymology. Wiikipedian (talk) 10:10, 4 July 2021 (UTC)

@Wiikipedian: The word is probably 到 even though it's in 上聲 instead of the expected 去聲. 《客家社會生活對話》中「到₃」、「到₅」功能的重疊及其與臺灣客語的比較 writes it as 到₃. I would like to hear Justinrleung's opinion on this. RcAlex36 (talk) 13:29, 4 July 2021 (UTC)
@Wiikipedian, RcAlex36: In general, if the Cantonese and Hakka words are etymologically the same, we should write them in the same way. There are certain cases where this is difficult, like for the word for tired, where most Hakka sources use 𤸁 but 攰/癐 is used in Cantonese. (This example is kind of irrelevant for Kuching (Hepo) Hakka since another word 𤺪 is used.) We usually follow the Taiwanese standard if the 本字 is not clear and various sources do not agree. We often also consult sources from Guangdong, Fujian and Jiangxi. The written tradition for Hepo Hakka may be similar to Cantonese, but Hepo Hakka should be generally more similar to other varieties of Hakka rather than Cantonese. Most varieties of Hakka spoken in Taiwan come from Guangdong as well, so I don't see why there would be a problem with following the Taiwanese standard. If we're talking about 著 (tó) in the Taiwanese standard specifically, I would say we should not follow it because it is not quite the etymological character and varieties that would have the /au/ vowel for this series use /au/ (like Meixian, for example). I would support writing it as 到. — justin(r)leung (t...) | c=› } 20:09, 6 July 2021 (UTC)

Hebrew transliteration - time to clear the mess.Edit

Hello everybody!

(@Malku H₂n̥rés, Metaknowledge, Fenakhay, Thadh, Lingo Bingo Dingo, Erutuon, Gnosandes, Pinnerup, Fay Freak)

As I'm sure many of you did already, I've noticed the lack of standardisation when it comes to Hebrew transliterations. Put simply: It's a mess. All other Semitic languages seem to deal with it in a much neater way. I've generally noticed the following scenarios:

  1. Simplified/Modern Hebrew transcription only: אֵל‎, בַּיִת‎ (see derived terms, too), קַרְקַע‎, אָחוֹת‎, etc.
  2. Accurate transliteration only:ראש‎ (see transliteration of derived terms), *ḳarḳar-, etc.
  3. Both of the above in random order: עָפָר‎, שיבולת‎, *halak-, *śamš- (note how the order in which they appear is also irregular), etc.
  4. Mistakes of any sort: עֹשֶׁר‎ (ע neither transliterated nor transcribed), *ʕaśar- (ע transliterated as /ʔ/ instead of /'/), פַּרְעוֹשׁ‎ (missing accent), etc.

Some of us have been recently discussing about this on Discord, listing different opinions, eventual problems and possible solutions. One thing pretty much all of us seemed to agree with was that something needs to be done about it. I'm aware that there have been similar discussions in the past, and also that an attempt to automatise the process through a transliteration template was made (see Module:he-translit). I think it's time to resume the discussion and make some decisions. Just to kick off the discussion, here is what I would like to see on a Hebrew entry:

This seems to me the most Wiktionary-like solution, using the parameter |tr= for an actual transliteration and giving a Modern Hebrew transcription via |ts=. For the indication of the accent when non on the final syllable (that might be necessary to automatise the transliteration/transcription template, from what I understood), I would use the oleh ( ֫ ), as it is apparently already widely used in textbooks and so easily recognisable by many already. Let me know what you think, issues you might see, alternative solutions. Thank you! Sartma (talk) 14:55, 4 July 2021 (UTC)

@Sartma: Hello. I don't know why I was mentioned. I will only support that you really need to use the sign oleh ( ֫ ). For this sign is indeed found in publications on Hebrew grammar. That's all. Gnosandes (talk) 15:05, 4 July 2021 (UTC)
I only recognize the oleh from a Christian who was interested in Biblical Hebrew using it and me saying, "What is that?" and looking it up and finding out it was a cantillation mark from three books of the Tanakh that was otherwise unused, and that it was not relevant to the example of Biblical Hebrew that they were presenting, so I don't support it; I don't think it's as universally recognized as you're claiming it is, and in fact it has a very separate usage traditionally that is not compatible with its usage as a stress marker. פֿינצטערניש (Fintsternish), she/her (talk) 18:36, 5 July 2021 (UTC)
No Modern Hebrew transcription—which would also use |ts= in an unseen manner—is needed as the scholarly transcription already includes everything. That link to WT:HE TR beside each transcription at a header could just tell manaman in an additional column how the IPA values are (it seems that to write IPA there one was too inert like one was too inert to use full characters instead of this cripple-keyboard transcription); additionally I arread that automatic generation of the pronunciations as on עַזָּה(ʿazzā́) is also possible. In a few cases though I have seen Modern Israeli Hebrew being claimed to have retracted stress against the other pronunciations, as on עַזָּה(ʿazzā́). Fay Freak (talk) 15:23, 4 July 2021 (UTC)
Comment: In how far is Modern Hebrew transcription not deductable from the Biblical Hebrew transliteration? Because looking at the examples given above, it seems they are (with the exception of bb > b, but as I understand it, modern Hebrew doesn't have gemination): p̄ = f, ḵ = kh, ṣ = ts, ḇ = v, ṯ = t, š = sh, ʿ = ', long vowels are ignored in Modern Hebrew. If the modern variants are just respellings of the Biblical ones, why even bother giving them? Thadh (talk) 16:26, 4 July 2021 (UTC)
One could argue that distinguishing e.g. long vowels from short vowels or w and b in transcriptions for words only used in Modern Hebrew (or other stages of Hebrew in which certain mergers took place) is awkward. Overall I strongly prefer a shift to a more scholarly system, though. ←₰-→ Lingo Bingo Dingo (talk) 16:52, 4 July 2021 (UTC)
@Ruakh, Enoshd, Mnemosientje, פֿינצטערניש I hope we can get some input from people who are competent in Modern Hebrew. ←₰-→ Lingo Bingo Dingo (talk) 16:52, 4 July 2021 (UTC)
No time rn for a long writeup but for starters one thing that is often different is the stress; Fay Freak has already noted the example of עַזָּה, which is pronounced áza in Modern Hebrew with initial stress. It's a bit of an annoying situation; MH transliteration is inadequate for readers interested in Biblical Hebrew, but scholarly transliterations will be confusing as hell to readers interested in Modern Hebrew. For example, there is no difference in pronunciation between kuf and kaf in MH, idem between khet and khaf, vet and vav, tet and tav, etc. and indeed even alef and ayin (except for Mizrahi speakers), but scholarly transliterations show differences nonetheless (e.g. q for kuf and k for kaf, which looks weird to me as a speaker of MH). In transliterations seen in daily life in Israel, q and ḥ and so forth are basically not used as a result, not to mention stuff like ṣ. While the spelling is the same, so to speak, the pronunciations of BH and MH are about as different as those of ancient and modern Greek.
Bottom line is Biblical transliteration, with geminates and long vowels and all these specific characters, will throw off most modern Hebrew speakers and learners, but modern Hebrew transliteration leaves out important info for Biblical Hebrew learners/people interested in comparative linguistics. I think a good compromise would be to include a scholarly transliteration optionally, and have the modern transliteration as a default (i.e. to have the option of two transliterations), so as to not have Biblical transliterations on MH words which would look woefully out of place. So I guess I mostly agree with Sartma's proposal. (Unfortunately the Hebrew headword line is already cluttered AF due to status constructus forms etc. - adding a second translit would certainly not help declutter that situation..) — Mnemosientje (t · c) 18:37, 4 July 2021 (UTC)
Worth noting that the Modern Hebrew translit is the de facto default atm: Wiktionary:About Hebrew#Romanizations. — Mnemosientje (t · c) 18:54, 4 July 2021 (UTC)
@Mnemosientje: I understand that if we only choose Biblical transliteration we risk alienating Modern Hebrew speakers/learners, but it's also true that native speakers do write different letters for kuf and kaf, khet and khat, and so on, and even in Modern Hebrew if you want to vocalise a word you would use Biblical niqqūds, so part of me thinks that it's just a question of habit: when Romanising Modern Hebrew the norm is to use the "simplified" version, so a proper transliteration would look weird to a native speaker. On the other hand, the simplified version is completely unsatisfactory to whoever studies Biblical Hebrew. Another idea that was brought up on Discord was having independent entries for Modern Hebrew and Biblical Hebrew, the same way that Arabic is separated form it's national "spoken" varieties. Could that be an option? Sartma (talk) 22:07, 4 July 2021 (UTC)
Splitting Hebrews would complicate things, and doing this for transcription would be out of proportion.
I do assess though that in so far as one writes different or the same letters, thus one does not confuse Modern Hebrew learners if our transcriptions reflect them. On the contrary, maybe mingling qūp̄ and kāp̄ does not do them justice and promotes spelling mistakes. But as a middle ground you can use the Latin letter ⟨k⟩ with a dot below, ⟨ḳ⟩, which we use anyway for Ethio-Semitic, and Proto-Semitic: *ḳarib- – because @Rhemmiel was a fan of it and I didn’t care one way or the other.
So it is also commendable to transcribe כ‎ as either ⟨k⟩ or ⟨ḵ⟩ because it reflects well that is the same letter. The same goes for all other begedkefet letters. Scholarly transcription is straightforward.
I don’t see any argument why we shouldn’t transcribe שׁ‎ with ⟨š⟩. The háček is well-known. שׂ‎ can be ⟨ś⟩ because it’s rare any way and if anyone doesn’t understand it he correctly reads it without the acute just like ס‎ with ⟨s⟩, which is also like in modern Ethio-Semitic languages (the descendants of *šurš- all read /s/ for conservatively spelt (śä)). Rare ז׳‎‎ runs as ⟨ž⟩ well of course.
צ‎ in turn should not be transcribed as ⟨ts⟩ because it does not count as two consonants but one root consonant so let’s stick to ⟨ṣ⟩, nothing fancy like ⟨c⟩.
And so I got rid of all the digraphs already, in addition to shewing why the macron letters are actually easier. I suspect this “chat romanization” is not actually there to make it easier for readers but for editors, who should know how to type in characters with diacritics – that can be seen by the avoidance of the half rings ʾ ʿ which Modern Hebrew learners can’ take issue with as they take only little notice. Daily life in Israel is hardly the yardstick of what is desired, otherwise we would use Arabic chat alphabet because it is daily life in Egypt – very ugly.
But we can tone it down with the vowels a bit. For Šwā there are less intrusive alternatives, such as U+1D4A MODIFIER LETTER SMALL SCHWA ⟨ᵊ⟩ and U+1D4A MODIFIER LETTER SMALL SCHWA ⟨ᵊ⟩. Macra over ⟨u⟩ and ⟨o⟩ and ⟨i⟩ seem to be of little significance for Hebrew, though theoretically-comparativistically they distinguish root patterns. I have to note that the transcriptions are not made for the purpose of being reverse-engineerable to the Hebrew script: Like in Ottoman Turkish we don’t transliterate the Arabic alphabet by circumflexes because we have the original spelling right next so we employ |tr= to approximate how one actually understood the sounds. Fay Freak (talk) 02:31, 5 July 2021 (UTC)
@Fay Freak: I guess that once we have a template, it will be easy to decide what characters to use, and even change them in case preferences change. To be honest, I would prefer using ⟨q⟩ for ק‎, mainly because pretty much all my Hebrew textbooks (even the Assimil one for Mother Hebrew) transliterate like that, but I'm not against ⟨ḳ⟩ either if that's what the majority wants. As long as we have a consistent, standard transcription, I'll be happy! Sartma (talk) 09:31, 5 July 2021 (UTC)
I agree 100% with @Sartma — scholarly and Modern should both be included, as they are separate things that a reader might be interested in when looking up Hebrew entries. Modern Hebrew transcription is very easy to read for a student of Modern Hebrew, but the scholarly transcription is necessary for people with an interest in Ancient Hebrew and/or the historical development of Hebrew reading traditions. It makes the most sense also to put these under tr and ts parameters, but the main hangup is that a random new editor coming along to edit an entry might not know the system. פֿינצטערניש (Fintsternish), she/her (talk) 10:22, 5 July 2021 (UTC)
@פֿינצטערניש: The idea is that this will all be automatised, so the only thing an editor should know is how to spell the Hebrew correctly with niqqūds, and the transliteration + transcription would appear automatically. Sartma (talk) 11:16, 5 July 2021 (UTC)
Even better. The only thing is that there are cases where spoken Hebrew diverges from what is expected based on the Nikkud, and it should be possible to manually input when that is the case. פֿינצטערניש (Fintsternish), she/her (talk) 13:22, 5 July 2021 (UTC)
Just because it bothers me, I will repeat what I said on Discord. Neither the scholarly Biblical Hebrew transcription nor the Modern Hebrew transcription that we currently use are strict transliterations, even ignoring stress. Both do not have a one-to-one relationship between the Hebrew graphemes and the letters of the romanization, and are not completely reversible. For instance qamats has two transcriptions, ā and o, and matres lectionis aren't transcribed. (For instance bēṯ could be בֵּית‎ or בֵּת‎ or בֵּאת‎.) Thus Module:he-translit was hard to write. (User:Wikitiki89 figured out lots of the edge cases that I had given up on.) The Biblical Hebrew transcription is closer to a transliteration given that it distinguishes more Hebrew graphemes (fricative ב and consonantal ו for instance), but it's not there. Thus strictly speaking it's a kludge to put the scholarly transcription in |tr= and the modern transcription in |ts=. I'm not opposed to the idea, because there isn't a better simple way to do it, just being picky about terminology because we were being picky on Discord. We could try to make an actual transliteration, but that would be unpleasant to read and not very useful for newbies.
There is only other option I could think of, but very difficult and it's perhaps impossible to agree on the details: adding a second transliteration parameter (called who knows what, |tr2=?) and putting the Modern Hebrew transcription in that. Perhaps this could be made to also allow a proper way to display Japanese kana and romaji (currently I think they're shoved in |tr= with a comma). But even if we could agree on doing this, it involves changes to Module:links, which is used by almost everything, so it would be painful. The |tr= and |ts= for Biblical and Modern is the easiest option that we have, even though it's not technically correct. — Eru·tuon 19:35, 5 July 2021 (UTC)
@Erutuon: We can already use |tr2= within {{head}}: עָפָר (ʿāp̄ā́r or 'afár). I wouldn't mind this layout either, it's honest, in a way. Sartma (talk) 00:02, 7 July 2021 (UTC)
@Sartma: Right, just ignore |tr2=; I'm not seriously proposing that as a name because it would conflict with existing parameters in commonly used templates, like {{affix}}. The real parameter name would have to be something else. — Eru·tuon 01:00, 7 July 2021 (UTC)
@Erutuon: I think בֵּית‎ or בֵּת‎ or בֵּאת‎ should be transliterated as bêt̠, bēt̠ and bēʾt̠ respectively. That's how they appear in textbooks where a proper transliteration is used. Sartma (talk) 08:50, 7 July 2021 (UTC)
@Sartma: Well, the exact details of the transcription aren't relevant to my point, but perhaps see Module talk:he-translit § Vowel distinctions for why WT:HE TR and Module:he-translit don't indicate matres lectionis in this way. Wiktionary doesn't have to do exactly what textbooks do. For my part, the circumflexed vowels and silent letters tend to just confuse me about what the phonemes actually are. — Eru·tuon 21:47, 7 July 2021 (UTC)
@Sartma, Erutuon: I learnt with a system where circumflex indicates that the vowel doesn't get shortened because of shifts in accent. However, it would be good to indicate what the matres lectionis are in the transliteration, e.g. by superscript letters. --RichardW57 (talk) 22:46, 7 July 2021 (UTC)
@RichardW57, Sartma: Superscript letters sound a bit clearer to me than circumflexes. There are superscript characters for w, y, h, but I don't know of a superscript ʾ (and it might be hard to distinguish from a regular one). We could use HTML tags, but then the transliteration wouldn't copy and paste cleanly as plain text. I think my book at least sometimes uses parentheses around silent matres lectionis, which has its own problems stylistically because transliterations are usually inside parentheses to start with. — Eru·tuon 21:00, 13 July 2021 (UTC)
@Erutuon, Sartma: There's a whole range of letters suitable for transliterating aleph:
  • ʔ U+0294 LATIN LETTER GLOTTAL STOP (or a casing pair if you really prefer)
  • ʾ U+02BE;MODIFIER LETTER RIGHT HALF RING
  • ˀ U+02C0 MODIFIER LETTER GLOTTAL STOP
  • ʼ U+02BC MODIFIER LETTER APOSTROPHE
One might even be able to bring oneself to use U+02BE for anciently sounded and U+02C0 for quiescent. --RichardW57 (talk) 18:39, 14 July 2021 (UTC)
I think ʔ U+0294 LATIN LETTER GLOTTAL STOP for non-mater-lectionis aleph and ˀ U+02C0 MODIFIER LETTER GLOTTAL STOP for mater lectionis aleph would be the clearest, but it's probably a big departure from tradition to not write aleph with the right half ring. — Eru·tuon 19:29, 14 July 2021 (UTC)

@Wikitiki89, Qehath, who might be interested in this. PUC – 11:10, 5 July 2021 (UTC)

Let's make a synthesis of everything that has been said on Discord and here. I spent my day doing it so don't ignore it. Here is not my personal opinion unless there's "I", but the arguments of everyone. It's a big message, though I tried to make it as synthetic as possible, but complete, comprehensive, exhaustive.

Introduction

Hebrew romanization is messy currently because there are several systems simultaneously used across Wiktionary, each user deciding which one to use. A unified romanization throughout Wiktionary's coverage of Hebrew, ie. a standardization, is necessary. The best way do to so is to use a module, which allows to ensure the standard will be respected and therefore makes things easier for the contributor who won't need anymore to write it because automated, and for the reader who knows there's no mistake and that the given romanization is consistent because standardized. Module:he-translit already exist thanks to Erutuon, but it's not ended and we need to agree about a solution in order to end it. The question isn't "should we?" but "how will we?".

I-Unification: Modern and Biblical Hebrew

By unified, it also means that the given romanization should be convenient for any stage (chronolect) of Hebrew: Biblical Hebrew (BH), Modern (Israeli) Hebrew (MH) as well as other medieval dialects. In a nutshell, MH has plenty of loanwords and neologisms and can freely pick up vocabulary from BH and later stages, which inherit part of their lexicon from BH, whose lemmata are limited. Therefore, any BH lemma is automatically a MH one too, and the best solution is to label (with {{lb}}, {{tlb}}, {{defdate}} or another) the period since which a term is attested, insofar as it can be used in any later stage, leaving MH loanwords and neologism without label. Splitting Hebrew would mean massive duplication, which would be useless in this context. On Discord there was consensus about keeping a single Hebrew header (actually the discussion started with this), which means the romanization will be common to MH and BH, and the automated standard should end the use of two kinds of romanizations, one chiefly for MH, somewhat tentative (use of digraphs, several romanization for one consonant and several consonants organized the same way), and a scholarly one initially for BH but which can also work for MH thanks to its precision. Working here means that it's a coherent system, not that MH speakers will be totally familiar with it. Indeed, MH transcription is fully deductible from the BH one, just ignoring such as gemination and vowel length. On the other hand, it would be stupid to use BH romanization for MH loanwords and neologism... MH being the living (thus growing) Hebrew language, MH romanization has to be a default one.

II-Romanization: transliteration and transcription

Romanization means the use of Latin script for a transliteration |tr= (stricto sensu, conversion of the graphemes and diacritics, should be reversible) and/or a transcription |ts= (phonological, therefore proper to each dialect, appears between "//", not necessarily reversible). The module shall serve for romanization, on the head for the entry and when a Hebrew term is mentioned, as well as for a future Hebrew pronunciation module Module:he-IPA, whose starting point to generate pronunciation for several dialects will be the romanization. Remember that the aim of the romanization is to help readers: if we were all Hebrew speakers we wouldn't need this; likewise, it's not meant to be a pronunciation since there's a section for this.

  1. In a pure transliteration, no data is lost, ie. alef (ʾ<') is distinguished from a ayin (ʿ<'), as well as bet (ḇ<v) and vav (v, or w<v), chet (ḥ<kh) and khaf (ḵ<kh), khaf (k) and qof (ḳ(or q)<k), tet (ṭ<t) and tav (ṯ<t), samekh (s) and sin (ś<s). It's precise, it's the scholarly romanization. Stress is not marked since graphemes do not say where it falls, unless using oleh which would resolve everything as I understand it. Glottal stops are always written, and the value of one letter is independent from the others, ie. a vav has the same transliteration no matter it's /v/, /w/ or /o/. Vowel length can be marked. Lastly, a transliteration isn't IPA-like so it's better to use <ḇ, ḵ, p̄> than <v, kh, f> since <ḡ, ḏ, ṯ> are also used, and not *<ɣ, ð, θ>. This is due to MH phonology, whereas BH is more parsimonious for a proper transliteration. <ṣ, š> are preferable to digraphs <ts, sh>.
  2. Using a pure transcription, it is relative to MH or BH (Sartma wants it for MH) and therefore it merges the consonants above and lose vowel length according to MH phonology. However stress is necessarily marked, and some adaptations can be made, for instance due to matres lectionis (such as vav, which can be transcribed as /o/ when it stands for a vowel), for which "it would be customary to use the ^" according to Sartma. Apparently, qamats can be unpredictably /ā/ or /o/; a pure transcription would require the distinction, which is problematic if unpredictable (unless it's /ā/ in open syllable and /o/ in a closed one, with more or less exceptions). Though it appears between slashes //, IPA symbols are not mandatory.

The question is to have either both, or only one having the advantages of both, with flexibility towards the definitions but still uniform (because automated). Note that both are used simultaneously used for languages using cuneiform, or pure abjads, in which what's written (given in transliteration) is quite different from the phonemic transcription (given in transcription). This doesn't make lot of sense IMO to use the former for vowel length and scholarly romanization and in the same time the latter for stress and phonemic transcription. I don't see the point to use both, peculiarly when a proper transcription would be specific to MH or BH, rather than one romanization, using |tr= and displaying distinction between each letter, stress, vowel length, all the consonants even not pronounced, and matres lectionis ie. romanizing vav as <o> when it's <o>. As a result, everybody would happy, since there are all the information on this merged romanization, however it may be overladen and therefore disturb people interested in MH.

III-Stress: position and implication

The main problem is the stress position:

  1. It is unpredictable. We don't know if it's absolutely unpredictable for BH, or just hardly, and if it would require to know the part of speech (POS) of the term. But even if it were predictable, there are the MH neologisms and loanwords whose stress position is truly unpredictable.
  2. The best, because by far the easiest for the user and the shortest for the module, which make it very elegant, is to indicate the stress position by means of a diacritic called oleh (Alt+6 on Windows) which is consistently and widely used in dictionaries and scholar works, see Sartma's message above, or with another diacritic (e.g. meteg). Perhaps there should be a default stressed syllable, when unmarked, say the final one, which would allow not to put it all the time.
  3. It should not be indicated on a transliteration, but only in a transcription, since the graphemes do not say where falls stress. Unless using (or rather showing) oleh. It was suggested to write oleh for this purpose but to remove it when displayed. I think this would be overcomplicated, all the more to remove it.
  4. Given that diacritics are already added, it won't bother to add one more. For a transliteration, either it's exclusively for consonants, using the pagename, but nobody would be satisfied with that, or it encompasses the added diacritics, including the one for stress. The romanization will anyway be generated from what's put in the template (with diacritics), not from the pagename (without any), so there's no technical problem to add stress with oleh.
  5. It may change through time, but seldom, I don't know how frequent it is.
  6. Apparently we can't do without the stress position, as stress would change things for vowels like length or position which also modifies transcription, not only pronunciation. Stress is a phonemic feature, compare /'ál/ and /el/. According to Metaknowledge, "the sticking issue has always been stress. If we mark that in the Hebrew, everything else can be overcome or specified." Thus there are implications of stress for transcription that are totally relevant for the pronunciation section, separating the dialects, but ignored for a transliteration. Those have not been explicitly said, I know not why the stress is so important beyond itself.

Conclusion

To conclude, there are a constatation (Hebrew romanization is messy), a necessity (agreeing on a standard) and a solution (a module), which remains to clarify (its exact working). I kept coherence where there is, so that oppositions in opinion appear clearly to you all. I'll let you discuss below what still need to be discussed. Then we will create a vote for the main proposal. Malku H₂n̥rés (talk) 17:32, 5 July 2021 (UTC)

Glad to see some initiative here! Firstly, we should be clear that this is not fundamentally an issue of transliteration, but rather an issue of how a language at two distinct stages in its history might be treated as a single unified object. Suffice it to say, the handling of Hebrew is unlike that of other languages on Wiktionary. I agree with @Fay Freak that, as long as Biblical and Modern Hebrew are treated here as a single language, a Biblical transcription scheme should be standardized, as these are the forms from which those of Modern Hebrew and the various reading traditions ultimately derive. Romanization, of course, will be of little interest to Modern Hebrew speakers anyways. Now, we must also be clear that when we speak of “Biblical Hebrew transcription,” we are not referring to a strict transliteration of Tiberian Biblical Hebrew (ie. the Hebrew text and vowel diacritics used in Wiktionary headers). Tiberian ◌ָ ambiguous indicates /ɔ/ or /ɔː/, since vowel length is left unwritten. A transcription which differentiates between o and ā for Tiberian ◌ָ is incorporating information from other reading traditions in which these vowels do not merge in quality. I’ll be able to expand on such issues and their implications later, but for now I’d suggest that anyone interested in familiarizing themselves with the Hebrew vowel system read Benjamin Suchard’s The Development of the Hebrew Vowels. I believe it contains all the information necessary to solve the problem of BH transcription, at least. Rhemmiel (talk) 11:06, 7 July 2021 (UTC)
@Rhemmiel: I didn't read all the 300 pages of that essay, but I think we might want to stick to something more mainstream and in use already in textbooks/dictionaries. I would want Wiktionary to remain "friendly", while still being exact. People who are interested in essays like that won't certainly be using Wiktionary. Sartma (talk) 19:41, 7 July 2021 (UTC)
@Sartma: I'm not sure if you'll be among those working on the module, but those who will be might find the material in this paper helpful when they run into problems determining the right output for a given vowel sign. I suggested it because it contains information relevant to module design, not for anything on the user-end of things 👍 Rhemmiel (talk) 22:55, 7 July 2021 (UTC)
Support final syllable as default stressed syllable and indicating the accent on the Hebrew (with oleh) in all other cases (segolates and unpredictable MH words). Sartma (talk) 08:43, 7 July 2021 (UTC)

Update - New proposalEdit

It looks like the majority of people who commented here agrees on the need to have both transliterations (Modern and Biblical) and on the desirability of an automated template. At this point, we just need to decide a format and start working on the template. My preference would be for something like this:

I would use a full scholarly transliteration for Masoretic BH as found in most textbooks (with ^ for matres lectionis, ā/o for qamets/qamets hatuph, etc.), since it's the most widespread and the only one that allows us not to care about pronunciation issues too much (but I won't oppose alternative proposals if they make sense). What do you think? Sartma (talk) 13:11, 13 July 2021 (UTC)

I think you have essentially two proposals here: showing two Hebrew transliterations in link templates and changing the Biblical Hebrew transcription system to show matres lectionis with circumflexes. Circumflexes currently aren't recommended in WT:HE TR, and Module:he-translit has a complete or near complete implementation without circumflexes.
Showing two transliterations might have the most support, but it requires coming up with a plan and modifying Module:links, which is a big job because the module is so widely used. And it might also require changes to Module:he-translit somehow, though perhaps fewer changes than adding circumflexes would require. Based on my reading of Module talk:he-translit, maybe the module has stalled because we haven't chosen a stress symbol, and because we don't have separate representations in the Hebrew script for the two pronunciations of qamets (אָ) and the pronounced and unpronounced shva (אְ), which are not completely predictable from context? User:Wikitiki89 probably can correct me if he's around. Not sure how easy those problems are to fix, and even if they are fixed, I'm not sure exactly how to proceed in Module:links to enable a second transliteration.
About circumflexes and matres lectionis, I haven't seen clear agreement. I think User:Wikitiki89 is against circumflexes because they're non-phonemic (he did most of the work to figure out the odd cases in Module:he-translit), I haven't been convinced that they're useful because they confuse me, and User:RichardW57 proposed representing matres lectionis with superscripts rather than circumflex (which seems clearer to me if it's possible). — Eru·tuon 19:22, 14 July 2021 (UTC)

Sardinian varietiesEdit

Hello. Some time ago, I added a few entries and templates (mainly related to verb conjugations) related to the Sardinian language. In the interest of representing the major varieties, these entries were about Logudorese, Campidanese, Nuorese, Sassarese and Gallurese. Looking at the page for Regional Sardinian, it seems that only the first two (Logudorese and Campidanese) are actually taken into consideration here on Wiktionary. Since there isn't a Wiktionary:About Sardinian page, I was wondering if this was official Wiktionary policy. I'm not sure if there is someone in particular I should be pinging, but the three users related to Sardinian seem to not have been active for quite some time. — GianWiki (talk) 17:50, 4 July 2021 (UTC)

@GianWiki: I don't know much about Sardinian, but Gallurese is a Corsican dialect treated as a separate language on Wiktionary (sdn). Sassarese is equally treated separately (sdc). Thadh (talk) 18:14, 4 July 2021 (UTC)
@Thadh: Thank you very much for your input. I was not aware of that. – GianWiki (talk) 18:18, 4 July 2021 (UTC)
Wiktionary:Language treatment documents what's currently treated as a language vs a dialect, with links to the discussions which led to the treatment — which in this case were both rather short. Gallurese and Sassarese are dialects of Corsican, aren't they?, and so treated separately from Sardinian. Whereas, as best I could tell, Campidanese and Logudorese.differed only slightly (and overlapped with 'standard' Sardinian, all in a manner that seemed similar to the dialects of Irish) and so I agreed with the 2014 proposal to merge them under ==Sardinian==. You may know more about it than anyone involved in the earlier discussions did; are you looking to keep Campidanese and Logudorese merged under Sardinian (with labels where necessary) or to split them into separate languages? - -sche (discuss) 18:32, 4 July 2021 (UTC)
As an aside, I wonder if we should link to WT:LT from the Module:languages/data3/a etc pages, to make it more findable. Like "Please check Wiktionary:Language treatment before adding a code to see if it has been intentionally subsumed into something else" or something. - -sche (discuss) 18:36, 4 July 2021 (UTC)
@-sche: Only admins can add new codes, so I'm not sure that wording is appropriate/needed, maybe a simple {{also}} template at Module:languages will do. Thadh (talk) 18:40, 4 July 2021 (UTC)
FYI Nuorese is normally considered a subdialect of Logudorese, notable for its exceptional conservatism. Benwing2 (talk) 20:49, 4 July 2021 (UTC)

Translations by languageEdit

Hello, I think it might a common question but couldn't find any answer.

In the same way the "t-needed" template adds the English headword in "Request for translation into XXX" categories, are there categories that contain all English terms for which translations into a target language were already provided (via templates "t", "t+",...), something like "English terms with translations into XXX"?

Sitaron (talk) 19:50, 4 July 2021 (UTC)

@Sitaron: No, else one could see them at the bottom of the page, which I don’t even with “hidden categories” on; it would also use too much Lua memory, so I know a priori it can’t exist. What may exist is that some users create lists for entries containing translations in certain languages in their userspaces by bot. You may be interested in categories of the format Category:Requests for review of Khmer translations as containing translations needing attention. Fay Freak (talk) 22:38, 4 July 2021 (UTC)

Oriya vs. OdiaEdit

For reasons unclear to me, it's become hip in India to pass laws renaming cities and states. In this vein, "Oriya" got renamed to "Odia" by law in 2011. It's not obvious to me that "Odia" does any better at representing the native pronunciation [oˈɽia] than "Oriya", but given that this change was made at Wikipedia and that even Wiktionary's definition of Oriya labels it as "historical", should we rename the categories here? The placename categories e.g. Category:Odisha already use the spelling "Odisha" in place of "Orissa". Benwing2 (talk) 20:45, 4 July 2021 (UTC)

  • I prefer the spellings Oriya and Orissa, as they are well-established English spellings. The government may use ‘d’ for ṛ, ‘aa’ etc. for long vowels, and ‘sh’, ‘ch’ etc. for ś, c, but on Wiktionary we follow the good transliteration albeit sans diacritics: therefor r, a, s, c. (For older languages such as the Prakrit varieties, I would prefer using diacritics, though.) The government’s passing laws to change toponyms should ill encourage us to follow suit, in our categories. ·~ dictátor·mundꟾ 21:43, 4 July 2021 (UTC)
  • “Odia” is ominous for it seems odious, literally the plural of odium (hate). The insight didn’t reach the political will of India but here in the West we must ward them from losing their faces. More, I am skeptical of any language name that is shorter than five characters—there often pop up homonyms, so it is better to avoid them. Similar are those programming languages and environments which you cannot well search about because of their names hailing from common tools or animals. So here users are better served with “Oriya” when they search since “Odia” has known homonyms even if they aren’t language names. Good reason? Fay Freak (talk) 02:43, 5 July 2021 (UTC)
Wow, here in the West we must protect the Indians from naming their cities as they like? Just, wow.
If the largest English speaking nation in the world uses a name for something in their purview, it's probably the best English name for that thing.--Prosfilaes (talk) 19:55, 5 July 2021 (UTC)
‘largest English speaking nation’: What? It’s not an Anglophone country, but it’s just that English is an official language. Other than the tiny Anglo-Indian community, no one speaks English as a native language: so whatever steps the policymakers of such a country take, well-established spellings must not be changed by us, because we should use spellings that are already in use by native English speakers. ·~ dictátor·mundꟾ 21:09, 5 July 2021 (UTC)
It's the largest nation where English is an official language, and it has more English speakers than any nation besides the US. At a certain point, there are second-language communities that are deserving of note, that create works among themselves and for themselves, like the last millennium of Latin usage, and India has more than hit that point.--Prosfilaes (talk) 23:00, 5 July 2021 (UTC)
Yes indeed! Students from India and other South Asian countries, despite these being formerly an integral part of the British Empire, have to sit English language tests to shew their English qualification before being allowed admission at a college abroad. You are seeing the sheer strength of the population being somehow taught the language at school, but in effect very few have a good knowledge of English, and hardly any with good accent. (And in this particular case of the language name, the policymakers favoured a bad transliteration‡ by rejecting the English spelling, this is not a renaming— there are other instances of renaming, though.) [‡ Our own transliteration: ଓଡ଼ିଆ (oṛia)] ·~ dictátor·mundꟾ 23:08, 6 July 2021 (UTC)
While not super well-versed in the matter personally, I agree that for consistency's sake, it should probably be renamed, especially with the cited references already making the switch. Additionally, Glottolog, SIL, Oxford Dictionaries Online, Collins Dictionary, Wikidata, Wikimedia, and the first instance of "Odia" or "Odiya" on the language's own Wiktionary seen at the top here all have made the change to and use "Odia" as the (primary) spelling as well. Also, see this discussion on the English Wikipedia about the matter, with the eventual decision being to move to "Odia" based on the rapidly increasing usage at the time 6 years ago (which is bound to have increased by now). Having the English Wiktionary be the odd one out, especially when the region itself has made the official change to "Odia" seems a bit strange to me, when even linguistic regulators like SIL & Glottolog that are often cited here have made the change themselves. I'd   Support a renaming. AG202 (talk) 22:30, 7 July 2021 (UTC)
More sources: Cambridge Dictionary, the Oxford English-Odia Dictionary (ISBN: 9780199474554), Microsoft, Google (another Google source), the Concise Oxford Dictionary of Linguistics, and multiple Wikimedia blog posts made by Odia natives which label Odia wiki projects using "Odia". AG202 (talk) 05:57, 11 July 2021 (UTC)
Support renaming per AG202.--Tibidibi (talk) 23:26, 7 July 2021 (UTC) Changed my mind.--Tibidibi (talk) 00:20, 9 July 2021 (UTC)
  No, never, per my earlier comments. I oppose a renaming as someone who works in the language. ·~ dictátor·mundꟾ 17:10, 8 July 2021 (UTC)
I'd take a never comment more seriously if you oppoſed it as ſomeone who works in þe language, instead of opposing one random change no matter how frequently or universally it is used in English. Tilt against windmills, not hand fans.--Prosfilaes (talk) 22:11, 10 July 2021 (UTC)
I don’t see SIL & Glottolog often cited here. And they often create false senses or impressions of what things are viewed as. They contain a lot of ghost languages – so that Wikipedia even contains entries for “languages” whose names even are hardly attestable, recently under WT:RFM. That is if you took a closer look, which for specialized topics are few people, who are then not found on Wikipedia, and could not effect anything there anyway because Wikipedians prefer trashy generalist references by the slant that they are accessible and similar to them in scope, the same way people try to add shoddy Indo-European reconstructions on Wiktionary because they can reference it with the American Heritage dictionary.
Those other dictionaries must be suspect. Like Collins now putting it as the lemma form of “British English” though the laws of India do not apply to Britain. We see that those dictionaries take political decisions before being usage-based. Gestures of obedience to vague overlords.
It always was to astonish how a whole country could obsequiously follow the confusing German orthography reform of 1996 and its later versions, which was nowhere required by law except for government workers themselves. Very similar to when 1933 everyone switched from guten Tag to Heil Hitler, one went with the bellwethers, instead of minding the maxim of Bakunin: “To revolt is a natural tendency of life. Even a worm turns against the foot that crushes it. In general, the vitality and relative dignity of an animal can be measured by the intensity of its instinct to revolt.” (Yes, the conservative rises.)
So the interesting question is: Was the spelling change in India grass roots or grass tops like latest summer’s fashion? (For the references do not indicate without doubt that they are on the former side and not the latter.) Fay Freak (talk) 23:16, 10 July 2021 (UTC)
To be completely honest, the language in your reply was a bit confusing to read and there's a lot that I feel isn't really relevant, but I'll still reply to the relevant points. SIL sets the standard for the ISO 639-3 language codes that are used here, with some changes made here and there, but overall it is cited for new language proposals, deletions, and renamings. Glottolog is also a well-respected linguistic resource and was actually recently cited for the Koreanic family name change that will be happening soon. With Collins, I don't know why marked it as British English, but if you look at other words like "test," there are specifically British English entries there as well. Oxford is one of the most respected dictionaries and is quoted and cited a ton here throughout many entries. Even the official email for the language for its wikis (see here at the bottom), the titles of its wikis in English, and multiple Wikimedia blog posts made by Odia natives use "Odia". And then to add on, Microsoft & Google (another Google source), Cambridge Dictionary, the Oxford English-Odia Dictionary (ISBN: 9780199474554), and the Concise Oxford Dictionary of Linguistics all use the "Odia" spelling. Thus, if it's an argument of which one is used and established more in English, the answer should be clear. Additionally, there are more sources in the Wikipedia talk page that I sent that show the comparisons of "Odia" vs "Oriya" up to that time, illustrating the trend of "Odia" having more usage in English overtime, along with opinions from native speakers. AG202 (talk) 05:40, 11 July 2021 (UTC)
I know exactly what SIL and Glottolog are. The ISO 639-3 language codes we of course reuse here so one doesn’t have to learn twice. I lay emphasis though on Wiktionary as a secondary source. Hence the acceptance of the moves of those dictionaries and encyclopediae was not automatic.
The “major English newspapers in India”, counted in usage at the Wikipedia move discussion, though just with Google, may be more of an argument. But against this one must caution because journalists are a very special kind of people, in the habit of copying from each other. It is a fact that their languages contain elements not found or not found in the same frequency in other types of texts. So we see the The Guardian using the striking spelling bandoe for bando, as if they have just learned the word—but if one of these guys starts to spell it that way the others follow suit, no matter how striking it is (because they don’t know or don’t care, the important thing is that they are the spearheads), and there is nothing organical about that.
A table of uses in social media could be more convincing, weren’t it that these were full of bots and ad-men and other paid shills close to government.
I have yet to see expansion on that this a naturally used spelling and not an orchestrated one. If someone wants a spelling-change and diffuses it top-down it should perhaps have the opposite effect. Fay Freak (talk) 12:06, 11 July 2021 (UTC)
Maybe they are revolting, against some people from the West who think they can dictate names of Indian languages based on Latin. You've yet to see any evidence that this is a naturally used spelling because you've excluded all sources of recent quotes. I don't honestly know what sources would interest you; I suspect none that didn't support you. Journalists are distinctive, partially in that they publish more text than just about anyone else; the Washington Post alone offers 300,000 words a day, not including wire articles. There is nothing "organical" about using works like "organical" in modern English texts, but languages exist, evolve and grow because humans adapt their way of speaking and writing to better communicate with those around them.
As a note of interest, Amazon has a bunch of books labeling themselves as in Odia that Amazon labels as being Oryia.--Prosfilaes (talk) 22:43, 11 July 2021 (UTC)

Once more on hyphenation for Korean suffixes and particlesEdit

It's been four months since the latest discussion on this, and I've become once more convinced of the advantages of hyphenation of 1) all suffixes and particles in the verbal paradigm, 2) all postpositions/nominal particles, and 3) native affixes without unbound counterpart. There are the following benefits:

  • This 2011 discussion led to the abolition of hyphens in Korean, but Korean was not in fact discussed; it was all about Japanese. Japanese is kind of a red herring in this discussion because of a fundamental difference in orthography: Japanese does not use orthographic spaces, whereas modern Korean does. Hyphenation makes sense in such an orthographic context.
  • Better readability. More-or-less "complete" entries like (i) end up being very cluttered due to the mix of bound morphemes and full-fledged words, with nearly two dozen independent etymologies. Such long pages would be difficult for learners to navigate.
  • Consistency with other languages. With the exception of Japanese and languages written in the Arabic script, all languages with significant morphology use hyphens to mark bound morphemes, e.g. Sanskrit (Category:Sanskrit suffixes).
  • Consistency with actual practice. Monolingual dictionaries and linguistic works lemmatize verbal suffixes with hyphens, e.g. -더라 in 표준국어대사전. While case markers and the like are not hyphenated in dictionaries, it is not difficult to find various academic sources that do so, e.g. 주격조사 ‘-가’의 발달.
  • When the functionality of hyphenation in Korean romanization is restored (this is one of the stated parameters of Template:ko-usex but has apparently been defunct for several years), hyphenization of particles might be useful in usex formatting. For example:
    {{ux|ko|-이 오길래 -을 감고 얼굴-을 -에 묻었다.}}
If possible, we could tell Module:ko to keep the link to the hyphenated forms but remove the hyphens in the actual display, while the hyphens are retained in the romanization. This would have two benefits. First, it would produce
Nun-eul kkwak gamgo eolgur-eul du son-e mudeotda.
Instead of the current suboptimal
Nuneul kkwak gamgo eolgureul du sone mudeotda.
Second, the hyphenation would allow linking directly to the particles, e.g. -이 (i) instead of (i) in general. Currently, you need to use {{anchor}} or {{senseid}} to do this, and it is annoying to 1) type out the IDs and 2) remember/look up the ID the particle is assigned.
I might be way off on the technical feasibility of this, though. @Suzukaze-c

On the other hand, there are problems with the proposal:

  • The treatment of certain affixes:
    • In my opinion, we should keep all Sino-Korean affixes where they are, at non-hyphenated forms.
    • Native affixes with unbound etymons like (su, male), (gae, fucking (vulgar)), etc.: I am ambivalent about these but I think they can be kept where they are, at non-hyphenated forms.
    • 하다 (hada) and 되다 (doeda) should obviously not be hyphenated because these are not actual suffixes, even though the 표준국어대사전 categorizes them as such.

So only native morphemes that exist only in bound form would be affected.

  • The technical feasibility of a move:
    • There are currently only 214 entries in Category:Korean suffixes, of which about two-thirds on a cursory glance would qualify for hyphenation. There are 74 entries in Category:Korean particles. These are few enough to be moved in a single day.
    • A much bigger issue is links in usexes and quotations, which would have to be manually retargeted. If the proposal passes, I would consider a hard redirect for words like (reul) or 습니까 (seumnikka) just to preserve the functionality of usexes and quotations. There are 342 words with quotations (Category:Korean terms with quotations), all of which would be affected, and 2,486 in Category:Korean terms with usage examples, only some of which would be affected since many of the usexes are poorly formatted and lacking links. The retargeting can probably be done gradually, like the phasing out of {{etyl}}.
    • {{af}} has not been used by editors of Korean until last year, so Category:Korean words by suffix is rather sparse. There seem to be less than 200 pages which would currently be affected by hyphenation of verbal suffixes.

I would note that the technical issues will only grow as the quality of Korean entries improve, so if hyphenization is ever to be done, it should be as soon as possible.--Tibidibi (talk) 08:23, 5 July 2021 (UTC)

Pinging participants in the previous discussion: @Solarkoid, Eirikr, Omgtw15, Atitarev.--Tibidibi (talk) 08:24, 5 July 2021 (UTC)
Support. — Omgtw15 (talk) 10:02, 5 July 2021 (UTC)
I support the hyphenation on entries, but I would never use them in usex, so I oppose that. Hyphens don't belong in a Korean text. Sartma (talk) 10:35, 5 July 2021 (UTC)
@Sartma I think you misunderstand; the hyphens would be stripped by code, leaving their trace only in the links and the Romanization (but not visible in the display of Korean text). I'm not sure if this is technically feasible, however. This used to be possible in {{ko-usex}} until around 2017, but then the relevant module was rewritten and it became defunct.--Tibidibi (talk) 10:44, 5 July 2021 (UTC)
@Tibidibi: Argh! Yes, I did misunderstand! I'm terribly sorry for that, I didn't read your text properly. I just had a moment of freak-out/panic when I saw the hyphens in the Korean. (^m^;;;; Sartma (talk) 11:00, 5 July 2021 (UTC)
@Tibidibi I'm not following everything above but I'm pretty sure the technical issues are solvable and I can help with them. I reimplemented Module:compound a couple of years ago (which handles {{affix}} among other things) and cleaned up the support for hyphens; changing the handling of hyphens on a language-specific basis is pretty easy. I'm sure Module:usex can be fixed to support whatever hyphen-related functionality you want (for that matter, anything in {{ko-usex}} should be foldable into Module:usex so we don't need a special Korean-specific version). Benwing2 (talk) 19:45, 5 July 2021 (UTC)
@Benwing2 What would need to be done is that {{usex|ko|[[사람]][[-이]]}} should produce:
사람
Saram-i
That is, the hyphens are preserved in the wikilink and romanization but stripped in the display of the text. Would this be possible?
If getting them to show up in the romanization is too hard (I believe hyphens are currently automatically suppressed by the Korean transliteration module), that can be delayed to some later time.--Tibidibi (talk) 01:00, 6 July 2021 (UTC)
@Tibidibi This can definitely be done, although it would be easier and cleaner to implement if hyphens weren't suppressed by the translit. Is there a reason they are stripped? If they are stripped in most places in the translit but not in usexes, it effectively means we have two different transliteration schemes. Benwing2 (talk) 01:17, 6 July 2021 (UTC)
@Benwing2 Until a few years ago, hyphens were not suppressed by the translit and were in fact used to separate particles from the noun, as I'm suggesting we return to (you can still see hyphenated usexes in some of the oldest usage examples). This was removed after a redesign of Module:ko-translit, apparently because the hyphens interfered with the transliteration. {{m|ko|얼굴-에}} is supposed to be transliterated as eolgur-e, because /l/ is realized as [ɾ] in intervocalic position within a word, but the hyphen made it think these were two separate words and transliterated eolgul-e. The hyphens were then suppressed in the transliteration to prevent this.
Do you think you could fix this?--Tibidibi (talk) 01:36, 6 July 2021 (UTC)
@Benwing2 OTOH this account might not be totally correct since I wasn't around for it, and there isn't any official statement I could find about why the hyphen functionality was removed from the translit. I think the user who changed it was Wyang, who is now gone.--Tibidibi (talk) 02:05, 6 July 2021 (UTC)
@Tibidibi This would be easy to fix as long as there aren't genuine cases where hyphens are used to separate two words and the written l needs to be transliterated as l in those cases. Benwing2 (talk) 02:10, 6 July 2021 (UTC)
@Benwing2 There are no such cases.--Tibidibi (talk) 02:14, 6 July 2021 (UTC)
@Tibidibi OK, I should be able to get to this within a day or so. I see where in the code it's removing the hyphens and it's just a case of figuring out where it converts l to r before a vowel and make it ignore an in-between hyphen. In the meantime can you construct some test examples for me with hyphens in them that might be tricky for the module to get right, along with what you expect to be generated? 얼굴-에 (eolgure) is one such example. Benwing2 (talk) 02:23, 6 July 2021 (UTC)
@Benwing2 Supporting bold formatting would also be nice. Module:ko-pron/testcases. —Suzukaze-c (talk) 06:43, 7 July 2021 (UTC)
@Benwing2 Sorry to bother you, but would it be possible for you to get to this by the end of this weekend?--Tibidibi (talk) 00:21, 9 July 2021 (UTC)
@Tibidibi Yes. My apologies, I will try to get to this tomorrow (Sunday). Benwing2 (talk) 05:59, 11 July 2021 (UTC)
  Support. The page for is getting very unwieldy, and I agree that the hyphenated entries would lead to less confusion and would be easier to find for the everyday user. I would also suggest that Jeju and its entries be added to this proposal as well, just for consistency's sake. AG202 (talk) 22:02, 5 July 2021 (UTC)
  Support. The various points made above all make sense to me, and I have no concerns with this proposal moving forward. ‑‑ Eiríkr Útlendi │Tala við mig 00:15, 7 July 2021 (UTC)
  Support. kwami (talk) 23:16, 10 July 2021 (UTC)

FYI, this is what McCune–Reischauer says with regard to this very issue.

"The nouns, likewise, should be written together with their postpositions, including those called case endings, not separately as in Japanese, because phonetically the two are so merged that it would often be difficult and misleading to attempt to divide them."

For example, should 낮에 be romanized as "naj-e" or "nat-e" or "na-je"? All of these are unsatisfactory and misleading. This is why McCune–Reischauer says that it should be simply "naje". --2607:FB90:5AEA:E837:B427:4261:38B5:2C21 04:00, 12 July 2021 (UTC)

낮에 is /nat͡ɕe/ phonemically and 낮-에 morphologically, so naj-e should be the preferred hyphenation and transliteration. Transliterating 낮에 as naje does not respect the morphophonemic nature of contemporary Hangul orthography, so if anything the lack of hyphenation is what is unsatisfactory. There is a clear orthographic distinction between 낮에 and 나제; how else would you mark this in the transliteration?
The only issue (which is honestly more an issue with Revised Romanization than with the principle of hyphenation in itself) is the clusters involving /h/ that surface as single aspirate consonants, but I don't see it as that big of a problem.--Tibidibi (talk) 04:18, 12 July 2021 (UTC)
Well, this is what McCune–Reischauer (and I) mean.
"naj-e" is misleading because it suggests that 낮 is pronounced /nat͡ɕ/ in isolation.
"nat-e" is also misleading because 낮 is not pronounced /nat/ when followed by a particle (postposition).
"na-je" is also misleading because it doesn't match morpheme boundary.
So it should be simply "naje".
낮에 and 나제 don't have to be distinguished in romanization. Rather, due to hangul's syllabary-like feature (모아쓰기), those two were "made" to be distinguished in Korean spelling. If Korean didn't use a writing system with syllabary-like feature, then the orthographic distinction between those two would not exist from the beginning (i.e., there would only be ㄴㅏㅈㅔ from the beginning). --2607:FB90:5AEA:E837:B427:4261:38B5:2C21 05:18, 12 July 2021 (UTC)

Splitting WT:RFVNEdit

I split off the CJK-related stuff in WT:RFVN into WT:RFVCJK. I did this by checking for either the language codes 'zh', 'ja' or 'ko' in the entry title or an East Asian character in the entry title. As a result, a couple of entries not in Chinese/Japanese/Korean got moved: I noticed one in Vietnamese and one in Ainu. Not sure this is desired; if not, I or someone else can move those two entries back to WT:RFVN. I fixed {{rfv}} and {{rfv-sense}} so they will automatically add to WT:RFVCJK instead of WT:RFVN if the language is Chinese, Japanese or Korean. The result of this is that WT:RFVN is now about 2/3 its previous size, which should help somewhat, although IMO it's still unwieldy. I'm thinking of also splitting off the Romance languages, as discussed prior; this will require a bit more work as there are more Romance languages than CJK languages and you can't identify pages to split off by character set. Benwing2 (talk) 21:08, 5 July 2021 (UTC)

I think Ainu is OK, ain should be in Template:rfv. For Vietnamese, four contributors: Ekirahardian, PhanAnh123, Bula Hailan, ColePeltier93 can all write some Chinese Characters. EdwardAlexanderCrowley (talk) 04:04, 7 July 2021 (UTC)

Deleting Italian reflexive participlesEdit

(Notifying GianWiki, Metaknowledge, SemperBlotto, Ultimateria, Jberkel, Imetsia, Sartma): User:SemperBlotto bot-created thousands of Italian reflexive participles some time back. Examples:

AFAIK, all of these forms are obsolete and the vast majority are both unattested and unattestable because they are not part of the modern language and no one is composing texts in obsolete Italian any more (unlike e.g. Latin). I can't actually find a single one of these terms that has any non-bot edits.

Yes, "all words in all languages" but Wiktionary also has an attestation criteria (WT:CFI) and the vast majority of these terms would fail that. I propose to delete all of them and require at least one actual attestation before adding any of them back (which means no adding by bot). Benwing2 (talk) 22:11, 5 July 2021 (UTC)

I searched a few randomly and didn't even give Google hits. I support deleting them and putting the burden of proof on anyone who wants to add them by hand to try to cite them. Ultimateria (talk) 01:56, 9 July 2021 (UTC)

AnagramsEdit

Some languages (e.g. English, Italian) have anagram sections at the bottom. These haven't been updated in over 10 years; User:Conrad.Bot was doing it but is no longer active. Should we (a) leave them alone, (b) delete them, (c) update them (somehow)? Benwing2 (talk) 05:25, 7 July 2021 (UTC)

Isn't User:NadandoBot doing this already? I would support updating them in any case though. Thadh (talk) 09:19, 7 July 2021 (UTC)
Yes, in the recent past I updated them (for English, Finnish and Danish only). I mainly find it impractical due to the large number of pages that may have to be edited (of course the program does it, but still). Maybe there is a solution that doesn't require editing each individual page, and crucially does not have a large memory footprint. DTLHS (talk) 02:00, 8 July 2021 (UTC)
@DTLHS Given the current way of handling anagrams, the only way I see of updating them is to go through the dump file, construct a map from alphagrams to pages containing those alphagrams, check each page in the dump file to see if its anagrams need updating, and edit those pages needing updating. I assume your code does essentially this. This will take a good amount of memory esp. for English. If you don't have enough memory to hold the whole map, it should be possible to hack up a map-reduce type of solution to process the dump in chunks, using the disk for intermediate storage. I have done that in the past to do things like sort the Wikipedia dump file on a 16G memory Macbook Pro.
BTW I assume you used the {{anagrams}} template and inserted an alphagram at the beginning? I found an example from late 2018 where your bot formatted things this way (angriest). I can't find any Danish examples though. The Italian anagrams use a random combination of formatting with {{anagrams}}, with multiple calls to {{l}} on a single line, and with each anagram on its own line formatted with {{l}}. Benwing2 (talk) 05:00, 11 July 2021 (UTC)
I should have said Swedish. And that was my method. The memory wasn't a problem, it was more of a problem keeping up with English which has lots of new pages added every month. DTLHS (talk) 00:34, 12 July 2021 (UTC)
Update them if you can. Otherwise leave them alone. It's not your business to delete good content. Jesus I've had enough of you people hating on anagrams. Equinox 09:29, 7 July 2021 (UTC)
@Equinox Really, "you people"? Have you seen the usage note on that term? Benwing2 (talk) 01:02, 8 July 2021 (UTC)
>Telling me to learn English
I had no idea about that and I speak British English. It's very rude of you to expect that your American rules apply across the world (but typical). Equinox 05:52, 10 July 2021 (UTC)
You must have a stick up your ass or something; I really don't know what your issue is but you seem to relish insulting people and picking fights. Benwing2 (talk) 19:05, 10 July 2021 (UTC)
But leave them alone or update them. I see no reason for discarding them. Like nobody would come about and censure us as a sketchy website by reason that we haven’t even updated anagrams. Fay Freak (talk) 01:37, 8 July 2021 (UTC)
We could do anagrams the way we historically did rhymes, i.e. have a page like "Anagram:aet" that lists ate, eat, ETA, etc, and then all of those pages just link to Anagram:aet. Then, whenever a new anagram is added, only the page for the new anagram and the central Anagram: page have to be updated, not every single other word spelled with those letters. Or we could do this in the way that it's more recently been proposed to do rhymes, i.e. with categories, which could be automatic and perhaps even easier to maintain. - -sche (discuss) 18:02, 8 July 2021 (UTC)
@-sche Can you explain the more recent proposal for rhymes, and how it would apply to anagrams? Benwing2 (talk) 04:38, 11 July 2021 (UTC)
The way I imagine anagram categories working would be that {{head}} would add a category with the sorted characters as an alphagram. This seems dubious to me as there would be a very large number of alphagram categories most of which would have only a single entry. DTLHS (talk) 00:26, 12 July 2021 (UTC)
We could do it the hard way and require an entry to provide an anagram (not itself) for it to be included in the category. However, what do we do about head's, which I presume is an anagram of shade, but is not eligible for an entry. What languages do we have definitions of anagrams for? We currently record English face and café as anagrams of one another, but we can't treat a and ä as the same for Danish. For Danish, do we get unusual rules such as aa and å matching? For Swedish, I presume ö is a different letter to o. What about Welsh anagrams? @Octahedron80 pushed the rule that two words are anagrams of one another if they have the equal bags of Unicode letters (I'm guessing in form NFD), but that gives what seem to be some odd results in Thai, such as spacing vowel symbols counting but not non-spacing vowel symbols. --RichardW57 (talk) 17:15, 12 July 2021 (UTC)
Ignoring non-spacing symbols is the same way thinking of Thai Scrabble "คำคม" or general crossword that put ะ า ำ เ แ โ ใ ไ into separated cells. Not ignoring the symbols will lead to VERY less results. The rules of anagram are different per language. (I made {{th-anagram}} for Thai.)--Octahedron80 (talk) 23:30, 12 July 2021 (UTC)
Whereas in Thai crosswords some cells have non-square shapes to accommodate marks above and below. I would have expected the elements for natural anagrams to correspond to primary sort keys for collation; you're saying technological limitations have prevailed. (The elements you list are also the elements for intra-word spacing.) Do Thais see any sort of significance in words or phrases being anagrams of one another? I know spoonerisms are significant. --RichardW57m (talk) 08:35, 13 July 2021 (UTC)
I don't know well what you mean. To not ignore non-spacing marks, it was possible, but I chose not to apply. Thai has too many letters to match; not like Latin just A-Z. Compare Module:th-anagram/processed data resulting 1000+ sets and User:Octahedron80/sandbox resulting only 200+ sets. (Intra-space is ignored as same as punctuation marks.) Do you really want the latter? Please confirm and I will replace. About spoonerisms, most of them are SOP's or meaningless; you won't find so many words in Wiktionary. --Octahedron80 (talk) 08:53, 13 July 2021 (UTC)
Do Thais lack a naturalised concept of 'anagram'? If they have one, that is what we would want. If the only Thai concept is one of rearrangements of letters on tiles for a board game, then I am not sure we want the Thai concept of anagrams. The 1000+ sets strike me as wrong, but that reflects my (English) culture and attempt to understand others. It's possible that we might want an English-speaker's notion of anagrams of Thai words, though Los Angles and Delhi might not have a common concept. What do others think? --RichardW57m (talk) 12:34, 13 July 2021 (UTC)
Anagram is really invented to play puzzles and board games, anyway. No others' comment for 7 days. I will change to match all letters then. --Octahedron80 (talk) 00:05, 21 July 2021 (UTC)
At least option a, if somebody is up for implementing option c or -sche's proposal, that's fine. ←₰-→ Lingo Bingo Dingo (talk) 09:21, 10 July 2021 (UTC)
Per - -sche, the way it's heading, we absolutely should just go ahead and create an Anagrams: namespace comparable to what we do with the Rhymes: namespace. I would prefer namespace pages to categories, as we still have no system (so far as I am aware) of watchlisting category contents to be alerted when a new categorization is made. bd2412 T 05:12, 11 July 2021 (UTC)

Some tips here if somebody wants to make anagram module. (I hope you can understand.)

How to find anagrams

  1. Get a list of every word in a language. Lemmas + non lemmas are expected.
  2. Make index for every entry of the list by binary-sorting its letters, with or without diacritics depending on each language. (See alphagram)
  3. Collect entries which their indices matches each other, except blank and 1-character indices. The blank index will occur for sure.
  4. Drop unmatched isolate entries. Anagrams remain.

--Octahedron80 (talk) 02:14, 13 July 2021 (UTC)

PS - Japanese anagrams would be catastrophe since they must extract all kana from kanji.

@Benwing2 Sorry I missed your question earlier, but yes, like DTLHS says, the way to do anagrams via categories would be to have categories for the "alphagrams" ("Category:English anagrams of aet", for example, listing "ate", "eat", etc). These could either be added automatically by {{head}} iff that wouldn't use too much memory (but it would result in many categories with only one entry, as DTLHS warns), or added manually or by bot (using less memory, and allowing addition to be restricted to cases where the category would have multiple entries). Even manual or bot categorization would be somewhat less tedious that the current setup, because when an entry is deleted, no other pages would have to be updated, unlike at present. (With an "Anagrams:" or "Anagram:" namespace, only the entry and the one central anagram page would have to be updated whenever an entry was created or deleted, also reducing how many pages have to be updated.) - -sche (discuss) 01:13, 15 July 2021 (UTC)

Taxonomic namesEdit

'The question of inclusion of taxonomic names is a matter for, first, the Beer Parlor, then a vote' (DCDuring). My opinion is to include both the generic epithet and the full binomial name (if I got my terminology right) as entries: really I just want their etymologies in Wiktionary. I'm not so convinced with including specific epithets by themselves, especially I don't think having a specific epithet entry would be enough to not include the binomial entry, since their could be more than one way in which the attribute described by the specific epithet is used in binomial names. Kritixilithos (talk) 14:15, 7 July 2021 (UTC)

We should not be including binomials (Pan troglodytes, Peromyscus maniculatus). As analogies, consider the rules for full names of people and chemical formulae. But more important to me, the internet is already overloaded with automatically created lists of species names. Most of these are harmful because they hide the few legitimate sources of information. ceb.wikipedia is a notorious offender and should be nuked from orbit. Wikispecies is the place for these names. We should also not be including specific epithets (nevadensis, slossonae) that exist only in taxonomy but I feel less strongly about this. Hit it with a bunker buster instead of a nuke. Vox Sciurorum (talk) 18:47, 18 July 2021 (UTC)

References vs. Further readingEdit

Currently, the Italian entries randomly put links to the same term in additional dictionaries either under "References" or "Further reading". I would like to clean this up. I assumed that these links ought to go under "Further reading" on the assumption that References is reserved for footnotes, but WT:ELE says this under "Further reading":

The “Further reading” section contains simple recommendations of further places to look.
This section may be used to link to external dictionaries and encyclopedias, (for example, Wikipedia, or 1911 Encyclopædia Britannica) which may be available online or in print.
This section is not meant to prove the validity of what is being stated on the Wiktionary entries (the “References” section serves that purpose).

After reading this, especially the final sentence, I'm thoroughly confused. These links do consist of pointers to external dictionaries (hence the second-to-last sentence makes sense), but the links are not provided just for random edification but usually to help verify the correctness of the Wiktionary entry. For example, when I insert a link to {{R:it:DiPI}} (a dictionary of Italian pronunciation), it's often because there's something in the pronunciation that isn't obvious (e.g. the presence of secondary stress or a hiatus), and the link is what I used to source this pronunciation. Many existing entries contain links to {{R:it:Treccani}}, which was clearly the source for the definition(s) in the entry, since this is the most authoritative Italian dictionary. So ... do these links go under "Further reading" or "References"? It should not come down to the purpose of the links (whether they are for verification of "simple recommendations of further places to look"); that is entirely subjective and an impossible standard to make use of. Benwing2 (talk) 03:24, 8 July 2021 (UTC)

One could make it more objective by adding a comment (perhaps an HTML comment) to the references to say why the reference is being cited as a reference. I'm not happy with the rule that we don't use in-line references, but I appreciate that they can look ugly and sometimes be confusing. An HTML comment should protect against an editor excusably moving the reference to 'further reading'. --RichardW57 (talk) 05:05, 8 July 2021 (UTC)
I feel like the distinction between these which was intended when they were separated is clearly so commonly not maintained in practice that it (together with the questions above about whether a distinction is even maintainable) calls into question whether we should just combine the headers (again?). Even I often just put all the references I add under References, or under whichever header is already present. - -sche (discuss) 18:09, 8 July 2021 (UTC)
I have recently added "References" sections to entries without specifying what I was referencing- as a kind of general "reference". I did so beacuse there was very little to read on those dictionary entries beyond "this is a varint form of another word x". If I need to change over to "Further Reading", let me know. --Geographyinitiative (talk) 18:26, 8 July 2021 (UTC)
My thinking is that "Further reading" = "works consulted" and "References" = "works cited". So at corretja, I got a date on the term from one source, but the definitions are based on a combination of these sources, so I treated them differently. Definitions themselves are technically only sourced through citations—except in LDLs, so I'm not sure how that affects things. (Presumably a reference on an LDL page sources all the info on it.) I'd be fine with merging the headers mainly to avoid inconsistency. Ultimateria (talk) 21:59, 8 July 2021 (UTC)
For me, ===References=== holds <references/> and nothing else, since inline citations are what connects a specific fact to the reference that confirms it. ====Further reading==== holds anything else relevant to a specific term, that is not tied to any facts stated in the entry. In other words, the former verifies information, the latter does not. —Rua (mew) 08:08, 12 July 2021 (UTC)
When I was new here, I was reproved for using in-line references - 'We don't do in-line references here.' --RichardW57 (talk) 17:38, 12 July 2021 (UTC)
"anything else relevant to a specific term, that is not tied to any facts stated in the entry." What about the fact of the existence of the word? --Geographyinitiative (talk) 21:37, 12 July 2021 (UTC)
All of this has arisen from the banning of the heading "External links" and the placement of such links under the heading "See also".
I find "Further reading" to be an unsatisfactory heading in many of its applications at Wiktionary. For a normal user, interested in definitions, usage examples, and translations, "further reading" seems time-consuming and irrelevant. "References" is more suggestive of terse information and confirmation of the information Wiktionary presents, or alternatives to the information Wiktionary presents or the way it is presented. Images linked under such headings are clearly not "reading" in the most common sense. Links to databases that contain semantic information not presented as sentences do not reward the reader with material to be "read" in the normal sense. "Further reading" makes me think of multi-page articles and scholarly notes about the etymology, usage, semantics, and syntax of words, not a verbal definition or translation, database entry, or picture. "Reference" seems to be much more inclusive.
Appropriating the term "Reference" solely for footnoted references would be fine if we had a single suitable term to characterize non-footnote type references that was not misleading as "Further reading" is. As it is, we once more are acting as if we are preparing a dictionary for only scholarly users of the linguistics variety, rather than a general population of users, whether scholars of any discipline or more ordinary users. DCDuring (talk) 20:13, 12 July 2021 (UTC)
I question the need for references to not be inline/footnotes. What use is a reference if you have no idea what information is referenced from it? —Rua (mew) 18:07, 13 July 2021 (UTC)
It just refers the reader to something, without further specification. The idea what is referenced the reader has himself, is prejudiced towards, i.e. he gets referenced what he wants referenced. This is really somewhat individual, depending on the term—the disputability of its definitions, forms and etymologies—, and the expected readers, and the working habits of editors, what gets referenced and how precise the pointers are, and whether it is sufficient. A usual way is just adding everything one knows and mentioning the resources that one had open.
I agree with DCDuring’s reasonings about “References” being more inclusive. I mostly use “Further Reading” to show that there is relatively a lot that one can read, if one wants to read more.
The distinction is basically not needed. Sometimes useful and abused for layout reasons: That direct references with <references/> go under the references header and the synthesized rest under the further reading header, because it is ugly when <references/> directly follows unspecific references, or perhaps the other way round. Else, I see it only as remnants from a time when there were much more headers. Fay Freak (talk) 19:22, 13 July 2021 (UTC)
Yeah, my instinct is simply to merge footnotes and non-footnote references under the ==References== header for Italian, as currently there's no consistency at all as to what goes in ==References== vs. ==Further reading==. The only way this loses info is if a consistent distinction is made in non-footnote references between those that go in ==References== vs. ==Further reading==, and from everything I have seen, there's no such consistent distinction. If we later decide to follow User:Rua's suggestion of putting footnotes under ==References== and everything else under ==Further reading==, this can be done by bot. Benwing2 (talk) 06:50, 14 July 2021 (UTC)
I think, going farther back, it might be the fault of MediaWiki choosing <references /> as the name of the footnotes tag. It should have been <footnotes />. Because of the tag, the header for footnotes is References. I recalled a vote saying References should only include footnotes, and this is probably it: Wiktionary:Votes/2016-12/"References" and "External sources". — Eru·tuon 03:48, 15 July 2021 (UTC)
Exactly, this is what I've been going by ever since. —Rua (mew) 19:16, 15 July 2021 (UTC)
I have just read Wiktionary:Entry_layout#References. To my formerly untrained eye, I recently seem to have been doing something right on the line between "verify[ing] the information available on our entries" and "simple recommendations of further places to look". However, I'm now leaning toward viewing my external dictionary links as more akin 'Further reading'. Please forgive my confusion- I have never added dictionary links until recently and hadn't seen this part of entry layout. --Geographyinitiative (talk) 19:31, 15 July 2021 (UTC)
I usually use references. I add a separate further reading section some references use <ref> and others do not. This is to make the page look nicer. I use further reading if I am linking to something like a Language Log post. This may not match anybody else's style or the officially prescribed style. Vox Sciurorum (talk) 18:57, 18 July 2021 (UTC)
FYI some Wikipedia pages have subsections called "Citations" and "Sources" under ==References==; see Newcastle upon Tyne for an example. Benwing2 (talk) 03:32, 21 July 2021 (UTC)

Earning the right to trial by RfVEdit

@Inqilābī has proposed entries with neither attestation nor Google hits may be subjected to speedy deletion even though there is reason to believe that they exist. As googling for a word in a heavily inflected language may easily fail if one just searches for the citation form, it seems that for words that 'obviously exist', one may need to provide google hits in the entry as evidence of its right to trial by RfV. In the use case I have in mind, these would exhibit inflected forms. How should one present these google hits? Would it be appropriate to exhibit them as 'usage examples', albeit perhaps lacking translations? --RichardW57m (talk) 13:18, 9 July 2021 (UTC)

(Actually the wording implied that both attestation and Google hits were required, but I'm assuming that this was just sloppy wording.)

  • I never said that. ·~ dictátor·mundꟾ 05:16, 10 July 2021 (UTC)
    Sorry, that was @SodhakSH. You two really shouldn't hide your names in posts. It should be easy to see who said what, without using tricks to find the user name. --RichardW57 (talk) 08:53, 10 July 2021 (UTC)

Synonym Chart for Danish, Norwegian Bokmal, Norwegian Nynorsk, and SwedishEdit

Hi! Owing to the generally acknowledged fact that the three North Germanic languages (Danish, Norwegian, and Swedish) have high mutual intelligibility, I was wondering if we can make a synonyms chart similar to what we have for Persian (check دوچرخه where it shows the word for bicycle in Iranian Persian, Dari Persian, and Tajik) and similar to what we have for Chinese (check 自行車 where it shows the word for bicycle in the various Chinese languages). A synonyms chart like this, which would theoretically show four entries (since we show one for both Bokmal and Nynorsk), would be helpful for language learners and language enthusiasts in comparing the word usage for these North Germanic languages. For example, a Scandinavian languages synonym chart for "frog" would have frø for Danish, frosk for both Bokmal and Nynorsk, and groda in Swedish, and a Scandinavian languages synonym chart for "breakfast" would have morgenmad for Danish, frokost for Norwegian Bokmal, frukost for both Norwegian Nynorsk and Swedish. Theoretically, it could also be expanded to the various Norwegian dialects if applicable. What do you guys think? --Mar vin kaiser (talk) 13:29, 11 July 2021 (UTC)

Tagging some Scandinavian language editors I've found, @Enkyklios, @Donnanz, and @Gamren. --Mar vin kaiser (talk) 13:38, 11 July 2021 (UTC)
Suzukaze made {{dialect synonyms}} which can be used. If we are going ahead with this, I'd be in favour of including all their dialects too and not just the 'standard' versions. Kritixilithos (talk) 14:50, 11 July 2021 (UTC)
@Mar vin kaiser I'm in favor of this in general. The Kurdish languages have a similar template {{ku-regional}} that I expanded; see moz for an example. BTW I also think we should consider figuring out a way to merge Norwegian Bokmål and Nynorsk. It seems silly to have so many entries like Abkhasia and abkhasisk that have both Bokmål and Nynorsk entries. Benwing2 (talk) 18:15, 11 July 2021 (UTC)
@Benwing2: I also think we should merge them, but there was a vote about it, and it failed, unfortunately. PUC – 22:12, 11 July 2021 (UTC)
@Benwing2 @PUC Bokmål and Nynorsk are different languages with entirely different origins. Bokmål originates in Danish (regardless of Wiktionary saying it is descended from Middle Norwegian), which has gradually had spelling reforms made to reflect the local (mis)pronunciation of the Norwegian urbanites. Nynorsk on the other hand originates in Ivar Aasens studies on Norwegian dialects. The two have borrowed a lot of words from eachother, but they are still separate languages and should not be merged. Mårtensås (talk) 12:29, 13 July 2021 (UTC)
@Benwing2: Can you help me make a template draft for this? At least for the four I identified, to get started. I'm not familiar with the code, you see. --Mar vin kaiser (talk) 11:23, 12 July 2021 (UTC)
@Kritixilithos, Suzukaze-c The problem with {{dialect synonyms}} is it requires that separate data pages be created for each lemma. This makes sense when you want to cover a zillion dialects but if there are just a few, it can be painful to have to create all those pages. In addition there's the issue of what to name the data pages. The solution of having separate data pages was tried for descendants (the former {{etymtree}}) and didn't work well; the solution actually adopted in {{desctree}} was to fetch the data directly from one of the pages, which could be done here as well. Benwing2 (talk) 18:22, 11 July 2021 (UTC)
@Benwing2 It is a design innovated by {{zh-dial-syn}} ({{dialect synonyms}} is a language-agnostic remake) and has worked so far. —Suzukaze-c (talk) 19:27, 13 July 2021 (UTC)
You're describing a Swadesh list. We already have lists for Danish, Swedish, Faroese, Icelandic and both Norwegians that you can merge. WP has another one just with the Scandi languages.__Gamren (talk) 19:05, 11 July 2021 (UTC)
@Gamren: But Swadesh lists only cover basic words, and WP only covers a sample of words in the language. This proposal would cover all words of the language in all aspects. For example, what would happen is when you open the entry for frø, for example, you would see a table there for the terms in the other relatively mutually intelligible Scandinavian languages. --Mar vin kaiser (talk) 21:55, 11 July 2021 (UTC)
What? I don't want that. Cognates go in etymology section. Non-cognates go nowhere.__Gamren (talk) 23:02, 11 July 2021 (UTC)
@Gamren: The same thing is being done in Wiktionary across other languages and dialect continuums, like Kurdish and Persian. For example, check دوچرخه where you can see Iranian, Dari, and Tajik use different terms for "bicycle", and moz where you can see the word for "hornet" across Kurmanji, Sorani, and Southern Kurdish. This format is very useful for language learners and users to find out word differences and spelling differences across super-related languages, especially those that have high mutual intelligbility, which the Scandinavian languages have. So it would be convenient for learners to know in the Danish entry frø, that it's a totally different word in Norwegian and Swedish, frosk for both Bokmal and Nynorsk, and groda in Swedish. --Mar vin kaiser (talk) 07:25, 12 July 2021 (UTC)
Okay. I think it's a stupid idea in and of itself. Besides, how do you deal with synonymy? Do you link ei with ej or ikke?__Gamren (talk) 08:03, 12 July 2021 (UTC)
@Gamren: Well, based on my experience with other languages, more than one entry can appear within one language or dialect. If you look at the dialectal synonyms chart in 自行車 in Chinese, one dialect can have several words/synonyms for "bicycle", and sometimes if one word is already obsolete in that dialect, or very literary in that dialect, there's an option to label that word for that dialect as obsolete, literary, or any other label. --Mar vin kaiser (talk) 10:26, 12 July 2021 (UTC)

Merging categories for 'informal' and 'colloquial' termsEdit

See WT:RFM. I have redirected the 'colloquial' label to point to 'LANG informal terms' e.g. Category:English informal terms (instead of e.g. Category:English colloquialisms). For the moment the two labels still display differently. BTW there are a few weird edge cases, e.g. the colloquial-um and colloquial-un labels, used only for Persian (Category:Persian colloquialisms containing sequence um and Category:Persian colloquialisms containing sequence un). I don't speak Persian so I have no idea what's going on here but having special-purpose labels like this sitting in the general-purpose code strikes me as wrong. Benwing2 (talk) 04:24, 12 July 2021 (UTC)

 @Benwing2: Well you see, in transcriptions as of بادام(bādām) and words like سلام(salām) you fast find pronounced, the Classical Persian vowel ā is pronounced in Modern Iran more rounded and raised than the transcription suggests. In substandard language of some cities this goes even, in words auslauting ān and ām, so far as [uː], and thence is written, as “eye-dialect”, with و‎. But apparently not all words and so editors find it necessary to collect these forms. That’s all that’s behind it. It’s also mentioned at w:Persian_phonology#Colloquial Iranian Persian. Fay Freak (talk) 05:14, 12 July 2021 (UTC)

@Benwing2 I've noticed some other problems with Category:English terms by usage, but the only one I'll address now is the sparsely populated Category:English familiar terms. Should this be folded into Category:English informal terms? The distinction is very subtle, and it's certainly not being used the way that the category description intends in the rest of Category:Familiar terms by language. I can tell you that in the Romance languages at least, "familiar" is just the term many dictionaries used instead of "colloquial" or "informal". Ultimateria (talk) 17:02, 16 July 2021 (UTC)

@Ultimateria I'd be fine with combining these categories with 'informal terms'. We already have e.g. Category:English endearing terms, and it looks like the 6 terms in Category:English familiar terms can go either in Category:English endearing terms or Category:English informal terms. Benwing2 (talk) 03:10, 17 July 2021 (UTC)
Anyone object to merging 'familiar' with 'informal'? Most languages that have any such terms categorized as 'familiar' have only 1 or 2. I checked some of the languages with more such terms; many are Romance languages and it's clear in this case that it's merely used for informal terms just as User:Ultimateria suggested, maybe sometimes with relatively strong informality, but that's it. The use in Japanese and English appears to represent mostly informal terms referring to family or friends, which is probably the original intention of the category, but we also have the better-populated and more well-defined Category:Endearing terms by language and Category:Derogatory terms by language. Both endearing and derogatory terms are almost always informal as well, and I'm almost positive these two categories along with Category:Informal terms by language will suffice for all use cases. Benwing2 (talk) 16:52, 18 July 2021 (UTC)
Isn’t “endearing” the antonym of “pejorative” or “derogatory”, or what is it? (May also be the antonym of “pejorative” and comprise the idea of familiarity at the same time.) I know “meliorative”, but this is rarely used, and more often in German (→ de:meliorativ, w:de:Pejorativum about “Melioration”). Fay Freak (talk) 23:29, 23 July 2021 (UTC)

Proposal for Category:Misnomers by languageEdit

This category would be used for terms that are named, or perceived to be named, after something that they aren't actually related to. For example, turkey isn't actually from Turkey, Dutch filet americain isn't from America, Russisch ei isn't from Russia, and so on. —Rua (mew) 08:20, 12 July 2021 (UTC)

Does Dutch baby fit in there? And what about bastard operator from hell? There are credible rumours these miscreants aren't actually from hell. Regardless of the potential merits of such a category, I have a problems with the proposed name, which IMO is too generic. The label “misnomer” can be applied to anything not appropriately named; one can call Oktoberfest a misnomer for a festival that most of the time takes place for most of its duration in the September month, and centipede a misnomer for a critter that never has exactly 100 legs.  --Lambiam 23:02, 12 July 2021 (UTC)
Same. I reckon it useful to have terms with misapplied origin qualifiers collected, but what else would fall under this name? We might start with a category restricted to demonymic terms – there are a lot in the plant lexicon, Armenian cucumber, granoturco, مصر بوغدایی(mısır buğdayı) (but also mısır?) – which later can be a subcategory of a more general concept. Fay Freak (talk) 23:38, 12 July 2021 (UTC)
Yes, good idea. Another example: cochon d'Inde.
Also, what about montagnes russes vs. америка́нские го́рки (amerikánskije górki)? Or take French leave vs. filer à l'anglaise? PUC – 23:55, 12 July 2021 (UTC)
Most (almost all?) noun phrase idioms are well-attested misnomers, aren't they? Aren't metaphors misnomers sensu stricto. DCDuring (talk) 01:39, 13 July 2021 (UTC)
Alternative name proposals are welcome of course. —Rua (mew) 11:33, 13 July 2021 (UTC)
Maybe the category could be introduced as "Category for terms named after an incorrect country of origin". This would rule out words like "centipede" and metaphors and leave only what I believe was meant by the proposal. Thadh (talk) 11:51, 13 July 2021 (UTC)
I think that that which is (mis)named is a concept, not a term (in this context synonymous with name).  --Lambiam 15:57, 13 July 2021 (UTC)
Also, it doesn't have to be countries. —Rua (mew) 17:42, 13 July 2021 (UTC)
In that case, what are the requirements, if I may ask? Because I would definitely support the country of origin-misnomeres, since that's very specific, but I'm afraid a more abstract definition would quickly become unmanageable, like the others said above: any metaphor could be called a misnomer, if an aspecific enough definition is handled. Thadh (talk) 22:13, 13 July 2021 (UTC)
Anything else that's a place name? —Rua (mew) 08:15, 14 July 2021 (UTC)
Oh, okay, makes sense. In that case "Category for concepts named after an incorrect location of origin"? Thadh (talk) 08:36, 14 July 2021 (UTC)
I agree this would be useful to categorize if we can think of what to call it. What is the general term for "thing from place" words which aren't misnomers, i.e. what are terms like Italian sausage, Belgian chocolate which, pace our entry, seems to almost always mean chocolate from Belgium or American cheese called? (And do we want to group those?) I'm wondering because then turkey and cochon d'Inde would just be "misnamed [whatever the term for the general thing is]s". - -sche (discuss) 01:31, 15 July 2021 (UTC)
I don't know what such terms are called, but I agree with this idea. —Rua (mew) 19:14, 15 July 2021 (UTC)
@-sche: The second sense at toponym reads as follows: "(less common) A word derived from the name of a place." But I don't know if it quite fits, and even if it does, using it might lead to confusion, given it's not the most common meaning of the word... PUC – 09:47, 16 July 2021 (UTC)
What about Category:X terms misleading as to the place of origin of the things they designate? Super wordy though... PUC – 09:52, 16 July 2021 (UTC)
To shorten it: "the things they designate" are referents. Equinox 10:05, 16 July 2021 (UTC)
Ah yes, thanks, that's already a bit better. While we wait for someone to come up with a good category name, I've created User:PUC/Terms misleading as to the place of origin of their referents, so that we can start gathering such terms now. PUC – 10:12, 16 July 2021 (UTC)

Appropriation of Module:la-pronuncEdit

I've asked for help stopping that user from several admins, but to no avail. To recapitulate, they've unilaterally replaced a pre-existing Vulgar Latin transcription with something entirely different, namely proto-Romance, an attempt to reconstruct Latin purely using the comparative method. What's worse, they're inventing a narrow phonetic transcription for that theoretical construct that isn't an actual language and has no identifiable sociolinguistic aspect (some postulate a "high" and a "low" variety), no sub-varieties with specific allophonies. That person is jumbling together whatever allophones they can find described in disparate sources. To justify this, they're throwing about Russel teapot whataboutisms like "if Classical Latin had a prestige standard variety, then why not Proto-Romance?" - I kid you not. They've pointedly done so while foregoing any discussion because they knew they wouldn't get the green light, and because their purpose here is to provoke and get their way by treating this website as their personal playground for reconstruction games; this is just one instance in an entire spree of the same. I've asked them to desist. I've tried naïvely treating them as a normal human being only to get a bucket of deranged abuse thrown in my face (complete with classic bully slander like "you made me do it" and "your medicine"), threatening further abuse if I continue trying to communicate (I've attempted to explain the reasons for this abusive behaviour elsewhere). They've even tried to turn the tables on me, saying that by reverting their one-sided removal it's me who's overwriting their work; and they tell me to start a sub-module in order to restore what they've unilaterally removed - obscenely enought, this is what I myself suggested they do from the get-go on the module's talk page, which they ignored.

Currently they're blithely edit-warring the page as they have done with capus, fōrmāticus and will continue to do in the future. Communicating with that is not an option for me - I had observed them behave that way elsewhere for several years, brushing it away (I'm numb to the Internet); but after that last interaction, stepping on the same rake would require a level of stupidity from me I simply don't possess. Please don't ignore and help stop this whatever from ruining the website. Brutal Russian (talk) 01:27, 17 July 2021 (UTC)

The Proto-Romance module that I have implemented, which is intended for reconstructed words, is thoroughly cited with reliable modern sources; I have even added documentation on the module page itself explaining the rationale for each and every feature. At least two other users (Ain92 and ser be etre she) support the general idea, and I have gotten 'thank yous' for making the relevant edits from another two users (Sigehelmus and Tibidibi).
Meanwhile, you have yet to cite your implementations of Pompeiian features. You first asked to well over a month ago, and several times thereafter, and you still have not done so. Needless to say, per Wiki policy it is the responsibility of the one who wants to restore removed content to actually support it with citations. To illustrate some of the issues here, multiple reliable sources contradict your assigned features, and at least one feature simply is not mentioned as existing in Pompeii by any of them, as far as I can tell. I will bring up one more example: why do you keep changing Classical Latin l before long and short e to non-velarized, when Weiss, Sihler, and Sen state that it was velarized?
Incidentally, you had no qualms about 'appropriating' the Vulgar Latin module without contacting the original editor and asking them to cite their content.
While we are on the subject of 'deranged slander', care to explain some of the absolutely stunning comments that you have made, such as example 1 and example 2? The Nicodene (talk) 01:57, 17 July 2021 (UTC)
They call out and explain the deranged and abusive behaviour as volatile narcissistic rage for those who have less of an experience dealing with enraged narcissists than I do. Trust me, I see through it. Brutal Russian (talk) 04:47, 17 July 2021 (UTC)
@Chuck Entz as can be seen above, this user continues to call me an enraged narcissist, even after multiple people, including yourself, have called him out for it. Clearly just asking him to stop is not working. Is there anything you can do? The Nicodene (talk) 07:43, 17 July 2021 (UTC)
@The Nicodene, Brutal Russian I strongly believe that neither 'Vulgar Latin' nor 'Proto-Romance' should be listed as pronunciations. The new Proto-Romance is IMO better than the old "Vulgar Latin" in that the latter contained random features of only certain Vulgar dialects, such as lenition of intervocalic stops, whereas the former tries at least to represent the proto-language; but there are too many uncertainties in most reconstructed systems to warrant giving reconstructed pronunciations. As a general rule we don't include pronunciations of other reconstructed languages, so IMO we shouldn't do so here. Benwing2 (talk) 03:19, 17 July 2021 (UTC)
@Benwing2 So that would mean that would mean simply excising 'vul' out of existence and leaving reconstructed Proto-Romance words without any pronunciation? Unfortunate to see it all go down the drain, but ah well. The Nicodene (talk) 03:35, 17 July 2021 (UTC)
@Benwing2: The Vulgar Latin that user continues to remove represents the specific variety of Campania. I replaced the older one with it after some encouraging discussion - that one did indeed include random features. As I explain elsewhere, when people don't mean a different language of the plebs when using the term, they're generally thinking of the language of Pompeian inscriptions. This variety is Classical-age, concurrent with our Classical, it seems to continue an even older Mid-Republican variety (what the Romans indifferently thought of as the rustic Latin of Latium and the environs), and therefore serves to represent variation inside the language, or described differently it serves as a repository for common non-standard features that we currently don't represent, but which other reconstructions posit as general, and consider our Classical reconstruction as conservative at best for the Augustan period. The voiced intervocalic stops are lenited (fricative) in all the modern varieties in the same area (even word-initially), and it's likely that they simply never hardened as they did in urban Latin (they were fricative in Pre-Latin). This is probably what underlies the b~v confusion in the first place. I gave some more thoughts on it here a few days ago. It's on absolutely different ground from the what-if-we-didn't-have-attested-Latin Latin that's being shoved in our face. Brutal Russian (talk) 04:35, 17 July 2021 (UTC)
It is not at all the case that most people mean Pompeiian Latin of the late first century C.E. when they say 'Vulgar Latin'. Per Lloyd (1979), the term has more than a dozen definitions/interpretations, many of them conflicting, and they span wide geographic spaces, and also potentially the better part of a millennium, chronologically. If you want to have such a module, it should specifically be labelled 'Pompeiian Latin, late first century' or words to that effect, not 'Vulgar Latin'.
If you believe that the reconstructed Classical pronunciation, such as we have implemented it, was contemporaneous with the Pompeiian pronunciation, such as you have implemented it, you will have to show, for instance, that /w/ had become a bilabial fricative in Pompeii but, simultaneously, it had not done so in Rome. Multiply that across other features as well, such as /i/ becoming [j] in hiatus after consonants, and you will begin to see the problem.
If you believe that pre-Latin intervocalic fricative realizations of /d/, /b/, and /g/ persisted in Pompeii, but had changed to occlusives in the 'urban Latin [of Rome]', then you will have to demonstrate that–or at the very least cite a single source that posits this as a regional difference for the year 79 C.E. Phonetic features from the Neapolitan Language are not proof of the existence of features well over a millennium prior. (Neapolitan does not, as far as I am aware, even have a fricative outcome like [ð] of word-initial /d/; rather it has an alveolar tap. I wonder about this supposed word-initial [ɣ] as well.) The Nicodene (talk) 05:25, 17 July 2021 (UTC)
(Sorry to ping again) @Benwing2 "As a general rule we don't include pronunciations of other reconstructed languages, so IMO we shouldn't do so here."
I am not sure that is the case. For instance Wiktionary has several pages of Proto-Italic, Proto-Celtic, Proto-Germanic etc. terms with IPA pronunciation. There are even four PIE entries given transcriptions, interestingly.
We understand a great deal more about Proto-Romance than we do about any of these other proto-languages due to the abundance of comparative data we have from different branches of Romance languages, not to mention that comparison with the attested language (Latin) is possible. The scholars compromising the Dictionnaire Étymologique Roman team, among various other specialists, have solid reasons for reconstructing the features that they posit. It seems to me that putting their work on entries for reconstructed Proto-Romance words enriches Wiktionary and provides useful information for those interested in the diachronic evolution of the Romance Languages. The Nicodene (talk) 06:17, 17 July 2021 (UTC)
It's possible to supply DÉRom's phonemic shapes in the pronunciation section like this: (DÉRom) IPA(key): /aˈɡʊst‑u/ invalid IPA characters (‑).
Whoever is reading, I continue to implore that we stop this abusive and volatile user from ruining the website and poisoning its community. Brutal Russian (talk) 21:09, 17 July 2021 (UTC)
The abusive user is the one who screams "narcissist" dozens of times at their opponent, calls them a dog, etc. Not to mention completely violates Wiki policy by pushing unsourced content via edit-wars, even after being asked to cite it several times, and even after specific problems are pointed out. The Nicodene (talk) 21:35, 17 July 2021 (UTC)
@The Nicodene See Wiktionary:Beer_parlour/2020/February#Inclusion of reconstructed pronunciations of Proto-Indo-European_words. Most people opposed reconstructed pronunciations for PIE, and I think the same applies here. I also see some cases where there are apparent mistakes, e.g. you listed [dɔn:na] for domna which can't be right given French dame and similarly French sommeil from *somniculum. Benwing2 (talk) 00:27, 18 July 2021 (UTC)
@Benwing2 Perhaps it would be best to put the issue to a vote?
Domna is supposed to output [ˈdɔmna]; it seems there was a bug in the code somewhere. The Nicodene (talk) 19:24, 18 July 2021 (UTC)
@The Nicodene Voting is not ideal until we get input from multiple people that makes it clear whether there's a consensus and what the concerns are. Some recent votes themselves have been controversial and led to unhappiness after the vote. So far hardly anyone has expressed their opinion about this (not surprising IMO given the bitter back-and-forth between you and User:Brutal Russian). It's not even clear to me what the vote should read, and given the current supermajority rule, the outcome would likely depend on how the question was phrased. There are multiple issues: (1) Should we allow reconstructed pronunciations in general? (2) Should we allow reconstructed Latin pronunciations? (3) If so, what sort of pronunciation? Reconstructed Proto-Romance, some specific reconstructed dialect, both, neither? Furthermore, any time you talk about reconstructed pronunciations, you have the question of whose reconstructions. For example, for many Indo-European families, you have the highly productive but eccentric Leiden school vs. one or more other more mainstream competitors. These sorts of questions are best hashed out *before* a vote, not *during*. Benwing2 (talk) 19:47, 18 July 2021 (UTC)
@Benwing2 Perhaps it would be best to start a new discussion about this somewhere else with a wide variety of users participating. It would no doubt help to have some moderation, whereby personal attacks or toxic comments are removed and/or met with disciplinary action. The Nicodene (talk) 20:26, 18 July 2021 (UTC)
@Benwing2: But the current Classical pronunciation is likewise a reconstruction, and we'd have to remove that too, only leaving Ecclesiastical; and even that is not altogether descriptive, but based on the codification in the Liber Usualis, with actual practice varying. In particular the school pronunciation that the Italians use is closer to neostandard Italian than that. So I think it bears repeating that the Classical and the Pompeian pronunciations are both reconstructions on the same footing; they belong to the same attested language and to the same period, they're backed up by vast amounts of research and metalinguistic comments by the speakers to warrant an exception to the general guidelines about reconstructed pronunciations. In contrast, Proto-Romance is a reconstruction itself, not an attested language, and is on a completely different footing from the attested varieties of Latin. Brutal Russian (talk) 02:16, 22 July 2021 (UTC)

This edit warring in Module:la-pronunc by User:The Nicodene and User:Brutal Russian is unacceptable, so I've protected the module. It's bad behavior and it makes the server have to constantly update all the pages that use the module. I'm not registering my approval of the current revision. If you would like, please decide on a less controversial revision and have an admin roll back to it. Please do your editing in Module:la-pronunc/sandbox or user sandbox modules. The main module is currently protected indefinitely, but it should be unprotected when the participants in the edit war have agreed on a ceasefire. — Eru·tuon 02:48, 22 July 2021 (UTC)

Reducing the size of Module:languages/data* by moving varieties/aliases/other names (and possibly Wikidata links) elsewhereEdit

It's known that the size of the language modules is a significant cause of memory errors. Some months ago (see Wiktionary:Grease pit/2020/November#splitting language data) I proposed splitting the data modules to be indexed by the first two letters instead of the first letter. Some people had concerns that this approach would be unwieldy to work with, and so it was proposed to have the modules still in the current structure but have a postprocessing step after saving a module that would copy them to the more split-up structure. This requires some JavaScript hacking, which I'm not very familiar with; User:Erutuon made some good suggestions and did some investigations, which weren't totally promising. I'm now suggesting a less radical but probably effective solution of moving the varieties, aliases, other names and maybe Wikidata link fields to different files. I think User:Rua also suggested this at one point. I'm almost positive the varieties, aliases and other names aren't used on most pages, and for some languages they can take up a lot of space (cf. French, which has one alias and 27 varieties listed). Moving the Wikidata links may be less effective; they are used at least when generating Wikipedia links in etymology templates. (On the other hand, they occur in pretty much *every* language, so the following solution might be very effective: (1) move the Wikidata links into a separate file; (2) create a list of high-memory pages where Wikipedia links are not to be generated by etymology templates, similar to what's in Module:links/data, but more centralized, more up-to-date and more effective; (3) implement this in Module:etymology.) If we do this, I would argue we should create a parallel set of modules under something like Module:languages/extradata* that has the same structure as Module:languages/data* but holds the secondary data not typically needed. Benwing2 (talk) 19:36, 18 July 2021 (UTC)

Sounds all good. I wasn’t even aware that varieties, aliases and other names could use memory while not being used. So the whole pages are loaded and therefore indexing by the first two letters instead of the first one helps, I understand. (I thought the other names are for easing the identification of languages, as e.g. it is annoying if you want to cite a cognate but do not find the language code and name which the language has on Wiktionary (thou probably hast not done such remote etymologies and reconstructions. It is bizarre of course that for that to work Lua memory is used …) Fay Freak (talk) 20:27, 19 July 2021 (UTC)
Thinking out loud about the effect of removing the Wikidata items from the main language data modules. The Wikidata items are under the key 2 in the tables and the family codes are under key 3. Key 1 (the canonical name) is always filled, and keys 2 and 3 are almost always filled (7270 languages have Wikidata item and family, 516 only Wikidata item, 356 only family, 24 neither). Because these three keys use the Lua array syntax (for instance, { "English", "Q1860", "gmw", ... }) they are placed in the array part of the Lua table when all three are present, and because Lua array sizes are in powers of 2, there will have 4 array slots and the fourth slot is unused. To reduce the size of the array to 2 slots, we would have to get rid of either the Wikidata items or the family, and move whichever one was left to key 2. Apparently array slots themselves take 16 bytes, so that would be at least (7,270 + 516) array slots × (16 × 2) bytes per array slot = 249,152 bytes out of a limit of 52,428,800 bytes saved on pages that do not load Wikidata items. The total size of the Wikidata item strings themselves seems to be 59,000 bytes, and much of that might be saved as well. Based on that napkin math, removing Wikidata items would have relatively insignificant savings, but because napkin math is not completely accurate, maybe it's worth trying if I make a script to do it automatically, roughly as I did for splitting language data into more modules.
Removing otherNames, aliases, and varieties would probably have a greater impact, but it's a bit harder to calculate the likely savings because nested tables are involved. As you say, they're almost never used in entries — probably never. I can't think of a way to verify this without editing Module:languages to transclude a tracking template inside getOtherNames, getAliases, and getVarieties and then wait for the server to update all the entries and count the transclusions. — Eru·tuon 21:33, 20 July 2021 (UTC)
If the slots themselves take up so much space an option could be to cram more data into one slot? e.g. comma or colon separated. "Aranadan", "Q3507928:dra" – Jberkel 22:02, 20 July 2021 (UTC)
@Erutuon Thanks for your analysis. I suspect the burden/savings may be greater because of the weird stuff that MediaWiki does when mw.loadData() is called. I'm not quite sure how that works but I think it adds an extra memory burden. Also, when you calculated the 59,000 bytes of the object strings, it sounds like you didn't take into account the memory used for the actual string object. Per the link to wowpedia.fandom.com, it says the average memory consumption is around 24 + string_length. The average length of the Wikidata items looks to be around 8 bytes, meaning (7270 + 516) * 32 = around 249,152 bytes for the actual strings. I suspect MediaWiki wraps each object loaded via mw.loadData() in a table, which means at least another (7270 + 516) * 40 = 311,440 bytes for the Wikidata strings, probably significantly more. I suspect we're talking at least 1 MB all told for the Wikidata strings given that the total size of the language data appears to be > 12 MB (based on a statement from User:Rua awhile ago). Benwing2 (talk) 03:29, 21 July 2021 (UTC)
@Benwing2: Yeah, I was just counting the byte length of the Wikidata items and had neglected the 24 bytes, so your calculations are about right. The padding takes up more than the actual string data!
I think mw.loadData doesn't affect the memory usage of string fields. It wraps tables (not strings or booleans or numbers or nil) and only when you access each of them for the first time. Access happens by calling mw.loadData (for the top-level table in the module) and by indexing the table or any of its subtables, directly or indirectly by iterating over them with pairs or ipairs, when indexing yields a not-yet-accessed table.
For example, local data2 = mw.loadData("Module:languages/data2") wraps one table, indexing local english_data = data2["en"] wraps another, indexing local aliases = data2["en"]["aliases"] wraps a third. The previously wrapped data["en"] was cached and returned the second time it was indexed. local wikidata_item = english_data[3] doesn't wrap anything because the yielded value is a string, not a table. for k, v in pairs(english_data) do end will then wrap the 5 remaining subtables in english_data. When local data2 = mw.loadData("Module:languages/data2") is called again, the process repeats, because the return value of mw.loadData is not cached. (The internal table that is accessed through the wrapper is cached for an entire page, however.)
It looks like the number of fields affects how long it takes for mw.loadData to iterate over the whole table recursively when it validates the data module (once per each page that loads a data module with mw.loadData), but I can't see a way for it to affect the memory used in wrapping it. I hadn't considered the validation part, and it could contribute to memory usage unpredictably, because it uses a temporary seen table to track the tables that have already been validated, with as many hash fields as there are tables in the return value of the data module, and that table will be garbage-collected at some time in the future. — Eru·tuon 06:03, 21 July 2021 (UTC)
@Erutuon Thanks again for the analysis. I think we should assume that no garbage collection happens when generating a page. Benwing2 (talk) 02:47, 22 July 2021 (UTC)
@Benwing2: It looks like that's true, because inserting an object with a garbage collection metamethod (setmetatable({}, { __gc = function() mw.log("Garbage is collected!") end })) into Module:links or Module:languages and previewing water doesn't cause anything to be printed to the Lua log. I'd never tried this before, but just assumed garbage collection would have to run sometime. Making a loop that increases memory usage until garbage collection happens doesn't manage to force garbage collection: the first template invocation to use the module just runs out of memory or time. So garbage collection seems to be disabled entirely, or maybe there's no way to trigger it under normal conditions. Then the unpredictability of memory usage is because of something other than garbage collection. — Eru·tuon 03:07, 22 July 2021 (UTC)
"The padding takes up more than the actual string data!" Encoding wikidata ids numerically would be another option to save space. They seem to use 64-bit floating point in the version of Lua used. – Jberkel 12:16, 22 July 2021 (UTC)

Change to WT:WDL?Edit

The criteria for inclusion (CFI) require three usages for well documented languages (WDLs).

hapax legomenon (entry, appendix, category) are terms which are only attested once.

So there can be no sub-category of Category:Hapax legomena by language for WDLs.

Arabic and Chinese hapax legomena can be there as WT:WDL only lists "Modern Standard Arabic" and "Chinese (Standard Written Chinese)" (= written vernacular Chinese). This doesn't include Classical Arabic, Old Chinese, Classical Chinese.

However, in the list it's only "Hebrew" and not Ivrit or Neo-Hebraic. Thus all terms in Category:Hebrew hapax legomena, often hapax legomena only found in the Bible and thus Biblical Hebrew and one term being Paleo-Hebrew, would have to be deleted.

To keep Palaeo-Hebrew and Biblical Hebrew hapax legomena I propose the following change to WT:WDL:

old: 2. Armenian, Azerbaijani, Georgian, Hebrew and Turkish;
new: 2. Armenian, Azerbaijani, Georgian, Hebrew (Ivrit or Neo-Hebraic) and Turkish;

Furthermore to align the wording and style, I propose to change:

old: 3. Modern Standard Arabic
new: 3. Arabic (Modern Standard Arabic)

Then it's first the language name as it's used in entries (Arabic, Chinese, Hewbrew) followed by a qualification which variety is considered a WDL. -Macopre (talk) 14:17, 19 July 2021 (UTC)

I always imagined that given the wording "languages well documented on the Internet", this always referred specifically to the forms that are actually represented online, unlike e.g. Biblical Hebrew. To insist on the three-use criteria for all forms of Korean would be quite damaging to Wiktionary's coverage of the dialects.--Tibidibi (talk) 14:49, 19 July 2021 (UTC)
Well, on the hand, Biblical Hebrew is represented online (e.g. digitalised BHS, and there surely are versions on Google Books). And on the other, the list doesn't cover what is well documented, but what wiktionary treats as well documented. Scots for example isn't well documented (BP, AS).
As for Korean, there are for example: Old Korean, Middle Korean, (New) Korean, Jeju. That is, Jeju, Old and Middle Korean are other languages than (New) Korean, and only the latter is a WDL. For Hebrew it's different as Hebrew contains Paleo-Hebrew, Biblical Hebrew, Medieval Hebrew and Ivrit. --Macopre (talk) 15:28, 19 July 2021 (UTC)
@Macopre The ISO 639 code "ko" or "kor" for Korean (and hence Wiktionary Korean entries) include everything between 1600 and 1900, which was still a very different language from Contemporary Korean. Many of the most conservative, and hence most linguistically interesting, dialectal words might also fail because of the rapidity of ongoing dialect leveling. I don't think it makes sense to insist that an interesting Korean hapax is grandfathered in because the text is from 1599, only to be deleted when a new analysis discovers that the text was actually from 1601; or to accept an interesting word collected by Ogura Shinpei on Jeju Island but to reject a similar word when collected in rural North Korea (whose dialects are worse attested than that of Jeju).
The same general ideas would apply to many other languages on the list. Follow the spirit of the law rather than the letter, so to speak.--Tibidibi (talk) 15:51, 19 July 2021 (UTC)
Well, that's a general and different topic regarding WT:CFI and all WDLs. Here some cases were English terms didn't have 3 usages and hence were removed:
--Macopre (talk) 19:30, 19 July 2021 (UTC)
Firstly it would not be a “change to WT:WDL” because as already said by systematic interpretation we get the desired results (“follow the spirit of the law rather than letter”, that is the intent of the statute maker); there is only the danger that you botch the text change and we only have a new statute maker intent to interpret, because obviously you do not think before editing and assume that everyone is just a machine that executes statutes instead of finding purpose.
So I don’t even see what the intended change with “Modern Standard Arabic” → “Arabic (Modern Standard Arabic)” is.
Then your wording is poor. “Neo-Hebraic” is ugly, it should be Neo-Hebrew, and Ivrit can just be understood as any Hebrew, because it is just the Hebrew form of “Hebrew”, irrespectively of whether English speakers pretend it to mean just Modern Hebrew. It is as poor as the claim that Persian or only Iranian Persian should be called Farsi.
Then the strict criteria for Armenian, Azerbaijani, and Georgian should start from a similar time as Modern Hebrew. Probably with their Sovietization, which is when Azerbaijani switched from the Arabic alphabet (before that no 10% could read and write in Azerbaijan) and about when the dissolution of Armenian communities in Turkey happened: Surely we should have Western Armenian dialect terms which are mentioned for specific places. The corpora are also bad for these three languages. Can’t attest normal words for Azerbaijani even though it is in Latin script (or before, various Cyrillic scripts) that should scan well.
As the examples of dialect words show, we are in need of implementing the idea that a word needs only as much to be attested as it is specifically claimed to be used: If a word was allegedly used in regional dialects of Western Armenian or Northern England at one time then as we expect it to lack in the written sources then so be it. The important thing is that it should not look sham. Other attestation-based dictionaries also include that and it is all fine as they are explicit about how it is attested. And if a word is used on the internet than it may be attested from the internet. Extreme example are emojis, man, the CFI even precede them. It was seen that if a word appears often non-durably then it does not matter any more that it is not durable because non-durable occurrences get replaced by others: I formulated that a word should be ”consistently appearing on the internet”. And a dialect word should be attested as its kind of word uses to be attested. It is different for literary inventions again as we want to exclude protologisms. Fay Freak (talk) 20:18, 19 July 2021 (UTC)

Oirat language codeEdit

@Metaknowledge I would like to revive the discussion at Wiktionary:Beer_parlour/2021/April#Oirat_language_code.

@Victar, what do you mean by "Kalmyk should probably be a etymology-only code Oirat xal"?

@LibCae, in your opinion, should the code xal be used for Oirat instead of Kalmyk? Can we create a separate code for Kalmyk (e.g. something like xal-kal), which is a variety of Oirat? RcAlex36 (talk) 16:52, 19 July 2021 (UTC)

Xinjiang Oirat has developed pretty differently than Kalmyk, as either phonetically or lexically. I’d like to keep xal for Russian Kalmyk. An independent subcode for Modern Xinjiang Oirat may be suggested, placed under xwo or xal after discussion. LibCae (talk) 08:59, 21 July 2021 (UTC)

Vietnamese Han character entries without headword-line templatesEdit

Wiktionary:missing headword-line templates has a lot of instances of Vietnamese Han character entries. They can be found by clicking the "language section" header once (or clicking until the entries sort alphabetically) and then scrolling down near the bottom. It would be nice to clear these out because they make up about 170 out of the 600 something missing-headword pages.

I posted about this on Discord, but it looks like the most knowledgeable Vietnamese editors might not be active on there. Pinging KevinUp as suggested by Suzukaze-c.

Would it be a good idea to clean these up by just putting all the readings into {{vi-readings}}, or do they need more like information like radical-stroke sortkey, categories of reading, and references? The top few search results here seem to all have the sortkey: Special:Search/insource:"vietnamese han character". We have Module:zh-sortkey if that would usually give the right answer. — Eru·tuon 20:21, 19 July 2021 (UTC)

(and @KevinupSuzukaze-c (talk) 20:25, 19 July 2021 (UTC))

Proto-Norse romanisationsEdit

Good day fellow Wiktionarians! I have recently been adding romanisations of Proto-Norse entries in runes, in order to make them easier to find and search for; this is consistent with another ancient Germanic language only attested in an extinct alphabet, Gothic. In order to complete this process I want to enable the instant linking of transliterations, like in Gothic. To demonstrate:

As we here can see, the Gothic transliteration becomes a blue link, while the Proto-Norse stays as unclickable text. In order to change this, the line `link_tr = true` should be added to Proto-Norse (gmq-pro) in Module:languages/datax. This requires an admin to do, so I'm posting here to make sure there is some level of agereement on doing this. ᛙᛆᚱᛐᛁᚿᛌᛆᛌWiktionary's most active Proto-Norse editorAsk me anything 22:26, 20 July 2021 (UTC)

Actually, I would rather we delete the link to Gothic romanizations, because I don't see the point. The romanization entries don't say anything but "Romanization of" the native script; why bother linking to that? As far as I can tell, of all the Indo-European languages we have romanization entries for – Gothic, Hittite, Konkani, Mozarabic, Oscan, Pictish, Primitive Irish, Proto-Norse, Sauraseni Prakrit, South Picene, Umbrian – Gothic is the only one where the romanization is linked to automatically. I'd rather bring Gothic into line with all the rest, rather than bring Proto-Norse into line with Gothic. —Mahāgaja · talk 15:45, 22 July 2021 (UTC)
I agree with Mahagaja. IIRC the romanization entries are to assist in getting users to the Gothic-script entries; if they contain no content other than a pointer to the Gothic-script entry, linking to them from the Gothic-script entry seems non-helpful. (Then again, we do link to e.g. plurals from plural even though it is nothing but a pointer to the lemma form, so, shrug. Yes, I know, plurals is supposed to contain a pronunciation section, but how often does that happen?) - -sche (discuss) 06:48, 23 July 2021 (UTC)
@-sche: It happens whenever I notice a Pronunciation section is missing. —Mahāgaja · talk 21:21, 23 July 2021 (UTC)
You know what, I actually agree with you. ᛙᛆᚱᛐᛁᚿᛌᛆᛌWiktionary's most active Proto-Norse editorAsk me anything 21:08, 23 July 2021 (UTC)
@Mnemosientje, any opposition to removing the link for Gothic? —Μετάknowledgediscuss/deeds 21:59, 23 July 2021 (UTC)

Universal Code of Conduct News – Issue 2Edit

Universal Code of Conduct News
Issue 2, July 2021Read the full newsletter


Welcome to the second issue of Universal Code of Conduct News! This newsletter will help Wikimedians stay involved with the development of the new code and will distribute relevant news, research, and upcoming events related to the UCoC.

If you haven’t already, please remember to subscribe here if you would like to be notified about future editions of the newsletter, and also leave your username here if you’d like to be contacted to help with translations in the future.

  • Enforcement Draft Guidelines Review - Initial meetings of the drafting committee have helped to connect and align key topics on enforcement, while highlighting prior research around existing processes and gaps within our movement. (continue reading)
  • Targets of Harassment Research - To support the drafting committee, the Wikimedia Foundation has conducted a research project focused on experiences of harassment on Wikimedia projects. (continue reading)
  • Functionaries’ Consultation - Since June, Functionaries from across the various wikis have been meeting to discuss what the future will look like in a global context with the UCoC. (continue reading)
  • Roundtable Discussions - The UCoC facilitation team once again, hosted another roundtable discussion, this time for Korean-speaking community members and participants of other ESEAP projects to discuss the enforcement of the UCoC. (continue reading)
  • Early Adoption of UCoC by Communities - Since its ratification by the Board in February 2021, situations whereby UCoC is being adopted and applied within the Wikimedia community have grown. (continue reading)
  • New Timeline for the Interim Trust & Safety Case Review Committee - The CRC was originally expected to conclude by July 1. However, with the UCoC now expected to be in development until December, the timeline for the CRC has also changed. (continue reading)
  • Wikimania - The UCoC team is planning to hold a moderated discussion featuring representatives across the movement during Wikimania 2021. It also plans to have a presence at the conference’s Community Village. (continue reading)
  • Diff blogs - Check out the most recent publications about the UCoC on Wikimedia Diff blog. (continue reading)

Thanks for reading - we welcome feedback about this newsletter. Xeno (WMF) (talk) 02:52, 21 July 2021 (UTC)

Anyone forking yet? Equinox 03:12, 21 July 2021 (UTC)

Vocalisation and diacritics: contrast in Persian and Arabic entriesEdit

Hi Wiktionary folks, I have a question to ask out of curiosity: I noticed that Arabic entries in Wiktionary provide a vocalised spelling of the headword (as well as an IPA transcription), but Persian entries contain no such version with vocalisation or other diacritics like tašdid (although they do contain an IPA transcription). Why is this so? I am just starting out learning Persian so I'm not familiar with dictionary formats and spelling conventions. Thanks! —⁠This unsigned comment was added by DDR44818 (talkcontribs) at 09:22, 21 July 2021 (UTC).

@DDR44818 This is I think due to a combination of conventions of the languages in question and choices of the editors involved in working on those languages. Vocalization of Arabic script was invented specifically for Arabic and I think (not sure though) that it's much more common to see it used for Arabic than for Persian. The choice the Persian editors seem to have made is to use the transliteration to indicate the vowels (not always consistently, I must say). Benwing2 (talk) 02:52, 22 July 2021 (UTC)

Deleting the IndexEdit

A new vote to do exactly what it says on the tin: delete an outdated, unneeded namespace from the wiki. Input is welcome. —Μετάknowledgediscuss/deeds 23:47, 21 July 2021 (UTC)

Nuke nuke nuke. Possibly some of the Chinese input method stuff is savable. Benwing2 (talk) 05:00, 22 July 2021 (UTC)
Support in principle, but I think it would be good to keep the red links somewhere. PUC – 11:45, 22 July 2021 (UTC)
I didn't know there was an index. Support then, I guess. But interesting to see the data presented that way. – Jberkel 12:04, 22 July 2021 (UTC)
Support. Even if this fails, at the very least the index links on the front page should be changed to point to our lemma categories, which I suspect many of our casual users never manage to find. — Vorziblix (talk · contribs) 18:56, 23 July 2021 (UTC)

Continued abuse of Module:la-pronuncEdit

Dear fellow editors, I've repeatedly asked for help with this here and personally from the admins, but to no avail. I believe that user The Nicodene is on a personal crusade against me that's driven by hatred and revenge and a desperate desire to personally humiliate me and exact admissions of being wrong about something, anything, and to portray me as an "ignorant beginner". They've have replaced part of the module with a transcription of an entirely different, non-attested entity, and they've done so blatantly, one-sidedly and with no prior discussion. They've proceded to wage an outrageous edit war. I've made every attempt to get outside help, but again, unsuccessfully; therefore I'm being forced to revert their edits to the pre-existing consensus. I hate doing this - my longest disagreement prior to this was with an IP and consisted in two reverts, and I've even started a discussion with them in BP. But I don't what other choice I have!

After @Benwing2 expressed an opinion (here above) that reconstructions shouldn't be given - which in any case would necessiate removing the Reconstructed Classical pronunciation as well! - user The Nicodene changed their tactics and are now repeatedly deleting the whole sub-module. They've been speciously citing WP:BURDEN and "no original research" to justify that outrage, but this is not Wikipedia and we don't operate this way! We constantly do original research! Surely one cannot be justified to up and delete anything they don't know anything about and hold the entry ransom, demanding thorough citations! And this is a module, not just an entry! How can there be any doubt in anyone's mind that the purpose of these actions is to exact personal revenge by disrupting the parts of the website that their target has been involved in?? In other words, this is a personal attack on me by the proxy of my edits - if my edits are undone, I'm being portrayed as wrong, an ignorant beginner, and they as right, a distinguished Romanist. They're even trying to turn the module itself into a battlefield and to give legitimacy to their blatant abuse of the website and its users by padding their reverts with "constructive edits", namely references inside the module, whether in suport or against the transcription! It's not enough that the discussion pages have been turned into disgusting squabble, they want to have that inside the module!!

Dear fellow editors, I implore your help! I understand that it's a man's world and everyone fends for themselves, but I don't understand why our website is being allowed to become a proxy battleground for disturbed haters to unleash abuse on long-time contributors like me who've been doing their best to seek feedback from others at every point, constantly seeking feedback in the Beer Parlour on policies and practices before making any changes, and who hasn't had a conflict with anyone here throughout the six years of participation?

Even forgetting the human aspect of this, I don't understand how such a blatant violation of the website's guidelines and practices is being allowed to happen. Am I wrong about this? Can anyone hold any page or module hostage, delete anything they like under a feigned "until the discussion is over" while continuing to edit it in whatever direction they desire? How is this compatible with {{rfv}} and so on?!

edit: now that @Erutuon has frozen the module in the state as vandalized by The Nicodene, it's my belief that by doing this the approach of The Nicodene is being validated, and a message is being sent that one can achieve their goals by abusing the website. I believe this is precisely what they were after when they started this debacle, and they've gotten what they wanted. The goal was to revert edit as fast as possible so that your revision wins out when the module eventually gets protected. I kindly ask that the module be restored to this revision, which is the same as the pre-conflict state with a few fixes applied. Brutal Russian (talk) 02:51, 22 July 2021 (UTC)

@Brutal Russian: I specifically said in my message on the page protection that I don't state my approval of the current revision by happening to have inaugurated page protection on that revision. You and other involved parties are welcome to agree on a more acceptable revision to roll back to. The essential thing is that edit warring should stop. — Eru·tuon 03:15, 22 July 2021 (UTC)
@Erutuon: I think the link you've provided is not the right one. PUC – 11:36, 22 July 2021 (UTC)
@PUC: Thanks, corrected. — Eru·tuon 18:16, 22 July 2021 (UTC)
For anyone wondering, I have provided a write-up of some of the abusive behaviour on Brutal Russian's part here. Suffice it to say that he said far worse than I could have ever dreamt of.
–––––––
I have already provided detailed write-ups of at least two issues with his edits: 1 and 2. I will now provide three more: 3, 4, and 5.
The central issue is not only that Brutal Russian does not cite sources (in fact, to judge by the quote "I will not enable the practice of taking a module hostage by paying ransom in references", he pointedly refuses to), but also that he adds examples of wholesale Original Research to the module, namely features that are not directly supported in any source.
All of my edits to the module, which include providing thorough citation of features in comments placed directly below their code (complete with actual quotes from the sources, as well as commentary), have been aimed at addressing the above issues and bringing the module up to Wikipedia's standards.
To claim that that is "vandalism" is simply flat-out wrong. The Nicodene (talk) 03:15, 22 July 2021 (UTC)
@Erutuon: But notice that I'm bringing up the fact that as far as I understand the principles of this websites's operation, user The Nicodene's behaviour cannot be described as anything other than abuse and vandalism. Could you comment on what I write in the top comment? Is it justified to arbitrarily delete something without discussing it? Is the person who cites WP:BURDEN, "no original research" and "I have my references" as justification of one-sided appropriation of a module in the right? I want to stress that the details of the transcription are irrelevant to this. I want to establish whether the behaviour that seems to me to violate everything this website stands for is admissible. If it's not - if the arbitrary, undiscussed replacement and further deletion of parts of the module violate wiktionary policy, I believe that the module needs to be reverted to the state it was in prior to the violation, and not to the state that resulted from its violation. That would be the neutral stance in my opinion. I would be thankful if you could address this. Brutal Russian (talk) 03:29, 22 July 2021 (UTC)
The neutral stance would be reverting to a version prior to the introduction of disputed features by either party. That would mean going back to January of this year, prior to your own appropriation of the Vulgar module and introduction of a series of undiscussed and uncited features (leaving aside, for the moment, the question of their validity).
The proper citation of sources is not some sort of cheap tactic ("I have my references"); it is rather a desirable course of action on Wikipedia, Wiktionary, and academic contexts in general. The Nicodene (talk) 03:43, 22 July 2021 (UTC)
I call attention to this user's vile hypocricy and mendacious accusations. I have discussed what to do with the Vulgar Latin transcription on several occasions - including with that very user under the alias Excelsius!! I proposed converting it into an attested variaty dubbed Campanian, which received some approbation and no objections. I acted on this after a long period of deliberation, open as I always am to any discussion and improvements. I haven't violated, abused or appropriated anything in any way. This user has arbitrarily, single-handedly and with no discussion obliterated that transcription, and they're employing all the whataboutisms in their book to make their blatant vandalism seem justified. Here they say that they're writing a separate PRmc module - and here they say "my request to start a separate one was rejected". Partly for this reason and partly as part of their personal crusade against me, they decided to blatantly appropriate the existing module!! I stress that this user's aim is to self-affirm their ego by humiliating and proving me wrong, and their battleground is wiktionary. And from my point of view not enough is being done to stop them. Please, dear fellow editors, I continue to implore your help. Brutal Russian (talk) 04:01, 22 July 2021 (UTC)
What I said was that the features in the Pompeiian module were undiscussed and uncited, which is objectively true. You did mention, back in the day, the possibility of adding a Pompeiian pronunciation in general (and got one user's "weak support" for it), but that is not the same thing as discussing, for instance, whether short Latin i should be rendered, in all environments, as [e] (or at least supporting that with citations), let alone other features.
I also mentioned, incidentally, adding Proto-Romance pronunciations back in the day, which also received no objections, and also met with some approval. Even you approved of the general idea before I ever mentioned it.
The above comment is full of vehement hostile language and "diagnoses", which you have already been blocked once for. To whit: "vile hypocricy [sic]", "mendacious accusations", "blatant vandalism", "blatantly appropriate", and "self-affirm their ego", and so on. I ask you to abide by Wiki rules for user behaviour. The Nicodene (talk) 04:26, 22 July 2021 (UTC)
@Brutal Russian: I've read as many of your accusations and psychoanalyses in this and previous posts as I can stand. You have used extreme dehumanizing language towards The Nicodene that is painful to read and makes it difficult to determine what the facts really are, and for all I know would get you banned on Wikipedia where they have more bureaucracy to enforce rules. I don't have the knowledge or daring to join your disputes about Latin pronunciation, though I have read some parts of them. It's unclear to me which of you has behaved worse in edit-warring over Module:la-pronunc, and I haven't tried to dig through your discussions and edits to figure that out. But you are both guilty of being editing a main module when you probably should be editing a sandbox instead, because the edits are so controversial at least to the other person that you are both willing to edit war over them intensively for a week! (Though editing and reverting started a month ago.) So it seems very reasonable to lock down the module.
If it's locked on the wrong revision, I'm not really inclined to change it unless you both come up with another revision that you can agree on. I just locked it because I got fed up when The Nicodene last edited. If another admin wants to unilaterally decide on another revision, they're welcome to, because the only part that I really wanted to take in this was locking down the module because edit warring drives me crazy. — Eru·tuon 06:37, 22 July 2021 (UTC)
@Erutuon: Please, I'm not asking that you read through the verbal wars. Is there no way I can get an opinion from you on the arguments that user is adducing to justify their edit warring? They're claiming that they're justified in deleting or replacing part of the module without discussion because it has no references, and that it's "original research". My reverts were based on the belief that this is a violation of the mode of operation of this website. My belief is that changes need to be discussed, that edits shouldn't be destructive of other people's work, and that personal research is allowed. Is my undersatnding utterly mistaken?? Can I just go and rewrite and delete modules and entries, demanding references, dismissing the contents as personal research and edit warring the entry or module until it gets protected? I believe all I was doing is reverting the module to the consensus because it fell on me to do so while everybody else was ignoring it! I want to understand whether my idea of how this website operates is completely mistaken, and if I've been wasting my time being polite to everybody, reaching out to people and trying to find consensus at every turn. I'm sincerely hoping to receive clarification on this.
(If you want to understand my "psychoanalyses", imagine suffering this type of abuse in return to extending good faith to somebody after they start a discussion by repeatedly covering you in slander while admitting a-priori animosity and bad faith in reply to your repeated calls for civility! Try to imagine that some embittered person with a hurt ego uses Wiktionary as a personal platform to wage war upon you!!! Try to imagine that your attempts to call public attention to this get repeatedly ignored! Not just that, but when trying to bring the issue to public attention I was rewarded with this user harrassing me further by creating millions of additional discussions that I was trying to escape, and using bewildering intellectual dishonesty including denying plain truth, as in the screenshot they adduce where I get outraged and call them a liar. My "psychoanalayses" was my best attempt to explain what's going on and I still have no better explanation! Nothing like this had ever happened to me before!) Brutal Russian (talk) 08:06, 22 July 2021 (UTC)
What you have discounted here is that I had good reasons, as I have laid out above, to doubt the accuracy of the content that was removed.
There was no consensus on the Pompeiian features, nor was there really much of a 'discussion' about the overall idea of a Pompeiian pronunciation beforehand, considering that nobody other than you actually discussed its merits or drawbacks, as far as I can see. At most you can say that it was not objected to and that it met with some approval–both of which were also the case for the idea of adding Proto-Romance.
Speaking of animosity, you have used a total of twenty-three exclamation marks in this thread, by my count, and your diagnoses/accusations/insults continue ("bewildering intellectual dishonesty", "embittered", "hurt ego", "denying plain truth"...)
Incidentally, I have never experienced a discussion online where I was being friendly and the other person suddenly accused me of "milking them for knowledge/conviction points" and being clueless, as you did. That was quite a first. The Nicodene (talk) 08:25, 22 July 2021 (UTC)
@Brutal Russian: It looks like User:The Nicodene reverted out of the conviction that original research was completely out of bounds, and that misunderstanding may be being cleared up. Then it's a matter of discussing rationally and deciding what original research is acceptable. Our Wiktionary is often a bit of a wild west in the area of module and template editing, so I don't know how to answer your question about The Nicodene's behavior with reference to any actively enforced policies, but I guess that, as you describe it, it's bad from a personal moral standpoint. Not that that stops anyone! There's plenty of bad behavior on Wiktionary. Unfortunately I don't see that we have robust processes for dealing with it. But I give The Nicodene the benefit of the doubt about the deletion of module content or whatever it was for having been convinced that original research is completely unacceptable everywhere (which it isn't, depending on consensus). Ideally you should come to some agreement between the two of you (and with anyone else who wants to get involved) before asking to have the main module unprotected again. Not sure if that's possible because of all the things said on both sides, but it's all I've got at this moment as an unpaid admin who dragged himself into this against his better judgement. — Eru·tuon 21:39, 22 July 2021 (UTC)
@Erutuon I had reasons to doubt the veracity of the removed content, and I provided write-ups above to explain. I see now that Wiktionary allows some degree of Original Research, but the above user refuses to engage in an evaluation of the disputed features. For instance I explained this issue a while back, only to have it ignored ever since. The Nicodene (talk) 22:48, 22 July 2021 (UTC)

@Erutuon (I'd also like @Benwing2 to read this) Thank you for the reply. I don't think it's possible to give them such a benefit of the doubt because right here they're doing original research and advocating a transcription with a "tense-lax diacritic soltion", which is unheard of in any discussion of Classical Latin. At least they were engaging in a discussion, even if it was with the same ultimate goal to prove me wrong and "recant" my transcription.

What I write next is not dependent on any technical details, but again concerns only the basic principles of Wiktionary, so please bear with me if you at all can. I understand what you're saying about coming to an agreement, but the thing is that this situation is not a disagreement about the features of the Campanian transcription. This is what I'm trying to highlight. This did not start with any disagreement about anything; what happened is that that user, arbitrarily and without raising any concerns or discussions, replaced an existing transcription of an entity that was discussed and encouraged, with a transcription of an entirely different entity, that goes far beyond any original research in that it's a theoretical reconstruction exercise and not a language, and it cannot possess a unified allophony in principle. They intended to have a separate PRmc module, but it was rejected. Their solution was to single-handedly appropriate the module. I don't think it can be reasonably doubted that they're simply trying to justify their actions by appealing to all those things, and they're now trying to justify them by portraying this as a disagreement about details. Again, they expressed no disagreement about anything, and they have foregone all discussion. It was I who tried engaging in a discussion - struggling to maintain a civil and polite attitude, despite the abuse that was already being perpetrated against me (19 May) -, and asked them to stop their edits until a consesus could be reached. I was ignored, and they continued to edit the module in an entirely arbitrary direction, and most importantly with a clear "I win you lose" conviction. To add offense to injury, they told me to re-add the transcription that they deleted as a separate sub-module - when that was what they needed to do, and that I specifically suggested and even sought help for.

To recapitulate: the module was hijacked by a single user. My requests to stop and discuss were ignored. Simultaneously I had been abused in another discussion with the same user, which I deserved only by engaging with them in a discussion. Evidently I was so disappointed that I didn't even think to write to any admin and request help. But given that my public abuse at the hands of the same user, in BP no less, had been ignored by everyone, and given that my grievances have been ignored by almost every admin subsequently, I'm not convinced that would have made a difference. I understand well how little you wish to have to do with this, and I understand the laissez-faire attitude here because I have always stuck to myself, even with respect to this very user at the other venue where they repeatedly attempted to abuse me. I simply walked away from the abuse. But here I have been cornered by the lack of interest from other users, including administrators, and by the apparent obligation to engage and seek consensus with anybody.

My contention is that even if they believed themselves to be justified - which I find incredible - this does not legitimize their actions nor their outcome. Eventually they latched onto Benwing2's opinion that reconstructed pronunciations in general are undesirable (expressed a few discussions above, which would however necessitate removing the Reconstructed Classical) in order to remove the prior-existing transcription instead of trying to replace it, citing disagreement over details. Firstly, as I have already explained, this is simply an "If I can't win I'll make everybody lose" move. Secondly, this too cannot be justified either since it violates our mode of operation. We do not delete modules or their parts when there's disagreement; the disagreement over details, even if not a specious attempt to assault my person by way of my edits, certainly doesn't necessiate the complete obliteration of the transcription.

The pronunciation that they unjustifiable deelted had been discussed and largely approved. First there was a drift towards totally removing Vulgar Latin, but I proposed salvaging it as Campanian and this aroused no objections. What has resulted from user The Nicodene's actions is in violation of that consensus. Our readers are interested in such a pronunciation existing, and its absence, as solidified by the latest freeze of the module, serves nobody. This is why I believe that reverting to the status quo revision that I propose is not taking my side, but the side of justice, consensus and civility. Otherwise, everybody has lost. Please, I ask that you consider this - I belive I'm making sense here. Brutal Russian (talk) 02:34, 23 July 2021 (UTC)

Taking the Latin exilis to mean 'tense' is hardly Original Research, considering that I was following a scholarly source in doing so. Notice, by the way, that that was just an idea I considered, not something I implemented.
You speak of "grievances", yet you have done far worse.
The idea of adding a Proto-Romance pronunciation was discussed beforehand in the same manner that adding a specifically Pompeiian one was. Namely, both were mentioned, neither met with objections, and there was a positive reception.
If you feel that the very concept of reconstructing Proto-Romance allophonic features is inherently wrong, even though the Dictionnaire Étymologique Roman–a reliable scholarly work–does exactly that (not to mention multiple other sources), then write them an e-mail and explain it to them.
I have multiple reasons to doubt the accuracy of the removed content, and I have written out, in detail, several specific examples. Removing dubious uncited content is an unremarkable, everyday occurrence- both on Wiktionary and Wikipedia.
Could I set aside the time to dig through all of the remaining features you added and provide detailed input, or at least a stamp of approval? Yes, I could. Then, at least, the Pompeiian features really would be the product of some kind of consensus.
If you would settle down and stop posting walls of text dripping with hostility, personal attacks, and weaponized "diagnoses"–which have sent 95% of interested parties running to the hills and completely exhausted the remaining 5%–perhaps a productive dialogue could begin. The Nicodene (talk) 04:04, 23 July 2021 (UTC)
@Brutal Russian, The Nicodene I'm going to be very short on this: You both are clogging up the discussion pages and both should cool down, as suggested by PUC. I'm confident enough to say that at this point nobody has enough power left to read through all these words, read up on the background info, come up with a solution and action. So please, just take a week or two/three to let this thing rest, find some other project for the time being, [I know the descendants sections are a mess], and let the others gather their spirits. Missing information is better than disputed information, so for the time being the revision is perfect for a short break. Thadh (talk) 05:24, 23 July 2021 (UTC)
I feel like it would be helpful for all the other Latin editors to have a discussion excluding the two editors whose long and bitter comments have swamped other discussions, about what the module should look like and do. - -sche (discuss) 07:27, 23 July 2021 (UTC)

Edit warring on fōrmāticusEdit

Just another example that the user in question thinks that edit summaries are battlefields, and after ending my genuine attempt to discuss by abusing and publically humiliating me they are again trying to taunt me into another squabble. The website cannot functions like this. It doesn't even bear mentioning how absurdly trifling their objections are - because nothing is not worth fighting over, and nothing cannot be use to portray me as an ignorant beginner who is wrong. Literally the whole purpose of this user's presence here is to edit war you by reverting within 10 seconds. Please, if you're inclined to ascribe it to some personal squabble that doesn't concern you, I urge you to consider what the website will turn into if this continues unabated; and if you're inclined to blame me as much as that person, I welcome you to have a quick look through my previous BP and TR (and other community) discussions to see what attitude to people and editing I've had all along. This didn't start with me - my beliefs and practices are entirely the opposite of this; it will not end until the person responsible for starting it is dealt with. Brutal Russian (talk) 09:53, 22 July 2021 (UTC)

You typed all-caps edit summaries in which you screamed at me instead of actually addressing the points that I calmly brought up. It is surprising to me that you insist on challenging the use of an asterisk to mark unattested words or phrases, since that is standard usage in Historical Linguistics. The Nicodene (talk) 10:00, 22 July 2021 (UTC)
@Brutal Russian: Yes, I remember you as a pleasant - and very knowledgeable - editor. But diagnosing The Nicodene was a very bad move on your part; I fear you've turned off many people and damaged your credibility by doing that. I've read your piece a few days ago. I've got no idea whether you're right - you might be - but it certainly didn't leave a good impression on me.
Also the volume of the exchanges has made it very difficult to assess what is what. And it has induced reader's fatigue in me; I am at a point where whenever I see something related to your disagreement, I'm thinking "oh no, not again; let's get away from here".
I wonder if it wouldn't be best for the both of you to give the issue a complete rest for a few weeks. It would give you the time to cool down, and others the time to recharge their batteries. They might be more willing to listen to you and reread some of the stuff then. PUC – 11:24, 22 July 2021 (UTC)
My sentiments broadly match PUC's; I've been more familiar with Brutal Russian (as a good and knowledgeable editor) than with The Nicodene, and I think The Nicodene is wrong on at least some things (I disagree with the removal of the pronunciation from formaticus and would like to restore that unless there is actually consensus to remove pronunciation information from such entries), but by now the repetitive diatribes between the two have made discussions tiring to wade through and TL;DR. I am aware that in a situation where one person is acting in bad-faith, and provokes a reaction, it'd be bad to act like both are being bad, but my impression is that in this situation both editors hold their positions in good faith / sincerity, and it's just that the positions are irreconcilable. This leads me to think it might be helpful if editors who edit Latin, other than these two editors, could decide what to do about formaticus and about the pronunciation module. - -sche (discuss) 08:03, 23 July 2021 (UTC)
Hi @-sche.
Note that Wiktionary avoids assigning a 5th century B.C. Attic pronunciation to the following words, which entered Greek in later eras:
δούξ, Κωνστᾰντῑνούπολῐς, Σκλάβος, βίρρος, φραγέλλιον, ἱεράρχης, πάσχα, πίτα
παροικία, Τοῦρκος, κυριακή, σάββατον, στήκω, Ἀλεξανδρέττα, σεβαστοκράτωρ
Formaticus post-dates the Classical pronunciation we have on Wiktionary (which is from about the first century B.C.) by the better part of a millennium.
The term is not used in Modern Latin, nor was it in Renaissance Latin. It occurs a few times around the ninth century in a limited region, clearly a local 'vulgarism'. It seems to me that it would be most appropriate to assign the word a pronunciation from that time and place, or perhaps a couple of centuries earlier. For that purpose one could use Mario Pei's analysis of the pronunciation of Latin in Northern France in the eighth century. The Nicodene (talk) 18:17, 23 July 2021 (UTC)
A pronunciation is not just how the word was spoken back in the day; it's also how the word is spoken now. A Shakespearean word may have dropped out of the language, but actors still have to pronounce it in the same accent as the rest of their speech. If you're reading a text out loud, or talking in Latin, you shouldn't drop from Classical pronunciation to Eighth Century Northern French pronunciation for one word. --Prosfilaes (talk) 04:12, 25 July 2021 (UTC)