Wiktionary:Beer parlour/2022/October

Removing spaced phrases from German compounds: no open compounds in German edit

I started to remove the likes of schwarzes Loch from the category of German compounds but was reverted. I provided two German sources saying German has no open compounds, Duden and German Wikipedia, but both were dismissed. The discussion was part of Wiktionary:Beer parlour/2022/August § Rename 'LANG words suffixed with SUFFIX' -> 'LANG terms suffixed with SUFFIX' and the arguments are there.

Some people think there are open compounds in German despite the sources. I need your support to continue the removal. There are about 100 items to correct; except for them Category:German compound terms with its 20,995 items is correct.

Compounds are defined as words; schwarzes Loch is not a word but a phrase. The situation in English is different: it has open compounds. Open compounds in English have no clear detection criteria to contrast them to non-compound phrases: some sources define them using phonology, other using idiomaticity.

Some sources and analysis are at Talk:open compound.

--Dan Polansky (talk) 17:45, 1 October 2022 (UTC)[reply]

Sources:

  • Duden:Kompo­sition: Zusammen­schreibung, Ge­trennt­schreibung, Binde­strich. This is a classic German source.
  • W:de:Leerzeichen in Komposita: "Komposita werden gemäß der geltenden deutschen Rechtschreibung grundsätzlich zusammengeschrieben. In vielen anderen Sprachen, wie z. B. Englisch oder Türkisch, ist dies anders, insbesondere in nicht-germanischen Sprachen." The sentences trace to no source.
  • W:Compound (linguistics): "As a member of the Germanic family of languages, English is unusual in that compounds are normally written in separate parts. This would be an error in other Germanic languages such as Norwegian, Swedish, Danish, German and Dutch." Tracing to no source.

--Dan Polansky (talk) 18:06, 1 October 2022 (UTC)[reply]

The status quo is obvious from the category: we don't categorize spaced multi-word terms as German compounds, or else Category:German compound terms be full of them. A proper conduct would be that those who want to change the status quo should produce consensus, not the other way around.

The category description now says "German terms composed of two or more stems." That is wrong: compounds are words, not just any terms. The renaming from "compound words" to "compound terms" was misleading at best, or just wrong. It was never approved by an explicit consensus. Most people apparently did not notice it was wrong: they did not notice compounds are defined by sources as "words" or "lexemes" and that without that definition the notion of "compound" becomes meaningless. --Dan Polansky (talk) 19:07, 1 October 2022 (UTC)[reply]

Also pinging @Benwing2. I propose 1 of 2 solutions:
Either we treat "compound" to mean specifically "closed compound" and the category "multiword noun/verb/etc-phrase" as "open compound".
Then there's the issue of terminology: Either we leave it as it is or we rename the categories "closed compounds" and "open compounds". Vininn126 (talk) 10:14, 2 October 2022 (UTC)[reply]
The solution is to let the category "compounds" contain what linguists consider to be compounds. For English, that includes open compounds, which are not coextensive with "multiword noun/verb/etc-phrase"; that is crucial. For instance, proverbs are not open compounds. Compounds are words, even if not typographically.
If you do not want to go by sources (for definitions and for German treatment) and want to do your own independent analysis, what definition of "compound" are you using? It cannot be "term made of multiple words" (that would include proverbs), so which is it? --Dan Polansky (talk) 10:50, 2 October 2022 (UTC)[reply]
That is why I said noun/verb phrases. Those are within your rules. I'm trying to cooperate with you. Please see that. Vininn126 (talk) 10:52, 2 October 2022 (UTC)[reply]
I don't understand your proposal and I don't know which definition of "compound" you are using. Your proposal above does not cover hyphenated compounds in any way. --Dan Polansky (talk) 10:57, 2 October 2022 (UTC)[reply]
A compound is a non-clause term composed of multiple words, be it written with a space or hyphen (open) or not (closed). This is not in disagreement with your sources or anything. The way it currently works is that closed compounds go into the category "compound" and that open compounds go into "multiword terms". That is our current system on Wiktionary. My proposal is to keep a distinction between open and closed compounds and we can determine terminology here, either by calling them closed/open or keep it as it is. Vininn126 (talk) 11:00, 2 October 2022 (UTC)[reply]
Thank you. This is in disagreement with my sources in so far as they require a compound to be a word or lexeme, which school bus traffic stop laws arguably isn't, but it meets your definition. Even longer phrase like that would meet your definition. All my sources either require the compound to act as a unit of meaning or be demarcated phonologically; your definition does none of that. What is the source of your definition? Furthermore, in fact, our current practice for English is that open compounds often go to "compounds"; the category for "multi-word terms" is independent of that. Category for multi-word terms can be kept, is independent and unproblematic; the only problem is "compounds". --Dan Polansky (talk) 11:10, 2 October 2022 (UTC)[reply]
https://en.wikipedia.org/wiki/Compound_(linguistics)
If you would like a source. They provide plenty of other sources that back up the definition provided. Vininn126 (talk) 11:13, 2 October 2022 (UTC)[reply]
WP definition is "compound is a lexeme (less precisely, a word or sign) that consists of more than one stem", not traced to a source. That differs from yours since "lexeme" is a fancy synonym for word as a set of inflected forms, not a term. Your previous definition classifies separately inflected adj-noun phrases as compounds, contrary to independent linguistic sources. --Dan Polansky (talk) 11:17, 2 October 2022 (UTC)[reply]
https://www.grammarly.com/blog/open-and-closed-compound-words/
https://www.merriam-webster.com/dictionary/compound
https://dictionary.cambridge.org/grammar/british-grammar/compounds
https://www.dictionary.com/browse/compound
Satisfied? Most of these even include all three kinds, closed, open, and hyphenated. Perhaps in relation to your proposal to include hyphenated forms we could have a separate category for hyphenated compounds. Vininn126 (talk) 11:21, 2 October 2022 (UTC)[reply]
M-W: "a word consisting of components that are words (such as rowboat, high school, devil-may-care)". Cambridge: "A compound word is two or more words linked together to produce a word with a new meaning:". Again, word. They consider "high school" to be a word. --Dan Polansky (talk) 11:23, 2 October 2022 (UTC)[reply]
That is honestly still within the confines of the original idea. The naming system proposed would apply specifically to these types of terms. Vininn126 (talk) 11:25, 2 October 2022 (UTC)[reply]
Your definition is "non-clause term composed of multiple words", but that is very different from "word" or "lexeme". Very long strings of nouns meet your definition, but are unlikely to meet the criteria of these sources. I still don't know where that definition is coming from; the sources provided do not have it. And I found no sources to give, say, 5-noun phrase as an example of an open compound. --Dan Polansky (talk) 11:28, 2 October 2022 (UTC)[reply]
Look at what I mean, Dan. I'm trying to find a solution and cooperate. If you have a problem with the wording provide your own, instead of nitpicking mine! You are smart enough to do so. We are ultimately here to find a solution, not bicker about nothing. I ask kindly for your cooperation :) Vininn126 (talk) 11:31, 2 October 2022 (UTC)[reply]
The biggest problem with your definition is that it breaks completely for inflected languages such as German, Czech and Polish. It ranks two-word adj-noun phrases as compounds, contrary to all treatment I can find in sources. My definition is the same as in sources: a compound is a word composed of multiple words, or more precisely, stems. The only problem becomes how to recognize a word in English; for German and Czech, there is no problem. I do not have syntactic criteria for how to recognize a word in English in the sense of these sources: there are none. One has to either look at meaning or at phonology, none of which is syntactic. There are sources linked from compound word and also from Talk:open compound; the talk page has some of the best material. --Dan Polansky (talk) 11:38, 2 October 2022 (UTC)[reply]
and adjective noun phrase is still considered a noun phrase, by syntacticians! Thus that would make it a noun composed of multiple stems! Vininn126 (talk) 11:38, 2 October 2022 (UTC)[reply]
A noun phrase is not a word in Czech, German and Polish. It is a noun phrase composed of multiple stems, but not a word composed of multiple stems. English does not inflect adjectives so it has no clarity there. --Dan Polansky (talk) 11:40, 2 October 2022 (UTC)[reply]
Asserting something lots of times doesn't make it true. Theknightwho (talk) 15:26, 2 October 2022 (UTC)[reply]
I agree with Dan about noun phrase being a different concept from open compound, in the sense that, as I understand it, some things that are noun phrases, like "the house that I used to live in" (a noun modified by a relative clause) and "the house in my hometown" (a noun modified by a prepositional phrase) are clearly not open compounds.
Clearly there are gray areas between the categories, though, because unlike Dan I would instinctively classify school bus traffic stop laws as a compound, because I don't know any syntactic names for the components school bus, traffic stop and laws (beyond "noun"), and the intermediate structures they make, such that I could draw a syntax tree for the whole thing. There's certainly a semantic tree-like relationship at least. It's possible Dan's definition of compound is better than mine. — Eru·tuon 04:02, 3 October 2022 (UTC)[reply]
@Erutuon: "compound" is not a grammatical category in syntactic tree: that would be "compound noun" or something, given that compounds can be nouns, adjectives and verbs. And that category is unnecessary: NP does all that is required. The long phrase is syntactically analyzed using the rule NP --> NP N, applied recursively. Furthermore, if that long phrase were a compound, why would one say that German has a tendency to form long compounds, and not say that English has such a tendency? --Dan Polansky (talk) 07:35, 6 October 2022 (UTC)[reply]
@Vininn126 I largely agree with your proposal, but with a slight modification. When you write "written with a space or hyphen (open) or not (closed)", this delineates an open/closed distinction that I don't believe is useful in the context of German; it would potentially have the same word classified in both sections. For example, Mobilfunk-Zubehör is a variant spelling of Mobilfunkzubehör, but both of these are treated as one word in German. Other than certain proper nouns (such as Baden-Württemberg), every single German word with a hyphen can be written without it; the hyphen, if used, exists purely to improve readability. Should someone write *Mobilfunk Zubehör, this would be considered a spelling error, in the same way that writing "toomorrow" instead of "tomorrow" is a spelling error.
The difference between Mobilfunkzubehör and schwarzes Loch is that Mobilfunkzubehör is composed entirely of nouns, Mobil + Funk + Zubehör, and thus must be written together, whereas schwarzes Loch is a phrase comprised of an adjective + noun, and may not be written together. It is possible for an adjective to become a noun through the process of nominalization. An example of this is Rotwein, or red wine. By itself, rot is an adjective and declines, for example eine rote Vase, mit einer roten Vase, etc. However, when an adjective undergoes the process of nominalization, it ceases to be an adjective. This is the case with Rotwein, where rot has been nominalized into a noun, and thus Rotwein is composed of two nouns, Rot + Wein; the spelling *Rot Wein is false. This is not the case with schwarzes Loch, where nominalization has not occurred, nor with any other phrase consisting of adjective + noun.
So, to sum up, I would propose having words which are written without a space, including any alternative forms with a hyphen (since there is no difference in German), put into one group, perhaps named "compound words" to avoid ambiguity. For a misspelled word such as *Mobilfunk Zubehör, if a hypothetical entry for this were created, it should still belong to the "compound words" category, because when spelled correctly, that is where it goes and is ultimately how it functions in the language. Then, everything else (guten Morgen, schwarzes Loch, etc.) would go into the second group, "multiword terms". Megathon7 (talk) 20:13, 7 October 2022 (UTC)[reply]

By way of academic analysis: how do we know schwarzes Loch is not a word? It is because of separate inflection. Both parts inflect separately, so they must be two words. The same is true for Czech černá díra, inflected as without černé díry. A true German compound is Abendblatt, for which "Abend" does not inflect, just "blatt". For German and Czech, a word is recognized syntactically and whether something is idiomatic or not plays no role. In English, which is largely uninflected, this detection criterion does not work: for black hole, "black" does not get inflected anyway, so other criteria must be used. That's as far as "arguments" since sources are not enough. --Dan Polansky (talk) 19:19, 1 October 2022 (UTC)[reply]

people with a grudge yelling at each other in circles — This unsigned comment was added by Vininn126 (talkcontribs) at 10:15, 2 October 2022 (UTC).[reply]
"The status quo is not how things are now, but how I think things should be". Theknightwho (talk) 19:31, 1 October 2022 (UTC)[reply]
Quite obviously false, isn't it. Googling the phrase finds nothing. So now Knight has decided with the help of Benwing2 that they own the status quo and that I must produce evidence of consensus. And yet they give us no idea of what compounds are and what they are not, and how to distinguish compounds from proverbs and from arbitrary adjective-noun phrases. diff shows misplaced confidence in a language Knight does do not know. They dismiss sources and replace them with their uninformed opinion and faulty analysis. ---Dan Polansky (talk) 19:39, 1 October 2022 (UTC)[reply]
If your view was the status quo, you wouldn't need to be advocating for change. How about you WT:AGF for once, instead of intentionally misrepresenting what I've said and talking about me in the third person? You seem to go out of your way to take the least charitable position, and it just makes you seem dishonest. Theknightwho (talk) 19:44, 1 October 2022 (UTC)[reply]
I only need to advocate the change since Knight reverted my changes instead of accepting them as being in accord with status quo. They did so based on misplaced confidence about "compound", as shown in the diff I linked. Knight has shown poor understanding of what compound and compound word are. So far he failed to provide his definition upon a request. He dares to venture into a language he does not know and dismisses a classic reliable source about that language. In this thread, his substantive contribution so far is zero. I find this whole drama quite absurd. Perhaps someone will be amused. --Dan Polansky (talk) 19:54, 1 October 2022 (UTC)[reply]
More lies, Dan. I gave you a clear argument, and you simply ignored it. Rather than forum-shopping, just take the L. Theknightwho (talk) 19:57, 1 October 2022 (UTC)[reply]
What is Knight's definition of "compound"? --Dan Polansky (talk) 19:58, 1 October 2022 (UTC)[reply]
How about you ask me, instead of being an ass? Theknightwho (talk) 19:59, 1 October 2022 (UTC)[reply]
What is the definition? --Dan Polansky (talk) 20:01, 1 October 2022 (UTC)[reply]
Surely the fact that there is a normative decree by Duden that compounds should be written closed suggests that, in fact, they aren't always? Given we care about language as actually used, and not language as Duden dictates it should be. Theknightwho (talk) 20:03, 1 October 2022 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── I see no definition of "compound" in that quote. --Dan Polansky (talk) 20:06, 1 October 2022 (UTC)[reply]
That just sounds like an excuse to ignore a major flaw in your viewpoint, which would explain why you've refused to answer it for the third time. Theknightwho (talk) 20:09, 1 October 2022 (UTC)[reply]
I answered that argument in the previous discussion. And above, I presented Duden-independent analysis and argument why schwarzes Loch is not a word and therefore not a compound. Still no definition. --Dan Polansky (talk) 20:13, 1 October 2022 (UTC)[reply]
You did not provide an answer - you simply asserted that open compounds don't exist. Stop lying. Theknightwho (talk) 20:14, 1 October 2022 (UTC)[reply]
Definition? --Dan Polansky (talk) 20:16, 1 October 2022 (UTC)[reply]
The current position is that we don't have any adequate definition, which means that you are in no position to make the changes you want to make. If you're just going to be obstructive by demanding I provide all the answers, then it's clear there's no point engaging with you. You wouldn't know good faith discussion if it hit you in the face. Theknightwho (talk) 20:19, 1 October 2022 (UTC)[reply]
Good. I do have a definition, you don't. My sourced definition requires a compound to be a word. Now, how do you know that schwarzes Loch is a compound given you have no definition? --Dan Polansky (talk) 20:22, 1 October 2022 (UTC)[reply]
Your definition is plainly inadequate, as I have explained, and as you keep ignoring. We cannot get anywhere until you actually acknowledge that. Theknightwho (talk) 20:24, 1 October 2022 (UTC)[reply]
My def is fine and sourced from multiple sources. How do you know schwarzes Loch is a compound? --Dan Polansky (talk) 20:27, 1 October 2022 (UTC)[reply]
Except it isn't: Surely the fact that there is a normative decree by Duden that compounds should be written closed suggests that, in fact, they aren't always? Given we care about language as actually used, and not language as Duden dictates it should be. Stop being stubborn, for once. I'm not the one advocating for change; you are. Theknightwho (talk) 20:28, 1 October 2022 (UTC)[reply]
The linked German WP article gives "Uni Halle" as an error for "Uni-Halle". It is a spelling error; "Uni" is not inflected in the phrase, so compoundhood is not harmed. How do you know schwarzes Loch is a compound? --Dan Polansky (talk) 20:32, 1 October 2022 (UTC)[reply]
Have we suddenly become a prescriptive dictionary, or did you just forget to read the second sentence of that short paragraph? Theknightwho (talk) 20:37, 1 October 2022 (UTC)[reply]
──────────────────────────────────────────────────────────────────────────────────────────────────── Why is schwarzes Loch a compound? Is there an answer? --Dan Polansky (talk) 20:41, 1 October 2022 (UTC)[reply]
I'll take that as a tacit admission that open compounds do, in fact, exist in German. Theknightwho (talk) 20:43, 1 October 2022 (UTC)[reply]
Repeated inquiries were unanswered. I call this a fail. --Dan Polansky (talk) 20:44, 1 October 2022 (UTC)[reply]
Because they're irrelevant, Dan. In case you've forgotten, this thread is about whether open compounds exist in German, and not about schwarzes Loch specifically, which you only brought up because you lack the intellectual integrity to admit when you're wrong. A fail indeed. Theknightwho (talk) 20:47, 1 October 2022 (UTC)[reply]
Open compounds such as "Uni Halle" only exist as rare errors. schwarzes Loch is standard German, not an error. And diff: "Obviously a compound. Stop being obtuse." So according to Knight, schwarzes Loch is obviously a compound, but when I ask them what makes it a compound, no answer is forthcoming. Does Knight actually still maintain that schwarzes Loch is a compound, given they have no definition and have provided no detection criteria at all? We don't know. --Dan Polansky (talk) 20:53, 1 October 2022 (UTC)[reply]
As I have already explained, we aren't a prescriptive dictionary and this is a thread about open compounds in German. If you want to have a discussion about a specific term, then please take that to WT:TR, where we can discuss it appropriately. As it is, it is very clear you are not willing to engage in honest discussion and are displaying obvious bad faith. Theknightwho (talk) 21:01, 1 October 2022 (UTC)[reply]
Yes, we are a descriptive dictionary. And what makes schwarzes Loch and similar phrases, not just this one, compounds? Knight made an edit that changed the entry, so the natural question is what knowledge is that edit based on. If Knight does not know that the term is a compound, I would like to ask Knight to revert their edit, and to acknowledge they do not know. schwarzes Loch is a good prototype for analysis; the analysis extends to other items Knight reverted. --Dan Polansky (talk) 21:05, 1 October 2022 (UTC)[reply]
As has been explained to you numerous times, you lacked consensus for the changes and have failed to establish your reasoning that there are no open compounds in German is correct. You are clearly suffering memory issues, and seem to have forgotten that the person you're talking about (me) is also the person you're actually responding to, so I think I'll call it quits. Theknightwho (talk) 21:09, 1 October 2022 (UTC)[reply]
What is Knight's reasoning? I see no reasoning to justify that the contested entries are compounds. I asked about the reasons multiple times, to no avail. --Dan Polansky (talk) 21:15, 1 October 2022 (UTC)[reply]
As has been explained to you numerous times, you lacked consensus for the changes and have failed to establish your reasoning that there are no open compounds in German is correct. Goodbye, Dan. Believe what you want. Theknightwho (talk) 21:17, 1 October 2022 (UTC)[reply]
Ok. Knight claims no knowledge. They are just not convinced by my sources and by my above analysis, so they object, and that's it. Since they rejected my German sources, my English-sourced definition and provided no definition themselves, there is nothing to do here: without a definition, no source-independent analysis can succeed. Nothwithstanding the fact that the quoted diff did claim knowledge. Pure obstructionism. I think this is explicitly discouraged on Wikipedia. The form is: "I don't know, I have no sources, but I object => you don't have consensus". Nothing can be done here. --Dan Polansky (talk) 21:25, 1 October 2022 (UTC)[reply]
Delusional. I love the sheer absurdity of the idea that I need to have all the answers to know that yours is wrong, too. Theknightwho (talk) 21:28, 1 October 2022 (UTC)[reply]
@Dan Polansky I should add that your test involving whether the parts of the open compound inflect is not very good for a number of reasons: (1) English adjectives don't inflect in any case, so we have no idea whether compounds like high school would inflect the adjective. (2) In languages that inflect adjectives, sometimes even closed compounds inflect, cf. quartusdecimus and respublica in Latin. (3) Some languages don't write spaces between words (Chinese, Japanese, Thai) or write spaces between morphemes (Vietnamese); in them, the distinction doesn't even make sense. Your phonological test in English is also flawed, since non-compounds like 'Spanish teacher' (= "teacher of Spanish", not "teacher who is Spanish") have the same stress pattern as compounds. Benwing2 (talk) 01:38, 3 October 2022 (UTC)[reply]
@Benwing2: (2) is a serious objection; sources indicate Icelandic has compound-internal inflection as well. The test only works for some languages. (1) and (3) are not really a problem: the test just does not do anything useful for them, but that does not invalidate the test. The phonological test from sources (not so much mine) may well work: I don't see what makes Spanish teacher a non-compound; surely whether it is a sum of parts should not matter given Tanzschule and bookshop.
We have to note: "The first, very simple observation is that all languages examined here have morphological compounds. However, it turned out that the compounds in these languages do not all share the same defining properties. While lexical (compound) stress, headedness (either right or left), inseparability and debarment of word-internal inflection, recursiveness, and linking elements are generally considered essential criteria for the definition of compound, in particular from a German(ic) perspective, all of them also emerged as problematic in at least one language, or as non-existent. Thus, it seems that there is no universal definition of compound."[1]
And further from the same: "In English, on the other hand, the formal distinction between (nominal) compounds and phrases is notoriously difficult."
And[2]: "In direct comparison with parallel phrases [in German], compounds can best be characterized by the following properties:
(i) Stress, which is on the left (modifier) constituent in compounds but on the head in phrases (Fríschluft – frische Lúft ‘fresh air’).
(ii) The stem form of the modifier, i.e. the absence of inflection (FrischØluft – frische Luft).
..."
If one believes the above (and I do), there is no cross-linguistic set of tests to distinguish compounds from multi-word expressions, and one has to look at the tests on a per-language level. For French, a source indicates different linguists classify different items as compounds nor not, and some even say there is no true compounding in French.
My proposal, now as before, is to determine compoundhood on a per-language level with the use of literature available for that language. When the literature disagrees, we have to decide what to do, but as long as there is no disagreement in literature, we should follow it.
What we cannot do is assume that there is a simple cross-linguistic conceptual definition of "compound" that acts as a practical decision procedure for all languages. That's not what conceptual definitions in dictionaries do; that's what operational definitions do.
You have classified "schwarzes Loch" as a compound. This brings the questions: 1) What is your definition of "compound"? 2) What are the cross-linguistic tests of compoundhood that you have identified? 3) How do the tests apply to schwarzes Loch, and how do they make it a compound? --Dan Polansky (talk) 07:35, 6 October 2022 (UTC)[reply]

Schwarzes Loch is classified as a Wortverbindung or Mehrwortausdruck, which translates to multi-word expression. These terms are used differently in German than in English. If we ignore that, and instead try to classify it based on some sort of English-centric approach, or a hypothetical "universal" approach that applies to all languages, well, then I don't have an answer for that. As far as German grammar is concerned, this is unambiguously a multi-word expression. Megathon7 (talk) 07:45, 7 October 2022 (UTC)[reply]

Let me qualify the proposal a little, to address one objection raised: the point is not to remove e.g. Uni Halle (if attested) in the meaning of Uni-Halle (university hall) from compounds; that is just a non-standard spelling. The point is to remove the likes of schwarzes Loch and guten Morgen, syntactically composed phrases, not morphologically composed words. The thread title is therefore slightly oversimplified; it is an approximation. --Dan Polansky (talk) 10:03, 7 October 2022 (UTC)[reply]
How exactly are we defining "compound"? This seems to be the problem. Guten Morgen is in the exact same grammatical category as the phrase/expression bis zum letzten Atemzug. If we are defining "compound" broadly as simply being composed of 2+ words, then it fits. But if compound is referring to compound word, then including these phrases is false. Megathon7 (talk) 15:18, 7 October 2022 (UTC)[reply]
@Megathon7 see also the heavily sourced Appendix:Compounds. I define compound as a word, not an orthographic word, but a morphological word. And this is what our category did just recently. Now Category:German compound terms says "German terms composed of two or more stems", which is broken beyond repair as including proverbs and who knows what else. My definition is in keeping with sources traced, and my analysis is in keeping with them. "schwarzer Tee" is not a morphological word, while "Schwarztee" is and so is "high school". In English, the syntax vs. morphology distinction is not so clear-cut as in German, hence "white house" (two morph.words) vs. "White House" (one morph.word). I also created Appendix:Wordhood, but that is much less interesting than the compound one; it at least mentions different kinds of "words" and traces them to sources for further reading. The opposition did not produce any workable definition despite repeated requests, which you will learn if you read the thread above. (But the thread is kind of nasty and not very productive.) --Dan Polansky (talk) 15:29, 7 October 2022 (UTC)[reply]
I have read through this thread. It seems like there's some confusion from non-German speakers on what a German word is, and what a German word is not. When it comes to "compound", the definition "German terms composed of two or more stems" includes, by that definition, all multiword terms, and thus Category:German multiword terms should be listed as a subcategory under Category:German compound terms. Then, when it comes to words such as schwarzes Loch and guten Morgen, they would be listed as a member of both. Sidenote: Why is guten Tag not listed as a compound term, but guten Morgen is? This is not logically consistent.
If "compound" actually wants to be defined as "German words composed of two or more stems", then these adjective + noun combinations cannot be included. Megathon7 (talk) 16:49, 7 October 2022 (UTC)[reply]
I should clarify that guten Morgen is composed of separate words and is not a compound word (similar to how guten Morgen and hätte, hätte, Fahrradkette are not compound words), at least as far as how compound word is defined in German. Megathon7 (talk) 16:58, 7 October 2022 (UTC)[reply]
I created Wiktionary:Compounds to track the state of policy and discussion. --Dan Polansky (talk) 06:32, 8 October 2022 (UTC)[reply]
I haven't read the entire discussion but I still felt like sharing my two self-important cents: German compounds are largely written without spaces. Writing with spaces what should prescriptively be written without spaces does occur commonly on the internet but I think we care about that just as little as we do about prescriptively incorrect capitalization (e.g. haus as a noun is easily attested on Usenet). Apart from that, there are a few true German open compound entries (Deppen Leer Zeichen) and perhaps even more if we count words that were borrowed with their spaces (e.g. Big Mäc). At any rate, schwarzes Loch is definitely an adjective+noun multi-word expression, not a compound noun. This is demonstrated by the fact that schwarzes Loch can be modified by an adverbial (komplett schwarzes Loch) which is not possible for (compound) nouns (*“komplett Big Mäc”). — Fytcha T | L | C 11:32, 25 October 2022 (UTC)[reply]

Hebrew entries are not conforming to our formatting norms (WT:EL) edit

Hi everybody.

I just realised how each and every Modern Hebrew entry that has a double headword (i.e.: headword followed by an alternative form) is actualy going against our formatting norms (WT:EL). (See for instance: חוכמה (khokhmá)).

Alternative forms should be given under the "Alternative form" section, not next to the headword with an ad hoc template. I don't know how this was allowed to happen for Hebrew. Did we create the "Alternative forms" sections after Hebrew headword templates were implemented? And even if that was the case, why was Hebrew not edited to adapt to the new rule?

Either ways, the status quo is against our rules and should be fixed.

The problem is not really just a matter of following the rules for the rules' sake, it's a structural matter and one of clarity: Hebrew entries are really messy trying to cope with all alternative forms in the headword line. Sometimes there are 3 different forms, but with the current template one can only add 2, and the 3rd one would still need to go under "Alternative spellings". See for instance ציפורן (tsipóren), that can be spelt ציפורן, צִפּוֹרֶן or צִפֹּרֶן, but there is space for only 2 alternative forms in the headword template. Another case is that of words spelled with the same consonants but different vowels, like להפך (lehéfekh), קמץ (kamáts, kaméts, kámets) or רטוב (ratóv, ratúv).

It would be a lot of work to fix this manually, is this something that can be fixed by a Bot? — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:12, 2 October 2022 (UTC)[reply]

I agree with this. The Hebrew entry formatting is even older than the {{alter}} template, and maybe even language-specific formatting in any kind of links or other templates, I have seen some Persian entries using {{lang}} instead of {{head}} even, this is probably most of the reason why one has resorted to an ad hoc template specific to Hebrew, to present it anyhow acceptable in lack of generalized templates. For Arabic it was only around 2015 that Benwing made most of the templates so that editors could add the mass of entries we now have and the templates stay fresh, while the Hebrew ones are specifically tailored but comparatively wayward and with the problems which you have mentioned. I have used {{alter}} nonetheless for plene/defective and even vocalization (e.g. אביונה) variants but one can still present the forms in headwords if only with more intuitive parameter names and not the ugly slashes which no dictionary publishing house would consider fleek typography. Fay Freak (talk) 11:47, 2 October 2022 (UTC)[reply]
@Fay Freak, Sartma I agree with the general principle here. As Fay Freak notes, these templates and entries are really old for the most part and are in desperate need of a refresh. (Notifying Wikitiki89, ZxxZxxZ, Ruakh, Qehath, Mnemosientje, Isaacmayer9, Metaknowledge): Pinging the listed Hebrew editors, although many of them are no longer active. It should be noted that languages that can be written in multiple scripts are inconsistent in how they handle this. See for example дати, where you can see that Old Church Slavonic places the Glagolitic equivalent in ==Alternative forms== but Serbo-Croatian places the Latin equivalent in the headword. Hindi-Urdu also places the Devanagari/Perso-Arabic equivalent in the headword, while Pali puts the multitude of other-script equivalents in ==Alternative forms==. Hebrew could potentially place the plene forms in the headword, but using a slash to separate them is strange and definitely shouldn't be done. I think the best way to proceed is for someone to make a proposal that addresses the major issues; once that's worked out I may be able to help implement the consensus. Benwing2 (talk) 01:30, 3 October 2022 (UTC)[reply]
@Fay Freak, Benwing2: Thank you for your comments!
@Benwing2: If it was just a question of alternative scripts for the same word it would be less of an issue, I think, but here we're talking about different spellings... I don't see any reason why we should give them at the headword level, it makes all very cluttered and not really user friendly, especially since we give all inflected forms too. See for instance חוכמה. We don't write: color/colour (pl colors/colours) in English, there's no reason why we should do that with Hebrew (especially since most nouns and adjectives have at least 4x2 forms...!)... Anyway, I'll come up with a proposal to discuss and then we can see what can be done! Thank you! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 12:47, 3 October 2022 (UTC)[reply]
For the benefit of those who don't know Hebrew, let me try to explain the situation:
  • Hebrew spelling doesn't indicate all vowels (and doesn't distinguish certain consonants); for example, the equivalent of 'SPR' (ספר) spells several words with different pronunciations: /'se.fer/, /si'per/, /sa'far/, /sa'par/, etc.
  • That said, the spelling does indicate some vowels, using certain letters that also have consonantal values; for example, the equivalent of 'LW' (לו) spells a word pronounced /lo/ and a word pronounced /lu/. This use of vowel letters has increased significantly over the millennia; as a general rule, later texts use these vowel letters in many more places than earlier ones. (Note: I should mention that en.wikt treats all forms of Hebrew as a single language, so all of these texts count as ==Hebrew==, just as weird Elizabethan spellings count as ==English==. But in Hebrew, as in English, en.wikt generally lists words under spellings that are in current use.)
  • Eventually, a system of diacritics was developed that indicates all the vowels, and makes almost all distinctions; hence we can distinguish the various 'SPR' words as סֵפֶר, ‎ סִפֵּר, ‎ סָפַר, ‎ סָפַּר, etc., and the two 'LV' words as לוֹ and לוּ.
  • These diacritics aren't part of the spelling of the word, which is still just ספר or לו or whatnot; rather, they're additional information on top of the actual spelling. An English analogue is the way that some (old?) dictionaries write the headwords "loot" and "look" as "lo͞ot" and "lo͝ok" to simultaneously indicate spelling and pronunciation, with no suggestion that the diacritics are part of the spelling.
  • But unlike "lo͞ot" and "lo͝ok", these diacritics are actually used in some sorts of texts, such as Bibles, prayerbooks, poetry, children's books, and dictionaries. And even though most texts don't use diacritics in general, they'll nonetheless use a diacritic here and there, when useful.
  • I mentioned above that the use of vowel letters has increased over time. In texts that don't use diacritics (the norm), this increase has continued even since the development of diacritics; but in texts that do use diacritics, this increase has now mostly stopped. So, many words now have two standard spellings: a spelling used in writing with diacritics, and a spelling used in regular writing. For example, in a text with diacritics, the word /si'per/ is written סִפֵּר ('SPR' plus diacritics), while in a more typical text, it's written סיפר ('SYPR').
  • What some (most?) Hebrew-English dictionaries do is, they list words under the regular spelling (the one used in writing without diacritics), but they include diacritics anyway, to indicate the pronunciation. So they'll give /si'per/ as סִיפֵּר ('SYPR' plus diacritics). This isn't something you'll find in "real" texts; rather, it's just a way that these dictionaries can show the common spelling and the pronunciation without wasting space, just like an English dictionary writing "lo͞ot".
  • I think that should be enough background to make sense of the situation on en.wikt. If you're interested in the subject and want to know more, see w:Mater lectionis (about the use of vowel letters, especially before the diacritics were developed), w:Niqqud (about the system of diacritics), and w:Ktiv hasar niqqud (about the further modern development of vowel letters in texts that still don't use diacritics).
FWIW, I agree that it's very messy to list both spellings of a bunch of inflected forms in the headword line, but I'm loath to give up the usual spelling, loath to give up the diacritics, and loath to put the diacritics on the spelling that "in real life" never has diacritics. :-P   So this feels "less bad" to me than the obvious alternatives, but if someone has a non-obvious alternative to suggest, or wants to make the case for one of the obvious alternatives, I'm all ears.
RuakhTALK 17:50, 4 October 2022 (UTC)[reply]
@Ruakh I think the biggest issue is the nonstandard use of slash to separate the variant spellings. The usual alternative is to either list the variant spelling in Alternative Forms or on the headword. As for inflected variants, one way is to have two entries, one spelled with the matres lectionis and containing inflections with mater lectionis, and the other using the "Biblical" spelling (or whatever you call it) with "Biblical" inflections. One of them points to the other for the definition. This is similar to how we handle Serbo-Croatian, where the Cyrillic entry contains Cyrillic inflections and the Latin entry contains Latin inflections. As for the niqqud, we have to compromise somewhere and I personally think it's fine to include them in the full-vowel forms (or not). Benwing2 (talk) 03:30, 5 October 2022 (UTC)[reply]
@Ruakh: As Benwing2 said, the issue is the formatting of Hebrew entry, which goes against our formatting rules. We are not suggesting to get rid of any alternative forms, just to show them under "Alternative forms", where they belong. We need to re-format all Hebrew entries and this matter is not open to discussion, it's just something that needs to happen and we just need to understand how to fix it with the least amount of work.
I agree with you that words with plene spelling shouldn't have diacritics, since the addition of vowel letters is de facto perceived and used as a substitution of the dotting system. The two systems are complementary, so it's either one or the other, but not together. I believe that main entries should be given in the plene spelling, the one generally used to write New-Hebrew, while the dotted forms can go under "Alternative spellings".
@Benwing2: I couldn't agree more with what you say, and that would have also been in my proposal (just didn't have time these day to post it here). That's the Wiktionary way. If an alternative form has alternative inflections, those are shown in the alternative form's page. So each "form" has its own page. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 06:42, 5 October 2022 (UTC)[reply]
So basically what Benwing suggested and Sartma agreed with, we should treat Hebrew like Russian with her pre-1918 spellings: An entry in the latter gets an extra table for pre-1918 inflection either in the main entry or if the citation form differs in the alternative form, given under the usual header. However in Hebrew entries we could combine both older and newer forms in one table since Hebrew inflection is not exactly comparable to Russian one and the interest to see older forms is obviously for Hebrew, her speakers having revived the language and thus ever deferring to old models (unlike Russian where no one but experts and enthusiasts ever read a book in the traditional spelling). Fay Freak (talk) 15:44, 7 October 2022 (UTC)[reply]
@Fay Freak: To be honest, Biblical/Classical Hebrew's conjugations are not the same as Neo-Hebrew. The entire tense system is quite different. NH has a very "European" past/present/future system, but Classical Hebrew didn't work that way. Classical Hebrew had a perfect/imperfect conjugation system and both perfect and imperfect forms could be used to refer to past, present and future. Biblical Hebrew also had a waw-consecutive inflection that doesn't exist in Neo-Hebrew. Biblical Hebrew deserves its own status as a language and its own inflection templates (maybe not for nouns, but definitely for verbs). — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 12:02, 8 October 2022 (UTC)[reply]
I don't think this is the right place to have that discussion; but FWIW, if you do start that discussion, please let me know. (I disagree with some of your claims/terms/assumptions, but I don't want to derail this conversation by rebutting them here.) —RuakhTALK 06:50, 3 November 2022 (UTC)[reply]
Just a process note: WT:EL is a guideline, not a set of rigid rules. It follows from its flexibility section. Editors are not forced to implement EL for Hebrew if they don't want to. The decision lies with editors. It definitely is for discussion. --Dan Polansky (talk) 15:38, 14 October 2022 (UTC)[reply]
@Dan Polansky: Just a topic note: We intend to have it closer to WT:EL not because we feel forced by their magic, but in order to make it more comfortable and conforming in particular for editors wont to the formatting in other languages, which should decrease the learning curve, to be specific, as well as to remove aesthetical deficiencies and perhaps maintenance costs particular to Hebrew, although of course it appears that Sartma argued mainly “it’s the rules” – but the rules have this specific goal. Fay Freak (talk) 17:58, 14 October 2022 (UTC)[reply]
I'm glad you feel that way. Sartma's comments, especially the one that put "goes against our formatting rules" in boldface, were starting to make me worry that general rules were being taken to supersede per-language judgment. —RuakhTALK 06:50, 3 November 2022 (UTC)[reply]

Today, the Citations:Gandhigiri page was blanked by @Sgconlaw on the basis that the on-entry cites and the cites on the Citations page were the same cites. I have added a new cite to the Citations:Gandhigiri page which does not appear on-entry at Gandhigiri. It is my view that, in the course of time, be it decades or a hundred or hundreds of years (2122 and beyond), it seems inevitable to me that all the cites that were blanked from that Citations page today will be restored, with the new cites from the ensuing epochs. I just wanted to see if you all had any reactions or comments on the blanking or my view of the Citations pages. --Geographyinitiative (talk) 18:51, 2 October 2022 (UTC)[reply]

I think we haven't really figured out what we want to do with these and the language at Wiktionary:Quotations seems very much like a series of compromises. One nice thing about the citation namespace is that it's used for making graphical representations with timelines on some entries and this kind of information would definitely be bloat on the entry itself, but can be extremely useful. I'm inclined to say that we should limit the citations in the entries themselves to some reasonable, manageable amount (e.g. 10 or a dozen) and then allow users to basically go wild on the citations namespace to show sustained and varied usage over time (or even a dip in usage if there's a legitimate dry spell), but that proposal is a a little outside the scope of this discussion. —Justin (koavf)TCM 22:12, 2 October 2022 (UTC)[reply]
I very much agree with this theory of how the Citations pages should work. The entry proper should have a couple good examples, and then the Citations page should be a wild wasteland of any shit that may help you understand the word from various angles. --Geographyinitiative (talk) 23:10, 2 October 2022 (UTC)[reply]
This is my feeling as well, entries should have usage examples, citations pages should have evidence of usage. - TheDaveRoss 15:03, 5 October 2022 (UTC)[reply]

Categories like Category:English 7-syllable words contain many multiword terms (as opposed to individual words), such as absolute immunity. Using a database dump from 2022-09-20 I have found 1,765 such terms across all such categories (for English). Some of these terms (absolute immunity, algorithmic randomness) have been manually added and some (quid pro quo, sub rosa) are categorized by {{IPA}} as if they were single words because their pronunciations contain no spaces (indicating they are pronounced as single words). I see this as incorrect because I think of quid pro quo as three words, but there's probably a debate to be had there. I see three reasonable solutions, and welcome any I've overlooked.

Option 1: Remove all manually added multiword terms from the categories. Change {{IPA}} to only categorize a term in these categories if the term itself (as opposed to its pronunciation) does not contain any spaces.

Option 2: Remove all manually added multiword terms from the categories. Consider terms pronounced as single words (like quid pro quo) to be single words, regardless of their spelling.

Option 3: Rename these categories, replacing "words" with "terms". Change {{IPA}} to categorize all terms in these categories, regardless of whether they or their pronunciations contain spaces. This would overrule the previous RFM.

What say you?

- excarnateSojourner (talk | contrib) 22:25, 2 October 2022 (UTC)[reply]

I think Option 1 makes the most sense. - excarnateSojourner (talk | contrib) 22:25, 2 October 2022 (UTC)[reply]
Option 1 with a caveat about hyphenated words which are still a single word and not a multiword term. I personally think of something like will-o'-the-wisp as being a multi-word term just like how I would think of will o' the wisp as being a multiword term for containing spaces between semantically meaningful elements, but (e.g.) pro-life is a single word, as pro- does not stand on its own as a word. There could be other instances where hyphenation is tricky as a character that combines two things into one thing or a character that stands between distinct elements in a phrase. —Justin (koavf)TCM 22:31, 2 October 2022 (UTC)[reply]
@Justin Good point. I intended all of the options I listed to leave hyphenated, spaceless terms unaffected (i.e. considered to be single words). - excarnateSojourner (talk | contrib) 22:46, 2 October 2022 (UTC)[reply]
@ExcarnateSojourner I agree with you in all respects. There's no point in having multiword terms in any of these categories; their purpose appears to be to help identify long words of a specific number of syllables, and having multiword terms in them defeats that purpose. I also agree with leaving hyphenated terms alone for now, since it's unfortunately not possible to programmatically make the distinction mentioned by User:Koavf. In the slightly longer term, we should consider keeping or removing hyphenated terms from these categories according to whether hyphenated terms are categorized into 'LANG multiword terms' for a given language (this is controlled on a per-language basis by hyphen_not_multiword_sep in Module:headword/data). Benwing2 (talk) 01:16, 3 October 2022 (UTC)[reply]
  • Oppose option 3: looking at long words by the number of syllables is useful. By contrast, there is nothing peculiar about syntactic composition producing long multi-word phrases. I am undecided between Option 1 and Option 2. Option 1 is easier to manage using automated tools. pro-life and non-European are single words. Terminologically, we have to distinguish "orthographic word" from "phonological word", and also "morphological word" and "grammatical word". None of the notions is the "correct" meaning of "word"; it is for us to decide which of the notions we pick. --Dan Polansky (talk) 09:16, 6 October 2022 (UTC)[reply]
I have created a Grease Pit discussion to find someone with a bot to perform all of the recategorization. I might consider attempting this myself if no one responds.
In seeking to implement Option 1, I have realized that changing {{IPA}} for all languages so that a space in spelling prevents a term from being considered a word is probably a bad idea. So I have instead requested a bot to add nocount=1 to automatically categorized terms containing spaces (like quid pro quo). If any multilingual editors believe a space or equivalent separator in the spelling of a term generally implies a slight pause in pronunciation in all languages, then I can revise the GP request to modify {{IPA}} instead. - excarnateSojourner (talk | contrib) 01:35, 25 October 2022 (UTC)[reply]

Linking to Disallowed Entries edit

Is it acceptable to link to terms that are not permitted? The common case is of links to words for which we have no form of attestation in their script. --RichardW57 (talk) 06:25, 5 October 2022 (UTC)[reply]

Do you mean should we have permanent redlinks? Or redlinks waiting for the eventual erosion of limitations on inclusion in WT:CFI? DCDuring (talk) 13:27, 5 October 2022 (UTC)[reply]
It's the blue redlinks that bother me the most. (One can't turn them orange without logging in.) For the type I have most in mind the evidence is likely to eventually turn up, but such searches are currently failing. I have entered terms with blue redlinks, but @Svartava has summarily deleted or moved them without public reproof. In some cases, one may actually be waiting for the evidence to be honestly created. There are, however, obsolete writing systems for which there may no longer be any instances, and so never will be. My feeling is that if a term is deleted via RfD or RfV, the links to it should also be expunged or blackened, and that terms should only be deleted for lack of attestation via due process; due process sometimes includes special cases of summary deletion. I also feel that restoring terms deleted without due process should not be considered reprehensible edit warring. --RichardW57m (talk) 16:12, 5 October 2022 (UTC)[reply]
If the link will stay red (since the target is not a permitted entry) then of course such links would be useless and should be avoided. Equinox 13:32, 5 October 2022 (UTC)[reply]
Especially since inexperienced users go around systematically creating entries from redlinks in assembly-line fashion. Chuck Entz (talk) 15:22, 5 October 2022 (UTC)[reply]
This most commonly happens with SOP translations or protologisms/achronistic translations. I believe this sort of things should be avoided. Vininn126 (talk) 16:59, 5 October 2022 (UTC)[reply]
Among the problems, as RichardW57 observes, there might be an entry (hence a blue link for those not logged in), but not in the right L2 (hence an orange link for those who are logged in). I would argue that waiting for attestation longer than a day or week is not satisfactory: the attestation might never arrive, but the ugly (red or orange) or misleading (blue) colored link remains. Let it be a black link or enclose it in a nul template, say {{noL2sect}} (one that leaves the display black and does nothing else, except, possibly categorizing into a [new?] maintenance category), so that the entry can be found via Cirrus search ('has template:"{{noL2sect}}"') and the item via browser page search. DCDuring (talk) 17:32, 5 October 2022 (UTC)[reply]
The template {{no entry}} at the destination looks like a good solution. --RichardW57m (talk) 14:27, 19 October 2022 (UTC)[reply]
When a link is to a "term" which I don't think should or will exist, I instead link to either Wikipedia (when appropriate), the component parts (when helpful), or de-link. - TheDaveRoss 15:01, 5 October 2022 (UTC)[reply]
That seems to be our best current practice. DCDuring (talk) 17:33, 5 October 2022 (UTC)[reply]

Biblical Hebrew (hbo) not set as ancestor of Hebrew (he) edit

I tried using {{inh}} on a Hebrew etymology entry (this: מיץ) and apparently it can't be used because Biblical Hebrew (hbo) is not set as an ancestor of Hebrew (he). I believe I don't have to convince anybody on the absurdity of this. Would it be possible to modify the settings so that {{inh}} can be used in Hebrew etymologies? Thank you! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:58, 5 October 2022 (UTC)[reply]

  Done. —Mahāgaja · talk 10:19, 5 October 2022 (UTC)[reply]
@Mahagaja: Thank you! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 14:20, 5 October 2022 (UTC)[reply]
@Sartma: It may be awkward, but it's not absurd. Biblical Hebrew is treated as part of Hebrew and is not defined as a separate language. As Biblical Hebrew is not included as a separate language, it cannot be inherited from. --RichardW57m (talk) 10:15, 5 October 2022 (UTC)[reply]
@RichardW57m: I also find it absurd that Biblical Hebrew is treated by Wiktionary (only, I hasten to add) as part of a general "Hebrew" (whatever that means). We should have two separate languages: Biblical Hebrew (or "Classical Hebrew", that would possibly include Post-Biblical Hebrew, maybe up to Medieval Hebrew, just like we do for Ancient Greek and Latin) and Neo-Hebrew. Keep talking about just "Hebrew", as we do here on Wiktionary, is linguistically and lexicographically absurd. But more to come on this in a future post. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 14:28, 5 October 2022 (UTC)[reply]
@Sartma: I could, but that whole etymology is circular. Biblical Hebrew is handled as a historical variety of Hebrew, so essentially the entry now says that Hebrew is inherited from Hebrew, which needless to say holds true for any language; If the form were different I would suggest the wording From earlier X, but in this case the whole etymology section seems useless. Thadh (talk) 10:17, 5 October 2022 (UTC)[reply]
Not entirely. It could easily state that the noun derives from the verb. --RichardW57m (talk) 10:24, 5 October 2022 (UTC)[reply]
@RichardW57m, Thadh: Actually, we can now. By adding ancestral_to_parent = true, at Module:etymology languages/data, we can say that an etymology-only variant of a language is ancestral to its parent language. Thus it now possible to have CAT:Italian terms inherited from Old Italian and CAT:Latin terms inherited from Old Latin (even though Old Italian and Old Latin are etymology-only variants of Italian and Latin), and CAT:Hebrew terms inherited from Biblical Hebrew should now be possible. —Mahāgaja · talk 10:26, 5 October 2022 (UTC)[reply]
Yeah, that's why I said I could; But in this case, this doesn't seem useful, unless Modern Hebrew and Biblical Hebrew are often spelled differently, which I believe isn't the case. Thadh (talk) 10:33, 5 October 2022 (UTC)[reply]
@Thadh: They are spelt the same, but there was a semantic shift. מיץ (miṣ) doesn't mean "juice" in Biblical Hebrew. We have similar etymologies for Arabic dialects, for instance Moroccan بيت.
Anyway, one of these days I'll propose to separate Biblical Hebrew from Neo-Hebrew again. I believe it needs to be done and I'll bring all the arguments to support my claim. But this is a topic for a future post. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 13:55, 5 October 2022 (UTC)[reply]
Where would Mishnaic Hebrew and Medieval Hebrew fit? 98.170.164.88 15:09, 5 October 2022 (UTC)[reply]
I would personally fit them under "Biblical Hebrew" (that we could also call "Classical Hebrew"), like we do for Medieval Latin and Greek, since they are "literary" (only written) forms. I'd basically put all the "dead" together, separated from the "alive", but that can be discussed. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:05, 6 October 2022 (UTC)[reply]
@RichardW57m: Not that easily, since the verb is not attested in Biblical Hebrew. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 14:19, 5 October 2022 (UTC)[reply]

Old Novgorodian edit

@Rua, Victar, Mahagaja, ZomBear, Gnosandes, Useigor, Silmethule, Atitarev, Tetromino, Benwing2, -sche, Tropylium, Surjection, Maas555 (hope I haven't forgotten anyone, if I did, I apologise)

Hi everyone. Currently, we concider Old Novgorodian (ON) and Old East Slavic (OES) as two separate languages of the East Slavic branch. Both encompass the period from the split of East Slavic, West Slavic and South Slavic (so, ca. 6th-7th CE) up to the split of Old Ruthenian in the west and Middle Russian in the east (ca. 1500 AD).

ON contains quite interesting features linguistically, like the absence of Slavic second and third palatalisation, tsokanye, and others - which is definitely an argument for keeping the two languages separate. There are however a few problems that we currently experience, especially in our etymological coverage:

  1. Currently, we handle any East Slavic borrowing up to 1500 AD. as OES borrowings - in a two-language model this would however be historically inaccurate. For instance, any Finnic-speaking population in the Middle Ages would live in the territory of the Novgorod Republic, which was predominately ON-speaking. The same goes for Komi and Samic peoples. Latvians lived pretty much on the border, which makes borrowings into Latvian and Latgalian difficult to analyse, but you get the point.
  2. The modern Russian language is actually a complete merger of Old Novgorodian on one hand and Old East Slavic on the other. This also makes it difficult to analyse Russian inherited vocabulary, since it doesn't necessarily come from what we understand as OES, but could rather be a northern influence. It gets even more difficult when you get into the modern Novgorod and Pskov dialects, which have an even greater gradient of ON.
  3. Finally, we haven't worked out the lemmatisation of ON yet, which makes it quite difficult reference it in etymologies. As in OES, the spelling was quite variable, so we'd need to come up with some normalisation scheme, but unlike OES yuses aren't generally written at all and yers consistently disappear once they disappear in the spoken language, so an etymological spelling like what we do with OES right now will probably not work.

Would anyone prefer merging ON and OES (how?) and if not, does anyone have any idea how to adress the points above? Thadh (talk) 10:09, 5 October 2022 (UTC)[reply]

@Thadh: Given all the features you mentioned and also stuff like keeping separate o-stem nom. in (хлѣбе (xlěbe)) vs acc. in (хлѣбъ (xlěbŭ)) (acc. to Olander due to different development of early *-a(ː)(R)s in Novgorod than in the rest of Slavic), I’d rather keep it separate. I don’t have answers to all the issues you raised though (and am not active in ON, I wasn’t even aware it is treated as a separate language currently!).
Anyway, keeping it separate makes it easier to elaborate in the etymology sections though – eg. in хлѣбе (xlěbe) we could write something like From early Proto-Slavic *xlěbas (> *xlěbъ) (or From early *xlaibas (…) or something) to indicate our PSl. reconstructions aren’t true direct ON pre-forms – even if we don’t do that now.
If ON and OES were merged (I don’t think I’d prefer that though), I guess we could list ON lemmas as alternative forms linking to main OES entries. // Silmeth @talk 11:28, 5 October 2022 (UTC)[reply]
On point 1: if you mean the older (1st millennium CE) Slavic loans in Finnic like lusikka, läävä, most of them are practically from Proto-Slavic and not really in any distinct way I know of from Old East Slavic. A few have been on the contrary argued to be specifically from Old Novgorodian. Nothing of the sort further east or north, however, where contacts are later and probably start with trade contacts with Pomors (who IIUC are already speaking just pre-North Russian with all palatalizations where should be, etc.)
Also: what linguistic reasons are there to group ONov. with surviving East Slavic at all (when it has e.g. no polnoglasie)? I've been under the impression that, what with all the archaisms, it is maybe better considered its own fourth branch — a "North Slavic". --Tropylium (talk) 12:17, 5 October 2022 (UTC)[reply]
@Tropylium: On the Finnic part: AFAIK, any modern theories on the Slavic urheimat would prevent Proto-Finnic and Proto-Slavic to be in direct contact, so if I don't misunderstand anything Proto-Slavic "proper" (for lack of a better name; so before the main migrations) would not come in contact with Proto-Finnic at all. This means that any (Proto-)Finnic loans after the split of the various Slavic branches - and according to Zaliznyak this would mean ca. 6th CE, since that's when second palatalisation came into being - would be classified as Old Novgorodian here.
Now on ON being East Slavic or not, that's where most experts disagree - it's also quite difficult to discuss because that's when science turns to the wave theory, and we can't really handle it here; I'm honestly not read up enough to readily give you all the features that make ON closer to OES than other branches, but from what I can tell there are just a handful of (early) phonological features that distinguish it from OES, although I guess that is true for any early Slavic language. Thadh (talk) 15:39, 5 October 2022 (UTC)[reply]
Turns out I missed the section in Zaliznyak's book discussing this, he gives the following features that are shared by both ON and OES:
  1. *je- > o- in many words
  2. *-Bj- > -Bl'- (where B is any labial consonant)
  3. *-TElT- > -TOlT (where T is any stop, E is e, ь and O is o, ъ respectively)
  4. *-oRT- > -RoT- ~ -RaT- (where R is r, l and T is any stop)
Of these, only point 3 is exclusive to OES and ON. Of course there's also numerous similarities in vocab, and shared further developments, but I guess those can be attributed to convergence. Thadh (talk) 14:29, 7 October 2022 (UTC)[reply]
@Thadh: Hi. Unfortunately, I can't participate in this discussion in any way (you know). Gnosandes ❀ (talk) 16:51, 9 October 2022 (UTC)[reply]

Whether various internet culture sites are to be regarded as durably archived sources edit

To add to this, here are some other potential online sources.

Imageboards (4chan, 7chan + minor/foreign "-chan's") edit

https://4chan.org/ https://7chan.org/

main problem is the inherent lack of official archive, remedied via 3rd party archives, most which provide backups to IA. personally am unaware about the status of archives of minor/nonenglish imageboards.

Support. Binarystep (talk) 02:46, 18 October 2022 (UTC)[reply]
Oppose. I can't imagine OED, M-W, or any other professional English dictionary quoting posts from 4chan on a regular basis. The site is ephemeral in nature, even if external archives exist, and I would question how durable those archives are. Posters are often completely anonymous, which makes it impossible to tell if a term has widespread adoption or is just used by one prolific poster. Moreover, adding 4chan as an acceptable source would be opening the floodgates for highly offensive and derogatory terms, quite plausibly to a greater degree than Usenet currently allows. Many of these would be like the Usenet-attested term Darky Cuntinent in that they have little currency outside the confines of 4chan. 98.170.164.88 17:02, 21 October 2022 (UTC)[reply]
I would be more open to allowing it if there were heavy restrictions placed on its usage, as an example (not meant as a specific suggestion): must quote at least 15 posts spanning 5 years, all of which must be acceptably archived on Wayback, etc., and possibly even requiring at least one other source/site for attestation and not just 4chan alone. 98.170.164.88 17:15, 21 October 2022 (UTC)[reply]

Imageboorus (Danbooru/Gelbooru mostly) edit

https://danbooru.donmai.us/ http://gelbooru.com/

gold standard of image tagging(they have their own wiki describing the vast majority of valid tags), been around for over a decade with no signs of stopping, and the tags themselves are liable to not be deleted, assuming they've been determined to be "canonical".

Oppose First of all, definitely could've used a NSFW warning for the first link. Anyway, to the point, I don't think "tags" are the kind of content we ought to focus on, especially when most of them (judging by the front page of the first site) are just names of media franchises, normal words, or SOP phrases like "huge breasts". Terms that are actually used in sentences/running text are more worthy of inclusion. 98.170.164.88 02:29, 7 October 2022 (UTC)[reply]
Support, but only if we don't accept tags as "uses". At best, they're definitions. Binarystep (talk) 02:46, 18 October 2022 (UTC)[reply]
i think that's a bit short sighted, depending what you mean. for example, various tags, such as red_eyes, have combined various tags from native japanese sites such as 赤眼, which has over 2000 entries alone, despite not having a entry on our own page 赤眼. I think a minimum amount of entries(50?100?500?) for a given tag should be sufficient as a valid attestment of it's use, though that might ultimately need to be a case by case thing as religion has less than 50 entries while nun and habit have legitimately over 9000 entries each, which i should think speak to the fundamentally different use case Akaibu1 (talk) 03:54, 18 October 2022 (UTC)[reply]
How is that not SOP? The only reason we have an English entry for red eye is because it has special meanings that you wouldn't gather from the components alone. 98.170.164.88 03:59, 18 October 2022 (UTC)[reply]
i'm not really sure why SOP is being brought up or why it seems you think i'm arguing against SOP? the comment is rather confusing.
In any case for a good example of a SOP in the hypothetical entry attestment might be detached sleeves,with over a quarter of a million entries
i see no reason why a quarter million uses would not be good enough Akaibu1 (talk) 05:19, 18 October 2022 (UTC)[reply]
I don't see how those count as uses rather than mentions. We don't create entries for words that only appear in dictionaries, regardless of how many dictionaries they appear in, and this seems to be a comparable situation. In any case, it shouldn't be too hard to cite detached sleeves using actual posts/comments/tweets/etc. instead of tags. Binarystep (talk) 06:35, 18 October 2022 (UTC)[reply]
"i'm not really sure why SOP is being brought up" You seemed to be suggesting that you would want to create the entry "red eyes" (or the Japanese equivalent) based on a tag containing pictures of characters with red eyes. That seems entirely SOP, just like "huge breasts" as I mentioned above. I think "detached sleeves" has a greater case for being idiomatic, since "detached" has a specific meaning, although I'm sure some would argue the opposite. Regardless, even in that case, I agree with Binarystep that citing running text is better than citing a tag on an image. 98.170.164.88 08:06, 18 October 2022 (UTC)[reply]

Know Your Meme edit

https://knowyourmeme.com/

third party documenter of internet meme culture, though i believe has a problem of people trying to "force memes"(protologism) by adding them to the site, shouldn't be a problem for use using the attestment standard(lolcat would be valid, "the mist challenge" would not) Akaibu1 (talk) 20:23, 5 October 2022 (UTC)[reply]

Making lists of sites is not helpful. Sites come and go, and change name and ownership and content. The HUGE sites of a decade ago may not exist now, and this pattern will repeat. We need a policy that can be applied to any site at any time. Equinox 20:41, 5 October 2022 (UTC)[reply]
a basic set of requirements might be:
- the site must be at least x(5? 10?) years old in it's current form(current meaning likely "this domain has been used for this purpose for this length of time", just to prevent people from pointing at the IA history and going "hey look, sky.com has been around since 1996" when the modern rendition has only been around for 7 months)
- the site must be independently and reliably backed up (in most cases that would probably be IA) Akaibu1 (talk) 21:52, 5 October 2022 (UTC)[reply]
I like these criteria. For reference, 4chan has existed for 19 years, 7chan for at least 16 years (because that's when the Wikipedia page was first deleted, lol), and Know Your Meme for 14 years. But I've searched and have been unable to find a decent source for how old Danbooru and Gelbooru are. - excarnateSojourner (talk | contrib) 22:31, 5 October 2022 (UTC)[reply]
Danbooru is a little over 17 years old,almost old enough to vote in a lot of countries while gelbooru is over 15 year old, enough to get a drivers permit in most Akaibu1 (talk) 22:39, 5 October 2022 (UTC)[reply]
looking further, 7chan seems to be only a month or two younger than 4chan itself Akaibu1 (talk) 22:44, 5 October 2022 (UTC)[reply]
Strong oppose on all of these, also UrbanDictionary and YouTube comments. I also vote we don't allow the ramblings of madmen or the text output of cats walking across keyboards. - TheDaveRoss 12:41, 6 October 2022 (UTC)[reply]
"I also vote we don't allow the ramblings of madmen": We already count any ramblings as long as they've been published in a book, magazine, or newspaper; even self-publishing via Lulu.com counts. What's considered a "madman" is highly subjective. If one is not religious, one might describe the Book of Revelation (or many other religious texts from many faith traditions) as "the ramblings of madmen", but obviously such works should count. The same could potentially be said for many other works that are wrong (e.g., alchemy), controversial (e.g., cold fusion), avant-garde/experimental/artistic/satirical (e.g., 'pataphysics), or postmodernist (e.g., the stuff Sokal criticizes in Fashionable Nonsense), but which are still valid uses of language. I know this is just a throw-away comment, but IMO this proposal isn't workable. Maybe we shouldn't go out of our way to cite these sources when we can use more "normal" sources instead, but I wouldn't exclude them entirely. 98.170.164.88 01:32, 7 October 2022 (UTC)[reply]
You have convinced me, I now vote that we exclusively allow the ramblings of madmen. - TheDaveRoss 02:29, 7 October 2022 (UTC)[reply]
I agree with equinox. But while I am at it, I oppose these for similar reasons given in the previous discussion. Vininn126 (talk) 14:40, 6 October 2022 (UTC)[reply]
Mixed: We shouldn't cite the articles themselves, given that they can be edited by anyone at any time. The parts of the website that are fixed (such as the News section [3]) are fair game in my view.
Ioaxxere (talk) 04:04, 8 October 2022 (UTC)[reply]
Support. The fact that the opposition boils down to "these people are stupid, we shouldn't document their stupid words" troubles me, given that we're purportedly a descriptivist dictionary. "All words in all languages" includes words we don't like. Binarystep (talk) 02:46, 18 October 2022 (UTC)[reply]
Not what's being said. Stop putting words in people's mouths. Vininn126 (talk) 07:52, 18 October 2022 (UTC)[reply]
How else am I supposed to interpret TheDaveRoss's comparison to "the ramblings of madmen" and "the text input of cats walking across keyboards"? Even during the previous discussion, a number of votes were more focused on the quality of the words themselves than their actual attestability. Binarystep (talk) 09:43, 18 October 2022 (UTC)[reply]
It has to do with the fact you get these literal actual occurrences, or as mentioned AI writings, and what has been proposed doesn't currently filter them out in an effective way. Taking "ravings of madmen and cat keyboard smashing" and interpreting it as "internet words bad" is a stretch and assigning meaning somewhere where it's not. Vininn126 (talk) 09:47, 18 October 2022 (UTC)[reply]
My actual rationale is that I don't support venues which are largely or exclusively user-generated content, especially when that content is skewed heavily towards invention and fandom. This is a recipe for people gaming the very brittle system to get words which they are passionate about included, even if they have little or no actual usage. Beyond gaming the system, this dives into the deepest niches of jargon, (I am sure we could cite millions of character nicknames if we allowed fandom wikis, for instance but I don't believe that having millions of character nicknames is actually a good thing for the project). I do think there should be reasonable restrictions on where we find CFI-establishing citations, because there are many editors here who take the CFI's criteria to mean that 3 uses of a term ever mean that it is now worth including, while I do not. This has nothing to do with what words I like or dislike (we have lots of words I dislike), and everything to do with not wanting to include a bunch of terms which are not actually used by any reasonably sized group of people.- TheDaveRoss 12:26, 18 October 2022 (UTC)[reply]
Support Know Your Meme for attesting facts about memes that otherwise meet CFI. For example, galaxy-brain as an idiomatic term has been well-attested, but KYM could be used to add clarity on the dating and origin of the source meme. KYM has a review process whereby memes are either "confirmed" or "Deadpooled" (rejected/deprecated). The entries for confirmed memes tend to be well-written and sourced. WordyAndNerdy (talk) 07:53, 21 October 2022 (UTC)[reply]
I think there is some fundamental confusion here, we don't need to allow these sites for attestation in order to use them for reference. The voting is specifically as to whether we consider citations from these sites sufficient to meet the criteria for attestation in the CFI. As I mentioned in the Twitter poll, if, for example, a Tweet was the origin of a particular term, it would be folly not to cite that Tweet. However, for the purposes of the CFI, we should find three other venues to cite to confirm that it has been adopted outside of Twitter. Since KYM is a third-party, user-edited site I have to imagine there is less chance that it will provide citations, but perhaps some meta information can be found there. - TheDaveRoss 12:42, 21 October 2022 (UTC)[reply]
KYM is sort of akin to a dictionary in that it defines terms/memes instead of simply using them. It very often shows example of usage, which we could potentially use for attestation. For example, I don't think the main text of the KYM entry "That escalated quickly" demonstrates usage in a way we could quote. The meme images contained on the page potentially would be examples, however. (Ignore that this particular meme is SOP, it's just the first example I found.) In line with what WordyAndNerdy said above, I'm supportive of using KYM as a reference, but using it for quotations seems iffy. 98.170.164.88 16:38, 21 October 2022 (UTC)[reply]
  •   Support any website including this one. We need more ability to add words, not more restriction. Let the words of rambling madmen just fill the site up, idc TDR. PseudoSkull (talk) 18:42, 21 October 2022 (UTC)[reply]
    If you don't care there are other websites you could not care on... It is clear there are plenty of people willing to add utter nonsense here, why make it easier for them? It is extremely easy to include English words already opening the door further isn't likely to increase the number of actual terms included. - TheDaveRoss 19:19, 21 October 2022 (UTC)[reply]
Oppose all these, obviously, for the same reasons as Dave. I would also echo his comment that the users saying we need to cite these as references for the meaning or etymology of terms seem to miss the point; we don't need to consider them durably archived and count them towards WT:ATTEST if all we want is to cite them as references. We've always cited things as references that we wouldn't allow as WT:ATTEST citations (for example, other English dictionaries! including online ones which we cited long before the votes on allowing online citations). - -sche (discuss) 02:39, 23 October 2022 (UTC)[reply]
  •   Weak support only for usage examples in the definitions that they provide. (Also this format of voting is very confusing. AG202 (talk) 05:01, 1 November 2022 (UTC)[reply]
  •   Oppose. We should not quote directly from 4chan under any circumstances. A newspaper or scholarly article that quotes 4chan can be treated like a similar source that quotes Twitter. We can use knowyourmeme as a corroborating reference (not a sole source) for meaning and etymology but not for "durable" citations. Vox Sciurorum (talk) 18:11, 14 December 2022 (UTC)[reply]

Hebrew roots template under Etymology sections edit

Hebrew roots template ({{HE root}}) should go under the language header (==Hebrew==), but a humongous number of them are under ===Etymology===. In some pages with more than one etymology section, the root is repeated, even if it's the same root in both etymologies (pages with two different roots are fine the way they are). Would it be possible to fix this with a Bot? I guess it could be done in two steps:

  1. move first found root template under ==Hebrew==
  2. delete any duplicate template in the page

Can that be done? — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:25, 8 October 2022 (UTC)[reply]

Normalization of Umbrian edit

Note: I'll be using here N meaning that the following orthography is normalized.

I start by making clear that I think that Umbrian is definitely a language of marginal importance (in comparison to other languages which are not getting the attention they should), which is why I hesitated to start this discussion. Confronting myself with others on the Discord though, I was convinced that this, rather than an "Umbrian" matter, it is a lexicographic one, which is where I'm trying to emphasize the discussion's point.

Nevertheless, a brief introduction of the language (Italic branch, 7th-1th century BC):

  • very poorly attested (see what's pretty much the entire Umbrian corpus)
  • written half in the w:Old Italic script and half in the Latin script (with both scripts having pros and cons), which means that some words are only attested in one script, some only in the other
  • written by superstitious shepherds who surely had better to do than coming up with a consistent orthography, so a word is spelled differently almost every time it is written
  • the contents of the inscriptions are liturgic or administrative, this implies that
    • nominatives are uncommon (in istructions the subject is always an implied "you")
    • first person singular present is very rare (attested only in subocau(u) (attested 17 times) “I invoke” and 𐌔𐌄𐌔𐌕𐌖 (hapax) “I exhibit”, and once more in a minor inscription)

All this makes it basically impossible to neatly have 'lemmas' and 'non-lemmas' like we do for other languages. Example: Navis (“bird”) is only attested 9 times: (acc.pl.: Navif) aueif (2), auif, 𐌀𐌅𐌉𐌚, 𐌀𐌅𐌄𐌚, auuei; (abl.pl.: Navis) 𐌀𐌅𐌄𐌔, 𐌀𐌅𐌉𐌔, aueis. None of these forms (which except aueif are all practically hapaxes) seem the right choice for being a lemma. The solution in line with our current practices of how we deal with attested and unattested forms would prefer having all those pages as nonlemmas soft-redirecting to a Reconstruction:Umbrian/avis page. One of the disadvantage of having the lemma in the Reconstructed namespace, though, is that it is misleading: it might suggest that the entire word is unattested, while it is only the lemma-form that is. Another problem would be that, while in the Reconstruction namespace I would have a bit of freedom and be able to normalize the spelling, the (few) words that actually have an attested lemma-form would be a complete mess: Naiom (“what is said”) would be under [𐌀𐌉𐌖], Npaker (“favourable”) under [pacer], Nkonigaz (“kneeling”) under [conegos]/[𐌊𐌖𐌍𐌉𐌊𐌀𐌆], etc. since those are the only forms they are attested as.

The other solution, which I'm highly supportive of, would be to prioritize words over attested forms and thus put Navis under [avis], Naiom under [aiom], etc. True, this is misleading into making think that Ancient Umbrians did write this way (and to fix that problem, there's {{normalized}}), but it brings consistency in page titles and easier page navigation. Would I also need to use a sort of {{unattested lemma}} that would say "this lemma form is not attested, but the word is", for cases like Navis?

Catonif (talk) 12:45, 8 October 2022 (UTC)[reply]

This kind of thing is important for any language with unattested lemma forms - I believe a certain level of nominalization to the lemma will be needed. Vininn126 (talk) 13:00, 8 October 2022 (UTC)[reply]
Lemma forms don't need to exist as words - many Pali lemma forms are impossible (consonant stems) or very rare (inanimate a-stems) as words. RichardW57 (talk) 01:39, 16 October 2022 (UTC)[reply]
Even in well attested languages like Greek some forms are missing, which may be interpolated and marked up with footnotes. I do however forego conjugation tables altogehter. Gothic has more examples of interpolation. English has some historically undocumented headwords, e.g. snore from *fnore (snore; snort, noun), but often etymology only. I do prefer the example of sub-pages as in English or the Vulgar Latin reconstructed entries but I'm not sure if those are orthogonal to the question, because spelling variants are optional whereas inflection is obligatory. Nevertheless, I see no pressing need to make 1p headword. Even if it may be the citation form you could do as needed in Umbrian, i.e. in spirit of a glossary, as long as it can be documented in Wiktionary:About Umbrian. A Trümmersprachen-Banner would be nice, but should not clutter the page. From the fact that there is non in Phoenician e.g. I guess that's a separate issue. 141.20.6.62 15:02, 12 October 2022 (UTC)[reply]
WT:About Umbrian is currently a work-in-progress, it will contain all the (attested and unattested) spelling rules. Catonif (talk) 19:57, 14 October 2022 (UTC)[reply]
I would compare what is being done at Category:Proto-Norse lemmas by User:Mårtensås, who may have thoughts on this subject as well. — Mnemosientje (t · c) 07:51, 13 October 2022 (UTC)[reply]
I've had a short exchange with him on the matter. He showed me how he solved some of the problems without using normalization, but concluded with "[The] most important thing is to be consistent", which I thoroughly agree with, and think that for this particular case, the best way to achieve that is through normalization. Catonif (talk) 19:57, 14 October 2022 (UTC)[reply]
How does one decide what should be normal for a marginally-attested language with highly variable spelling?
We could do with Umbrian what we have done with Oscan, namely provide attested forms as they are and explain what case or conjugation they reflect. Nicodene (talk) 20:37, 14 October 2022 (UTC)[reply]
Yes, that was the original plan, but how would you solve the above mentioned problems? I mean the lemma of Navis, the 𐌀𐌉𐌖-pacer-conegos/𐌊𐌖𐌍𐌉𐌊𐌀𐌆 thing.
Oscan (bad example) also needs to be developed, now it's just some stub pages. I'll hopefully work on it after Umbrian. Catonif (talk) 15:37, 15 October 2022 (UTC)[reply]
I personally would prefer a Normalized template as opposed to a reconstruction page Vininn126 (talk) 15:59, 15 October 2022 (UTC)[reply]
Arbitrarily choose a lemma from the attested forms? Probably with a preference for frequency of attestation. Nicodene (talk) 23:20, 30 October 2022 (UTC)[reply]
In my opinion, the normalized entries should at least include what the attested forms are. 98.170.164.88 16:35, 20 October 2022 (UTC)[reply]
100%. This is already something I do with Old Polish and Middle Polish. Vininn126 (talk) 16:51, 20 October 2022 (UTC)[reply]
Is there a standard normalization of Umbrian that scholars writing about it use? We systematically normalize Old Norse (entries for the attested manuscript spellings are afterthoughts, although one day we should ideally enter them all and link to them from the lemmas), but Old Norse is also systematically normalized in general, in most(?) reference works. If there's a standard normalization of Umbrian, I see no problem with just using that, and if we (or scholars) are reasonably confident of what a lemma form would have been, I don't see a problem with lemmatizing that (in mainspace, not in the Reconstruction: namespace) even if only inflected forms are attested: overall, words being attested only in inflected forms and not in the "lemma" form is not that rare, in all the languages we cover. But also, if e.g. the first-person singular present is almost nonexistent, is there another more attested form it would make more sense to lemmatize, e.g. a third-person form? - -sche (discuss) 02:58, 23 October 2022 (UTC)[reply]
Sadly no, there's no normalization used by scholars, it's solely my creation. I swear it isn't some crazy thing I made up though, it's just a compromise between the two scripts:
  • OIt. script has
    • 𐌔, 𐌆 (Ns, Nz) for L. script S
    • 𐌔, 𐌜 (Ns, Nç) for L. script S. is sometimes used for Nç, but it's not as common
    • 𐌓𐌔, 𐌛 (Nrs, Nř) for L. script RS
    • Note: the romanization of the Italic script for Umbrian is widely agreed on by scholars, so ⟨z ç ř⟩ are not my choice
  • on the other hand L. script has
    • O, U (generally No, Nu) for OIt. script 𐌖
    • C, G (Nk, Ng) for OIt. script 𐌊
    • T, D (Nt, Nd) for OIt. script 𐌕
This and the fact that N-m and N-f are often (but not always) dropped, even though they carry important grammatical information.
Note: I know that orthographies are rarely phonemic and that this problem alone would have not needed normalization. The choice of normalizing arises from the more complex situation, that is better described in the first message.
About lemma forms, in the end I opted for the infinitive for verbs (the most common verb form is likely the third person imperative, but no way that's a lemma) and realized that for nouns and adjectives accusative is best (as we already do in Old French, for example).
Catonif (talk) 15:48, 24 October 2022 (UTC)[reply]

"Bag" edit

The Wiktionary article on "bag" (English) gives the following senses for "bag" in the sense of container:

  1. A flexible container made of cloth, paper, plastic, etc. Synonyms: (obsolete) poke, sack, tote
  2. (informal) A handbag Synonyms: handbag, (US) purse
  3. A suitcase.
  4. A schoolbag, especially a backpack.

I find this rather unsatisfactory, as it gives only a choice between the sense "sack", and three very specific kinds of "bag" (handbag, suitcase, schoolbag).

The Cambridge dictionary definition makes a distinction between a sack-like object (paper bag / plastic bag / hessian fertiliser bag) and a container for carrying personal items, which makes far more intuitive sense to me:

  1. a soft container made out of paper or thin plastic, and open at the top, used to hold foods and other goods:
    • a paper/plastic bag
    • a bag of apples/nuts
    • Don't eat that whole bag of (= the amount the bag contains) sweets at once.
  2. a container made of leather, plastic, or other material, usually with a handle or handles, in which you carry personal things, or clothes or other things that you need for travelling:
    • She pulled a pen and notepad out of her bag and started jotting down information.
    • I hadn't even packed my bags (= put the things I need in suitcases/bags).
  3. a shopping bag

Looking at the Translations section, the current definition appears to have given rise to some confusion, with some giving translations corresponding to Cambridge sense 2. (a container for carrying personal things, etc.) at the current sense 1., which clearly refers to a sack-like object.

Changing the definition would obviously be a major change and shouldn't be undertaken lightly. I don't know how this sort of thing should be handled at Wiktionary and would be interested in what editors think.

Bathrobe (talk) 06:41, 10 October 2022 (UTC)[reply]

I find your reworking of the definitions a more accurate reflection of actual usage. Andrew Sheedy (talk) 22:28, 16 October 2022 (UTC)[reply]

Preferred Encoding edit

When a word has two different encodings whose rendering differs only by the presence of dotted circles, how do we choose the encoding for the primary lemma? (By Wikimedia diktat, only NFC forms will be considered.) Is there some general rule (I haven't seen one) or do we normally decide about it language by language, or script by script? I haven't seen any such local rules either.

The issue came up with Mon ရဲု, which follows the Unicode Modern Burmese (sensu stricto) rule (at TUS Table 16-4) of vowel above (sensu lato) before vowel below, while Microsoft script Creating and Supporting OpenType Fonts for Myanmar Script requires vowel below before anusvara and VOWEL SIGN AI (so ရုဲ). The Unicode (Burmese) encoding outnumbers the Microsoft encoding by 5:1. I feel bound to notify @咽頭べさ as he might offer some helpful comments. --RichardW57m (talk) 13:41, 10 October 2022 (UTC)[reply]

Hi @RichardW57m, User Hintha is not able to understand Mon language and copying Mon vocabulary from Sealang website is very disappointing. the Sealang website is not useful for Mon vocabulary in any way because the Sealang website has many mistakes in Mon vocabulary, so Polski Wiktionary has explained this in detail in my user talk, check out the truth and falsehoods of ရုဲ terminology below.
  • ရ+ု+ဲ=ရုဲ ✔
  • ရ+ဲ+ု=ရဲု❌
Thanks.--𝓓𝓻.𝓘𝓷𝓽𝓸𝓫𝓮𝓼𝓪|𝒯𝒶𝓁𝓀 16:06, 10 October 2022 (UTC)[reply]
Your inability to format a reply is annoying.
Having now seen the rendering on MS Word, MS Edge (Chromum-based) and Firefox (Linux), it looks in this particular case as though non-standard is the way to go. However, I'd like to know the scope such decisions should have. --RichardW57 (talk) 18:11, 10 October 2022 (UTC)[reply]
The way it is used now is only the consensus standards of the Mon scholars from the Myanmar unicode organization. 𝓓𝓻.𝓘𝓷𝓽𝓸𝓫𝓮𝓼𝓪|𝒯𝒶𝓁𝓀 19:06, 10 October 2022 (UTC)[reply]
What is the "Myanmar Unicode organization"? Is it anything to do with the Unicode Consortium? The way it is used now should still be documented by Wiktionary. Someone may want to look up words by cuutting and posting from Twitter to Wiktionary. Be comforted that the Unicode standard only says that ရ+ဲ+ု=ရဲု should be used for modern Burmese, and note that not everyone (in fact, no supplier of renderers to the masses) agrees with them. However, it does seem to be the commonest spelling around. --RichardW57 (talk) 20:12, 10 October 2022 (UTC)[reply]
I forgot one rather important rendering system - that of iPhone! The rendering system on iPhone (at least, iOS 12.5.6, the latest version for mine), is the other way round - it insists on SIGN AI before SIGN U. That explains why <RA, SIGN AI, SIGN U> is the most popular Mon encoding on the Internet, especially Twitter. I was also perhaps unfair to dismiss Mac users. We may have to switch the content of ရုဲ and ရဲု round, making ရဲု the main entry again! --RichardW57m (talk) 09:11, 11 October 2022 (UTC)[reply]
Hi@RichardW57m,I will pretend that I don't know about all this, so as I am a scholar who wants to protect my mother tongue, I understand that people on this English Wiktionary are jealous of me for personalizing their vices, so I'll pretend I don't know. if you want to learn about Myanmar Unicode typing, you can contact the Burmese language team of the ministry of education's Myanmar ethnic Language department as well as read the Font page, so no matter how much I do my best on this English Wiktionary, people won't recognize my dedication, so I'm going to leave this English Wiktionary forever.--𝓓𝓻.𝓘𝓷𝓽𝓸𝓫𝓮𝓼𝓪|𝒯𝒶𝓁𝓀 08:31, 13 October 2022 (UTC)[reply]

Why are all the entries (though they are only few) except one uncapitalised? Am I missing a special rule for Middle Dutch here? --Florian Blaschke (talk) 21:09, 12 October 2022 (UTC)[reply]

Some citations seem to support this lower case: https://gtb.ivdnt.org/iWDB/search?wdb=VMNW&actie=article&id=ID95213 Not sure why/how this happened in Dutch and I'm not very versed in that language, but it's at least consistent with this source. —Justin (koavf)TCM 21:46, 12 October 2022 (UTC)[reply]
(Notifying Rua, Mnemosientje, Lingo Bingo Dingo, Azertus, Alexis Jazz, DrJos, Rua, Wikitiki89, Benwing2, Mnemosientje, The Editor's Apprentice, Hazarasp): Apologies if you get double-pinged, trying to hit two groups. AG202 (talk) 03:57, 13 October 2022 (UTC)[reply]
User:Rua is most likely to know as she created these entries, but she isn't very active any more. Benwing2 (talk) 04:20, 13 October 2022 (UTC)[reply]
Florian Blaschke, Middle Dutch is barely readable for me. My German is probably better than my Middle Dutch. And my German isn't that great. @MarcoSwart, if anyone knows, I guess maybe you? — Alexis Jazz (talk) 04:31, 13 October 2022 (UTC)[reply]
@Florian Blaschke: I believe the reasoning behind this by Rua was that capital letters are an anachronism for medieval languages, as the Gothic minuscule script used to write medieval West-European languages did not have such a system of capitalization (capitalization for proper nouns is I believe an early modern thing, before that European scripts were consistently either majuscule or minuscule). The same goes for Latin, Ancient Greek, Middle English, Old Norse, etc. though, so this does not represent a special rule for Middle Dutch. Perhaps Rua is here following scholarly convention, but I'm not sure. (I think diplomatic editions of a manuscript will typically forgo capitalization.) — Mnemosientje (t · c) 07:23, 13 October 2022 (UTC)[reply]
@Florian Blaschke: To the information given by @Mnemosientje I could just add that both Middelnederlands Woordenboek and the more recent Vroemiddelnederlands Woordenboek (see this example at the site of IvdNT) do spell medieval names without capitals. MarcoSwart (talk) 10:03, 13 October 2022 (UTC)[reply]
But why would we do that only for Middle Dutch, when we don't do it for any other ancient or medieval language, where capitals are just as anachronistic? --Florian Blaschke (talk) 11:28, 13 October 2022 (UTC)[reply]
I agree, proper names in other old languages shouldn't be capitalised either. —Rua (mew) 18:42, 13 October 2022 (UTC)[reply]
I don't really care either way (though it's far more common to just capitalise names anyway, and outside diplomatic editions, not doing so comes across as needlessly pedantic to me), I just want consistence. --Florian Blaschke (talk) 21:20, 13 October 2022 (UTC)[reply]
IMO if we want to be pedantically correct for Latin, we'd spell it ALL IN VPPERCASE and avoid U, since that's how the Romans wrote it. But IMO that would be silly because it's contrary to the way that every modern Latin dictionary and scholarly edition does things. For Middle Dutch we should follow whatever the majority scholarly convention is for that language in modern texts. I suspect (but don't know for certain) that this includes capital letters for proper names. Benwing2 (talk) 04:28, 14 October 2022 (UTC)[reply]
My point exactly. Can we remove the pedantically uncapitalised entries now? --Florian Blaschke (talk) 00:41, 23 October 2022 (UTC)[reply]
Both reference works mentioned above don’t follow that convention, though. We should follow the convention for each language, and not shoehorn them just because Latin does things one way. Plus, Latin has been used with capitalised proper nouns for a very long time; the Romans were not the only speakers. I doubt we can say the same for Middle Dutch. Theknightwho (talk) 13:05, 28 October 2022 (UTC)[reply]

Wu romanization on Wiktionary edit

We should definitely implement reforms to the way Wu is currently handled on Wiktionary. User:ND381/Wu Expansion seems like a great place to start, but it currently appears to be incomplete in many areas. What do you say? Dennis Dartman (talk) 18:54, 13 October 2022 (UTC)[reply]

Hi there hello we are once again discussing Wugniu
We kind of had a conversation going over at Wiktionary:Beer parlour/2022/August, where we got to the consensus that something should be done about it eventually. I proposed we first show Wugniu as default and Wikt in the expanded menu, then switch to Wugniu input. Currently we still have to figure out how to write tones down in Wugniu to show tone sandhi chains and right prominent sandhi, but after that, it should be smooth-ish sailing — 義順 (talk) 11:46, 16 October 2022 (UTC)[reply]

Category(s?) for Hands and Hand related stuff edit

on the discord i've been talking about adding a hand category for about week or so but every time i go to just make it myself i realize there's complications so i'll bring it up here to decide if, and how many, hand related categories we might need, for example, mitten,glove and gauntlet might go under a hypothetical Category:Handwear while something like fist, and carpal tunnel syndrome might go into a Category:Hands

though i know there's at least half a dozen other nuanced things to consider(like how it seems we've deemed that gestures, and hand gestures by extension don't warrant a category? though perhaps i'm mistaken


also yes i know the first word of the subject doesn't have proper plural spelling but i can't think how to make it look good as "Category(ies?)" doesn't look any better Akaibu1 (talk) 19:31, 13 October 2022 (UTC)[reply]

My perspective is that it's okay to start off with Category:Hands and populate Category:en:Hands and then see if you're reaching a lot of entries that could logically go in a subcategory. Note that a subcategory like Category:Handwear would not only diffuse Category:Hands, but also Category:Clothing, which could be even more valuable. Also, these categories ultimately apply across the entire dictionary and all its languages, so maybe Dutch, Igbo, and Russian actually have tons of terms for handwear that can justify a distinct category scheme for that. All that is to say, I think both Category:Hands and Category:Handwear are totally valid and you can include Category:Fingers under Category:Hands to have a pretty robust scheme of terms and subcategories. —Justin (koavf)TCM 19:42, 13 October 2022 (UTC)[reply]
relatedly it occurs to me that actions related to hands, such as toss,lob,throw,hand,clap,applause,chuck(and probably a dozen more) could warrant it's own category, but not sure what such a category would be named... Akaibu1 (talk) 20:05, 13 October 2022 (UTC)[reply]
Those are all hand-related actions, as are scratch, sketch, point, grip, etc. or even things like twist, which applies elsewhere. I'm not 100% sure about all these action verbs, but certainly the nouns seem to make a single germane topic.Justin (koavf)TCM 20:08, 13 October 2022 (UTC)[reply]

Template shortcuts for Template:IPAchar edit

Having to write {{IPAchar|/ɔ/}} is IMO far too many characters to type for a single IPA symbol. As a result, in documentation pages and such, many people are lazy and just write /ɔ/ or the like. I'd like to propose a shortcut or two to avoid this issue. There are various possibilities:

  1. Create a shortcut for {{IPAchar}}. Examples: {{ip}}, {{ipa}}, {{ic}}, {{ch}}.
  2. Create two shortcuts, one for slash-surrounded text and one for bracket-surrounded text. Then you'd write e.g. {{is|ɔ}} or {{isl|ɔ}} without the slashes in place of {{IPAchar|/ɔ/}}, and {{ib|ɔ}} or {{ibr|ɔ}} without the brackets in place of {{IPAchar|[ɔ]}}. We could even make them more mnemonic by calling the one that auto-adds slashes {{i/}} and the one that auto-adds brackets {{i(}} or something (unfortunately you can't include bracket characters in template names).

What do people think? Probably the simple {{IPAchar}} aliases are the best, but I don't know which one is most mnemonic. Benwing2 (talk) 04:42, 14 October 2022 (UTC)[reply]

I'd agree with having {{ic}} or something of the like as an alias of {{IPAchar}}. Option 2 wouldn't hurt but I believe it's best to keep it straightforward. Whatever the result is, i/ and i( are hideous names though. Catonif (talk) 08:43, 14 October 2022 (UTC)[reply]
I think having a shortcut for IPAchar is a good idea. Having shortcuts for slashes or not is an interesting idea, but I think the exact shortcuts could have a better form. Vininn126 (talk) 10:02, 14 October 2022 (UTC)[reply]
I created {{ic}} for the moment as a shortcut for {{IPAchar}}. So far I'm using it only on the documentation page of Template:pt-IPA. I can always remove it if people end up not liking it. Benwing2 (talk) 01:12, 17 October 2022 (UTC)[reply]
I think {{ipc}} would make more sense, but that's purely subjective. Chuck Entz (talk) 02:38, 17 October 2022 (UTC)[reply]
  Support. I recently added about 80 instances of the full name to Rhymes:English, and a shortcut would have been nice. - excarnateSojourner (talk | contrib) 05:19, 17 October 2022 (UTC)[reply]

Conlang's "official" words edit

If a conlang is notable (apendix or not) there should be some equivelent to WP:ABOUTSELF so that words that are said to be part of the the languige by its oringnal/offical documentation do need to be proven attestated. Unoffical words woulf still need to be attestated. GTbot2007 (talk) 12:22, 14 October 2022 (UTC)[reply]

Thai Mon edit

Please expunge this newly created language. Mon is Mon, Intobesa is gone. --RichardW57m (talk) 13:20, 14 October 2022 (UTC)[reply]

Change the ancestor of Norwegian Bokmål to Danish edit

There is no doubt that Bokmål is historically descended from Danish through a series spelling reforms, though with heavy borrowing from both Nynorsk and the Oslo dialect. The current policy of having Bokmål as a descendant of Middle Norwegian is then completely inaccurate, and should be changed. Must there be a vote for this? @Eiliv ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 15:01, 15 October 2022 (UTC)[reply]

1, 2 - we all have some reading to do, again. Thadh (talk) 15:10, 15 October 2022 (UTC)[reply]
This again … I remember the first discussion, where, even though I wrote lengthy replies, I felt like there was a lack of interest and will to do anything. There was no thorough, fact-based discussion. My replies were largely ignored and the whole discussion just silently died off and was archived. I gave up, it was like talking to a wall.
I was not aware of the second discussion and that it actually reached the voting stage. However, the outcome does not surprise me. Votes rooted in facts and reason lost to a couple of votes that essentially were "quit nagging". Again truth was sacrificed and the convenient option was chosen.
It is convenient to ask people to go read those two previous discussions, but until the original question actually is addressed (and not brushed off as unimportant, bothersome or whatever), it will be brought up again and again. Those who insist that Bokmål is fully Norwegian and descended from Danish must provide some convincing evidence if they want to keep the status quo and settle this issue once and for all.
Nobody would accept it if we defined English as a descendant of Middle French because it has words like question and voice in its lexicon. Likewise, it is equally stupid (yes, stupid) to claim that Bokmål is not descended from Danish because it has words like røyk. It is absolutely impossible to explain how Bokmål got words like øgle, uke, velge, frem, hun and lege without ackowleding its Danish roots. They were there ever since Danish was Norway’s sole written language (a historical fact) and not borrowed into written Norwegian from Danish. They were inherited from Danish and stayed as the language evolved into modern Dano-Norwegian, i.e. Bokmål. All non-Danish elements in the language can be dated and traced back to specific spelling reforms through which the government introduced Norwegian elements by force as part of official Norweganisation policy.
While I am not personally convinced, I agree that since Bokmål has been so heavily Norwegianised, it can be argued that it is a creole language that in part is descended from Middle Norwegian, just as one could argue coherently that Middle English is a creole language descended from Old English and Old Norse. This is also why my suggestion in the first place was to add Danish as one of Bokmål’s ancestors rather than replacing Middle Norwegian as the sole ancestor. Bokmål’s relation to Middle Norwegian is another question that can be discussed later. Denying Bokmål’s Danish origins, however, is outright disinformation. Hått (talk) 17:49, 16 October 2022 (UTC)[reply]
I don't think anyone disputes that there's a significant stratum of Danish in Bokmål, and also some Norwegian. The question is rather the relationship of the two to the modern language and what exactly it means. With English, it's easier to tease things apart because Old French is a Romance language and Old English is a Germanic language. Yes, there were Latin borrowings in Old English and a Frankish element in Old French and an Old Norse element in Norman Old French, but the basic vocabulary and grammatical structure were easily recognized as different. Even the Old Norse element in Old English is easy to spot. Although the Latin and Romance vocabulary is numerically by far the largest element in modern English, there's no doubt whatsoever that English is a Germanic language. As a result, we don't consider Old French to be an ancestor of Middle English or modern English.
With Norwegian, as with English, there was an official standard that was intended to supersede the native language. For centuries the only languages allowed in the English legal system were Old French and Latin. There are all kinds of socioeconomically based diglosses fossilized in English: farmers raised cows and sheep and pigs (Germanic), but the meat they sold was beef and mutton and pork (French). I would expect to see similar phenomena with Norwegian.
So, I suppose the question is: what is the core of Bokmål? is it Danish that's absorbed a bit of Norwegian, or is it Norwegian with a Danish overlay in semantic fields where communication with the ruling classes would be required, or is it some complicated mixture that's hard to analyze? I don't know enough about Danish or Norwegian to be able to even guess. Chuck Entz (talk) 20:39, 16 October 2022 (UTC)[reply]
@Hått Tricky issues like this are by nature hard to resolve, and even harder in a consensus-based system. I think it's unreasonable to say things like "Again truth was sacrificed and the convenient option was chosen" and to use words like "outright disinformation". This presumes that your opinion is the one true and correct view, and that everyone who disagrees is simply wrong. I am certain in this case there are differences of opinion in the scholarly literature, so why should the Wiktionary community be any different? It's better to assume good faith on everyone's part.
BTW my personal view is that Bokmal and Nynorsk should be grouped as one Wiktionary language similar to how all Chinese languages are grouped under "Chinese". This obviously does not imply that they are the same language, any more than the editors of "Chinese" think that Mandarin, Shanghainese, Cantonese, Min Nan etc. are the same language. It simply helps reduce duplication. As it is the situation with Bokmal and Nynorsk is an utter mess of duplication, with a lot of entries de-facto unified already under "Norwegian" but others not. I'm pretty sure that my view is the majority view here, but there are a few extremely vocal members of the Norwegian community who disagree and are effectively blocking any consideration of this option. Benwing2 (talk) 00:29, 17 October 2022 (UTC)[reply]

This term feels little too specific, being applied for a specific mechanic used in only two Super Mario Bros games. There are hundreds of thousands of video games, and allowing the jargon for every single one feels a little excessive. Should it be deleted? Ioaxxere (talk) 16:11, 17 October 2022 (UTC)[reply]

No. Not too familiar with this specific term, but I doubt this is only used in two Super Mario Bros. games. Even if it was, the clear distinction for me is that it's jargon that was not invented by the creators of the games as a trademark, or something similar to that, but was a case of clear outsider invention. It wouldn't be the same as "Goomba" or "fire flower", for example, in that way. We should keep framerule for the same reasons we keep fandom slang. PseudoSkull (talk) 16:32, 17 October 2022 (UTC)[reply]
Also, we allow jargon for lots of things, such as scouting, so why video games should be any different is just because...it's media? that's newer? PseudoSkull (talk) 16:33, 17 October 2022 (UTC)[reply]
No, he criticised it as one specific franchise (Mario), not as "video games = bad". Equinox 16:36, 17 October 2022 (UTC)[reply]
We have loads of entries for jargon pertaining to specific organizations. Those organizations of course tend to have dated much further back—we're talking 50-100 years old at least, for example, Scouting subgroups and especially religious institution jargon. Those terms don't tend to get any pushback at all. We tend to, however, on the other hand get really pissed off when it's jargon pertaining to a specific piece of media or a franchise that's more modern. I was calling out a bias that does exist on this site and that I believe was applied here. PseudoSkull (talk) 16:45, 17 October 2022 (UTC)[reply]
Those terms are not at all comparable. There are millions of people involved in the Scouts, versus 1732 Super Mario Bros speedrunners (at least according to [4]). Ioaxxere (talk) 16:47, 17 October 2022 (UTC)[reply]
That's not counting all the runs that are done and were never submitted to that site, which I'd estimate throughout Internet history (and that's considering the fact that speedrun.com is relatively new actually, founded in 2014, and I'd been speedrunning even before then), have probably been done by at least tens of thousands of people, especially considering the popularity of that game. PseudoSkull (talk) 16:51, 17 October 2022 (UTC)[reply]
Furthermore: Page views for the Wikipedia article on Super Mario Bros. this month was 96,278, and page views for the Wikipedia article on Boy Scouts of America this month was less than half that...21,936! So we can see evidently that these pieces of media, even individually, in fact are very culturally significant. PseudoSkull (talk) 16:55, 17 October 2022 (UTC)[reply]
Also used for non-Mario games so IMO keep. Equinox 16:36, 17 October 2022 (UTC)[reply]
Which ones? Ioaxxere (talk) 16:39, 17 October 2022 (UTC)[reply]
https://www.google.com/search?q=%22framerule%22+-super+-mario+-smb1+-smb2j
I can't find anything related to what you're referring to Ioaxxere (talk) 16:51, 17 October 2022 (UTC)[reply]
Including the word "speedrunning" gives results like [5], [6], [7], [8]. —The Editor's Apprentice (talk) 18:53, 17 October 2022 (UTC)[reply]
@Ioaxxere Wouldn't this be better suited to WT:RFDE? It should probably be moved there. - excarnateSojourner (talk | contrib) 20:18, 18 October 2022 (UTC)[reply]
I mainly wrote this to gauge people's responses to this specific word. I never requested for the page to be deleted, I only asked if it should. Ioaxxere (talk) 20:32, 18 October 2022 (UTC)[reply]

Partly standardising categorisation edit

So, there was always a debate on whether words should be assigned to categories named after a word: Does "milk" belong in CAT:Milk? Does "dog" belong in CAT:Dogs? So, we recently got to discussing this on the Discord server and with some substantial help of @Surjection we came up a great compromise, where "milk" will be categorised into the parent categories of Milk (CAT:Beverages, CAT:Dairy products, CAT:Bodily fluids) BUT it will also be categorised into CAT:Milk under a separate bullet point, which would precede all other members of the category and as such have a visibly different status whithin this category, while being findable through the category.

You can see this proposal in action on CAT:fi:Dogs, where koira is the basic Finnish word for "dog". So, what does everyone think about this? Is it satisfactory to everyone? Thadh (talk) 19:34, 18 October 2022 (UTC)[reply]

If we do implement this, perhaps we should have a separate template like {{topics}} except that it uses the space character as the sort key, instead of how koira does it now with a |sortN= parameter. — SURJECTION / T / C / L / 19:37, 18 October 2022 (UTC)[reply]
@Thadh I've heard there is a distinction between set and topic categories. Are we talking about both here? - excarnateSojourner (talk | contrib) 20:14, 18 October 2022 (UTC)[reply]
@ExcarnateSojourner: Huh, I'm not aware of any difference, what is it? Thadh (talk) 20:25, 18 October 2022 (UTC)[reply]
@Thadh I don't fully understand the difference myself, but I have seen it mentioned in a few old (but unresolved) RFM discussions, such as this one, this one, and this one. My impression is a topic category contains terms related to a subject (hypotheically cat:Animals could include migration or nocturnal), and a set category is for a particular type of thing, and only contains subcats and terms for more specific subtypes of that type (hypotheically cat:Animals could include cat:Birds and cat:Fish, but not migration, because migration is not a type of animal). - excarnateSojourner (talk | contrib) 22:47, 18 October 2022 (UTC)[reply]
@ExcarnateSojourner: From the last RFM I got the impression that sets and topics are the same thing but different from POS categories (and similar). In any case, it doesn't seem like the current editors distinguish between topics and sets, so let's do this for both. Thadh (talk) 22:56, 18 October 2022 (UTC)[reply]
How do you find out whether a category is for a "set" or a "topic"? 98.170.164.88 04:31, 19 October 2022 (UTC)[reply]
Look at the breadcrumbs at the top of the category page. Topical categories have the language-specific subcategory of Category:All topics as the root and set categories have the language-specific subcategory of Category:All sets. Although the implementation overlaps, topical categories are used for terms about something, while set categories are used for names of various examples of something. For example, man's best friend should go in a topical category, while schnauzer should go in a set category. In practice, though, there usually aren't separate topical and set categories for a lot of things. I've spent an inordinate amount of time over the years fleshing out the set categories under Category:Lifeforms, but taxonomy provides an inherent structure that the topical equivalents lack. Just to see how it would work, I created a group of related categories around the theme of maize: Category:Maize (plant) is a set category, while Category:Maize (food) and Category:Maize (crop) are topical categories. Maize is unusual in having a lot of terms unique to it, so this is more the exception than the rule. More commonly, you have things like Category:Dogs, which is part of the set category tree, but is full of topical terms, and Category:Diseases, which is in the topical category tree but is really a set category.
@Thadh: there are indeed serious problems with our category structure, but this doesn't really address those. Chuck Entz (talk)
@Chuck Entz: I wasn't really trying to clean up the whole topic/set mess (I didn't even know the difference myself), rather trying to come to an agreement of whether to include the main term for the category in the category itself: Whether CAT:Dogs is about terms related to dogs or terms that are subtypes of dogs, the word "dog" is neither. This is just a way to stop the constant edit wars about including/excluding "dog" into CAT:Dogs, for instance. Thadh (talk) 14:42, 19 October 2022 (UTC)[reply]
Categories as they stand seem to essentially be lists of hyponnyms - but I think there is a want to include the term itself. I think the proposed change of separating it out at the top with a bullet is the solution. The term is included but not as a hyponym. Vininn126 (talk) 15:03, 19 October 2022 (UTC)[reply]
  Support - excarnateSojourner (talk | contrib) 23:04, 18 October 2022 (UTC)[reply]
I agree with having "milk" in the milk category, it might not make sense for a native speaker of the language, but for everyone else it is very valuable information. But most importantly, who's to say what's the "basic word"? Words are full of nuances and synonyms: the basic Italian word for "cow" could be vacca or mucca depending on who you're asking, same thing to "pig" porco/maiale. Catonif (talk) 17:02, 24 October 2022 (UTC)[reply]
@Catonif: I'd say any language's community should decide on their own which words are basic and which are more specialised - there's no way of deciding this for all languages. It's more about the idea, really. Thadh (talk) 17:34, 24 October 2022 (UTC)[reply]

Medaphobic edit

Medaphobic : A fear of medications, physicians or other health care providers based on real or imagined assumptions. Eehillmd (talk) 05:56, 19 October 2022 (UTC)[reply]

We're a descriptive dictionary. We don't introduce new words people have made up. See our Criteria for inclusion. Chuck Entz (talk) 11:38, 19 October 2022 (UTC)[reply]
The real word is medicophobic. Equinox 11:39, 19 October 2022 (UTC)[reply]

A Modern Greek etymology template needed edit

by ‑‑Sarri.greek  I Subject: Asking permission & help for etymology template especially for Modern Greek.
Although I do not write etymologies, I often see lots of existing ones with etym, with {{uder}}, lots with a wrong {{inh}}. Dictionaries for Mod.Greek always make the distinction of

  • bor (loanword) and lbor (learned loanword), strictly for foreign words
  • requested template: el-dlbor or el-d-lbor or dlbor with param el = diachronic learned loanword/borrowed word (from previous centuries: from gkm, grc-koi, grc). Relevant discussion at GreasePit2022
    • A huge subcategory for modern Greek. The term is marked at greek dictionaries such as {{R:DSMG}} and discussed extensively at its Introduction as an exceptional category for Greek. Their other term internal loanword describes synchronic loanwords from dialectal words which enter the Standard (very rare). Other etymologists use the adjective internal for all cases. At el.wikt, we use the DSMG term "diachronic learned loanword".
    • Examples of words: δημοκρατία (dimokratía), which is not inherited, as it might be assumed, but a dlbor (there was a gap in its use through centuries, and was reused) The English word sibling is an example of such a dlbor, a revival.
    • Examples of categories: Cat.dlbors from ancient@el.wikt versus Cat.inh@el.wikt while the Cat.inh@en.wikt has too many wrong ones. There is allso Cat.undefined

The requested template e.g. {{dlbor|el|grc|δημοκρατία}} would:

If permission is granted, could a template or an addition to Module:etymology/templates be added? (with help from @Benwing2?) I then, would be able to gradually correct existing etymologies while doing my regular work for pronunciations. Updating my mentor and administrator, @Saltmarsh.
PS. The term Modern Greek might be a more helpful and precise term for code el. Thank you. ‑‑Sarri.greek  I 12:29, 20 October 2022 (UTC)[reply]

This not unique to Greek. All languages with a Classical ancestor do it (Armenian, Indo-Aryan, Neo-Aramaic etc.). Why can't you use {{lbor}}? Vahag (talk) 20:16, 20 October 2022 (UTC)[reply]
@Vahagn Petrosyan, I thought about it, occasionally i used lbor. But I would prefer a language-specific template because _a. el-lbor is for foreign words only (bor and lbor are under Category:Foreign derivation templates) _b. the text should be different. and _c. too big a subcategory. It would be nice to mark it as special. I do not know if in other languages this 'ancestral' lbor has e.g. 10,000-20,000 words at least. ‑‑Sarri.greek  I 20:39, 20 October 2022 (UTC)[reply]
The past is a foreign country. --RichardW57m (talk) 08:55, 21 October 2022 (UTC)[reply]
How would the new category be any smaller? How would you categorise the new template? --RichardW57m (talk) 09:03, 21 October 2022 (UTC)[reply]
A! yes @Vahagn Petrosyan, I see now Category:Armenian learned borrowings from Middle Armenian and Category:Armenian learned borrowings from Old Armenian. Yes, these too! ‑‑Sarri.greek  I 20:58, 20 October 2022 (UTC)[reply]
In my opinion, {{learned borrowing}} should be dedicated exclusively to borrowings from a classical literary language, which is usually the ancestor language but can also be a classical literary cousin or uncle (as in Turkish borrowings from Chaghatay). Anyhow, that is how I always understood the meaning of the English expression learned borrowing. Vahag (talk) 21:03, 20 October 2022 (UTC)[reply]
That def, @Vahagn Petrosyan, implies classical terms from latin or greek e.g. for european languages. All neolatin terms are lbors anyway. But it could be from other languages of any period. Perhaps you are right. Never mind then. I give up. I do not do etymologies anyway. ‑‑Sarri.greek  I 21:15, 20 October 2022 (UTC)[reply]
@Vahagn Petrosyan: In my understanding, learned borrowings are borrowings from any - almost always classical - language that didn't happen through language contact; this doesn't need to be a related language, Burmese borrowings from Sanskrit would also count. So would Greek loans from Hebrew. Thadh (talk) 22:22, 20 October 2022 (UTC)[reply]
What is "language contact" in that context? @Nicodene considers the literary form მონტაჟი (monṭaži) a learned borrowing from Russian because it reflects the spelling of the Russian term, as opposed to the colloquial form მანტაჟი (manṭaži) which reflects the Russian pronunciation. Is there no language contact in the first case? Vahag (talk) 09:17, 21 October 2022 (UTC)[reply]
I'm not particularly attached to the usage of 'learned' in this context and would be happy to opt for some alternative, so long as it fits. Perhaps 'orthographic borrowing'. Nicodene (talk) 11:52, 21 October 2022 (UTC)[reply]
I disagree that it should be limited only to classical languages - Mongolian imported vocabulary en masse from Russian in the early 20th c. in order to fill gaps in the vocab (mostly in relation to technology or Western culture) - these weren’t inadvertent loans from ordinary language contact. Further to this, it may be a good idea to be able to specify natural borrowings (i.e. those that did occur through language contact). For some languages, learned borrowings constitute the large majority - to the point where it is the default assumption. Although different languages might have very different ratios when it comes to the two, simply calling natural borrowings “borrowings” obfuscates things.Theknightwho (talk) 12:58, 28 October 2022 (UTC)[reply]

So, I have to rephrase and alter my proposal. What I understand from this discussion is that we need a term, a template and a Category for [-learned], [+oral, speaker-to-speaker] loanwords (versus lbors). I do not know what the proper linguistic term is; in greek it is λαϊκός (laïkós, of the people), "λαϊκό loanword", or "oral loanword" and just "loanword" is always assumed to be non.learned loanword. Or why is it that English dictionaries, never make the distinction. Thank you all for helping me rethink this subject. ‑‑Sarri.greek  I 19:09, 28 October 2022 (UTC)[reply]

Word Boundaries in Transliterations of Scriptio Continua Quotations edit

As I understand it, a recommendation for quotations in non-Roman scripts is that the transliteration be automatically generated from the actual text. Now this may need varying degrees of manual support - for Hindi there is |subst=, while the solution for Thai is to break the text up into words so that they may be transcribed independently.

In languages where word boundaries are not marked visually, the text is more legible if the word boundaries are marked in the transliteration. This does suffer the problem that there may not be agreement on where the word boundaries are - this is a notorious problem in Thai. Now, in some writing systems, it is preferred not to split words between lines, and for this Unicode provides a solution. Line-breaking opportunities between words may be marked by the character U+200B ZERO WIDTH SPACE, while line-breaking opportunities within words may be marked by U+00AD SOFT HYPHEN. (Thai actually has cause to have both.) It may be noted that we seem not to use U+00AD in terms. Do we have a policy on using U+200B in terms? Some people, notably Octahedron80, regard it as an error in terms, and prefer to transliterate it as '!!' so as to detect its occurrence.

The issue has come up in Mon, for which I do not believe we have an agreed transliteration, and I therefore wish to transliterate example text dynamically. However, some of the transliterations have some word boundaries marked by spaces by an ethnic Mon, and I am wondering whether I should mark these, and only these, boundaries by U+200B in the Burmese script master. As it seems necessary to mark the text up anyway because of phonetic input to the usual transliteration schemes, I have implemented an alternative quotation template {{mnw-quote}} to allow mark-up in the original text. If U+200B is widely opposed, I could instead use a mark-up code.

What do people think? Notifying (Notifying RichardW57, Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Atitarev, Octahedron80, Mahagaja, Atitarev, Hintha): . --RichardW57m (talk) 14:51, 20 October 2022 (UTC)[reply]

Actually, there's a simple, general solution. I just enter the expressed word boundaries with the HTML mark up <wbr>, which has the advantage of being visible to a simple editor. Then, when I transliterate, I can widen sequences of spaces and convert '<wbr>' to single spaces. Implemented in Module:mnw-translit. The <wbr> can remain unchanged in the original text. --RichardW57 (talk) 18:16, 20 October 2022 (UTC)[reply]

A quite pleasant thesaurus edit

https://carefulwords.com/Justin (koavf)TCM 04:07, 21 October 2022 (UTC)[reply]

I like the design. 98.170.164.88 06:26, 21 October 2022 (UTC)[reply]
Pretty, but doesn't address polysemy in the first case I tried (straight). DCDuring (talk) 00:12, 23 October 2022 (UTC)[reply]

Can I assume that 'alternate spellings' always allow exactly the same set of pronounciations? edit

I see on co-operation that it is an 'alternate spelling'. This surprises me, because I know that the "co-" is not pronounced separately when the word doesn't have a hyphen. But if the word exists, and there is a hyphen, I would expect the pronunciation to allow pronouncing the "co-" prefix separately. Is this entry wrong, or am I misunderstanding what 'alternate spelling' means? --189.217.85.126 20:48, 22 October 2022 (UTC)[reply]

In what way would separating the prefix lead to a pronunciation different from the one given for cooperation? Nicodene (talk) 21:02, 22 October 2022 (UTC)[reply]
Oops, I confused it with corporation. 189.217.85.126 21:07, 22 October 2022 (UTC)[reply]
(e/c) Hm? Contrary to your statement, "co-" is pronounced 'separately' even when the word doesn't have a hyphen, in this and other similar cases like cooperate : cooperation is /koʊˌɒpəˈɹeɪʃən/, not /kupəˈɹeɪʃən/. (BTW, although I know co-operation is sometimes just another way of spelling /koʊˌɒpəˈɹeɪʃən/, and this is all our entries currently reflect, it wouldn't surprise me to find it sometimes being used to mean joint operation, stressed more like /ˈkoʊ ˌɒpəˈɹeɪʃən/...)
In theory, "alternative spelling" is to be used when two terms are pronounced the same, and "alternative form" when they are not. However, because both are grouped as "alternative forms" in the lemma entries, you have no way of knowing, without clicking through to each form, which ones might have different pronunciations. And because people are not particularly concerned with maintaining the distinction between "alternative forms" and "alternative spellings", even if you click through to another spelling/form, if it doesn't have its own pronunciation section you can't be sure if that means it's pronounced the same as the lemma or someone just hasn't added a pronunciation section (for example, I reckon mezail is probably pronounced with /z/, different from the lemma mesail's /s/, whereas mamellière is probably pronounced the same as mamelière). - -sche (discuss) 21:07, 22 October 2022 (UTC)[reply]
@-sche Hey, I deal with a lot of alternative forms and spellings, so I would like to know if you would say that Taizhong (in Taichung's 'Alternative forms'-header section) should use alternative spelling|en|. (I made the change.) I'm thinking yeah, based on what you've said above (as far as I know, these two words are supposed to be pronounced identically). I am neutral on the issue; I didn't realize there was any distinction between alternative forms and alternative spellings. BTW: Is what you're saying about this distinction written in Wiktionary policy somewhere? Thanks for any help. --Geographyinitiative (talk) 21:17, 22 October 2022 (UTC) (Modified)[reply]
@-sche This distinction is not mentioned in the documentation of {{alternative form of}} or {{alternative spelling of}}, which is where I would look for it if I was wondering when to use one instead of the other. - excarnateSojourner (talk | contrib) 19:23, 28 October 2022 (UTC)[reply]
@ExcarnateSojourner, Geographyinitiative well, the distinction was advocated by Thadh here, but from your comments it looks like there's not as much acceptance/awareness of it as he assumed. (I've also seen a different distinction, that an alternative spelling is a mere difference in spelling whereas an alternative form uses, well, a different form, e.g. different morphemes, even if these end up pronounced the same, as with words that differ in using Germanic -er vs Latinate -or.) IMO the distinction is not clear enough (if you hadn't heard of it) or maintained enough that we can rely on it to tell readers whether a form is pronounced the same or differently from its lemma, hence I've advocated creating a template or agreed-upon verbiage to put in the pronunciation section when an alt form has the exact same range of pronunciations as the lemmatized spelling and we don't want to tediously repeat them (and risk them falling out of sync if we list nonrhotic UK and Aus pronunciations in both foobar-er and foobar-or and then someone later adds the rhotic US pronunciation to only one...). I already sometimes manually spell out "as {{l|en|foobar}}" in pronunciation sections; in regie-book I gave the pronunciation of the obscure word and then just linked to book.
If Taizhong and Taichung are pronounced the same(?), and ultimately originate from (different representations of) the same Chinese word, I think they're fine being listed as alternative spellings whether we're making Thadh's distinction or not. But I wonder if they're really pronounced the same; I think Zhong usually has /ʒ/ or /dʒ/ whereas Chong has /tʃ/...? - -sche (discuss) 20:16, 28 October 2022 (UTC)[reply]
@-sche, thanks so much for your comments and linking to the old conversation. The pronunciations from the 20th century gazetteers for English 'Taichung' say: 1952 tīʹjo͝ongʹ [9]; 1979 tīʹjo͞ongʹ [10] (Appendix:English pronunciation). As I understand it, these pronunciations do not indicate /tʃ/ and specifically do indicate /dʒ/. Therefore, I would say that the intended pronunciation for English language word 'Taichung' really does seem identical to what's intended for the English language word 'Taizhong'. IN PRACTICE, I agree that the average American will look at a "zho" versus a "chu" in these words and come to different sounds. Given these new factors/info, does the Taichung-Taizhong relationship fall into your ideal category for alternative spellings? I have never had a strong theoretical grounding for calling words 'alternative forms', 'alternative spellings' or 'synonyms' or etc, and this is pretty interesting to me. Now I'm seeing 'alternative spellings' as a subcategory of 'alternative forms'. Alternative forms is way broader than alternative spellings (as can be seen at six ways to Sunday). Thanks for any help!! I understand all this may seem obvious to you, but I never considered this before. No need to respond on this. --Geographyinitiative (talk) 21:51, 28 October 2022 (UTC) (Modified)[reply]

Using the template RQ:Paul.Fest. edit

I am adding a quotation from Paulus Diaconus's epitome of Festus to the entry arculata using the template {{RQ:Paul.Fest.}}. However, I can't find a way to get the displayed page number to be different from what is used in the link to the archive.com version. This is a problem because the standard page numbering would be 16, but the archive.com url needs to have n67 to link to the correct place. Should I use another template; can this one be edited to add another parameter; or is there a way to do what I want with this template after all? Urszag (talk) 05:23, 24 October 2022 (UTC)[reply]

@Urszag: I added an optional urlpage parameter to it to do what you want. —Al-Muqanna المقنع (talk) 11:14, 24 October 2022 (UTC)[reply]
It is possible to do arithmetic on parameters; I've done it at {{R:pi:Childers}}. Possibly the users of Module:Quotations have also done it and could chip in. It can get painfully fiddly with templates, and screams for a bit of Lua, which Module:Quotations allows. Possibly we need a utility module aimed at archive.com. --RichardW57 (talk) 18:03, 24 October 2022 (UTC)[reply]
@Urszag: Man has done this hundreds of times, see many Arabic reference templates, or specifically {{R:sem-eth:Littmann}} for some of the most complicated I have done; most recently {{R:ar:Casanova:1924}} for a journal piece. In |pageurl= you invoke add the end of the URL without the number to be calculated {{#invoke:ugly hacks|match|{{{page|{{{pages}}}}}}|[0-9]+|}} to normalize input of numbers, that is so template users can type page ranges and others things but they are clipped to the numbers in front, and to this number you usually only have to arithmetically add or subtract a number by wrapping this function in {{#expr:+your needed difference}}. In normal books, i.e. like yours instead of my complicated lifework template for books as well as journal pieces, |page= and |page= merely are |page={{{page|}}} and pages={{{pages|}}}, such as in {{R:ar:Freytag-Einl.}}; the more complicated code in {{R:ar:Casanova:1924}} pageurl= … {{#invoke:ugly hacks|match|{{{page|{{{pages}}}}}}|[0-9]+|}}{{#if:{{{page|{{{pages|}}}}}}||356}}+2}} and pages={{#if:{{{page|}}}||{{#if:{{{pages|}}}||356–360}}}}{{{pages|}}} is to create a default page link for a journal piece. I also have a formula for books numbered by columns from {{R:ota:Meninski}}, used with fewer complications in {{R:ota:Kanun-name:1830}}, which you may need later …
Please note that this is not Beer Parlour but Grease Pit matter.
I also note that this is technically not the way literature from antiquity is cited, but doing it the traditional way plus linking the pages easily is way more cumbersome, so we take an easy standard for the internet. Fay Freak (talk) 23:03, 25 October 2022 (UTC)[reply]

Echo words in Indian subcontinent languages edit

How are w:echo words handled on Wiktionary? Dennis Dartman (talk) 04:30, 25 October 2022 (UTC)[reply]

@Dennis Dartman: we don't handle them. An entry for the echo-forming morpheme like schm- is the only thing we do. Vahag (talk) 10:04, 26 October 2022 (UTC)[reply]
I’m not sure if I fully agree with that approach, but certainly they’re more predictable in some languages over others. Theknightwho (talk) 12:06, 26 October 2022 (UTC)[reply]
We definitely should be handling these; the only problem is the limitation that Wiktionary brings in terms of entry titles. I ran into this issue when trying to write an entry for the gerund-creating prefix in Yorùbá which is Cí- where C is the consonant that the verb starts with. Ex: mímọ̀ (the act of knowing) from mọ̀ (to know), but there's no easy way to do it. AG202 (talk) 12:33, 26 October 2022 (UTC)[reply]
@AG202: For languages with vowel harmony, this is solved by having as many entries as there are possible forms (-li, -lı, -lu, -lü). — Fytcha T | L | C 13:54, 26 October 2022 (UTC)[reply]
Each language should probably approach this differently. One option could be listing it in the headword perhaps unlinked. Vininn126 (talk) 14:00, 26 October 2022 (UTC)[reply]
If it's a common feature of the language (which sch- is not), then I think it's fine to have entries even if they're pretty predictable. In Mongolian, for example, they're about as regular as English plurals, but common enough that it's worth having them. Theknightwho (talk) 22:23, 26 October 2022 (UTC)[reply]
I dunno, maybe we could take an approach similar to that of the Maltese definite article (il-#Maltese) and just create one separate article for every possible consonant? Dennis Dartman (talk) 18:41, 26 October 2022 (UTC)[reply]
@Fytcha, @Dennis Dartman, @Vininn126 replying here to try and keep things in one thread: my initial thought was the Maltese/vowel harmony solution where I'd just list every consonant, but that's obviously not ideal, and I don't even know where the main lemma would be and then which ones would be alternative forms. Listing it in the headword unlinked could also be an idea, though categorization comes to mind. AG202 (talk) 21:26, 26 October 2022 (UTC)[reply]
I know nothing about the Yoruba language, but are there certain gerund forms created using the usual echo-word process that have taken life of their own? And how do we decide where to draw the line when it comes to determining which gerunds deserve an independent entry and which don't? For instance, compare:
running
driving
amusing
tranquilizing
...to:
conflating
contributing
excising
barbequing
...in English.
Contrast Latin, wherein perfect passive participles formed using the suffix "-tus" or "-ns" are universally treated as "participles" as opposed to adjectives, even among common ones like factus. Dennis Dartman (talk) 22:02, 26 October 2022 (UTC)[reply]
Regarding where to put anything like Yoruba's Cí- prefix that we decide to include (without expressing an opinion on whether we should include echo words in general), a few Finnish entries use placeholder V (-hVn, -Vn, -V), so I suppose we could consider using placeholder C in entry titles, too. If the entries were careful to spell out that C is just a placeholder and you don't actually write mọ̀Címọ̀, that'd work as far as linking in our own etymologies, the main(?) problem would be that it'd be unintuitive for someone to search for, but that could be addressed by having soft redirects at all the possible/attested forms. But I don't know if this is actually any better than just lemmatizing one of the attested forms, e.g. the alphabetically first one (bí-?). Where do dictionaries of Yoruba cover the above-mentioned prefix? - -sche (discuss) 23:51, 26 October 2022 (UTC)[reply]
I did not know that Finnish did that, thanks! In terms of Yorùbá dictionaries, this prefix is not acknowledged outside of grammar sections. Neither "Cí-" nor the individual suffixes "bí-, dí-, fí-, etc." are listed. Words that are derived from it like kíkọ́ (the act of teaching) in kíkọ́-èdè (language acquisition), just list like "kí- + kọ́" in the etymology sections. AG202 (talk) 02:30, 27 October 2022 (UTC)[reply]
FWIW, I definitely think it's nonstandard that Finnish does that (from a rest-of-Wiktionary point of view, although the only reply I got when I brought it up at RFM was that it's standard in Finnish sources). But I guess it's not that much weirder than the entries we have at -∅. - -sche (discuss) 04:51, 27 October 2022 (UTC)[reply]
I wonder if the Yoruba prefix is really a prefix. It looks to me more like reduplication of the initial consonant with a vowel in between that's lexically specified. In the same way, the echo words could be viewed as reduplication of the coda with a lexically specified consonant in between. What do we do for languages that use reduplication to indicate the plural? Chuck Entz (talk) 06:11, 27 October 2022 (UTC)[reply]
Oh, yes; it was lax of me to call it a prefix; it sure looks like reduplication instead, albeit we sometimes shunt such things into our infrastructure for affixes out of convenience (like with -x- that was discussed in the Tea Room a while ago). - -sche (discuss) 20:17, 28 October 2022 (UTC)[reply]

User:Dan Polansky now restored the RFD-deleted entry non-French under the pretense that it is protected by WT:THUB. However, the very first sentence of WT:THUB reads: "A translation hub (translation target) is a common English multi-word term or collocation that is useful for hosting translations." (emphasis mine) This doesn't seem to pertain non-French but even if we set this aside, the two translations he added are literally non- + French translated separately, which we explicitly prohibit in the case of multi-word expressions: "a closed compound that is a word-for-word translation of the English term: German Autoschlüssel does not qualify to support the English "car key"; or [] " So because this is not a multi-word expression (which bars it from being a THUB in the first place), he thinks it's suddenly fine to do the part-by-part translation thing. Any thoughts? — Fytcha T | L | C 15:01, 26 October 2022 (UTC)[reply]

Keep non-French. Per WT:CFI, "including a term if it is attested and, when that is met, if it is a single word or it is idiomatic". Thus, CFI does not require idiomaticity for single words, which this is. I accept that THUB as written, literally interpreted, does not protect the entry. It is only the spirit of THUB that protects the entry. The translation nefrancouzský is not a compound, in the narrow sense of the word. There is a long treatment at Wiktionary:Beer parlour/2022/September § Including hyphenated prefixed words as single words, which shows our practice was largely to keep hyphenated single words, and argues the case at length. Start with deleting non-standard: it is less valuable since there is nonstandard; by contrast, nonfrench is a misspelling or miscapitaization, regardless of claims recently made in RFD to the contrary. --Dan Polansky (talk) 15:48, 26 October 2022 (UTC)[reply]
The "spirit of THUB" protects quite literally anything, by that logic. We should not be adding translation hubs in the hope that they may become useful at some point in the future; we should only be adding them if they contain at least one translation that would pass CFI. Theknightwho (talk) 22:01, 26 October 2022 (UTC)[reply]
I disagree that that is the spirit of THUB. The letter of THUB states that word-by-word translations of compounds don't count and the logical extension of that is that part-by-part translations of single-word SOPs don't count either.
As to "including a term if it is attested and, when that is met, if it is a single word or it is idiomatic", I don't think this is to be interpreted as us including all attested single words (I agree though that this sentence is worded poorly and should be updated). For instance, we exclude typos, rare misspellings and many proper nouns so I don't see why you deem it unthinkable that SOPness is another barrier for attested single words, especially considering that WT:SOP is written in language that makes it applicable to both single-word as well as multi-word expressions. — Fytcha T | L | C 00:37, 27 October 2022 (UTC)[reply]
About the spirit of THUB, I see no reason why THUB should protect phrases and not words, as long as words are being targeted for deletion contrary to CFI. Before 2019, single hyphenated words were not being successfully deleted as sum of parts, and instead, Talk:multi-word is a 2013 example of near-unanimous keeping in RFD of a hyphenated single word. That's why the pre-2019 drafting of THUB did not use wording to protect single words.
What "if it is a single word or it is idiomatic" means is that single words do not need to be idiomatic. It cannot mean anything else. A phrase of the form "if A or B" cannot be reasonably interpreted as having the same effect as "if B". The exclusion rules for proper names are a different story; they override the general rule and the general rule has no wording to prevent that override. Needless to say, the CFI drafting is not ideal; the general rule should ideally advertise it has exceptions specified below in the CFI. --Dan Polansky (talk) 10:20, 28 October 2022 (UTC)[reply]
I've deleted it. This is a blatant attempt to subvert the earlier RFD consensus. — SURJECTION / T / C / L / 06:21, 31 October 2022 (UTC)[reply]
I restored the entry only because the RFD did not consider the translation target argument, and its participants willfully ignored the near-unanimously supported COALMINE. Furthermore, we now know that the 5–3 = 62.5% is not consensus: Wiktionary:Votes/pl-2022-09/Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes. The RFD closure would have to clarify the overriding concerns, but that did not happen. As I just pointed out in this BP thread, the deletion is in violation of CFI anyway. Whatever happens to non-French, I hope the subject of hyphenated single words will be handled in a saner manner in future, to help Wiktionary work much better as a spelling guide and to document productivity of affixes. We have over 10,000 low-value nonX entries; we can have non-French (properly so spelled, not nonfrench and not the likes of antijapanese and antimuslim) as well. --Dan Polansky (talk) 08:20, 31 October 2022 (UTC)[reply]
Keep non-French per Dan Polansky (though I suppose this should be moved to RFD). Binarystep (talk) 13:30, 2 November 2022 (UTC)[reply]

Syllabification in spelling edit

Any algorithm to automatically syllabificate English headwords orthographically in every entry? Backinstadiums (talk) 10:07, 28 October 2022 (UTC)[reply]

It's not that simple. See, for instance, Category:English terms with pseudo-digraphs. English spelling just isn't consistant enough to do that kind of thing. Chuck Entz (talk) 17:25, 28 October 2022 (UTC)[reply]
Really? up•hold ; co•op•er•a•tive or co-op•er•a•tive. So let's do it boyz Backinstadiums (talk) 17:40, 28 October 2022 (UTC)[reply]
It's easy enough to find lists of syllabifications, but not to create them by algorithms without consulting a list. How does an algorithm figure out that nother isn't a compound of not + her? Chuck Entz (talk) 17:55, 28 October 2022 (UTC)[reply]
More exceptions? And no less a word than nother? So far yours are not convincing arguments... Anyway, list are accessible and reverse engineering an algorithm too. Look forward to it so. Backinstadiums (talk) 18:31, 28 October 2022 (UTC)[reply]
I fear both English spelling and English syllabification may be too irregular for this to be practicable. I wonder how much value there is to listing syllabification, anyway, for English. German has official syllabification rules, but in English word-breaks can be found all over the place (e.g. in books I can find En- glish, Eng- lish, and Engl- ish, albeit not E- nglish or Engli- sh). Indeed, while dictionaries agree that nother and another shouldn't split the th, I can nonetheless find a few books that split it across a line break as anot- her... - -sche (discuss) 20:27, 28 October 2022 (UTC)[reply]
Eng·lish(ed), Eng·lish·ing, Eng·lish·es Backinstadiums (talk) 20:44, 28 October 2022 (UTC)[reply]
What is your point? AHD is correct that between the g and the l is one place the word can be syllabified. It is not the only place the word gets syllabified, and it's not clear how a person would write an algorithm that would predict the places it gets syllabified. - -sche (discuss) 02:01, 29 October 2022 (UTC)[reply]
No, currently that is the only one in its entry https://en.wiktionary.org/wiki/English#Pronunciation Backinstadiums (talk) 11:50, 29 October 2022 (UTC)[reply]
an·e·thole and cath·ode, but cat·hole. re·al·lege but real·ly. u.nite but un.in.ten.tio.nal. rat.race but re.trace. And so on and so forth.  --Lambiam 19:30, 31 October 2022 (UTC)[reply]
But of course, taking into account morphology http://ingles-americano.blogspot.com/2012/08/syllabication-rules.html Backinstadiums (talk) 19:51, 31 October 2022 (UTC)[reply]

Label + cat for organizations / companies / institutions etc. edit

What do others think of labeling and categorizing senses designating organizations / companies / institutions etc. as such? See for instance MSC: I would have found it useful had Military Sealift Command been labeled an organization because, prior to clicking on the Wikipedia link, I wasn't sure whether it referred to an organization or whether it just was some military command (mind you it was under the noun header before my edits). I think it could help readers find the desired senses more quickly. — Fytcha T | L | C 22:23, 29 October 2022 (UTC)[reply]

I edited the MSC entry like this: "Initialism of Military Sealift Command: an organization that controls the replenishment and military transport ships of the United States Navy." I don't see why not. Is this something that you like or find useful? --Dan Polansky (talk) 15:45, 30 October 2022 (UTC)[reply]
Seems reasonable. Fytcha, I don't know if you're saying "labelling" just to mean we should indicate it in some way or if you specifically mean adding a {{label}}, but I think adding it to the gloss is better. Maybe we should devise a system for "topic" qualifiers separate from our usage-restriction labels (e.g. for terms related to the topic of biology but not restricted to the jargon of biologists), perhaps using a different type of brackets, since people are wont to misuse labels to indicate topics, but {{lb|en|organization}} isn't even really a "topic", it's just stating the nature of the referent, which is what the definition/gloss is for. I'm not sure I see the utility to categorizing e.g. "English acronyms denoting organizations" but I guess it's not harmful either... - -sche (discuss) 23:44, 30 October 2022 (UTC)[reply]
Category:English initialisms for organizations is in RFD and failing. I find most of the arguments against it unconvincing and some outright wrong. I would be happy to have Category:Organizations named by initialism or acronym so that the user does not need to figure out whether the term in question is an initialism or an acronym. Category:en:Organizations is not under attack and can be used if wished. --Dan Polansky (talk) 09:01, 31 October 2022 (UTC)[reply]
@Dan Polansky: Thanks for linking these, I wasn't aware of them. I agree with the nomination that we shouldn't have intersections of lingual and topical categories as separate categories. Intersections can be computed using CirrusSearch. Category:en:Organizations seems fine for these but OTOH, maybe we need to finally tackle the "is an X" vs. "is related to X" problem of topical categories. — Fytcha T | L | C 17:59, 31 October 2022 (UTC)[reply]
@-sche: You are completely right. I myself fought against this kind of label abuse (using them to indicate a topic rather than a usage restriction) but I'm more than sure that I am guilty of misusing them myself at times. This is a rather deep-running problem on Wiktionary and if you plan to propose or implement changes to fix it you would have my support. From my experience, finding a clear, unobstructing and pleasant-to-look-at presentation is always the biggest roadblock for these kinds of changes. Maybe it would be a start if we brought people to use {{C}} in front of the senses (and duplicated even if the category pertains to multiple senses) because that way the categories can easily be made visible if we agree to do so. Currently, the categories are applied on a per-entry basis which is also bad for other reasons (e.g. because it makes it impossible to find, say, all proper nouns pertaining to some topical category because there may be a stray proper noun in the article that has a different sense belonging to that topic). As for "it's just stating the nature of the referent", we are already doing that with Category:Female people and Category:Male people so I'm not wholly convinced to refrain from this for organizations pending any side-wide changes to tackle the "is an X" vs. "is related to X" problem of topical categories. — Fytcha T | L | C 17:59, 31 October 2022 (UTC)[reply]

User adding multiple incorrect English pronunciations? edit

Wrong stress, wrong vowel, etc. not matching the IPA given (or at least heavily accented without noting this on the audio titles). See User talk:Qiu Ennan. Please weigh in. Equinox 15:24, 30 October 2022 (UTC)[reply]

Hmm, could you please be specific? As in which part does not match the IPA given/is stressed wrong? - Ennan (I'm new to Wiktionary and don't know how to sign) — This unsigned comment was added by Qiu Ennan (talkcontribs) at 15:27, 30 October 2022 (UTC).[reply]
@Qiu Ennan: You sign posts with four tildes at the end: ~~~~. —Justin (koavf)TCM 21:37, 30 October 2022 (UTC)[reply]
@Qiu Ennan:: we're a descriptive dictionary based on usage, so we're more interested in how things are actually pronounced rather than how some reference work says they should be pronounced. Your sound files make it obvious that you aren't a native speaker of the variety of English covered by the Oxford dictionary you cite, so they are bound to be inaccurate in spite of your best efforts. Replacing or demoting sound files by someone who is from that area just because they don't match a published dictionary's pronunciation is the exact opposite of what we should be doing.
Besides: pronunciation of standard English in England is far too complex for any single pronunciation to give a good picture in many cases. Chuck Entz (talk) 22:06, 30 October 2022 (UTC)[reply]
The reasoning you've provided for e.g. media seems backwards- you should pronounce the word as you naturally do, provided that you're a native speaker, rather than trying to artificially replicate the vowels found in some transcription. The audio pronunciation that you've removed from that entry represents without a doubt the predominant one (per my experience- cf. also YouGlish). The transcription with /ɪ/ appears to be inaccurate or outdated. Both the Oxford and Cambridge dictionaries provide /i/ instead, though one could argue for /iː/, as 'schwee' is a questionable stopgap measure. Nicodene (talk) 22:13, 30 October 2022 (UTC)[reply]

As a native British English speaker, I have reverted all of these. Theknightwho (talk) 23:08, 30 October 2022 (UTC)[reply]

Straw poll: Requiring 6 instead of 3 attesting quotations from Usenet edit

In RFD, there is some discussion about how many quotations from Usenet should be required for attestation. There was a February BP discussion about it, but it was chaotic in that it seemed to combine the proposal to increase the number of quotations with the proposal to ban Usenet altogether. To bring some clarity, I am creating this poll solely for increasing the minimum number of quotations from Usenet from 3 to 6. Thus, two Usenet quotations would count as one print quotation. The proposal is emphatically not to ban Usenet quotations. If you support requiring even higher number than 6, please indicate so in the poll. Please keep the poll-like structure for ease of counting, but as always, comments beyond "support" and "oppose" and discussion are the golden standard.

Discussion:

  1. Wiktionary:Beer parlour/2022/February § Increasing the number of citations required for Usenet and updating CFI
  2. WT:RFDE#Christcuck

--Dan Polansky (talk) 07:41, 31 October 2022 (UTC)[reply]

Support requiring 6 Usenet quotations edit

  1.   Support. 3 quotations from an unedited corpus spanning a mere year is an extremely lenient standard, very easy to game. Posting to Usenet is so much easier than getting things into print. Even 6 quotations are in fact an extremely lenient standard. Fandom slang that sees significant use should be easy to attest using 6 quotations. It is not the purpose of a dictionary to document proto-words that have not yet gained anything like wider acceptance. All unedited corpora to be eventually accepted should be subject to a higher standard than that required for print corpora; starting with Usenet would be the first step in that direction. In general, counting a quotation from Usenet as having the same weight as a quotation from print media seems odd. I think we could require 9 quotations (1:3) or even 15 (1:5); I created a proposal for 6 (1:2) as a very moderate proposal to see whether we can get at least that. --Dan Polansky (talk) 15:47, 31 October 2022 (UTC)[reply]
    Fandom slang that sees significant use should be easy to attest using 6 quotations. This seems to be patently wrong and uninformed. The extremely widespread touch grass is only found twice on Usenet (I know it's not fandom slang but I don't think there's any reason to believe that Usenet is better at attesting fandom than it is at general internet slang). — Fytcha T | L | C 18:13, 31 October 2022 (UTC)[reply]
    That's a pretty recent phrase. According to Know Your Meme it has been used since at least 2015, but Google Trends indicates it only became popular to a level greater than background noise around 2020. I wouldn't necessarily expect to find it much on Usenet, whose popularity probably peaked around twenty years ago (excluding spam and warez). 98.170.164.88 18:24, 31 October 2022 (UTC)[reply]
    I know but I don't think this makes my counterargument apply any less. Dan said that it should be easy to find 6 quotations on Usenet for fandom slang (which I take to be generalizable to internet slang because this proposal impacts our coverage of internet slang as well) provided that the terms in question see significant use (outside Usenet), which appears to be easily refutable. — Fytcha T | L | C 18:50, 31 October 2022 (UTC)[reply]
    I stand corrected: slang that sees significant and wide use in Internet conversations is not guaranteed to spread into Usenet, which sees greatly diminished use. However, I take the recent BP poll on Twitter to mean that Twitter quotations are allowed provided they meet a much more stringent standard than 3 quotations; there is evidence of consensus for that. Thus, truly significant use outside of Usenet should be covered by Twitter. If that interpretation of the BP poll is rejected, then increasing the number from 3 to 6 for Usenet could have some adverse effects. But these effects seem tolerable: people who search for very recent Internet fads are covered by Know Your Meme and Urban Dictionary. In general, requiring non-print quotations to meet a much more stringent standard than 3 quotations is the way to go, and Usenet is no exception. --Dan Polansky (talk) 08:00, 1 November 2022 (UTC)[reply]
  2.   Support (though it doesn't seem like a particularly high barrier to pass). — Sgconlaw (talk) 18:31, 1 November 2022 (UTC)[reply]

Oppose requiring 6 Usenet quotations edit

  1.   Oppose. This is the wrong approach to solve the issue brought up by TheDaveRoss. Words with 6 Usenet citations but zero non-Usenet use would be kept while words with 3 (but not 6) Usenet citations but very prolific internet use would be deleted (@WordyAndNerdy). I may reconsider once (if?) we start caring more about internet slang but for now this seems to clearly do a lot more harm than good. Furthermore, I still hold that it is better to err on the side of inclusion with these kinds of things because it's generally better to have too many words than it is to have too few. — Fytcha T | L | C 14:08, 31 October 2022 (UTC)[reply]
    Assuming that Twitter is allowed but needs to meet much more stringent quotation requirements than 3 (more stringent but unspecified), very prolific Internet use is covered by Twitter. --Dan Polansky (talk) 16:29, 31 October 2022 (UTC)[reply]
  2.   Oppose per Fytcha —Al-Muqanna المقنع (talk) 14:30, 31 October 2022 (UTC)[reply]
  3.   Oppose Usenet seems superior to other online text repositories in that it is more durably archived and less susceptible to the censorships that have become rampant. Usenet itself is becoming more difficult to use because of the changes in how Google allows access, ie, soft censorship. DCDuring (talk) 17:44, 31 October 2022 (UTC)[reply]
    I don’t see how censorship (or lack of) is remotely relevant to this discussion. Theknightwho (talk) 18:22, 31 October 2022 (UTC)[reply]
    Because we care about permanence. Our readers want to be able to verify whether the quotes we're providing for a given term actually exist or whether we just made them up. — Fytcha T | L | C 18:44, 31 October 2022 (UTC)[reply]
    I understand that, but I don't see any evidence for meaningful levels of censorship in the other sizeable internet archives that exist, or for Usenet being particularly immune from any that does occur. Plus, what evidence do we have for this being censorship in the first place, as opposed to archives simply becoming unavailable for commercial reasons? It distorts the meaning of the word beyond any recognition. Theknightwho (talk) 18:53, 31 October 2022 (UTC)[reply]
    I don't want to engage in the discussion about the exact label we attach to the behavior we all know is meant (label discussions are almost always pointless), but the underlying point remains that relatively more tweets are removed off of Twitter than Usenet posts are removed by whatever governing entity. I don't think we disagree on that, do we? And, given that, it seems obvious that "more durable website + internet archive" is superior to "less durable website + internet archive" because redundancy is obviously better if permanence is our goal. — Fytcha T | L | C 19:30, 31 October 2022 (UTC)[reply]
    I think it matters from the perspective of whether we use it as a deciding factor. Given that DCDuring also criticises Google for "soft censorship" in the same comment, it really stretches the bounds of credibility that Usenet alone is the only acceptable online source. Rather than attaching undue weight to hypothetical malign actors, I'd rather we looked at things in a more grounded way.
    I don't think we disagree on the larger issues here, fundamentally. Theknightwho (talk) 19:36, 31 October 2022 (UTC)[reply]
    Sure, and what does it tell us about 3 vs. 6 quotations? The proposal is not to ban Usenet or something. Does Usenet not being an edited source but rather conversational one not matter at all? Would 3 quotations from Twitter be okay if Twitter were not censored and were very well archived up to today? --Dan Polansky (talk) 09:21, 1 November 2022 (UTC)[reply]
  4.   Oppose. I don't like retrospectively raising the requirement. Should we not have some form of grandfathering, e.g. allowing quotes before a certain date to count as quotes from books, or perhaps quotes recorded here before a certain date. --RichardW57m (talk) 11:29, 1 November 2022 (UTC)[reply]
    I don't think grandfathering is a good idea. That only makes sense in situations where things will naturally fall out of use anyway, which doesn't apply here. Theknightwho (talk) 18:37, 2 November 2022 (UTC)[reply]
  5.   Oppose. This would only worsen our coverage, and it doesn't feel right to allow obscure words from print media while excluding equally uncommon words from Usenet posts. Overall, I agree with Fytcha's point that it's better to have more words than less. Binarystep (talk) 13:26, 2 November 2022 (UTC)[reply]
  6.   Oppose. I was pinged above. I probably wouldn't have voted otherwise. The need to constantly re-litigate settled matters like this is a large part why I'm retired. (As a courtesy I'd prefer not to be pinged in future discussions.) WordyAndNerdy (talk) 10:21, 3 November 2022 (UTC)[reply]
    You can manage notifications in your preferences. - TheDaveRoss 13:57, 3 November 2022 (UTC)[reply]

Other discussion edit

Usenet is not representative of modern Internet culture and has not been for a very long time. At best, it should be treated the same as other media platforms like Twitter, which has a much wider reach, and yet still failed to attain favored status in a recent vote. I abstain because I understand this vote is about a specific and clearly defined proposal, not a place to bring up other proposals. Soap 15:43, 31 October 2022 (UTC)[reply]

Discussing other specific proposals is welcome, although this poll intends to see whether a specific proposal could pass. For Twitter, I would also require at least 6 quotations (2:1) as a very minimum, in fact insufficient given the Twitter volume. Since 3-quotation Twitter did not pass, we will need to require more quotations from Twitter to have Twitter passed. --Dan Polansky (talk) 15:51, 31 October 2022 (UTC)[reply]
While I appreciate that this discussion is taking place, I think that any n-cites policy will be a problem.
Part of the problem is that of accessible volume. According to our main source for book cites (books.google.com) they have digitized about 40 million books. At an average of 90,000 words (about 300 pages) that equates to 3.6 trillion scanned words. Twitter averages about 6000 Tweets per second, those average 28 words, which is 5.3 trillion words per year. There are probably going to be more words Tweeted this decade than have ever been published in books and periodicals in all of recorded history. And we have easy access to every last word of it. The CFI was never intended to mean that "words which have been used at least three times are permitted," but since that is how it is now interpreted the actual usage threshold for a word is absurdly low.
Part of the problem is that social media echoes. There are direct echoes (e.g. re-Tweets), and indirect echoes (e.g. copypasta). How do we account for that in some meaningful way in the CFI?
There is the problem of astroturfing, there is the problem of very small but loud populations, there is the problem of sampling bias (as we see with racism and fancruft being very easy to cite, while common LDL terms are nearly impossible to), there is recency bias (older platforms are now defunct, so the jargon of the day is less accessible). There are plenty of other problems as well.
Rather than try and tweak the CFI to attempt to balance non-edited media with edited media, I think we need to figure out a better metric or set of metrics for social media and its ilk. I do think it is important to be able to document words which have been adopted into the language even if the only evidence is on social media, but I don't think we need to document extremely rare, extremely niche terms which have likely only ever been used by a handful of individuals. - TheDaveRoss 16:27, 31 October 2022 (UTC)[reply]
CFI is not being misinterpreted. CFI says very clearly that 3 quotations are enough and that Usenet is one of the accepted sources for these quotations. What CFI was intended to do is unknown since the requirement was never voted in: someone edited it in back in 2005-2007 without opposition, and that's it. If the sheer volume is a problem (and I think it is), increasing the number to 6 is not really a true solution but rather an acknowledgement that 3 quotes from unedited corpus is a ridiculously low standard. If such a ridiculously low standard as 6 quotations still cannot be accepted, then we are stuck, I guess, as far as Usenet. As far as Twitter, the opposition still holds some cards since 3-quotation Twitter did not pass. --Dan Polansky (talk) 16:39, 31 October 2022 (UTC)[reply]
I remember plenty of IRC debates about the CFI from the years 2005-2007, I feel comfortable saying that the sentiment of the time did not match the current interpretation. That doesn't matter, what happened in 2005 has little bearing on how we are managing the project today. - TheDaveRoss 16:52, 31 October 2022 (UTC)[reply]
My point is that the WT:ATTEST text is clear, unambiguous, its interpretation over a decade is stable and almost never disputed, and the original intent is unknown. Whoever wrote the current text into CFI must have known what "3 quotations from Usenet" means in terms of practical consequences. Those who did not dispute the current text for over a decade must have known that as well. RFDs like yours that imply CFI requires more than 3 quotations are almost never seen; I don't remember any such RFDs. Either we want to tighten the standard for Usenet now or we don't, and that's it. --Dan Polansky (talk) 17:08, 31 October 2022 (UTC)[reply]

Adding Classical Guarani edit

Hello! As you may see, I'm (relatively) new to Wiktionary. I've been wanting to add Classical Guarani to Wiktionary but I need some help, as I don't know much about this. I have a lot to contribute and I would really appreciate if this language could be added to Wiktionary. Thanks in advance! ~𝔪𝔢𝔪𝔬 (talk) 10:07, 31 October 2022 (UTC)[reply]

"Please review how the alphabet works; pretty sure D comes before S." edit

(Since user doesn't allow criticism on their talk page.)
@Nicodene, comments like "Please review how the alphabet works; pretty sure D comes before S." are extremely toxic and are wholly unwelcome in the community. -- Skiulinamo (talk) 18:53, 31 October 2022 (UTC)[reply]

@Skiulinamo Wrong, it's not that I 'don't allow criticism'- a look at my talk page immediately proves otherwise. It's that you, in particular, have been hounding me across multiple Wiktionary pages, and I'm tired of it. Nicodene (talk) 19:19, 31 October 2022 (UTC)[reply]
@Nicodene: However you may think I am "hounding" you, it doesn't justify your unbecoming behavior. --Skiulinamo (talk) 19:29, 31 October 2022 (UTC)[reply]
@Skiulinamo Oh we're going to bring up interactions with other people, then? Have anything to say about this or this? Both about ten days ago, mind. I also see that you're (partially) blocked for making disruptive edits, which I can't say is too unexpected. Nicodene (talk) 19:57, 31 October 2022 (UTC)[reply]
You're deflecting. I was not rude nor condescending in any of those interactions. Do you honestly feel as though your comment was warranted and above scrutiny? --Skiulinamo (talk) 20:07, 31 October 2022 (UTC)[reply]
Some might say that frivolous reverts are a form of rudeness. Nicodene (talk) 01:37, 1 November 2022 (UTC)[reply]
I'm not going to play the game of your tu quoque fallacy. My point in posting this is done. --Skiulinamo (talk) 01:43, 1 November 2022 (UTC)[reply]
Oh don't mind me, I'm just documenting your behaviour. Nicodene (talk) 02:01, 1 November 2022 (UTC)[reply]
It's a bit sarky but doesn't seem like anything to get worked up over. —Al-Muqanna المقنع (talk) 21:56, 31 October 2022 (UTC)[reply]
@Al-Muqanna: It's chronic behavior, that unless publicly documented, will go unaddressed. This is me doing so. --Skiulinamo (talk) 22:37, 31 October 2022 (UTC)[reply]
Chronic behaviour, you say. I suppose that explains the new account. Nicodene (talk) 01:21, 1 November 2022 (UTC)[reply]
@Nicodene, please be more polite in the future. @Skiulinamo (seriously, your former name was much easier), BP is hardly a platform for this type of minor rudenesses. In the future, consider writing a short email to an active admin instead. We can all be a little grumpy sometimes, especially you. Thadh (talk) 23:23, 31 October 2022 (UTC)[reply]
You're correct, this would have been much better left on a talk page, but again, this isn't about a resolution, but documentation. --Skiulinamo (talk) 00:04, 1 November 2022 (UTC)[reply]
@Nicodene, @Skiulinamo, @Thadh: Why are pluit and pluo separate entries?!? The verb is one, pluere, with an impersonal and a personal use. It's exactly the same in Italian and we don't have two entries in the dictionary... How weird... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:29, 11 November 2022 (UTC)[reply]
@Nicodene, @Skiulinamo, @Thadh: and what does it mean that pluo is an "alternative form" or pluit?? It's literally the same verb! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:38, 11 November 2022 (UTC)[reply]
I'm not a Latin editor, I wouldn't know. It seems to be a difference between impersonal and personal use, but I don't know if that's a difference worth documenting, I'll leave that to people who know anything about Latin. Also, what does it have to do with this discussion? Thadh (talk) 12:22, 11 November 2022 (UTC)[reply]
@Thadh: Nothing. Sometimes discussions branch. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 17:15, 11 November 2022 (UTC)[reply]

@Skiulinamo, please mention that you're a sock of @Victar on your talk page or user page. PUC20:01, 31 October 2022 (UTC)[reply]

Styling of the list of discussed online-online sources edit

I am preparing an upcoming vote and I would like to know which style of listing discussed sources people like better.

On the one hand, there is the table-based approach as in User:Fytcha/Sources. The advantages are that it is sortable and compact.

On the other hand, there is the section-based approach as in Wiktionary talk:Votes/pl-2022-06/Streamlining the approval process of online sources § Big suggestion. The advantages are that it is easier to edit, incurs less cognitive load and that it can more easily accommodate bigger chunks of text. — Fytcha T | L | C 22:10, 31 October 2022 (UTC)[reply]

I like the table better edit

  1.   Vininn126 (talk) 23:19, 31 October 2022 (UTC) Despite what was mentioned, I actually find the table much easier to process. Perhaps that is just me. I recognize there are upsides to the sections. Vininn126 (talk) 23:19, 31 October 2022 (UTC)[reply]
  2.   Support Per Vininn126. Can we make the colours softer, though? That red is burning my retinas. Theknightwho (talk) 23:26, 31 October 2022 (UTC)[reply]
    @Theknightwho: I tweaked them both a little. — Fytcha T | L | C 23:38, 31 October 2022 (UTC)[reply]
    @Fytcha Thanks. I hope you don't mind, but I've softened them again in line with WP:Reliable sources/Perennial sources (which wasn't why I cared, but I remembered was easy on the eyes). Maybe my screen is just too bright or something, so feel free to revert. Theknightwho (talk) 23:43, 31 October 2022 (UTC)[reply]
    @Theknightwho: No, I actually like it better now. Thanks. — Fytcha T | L | C 23:55, 31 October 2022 (UTC)[reply]
  3.   SupportAl-Muqanna المقنع (talk) 23:51, 31 October 2022 (UTC)[reply]
  4.   Support. Easier to read for me. Though I wonder how votes like those for melanoheliophobia will be taken into account. AG202 (talk) 05:05, 1 November 2022 (UTC)[reply]
  5.   Support: the table seems fine to me. — Sgconlaw (talk) 18:34, 1 November 2022 (UTC)[reply]

I like the sections better edit

Comments edit

I think one thing we really need to consider with online sources is not making it all-or-nothing for a website - we need a little nuance when approving something. i.e. increased diversity of quotes from many sites. Vininn126 (talk) 23:19, 31 October 2022 (UTC)[reply]

@Vininn126: If I understand you correctly, you're saying we should make it possible for terms to only pass if they occur on different websites as well? I.e. some websites should only count if the term occurs on other websites as well? — Fytcha T | L | C 23:38, 31 October 2022 (UTC)[reply]
What I mean to say is we should allow certain websites to be allowed under certain criteria, depending on the website. This is well covered by the "additional constraints" sections but has been little dealt with in the actual votes! Vininn126 (talk) 23:41, 31 October 2022 (UTC)[reply]
@Vininn126: Gotcha, makes sense. Yeah, there's both the "Parts" field as well as the "Additional constraints" field. I hope that allows for enough granularity and I also hope proposers will actually make use of them. The four BP votes that I've added to the table unfortunately didn't deal with these details so I was more or less forced to just make something up that I thought was reasonable. I can change it if people disagree with my choices. — Fytcha T | L | C 23:55, 31 October 2022 (UTC)[reply]
@Vininn126: during the vote which led to WT:DEROGATORY (Wiktionary:Votes/pl-2022-06/Attestation criteria for derogatory terms#Option 2), option 2 included the following line: “In addition, where applicable, the quotations must be from two or more different sources. For this purpose, a particular website (for example, Reddit, Twitter, or Usenet) is considered as one source.” This option was not supported, so it seems unlikely that there will now be a consensus on requiring quotations to be from at least two different websites. — Sgconlaw (talk) 19:24, 2 November 2022 (UTC)[reply]