Open main menu

Wiktionary:Beer parlour/2019/May

discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← April 2019 · May 2019 · June 2019 → · (current)


Using an arrow in synonyms to mean seeEdit

For some time, I've been using an arrow (→) in the synonyms template to mean "see". I think it's pretty neat and can be picked by the readers reasonably fast. Today, an admin prevented me from using the arrow in buřt, so let's discuss.

An example appearance is in this revision of buřt, approximately:

  1. fatso, butterball
    Synonym: → tlusťoch

An alternative is this:

  1. fatso, butterball
    Synonym: see tlusťoch

I think the arrow is neater.

Thoughts? --Dan Polansky (talk)

Using arrows in synonymsEdit

User:Dan Polansky invented some new notation for synonyms, using an arrow to indicate to the user that there are more synonyms on another page. In a sense, this is analogous to our current use of "see Thesaurus page", which is officially supported by code in Module:nyms. The new notation can be seen in diff. I have removed it for the moment, for reasons that are both technical and merit based:

  1. The arrow is included in the Czech language tag, which means that the arrow is considered part of the Czech term, which is obviously not the case. Some screen readers may handle this by reading out the Czech word for "arrow", and we definitely do not want that. Punctuation and other symbols that are not part of the term should not be included in the language tag.
  2. This is not a standard practice, and is basically a creative misuse of {{syn}} that's not officially supported by Module:nyms.
  3. There is no explanation at all for what the arrow means. The other place where we regularly use arrows, in descendants sections, there is a mouseover text. That's not great either, though. When we direct users to a Thesaurus page, we include the word "see", which is much clearer. If we allow this practice, then we should definitely use "see", of course placed outside the language tagging so that it's not interpreted as Czech.
  4. Given that we have Thesaurus pages to hold collections of synonyms, redirecting the user to another normal entry is a bad practice. The other entry may have multiple senses each with their own synonyms, and the user then has to hunt for the sense that the first entry was trying to refer to.

What do others think of this? —Rua (mew) 11:03, 1 May 2019 (UTC)

I already opened #Using an arrow in synonyms to mean see. ---Dan Polansky (talk) 11:05, 1 May 2019 (UTC)
Let me just note that the use of "see", "see also" or the like is a very useful practice that I have used for multiple years (I think) in the traditional Synonyms sections. It works reasonably well; it allows picking one page as a central collecting location of synonyms without a need of a separate Thesaurus page. The downsides mentioned above are minor, from my experience, and the upside is huge.
As for what this discussion is about, the title of the thread points to the use of arrow rather than raising the subject whether the "see" technique should newly be prohibited. Where there is thesaurus page, the "see" technique can be used with the thesaurus page instead of another mainspace page. --Dan Polansky (talk) 11:12, 1 May 2019 (UTC)
Please don't. The synonyms on the individual "arrow" page might change, get moved to the thesaurus or whatnot. It's confusing (duplicate concepts, where I'm I supposed to look? Thesaurus? Entry? Both?) and requires more bookkeeping. – Jberkel 11:26, 1 May 2019 (UTC)
Czech entries have almost no thesaurus entries so the problems do not arise. It is a scheme alternative to Thesuarus, admittedly, but that is not necessarily a bad thing, I think. Using that technique for English would be confusing, but not for Czech. For Czech, there is no bookkeeping overhead. --Dan Polansky (talk) 11:38, 1 May 2019 (UTC)
An example for ease of reference: tlustoprd is an entry created by me in June of 2014 that uses the technique and points the reader to tlusťoch. --Dan Polansky (talk) 12:00, 1 May 2019 (UTC)
"Czech entries have almost no thesaurus entries so the problems do not arise." Why not? Sounds like this could be a solution. – Jberkel 18:03, 1 May 2019 (UTC)
Sure thesaurus would be a solution. I already have a solution, one that works, is widely deployed in Czech entries, and is simpler than the thesaurus. I have no desire to switch the deployed solution to the thesaurus: the solution is simple, neat, and works well, as far as my experience shows me. The solution does not prevent transition to the thesaurus later. The reader can be happy. Who is not happy is CodeCat/Rua. I am not happy with CodeCat/Rua interfering with my expanding the useful content for our readers. I do not appreciate having to deal with what to me look like pseudo-problems. --Dan Polansky (talk) 14:29, 6 May 2019 (UTC)

PIE Tʰ > T /_s#Edit

In our PIE reconstructions, we recognize Szemerényi's law and Stang's law and have incorporated them into our PIE inflection templates. Does anyone have any objections to including the other word-final rule of Tʰ > T /_s#.[R0 1][R0 2][R0 3][R0 4] --{{victar|talk}} 12:16, 1 May 2019 (UTC)

We should probably stick to more mainstream scholarship on Wiktionary, and not stray too far into radical new ideas. Is this a widely recognised rule, and not one limited to one particular university or school of thought? —Rua (mew) 17:14, 1 May 2019 (UTC)
It's not a fringe theory and is required for explaining the lack of Bartholomae's law and voicing of word-final consonant clusters, but most of my sources are Irano-centric, so it would be helpful if those familiar to other branches could comment and give counter-examples, if they exist. --{{victar|talk}} 18:25, 1 May 2019 (UTC)
Well, Celtic, Germanic and Latin are of no help at all here because they merge all three stop series before a voiceless obstruent. I don't know enough about Balto-Slavic to be sure, because Winter's law is the only direct reflex of the voiced-aspirate distinction. And that's about all the branches I know anything about. —Rua (mew) 19:56, 1 May 2019 (UTC)
And of course, helpfully, the primary source of obstruent + s sequences is the aorist, which has a lengthened vowel of PIE origin and thus is ineligible for a Winter's law distinction in Balto-Slavic... —Rua (mew) 20:01, 1 May 2019 (UTC)
@JohnC5 can confirm this exists in Ancient Greek as well. I've asked him for an example when he has a moment. --{{victar|talk}} 20:02, 1 May 2019 (UTC)
@Rua: Lithuanian vapsvà < *wobʰseh₂ (English wasp) seems good for BSl, but we also have Old Prussian wobse, so I dunno. AG certainly had a productive deaspiration process before *s found throughout the verbal system (see the fut. and aor. stems of γράφω (gráphō)). I think that this sound change may be too common to necessitate reconstruction. —*i̯óh₁n̥C[5] 06:34, 2 May 2019 (UTC)
@JohnC5, Rua: PII retains Tʰs both initial and medially, as seen in *wóbʰsos (wasp) > *wábʰsas > *wábzʰas > PII *wábžʰas > PIr *wábžah > YAv. 𐬬𐬀ß𐬰𐬀𐬐𐬀(vaßzaka). Looks like AG isn't going to be of any help to us if it deaspirates stops before sibilants everywhere. --{{victar|talk}} 07:17, 2 May 2019 (UTC)
Or maybe it's exactly the evidence we need. On the other hand, it appears to be a synchronic rule of Greek, given that deaspirated stops become voiceless and devoicing of aspirates is a Greek-specific change. If it were a PIE rule, then you'd expect voiced stops to result. I don't know what Greek does with voiced stop + s sequences though. —Rua (mew) 10:40, 2 May 2019 (UTC)

@Rua, JohnC5: I wanted to revisit this after creating *(s)rā́ps (turnip). Reconstructing it as *(s)rā́bʰ-s ~ *(s)rabʰ-és would be problematic because there are several derivatives from the nominative, such as *(s)rāp-éh₂, that require *(s)rā́p-s. Any thoughts? --{{victar|talk}} 08:17, 20 November 2019 (UTC)

I don't think there is enough evidence and support in linguistic sources for such a rule. Your page moves are premature. —Rua (mew) 09:52, 20 November 2019 (UTC)
@Rua: *(s)rā́ps (turnip) is a newly created entry. Do you have any opinions on this example? I believe the evidence for Tʰ > T /_s# is supported by scholars in the field, as evidenced in the sources below, but it's more of a matter of style, and I think not incorporating this change into entries is confusing to the reader, as seen in *(s)rā́ps. --{{victar|talk}} 21:13, 20 November 2019 (UTC)
  1. ^ Matasović, Ranko (2010), “The etymology of Latin focus and the devoicing of final stops before *s in Proto-Indo-European”, in Historische Sprachforschung, volume 123, issue 1, JSTOR 41219150
  2. ^ Lipp, Reiner (2009) Die indogermanischen und einzelsprachlichen Palatale im Indoiranischen: Neurekonstruktion, Nuristan-Sprachen, Genese der indoarischen Retroflexe, Indoarisch von Mitanni (Indogermanische Bibliothek; 3) (in German), volume 1, Heidelberg: Winter, page 212
  3. ^ Kobayashi, Masato (2017–2018), “Chapter V: Indic”, in Klein, Jared S.; Joseph, Brian D.; Fritz, Matthias, editor, Handbook of Comparative and Historical Indo-European Linguistics: An International Handbook (Handbücher zur Sprach- und Kommunikationswissenschaft [Handbooks of Linguistics and Communication Science]; 41.2), Berlin; Boston: De Gruyter Mouton, →ISBN, § The phonology of Indic, page 332
  4. ^ Kapović, Mate (2017), “Proto-Indo-European morphology”, in The Indo-European Languages (Routledge Language Family Series), 2nd edition, London, New York: Routledge, page 359

Reflex and orthographic changes to Proto-(Indo-)IranianEdit

There are two reflex and orthographic changes to Proto-Indo-Iranian and Proto-Iranian I'd like to discuss:

  1. When I started cleaning up and adding Proto-(Indo-)Iranian entries awhile back, I went with the orthographic choices of amd , which mirrored the entry for Proto-Indo-Iranian. Since then, however, that orthography has fallen out of favor in published academic works, which mostly prefer and *ȷ́, respectively, for both Proto-Indo-Iranian and Proto-Iranian.[R 1][R 2][R 3][R 4] I'm really not troubled either way, but it might look better for us to use academic standards, which, incidentally, also more closely echo the orthography the project uses for Proto-Indo-European (ḱ and ǵ).
  2. In Iranian, the spirantization of aspirated stops *pʰ, *tʰ, and *kʰ to *f, , and *x, respectively, is not seen universally, as evidenced in Sakan, Balochi, Parachi, and some dialects of Kurdish. It has been suggested that Proto-Iranian retained said aspirated stops.[R 4] Alternatively, all these languages experienced a fortition of fricatives, which led to a back-mutation. Any thoughts either way?

Are there any objections to either of these recommendations? If we went through with either, the use a of a bot complete the task would be ideal. Pinging: @JohnC5, Tropylium, AryamanA, Vahagn Petrosyan, Bhagadatta, Calak, Kwékwlos. --{{victar|talk}} 16:21, 1 May 2019 (UTC)

My preference is definitely for ć and j́. We use ś for Sanskrit, which is a reflex of ć, so using the same diacritic shows that continuity. We also use the same symbol for their Balto-Slavic cognates ś and ź, and as you mentioned their PIE ancestors ḱ and ǵ. So that fits much better. —Rua (mew) 17:19, 1 May 2019 (UTC)
I also prefer and *ȷ́. Standard Kurdish orthography ignores aspirated consonants, but usually use character to indicate aspiration.--Calak (talk) 17:24, 1 May 2019 (UTC)
I agree, that would be a better option. But for PIA we should write *c instead of *ć. Kwékwlos (talk) 17:40, 1 May 2019 (UTC)
Academic works on PIA preferenciate and *ȷ́ over *c and *j, which serves to inform the reader that they have different phonetic values than that of Sanskrit, so no, I would disagree with that change. --{{victar|talk}} 18:30, 1 May 2019 (UTC)
I'm fine with these changes. —*i̯óh₁n̥C[5] 06:22, 2 May 2019 (UTC)
@Victar, JohnC5, Rua: I too consent to the proposed changes. I would also like to make an additional proposition: to show the Proto-Indo-Aryan descendant of Indo-Iranian *ȷ́ as *ź (the fricative) and not as *ȷ́ (the affricate) as we currently do. The rationale is that just in the manner Indo-Iranian *ć produced Indo-Aryan *ś, the voiced counter part of *ć (ie, *ȷ́) produced the voiced counterpart of *ś, which is *ź (representing the /ʑ/ sound) and distinctive from the *ȷ́ (/d͡ʑ/) in PIA which comes from IIR *ǰ. Although both IIR *ȷ́ and *ǰ ended up as Sanskrit (ja), Mr Kobayashi believes that the distinction was preserved in the intermediary stage and Old Indo-Aryan passed through some kind of "affricate filter" that merged both the fricative and the affricate into the affricate. -- Bhagadatta (talk) 05:37, 3 May 2019 (UTC)
  • @DTLHS: Would you possibly be able to run a bot to replace to replace amd with and *ȷ́ in all iir-pro and ira-pro entries and links to them? --{{victar|talk}} 07:05, 5 May 2019 (UTC)
    I'll see what I can do. DTLHS (talk) 05:10, 6 May 2019 (UTC)
    @DTLHS: That would be much appreciated, thanks. --{{victar|talk}} 00:10, 7 May 2019 (UTC)
    Done, please watch for double redirects. DTLHS (talk) 05:09, 8 May 2019 (UTC)
    Thank you so much, @DTLHS! I hope it wasn't too much of a pain. --{{victar|talk}} 16:09, 8 May 2019 (UTC)
    @DTLHS, would it be possible to run your bot to include {{desc}} entries as well? --{{victar|talk}} 13:01, 14 May 2019 (UTC)
    I don't know what you mean. DTLHS (talk) 15:37, 14 May 2019 (UTC)
    @DTLHS: Well I just noticed this and I thought maybe {{desc}} wasn't included. --{{victar|talk}} 16:36, 14 May 2019 (UTC)
    I was only modifying links to pages that actually exist. DTLHS (talk) 18:06, 14 May 2019 (UTC)
    I see. --{{victar|talk}} 19:34, 14 May 2019 (UTC)
  • @DTLHS, would it be possible to run a ĉ -> ć, ĵ -> ȷ́ replace on this list that @Erutuon was kind enough to generate? --{{victar|talk}} 20:52, 4 June 2019 (UTC)
    Also, I was discussing it with others, and I think plain *c and *j are better suited for Proto-Iranian [ira-pro] (note: still and *ȷ́ for [iir-pro]). Maybe @Erutuon can help generate @Erutuon generated a list for that as well. You would have to run your script for moving pages, which would be nice if they could be moved without a redirect this time around. --{{victar|talk}} 21:02, 4 June 2019 (UTC)
    Remind me in a week. DTLHS (talk) 15:21, 5 June 2019 (UTC)
    @DTLHS --{{victar|talk}} 17:55, 14 June 2019 (UTC)
    @DTLHS Is this something you might now have time for? --{{victar|talk}} 18:05, 13 October 2019 (UTC)
    Done. DTLHS (talk) 05:53, 16 October 2019 (UTC)
  1. ^ Lipp, Reiner (2009) Die indogermanischen und einzelsprachlichen Palatale im Indoiranischen: Neurekonstruktion, Nuristan-Sprachen, Genese der indoarischen Retroflexe, Indoarisch von Mitanni (Indogermanische Bibliothek; 3) (in German), volume 1, Heidelberg: Winter
  2. ^ Martínez García, Javier; de Vaan, Michiel (2014) Introduction to Avestan (Brill Introductions to Indo-European Languages; 1), Brill, →ISBN
  3. ^ Skjærvø, Prods Oktor (2017), “Avestan and Old Persian Morphology”, in Kaye, Alan S., editor, Morphologies of Asia and Africa[1], Winona Lake, IN: Eisenbrauns
  4. 4.0 4.1 Kümmel, Martin Joachim (2014), “The development of laryngeals in Indo-Iranian”, in The Sound of Indo-European[2], volume 3, Opava

Japanese soft-redirection necessitates kana-centric approach for wagoEdit

{{ja-see}}, the Japanese soft redirection template, is meant to be put under ==Japanese== or ===Etymology x===. This means that soft-redirection entries do not have POS headers, which affects the table of contents of those pages.

For example, please take a look at the three etymology sections of した. In the table of contents, only Etymology 3 has "Pronunciation" and "Verb" subheaders, while Etymologies 1 and 2 are "bare", having no POS subheaders. This makes nagivating by POS inconvenient. (I frequently look up English words in Wiktionary, and I rely heavily on the POS links in the table of contents.)

Moreover, there is no reason why the three etymology sections should look different. All are wago words, but just because the first two have kanji spellings, does not mean they should be "second-class citizens" meriting a single line in the table of contents, while the verb form remains a full-fledged entry. By contrast, please take a look at わっぱ, which looks much better.

There is no way for {{ja-see}} to automatically generate the POS headers at the proper level (L3 or L4) because it doesn't know its position in the page. Therefore if we want to have soft-redirection at all, the "inequality" mentioned above is inevitable. This adds one more argument for the kana-centric approach for wago. By choosing the kana spelling as the lemma entry,

  1. Imagine how clean would look and how easy it would maintain if it's put in the format of !
  2. We emphasize that the kanji is an encoding of the spoken word, rather than that the spoken word is the decoding of the kanji. I'm not sure which direction western learners of Japanese take, but linguistics suggests the former direction. (The latter direction, if needed, could be built in the ===Kanji=== section.)
  3. In addition, we reduce the chance of {{ja-spellings}} and {{ja-kanjitab}} appearing together for a kanji lemma entry, which would take up too much space. We also solve the symmetry problem that <かえる: non-lemma, 帰る: full-fledged entry, 還る: non-lemma, 返る: non-lemma, 反る: non-lemma> looks imbalanced while <かえる: full-fledged entry, 帰る: non-lemma, 還る: non-lemma, 返る: non-lemma, 反る: non-lemma> looks better.

In any case, the Japanese soft-redirection system works best with the kana-centric approach for wago. There is at least one issue we haven't discussed before: whether to use kana for compounds like 繰り返す. But at least I think we have agreed on using kana for the "core wago vocabulary" – roughly, those appearing as kun readings of kanji.

I propose that we make a proposal to update Wiktionary:About Japanese#Lemma entries. If it passes, we can update the core wago entries to use the new format. If not, we can remove the soft-redirection system (at least for wago entries) and revert to the plain old {{alternative spelling of}}, which would restore the proper POS headers.

(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): --Dine2016 (talk) 11:37, 2 May 2019 (UTC)

  Support. —Suzukaze-c 06:37, 4 May 2019 (UTC)
  Support redirection of "core wago vocabulary" which have numerous kanji forms to their kana forms. However, I think redirection of wago 複合語 entries such as 山登り, 切り倒す, 繰り返す is not necessary because their kana forms やまのぼり, きりたおす, くりかえす are not that recognizable. KevinUp (talk) 00:47, 5 May 2019 (UTC)
  Oppose: I prefer using most common spellings. The problem of English Wiktionary is rather existence of too many rare or archaic readings. It’s good to move nonstandard readings to each hiragana page. — TAKASUGI Shinji (talk) 15:35, 5 May 2019 (UTC)
@TAKASUGI Shinji I find this a good compromise. Can you put it into a guideline for use on WT:AJA? I'm not good at writing in English.
By the way, I think the default entry layout of the English Wiktionary also contributes to the problem. WT:EL requires that etymology and pronunciation be put before the headword, even for langauges like Japanese where the reader must first locate the headword and then get information on the word, which is a great distraction. By contrast, the entry layout of the Japanese Wiktionary is much more suitable: etymology and pronunciation are subordinate to the headword, so the reader can easily navigate in an ocean of homographs. If the English Wiktionary allowed such a format, then the table-of-contents problem would no longer exist:
Extended content



{{ja-verb form}}

# {{inflection of|する||perfective|lang=ja}}

--Dine2016 (talk) 06:14, 6 May 2019 (UTC)
@Dine2016, the sample structure in the expansion section is problematic -- has pitch accent 0, while and have pitch accent 2. This suggests that Pronunciation would have to come before the POS. ‑‑ Eiríkr Útlendi │Tala við mig 18:32, 8 May 2019 (UTC)
@Eirikr: I don't think so. The full format would be
Extended content



so it's still organized by word, just with the words beginning with POS headers and definitions. The problem is rather what to do with words having multiple POS sharing the same etymology and pronunciation, and single kanji which are usually POS-fluid. The Oxford English Dictionary solves this problem by having headwords in the form “xxx, adj. and n.”, but on Wiktionary ===Adjective and noun=== doesn't look good. --Dine2016 (talk) 06:21, 9 May 2019 (UTC)

Hello everyone, I created the vote at Wiktionary:Votes/pl-2019-05/Lemmatize Japanese wago words at kana spellings. What do you think about the wording?

@Atitarev, Cnilep, Eirikr, KevinUp, Korn, Suzukaze-c, TAKASUGI Shinji --Dine2016 (talk) 14:56, 5 May 2019 (UTC)

@Dine2016 If possible, can you reword the "general rule" in the proposed vote to the following categories to make things clearer?
  1. terms commonly written with kanji漢語 (kango), i.e. Sino-Japanese terms that have kana readings based on 音読み (on'yomi)
  2. terms without kanji spellings外来語 (gairaigo), i.e. Japanese loanwords that are usually written using katakana.
  3. 和語 (wago) words or 大和言葉 (Yamato-kotoba), i.e. native Japanese words that have kana readings based on 訓読み (kun'yomi). These usually have multiple kanji spellings.
  4. Derivatives or compound words such as お巡りさん (omawarisan), 山登り (yamanobori), 繰り返す (kurikaesu).
  5. Special consideration:
    1. Words that have reading patterns based on 重箱読み (jūbakoyomi) or 湯桶読み (yutōyomi) (mixture of on'yomi and kun'yomi readings)
    2. Words with irregular readings such as 当て字 (ateji). Examples include common words such as  (とう)さん (otōsan, father) and  (とも) (だち) (tomodachi, friend).
If I'm not mistaken, categories 1,2,4 are mostly not affected by this vote while category 3 (native Japanese words) will be the one that is greatly affected. I'm not sure about category 5 but for me, I would prefer for the most common spelling to be used. I think the entry at 寿 () () (sushi) can stay while the entry at  () (sushi, sushi) can be moved to its kana form すし (sushi). Note that both are wago terms with the same kana reading and the only difference is in its reading pattern. KevinUp (talk) 02:41, 6 May 2019 (UTC)
@KevinUp: Thanks for your reply. I think 寿 () () (sushi) should also be moved to the kana form because (1) it is wago and (2) its word formation is not clear from the kanji. The word has three kanji spellings (, and 寿司) so lemmatizing at kana would be better. Lemmatizing at kana also makes its relationship with  () (sushi) clear. The word formation of  (とう)さん (otōsan, father) and  (とも) (だち) (tomodachi, friend) is clear from their kanji spellings, so they are better lemmatized at the most common kanji spellings. --Dine2016 (talk) 05:47, 6 May 2019 (UTC)
Thanks for the explanation. So if the word formation of a wago compound can be determined from its kanji spelling, then the main lemma form shall be the kanji spelling. I get it now. KevinUp (talk) 09:39, 6 May 2019 (UTC)
@KevinUp: This is off topic but absolutely important in a topic related to the Japanese language. Could you please stop linking romaji as if they are words? Thanks. —Anatoli T. (обсудить/вклад) 08:48, 6 May 2019 (UTC)
Okay, I've corrected the links. It seems that kango and ateji have their own English entry. I added the links only for this discussion and of course I don't do this when dealing with actual entries. KevinUp (talk) 09:39, 6 May 2019 (UTC)

I have a question. Can {{ja-see}} exactly determine, out of several definitions in another page, which is intended for the current page? In case I had not made it clear, suppose ははは means "1. apple; 2. peach". It can be written 派派派 when meaning "peach". Can a {{ja-see}} in page 派派派 state it clearly that 派派派 means "peach" but not "apple"? -- Huhu9001 (talk) 06:00, 6 May 2019 (UTC)

@Huhu9001: Yes, please see 暗い and 貴方 for examples. --Dine2016 (talk) 06:36, 6 May 2019 (UTC)
@Dine2016: It seems the code differentiate them by etymologies. Does that still work when the two senses share a same etymology? -- Huhu9001 (talk) 07:35, 6 May 2019 (UTC)
@Huhu9001: Yes, that's what {{ja-def}} is for. --Dine2016 (talk) 08:59, 6 May 2019 (UTC)

Only lemmatize rare or archaic readings at kana spellingsEdit

Hello everyone, I have added an option to the proposed vote to lemmatize only rare or archaic readings at kana spellings, per Shinji's comment above. Under this option, would retain only みず (mizu) (and the on'yomi  (すい) (sui)) as full-fledged entries, with the (mi) and もい (moi) readings moved to the hiragana entries even though they're frequently spelled in compounds. Similarly, would retain only わらべ (warabe), and would retain only わたし (watashi) and わたくし (watakushi). What do you think about this approach?

(Notifying Eirikr, TAKASUGI Shinji, Nibiko, Atitarev, Suzukaze-c, Poketalker, Cnilep, Britannic124, Nardog, Marlin Setia1, AstroVulpes, Tsukuyone, Aogaeru4, Huhu9001, 荒巻モロゾフ, Mellohi!): also pinging @KevinUp --Dine2016 (talk) 06:19, 7 May 2019 (UTC)

is an example of what the new approach would look like. --Dine2016 (talk) 15:39, 7 May 2019 (UTC)
I think the entry unfortunately looks like a bit of a dog's breakfast, and the apparently arbitrary inclusion or exclusion of material impairs the usability. If we lemmatize wago at the kana spellings, it would be much more consistent and clear to users if we do that across the boards. The ware reading may be spelled more commonly as in modern texts, but the spelling is also in evidence, and it's listed in modern dictionaries. It's unclear to me why we wouldn't lemmatize at われ. Likewise for the wa reading, I'd expect to find the entry at instead of .
Ultimately, what we're doing in all of this is attempting to overcome the limitations imposed by the back-end structure on the one hand, and our usage practices on the other. Electronic Japanese dictionaries appear to use a biaxial indexing system, where a user can enter either the reading or the spelling, and be presented with a list of hits showing the intersection between that reading and all matching spellings, or that spelling and all matching readings. The MediaWiki back-end can't handle that as-is.
Technically speaking, it might be possible to make clever use of transclusion to reproduce the behavior of electronic Japanese dictionaries, allowing users to go to われ and see full information for all terms that have that reading, or go to and see full information for all terms that have that spelling -- without having to jump through the hoops of clicking through soft-redirect links. However, if this is even possible, any implementation of this approach would require changes in how we create and edit entries. ‑‑ Eiríkr Útlendi │Tala við mig 18:28, 8 May 2019 (UTC)
@Eirikr: Yes, you're right. Take a look at and you'll see the problem caused by redirecting kanji to kanji. Etymology 1 is really two words, but spitting them into two etymology sections would require extra mechanism to specify "which word in the lemma entry in intended" for each {{ja-see}}. This could be avoided by having kana as main hubs for wago. More importantly, having kana as wago paves the way for unified Japanese (cf. User:荒巻モロゾフ/draft). --Dine2016 (talk) 06:21, 9 May 2019 (UTC)
I think that if option 2 was implemented, it would be much harder to implement a unified Japanese approach for Japanese lemmas. Since Wiktionary is an etymological dictionary, I would prefer to see native Japanese words being lemmatized at their kana forms and Sino-Japanese terms lemmatized at their kanji forms. Korean has a category for Category:Native Korean words. Does such a category exist for Japanese words? KevinUp (talk) 06:36, 10 May 2019 (UTC)

@Eirikr I find my original proposal (Option 1) a failed attempt to limit the number of terms affected. For example, the word-formation of 青い is clear from its kanji spelling, yet it should obviously be lemmatized at あおい. I have changed Option 1 to lemmatizing all wago at kana; please take a look at the wording in the proposed vote and provide feedback. (For example, should proper nouns use the kana spelling?) --Dine2016 (talk) 15:47, 9 May 2019 (UTC)

Also pinging @KevinUp, Suzukaze-c. --Dine2016 (talk) 15:52, 9 May 2019 (UTC)

Yes, the current wording is much better now. With regards to proper nouns, I think the same rule can be applied, so  () (ほん) (Nihon) would be lemmatized at its kanji form while  () () (Fuji) would be lemmatized at ふじ instead.
However, for Japanese names, particularly given names, I would like to see all given names lemmatized using hiragana only (see also Wiktionary:Beer parlour/2019/February#Kanji compounds for Japanese given names). KevinUp (talk) 06:36, 10 May 2019 (UTC)
@KevinUp Why are you now suggesting to lemmatise  () (ほん) (Nihon) and  () () (Fuji) at kana, if they are not wago?! --Anatoli T. (обсудить/вклад) 01:46, 11 May 2019 (UTC)
@Atitarev: If you look at the etymology for  () () (Fuji), the term is actually derived from Old Japanese. As for  () (ほん) (Nihon), I mentioned that it would be lemmatized at its kanji form, not at its kana form. KevinUp (talk) 08:59, 11 May 2019 (UTC)

かとう is an example of how inflected forms would look like under option 1. --Dine2016 (talk) 11:30, 10 May 2019 (UTC)

Broadly speaking, I think that structure looks good.
At a finer-grained level, I don't agree with the "infinitive" POS label. For adjectives in particular, the -く and -う forms function as adverbs; there's nothing particularly infinitive about them. ‑‑ Eiríkr Útlendi │Tala við mig 23:16, 10 May 2019 (UTC)
@Eirikr: The use of the label "infinitive" for what traditional Japanese grammar calls the ren'yōkei is pretty common in linguistics. See, for example, Samuel E. Martin's A Reference Grammar of Japanese, John R. Bentley's A Descriptive Grammar of Early Old Japanese Prose, Bjarke Frellesvig's A History of the Japanese Language, [3], [4], [5]. However, it seems that western linguists have not agreed on some other labels. For example, Samuel Martin calls the dictionary form the imperfect while Bjarke Frellesvig calls it the nonpast. So it's better to stick to traditional labels such as shūshikei or self-evident labels such as "-(r)u form". --Dine2016 (talk) 07:57, 14 August 2019 (UTC)
@Dine2016: Two thoughts. 1) In the sources listed, the English term infinitive is consistently used in reference to the 連用形 (ren'yōkei) of only verbs, not adjectives (for which, as noted above, the 連用形 (ren'yōkei) would be more properly classed as an adverb). 2) There are various ways in which we (the Wiktionary editing community in general) eschew academic linguistic labels, in favor of more commonly understood terminology. (See also the recent discussion of the ergative label for English verbs at Wiktionary:Beer_parlour/2019/August#Flowers_wavered_in_the_breeze.) I recommend that we adopt any English-language terms for 連用形 (ren'yōkei) in a similar fashion -- i.e. not necessarily matching academic usage. ‑‑ Eiríkr Útlendi │Tala við mig 22:41, 14 August 2019 (UTC)
@Eirikr: Actually, the sources do use the label "infinitive" to refer to the ren'yōkei of adjectives:
Samuel E. Martin's A Reference Grammar of Japanese, section 9.1 "The Infinitive":
The infinitive has the shape -í for consonant verbs, abbreviated to zero for vowel verbs (with sí for suru and kí for kúru—the imperfects have assimilated the first vowel to the second); adjectives take the shape -kú [after removing the -i which is the imperfect ending of our nuclear sentence corresponding to the verbal -(r)ú]; […]
John R. Bentley's A Descriptive Grammar of Early Old Japanese Prose, section "Infinitive -ku"
The infinitive of stative verbs is formed by attaching -ku.
Bjarke Frellesvig's A History of the Japanese Language, Table 3.8 OJ adjectival copula forms
Infinitive-1 ku
I agree that the "adverb" label (for adjectives) is more easily understood, but I think "continuative" isn't. The "continuative" label describes ~て, ~つつ, 連用中止法, etc., but it does not describe ~ます, ~た, ~たい - there's nothing particularly continuative about them. --Dine2016 (talk) 02:42, 15 August 2019 (UTC)
@Dine2016: Hmm, thank you for the additional examples. Frankly, ugh. The "infinitive" as defined by these linguists bears little apparent relationship to the infinitive as I learned it with regard to European languages, and indeed, I see little in the Wikipedia articles at [[w:Infinitive]] or [[w:Nonfinite verb]] that makes me think of the Japanese 連用形, and instead much to make me think that the 連用形 does not equate at all to the infinitive... Also, pretty much all other materials I've read about "infinitive" only apply this to verbs; if these linguists are making the case that Japanese -i adjectives or 形容詞 are stative verbs, they are unnecessarily muddying the waters by then clumping them together with regular 動詞 in other descriptions, like the above.
Re: how to label the 連用形, I ran across the "continuative" label some while back, wherein (if memory serves) the author made the case that the meaning of the term conjugated in this form "continues" in a compounding sense into the next verb or other inflecting term, as a rough translation of 連用 (as in, 用言に連れる, and possibly also taking a cue from the verbal sense of 連用する "to use something continuously"). For ~て, I've more commonly seen "conjunctive", as this is similar to the "[VERB] and ..." conjunctive construction in English. For 連用中止法, it has the 連用 right there, and it points to this same 連用形 verb / adjective form, so an English label that includes "continuative" seems appropriate. For ~つつ, yes, I can see an argument that "continuative" could be appropriate for this too, and thus ambiguous. However, it's often translated and explained in English in terms of "to be [VERB]-ing", as essentially a "progressive" form, so we have other options for terms we can use to label ~つつ.
About ~ます, ~た, or ~たい, I don't recall ever using, or seeing others using, the term "continuative" to refer to these, so that part of your comment confuses me. These attach to the 連用形 or "continuative", with potential sound changes depending on the verb (such as かき + た → かいた, or やり + た → やった, etc.), but these endings themselves are not the 連用形 or "continuative" forms of anything.
I admit to liking the term "continuative" for its almost-translation of 連用, but I'm open to suggestion -- so long as that suggestion isn't "infinitive". :)   In functional terms, the 連用形 attaches to another inflecting word, which is similar to how English adverbs work -- attaching to verbs or adjectives. Would you prefer "adverbial" as a label?
I'm in favor of continuing to use the Japanese term 連用形 in term etymologies, etc. That said, whichever way we finally go for the English term used as a gloss for this, we should probably update the 連用形 page and related pages like WT:AJA, to clarify what this means here on the EN WT.
Cheers, ‑‑ Eiríkr Útlendi │Tala við mig 19:10, 15 August 2019 (UTC)
@Eirikr: Thanks for your explanation. It seems that English labels are a chaos. For example, one author may call 未然形 the imperfective while another may call it the irrealis, one author may call 終止形 the terminal form while another may call it the conclusive, one author may call the ~(よ)う form the volitional form while another may call it the conjectural form, etc. So, whichever label we choose it might look strange to some eyes. In light of this, it may be better to just refer to the forms by their endings, e.g. -(a)- stem, -(i) form, -(y)ō form.
One difficulty with this approach is that different descriptions of Japanese grammar have different ways to divide verbs into a stem and an ending. In traditional Japanese grammar (学校文法), godan verbs are よ・む and ichidan verbs are た・べる. In the western analysis of Japanese grammar, godan verbs are yom-u and ichidan verbs are tabe-ru. In 日本語教育文法, the analysis is underlyingly よみ→よむ and たべ→たべる. That is, the stem of よむ is よみ, which makes sense more syntactically than morphologically. --Dine2016 (talk) 02:46, 16 August 2019 (UTC)
  • Agreed that things are bit of a dog's breakfast, both in English for the variance in terminology, and in Japanese for the disjunct between conjugated stem forms and underlying roots (i.e. the よむ→よみ view imposed by kana and the inseparability of consonants and vowels, versus the actual yom- root morphology).
One challenge in discussing forms by their endings is that, for instance, the 未然形 for 五段 verbs ends in -a, but for 上一段 verbs it ends in -i, and for 下一段 verbs it ends in -e. The Japanese terms are reasonably consistent, at least, so my practice so far has been to say things like, "derived from the 連用形 (ren'yōkei, continuative or stem form) of verb XYZ...", putting the 学校文法 terminology in the forefront. Then at the 連用形 entry, I've tried to list some of the more common variants found as the English translations. What's your thought on that? ‑‑ Eiríkr Útlendi │Tala við mig 03:59, 16 August 2019 (UTC)
The -i and -e of ichidan verbs actually belong to their stems, as can be clearly seen from the paradigm: tabe, tabe, taberu, taberu, tabere, tabeyo/tabero. The stem is clearly tabe-, so the 未然形 suffix is -a for godan but zero, not -i or -e, for ichidan.
I think the reason traditional Japanese grammar posits お・きる and あ・ける (instead of おき・る and あけ・る) is because traditional Japanese grammar is designed for Classical Japanese, where 二段 verbs are more common. But even 二段 verbs can be analysed as having stems like oki- and ake-, with phonological rules like oki- + -u = oku, etc. This analysis is adopted in John Bentley's and Bjarke Frellesvig's books and has the advantage that the rough shape of verb stems remain the same from Old Japanese to Modern Japanese: /oki2-//oki-/, as opposed to /oku//okuru//okiru/.
I think using 学校文法 terms and linking is a good idea. I don't like 学校文法 itself, but I think it's difficult for the community to agree on another description of Japanese grammar, so let's temporarily stick to 学校文法 terms. --Dine2016 (talk) 07:25, 16 August 2019 (UTC)
  • Ah, I think we may have missed each other somewhat -- you mentioned "the 未然形 suffix -a", which only makes sense from a non-学校文法 analysis of verb conjugations, whereas I was talking purely about 学校文法 and the final vowel in the conjugated verb stem forms. That said, that's more of a minor terminology mismatch, and it looks to me like we're both agreeing on the substance. :)
I believe your analysis of the reasons for Japanese dictionary stem organization (excluding the final kana of 上・下一段 verbs from the "root" by putting them to the right of the 中黒) is probably correct. I haven't read any specific explanation of why dictionaries do this, but they're pretty consistent, and you're right in that it only makes sense if we look at the etymologies and historical derivations, wherein the 上・下 verb forms (be they modern 一段 or historical 二段) all seem to derive as essentially defective paradigms from underlying 四段 roots (even if sometimes those roots fell out of use even before the OJP stage).
I've found Frellesvig's work pretty good so far from what I've read of his stuff directly, but I'm not 100% sold yet on his analysis of stem changes -- I recall reading somewhere that he posits that the -(a)- that appears in 四段 verbs as the 未然形 linking vowel may have arisen initially as the start of the negation suffix, which works well enough as a hypothesis, but then that runs afoul of the -i ending he also posits as part of the verb root: i + a as a fusion pretty consistently produces ⟨e1, not a. The "-a- as start of suffix" theory makes much more sense if the verb root ends in a consonant. It can't in any historical (i.e. written-down) flavor of Japanese, and I think this phonetic restriction has led to linguists jumping through weird hoops to account for improbable vowel shifts. However, if we raise the possibility that prehistoric (i.e. before writing) Japonic may have had coda consonants, things get somewhat easier to parse...
But I digress. :) As you say, 学校文法 has its warts, but it does seem like it might be the least-confusing set of labels at present, so let's go with that for now. ‑‑ Eiríkr Útlendi │Tala við mig 16:51, 16 August 2019 (UTC)
I think it's good. —Suzukaze-c 07:34, 12 May 2019 (UTC)
As for inflections, I have thought that omitting kanji spellings would be easier to manage (as with いった). いった is an inflection of いう・いく・いる, and 云った is an inflection of 云う. But perhaps this isn't applicable anymore if we use {{ja-see}}. —Suzukaze-c 07:36, 12 May 2019 (UTC)

The vote has started. I would appreciate it if you could clarify your positions at your earliest convenience, so that I could know what steps to take next. @KevinUp, TAKASUGI Shinji, Eirikr, Atitarev, Huhu9001 --Dine2016 (talk) 04:57, 24 May 2019 (UTC)

Wikidata Lab XV: Lexicographic DataEdit

The fifteenth Wikidata Lab will occur on May 23, 2019. This time the event will address lexicographic data within Wikidata, also known as lexemes. The event is free and open to all, but by limit of vacancies, prior registration is required. The event will eventually be recorded and will be available in the NeuroMat youtube channel.

This will be the fifteenth activity of a series of trainings for integrating This is the fifteenth activity of a series of trainings for the integration of the projects Wikidata and Wikipedia (and now Wiktionary!). The presentations, photographs and impact reports of the first fourteen activities are available for consultation at Wikidata Lab I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII e XIV, respectively.

The "Wikidata Lab XV: Lexicographic Data" will occur in CEPID NeuroMat, at the w:en:Universidade de São Paulo, in May 23,(Thursday), from 12:30 to 19:30 UTC.

The activities will be conducted by the wikimedian Léa Lacroix, in english. The event is offered by the group Wiki Movement Brazil and by CEPID NeuroMat, with support from the Foundation for Research Support of the State of São Paulo (FAPESP).

More informations: Visit the event page and sign up. Ederporto (talk) 15:18, 2 May 2019 (UTC)

The event will take place in São Paulo, Brazil, a bit far for me. Also, the page announced as the “event page” characterizes its aim as “training for the integration of Wikidata with the Portuguese Wikipedia”, so it is not clear that this is of interest to this project.  --Lambiam 18:45, 2 May 2019 (UTC)
Hi, @Lambiam, as stated before, the event will be recorded (and maybe streamed!), so you will have full access to the presentation. As for the event page at the ptwiki, our main target is our local community, but of course, we believe, as the presentation will be in english, and involves lexicography, this community could make a good use of it, that's why I announced here. Ederporto (talk) 02:03, 3 May 2019 (UTC)


What's the difference between:

For example, where does prefecture belong? Ultimateria (talk) 20:03, 2 May 2019 (UTC)

All three are categories of categories only, so the entry prefecture belongs in none of them. But we also have Category:en:Administrative divisions for “English terms related to administrative divisions” and Category:en:Political subdivisions for “English terms for political subdivisions, such as provinces, states or regions”. Some guidance or a mini-tutorial on when to apply which seems in order. On Wikipedia, Political subdivision redirects to Administrative division, so that is no help. The intersection of Category:en:Administrative divisions and Category:en:Political subdivisions contains county, dependency, département (is that English?), oblast, prefecture, province, region, and state. The terms categorized as administrative divisions are types of (sub)divisions, whereas the terms categorized as political subdivisions tend to be the names of specific entities that are instances of such (sub)divisions. But, as the overlap shows, the distinction has not been rigorously maintained.  --Lambiam 16:53, 3 May 2019 (UTC)
Somewhat surprisingly, the terms municipality and voivodeship are in neither of these categories. I also see we have no entries at all for crown dependency and special administrative region. --Lambiam 17:08, 3 May 2019 (UTC)
I meant which English category does prefecture belong in. I would like to populate these categories in other languages, but as you point out, some guidance is needed. Could we merge them into a single category to contain instances of subdivisions (in the form of categories) and types of subdivisions (in the form of entries)? There's nothing intuitive about the current division. Ultimateria (talk) 00:37, 4 May 2019 (UTC)
Rather than merging them, it may be more useful and helpful to pick more evocative names for these categories, like Terms for administrative divisions and Names of political subdivisions. Renaming will also involve a bit of separating the sheep from the goats. There are currently twelve languages (Arabic, Chinese, Czech, German, English, Middle English, Esperanto, Ottoman Turkish, Polish, Serbo-Croatian, and Zhuang) for which we have a category “L2:Administrative divisions”. Each of those also has a category “L2:Political subdivisions”, but there are in total 144 of such categories. For languages using an alphabet and orthography that distinguishes minuscules (for common nouns) from Majuscules (for proper nouns), I think it is a safe bet that the minuscule minority are terms, not names.  --Lambiam 13:54, 4 May 2019 (UTC)
These are already set categories, which follows from their name, so they are already known to contain "terms for" things. The confusion arises because we need to differentiate types of administrative divisions from names for specific administrative divisions. However, this kind of confusion isn't specific to this case, there is also Category:en:Celestial bodies which could conceivably contain both planet and Jupiter, and really any other case where there are names for specific instances of something. More confusing is that we have both Category:English names and Category:en:Names. By our own category naming scheme, the first is a word-type category, in which the category name relates to the word itself (it contains words that are names), while the second is a set-type category, in which the category name relates to the referrent (the words refer to names but are not themselves names). It appears, though, that these two categories are used indiscriminately without any rhyme or reason. That should probably be sorted out. —Rua (mew) 20:31, 4 May 2019 (UTC)
So if we have two categories, one for types of boats and one for names for specific boats, wouldn’t it be a good idea to name these categories such that the names are suggestive of which is intended for which category, rather than using, say, Category:Ships and Category:Boats?  --Lambiam 07:33, 5 May 2019 (UTC)
We already have Category:en:Named roads, perhaps there is a case for a Category:en:Named boats? —Rua (mew) 10:25, 5 May 2019 (UTC)
And Category:en:Named administrative divisions to end the confusion?  --Lambiam 11:42, 5 May 2019 (UTC)
That seems like a good idea. Although we may not want to have Category:en:Named administrative divisions next to Category:en:Political subdivisions. We should use the same term, one with "named" and one without. —Rua (mew) 11:54, 5 May 2019 (UTC)

Lack of editors from certain countriesEdit

Just pondering: why do we have so few editors from certain ("First World") countries that we might expect: e.g. almost no Germans or Swedes (correct me if I'm wrong)? Does this suggest that their local Wiktionaries (de.wikt, sv.wikt...) are very good, or that these nationalities have less interest in working in English, or what? We seem to get much more of the Romance-language contingent, even in some cases from South America. Equinox 02:48, 5 May 2019 (UTC)

On what are you basing this assertion? I think we have several editors from Germany, they just aren't actively working on the German language. DTLHS (talk) 02:50, 5 May 2019 (UTC)
Yeah, you're right, I meant "lack of editors doing certain languages". I won't change the heading now because it would be rewriting history, but if you're not bothered, feel free to fix the question. Equinox 03:06, 5 May 2019 (UTC)
We have a number of very active IPs working on German. We did actually lose at least one good German editor over our refusal to treat any transparent multi-multi-whole-word German compounds as SOP, though their main focus was on other languages. As for Swedish, there are some familiar names at Category:User sv-N- both old and new. Chuck Entz (talk) 06:25, 5 May 2019 (UTC)
It would be interesting to get some stats on this. Reported languages vs actual edits, changes over time, IP contributions etc. I know that on the English Wikipedia some research has been done, analyzing contribution patterns of L2 English speakers. Can't find the link right now (using a search engine to find content about Wikipedia and not on Wikipedia is tricky…) – Jberkel 08:05, 5 May 2019 (UTC)

Bopomofo or Zhuyin?Edit

Hey all. I was considering whether or not Template:zh-pron should call the 注音符號 system 'Zhuyin' or 'Bopomofo'. Here's the vitriolic discussion that happened about this in 2008 on Wikipedia- w:Talk:Bopomofo#Requested move (2008). Here's the recent discussion I have had Talk:邊. I would say change it to Bopomofo, but that's just me. I feel the term 'Zhuyin' has just not reached the level of acceptance that 'Pinyin' has. No need for rudeness or anything. It's just my opinion. See that talk page for more details on some of the back and forth. --Geographyinitiative (talk) 08:41, 5 May 2019 (UTC)

I agree with several Justinrleung's points and I think we should keep the name as is. I disagree with you about its currency or level of usage. 注音符號注音符号 (zhùyīn fúhào) or 註音注音 (zhùyīn), in English Zhuyin is the formal term, known in China as well. Bopomofo (ㄅㄆㄇㄈ) is an informal Taiwanese (only) name, for which there is not even a phonetic transcription exists in Mandarin. --Anatoli T. (обсудить/вклад) 09:48, 5 May 2019 (UTC)
Thank you for your reply. Let us be clear about what is English and what is Mandarin Chinese. Why would you say "Zhuyin" is the formal term for 注音符號 in English? Only because it is the formal term in Mandarin Chinese? Use of "Zhuyin" in English works is probably prominent in situations where English is a second language. The system has not been promoted in Mainland China for decades, and some might assume that 'Zhuyin' or 'Zhuyin Fuhao' is just the best way to render it for English. I do not accept this as a valid reason to change the English term to 'Zhuyin' from what it is, 'Bopomofo'. (Isn't the term 'Bopomofo' influenced by Hanyu Pinyin? If it were based on the romanizations used in Taiwan, wouldn't it be written something like "Pop'omofo"?) It is 100% irrelevant to this discussion whether or not "a phonetic transcription exists in Mandarin" for the term 'Bopomofo': this is English I'm talking about, not Mandarin Chinese. Also, ISO and Unicode call it 'Bopomofo'. The Taiwanese Phonetic Symbols are called 'Bopomofo Extended'. Also, see all the usages I mentioned on the other page. I definitely used 'Bopomofo' when talking about these symbols in English in the USA in 2006 or so. My goal in promoting this change is to help ENGLISH speakers who use Wiktionary to look up stuff about Chinese more readily understand what the symbols they are looking at are. They are probably only passingly familiar with the word 'Bopomofo', and will likely be completely unaware of the term 'Zhuyin'. All I'm asking is to conform to English language common usage. Mencius not Mengzi. Yangtze not Chang Jiang. Bopomofo not Zhuyin or Zhuyin Fuhao. (Also, if you are going to find examples against my argument, a use of 'Zhuyin Fuhao' doesn't count as a use of 'Zhuyin' to me.) --Geographyinitiative (talk) 10:44, 5 May 2019 (UTC)
Since Zhuyin/Bopomofo is not inherently a concept known to most native English speakers, it is hard to judge whether Zhuyin or Bopomofo is more formal in English; the formality would naturally be borrowed from Chinese, and there is some evidence for this (all emphasis mine):
  • Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard, p. 377: "This alphabet is nicknamed 'Bopomofo'"
  • Learning Chinese: Linguistic, Sociocultural, and Narrative Perspectives, p. 293, footnote for bopomofo: "The colloquial term for 注音符號 (zhùyīn fúhào)"
While it is true that Zhuyin is probably a less common term than Bopomofo, this does not dismiss the validity of Zhuyin as an English word. Even though the use of Zhuyin by Chinese/Taiwanese users of English should not be dismissed, there is evidence of its use by non-Chinese/non-Taiwanese, as demonstrated by the quotes I have placed at Zhuyin. We're not forcing people to change Bopomofo to Zhuyin; it would simply be a choice of our dictionary. I would still choose Zhuyin over Bopomofo because it is the more formal option of the two.
(As a tangent, Bopomofo Extended refers to the Unicode block containing some extra symbols, including symbols only used in the Taiwanese Phonetic Symbols; the name does not refer to Taiwanese Phonetic Symbols, which also use symbols in the main block. The Bopomofo Extended block also contains symbols used in Hmu and Ge, which have nothing to do with Taiwanese.) — justin(r)leung (t...) | c=› } 07:33, 6 May 2019 (UTC)
I am still thinking about this issue. I recommend calling 'Zhuyin' 'Bopomofo' because 'Bopomofo' is the English language term for this system. --Geographyinitiative (talk) 01:16, 7 July 2019 (UTC)
@Geographyinitiative: As I've shown above, Bopomofo is not the only English term for this system. Zhuyin is also equally as valid an English term for the system. You have not quite countered the colloquialness of the term "Bopomofo", so Zhuyin seems to be a better choice at this point. — justin(r)leung (t...) | c=› } 01:43, 7 July 2019 (UTC)
"Zhuyin" and "Zhuyin Fuhao" are not only more formal but they are also better known terms in Greater China or Chinese speaking countries/areas outside Taiwan. It seems like we keep revisiting norms just because of Geographyinitiative's personal wishes. --Anatoli T. (обсудить/вклад) 11:10, 7 July 2019 (UTC)

Template:de-noun, Template:feminine noun of and confusion of sex and genderEdit

The user Rua changed Template:de-noun to display "male .." and "female .." instead of "masculine .." and "feminine .." and changed Template:feminine noun of into a template displaying "female equivalent of ..". While doing so the user confused sex (or: natural gender, biological gender) and gender (or: grammatical gender), or the user lacks knowledge of the German language. The user's edits are incorrect, led to incorrect entries and the old versions of the templates need to be restored:
Ankläger m, Herausgeber m, Hersteller m for example are masculine but not necessarily male; Anklägerin f, Gegnerin f, Herausgeberin, Herstellerin f, Sammlerin f for example are feminine but not necessarily female. Examples can be seen in the entries. Rua's edit led to wrong statements like "male Ankläger", "female Anklägerin" and "female equivalent of Hersteller" which is even kind of ridiculous when the entry clearly shows that it is incorrect, like in Herstellerin: "female equivalent of Hersteller ... von der Herstellerin, der Firma Karl Zeiß ...". In it the word Herstellerin f is feminine but the referent is a sexless and not female Firma f. In general words with -in f are always feminine but the referent isn't necessarily female (it can be, but doesn't have to be), and similary words with -er m are masculine but the referent isn't necessarily male. --Majbef (talk) 14:36, 5 May 2019 (UTC)

Just because Anklägerin is not necessarily female doesn't mean it's not the noun used for a female referent. The template was renamed because there are languages that have pairs of nouns for male-female referents without having grammatical gender, languages that have gender but do not distinguish male and female in their gender system, and specific words where grammatical gender disagrees with natural gender. English is an example of the first, Dutch and Scandinavian examples of the second, and Old English wīf an example of the third. For such languages, the terms "masculine" and "feminine" are inappropriate. See also WT:RFDO#Male and Female categories. —Rua (mew) 14:46, 5 May 2019 (UTC)
It obviously does mean that your edits are misleading as they make wrong implications. They are also wrong as Herstellerin for example isn't only female as it now states ("female equivalent of Hersteller") because of your edit. And they are also wrong as the -in-terms aren't the only terms used for a female referents: The -er-terms can refer to females too, especially as m pl refering to both sexes, beiderlei Geschlechts refering to both sexes, männliche und weibliche refering to both sexes, and weibliche refering to the female sex only.
For genderless languages or differences as in case of wer m and wīf n other parameters/templates would be needed instead of changing templates and making entries wrong. --Majbef (talk) 15:09, 5 May 2019 (UTC)
If Herstellerin is wrong, then how would you improve it without messing with the templates? —Rua (mew) 15:18, 5 May 2019 (UTC)
It is wrong as can be seen by the definition (clearly stating a sex) and the example (refering to a sexless entity), so it's not "If ..., then how ..." but "As ..., how ...". You messed with the templates, so you should answer that and correct that entry and all other entries you made wrong. The simpliest way I see would be to differ between "feminine equivalent of .." of the original template which fits here and a new "female equivalent of .." which could for example be used in stewardess pointing to steward. --Majbef (talk) 15:42, 5 May 2019 (UTC)
I asked you how you would improve it without messing with the templates, can you give an answer to that? If you only keep hammering on the template being "wrong" we're not going to get anywhere. Renaming the old template back would reintroduce the old problems which led to me renaming it in the first place. You need to explain better why the grammatical gender of these words matters when they are referring to things with no natural gender. —Rua (mew) 16:19, 5 May 2019 (UTC)
Personally I don't see an issue with using "feminine" in languages like English without gender, and would prefer "feminine" over "female". To me, "feminine" is ambiguous between natural and grammatical gender, while "female" can only imply natural gender. But if the terminology is an issue, I think the correct thing is to create separate templates {{female equivalent of}} and {{feminine equivalent of}}. Benwing2 (talk) 16:34, 5 May 2019 (UTC)
The question that begs answering, though, is whether the cases of grammatical equivalent genders require the existence of equivalent neuter forms as well in languages that have a neuter gender. German is one such language, and it was exactly this objection I raised in the RFDO discussion I linked. If a word like Herstellerin is required to be used in reference to something with no natural gender but feminine grammatical gender, then what happens when that noun is neuter instead? Or cases where the two types of genders mismatch? Is Weib a Hersteller or Herstellerin? Would a neuter noun similar to Firma be a Hersteller or Herstellerin? —Rua (mew) 16:43, 5 May 2019 (UTC)
"feminine equivalent of" was correct, and you made it and entries wrong instead of correcting it. So changing it back would fix the errors you introduced.
As Klägerin f shows, the term can also be used in reference to a neuter noun (Unternehmen n) or a masculine noun (Verlag m) (Maybe because of some kind of constructio ad sensum, connecting it with Firma f or Unternehmung f). But it doesn't matter: In both cases (Unternehmen + Hersteller and Unternehmen + Herstellerin), "Herstellerin: feminine equivalent of Hersteller" is correct, while "Herstellerin: female equivalent of Hersteller" is obviously incorrect as a Herstellerin f isn't necessarily female. --Majbef (talk) 22:01, 5 May 2019 (UTC)
The example you just gave shows otherwise, that grammatical gender is irrelevant for what Klägerin can refer to. I think I'll wait for a German native speaker with more Wiktionary experience to give input, as I have trouble making any sense of your arguments. —Rua (mew) 22:09, 5 May 2019 (UTC)
The example does show that "Herstellerin: female equivalent of Hersteller" is incorrect and that "Herstellerin: feminine equivalent of Hersteller" is correct. Herstellerin f is always feminine - regardless of it refering to a masculine, feminine or neuter noun. Hence it's a feminine equivalent to (the masculine) Hersteller m. The referent however isn't always female - it can also be sexless. Hence it's not a female equivalent to (the sexless, male and/or female) Hersteller m. Emphasising the sex, "Herstellerin: sexless or female equivalent of Hersteller" could be correct, but that's not what the template displays and what the template was for. --Majbef (talk) 22:23, 5 May 2019 (UTC)
That just shows that the current entry is incomplete and is missing senses. But you didn't give any suggestion for how to improve the entry otherwise, when I asked for it, so I don't know what you're after. Again, I'm going to wait for someone more experienced with German on Wiktionary who can explain the situation better. —Rua (mew) 22:36, 5 May 2019 (UTC)
The entries were correct and didn't miss any senses, you made them wrong or "incomplete". So it would be your task to fix them. Nontheless I made two suggestions of how to fix them: Firstly, change the template back to when it was correct, and secondly, optionally create a new "female equivalent of"-template which can for example be used for English. --Majbef (talk) 22:49, 5 May 2019 (UTC)
We're just going in circles here. @Fay Freak can you offer anything? —Rua (mew) 22:52, 5 May 2019 (UTC)
Rua confused nothing. She solved a technical problem. The templates are there to show pairs in a typified fashion. As rightly noted, “Herstellerin” is the female equivalent of “Hersteller”. Addingfemale Hersteller” to Herstellerin is confused and says nothing to the reader, at the best, possible even confuses him, and it’s wrong since it is the male equivalent, according to our system of displaying information. If a reader does not know that a GmbH is treated like a female in German optionally, in accepting the -in suffix and else gendered forms of nouns (“die Beklagte” and “der Beklagte”), this is not the place to tell him this. Fay Freak (talk) 23:13, 5 May 2019 (UTC)
Duden/de.wikt just say "weibliche Form zu Hersteller". In German weiblich can mean both female and feminine, so the problem doesn't arise there. Technically "feminine of" seems to be correct in all cases, but I don't think "female of" would be massively confusing. Perhaps the template be changed so that it outputs a different text based on the language? And why do we have {{masculine noun of}} ? – Jberkel 00:00, 7 May 2019 (UTC)
Seems to demand a change too. Fay Freak (talk) 13:32, 7 May 2019 (UTC)
"“Herstellerin” is the female equivalent of “Hersteller”" isn't correct or "incomplete" as Rua called it: A Herstellerin f is feminine (gender) but not necessarily female (sex). And the example in the entry does clearly show that the definition is now(!) wrong - at least if the reader has some knowlegde of German to understand the example and does know that a company is a sexless thing. "“Herstellerin” is the feminine equivalent of “Hersteller”" on the other hand is correct and complete, it doesn't lack any senses.
As for the adding part: It was a typo for m= (masculine), which would be correct. In case of f= (female) it would be Hersteller or Herstellerin and not only Hersteller.
As for the GmbH part: The entry GmbH is not the place to tell the reader that a GmbH f is sexless but can be referred to with feminine (not female!) words like Beklagte f. However, the entry Beklagte f is a place to tell the reader that the noun is feminine (not female!), or that the referent can be sexless or female (and not only female!). To tell the reader that a Beklagte f is only female (sex) is wrong (or "incomplete"). --Majbef (talk) 07:51, 11 May 2019 (UTC)
It didn’t tell this though. It’s your exaggerated interpretation. This is not what “female” means here: nowhere the wording excluded that the word can be used for a GmbH. As I said, the way information is presented is typicized. And if a reader does not know that such pairs can be employed depending on conjecturing natural gender from grammatical gender of a company, there is no place, no entry to tell him. It is a general phenomenon that is to be read in grammars but not here. However the distinction in issue is basically a sex or natural gender distinction and not a grammatical gender distinction, as can be seen by there not being a “neuter equivalent”. All these words are used primarily for the roles of humans, hence the distinction is after the natural gender. The distinction applied for corporations is secondary and optional. It might be incomplete to feature the distinction so as a role split but it is a distortion to mark it as a grammatical gender distinction.
Der Eintrag hat selbiges indes nicht behauptet. Es ist von dir hineingelesen, eine überzogene Deutung. Das ist nicht, was »weiblich« hier bedeutet. Nirgendwo schloß der Wortlaut aus, daß das Wort auch für eine GmbH genutzt werden kann. Wie ich bereits gesagt habe, die Mitteilungen werden auf eine typisierte Weise gemacht. Wenn ein Leser nicht weiß, daß solchergestalt Paare in Abhängigkeit eines durch grammatisches Geschlecht ersonnenen natürlichen Geschlechtes angewandt werden können, gibt es keinen Ort, keinen Wörterbucheintrag, um ihm davon zu berichten. Es ist eine allgemeine Erscheinung, die in den Grammatiken nachgelesen werden kann, jedoch nicht hier. Die Unterscheidung freilich ist ihrem Grunde nach eine Unterscheidung nach dem natürlichen Geschlechte, nicht nach grammatischem Geschlechte, wie man daran sehen kann, daß es keine »sächliche Entsprechung« gibt. Alle diese Wörter werden vorzüglich für Rollen von Menschen gebraucht, daher ist die Unterscheidung nach dem natürlichen Geschlechte. Die Unterscheidung bei Gesellschaften ist nachrangig und freigestellt. Es mag unvollständig sein, die Unterscheidung als eine solche Rollenteilung darzustellen, doch ist es eine Verdrehung, sie als eine solche des grammatischen Geschlechtes zu kennzeichnen. Fay Freak (talk) 01:46, 12 May 2019 (UTC)

Japanese verb: both -suru verbs and irregular verbs are put into Category "type 3"Edit

Why not separate them? -- Huhu9001 (talk) 05:02, 6 May 2019 (UTC)

  SupportSuzukaze-c 22:50, 6 May 2019 (UTC)
  • @Huhu9001, if memory serves, this is because English-language materials for teaching Japanese often classify する (suru) and  () (kuru) as "type 3" or "group 3" or something similar. Example.
What would you suggest? And what verbs besides する (suru) and  () (kuru) would be affected? ‑‑ Eiríkr Útlendi │Tala við mig 17:14, 7 May 2019 (UTC)
@Eirikr: This affect する and all suru verbs like 散歩. くる is not affected. -- Huhu9001 (talk) 03:13, 8 May 2019 (UTC)
Concerns that arise:
  • This deviates from English-language teaching materials, and is thus likely to confuse (some of) our readership.
  • This would result in  () (kuru, to come) being our only expected "type 3" verb.
  • However, it appears that our "type 3" categorization has itself been ... irregular, as our "type 3" category contains many things typically excluded in common English-language teaching materials. I also see a few entries that are Classical, not modern Japanese (居り (ori), (su), そうず (sōzu), (sōrou)). See also w:Japanese irregular verbs.
My growing sense is that the current approach -- which only removes する verbs from "type 3" -- is mistaken and insufficient. する is indeed a truly irregular verb, as is 来る, and these two should be treated as such. This would mean that both are "type 3", according to common English-language materials for teaching Japanese. Creating a sub-category under Category:Japanese type 3 verbs for just the する verbs (of which there are many) would strike me as a better approach than removing する verbs from this category altogether.
Widening the scope, there are various other words that could be treated as "irregular verbs", some of which are already in our Category:Japanese type 3 verbs despite not being included in "type 3" . Some are classed as 助動詞 (jodōshi, literally helper verb) in common Japanese grammars, although this is historically a grab-bag of grammatical oddments, including things like べし (beshi) that inflect as adjectives, and things like (da) that is essentially a modern invention cobbled together from disparate pieces. We also have the honorific verbs おっしゃる (ossharu), ござる (gozaru), なさる (nasaru), and the like that are mostly regular, only evincing some minor irregularity in one conjugation stem, where the expected -ri ending instead becomes -i.
Considering all of this, I really think that we need to have a broader conversation about how we (Wiktionary) want to treat the less-regular verb classes as a whole -- and whether this categorization system should apply just to the modern language, or to Classical and OJP as well. ‑‑ Eiríkr Útlendi │Tala við mig 17:56, 8 May 2019 (UTC)
@Eirikr: One solution to the current problem I suppose is to modify Module:ja-headword to remove all irregular verbs except くる from Category:Japanese_type_3_verbs, and then Category:Japanese_suru_verbs can be made a subcategory of it. Also it is better to have the label of type 3 verbs changed from "irregular" to "type 3".
I agree that it is better to separate "classical" or older conjugations from modern ones. But there seems to be no consensus on this. -- Huhu9001 (talk) 03:50, 10 May 2019 (UTC)
@Huhu9001: <nods/> A follow-on suggestion, then:
  • Do as you suggest and reduce the current "type 3" category to 来る and compounded verbs at the top level of the category, with the する verbs included as a sub-category within "type 3".
  • Also create a category for irregular verbs more broadly, under which "type 3" would itself be a sub-category. This broader category could include the honorific verbs with the -i endings where -ri would be expected, which verbs are used in the modern language, and the oddball copular verb (da), among others.
  • Talk further about how to handle Classical forms. (I see now that multiple editors appear to be treating Old Japanese as a separate language, for entry-structure purposes, which is great by me.)
いかがでしょう (Ikaga deshō ka?, How would that be?) ‑‑ Eiríkr Útlendi │Tala við mig 23:13, 10 May 2019 (UTC)
@Eirikr: So any idea on the categories of old conjugations? -- Huhu9001 (talk) 10:41, 17 May 2019 (UTC)

Remove all Unihan definitions and adjust reference header of CJKV translingual section from L4 to L3Edit

By now, it is well established that there are many errors in the Unihan database such as non-existent readings and inaccurate definitions. The source of these definitions is not stated by Unihan and editors working on CJKV entries would occasionally send some of these definitions to RFV. I think it would be better to remove them altogether or at least hide them using comments: <!-- Unihan definition -->.

Note that CJKV translingual entries should not have definitions lines so editors would usually move these definitions to the appropriate language section. However, Unihan definitions tend to conflate Chinese and Japanese meanings together, especially those of flora and fauna, so I don't think it is a good idea to use these definitions.

On the other hand, the CJKV translingual section has references listed at L4 level under "Han character". I find this to be a little odd, since most other languages have references at L3 level. I would like to request for these references to be adjusted to L3 level rather than L4 level for uniformity.

Of course, the following tasks: (1) hiding or deleting translingual definitions from Unihan and (2) adjusting reference headers from L4 to L3 would involve the use of bots. KevinUp (talk) 10:03, 6 May 2019 (UTC)

  Support   Support   SupportSuzukaze-c 22:47, 6 May 2019 (UTC)
Pinging @Bumm13, Dokurrat, Geographyinitiative, Justinrleung for comment: KevinUp (talk) 10:59, 10 May 2019 (UTC)
Support tentatively. I don't think it's a good idea to remove the definitions completely. They need to be moved with {{attention|zh|moved from Translingual, please verify}} to the Chinese section. The conflation happen, I think, because some editors just copied definitions from Japanese dictionary, not sure these errors happen a lot in the Unihan database. {{ping}} doesn't work without the signature in the same edit. --Anatoli T. (обсудить/вклад) 09:09, 10 May 2019 (UTC)
Yes, we could hide the translingual definitions using comments such as <!-- Unihan definition --> or using {{attention|zh|Unihan definition}} so that editors working on converting the old Mandarin/Cantonese sections to unified Chinese would still be able to see them. KevinUp (talk) 10:58, 10 May 2019 (UTC)
Responding to the first point, it sounds like a good idea to have the 'references' all on level 3, but I think a thorough understanding of the original reason for putting them in level 4 should be made before the change is made. Responding to the second point: Here's what I do. When I see definitions in the translingual section of a ckjv character and I am interested, I will check a dictionary to see if they are similar to the Chinese definitions. If they are, then I move them to the Chinese section. It is definitely a mistake to move them blindly into the Chinese section. But I don't know if hiding them is the answer either. I think leaving them there kind of alerts me that the entry is still in a primitive stage of development. If the unihan definitions are hidden, then I have no base from which to start understanding the characters in the way that almost every other Chinese-English dictionary understands the characters. Not really vehement for or against these proposals. --Geographyinitiative (talk) 11:35, 10 May 2019 (UTC)
We do have several maintenance categories such as (1) Category:Mandarin Han characters, (2) Category:Cantonese Han characters, (3) Category:Requests for definitions in Mandarin entries, (4) Category:Requests for definitions in Cantonese entries, (5) Category:Requests for definitions in Chinese entries to keep track of Han character entries that are still in a primitive stage of development. I think the Unihan definitions are not that reliable when it comes to rare or archaic Han characters, so it would be better to hide them for the time being. KevinUp (talk) 12:29, 10 May 2019 (UTC)
I'd support hiding the definitions in translingual and putting an attention template. For the references section, I would also support the move to L3; I don't see a reason for it to be in L4. While we're at it, I would probably also go as far as changing it from "References" to "Further reading" because we don't actually have much in the translingual section that refers to the listed sources (probably except the Unihan database page). — justin(r)leung (t...) | c=› } 14:54, 10 May 2019 (UTC)
I think having straightforward definitions in the translingual section is a mistake, more often than not: no one ever speaks or writes in Translingual, just Chinese, Japanese, etc. There are definitely common semantic threads, but IMO the definitions should be more like those in root entries for Semitic languages (see ש־ל־ם‎ for an example). Chuck Entz (talk) 18:33, 10 May 2019 (UTC)

Change Proto-Slavic notation to consistently use haček for all cases of iotationEdit

Right now, we mix two different notations for Proto-Slavic consonants which result from the process known as iotation (triggered by a following j). We use the haček for the letters č, ď, š, ť, ž, but a following j for the cases of lj, nj and rj. All of these were single phonemes in Proto-Slavic, and remain so in most of the modern languages. Derksen's notation uses č, š, ž with haček, ļ, ņ, ŗ with a comma below, and dj, tj with following j, which is even less consistent. Czech and Slovak orthography, on which the use of hačeks for Proto-Slavic is based in the first place, use the haček consistently for all of these cases, thus č, ď, ľ, ň, ř, š, ť, ž (although their ď and ť are not reflexes of the Proto-Slavic equivalents). I propose that we consistently use the haček for all consonants that result from iotation, thus renaming our existing lj, nj and rj to match the ľ, ř and ň of Czech and Slovak orthography. —Rua (mew) 17:46, 6 May 2019 (UTC)

  Support – Háčky are easy to type and easier to read than digraphs. Digraphs should be avoided when creating orthographies. If one has digraphs now this is carried over from the past when one was not creatively or technically capable, that is for Proto-Slavic one used what one found easy to print or to type. Fay Freak (talk) 13:29, 7 May 2019 (UTC)
Whether they are easy to type depends on the platform one uses. For me, the easiest way (via the Latin/Roman edit panel) is awkward.  --Lambiam 17:45, 7 May 2019 (UTC)
You have to use that panel anyway for the yers, nasal vowels and other hačecked letters, so adding three more is not going to make a difference. —Rua (mew) 18:06, 7 May 2019 (UTC)

Pinging editors who have recently edited Slavic entries: @Benwing2, Useigor, Bezimenen, Kwékwlos, Greenismean2016. —Rua (mew) 20:08, 9 May 2019 (UTC)

  Support – More convenient. Kwékwlos (talk) 20:09, 9 May 2019 (UTC)
  Support – Looks better to me. Although the "hard to type" argument holds some weight; haceks are easy to type on the Mac keyboard using U.S. Extended aka ABC Extended, but I'm not sure about the PC. Benwing2 (talk) 00:18, 10 May 2019 (UTC)
But again, Slavic has so many other special characters, adding three more isn't going to matter. I can type all the hačeks and ogoneks on my keyboard (Linux US international) but for the yers I still have to use the special characters panel. —Rua (mew) 09:52, 10 May 2019 (UTC)

Lacking háček before soft vowelsEdit

@Rua, Vorziblix: Am I correct to observe that there are a lot of entries having a consonant letter incorrectly not using a kvačica, and all letters {t d l n r} have to use it in accordance with Wiktionary:About Proto-Slavic if ⟨ь⟩ follows? It appears that as other resources write those palatalized consonants with a -j digraph they also do not mark them before soft vowels, and we need to get rid of this variation. Because I compared entries like *koňь (which ESSJa writes *konь) and *ogňь with for example *strьžьnь, and see that in the declension table the genitive *strьžьna doesn’t even look like a soft stem. So *borьba*bořьba, *svatьba*svaťьba, *dvьrь → *dvьřь, *svinьcь*sviňьcь, *kъ̏nędzь*kъ̏ňędzь etc.? Fay Freak (talk) 19:30, 30 September 2019 (UTC)

I think you are confusing jo-stems with i-stems. In jo-stems, the j disappears and iotates the preceding consonant, while at the same time fronting the following vowel. But in i-stems, there is no j, and consequently neither fronting nor iotation. Only the first palatalization (č š ž) occurs here, which is triggered by a following front vowel. The case of *strьžьnь is thus indeed wrong; it must have ň if it's a jo-stem, but n if it's an i-stem. Slovene evidence suggests it's the latter. The other words you cited have no issues however, the forms they are currently at are correct. —Rua (mew) 20:40, 30 September 2019 (UTC)
@Rua I was rather thinking that those háčky do not only stand for jotation and that there are other sources of palatalization (ʲ). I imagine the system might rather be three different degrees of palatalization, 1. none 2. middle 3. strong (/j/-type). For example 1. *čьrta gives Russian черта́ (čertá) 2. *svatьba gives Russian сва́дьба (svádʹba) with palatalized /tʲ/, *kotьlъ gives Russian котёл (kotjól) with /tʲ/ but 3. *dobyťa gives добы́ча (dobýča) and *nъťьvy gives Russian но́чвы (nóčvy), *mȍťь gives Russian мочь (močʹ).
South-Slavic then only reflects the palatalization in case three while the weak or middle one gets lost and case two and one sound the same (now). But we do not distinguish case one and case two in the notation of Proto-Slavic. If we did, it looked like:
*koťьlъ but *dobyt́a. But I see it cannot be supported to mark that second case.
It seems like this cannot be proven and /tʲ/ /dʲ/ /lʲ/ /nʲ/ /rʲ/ presumably did not even exist in Proto-Slavic unless from formerly preceding /j/ or in the clusters *gt and *kt for *ť, right, all only developments in individual Slavic languages? I imagined Proto-Slavic more palatalized than it actually was, it seems. On the other hand, what speaks against it other than the notion that three degrees are too much to fit into the organs of speech? Maybe *gt and *kt engendering *ť point towards a more palatal pronunciation of this *ť, for example /t͡ɕ/ while there was room for a /tʲ/.
This goes not only for /tʲ/ /dʲ/ /lʲ/ /nʲ/ /rʲ/, I wonder if our Proto-Slavic had /gʲ/ /kʲ/, /vʲ/, /mʲ/. But that would not be phonematic so our notation hides it, so it is correct that we do not note what I have imagined, because the háček does not stand for any palatal or palatalized consonant per se but only that third case, and it is not said how ⟨t d l n r⟩ sound before ⟨ь⟩ etc. The third case however must be marked, I wonder why many resources fail to (or for only a subset of ⟨t d l n r⟩?). Fay Freak (talk) 22:27, 30 September 2019 (UTC)
In principle, iotated consonants could be denoted with a following j instead of a haček, and perhaps for a time that is indeed what their phonemic status was. But the problem is that the first palatalization produces č, š and ž, the same three sounds that result from iotation, but without a j originally following. It's possible that the change k > č progressed through an intermediate kj, but we can't really tell this, because all Slavic languages show a postalveolar or retroflex affricate. In the case of iotation, too, all Slavic languages show a single phoneme rather than two, and the same applies to kt > ť. For that reason they are treated as single phonemes, based on the comparative principle.
As for palatalisation not caused by iotation, it appears to have been an allophonic distinction in Proto-Slavic, triggered automatically by a front vowel. In South Slavic, this process remained allophonic, and this was probably helped along by the merger of the front and back yers to a schwa in western South Slavic. Only in East and West Slavic did the process become phonemic, when front yers were lost but left their palatalising effect behind. There, too, the process was disrupted in various ways. In West Slavic, the merger of the back yer with e meant that non-palatalised consonants could now be followed by a front vowel. In Czech and Slovak, the merger of y with i had the same effect as well. —Rua (mew) 08:38, 1 October 2019 (UTC)

<s(s)> pronounced /z/Edit

Joseph, deserve, dessert... since it is a lexical issue, it would be quite useful to create a category for <s(s)> pronounced as /z/, even if as a variant --Backinstadiums (talk) 03:19, 11 May 2019 (UTC)

Is it lexical? Why this particular situation, and not "X pronounced Y" for any other X-Y pair? English spelling and pronunciation allow all sorts of odd combinations of these; see ghoti. Equinox 03:20, 11 May 2019 (UTC)
@Equinox: This one clearly exceeds the rest by far and is not a diagraph, except for <ss> --Backinstadiums (talk) 15:40, 12 May 2019 (UTC)


User:Rua recently replaced the etymology section of housen with Template:nonlemma, and later informed me in an edit summary that it's "the standard practice". If so, I think it's a remarkably poor practice. The link in the template leads not to the lemma entry, but rather to Wiktionary:Lemmas. I think in general we should avoid linking from the mainspace to "behind-the-scenes" Wiktionary namespace pages, and this template in particular is quite confusing as a reader would expect the link to take them to the page that actually has the etymology. What do others think? —Granger (talk · contribs) 14:08, 11 May 2019 (UTC)

We shouldn't give, and in fact in general never have given, etymologies for every inflection of a lemma. The template {{nonlemma}} reflects that. Instead, the etymology is to be found on the lemma form, which is given in the definition and therefore does not need to be repeated in the etymology. —Rua (mew) 14:13, 11 May 2019 (UTC)
I agree that we don't and shouldn't give etymologies for every inflection. But where appropriate, it's fine to give etymologies for nonstandard, irregular, or otherwise interesting inflections. The edit that started this replaced a perfectly good etymology with a confusing and (IMO) unhelpful template. —Granger (talk · contribs) 14:25, 11 May 2019 (UTC)
You seem to think that I'm objecting to giving an etymology. What I'm objecting to is giving an etymology on a non-lemma. Note that the lemma house contains the exact same etymology, making it redundant to list it again on the non-lemma. —Rua (mew) 15:58, 11 May 2019 (UTC)
Please read my comments more carefully. I understand that you object to giving an etymology on a non-lemma. But you haven't given any reason for that objection other than the straw man of giving "etymologies for every inflection of a lemma".
What I'm objecting to is the confusing and non-user-friendly link in the template. In my opinion leaving out the etymology section in the housen entry would be preferable to adding such a confusing template. But I think the best options are (a) keeping the etymology as it was before this disagreement began, or (b) a short sentence that links to house (something like, "See house."). —Granger (talk · contribs) 23:16, 11 May 2019 (UTC)
This should be obvious from the definition line, which already links to house. We should not repeat the lemma in the etymology. —Rua (mew) 12:06, 13 May 2019 (UTC)
Of course it's obvious to you. I don't think it's obvious to a casual dictionary user, and what makes matters much worse is that the link doesn't go where one would expect it to. Even just removing the link from the template would be an improvement. —Granger (talk · contribs) 14:01, 13 May 2019 (UTC)
As a general rule, I leave nonlemma forms without an etymology. I don't see how the {{nonlemma}} template adds any useful information, regardless of whether it links to the lemma or to Wiktionary:Lemmas. The lemma is already found in the definition. Having to redundantly specify the lemma in the call to {{nonlemma}} will be a bit painful (esp. since nonlemma entries link to multiple lemmas) and IMO not terribly useful. Benwing2 (talk) 02:20, 12 May 2019 (UTC)
I use it so that the etymology section isn't empty. Leaving the section empty is ugly; it says to the user "this is where the etymology goes, but now we're not giving you any". Again, this is because our entry layout is broken, as I mentioned at the end of last month. The template is one of the kludges I use to deal with the broken layout, and was introduced in Wiktionary:Beer parlour/2016/May#Template:nonlemma, because another proposal to make the entry layout more sensible failed. The proper fix is to eliminate etymology sections where there is no etymology, but until then {{nonlemma}} is a good stopgap. It also signals to bots which etymology section to add inflections to. —Rua (mew) 11:18, 12 May 2019 (UTC)
I think I've only encountered this at non-lemma verb forms like, say, "flying", where it seems annoyingly redundant to repeat the material from "fly". It doesn't seem to add much, as Benwing2 remarks. Equinox 11:07, 12 May 2019 (UTC)
Yes, in entries like those I would suggest not including an etymology section at all, which seems to be what others in this discussion are advocating. In rare cases like housen I think it can be useful to give an etymology at the nonlemma (or at least a link to the lemma's Etymology section), but no etymology section at all would be better than such a confusing template. —Granger (talk · contribs) 12:51, 12 May 2019 (UTC)

There seems to be wide agreement that the template in its current form is not good. What's the next step? RFDO? Rewriting the template to be useful? Orphaning it by removing the Etymology section from articles that use the template? —Granger (talk · contribs) 12:03, 23 May 2019 (UTC)

  • I would not agree with the statement that no etymology should be given for a non-lemma form. It is fine to say that the etymology for "eating" is "eat" + "-ing". It would not be good to repeat the etymology of "eat" at "eating" though. Ƿidsiþ 12:39, 23 May 2019 (UTC)
    • @Widsith What would you propose we do with cases where the lemma has an ending but the nonlemma doesn't, like sluit or open? —Rua (mew) 13:25, 23 May 2019 (UTC)
      • I'm not proposing anything for such cases. Ƿidsiþ 13:54, 23 May 2019 (UTC)
    • If you have etymologies for inflected forms, you're basically turning the categories for the inflectional morphemes into categories for the inflections. I don't want all the present participles and gerunds to swamp out interesting cases like the -ing in building. Chuck Entz (talk) 13:38, 23 May 2019 (UTC)
      • Well, I mean that's the problem with categories in general, isn't it? You can hardly restrict their use to only those examples that you personally find interesting. Ƿidsiþ 13:54, 23 May 2019 (UTC)
        • In this case, you can. I think we should avoid etymologies in non-lemmas that only state the obvious, morphologically speaking. If you know that a form is the present participle of an English verb and you know anything about English grammar, you already know that it's got an -ing ending tacked on. Etymologically, its SOP. I have no problem with providing etymologies in non-lemmas when there's something like suppletion that isn't predictable from straightforward application of well-known morphosyntactic rules, but I don't see the point in having a list of every English word that ends in "-ing". There's no useful information- it's just clutter. It would be like having a Derived terms section in the have entry with the compound-tense forms of every English verb. Chuck Entz (talk) 02:40, 24 May 2019 (UTC)
          • OK. Well, I disagree. If someone wants to specify these etymologies, they are clearly correct and I can't see any justification to remove them. Ƿidsiþ 10:48, 25 May 2019 (UTC)
  • I think that the etymology sections should be removed from any straightforward, obvious non-lemma entries. If altered to redirect to the most relevant entry, however, this template could be useful for non-lemmas in entries—such as abak in Polish—that contain both lemma and non-lemma (with obvious etymologies) definitions to avoid blank etymology sections. Another application for the template could be to direct the reader to the non-lemma entry with the etymology for non-obvious non-lemmas that are obviously related to another non-lemma—like tygodnie, tygodni, et cetera to tygodnia, whose lemma is tydzień. Maybe format it like this: nonlemma|lang|entry . By the way, I think the etymology of housen should stay as it is unusual in English and would not be obvious whether it was inherited from an inflection of its predecessors or occurred due to some alternative dialectal construction of plural forms. İʟᴀᴡᴀ–Kᴀᴛᴀᴋᴀ (talk) (edits) 01:23, 1 June 2019 (UTC)
  • There seems to be consensus that this template is not useful in its current form, and no one has offered to rewrite it to make it useful. Therefore, I suggest going through and removing the "Etymology" section from all entries where the section only contains this template. —Granger (talk · contribs) 00:20, 6 June 2019 (UTC)

Changing {{doublet}} to take multiple termsEdit

User:Metaknowledge recently suggested to me in an edit summary that {{doublet}} should allow multiple terms to be specified, so you can easily say "doublet of foo and bar" or "doublet of foo, bar and baz". I'm wondering what people think of this. Currently, to specify this, you have so say something like {{doublet|en|foo}} and {{doublet|en|bar|notext=1}}, which is ugly and gets uglier if you need to specify 3 or more terms. The most logical way to make the change is to use the structure {{doublet|LANG|TERM1|t1=TERM1-DEFN|alt1=TERM1-ALTTEXT|TERM2|t2=TERM2-DEFN|alt2=TERM2-ALTTEXT}}, which is a radical departure from the current structure, which looks like {{doublet|LANG|TERM|ALTTEXT|DEFN}}. (In truth, however, more than 95% of calls to {{doublet}} look like {{doublet|LANG|TERM}}, which wouldn't change.) What do people think? Benwing2 (talk) 02:30, 12 May 2019 (UTC)

I support this idea because I've added {{doublet}} a lot and encountered quite a few entries where this could be used. In Appendix:English doublets, there are quite a few lines in the various tables with triplets and quadruplets, and even some quintuplets and sextuplets. These have to be linked with separate templates at the moment; I usually use {{doublet}} and then {{m}}. It would be nice to be able to link them in the same template. — Eru·tuon 02:38, 12 May 2019 (UTC)
Good. Don’t know though when one shall say “triplet” or if templates should be able to do that or if one should categorize triplets, quadruplets. Seems like for Arabic for now I have only used the word “triplet” once, on عُقَار(ʿuqār). Or maybe triplets being unlikely is even more a reason to collect them. Fay Freak (talk) 12:19, 12 May 2019 (UTC)
This is implemented, and I've switched the format of all uses. Benwing2 (talk) 02:05, 13 May 2019 (UTC)
Created a list of probable doublet lists that can be merged into a single instance of {{doublet}}. — Eru·tuon 03:21, 13 May 2019 (UTC)
@Erutuon Thanks. The list looks good, I'll get to work on it. Benwing2 (talk) 13:49, 13 May 2019 (UTC)
@Erutuon Done. Benwing2 (talk) 00:24, 14 May 2019 (UTC)

will be needed to replaceEdit

Is the following construction grammatically correct (meaning "will need to be replaced")? Each such device will be needed to replace sometime in the near future. If so, is such structure specified in the entry of need? --Backinstadiums (talk) 16:42, 12 May 2019 (UTC)

It's wrong. Equinox 16:44, 12 May 2019 (UTC)
I'd say wrong. DCDuring (talk) 17:54, 12 May 2019 (UTC)
Ugly syntax sounds as if it were written by a non-native speaker. SemperBlotto (talk) 05:33, 13 May 2019 (UTC)

hour meaning time outside poetic contextsEdit

For example in a sentence such as "She couldn't avoid calling him to share the news, despite the hour". Yet hour specifies that meaning as poetic --Backinstadiums (talk) 18:52, 12 May 2019 (UTC)

In your sentence, it means "despite how late it was" (i.e. she called even though it's rude to call at night, when people are sleeping). That's not the same as the poetic sense. Equinox 19:04, 12 May 2019 (UTC)
Well, I suppose that's what you're saying, but really it's "despite the clock time", the actual hour, sense 1. Equinox 19:04, 12 May 2019 (UTC)
In this, your hour of need, you should have recourse to WT:TR or WT:ID or, yea, Google "the hour of" (BooksGroupsScholar). "I call on the colectivos; the hour of resistance has arrived, active resistance in the community,"
Hour ("time") seems to me a bit rhetorical, literary, but it can be found in newspapers and in religious works ("now and at the hour of our death. Amen"). It is very common in book titles. DCDuring (talk) 19:47, 12 May 2019 (UTC)

Thesaurus - translations to other languages or not?Edit

There is a bit of a disagreement going on at Thesaurus:juoppo - it is a Finnish word and has Finnish synonyms, but recently User:Paradoctor added German words there as well, despite the fact that juoppo is not a German word. The policy does not seem to, based on a quick look, clearly indicate whether this is allowed or not. — surjection?〉 20:23, 12 May 2019 (UTC)

For the record, my personal opinion is that the German words should be under a Thesaurus page for a German word. — surjection?〉 20:24, 12 May 2019 (UTC)
Wiktionary:Thesaurus#Multilingualism: "English Wiktionary Thesaurus is multilingual"
In light of that, I don't see how one could reasonably exclude synonyms in other languages, when they exist in them.
Mainspace lists meanings in other languages, so why should the thesauraus not list synonyms in other languages? Paradoctor (talk) 20:28, 12 May 2019 (UTC)
English Wiktionary itself is multilingual too. That doesn't mean you see German synonyms in Finnish entries. They are in German entries. —Rua (mew) 20:31, 12 May 2019 (UTC)
We don't put German synonyms in Finnish dictionary entries, so the same treatment applies to Thesaurus pages as well. Doing otherwise would ultimately result in a huge criss-cross of redundant synonyms with every language pointing to synonyms in every other language. We limited translation to English entries for the same reason, so English is to be the "synonym hub": all terms have translations into English, and all English terms have translations into all other languages. —Rua (mew) 20:30, 12 May 2019 (UTC)
Yeah, German words go under their own headers and not under Finnish. Fay Freak (talk) 20:37, 12 May 2019 (UTC)
That being said, it might become a thing to link to Thesaurus pages of other languages with related meaning, provided that the format is neat. This is from a marketing staindpoint, for leading the user further within the website, and it isn’t too farcical if we aim at polyglots. If someone is interested in Ukrainian synonyms he might be interested in Russian synonyms, if someone is interested in Catalan words he might be interested in Spanish words, if someone is interested in Ottoman words he might be interested in Persian and Arabic words. No human resources to maintain thesauri so though, this will stay a thought for the next years. We have more problems in attracting than in keeping users. Fay Freak (talk) 02:26, 13 May 2019 (UTC)
You have to consider what it looks like when every language links to the equivalent Thesaurus page of every other language. Then someone adds a new Thesaurus page to the list of one of the pages, but not the others. It will all go horribly out of sync after a while and become an unmaintainable mess. We disallow Translation sections for all languages other than English for this reason, too. Allowing them would create that same inconsistent unmaintainable many-to-many relation. —Rua (mew) 12:03, 13 May 2019 (UTC)
So we gonna link to Thesaurus pages in other languages only on English thesaurus pages? 🤔 Fay Freak (talk) 00:41, 14 May 2019 (UTC)
Only English entries have Translations sections. Translations of synonyms are translations too. —Rua (mew) 08:53, 14 May 2019 (UTC)

Pali Verb RootsEdit

Do Pali verb roots merit entries in different scripts? The usual lemma form of Pali verbs is the 3s of the present active. I can't find any evidence of native traditions of *Pali* verb roots, and the PTS dictionary attempted to work in terms of equivalent Vedic Sanskrit roots. If Pali roots do actually merit entries, I think they are an abstraction for the benefit of Wiktionary users, and should therefore only be in the Latin script. RichardW57 (talk) 21:18, 12 May 2019 (UTC)

Literary Chinese or KoreanEdit

Following an extremely lengthy conversation at User talk:B2V22BHARAT#囧, I feel compelled to ask, what does the community feel about the following quotations. Are they considered literary Chinese or Korean? Should they be placed under the Korean section or under the Chinese section? Also, what is our criteria for hanja? Is hanja confined only to those used in the Korean language (한국말 (han-gungmal))? Are characters used in literary Chinese texts written by Korean scholars considered hanja as well?

(1) Literary Chinese quotations written by Korean scholars added by me to Korean hanja entries:

Extended content
  1. 1818, Jeong Yakyong, 정약용(丁若鏞), “율기(律己) (yulgi)”, in 목민심서(牧民心書) (mongminsimseo):
    . . . . (Gan-gichuryul. On-giansaek. Isunibang. Jeungminmuburyeorui.)
    (please add an English translation of this quote)
  2. 1567, 선조소경대왕실록 (宣祖昭敬大王實錄), 즉위년 십월 (卽位年 十月):
    , ,
    , , . (Gyeongjihageo, jeokjaewang-wangmanggeukjijung, migeupchalji.)
    The noble official descended, heads towards solemnness that is boundless in the middle, not able to investigate it.
  3. 1412, “十二年 春正月, 3月 30日”, in 태종공정대왕실록 (太宗恭定大王實錄) [Veritable Records of King Taejong Gongjeong], published 1431:
    , 一處訊問
    , . (Myeongjip Gugyeong、Gwangmi、Yeong-u、Choebaek deung, ilcheo sinmun.)
    He ordered for the arrest of Gugyeong, Gwangmi, Yeong-u, Choebaek and others, for them to be interrogated together.

(2) Literary Chinese quotations added to 박#Korean: (Note the usage of {{zh-x}} in a Korean entry):

Extended content

Etymology 3

Korean Wikipedia has an article on:
Wikipedia ko
English Wikipedia has an article on:

Sino-Korean word from

Proper noun

박 • (Bak) (hanja )

  1. A surname​.
    c. 1280 AD, Il-yeon, 일연(一然) (iryeon)), “기이 권제1(紀異卷第一), 신라시조 혁거세왕(新羅始祖 赫居世王) (gii gwonje1, sillasijo hyeokgeosewang(新羅始祖 赫居世王))”, in Samguk yusa, 삼국유사(三國遺事) (samgugyusa):
    사내는 박〔瓠〕과 같이 생긴 알에서 나왔고, 향인들은 박을 박(朴)이라 이르므로, 그 성을 박으로 하였다.(The boy hatched from an egg that looked like a gourd, and countrymen called gourd as Bak, so his last name became Bak.)
    Sanaeneun bak〔瓠〕gwa gachi saenggin areseo nawatgo, hyang-indeureun bageul bagira ireumeuro, geu seong-eul bageuro hayeotda.(The boy hatched from an egg that looked like a gourd, and countrymen called gourd as Bak, so his last name became Bak.)
    (please add an English translation of this quote)
  2. bark of a tree
    , [Classical Chinese, trad.]
    , [Classical Chinese, simp.]
    From: Shuowen Jiezi, circa 2nd century CE
    Pǔ. Mù pí yě. Cóng mù, bǔ shēng [Pinyin]
    (please add an English translation of this example)
  3. A historical surname used by the Yi () people.
    九月西 [Classical Chinese, trad.]
    九月西 [Classical Chinese, simp.]
    From: Chen Shou, Records of the Three Kingdoms, circa 3rd century CE
    jiǔyuè, bā qī xìng yí wáng Pú hú, cóng yì hóu Dù huò jǔ bā yí, cóng mín lái fù, yú shì fēn bā jùn, yǐ Hú wéi bā dōng tài shǒu, Huò wèi bā xī tài shǒu, [Pinyin]
    (please add an English translation of this example)
    [MSC, trad.]
    [MSC, simp.]
    From: 通志畧
    Yí dí dà xìng. Yǒu pǔ shì yǔ pǔ tóng [Pinyin]
    (please add an English translation of this example)
  4. big, suddenly, beautiful
    [MSC, trad.]
    [MSC, simp.]
    From: 博雅
    Pǔ. Dà yě cù yě lí yě [Pinyin]
    (please add an English translation of this example)
  5. combined notes of and , the sound is . The meaning is
    [MSC, trad. and simp.]
    From: 玉篇
    Pǔ mù qiè, yīn pū. Běn yě [Pinyin]
    (please add an English translation of this example)
  6. there were many Kim and Park in the country, but they did not married with other last names.
    · [MSC, trad.]
    · [MSC, simp.]
    From: 舊唐書
    guó rén duō jīn pǔ liǎng xìng yì xìng bù wéi hūn [Pinyin]
    (please add an English translation of this example)
  7. the king's last name was Kim, the noble man's last name was Park, the people had no last name, but only a name.
    ,, [MSC, trad.]
    ,, [MSC, simp.]
    From: 新唐書
    wáng xìng jīn, guì rén xìng pǔ, mín wú shì yǒu míng [Pinyin]
    (please add an English translation of this example)

Pinging Chinese editors @Atitarev, Bumm13, Dine2016, Dokurrat, Geographyinitiative, Justinrleung, Suzukaze-c, Tooironic for comment: KevinUp (talk) 11:07, 13 May 2019 (UTC)

I think they are Chinese quotes: they have no bearing on the Korean language "as she is spoke". Perhaps they could go in the Etymology section if they have a heavy Korean 'flavor.'
It reminds me that 舎利子#Japanese quotes the Heart Sutra, but I feel that is an exception. I don't think I would like to see quotations of the original Analects in a Japanese entry. —Suzukaze-c 02:43, 14 May 2019 (UTC)
@KevinUp: I have no idea about Korean, but 日本国語大辞典 has Classical Chinese citations, shown in the original Chinese (with shinjitai) and annotated with 返り点. --Dine2016 (talk) 04:53, 24 May 2019 (UTC)
There are also several Japanese works compiled by Japanese scholars in the 8th century AD using Literary Chinese such as Kojiki (古事記) and Nihon Shoki (日本書紀). I'm not sure how these works are annotated in Japan but Wikisource Chinese has the original text [1] [2] and it is written using the Literary Chinese language.
Vietnam also has Literary Chinese works (mainly poetry) by writers such as Lê Thánh Tông (1442-1497) and Nguyễn Khuyến (1835–1909) who wrote literary Chinese poems (thơ chữ Hán) as well as Vietnamese poems in Nôm script (thơ Nôm). KevinUp (talk) 12:30, 24 May 2019 (UTC)
An interesting topic to discuss would be whether Korean, Japanese and Vietnamese scholars were bilingual, i.e. were they fluent in spoken Chinese (any of its varieties) as well as their native language? I don't think this is the case because Korean, Japanese and Vietnamese scholars were borrowing the Han script mainly for writing purposes such as recording their own history. Furthermore, the characters would have been pronounced in a way that conforms to the phonology of their native language, e.g. the Japanese language didn't have consonant endings so some monosyllabic words were split into two syllables. KevinUp (talk) 12:30, 24 May 2019 (UTC)
In Lion-Eating Poet in the Stone Den (施氏食獅史), the modern Chinese linguist Yuen Ren Chao (趙元任) pointed out that Classical Chinese texts can be read and understood in written form but does not make sense when it is recited in Mandarin because pronunciation of written Chinese has changed a lot over the centuries. KevinUp (talk) 12:30, 24 May 2019 (UTC)
For now, the solution that I can think of is to use {{quote-book|lzh}}lzh is the language code for Category:Literary Chinese language to quote Literary Chinese works written by Korean, Japanese and Vietnamese scholars until the collapse of the Qing dynasty around the early 20th century. (Korean and Vietnamese scholars continued to produce works in Literary Chinese until this time, unlike Japan which stopped writing texts in Literary Chinese centuries before). Quotations using {{quote-book|lzh}} would then be transliterated using Sino-Xenic readings of the respective languages. KevinUp (talk) 12:30, 24 May 2019 (UTC)
@Justinrleung Would you mind checking this quote regarding the origins of the Korean surname (, gim)? It's written in Literary Chinese by Yi Ik (李瀷), a Korean scholar from the 18th century.
I would prefer not to see {{zh-x}} being used in a Korean entry because it's a bit awkward to see Simplified Chinese characters and Pinyin appearing in a Korean entry, but I would like to seek a second opinion. KevinUp (talk) 12:30, 24 May 2019 (UTC)

Cognate vs relatedEdit

I strictly distinguish cognates from related terms. To me, cognate implies common descent: a word that is inherited from the same ancestral term as another. This is what the word "cognate" means when you analyse it in Latin: co-gnate, born together, from the same parent word. Two terms that merely share a root or are derived from a common term aren't cognates, they are related. Yet I see countless instances of people using "cognate" for terms that are not cognate, but related. This is a misuse of terminology in my view, similar to using {{inh}} for a Proto-Indo-European root: nothing can be inherited from a root, because it's not even a word, therefore words with the same root aren't necessarily cognates. I would like to ask if others are willing to pay attention to the cognate-related distinction when editing entries, and fix it if needed. —Rua (mew) 19:59, 13 May 2019 (UTC)

AFAIK, the correct use of "cognate" does include all terms derived from the same root. The term normally used to indicate descent from a given ancestral term is "reflex". I would not support trying to narrow the meaning of "cognate" to mean "reflex". Benwing2 (talk) 23:48, 13 May 2019 (UTC)
In my understanding, parallel formations can also be cognate; though sharing a root is not enough. If Turkish and Azerbaijani have formed the same word, a word using sensu strictissimo cognate parts, around the same time for a technological innovation, this is a cognate. So to say SOP cognate. It is the regular outcome of the tie of inheritance. It would be sesquipedalian to constantly point out that they are parallel formations.
So X is inherited into languages A and B, and Y is too, X and Y are cognates, and when both languages form XY by the same motivation, it will be a cognate. The whole is not more than the sum of its parts.
Or “cognate” has multiple meanings, why not. Long not used the word polysemy.
Even if it were not correct to say “cognate” in my case, there wouldn’t be any other template (apart from {{noncog}} which is identical). Do we need {{parallel formation}} now? Lol. Fay Freak (talk) 00:38, 14 May 2019 (UTC)
I use {{cog}} even when the forms are not cognate. We abuse the term "cognate" so much that there's no point in using the template correctly. I disagree that parallel formations are cognates. They are not "born together". It's like parallel evolution: not everything with wings is a bird. —Rua (mew) 08:51, 14 May 2019 (UTC)
@Rua Do you use {{cog}} even with words that are completely unrelated etymologically? Please don't do that; that's what {{noncog}} is for. Benwing2 (talk) 14:58, 14 May 2019 (UTC)

Questions on eliminating {{yi-inflected form of}} and {{lb-inflected form of}}Edit

I am currently using code like the following to replace calls to {{lb-inflected form of}} for Luxembourgish, and {{yi-inflected form of}} for Yiddish.



{{head|lb|adjective form|head=groussen}}

# {{inflection of|lb|grouss||str//wk|nom//acc|m|s|;|wk|dat|m//n|s|;|str//wk|dat|p}}

{{head|lb|adjective form|head=grousser}}

# {{inflection of|lb|grouss||str//wk|dat|f|s}}

{{head|lb|adjective form|head=grousst}}

# {{inflection of|lb|grouss||str//wk|nom//acc|n|s}}

{{head|lb|adjective form|head=groussem}}

# {{inflection of|lb|grouss||str|dat|m//n|s}}



{{head|yi|adjective form|head=שאָטנדיקן‎}}

# {{inflection of|yi|שאָטנדיק||acc//dat|m|s|;|def//postpositive|dat|n|s}}

{{head|yi|adjective form|head=שאָטנדיקער‎}}

# {{inflection of|yi|שאָטנדיק||nom|m|s|;|dat|f|s}}

{{head|yi|adjective form|head=שאָטנדיקע‎}}

# {{inflection of|yi|שאָטנדיק||def|nom//acc|n|s|;|nom//acc|f|s|;|all-case|p}}

{{head|yi|adjective form|head=שאָטנדיקס‎}}

# {{inflection of|yi|שאָטנדיק||postpositive|nom//acc|n|s}}


  1. The inflection tables for Yiddish adjectives say "postpositive or nominalized". Is it enough to say just "postpositive" in the call to {{inflection of}}, or should I say "postpositive/nominalized" or "postpositive and nominalized"?
  2. The inflection tables for Luxembourgish adjectives say "without article" and "with article", which I have rendered as "strong" and "weak" respectively, as in German. Is this correct?

Benwing2 (talk) 00:37, 14 May 2019 (UTC)

(Notifying Metaknowledge, Wikitiki89): @Qehath @BigDom Benwing2 (talk) 00:39, 14 May 2019 (UTC)
I'm still not all that happy about replacing something that points the reader toward the more informative main entry anyway with something that I won't just be able to type out easily if I want to create an entry. (There's also the lingering problem that it only applies to Standard Yiddish, rather than the dialects.) But I don't see anything incorrect with how you lined it out above. —Μετάknowledgediscuss/deeds 17:37, 14 May 2019 (UTC)
@Metaknowledge I am planning on adding accelerators to make it easy to generate pages like these. Benwing2 (talk) 00:52, 15 May 2019 (UTC)
Alright, guess I can live with it. It would also be nice if you could bot-create them for all the existing adjectives with tables, which would obviate much of that work. —Μετάknowledgediscuss/deeds 00:57, 15 May 2019 (UTC)
@Metaknowledge OK, I'll see about doing that. BTW is פֿארגאַנגענער misspelled? The base form פֿאַרגאַנגען has an extra patakh under the first aleph. Benwing2 (talk) 01:28, 15 May 2019 (UTC)
Moved, good catch. —Μετάknowledgediscuss/deeds 02:34, 15 May 2019 (UTC)

The following user names are in block list . Is it applicable for Wikitionary?Edit

This is a list of page titles which are blocked from creation/editing on Wikimedia wikis.Is it applicable for Wiktionary?

Please see following link :

My doubt is Admin blocked the user name "Cruzir" , so my user name is applicable for this in Wiktionary?

If my user name is allowed i will continue , other wise i will change my user name

As far as I can see the user name “Cruzir” has not been blocked and is available, unlike the name “Cruizir”, which is already registered but has been blocked because of persistent abuse.  --Lambiam 12:53, 14 May 2019 (UTC)
You are correct User:Lambiam. The above "Title_black" listed user names are prohibited in Wiktionary?
Yes this is Block users list of Wikimedia foundation Wikis...includes Wikipedia..Wiktionary

In following link Admin blocked Cruizir and Bonadea .These two users in block list Prohibited in all wikies


I just got an email from the EU project ELEXIS announcing all sorts of free access to their lexicography tools for those from EU institutions. That would seem not to include en.wikt but may be interesting for those who also contribute to fr.wikt, de.wikt, it.wikt, etc. DCDuring (talk) 15:49, 14 May 2019 (UTC)

What kind of tools do they have that we don't? --I learned some phrases (talk) 07:31, 23 May 2019 (UTC)

The apostrophe in ZealandicEdit

Zealandic (Zeêuws) belongs part of a wider Low Franconian dialect group which has historically lost the consonant /h/, the beginnings of which can already be found in Old Dutch. Modern Zealandic has no standardised spelling, but guides often suggest the use of an apostrophe in places where Zealandic has no h, but where it is found in standard Dutch. This is done purely for the sake of recognition, so that people who know standard Dutch can more easily guess the corresponding Dutch word. For Zealandic itself, the apostrophe has no significance, it is a vowel-initial word like any other. The suggestion only applies to words that have recognisable counterparts in Dutch; otherwise there's just an initial vowel and no apostrophe.

I'm unsure if we should lemmatise words with this apostrophe or without it. From a phonological point of view, the apostrophe makes no sense, it's only written because of comparison with Dutch and only where such a comparison can be made, which is subjective and inconsistent. Even the Wikipedia article w:zea:Zeêuws seems to be inconsistent with it, with a mishmash of spellings. I see both Ollands without it and Neger'ollands with it, both and and 'and, both oôg and 'oôg. My preference would be for the apostrophe-less forms to be standard, with the apostrophe forms to be treated as alternative forms. —Rua (mew) 12:22, 15 May 2019 (UTC)

My preference would be to lemmatise with h, as one does with any Romance language that has lost /h/. Now that is recognizable. Fay Freak (talk) 13:27, 15 May 2019 (UTC)
But nobody writes that way. —Rua (mew) 13:59, 15 May 2019 (UTC)
What's more common? Is there any community besides you working on it? If you can find both apostrophe and non-apostrophe forms for one word, it seems reasonable to lemmatize the non-apostrophe form.--Prosfilaes (talk) 14:54, 15 May 2019 (UTC)
I haven't been able to find out which is more common, but the fact that both variants of several words are used in a single Wikipedia article suggests that it's likely about even. —Rua (mew) 16:22, 15 May 2019 (UTC)
Technical approach: lemmatize at both, backed by a single entry page. Rough proof-of-concept at [[User:Eirikr/'and]], [[User:Eirikr/and]] for the "front-end" pages, and [[User:Eirikr/'and-and]] for the "back-end" where the data would live. On the "front-end" pages, clicking any of the "edit" links next to the headers takes the user to the "back-end" page, similar to how it works for our sectionalized forum pages like the Tea Room. ‑‑ Eiríkr Útlendi │Tala við mig 19:02, 15 May 2019 (UTC)
That's way too complicated. —Rua (mew) 08:47, 16 May 2019 (UTC)
These sound very much like apologetic apostrophes in Scots. Based on what has been said above, I would suggest lemmatizing the form without apostrophes. In theory, one benefit to lemmatizing the forms with apostrophes would be that anyone who wants to know how the word would be spelled without them would find removing them straightforward (right?), whereas knowing where to add them if an entry doesn't use them is not straightforward—but if the apostrophized forms are listed as alt forms in the un-apostrophized entries, then that issue is eliminated. - -sche (discuss) 06:33, 16 May 2019 (UTC)

National politics categoriesEdit

It seems odd to claim that Brexiteer is limited to British English or that Jamaika-Koalition is limited to the German of Germany; the terms are used by any speaker of that language, but only pertain to the particulars of one nation's politics. We would certainly have enough entries for Category:en:UK politics and Category:de:German politics, as well as many others. What do you all think about this scheme? —Μετάknowledgediscuss/deeds 20:51, 16 May 2019 (UTC)

We've never correctly addressed the distinction between topic and register/context/language-variety. I hope you can come up with something, which might require a parallel category structure and careful cleanup of membership in the existing structure. DCDuring (talk) 23:37, 16 May 2019 (UTC)
I have a similar issue with the label (African American Vernacular), which is applied to terms that are found predominantly in speech by African Americans, but that are not specific to the AAVE sociolect – they may as well be used informally by black people with a high socioeconomic status, such as professors. I feel the distinction between context (in this case, a restricted class of speakers) and variety (nevertheless basically standard English) should also be carefully maintained.  --Lambiam 00:18, 17 May 2019 (UTC)
Categorizing these seems fine. Not sure on the name; Category:en:Stars is for names of stars whereas Category:en:UK politics wouldn't be a list of names of politics (would it?), but we also have Category:en:Politics, so maybe it's fine. It won't affect people trying to use {{lb|en|UK}} to indicate that a word refers to a British topic, though. (We even had "Category:German English" because people tagged {{l|en|GDR}} with {{lb|en|Germany}}...) I don't know how to stop that besides periodic checks of all dialect categories. (Some users have proposed adding a second, topical type of labels, to list "topics": e.g. "eye" might be tagged "anatomy", and "pronoun" tagged "grammar".* I don't know if that's a good idea.)
Regarding "AA-but-not-AAVE", probably it should have its own label. It still at least seems to be in the same category of phenomenon as the label "AAVE" (and "Appalachian", "Cockney", and "UK", etc), i.e. it's indicating who uses the term, which seems different from labelling "Brexit" as "British" just because it pertains to Britain. (Comparable to having a label for "AA speech of any level of formality", we have labels like "LGBT" that encompass not only a diversity of levels of formality but also aren't even limited to particular nations...)
- -sche (discuss) 06:56, 17 May 2019 (UTC)
Would (African-American community) be a good label?  --Lambiam 16:34, 17 May 2019 (UTC)
*I see it already is, but that seems weird, since it's not limited to discussions about grammar unless one argues the very fact of using it makes the discussion become about grammar, but in that case is saying "there's an airplane!" making a discussion (and the word) about aviation? - -sche (discuss) 07:02, 17 May 2019 (UTC)
If someone applies the label entomology or lepidopterology to a definition of brown that refers to a butterfly, is that a topical or a context label? I'd argue that the term can be used 'correctly' by almost any person referring to a lay observation of nature, so it shouldn't be considered a context label. I don't recall that there was any consensus, let alone a vote, not to use topical labels. I do not advocate topical labeling and would prefer that they be banned, but there were reasonable arguments in favor. DCDuring (talk) 12:13, 17 May 2019 (UTC)

I've had this same question with foods. See pozole; it's not, as the entry currently suggests, a "Mexican Spanish" word. It's just a dish eaten in Mexico. The English definition labeled as "American English" makes even less sense. It's just a geographic coincidence, it's not as if the Australians have a different name for the same soup. In these cases I usually remove the label and write the country where it's eaten in the definition. Ultimateria (talk) 15:32, 17 May 2019 (UTC)

FWIW, that (removing the erroneous label and writing the country into the definition) is the correct approach. Is the use of the word to mean a drink limited to Honduras, or is the drink merely Honduran? - -sche (discuss) 15:46, 17 May 2019 (UTC)
I think the name of the drink is actually pozol. It is not confined to Honduras (see Pozol on Wikipedia), but Hondurans consider it an authentic Honduran tradition ([6],[7]).  --Lambiam 17:54, 17 May 2019 (UTC)
  • Seeing no opposition and a great deal of unrelated discussion, I have gone ahead and added some national politics categories to get us started. —Μετάknowledgediscuss/deeds 06:18, 19 May 2019 (UTC)


Would it not make sense to cite specific sources for definitions, as well as cite quotations of sources? -ApexUnderground (talk) 06:05, 17 May 2019 (UTC)

That depends on the language. But generally for English we are not attempting to be a tertiary source. The source is the editor who created the definition. DTLHS (talk) 06:06, 17 May 2019 (UTC)
The source are the citations. Definitions are based here on usage, not on other authorities. Ƿidsiþ 12:36, 23 May 2019 (UTC)

Talk pages consultation: Phase 2Edit

  • We have generally discouraged reliance on talk pages to initiate discussions. Do we want to change that practice, make it official, continue the practice unofficially, or ??? Is there some boilerplate that we could have, perhaps as a filter warning, to inform someone starting a new talk page to go elsewhere? If there is any interest in this we should participate in the process, I suppose. DCDuring (talk) 20:35, 17 May 2019 (UTC)
  • When a user goes to a not-yet-existing talk page of an article, they are shown an info box:
      Talk pages of individual entries are not usually monitored by editors,
          and messages posted here may not be noticed or responded to.
          You may want to post your message to the Tea Room or Information desk instead.
    We could add that to every article talk page.  --Lambiam 13:37, 18 May 2019 (UTC)
    How do I forget these things?
I suppose that a user could post to an existing talk page, expecting it to be monitored. The notice for new talk pages would be useful for such users. But it is easy to miss such messages. I wonder whether such a message can and should appear at the edit window whenever one is opened on a talk page. DCDuring (talk) 15:52, 18 May 2019 (UTC)
The message MediaWiki:Editnotice-1 apears at the edit window of new and existing article talk pages. --Vriullop (talk) 09:48, 19 May 2019 (UTC)
Thanks. That I don't remember having read it shows how easy it is to filter such notices out. I must have noticed it at one time, either when I was new or when it was new. DCDuring (talk) 16:32, 19 May 2019 (UTC)
Uh oh, I hope this won't be the revenge of the spurned Liquid Threads upon us all. Equinox 16:10, 18 May 2019 (UTC)
How many are still using Liquid Threads? DCDuring (talk) 17:32, 18 May 2019 (UTC)
On the occasions where I used them (because I had no choice) I became even more confused than I already usually am.  --Lambiam 00:14, 19 May 2019 (UTC)
I would happily vote to ban LT from Wiktionary. —Μετάknowledgediscuss/deeds 00:17, 19 May 2019 (UTC)
I was using LT for a while. TBH, the only reason (a good one, IMO) was to annoy my fellow Wiktionarians. --I learned some phrases (talk) 17:25, 19 May 2019 (UTC)
Same here, I'd vote for a ban too. Canonicalization (talk) 16:34, 20 May 2019 (UTC)

Clown worldEdit

My friend's six-year-old was given this test: [8]. I don't think I knew "digraph" was a word until I was an undergraduate. Equinox 01:44, 19 May 2019 (UTC)

There's no reason six-year-olds can't learn it, though. It's no harder than a lot of words that six-year-olds learn. —Mahāgaja · talk 15:47, 19 May 2019 (UTC)
But I don't see that digraph has any purpose in the test question, except to make the answer slightly less obvious. I suppose the six-year olds need to know not to waste time on a word that doesn't contribute useful information. DCDuring (talk) 16:28, 19 May 2019 (UTC)
I still very vaguely remember being given "church" and "photograph" (which both conveniently begin and end with the digraph) as examples of ch and ph. I think there's a lot to be said for teaching fragments of Latin and Greek, as it really helps with spelling (you don't expect an f in an Ancient Greek word) and with decoding medical terminology and generally making sense of "hard words". Anyhow... knowing the jargon isn't the important bit. Hey! if you know the Greek bits then you can work out what a digraph is anyway. Equinox 21:01, 22 May 2019 (UTC)

Bad definitions in functional wordsEdit

Functional words are usually definied by a synonym, yet the latter remains unspecified regarding which of its meaning(s) is intended (concessive, adversative, etc.). For example, the third definition of as long as is just "while, since", but a starting learner wouldn't be able to infer the common causal meaning both terms shared. --Backinstadiums (talk) 15:57, 20 May 2019 (UTC)

We could improve our definitions, no question. But Wiktionary is not aimed at a starting learner of English. A starting learner will not benefit from an English-language resource anyway. —Μετάknowledgediscuss/deeds 17:03, 20 May 2019 (UTC)
Still, if a definition has multiple senses, only one of which applies, we should aim at disambiguating it – for example by giving a sequence of definitions whose intersection is no longer ambiguous. While I’m writing this, sense 3 has been simplified to “Since”, which in turn has two current senses as a conjunction: “From the time that” and “Because”. Neither is a particularly good substitute in the example sentence As long as you're here, you may as well help me with the garden. The first is a clear misfit, but Because you're here, you may as well help me with the garden also doesn’t sound right. I’d like to define it as “Since, seeing as”, but we don’t have the latter.  --Lambiam 15:13, 21 May 2019 (UTC)
I tnink I understand your concern, but OneLook dictionaries don't have seeing as or seeing as how as entries. They are highly informal. MWOnline has 1 "provided that" and 2 "inasmuch as, since". Usage examples help. DCDuring (talk) 18:56, 21 May 2019 (UTC)
I believe Cambridge and Merriam–Webster are OneLook dictionaries.  --Lambiam 21:57, 21 May 2019 (UTC)
Apparently the OED also has it, as a colloquial variant of seeing that.  --Lambiam 22:02, 21 May 2019 (UTC)
Thanks. I must have mistyped into their search box. I think of seeing as as informal, but MWOnline doesn't think so, though Cambridge thinks of it as nonstandard. DCDuring (talk) 21:20, 22 May 2019 (UTC)
Cambridge applies the nonstandard label only to the variant seeing as how. Just seeing as is labelled “informal”. Seeing as how sense 3 of as long as is also rather informal, using an informal synonym in its definition is reasonable. Cambridge and OED appear to agree that seeing that is more formal. BTW, the meaning of these collocations is not covered in the entries see or seeing, nor anywhere else that I can see.  --Lambiam 23:56, 22 May 2019 (UTC)
Also, you could try creating categories or, better, appendices that group the conjunctions into categories with impications for the definitions and use them to achieve some kind of consistency and correctness in the definitions. The modern grammars might help in that noble endeavor. DCDuring (talk) 19:00, 21 May 2019 (UTC)

"RFD failed"Edit

On the RFD pages, people often close discussions with the note "RFD failed", which I understand from context to mean that the entry should be deleted. However, this seems the wrong way round. If the RFD, i.e. request for deletion, has failed, does not that mean that the entry should be kept? Or is this just me? Mihia (talk) 20:56, 22 May 2019 (UTC)

GASP! Way to rock the boat. I suppose you're right but it's nice to have parity in how we talk about RFVs and RFDs. Perhaps we should say "entry passed" or "entry failed" in both cases? Maybe we need more goddamn templates. Equinox 20:59, 22 May 2019 (UTC)
I just say "deleted" and "kept". —Rua (mew) 21:39, 22 May 2019 (UTC)
That is much better. On Wikipedia the closer usually writes: “The result was keep/delete.”  --Lambiam 00:05, 23 May 2019 (UTC)
I have always been confused about what "failed" meant on RFD pages. "Deleted" and "kept" are much easier to understand. — Eru·tuon 00:40, 23 May 2019 (UTC)
I have a vague memory that when I used to close RFVs and RFDs (which I haven't done for years) I used to put things like "deleted" and was scolded because this didn't indicate whether it was unilateral or processual. Equinox 00:43, 23 May 2019 (UTC)
The “The result was keep/delete” wording on Wikipedia is clearer in that respect. — Eru·tuon 02:00, 23 May 2019 (UTC)
The idea was traditionally that the entry itself failed or passed, rather than the request. I don't care what people say as long as they help close RFDs and RFVs, which very few people seem to do nowadays. —Μετάknowledgediscuss/deeds 15:40, 25 May 2019 (UTC)
Can anyone do that, or is it only admins? Mihia (talk) 00:22, 28 May 2019 (UTC)
Generally only admins, because failing things may require deleting pages. —Μετάknowledgediscuss/deeds 03:18, 28 May 2019 (UTC)

Nonconcatenative morphemesEdit

I think they deserve entries. They're morphemes. They're tricky to deal with though; we can't just give "palatize final consonant" a main namespace entry, but they shouldn't be hidden in appendices. A past discussion on Semitic transfixes was pretty inconclusive. Is anyone else on board with incorporating nonconcatenative morphemes into Wiktionary? If so, how do we do it? Julia 01:50, 23 May 2019 (UTC)

I'm disinclined to include them. They're morphemes, but they aren't things that readers are going to want to look up in a dictionary. The same goes for the zero morpheme that "marks" the plural of words like sheep or the past tense of words like put. Readers aren't going to want to look up a suffix -∅ for those forms, or a suffix for certain Irish genitive singulars and nominative plurals (e.g. báid), or a prefix ʀᴇᴅ- for Malay plurals, Ancient Greek perfects, or anything else languages use reduplication for. —Mahāgaja · talk 10:25, 23 May 2019 (UTC)
I'm in favour of including nonconcatenative morphemes, but I'm not sure how to include them either. We should keep in mind that the basic goal of Wiktionary, as stated by WT:CFI, is to include things that someone comes across and wants to know what it means. Someone might come across a suffix as part of a word, then realise that it's a suffix, and look that up. But with nonconcatenative morphology it's much less clear what the user would be looking for and what the most intuitive place is that they'd find it. If the user doesn't know where to find it, they might be inclined to look for a word that has it instead and see what the etymology contains. Thus, I think that should be our focus: presenting nonconcatenative morphology in the etymology section in a way that lets the user find out more on that particular mode of derivation. At that point, it doesn't really matter where the information is located, so an Appendix seems workable. —Rua (mew) 13:50, 23 May 2019 (UTC)
Yeah, an appendix would definitely be better than nothing. With good etymology and linking templates I think it would easy to implement. Julia 21:03, 25 May 2019 (UTC)

Google have broken/retired their Usenet searchEdit

Searches in Google Groups no longer find anything from Usenet by default. You have to search for a specific desired Usenet group first, by group name, and then search within that group. Naturally this is utterly crippling for Usenet searching and makes it quite impractical, since you won't know which groups to look in, and doing it one at a time is impossible anyway. Equinox 15:33, 25 May 2019 (UTC)

You can still search Usenet using regular Google. Just search "cromulent" to find cites for cromulent. That said, it is certainly inconvenient. —Μετάknowledgediscuss/deeds 15:38, 25 May 2019 (UTC)

Cites in different languagesEdit

At the English entry for black pill, we have a citation in Swedish! And people defended this at its RfV. Cites of English words in other languages can only be mentions. Do we really need to have a vote to disallow this, or is it obvious enough that I can just delete the cite? Julia 20:33, 25 May 2019 (UTC)

It depends. A quote in another language is inherently a mention, so for English and other WDLs it's invalid as a cite for CFI. For LDLs, it's not that common to have decent lexical resources in the languages themselves, so a reliable source discussing the term would be welcome. In this case it's just someone discussing it on some website, and the term is English- so it's utterly useless. Chuck Entz (talk) 20:57, 25 May 2019 (UTC)
Yes, I forgot about LDLs. Per CFI: “For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate...For all other spoken languages that are living, only one use or mention is adequate.” It doesn't explicitly say that you can use a mention from different-language sources, which we do allow (c.f. tons of Classical Nahuatl entries). Do you think that should be included, or can it be inferred well enough? Julia 21:33, 25 May 2019 (UTC)
Discussing a term, I must warn, is not a “mention” generally. It often is a report of use. The use-mention distinction was incorporated into the CFI to shed ghost words, these “dictionary mentions” where a dictionary editor could have just copied it over mindlessly, and protologisms which barely live outside of lists of word yet only intented to be used. An instance of witnessing a word use does not fall out of our use-mention axis, here the rule must be teleologically reduced. For example the quote: “Eine kleine Minderheit schnalzt beim Begriff Kaviar mit der Zunge und denkt an eine viel billigere, viel leichter zu beschaffende Delikatesse – an Kot.” at Kaviar – this is typical journalistic style of reporting a use and meseems this is “use enough”, enough use shines through. This is not where the danger of ghost words or protologisms is. This is how slang terms often shine through which a great part of society avoids to use. In fact, avoiding to use a term is a use. Fay Freak (talk) 22:35, 25 May 2019 (UTC)
A use in another language can be a use like it can be a mention (ha! I have called it use hence). Nowadays it is easier than ever to just pick up foreign discourses. Also, I suggested to implement code-switching templates. Of course sb who consults an English dictionary does not want Swedish quotes as if they were English. That's not my ideal. But separated this is interesting extra information. The real problem here is not votes but the insecure criteria according to which a word passes through languages, and lacking technical refinement to represent uncertainty. It’s rather how should it be displayed, not whether it should be displayed, for by itself these quotes are not bad, so it depends on whether the technicians have written some good code so one can place it inoffensively – I mean quotes always come at a premium, like any corpus is limited and particularly any editor’s access to sources, so editors take what they have. Remember the strifes about APP? Much hate broke out for this being alleged to be Chinese. I suggested to keep the Chinese quotes at the English and this was a good compromise. Not to solve whether it is Chinese. Similarly, I wouldn’t ever create that word as Swedish. There must come much in addition to it being used in Swedish for it to be Swedish. These material criteria are currently hard grasped by editors, let alone formulated. Somebody has put much effort into quoting English arrastão and now nothing can be seen of that because the deficiency of our moulds let it slip through, sad, I think we can have English quotes without English section. Fay Freak (talk) 22:35, 25 May 2019 (UTC)
I did a quick Google Translate of the Swedish quote; it seems like the first instance is a mention, and then two uses in Swedish. The quote doesn't say anything about it being an English word. I don't see how this proves English usage. Julia 22:46, 25 May 2019 (UTC)
It does, because what else? How likely is it thought out for that article, or only picked up from Sweden? Nobody will try to prove baby-foot and Handy so, this is a strawman, the counter-examples are sought. Unless we have reason to believe that the term is invented only in this language, the presumption is that it isn’t, unless perhaps we have a situation where a foreign language is constantly used to invent new words (like Latin in Europe today, or Arabic in Ottoman Turkish, but even then it depends on the concrete use; Swedish does not use English that way, the “fake anglicisms” must be a real outside case). But here the whole points to a foreign context, so all indications are against it. All these talks about incels, redpills, blackpills, is the English internet and not the Swedish. Not yet heard “redpill” or “incel” for example in German, though I do encounter “rotgepillt”. Just reckon the likelihoods. That’s what one constantly has to do in investigating language. Fay Freak (talk) 23:02, 25 May 2019 (UTC)
A quote in Swedish does not count as a use in English. Handy is a good comparison. —Granger (talk · contribs) 00:18, 26 May 2019 (UTC)
It’s not about “counting”. It’s about what is a likely witness. “Three quotes” is just a procedural rule of thumb. Even when you have a quote in the target language it might still not be good, it might be a varyingly probable mistake or manuscript corruption or one is not sure “if this is still English” (Pidgin etc.). One still has to discern what is what. So one does not know if one has three or one has two or four, because they have varying quality. Usually it provides proof, it is in a text in a different language with the shape of the target language. No again, Handy is a strawman. One has to learn to make use of the verisimilars. I am averse to this black-white think.
WT:CFI literally speaks of a guideline. Guideline wherefor? For our cognizance of the terms. Like a judge does not just count the number of witnesses but assesses their quality, so the CFI only wants that one verifies terms by convincing procedure. The records that suffice might be of varying permanence. We exclude Facebook, Instagram, and the like, because people there write unreliably and one needs a filter against protologisms, raids, and all the garbage of the unredacted wilderness, apart from their trait that they disappear often and hence induce link rot. The CFI did not separate strictly what English is, and it didn’t tell that we can’t prove to the community a word by a bunch of web quotes, in fact such an assertion is very contrary to the CFI. The idea is: Include words that exist. Then only just enough measures have to be taken to make it evident that a word has been around. A web quote, unlike a book not digitized, is particularly suitable to verify, because many people can see at the appropriate time that it is there and not made up.
WT:CFI speaks of a “more formal guideline” implying there are still the other guides aside from the form. Even viewing WT:CFI#Attestation strictly the CFI does not say that a term is only to be included if it is to be attested in that very sense. Under the general rule it still might pass because of being verified otherwise.
Why this all for “well-documented languages”, one might think? It is because the more well-documented a language is – the more space it takes in the world –, the more peripheral areas there are that call for inclusion. The “good documentation” of a language grows linearly with bad documentation of the same language, that is. The more documented it is, the more used it is, hence the more badly documented but equally real fringes it has, it has them like any “limited documentation language”. We have a lot missing from Africa or even Multicultural London English.
This interpretation of the CFI being, at least expressed, novel for most, I apologize for the pretention, but you might see that there is a greater satisfaction to achieve than suffering the deletion words that you see being in the world.
I think we can also more often mark senses as dubious. If you object that a sense is not found except on the internet, but there it is found, the right thing to do might be to give it a cautionary background colour or separating, what CSS offers, to mark words that stand on the threshold, not being in and not being out.
So, we can also “include” words because they are likely but our current readings are uncertain. There was this case tracer. The meaning “The act or state of tracking or investigating something“ is still dubious. We have quotes, but hm. Irrespective of not counting enough evidence, I think this is a sense that should be included because of its possibility, because “it's likely that someone would run across it and want to know what it means” (as the CFI page says), and if somebody runs across the gloss, he can possibly add a quote that we could not find.
What do we do with reconstructed senses anyway? I mean if the word is attested but a sense is reconstructed. For example Aramaic זַיָּירָא(zayyārā) is attested as a pressing-tub for olives (or wine?) but Arabic زِيَار(ziyār, barnacle, twitch, pincers to fix a horse) is borrowed from it. So one can reconstruct a “possible meaning” “barnacle, twitch, pincers to fix a horse” for the Aramaic. (Good example?) I cannot exclude of course that this needs to be done for English, though of course rarely as compared to what one else does.
I have discovered whole semantic groups that are unfit for being “attested by three quotes”.
1. For instance, I found it a bit ridiculous how one wanted to “quote” that pomelo actually meant “grapefruit” – quotes from old times won’t make this possible because grapefruits and pomelos are used in the same contexts, only non-philological evidence can do it. Apparently editors settled with reasons therefore. So in general, lifeforms that are hard to distinguish.
2. Then, there is the field of diseases. For instance, the entry bejel, for a bacterial diseases, lists a lot of synonyms and translations. Of course by 19th-century sources we couldn’t find out that these names are all of the same meaning. This is a study of its own. Hence one needs medical treatises which look at the symptoms of historical diseases and decide if something is the same, and I have cited such a treatise as a reference.
3. It would be weird to quote for units of measure. Even for English-language historical measures, one might need to measure their capacity physically. You won’t find a quote from the 17th century telling you how much that weight is in kilograms. If you are lucky there are modern studies for that.
These are examples how the whole “three quotes“ thing needs to be seen flexibly according to the topic. Maybe I have talked specifically about a fourth group, “the internet’s language”. Different fields need different paths to be caught up with. Fay Freak (talk) 03:18, 26 May 2019 (UTC)
(edit conflict) A mention is circumstantial evidence, not direct evidence. It has zero standing under CFI. A discussion in English might fill out some of the details of the definition and context, and thus be marginally useful, although useless for verification. an untranslated chunk of Swedish in an entry for an English term might as well be lorem ipsum- something used to keep it from looking empty until actual content (a translation, at the very least) is added. Chuck Entz (talk) 03:37, 26 May 2019 (UTC)
So? Circumstantial evidence is also evidence. But this distinction is artificial, and does not address my point: Avoiding to use a term is a use. And ignores the code-switching problem. The assertion “that is a mention” is dogmatic, not to say, a swearword. “Anything I don’t count is a mention”.
An interpretation of this distinction after the CFI based on its telos of excluding ghost words and protologisms leads me to conclude that what you call mention is actually use.
To speak more clearly, I think “use” and “mention”, as used in WT:CFI do not have the meaning as usual outside of Wiktionary, hence the confusion. They are private language. So well, what you call a mention is a mention, but not according to the CFI definition of a mention, because its meaning is restricted according to the context and telos of the CFI. Because there are good mentions. Mentions that alone let you know the meaning. In fact we know many meanings that we consider language knowledge only through a chain of mentions. Mentions cannot be ousted.
And yet, how are details useless but useful? Either you recognize that one uses information apart from that conferred by uses, or not. Fay Freak (talk) 04:00, 26 May 2019 (UTC)
I don't have time to read a long comment this evening, but the basic point here is that a quotation in Swedish may count as a use in Swedish, but not as a use in English. —Granger (talk · contribs) 11:18, 26 May 2019 (UTC)
I agree with Chuck's first comment. A Swedish or Chinese (etc) text with a seemingly-English word embedded in it obviously does not attested the English term and is not suitable for illustrating its use. The fact that pseudo-loans (and pseudo-anglicisms, etc) exist demonstrates this quite clearly, that a language can make up something that looks like a word in another language, but isn't. If a term exists in English, find people using it in English. - -sche (discuss) 08:45, 28 May 2019 (UTC)
  • One of my favorite such examples of "English" in a non-English context is No Smorking.
Context is key. ‑‑ Eiríkr Útlendi │Tala við mig 17:00, 28 May 2019 (UTC)

Proposal: Switch to tonal orthography for SloveneEdit

@Benwing2 Slovene has two diacritic orthographies, the stress-based one and the tonal one. The tonal one can be converted automatically to the stress-based one, but the reverse is not true. The tonal orthography thus gives more information. At the moment, following WT:ASL and Appendix:Slovene pronunciation (which I originally wrote), the stress orthography is used in Wiktionary for all headwords, inflection tables and links. At the time, my argument was that the stress orthography is more common in dictionaries, and since not all Slovene dialects distinguish tones, the stress orthography applies to more of them. On the other hand, the standard practice for Dutch on Wiktionary is to include three genders, even though a majority of speakers only distinguishes two. The argument for Dutch is, again, that 3 genders gives more information, and the conversion to the two-gender system is automatic.

I'd like to propose that we change this practice for Slovene, by using the tonal orthography exclusively. As mentioned, it includes more information that would otherwise not be visible, especially tonal alternations in inflection paradigms. It is also crucial for etymologies, to the point that a separate {{desc/sl-tonal}} template was created, just to be able to show the tonal orthography in descendants (and not get people confused about the notation). Converting existing entries will take a while, and since the two notations overlap it's not always clear which of the two is currently present in the entry. —Rua (mew) 13:17, 28 May 2019 (UTC)

@Rua 100% in agreement with this change. Benwing2 (talk) 14:31, 28 May 2019 (UTC)
@Benwing2 Any idea how we can do it? The acute ´ and grave ` accents appear in both schemes, which makes it difficult to determine whether a particular word has been converted already. Perhaps a bot could list all forms that are currently ambiguous, and then we can work through that list. —Rua (mew) 17:28, 28 May 2019 (UTC)
I've created a bunch of tracking categories:
We should presumably start with the last category, which can be converted unambiguously to the tonal scheme. —Rua (mew) 18:26, 28 May 2019 (UTC)
Also, should we include the two additions Wiktionary made to the tonemic scheme, namely the letters ə and ł? They aren't used when writing full words with tonemic diacritics in other sources, and may confuse the reader into thinking they're real Slovene letters. On the other hand, while not related to tone and accent, they are a useful pronunciation guide. —Rua (mew) 17:32, 28 May 2019 (UTC)
Another problem: it's actually surprisingly hard to find thorough information on the tonemic system. I've found only one source that consistently uses the tonemic diacritics throughout, and it's a bit lacking in other areas. It seems that schools and such almost only teach the stress system and ignore the tonemes, as do grammars in general. I think the best we can do is provide the headwords with tonemes; providing them in inflection tables is going to be extremely difficult. —Rua (mew) 20:20, 28 May 2019 (UTC)
It seems like the right thing to do. How do you make sure the notations are not mixed and also, is there a definite source for the tonal notations? I have just searched for soprog in [9] and found three different notations from various sources: sopróg, soprọ̑g and sǫ́prog. soprọ̑g matches *sǫprǫgъ, so it must be right. --Anatoli T. (обсудить/вклад) 03:28, 29 May 2019 (UTC)
@Atitarev If you read those entries correctly, some of the notational variance disappears. The first entry, which says sopróg for SSKJ, is the non-tonal orthography; the tonal equivalent (ọ̑) for the stressed vowel is given in parens afterwards on the same line. Pravopis does the same thing. The feminine equivalent sopróga given in Pravopis has the tonal notation (ọ́; ọ̑) after it, which I think means that the nominative has tonal ọ́ but the genitive has tonal ọ̑, although I can't be sure about this. Pletersnik uses his own notation which can mostly be converted to the standard tonal notation but expresses additional distinctions found in dialects; I'm not sure what sǫ́prog is doing here, that must be some sort of dialectal form. Benwing2 (talk) 04:21, 29 May 2019 (UTC)
Thanks, Benwing2. I guess, the important questions are:
  1. Is there enough information online to make switch completely to tonal notation? At first glance, I find Fran a little confusing because of many different results. Are there more sources?
  2. It seems tones changes in inflected forms. Some entries show these changes, e.g. brlòg. Is there enough info for that?
  3. It seems it's easy to mix up both notations in some instances. Are (any of) you ready to commit and complete and support this for some time? If it's half done, we could have a mess instead. Correct me if I'm wrong.
  4. I guess casual Slovene users are less familiar or not familiar with this notation or they will use the stress-based. We need to make sure this is addressed.
Also @Rua. --Anatoli T. (обсудить/вклад) 04:44, 29 May 2019 (UTC)
Point two is what I was concerned about. With a lot of searching I found Jože Toporišič's "Slovenski slovnica", which gives quite a few more details about tone in inflectional patterns. The problem of course is that it's in Slovene, so it's taking me a bit to interpret. I'm planning to update w:Slovene declension based on it, when I figure it out. We can then work from that. —Rua (mew) 09:22, 29 May 2019 (UTC)
I've done some work, which can be seen at w:User:Rua/sandbox. It's still a work in progress, and there's still most likely some gaps. —Rua (mew) 16:51, 29 May 2019 (UTC)
@Benwing2, Atitarev I'm currently working on adding {{sl-tonal}} to all entries that don't have it yet. It will no longer have any use once the headword displays the tonal orthography, but I intend to turn it into an automatic IPA generating template instead, and rename it to {{sl-IPA}}. This will also be the way to track progress: as part of converting entries to the tonal orthography, we also change {{sl-tonal}} into {{sl-IPA}}. That way, we can see by the transclusions of the former which entries still need updating.
I am thinking that a bot can handle updating the headwords automatically in many cases. If the number of parameters on {{sl-tonal}} equals the number of headwords in the headword template, and if they match (if the tonal orthography converted to stress orthography equals the headword), then change the headwords to use the tonal orthography and also update the template name to {{sl-IPA}} to track progress. This will not update inflections, but that's ok because inflections need manual attention anyway. I am going to change the inflection tables to display some forms in the collapsed state, a format that some languages already use, e.g. kopen, eallit, kala. This keeps the inflection information in one place and means we don't have to fix the inflections in the headword, we can just remove them if an inflection template is in place. —Rua (mew) 10:09, 31 May 2019 (UTC)
I've now created {{sl-IPA}}, and it works nicely. :) From now on, when you convert entries to the tonal format, please change {{sl-tonal}} to this, or add it if it's not present. That way it's easy to keep track of what has been done. —Rua (mew) 13:26, 31 May 2019 (UTC)
I've run the bot and orphaned {{sl-tonal}} in favour of {{sl-IPA}}. Now, every entry that contains the latter template should have tonal orthography in the headword. There are still a fair number of pages that don't have the IPA template, and which don't have tonal orthography yet. I'll work on those. —Rua (mew) 15:11, 1 June 2019 (UTC)


Excuse me, what does "WTNoD" stand for? I accidentally got tripped by an abuse filter with a good-faith edit, and I saw this phrase in the filter description. 01:08, 29 May 2019 (UTC)

We've had problems with anonymous editors randomly removing big chunks of text from discussion pages like this one and various information pages in the Wiktionary namespace. This filter stops that. Unfortunately, it has no way to know when the text removed was originally added by the same person. The filter has already stopped some very destructive vandalism in the month it's been in place, so I'd really rather not disable it if I can avoid it. I've taken care of the text you wanted to remove, so the immediate problem is solved. If this happens again, one workaround would be to split your deletion into two edits so you don't go over the size threshold, or you can ask someone with an account to do it for you (I'd be happy to). Either that, or sign up for an account: you may think that editing as an IP is protecting your privacy, but in reality, anyone who wants to can tell in less than a minute what Internet Service Provider you're using and your approximate physical location to within a couple dozen miles or so. If you were logged in to an account, the only people with access to that information would be checkusers, and we would have to have a very good reason to look. At any rate, sorry for the inconvenience. Chuck Entz (talk) 03:38, 29 May 2019 (UTC)
@Chuck Entz: I am aware of the above. I do have an account , but I have recently decided to stop using it so that I could slowly depart from this community, partially out of boredom and partially out of real-life circumstances. 19:59, 30 May 2019 (UTC)

Which display is preferable - split or solid?Edit

See also Wiktionary:Grease_pit#Tibetan_་_shouldn't_separate_words_in_headword_lines

I want to gauge people's preference, if it's OK for these two words, e.g. in Khmer and Burmese: ខួរក្បាល (khuə kbaal) and ဦးနှောက် (u:hnauk). This question is applicable to Thai and other scripts

Should the headword display them, regardless if the etymology section exists?

  1. As solid: ខួរក្បាល (khuə kbaal) and ဦးနှောက် (u:hnauk). this revision and this revision.
  2. Split into components, with links to individual parts: ខួរក្បាល (khuə kbaal) and ဦးနှောက် (u:hnauk). this revision and this revision.

Calling @Mahagaja, Octahedron80 to participate. --Anatoli T. (обсудить/вклад) 06:11, 29 May 2019 (UTC)

  • I vote for the unlinked display in the headword line, just as English compounds written without a space are written without the component parts linked. —Mahāgaja · talk 06:18, 29 May 2019 (UTC)
    @Mahagaja: Thanks for the reply. Can you be more specific, why? I have no issue of reading and deciphering the English compound word e.g. airwave (well, I know English, Roman letters and I know these components) but even acetylmannosaminyltransferase becomes a bit problematic. I would support displaying Hungarian agyhártyagyulladás split in two as agyhártyagyulladás--Anatoli T. (обсудить/вклад) 06:28, 29 May 2019 (UTC)
    First, because that's what the etymology section is for, and the etymology section can give more detail, such as which sense of the word is relevant to the compound, whether one element is obsolete, and so on. Secondly, because sometimes simply linking the visible parts of the word leads to wrong results. For example, Burmese ကျောင်းသား (kyaung:sa:) isn't ကျောင်း (kyaung:) plus သား (sa:), it's ကျောင်း (kyaung:) plus -သား (-sa:); and ပန်းသီး (pan:si:) isn't ပန်း (pan:) plus သီး (si:), it's ပန်း (pan:) plus အသီး ( And of course it is possible to write |head=[[ကျောင်း]][[-သား|သား]] or |head=[[ပန်း]][[အသီး|သီး]], but why go to that trouble, especially when all we're doing is copying what the Etymology section says, and in a less informative manner? —Mahāgaja · talk 06:47, 29 May 2019 (UTC)

I always unlink in the headword line. Because they can be described in etymology instead. Besides, some compounds change in spelling when they join, which cannot explain in headword line, or sometimes cannot put wikilink at all. --Octahedron80 (talk) 08:26, 29 May 2019 (UTC)

@Mahagaja, Octahedron80: OK, thanks. I have already switched to not linking on new entries. Yeah, I should mentioned the change spelling. All the more it would be important showing the combining forms (visible) in the header for Thai and Khmer but link to the actual, non-combining entry but, anyway, I'll follow the decision. --Anatoli T. (обсудить/вклад) 11:43, 29 May 2019 (UTC)

Done working on etymologies for a whileEdit

I've gone through much of the Romance inherited lexicon for the big languages, and although there is still considerable work to be done on many of them outside of Romanian, with probably a thousand or two more inherited words to add to the lists. I plan on taking a break from this stuff for a while now that Wiktionary has been expanded a lot more in this regard. I only wish there was a way to prevent users who aren't well-versed in etymology from making uninformed (but I'm sure well-intentioned) edits, but I guess that's the nature of wikis. The good thing is you can always change them, but someone has to go back and monitor. It's a bit annoying having to comb over the categories of inherited words in some languages and seeing some users add blatantly learned/borrowed words to them, probably with the assumption that the vast majority of the language's lexicon is inherited, without taking into account the specific evolution of the word or lack thereof. I guess all I can do is come back every now and then to clean up some things. Of course, others are free to continue working on terms for other Romance languages, which still need a lot of work. I'll check back from time to time if anyone has questions about things. Thanks Word dewd544 (talk) 22:52, 23 June 2019 (UTC)