Wiktionary:Beer parlour/2022/December

Japanese kyujitai edit

Previous discussion: Wiktionary:Beer_parlour/2020/November#Move_kyujitai_to_t:ja-kanjitab

Currently in Japanese entries, both t:ja-kanjitab and the headword template are capable of displaying kyujitai. But obviously we want only one of them. So I suppose the community should make a decision on which to stay and which to go.

To abolish headword line kyujitai, some 6000 pages (Special:WhatLinksHere/Template:tracking/ja-headword/kyu) need cleanup. We need bots to do this.
To abolish t:ja-kanjitab kyujitai, perhaps only no more than 100 pages need cleanup. Much easier. -- Huhu9001 (talk) 10:22, 2 December 2022 (UTC)[reply]

Considering that kyujitai is specific to the details of how a word is spelled in kanji, and that is the entire purpose of {{ja-kanjitab}}, it makes more sense to me to have all of the spelling, script, and reading-type information consolidated into that template. The headword is already crowded with other information, which was (I think) a big part of the impetus in the creation of {{ja-kanjitab}} in the first place. ‑‑ Eiríkr Útlendi │^{Tala við mig} 22:58, 2 December 2022 (UTC)[reply]

To me 旧字体 (kyūjítai) and 歴史的仮名遣い (rekishiteki-kanazúkai) should both be at the end of the "Alternative spellings" box, with their own "Historical spellings" caption and kanji/kana labels. No need to add either to the headword, it's just visual noise. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 12:42, 3 December 2022 (UTC)[reply]

Adding a bit more: separating "normal" alternative spellings from historical spelling would allow us to deal with cases like 掴む (tsukámu) and 我が儘 (wagamáma) in a clearer way.

I would put kyūjítai and rekishiteki-kanazúkai together under "historic" with kanji and kana labels since they were both the standard before the post-war reforms that gave us Modern Japanese orthography, so they do belong together like modern kanji and kana belong together. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 12:52, 3 December 2022 (UTC)[reply]

The automated display of kyūjitai is good but I have been providing the manual |kyu= because the automation requires maintenance and incorrect values do happen, as is the case (currently) with 優勝者(ゆうしょうしゃ) (yūshōsha) where the "Alternative spelling" box shows the same entry name: "優勝者". --Anatoli T. ^{(обсудить}/^вклад) 05:20, 19 December 2022 (UTC)[reply]

@Atitarev: No, the automation is correct. The kanji in your example are actually different. 者 U+8005 and 者 U+FA5B. -- Huhu9001 (talk) 14:32, 21 December 2022 (UTC)[reply]

Why is the hyperlink missing then? The alt form is in bold. Anatoli T. ^{(обсудить}/^вклад) 20:28, 21 December 2022 (UTC)[reply]

@Atitarev: The Wikimedia software does not support CJK Compatibility Ideographs titles (者, forcibly redirected to 者). -- Huhu9001 (talk) 03:10, 22 December 2022 (UTC)[reply]

@Huhu9001: Thanks. I am missing my tools to check this code on this laptop and there is something I may miss technically. If I search for "優勝者" (Ctrl+F) on the entry I get both the entry and the alt form. Perhaps there is a way to notify users of this as it is in 者? Anatoli T. ^{(обсудить}/^вклад) 04:32, 22 December 2022 (UTC)[reply]

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria): I think this is worth a vote if no conclusion can be drawn here. -- Huhu9001 (talk) 08:59, 3 December 2022 (UTC)[reply]

@Huhu9001 As a bot owner but not a Japanese editor, I think we should do what's right irrespective of how many pages need to be changed. Changing 6000 pages by bot is really not a big deal (I did one change a few years ago that hit about 1.4 million pages ...); the only question is how much can be automated vs. how much needs to be done manually. Any idea about that? For example, if 100 pages can't be handled automatically, that's fairly easy to do manually; doing 1,000 pages manually is more difficult and would best be handled by the "push-manual-changes" method I use for such situations (where you load all the pages into a text file, do all the edits there and then push the results using a bot). Benwing2 (talk) 19:44, 3 December 2022 (UTC)[reply]

I agree with Eirikr and Sartma that ja-kanjitab is the more sensible place for kyujitai, as it is a matter of written form. I would have no objection to a 'historical' section that is somehow separated from other alternative written forms. Cnilep (talk) 01:20, 6 December 2022 (UTC)[reply]

I am trying my best to work out a bot script to do this. It takes some time. -- Huhu9001 (talk) 02:09, 15 January 2023 (UTC)[reply]

I have made a bot to push this change. Wiktionary:Votes/bt-2023-02/User:Huhu9001Bot for bot status. (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria): . -- Huhu9001 (talk) 06:24, 27 February 2023 (UTC)[reply]

Position of box templates edit

While working on Umbrian, it was pointed out to me that my usage of {{normalized}} was going against what's written on its documentation, that is, to place it at the end of the entry, where as I place it at the beginning (see avif, persklom, etc.).

This is also common practice with {{LDL}}, and I find it odd, since all the other box templates I can think of are placed at the topmost of the entry: most notably {{reconstructed}} (which actually comes before the L2 header) and {{phrasebook}}. {{hot word}}, although not a box, might also be worth mentioning.

With our current positioning (1) the box is theoretically inside the last header, usually References or Further reading, (2) important information is at the bottom of the page, which is not ideal since due to the bright green everyone is going to look there first thing anyways, so placing it at the top would make the reader follow the normal top-to-bottom order, and (3) it looks worse: I mean... look at the spacing (eg: nyelingur, суъптаъ), it looks like it ended up there by mistake. Cast your votes.

Catonif (talk) 20:15, 3 December 2022 (UTC)[reply]

Top. Vininn126 (talk) 21:21, 3 December 2022 (UTC)[reply]

Top, after L2 header for this template and all other box templates that apply to the entire L2 entry. JeffDoozan (talk) 17:25, 4 December 2022 (UTC)[reply]

At the very Top of L2 section. DCDuring (talk) 17:49, 4 December 2022 (UTC)[reply]

Does that mean above or below the L2 itself? Vininn126 (talk) 17:50, 4 December 2022 (UTC)[reply]

Below the L2. Theknightwho (talk) 19:27, 4 December 2022 (UTC)[reply]

Top makes more sense IMO. —Al-Muqanna المقنع (talk) 21:08, 4 December 2022 (UTC)[reply]

Bottom, as these boxes do not contain any important information at all. MuDavid 栘𩿠 (talk) 01:30, 6 December 2022 (UTC)[reply]

Does this imply that box templates should not be used if they do not apply to all the homographs in the language section? --RichardW57 (talk) 01:51, 6 December 2022 (UTC)[reply]

As appropriate. I think nyelingur, which uses {{LDL}}, looks better as it is. Seeming to be in a References section is actually appropriate for that template. The box for that template will be disruptively thick if it appears at the start of the language section. By contrast, boxes for {{rfv}} indicate something that needs attention; if one is looking up a word found in a durable medium, the user may be able to help. --RichardW57 (talk) 02:04, 6 December 2022 (UTC)[reply]

I suppose I could see moving the LDL template up a la {{hot word}}. Moving {{normalized}} up just seems like clutter, if we're comprehensively normalizing all the words from a reference work in one script/orthography to another script, so my personal aesthetic preference would be to leave that box at the bottom. - -sche (discuss) 02:27, 6 December 2022 (UTC)[reply]

I'm happy to see this matter is gaining so much input.

I would like to underline that we should not have this floating homelessly in the entry, that is, not being technically under any of our headers, defying the structure. If we really want it to be in the References (even though it's not a reference), so be it, as long as in EL we clearly state "the References L3 is for references... and green boxes", and that in the presence of (e.g.) Anagrams, the latter will be placed below. Now this doesn't sound so great, but it's the only way to make the boxes not defy our tree-like structure while still staying at the bottom. Either this, or top.

@MuDavid: while I agree that it is subjective whether the information is important or not, the point is that the bright green is going to attract the eye nonetheless, and in that case, better reading top-to-bottom than top-jump-to-bottom-go-to-top-and-read-to-bottom. @RichardW57: not sure how homographs should be dealt with, but that sounds an exceptionally good reason to have the box at the top (right under the ===Etymology N=== header). For it at the bottom, see at saman#Azerbaijani. @-sche: are you suggesting we move {{LDL}} to the top, but not {{normalized}}? They work and look very similarly, it would be weird to have them with separate positioning. Imagine ката#Udi. About the clutter, I need to point out that (in Umbrian, which I presume is what you're talking about) not all words are normalized, some being lemmatized in the same spelling in which they are actually attested.

Catonif (talk) 13:37, 6 December 2022 (UTC)[reply]

(Currently it's 4-2, so it'll likely be top.) Vininn126 (talk) 13:45, 6 December 2022 (UTC)[reply]

Ok, I waited to not take any premature decision. Seeing that the discussion had the result of top (5-2, counting myself), I can change the the documentations, but on the other hand, I can't manually move all istances of the templates, can this be automated by a bot? Catonif (talk) 15:29, 9 December 2022 (UTC)[reply]

I can have my bot enforce this, which templates should always be at the top of the language entry after the L2 header? Just {{normalized}}, {{reconstructed}} and {{hot word}} or are there others? JeffDoozan (talk) 01:54, 10 December 2022 (UTC)[reply]

Thank you! They should be {{normalized}} and {{LDL}}. {{hot word}} should technically already be there, and {{reconstructed}} is actually before the L2, and I think everyone's fine with that. Catonif (talk) 07:08, 10 December 2022 (UTC)[reply]

Thanks Jeff! Vininn126 (talk) 11:47, 10 December 2022 (UTC)[reply]

The 'top' position is not immediately after the L2 header. It is after the L2 header or Etymology N header. --RichardW57 (talk) 10:37, 11 December 2022 (UTC)[reply]

That's true. @JeffDoozan: could you provide a list of of the entries your bot moved the boxes from that have multiple etymologies?

On another note, we could consider having some sort of |lite= parameter to be enabled in such cases, to make the templates less cluttery, since right under L3 headers the box doesn't look very good. Catonif (talk) 19:02, 14 December 2022 (UTC)[reply]

`{{defdate}}` vs `{{etydate}}` edit

Would anyone mind if I changed etydate to be placed in the etymology line? Vininn126 (talk) 09:23, 5 December 2022 (UTC)[reply]

Support. Hopefully also a bot to do the cleanup. Catonif (talk) 19:03, 5 December 2022 (UTC)[reply]

@Vininn126: I object to your proposal on the grounds of unintelligibility. What change are you proposing? --RichardW57 (talk) 23:42, 5 December 2022 (UTC)[reply]

Object RichardW57 (talk) 23:43, 5 December 2022 (UTC)[reply]

That is... an odd reason to object? Currently etydate is supposed to be on the definition line like defdate. It is overlapping with defdate in that area if you were to put both. Plus it's ETYdate. Vininn126 (talk) 07:49, 6 December 2022 (UTC)[reply]

You mean in the etymology section like ampersand#Polish, or on the definition line like {{defdate}}? When used on the definition line does seem to overlap with defdate, though I concede it's not redundant because it automates "first attested in" and some other things. If we're putting it in the etymology section, IMO it should be reformatted, because I see no reason for it to be in brackets and at a small font size if it's in the etymology section, though it could still be helpful as a time-/keystroke-saving templatization of our current 'handwritten' etymologies like "First attested in 1644; engineering sense first attested in 1793". - -sche (discuss) 01:58, 6 December 2022 (UTC)[reply]

I could really get behind that. If we increased the font we'd want to increase the reference size as well. Vininn126 (talk) 07:49, 6 December 2022 (UTC)[reply]

I agree that it would need reformatting to be moved into the etymology section. Graham11 (talk) 08:03, 6 December 2022 (UTC)[reply]

We can also discuss if it should be at the beginning or the end. Vininn126 (talk) 08:21, 6 December 2022 (UTC)[reply]

Does that need to be determined? I don't think it really matters if an etymology section says "first attested in 1900, from X + Y" or "from X + Y, first attested in 1900", though my personal preference is for the latter. Agree with -sche about removing the brackets and size formatting in any case. —Al-Muqanna المقنع (talk) 13:06, 6 December 2022 (UTC)[reply]

It's been discussed on the discord, plus there's an argument to be made about consistency and uniformity of entries making them easier to read. Vininn126 (talk) 13:13, 6 December 2022 (UTC)[reply]

Discord discussions don't replace discussions in BP. DCDuring (talk) 00:58, 16 December 2022 (UTC)[reply]

That is why I brought it up here! When I say discussed, I mean raised. No decision like that would be made in that way; again, why I brought it up here. Vininn126 (talk) 01:06, 16 December 2022 (UTC)[reply]

Regardless of position, should we move forward with the reformatting of {{etydate}}? I'd like to use this template more but the current formatting is appropriate for glosses and not the etymology section IMO. I've also noticed that there are quite a few Hungarian entries with a raw {{defdate}} in the etymology section, which might need looking into (e.g. aréna). —Al-Muqanna المقنع (talk) 20:18, 14 December 2022 (UTC)[reply]

I was going to wait about 2 weeks but it seems the conversation has died down since the last comment. I would like to make the changes, and then also look for any etydates in the deflines and defdates in the etylines. Vininn126 (talk) 20:19, 14 December 2022 (UTC)[reply]

@-sche@Al-Muqanna@Graham11 I have made the following changes: not make the template print small text and removed the []'s. Currently converting the approrpriate templates. Vininn126 (talk) 20:28, 15 December 2022 (UTC)[reply]

Wugniu tone notation edit

We’ve generally come to a conclusion as to how the Shanghainese Wugniu rollout will work. For a refresher on the romanisation scheme, see User:ND381/Wu Expansion, which also has notes on what will be done for the Wugniu display integration. However, as you can see, we do not yet have a consensus as to how to display tones. Here are a few ideas for you, please leave a comment as to what you prefer. (I’m working on the assumption that we can all agree that left-prominent sandhi is to be notated with a dash, but if you disagree, let me know)

1. Diacritics

Wugniu, as the website displays it, does not have diacritics. However, due to the “two phonemic tones” analysis of Shanghainese, many have opted to simply notate the dark level (陰平) tone with a diacritic - usually acute or grave accent.

non	tsén	mo-ve
儂	真	麻煩

An important advantage of this that this makes the transcription a lot cleaner. Though it is to note that this will not be possible for lects such as Suzhounese where no analyses have all non-first-syllable tones lose phonemic tone.

2. Numbers

What we currently do reflects that of many romanisations, however, Wugniu prioritises historical tone distribution, and thus tones 2-5 will be renumbered 5-8. This is, frankly, all that people which use number notation can agree on. Whether to use super/subscript numbers before/after the syllable are all points of contention. Unfortunately, due to how the old module is programmed, there is no way to re-implement tones for syllables after the first.

a. all behind syllable

non⁶	tsen¹	mo⁶-ve
儂	真	麻煩

b. all behind syllable, except for sandhi chains

non⁶	tsen¹	⁶mo-ve
儂	真	麻煩

c. all in front of syllable

⁶non	¹tsen	⁶mo-ve
儂	真	麻煩

3. Right prominent sandhi

It is also of note that many people don't actually notate the right-prominent sandhi in Shanghainese. However, this can lead to changes of tone. I'm not sure whether we should notate it as well (the current module already forces use of +), and if we do decide to, how we ought to do it.

If there are any further thoughts, let me know as well. (yoinking from justin's message from last time: @Atitarev, Thedarkknightli, ChromeGames, Mteechan) — 義順 (talk) 19:15, 5 December 2022 (UTC)[reply]

My two cents is that "non⁶ tsen¹ ⁶mo-ve" is confusing; I would think that the tone a syllable has should be notated either consistently after or consistently before that syllable, but not in different places. (If the issue is that in this case tsen itself is pronounced with a tone that goes from 1 to 6, I would still think that the indication of this should be attached to tsen, not half to tsen and half to mo.) Of those options (after vs before a syllable), it seems like languages in general and Chinese languages in particular usually notate tones after the relevant syllable (tsen¹), rather than before (¹tsen), so notating tone after the syllable here too would be consistent. - -sche (discuss) 02:24, 6 December 2022 (UTC)[reply]

Upon asking several Shanghainese people (most of which having an understanding of Wugniu and/or linguistics), the general consensus seems to be thaf right-prominent sandhi is too variable to be practical to include (ie. the 1 + 6 should not be written). The overwhelming majority agree that 2c looks the best (including those that know of systems such as Jyutping), with one supporting sticking to 1. I personally also agree that 2c looks the best, but we may want more Wiktionarians to reply first. — 義順 (talk) 23:06, 7 December 2022 (UTC)[reply]

2b should not be used, IMO, because of the ambiguity that -sche mentioned. a and c both seem fine to me. —Al-Muqanna المقنع (talk) 11:46, 8 December 2022 (UTC)[reply]

I generally agree with what's been said above; using diacritics might be lacking potentially-useful information, and (b) is a bit visually confusing without experience. At first glance (a) looks more natural to me, but given the notation of left-prominent sandhi with a dash, I do think (c) is better and I would be partial to it. I'm a bit surprised that right-prominent sandhi would be too variable to include, and I wonder if there would be a way to accommodate it less strictly, but I don't think I well versed enough in the language or romanization to say. ChromeGames (talk) 12:43, 22 December 2022 (UTC)[reply]

@ChromeGames to quote someone I've contacted outside of wikt - "[Right prominent sandhi] would probably be p messy to even attempt to standardise but u run the risk of ending up with like, a pretty artificial representation of tone especially with certain phrases".

One solution is to leave RPS unmarked in display while possible to type in the module, ie. leave it as is. This seems to be the one most supported by people I've asked outside of wikt, and hopefully if we can agree on this, we can start implementing the new module. — 義順 (talk) 04:05, 29 December 2022 (UTC)[reply]

Forgot to mention - Module:wuu-pron/sandbox/documentation#Usage exists and also has a scheme for how the input would work. If there are any comments, leave them somewhere to see — 義順 (talk) 23:35, 19 December 2022 (UTC)[reply]

Location of Footnotes for Etymologies edit

The problematic text is in Wiktionary:Etymology#References. What does "Etymologies should be referenced if possible, ideally by footnotes within the “Etymology” section" mean? Web pages don't naturally do literal footnotes. For talking of Wikimedia pages, I suggest that 'footnote' should normally mean the display the content implied by the domain of a <ref> tag; such is typically displayed as the expansion of a <references/> tag. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

If it means what I think it means, I propose that "footnotes within the "Eymology" section" be replaced by "inline references", and that "inline references" be added to the glossary. Otherwise, I will defend adherences to the currently proposed policy. Silence is consent. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

Should they be below or above further reading? Vininn126 (talk) 14:15, 6 December 2022 (UTC)[reply]

I would expect them to be in a 'References' section. --RichardW57 (talk) 23:52, 6 December 2022 (UTC)[reply]

Yes, but I mean the references section itself. Vininn126 (talk) 08:14, 7 December 2022 (UTC)[reply]

@Vininn126: If they be within the Etymology section, then the order is unspecified, but I feel they would be better outside the etymology section even if we have the bodies of the references within the etymology section. As sisters within the same section of another type, I would expect 'References' to come before 'Further Reading'. RichardW57m (talk) 14:39, 7 December 2022 (UTC)[reply]

I ask because on many pages, particularly Proto Slavic pages, References is under, but I would expect it to be above Further Reading as well. Vininn126 (talk) 09:38, 8 December 2022 (UTC)[reply]

This has relevance to the layout of the etymology section of พริก (prík). Potential edit warrrer: @This, that and the other. --RichardW57 (talk) 01:17, 6 December 2022 (UTC)[reply]

Footnotes should go at the bottom. As you can ready in WT:EL, references go below. MuDavid 栘𩿠 (talk) 01:26, 6 December 2022 (UTC)[reply]

I'll jump in here since I think my bot edits provoked this. I would read that as meaning that Etymologies are simply encouraged to use the <ref> tag and that consequently those references would be displayed in the References section at the end of the entry. I see no good reason for any entry to have more than one References section and certainly not one stuck inside the Etymology. JeffDoozan (talk) 01:46, 6 December 2022 (UTC)[reply]

I agree with Jeff, I'd take this to mean etymologies should have <ref>s, not that the <references/> also needs to be directly in the Etymology section (above the POS section and definitions); I would support rewording the guidance to be clearer. On the last point: sometimes, if an entry has two different etymology sections, it may have ====References==== sections at the end of each overall Etymology division, i.e. after the POS, etc. That seems OK. But yeah, don't put <references/> directly inside the ===Etymology=== section i.e. above the POS and definitions. In the exceptional circumstance that it's needed for the exact quote from a reference to be directly adjacent to the etymology, just quote the reference... - -sche (discuss) 02:10, 6 December 2022 (UTC)[reply]

What do you mean by 'entry'? Do you perhaps mean 'language section' or 'language section or numbered etymology section'? You can't mean 'lemma' or 'form', because there may be multiple lemmas for a single etymology, especially in languages where Europeans readily confound verbs with prepositions, or absolute neuter adjectives with abstract nouns. RichardW57 (talk) 14:07, 6 December 2022 (UTC)[reply]

Apologies for the ambiguity, by entry I mean the entire language section. JeffDoozan (talk) 14:20, 6 December 2022 (UTC)[reply]

I'm in complete concurrence with the three users above. If nobody objects, I'll make the change to EL as suggested by Richard, on the basis of this consensus. This, that and the other (talk) 11:28, 6 December 2022 (UTC)[reply]

Do you mean "WT:Etymology"? --RichardW57 (talk) 14:11, 6 December 2022 (UTC)[reply]

@RichardW57 Yes, I do. I thought you were referring to EL, which is a protected policy page, but as this is WT:E, which is not protected, feel free to make the change yourself. This, that and the other (talk) 09:36, 8 December 2022 (UTC)[reply]

Done. --RichardW57 (talk) 12:38, 11 December 2022 (UTC)[reply]

Syllable breaks in English pronunciations edit

User:Kwamikagami seems to be on a one-person crusade to expunge syllable-break markings from English pronunciation transcriptions (e.g., here, here, here, here, here, here), claiming that English syllabification is theory-dependent, when, in fact, English words naturally fall apart cleanly into separate syllables, something that's inconsistent with syllabification being theory-dependent (as this would require the actual pronunciation of the word to change depending on which theory one subscribes to, which is obviously ludicrous). Other people's thoughts? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:04, 6 December 2022 (UTC)[reply]

Whoop thinks it's "obvious" that Vashti is syllabified /ˈvæ.ʃti/. To me, it's obviously /ˈvæʃ.ti/. But to Wells it's clearly /ˈvæʃt.i/. If we're going to mark syllable boundaries by default, then we need consensus on an algorithm as to where they are. For example, do we agree that in GA girl is disyllabic? And then there's the question of how to handle ambisyllabicity.

~~[Or maybe Ladefoged. I forget: who is it that analyses nitrate as /ˈnaɪtr.eɪt/?]~~ kwami (talk) 08:07, 6 December 2022 (UTC)[reply]

User:Kwamikagami Personally I think you should avoid unilaterally removing syllable boundaries until this has been discussed here and there is consensus to make these changes. Benwing2 (talk) 08:24, 6 December 2022 (UTC)[reply]

@Benwing2: Should we go and undo Kwami's syllable-break purges until there's a consensus here one way or the other? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:46, 6 December 2022 (UTC)[reply]

(ec) Well, Wells-or-Ladefoged-or-whoever apparently has some... interesting ideas about what kinds of consonant clusters can serve as an English syllable coda, ideas that seem to not always correspond perfectly with reality (at least if "/ˈnaɪtɹ.eɪt/" is anything to go by). girl can be either disyllabic (/ˈɡɚ.əl/) or monosyllabic (/ɡɚl/) in GA; this isn't a notational difference, but an actual variation in pronunciation (the GA dialects haven't developed a consensus as to the number of syllables in girl). As for Vashti, I strongly suspect that the difference in what various speakers consider the "obvious" syllabification might well reflect actual differences in pronunciation among GA speakers, similarly to the situation with the number of syllables in girl - in which case this isn't a question of what syllabification theory one subscribes to, but, rather, a question of multiple actual coexisting pronunciations that each need to be included. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:31, 6 December 2022 (UTC)[reply]

What is there to stop /ɡɚəl/ being monosyllabic, like British English /bɪəd/ beard? --RichardW57 (talk) 14:38, 6 December 2022 (UTC)[reply]

The R-coloring of the first schwa. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 01:00, 7 December 2022 (UTC)[reply]

How is that a problem? It is quite possible for only the second half of a vowel to be rhotacised. --RichardW57m (talk) 14:48, 7 December 2022 (UTC)[reply]

As regards ambisyllabicity, the natural way to notate that seems to be to include the consonant in question twice, first as the coda of the first syllable and then as the onset of the second syllable, which also seems to correlate the best with how the words in question actually sound. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:35, 6 December 2022 (UTC)[reply]

No, that's not natural, because it implies that the consonant is held longer than others in the word, which is generally not true. Consonants are not geminate simply by virtue of landing on syllable boundaries. Andrew Sheedy (talk) 08:37, 6 December 2022 (UTC)[reply]

It's not geminate; the first syllable has a half-length coda and the second has a half-length onset, with the syllable break coming in the middle of the sound lying across the syllable boundary. How would you go about notating ambisyllabic pronunciations (and don't try to avoid the problem by omitting syllable boundaries altogether, since that wouldn't help in cases where the presence of stress on at least the second syllable requires the syllable break to be marked)? Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 08:44, 6 December 2022 (UTC)[reply]

The fact that you don't understand something doesn't mean that it doesn't correspond to reality. If you think you know better than internationally recognized experts, then it could be that you understand less than you think you do. kwami (talk) 10:03, 6 December 2022 (UTC)[reply]

Reconfirmed, it is Wells, author of the English Pronouncing Dictionary and Longman Pronunciation Dictionary. Woop, I'm curious how you would syllabify the following words, compared to one of the main RS's for English pronunciation. (Just add periods or hyphens if you like:)

petrol, selfish, feature, dolphin, hamper, brandish, carpeting, crisis, banker, attestation, apex, freedom, mattress, squadron, paltry.

Without concordance, and considering that dictionaries contradict each other, I'm wondering how we would be able to decide on syllabification. kwami (talk) 10:31, 6 December 2022 (UTC)[reply]

@Kwamikagami: /ˈpɛ.tɹəl/, /ˈsɛl.fɪʃ/, /ˈfi.t͡ʃɚ/, /ˈdɔl.fɪn/, /ˈhæm.pɚ/, /ˈbɹæn.dɪʃ/, /ˈkɑɹ.p(ɪ/ə).ɾɪŋ/, /ˈkɹɑi.sɪs/, /ˈbæiŋ.kɚ/, /ˌæ.ɾəˈsɾɛi.ʃ(ɪ/ə)n/, /ˈɛi.pɛks/, /ˈfɹi.dəm/, /ˈmæ.tɹ(ɪ/ə)s/, /ˈskwɔ.dɹ(ɪ/ə)n/, /ˈpɔl.tɹi/. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 00:58, 7 December 2022 (UTC)[reply]

Okay, Wells disagrees with you on every one of those. E.g. for 'selfish', Wells argues it's self.ish, forming a near-minimal pair with 'shellfish', which is syllabified shell.fish. Other dictionaries agree with some of yours but not others. E.g., you have short/lax vowels in open syllables in pe.trol and ma.ttress, which most treatments argue is not allowed in English. So it's not obvious how we should approach this. kwami (talk) 01:06, 7 December 2022 (UTC)[reply]

That's confusing to me, because I interpret the aspiration of a stop in petrol, mattress, paltry as an indication that it comes at the beginning of a syllable, so they would have a syllable onset with /tɹ/ or /t͡ʃɹ/. Similarly with Wisconsin, some people pronounce the c as aspirated [kʰ] and others don't; that means to me that consonant cluster is either split across the syllable boundary /s.k/ or not /sk/. But I don't know how to harmonize this with the lax vowel rule. — Eru·tuon 14:02, 8 December 2022 (UTC)[reply]

Wells argues that /tr/ acts like an affricate in the syllabification of words like /ˈmætr.əs/, but it is not a popular solution. (In accents that don't affricate /tr/, is there any more aspiration here than in words like happy, apple or heckle?) Other alternatives are ambisyllabicity (not as unpopular, but there's far from a consensus in its favor) or concluding that English allows word-medial syllables to end in ways that word-final syllables cannot (this is not so implausible if we view the ban on words like */ˈmæ/ or */səˈmæ/ as having to do with minimal length requirements for feet, rather than restrictions on syllables).--Urszag (talk) 00:16, 9 December 2022 (UTC)[reply]

I often don't affricate the t in mattress and the r is still usually devoiced. But even when it's an affricate I think I'd still aspirate it. — Eru·tuon 03:45, 9 December 2022 (UTC)[reply]

Another possibility would be that the lax-vowel rule isn't an actual rule of English phonology, but merely a coincidental lack of words that violate the so-called rule; a point in favor of this theory would be that some (mostly-onomatopoeically-derived) words do exist which end in lax vowels, like eh and baa. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 06:32, 9 December 2022 (UTC)[reply]

That's not an accidental gap. Interjections frequently have their own phonotactics. E.g. you wouldn't say English is a click language because of tsk! tsk! or tchick! So yes, in lexical vocabulary, English words (and perhaps syllables) do not end in 'lax' vowels. kwami (talk) 08:39, 9 December 2022 (UTC)[reply]

Re "include the consonant in question twice", I'd say that'd be bad for a different reason than Andrew: if I'm understanding correctly, you're suggesting to write something like /-d.d-/, for a case where the word has a /-d-/ sound which is hard to pin down to one syllable or the other? But it's still one /d/; /-d.d-/ would wrongly say there are two consonants, the way there actually are in some words like bookkeeper, or wholely and solely /-l.l-/ when contrasted with holy, soul-y /-l-/. - -sche (discuss) 10:34, 6 December 2022 (UTC)[reply]

Umm, wholly and solely aren't contrasted with holy and souly; they're homophonous with the two latter words. Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 01:04, 7 December 2022 (UTC)[reply]

They're homophones for me as well, but contrastive according to Longman. That may be an RP/GA difference, I don't know. kwami (talk) 01:07, 7 December 2022 (UTC)[reply]

Markedly different for me as a BrE-speaker, both in terms of vowel sound (cf. goat split) and gemination, so probably. —Al-Muqanna المقنع (talk) 01:09, 7 December 2022 (UTC)[reply]

Likewise - different for me. The 'l' is clearly lengthened in wholly and solely, but not in holy and souly. Theknightwho (talk) 02:13, 7 December 2022 (UTC)[reply]

It isn't even an RP/GA difference, as American dictionaries also acknowledge the double l in solely, vs single l in holy. Some speakers don't distinguish them is about as much as can be said, and merger seems to be more common for some words (like wholly, where the original morphemic division whole+-ly has become obscured) than others (like solely and bookkeeper where the fact that they're composed of different parts, one of which ends with /l/ or /k/ and the other of which begins with it, is transparent). Checking Cambridge, the old Century, Collins, Dictionary.com, Longman, MacMillan, Merriam-Webster, the old OED, and Oxford Learner's, all of them have double k as the only option for bookkeeper or bookkeeping (none allow single /k/), and all of them have double l as the only option for solely except MW which allows either double or single /l/. For wholly, Cambridge, Collins, Longman and Oxford Learner's have only double l, Dictionary.com and MW and the OED allow either double or single /l/, Century allows only single /l/ and MacMillan has single /l/ for the US and double for the UK. - -sche (discuss) 06:32, 7 December 2022 (UTC)[reply]

The problem of consonants being ambisyllabic / hard to pin down to one syllable or another is a known/longstanding problem, but it's a problem we face regardless (when we have to insert stress markers), and I don't think we should start removing syllable breaks as a result. (I also don't think "notate syllable breaks like other dictionaries generally do" requires "also list every alternative syllable-breaking scheme any phonologist anywhere has devised.) I will say, in the specific case at hand, /ˈvæʃ.ti/ seems to be a better analysis than /ˈvæ.ʃti/; it's my understanding that English speakers prefer to avoid ending syllables with checked vowels like /æ/ whenever possible (as is readily possible here by ending the syllable in /æʃ/ instead), and Dictionary.com also breaks it as /ˈvæʃ.ti/, and although Collins just has /ˈvæʃti/ without a syllable break marked, they list an alternate pronunciation /ˈvæʃˌtaɪ/ where they do mark the break. - -sche (discuss) 10:34, 6 December 2022 (UTC)[reply]

Can you give an example of where stress marking would require us to decide on ambisyllabicity?

It's not just ambisyllabicity, but that Whoop's idea of "obvious" contradicts mine, that Wells contradicts what is obvious to all three of us (e.g. he has /ˈvæʃt.aɪ/), and that respected dictionaries contradict each other. Given that, how are we to decide how to syllabify words consistently? kwami (talk) 10:48, 6 December 2022 (UTC)[reply]

Also, it's important to remember that we're not transcribing pronunciations here, but rather the phonemic abstractions that underlie the pronunciations. Phonemic analysis may produce something different than what we'd see in a spectrogram. E.g. ambisyllabic consonants might be necessarily codas or onsets phonemically, regardless of how they're realized phonetically. kwami (talk) 11:04, 6 December 2022 (UTC)[reply]

Re "Can you give an example of where stress marking would require us to decide on ambisyllabicity?": well, since you're arguing syllable divisions are inherently or widely ambiguous and hard to decide on, the answer is that any word where the stress isn't on the first syllable will require deciding where, exactly, relative to the word's various consonants and vowels, to insert the stress marker, just as we decide where to insert the /./. - -sche (discuss) 06:32, 7 December 2022 (UTC)[reply]

I don't think it's that bad. Most analyses agree pretty unanimously on syllabifying a consonant that comes between a reduced fully unstressed vowel and a stressed unreduced vowel with the following vowel, e.g. I think hardly anyone would argue for ambisyllabicity of /l/ in a word like political /pəˈlɪtɪkəl/ or of [m] in a word like information /ˌɪn.fɚˈmeɪ.ʃən/. The only type of word where I can imagine it being argued that the consonant at the start of the stressed syllable is ambisyllabic are certain words with an unreduced vowel before the stressed syllable, especially if it is a short/"lax" vowel, such as tattoo, elasticity, plasticity.--Urszag (talk) 08:06, 7 December 2022 (UTC)[reply]

"English words naturally fall apart cleanly into separate syllables" is certainly false. If this were true, there would be no disagreement among theoreticians about how to syllabify English words, but there is, as kwami observes. The perception of syllables by lay speakers is also variable in a number of cases and can be influenced by spelling (see e.g. David Eddington , Rebecca Treiman & Dirk Elzinga (2013) Syllabification of American English: Evidence from a Large-scale Experiment. Part I∗ , Journal of Quantitative Linguistics, 20:1, 45-67, DOI: 10.1080/09296174.2012.754601). Many aspects of syllabification are entirely predictable, and so not that helpful to display; however, there are some small contrasts in pronunciation that in some systems like that of Wells constitute examples of contrastive syllabification, which we ideally would be able to display somehow (either by means of marking syllable boundaries, or in some other way). These contrasts usually involve one of the items having an "unpredictable" syllable division due to an intervening morpheme boundary: hopefully, the placement of those will not be controversial, since the position of morpheme boundaries is generally clear. Of the linked examples, cupola, Vashti, Monty Python, vindaloo don't seem to benefit from showing syllable divisions. But for t-girl and understudy, I think the transcriptions /ˈtiɡɝl/ and /ˈʌndɚstʌdi/ leave some useful information out: they would benefit from either showing a syllable division marker as /ˈti.ɡɝl/ and |/ˈʌndɚ.stʌdi/ or a secondary/tertiary stress marker as /ˈtiˌɡɝl/ and /ˈʌndɚˌstʌdi/. I think I would perceive a slight difference between the rhymes found in these and in hypothetical words "league-earl" and "underce-tuddy" or "underst-uddy". These examples show that transcription of a secondary/tertiary stress after the main stressed syllable in a word is often an alternative possibility to the hypothesis of contrastive syllable divisions. Another example, from Wells, where the distinction that Wells reports making could be explained either in terms of contrastive syllable division or contrastive secondary/tertiary stress is "selfish" (with default syllable divsion, whatever you think that is, and definitely no stress on the second syllable) vs. "shellfish" (per Wells, /ˈʃɛl.fɪʃ/; per our current transcription, /ˈʃɛlˌfɪʃ/).--Urszag (talk) 13:33, 6 December 2022 (UTC)[reply]

It will indeed be useful to mark syllable boundaries in some cases. Another possibility might be to write compounds with a space between the elements in the IPA. We shouldn't use the stress marker as a syllable marker, though: that should only be for stress, and none of your examples have secondary stress. We might want to have a guideline something like "the syllable break should only be used to separate vowels and at morpheme boundaries." Currently we say that it needs to be used for one vowel sequence would would otherwise be ambiguous. kwami (talk) 01:13, 7 December 2022 (UTC)[reply]

With "We shouldn't use the stress marker as a syllable marker" do you mean we should use both the syllable break and stress marker before a non-initial stressed syllable? We try not to do that on Wiktionary; that's regarded as an error and tracked in Category:IPA for English using .ˈ or .ˌ. I get the impression it's avoided on Wikipedia as well. — Eru·tuon 14:08, 8 December 2022 (UTC)[reply]

<.ˈ> and <.ˌ> are correct IPA, but no, that's not what I meant. I meant that we should not use <ˌ> as a substitute for <.> on a non-stressed syllable. kwami (talk) 04:28, 9 December 2022 (UTC)[reply]

Good, I think everybody would agree with that. — Eru·tuon 14:29, 9 December 2022 (UTC)[reply]

`{{la-epithet}}` edit

At some point recently this was changed to be self-contradictory, but as far as I can tell the note-to-the-note is redundant to the usually=1 option. I guess another question is, if a Latin word is exclusively used as a taxonomic epithet and never inflected, shouldn't it just be listed as Translingual? —Al-Muqanna المقنع (talk) 13:00, 6 December 2022 (UTC)[reply]

Right, the current note looks pretty bad. The issue as I think of it so far is that these words are hypothetically supposed to have certain forms built according to Latin rules, but that doesn't mean that they are ever used in any other Latin context, and in practice I'm not sure taxonomic nomenclature should even be categorized as Latin anymore, given how little most coiners of new names actually are involved in a community of Latin speakers or writers. (I'm not sure whether the idea that these names are in Latin has been officially abandoned, or whether that varies depending on the codes according to which different types of organisms are named.) Listing as Translingual is OK; but as the note points out, it's an overgeneralization to say that taxonomic epithets are "not inflected except in the nominative singular"; plural forms are sometimes found and the genitive singular is not infrequently found in the formation of parasite names. So it is useful to provide some further information about inflected forms (if that information is available). There is no fixed pronunciation of these names, but if coined from Latin or Greek roots, the original vowel lengths may also be helpful information as in theory the stress should probably follow the Latin stress rule.--Urszag (talk) 14:16, 6 December 2022 (UTC)[reply]

The note-to-the-note isn't redundant to that parameter. Even if an epithet wasn't used in Latin, there can still be inflection as can be seen by ruderalis (German example, taxonomics, inflected in Dat./Abl. Sg.) and Homo neanderthalensis together with Citations:Homines neanderthalenses (various examples, with Pl.).

Indeed, the note without parameter (i.e. with the text: "Used exclusively as a taxonomic epithet and thus not inflected except in the nominative singular") makes no sense in Latin entries. It's not that taxonomic terms stay uninflected in Latin (like Gen./Dat./Acc./Abl. Sg. Homo sapiens). They are inflected the Latin way in Latin (as ruderalis shows). But some terms simply aren't (attested in) Latin (for which maybe see also Category:Pseudo-loans from Latin by language). --14:29, 6 December 2022 (UTC)

Yes, I see your point, taxonomic epithets can be inflected. In that case I think the template should be reworded—as it stands it just looks like two different editors arguing. —Al-Muqanna المقنع (talk) 14:33, 6 December 2022 (UTC)[reply]

Well, there are some similar issues even with non-taxonomic Latin names being displayed as "singular only". The proper name of an individual person is by its nature not pluralizable as such, but there are semi-productive ways to semantically coerce the meaning of plural proper names by giving them a meaning like "someone named X", "a person like X" or "a version/account of X", and in that case there is often no greater obstacle in Latin than in English to using a morphologically predictable plural form. E.g. consider the form Oedipōrum (currently marked with an RFV since we display Oedipus as "singular only"); I would say this is in reality simply no more or less possible in Latin than "Oedipuses" is in English (found in various contexts, e.g. "the Oedipuses of Harold Bloom and Gilles Deleuze"). Perhaps one could argue that we should have explicit sub-senses for names that have attested uses of that kind (and only for names with attested uses), but that seems a bit impractical and also not that valuable.--Urszag (talk) 14:46, 6 December 2022 (UTC)[reply]

I think another case worth considering is that taxonomic epithets were originally coined and discussed in Latin prose, and continued to be at least into the late 19th century. Epithets would naturally be used and declined in Latin in that context, e.g. here ("in sched. foliis ut in G. Burmauni Cass. et G. natalensi et abyssinica opacis"), where G[erbera] natalensis and G[erbera] abyssinica are in the ablative. —Al-Muqanna المقنع (talk) 15:08, 6 December 2022 (UTC)[reply]

Proper nouns: That's (IMHO) another topic. Proper nouns can be set in plural as pointed out above. For Hercules and Oedipus a plural is also mentioned in dictionaries. Though sometimes it's not the plural of a proper noun (with a meaning like multiple persons named X), but instead the proper noun turned into a common noun (person like X, with characteristics of X) and then for the common noun there is a plural. Example: There're Krösus (proper noun, a certain rich king) and Krösus (common noun, rich person, has a plural). --22:05, 6 December 2022 (UTC)

Per the above I've adjusted the template to remove the note-to-the-note and change the wording in the template's relevant forms to indicate that other inflections may be theoretical/rarely found as appropriate, rather than that they are theoretical. —Al-Muqanna المقنع (talk) 18:15, 10 December 2022 (UTC)[reply]

Let's add Jisho.org to the abuse filter edit

There have been edits that are based on copying from jisho.org. The problem with Jisho.org is that it is a tertiary source. Here is an example and its reversion. Here is another example. The content and tone of these edits tend to be a bit more informal than usual, and these are being slightly more frequent lately (the remaining ones that weren't reverted).

If this isn't WP:COPYVIO, this has a potential risk of being WP:CIRCULAR. Jisho.org is a site that aggregates information from different dictionaries to present a user-friendly display. It's like trying to cite Google.com. Perhaps one day it might even source information from Wiktionary itself, which would have us copying from our own mirror. Don't get me wrong, I use Jisho.org a lot to help me study Japanese. The thing is that it, like any tertiary non-expert source, needs to be cross-referenced and I do that with AnkiWeb and Google Translate. Before submitting to Wiktionary I go further and cross-reference against Yahoo Chiebukuro, DeepL, HiNative, and the underlying dictionaries that Jisho.org displays from. This at least resolves the copyright concerns especially with the restrictive EDRDG license.

If one does not want to look through all those sources, then one could at least cite the dictionaries that Jisho.org uses. That site states that it uses "the JMdict, Kanjidic2, JMnedict and Radkfile dictionary files". Those appear professional and are primary/secondary expert sources which are acceptable. Nippon Jisho is a different source that's probably fine, but Jisho.org isn't citable. The users adding these seem to be well-intentioned although beginner or intermediate Japanese students. Advanced students know how to consult a wide variety of sources like Japanese-Japanese dictionaries. If there is a warning before entering Jisho.org in the article bodies or edit summary, sources will be more critically examined and the quality of edits should improve. Therefore, I propose adding Jisho.org to the abuse filter. Daniel.z.tg (talk) 12:00, 10 December 2022 (UTC)[reply]

@Daniel.z.tg: Thank you for the post. I wholly agree that Jisho.org should not be usable as a reference. I am not sure how an abuse filter would prevent this, however, and I defer to the other editors who maintain the filters. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:20, 13 December 2022 (UTC)[reply]

Pinging @Fish bowl, Eirikr: Daniel.z.tg (talk) 19:37, 18 December 2022 (UTC)[reply]

Narrow IPA norms for English edit

Let's try to make a list/table of how things should be represented in narrow IPA for GenAm (and British if possible). Appendix:English pronunciation already has a few notes, e.g. that word-initial /p t tʃ k/ are aspirated [pʰ tʰ tʃʰ kʰ], but we should try to cover as much as possible: "narrow IPA for morpheme-final /e/ (day, gayly) should be [___] while narrow IPA for /e/ before same-morpheme /l/ (gale-y) should be [___], [___]", etc, etc; then we could make an effort to add (consistent) narrow IPA to entries more routinely.
I figure, it we routinely have narrow IPA covering flapping, aspiration, dark L, vowel allophones, etc, it'll address some of the concern that we make it seem like certain things have the same vowels or consonants when they actually (allophonically) differ, while the broad IPA stays phonemic. But I figure we should establish agreed-on notations, not just encourage everyone to add whatever narrow IPA seems right to them, because recent discussions amply demonstrate that people are often both confident and mistaken in their assessments of what the typical GenAm (etc) pronunciation of something is. So, what norms can you think of for narrow IPA notations of GenAm, British, etc; e.g., in what situations is /u/ one thing and when is it another? - -sche (discuss) 20:57, 10 December 2022 (UTC)[reply]

Template:hcol, Template:hrow, Template:zcol, Template:zcol+, Template:acol, Template:topx, Template:topx+, , Template:exp-topx, etc. edit

What a mess. @Useigor Why have you created such a profusion of row/column templates when we already have {{col}} and variants such as {{col2}}, {{col3}}, etc. as well as {{top2}}, {{top3}}, etc.? Can you please explain what they accomplish that the existing templates don't? We need to clean this up, and I am going to undo all your changes unless there is a good reason for them and a clear plan to clean them up. Thanks. Benwing2 (talk) 01:59, 12 December 2022 (UTC)[reply]

Happy birthday Wiktionary! edit

Apparently it's our 20th birthday today. We should rename ourselves "Wikintionary" for the day, or week... This, that and the other (talk) 06:48, 12 December 2022 (UTC)[reply]

Congrats! That's a major milestone in a lifetime! And that joke deserve a round of applause too!

Noé 08:43, 12 December 2022 (UTC)[reply]

Here's to another 20 years! Vininn126 (talk) 10:23, 12 December 2022 (UTC)[reply]

This makes me feel old. - TheDaveRoss 13:23, 13 December 2022 (UTC)[reply]

Reminder to provide feedback on the Movement Charter content edit

Hi all,

We are in the middle of the community consultation period on the three draft sections of the Movement Charter: Preamble, Values & Principles, and Roles & Responsibilities (statement of intent). The community consultation period will last until December 18, 2022. The Movement Charter Drafting Committee (MCDC) encourages everyone who is interested in the governance of the Wikimedia movement to share their thoughts and opinions on the draft content of the Charter.

How to share your feedback?

Interested people can share their feedback via different channels provided below:

Fill out a survey (optional and anonymous, accessible in different languages)
Share your thoughts and feedback on the Meta Talk pages:
- Preamble
- Values & Principles
- Roles & Responsibilities (statement of intent)
Share your thoughts and feedback on the MS Forum:
- Preamble
- Values & Principles
- Roles & Responsibilities (statement of intent)
Send an email to: movementcharter@wikimedia.org, if you have other feedback to the MCDC.

If you want to help include your community in the consultation period, you are encouraged to become a Movement Charter Ambassador. Please find out more about it here.

Thank you for your participation!

On behalf of the Movement Charter Drafting Committee Mervat (WMF) (talk) 13:00, 12 December 2022 (UTC)[reply]

Links to reflexive Polish verbs edit

What prompts my enquiry is that the 3rd person plural verb form "podobają" page shows its "inflection of" link as "podobać się" rather than "podobać", in that case resulting in a red link. (The same thing applies to spodoba/spodobać się/spodobać).

Looking through the reflexive verbs category for some (apparently rare) similar examples, I notice that the synonyms listed on the "pojawiać" page are shown on the page as "ukazywać się" and "zjawiać się", but are linked within the "inflection of" template to the "ukazywać" and "zjawiać" pages. On the other hand, within the 3rd person singular form "pojawia" page, the "inflection of" link "pojawiać się" links to the pojawiać page via a redirect (presumably because that page was originally titled "pojawiać się"?).

My question is - are those "verb+się" links now optional (just used occasionally to specifically point out the reflexive part of a verb)? Or are there any specific/changed rules? i.e. For example, is it best to leave that "podobać się" link visible and link it to the "podobać" page within the "inflection of" template? Or is it now best to completely remove "-się" from the link? Thanks. DaveyLiverpool (talk) 14:46, 12 December 2022 (UTC)[reply]

At one point the pagename included się, but instead we opted to have it be a label, considering the Polish reflexive word is a mobile particle (as opposed to anchored to the word, like Russian). Most pages were completely switched - but I guess nonlemmas were skipped. Perhaps a bot owner would be willing to help...

On occasion hard redirect pages are made so that interwiki linking can work. Vininn126 (talk) 15:01, 12 December 2022 (UTC)[reply]

IMO (a) we should eliminate the hard redirects, (b) if the choice was made to lemmatize reflexive-only verbs at their non-reflexive equivalent, the non-lemma forms should point to the non-reflexive equivalent. podobają is not the third person plural present of podobać się; that would presumably be 'podobają się'. Benwing2 (talk) 07:51, 13 December 2022 (UTC)[reply]

I agree to b, but I don't get a. What about interwiki linking? Vininn126 (talk) 09:51, 13 December 2022 (UTC)[reply]

@Vininn126 Hmm. Are you saying that Polish Wiktionary lemmatizes reflexive-only verbs at their reflexive version? (IMO this is actually the correct thing to do, and it's how Spanish, Portuguese and I think Bulgarian currently work. If the verb is reflexive-only, it's lemmatized at the reflexive version, otherwise at the non-reflexive version with a 'reflexive' tag.) In that case we should keep redirects only for the reflexive-only terms and make them soft redirects using {{reflexive of}}. Benwing2 (talk) 03:22, 14 December 2022 (UTC)[reply]

What I'm saying is en.wiktinonary will take a reflexive verb like bać się and set the page name to bać. nonlemmas such as "bałaś się" should be set to "bałaś", but the lemma should at least have a redirect set because on pl.wikt they have bać się. If there is a verb that is both reflexive and non-reflexive we set up no such redirect. Vininn126 (talk) 10:21, 14 December 2022 (UTC)[reply]

@Vininn126 Right, that is fine with me although we should use soft rather than hard redirects; hard redirects are generally dispreferred in Wiktionary. Again I'd prefer to lemmatize reflexive-only verbs with the reflexive particle in the pagename but the consensus of the Polish editors should take precedence. Benwing2 (talk) 01:35, 15 December 2022 (UTC)[reply]

@Hythonia @BigDom @KamiruPL Any thoughts as to whether reflexive only verbs should be lemmatized with się in the pagename? Where was the previous discussion on this? Vininn126 (talk) 13:47, 15 December 2022 (UTC)[reply]

I'd support that, yeah. I'm a fan of Polish Wiktionary's solution. I don't recall taking part in a discussion about it here, we definitely talked about it on the Wiktionary discord and the idea of lemmatizing reflexive-only verbs with the particle się was emphatically put in the "we should talk about this someday" box. Which I'm glad someone's finally opened. Hythonia (talk) 17:01, 15 December 2022 (UTC)[reply]

I'd be fine with this, I suppose. I'd also like to make it more clear on pages with another reflexive meaning, because as it stands now it's just a label which someone can very easily miss. What if on pages where there are transitive and reflexive meanings we have a second headword with a head printed |head=verb się? or should it be in the label, like I've seen before? Vininn126 (talk) 17:27, 15 December 2022 (UTC)[reply]

@Vininn126 That's an interesting solution. My only potential concern would be if it is common for a given reflexive meaning of a verb to also exist in non-reflexive form (e.g. in Portuguese, for many verbs the reflexive particle is optional). If so, this would potentially lead to duplication of information between the reflexive and non-reflexive headers. Otherwise it seems like a good idea. Benwing2 (talk) 07:18, 16 December 2022 (UTC)[reply]

Also is the reflexive particle uniformly a separate word 'się' placed after the verb in all inflections of the verb? If not, and it varies depending on the specific inflected form, it would IMO make sense to have a separate reflexive conjugation table (we do this in Portuguese, for example, where depending on the particular inflection the reflexive particle is 'me', 'te', 'se', 'nos' or 'vos' and variously goes before, after or in the middle of the verb; see lembrar for an example). Benwing2 (talk) 07:21, 16 December 2022 (UTC)[reply]

@Benwing2 The reflexive particle is basically always required - the only time it isn't is when there's another reflexive verb in the clause and the particle is already there. The particle is mobile but dictionaries already always print it after the verb in their headwords. Vininn126 (talk) 10:22, 16 December 2022 (UTC)[reply]

I think it might be a better idea to have a parameter in the headword for reflexive that does what I am suggesting? Vininn126 (talk) 10:23, 16 December 2022 (UTC)[reply]

@Benwing2: My only potential concern would be if it is common for a given reflexive meaning of a verb to also exist in non-reflexive form. For clarity, this happens for a couple of words in Polish (see przyzostać), but it's not at all a common occurence. As Vininn said, the particle is mobile, but generally when considering a form separately, outside of the context of a sentence it's always placed after the verb. Hythonia (talk) 11:45, 17 December 2022 (UTC)[reply]

In that case would two labels be better, like here, or to modify the head? Vininn126 (talk) 12:16, 17 December 2022 (UTC)[reply]

For non-lemma pages, then, (until any decisions are made around possible changes to lemma pages) to keep it simple would it be appropriate to just add się in brackets (or something similar) where there is at least one such reflexive form at the lemma page? - See podoba#Polish. That way it doesn't matter if the verb is reflexive-only or there is a mix of reflexive and non-reflexive forms at the lemma page - it still shows the user at a glance that there is at least one such reflexive use.DaveyLiverpool (talk) 21:20, 18 December 2022 (UTC)[reply]

Community Wishlist Survey 2023 opens in January edit

Please help translate to your language

(There is a translatable version of this message on MetaWiki)

Hello

The Community Wishlist Survey (CWS) 2023, which lets contributors propose and vote for tools and improvements, starts next month on Monday, 23 January 2023, at 18:00 UTC and will continue annually.

We are inviting you to share your ideas for technical improvements to our tools and platforms. Long experience in editing or technical skills is not required. If you have ever used our software and thought of an idea to improve it, this is the place to come share those ideas!

The dates for the phases of the Survey will be as follows:

Phase 1: Submit, discuss, and revise proposals – Monday, Jan 23, 2023 to Sunday, Feb 6, 2023
Phase 2: WMF/Community Tech reviews and organizes proposals – Monday, Jan 30, 2023 to Friday, Feb 10, 2023
Phase 3: Vote on proposals – Friday, Feb 10, 2023 to Friday, Feb 24, 2023
Phase 4: Results posted – Tuesday, Feb 28, 2023

If you want to start writing out your ideas ahead of the Survey, you can start thinking about your proposals and draft them in the CWS sandbox.

We are grateful to all who participated last year. See you in January 2023!

Thank you! Community Tech, STei (WMF) 16:44, 15 December 2022 (UTC)[reply]

Oghuz language edit

We should add Oghuz for the Oghuz language of Kashgarî's Divanü Lügatit Türk. (See Middle Turkic languages). We have qwm "Kipchak", but we don't have Oghuz. DLT really has huge Oghuz data that should be improved in Wiktionary. trk-ogz or ogz is the possible codes for it. BurakD53 (talk) 17:28, 15 December 2022 (UTC)[reply]

How is it different from Category:Proto-Oghuz? Vahag (talk) 17:37, 15 December 2022 (UTC)[reply]

I believe Proto-Oghuz must be a reconstructed language. BurakD53 (talk) 18:39, 15 December 2022 (UTC)[reply]

But what is the relation between the family of all Oghuz languages, their common ancestor Proto-Oghuz, and this language you say we should add? Does it have an ISO language code? --Lambiam 06:11, 16 December 2022 (UTC)[reply]

If Proto-Oghuz is okay, I will enter Proto-Oghuz lemmas. Both Proto-Oghuz and Oghuz don't have an ISO language code. BurakD53 (talk) 06:32, 16 December 2022 (UTC)[reply]

Possibly this user is using "Oghuz language" to refer to Old Anatolian Turkish. If so, this already has a code in Wiktionary. Benwing2 (talk) 07:14, 16 December 2022 (UTC)[reply]

OK. I'll count it in OAT. Thanks. BurakD53 (talk) 08:44, 16 December 2022 (UTC)[reply]

Note that the language code is trk-oat, and that we use the Ottoman Turkish variety of the Perso-Arabic script, so if we had an entry for the proper noun Oghuz, it should be found at Old Anatolian Turkish اغوز. --Lambiam 12:10, 17 December 2022 (UTC)[reply]

@Lambiam According to Wikipedia, short vowel diacritics were used, so it might actually have been Old Anatolian Turkish اُغُوز or similar; you'd have to go by what the actual sources said. Benwing2 (talk) 19:41, 17 December 2022 (UTC)[reply]

Or perhaps ٱغُز; see Oghuz Turks. But indeed, we must go by the sources. --Lambiam 21:45, 17 December 2022 (UTC)[reply]

@Benwing2, Lambiam: Wikipedia’s presentation is misleading there. The Ottoman literary language was radically developing across that timespan to which we ascribe Old Anatolian Turkish — while at the beginning it had this diacritic and inplene vowel writing, although this may be reflective of the available texts regularly consisting of poetry, at the end, and this means the whole of the 15th century (when Azerbaijani was not a distinct language and when the Ottomans became prominent on the scene of the world), it went out the Ottoman spelling. (At the same time some orthography habits in Persian and Arabic that we know had developed, such as even the death of rasm; hand-writing during various centuries in the Middle Ages exposes lots of difference (that in the case of Persian and Arabic text editions, undiplomatic as they be, we like to even out in favour of most modern distinctions, while for (Old) Ottoman we ask the same weird questions as for editions of Middle Dutch).) Fay Freak (talk) 08:40, 20 December 2022 (UTC)[reply]

Bragging edit

According to Category:English lemmas, we're closing in on 700,000 English words. Obviously we need to boast about this. Time to get our Twitter, Facebook, Instagram and Tinder pages updated. And perhaps the world's major publications would like to run front-pages stories about this too... Are we actually the biggest dictionary in the universe? Flackofnubs (talk) 16:01, 16 December 2022 (UTC)[reply]

Don't tell the press about us, we'll get cancelled. Equinox ◑ 16:13, 16 December 2022 (UTC)[reply]

Quoting a passage with multiple lines edit

What's the best way to quote a multi-line passage like the one found on 100s & 1000s using {{quote-book}}? JeffDoozan (talk) 02:00, 17 December 2022 (UTC)[reply]

I think Wiktionary:Quotations#Line breaks explains it pretty well, use a <br> tag if you want to preserve the line break or a / or ¶ character as appropriate if you don't. The paragraphs in the quotation in the entry were breaking the layout so I've changed it to use <br>. —Al-Muqanna المقنع (talk) 02:11, 17 December 2022 (UTC)[reply]

I greatly prefer using a character (usually "/" or, actually, / ) to indicate the line breaks without having the passage risk pushing other content off the screen too much of the time. DCDuring (talk) 02:21, 17 December 2022 (UTC)[reply]

I think that's fair generally, I figured in the case of a recipe like at this entry it would seem a bit odd though I wouldn't protest either way. —Al-Muqanna المقنع (talk) 02:23, 17 December 2022 (UTC)[reply]

I personally prefer newlines, although since seeing a discussion where someone complained about them I've switched to using slashes (in most cases). One problem is that it's not ideal in situations where the text itself contains a slash, e.g. see the 2020 citation on Citations:tiki torcher I added earlier today. 98.170.164.88 02:25, 17 December 2022 (UTC)[reply]

</p><p> also works (see, e.g., the quotation on Appendix:Gestures/breasts). Whoop whoop pull up ^{Bitching Betty ⚧️ Averted crashes} 03:18, 17 December 2022 (UTC)[reply]

Wikimedia Sound Logo Voting: Final days! edit

Hello (:

The Wikimedia Sound Logo contest presented the 10 finalist, out of 3,000 submissions from 135 countries.
Play a part and help us decide what the Sum of All Human Knowledge sounds like!

The voting is open until 19 December 2022, 23:59 UTC.
Check the info on how to vote on Wikimedia Commons; or about the contest on the project's page on Meta-Wiki.

Best,
CalliandraDysantha-WMF (talk) 02:04, 18 December 2022 (UTC)[reply]

Macro-English languages, Macro-French languages, etc. edit

@Theknightwho I'm sorry but this absolutely needs to be discussed before implementing. You can't just create a bunch of new, questionable macro-families in Module:families/data with no prior discussion. Please revert pending consensus. Benwing2 (talk) 05:23, 19 December 2022 (UTC)[reply]

@Benwing2 I assumed these would be uncontroversial, as they're almost exactly what we already had: collections of languages with English, French etc as a common ancestor. The reason for doing it this way is because there was no way to create subfamilies within the descendants without having a main family that encompassed all of them. Theknightwho (talk) 05:29, 19 December 2022 (UTC)[reply]

Maybe the naming was inspired by Glottolog, which has Macro-English and Macro-French as families. I can't say I'm familiar with these terms, though, and I couldn't find any scholarly works that use them (except in a critique of Glottolog's hierarchization). 70.172.194.25 05:30, 19 December 2022 (UTC)[reply]

@Theknightwho I think the IP hits the nail on the head; these sorts of families aren't accepted practice and this is a big change in the way that families are structured (which is definitely a potentially controversial issue, and our current scheme has evolved gradually through lots of discussions). I also don't quite understand "there was no way to create subfamilies within the descendants without having a main family that encompassed all of them" means; is this a technical limitation? Benwing2 (talk) 05:38, 19 December 2022 (UTC)[reply]

I should also add, this introduces hundreds or thousands of new categories "Foo terms derived from Gallo-Romance languages", "Foo terms derived from Macro-Portuguese languages", etc. Benwing2 (talk) 05:40, 19 December 2022 (UTC)[reply]

@Benwing2 Yes - it does seem to be a technical limitation, and yes, 98 is correct as to where the naming scheme came from. There are three: Macro-English, Macro-French and Macro-Portuguese.

The main issue is that it isn't possible to set ancestors of families - you can only give them proto-languages. If a proto-language is attached to multiple families, it doesn't seem to work properly, as it groups every descendant language under one of those families (including those we don't want to be in any subfamily) while leaving the other subfamilies empty. Theknightwho (talk) 05:41, 19 December 2022 (UTC)[reply]

@Theknightwho The way to deal with a technical limitation is never to work around it like this, but to solve it properly; first make a proposal and circulate it (you agreed to do this with future module changes, I think) and then implement it. Benwing2 (talk) 06:10, 19 December 2022 (UTC)[reply]

Please don't make any more changes to Module:families/data while this discussion is happening. I have already asked you to revert. Benwing2 (talk) 06:30, 19 December 2022 (UTC)[reply]

@Theknightwho ^^^^ Benwing2 (talk) 06:31, 19 December 2022 (UTC)[reply]

@Benwing2 I needed to stabilise the changes I was in the middle of, because otherwise modules become incompatible with each other. The Indo-Aryan languages that I was just sorting out had a straight-up broken structure, so it needed doing.

On the macro-languages, I am fine with getting rid of them at the earliest opportunity, but undoing that will break the subfamily structure that I've created in the creole descendants. Would it not be preferable to find a technical solution first, and to then implement that, rather than just getting rid of everything? Theknightwho (talk) 06:40, 19 December 2022 (UTC)[reply]

@Theknightwho Once again it feels to me like you are trying to claim your changes made without consensus are a fait accompli and that it would be too much trouble to back the changes out. After the last time this happened, you agreed to shop around any significant module changes before making them, but then went ahead and did this without any prior discussion. So no I don't think it's preferable to keep these changes in place while we figure out what the right thing to do is, because (a) I don't agree with all these newly-added intermediate families, and I suspect many others don't like them either, because many of them don't reflect any sort of linguistic consensus; (b) the right solution is not likely to involve such intermediate families in any case; (c) the creole descendants can be made descendants of the acrolect (I think that's how they have been done up till now in any case), or you can just wait for the technical solution to be worked out. In sum I really think you should back out your changes *before* we work out the proper solution. In this case I'll wait for you to do that but if a situation like this recurs, it's likely I will back them out myself, no questions asked. Benwing2 (talk) 09:22, 19 December 2022 (UTC)[reply]

@Benwing2 You're raising separate issues here:

If the problem is just the macro-language families mentioned in the title, then I simply disagree that they add anything meaningful to the structure. None of us want to keep them, in any event. Of course the creoles can be made descendents of the macrolect, but that also entails removing all of the sub-structure.
If the issue is the fact that there are additional language families at all, then that's a separate issue entirely, and obviously it's fine for us to have a discussion about them. However, I just don't agree with you that it's appropriate to list 30 creole languages under English without any further organisational structure.
What we agreed in relation to modules related to potential technical issues, not linguistic content.

Theknightwho (talk) 09:46, 19 December 2022 (UTC)[reply]

@Benwing2 I've reverted these, but we're already seeing problems at pages such as जस, caused by the fact that Ardhamagadhi Prakrit is both a full language (pka) and an etymology-only language (inc-pka), but the full language is (wrongly) not set as ancestral. In my changes, I made sure it was, and corrected about 100 of these pages because ~~we obviously want to deprecate the etym-only code~~. Apparently we want to deprecate the language codes, so I'll switch them. In any event, the current situation is a problem. There are seven of these duplicate codes (that I know of):

Ardhamagadhi Prakrit: pka / inc-pka
Helu: elu-prk / inc-elu
Khasa Prakrit: inc-kha / inc-khs
Magadhi Prakrit: inc-mgd / inc-pmg
Maharastri Prakrit: pmh / inc-pmh
Paisaci Prakrit: inc-psc / inc-psi
Sauraseni Prakrit: psu / inc-pse

Theknightwho (talk) 10:30, 19 December 2022 (UTC)[reply]

@Theknightwho Thank you very much for reverting. Apologies for not responding yesterday, some RL issues came up. I'm not at all opposed to solving the Prakrit issues, it's just that they were mixed in with a bunch of other changes. For those issues in particular, the Indo-Aryan language community made a decision to switch from having separate language codes for the various Prakrit varieties to having a single language "Prakrit" with etymology-only variants. I had no part in this and I don't know enough about the linguistic situation with the various Prakrits to judge whether this was the right decision, but I know the change was made piecemeal once it was decided on, and I'm not surprised this left a mess in some places. From my perspective please feel free to fix it, but do ping the Indo-Aryan editors e.g. (Notifying AryamanA, Kutchkutch, Bhagadatta, Inqilābī, Msasag, Svartava, RichardW57): . Let me read the rest of what you wrote and I will respond to it. Benwing2 (talk) 01:59, 21 December 2022 (UTC)[reply]

@Theknightwho So here are my thoughts:

Some of the changes you made seem uncontroversial, such as adding a Gallo-Romance node; not sure why this wasn't already there.
Some of them seem problematic, like the Macro-{English,French,Portuguese} families.
Some of them I'm not sure of in terms of how well they are supported, such as West Scandinavian.

Overall I'm not sure what all changes were made, and I think my main concern is that you made a whole bunch of changes with types (1), (2) and (3) just mentioned all lumped together. I would suggest the following: Make a summary here of the changes involved. Some users have been heavily involved in adding new languages and families and ensuring they're properly classified, e.g. I think User:-sche, and they should be able to comment more. If no one objects after a week or so, go ahead and made the type (1) and type (3) changes. Meanwhile we can try to figure out how to handle the various creole languages. Overall I don't see a problem listing them directly under English/French/Portuguese (after all they are in fact descendants of those languages, maybe of versions from 200-300 years ago rather than present-day versions, but those still count as English etc.; similar to how Afrikaans is a descendant of Dutch). As for adding structure to the creole languages, I don't object to that but we should tread carefully as there are probably scholarly disagreements as to how to classify them. Benwing2 (talk) 06:44, 22 December 2022 (UTC)[reply]

Arabic transliterations: let's use ʔ and ʕ instead of ʾ and ʿ. edit

(Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fenakhay, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka): — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 23:35, 19 December 2022 (UTC)[reply]

It looks like this suggestion may pass. I will, of course, respect the decision but it still seems wrong. No transliteration is perfect and liked by everybody and we may kiss goodbye to be found by standard searches when someone tries to find عِلْم (ʕilm) while using "ʿilm" or عَرَبِيّ (ʕarabiyy) by using ʿarabiyy. Anatoli T. ^{(обсудить}/^вклад) 22:47, 20 December 2022 (UTC)[reply]

@Atitarev: I would still argue that searches by transliterations are not the main reason for having transliterations on Wiktionary anyway, just a side effect. If that was the case, we should at list give "chat Arabic" too, and possibly other common transliteration systems. If we wanted to have searchable Arabic romanization, we should do what we do for Chinese, Japanese and any other language here that has Romanization entries. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:17, 21 December 2022 (UTC)[reply]

There are various diacritics and symbols that may be disliked and seem difficult to see. I don't quite believe that ʾ and ʿ that hard to see or distinguish. If I, for example, have trouble seeing any symbols, I use glasses or make the display larger. Besides, options to make symbols larger or clearer have been suggested in this thread. I haven't seen any prior discussions to discard some symbols based on their visibility.

If we abandon these two symbols, there is no point to sticking to the current standard after that. What's the point of keeping those ṣ, ṯ, ḵ, ḥ, ā, ḡ, etc. if the resulting strings are no longer standard, anyway?

Multiple transliterations systems are added to languages you mentioned in pronunciation sections. Arabic has a very complex inflection and transliterations appear in the inflection tables, many Arabic entries lack pronunciation sections, so adding a single place with alt. transliterations won't make a lot of difference, IMO. Anatoli T. ^{(обсудить}/^вклад) 00:43, 21 December 2022 (UTC)[reply]

@Atitarev: the point is: ء (ʾ) and ع (ʿ) are not just "diacritics" or "symbols". They are full-fat proper letters. You can't just downplay everything saying "put on glasses or make everything bigger if your eyes aren't good enough!". My eyes are good enough to read all other letters in an Arabic transliteration, BUT THOSE TWO, which for some reasons that are inexplicable to me, have been transliterated with diacritics, instead of full-on grown-ass proper letters. I'm glad you kept calling them "diacritics" or "symbols", because that means it's clear to you too that those two signs are not letters. We're asking for something completely reasonable here. I understand it might be uneasy to move away from the status quō, and it might have its technical difficulties, but the reasons to do so are extremely solid and worth the uneasiness. We're not proposing this on a whim. Most people who didn't think twice to support this proposal are editors who work with Semitic languages every day and have had enough of those two weird apostrophes, and for a very long time now. And I'm sure our readers/users would thank us immensely, too.

As for abandoning the standard, I'm actually more than happy to discuss other letters too, especially in the direction of a Common Semitic Transliteration System. But. We're just proposing to modify the standard for two letters here, and for extremely very good reasons. We're tweaking the standard (to improve it), we are not abandoning it. There's not just two poles: the standard and chaos. Modifying the standard doesn't make us suddenly fall into chaos. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:25, 21 December 2022 (UTC)[reply]

(Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fenakhay, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka):

Let's just do it. Whoever thought it was a good idea to use ʾ and ʿ to transliterate ء and ع definitely had very good eyes, but for the rest of us normal people, those two sings are just hell to tell apart. Plus, they give the impression those are not "real" consonants, but just little more than an apostrophe. Let's give consonantal dignity back to 2alif and 3ayn! Let's give a break to our users and their poor eyes. Let's vote for what makes sense! Let's vote for ʔ and ʕ in Arabic transliterations! — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 21:27, 19 December 2022 (UTC)[reply]

Support. Andrew Sheedy (talk) 23:04, 19 December 2022 (UTC)[reply]

I'm striking my support, because I think Anatoli's point about people potentially searching using standard transliteration is a good one. I won't oppose, however, because I think there are still good reasons for making this change. Andrew Sheedy (talk) 23:25, 20 December 2022 (UTC)[reply]

Oppose. Why not use numbers, 2's and 3's like in your post and switch to chat Arabic, LOL? The current system is not made by people who wanted to make the life harder but it's based on Hans Wehr dictionary. Romanization of Arabic shows other systems, none of them, apart from API, uses ʔ and ʕ.

BTW, no pings were sent, since pings and signature were in different edits. I read this topic accidentally. --Anatoli T. ^{(обсудить}/^вклад) 23:17, 19 December 2022 (UTC)[reply]

@Atitarev: I wouldn't use numbers because they are not linguistically "neutral", they're heavily marked as "popular" and are basically just an expedient one has to resort to when typing in Arabic is not possible. ʔ and ʕ are a totally different question. They are legit substitute for ʾ and ʿ, and as a matter of fact we use them already on Wiktionary for all other Arabic dialects. Incidentally, they are pretty much the same symbols as ʾ and ʿ, they're just bigger and look more like a proper letter (I suppose the vertical bit is there to avoid confusion with /ɔ/ and /c/?). I can't imagine anyone who studies Arabic being confused or disoriented by them. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 23:51, 19 December 2022 (UTC)[reply]

Comment I agree with Sartma that the current accessibility is terrible. I personally find these symbols extremely hard to distinguish. But like Atitarev, I'd prefer to stick to something standard, in part because it will make things easier for people who want to compare our content with other sources, and in part because Google Search doesn't treat them as equivalent, which reduces the chance that people will find us if they want to look up a transliteration in quotation marks. The ideal solution would be to make the half-rings larger. This is technically feasible, all it would require is making a font consisting of two-characters, magnified versions of the ayin and alif symbols, and setting that as the font for the CSS selector .ar-Latn or whatever.

Also worth noting that these characters are used in transliterations of various other Semitic languages too, not just Arabic and its dialects. 70.172.194.25 23:27, 19 December 2022 (UTC)[reply]

I do understand that "tradition" is difficult to abandon, but this really is a case of "bad" tradition. We shouldn't be too afraid of improving things. I'm sure more people will thank us than criticise us for that. As for googling transliterations with ʾ or ʿ, I think it's such a remote eventuality, we can quite easily dismiss it. If anyone can type ʾ or ʿ from their keyboard, they will surely be able to type the word in the Arabic alphabet. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:35, 20 December 2022 (UTC)[reply]

I have personally many times encountered transliterated Arabic-language text online (e.g., Google Books OCR, which picks up the half-rings, but lots of other sources too), and searched it, and found the correct Arabic script representation. I don't usually enter those two characters myself, though. So this isn't a remote possibility, but to be fair maybe my workflow isn't typical. 70.172.194.25 00:45, 20 December 2022 (UTC)[reply]

Also compare searching on regular Google for "Muqannaʕ" (3 results), vs. "Muqannaʿ" (5,340 results). And that's for a term that has been covered extensively, for rarer terms you might not find anything by searching the IPA character. 70.172.194.25 00:57, 20 December 2022 (UTC)[reply]

I support your solution on this one - amending the CSS to make these two characters larger is preferable. Theknightwho (talk) 01:04, 20 December 2022 (UTC)[reply]

And Muqanna has 177.000 results... Following your line of reasoning, one would think that we should just get rid of the final ʕayn. The thing is that we don't give transliterations on Wiktionary so that people can find an Arabic word. That is nothing but a side effect. If this really was a feature so critically important to the extent that we can't ignore it when deciding how to transliterate Arabic, then we should give all possible transliterations, included, and much more pertinently, things like 3alam, with its 1.390.000 results (compare ʿalam, with just 13.600 results)... — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 01:46, 20 December 2022 (UTC)[reply]

It's still the case that larger half-rings would solve the accessibility problem while being more standard than the IPA characters. Could you please humor me and install this font, and then put in your CSS [lang="ar-Latn"] { font-family: LegibleHalfRings, sans-serif; }? I see this as a compromise that allows us to stay in line with almost every other scholarly source on Semitic languages, while also being accessible to readers. @Theknightwho may also want to test this. The font can be tweaked, of course. 70.172.194.25 03:04, 20 December 2022 (UTC)[reply]

The problem with depending on a particular font is that it's not immediate, a user will have to download it, and that's not something we can expect. ʔ and ʕ don't have that issue. They are already visible to anyone whatever font is used. I wouldn't make this a question of font, but one of letters vs diacritics. ʔ and ʕ are proper letters, ʾ and ʿ are just diacritics. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:28, 20 December 2022 (UTC)[reply]

@Sartma Actually it would be trivial to hardcode such a font as a base64 data URI in a CSS file, and as the TTF is only 2 KB it would not take long to load at all. I haven't checked how large it would be as a WOFF, but usually they're even smaller. 70.172.194.25 03:29, 22 December 2022 (UTC)[reply]

Edit: as a WOFF2, it's only 668 bytes. 70.172.194.25 03:31, 22 December 2022 (UTC)[reply]

I tried that user's font on my custom CSS and it worked. So far, I'm testing it. I'd say it's overall better than how I see the ring symbols without it, however, it lacks italic versions which makes it less than optimal when the font is made oblique, which is common in romanization. --Mahmudmasri (talk) 06:21, 22 December 2022 (UTC)[reply]

To be clear, that issue could be remedied. I've actually never made a font before, so I just Googled a tutorial and messed around in Inkscape until I got a workable proof of concept. With a bit more effort it could be perfected. 70.172.194.25 06:45, 22 December 2022 (UTC)[reply]

Are you saying that it's possible to deliver web fonts on-the-fly over Wiktionary? I'm aware of the technology, but I didnt realize MediaWiki was capable of using it. —Soap— 23:47, 24 December 2022 (UTC)[reply]

Yep, it is. And it's even the case that they won't be loaded if they aren't used on a particular page. 70.172.194.25 23:56, 24 December 2022 (UTC)[reply]

We should do that for Hittite, so we can delete that "please download bla-bla font from bla bla" message we have for each and every Hittite entry... And we could also do it for Hebrew, Ugaritic and Classical Greek, to make sure they're always visualised correctly. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:14, 25 December 2022 (UTC)[reply]

That's a good idea. Let's start a new thread about this, maybe on WT:GP? 70.172.194.25 00:20, 25 December 2022 (UTC)[reply]

I'm not confident starting a thread about that, I'm not at all versed in the technicalities. I can contribute suggesting fonts, but I'm afraid that would be pretty much it. Whoever starts the new thread, ping me! — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 22:07, 25 December 2022 (UTC)[reply]

Support. The current transliteration is as bad as using 2 and 3, by not treating them as real CONSONANTS. In Semitic linguistics, both ʔ and ʕ are used. — Fenakhay ^{(حيطي · مساهماتي)} 00:05, 20 December 2022 (UTC)[reply]

(Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka, Sartma): : Repinging as the first one didn't go through. — Fenakhay ^{(حيطي · مساهماتي)} 00:57, 20 December 2022 (UTC)[reply]

Support: It's easier to discern, and most of us, I suppose, would be familiar with the IPA treatment. Fixmaster (talk) 01:15, 20 December 2022 (UTC)[reply]

Since user:Sartma was the first to open the topic, I'll just address him primarily:

The ring characters are used because they are used in the standard schemes we adhered to.
They or similar symbols were obviously chosen by some schemes, because romanized Arabic is supposed to be read by speakers who mostly lack both consonants and very likely approximate them to a glottal stop if they ever utter them.
The scheme on Wiktionary is already a mix, so I see no problem adding to the mix the IPA-based symbols, ⟨ʔ ʕ⟩, which are possible to add from smartphones which lack the ring characters and the Minerva editor lacks any inserting options I know of, anyway.

--Mahmudmasri (talk) 01:42, 20 December 2022 (UTC)[reply]

OP's argument sums it up well, so beyond status quo bias, I support. Fay Freak (talk) 02:02, 20 December 2022 (UTC)[reply]

Comment Don't the Aramaic, Hebrew, and Syriac entries also use the same two symbols (ʾ and ʿ)? I have looked at a few entries in these languages to verify this: they do use ʾ and ʿ, but at ܥܠܡܐ (“world”), the word is transcribed as "ˁālmā", which is curious.

I myself do not mind how these two sounds are transcribed (though, esthetically, I do like the ˁ from that Syriac entry). Nonetheless, I think that changing the transcription symbols for Arabic only would be strange; it would be like changing the representation of the dental consonants only for Arabic. Roger.M.Williams (talk) 04:28, 20 December 2022 (UTC)[reply]

@Roger.M.Williams: The topic of consistency of transliteration between Semitic languages has been discussed in the past, and people always said that "there is no need for all languages to have the same rules". I would agree with you that it would be nice to have inter-linguistic consistency, but apparently only intra-linguistic consistency is a thing on Wiktionary. That's why it's ok for all Arabic dialects to use ʔ and ʕ, while Standard Arabic still has ʾ and ʿ. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 08:58, 20 December 2022 (UTC)[reply]

@Roger.M.Williams: I still prefer the full ʕ to the superscript ˁ. I have issue with this constant effort to reduce full consonants to a diacritic or to something "different" from other consonants. ʔ and ʕ are proper consonants, why should they be written as superscripts (ˀ, ˤ) or as diacritics (ʾ, ʿ)? Even from a linguistic point of view it makes no sense. The problem is that we're so used to see these two consonants abused and belittled that we think this must be the way. But a different world is indeed possible. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:04, 20 December 2022 (UTC)[reply]

Comment/support Template:ayn and Template:hamza would perhaps be more visible, but they suffer the problem of looking like punctuation marks. I agree though that if we use the IPA letters for Arabic, we should probably also use them for Hebrew and Aramaic. It would certainly be easier to read the languages that way, apart from dyslexia (which I imagine the current convention suffers from too). kwami (talk) 05:47, 20 December 2022 (UTC)[reply]

In order to avoid problems with dyslexia, perhaps the two shouldn't be symmetric. Maybe for glottal stop we could use the lower-case form <ɂ>:

ʔ vs ʕ
ɂ vs ʕ

kwami (talk) 06:02, 20 December 2022 (UTC)[reply]

I think this may start another discussion (or wrangle) about the esthetic qualities of ˀ, ɂ, and ʔ.

In any case, I prefer the first two. Roger.M.Williams (talk) 06:44, 20 December 2022 (UTC)[reply]

@Kwamikagami: I think that can be discussed once the decision to go for something else then ʾ and ʿ has been made. One thing is sure: people with dyslexia would definitely be better off with ʔ and ʕ, if anything because they are bigger and clearer. The symmetry issue wouldn't anyway be bigger than existing letter like b/d p/q u/n etc. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:04, 20 December 2022 (UTC)[reply]

Yeah, no biggie either way. I just wanted to bring it up. Full-size ʔ vs ʕ would probably be the most straight-forward to implement. kwami (talk) 18:33, 20 December 2022 (UTC)[reply]

ˁ in Syriac/Aramaic entries is most probably lazily copied over from CAL who themselves may use it for its being conservatively close to the the traditional rings while more distinctive. But I suspect its being a “modifier letter” makes it technically illegal for expressing a whole consonant.

ɂ looks nice, and also amenable to European thinking where the voiced pharyngeal fricative may be perceived as more an actual sound than the glottal stop, as well as reflecting the linguistic reality with the drop of this sound in many dialects unlike the voiced pharyngeal fricative.

We forgot that Ethiopic script transcription uses the rings, but luckily I have argued in a previous discussion about the same matter that it can have special status. Fay Freak (talk) 08:18, 20 December 2022 (UTC)[reply]

@Kwamikagami: I proposed unifying transliterations of ʔ and ʕ for all Semitic languages, but back then I was the only one thinking that would have been a good idea... See: Unifying the transliteration of ʾalef and ʿayin in Semitic languages. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:54, 20 December 2022 (UTC)[reply]

Support. No reason to prioritise adhering to what happens to be the most common scientific transcription over overall usability, and the point about being a full consonant is a good one. —Al-Muqannaʕ المقنع (talk) 09:43, 20 December 2022 (UTC)[reply]

Support. We would have to use the IPA letter ʔ, not lowercase ɂ (whose uppercase is Ɂ, different from the IPA letter), because there is no lowercase version of ʕ, unfortunately because I like the idea of using only lowercase letters in transliteration. Apparently no language uses uppercase and lowercase versions of ʕ yet so Unicode hasn't added them. — Eru·tuon 15:29, 20 December 2022 (UTC)[reply]

The Arabic transliteration is strictly lower-case, so that's not a problem only with no exception at Wiktionary but Wikipedians suffer from that notion that they need to capitalise foreign transliterations where it's inappropriate. Anatoli T. ^{(обсудить}/^вклад) 22:34, 20 December 2022 (UTC)[reply]

@Atitarev I am not opposed to this, primarily because the existing ʾ and ʿ are nearly impossible to tell apart at a normal distance (several times I've had to zoom in on the text in order to see what was going on). In addition, in the edit window the two are displayed reversed for some reason, which makes things incredibly difficult. However I do understand the concern about people searching for the transliteration using the ring symbols. I think there's a technical solution to this: insert the old transliteration into the output code inside of an HTML comment or non-displaying span. This should make the correct entry pop up when you search using the "traditional" transliteration.

If we decide to implement this, we should not "just do it", but think it through; lots of entries and other places use manual transliteration, and they will all have to be corrected by bot. Benwing2 (talk) 02:19, 21 December 2022 (UTC)[reply]

@Benwing2: It sounds interesting but I'd like to know more when it comes to it (implementation). Thank you.

As for the display of ʾ and ʿ, there are some suggestions (not sure how good and feasible) about the visual improvements in this topic. Anatoli T. ^{(обсудить}/^вклад) 03:21, 21 December 2022 (UTC)[reply]

This reminds me of some of the ideas floated in Wiktionary:Grease pit/2012/April#How_the_search_feature_works_(and_doesn't_work). I like the idea of outputting some invisible text, especially if we could do it in a way that'd be picked up not only by our internal search function but by Google (would an HTML comment, or an undisplayed {{misspelling}}-style template parameter, be picked up by Google?). Then we could include any common transliteration system, even chat Arabic. - -sche (discuss) 06:33, 21 December 2022 (UTC)[reply]

@-sche: I do like this, too. It would open a world of possibilities, even getting rid of "Romanization" entries for some languages. And adding chat Arabic search would be great too! Let's explore this option! — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:58, 21 December 2022 (UTC)[reply]

@Benwing2: Just for the records, when I wrote "let's just do it", I wasn't implying "without thinking it through". I'm very well aware that we need to consider the technicalities of it, but the point was: it's nothing impossible, so let's not shy away from it and let's make it happen". — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:33, 21 December 2022 (UTC)[reply]

@Benwing2: I had the display issue in the edit window on my Mac, too (not on my Windows). It's caused by the fixed-width font set on your browser. Try a different font. On my Mac I use "Menlo" (better than other fixed-width fonts with Greek accents, too), and both ʾ and ʿ are displayed correctly. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:48, 21 December 2022 (UTC)[reply]

For future reference, the Wiktionary search engine picks up text with the display: none; CSS property, but not HTML comments. Tested on my userpage. I don't know what Google does though. — Eru·tuon 21:20, 21 December 2022 (UTC)[reply]

@Sartma Thanks for the info, yes I'm on a Mac in Google Chrome. My fixed width font is Courier. The thing is, though, that many people will run into this issue unless Chrome has changed the fixed width font default in more recent versions, or unless there's a way of overriding the default edit window fixed width font using Wiktionary CSS settings. @Erutuon Thanks for the test; if display-none text is picked up by Google this is probably a good solution. Benwing2 (talk) 03:07, 22 December 2022 (UTC)[reply]

I can't get User:Erutuon/sandbox to show up in a Google search even if I search for the visible text, restrict the search to en.wiktionary.org, etc, so I can't check whether the invisible text is being noticed or not. (This isn't the first time I've searched for a unique string I knew was on a page which I knew Google had indexed [because I could find the page via other searches], but it wouldn't show up when searching for the unique string; I think Google may be being too smart for its own good when hiding results it deems too unimportant and telling you it simply has no results.) So, I've temporarily put some test text in test; if it gets noticed by Google, that'll be great. If not, maybe we can think of other ways we could get alternative transliterations noticed (or just accept that at least having them findable via our internal search is enough). - -sche (discuss) 23:26, 25 December 2022 (UTC)[reply]

@-sche: I think Google might not index user pages so it's not surprising. I should've said something to save you the trouble. However, I do see the test text showing up in Google, so it's confirmed that Google indexes display-none text. — Eru·tuon 17:13, 26 December 2022 (UTC)[reply]

I too see the test text showing up in a google search; that's exciting, this method would work! We could even consider using it for other things, like invisibly mentioning at the end of declension tables various (procedurally generable) Turkish verb forms that someone unfamiliar with Turkish might look up because they didn't know how to decompose them, but that we don't include.
On a technical note, Google does index User:-space (sub)pages, it just seems to classify them — and e.g. little-viewed blogs — as unworthwhile, such that if I search for a string which occurs only on one user-space page or obscure blog, Google will lie and say 'no results', whereas if I search for a string that's on several pages, Google does include the userpage or blog towards the end of the results. E.g., although most of the results for user sche site:en.wiktionary.org are User talk: pages, categories, etc, some of my User:-space (sub)pages do show up ... but if I search for a specific string, even a specific string that Google found in the other search and displayed on the results page like "5th century inscription on a scabbard-mount", Google claims 'no results'. - -sche (discuss) 19:17, 26 December 2022 (UTC)[reply]

@-sche, @Erutuon: This is indeed quite cool. It's a minor thing, but I've been using the apostrophe (') instead of the alif half-circle (ʾ invalid IPA characters (ʾ)) in Akkadian headwords to make Akkadian normalization searchable on Wiktionary, but with this little trick I should be able to keep the ʾ invalid IPA characters (ʾ) on headwords, right? For example a'īlu: I could create aʾīlu and hide "a'īlu" in the page to make it searchable, right? — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 22:26, 26 December 2022 (UTC)[reply]

I mean, we should probably have the templates/modules generate these automatically (for every template that is receiving a certain transliteration as tr= input, or that is generating one based on original-script input, also output invisible alternative transliterations using the other major systems), rather than manually adding it to entries in most cases, but yes.
(Will Google demote us for SEO trickery, if we add lots of invisible text? Do we care?) - -sche (discuss) 15:11, 27 December 2022 (UTC)[reply]

Now, could the guys who have the skills now allow or at least please consider the searches for standard transliterations like "ʾiʿlān" to bring the term إِعْلَان, since it's now transliterated as "ʔiʕlān" and any search for "ʾiʿlān" won't bring any results in Wiktionary but it is an attestable transliteration in Google books. @Erutuon, -sche, Sartma, Benwing2, Fenakhay, Mahmudmasri, Fay Freak. It's a long discussion, not sure where to post but this idea was mentioned, no idea how feasible it is. --Anatoli T. ^{(обсудить}/^вклад) 01:27, 1 March 2023 (UTC)[reply]

Presumably the best(?) way to accomplish this systematically is to have the templates/modules that generate Wiktionary-standard transliteration also generate the other elsewhere-common transliteration(s) but hide those in display: none spans; I don't know if this would cause issues if e.g. some other module expects the only output to be a transliteration (and would break in some way upon receiving a span tag), so unfortunately I think I have to leave this to someone else to implement. An alternative would be to create a template like {{display none}} and put needed transliterations into that, at the bottom of each entry or after every use of any template that uses tr=. - -sche (discuss) 07:49, 1 March 2023 (UTC)[reply]

Implementation of ʔ and ʕ in Arabic languages edit

Hello all! A while has passed since this proposal. I count 6 support and 1 oppose, plus various comments. Shall we proceed to implementing it? — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 10:02, 1 February 2023 (UTC)[reply]

It seems I didn't originally offer my input given that I only work with South Levantine Arabic, which already uses ʔ & ʕ. For what it's worth, I strongly support this proposal for all the reasons already mentioned in addition to — if nothing else — consistency with other Semitic languages & varieties that already use ʔ & ʕ. If that makes seven votes, I'd say go ahead & start implementing it. 213.6.250.158 10:15, 1 February 2023 (UTC)[reply]

Oops, this was me. I wasn't logged in. AdrianAbdulBaha (talk) 10:16, 1 February 2023 (UTC)[reply]

I believe Fenakhay made Module:ar-translit produce ʔ and ʕ. There are still lots of manual transliterations with ʾ and ʿ. I made a list of pages with those characters in the wikitext: User:Erutuon/lists/ʾ and ʿ. That's not quite usable by a bot. I guess the parameters that need to be changed are 1. any parameter whose name starts with tr and 2. any term with <tr: where the value of the parameter or the value of <tr: contains ʾ or ʿ and where the language of the term is an Arabic dialect. (On Discord Fenakhay made a list of language codes.) I've written a program that finds the candidate parameters but doesn't identify the language. I used to have a list of templates categorized by where the language code can be found, but it's pretty out of date. — Eru·tuon 06:17, 27 February 2023 (UTC)[reply]

Shall we proceed with this? (Notifying Atitarev, Benwing2, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fenakhay, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka): — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 00:30, 25 February 2023 (UTC)[reply]

I also strongly support this proposal. It's uncomfortable to read these small signs (ʾ and ʿ), these two consonants should be as big as other consonants are and it's time to change it. I support these signs (ʔ and ʕ). Zhnka (talk) 05:56, 25 February 2023 (UTC)[reply]

Sure! Roger.M.Williams (talk) 18:21, 26 February 2023 (UTC)[reply]

Now it has been changed I already got used to it. Feels good. If I am not annoyed by it, due to habituation, nobody else will be either. Fay Freak (talk) 00:33, 1 March 2023 (UTC)[reply]

Okay, I've got a several-step process to find template parameters containing ʾ and ʿ that need to be replaced with ʔ and ʕ. I think what a bot would need is the title, the name of the template, the key and value of the parameter that contains the language code (just to verify that the language code hasn't changed), and the key of the parameter that contains the transliteration. The bot can probably run the replacement on the whole transliteration parameter value even if the transliteration is inside <tr:...> because there are probably never any attributes (if that's the word for it) of the term, like translation or literal translation, that would have ʾ or ʿ, unless someone nested a term from another Semitic language that still uses ʾ and ʿ inside a non-transliteration attribute.

The process involves extracting the link templates from the dump, parsing out the relevant information from the link templates, and then filtering the links by language code and whether the transliteration contains ʾ or ʿ. I save the intermediate bits as CBOR so that they're fast to parse. It only catches link templates that are in the surface syntax and the ones that I happen to have rules for, so it's probably going to miss some. I found 3205 Arabic transliterations that need changing. This counts transliterations and not templates, and there might be multiple transliterations in a single template. This process only handles transliterations, but I might be able to work it into something that lets us search for other things inside link templates. — Eru·tuon 21:26, 2 April 2023 (UTC)[reply]

Published the results at User:Erutuon/lists/Arabic ʾ and ʿ.json. — Eru·tuon 23:34, 2 April 2023 (UTC)[reply]

@Erutuon: Thanks. Could you also please comment on allowing users to search for terms spelled in standard transliteration like ʾiʿrāb to find إِعْرَاب (ʔiʕrāb).

The spelling "ʾiʿrāb" is heavily is used by grammarians. I.e. convert a search for "ʾiʿrāb" into "ʔiʕrāb"?

Pronunciation section, hidden transliteration, anything else? Is it even possible? Anatoli T. ^{(обсудить}/^вклад) 02:49, 3 April 2023 (UTC)[reply]

To have ʾ and ʿ treated as equivalent to ʔ and ʕ in regular Wiktionary searches would require changes to Extension:CirrusSearch. I don't think it would cause any problems because probably no writing system uses both of them. Maybe alternative transliterations (transcriptions) could be put in the pronunciation section too. That wouldn't let you find transliterations in other places containing ʔ and ʕ with a search query containing ʾ and ʿ though, and it wouldn't work if the language doesn't have this alternative transliteration in its pronunciation section (thinking of modern Arabic "dialects" here). The search equivalency thing would be great, but I don't know if the developers could or would implement it.

I've been thinking off and on about making a search engine on Toolforge (kind of like toolforge:enwikt-translations) that would let you search various link template stuff, including transliterations, but I haven't begun it and I don't know if I will do that anytime soon. If I did, it could certainly merge those equivalent letters. — Eru·tuon 07:37, 3 April 2023 (UTC)[reply]

Finished replacing the characters. See User:ToilBot/edit logs/2023-04/replacing Arabic half rings. I'll double check my work after the next dump comes out on May 1st. — Eru·tuon 19:17, 27 April 2023 (UTC)[reply]

Oppose: Yikes, I did not see this discussion. This was such a bad move and 100%, this should have gone to a vote. Anatoli T.'s argument is absoltely on point -- nobody in academia uses ʔ or ʕ, and now we just made all those transcriptions unsearchable. Do we need to start a vote to revert this? -- Sokkjō 07:32, 28 April 2023 (UTC)[reply]

I suggest a bot task to remove unnecessary tr= arguments from standard Arabic, adding vowels to the Arabic text if necessary to generate the correct transliteration. That would leave only irregular pronunciations with explicit tr= arguments. The next change will be easier to implement if only a module needs to change. Vox Sciurorum (talk) 10:38, 28 April 2023 (UTC)[reply]

Names for some Turkish verb forms edit

In a definition line for a Turkish verb form, what should be the name for

the participle ending in -en
the form ending in -dik
the verbal noun ending in -iş
the gerund ending in -erek
the gerund ending in -ip

?

I have been defining -iş forms as verbal noun of, the same as -me, which may be misleading.

I added placeholder names for -en, -erek, and -ip in Module:tr-verb form of. If you have a strong opinion, go ahead and edit that module.

Which of these forms should defined as {{head|tr|verb form}} instead of as lemmas? Vox Sciurorum (talk) 18:14, 20 December 2022 (UTC)[reply]

Some authors classify some as converbs, while I have the impression that the use of the term gerund is controversial. Off the top of my head, I miss the future participle AcAk and the expeditive(?) converb IncA.

An: present participle is just fine.
DIk: perhaps "real participle"? ("Real" in contrast to the irrealis of the future participle.)
[added 14:08, 23 December 2022 (UTC)] The template {{tr-conj-head}} names this a “non-prospective personal participle”.
Iş: instantial verbal noun? (Whereas <verb>mA is the general act of <verb>ing, <verb>Iş is an instantiation of that act.)
[added 13:51, 23 December 2022 (UTC)] It seems better to me not to list terms formed with -iş as verb forms, but only as common nouns.
ArAk: instrumental converb?
Ip: conjunctive converb?

--Lambiam 23:27, 21 December 2022 (UTC)[reply]

Perhaps not totally related but I can't find the name for this suffix either: Pekiştirme (dilbilgisi)

Is it missing in {{inflection of}}? "Excessive degree" doesn't link to anything, perhaps that's what it is? Can we add it if it's missing? And better yet, a new parameter for {{tr-adj}} to input these forms. --Whitekiko (talk) 08:03, 30 December 2022 (UTC)[reply]

Adding script codes for the Clear Script, Manchu and Xibe. edit

For context, the the Clear Script (also know as Todo) was used for Written Oirat and (at least historically) its descendant Kalmyk, Manchu for Manchu, and Xibe for Xibe. All three are descended from the Mongolian script proper: the Clear Script was an overt attempt to remove the ambiguities present in Mongolian, while the latter two represent adaptions to account for the different phonology of the Tungusic languages. Certainly in the case of the Clear Script and Manchu, they also allow for the transcription of Tibetan and Sanskrit for religious purposes. However, they should not be confused with the Galik alphabet, which was an augmentation to Mongolian proper for the same purpose.

From an encoding perspective, the overlap between the four scripts is surprisingly small, despite the fact that they can often appear orthographically similar. This is due to the fact that each have several equivalent characters that appear orthographically identical in many forms, but which exhibit different behaviour under certain circumstances. Compare Mongolian and Manchu "i" (encoded separately), which have identical isolated (ᠢ (i) ᡳ (i)) and initial (ᠢ᠊ (i-) ᡳ᠊ (i-)) forms, and are usually identical when medial (᠊ᠢ᠊ (-i-) ᠊ᡳ᠊ (-i-)) and final (᠊ᠢ (-i) ᠊ᡳ (-i)), too. However, Manchu has additional variant medial (᠊ᡳ᠌᠊ (-i-)) and final (᠊ᡳ᠋ (-i)) forms. What's more, even when two of the scripts use the same encoded character, they may be stylistically different: compare medial "n" between Mongolian (᠊ᠨ᠊ (-n-)) and Clear Script (᠊ᠨ᠋᠊ (-n᠋-)), where the latter includes a dot.

Usually, these differences are achieved by the use of one of the Mongolian variation selector characters. Unfortunately, there is no agreed upon standard when it comes these, and so different fonts might exhibit opposite behaviour depending on whether or not a variation selector has been included. As such, it's preferable for us to simply exclude their use from entry names entirely, and to implement alternate display forms based on which script is in use. This is only possible if we have different script codes for each script. It also wouldn't be appropriate to do this based on the language code, due to the fact that the Sanskrit language has historically been written in Mongolian, Clear Script and Manchu, and therefore all three could (at least theoretically) be added to the list of scripts displayed by {{sa-alt}}. In future, we may want to create a similar template for Tibetan, too. Not to mention the fact that this also introduces the possibility of a user using different fonts for each script.

As such, I suggest we add the following script codes: Todo for the Clear Script, Manc for Manchu and Xibe for Xibe. Theknightwho (talk) 20:10, 20 December 2022 (UTC)[reply]

@Theknightwho You might try seeing if you can find and ping the editors who work in Mongolian and Manchu (there can't be very many ...). The following might be able to comment: (Notifying Crom daba, LibCae): . If you don't get a response in a few days I would just add those scripts. Benwing2 (talk) 07:26, 22 December 2022 (UTC)[reply]

Maybe also User:RichardW57m who knows a lot about Sanskrit and Pali. Benwing2 (talk) 07:27, 22 December 2022 (UTC)[reply]

Oops @RichardW57. Benwing2 (talk) 07:32, 22 December 2022 (UTC)[reply]

I thought we had the racist rule that only Indic scripts could be used for Sanskrit. Has that rule been repealed, so that now we only have the even more racist ban on the Roman script? --RichardW57 (talk) 09:42, 22 December 2022 (UTC)[reply]

I don’t know about any ban on the Roman script, but when I mooted the idea of Mongolian transcription a few months ago (on Discord, I think) it was received positively. It’s proving a bit of a challenge as it’s so different, but I should be able to get it up and running in a test instance today.

Interestingly, the sources tend to be Chinese, using 4 or 5 scripts in the following order: Devanagari (sometimes), Tibetan, Manchu, Mongolian and Chinese. This raises the possibility of Chinese transcriptions as well, as the system they use is completely standardised. Theknightwho (talk) 12:20, 22 December 2022 (UTC)[reply]

This is an interesting idea. To my knowledge we haven't normally had occasion to add new non-language-specific script codes, unlike how often we add new language codes, because if the characters of a script have been encoded, they've generally also been assigned to a (ISO-coded) script already (here, Mong), but AFAICT it should work, just like pjt-Latn works for allowing specific fonts to be set. We should probably avoid having our invented codes look like (and potentially clash with some future assignment of) ISO codes, though. Since the private use range of ISO 15924 is pitifully limited (Qaaa-Qabx), the more intelligible way to do that is probably to follow the model of Latinx, polytonic, etc and make the new codes longer than four letters, e.g. Todob or Clear, Manch or Manchu (if it wouldn't cause problems for the same string to appear as a script code and a language name), etc. - -sche (discuss) 08:16, 22 December 2022 (UTC)[reply]

Can we please have a specific use-case where we actually need to select different fonts. (Accompanying statements of the encodings would help.) I don't see how font-specific encoding helps. --RichardW57 (talk) 10:06, 22 December 2022 (UTC)[reply]

@RichardW57 To give a use case, Clear Script and traditional Mongolian differ stylistically in several ways (even if we exclude where they use different code points) - most obviously in their form of the final “a” in Sanskrit transcription. However, they use the same code point, and no variation selector distinguishes them. Bobrovnikov’s comparative grammar between Mongolian and Kalmyk ( Грамматика монгольско-калмыцкого языка) is excellent (if dated), and shows this well at p. 374:

Naturally they both come from Middle Mongolian, which had many variant styles. However, its descendants Written Oirat and Classical Mongolian are both classical languages, and therefore it would be good, in an ideal world, to display them in ways that are most fitting for that language. Their descendants, Kalmyk (for Written Oirat) and Mongolian and Buryat (for CM) naturally adopted the relevant style.

An obvious comparison is Arabic, where Perso-Arabic uses fa-Arab, and as such displays quite differently. On second thoughts, we could do something similar, with maybe four codes in the style mn-Mong etc. I suspect this would be better for browser support, too. Theknightwho (talk) 12:01, 22 December 2022 (UTC)[reply]

In light of no objections, I've gone ahead and added these. Theknightwho (talk) 20:54, 27 December 2022 (UTC)[reply]

Browsers don't recognize classes like class="mn-Mong", but they do recognize language code and script code combinations lang="mn-Mong" (which we don't use except lang="mn-Latn" for transliterations) because they are valid IETF language tags, though they don't necessarily assign different styles to all combinations. — Eru·tuon 21:01, 27 December 2022 (UTC)[reply]

@Erutuon: The problem here is that lang=sa-Mong is inadequate. What we would allegedly need is something like:

sa-Mong-MN (Galik)

sa-Mong-CN (Manchu)

sa-Mong-RU (Todo)

The correspondences have a degree of arbitrariness - all three styles could be refined by 'as used in China'. RichardW57 (talk) 12:40, 28 December 2022 (UTC)[reply]

TKW, which fonts should be set for the different varieties, in MediaWiki:Common.css? - -sche (discuss)

I'll get back to you on this, as it would make sense to define a few for each in priority order, as most people won't have them. Theknightwho (talk) 02:46, 31 December 2022 (UTC)[reply]

I'm just thinking in terms of: it doesn't accomplish much to have different script codes if they're all getting displayed the same way! - -sche (discuss) 08:01, 1 March 2023 (UTC)[reply]

Should "translation hubs" have inflections? edit

...as just added at commit adultery? I would think not. Equinox ◑ 22:00, 21 December 2022 (UTC)[reply]

In my opinion, multiword phrases should never have inflections but just refer to the word that’s actually inflected (unless of course inflection is for some reason different in the multiword phrase). MuDavid 栘𩿠 (talk) 01:47, 22 December 2022 (UTC)[reply]

Well, it's possible a user could type one of the non-lemma forms (especially the present participle) into the search box, but I don't know how common that would be. 70.172.194.25 06:50, 22 December 2022 (UTC)[reply]

I would say no. I have no problem with them being listed in the entry itself, with no links, but we shouldn't have actual pages for commits adultery, committing adultery, etc. The only reason we have the lemma is for the translations, so that means there is absolutely zero reason for having the inflected form entries. Andrew Sheedy (talk) 07:02, 22 December 2022 (UTC)[reply]

Worth mentioning that in this case it was added by bot. I agree translation hubs shouldn't, but I do create inflections for multiword phrases because of what 70.x mentioned. —Al-Muqanna المقنع (talk) 12:09, 22 December 2022 (UTC)[reply]

Setting Old Church Slavonic as an ancestor of Bulgarian and Macedonian edit

Pinging interested parties: @Sławobóg, Skiulinamo, Atitarev, Горец, Gnosandes, Andrew012p, Fay Freak, Огньметъка, Rua, Bezimenen, Useigor, Jurischroeer, Vorziblix, Bogorm, Mahagaja, Vininn126.

Presently, our handling of Bulgarian and Macedonian in terms of etymology is quite inconsistent: Many Old Church Slavonic (OCS) entries feature Bulgarian and Macedonian under the Descendants section. We also classify Bulgarian as a descendant of OCS. On the other hand, we don't classify Macedonian this way and we don't feature OCS in either Bulgarian or Macedonian etymology sections.

A bit on the background of this issue: Somewhere in the ninth century CE, Cyril and Methodius came up with OCS as a literary, liturgical standard for the Slavic languages based on their native language, which is commonly accepted to be a certain Eastern South Slavic (ESS) dialect. Over time, this standard diverged greatly from the varius ESS dialects that continued to be spoken in the area, and that eventually became the modern Bulgarian and Macedonian dialects. Somewhere around the 18th century, new literary standards arose based on these spoken varieties - Bulgarian and Macedonian.

Now, we have precedent for classifying languages derived from dialects as being descendants of the language those dialects are attributed to: So, Afrikaans (which is derived from various Dutch dialects) is a descendant of Dutch (lemmatised at Standard Dutch) and Old French (which is derived from Vulgar Latin) is a descendant of Latin (lemmatised at Classical Latin). We also have precedent to do this when the written language diverged from the spoken language long before the spoken language's codification, and continued to be used liturgically even after their daughter language's emergence: This is why Greek is a descendant of Ancient Greek, Corsican is a descendant of Latin, and numerous Arabic dialects are descendants of Standard Arabic.

Since ESS dialects served as a basis for the OCS language, I believe it's fair to handle ESS dialects as being "dialectal" OCS: unwritten, yet attributed to this written language. This would in turn entail that both Bulgarian and Macedonian be handled as full-fledged descendants of OCS, i.e. etymologies should feature an intermediate between Proto-Slavic and the languages (if need be as a reconstruction), and the two languages should be given under descendant sections of OCS.

I know some have very strong feelings on this issue, which is why I'm bringing it up on this forum. I hope we can come to an agreement together, but if that's not possible it seems the next step would be a formal vote. Thadh (talk) 12:40, 22 December 2022 (UTC)[reply]

Summoning @Siedmiogrodzianin. Sławobóg (talk) 14:05, 22 December 2022 (UTC)[reply]

I'm not a Slavicist but I've encountered the problem in Hungarian etymology, e.g. Pest, which Hungarian dictionaries note is derived from Old Bulgarian but I've denoted as Old Church Slavonic for lack of a distinct Old Bulgarian tag. Old Bulgarian is currently glossed as a synonym of OCS at the moment, and vice versa at Old Church Slavonic. If OCS is not set as an ancestor of Bulgarian then Old Bulgarian probably needs tooling as a separate language. —Al-Muqanna المقنع (talk) 14:53, 22 December 2022 (UTC)[reply]

@Al-Muqanna: If we decide that OCS shouldn't be an ancestor of Bulgarian, Old Bulgarian would still be an OCS label; Its name is a bit misleading, but Old Bulgarian is essentially the continuation of OCS in Bulgaria. Thadh (talk) 16:07, 22 December 2022 (UTC)[reply]

OK, so the non-inherited option would just be to treat Proto-Slavic > OCS/"Old Bulgarian" and Proto-Slavic > Bulgarian as parallel. I was just confused because the dictionary I got Pest from uses both "OCS" and "Old Bulgarian" without giving an indication of the difference, and my impression was that the intended distinction was between literary and vernacular languages rather than a chronological one. From what I understand I think your solution in the OP makes sense, but I'll defer to those more knowledgeable. —Al-Muqanna المقنع (talk) 19:56, 22 December 2022 (UTC)[reply]

I don't know about the taxonomic policies of Wiktionary, but technically only contemporary renditions of Church Slavonic can be counted as descendants of Old Church Slavonic. Modern Bulgarian and Macedonian + a few other micro-languages (Pomak, Banat Bulgarian, Aegean Slavomacedonian, arguably Torlak) are ausbau cousins of OCS, but none of them strictly descend from OCS. The relation between them is like that between Hindi, Bengali, Punjabi + other Prakrit varieties and ecclesiastical Sanskrit. Безименен (talk) 20:06, 22 December 2022 (UTC)[reply]

Yes. As I pointed out above, there is quite a lot of precedent to simplify this type of relation to mother/daughter (I should note that Sanskrit is also set as the ancestor of Hindi). We could choose to ignore these precedents, but I personally don't see any reason to. Thadh (talk) 20:35, 22 December 2022 (UTC)[reply]

Just FYI, there's no such thing as a 'microlanguage' - that's a Soviet pseudo-scientific term coined with the aim of fragmenting real existing languages. In the USSR, it was used to denote local dialects such the Polissya dialect of Ukrainian in Norther Ukraine and Southern Belarus. The Communists went as far as coining a 'West Polissyan microlanguage'. Same shit with the Banat, Pomak speech - they're just dialects of Bulgarian. The Banat Bulgarians explicitly call themselves Banat Bulgarians, they don't have a separate 'Banat' ethnic identity and call their speech a variant of Bulgarian. They are known by all their neighbours as Bulgarians. The 'microlanguage' term is part of the glossary of (malevolent) linguistic engineering. Ентусиастъ (talk) 08:22, 23 December 2022 (UTC)[reply]

Putting my two cents in

0. I want to start off by saying that I'm Bulgarian; now you can more easily diagnose my biases.

>Somewhere in the ninth century CE, Cyril and Methodius came up with OCS as a literary, liturgical standard for the Slavic languages based on their native language

1. This way of phrasing it appears a little off to me. Yes, they turned this already existing spoken language into a literary form, but it appears to me that what's unsaid here tries to paint a different picture. I know this is not what you mean by I've seen certain people claim that OCS had simple past tenses and OES did not because OCS, somehow, has more of a "constructed" essence.

2. Calling the literary form "OCS" is quite anachronistic given that all the way back in the 11th century it was referred to as "the Bulgarian language" by Theophylact of Ohrid. (Noteː the Medieval notion of bulgarianness was different to the modern one.)

3. Depending on how far deep down we go, we can reach some peculiar conclusions. Even though, yes—OCS is based on ESS dialects, if we zoom further, we'll find that it's based on the predecessor to the Rup dialectal group. Neither the Neobulgarian, nor the Macedonian literary standards are based on that dialectal group.

4. If anything, I believe that Russian should be most justified in claiming OCS as its antecedent. Whereas modern Russian is based on the marriage between OES dialects and Neomuscovite Church Slavonic (itself a clear descendant of OCS), both Neobulgarian and Macedonian standardizers—clouded in post–French Revolution Romanticism—set it as their goal to oppose the acceptance of Church Slavonic elements into the aforesaid literary languages.

5. Furthermore, "Middle Bulgarian" adds even more mist to the discussion. Even though yes—both Ohrid and Pliska-Prĕslav(-Trĭnovo) already had some differences before the fall of the First Bulgarian Tsardom, works from the Second Bulgarian Tsardom from both groups are counted as being written in "Middle Bulgarian". Following that train of thought, even if we assume that Neobulgarian and Macedonian both stem from OCS, wouldn't it be better to say that instead they stem from Middle Bulgarian, which in turn stems from OCS?

Concludingː Honestly, I'm undecided on this topic, so as long we reach some kind of conclusion—a standardǃ—I'd be happy. Have a nice dayǃ Ѻгн҄еметꙑн҄и/Ogňemetyňi (talk) 13:57, 23 December 2022 (UTC)[reply]

@Огньметъка: Thank you for your response. I certainly didn't want to make it sound as if OCS is a constructed language or anything of the sort, I'm sorry if I accidentally did.

I'm also using "OCS" here as a shorthand for "Old Church Slavonic or any subsequent Church Slavonic varieties", since that's what our L2 "Old Church Slavonic" denotes. (whether it's a good idea to rename it is another issue). It's also important to note that Middle Bulgarian (or Middle Macedonian, whatever one may call it) is currently handled as Church Slavonic, and as such as part of the L2 Old Church Slavonic - of course everything is changeable if we want to, but that's not the topic of this discussion, which is purely focussing on whether or not Bulgarian and Macedonian should have Church Slavonic lects set as an ancestor or not.

I think the fundamental question is - should we use OCS as a model representation of all ESS dialects spoken at the time of its codification (basing a literary language on one dialect is very common after all), or should we instead handle it as only being the representation of the Rup dialects, with the other dialects being left unwritten for multiple centuries.

I personally prefer the first solution, simply because of the numerous common developments that ESS dialects have undergone in the ninth century already, but there is no right answer here. Thadh (talk) 14:16, 23 December 2022 (UTC)[reply]

@Thadhː Thank your for you swift response as wellǃ

>I certainly didn't want to make it sound as if OCS is a constructed language or anything of the sort, I'm sorry if I accidentally did.

Haha, I knew that's not what you meant but thank you very much for clarifying as well.

>I'm also using "OCS" here as a shorthand for "Old Church Slavonic or any subsequent Church Slavonic varieties", since that's what our L2 "Old Church Slavonic" denotes. (whether it's a good idea to rename it is another issue). It's also important to note that Middle Bulgarian (or Middle Macedonian, whatever one may call it) is currently handled as Church Slavonic, and as such as part of the L2 Old Church Slavonic - of course everything is changeable if we want to, but that's not the topic of this discussion, which is purely focussing on whether or not Bulgarian and Macedonian should have Church Slavonic lects set as an ancestor or not.

Ah yeah... I remember seeing some other discussion on this topic too. The reason why I brought it up is because if we are to mangle with the descendants of OCS, that would mean that we'd also have to properly align entries whenever a daughter language has inherited that term. If we were to add more L2's in the future—that would create more work.

I know this is incredibly unpopular but I personally like to look at all pre–national slavic languages stemming from the Cyrillian tradition as the same evolving organism in a Ship of Theseus–esque way. I'm quite positive that is how Patriarch Euthymius also saw and that it would be quite anachronistic to look at it otherwise.

>I think the fundamental question is - should we use OCS as a model representation of all ESS dialects spoken at the time of its codification (basing a literary language on one dialect is very commonafter all), or should we instead handle it as only being the representation of the Rup dialects, with the other dialects being left unwritten for multiple centuries.

Well, even though Cyril was brought up with a proto–Rup dialectal range tongue, in books contained in the OCS canon we can see it being "contaminated" by other ESS speeches. Codices created in Prĕslav show signs of contamination from northeastern ESS dialects, while ones created in Ohrid—of southeastern ESS dialects. The only ESS subdialect group missing is the northwest one. However, it can be quite confidently argued that Codex Marianus shows signs of Serbian contamination as well, not to mention the Kiev Missal which shows clear signs of West Slavic influence.

In a tree model maybe that can be easily disregarded as—OCS just stems from the proto-Rup tradition and everything else is "contamination", but when we are talking about the dialects of Middle~Late Common Slavic that are all so similar to each other, it seems to get quite murky.

>I personally prefer the first solution, simply because of the numerous common developments that ESS dialects have undergone in the ninth century already, but there is no right answer here.

Mhm, that seems sound; I do not have any strong objections to this. Ѻгн҄еметꙑн҄и/Ogňemetyňi (talk) 16:13, 23 December 2022 (UTC)[reply]

My two cents thus far:

I believe it does make sense to categorize ESS dialects into OCS with a caveat: using labels and clear explanations of labels might be necessary, so that people know EXACTLY what we mean when we are doing this. Vininn126 (talk) 15:16, 23 December 2022 (UTC)[reply]

It is most closely related to modern Bulgarian and Macedonian, although the literary forms of these languages are based on other dialects, rather than on the Solun dialect, which, apart from OCS, has not become the basis of any literary language.
- Czesław Bartula

Oppose per quote^. Sławobóg (talk) 16:53, 20 June 2023 (UTC)[reply]

@Sławobóg: Could you also respond to the points raised above about how other languages are treated? Why should Bulgarian and Macedonian be treated differently from Romance and Indic languages in your opinion? Thadh (talk) 16:56, 20 June 2023 (UTC)[reply]

Use of idem in etymologies edit

I stumbled upon rayon's French entry, and I noticed that the definition of the etymon 'ree' is listed as "id.", presumably to mean that it means 'honeycomb' like Old French raie and Frankish *hratu. I figure that at least the definition "id." wasn't intended to be in quotations, but is it even acceptable to use idem in an etymology? Wouldn't it be better to just repeat "honeycomb" three times? Qwed117 (talk) 17:54, 22 December 2022 (UTC)[reply]

Why gloss both altforms? I'd just write

raie, ree ("honeycomb")

Or I wouldn't gloss either of them and leave the reader to draw the obvious conclusion from 'Frankish *hrātu (“honeycomb”)', the following sentence, and the senses provided below.

Nicodene (talk) 20:49, 22 December 2022 (UTC)[reply]

Agree with Nicodene: I would just add that putting id. or the like inside quotation marks is potentially confusing, since it implies that "id." is the actual gloss. If it's really necessary you could just write "(same meaning)" manually afterwards, or (ab)use the pos= parameter. —Al-Muqanna المقنع (talk) 20:54, 22 December 2022 (UTC)[reply]

Agreed with the above.

I'll also point out that Wiktionary is not paper -- we have no need for the kind of aggressive abbreviation so prevalent in print dictionaries. I would prefer that we never use "id." anywhere at all, as this is not just unneeded for space reasons, but also unnecessarily confusing to readers. ‑‑ Eiríkr Útlendi │^{Tala við mig} 00:52, 23 December 2022 (UTC)[reply]

I agree with all of User:Eirikr's points. Benwing2 (talk) 08:20, 25 December 2022 (UTC)[reply]

Can alternative forms have different etymologies? edit

Should something be able to be listed as an alternative form if it has a different but cognate etymology? The pair of terms I am considering are Linnaean and Linnean. I've seen some instances where an English term for a Slavic concept had a number of different spelling since it has been borrowed a number of different times different Slavic languages which have slightly different words for the concept. I can't find the example I am thinking of, but a similar case exists with Berdychiv and Berdichev. In the end, my question is should terms like Linnaean and Linnean or Berdychiv and Berdichev be able to be listed as alternative forms of each other? If so Wiktionary:Entry layout#Alternative forms should probably be updated to explicitly include such instances. —The Editor's Apprentice (talk) 22:49, 23 December 2022 (UTC)[reply]

The way we use "alt form" is not terribly consistent. I agree with the response you got on the Discord chat, i.e. "alt form" is probably fine if you include an etymology in each of the entries. Equinox ◑ 22:50, 23 December 2022 (UTC)[reply]

Well, it has been a little less than a week since my first post and the response to my proposal has been positive, so I'll move ahead with editing Linnaean and Linnean. For the record, the others who responded positively on the Discord server were Catonif, Soap, and the Theknightwho (apologies for the pings). I may later start a formal vote on modifying Wiktionary:Entry layout#Alternative forms. —The Editor's Apprentice (talk) 06:22, 30 December 2022 (UTC)[reply]

"derogatory" -> "pejorative" in categories and labels edit

The label pejorative displays as derogatory and categorizes into e.g. Category:Italian derogatory terms. However, pejorative is the standard linguistic term. To add to the confusion, we have a template {{pejorative of}} that displays Pejorative of FOO but categorizes into LANG derogatory terms rather than LANG pejorative terms. To add even more confusion, we also have a template {{derogatory}} that is used for a totally different purpose, which is to tag terms that denigrate particular groups (e.g. racist or sexist terms). I propose fixing this by renaming the LANG derogatory terms to read LANG pejorative terms and make the pejorative label display as pejorative. I have no strong opinion about whether to map the derogatory label to pejorative (currently pejorative and derogatory both map to derogatory) but I tentatively feel they should be separated as it appears from the {{derogatory}} template that the term derogatory has a different, stronger and more specific meaning than pejorative. Thoughts? Benwing2 (talk) 08:19, 25 December 2022 (UTC)[reply]

While they often coincide, I thought that "derogative" implied some sort of belittling or enmity, while pejorative was simply stating the lack of quality of something. For example, "bad situation", or "bad moment" are pejoratives, but I wouldn't call them derogative. Italian -accio forms pejoratives, and not always derogative terms. I don't know how categories should be handled, but I'd agree with keeping the label "derogative", the speedy red template "derogative", and the "pejorative of" template as they stand. Catonif (talk) 20:09, 25 December 2022 (UTC)[reply]

I think the two labels should stay merged: they are treated as synonyms at Appendix:Glossary. Independent of use on Wiktionary my sense of "derogatory" vs. "pejorative" is actually the opposite, that pejorative is the stronger term. I can find various people online who share my opinion, but reference works don't seem to make this distinction: Merriam-Webster's Dictionary of Synonyms claims that "pejorative" is equivalent to "derogatory" but mainly used when words also have an earlier, non-pejorative sense, or for pejorative derivations from neutral words. If we follow that distinction, then it makes sense as a matter of wording to have {{pejorative of}} and not something like "derogatory form of", but to retain a single category. —Al-Muqanna المقنع (talk) 20:27, 25 December 2022 (UTC)[reply]

I used to think derogatory was stronger / more negative, but based on what other people were saying in the discussion about merging these, and in other dictionaries, I'm not sure other people have that distinction (some people have the opposite distinction!), and I know overly-fine distinctions don't get maintained by users in any event, e.g. swine has (since before the labels were merged) had "A contemptible person" as pejorative but "A police officer" and "Something difficult or awkward; a pain" as derogatory, if you look at the wikicode, which is surely a distinction without a difference. Back then, I supported a merge to "derogatory" because it was more common and older so I figured it'd be more intelligible ("pejorative" feels like a more jargon-y term to me), and since "pejorative" sounds weaker to me, I didn't want {{derogatory}}-derogatory terms labelled "pejorative". Back then, we didn't define either word in our glossary; these days the glossary 'redirects' derogatory to pejorative. I'm not opposed to unmerging them if we think we can maintain a distinction and clean up misuses, but I'm sceptical that we can. - -sche (discuss) 20:37, 25 December 2022 (UTC)[reply]

FWIW, here's how other dictionaries define the terms:

	derogatory	pejorative	notes
American Heritage	"Disparaging; belittling"	"Disparaging; belittling"	identical
Cambridge	"showing strong disapproval and not showing respect"	"expressing disapproval, or suggesting that something is not good or is of no importance"	difference of "strong disapproval" vs "disapproval"
Century	"Detracting or tending to lessen by taking something away; that lessens extent, effect, estimation, etc. [...] Syn. Depreciative, discreditable, disgraceful."	"Tending or intending to depreciate or deteriorate, as the sense of a word; giving a low or bad sense to."	(Century has an obsolete derogatory#Noun btw)
Collins	"If you make a derogatory remark or comment about someone or something, you express your low opinion of them" (synonyms: "disparaging, damaging, offensive, slighting")	"A pejorative word or expression is one that expresses criticism of someone or something" (synonyms: "derogatory, ...")	is "low opinion" vs "criticism" a meaningful distinction?
Dictionary.com	"tending to lessen the merit or reputation of a person or thing; disparaging; depreciatory" (synonyms: "belittling, uncomplimentary, denigrating")	"having a disparaging, derogatory, or belittling effect or force" (synonym: "deprecatory")	doesn't seem to make a distinction (defines both as "disparaging", and defines pejorative as "derogatory")
MacMillan	"showing that you have a bad opinion of something or someone, usually in an insulting way"	"a pejorative word, phrase, etc. expresses criticism or a bad opinion of someone or something"	difference of "bad opinion ... insulting" vs "bad opinion"
M-W	"expressive of a low opinion : disparaging" (synonyms include pejorative)	"having negative connotations (see connotation sense 1), especially : tending to disparage or belittle : depreciatory" (synonyms incline derogatory)	not sure this is much of a distinction, "low opinion : disparaging" vs "negative connotations [...] disparag[ing]"
Oxford Learner's	"showing a critical attitude and lack of respect for somebody" (synonym: insulting)	"express[ing] disapproval or criticism" (synonym: derogatory)	is this a meaningful distinction?

(It seems like they either treat the terms as synonymous, sometimes exactly or explicitly synonymous, or else treat derogatory as a little stronger.) - -sche (discuss) 20:37, 25 December 2022 (UTC)[reply]

To add to the pile (and without copying the whole thing), the OED has "a derogatory word or form" (noun), "depreciatory, contemptuous; (linguistics) giving or acquiring a less favourable meaning or connotation" (adj) for pejorative, and "disparaging, disrespectful, lowering" for the relevant sense of derogatory. —Al-Muqanna المقنع (talk) 20:55, 25 December 2022 (UTC)[reply]

@Catonif, -sche, Al-Muqanna Thanks for the comments. It sounds like the two labels should stay merged. The question then, getting back to the original thread of this post, is should the categories and display text read "derogatory" or "pejorative"? I still maintain we should use "pejorative" as this is the more common term in a linguistic sense these days. Wikipedia uses pejorative suffix to describe things like -astre and -accio; derogatory suffix doesn't even exist as a redirect. Google Ngrams agrees, where 'pejorative suffix' is 5x as common as 'derogatory suffix'. derogatory term meanwhile redirects to pejorative. Benwing2 (talk) 22:26, 25 December 2022 (UTC)[reply]

For {{label}}ing words, "derogatory" struck me as the clearer (and more common) label, though I'll defer if consensus is otherwise. For affixes (at least in most languages), where the categorization is different and manual anyway, I see no problem with manually writing "pejorative" in the definition and {{cln}} instead of derogatory, and it does seem like the difference in display and categorization could be accomplished that way i.e. by spelling it out in the definition rather than using either word in {{lb|foo|...}}. E.g. in -uccio, the only entry in Category:Italian derogatory suffixes, that category has been manually added and the {{lb|it|derogatory}} seems redundant to the {{non-gloss definition|Diminutive suffix with patronizing or pejorative connotations when attached to specific words.}}, so I see no problem with dropping the label and renaming the category. Likewise, I agree with Al-Muqanna that {{pejorative of}} seems OK as it is, particularly for languages other than English that indeed have standard methods of deriving pejorative forms from base forms. (Whereas, I can see how we might prefer to not use {{pejorative of}} at all but instead use a {{label}} + an actual definition for an English entry like Eurofag... it isn't really a "pejorative form" of *Euro (“a European person”) anyway.)
On a procedural note, if we keep these merged at derogatory, we should make derogatory the lemma in the glossary rather than pejorative, since if typing {{lb|en|pejorative}} results in the display (derogatory), the displayed word is the one we should define, no? And if we distinguish them, e.g. using pejorative for affixes, or maybe even if we don't distinguish them, I'm inclined to try to expand the glossary (...and maybe derogatory and pejorative) with a little summary of what the refs above say about whether and how they're different... - -sche (discuss) 17:22, 26 December 2022 (UTC)[reply]

@-sche Can you clarify your preferences? Are you suggesting we rename Category:Italian derogatory suffixes to Category:Italian pejorative suffixes (manually added when necessary) but keep Category:Italian derogatory terms (generated by {{lb|it|pejorative}} or {{lb|it|derogatory}}) as-is? If so, we should maybe create a separate category for pejorative forms (generated by {{pejorative of}}), e.g. Category:Italian pejoratives or something. For example, Russian золоти́шко (zolotíško) is the pejorative variant of зо́лото (zóloto, “gold”); it's hard to say it's derogatory in any way, more like meaning "worthless gold", "ill-gotten gold", "that damn gold", etc. depending on the context. Sometimes it's best translated as simply "gold", with the pejorative sense clear from context. e.g.:

Поэтому местное золотишко добывают дети на нелегальных шахтах, работая, по сути, за еду.

Poetomu mestnoje zolotiško dobyvajut deti na nelegalʹnyx šaxtax, rabotaja, po suti, za jedu.

That's why the local gold is mined by children from illegal mines, working, in fact, for food.

(from context.reverso.net)

Benwing2 (talk) 03:51, 28 December 2022 (UTC)[reply]

Yeah, renaming Category:Foobar derogatory suffixes to Category:Foobar pejorative suffixes seems fine, and leaving золоти́шко defined (like it currently is) as "Pejorative of зо́лото". I'm not sure what the category for зо́лото etc should be. I wondered if it should be "Foo pejorative forms", since they're like derived forms(?) of the words, but I see that even T:superlative of just puts things in "Foo superlative adjectives" (and that "Category:Foo superlative adjective forms" is actually for inflected forms of those), so IDK. My only hesitation about "Foo pejoratives" is whether people will confuse it vs "Foo derogatory terms" and miscategorize things, but I suppose if both are added by templates and there's no reason for someone to manually add them, there's less risk of confusion/mixup. - -sche (discuss) 21:06, 1 January 2023 (UTC)[reply]

@-sche I looked into renaming 'derogatory suffixes' to 'pejorative suffixes'. There are only three languages represented currently (English, Spanish and Italian) and the English suffixes given (-tard; -fag; -ee mocking Chinese people; etc.) seem fundamentally different and nastier than the Italian and Spanish suffixes. I wonder if we shouldn't just remove the English suffixes from the renamed category. Benwing2 (talk) 21:45, 2 January 2023 (UTC)[reply]

I agree. I think we should just categorize English -fag et al as suffixes ({{en-suffix}}) and as derogatory ({{lb|en|derogatory}}), and stop manually adding "pejorative suffixes". I commented above that I don't think it makes sense to view an English word like Eurofag as a {{pejorative of}} *"Euro", it seems different from a situation like Russian where "pejorative suffixes" and regular derivation of pejorative forms from other words are a thing. - -sche (discuss) 23:26, 2 January 2023 (UTC)[reply]

@-sche OK, I renamed the suffix category. I also changed the language of 'derogatory terms'. Formerly, both 'derogatory terms' and 'derogatory suffixes' said LANGNAME terms/suffixes that belittle (lessen in value). I kept this wording for 'pejorative suffixes' but changed 'derogatory terms' to LANGNAME terms that are intended to disparage, demean, insult or offend. This seems more accurate or at least specific, and it mirrors the language used for 'ethnic slurs', which says LANGNAME terms that are intended to offend certain ethnic groups. Benwing2 (talk) 23:51, 2 January 2023 (UTC)[reply]

"Category:Uncountable nouns by language" vs "Category:Singularia tantum by language" edit

Category:Uncountable nouns by language vs Category:Singularia tantum by language seems to be the very same thing. It works particularly badly for Swedish: Category:Swedish uncountable nouns vs Category:Swedish singularia tantum. Propose to merge those 2 groups. Taylor 49 (talk) 19:10, 25 December 2022 (UTC)[reply]

I agree they're very similar, but they aren't the same. In English plural-only nouns (pluralia tantum) can also be uncountable, e.g. outskirts. As far as Swedish in particular goes a native speaker will have to weigh in. —Al-Muqanna المقنع (talk) 19:28, 25 December 2022 (UTC)[reply]

Agree with Al-Muqanna. They are largely the same but not exactly. There's a slight nuance and difference between them. Vininn126 (talk) 19:32, 25 December 2022 (UTC)[reply]

OK, outskirts is uncountable and pluralia tantum instead ... but are there pluralia tantum that are countable? Let me guess that over 90% of nouns in those 2 groups are assigned randomly, rather than on the base of those slight nuances. If there is no universal definition of those nuances that could govern the use of those 2 categories consistently across languages, then they maybe should get merged nevertheless. Taylor 49 (talk) 20:34, 25 December 2022 (UTC)[reply]

There is a fair amount of inconsistency in how it's applied, yeah. For Latin marking a noun as singular-only in the head will categorise it into Category:Latin singularia tantum and not Category:Latin uncountable nouns, whereas for English the main setting on {{en-noun}} is whether a noun is countable/uncountable, and {{en-plural noun}} is (for some reason) a separate template. For the same reason, loads of Latin proper nouns get categorised into Category:Latin singularia tantum whereas English ones generally aren't found in Category:English singularia tantum. I don't think they can be merged as such though: they mean distinct things, and there are languages that don't generally mark for singular/plural at all but still have countability as a feature, e.g. we have Category:Chinese countable nouns. —Al-Muqanna المقنع (talk) 21:17, 25 December 2022 (UTC)[reply]

@Taylor 49 sunglasses, scissors, jeans, etc. are countable pluralia tantum. English typically uses "pair" to count such nouns but colloquially you can sometimes just say "two scissors" (but not so much #"two jeans"). It's true that uncountable pluralia tantum are relatively rare but the example of outskirts shows they do exist; English poetic language contains many such terms, like heavens, waters, rains, etc. Languages like Russian and Latin have special sets of numbers for counting pluralia tantum. I'm not sure there are such things as countable singularia tantum; it seems to me that singulare tantum implies uncountable, but I may be wrong. You are right however that these distinctions aren't always made consistently in the template implementations. Benwing2 (talk) 22:44, 25 December 2022 (UTC)[reply]

Let's write up, in the glossary or somewhere else, what distinction we aspire to maintain between these, with as many examples as possible of words that are at different intersections (plural-only and uncountable; plural-only and countable?; singular-only and countable?; etc). That would help with enforcing consistency on wayward entries and templates. I know Equinox (and inspired by him, I) has raised questions in some previous discussions about whether it's technically correct that templates like {{en-noun}} use "uncountable" to mean "does not have a plural form", i.e. treat those things as 100% synonymous and interchangeable, or whether there are cases where the two are distinct and we should tweak {{en-noun}}. - -sche (discuss) 04:52, 26 December 2022 (UTC)[reply]

proposing to clean up column templates edit

FYI I am proposing to clean up column templates, removing the redundant ones. (Column templates include things like {{top2}}, {{rel-top}}, {{col}}, {{der3}}, etc. See Category:Column templates.) See WT:RFDO#remove lesser-used column templates. I am posting here as well in case people don't look at WT:RFDO; please add comments under WT:RFDO rather than here. Benwing2 (talk) 22:34, 25 December 2022 (UTC)[reply]

How is this different from my proposal to use {{col-auto}} and the related ones to merge it? Vininn126 (talk) 22:44, 25 December 2022 (UTC)[reply]

@Vininn126 Not the same. AFAIK your proposal was to replace templates like {{col4}} that manually specify the number of columns with the auto-specifying template {{col-auto}}, and there were several objections to this based on the non-optimal functionality of {{col-auto}}. My proposal aims in general to not make functional changes but to eliminate redundancy where the same functionality is implemented in multiple templates. Benwing2 (talk) 22:56, 25 December 2022 (UTC)[reply]

@Benwing2: I use {{top2}} as a starter template which can be used with {{bottom}} to templatise anything that needs it, any strung-out list with more than four entries. It can be revised to another template later if necessary. DonnanZ (talk) 12:52, 2 January 2023 (UTC)[reply]

Two RFV discussions about social media attestation for emoji edit

I'm listing these here for broader exposure because the RfV threads are several months old, and so could easily be overlooked. Please comment so we can settle the issues and reduce the non-English RfV backlog.

70.172.194.25 22:52, 27 December 2022 (UTC)[reply]

Well, it's the last day to comment on both of these according to the arbitrary predetermined time limit. It looks like "crab" will pass and "OK sign" will fail. 70.172.194.25 05:44, 11 January 2023 (UTC)[reply]

term for generalization of verb-noun compounds? edit

Verb-noun compounds are extremely common in many languages, e.g. the Romance languages. In English these are often (but not always) rendered in the form 'NOUN-VERBer', cf. Spanish lavaplatos (“dishwasher”, literally “wash-plates”), tocadiscos (“record player”, literally “play-records”), rascacielos (“skyscraper”, literally “scrape-skies”). Current etymologies of these terms are handled in all sorts of inconsistent ways. I am going to create a template, tentatively {{it-verb-noun}}, to standardize etymologies of such terms in Italian, using consistent wording and auto-categorizing into Category:Italian verb-noun compounds; it can easily be ported to other Romance languages. However, in the process of doing this I've discovered that in several such compounds, the second element is not a noun; cf. Italian ammazzasette (“braggart”, literally “kill-seven”); asciugatutto (“paper towels”, literally “dry-everything”); buttafuori (“bouncer”, literally “throw-out”). How should we handle such cases? Should we ignore the fact that the second element isn't a noun, or avoid categorizing, or rename the category to something else (e.g. Category:Italian verb-object compounds)? User:-sche, thoughts? Benwing2 (talk) 03:40, 28 December 2022 (UTC)[reply]

More specifically, it appears the verb is conjugated in the third person singular, so Spanish lavaplatos (“dishwasher”) literally translates as ("washes-plates"), or "[one who] washes plates"; likewise, Italian ammazzasette (“braggart”, literally “[one who] kills-seven”); asciugatutto (“paper towels”, literally “[that which] dries-everything”); buttafuori (“bouncer”, literally “[one who] throws-out”). Leasnam (talk) 17:54, 28 December 2022 (UTC)[reply]

In the case of Italian ammazzasette and asciugatutto, I think it's accurate enough to categorize them as verb-noun compounds. There is a definition of tutto as a pronoun, and in some part-of-speech categorization schemes pronouns are treated as a subcategory of nouns. Likewise, numerals are sometimes given definitions as nouns. In both cases, the second word seems to have the role of a direct object of the verb, like with the other examples with words that are uncontroversially nouns. So I would agree with calling the category "verb-object compounds". The case of buttafuori (literally “throw-out”) seems different, since fuori does not serve the role of a direct object of the verb. It looks like a few verb-noun compounds also exist that are not verb + direct object: for example, I saw trombamico (“fuck buddy”, literally “fuck-friend”). I'm also unsure about tornaconto: Treccani says "dalla locuz. tornare conto «essere utile, vantaggioso»" but I'm confused about why it has a plural form tornaconti, since I thought this kind of exocentric compound was invariable in Italian. It looks like tornaconto and some other odd cases are discussed in the paper "VN COMPOUNDS IN ITALIAN AND SOME OTHER ROMANCE LANGUAGES, PHRASAL SPELL-OUT AND REBOOTING" (L Franco)—that might be worth a read.--Urszag (talk) 03:51, 28 December 2022 (UTC)[reply]

@Urszag Other examples might be batticuore (“palpitation”, literally “beat-heart”) ("heart" is the subject), battimazza (“assistant blacksmith”, literally “beat-mallet”) ("mallet" is used instrumentally) and battifiacca (“lazybones”, literally “beat-laziness”) (unclear relationship). Benwing2 (talk) 04:04, 28 December 2022 (UTC)[reply]

Curious, but these appear to be formed using the second person verb conjugation (or imperative). In the case of battifiacca, perhaps this is using the sense of "pound, insist" as in "[one who] insists on being lazy" or "[one who] drives home [their] laziness" (?) Leasnam (talk) 18:02, 28 December 2022 (UTC)[reply]

I don't have any particular insight into this topic, I'm sorry. Do Italian sources consider ammazzacaffè, ammazzasette, batticuore, battimazza, and buttafuori to all be one type of compound (distinct from other types like noun-noun) that we should be seeking one umbrella term for, or might these represent multiple different categories? If we put verb-pronoun, verb-adverb, etc. compounds into Category:Italian verb-noun compounds, random users will probably try to 'helpfully correct' things by removing them; I don't know if it'd be better to put them into some "verb-object" category or just leave them in the main/catchall "compounds" category. What kind of compound is English blowout, or catchall? That might help with determining what buttafuori is. We have such an array of compound-subtype categories, it's hard to quickly determine which ones apply, and the categories themselves seem to be inconsistently categorized: Category:Italian verb-noun compounds is given as a subcategory of Category:Italian exocentric compounds, but this seems wrong, because as Category:Dutch endocentric verb-noun compounds shows, "verb-noun" is not inherently a subset of "exocentric"... so we may need to check the current setup...
- -sche (discuss) 09:28, 28 December 2022 (UTC)[reply]

Do we need categories "Category:<lang> e*ocentric verb-noun compounds" at all? Aren't they always just the intersection of "Category:<lang> e*ocentric compounds" and "Category:<lang> verb-noun compounds"? --Lambiam 10:49, 28 December 2022 (UTC)[reply]

Such categories can be useful for those not versed in Cirrus Search. I would argue that anyone who needs the list of members of this particular category should become versed in Cirrus Search to generate it when needed. I would think we would want to follow a policy of eliminating categories that are simple intersections of other categories unless a normal or casual user would find them very useful. An additional exception might be for an intersection with many members that was very frequently in use. DCDuring (talk) 15:59, 28 December 2022 (UTC)[reply]

Perhaps we should add some info about the use of Cirrus Search, with a pointer to m:Help:CirrusSearch, at our page Help:Searching. --Lambiam 14:04, 30 December 2022 (UTC)[reply]

Wiktionary:Forms and spellings - leetspeak? edit

Is there a reason for the explicit leetspeak "exception" that's given pride of place in the lede here? It's been in the page since it was created in 2007 but comes off as bizarrely dated now, if it's an actual policy exception it can probably be moved somewhere else on the page. —Al-Muqanna المقنع (talk) 11:20, 29 December 2022 (UTC)[reply]

I'd move it to its own heading. DCDuring (talk) 16:24, 29 December 2022 (UTC)[reply]

That reminds me: I see no way to cover it on Wiktionary, but in early BBS/Internet days, kids would wRiTe LiKe tHiS in alternating caps to show that they were kewl and underground. I thought of this because I've recently seen the same style used in generic mockery, e.g. i LoVe DoNaLd TrUmP!!! Equinox ◑ 16:28, 29 December 2022 (UTC)[reply]

I guess it's spelled out because the border between "this is obviously inclusion-worthy (e.g. flavour vs flavor) and "this is obviously not includable" (e.g., uppercasing THIS word for 𝘦𝘮𝘱𝘩𝘢𝘴𝘪𝘴) is kinda indistinct, so if we accept some leet, it's helpful to make that clear...? (As DCDuring says, we could move it further down the page / under its own heading, though.) I'm guessing Equinox's CaMeLcAsE falls on the "systematic/regular and so not includable" side of the line, since you can do it to any word...? - -sche (discuss) 19:59, 11 January 2023 (UTC)[reply]

@-sche: That's true, I suppose, but the current note also doesn't really clarify anything IMO, in particular about where the line is between leetspeak that deserves an entry vs. rote substitution similar to the other stuff you mention. Do we need an explicit exception from the existing guidelines? There are only 71 entries in Category:English leet, and a lot of them clearly aren't rote substitutions, e.g. -0r and its derivatives. Some of them on the other hand seem miscategorised to me (the ph- substitution in phun is an earlier hacker thing as per the citation, originally based on phreaking I think, and not "leetspeak"), and others are pretty dubious per the rote substitution rule (ch3ap, which isn't defined or cited as leetspeak anyway, passw0rd). —Al-Muqanna المقنع (talk) 20:17, 11 January 2023 (UTC)[reply]

On banning dogwhistles edit

The political sense of dog whistle is in my experience always derogatory. I have just now labeled it as such. As a derogatory term it does not belong in definitions. I propose to ban dogwhistle from definitions, including usage notes and citations intended to get the word in through the back door. Of course, Citations:dogwhistle welcomes all comers.

We would benefit from a standardized set of labels that express the right/left divide common in American English, in cases where it is not otherwise clear. You don't see many liberals saying China virus and you don't see many conservatives talking about gender affirmation. If you say either I have a good idea what you are. We have a usage note at Democrat saying "Adjectival use carries a strong connotation of political conservatism." That's all that really needs to be said, and there is some benefit to saying it because attributive use of Democrat was polarized by an accident of history. Vox Sciurorum (talk) 20:52, 29 December 2022 (UTC)[reply]

On an unrelated note, the current definition at dog whistle ("[...] that only a certain audience is intended to note and recognize its significance.") sounds ungrammatical. P U C – 21:02, 29 December 2022 (UTC)[reply]

These edits broke it. P U C – 21:06, 29 December 2022 (UTC)[reply]

Where has it been used? It certainly should not be used in glosses on sense lines. Equinox ◑ 21:09, 29 December 2022 (UTC)[reply]

One use that I think would be difficult to explain any other way (unless we delete the definition) is at 👌 as mentioned up above at Wiktionary:Requests_for_verification/Non-English#👌. —Soap— 22:13, 29 December 2022 (UTC)[reply]

I don't think this proposal makes sense at all. The term dog whistle describes a concept that there is no other obvious word for. I also don't see how the word is derogatory, either: the connotation might be negative, but by the same logic lie would be a derogatory term as well. I also don't think there are any other words for which we have a ban as broad as the one proposed here. Theknightwho (talk) 22:25, 29 December 2022 (UTC)[reply]

We could make a case for excluding any use of derogatory or offensive senses of terms in definitions. I would prefer such a broad, principled exclusion to an itemized list of senses to be excluded. It may not be practical. DCDuring (talk) 21:17, 29 December 2022 (UTC)[reply]

Examples would help show the need for this policy. "China virus" is not a "dogwhistle" by any sense of the term because it's obvious to all what it means—people just disagree about whether it's an appropriate name. The concept of a "dogwhistle" is a bit complicated and disputable, but there are some limited situations where I think it may be helpful: the best case that I know of is the word fren, where it is both a) extremely clear to anyone who looks into it that it was being used by some speakers as a kind of "coded" language and b) sufficiently unobvious that someone unfamiliar with that usage might plausibly not know about the coded meaning when seeing a less blatant example of the word being used this way. I don't think the usage of "dogwhistle" in the usage note there is inappropriate; that said, the current definition and usage note there could still be improved (I think it's usually inaccurate to say that this use means "white nationalist; fascist; far-right supporter", my sense is that it was more often used by such speakers with a meaning something like "co-ethnic" or "white", as part of a worldview that stereotypes non-white inhabitants of Western countries as predominantly criminal and violent hence "nonfrens": some examples: "It's a shame to frens had to flee their homes. Don't let the nonfrens destroy their homes! We should all be able to live freely with our own frens." and "For every 1,000,000 quran non fren, there is one fren"). Comparing this to comrade, I see that word simply labels some senses with "communism"; would this strategy be recommended for other politically charged terminology (a label with the name of the relevant political ideology)?--Urszag (talk) 22:45, 29 December 2022 (UTC)[reply]

This is an interesting question. Several entries that currently contain the phrase do seem like they'd be better described in other ways, because they don't seem to be dog whistles (regardless of whether dog whistle is derogatory); e.g. Russian Z as "a pro-war symbol" doesn't seem to be "aimed at particular groups [to] only be fully understood by them" (to use MacMillan's definition of dog whistle) or "targeting [...] potentially controversial messages to specific voters while avoiding offending those voters with whom the message will not be popular" (Collins) or "only intended for and heard by a particular group of people" (Oxford Learner's), since it seems to be (intended to be?) widely heard and understood also by non-supporters. OTOH, as Urszag and TKW say, some things are dog whistles.
I hadn't interpreted it as a derogatory for those things, but I see Cambridge and MacMillan do label it "disapproving" and "showing disapproval", respectively. Collins, Dictionary.com and Merriam-Webster attach no such label. Hmm... as TKW says, a speaker may think using dog whistles is negative, like making e.g. false or inflammatory statements is negative, but I think to be {{lb|en|derogatory}} a term must be more pejorative than some neutral description of the thing (hence false is not a {{lb|en|derogatory}} word), so what do you suggest would be a neutral description for dog whistles? Coded language? Hmm; I can see how someone would view dog whistle as more pejorative than coded language (and the latter is also slightly more common). OTOH, are they really interchangeable?
Dog whistle seems to be a specific concept: as the definitions above say, the whistle is intended to picked up by many people (the audience it's intended to signal to) while its offensiveness is not explicit, whereas coded language seems to put emphasis on being intended to be obscure and not understood by many people (who are uninitiated), only by initiates.
So, I think there are some entries that could stop using the phrase dog whistle simply because they may not be dog whistles in the first place, e.g. maybe Russian своих не бросаем (svoix ne brosajem) is just a "slogan" or "rallying cry"? And maybe stunning and brave is just a memetic set of words (see RFV)? But some things are dog whistles, and for those, it seems like the right term to use, especially in usage notes — as are being discussed for groomer at RFV, for example. - -sche (discuss) 19:25, 30 December 2022 (UTC)[reply]

I'm not sure abut there being a real distinction between 'dog whistle' and any 'secret' term, in early use not well known outside a group, used to signal membership in, identification with, or insider status with said group. DCDuring (talk) 19:39, 30 December 2022 (UTC)[reply]

True dogwhistles are deceptive: those who use them are intentionally hiding the message they're sending to the target group by using language that they know the others won't interpret the same way. As for @DCDuring's point, dogwhistles deliver hidden messages, usually more than just secretely identifying themselves with the target group. Chuck Entz (talk) 20:52, 30 December 2022 (UTC)[reply]

Maybe I am reading too much into the metaphor, but a literal dogwhistle gets a response from a dog, not others, ie, it is cryptic to outsiders. Isn't that a defining characteristic of lots of in-group language? (BTW, isn't dog-/dog used to demean whatever follows? Ie, isn't it as offensive, suggesting that the audience for the cryptic term are "dogs"?) Is the problem with dogwhistles that they are cryptic slurs? If they are such, do they only become offensive when they cease being cryptic, ie, no longer truly dogwhistles in the sense of the metaphor? Would the "dogwhistle" phenomenon, perhaps under another, non-offensive name belong in the etymology? DCDuring (talk) 22:03, 30 December 2022 (UTC)[reply]

Don't words like diversity/diverse, inclusive, self-identify, gender politics, LGBTQ+ serve as dog whistles carrying meanings (connotations?) for some far beyond what most normal definitions a dictionary would have? DCDuring (talk) 22:21, 30 December 2022 (UTC)[reply]

No. If "diversity" means more to someone than what is intended, it's not because it's a cypher for bigotry like "welfare queen" is. —Justin (koavf)❤T☮C☺M☯ 22:24, 30 December 2022 (UTC)[reply]

But welfare queen is not in any way cryptic, at least no more than many figurative uses of terms like queen.

I'm trying to get to an explicit set of differentiae for dog whistle because I am not really in tune with those to whom it seems natural and obvious. It seems that a dogwhistle term merely has to be cryptic (deceptive to the non-initiated), cryptic by intent of the speaker/author, and political. It does not have to be offensive, eg, fren. Does the nature of the intent matter? What if the intent is to evade content filters? Does the term have to be used by a group constituting a small minority (bolsheviks!), possibly in the nature of a conspiracy? The uses of the term dogwhistle here seem almost entirely directed at terms used in right-of-center use of social media. Is that an essential differentia? DCDuring (talk) 23:40, 30 December 2022 (UTC)[reply]

Judging from current use, a dog-whistle need not be in any way cryptic, either in intent or effect, certainly not when it becomes readily attestable. Nor need it be from a minority or a conspiracy group. Is a dogwhistle anything more than "that which inflames political discussion, rallying each side to its characteristic attitudes and beliefs". (It seems that it could be a lyric, a tune, a genre, a flag, a person (eg, Winston Churchill), a statue, a holiday, a book, a play, a movie, a TV show, a social media company, etc.) DCDuring (talk) 00:35, 31 December 2022 (UTC)[reply]

Yes the meaning of the term has evolved over time. I held out against the change for a long time, saying everyone else had it wrong, and was pleased to see that our current definition sticks to the earlier tradition and defines it specifically along the analogy with the physical dog whistle .... something only a few people can hear. And other people listening to a speech will not hear an ordinary offensive word in its place, theyll hear nothing at all. That's the meaning I learned, and why I don't want to just switch to another term.

But we're a descriptivist dictionary, as we so often say, and we may be against the tide on this term, as I've heard much more people use it with much broader meanings, such that any word or deed (but usually a word) that is offensive but has at least some derived non-literal meaning can be called a dogwhistle. —Soap— 00:45, 31 December 2022 (UTC)[reply]

It appears that the word was sometimes used only about racist-associated terms. Not so much lately. Socialism is referred to as a "dog whistle". My concern is that dog whistle has morphed so much as to be worse than useless, instead misleading, as a defining term or a label. I think that the use in journalism came to refer to terms that were clearly not obscure to anyone on either side or the middle of the audience political spectrum. As many (most?) readers were not dog-knowledgeable, they were not familiar with, or focused on, the idea that it was something that dogs heard and not others, ie, cryptic. This it seems that it now means something that summons the dog-whistlers' allies, their dogs. That may be what leaves it with a derogatory flavor, potentially offensive. DCDuring (talk) 01:41, 31 December 2022 (UTC)[reply]

You're assuming that you're always able to perceive them, and that therefore if you do not, the term must be meaningless. I see it a different way: the fact that you find some of them ridiculous is precisely why they get used as dog whistles in the first place: they present plausible deniability. Theknightwho (talk) 02:43, 31 December 2022 (UTC)[reply]

@DCDuring: we need to keep straight what is meant by "derogatory". Dogwhistles themselves aren't necessarily derogatory- they may be used to say neutral or positive things that the mainstream would disagree with. But we weren't discussing that. We were discussing whether to avoid describing things as dogwhistles because the term "dogwhistle" is derogatory towards the terms so described (and especially toward the people who use them). To use an analogy: a lie can be used to make something look better than it is, so it's not inherently derogatory. I would avoid describing something as "a lie" in an entry, however, because that wouldn't be NPOV. Chuck Entz (talk) 03:36, 31 December 2022 (UTC)[reply]

Indeed. The virtually complete disappearance in current usage of the cryptic nature of terms called dog whistles, a phenomenon noted in commentary, has made our "political" sense seem a bit dated., as Soap acknowledged above. The citations I have added don't fit that definition very well. The steady drift of meaning from the metaphor, to "a euphemistic inflammatory rhetorical device", to "call to arms" makes the term likely to be misleading in use as a defining term or a label. That the term itself is offensive, demeaning to the audience for such terms, is just icing on the cake, though many seem to thoroughly enjoy its flavor. DCDuring (talk) 05:00, 31 December 2022 (UTC)[reply]

Having read through this thread twice just to make sure I wasn't missing anything, I am still at a loss as to how you reached this conclusion at all. I'm also not at all convinced that lie is a POV term either. Some things are very plainly just lies, and we shouldn't shy away from that. Theknightwho (talk) 22:25, 31 December 2022 (UTC)[reply]

I don't see what lie has to do with dog whistle. At no stage in the evolution of the meaning of dog whistle was truth or falsity inherent in the content of the purported dog whistle. Furthermore, I don't think it is our role to opine on the truth or falsity of anything other than the meaning of words, which still seems to get us into PoV difficulties.

Which part of the conclusion bothers you?

Is it that the term dog whistle is per se offensive, whether what is being dog-whistled be true or false?

Is it that the terms referred to as dog whistles now are hardly ever 'inaudible' (aka 'cryptic') to anyone?

DCDuring (talk) 03:12, 1 January 2023 (UTC)[reply]

(AFAICT, Theknightwho was mentioning lie here because Chuck did.) The term dog whistle is obviously not "offensive"; two of five dictionaries I checked say it "expresses disapproval", a majority have it as a neutral term, none of them say it's "offensive" or "derogatory", and all of them say it's for something intended to be understood only by a target audience while concealed from others, so if you're proposing to radically redefine it — probably your definition should be a separate sense entirely — and to label it "offensive", it'd be helpful to see evidence supporting either of those changes. - -sche (discuss) 03:27, 1 January 2023 (UTC)[reply]

Correct. And the part of the conclusion that bothers me is that (a) DCDuring's argument for calling it offensive is extremely weak, as the metaphor is not using "dog" in a negative way; and (b) he has failed to address the point that he's assuming dog-whistles are all "obvious" nowadays because he's forgotten/has chosen to ignore the existence of the ones he doesn't spot. Theknightwho (talk) 05:18, 1 January 2023 (UTC)[reply]

Here is a thought experiment to tell if a word is gaining a derogatory meaning. Would you use it to describe yourself? In the third person one says dog whistle. In the first person, perhaps in joke or coded message? "I wear a hankie in my left vest pocket to signal to other men that I like to watch ballet instead of football. He wears a hankie in his left vest pocket as a dog whistle to other perverts." (Allegedly there was a handkerchief code but my example is invented.) Every use of dog whistle I have seen in the past few years has been a put down, a call to arms against the whistler who is not part of the ingroup. It may be a politically polarized term, used by liberals to put down conservatives. That impression of mine could be a sampling bias. If I went to Breitbart would I find it used as a call to hate liberals? Vox Sciurorum (talk) 20:47, 1 January 2023 (UTC)[reply]

I think this falters in two places. One, when a thing is (viewed as) negative, a person is less likely to describe themself as doing it, regardless of whether a word for the thing is neutral or negative: you might say "as you all know, my opponent has made a number of false and inflammatory statements about me", but you probably wouldn't tell a crowd of supporters "thank you all for coming out and supporting my campaign! I am going to make a number of false and inflammatory statements now", but the reason for that is not that false or inflammatory is derogatory. And two, a person is particularly unlikely to go around saying "next, I'm going to say something intended to be understood only by certain people, while its offensiveness is concealed from others", but that's because it'd be contrary to the point of being "intended to be understood only by certain people, while its offensiveness is concealed from others", not because any of the words which make up that sentence are derogatory. - -sche (discuss) 21:26, 1 January 2023 (UTC)[reply]

But again, I'm not saying "this word is fine so let's add it to everything!"; as I said, I think several things we currently describe as dog whistles are just "slogans" or rallying cries (Z, своих не бросаем (svoix ne brosajem), GAWA). Even some of the things which I thought might be good examples of dog whistles seem to just as often be described — in references / sources outside Wiktionary, I mean — as slurs, e.g. groomer and globalist, so maybe even there we can get by discussing the words' "connotations", their use by certain factions to convey certain things, their use as "slurs", etc, without needing the term "dog whistle", IDK. But if sources describe something as a dog whistle, I don't see the word as inherently problematic. - -sche (discuss) 01:34, 2 January 2023 (UTC)[reply]

I don't see the necessary connection between a dog-whistle term being false and its being a dog whistle. Most dog whistles seem not even to be propositions. (Prototypical examples are nouns that remind the audience of narratives, which are also not dog whistles, eg "Willie Horton", "neighborhood schools", "busing", "bail reform", "set-em-loose Bruce", "border crisis".) I thought truth and falsity only apply to propositions. I certainly see that dog-whistle terms are inflammatory (aka, mobilizing). I suppose that most users of the term dog whistle believe that the thoughts, attitudes and beliefs of the other side are intrinsically evil and, therefore (a non sequitur, BTW) false, so any element of the opposing ideology must be false, a lie. DCDuring (talk) 22:37, 1 January 2023 (UTC)[reply]

I agree there's no inherent connection between something being a dog whistle and it being true or false, since as you say, many dog whistles are not really propositions at all. I don't think anyone has said there is such a connection...? Several people have brought up the words lie or false or inflammatory as examples of other words which, like dog whistle, refer to things/qualities which are commonly disapproved-of (but where the word for the thing is not itself derogatory). - -sche (discuss) 00:43, 2 January 2023 (UTC)[reply]

The word as currently used (See entry.) is polysemic, potentially misleadingly. A dog-whistle term is very often not obscure to anyone; it just has different valence and different mobilizing/inflaming effect. If it sometimes means cryptic/coded and sometimes not, then it is confusing or misleading in use either as definiens or label.

Furthermore, it is offensive and demeaning to those being called to action/vote/donation/arms by the dog-whistle terms: they would be entitled to feel that they are being compared to dogs. Please note that it is highly likely that many don't know the original function of a dog whistle. Some seem to believe that it is a device used to summon a dog, a sub-human animal in the opinion of many. Our own definition of dog includes "(derogatory) Someone who is cowardly, worthless, or morally reprehensible." Compound terms including dog used attributively often indicate something that is inferior. I believe that we generally shun words that are offensive to groups. DCDuring (talk) 04:12, 2 January 2023 (UTC)[reply]

Can you supply any evidence in support of your argument that it's offensive? As far as I can tell, you are the only person who has claimed this. Theknightwho (talk) 05:10, 2 January 2023 (UTC)[reply]

I'm not sure that I know how. Looking at terms like ))) (((, a pox on, abortionist, autism, it's clear that we often have no support for what seem to me to be disputable claims of offensiveness. Most terms deemed offensive are not challenged because they have only one sense, which sense is clearly derogatory, or because the offensiveness seems obvious. I suppose we would have to go back, perhaps more than ten years, to instances of offensiveness labels or usage notes being first applied to our entries to see how it was done. I don't have any specific recollection of where, when, how, by whom, and to what entry/sense it was done. Maybe disputable cases included the word microaggression in discussion. DCDuring (talk) 18:44, 2 January 2023 (UTC)[reply]

Well, you seem to have formulated this view during this discussion (BTW, isn't dog-/dog used to demean whatever follows? Ie, isn't it as offensive, suggesting that the audience for the cryptic term are "dogs"?), and you've provided your reasoning for how you believe it's offensive. What I'm asking is whether you actually have evidence that it's actually viewed that way by anyone other than you, which doesn't really have anything to do with any other entries that may or may not be mislabelled. Theknightwho (talk) 18:49, 2 January 2023 (UTC)[reply]

How I formulate my view and what is acceptable here as evidence are distinct matters. I am not sure that we have any systematic way of supporting an assertion that a definition is offensive. In practice we seem to say that a term with some definition or usage pattern is offensive if:

has been asserted multiple times in print to be offensive (though we rarely, if ever, include citations supporting the claim),
is seemingly pejorative and directed against an identifiable group, especially a disadvantaged one (at least one not disapproved by us), or
annoys enough of us.

As to dog whistle, I have suggested that dog renders it disparaging to the audience supposed to be differentially affected by a given dog-whistle term. Judging by usage the group is apparently US voters and others who are concerned about matters that have a differential effect on racial of ethnic groups, such as neighborhood schooling, busing, location of various kinds of housing and public facilities, quotas, affirmative action. This seems to coincide with those famously called deplorables and includes a large portion of US Republican voters. Much of the usage of the term is clearly about the use of dog-whistle terms to mobilize such persons to do things like vote, make political contributions, volunteer for political campaigns, talk up their cause and candidates, and show up at political events such as rallies.

At the very least the term is used in a partisan way. Even if no participants in this discussion see it as offensive, there partisan nature of the usage is clear. I only found a single recent use of the term that was not partisan.

I think the term could be compared to virtue signalling (or bleeding heart), which is similarly partisan and, thereby, offensive. DCDuring (talk) 19:40, 5 January 2023 (UTC)[reply]

This is an even more ridiculous argument. It's got nothing to do with Republicans, in the same way the word lie doesn't. At this point, it's starting to feel as though you object because you just don't like acknowledging that this phenomenon exists, which is supported by the fact that you actively deny that it has any current relevance throughout this discussion (despite the evidence to the contrary). Theknightwho (talk) 21:01, 5 January 2023 (UTC)[reply]

Yeah, I come back to the thing you (TKW) and I both touched on at the start of this discussion, which is: what's a better term for this phenomenon, then? If someone thinks dog whistle is derogatory because it contains dog, what's a neutral way of describing "this word is used to signal X to Y, without other people (who might find that offensive) realizing"? DCDuring, you've said you just don't see "a real distinction between 'dog whistle' and any 'secret' term" (at which point Chuck and others explained the distinction). Based on that, and the lack of suggestion of what term or phrase would be better for describing this phenomenon, I (like TKW) get the sense that the underlying objection is acknowledging that the phenomenon exists, which is not an actionable objection, since the phenomenon is well-documented. - -sche (discuss) 21:58, 5 January 2023 (UTC)[reply]

TKW: Have you actually looked at who uses the term and about what? It is no less or more partisan than virtue signalling, bleeding heart and similar terms.

-sche & TKW: Current uses of dog whistle no longer refer to anything secret/cryptic. The term has connotations derived from its literal meaning and may have sometimes been used in accordance with that meaning, but it clearly no longer does so.

-sche: But what is the actual phenomenon if the cryptic/secret element is gone? It looks like labeling a phenomenon in a way that works well for one's own side in a partisan battle. The terms called dog whistles by their detractors are motivating terms for issues for the supporters of those who use the terms, ie, rallying cries, and motivate opposing partisans in the opposite direction. But anyone using political rhetoric will label things they don't like using terms that make clear what they are talking about without using terms that would bring opprobrium to their side. Rallying cries and indirect references to them also serve to identify which side the speaker is on.

Calling something a dog whistle is an attempt to produce a disparaging label for such ordinary and possibly effective political speech. The disparagement has the usually intended effect of marginalizing the issue as framed by the dog whistler, so as to prevent anyone whose mind is not yet made up from taking the issue referred to seriously in that frame. That disparagement too is ordinary political speech. But political speech is not a legitimate function of a purportedly neutral, objective, NPOV entity like Wiktionary.

An interesting neologism of some relevance is algospeak, which might apply to social media use of some terms. It embodies an alternative to the dog whistle hypothesis about motivation for use on social media of some of the terms labeled dog whistle. The use on social media might lead to use in other contexts, where the term would have a signaling/identifying function. DCDuring (talk) 23:55, 5 January 2023 (UTC)[reply]

But the cryptic/secret element hasn't gone - which is the point. The fact that many people can see them doesn't change the fact that they are used because they give plausible deniability. I find it disingenuous in the extreme to compare dog-whistles to ordinary political discourse, to be honest. Particularly given that you seem to be playing a rhetorical trick here: you're using arguments against the term dog whistle as a way to argue that the concept itself should not be referred to in glosses. Those are two very different issues: we don't pretend propaganda doesn't exist, just because we don't use newspeak in glosses. Theknightwho (talk) 01:57, 7 January 2023 (UTC)[reply]

Try https://lawandcrime.com/high-profile/twitter-explodes-after-homeland-security-headline-appears-to-mimic-14-words-neo-nazi-slogan/ . Yes, it's the 21st century and the power of people who see such things to mention them has exploded. But it's still a dog whistle.--Prosfilaes (talk) 19:04, 8 January 2023 (UTC)[reply]

Nothing in the discussion has solved the question. The first relevant result in my search is white pride, which does not match either of our definitions of dogwhistle. This is itself blowing the whistle to hypersensitive and poorly informed readers by stating the obvious, actually. 62.155.150.198 19:41, 7 January 2023 (UTC)[reply]

I think the term's meaning has evolved ... right now our entry sticks to the traditional meaning of a message only a few people can hear, but many people these days use it to mean any expression of racism, even something whose meaning is obvious to all. The debate about how to define the new sense (if we accept it at all) might outlast this discussion, though so far it seems not to have attracted attention. —Soap— 16:24, 11 January 2023 (UTC)[reply]

Another evolution of the term is that it means "inflammatory term" or even the more completely bleached sense of "a figurative whistle", "a call to action" ('action' being defined loosely). What's more, the usage in these figurative senses occurs in the same context. With the term carrying at least four figurative meanings I don't see how it is suitable for use as a definiens or label, however colorful it may be. Like many terms it has lost its ability to designate something specific. DCDuring (talk) 17:03, 11 January 2023 (UTC)[reply]

Relevant result in what search?--Prosfilaes (talk) 19:04, 8 January 2023 (UTC)[reply]

IMHO, this discussion should appear on or its archived location be referenced on Talk:dogwhistle. DCDuring (talk) 17:06, 11 January 2023 (UTC)[reply]
good idea; done: [1], [2], [3]. - -sche (discuss) 19:28, 11 January 2023 (UTC)[reply]

Template editor right request edit

Hello, I would like to request the template editor user right. I think I have a good editing background here, and by having that right I could make corrections easier, like at Module:zh-glyph/phonetic. Thanks, ChromeGames (talk) 05:49, 31 December 2022 (UTC)[reply]

@ChromeGames The template editor right gives the ability to change a lot of fundamental modules and templates, which can easily mess up the system if done wrong. I don't see any edits you've done that involve templates or modules, and if your interest is only in specific Chinese modules it might be easier to downgrade the protection of them. (See Wiktionary:Grease_pit/2020/January#Requesting protection for Middle Chinese and Old Chinese pronunciation modules though where the protection was originally requested, on the grounds that these rarely need to be changed ... are there specific changes you want to make?) Benwing2 (talk) 21:30, 1 January 2023 (UTC)[reply]

@Benwing2: Thanks for the response. You're right that I am not the most experienced template editor, but I think I do have edits here and on Wikipedia such that I wouldn't be breaking things I shouldn't be. That said, it might indeed be easier to reduce the page protections. Currently I want to replace 鱽 with 魛 to have the correct character in the phonetic series, but I mainly wanted permissions so that I don't have to keep requesting to make changes such as here and here, where at the latter @Huhu9001 suggested I could consider requesting this permission. Hope you're having a good new year, ChromeGames (talk) 18:09, 6 January 2023 (UTC)[reply]

@ChromeGames I downgraded protection on Module:zh-glyph/phonetic (that page is hard to load, needs to be split up ...). Let me know if there are any other pages needing protection changes. Benwing2 (talk) 19:41, 6 January 2023 (UTC)[reply]

@Benwing2: Thanks, I was able to edit the page successfully, I really appreciate the help. The other pages that contain the middle and old Chinese pronunciations/reconstructions seem more direct and less prone to simple transcription errors, so I think it's fine to keep them as they are with the current instructions to discuss at the tea room to propose changes. Thanks, ChromeGames (talk) 20:40, 6 January 2023 (UTC)[reply]