Wiktionary:Grease pit

Wiktionary > Discussion rooms > Grease pit

Welcome to the Grease pit!

This is an area to complement the Beer parlour and Tea room. Its purpose is specifically for discussing the future development of the English Wiktionary, both as a dictionary and thesaurus and as a website.

The Grease pit is a place to discuss technical issues such as templates, Lua modules, CSS, JavaScript, the MediaWiki software, extensions to it, Toolforge, etc. It is also a place to think in non-technical ways about how to make the best free and open online dictionary of “all words in all languages”.

Others have understood this page to explain the “how” of things, while the Beer parlour addresses the “why”.

Permanent notice

  • Tips and tricks about customization or personalization of CSS and JS files are listed at WT:CUSTOM.
  • Other tips and tricks are at WT:TAT.
  • Find information and helpful links about modules, Lua in general, and the Scribunto extension at WT:LUA.
  • Everyone is encouraged to expand both pages, or to come up with more such stuff. Other known pages with “tips-n-tricks” are to be listed here as well.

Grease pit archives edit
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022


May 2022

Bot for self creating cognagations from verbEdit

I'd love a bot which would create all conjugations from a verb, especially for German as it can be quite tedious (even with acceleration) since often lots of the words are not shown, like the zu-infinitive and preterite tense ADDSamuels (talk) 09:59, 1 May 2022 (UTC)

Unicode Private Use AreaEdit

(U+E864)を作成しようとしたところ、作成できませんでした。

具体的には、

{{character info}} ==Chinese== ===GB18030=== ''For pronunciation and definitions of <span style="font-family : 'SimSun', 'MingLiU', 'Dotum', 'Gulim', 'Gungsuh'">{{PAGENAME}}</span> – see [[龻]]''

という内容を投稿しようとしました。

w:GB 18030を参照。ソフトリダイレクトとして有用なため。--Charidri (talk) 09:22, 3 May 2022 (UTC)


Chinese
GB18030

For pronunciation and definitions of – see  ←このようにしたいです。--Charidri (talk) 09:33, 3 May 2022 (UTC)

@Charidri: we don't create entries for Private Use Area characters. The character you see on your system and the character I see on my system may be completely different. Anyone can use those codepoints for anything they want. Chuck Entz (talk) 13:40, 3 May 2022 (UTC)
それは分かっています。しかし、U+E815からU+E864までのような、UnicodeとGB18030(中国の規格)の対応関係が明白なものは、項目を作成する価値があります。見た目は、字の形が同じでも、はっきりと違うとアナウンスできます。そしてそれは、SimSunなどの標準的なフォントで、誰でも確認することができます。De Facto Standard です。--Charidri (talk) 04:54, 4 May 2022 (UTC)
@Charidri: I just created Appendix:Unicode/Private Use Area/GB 18030. This would be sufficient. --172.58.36.21 01:14, 7 May 2022 (UTC)
一覧を作成いただいて、ありがとうございます。でも、一つずつ個別に項目があったほうがいいと思います。一つの項目で横断的に確認することができます。たとえばU+E864が、GB 18030だったら龻、HKSCSだったら藮、Hanyangだったらᄫᅡᇹ などと。代表的なものをいろいろ確認できます。そして、その文字がPUAであることが明確に判別できます。--Charidri (talk) 06:11, 7 May 2022 (UTC)
とりあえずAppendix:Unicode/Private Use Area/GB 18030にGB対応フォントを指定しました。しかし、PUAを作成する価値は変わりません。PUAを作成するメリットは、外見は同じでも、その文字がPUAであると判別できます。やはり(U+E864)を含めてPUAの項目を一つずつ作成できるようにするべきです。--Charidri (talk) 10:16, 25 May 2022 (UTC)

Module:el-translitEdit

There is currently a module error at Greek ψεύτρα (pséftra) due to interaction between the respelling used in a transliteration parameter:

wikitext: {{m|el|ψεύτης|g=m|t=liar|ψεύ(της)}

and a mw.ustring.gsub capture on line 43 of the module. @Sarri.greek pinged @Benwing2 in her edit summary, but he hasn't been active for over a month.

Her edit summary was:

ety -τρα // bug: if i write {{l or {{af or {{m|el|ψεύτη|ψεύτ(ης)}} it works, but gives error if τ is moved: {{m|el|ψεύτη|ψεύ(της)}} Attn. Benwing2

Any help would be appreciated, as I have no clue about the finer points of ustring functions. Thanks! Chuck Entz (talk) 15:07, 5 May 2022 (UTC)

Thank you @Chuck Entz. I did not realize it was a transliteration problem (it is caused by the parenthesis, not expected at Module:el-translit
paragraph text = gsub(text, "([αεηΑΕΗ])([υύ])(.?)", the 'following' letters do not include a parenthesis symbol.
This intrusion of a parenthesis mark is not common and occurs only in etymology (for composition). A simple solution is |tr=- or tr=manual. ‑‑Sarri.greek  I 15:22, 5 May 2022 (UTC)
The module does not allow manual correction |tr=pséf(tis). ‑‑Sarri.greek  I 15:54, 5 May 2022 (UTC)
Fixed the module error at least. — Eru·tuon 20:23, 6 May 2022 (UTC)
Also made consonantal υ be more often transliterated correctly. — Eru·tuon 21:09, 6 May 2022 (UTC)

Caught in the vandalism filterEdit

Tried to make an entry for sexx0r, only to be told that I can't because it contains "xx". What should I do? Binarystep (talk) 12:21, 6 May 2022 (UTC)

@Chuck Entz. Binarystep (talk) 12:39, 6 May 2022 (UTC)
I created the entry. The filter should probably be adjusted to be less restrictive about which users it applies to. - TheDaveRoss 13:12, 6 May 2022 (UTC)
I added a check for edit count. The problem that led me to create the filter was the proliferation of new and clueless smartphone users who thought they could pull up porn by typing in "xx". Those are all IPs and brand-new accounts (we still get them all the time, years later). The filter could still use some tightening up to allow editing of things like Roman-numeral entries. Chuck Entz (talk) 21:24, 8 May 2022 (UTC)

fixing title displayEdit

Hello. What is the template to correct the display of an article title? For example, if a character displays as an emoji in some fonts, and we wish to suppress that, or if we want the title to display as italic or with subscripts. French WK has tl:titre incorrect, but that doesn't connect to anything here. kwami (talk) 20:32, 8 May 2022 (UTC)

As far as I can tell, there isn't one. The MediaWiki magic word is DISPLAYTITLE: but searching for it in the template namespace turns up only a few very specialized templates. Like DEFAULTSORT, this can be problematic: for instance, some Han characters display differently for different languages, or even for simplified vs. traditional Chinese. A template like you're describing might interfere with that kind of thing and perhaps even lead to disputes. Chuck Entz (talk) 21:12, 8 May 2022 (UTC)
So, how should I handle something like M (solar mass), where the M is italic and the ☉ subscript? The actual characters are M☉, but that's not how the symbol is rendered.
I see for H2O we get around this by using the hack H₂O, even though those are the wrong characters. And there is a Unicode italic 𝑀 we could use with a redirect, but no subscript ☉. kwami (talk) 21:27, 8 May 2022 (UTC)
Well, even problematic things like redirects have exceptions where they're appropriate, and that looks like one of them. Whenever something new is proposed, I always ask myself "what could go wrong", but that doesn't mean I'm against everything. If other people think it's a good idea to create such a template, I'm not going to object. Chuck Entz (talk) 21:42, 8 May 2022 (UTC)
A template like that could indeed be used to force a simplified or traditional display for CJK characters, but someone would have to go to some effort to find the proper variation selector to achieve that. The question would be why they did so, and if that's an appropriate reason.
I added a rd from H2O to H₂O, but with this template, the entry could be moved to H20 where it would be easier for readers to find, and there would be no need for a rd. The same for H2SO4 and similar chemical formulae where we currently use hacks.
Another use would be with the signs of the zodiac and the astronomical symbol for a comet, . We generally don't want entries for emojis, and these aren't. But they have an emoji option, and some popular fonts display them as emojis by default. Readers using such a font in their browser may wonder why I get to create Wikt articles for emojis but they don't. Forcing ☄ &c. to display in their non-emoji form would address that issue, while having no effect on most readers (since most fonts display them as text by default).
What could go wrong there: a reader might not have any font that supports text display of ☄, so if we forced it, they'd only see a box. But again that requires a variation selector, and it wouldn't be a problem for simple HTML formatting such as italics and subscripts. kwami (talk) 21:55, 8 May 2022 (UTC)

Another example is e.g. xkuMS in Chatino. That should be xkuMS, but there is not yet Unicode support for Chatino superscript S. Chatino C and F were just added to Unicode last year, but don't have much font support yet, and so will be impractical for probably a few years. So for the time being, the best approach would probably be to write them as we do now and use a title-formatting template to fix. kwami (talk) 16:32, 9 May 2022 (UTC)

I guess there is not a template for this yet(?), just the magic word. If there is some advantage to having a template and not just using the magic word, and you're able to write a template, go ahead, I guess. Fixing up entries' titles would be useful; the issue, as Chuck says, is just that there'd be certain entries it couldn't be used on (if there were multiple language sections that should display differently), but that's no reason not to let most of the entries get fixed up. If there are particular sets of terms which use particular templates and which are unlikely to be homographic to terms in another language that would need to display differently, we could even consider using the magic word / template inside those templates (or their underlying modules), like we already do to verticalize Mongolian-script entries' titles. E.g., maybe we could/should make the Chatino headword templates always superscript terminal capitals with only a parameter to turn it off in the apparently-uncommon situation of conflict (e.g. if there were a Chatino word *aE it'd have to be turned off there because there's also German aE), or having the {{taxon|genus always italicize the page title except in the apparently statistically-rare case of homography with something that shouldn't be italicized, like Homo. (I feel like the second one must've been discussed before already somewhere...) - -sche (discuss) 16:06, 18 May 2022 (UTC)

Lua memory usage on barEdit

The page bar was exceeding the Lua memory quota. I tried changing a bunch of the {{IPA}} and {{head}} templates to their -lite equivalents, and this only helped a little. But then I noticed there was a recent change to the German declension templates and modules, and when I undid this, it fixed the issue. Can someone look into whether those templates are doing something unreasonably costly? It may also be that they aren't super costly, but that they just happened to be the straw that broke the camel's back on a rather long page. 70.172.194.25 05:07, 10 May 2022 (UTC)

@Benwing2 Chuck Entz (talk) 05:11, 10 May 2022 (UTC)

Synonym collapser thing looks wrongEdit

See:

  1. A definition
    Synonym: hi

For me, the definition line looks like:

  1. A definition  synonym ▲

When it should look more like:

  1. A definition  [synonym ▲]

Just started happening today, and it only affects nyms, not quotations. This, that and the other (talk) 12:01, 11 May 2022 (UTC)

@This, that and the other: See MediaWiki talk:Gadget-defaultVisibilityToggles.js § CSS class. J3133 (talk) 12:19, 11 May 2022 (UTC)

Bot request: redundant Italian rhymesEdit

This search shows 634 pages that use {{it-pr}}, which generates rhymes automatically, but also {{rhymes|it|}}, which is added by edits in the Rhymes namespace. Can someone remove the rhyme templates from these pages? I can't think of a case where the pronunciation template doesn't take precedence over the one for rhymes. Ultimateria (talk) 00:27, 12 May 2022 (UTC)

  Done. —Svārtava (t/u) • 11:30, 15 May 2022 (UTC)

Template for Onkelos quotations?Edit

Hi! Lately I've been adding quotations from the Targum Onkelos to Aramaic terms, such as דתאה. For this purpose I used the RQ:Tanach template, since the Targum is a translation of the Bible in Aramaic. But I was wondering - is there a template specific for Targumic citations? And if there is no such template - could it be created? (such a template may be identical to RQ:Tanach, with an additional note that the citation is from the Targum, and not the original Hebrew text). Thanks! Cymelo (talk) 07:41, 11 May 2022 (UTC)

https://en.wiktionary.org/wiki/Template:quote-book Vininn126 (talk) 11:48, 16 May 2022 (UTC)
quote-book is poorly suited for classical texts; {{Q}} is the recommended alternative. However, in Cymelo's case, you could also add a parameter to {{RQ:Tanach}}, or ask for someone to help you if you aren't confident to edit that hot mess of template syntax! This, that and the other (talk) 12:36, 16 May 2022 (UTC)
Why has nobody seen {{RQ:Onkelos}}? Fay Freak (talk) 14:37, 16 May 2022 (UTC)
If this template, that I have not made, is lame, you are free to loan the code from one of my reference templates and even link various books and editions of the Targum and Talmud or even also Mishna, e.g. on the model of {{R:ar:GdQ}} or {{RQ:Ibn Batoutah}}, and even provide links to the respective sections or pages of editions, e.g. from Sefaria as their URLs do not look too unreasonable—I don’t what you use exactly. (I have little experience with adding such functionality to {{Q}}). Fay Freak (talk) 14:48, 16 May 2022 (UTC)
Thanks, but I'm still not 100% sure about how to create/changing templates (I'm quite new to Wiktionary). It would be better if somebody else could create it (I don't want to make a mess). Cymelo (talk) 12:10, 17 May 2022 (UTC)
I'm really not sure how to use this template, it seems pretty different from {{RQ:Tanach}}, and not suited for citations... Cymelo (talk) 12:04, 17 May 2022 (UTC)
Yes, I think that this would be the best option, since the Onkelos is an Aramaic translation of the Pentateuch, so apart from the language everything would be the same. An additional parameter such as Targum:Onkelus is basically what we need. Do you know who could I ask for help? Cymelo (talk) 12:03, 17 May 2022 (UTC)

Category:Jandavra_languageEdit

Can someone edit the relevant module(s) to include the missing data for this language please? Info can be easily found in the linked WP article. Acolyte of Ice (talk) 13:10, 18 May 2022 (UTC)

Requesting the same for Category:Kalkoti language. Acolyte of Ice (talk) 13:13, 18 May 2022 (UTC)

Quiet Quinton Further DevelopmentEdit

Is it possible to get QQ to not only backend G-books, but also WikiSource? Vininn126 (talk) 14:44, 18 May 2022 (UTC)

Big support for this idea from me. Theknightwho (talk) 11:58, 23 May 2022 (UTC)

Proto-Meso-Melanesian and Meso-Melanesian languagesEdit

@Kwékwlos has used these terms in the new entry for poke- so I'm wondering should we have these in our system here at Wiktionary? Researching this kind of stuff isn't really my thing so while I've glanced at Wikipedia I said I'd post here and see what people think and hopefully someone who knows the language template/module system can add data on this stuff if need be. Acolyte of Ice (talk) 12:12, 19 May 2022 (UTC)

@Acolyte of Ice Proto-Meso-Melanesian was first described by Ross (1988) as the ancestral language to the Meso-Melanesian linkage. But as a strict proto-language, it probably doesn't exist, being only a Western Oceanic residue of mutually intelligible dialects. Currently I am focusing on Bali (Uneapa) which is the most conservative language of Oceanic in phonology, but lacks a dictionary that could be used for comparative purposes. Besides I have to deal with areal words (shared by Willaumez and the Bariai languages). Kwékwlos (talk) 12:18, 19 May 2022 (UTC)

zh-pron IssueEdit

@Fish bowl, Justinrleung, RcAlex36, Theknightwho Several years ago, I don't remember when anymore, either I personally or someone else (not sure anymore) was able to add Tongyong Pinyin to Template:zh-pron for one syllable and multi-syllabic Chinese character entries. However, there was something stopping us from unlocking Wade-Giles for the multi-syllabic entries. Keep in mind: all the syllables are already inputted into zh-pron-- they get displayed in the zh-pron box for all (one syllable) Chinese characters- no problem! It feels like Wiktionary is one small step away from having Wade-Giles on the multi-syllable Chinese character entries. I don't know what that step was exactly; it feels like it was a technical issue and not a linguistic theory issue. Here is a book of multi-syllabic Wade-Giles forms for reference: [1]. I think that everything related to linguistics should already be inputted into zh-pron, it's just that there's some key element missing that's preventing Wade-Giles from being displayed in zh-pron for the multi-syllable entries. Can anyone help me identify that small remaining problem is so we can determine how to overcome it? Here: Category:English terms derived from Wade-Giles is a category filled with over 400 English language loan words derived from the Wade-Giles transliteration scheme most of which are multi-syllable terms; after nearly twenty years of being ignored they cry out to you for your help. --Geographyinitiative (talk) 19:49, 20 May 2022 (UTC)

Does this change work correctly? You can test it by previewing with {{zh-pron/sandbox}} instead of {{zh-pron}} on Chinese entries. 98.170.164.88 19:56, 20 May 2022 (UTC)
God bless you IP. If this can be implemented on the mainspace, please do it. I think technically there needs to be a "dash" between the syllables, but if it can't be done, a a space okay- this is still an important step forward on this issue. Love you 98 IP. --Geographyinitiative (talk) 21:01, 20 May 2022 (UTC)
I've added the code created by 98 IP into the real deal. It's not perfect yet, but neither is this dictionary website which for twenty years just ignored Wade-Giles for multi-syllable Chinese character entries.
7 “Ask and it will be given to you; seek and you will find; knock and the door will be opened to you. 8 For everyone who asks receives; the one who seeks finds; and to the one who knocks, the door will be opened.
--Geographyinitiative (talk) 22:12, 20 May 2022 (UTC)
User:Geographyinitiative: what about this? [2] 98.170.164.88 00:48, 21 May 2022 (UTC)
Hey 98.170.164.88, this really works and accomplishes the exact purpose I was intending. Thanks. I have one final problem though: now, on the single-syllable Chinese character entries like , there are TWO Wade-Giles spots under zh-pron. Can you help me delete that duplicated one? Thanks for your work. --Geographyinitiative (talk) 13:32, 21 May 2022 (UTC)
@Geographyinitiative You can remove the entire if block at Module:cmn-pron#L-1167 to 1171 to prevent this redundant behavior.
By the way, you can move the code that is currently on lines 1186–1192 to wherever you feel is appropriate in the ordering of romanizations. Currently it is before sinological IPA, but I just put it there arbitrarily. 98.170.164.88 15:32, 21 May 2022 (UTC)

Account deletionEdit

Can I please have my account deleted and my edits reattributed. – Ilovemydoodle (talk) 03:54, 22 May 2022 (UTC)

@Ilovemydoodle: Is there some reason why you need your account deleted? You can just abandon the account if you don't want it anymore. - TheDaveRoss 16:49, 24 May 2022 (UTC)
Also, what do you mean by “reattributed”? To whom should the credit or blame for your edits be given?  --Lambiam

Disallowing page creations as well as edits with abuse filtersEdit

There's a certain very persistent Greek IP editor who is convinced that their allegedly superior knowledge of advanced physics and philosophy makes their version of English much better and more important than that of the mere mortals that actually speak the language. After years of getting their protologisms deleted and cleaning up their incomprehensible definitions in entries, I finally came up with an abuse filter(#128) that prevents any of their IP ranges from editing any entries or entry talk pages that aren't Greek. The last part is to allow some functionality to innocent third parties who have the misfortune of using the same IP ranges.

So far, this has worked quite well. They do have a tendency to follow the link to the Grease pit in the abuse filter message and post explanations in their usual unreadable private language, but those are so out of place that they're easy to spot and revert.

Recently, though, they seem to have discovered a loophole: the filter won't stop them from creating and saving the entry or talk page the first time, even if it stops them from editing it once it exists. I'm not sure if it's because something in the variables I check isn't available for page creations, or there's just an error in my code. I've taught myself abuse-filter syntax by browsing the manuals and trial-and-error, so I certainly could have missed something.

The first page they created (that I know of) is physicsism, with the definition "Overestimation of the descriptive ability of a future and ideal physics; The belief that physics is evolvable into a general descriptor." @Surjection recognized this for the quasi-gibberish it was and replaced that with {{rfdef}}. I'm sure the IP has a good idea of what they think the word means, but A) there's no guarantee that it matches what anyone else means when they use the word, and B) they're unable to explain it so that anyone else can understand it. I have yet to see any but the most trivial of their edits that was an improvement. I would appreciate it if anyone with access to the abuse filter would fix it or let me know how to fix it myself. Thanks! Chuck Entz (talk) 21:49, 22 May 2022 (UTC)

The filter is working perfectly fine and is catching page creations as well. The reason it didn't catch that particular edit is entirely different, and I have already updated the filter to address it. — SURJECTION / T / C / L / 05:37, 23 May 2022 (UTC)

Template:Han compound IssueEdit

At 广 (guǎng), we see: "+ phonetic 黃 (OC *ɡʷaːŋ)" in the Glyph Origin section. The exact same content should appear at the exact same spot on the (huáng) entry, but instead we see: "+ phonetic 黄 ()". I assume this must be a tech issue so I send it to you all to look at. --Geographyinitiative (talk) 15:53, 24 May 2022 (UTC)

Wiktionary:StatisticsEdit

It hasn't been updated since the April wiki dump. @Ungoliant MMDCCLXIV Could you update it? — Fenakhay (حيطي · مساهماتي) 15:55, 24 May 2022 (UTC)

Janus page unviewable – reported to be a phishing siteEdit

Please check this out: Wiktionary:Tea room/2022/May § Janus.  --Lambiam 16:49, 24 May 2022 (UTC)

Old Occitan link normalizationEdit

On laüt#Occitan, the link to Old Occitan takes you to laut (no diaeresis) when it should go to laüt#Old Occitan. Or maybe the entry should be moved to laut? 70.172.194.25 00:04, 25 May 2022 (UTC)

{{quote-av}} transcript URLEdit

IMO, it would be great if {{quote-av}} supported up to two URL parameters, one to view the audiovisual content and another to see a transcript. When only one of these options is available, of course, you could just supply that one. For comparison, the English Wikipedia's equivalent, Template:Cite AV media, has a transcripturl parameter. I think this would improve accessibility and searchability. Sometimes transcripts are not identical to what actually gets said, but often they're close enough. 70.172.194.25 04:14, 25 May 2022 (UTC)

Transliteration Systems in EtymologiesEdit

@Inqilābī, Justinrleung, Theknightwho & all: I would like to add two transliteration systems to Template:borrowed which would "fall under" Mandarin (cmn): one called "wg" (Wade-Giles) and one called "hp" (Hanyu Pinyin). See the last three posts in Talk:Kuomintang for discussion of this issue. See the first half of the Etymology section of 'Xizhi' for a potential example of what this might look like if implemented: "From the Hanyu Pinyin romanization of Mandarin [] ". On the Xizhi page, you would hypothetically write "From the {{bor|en|hp|-}}" and produce that text (or similar), and all the attendant categorization, etc that cmn would normally produce. ***Note: this issue can become extremely complex- I want to keep it narrowly focused right on the request in the first sentence so something can actually get DONE rather than endless debate. Please don't discuss new categories, different transliteration schemes, etc. yet.*** Thanks for any help here! --Geographyinitiative (talk) 18:59, 25 May 2022 (UTC) (modified)

This is a very reasonable idea. These could be set up as etymology-only languages; maybe cmn-hp and cmn-wg would be more appropriate codes. The existing Lua infrastructure for language codes would need some extensions, such as allowing for multiple Wikipedia links in the language name, but that is probably easy work for a Lua expert.
Another option, with slightly different output, would be to add a parameter to {{bor+}}, like {{bor+|en|cmn|-|rom=hanyu}} = "Borrowed from the Hanyu Pinyin romanization of Mandarin".
Still a third option would be to have separate templates which also integrate the functionality of {{zh-l}}, like {{bor-cmn-hanyu|en|汐止|tr=Xìzhǐ}}. = "From the Hanyu Pinyin romanization of Mandarin 汐止 (Xìzhǐ)." This seems like the most flexible option, and there is precedent in {{zh-l}}, the Chinese-specific variant of {{l}}. However, it doesn't scale very well if this approach gets expanded to more languages and romanization systems. This, that and the other (talk) 04:41, 26 May 2022 (UTC)
I'm supportive of this idea, and I think that etymology-only languages are the best way to refer to these. I would suggest that we slightly modify the ISO standard (for consistency going forward) and use cmn-pny for Pinyin, but with Wade-Giles I agree that cmn-wg is the best option. The main reasons I'd oppose your other two suggestsions are:
  • Added complexity, which becomes relevant when you may have editors adding etymologies for more distant descendents who may not be familiar with Chinese-specific templates (e.g. a German term borrowed from English, which was itself a Wade-Giles Romanisation). Etymology-only languages are a system that editors are already familiar with.
  • I don't envision a Romanisation applying to more than one parent langauge (e.g. you aren't going to have Pinyin of any language other than Mandarin), so there doesn't seem to be a reason to allow the Romanisation to be specified in a separate field to the parent langauge. It's just a "flavour" of that language, in the same way Medieval Latin is a "flavour" of Latin.
  • Incorporating the functionality of {{zh-l}} is a good idea, but I think that's a wider point that we should be getting the etymology templates to do in general; let's not compound the divergence, but deal with that issue properly (and separately).
Theknightwho (talk) 21:54, 26 May 2022 (UTC)
Very reasonable points. Some code would need to be added to Module:etymology languages, which would need additional code added by someone with appropriate powers. I would suggest, though, that co-opting pny in the code cmn-pny isn't the greatest idea, since pny is the code for an African language called Pinyin that has nothing to do with Chinese, and as Geographyinitiative reminds us from time to time, other Pinyin systems exist besides Hanyu Pinyin (see Tongyong Pinyin). This, that and the other (talk) 02:57, 27 May 2022 (UTC)
My mistake! Let's use cmn-hp then. Theknightwho (talk) 04:00, 27 May 2022 (UTC)
Why not use the BCP 47 variant tags? As far as I can tell, admissible codes would be cmn-pinyin and cmn-wadegile, though there might be a pedantic argument that one SHOULD use zh-cmn-pinyin and zh-cmn-wadegile. I'll ask about that tonight on the BCP 47 forum. --RichardW57m (talk) 12:53, 27 May 2022 (UTC)
@RichardW57m I think one of the main issues is the point that Pinyin isn't limited to Hanyu Pinyin, which is the system we usually associate with the term. Theknightwho (talk) 14:13, 27 May 2022 (UTC)
IANA has assigned pinyin to Hanyu Pinyin and tongyong to Tongyong Pinyin, so there is justification behind the use of those variant tags. This, that and the other (talk) 14:29, 27 May 2022 (UTC)
Fourth option: modifying {{transliteration}} to have a "system" parameter? —Fish bowl (talk) 22:02, 26 May 2022 (UTC)
Is that not just option 2? Theknightwho (talk) 22:06, 26 May 2022 (UTC)
Geographyinitiative has argued that we should avoid using the word "transliteration" to refer to romanizations of Chinese. See Talk:Kongmoon. So implementing it in {{bor}} may be less controversial. 70.172.194.25 01:03, 27 May 2022 (UTC)

Lua memory usage on Han character pagesEdit

Currently, CAT:E is full of Han character pages that are exceeding Lua memory limits. Does anyone know which modules might be the culprits? 70.172.194.25 17:45, 26 May 2022 (UTC)

The CJKV modules in general are a bit of a mess and could probably be optimized a lot in terms of memory usage. — SURJECTION / T / C / L / 20:13, 26 May 2022 (UTC)

Why is Supratiṣṭhitacāritra auto-categorised in Category:Long English words?Edit

The category is intended for "English words that are 25 letters long or more". Equinox 03:28, 29 May 2022 (UTC)

Because the code that populates this category counts bytes, not Unicode characters. In Lua, ("Supratiṣṭhitacāritra"):len(), or equivalently string.len("Supratiṣṭhitacāritra"), evaluates to 25. On the other hand, mw.ustring.len("Supratiṣṭhitacāritra") evaluates to the expected 20. The extra bytes come from the diacritics. 70.172.194.25 03:43, 29 May 2022 (UTC)
That is a bug then. Equinox 03:57, 29 May 2022 (UTC)
@Equinox: I have done 70.172.194.25’s fix. J3133 (talk) 08:01, 29 May 2022 (UTC)
On a wider note, we should never be using the string functions in Lua - they should always be mw.string. Theknightwho (talk) 17:57, 30 May 2022 (UTC)

Script for Pali in Eastern Nagari ScriptEdit

If anyone rushes in to make changes, please note that there are related changes to be done listed in Tweaking Eastern Nagari Script Definitions.

We currently have two versions of the Eastern Nagari script - Bengali (code 'Beng') which uses U+9B0 র RA for 'r' and Assamese (code 'as-Beng') which uses U+09F0 ৰ RA WITH MIDDLE DIAGONAL for 'r'. Template {{sa-sc}} uses the difference to determine whether a Sanskrit word is in the Bengali script or the Assamese script. Inconveniently, Pali in the Eastern Nagari script nowadays uses both letters - the first for 'r' and the second for 'v'. (See Template_talk:pi-alt for the elucidation of evidence.) Pali is currently declared to use the Bengali script as its Eastern Nagari script.

Unfortunately, this prevents the script detection and thus automatic transliteration of the indeclinable particle (va). This appears to be the only word affected. What is the proper solution? Is it to manually specify the script and transliterations, including replacing {{pi-particle}} in the entry with {{head|pi|particle}}, or should I create a third Eastern Nagari script for Pali? Today I modified the page for the particle to work around the problem. --14:12, 29 May 2022 (UTC) —⁠This unsigned comment was added by RichardW57 (talkcontribs) at 14:12, 29 May 2022‎.

If this issue only affects one word, then it is not unreasonable to handle it using manual overrides on that one entry. If other entries with this character are being misclassified, then you could consider removing this detection rule from the module that detects scripts and requiring the script to be manually specified in such cases. I'm not sure what you mean by creating a third Eastern Nagari script for Pali; would it only include these exceptional words? 70.172.194.25 15:06, 29 May 2022 (UTC)
No, the third script would be pi-Beng, would (probably) only be used for Pali, and would include both the letters above as well as the present repertoire of Beng. It might be possible to remove some unused characters, but it is unlikely to be worth the effort, which could backfire. I will leave a note at Module:pi-headword to say that it should be modified to handle ৰ; at the moment I am simply bypassing it and going direct to Module:headword. --RichardW57m (talk) 12:26, 30 May 2022 (UTC)

RQ:Byron Childe HaroldEdit

At face, RQ:Byron Childe Harold makes Lord Byron out to be a contemporary of the Han dynasty by saying his works were written in the year 181. --Geographyinitiative (talk) 00:04, 31 May 2022 (UTC)

Getting the HTML of the Flexion namespace in German WiktionaryEdit

I am trying to parse the German Wiktionary. I have been using the HTML dumps and they have been working great, however for some pages the inflections are not on the page itself, but instead on a subpage in the form of https://de.wiktionary.org/wiki/Flexion:spole%C4%8Dn%C3%BD . And these pages are not included in the HTML dumps, unfortunately. Does anyone have any idea what the best way to solve the problem would be? One could use the XML dump and then some technique similar to this project to turn the XML into HTML, however this would be very difficult to implement. After thinking about it, the simplest technique would be to simply scrape all the desired pages' HTML. Pretty ugly, but would probably work. Does anyone have a better idea? --MrBeef12 (talk) 11:36, 31 May 2022 (UTC)

I think that the best solution is to get the Wikimedia developers to include the Flexion namespace in the German Wiktionary's HTML dump. Using the XML dump involves extra steps because you have to process the templates in the Flexion pages to get the inflected forms. jberkel posted a Phabricator task for including Appendix, Thesaurus, Reconstruction, and Citations namespaces in the English Wiktionary HTML dump and maybe other Wiktionaries' namespaces that include entries or dictionary-related information could be mentioned in the same task. Not sure how soon that will be addressed. I'm surprised that addressing the English Wiktionary task wasn't as simple as just adding more namespaces to a list, but I don't really know how the HTML dumping works. — Eru·tuon 18:57, 31 May 2022 (UTC)
Thank you, this is probably the best option. Let's see when they are going to be available. MrBeef12 (talk) 11:16, 2 June 2022 (UTC)
If you want, leave a comment on the phabricator ticket indicating your interest in this, maybe this helps. – Jberkel 11:31, 2 June 2022 (UTC)
Good idea, I did so. Let's hope it'll get fixed. MrBeef12 (talk) 12:22, 3 June 2022 (UTC)
Yes, I wouldn't recommend using HTML dumps at the moment, they are incomplete and unreliable (aka enterprisey). On the other hand, parsing wiki markup with anything other than MediaWiki is usually doomed to fail at some point. But depends what kind of data you want to extract. – Jberkel 19:21, 31 May 2022 (UTC)

June 2022

The site thinks that my edits are harmfulEdit

I'm just manually adding inflected forms, I won't harm this site --ConjugationMan (talk) 13:39, 4 June 2022 (UTC)

It stopped complaining, which is useful for me (I only edit in good faith) --ConjugationMan (talk) 14:28, 4 June 2022 (UTC)
The filter that you triggered is a safety valve: vandals tend to create new accounts, then do as much damage as possible before they're discovered and blocked. Most new editors don't edit fast enough to set this off, so it's only rarely a problem. Chuck Entz (talk) 15:23, 4 June 2022 (UTC)

url2 in Template:quote-journalEdit

Hey all- I tried to use url2 in quote-journal, and the parameter journal2= did not display (everything else worked) see: diff. Let me know if I'm doing it wrong. --Geographyinitiative (talk) 13:47, 4 June 2022 (UTC)

@Geographyinitiative As you know, Template:quote-journal takes the parameters journal and title. But under the hood, these are passed to Module:quote as title and chapter, respectively. So Module:quote doesn't know anything about the parameter journal, and as a result it doesn't look for journal2 (the code a("journal") does not appear in the module's source). As a workaround, I changed the citation to use title2 and chapter2 instead: diff. If anyone wants to edit the module code to handle this situation properly, that would be even better. 70.172.194.25 00:05, 7 June 2022 (UTC)
This workaround works; not sure if there was a theory-based rationale for not having journal2. Might be good to have it. The situation in question is the oldest known cites for the word on Wiktionary, so it does seem important/valuable to move beyond work-around level (if this holds as the oldest known usage, this is valuable cultural heritage that can ring out throughout the internet; undignified to have a work-around in it). --Geographyinitiative (talk) 00:15, 7 June 2022 (UTC)
I think the reason is that it was the easiest way to let all the quote templates customize their labels for the individual work (chapter) and containing work (title). For example:
Anyway, the displayed "workaround" output is exactly as it should be; only the parameter names are different. 70.172.194.25 00:39, 7 June 2022 (UTC)

How about a Finnish impersonal verb conjugation template?Edit

I've already made one myself, as a subpage of my userpage. (It's here.) The template has full conjugation for the finite verb forms, and automatically adds transcluding pages into the impersonal verb category.

Here is an example of the template: [Redacted]

Let me know what y'all think! :) --ConjugationMan (talk) 14:45, 4 June 2022 (UTC)

I think this can be automated with a Lua module, but other than that it's a sound idea. I don't know whether we should have it in all entries that have monopersonal meanings, even if they also have personal ones (like pelottaa), or only to entries that solely have monopersonal meanings (like täytyä). — SURJECTION / T / C / L / 20:03, 4 June 2022 (UTC)
Another issue is that sometimes the "subject" is in genitive case (täytyä), other times in partitive (pelottaa), sometimes even in other cases (adessive case for olla (to have))... — SURJECTION / T / C / L / 21:01, 4 June 2022 (UTC)
@ConjugationMan: I commented out your template because its module invocation returns nothing but an error. Feel free to restore it when you have the module working. Chuck Entz (talk) 20:29, 5 June 2022 (UTC)

Ostensibly Lua-Free Declension Tables with OmissionsEdit

My preferred way of producing inflection tables is to add manual overrides to a Lua module that generates a regular inflection table. However, the Indian Indic community seems to prefer 'simple' templates that use nothing more(?) complicated than the parser extensions. @AryamanA, Svartava, Kutchkutch, Bhagadatta. To that end I have modified {{pmh-decl-noun-irregular}} to {{psu-decl-noun-irregular}} to have the following features:

  1. When the output table is displayed when viewing the template page, the contents of each cell is displayed in a form such as {{{7}}}.
  2. When a value is specified for a cell, it is displayed using {{l-self}}, as is common practice in inflection tables.
  3. If no value is specified for a cell, it is displayed as an em dash. The nature of Prakrit (e.g. the absence of dative plurals) and the incompleteness(!) of the attestation may lead to gaps in the table.

I'm wondering if I have missed some tricks in my coding. If so, could someone please advise me what they are? I do want the code to be maintainable by others.

I have lengthy coding for a single cell such as:

|{{#if:{{{7|<noinclude>f</noinclude>}}}|{{l-self|inc-pra|{{{7}}}}}|—}}

This may make it difficult to elaborate the display, e.g. to better display alternative forms. Currently I would use for a parameter something like |7=पुत्तण ''or '' पुत्तणं, which yields:

पुत्तेण or  पुत्तणं (putteṇa or  puttaṇaṃ)

That's not the best of formats, neither for display nor for input. (I feel a kinder input would be |inss1=पुत्तण|inss2=पुत्तणं.)

Is there some way of abbreviating <noinclude>f</noinclude>? --RichardW57 (talk) 20:44, 5 June 2022 (UTC)

Lua memory errors againEdit

@Surjection, Chuck Entz, Erutuon Suddenly we are back to the old situation with 26 pages with memory errors. What happened? Something must have changed around May 30 or May 31, when I first saw this. It was not a change I made. Maybe someone added a bunch more languages? I was able to eliminate the memory errors on maybe 8 pages using {{multitrans}}, but the remainder can't be fixed this way. Maybe we should resurrect the old ideas of splitting some of the Module:languages/data3 submodules, and/or implementing a generalization of {{multitrans}} that handles {{l}}, {{m}}, {{head}} and other common templates. Benwing2 (talk) 23:12, 5 June 2022 (UTC)

@Benwing2 This really started earlier: I first noticed it May 23, when there were suddenly 17 entries in CAT:E. @Surjection managed to clear those, but now they're back- with a few friends. When I looked at the transclusion list for bar on the 23rd, the there were only a few modules with changes in the previous few days. We can ignore your edit to a French module because it's not transcluded in the Chinese-character entries, and @Fish bowl also edited one of the CJK modules, but it didn't seem like the kind of thing that would radically change memory usage. There were substantial additions to Module:zh/data/st and Module:zh/data/ts- substantial, percentagewise, but those modules are still only about 90k in file size. I don't know if that would make a difference if multiplied by the number of transclusions in the Chinese-character entries (every link in the Chinese entry seems to use them). I asked @Justinrleung and he saw nothing wrong with them.
I can't use the "what changed" technique anymore because Surjection went through a lot of the Chinese modules on May 27 and worked on memory-related aspects of the code. It wouldn't hurt to check whether there were any problems with those edits, just to be safe, but I have no reason to believe they did anything but help.
I should mention that right now we're going through a minor wave of false positives that clear with a null edit, so I think someone made and quickly corrected an error in some widely-transcluded module. There's still a core of 28 memory errors once those are cleared.
As for your suggestions: such techniques are hard to apply to the Chinese-character entries due to the sheer number of data modules transcluded, and the other entries have dozens of language sections that link to almost everything at least once. Beyond that, this is all over my head. Chuck Entz (talk) 14:33, 6 June 2022 (UTC)

I think that copying over Module:cmn-pron/sandbox to Module:cmn-pron would help as a stopgap measure, saving about 1,500,000 bytes per invocation of {{zh-pron}}. (I divided Module:zh/data/cmn-hom into /1, ... /4 subpages. I figure nobody minds this much, probably, as long as it works.). 70.172.194.25 01:57, 7 June 2022 (UTC)

@Chuck Entz I made a suggestion awhile ago to compress some of the Chinese data modules into one big string rather than a bunch of tables, and index into the string appropriately. I think this would make a big difference for the bigger data modules but requires some effort. Let me try the IP's suggestion and see what happens. Benwing2 (talk) 02:29, 7 June 2022 (UTC)
OK, I went ahead and did that. It seems to have helped on "rì" but not on "xīn" or "wǒ". Benwing2 (talk) 02:37, 7 June 2022 (UTC)
After some null edits and assorted minor tinkering (also reverting to fix one caused by a rather strange misuse of {{ja-pos}}), we're down to 4 entries as opposed to 28 as of my last post. I would say that's a definite improvement. Even those 4 now get all the way to the bottom of the page before they run out of memory. Chuck Entz (talk) 04:07, 7 June 2022 (UTC)

Translations of staticityEdit

https://www.greek-language.gr/greekLang/modern_greek/tools/corpora/corpora/search.html?lq=%CF%83%CF%84%CE%B1%CF%84%CE%B9%CE%BA%CF%8C%CF%84%CE%B7%CF%84%CE%B1

Quotations from Anthologies and SamplersEdit

When I started adding Pali quotations to Wiktionary, I thought the details in {{quote-web}} were to help others find the quotation. Consequently, when quoting what, for my purposes, are anthologies, it looked as though the author of the anthology were the the author of the quotation. I am now trying to fix this problem. I have a few problems, though.

1) Is there an alternative to triggering of |location2=. That doesn't really feel suitable for web documents, though it doesn't look too bad in ᨷᩤᨭᩥᨾᩮᩣᨠ᩠ᨡ (pāṭimokkha), which uses quotation template {{RQ:pi:N3207}}. What is a problem is the word 'republished as'. If the new work is an anthology, it may neither republish the whole work nor essential consist of it. The simplest catch-all replacement I can think of is '(partly) republished in'.

2) The author keywords for the anthology only allow for one author, though they do allow for an author link.

3) Is the date of a work the date it was composed, first written down, or when the spelling was decided? A concrete example I have in mind is a single sutta composed in India, first written down in Sri Lanka a few generations later, and typeset in Burma (or just possibly Thailand) over a millennium and a half later, when glyph choices (subsequently confirmed as character choices) that didn't exist a thousand years earlier had to be made, and a picture of the printed text then published in a book, possibly a few decades later. --RichardW57m (talk) 11:57, 10 June 2022 (UTC)

HotCat Wiktionary fork testingEdit

There have long been plans to fork HotCat to better support English Wiktionary templates (such as {{C}}). I've now started work on such a fork and would like more testers to check whether the changes are working correctly. Automatic saving is disabled to let users review markup and such changes which is important for ensuring the gadget works correctly. Once sufficiently tested, the en.wiktionary gadget will be migrated over.

To test this version of HotCat, first go to Special:Preferences, Gadgets and disable HotCat. Then go to your common.js and add importScript("User:Surjection/HotCat.js"); into your list of imports. If you do not have a common.js, it should be enough to place that line on the page and nothing else. (To revert, remove the line from your common.js, possibly leaving it empty, and then re-enable HotCat from Gadgets under your preferences.) — SURJECTION / T / C / L / 12:44, 11 June 2022 (UTC)

Having tested it on many different kinds of pages (with many/singular L2's) and adding either singular or multiple categories, it works as expected. Vininn126 (talk) 11:53, 12 June 2022 (UTC)
@Surjection Thank you for this. It looks like your version uses {{C}} and {{cln}}, which is also what my templatize_categories.py script standardizes on; this is good. You should add a couple more aliases to the list of topical category templates; the full list I currently have is {{C}}, {{c}}, {{top}}, {{topic}}, {{topics}} and {{catlangcode}}. It's unfortunate we have so many aliases; I would eliminate some of them but they all seem to be used on more than 1000 pages, which is usually my threshold for when a template should be kept and deprecated vs. just deleted. Maybe though we should still consider deprecating some of them, esp. {{catlangcode}}, which is long and has a non-obvious name. BTW there's a third set of category templates, which is {{categorize}}/{{cat}}; these are for categories not preceded by a language name or code, e.g. Category:Verlan; it still can be useful to write e.g. {{cat|fr|Verlan}} instead of just a raw category link so that the sort code gets generated correctly. Benwing2 (talk) 21:22, 12 June 2022 (UTC)
D'oh. I tried getting all of the aliases, but I suppose I missed some. I do know about {{categorize}}, but the issue is that it's harder to check what language those categories are for (in case one wants to add them). I think the gadget would have to fetch the category data from a Lua module, or at least in the case of Category:Verlan, fetch the category page and check which categories it belongs to. It might be doable, but I don't know if it's possible to make the logic entirely foolproof. — SURJECTION / T / C / L / 05:37, 13 June 2022 (UTC)

Category:Melbourne, Category:zh:MelbourneEdit

If we really want these, can someone update the appropriate template(s)/module(s) to get {{auto cat}} to work with this? User: The Ice Mage talk to meh 11:12, 12 June 2022 (UTC)

Whitelist the Twitter search pageEdit

It's currently not possible to link to a Twitter search (only to individual tweets) because the twitter search page is globally spam-blacklisted. Can we MediaWiki:Spam-whitelist (cf w:MediaWiki:Spam-whitelist) it? I've run into the block a few times, and it came up again at Wiktionary:Requests for verification/English#waffle_stomp. (It's possible to whitelist things for use only on specific pages, but whitelisting it for use only on WT:RFVE and WT:RFVN would mean the threads would fail to be archivable; if we could whitelist it for use on any Wiktionary: or Talk: namespaces, that should cover anywhere it'd be needed. Alternatively, generally whitelist it and use an edit filter to block new users from adding links to it if it actually becomes a problem that people spam links to it.) - -sche (discuss) 04:27, 15 June 2022 (UTC)

I don't even see how the Twitter search page is any more abusable by spammers than linking to Twitter accounts or individual Tweets. There must be something I'm missing. 98.170.164.88 04:39, 15 June 2022 (UTC)
It does seem weird that the search page is blocked, though I suppose if we were blocking particular accounts or tweets it could be used to link to them anyway via search query. - TheDaveRoss 12:12, 15 June 2022 (UTC)
I don't want to give anyone BEANS-y ideas but, my guess is that the rationale is partly, as you say, the ability to evade tweet-/account-specific blocks, and partly that if a spammer tweets a link to their malware / phishing site / whatever, either the specific site or the specific tweet can be taken down, but if they link to a twitter search for some unique string, they can create tweets linking to their sites and using that string, and whenever one tweet or site gets taken down, create another tweet. (Ah, I see it's being discussed here, prior discussion here.) - -sche (discuss) 19:29, 15 June 2022 (UTC)
(But I think we should try whitelisting it, and see if we actually get spammed or not — I doubt it, we don't seem to be getting spammed with any other links to Twitter — at which point we could add an edit filter or un-whitelist it if necessary.) - -sche (discuss) 00:09, 16 June 2022 (UTC)
I went ahead and whitelisted it. If this unexpectedly causes a flood of twitter search spam, we can evaluate whether to block new users from adding such links with an edit filter and blocks (if it's just spambots from new accounts) or whether to un-whitelist it. - -sche (discuss) 15:44, 24 June 2022 (UTC)

HinduphobiaEdit

i wanted to edit some details in a blank page of hinduphobia , but the automatic word recognition system of wikipedia finds some words inappropriate , ex-racist , killing , plaese review it and help Uksinghrana (talk) 00:26, 17 June 2022 (UTC)

(Looking at the AbuseLog, I see that this user added a long text to Hindu Phobia which was appropriately deleted, then got some automatic warnings for bad edits to Hinduphobia, which Fytcha appropriated reverted. I'm not seeing any edits which were actually stopped by edit filters, let alone incorrectly stopped. No objections if someone wants to just roll this whole section back.) - -sche (discuss) 20:48, 17 June 2022 (UTC)

Archives: recent on top?Edit

Would you consider reversing the sequence of years at archives (the right hand contents) placing the more recent on top? Thank you ‑‑Sarri.greek  I 20:49, 17 June 2022 (UTC)

Templates movedto and movedfromEdit

I don't know why I've never noticed this, but the parameters are reversed, with |1= as the display parameter when there are two parameters. I'm not sure how many of the 528 (combined) transclusions use the second parameter, but it would be a good idea to think this through before fixing things Chuck Entz (talk) 21:12, 17 June 2022 (UTC)

SingulativesEdit

Should {{singulative of}} add the page to a relevant category, such as Category:English singulatives (which currently holds but one entry)? This is already the case with {{clipping of}}, {{misspelling of}} and probably some other templates too. brittletheories (talk) 11:05, 19 June 2022 (UTC)

Probably. I don't see why not. - -sche (discuss) 15:39, 24 June 2022 (UTC)

ΜάγιαEdit

The entry for μάγια seems to be a bit confused; it gives only one definition, which it classes as a feminine noun meaning "spell (magic and witchcraft)". As the Greek Wiktionary makes clear, though, of the three definitions given there, the feminine singular noun is the name of a type of dance; the two definitions concerning magic, witchcraft, etc., involve μάγια as a neuter noun always used in the plural. I don't have the expertise (yet) to make the requisite changes. Could someone either make them, or help me to learn? Thanks. --Bibliosporias (talk) 16:38, 19 June 2022 (UTC)

Thank you @Bibliosporias for spotting the mistake: it is neuter (n). I can add the extra definitions as in el:μάγια: the language, and the leotard worn by dancers. ‑‑Sarri.greek  I 09:24, 20 June 2022 (UTC)
Many thanks! --Bibliosporias (talk) 09:52, 20 June 2022 (UTC)

Armenian entries that lack a pronunciationEdit

On English Wiktionary, there are around 16K Armenian lemmas. Most of them have a pronunciation entry like գրել. It seems there's 15k such entries with a pronunciation. So at least 1K words lack a pronunciation like ակն ընդ ական. Is there a way retrieve a list of Armenian entries that lack a pronunciation? If so, I can then manually add pronunciations to the leftovers. Hovsepig (talk) 09:54, 20 June 2022 (UTC)

Here you go! Vininn126 (talk) 10:16, 20 June 2022 (UTC)

Transliteration Systems in Etymologies 2Edit

@Fish bowl, RichardW57m, Theknightwho, This, that and the other, 70.172.194.25 & all: I would like to add two transliteration systems to Template:borrowed (or similar) which would "fall under" Mandarin (cmn): one called "wg" (or similar) (Wade-Giles) and one called "hp" (or similar) (Hanyu Pinyin). See the last three posts in Talk:Kuomintang for discussion of this issue. See the first half of the Etymology section of 'Xizhi' for a potential example of what this might look like if implemented: "From the Hanyu Pinyin romanization of Mandarin [] ". On the Xizhi page, you would hypothetically write "From the {{bor|en|hp|-}}" and produce that text (or similar), and all the attendant categorization, etc that cmn would normally produce.
(NOTE: As can be seen from the discussion at Wiktionary:Grease_pit/2022/May#Transliteration_Systems_in_Etymologies, this is a complex issue that leads into many fun questions that can be talked about endlessly around the campfire, which is EXACTLY how NOTHING got done the first time. There are complex issues that absolutely cannot be solved ahead of time. The only way forward is for "something" to be created. Once "something" is created (regardless of how ugly, stupid, wrong, un-academic, whatever), I will implement it immediately. That will get the attention of the big boys, who will want to comment. At that point, the long, meaningful discussions can lead to wonderful tweaking and modulation of the final form. But for now, please don't discuss new categories, different transliteration schemes, etc. yet. All I'd like to see is a functional system where "Hanyu Pinyin" and/or "Wade-Giles" appears automatically. God bless.) --Geographyinitiative (talk) 15:48, 20 June 2022 (UTC)

Just to add that I support this.
What needs adding based on the previous discussion
In Module:etymology languages/data:
m["pinyin"] = {
canonicalName = "Hanyu Pinyin",
aliases = {"Pinyin"},
parent = "cmn",
wikidata_item = 42222,
}

m["wadegile"] = {
canonicalName = "Wade–Giles",
aliases = {"Wade-Giles", "Wade Giles"},
parent = "cmn",
wikidata_item = 208442,
}
Theknightwho (talk) 16:21, 20 June 2022 (UTC)
Use the standard names for the written lects, cmn-pinyin and cmn-wadegile. --RichardW57m (talk) 12:13, 22 June 2022 (UTC)
I've updated to simply pinyin and wadegile so as to match IANA. Theknightwho (talk) 14:00, 22 June 2022 (UTC)
Also suggesting we add Tongyong Pinyin, because Category:English terms derived from Tongyong Pinyin has 35 terms in it.
m["tongyong"] = {
canonicalName = "Tongyong Pinyin",
parent = "cmn",
wikidata_item = 700739,
}
Theknightwho (talk) 15:09, 22 June 2022 (UTC)
@Theknightwho: Dropping the 'cmn-' is in violation of IANA. The relevant format is <language>-<variant>. The feedback I got was that the 'standard names' were acceptable, but some systems might not properly understand it if we didn't lay it on with a trowel by prefixing 'zh-', though 'cmn' is supposed to be preferred to 'zh-cmn', and suffixing '-Latn', which we don't do, because we pass our (amplified) script identification separately. --RichardW57m (talk) 16:11, 22 June 2022 (UTC)
Checking IANA's list and what they list as the prefix, the correct codes would be zh-Latn-pinyin, zh-Latn-wadegile and zh-Latn-tongyong. A Google search confirms that that is what is in general use. Theknightwho (talk) 16:28, 22 June 2022 (UTC)
@TheKnightWho: Sorry, yes, the '-Latn' would be an insert rather than a suffix. I asked about the need for 'zh-' at https://mailarchive.ietf.org/arch/msg/ietf-languages/q0Kw4JINn7lWCjsbNajh82nztAI/ and was told it was not necessary. --RichardW57 (talk) 00:12, 23 June 2022 (UTC)
This is a purely Wiktionary-internal thing, it doesn't matter if "some systems might not properly understand it". I'd prefer something with a language code at the beginning (cmn-pinyin, cmn-wadegile; "zh" is wrong as these are for Mandarin only, and "-Latn" is superfluous for our purposes). But that's neither here nor there. Let's hope an admin will see this discussion this time! This, that and the other (talk) 02:23, 23 June 2022 (UTC)

Template:ja-romanization of fails to recognize archaic/retrospective kanaEdit

Currently, Template:ja-romanization of recognizes only the kana in the main Hiragana and Katakana Unicode blocks; the archaic or retrospective kana 𛀀, 𛀁, 𛄡, 𛄠, 𛀆, 𛄢, and 𛄟 (in the Kana Supplement and Kana Extended-A blocks), if present in the kana source of a romanization, cause the template to throw up a big ugly red "(link to non-kana entry)" error message, as can (for instance) currently be seen at ye#Japanese. Could someone please add these seven kana to the characters that Template:ja-romanization of accepts without throwing an error message? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 20:22, 22 June 2022 (UTC)

@Whoop whoop pull up: I just added those specific glyphs to the sanity check. Does it work now? ‑‑ Eiríkr Útlendi │Tala við mig 20:53, 23 June 2022 (UTC)
@Eirikr Looking at ye#Japanese, it does indeed seem that that did the trick. Thanx! Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:31, 23 June 2022 (UTC)

contextualityEdit

TranslationsEdit

2A02:2149:8B65:2C00:5CCF:6450:CD0E:4A2B 02:31, 23 June 2022 (UTC)

Have you considered adding this content directly to the entry itself? This, that and the other (talk) 02:55, 23 June 2022 (UTC)
@This, that and the other: They have, but fortunately there's an abuse filter that stops them. This is the the person I was referring to here. You have no idea how much utter bilge they've had deleted in rfv over the years. Chuck Entz (talk) 06:03, 23 June 2022 (UTC)

Lua access to Wikidata Lexeme has been enabled on all Wikimedia projects from June 21Edit

Check out d:Wikidata_talk:Lexicographical_data#You_can_now_reuse_Wikidata_Lexemes_on_all_wikis. So, now we can call lexemes, forms, senses, etc., from Wikidata Lexemes using Lua. Thanks. Vis M (talk) 11:04, 23 June 2022 (UTC)

Oddities in behavior of Template:rfeEdit

I just added an {{rfe}} over at Hungarian szerszám:

  • {{rfe|hu|Could someone explain the sense development? Unclear how ''tool'' + ''number'' = ''tool''.}}

The expected behavior is for argument 1 to be the lang code, and for argument 2 to be a description or note, which should display. The lang code appears to be correctly handled, and the page is added to the language-appropriate category. However, the note is no longer appearing for me.

Playing around, I noticed that any string added as argument 3 is treated as the description. But argument 3 isn't accounted for anywhere in the wikicode...

This template references a couple others, but none of these use Lua. None have been edited all that recently (stable for a few weeks at any rate). Can anyone tell what's going on here? Was there a tweak to the underlying MW infrastructure that has gone a bit funny? ‑‑ Eiríkr Útlendi │Tala við mig 20:46, 23 June 2022 (UTC)

@Eirikr: The equals sign is disruptive because MediaWiki is treating it as the thing used to define parameters, so you have to manually write the unnumbered parameter as N=, or use {{=}}. —Fish bowl (talk) 20:49, 23 June 2022 (UTC)
Oh, FFS. Thank you! I (mis-)remembered that that only happened if there was a single space-less string right before the equals sign. <sigh.> Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 20:55, 23 June 2022 (UTC)

Borrowings from ancestorEdit

Is it possible to make a filter that is triggered when you try adding a {{borrowed}} template with a mother and daughter in it? I can only think of a handful (Sanskrit, Latin, Greek) languages where this would be annoying, and a whole lot of languages where this'd be useful. It's probably not a good idea to make an error out of this because of things like Algiz or Dyeus, but a filter might be nice to at least diminish the amount of mistakes. Thadh (talk) 11:58, 24 June 2022 (UTC)

Shouldn't those languages be using {{learned borrowing}} anyway? Vininn126 (talk) 12:46, 24 June 2022 (UTC)
Not always. Old French bulle (bull) looks like a word naturally acquired from Latin speakers, rather than one deliberately introduced. That's on top of the fact that using 'learned borrowing' contrary to normal usage is confusing. I think some people are using it as a semantic loan of tatsama.
What would be more useful, though much harder, would be to check the first parameter against the L2 heading. (Do we already have a bot that does this in slow time?) I fear this immediate check might be a phabrication request. --RichardW57m (talk) 15:26, 24 June 2022 (UTC)
The "slow time" approach (i.e. WT:TODO) is the right way to address this problem. We already have WT:Todo/Incorrect derivation templates and WT:Todo/Template language code doesn't match header for the "first parameter doesn't match L2 heading" issue. I could look at coding up a report that displays instances of borrowing from parents to children (obviously excluding languages like Latin as mentioned already). This, that and the other (talk) 02:42, 25 June 2022 (UTC)
@Thadh, RichardW57m User:This, that and the other/terms borrowed from ancestor This, that and the other (talk) 05:20, 25 June 2022 (UTC)
No. Abuse filters cannot access our language data. The best we can do is hardcode particular pairs. — SURJECTION / T / C / L / 16:14, 24 June 2022 (UTC)
In that case, what about triggering a module error except for specific languages? I'm guessing learned borrowings from Proto-Yukagir or Proto-Kuki-Chin aren't a thing anyway, so we could make a list of parent languages (or, perhaps, extinct parent languages) that would be ignored (PIE, PG, Latin, Dutch, French, English, Ancient Greek, Sanskrit, Malay... Are there many more of these?). Or is that what you meant with "hardcore particular pairs"? Thadh (talk) 18:26, 24 June 2022 (UTC)
I don't think that's a good approach either. What I meant by "hardcoding particular pairs" is that we can have it block certain languages or language pairs from being used, but not all languages that have a parent-child relationship according to our language data. — SURJECTION / T / C / L / 08:03, 25 June 2022 (UTC)