Wiktionary:Grease pit/2023/April

Suggested change to Block & Delete reasons

When dealing with spammers, the Delete Page dropdown list of reasons has "promotional material", but the Block User dropdown list of reasons has "spamming/advertising". I always get mixed up when trying to do this quickly on the keyboard, and forget where to press P and where to press S. Can we make these reasons start with the same letter, e.g. both "spam" or both "promoting"? Equinox ◑ 06:56, 1 April 2023 (UTC)[reply]

Support. I’d prefer them both to have “slang”. Theknightwho (talk) 11:19, 2 April 2023 (UTC)[reply]

Spam? Vininn126 (talk) 11:23, 2 April 2023 (UTC)[reply]

Spam, yeah. My brain sometimes mixes up similar-sounding words, even though I knew exactly what I meant lol. Theknightwho (talk) 11:41, 2 April 2023 (UTC)[reply]

I like "promotional material"; it's less finger-pointy and more general than "spam". The block reason could be changed to "Posting promotional material" or "Posting spam/advertising material" so they both start with "P". This, that and the other (talk) 12:11, 2 April 2023 (UTC)[reply]

I think I agree with you. It's also clearer to me. Vininn126 (talk) 12:12, 2 April 2023 (UTC)[reply]

Done Equinox ◑ 09:07, 10 April 2023 (UTC)[reply]

Making a test environment for Lua code, access through git?

Hi - I'm interested in playing around with the Lua code for Greek declensions: https://en.wiktionary.org/wiki/Module:grc-decl/documentation . Is there some other way to do this other than clicking through to lots of pages in a browser and cutting and pasting the code into an editor? Is there a git repository somewhere? Thanks. -Ben Crowell, 1 April 2023

Hello. There is no standard way of doing this but AFAIK some people (e.g. User:Erutuon I think) have set up custom Lua environments on their own machines so they can test Lua code. (I don't do this; I test in userspace modules on the site itself.) As for retrieving the code, there isn't that much to cut and paste, or you can fetch it from the dump file at dumps.wikimedia.org (it's released twice a month and the new one should come out later today). BTW you should create an account, it will make it easier for you and others. Benwing2 (talk) 17:29, 1 April 2023 (UTC)[reply]

Thanks for the reply, much appreciated. -Ben Crowell

I have only used the environment to do very particular tasks like generating transliteration. Another user announced a more complete Lua environment at Wiktionary:Beer parlour/2021/September#Announcing ilscripto 0.0.1: pure Lua Scribunto engine. Accessible on GitHub at Crowley999/ilscripto. I haven't used it myself, apart from maybe taking some of the code. — Eru·tuon 19:27, 9 April 2023 (UTC)[reply]

Line break with Template:hyphenation

Hi, for some reason a line break is inserted after the first {{hyphenation}} in entries with a second template using the nocaption= parameter. (See e.g. insource:"or hyphenation"). Einstein2 (talk) 18:29, 1 April 2023 (UTC)[reply]

This is still an issue. I think it must be caused by Module:hyphenation's format_hyphenations, but I am unable to edit it. — excarnateSojourner (talk · contrib) 00:03, 20 April 2023 (UTC)[reply]
It's because of (caption .. ": "). That could be deleted, but I don't know what the point of it is; it might be useful in some situation. I don't do hyphenations so I have no idea. — Eru·tuon 04:53, 20 April 2023 (UTC)[reply]

Never mind, it was an unintended consequence of this edit by Fenakhay. Fixed, I think. — Eru·tuon 05:03, 20 April 2023 (UTC)[reply]

I was just wondering why I was unable to reproduce the issue now, lol! Thanks! — excarnateSojourner (talk · contrib) 05:33, 20 April 2023 (UTC)[reply]

Alternative forms with the wrong language code

Today I was surprised to find several "Alternative forms" sections in Old Norse entries dating to March of 2021 with links to Westrobothnian entries. Would it be possible to create a list of all the Alternative forms that have language codes that don't match those of their L2 section? For that matter, you could do the same with Derived terms and Related terms while you're at it, since all of them are supposed to only contain terms in the same language. Thanks! Chuck Entz (talk) 23:25, 1 April 2023 (UTC)[reply]

Oh man, so is this the reason of this weird stuff like in the code text on dœld? Do we must now merge Westrobothnian into Swedish? Like in case with leka? Tollef Salemann (talk) 07:27, 2 April 2023 (UTC)[reply]

@Tollef Salemann: No, see Wiktionary:Requests_for_deletion/Others#Category:Westrobothnian language. It's a bit more complicated than a simple merger because of uncertainty as to whether the Westrobothnian content could be trusted to be real dialectal Swedish and not some kind of Swedish-based conlang. Chuck Entz (talk) 07:54, 2 April 2023 (UTC)[reply]

Im understand it, but what to do with the dialectal stuff from books? i've seen many north Swedish dialectal texts on Runeberg and some other places, as well in blogs long tame ago. They've got 4 genders and casus and a lot of stuff which is very hard to fit into regular Swedish. So if im gonna find such words, im gonna consider them as dialectal Swedish and just make a manual created conjugation for them, as i do with Trøndsk. But if i see a Westrobothnian term now anywhere, do i need to report it to you, or just hide it with the <! sign? Or just delete it? Tollef Salemann (talk) 08:08, 2 April 2023 (UTC)[reply]

The code for Westrobothnian was removed from the modules yesterday, so it will throw a module error anywhere it's used on Wiktionary. The bot was moving the entries to a special page, and commenting everything else out to preserve the data without the module errors. From now on, Wiktionary doesn't recognize Westrobothnian at all. The dialectal forms from that area are just dialectal forms like all the others. Eventually, people who know the local dialect(s) will go through the special page and salvage anything usable. My impression from the deletion discussion is that at least some of it is made-up stuff that locals who aren't in on the game wouldn't recognize- but I don't speak Swedish, so I wouldn't know. Chuck Entz (talk) 14:53, 2 April 2023 (UTC)[reply]

@Chuck Entz Should we have a place to store these deprecated codes? They’re occasionally necessary, as four of the old FWOTD pages are now throwing errors, such as Wiktionary:Foreign Word of the Day/2017/June 29. Theknightwho (talk) 15:42, 5 April 2023 (UTC)[reply]

The problem with this is that many northern Swedish dialects do not partake in even the most basal East Norse innovations such as monophthongization ei > é, which was completed in central Sweden and Denmark as early as 1000 CE. That is, they are not descended from Old Swedish. Just like having Elfdalian (which should probably be renamed Dalecarlian, since the traditional dialect is not unique to Älvdalen), Westrobothnian was a useful heading under which these dialectal forms could be collected, although in the way the entries were previously spelled and rendered it was certainly unsalvageable. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 12:18, 9 April 2023 (UTC)[reply]

AFAIK, the diftongs in Sweden ain't only used in the north, so i prupose to look on Old Swedish just as a written norm, with dialectal variations. If you take a look on Trøndsk, we got the old Selbu dialect with pre-Old-Norse nasalisation in words like lås (like in Älvdalen). In the same time there are non-standard word forms in Middle Norwegian texts from this area.

But it is still can be considered as a dialect, not a separate language (at least if you look at grammar and lexical variety of old Selbu, it's kinda identical to old Nynorsk). I've seen some long texts in North Swedish (can't find them again yet, i reckon it was on Runeberg), and it is the same situation there.

The main pain on the back is to find which dialectal spellings are in fact used in books/internet/sms and how are they pronounced, because Swedes seems far less happy in dialect use, compared to Norwegians.

So here's my solution i would use in this case:

1) if some Swedish dialectal word is not from "normal" Old Swedish, it can be pointed out in the etymology section. If there is a known written dialectal form in some very old book, worth to mention it.

2) if the dialectal word has weird conjunction (like "dalom" used in "Nils Holgerssons resa") it can be easily pointed out in the conjunction section by using a non-standard code.

3) include the dialectal forms into the main entry (like in Nynorsk sytten)

4) use quotes (from books, internet, maybe also some non-Jamtish Triakel songs)

5) make a category for northern words

Examples on such are Izhman кыыны or Norwegian bøgd Tollef Salemann (talk) 06:23, 10 April 2023 (UTC)[reply]

@Chuck Entz see Wiktionary:Todo/Template language code doesn't match header under alt and alter in the first column. As you can see, a lot of people invalidly put Malay terms as alt forms of Indonesian, and vice versa. However, more pertinently to what you are asking, there are no Westrobothnian alt forms showing up here; perhaps they use the {{l}} template under an Alternative forms L3, which my script doesn't capture, or they have been added since the January 1 dump. This, that and the other (talk) 12:16, 2 April 2023 (UTC)[reply]

@This, that and the other: I removed all of them yesterday, anyway. Because of the discontinuation of the code for Westrobothnian, they all had module errors. See gor for one example (it used {{l}}, which explains why you didn't find it). Chuck Entz (talk) 14:53, 2 April 2023 (UTC)[reply]

@This, that and the other: Wow, there are a lot of false positives in that list to filter out - a distinctly non-trivial task to automate, with lots of low yields. Is it worth reporting them as I go through a selected set? --RichardW57m (talk) 16:06, 3 April 2023 (UTC)[reply]

@This, that and the other, RichardW57m: Yes, there are specifically correct template uses, e.g. using {{compound|ota}} with |nocat=1 in Serbo-Croatian iđirot, the same in English magenstrasse, and also a lot of cases where Arabic (macrolanguage) noun sections contain dialect pronunciations marked by the dialect languages codes, comparable to Chinese entries because there would be no additional value in dedicated language sections other than the specific transcriptions. Fay Freak (talk) 16:30, 3 April 2023 (UTC)[reply]

@This, that and the other: The only other one I found when cleaning up the Pali entries was commented out module invocations. Note that the first parameter in such invocations is a function name, not a language code. --RichardW57 (talk) 17:28, 10 April 2023 (UTC)[reply]

@RichardW57 in order to make the list more useful, I'd love to know about further false positives that can be excluded. It is supposed to ignore templates with |nocat=1; I'll need to fix this. I can exclude the Arabic codes, but I think a larger community discussion might be warranted there. For instance, if we treat Arabic as a separate lect from, say, Egyptian Arabic, we shouldn't be using {{syn}} and similar templates to link between them, as a word with the same meaning in a different lect is a translation, not a synonym. This, that and the other (talk) 23:05, 10 April 2023 (UTC)[reply]

Another pair which I know do this are Adyghe and Kabardian, as they’re tied up in Circassian nationalism. Theknightwho (talk) 12:42, 2 April 2023 (UTC)[reply]

"Greek > Ancient" in translation tables

A lot of translation tables have "Ancient: ..." under Greek for grc translations instead of "Ancient Greek: ...". The problem with this is it breaks the automatic translation editor, which will add any new grc translations on a new "Ancient Greek" line instead of the existing "Ancient" one. Either works as far as I'm concerned but we should probably settle on one and run a bot job to bring the others in line. —Al-Muqanna المقنع (talk) 19:28, 2 April 2023 (UTC)[reply]

@Al-Muqanna This should be easy to fix (Ancient -> Ancient Greek, I think); can you point me to a few examples? Benwing2 (talk) 20:27, 2 April 2023 (UTC)[reply]

@Benwing2: Some at legal, signet ring, blacksmith, crimson, sulfur—seems harder to find examples with "Ancient Greek" if anything, though under is one. —Al-Muqanna المقنع (talk) 20:36, 2 April 2023 (UTC)[reply]

We could also change the nesting in MediaWiki:Gadget-TranslationAdder-Data.js from 'Greek/Ancient Greek' to 'Greek/Ancient' to make the translation adder behave. But it does seem like most languages have 'X/modifier X', not 'X/modifier', so it would be consistent to keep Ancient Greek as you're proposing. — Eru·tuon 20:45, 2 April 2023 (UTC)[reply]

Yeah, it would also be simpler for bot code and such to use the actual language name rather than some lect-specific abbreviation. Benwing2 (talk) 21:00, 2 April 2023 (UTC)[reply]

Personally I've always found it unintuitive that we have languages which, in any other situation, are listed alphabetically (e.g. on the page for φιλία itself "Ancient Greek" comes first, then "Greek"), but which are not listed alphabetically in translations tables (you have to look under "G" to find "Ancient...", whereas "Old English and "Old High German" are under "O" as expected). But yes, at least use the full language name and not an abbreviation. - -sche (discuss) 01:28, 3 April 2023 (UTC)[reply]

It just seems for whatever reason that Modern Greek is so overshadowed by its history that "Greek" often means the ancient form unless qualified, at least in certain fields such as Bible study and lexicography. The American Heritage Dictionary for example just uses "Greek" for the ancient form of the language and "Modern Greek" for those relatively few words that are borrowed into English from the modern language. Even though we are also a dictionary, we are a multilingual one, and I think it would be a bad idea to follow the AHD's practice here. That said, I think it's good to have them both under the same header rather than splitting them out as we do with English vs Old English ... Greek has changed much less than English over time, especially in its written form, and it also has a lot of borrowings from its older stage .... and, yes, I suspect a lot of readers will be looking under the G header out of instinct just because the ancient form of the language is so often just called Greek. —Soap— 06:41, 5 April 2023 (UTC)[reply]

A lot of the time these are quirks of either very old practice on Wiktionary, or just the idiosyncrasies of whoever added the translation.

That being said, I’m not keen on the idea of nesting languages based on how close or distant they are perceived to be linguistically. Rather, I think we should make a decision based on whether we want to include or ignore “Ancient” in the alphabetical order. Same goes for other common qualifiers, like “Middle”, “Old”, “Early Modern”, “Classical” and so on. Theknightwho (talk) 15:39, 5 April 2023 (UTC)[reply]

Adding the Origin of Ohana

Ohana comes from the root word tied to the taro plant, a vital part of life on the islands. ʻOha refers to the shoot of the plant, which Hawaiians cut off and replant to grow more taro plants the following season. The word ana is related to procreation and regeneration.

I merely wished to include the Google reference. Please. 50.219.140.122 23:20, 2 April 2023 (UTC)[reply]

See etymology of Hawaiian ʻohana. DCDuring (talk) 12:35, 3 April 2023 (UTC)[reply]

Quiet Quentin stopped working?

The tab's still there, but nothing happens when I click it. Is it just me?__Gamren (talk) 00:23, 3 April 2023 (UTC)[reply]

It still works for me. (I get no results if I search for an extremely common string like "kittens" — I have to search for "kittens are" to get results to show up — but I think that's a quirk on Google's end, not ours.) It doesn't look like you changed any of your js recently (so nothing there should be causing a new conflict), and it doesn't look like the Gadget itself was changed recently. I dunno... have you tried clearing your cache, or hard-refreshing the QQ js in the manner describe at the very top of this page? - -sche (discuss) 01:23, 3 April 2023 (UTC)[reply]

Citation flagged

New contributor here, trying to add a citation to the word "infoglut," which was flagged for "various specific spammer habits." Maybe because the template I used included "authorlink"? I'm happy to remove that if it's a problem. KMSheldon (talk) 18:36, 3 April 2023 (UTC)[reply]

Yeah, looking at the text of the edit you tried to make (which is stored in a log where admins can view it), if I had to guess, I would guess it's the external link that's being blocked; you're a new user with no edits (and your username is similar to the name of the site you're linking, to boot), so it probably rings a lot of "this looks like drive-by self-promotional spam" alarm bells. Try adding the citation without the link. - -sche (discuss) 21:26, 3 April 2023 (UTC)[reply]

So, I took out the link to my web site, but still got flagged for "various specific spammer habits." Perhaps because I am the author of the article in question? Self-promotional, perhaps, but if you search online you will find no citations for this word "infoglut," which I believe I coined in 1991, unless someone comes up with an earlier citation.

Thank you for posting here. Im not an administrator, but I think in this case the text you were trying to post is visible to all. Since the website you linked doesnt have the 1991 research paper on it, I agree with the filter that it's not really helpful to readers looking for more information about the history of this word. If the research paper is available online, even if it's behind a subscription paywall, it would be great to link that.

The text of the edit looks very good. I will add it to the citations page, and I'd say it could be added to the entry too, as the citations page is usually meant for when we have too many citations to list all under the main entry, or when we have citations for a word we don't have listed yet. That said, from the context this looks like a fictional account, so I just want to know one more thing .... is Human Filters the name of a short story, or the title of the research paper, or something else? Thank you, —Soap— 06:24, 5 April 2023 (UTC)[reply]

Thanks! The last page of Byte Magazine was a humor/commentary column called Stop Bit. Human Filters was a commentary about the flood of information that personal computers had unleashed. Infoglut was derived from the term "oil glut," when there's too much oil production. I finally found a link to the entire issue with the cited column, albeit in a PDF form. Should I add that link to the citation? KMSheldon (talk) 11:59, 5 April 2023 (UTC)[reply]

That sounds good to me, thank you. As for why the edit filter is so strict even on a citations page, where one would least expect spam, I can't say, but it's possible that it's computationally less expensive to check an edit on all pages than to check an edit on only some. —Soap— 12:08, 5 April 2023 (UTC)[reply]

Productivity categories

I propose we create categories for productivity of affixes and also standardize wording. Vininn126 (talk) 16:45, 5 April 2023 (UTC)[reply]

How would they be populated? Has any other language dictionary tried to do this. Are there scholarly papers on this? How did they measure productivity? Would we test in one language first? DCDuring (talk) 16:55, 5 April 2023 (UTC)[reply]

Good questions. I propose the name be based on our other categories, i.e. X languages with (non)-productive senses. I think we'd only need the two, because if a suffix is not standardly productive, but users sort of recognize it we'd label is as "archaic" in agreeance with our glossary (used but usually for stylized effect). I do not believe other dictionaries do this, but we already mark productivity and link to it in our glossary; also to be frank I rarely see dictionaries listing affixes as it is. We could test it, I offer Polish as tribute. Vininn126 (talk) 17:00, 5 April 2023 (UTC)[reply]

Is an English translation available? All I could extract from this reply was, "No-one else does it. Try doing it for Polish." --RichardW57m (talk) 13:32, 6 April 2023 (UTC)[reply]

-th is an example of a suffix no longer or rarely a productive in English. Did you even read the glossary or look at any affixes with it? There are tons of languages with this label. Vininn126 (talk) 13:40, 6 April 2023 (UTC)[reply]

@RichardW57m I even say it's in tons of entries and the glossary, how did you come to the conclusion no one else does it? Vininn126 (talk) 13:41, 6 April 2023 (UTC)[reply]

@Vininn126: You didn't say which glossary! I automatedly searched in WT:Glossary (rather than reading from the start), and not in Appendix:Glossary.

You wrote, 'I do not believe other dictionaries do this,...', seemingly in response to 'Has any other language dictionary tried to do this?'. "No-one else does it" was meant collectively, i.e. in terms of bodies of editors, not in reference to individual editors. --RichardW57 (talk) 11:03, 9 April 2023 (UTC)[reply]

I believe @DCDuring thought you were proposing that all affixes be categorised according to their productivity, which is a daunting task, especially for dead or moribund languages. Your suggested category name 'X languages with (non)-productive senses' was too garbled for the reader to confidently correct, though subsequent posts now make it intelligible. It seems that you are proposing to add categories with names such as 'English affixes with productive senses' and 'English affixes with non-productive senses'; not all English affixes would be categorised into one of the two categories, and some, such as -th, would be categorised into both. Assessing the productivity of Gothic affixes is likely to be hard work. --RichardW57 (talk) 11:03, 9 April 2023 (UTC)[reply]

In short yes, the proposal is to have the label "nonproductive" categorize. Vininn126 (talk) 22:59, 10 April 2023 (UTC)[reply]

We actually already have Category:English unproductive suffixes (@DCDuring @RichardW57m who were asking if this has been done before), so someone has thought of this before. I think we should change {{lb}} to include nonproductive as a label. Vininn126 (talk) 13:55, 6 April 2023 (UTC)[reply]

Disabling bold and italics for the Khitan scripts

Could we please disable bold and italics for the Khitan scripts by adding the following to MediaWiki:Common.css?

.Kitl, .Kitl * {
	font-style: normal;
	font-weight: normal;
}

.Kits, .Kits * {
	font-style: normal;
	font-weight: normal;
}

The Khitan small script is used in a number of entries, whereas the Khitan large script hasn't been encoded yet (but I suspect will be in the not-too-distant future). In either case, it never makes sense to use bold or italics, as they're both quite closely related to the Han script. Theknightwho (talk) 17:23, 5 April 2023 (UTC)[reply]

I notice we're only setting font-style: normal (disabling italics) but AFAICT not disabling boldness for Han, and indeed boldness looks fine and usefully highlights the relevant term in the usxes (etc) in e.g. 荊, just as much as in entries in other scripts. So disabling italics seems reasonable, but what's the rationale for disabling bold for Khitan? Perhaps we should instead reconsider why we're suppressing bold for e.g. Tangut; here having bold (in the second, manual usex; vs forced nonbold in the first, templated usex) seems to help with identifying the relevant term, like in Han and most other scripts/languages. (Looking at Arabic and Hebrew, where we also suppress bold, I see highlighting is used instead; I can't say I find it very noticible/legible.) - -sche (discuss) 20:45, 5 April 2023 (UTC)[reply]

@-sche I'd say because it's completely ahistorical, and (certainly for Tangut) degrades legibility. ~~I'd support removing it for Han, too, as the dedicated templates don't use it, which means I suspect it's an oversight.~~ Actually, I'd support removing it for mentions in non-gloss templates, but keeping it for usexes. That way, it's only ever used in places where it's obvious to the reader what it's supposed to say (instead of being some other term). Theknightwho (talk) 21:55, 5 April 2023 (UTC)[reply]

Historicity is not an overly persuasive argument for me, we don't use t h i s kind of emphasis for usexes and quotes on older words in e.g. German even thought it'd be historical (and when we bold lemmata in quotations that use them, they were almost never bolded in the original text). We need some way of indicating the relevant term in quotations and usexes, and bolding seems to do the job legibly and consistently across entries in most of the scripts we have entries in. What other method of marking the relevant term would you propose? Faint highlighting isn't legible IMO. Maybe make emphasized words bigger? (Underline them?) - -sche (discuss) 22:17, 5 April 2023 (UTC)[reply]

@-sche I did draw a distinction between the two different places where we bold terms - though I should note that Chinese dictionaries tend to use ～, but wouldn't support that as I feel it's too different to how we do things elsewhere, and it's not actually very intuitive. Underlining might make more sense. Theknightwho (talk) 22:34, 5 April 2023 (UTC)[reply]

@Theknightwho I don't agree with disabling italics and bold in all cases. See also Wiktionary:Grease pit/2022/August#Italics as alt in Greek, where we used to disable italics in all cases in Greek, and User:Sarri.greek pointed out that this caused problems with taxonomic references and other things. On first thought I think it may reasonable to distinguish bold/italics in mentions from usexes, but I need to consider this more. Benwing2 (talk) 23:32, 5 April 2023 (UTC)[reply]

@Benwing2 That makes sense - though would it please be possible for you to add the CSS for italics? That would bring Khitan in line with what Han has now.

Any changes to Han text handling should probably be the subject of a new thread on the BP, I think. Theknightwho (talk) 20:20, 7 April 2023 (UTC)[reply]

the translation-adder gadget's edit summaries

...include unnecessary s, compare this recent edit summary to this older edit summary (note the edit summary, not the edit itself). (This is only an aesthetic problem AFAICT, so not a high priority to fix.) - -sche (discuss) 03:29, 6 April 2023 (UTC)[reply]

@Theknightwho I am guessing this is related to your changes to {{ll}}; see also the comment below. Can you explain what your change does? Benwing2 (talk) 04:44, 6 April 2023 (UTC)[reply]

@Benwing2 The change makes sure that formatting is present, without adding any annotations - I made the change as I noticed that {{ll}} is frequently usd used outside of other formatting templates, which just results in bare text. Theknightwho (talk) 04:49, 6 April 2023 (UTC)[reply]

@Theknightwho Can you clarify what you mean by "makes sure that formatting is present, without adding any annotations"? It seems to be adding a ..., which per the documentation it should not. Benwing2 (talk) 04:53, 6 April 2023 (UTC)[reply]

@Benwing2 It tags the text. It should be possible to get rid of the formatting only for the translation-adder, because otherwise it's a problem for any uses of {{ll}} outside other templates - and there are a lot. Theknightwho (talk) 04:57, 6 April 2023 (UTC)[reply]

@Theknightwho It formerly generated a bare link when used outside of other templates; this was intentional. I don't think this behavior should be changed. Benwing2 (talk) 04:58, 6 April 2023 (UTC)[reply]

@Benwing2 Are there any other situations where the use of formatting is a problem, besides the translation adder? Theknightwho (talk) 04:59, 6 April 2023 (UTC)[reply]

@Theknightwho I don't know, but as a general rule you should not change the behavior of highly-used templates without prior discussion, as it can easily lead to unintended consequences (as with the translation adder and possibly in other places). Benwing2 (talk) 05:01, 6 April 2023 (UTC)[reply]

@Benwing2 That fair. However, I do think this situation needs to be dealt with, because it may need mass replacing on a lot of pages. Theknightwho (talk) 05:04, 6 April 2023 (UTC)[reply]

@Theknightwho Can you give some examples? Benwing2 (talk) 05:20, 6 April 2023 (UTC)[reply]

@Benwing2 If you check what links to the template, you'll see lots of taxonomic pages - it seems to be the case on most of those, e.g. Felis margarita. Sometimes, it's nested within a taxonomic templates, but none of those apply formatting as far as I can tell. Theknightwho (talk) 05:26, 6 April 2023 (UTC)[reply]

@Theknightwho In the case of this page, User:DCDuring intentionally changed {{l}} to {{ll}}; I'm not sure why. Benwing2 (talk) 05:29, 6 April 2023 (UTC)[reply]

@Benwing2 There are tons of them - it seems to have been systematic. Theknightwho (talk) 05:41, 6 April 2023 (UTC)[reply]

@Theknightwho In the meantime, can you revert your change? Benwing2 (talk) 05:42, 6 April 2023 (UTC)[reply]

@Benwing2 It no longer tags the text, but I don't think we should keep it that way in the long term if at all possible. Theknightwho (talk) 06:05, 6 April 2023 (UTC)[reply]

For at least a time {{l}} yielded a different font than plain text, but I was compelled to use a link template to avoid misleading orange links (generated by a gadget). {{ll}} seemed to avoid the font problem and seemed generally "lighter". (Is it as "light" as {{l-lite}}?) I got into the habit of using it all the time. DCDuring (talk) 11:15, 6 April 2023 (UTC)[reply]

@DCDuring It isn't as light as {{l-lite}}, no, as it still uses Lua. Theknightwho (talk) 11:53, 6 April 2023 (UTC)[reply]

@User:Theknightwho Would it be possible to redirect {{ll}} to {{l-lite}}? DCDuring (talk) 14:23, 7 April 2023 (UTC)[reply]

@DCDuring No - {{l-lite}} still has the formatting and serves a different purpose. However, I really don’t think there’s much point in {{ll}} as a template. The one use case is the translation adder, and it’s not hard to set something up for just for that. Theknightwho (talk) 19:21, 7 April 2023 (UTC)[reply]

I never understood why {{l}} gave a different font for Translingual from what it gave for English. No one offered an explanation. {{ll}} gave the right result when {{l}} did not. DCDuring (talk) 19:31, 7 April 2023 (UTC)[reply]

The uses I make for a "list" template require nothing other than that the item not appear orange when used within a template that expects a different language. Beside the column templates, {{taxoninfl}} and {{head|mul}} might have specific epithets that we do not treat as Translingual. DCDuring (talk) 19:38, 7 April 2023 (UTC)[reply]

@DCDuring So {{l}} (and {{l-lite}}) tag the text as having a given script and language (even if that language is mul). That’s how we make sure text has the correct font, and so on. {{ll}} runs like {{l}}, but doesn’t do that; but it does still link to the correct language section.

In the past, that made sense in some cases when placing links inside other link templates, but now we can handle any difficulties caused by that automatically. That’s why I think it’s pointless. Theknightwho (talk) 19:39, 7 April 2023 (UTC)[reply]

I don't see that anyone has handled my orange-link problem within the column templates automatically or any other way. DCDuring (talk) 19:48, 7 April 2023 (UTC)[reply]

@DCDuring That’s something we’re working on with the column template by making it possible to specify languages for specific terms, which I think @Benwing2 has on his todo list? In any event, I don’t think {{ll}} solves that issue (but can’t check as I’m on mobile right now).

That being said, you should be able to do it right now by using a normal link template within the column template - though I appreciate it’s quite messy to do it that way. It should work as a short-term solution, though.

Theknightwho (talk) 20:06, 7 April 2023 (UTC)[reply]

@DCDuring, Theknightwho Is the request to me to add the ability to specify a language prefix on {{col}} templates? Let me go ahead and add that. Benwing2 (talk) 20:13, 7 April 2023 (UTC)[reply]

@Benwing2 Thank you! Theknightwho (talk) 20:15, 7 April 2023 (UTC)[reply]

That's what I have been doing. It's "only" nine more characters per item for my arthritic fingers. DCDuring (talk) 20:17, 7 April 2023 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Theknightwho Thank you for the change. Why do you not think it should stay that way? Benwing2 (talk) 06:10, 6 April 2023 (UTC)[reply]

@Benwing2 It's evidently being misused, and (aside from the translation adder) I cannot think of any situations where having formatting is a problem. Theknightwho (talk) 06:12, 6 April 2023 (UTC)[reply]

If {{ll}} formats like {{l}}, what's the point of {{ll}} at all? That implies it should stay without formatting. Benwing2 (talk) 06:14, 6 April 2023 (UTC)[reply]

@Theknightwho Benwing2 (talk) 06:15, 6 April 2023 (UTC)[reply]

@Benwing2 It doesn't add anything like transliterations by default, which is legitimately useful. I don't see how not having formatting is ever useful, apart from this one specific niche. Theknightwho (talk) 06:17, 6 April 2023 (UTC)[reply]

@Theknightwho You can disable transliterations with tr="-". I disagree that having a raw link-generating template isn't useful. It isn't your problem IMO if people are (in your eyes) misusing the template, and if it's a genuine problem, it can be corrected by bot. Benwing2 (talk) 06:19, 6 April 2023 (UTC)[reply]

@Benwing2 Well... when is it useful? As far as I can tell, most of its uses are either pointless or actively detrimental. Theknightwho (talk) 06:21, 6 April 2023 (UTC)[reply]

@Theknightwho For one thing if it's wrapped in a formatting template, it's double-formatting. But you're missing the larger point, which is that you made a significant and incompatible behavioral change to a major template without prior discussion, just because you think it's better that way. This is not the right way to go about such a change. Benwing2 (talk) 06:31, 6 April 2023 (UTC)[reply]

@Benwing2 Double formatting is automatically stripped by the link template. The reason I considered it to be a reasonably minor change is because it is explicitly supposed to be used in situations where formatting is already present, and the template seemed to pre-date many of the current features of the links templates, such as being able to smoothly handle double-formatting. In other words, the only situations where I saw it having any substantive change were those that were explicitly disallowed anyway - and the only change it would make is to put in formatting that should have been there in the first place.

I appreciate that the translation-adder complicates things, but I also don't think we should have a special template just for that, or that it provides a compelling case for forcing mistakes to be fixed by bot, instead of just handling them automatically. Theknightwho (talk) 06:41, 6 April 2023 (UTC)[reply]

Actually - slight correction: it isn't stripped in this particular situation; it's just totally irrelevant, because the new formatting always takes precedence. This happens all the time with column templates. Theknightwho (talk) 06:53, 6 April 2023 (UTC)[reply]

@Theknightwho You are still trying to justify your change. I'm not sure you really heard what I said; the change you made was incompatible, and regardless of whether you think it's "obviously" better, it needs discussion beforehand. This isn't the first time something like this has happened. Benwing2 (talk) 06:58, 6 April 2023 (UTC)[reply]

@Benwing2 I understand your point. However, it was a change in the spirit of WP:BOLD, with a comparatively minor impact in a niche situation. It was also done in line with (what I perceived was) the underlying basis of the original template, and neither of us have suggested any uses that make the template necessary. We can obviously discuss specifics, but it wasn't just some arbitrary, reckless change. Theknightwho (talk) 07:26, 6 April 2023 (UTC)[reply]

BOLD is great for content, especially to resolve long-standing stalemates. I don't think it applies when lots of habits and expectations are changed, which is often the case with template/module/formatting changes. DCDuring (talk) 11:23, 6 April 2023 (UTC)[reply]

For cases where {{ll}} was necessary, see Wiktionary:Beer parlour/2017/March § Template:ll and Wiktionary:Grease pit/2020/December#text formatting oddity with {{link}} and {{mention}}. Other discussions about this template can be seen in this search. Even if these uses are dealt with in other ways now, it's likely that sometime we will need to just language-link without HTML and link annotations. If {{ll}} were changed to be a {{l}} without link annotations, we would probably need another template to replace {{ll}} eventually, so it would be simpler to leave {{ll}} alone and make a new template to be the {{l}} without annotations, if that's wanted. — Eru·tuon 02:24, 7 April 2023 (UTC)[reply]

Could {{ll}} be redirected to {{l-lite}} without bad consequences? DCDuring (talk) 14:25, 7 April 2023 (UTC)[reply]

@Erutuon I think there are better solutions for the problem in that second thread (linking to language A while displaying language B), because it’s still lacking any formatting for language B, and still doesn’t provide a language link to it. It just doesn’t tag the text as being language A. That’s a really suboptimal workaround, imo.

My main concern with {{ll}} is that it has a lot of potential for being misused, which is exactly what seems to have happened; and in fact, I think the example in your thread is one of those misuses. There are almost no situations when we shouldn’t be tagging text - I can’t think of any outside of edit comments, and the only reason for that is because it’s not possible to use CSS in them. Theknightwho (talk) 20:14, 7 April 2023 (UTC)[reply]

Minor issue with the translation adder

Just thought I'd flag up a minor issue which I noticed, so that it can be attended to at some stage. Wordsmith was showing up in the redundant Chinese transliteration category, so I removed the manual transcriptions. However, as can be seen in this version, in the verb translation table "[[Category:|WORDSMITH]]" showed up after the translation. Re-adding the transliteration made the issue go away. — Sgconlaw (talk) 17:33, 6 April 2023 (UTC)[reply]

Template:ll

Template:ll has been broken by special:diff/72606927. It is now just a copy of Template:link with less parameters. -- Huhu9001 (talk) 04:28, 6 April 2023 (UTC)[reply]

@Huhu9001 "Broken" is really non-specific. Broken in what way? And no, it isn't just Template:link with less parameters, because it's also computationally less expensive. Theknightwho (talk) 04:50, 6 April 2023 (UTC)[reply]

(The causer seems to have reverted the change so it is back to normal for now.) -- Huhu9001 (talk) 06:41, 6 April 2023 (UTC)[reply]

Entries incorrectly categorized in "Category:Pages linking to anchors not found in Appendix:Glossary"

I am beginning to see entries appearing in "Category:Pages linking to anchors not found in Appendix:Glossary", even though they do in fact link to anchors in the Glossary. For example, at Interior Mexican, the use of {{alternative case form of}} causes the alternative form entry to be placed in that category, even though "letter case" is an anchor in the Glossary. — Sgconlaw (talk) 20:46, 6 April 2023 (UTC)[reply]

@Sgconlaw: Fixed with this edit. It might cause other problems. Pinging Surjection to check whether this will break whatever the previous edit was meant to fix. — Eru·tuon 01:54, 7 April 2023 (UTC)[reply]

@Erutuon: great, thanks! — Sgconlaw (talk) 04:59, 7 April 2023 (UTC)[reply]

Template:+obj

THis has been experimental for nearly a decade now - perhaps it's time to finish it and write proper documentation? — SURJECTION ^{/ T / C / L /} 19:44, 7 April 2023 (UTC)[reply]

To be honest, this doesn't look that good to begin with - it cannot handle multiple specifications without having multiple templates, which results in a [...] [...]. Maybe that's intended? — SURJECTION ^{/ T / C / L /} 20:10, 7 April 2023 (UTC)[reply]

@Surjection Hi. A year or so ago I totally rewrote this template and added support for multiple specifications and all sorts of other things, but I haven't gotten around to finishing it and converting everything. See User:Benwing2/test-obj. The main issue is I'm not completely happy with the appearance. If you have any thoughts on this, please let me know. Maybe I should just push this live as it's a huge improvement on what we have (and subsumes {{+preo}} and {{+posto}}). Benwing2 (talk) 21:02, 7 April 2023 (UTC)[reply]

I think the bolding and underlining makes it look too busy. Maybe remove underlining entirely and replace bolding with the standard mention formatting? — SURJECTION ^{/ T / C / L /} 08:37, 8 April 2023 (UTC)[reply]

It'd also be nice to have some kind of support for indicating the government of a subject, i.e. that a subject has to be in a specific case. Perhaps that'd be better as a separate template ("+subj"?) — SURJECTION ^{/ T / C / L /} 08:53, 8 April 2023 (UTC)[reply]

@Surjection You are referring to quirky subjects, known especially from Icelandic? Probably we should use the same template unless there are significant conceptual differences from object and preposition governance. As for the busyness, yes, I generally agree. I think I made it that way to make it clearer where the difference lies between prepositions and cases but maybe it can be improved to make it slightly wordier and less formatting-heavy, e.g. something like with auf + accusative, although maybe you'd still want to bold the prepositions. Benwing2 (talk) 04:19, 9 April 2023 (UTC)[reply]

Yes, that is what I am referring to. — SURJECTION ^{/ T / C / L /} 07:11, 9 April 2023 (UTC)[reply]

If you want to test whether your new +obj can handle really complicated cases, tuntua #3 has got you covered. — SURJECTION ^{/ T / C / L /} 17:57, 9 April 2023 (UTC)[reply]

First addition to Wiktionary caught in spam filter

Well, the barrier to entry around here seems high, indeed...

I'll keep it brief: I wanted to add the following to the Quotations page of HIJMS

2019 March 4, Soumyajit Dasgupta, What are Ship Prefixes for Naval and Merchant Vessels?‎^[1]:
It is not a compulsory rule that ship prefixes are to be attached to every vessel, and the Imperial Japanese Navy and the Kriegsmarine from Third Reich are good examples. Thus, the ship-naming process is not followed universally. Few English writers promote ship prefixes like the “IJN”, which stands for “Imperial Japanese Navy” fleet, “HIJMS”, referring to “His Imperial Japanese Majesty’s Ship”, and the “DKM” signifying “Deutsche Kriegsmarine” vessels. Interestingly, the names result from logical coherence and agreement with abbreviations like “HMS” or “USS”. Most of the writers follow the basic norms prescribed by the Navy and leave out the ship prefixes.

This was identified as "various specific spammer habits," and I was directed here. Help? PhotogenicScientist (talk) 21:46, 7 April 2023 (UTC)[reply]

My initial guess would be that it was because you're a new editor adding a link, although KMSheldon said at #Citation_flagged that removing the link didn't help. FWIW, having looked through all the edits the filter caught this month, they were all vandalism / linkspam apart from KMSheldon's and your legitimate attempts to add cites, Special:Contributions/85.193.248.142's attempt to ask if OP meant overpowering in a certain Youtube video he linked, and Special:Contributions/8.219.131.141's attempt to add 八美肉. Perhaps someone else can look at whether it's over-broadly catching some keyword. - -sche (discuss) 23:00, 7 April 2023 (UTC)[reply]

@-sche I removed the link from my citation, and the filter let it through (though without a url, the template broke; so I added "n/a" instead).

Coming from Wikipedia as I have, I'm not sure if there was anything wrong with the url I was trying to add. The source seemed reliable enough. And I would think that having url links to support quotations/citations would be desirable. IMO the spam filter is being overly aggressive. But that's just my 2 cents. PhotogenicScientist (talk) 15:41, 11 April 2023 (UTC)[reply]

Interlinear glossing template

I designed Template:User:Surjection/interlinear as an experiment - the documentation and module code still would need a lot of cleanup, but it has potential. The question is whether we should have a template like this and how its use should be regulated. I certainly don't think every usage example should be glossed like this, but there could be cases where it would be useful. — SURJECTION ^{/ T / C / L /} 08:49, 8 April 2023 (UTC)[reply]

Observation: sometimes (typically for LDLs where the best sources are linguistics works), we quote books which themselves already have this type of interlinear gloss. A template like this, whether or not this exact template, might be able to help format such quotes. :) At the same time, we might need some way of distinguishing when interlinear glosses were present in the original and when they've been added by Wiktionary. - -sche (discuss) 18:44, 8 April 2023 (UTC)[reply]

I think we could distinguish this in the same way as we do with quotes and usage examples. — SURJECTION ^{/ T / C / L /} 07:43, 9 April 2023 (UTC)[reply]

@Surjection Looks good, thanks for writing it! One thing you might want to add support for is a word-for-word translation between the second and third lines, so that e.g. in your example, the extra line would say something like 'I go to-(the)-store'. In languages that have lots of morphology and a word order that's significantly different from English, it can be difficult with just the raw glosses to sort out what's going on. This word-for-word translation is approximate and may miss some of the morphology (e.g. if the language has evidential markers attached to the verb, it may be difficult to express that succinctly in this format), but it can be a good explanatory tool and I've seen it used in some interlinear glosses in linguistics papers. Benwing2 (talk) 07:45, 9 April 2023 (UTC)[reply]

Seems straightforward enough to add. — SURJECTION ^{/ T / C / L /} 07:48, 9 April 2023 (UTC)[reply]

@Surjection, Benwing2: I like the idea, but one technical issue I have is with word boundaries. How would you handle other differences in word boundaries, e.g. French "A-t-il de l'argent?" glossed as "Has he got any money?"? How would you handle

(a) Languages where spaces are more like clause boundaries than word boundaries, e.g. Thai?

(b) Enclitics, and languages where sandhi frequently deletes word-boundaries, e.g. the quotation data[142].bhummanam at Module:RQ:pi:Singthon, laid out as a quotation at สัททะ?

The latter example has potential issues with the explanations at the translation. --RichardW57m (talk) 16:33, 18 April 2023 (UTC)[reply]

If I'm not understanding wrong, isn't this exactly what the brace syntax described in the documentation is meant to solve? — SURJECTION ^{/ T / C / L /} 16:52, 18 April 2023 (UTC)[reply]

I suppose one could gloss the French example as {{interlinear|fr|A-t-il de l'argent|have-PRES.3s-he GEN {the money}|Has he got any money?}} yielding:

A-t-il	de	l'argent
have-PRES.3s-he	GEN	the money
"Has he got any money?"

I had hoped to use 'some' as a gloss of de l'.

I don't see how brace syntax can help with {{interlinear|th|ใช้อะไรครับ|use what POLITE|What do I use?}}, which currently yields

ใช้อะไรครับ
use	what	POLITE
"What do I use?"

I get the same result if I replace '' with '' or the ZWSP character itself. Possibly the answer is to used a spacing tie character to render non-spacing word breaks. Something is broken in 'automatic' transliteration for Thai - not even {{interlinear|th|{ใช้} {อะไร} {ครับ}|use what POLITE|What do I use?}} works for the middle word, but currently yields

ใช้	อะไร	ครับ
chái		kráp
use	what	POLITE
"What do I use?"

with the transliteration of the middle word blank. The transliteration issue looks like clean-up.

For the Pali exam question, an approximate answer, which exploits the brace syntax, is

{{interlinear|pi|ภุมมานัง เทวานัง สัททัง สุตวา // จาตุมมะหาราชิกา เทวา สัททะมะนุสสาเวสุง ฯ|terrestrial-GEN.PL god-GEN.PL word-ACC.S hear-ABS , {in the service of the Four Heavenly Kings-NOM.PL} god-NOM.PL {word-ACC.S proclaim-AOR.3PL} …|Having heard the word of the terrestrial gods, the gods in the service of the Four Heavenly Kings proclaimed the word …|tr=+ + + sutvā + + + + etc.}}

, which yields:

ภุมมานัง	เทวานัง	สัททัง	สุตวา	//	จาตุมมะหาราชิกา	เทวา	สัททะมะนุสสาเวสุง	ฯ	:::::
bhummānaṃ	devānaṃ	saddaṃ	sutvā	//	cātummahārājikā	devā	saddamanussāvesuṃ	etc.	:
terrestrial-GEN.PL	god-GEN.PL	word-ACC.S	hear-ABS	,	in the service of the Four Heavenly Kings-NOM.PL	god-NOM.PL	word-ACC.S proclaim-AOR.3PL	…	:::::
"Having heard the word of the terrestrial gods, the gods in the service of the Four Heavenly Kings proclaimed the word …"

It may be desirable to include a parameter for footers, such as an explanation of the ellipsis, in this case "The ellipsis refers to the text of the proclamation.".

The design loses marks for not resolving saddamanussāvesuṃ into two words (which seems sometimes soluble for some Pali writing systems). --RichardW57m (talk) 10:43, 19 April 2023 (UTC)[reply]

It would appear that much Thai transliteration does not use Wiktionary's transliteration interface - we may get some discussion at topic Thai Transliteration. --RichardW57m (talk) 14:58, 19 April 2023 (UTC)[reply]

@Surjection: Are there any plans to handle row-breaking? I haven't a clue how one would do it. At present, long text, which need not be very long on a mobile phone, requires horizontal scrolling. --RichardW57m (talk) 14:26, 21 April 2023 (UTC)[reply]

The Wikipedia version manages to wrap rows - example currently at w:User:RichardW57/sandbox. --RichardW57m (talk) 14:49, 21 April 2023 (UTC)[reply]

Not right now, maybe eventually. — SURJECTION ^{/ T / C / L /} 18:44, 21 April 2023 (UTC)[reply]

I would note that Wikipedia already has a feature-rich interlinear glossing template: w:Template:Interlinear in case there is a desire not to reinvent the wheel. This, that and the other (talk) 08:37, 12 April 2023 (UTC)[reply]

Alphabetization of Ü in a Category List

In Category:en:Places_in_China after Z, there is a separate section for Ü containing the lone word Ü-Tsang. In that same category (Category:en:Places_in_China), at U, Ürümqi appears in the list of U words (and not under Ü). Can someone explain what's happening here, technically speaking? Why is Ü-Tsang not among the U words? And why is Ürümqi not a Ü word? Or is the status quo actually correct somehow? Also, if you can fix this, please do. Thanks! --Geographyinitiative (talk) 19:25, 8 April 2023 (UTC)[reply]

I suspect it's because Ürümqi uses {{place}}, which must strip the diacritic, whereas in Ü-Tsang {{place}} is not used and the category is applied manually. I think writing [[Category:en:Places in China|U-Tsang]] would cause Ü-Tsang to also be sorted under U. - -sche (discuss) 19:54, 8 April 2023 (UTC)[reply]

@-sche To be more specific, English sortkeys strip the diacritic. It's language-dependent, of course. Theknightwho (talk) 20:16, 8 April 2023 (UTC)[reply]

@Geographyinitiative: in English Ürümqi, the category is added by a template ({{place}}), with the same code that creates the category also creating a sort key to go with it. In English Ü-Tsang, the category was manually coded in the wikitext as [[Category:en:Places in China]], with no sort key. If you just go by the character codes in Unicode, Ü comes after all the basic Latin letters without diacritics, which is why it alphabetizes the way it does. There are two ways to fix this: you could either add a sort key to the manually-coded category: [[Category:en:Places in China|U-Tsang]], or convert it to a template and let the template do it for you: {{C|en|Ürümqi}} would be the simplest, but {{place}} does other nice things if you know how to use it. For most uses, templates are better than hard-coded categories, though there are exceptions like huge pages that run out of memory if you use too many module-based templates. Chuck Entz (talk) 20:09, 8 April 2023 (UTC)[reply]

Done Thanks! --Geographyinitiative (talk) 20:15, 8 April 2023 (UTC)[reply]

Old Norse descendant trees broken after Westrobothnian was commented out

If you look at Proto-Germanic *augô and then the Old Norse desctree, you will see that it ends at Elfdalian. Yet if you go to Old Norse auga, there are many more descendants. The cause of this is that the Westrobothnian line has been commented out, and so the desctree template stops parsing any descendants after it. This must be fixed. ᛙᛆᚱᛐᛁᚿᛌᛆᛌ ᛭ Proto-Norsing ᛭ Ask me anything 12:20, 9 April 2023 (UTC)[reply]

I fixed this by removing a newline. Vox Sciurorum (talk) 12:36, 9 April 2023 (UTC)[reply]

@Mårtensås @Vox Sciurorum I think I've fixed the underlying issue here. Comments like that are now just treated as though they don't exist from the very start.

We were also experiencing a (much more serious) bug where the module wasn't actually parsing the HTML comments properly, which was causing Westrobothnian templates not to be commented out in the tree (which resulted in errors). This change has also fixed that. Theknightwho (talk) 15:17, 9 April 2023 (UTC)[reply]

hiding entry maintenance categories

Shouldn't __HIDDENCAT__ be added to Category:English entry maintenance, its sisters in other languages, and their subcategories? The categories seem to be of negligible use to anyone but editors. I do see that they were hidden at one person's instigation and then unhidden at another's, but wonder what the general consensus is. (Pinging the two people.)—msh210℠ (talk) 12:44, 9 April 2023 (UTC)[reply]

I can't offer any more. The categories I complained about being hidden are no longer hidden. DonnanZ (talk) 13:16, 9 April 2023 (UTC)[reply]

I can't really add anything beyond what is noted on the thread that has been linked to above. Some of the categories labelled as 'maintenance' may also be used for other purposes and that has implications for decisions around visibility. John Cross (talk) 16:25, 9 April 2023 (UTC)[reply]

Word vectors or embeddings

Have there been any attempts to combine Wiktionary with new NLP (natural language processing) techniques such as word vectors or embeddings? I've found a page meta:Research:NLP Tools for Wikimedia Content, but it was written once and nobody added to it. And I found nothing here. --LA2 (talk) 17:17, 9 April 2023 (UTC)[reply]

@LA2 I actually work in this area and I've long thought about the possibility of using neural NLP techniques with Wiktionary. The basic problem is that they're only approximate and make mistakes. The recent work on large language models (where "large" means "exceptionally huge", i.e. billions upon billions of parameters) has made things significantly better but there's still a long tail of cases where mistakes are made, and they still aren't very good at knowing when their outputs are likely to be wrong. This is quite problematic for a dictionary, where it is precisely the less common terms that people often look up. Possibly Wiktionary could have "auxiliary" content that is NLP-generated and marked and such, but I'd be concerned that unless done very carefully and thoughtfully it would end up being a waste of energy and resources; we can never compete in this area with the likes of Google and Microsoft. Maybe a better approach would be to have NLP-assisted content editing, where somehow or other you could use deep learning to identify likely mistakes in content before it gets saved, which would especially benefit novice editors. In this case, however, I'd be concerned that machine learning is overkill, as many of the problems (e.g. mistaken template uses) can be identified more simply and cheaply without the aid of machine learning at all. But I'm totally welcome to suggestions of where and how we can use machine learning in Wiktionary. Benwing2 (talk) 22:09, 9 April 2023 (UTC)[reply]

Wiktionary does have Appendix:Roget's thesaurus classification as an appendix (and sv:Appendix:Bring in Swedish). This classic thesaurus gives near-synonyms under 1000 different classes. And perhaps some good existing language model with embeddings could offer something similar, with a new perspective. I don't suggest we generate 1000 new appendix pages. But it would be interesting to discuss what could be done. --LA2 (talk) 22:59, 9 April 2023 (UTC)[reply]

Suggesting synonyms could be a good use case (or even suggesting translations). It's a shame that right now it's mostly the big players who benefit from the work done here. – Jberkel 20:16, 15 May 2023 (UTC)[reply]

`{{col-auto}}`

I think {{q}} is keeping this from linking at e.g. Samotracia. Ultimateria (talk) 18:57, 9 April 2023 (UTC)[reply]

@Ultimateria: I replaced {{q}} with <q:...>. It puts it in a different position (before, not after), but it's probably the only way that's supported. — Eru·tuon 19:21, 9 April 2023 (UTC)[reply]

Thanks! I wonder how many instances of this are out there though. Ultimateria (talk) 19:30, 9 April 2023 (UTC)[reply]

@Ultimateria, Erutuon Module:columns has a check to see if the text "<span" occurs in a link, and outputs the link raw if so. This is to avoid double-linking in the frequent case where {{l}} is used in an argument to {{col}} or similar. This has long been there, but I think the issue with Samotracia is recent due to a bot run by User:JeffDoozan. We need to do another bot run to convert cases where nested link and qualifier templates occur inside of {{col}} etc. to use inline modifiers instead. Note that just yesterday I added support for <qq:...>, which is like <q:...> but places the qualifier to the right of the link instead of the left. Lots of templates (e.g. {{syn}}, {{alt}}, several others) now support both <q:...> and <qq:...> inline modifiers. Benwing2 (talk) 21:56, 9 April 2023 (UTC)[reply]

There shouldn't be too many entries like Samotracia and they're probably limited to just Spanish. At the start of the bot run, I didn't realize that Module:columns was checking for "<span" tags and not generating the appropriate links, and I fixed that before running it more widely. I didn't know until just now that {{col-auto}} supported <q:...> and <qq:...> (thanks, Ben!), so I'll do another bot run this week to fix any lingering entries like Samotracia and also convert entries formatted as {{l|entry}} {{q|qualifer}} (my workaround for entries like Samotracia) to be just entry<q:qualifier>. JeffDoozan (talk) 22:20, 9 April 2023 (UTC)[reply]

sloppy templating

I recently fiddled with some templates, as a result of which there's a load of non-English crap at Category:English archaic spellings. Sorry about the mess, and have a nice day. It is probably (talk) 19:42, 9 April 2023 (UTC)[reply]

unprotect request

mod:Jpan-sortkey need to be unprotected to make changes to correctly handle: 1. Small kana 2. Incorrect sortkey resulted from bad entry format. 3. Category sorting v. non-category sorting. See Wiktionary:Beer_parlour/2023/April#Trouble. -- Huhu9001 (talk) 02:47, 11 April 2023 (UTC)[reply]

No. You kept edit-warring to impose what you wanted, despite having no consensus to do so. Theknightwho (talk) 05:45, 11 April 2023 (UTC)[reply]

@Wpi31 See this. -- Huhu9001 (talk) 05:49, 11 April 2023 (UTC)[reply]

Override for Korean hyphen is required.

Normally, hyphens are removed from the Korean text even when they are part of the headword (suffixes). In some cases, though "-" is required in the Korean text and should not be suppressed. Example: 오스트리아헝가리 (Oseuteuria-heonggari). It is written in Korean with a hyphen 오스트리아-헝가리 (currently can't be linked with a template, since the hyphen is automatically removed).

Please consider this also, if hyphens are used similarly for other languages, e.g. diff. They can be used in Persian, etc.,e.g. (otriš-majârestân).

I raised this first in Module_talk:ko-translit#Is_suppressing_hyphens_a_good_idea_in_100%_of_cases?. Anatoli T. ^{(обсудить}/^вклад) 06:12, 11 April 2023 (UTC)[reply]

Graphical display on the right of prefix/infix/suffix slots

Would it be worth having a visual representation of which slot an affix occupies off to the right in the entries of some languages? It could be particularly useful for languages where there are two or more infix slots, which cant be flipped around ... which is the case in most languages with more than one slot. Some languages that could use this would be Swahili, Georgian, and Inuktitut. I dont know much about those languages. Another example possibly would be Turkish, where there are some suffixes that must always be padded by other suffixes, and therefore occupy distinct slots.

This would occupy the right part of the screen, taking up space to balance the content-heavy left side, but would be small enough that people would see it as a useful visual aid and not a distraction. Thoughts? I'm leaving this idea wide open. —Soap— 09:13, 11 April 2023 (UTC)[reply]

It sounds interesting and possibly quite useful. Is there some standard way linguists do this? Could someone do a mockup of how this would look for a language that would greatly benefit from this? DCDuring (talk) 17:36, 11 April 2023 (UTC)[reply]

My thoughts match DCDuring's: this sounds like it could be useful, but a mockup would be helpful. Could crib formatting from {{examples}}, perhaps (as far as being a box floating on the right). - -sche (discuss) 15:43, 13 April 2023 (UTC)[reply]

I may be slow in getting back to this. All I can add now is ... what got me thinking of this is the March Beer Parlour thread To hyphen or not to hyphen, where I realized that it would be useful to readers if infixes and other affixes were labeled with their assigned slots, in those languages that have them. Abkhaz is one example of a language where the slots can all be assigned specific numbers, but there are many others. By contrast, even in some languages with many slots in their word structure, some elements can move around, so assigning a number to a morpheme would be complicated. That's why I'm going to take a long time with this. But I havent forgotten. —Soap— 13:12, 2 May 2023 (UTC)[reply]

My own CSS Page

My edit was disallowed. All I want is a dark monobook theme Thespaceface (talk) 16:33, 11 April 2023 (UTC)[reply]

@Surjection, Fytcha, Chuck Entz I'm guessing filter 32 should allow users to edit their own .css and .js? and (independently of where the edit is made), if it's currently flagging links to WMF wikis, it could refrain from that. See also two other legitimate edits this has stopped, at #Citation_flagged and #First_addition_to_Wiktionary_caught_in_spam_filter, although on the whole almost all of what it stops is indeed spam it was good to stop. - -sche (discuss) 17:54, 11 April 2023 (UTC)[reply]

I think I know what the common problem is with all of these, but it's difficult to solve. I added a note to the header shown when creating new citation pages to check how to correctly format pages, which should hopefully reduce the probability that the issue reoccurs. Exempting .css and .js pages is quite tricky (considerably more so than one might first assume). — SURJECTION ^{/ T / C / L /} 18:33, 11 April 2023 (UTC)[reply]

I think it's as simple as excluding the CSS and JavaScript content models from the filter. These pages can only affect the user making the edit (unless someone has imported another person's personal CSS or JavaScript) so it should not be necessary to protect them from spammers. I've made this change. — Eru·tuon 22:28, 11 April 2023 (UTC)[reply]

Unstated bad interaction between Module Links and a client?

@Theknightwho recently made a change to Module:pi-decl/noun (diff) with the obscure comment 'Bugfix'. What was the problem? Was it that function full_link in Module:links makes undocumented changes to the table pointed to by its first argument? The effect of the change is that the table is no longer reused between calls, resulting in a small, possibly trifling, increase in garbage generation. Do we need to document possible utter destruction of the table? Conceivably extra fields get added to it. --RichardW57m (talk) 13:15, 12 April 2023 (UTC)[reply]

@RichardW57m Could you please explain what the problem is here? Theknightwho (talk) 14:57, 12 April 2023 (UTC)[reply]

@Theknightwho: I'm trying to work out why my code was wrong - I don't know that it was wrong, but I think it likely that making the change cured a problem you had seen. If there is a problem with the service module, then a warning sign should be erected for the benefit of future encoders, so they don't blunder into the same trap. As it happens, I am slowly putting some enhancements into a sandbox version of the Pali module, so I also need to make sure that I don't thereby overlook any fixes. --RichardW57m (talk) 15:55, 12 April 2023 (UTC)[reply]

Having dug into things myself, it looks from this change to Module:links by @Theknightwho, that the bug was in Module:links (arguably in its comments), the change to Module:pi-decl/noun is now redundant, and the pitfall has been removed. No further action seems required. --RichardW57m (talk) 10:29, 13 April 2023 (UTC)[reply]

@RichardW57m You shouldn’t ever have relied on the input object being unmodified. Just because I’ve put a protection in against that doesn’t mean it wasn’t a bug in your code. Theknightwho (talk) 10:45, 13 April 2023 (UTC)[reply]

@Theknightwho, Benwing2: Not putting up a warning notice stinks to me of being a documentation bug, and since documentation was moved to the code, a bug in the comments. Note that a bold warning is put up for similar behaviour in Module:headword. It's also good practice to put up a warning when making changes that will not be part of the purpose of the call. --RichardW57m (talk) 15:30, 13 April 2023 (UTC)[reply]

Is the function still free to go changing its arguments? If it is, then we still need that warning. --RichardW57m (talk) 15:30, 13 April 2023 (UTC)[reply]

If so, in the opinion of @Theknightwho, should I assume that the script and language object designated by data.sc and data.lang, as opposed to their designations, may also be destructively changed? (It would be safe for them to be used to gather statistics or hold caches.) --RichardW57m (talk) 15:30, 13 April 2023 (UTC)[reply]

Am I correct in presuming that it is better to abandon the table designated by data as garbage than to clean it out for re-use? --RichardW57m (talk) 15:30, 13 April 2023 (UTC)[reply]

Interwiki links are broken?

акка́унт (ru) (akkáunt) is trying to link ru:акка́унт (with the accent), not ru:аккаунт. Anatoli T. ^{(обсудить}/^вклад) 06:31, 13 April 2023 (UTC)[reply]

Better way to handle descendants

So I'm thinking we should make a desc-col template which can automatically generate columns for descendants by number of families, taking parameters for families, so something like {desc-col|branch1=FOO|branch2=BAR} etc, and depending on the number of branches, it generates a column for each, and we can orphan all the other templates used in these sections. Thoughts? Vininn126 (talk) 10:25, 13 April 2023 (UTC)[reply]

Yuck!

A complex illustration, e.g. of a PIE or Proto-Austronesian word, would help. A sketch of the template call would also help. You seem to be proposing one column per branch, but branching is fractal, and horizontal scrolling is also unpleasant for the reader. Additionally, 'descendants' includes descendants by borrowing, which can be back and forth. For example, English gyaru descends from English girl via Japanese. How would you handle that? --RichardW57m (talk) 12:07, 13 April 2023 (UTC)[reply]

This would mostly be for pages with many, many descendants, i.e. Reconstruction:Proto-Slavic/osoba, not something like ギャル with only one. Currently there are many, many templates for different shapes and such, i.e. {{hrow}}, {{zcol}}, all of which seem unnecessary to me. Vininn126 (talk) 12:27, 13 April 2023 (UTC)[reply]

You should have explained what you meant by 'these sections'. And girl has three descendants - two in English, one in Japanese. So you're only only talking about consolidating the rake formats where the word has only a few immediate descendants and no long chains of descendants, and leaving the more traditionally text tree layout as is. --RichardW57m (talk) 14:08, 13 April 2023 (UTC)[reply]

Interesting idea – I reckon this would be helpful in unifying the layout and also the phylogeny (ordering of subgroups, names, etc.) of descendants section, especially those of reconstruction entries. Perhaps we could have a general purpose template that takes in all the arguments and does all the work including formatting and indenting etc., though some care should be taken of the interactions between this template and {{desctree}}. Also, given the complexity of some family trees (notably, the deep subbranching into the Oceanic branch of Austronesian), and the issues of borrowing (e.g. for Sino-Tibetan we have aberrations like {{sit-loan}} and {{Sinoxenic}}), I think there would inevitably be family-specific templates that needs to modify the base template. – Wpi31 (talk) 14:07, 13 April 2023 (UTC)[reply]

Page creation disallowed, "recurring vandalism"

Whilst creating & populating the page Wiktionary:Frequency_lists/Spanish/Mixed_730K/170001-180000, I've been disallowed due to activating the abuse rule "recurring vandalism". Much like all the other dozens (perhaps hundreds) of pages I've created in a similar vein up to now without trouble, it will contain approximately 10,000 wikilinked terms. Most of the time this is smooth sailing (see Wiktionary:Frequency_lists/Spanish/Mixed_730K, I've only had one similar (but different) issue come up before, which I described here. Hopefully someone can explain to me what I am doing wrong—besides making such large single edits—and how—indeed whether—I can proceed without causing further issues. Helrasincke (talk) 05:12, 14 April 2023 (UTC)[reply]

In this case, I think the issue is not the size of the page per se, but that one or more of those 10,000 terms happens to be one spammers overuse and which the filter therefore blocks. @Helrasincke I have temporarily deactivated the filter for a little while so you can make your edits; the filter has only stopped one other edit since February so I don't think we'll get a flood of spam. Longer term, @ other people, should we whitelist Helrasincke? - -sche (discuss) 05:44, 14 April 2023 (UTC)[reply]

Navajo tables look bad with Firefox (desktop and mobile)

Entries like -KAL and -YOOD that use {{nv-theme-header}} and {{nv-theme-row}} to build tables display an 8 column table, with the final 4 columns having no header and just empty cells. JeffDoozan (talk) 17:22, 14 April 2023 (UTC)[reply]

Links to reconstruction entries in etymologies

The links to reconstruction entries in certain etymology sections in lo (Lashi, Middle Dutch, Old French) are broken (displaying a line break and a bullet point instead of an asterisk). (All three sections use {{inh-lite}}.) Einstein2 (talk) 19:17, 14 April 2023 (UTC)[reply]

It's treating the asterisk like wiki-formatting, like the asterisk at the start of this sentence, as if something has caused it to consider whatever is supplied to inh-lite to be a new line of content... but the problem seems to arise only if a large page is viewed all at once; when I preview the contents of the Lashi section by itself or copy just that section over to the sandbox, it displays *laj as expected, but when viewing the whole page at lo or in the sandbox, the problem occurs. - -sche (discuss) 03:26, 15 April 2023 (UTC)[reply]
This only happens when the alt text field contains an initial asterisk, because that field doesn’t go through any special processing. I’ll add something to account for this. Theknightwho (talk) 03:37, 15 April 2023 (UTC)[reply]
This is fixed. Theknightwho (talk) 03:53, 15 April 2023 (UTC)[reply]
Now there's a different problem: the etymologies of the other sections display things like "From Middle English [[lo#Middle English|]], loo, from Old English lā (“exclamation of surprise, grief, or joy”). Conflated in Middle English with lo! (interjection), a corruption of lok!, loke! (“look!”) (as in lo we! (look we!)). Cognate with Scots [[lo#Scots|]], [[lu#Scots|]] (“lo”). See also look." (PS, any idea why the asterisk problem would only arise on large pages and not when the same etymology was present on a small page?) - -sche (discuss) 04:26, 15 April 2023 (UTC)[reply]
That was an error in when I condensed the code, so I’ve rolled that back as it wasn’t important and I can’t be bothered to identify what the problem was. The original fix is still working.

Are you sure those smaller pages were using the lite templates? If so, it might have something to do with when * can and can’t act as a list marker, and it’s just coincidental that the smaller pages you looked at weren’t affected. I suspect there’s a newline in there which is sometimes getting transcluded, which normally gets stripped, but if placed before the asterisk would start a list. In any event, it doesn’t matter now, as the new fix makes it impossible for that to happen. Theknightwho (talk) 14:05, 15 April 2023 (UTC)[reply]
At the time I posted, I noticed (as I said) that when I previewed the just Lashi section alone, it displayed fine, and when I copied the contents of just the Lashi section over to the sandbox and posted them, they displayed fine, but if I copied the whole page lo over, the Lashi section was broken again, like it was on the lo page; i.e. the same exact L2 section was functioning correctly or incorrectly based on whether there were other L2 sections around it. lo displays correctly now, but it might be worth looking at whether some template in a language section higher up on lo is opening and not closing a <li> or something (I don't know what) that would be the cause of * being parsed differently. - -sche (discuss) 15:52, 15 April 2023 (UTC)[reply]

SORRY abbreviation

Hello, Can "Some One is Really Remembering You" abbreviated as SORRY? Yuliadhi (talk) 14:06, 16 April 2023 (UTC)[reply]

That's a backronym. It's not the real origin of the word "sorry", but was invented later. Equinox ◑ 16:13, 16 April 2023 (UTC)[reply]

Strange Warning

At mind, {{head-lite|et|pronoun forms}} displays just fine when previewing the section, but when viewing the whole page, the headword is followed by Warning: Display title "mind" overrides earlier display title "mind".

It seems to be due to conflict between {{head-lite|zh|verbs|sc=Hani}} and {{head-lite|et|pronoun forms}}, with the system freaking out over being given two different values for the display title. Does {{head-lite}} really need to mess with the display title?

I should also mention that a search for "overrides earlier display title" turns up 4 pages with this warning: ᨅᨔ ᨕᨘᨁᨗ, -کو, mind and a, plus a mention of a similar problem at Wiktionary:Grease pit/2017/May. the first two are due to other templates that set the display title. @Theknightwho. Chuck Entz (talk) 04:11, 17 April 2023 (UTC)[reply]

I've removed |sc=Hani in the {{head-lite|zh}} – there is no actual need for doing so, and it only results in weird fonts displaying. – Wpi31 (talk) 04:43, 17 April 2023 (UTC)[reply]

Term not listed alphabetically

See Potter: Potter Valley should be listed before Potterverse but for some reason the template lists it after Potter County. Pinging @Theknightwho. J3133 (talk) 05:47, 17 April 2023 (UTC)[reply]

@J3133 The sorting order is correct, because spaces aren't ignored in English sorting. If you put that list into an Excel table and sort it alphabetically, it will give you the same result. Theknightwho (talk) 09:06, 17 April 2023 (UTC)[reply]

@Theknightwho: Was it always this way or am I misremembering? J3133 (talk) 09:09, 17 April 2023 (UTC)[reply]

@J3133 Yes.

By the way, you've pinged me 5 times in 24 hours about various issues - could you please possibly put them on the Grease Pit instead? Someone will pick them up. Theknightwho (talk) 09:16, 17 April 2023 (UTC)[reply]

@Theknightwho: Actually both spaces and punctuation were ignored in column template sorting (though not in category sorting) till your edit switching to get_plaintext in Module:utilities, which does more sophisticated processing. I had made Module:columns stop ignoring spaces on August 3, 2018, after a comment on the talk page, and DTLHS undid that on September 8, 2018, so ignoring of spaces and punctuation was carried over into Module:collation. On January 4, 2023, punctuation stopped being ignored in languages that use Module:Tibt-sortkey with this edit. — Eru·tuon 18:37, 17 April 2023 (UTC)[reply]

@Erutuon Ah - thanks! I had forgotten about that change, and you're absolutely right. I'm not sure that I'd be keen on restoring it, as it's the kind of thing that should be done on a per-language basis. I'm not hugely bothered if we do that for English (I guess I'd count myself as a weak oppose), but it's too blunt an instrument to apply universally, I think. After all, it's why I had to manually exclude Tibetan from it in the first place. Theknightwho (talk) 18:51, 17 April 2023 (UTC)[reply]

Lack of support for parameters in Template:&lit

It lacks some very basic parameters such as tr, t. Both the tr1=, tr2=, … and the term<tr:transliteration> syntaxes should be supported. – Wpi31 (talk) 06:03, 17 April 2023 (UTC)[reply]

Why would |t= be needed in &lit? Surely the whole point is the word is used literally, in any possible sense - or if a specific sense is meant, it should be clear from context.

Obviously |tr= needs adding, though. This, that and the other (talk) 09:49, 18 April 2023 (UTC)[reply]

@User:This, that and the other, @User:Wpi31 I don't think "literally" means that only one sense of a term can possibly be the meaning for which {{&lit}} applies. That template is used for multi-word expressions and is intended to remind users that usages of the expression are SoP. {{&lit|en|grain|of|salt}} does not use grain ("cereal"), though that is probably the most common meaning, rather than grain ("small particle"). Language learners might benefit even more from a hint about the meaning of of in this expression. DCDuring (talk) 14:39, 18 April 2023 (UTC)[reply]

Problems with sorting in categories

(Copied from Module talk:category tree/topic cat.)

In the manual list at Category:English terms derived from fiction, The Simpsons should be listed before Star Trek (as it is in the automatic subcategory list) instead of first.
At Category:American fiction, the categories “The Hunger Games”, “The Matrix”, “The Simpsons”, “A Song of Ice and Fire”, “The Walking Dead”, “The Wizard of Oz”, and “The X-Files” have correct sorting (under "H", "M", "S", "S", "W", "W", and "X", respectively) but need a space (“ ”) at the beginning to be listed together with the other categories. J3133 (talk) 15:03, 15 April 2023 (UTC)[reply]
At Category:Star Wars, Category:Terms derived from Star Wars by language should be listed separately, under the space character, instead of between languages.

J3133 (talk) 09:19, 17 April 2023 (UTC)[reply]

I have fixed issue 2 at Module:category tree. J3133 (talk) 14:43, 9 June 2023 (UTC)[reply]

@J3133 I have reverted this change; this is not the correct way of doing it. The space should be added by each individual category as necessary. Benwing2 (talk) 22:54, 27 September 2023 (UTC)[reply]

@Benwing2: Do you know how to fix the issue I mentioned? J3133 (talk) 03:42, 28 September 2023 (UTC)[reply]

@J3133 I see you have done a bunch of hacking on the category code without really understanding it. For example, you added a getSort() method which is interfering with the proper sort key operation. The sort keys are supposed to be specified in the parents themselves, not using `sort` at the top level. If you had done it that way, the space would have been added automatically. I am going to delete most of your added code since it's not generally helpful. Please let me know what you were trying to do and I will add any missing functionality. Thanks. Benwing2 (talk) 04:07, 28 September 2023 (UTC)[reply]

BTW why did you create both getDisplay() and getDisplay2()? At most one is needed. Benwing2 (talk) 04:09, 28 September 2023 (UTC)[reply]

@J3133 I have backed out all of your additions. From what I can tell you were trying to add support for italicized titles in topic categories. Note that support for customized sort keys is already present in topic cats and poscatboiler cats (see the documentation, esp. for the latter), and support for customized display is already present in poscatboiler. Also you added getTopicParents(); I don't know what this is for but support for customized parents is already present in both poscatboiler and topic categories. I will add back support for customized display in topic cats and make the sort keys for topic cats smarter so they automatically truncate "the" and "a" at the beginning of categories. Benwing2 (talk) 05:17, 28 September 2023 (UTC)[reply]

@Benwing2: The italics are missing in the navigation line for derivation categories; compare Category:en:Star Wars (“American fiction » Star Wars”) and Category:English terms derived from Star Wars (“Fiction » Star Wars”); and also in the list at Category:English terms derived from fiction. I made this edit to Module:category tree/poscatboiler/data/terms by etymology in August 2022 (it worked before). Also, issue 1 is still present (Category:English terms derived from The Simpsons is listed first at Category:English terms derived from fiction). J3133 (talk) 06:09, 4 October 2023 (UTC)[reply]

@J3133 All right, I'll take a look. Please note, I have purposely not added the displaytitle italicizing to the umbrella msg displayed in topic umbrella categories because those are specifying sample categories to be typed in and it seems strange to italicize those categories when they're not typed in italicized. Benwing2 (talk) 06:18, 4 October 2023 (UTC)[reply]

@Benwing2 Category:English terms derived from Star Wars does not have the italics and shows this warning: “Display title "Category:terms derived from Star Wars" was ignored since it is not equivalent to the page's actual title.” J3133 (talk) 17:12, 8 October 2023 (UTC)[reply]

{{grc-IPA}} weirdness

At ἕως (héōs) and other Ancient Greek entries the following weirdness is happening. When the following templates are used in pronunciation sections:

{{grc-IPA}}
*{{hyphenation|grc|ἕ|ως}}

the hyphenation line ends up extending into the left margin as shown:

(5^th BCE Attic) IPA^(key): //
(1^st CE Egyptian) IPA^(key): //
(4^th CE Koine) IPA^(key): //
(10^th CE Byzantine) IPA^(key): //
(15^th CE Constantinopolitan) IPA^(key): //

Hyphenation: ἕ‧ως

Adding a blank line between the two templates makes the issue go away, but it seems like {{grc-IPA}} needs to be fixed in some way. — Sgconlaw (talk) 21:46, 17 April 2023 (UTC)[reply]

@Erutuon: maybe you know something about this, since you edited the template and module? — Sgconlaw (talk) 22:21, 19 April 2023 (UTC)[reply]

It's a case of the MediaWiki parser not knowing how to merge the HTML generated by {{grc-IPA}} properly with the following list item. There is a div with class="vsSwitcher" and two divs with class="vsShow" and class="vsHide" containing separate lists (ul tags) of pronunciations. And then the list item is appended as a li tag after the divs and MediaWiki doesn't give it a parent ul tag, so it is displayed too far to the left by the browser.

<div class="vsSwitcher">
<div class="vsShow">
* summary, <code>ul</code> with one <code>li</code>
</div>
<div class="vsHide">
* individual labeled pronunciations, another <code>ul</code> with several <code>li</code>s
</div>
* hyphenation, <code>li</code> with no <code>ul</code> around it

I don't see a solution where there are hideable list items and the hyphenation is in the same HTML list. I was thinking maybe class="vsShow" and class="vsHide" could be applied to individual li tags so that all the pronunciations are in one HTML list, which requires {{grc-IPA}} to use literal HTML tags rather than wikitext syntax for the list. But that seems to require the hyphenation to be in a different HTML list. — Eru·tuon 23:03, 19 April 2023 (UTC)[reply]

@Erutuon, Sgconlaw I ran into this same issue with Spanish and Portuguese and found a (hacky but working) solution. See "major hack to get bullets working on the next line" near the bottom of Module:pt-pronunc and/or "major hack to get bullets working on the next line after a div box" around line 946 of Module:es-pronunc (this latter one is a bit more sophisticated). This should probably work for Ancient Greek because I modeled the collapsed multiple outputs of both modules after the Ancient Greek one. Benwing2 (talk) 23:45, 19 April 2023 (UTC)[reply]

Or embed a carriage return at the end of {{grc-IPA}}?? — Sgconlaw (talk) 04:26, 20 April 2023 (UTC)[reply]

Warning of Headword Transliteration Issues

I'm not sure whether this is a Grease Pit or Bear Parlour question.

If a transliteration was supplied in the wikicode for the headword line (at least, if it used Module:headword), the page would be added to Category:Terms with redundant transliterations or Category:Terms with redundant transliterations and the language-specific subcategory. This has recently stopped, at least for Pali. Why? Example: Pali ທຸດໂຖ (duttho, “bad”). The categorisation is still functioning for simple links using {{link}} and {{mention}}. --RichardW57m (talk) 12:25, 18 April 2023 (UTC)[reply]

@RichardW57m Can you check to see if the brokenness was caused by my changes to Module:headword? (Preview the relevant Pali page using the Mar 6 2023 version, just prior to my Mar 8 rewrite.) Benwing2 (talk) 23:38, 19 April 2023 (UTC)[reply]

Corrrection: The text should have said Category:Terms with redundant transliterations or Category:Terms with manual transliterations different from the automated ones.

@Benwing2: I can't check easily. DO you know where the instructions on how to do this are? The module is protected against my changing it, so I can't just preview an edit. --RichardW57 (talk) 00:59, 20 April 2023 (UTC)[reply]

I've found the relevant instruction in the January 2023 volume - use Special:TemplateSandbox. --RichardW57m (talk) 09:50, 20 April 2023 (UTC)[reply]

After modifying the 6 March version to remove the use of deleted method getType for language objects, the change had no effect - the categorisation had already gone. The modified module is at Module:User:RichardW57/20230306/Module:headword. --RichardW57m (talk) 11:43, 20 April 2023 (UTC)[reply]

@RichardW57m Thanks for looking into this and apologies for not responding earlier (I am in Spain now on vacation). I didn't actually know about Special:TemplateSandbox. I'll look into what is going wrong. Benwing2 (talk) 15:26, 20 April 2023 (UTC)[reply]

Actually, this may not be an adequate test to time the change, and indeed it looks wrong. On 16 March Pali had 8 'redundant' transliterations and 19 'different' transliterations, and I haven't made enough edits to eliminate most of them. When I started this topic, the numbers had dropped to 1 and 2, neither instance connected to the headword template. The relevant pages have probably been regenerated frequently, for non-Roman script nouns, verbs and adjective and terms that are inflections all use Module:links. I need to see what happens when also using Module:links as at 6 March. --RichardW57m (talk) 12:49, 21 April 2023 (UTC)[reply]

@Benwing2, Theknightwho I can't work out what's going on. My attempts to examine Module:headword/data suggests that the likes of {{head|pi|noun form}} should never have generated a transliteration, for the data module was setting notranslit["pi"] to false, but I very much remember having to usually add |tr=- to suppress redundant transliterations. However, @Theknightwho has kindly re-enabled the checking behaviour for issues in the transliteration of headwords.

However, this isn't working for Pali because of the setting of notranslit. Could someone therefore please undisable, at the level of Module:headword, automatic transliteration for Pali. Module:pi-headword disables transliteration by default. I thought I had found a way to enable default transliteration for invokers of Module:pi-headword, but once automatic transliteration has been enabled at the level of Module:headword, I will have to retest it. --RichardW57m (talk) 11:11, 10 May 2023 (UTC)[reply]

(I've tested the change to Module:headword/data at Module:RichardW57/enable/Module:headword/data.) --RichardW57m (talk) RichardW57m (talk) 11:25, 10 May 2023 (UTC)[reply]

Part of the problem, the lack of transliterations, can be traced to the change Special:Diff/71569780 on 4 March 2023, which has change comment "Fix issue with "notranslit" languages not having the transliteration suppressed". This changed the interpretation of notranslit from 'Don't record a problem if transliteration fails' to 'Don't use automatic transliteration'.

Several other members of this list with 2-letter codes look odd for the new interpretation:

az (Azerbaijani - Cyrillic Azerbaijani should not be a problem)
oj (Ojibwe)
ro (Romanian)
sh (Serbo-Croatian)

and I'm not sure that Arabic script languages (ms, Malay) and (az, Azerbaijani) need be on it. I suspect that they were originally added because they didn't have automatic transliterators, and that Pali is on the list because

@Octahedron80 hadn't been advised that languages should be removed when a transliterator becomes available; or
@Benwing2 didn't see why a Pali transliteration was needed, as there would be an invocation of {{pi-sc}} on the definition lines.

Alerting others who may well have something to say about the list - I notice that CJK languages are on the list - @Huhu9001, Atitarev. --RichardW57m (talk) 14:52, 10 May 2023 (UTC)[reply]

@RichardW57m: There is a small problem. If I have understood you correctly, "undisable, at the level of Module:headword, automatic transliteration for Pali" would also cause unwanted automatic headword transliteration for Chinese. (This was not a problem before because Chinese did not have a transliteration module then, but it has one now.) -- Huhu9001 (talk) 16:18, 10 May 2023 (UTC)[reply]

Or did you mean just dump Module:RichardW57/enable/Module:headword/data into Module:headword/data? -- Huhu9001 (talk) 16:34, 10 May 2023 (UTC)[reply]

Rather the latter, though obviously whoever takes responsibility needs to review the change. However, in the longer term, we may need to adjust the interface for some specific languages so that:

By default, transliteration is not done.
Default transliteration can be commanded.
Manual transliteration can be done, but is treated as untrustworthy and is therefore flagged up by inclusion in a maintenance category.

I suspect that true transliteration of Moldavian does not always yield correct Romanian, which would be similar to Lao-repertoire Pali, though it might be more like the Burmese and Lanna-script differences in Pali. RichardW57m (talk) 16:54, 10 May 2023 (UTC)[reply]

The Pali-specific change to the data module has now been done (by @Huhu9001). It's effects are slowly filtering through to the two maintenance categories - the test word ທຸດໂຖ took over 16 hours, at which point I purged the page to get it added to the category. A new word to monitor to check on propagation to the categories is ພຸດໂທ. We seem now to have restored, for Pali, the February 2023 behaviour, including, where documented (e.g. {{pi-verb form}}), the selection of the default transliteration (as opposed to the default transliteration behaviour) for Pali headwords. --RichardW57m (talk) 09:23, 11 May 2023 (UTC)[reply]

Thai Transliteration

@Octahedron80, Benwing2, Theknightwho, Surjection Today I've just noticed that {{xlit}} and direct access via method transliterate() on language objects are failing for Thai on words for which default transliteration by {{link}} works. What gives? I don't know how long things have been like this - it might be an ancient discrepancy. An example word is อะไร (à-rai). Most or all monosyllabic words still have transliterations returned. One could make a work around from what works, but that would be highly undesirable. --14:31, 19 April 2023 (UTC) RichardW57m (talk) 14:31, 19 April 2023 (UTC)[reply]

@RichardW57m The only thing I can think of is that the link module is passing in a script and the Thai transliteration module requires this for some reason although it doesn't look like it from examining the code. Benwing2 (talk) 17:51, 19 April 2023 (UTC)[reply]

The short answer is that the link module formally does not use the Thai transliteration module! Instead, it uses 'phonetic extraction', as tabulated in Module:links/data, and reads a hyphen-joined list of Thai-script syllables for the term in question from its page's source. Matters are not totally separate, as the code to interpret this list is also used by the transliteration module. However, formally transliteratable runs of 'alphabetic' Thai characters cannot exceed one syllable.

It may be possibly to extend the Thai transliteration module to also transliterate terms for which we have pages. I think we should prioritise terms over arbitrary plausible syllables. --RichardW57 (talk) 22:06, 19 April 2023 (UTC)[reply]

@RichardW57 @Benwing2 This is a quirk of Thai and Khmer which it would be good to implement properly, as it works by extracting the phonetic respelling from the linked page, and then feeds that through the transliteration module. It’s been partially integrated into Module:links, but is currently treated as a special case. I don’t know all the details, but I did have a brief look into it a while back.

As a side point, I suspect other languages which use phonetic respelling regularly would also benefit, such as Russian. Even if transliterations or respellings have to be done manually, they should only need to be given in one place (leaving aside ambiguous cases for now). Theknightwho (talk) 22:20, 19 April 2023 (UTC)[reply]

@Theknightwho You are proposing examining the page of the term being transliterated to find manual respellings? This is an interesting idea but I worry about it being "expensive" and hitting limits on certain pages e.g. frequency lists. Can you point me to the code in Module:links that special-cases Thai and Khmer? Benwing2 (talk) 23:33, 19 April 2023 (UTC)[reply]

@Benwing2 Yes - I think the most vulnerable pages would be those with large translation lists, as there are very few Cyrillic pages close to any page limits. The same goes for other languages which would benefit from headword parsing, such as Arabic or Persian.

It's referred to at Module:links#L-660 and Module:links#L-782. Theknightwho (talk) 23:37, 19 April 2023 (UTC)[reply]

@Theknightwho: Would Arabic benefit from headword parsing? My impression was that Arabic often had two terms with the same spelling but different transliterations. By contrast, such pairs are rare in Thai and I think even more so in Khmer. --RichardW57 (talk) 00:36, 20 April 2023 (UTC)[reply]

@RichardW57 It could work like Chinese does now: do headword parsing, but fail the transliteration if there are ambiguities. Theknightwho (talk) 00:37, 20 April 2023 (UTC)[reply]

@Theknightwho: Support Thai and Khmer auto-transliteration, especially multi-word. Currently, single-word transliteration works OK, if words are already defined and use corresponding pronunciation templates (with or without respellings). An invisible space (in Thai/Khmer spelling) should be viewed as a word separator for translit.

Works on {{th-usex}} and {{km-usex}}. BTW, undefined words or words with multiple readings can be respelled using native means. Anatoli T. ^{(обсудить}/^вклад) 02:28, 20 April 2023 (UTC)[reply]

@Atitarev Could this be the way to handle usexes for Chinese (etc), too? Seems like a good way to determine word boundaries, as they're crucial for scraping terms. Theknightwho (talk) 02:35, 20 April 2023 (UTC)[reply]

@Theknightwho: It's already used by {{zh-usex}}, which caters not just for Mandarin. A few other special tricks there.

Don't forget {{ja-usex}}, it's based on the kana input but methods similar to {{ja-r}}. Invisible spaces in kana are also used to separate translit. Anatoli T. ^{(обсудить}/^вклад) 02:40, 20 April 2023 (UTC)[reply]

@Atitarev Perfect. Sounds like an ideal model. Theknightwho (talk) 02:41, 20 April 2023 (UTC)[reply]

@Atitarev: {{th-usex}} knows nothing about zero-width spaces (ZWSP). It uses single spaces to separate words and double spaces to represent spaces. It might be worth using a Thai script string for |tr= to give a phonetic spelling to be automatically transcribed as a correction for transliteration. We should perhaps consider using |id= to select transcription by sense ID. --RichardW57 (talk) 08:09, 20 April 2023 (UTC)[reply]

@RichardW57: I support the opinion that ZWSP should be avoided and removed from any Thai and Khmer entries, translations and usage examples altogether.

All current entries with {{th-pron}} and {{km-IPA}} are already rid of ZWSP. The same is true for Burmese entries.

The invisible space (hidden from Thai and Khmer) entries is the best to represent word or component divisions. Anatoli T. ^{(обсудить}/^вклад) 08:15, 20 April 2023 (UTC)[reply]

Actually, I think that outside the terms themselves, is probably better than the ZWSP character. I've been using it to semi-automate transliteration for Mon quotations - see {{mnw-quote}} and Module:mnw-translit. --RichardW57m (talk) 09:25, 20 April 2023 (UTC)[reply]

@Atitarev: I don't understand you. What do you mean by 'invisible space'? I thought the term was being used to mean the ZWSP, i.e. U+200B ZERO WIDTH SPACE, which identifies a line-breaking opportunity and is usually taken as a strong hint of a word boundary. --RichardW57m (talk) 09:39, 20 April 2023 (UTC)[reply]

@RichardW57: No, just a space, as {{th-x|ภาษา ไทย|the Thai language}} resulting in ภาษา ไทย

paa-sǎa tai

the Thai language

Which doesn’t show the space in the Thai spelling but is used to separate words and translit. Anatoli T. ^{(обсудить}/^вклад) 11:56, 20 April 2023 (UTC)[reply]

The phonetic respelling for |tr= is good. I support that idea as well. I thought that would be easier to achieve, since {{th-usex}} and {{km-usex}} already do that (they can work with undefined terms or terms with multiple readings). Anatoli T. ^{(обсудить}/^вклад) 08:18, 20 April 2023 (UTC)[reply]

@Atitarev, RichardW57, Benwing2 I like the idea of being able to manually specify respellings, but I'd prefer that we used a different parameter, as otherwise it'll be really confusing; plus, there may be instances where we want to be able to give manual transliterations as well. I'll also point out that this would benefit languages like Middle and Old Chinese, where {{ltc-l}} and {{och-l}} use numbers to specify the reading when characters have more than one, and this is currently a hurdle that needs to be cleared before we can deprecate them. They could use the same parameter, as it's ultimately just a way to modify the input being sent to the transliteration module. Because of that, I suggest that we use r=, which can stand for "respelling" or "reading". Also pinging @Wpi31, Justinrleung re the MC and OC point. Theknightwho (talk) 08:27, 20 April 2023 (UTC)[reply]

@Theknightwho: Cool. |r= could be used for Japanese as well for such cases.

@Huhu9001 is currently seeking support to implement automated transliterations similar to how you did for Mandarin and Cantonese in Wiktionary:Beer_parlour/2023/March#Does_Japanese_need_automatic_transliterations? Anatoli T. ^{(обсудить}/^вклад) 08:33, 20 April 2023 (UTC)[reply]

(You were part of that discussion, anyway). Anatoli T. ^{(обсудить}/^вклад) 08:35, 20 April 2023 (UTC)[reply]

@Atitarev And most of the other lects ;)

@Vininn126 also suggested on Discord that we could use a similar technique of page scraping to grab things like stress markers for Russian. My only concern with that is that if we start using this for a very large number of languages, it would probably start to cause performance problems. Theknightwho (talk) 08:38, 20 April 2023 (UTC)[reply]

It could also be useful for Sanskrit, where it's not clear whether the pitch accent should be shown in the transliteration - I've never seen it in the Devanagari, but where known is shown in the transliteration in the headwords. --RichardW57m (talk) 09:04, 20 April 2023 (UTC)[reply]

@Theknightwho: What other lects are working? I tried eg Min Nan: 現錢／现钱. Anatoli T. ^{(обсудить}/^вклад) 12:54, 20 April 2023 (UTC)[reply]

@Atitarev All except Min Nan, Hakka and Teochew: the first two because they probably need to be subdivided, and the third because it's not clear which romanisation scheme would make most sense to show. Theknightwho (talk) 12:58, 20 April 2023 (UTC)[reply]

Support the idea for the r= parameter. It would also be useful for the other Chinese lects to allow automatic superscript. – Wpi31 (talk) 08:37, 20 April 2023 (UTC)[reply]

@Theknightwho: Manual transliteration and respelling to express transliteration are serving the same aim, so I don't see why you would want both. Picking a reading from a list is a different concept, largely if not entirely duplicating |id=. Where one would want both transliteration and respelling for a different reason, I think the standard mechanism is already |tr= with |ts=. If one uses |r= for both respelling and reading, as in the underdocumented {{ltc-l}} and {{och-l}}, one then has to decide which meaning one is using. --RichardW57m (talk) 12:54, 20 April 2023 (UTC)[reply]

@RichardW57 It is far less confusing to keep tr= doing what it already does, instead of having a selection of languages where it suddenly does something else. ts= is for transcriptions, so doesn't fit this purpose, and id= doesn't fit either, because a reading may cover multiple senses. It doesn't seem sensible to overload any of these parameters, as you're proposing. Plus, with Middle and Old Chinese, it is very conceivable that one might want to use manual transliteration anyway under certain circumstances, which means that we still need a third approach.

Ultimately, it is far less confusing to have two ways of achieving the same thing than it is to take a parameter with widely understood behaviour and drastically change it. I'm not really sure I understand where your objection is coming from, as there doesn't seem to be any obvious downside. It's a parameter which changes the input to the transliteration module, which is something we can't currently do, and whether that's reading numbers or respellings is immaterial. It was merely convenient to pick a letter that fit both.

Plus, I am certain that if we take your approach, it'll just result in users find bodges to work around it, which is really not what we want. Theknightwho (talk) 13:26, 20 April 2023 (UTC)[reply]

@Theknightwho::

|tr=Roman script text is current behaviour

|tr=Non-Roman script text would be an alternative behaviour.

This is an extension of meaning, not a change of meaning. In both cases, the parameter would prescribe the Roman script transliteration to be displayed; it just needs further processing in the second case. The second method would only be available for some languages.

As far as I understand, |ts= represents the transliteration that would appear had a good spelling reform occurred.

|id= can identify a senseid or an etymid, and obviously an etymology can cover multiple senses. --RichardW57m (talk) 14:28, 20 April 2023 (UTC)[reply]

@Theknightwho: Of course, if we read |r= as 'rules', we could pass in a description of the writing system! That would actually be useful for Pali, which uses an alternative function trwo() for transliteration in inflection tables, where the writing system often can't be identified from the word to be transliterated. --RichardW57m (talk) 14:43, 20 April 2023 (UTC)[reply]

@RichardW57 I don't mind how you read it. I simply object to you saying we should overload another parameter, because it's confusing and messy, and forces other editors to work in ways that might not make sense in some situations. There's nothing wrong with r= and tr= being mutually exclusive, because they work in different ways. You also seem to be arguing that Thai should use tr= but Old Chinese should use id=, but the underlying process is exactly the same, and in both cases it would attach strings to this new method that don't seem to be necessary. Theknightwho (talk) 14:48, 20 April 2023 (UTC)[reply]

@Theknightwho: What I envisaged is the following, in order of precedence:

|tr=- suppresses as at present
|tr=other non-blank takes precedence
For a scraper system, |id= if present determines what to scrape. This might be difficult in some cases - Thai has the data in a pronunciation section, which may precede or follow the definition or etymology section, and so far as I am aware, for Thai the association of pronunciation and sense depends on the order and hierarchy of subsections. In the absence of |id=, some simple rule applies, such as first headword line for the language, with language as defined by the name of the template.

Note that |id= will have precedence in determining what to link to, as at present.

The scraper system will only be available for some languages and only for terms - it would not be available for general purpose transliteration functions. However, some of the discussion below suggest that we may end up with a general purpose paragraph transliterator or two, where paragraphs have to take mark-up. (And I expect we'll wind up with legacy transliterators, e.g. as invoked by {{th-usex}}.)

Is there a new clash in this system?

Fun fact: The combination of language and script does not determine how, if at all, words are separated. And, of course, spaces are not the only visible word separators. And don't forget French typography!

Now, if we instead have an extra information parameter, say |r=, the information will in general be interpreted by language (and script) sensitive rules. At best, the transliterator will need an extra function to decide whether to scrape; alternatively, the transliterator will control the scraping, which I think you will find a disturbing prospect. --RichardW57m (talk) 17:16, 20 April 2023 (UTC)[reply]

@Benwing2, Theknightwho: It would be good to start vocalising Persian headwords and implement some tricks to avoid manual transliterations for both Arabic and Persian. If vocalisations are used, Persian can be transliterated like Arabic (same with Urdu and some other languages). The unused Persian transliteration module works in 90% of cases and can be improved further.

There is some weak resistance to vocalising Persian, mainly because of some differences between classical and modern Iranian pronunciations/transliterations. Also about the notion that Persian vocalisation is not used. Anatoli T. ^{(обсудить}/^вклад) 02:35, 20 April 2023 (UTC)[reply]

I can't really weigh in on this, as I don't know enough about Persian to comment, but it's also possible to take manual transliteration directly. So either way, we can at least make sure that links are consistent with the corresponding headwords. Theknightwho (talk) 08:44, 20 April 2023 (UTC)[reply]

The problem with manual transliteration are that it can be difficult to type, be prone to changes in fashion, or be vomitous (e.g. the monger Thai transliteration we use, which is also difficult to type). --RichardW57m (talk) 09:11, 20 April 2023 (UTC)[reply]

@RichardW57 The current Thai transliteration is not supposed to be entered manually, it is fully reliant on Thai respelling. It is based on Paiboon publisher’s method with one single modification for non-phonemic short vowels. If you really need to enter it, you can use edit tools. Any missing symbol can be added. Anatoli T. ^{(обсудить}/^вклад) 13:06, 20 April 2023 (UTC)[reply]

@Atitarev: I said difficult to enter, not impossible to enter, though the copying of invisible characters on Windows is fraught. Writing {{m|th|เพลา|t=axle}} and seeing เพลา (pee-laa, “axle”) (with 'transliteration' pee-laa) would be nonsense; one currently needs to write {{m|th|เพลา|t=axle|tr=plao}} to be shown the coherent เพลา (plao, “axle”). Other than by suppressing transliteration entirely, how would you currently achieve the latter mentioning without entering a manual transliteration? --RichardW57m (talk) 13:42, 20 April 2023 (UTC)[reply]

@RichardW57m, Atitarev, Theknightwho I haven't had a chance to read this whole conversation in detail. A separate respelling parameter is a possibility but it would be a lot of work to add it everywhere. Another alternative is to have a prefix on transliteration parameters to indicate a respelling, such as r:.... More generally we need a way of having partially-specified translit parameters such as an equivalent of the |subst= param for {{ux}}; it's not really feasible to have to enter the entire translit or respelling when you have a long word or multiword expression and the automatic translit is almost right but not quite. E.g. it would be great to be able to write just [тэ] to indicate that the Cyrillic е following т should be rendered in translit as if written э in a word like интернационали́зм (intɛrnacionalízm, “internationalism”). Benwing2 (talk) 15:39, 20 April 2023 (UTC)[reply]

The idea is that we need a way of passing information from the translit param into the translit module. Benwing2 (talk) 15:40, 20 April 2023 (UTC)[reply]

@Benwing2 {{zh-usex}} has a way of doing this, which we might want to use. If we use spaces to delimit separate terms, it also makes sense to be able to put <tr:xxx> or <r:xxx> after each term. The current Chinese module uses curly brackets, but I really, really want to avoid putting single curly brackets in templates unless we have to, as they're an obvious triphazard. Theknightwho (talk) 15:57, 20 April 2023 (UTC)[reply]

@Theknightwho Yes, I never use single curly brackets for any purpose in any templates I've designed for exactly that reason. Can you explain what you mean by separating terms with spaces? What about multiword expressions? Maybe commas that aren't followed by a space? Benwing2 (talk) 17:27, 20 April 2023 (UTC)[reply]

@Benwing2: I used a Thai example before in response to @Theknightwho but here's a Mandarin Chinese: 他是我的朋友 ― tā shì wǒ de péngyǒu ― He is my friend. The spaces (in the template input) separate words but they are not visible in the Chinese writing. The same principle is used for Thai, Khmer and Japanese equivalent templates. Anatoli T. ^{(обсудить}/^вклад) 22:37, 20 April 2023 (UTC)[reply]

Here's a Thai equivalent: เขา เป็น เพื่อน ของ ผม

kǎo bpen pʉ̂ʉan kɔ̌ɔng pǒm

He is my friend

Anatoli T. ^{(обсудить}/^вклад) 22:41, 20 April 2023 (UTC)[reply]

@RichardW57: Using native Thai and Thai specific templates, it would be เพลา (pee-laa) vs เพลา (plao) with supplied respellings via |p=เพ-ลา or |p=เพฺลา.

The generic {{usex}} templates allows |subst= but this template doesn't transliterate multiword examples for Thai, the way {{th-x}} does. If you don't want to retransliterate, you could use |subst=เพลา//เพฺลา to get the desired "plao"

The Chinese template uses curly brackets but @Benwing2 and @Theknightwho oppose this use.

I'd like to be able to write the same usage example with {{usex}} but this is what happens, if you use the same input on both:

เขา เป็น เพื่อน ของ ผม

kǎo bpen pʉ̂ʉan kɔ̌ɔng pǒm

He is my friend

เขา เป็น เพื่อน ของ ผม

kǎo bpen pʉ̂ʉan kɔ̌ɔng pǒm

He is my friend

Anatoli T. ^{(обсудить}/^вклад) 22:56, 20 April 2023 (UTC)[reply]

@Atitarev, RichardW57, Theknightwho This is similar to the hyphen-in-Korean thing; hyphens in Korean text show up in the translit but not the link or display text. This should definitely be possible (I think) now that Theknightwho added general support for this. I remember you noted there is a problem with hyphens that do need to show up in the Korean link; we need an escape char for these cases. This would be similar to spaces that need to be in the Thai or Chinese. Benwing2 (talk) 23:24, 20 April 2023 (UTC)[reply]

@Benwing2: Thanks. Could you please also say why curly brackets are undesireble

เพลา ― pee-laa ― occasion (archaic)

เพลา ― plao ― axle

With curly brackets:

เพลา และ เพลา(plao lɛ́ pee-laa) Anatoli T. ^{(обсудить}/^вклад) 23:37, 20 April 2023 (UTC)[reply]

@Atitarev Single braces can easily be misparsed if there's an adjacent template call (which uses double braces) or a parameter reference (which uses triple braces). You have to remember to put an extra space next to the single brace if it occurs next to another brace, as in the example you gave. I'd prefer to use some other delimiter, e.g. <...>, <<...>>, (...), ((...)), [...] or even /.../. Benwing2 (talk) 00:07, 21 April 2023 (UTC)[reply]

I'd completely forgotten {{th-l}}. However, it can't italicise the same as {{mention}} and it doesn't do glosses (|t=). For quotations, one could use {{usex|th|เขาเป็นเพื่อนของผม|They are my friends}} getting

เขาเป็นเพื่อนของผม

kǎobpenpʉ̂ʉankɔ̌ɔngpǒm

They are my friends

We need something, probably applied optionally, in the system to transliterate ZWSP or equivalents to modern Roman script word separators. The Mon template {{mnw-quote}} did this until @Theknightwho broke it. I'll raise a new topic about fixing it today - it should be fairly straightforward. Additionally, at present the Thai transliteration module simply fails on polysyllabic words. --RichardW57m (talk) 09:53, 21 April 2023 (UTC)[reply]

@RichardW57 I am way less inclined to help you if you keep making these kinds of accusations instead of being more understanding about why I made many of the changes I did, as they’re needlessly confrontational and rarely get us anywhere. This thread already contains several examples of the ordinary space fulfilling this function for Thai and Chinese, so I have no idea why you insist on using a much more awkward method. Just shouting that I broke things helps no-one, because (a) your method is totally nonstandard and not used by any other language, (b) it was still in testing, and (c) there’s a straightforward solution. Theknightwho (talk) 15:35, 25 April 2023 (UTC)[reply]

Digression moved hither from Wiktionary:Beer parlour/2023/April#Thai Coalmines: --RichardW57 (talk) 11:13, 29 April 2023 (UTC)[reply]

On the issue of transliteration of undefined words, I think when @Theknightwho gets around to it, it will be possible to transliterate multi-words, hopefully with {{m|th|วัน จันทร์}}, the way {{th-x}} can do it, e.g.

วัน จันทร์

wan jan

Monday

The display is a bit ugly, since the template currently doesn't support inline quotations. Anatoli T. ^{(обсудить}/^вклад) 00:51, 27 April 2023 (UTC)[reply]

@Atitarev, Theknightwho How will one decide whether to suppress the space in the invocation? The combination of script and language will not always be enough to determine whether the writing system is scriptio continua. I recently encountered printed scriptio continua Sinhalese script Pali, even though recent publications are aerated. --RichardW57m (talk) 09:34, 27 April 2023 (UTC)[reply]

@RichardW57m: It will language-specific, of course. You can talk on WT:GP about Sinhalese Pali when it comes to it. If you ask specifically about Thai, two spaces make one real space (in Thai script as well), note the space between ไหม and ไม่:

อยู่เมือง ไทย นาน ไหม ไม่ค่อย นาน เท่าไร ครับ

yùu mʉʉang tai naan mǎi · mâi kɔ̂i naan tâo-rai kráp

How long have you been in Thailand? Not very long.

If you observe how th-usex is used, only defined words with {{th-pron}} or some monosyllabic words are automatically transliterated. So if a word is undefined, like วันจันทร์, it won't transliterate as a solid word or as part of a sentence. Anatoli T. ^{(обсудить}/^вклад) 10:54, 27 April 2023 (UTC)[reply]

@Atitarev @RichardW57m It’s script specific, as certain scripts are already flagged as not having spaces. Theknightwho (talk) 11:51, 27 April 2023 (UTC)[reply]

That looks dodgy, as I've never seen scriptio continua Thai script Pali. What do you mean by "it's" - the mood/tense looks wrong, and doesn't agree with Pali ปะระทารัง คัจฉะติ (paradāraṃ gacchati, “to commit adultery”) (correctly) showing words separated by spaces. There are also minor non-Tai Thai languages that are written in non-scriptio continua Thai script. I suspect some can be found in both styles - and there are even missionary works in non-scriptio continua Thai script Thai. --RichardW57m (talk) 12:12, 27 April 2023 (UTC)[reply]

@RichardW57m: I don't think I understand now. If you need a space, you add a space but I was talking about Thai (also Khmer), not Pali or Pali in Sinhalese or Pali in Thai script. Thai and Pali written in Thai work on different transliteration modules (Module:th-translit and Module:pi-translit) with different language codes |th= and |pi=, even if they may share scripts, they work independently. You asked about Thai originally, so I answered about Thai. What seems to be dodgy? Anatoli T. ^{(обсудить}/^вклад) 12:23, 27 April 2023 (UTC)[reply]

@Atitarev: Template {{m}} is a general purpose template. If one is going to change the rules on how it works, then one should ensure that what worked before will still work. This is difficult in general - most testcases that achieve existence are inadequate, and preconditions are generally not documented (possibly not even known), which is why @Theknightwho breaks so much. The preconditions of {{th-x}} are close to acceptable because one can, at the cost of more labour, always use the language-independent templates. I wasn't happy at having to effectively have 1 (/⁠สิบ⁠/, “10”) in the quotation for โควิด-19 because {{th-x}} wouldn't accept '19' as a word. --RichardW57m (talk) 13:20, 27 April 2023 (UTC)[reply]

On a technical note, transliteration for Thai in links doesn't use the transliteration module Module:th-translit - it uses the scraper in Module:th-pron. Thus {{xlit|th|เพลา}} yields monosyllabic plao whereas {{m|th|เพลา}} yields disyllabic เพลา (pee-laa). There is thus a challenge with {{m|th|เก้า}} yielding เก้า (gâao) with a long vowel. At present, {{th-l|เก้า|p=เก้า}} yields the hypothetical short-vowelled homograph เก้า (gâo). --RichardW57m (talk) 13:20, 27 April 2023 (UTC)[reply]

@RichardW57m: I see that you're complaining about the quality of Thai transliteration. If the same method is taken as it is for Chinese, the numbers will be transliterated. But you can't expected the module to reliably syllabify Thai texts and guess the readings by context, it's not possible, so spaces will be necessary. It's not comparable to Cyrillic or even Devanagari. Even a more phonetic Lao script fails to transliterate in 10% of cases when it's more like Thai, with clusters. If you pay attention to how Module:th-pron works, the input is full of hyphens and and respellings, including symbols specific to mark consonant clusters as opposed to inherent vowels. Anatoli T. ^{(обсудить}/^вклад) 14:26, 27 April 2023 (UTC)[reply]

Also, {{m}} will work with Thai if it’s solutioned the way it is for some Chinese varieties. It will fail on undefined multisyllabic words and words with multiple readings. We’re not there yet but it may happen. Anatoli T. ^{(обсудить}/^вклад) 15:02, 27 April 2023 (UTC)[reply]

Inserting a space in วันจันทร์ only works properly if one thinks it is two words. So far as I am aware, ทร์ is not a possible Thai syllable, so it so happens that this word can be broken into syllables in only one way if one properly restricts the syllables ending in mai hanakat. With regard to เพลา and เก้า, I was noting issues with incorporating the scraper into the transliterator.--RichardW57m (talk) 15:18, 27 April 2023 (UTC)[reply]

@RichardW57m: In จันทร์ (jan), the last letter ร (rɔɔ) is killed with the diacritic ◌์ and ท (tɔɔ) is just silent. The module knows how to pronounce the word, since it's respelled as จัน (jan) in {{th-pron|จัน}}. Without the re-spelling in the {{th-pron}}, it won't guess the reading correctly, even a single word, let alone long sentences without spaces. With regards to เพลา, how do you expect the modules to guess the correct reading of the word? Is it "เพ-ลา" (pee-laa) or "เพฺลา" (plao)? The automatic syllabification of Thai will never be 100% accurate, Thai has tons of irregular readings or Sanskrit, Pali and modern European loanwords.

You can try (if you haven't) https://www.thai2english.com/, which has its own transliteration and it's only about 80% accurate.

My understanding is, if @Theknightwho implements the transliteration using regular means, such as {{m}} or {{t}} for sentences or collocations, it will require spaces between words and occasional re-spellings. It will fail when the input doesn't have enough information (undefined words, no {{th-pron}} in the entry or multiple readings), just like transliteration of undefined Chinese words or sentences fails without word separations. Anatoli T. ^{(обсудить}/^вклад) 22:41, 27 April 2023 (UTC)[reply]

It is a major effort to work out an unambiguous pronunciation for จันทร์ (jan), and the Wiktionary's Thai transliterator doesn't make it. Why do you think I expect the transliterator to work out the pronunciation of เพลา (pee-laa)? I do think though that the unaided scraper gets it wrong more often than not when working on sentences. Getting it right needs an analysis of the context. That word and แหน (hɛ̌ɛn) are the standard examples that 100% accurate phonetically-related automatic Thai transliteration is impossible. However, |tr= is part of the regular means for {{m}} and {{t}}, and that will always work. --RichardW57 (talk) 08:00, 28 April 2023 (UTC)[reply]

One of the things I dislike about the Thai transliterator is that it does, or did not, tolerate Roman script within Thai text. I see quite a bit of Roman script in Thai sentences. However, my main point was that there is an alternative to using {{th-x}}, so it does not have to cover all cases. {{l}} and {{m}} need to cover all cases. --RichardW57m (talk) 15:18, 27 April 2023 (UTC)[reply]

So the issue was not coalmines, but transliteration? Or are you derailing to complain? Vininn126 (talk) 09:50, 28 April 2023 (UTC)[reply]

@Vininn126: @Atitarev raised the issue of transliteration at the paragraph starting "On the issue of transliteration of undefined words" in his post of 00:51, 27 April 2023 (UTC). That paragraph and the consequent replies could reasonably be appended to the discussion at Wiktionary:Grease_pit/2023/April#Thai Transliteration. Does anyone (e.g. @Theknightwho) object? --RichardW57m (talk) 10:27, 28 April 2023 (UTC) RichardW57m (talk) 10:27, 28 April 2023 (UTC)[reply]

@RichardW57m If you need a visible space, put two spaces. I don’t really understand your concerns here. Theknightwho (talk) 13:52, 27 April 2023 (UTC)[reply]

My concerns are consequent complexity and breakages. Within the Thai system, there would be problems with Thai terms with existing visible spaces, such as ก กา (gɔɔ-gaa) (breaks Module:th-anagram and friends), ดูช้างให้ดูหาง ดูนางให้ดูแม่ (duu-cháang-hâi-duu-hǎang duu-naang-hâi-duu-mɛ̂ɛ) (breaks {{th-der}}), and quite possibly from reduplicated terms such as กล้วย ๆ (glûai-glûai) and งู ๆ ปลา ๆ (nguu-nguu-bplaa-bplaa). --RichardW57m (talk) 10:13, 28 April 2023 (UTC)[reply]

For complexity, an issue would come with automatic generation of links, e.g. invoking a template to link to both a Thai script Pali phrase and its transliteration (to the Roman script) as derived within the template. Under this proposal, the link for the Thai script form would need double spaces but the transliterator would need to start from a string with single spaces. Such a template is {{pi-nr-inflection of}}, for which no replacement has been proposed. --RichardW57m (talk) 10:13, 28 April 2023 (UTC)[reply]

@RichardW57 Why would the transliterator need to start from a string with single spaces? Theknightwho (talk) 12:20, 28 April 2023 (UTC)[reply]

@Theknightwho: My error there; I didn't realise that multiple spaces got folded to single spaces in page names. --RichardW57m (talk) 13:25, 28 April 2023 (UTC)[reply]

Of course, transliterators from Latin or Devanagari would have to be tweaked to double spaces in some cases, potentially a source of subtle bugs. Would the quote-xx family of templates be modified to affect the display of text in scripts deemed not to separate words with spaces (or whatever the script spaces property means). --RichardW57m (talk) 13:25, 28 April 2023 (UTC)[reply]

@RichardW57m It is potentially a source of subtle bugs, yes, and it may be worth automatically converting single spaces into ZWSP and double spaces into single spaces before things reach the transliterator. I have far less of an issue dealing with ZWSP in modules; I just want to avoid requiring them to be input by an ordinary user, as it's time-consuming or even inaccessible for people who are less technically savvy. Theknightwho (talk) 13:38, 28 April 2023 (UTC)[reply]

@Theknightwho: There is a highly accessible way of entering ZWSP - just type ''! If one does that systematically, one then gets the intuitive method of representing single spaces by single spaces. --RichardW57m (talk) 08:52, 2 May 2023 (UTC)[reply]

@RichardW57m It’s 5 characters instead of 1, and makes the wikitext much less readable. Is there a reason why you object to using the space character, or two spaces if a visible space is needed? This has worked very well with other languages for years. Theknightwho (talk) 10:48, 2 May 2023 (UTC)[reply]

I've found that the contrasts of no v. one space and one v. two spaces aren't always easy to see. Additionally, '' is clearly mark-up. Converting single space to double space is tweaking, and from there it's a slippery slope to falsifying digits when using {{th-x}}, which I have found being done. --RichardW57m (talk) 12:22, 2 May 2023 (UTC)[reply]

Language specific isn't good enough - consider Vietnamese and Zhuang. --RichardW57m (talk) 11:53, 27 April 2023 (UTC)[reply]

Vietnamese or Zhuang don't require transliterations. Anatoli T. ^{(обсудить}/^вклад) 12:26, 27 April 2023 (UTC)[reply]

Wiktionary has both Roman and CJK lemmas for both of them. --RichardW57m (talk) 13:24, 27 April 2023 (UTC)[reply]

@RichardW57m: Reading back from CJKV characters would require the same volumes of data as with Chinese but I don’t see the need personally. Anatoli T. ^{(обсудить}/^вклад) 15:04, 27 April 2023 (UTC)[reply]

Baht Symbol

How should the Baht symbol (฿ (bàat)) be transliterated? (Alerting @Octahedron80). Does it instead need to be removed from the Thai script as defined in Module:scripts/data? @Theknightwho has decreed at Module:languages#L-901 that no character of a Wiktionary script shall have a character of the script in its standard Wiktionary transliteration. (Should I take this question to the Beer Parlour?) --RichardW57m (talk) 14:10, 25 April 2023 (UTC)[reply]

The bath symbol is stated as "common" script in Unicode, that means no proper script for it, like other symbols. As a symbol, it is not needed to transliterate to anything. It can be put before or after an amount. It is surely not a letter of Thai script, but it has been there to support local encoding.

The baht symbol is not quite used todays, because perhaps it is hard to type. In Thai text, we can use บ. e.g. 100 บ. or just write the full word บาท. --Octahedron80 (talk) 15:19, 25 April 2023 (UTC)[reply]

I’ve excluded the baht symbol from the Thai script. We need to overhaul many of the script character lists, as they’re too simplistic at the moment. Theknightwho (talk) 15:30, 25 April 2023 (UTC)[reply]

Being simplistic was generally good until you added the test for script characters surviving transliteration. It also has made the definitions readable. When I look at the new code for the Thai script, I now see square box, hyphen, box with question mark in it, legible character, hyphen, square box. Three of the four Thai block characters you've used are unassigned! Something like u(0x0e00) .. "-" .. u(0x0e3e) .. u(0x0e40) .. "-" .. u(0x0e7f) would have been far more readable. --RichardW57m (talk) 16:29, 25 April 2023 (UTC)[reply]

I fixed it. The reserved ranges shouldn't be included. In Unicode, Thai set will not be changed or added for decades as no more need arises (while Lao is already added thrice.) --Octahedron80 (talk) 02:03, 26 April 2023 (UTC)[reply]

I was simply following the convention in the data module, as other scripts do include the whole block. Theknightwho (talk) 02:39, 26 April 2023 (UTC)[reply]

It depends on what's done with Thai Noi. There's already been an attempt to add some Thai Noi glyphs to a transliteration standard for the Thai script, and there've been rumblings about some of the diacritics for Thai script Patani Malay - general acceptance of a proposal not to encode the latter doesn't prevent a subsequent decision to encode them as part of the Thai script. At least the Thai script is unlikely to have any swastikas added - four got added to the Tibetan script. Fongman with rat's teeth is also a possibility. --RichardW57m (talk) 11:50, 26 April 2023 (UTC)[reply]

And @Octahedron80 has raised the relevant point that the Baht symbol seems to be falling out of use. When I looked for Thai price tags we had at home, I found that instead some of them had a Roman script abbreviation written sideways - and I don't think the books they were on were targeted at foreigners. Back in the 1990's, I can remember feeling that the baht symbol after the digits looked very like an '8'. So perhaps the symbol does need to be transliterated given that it's falling out of use, though taking it out of the script definition does feel right to me. (I trust we won't have any problems with automatic font substitution not handling it.) --RichardW57m (talk) 16:29, 25 April 2023 (UTC)[reply]

Transliterator String Decomposition Bug

@Theknightwho: Module:languages records the (new?) bug that transliteration makes the assumption tr(s1) .. tr(s2) == tr(s1 .. s2) and that problems may arise with Wiki mark-up such as ''' if it is violated. While this seems true, and can be a problem for majority Thai- and Lao-script Pali, the problem materialises much less often than the bald statement suggests. I believe it is only a problem when such text pairs s1, s2 are separated by Wiki mark-up. Thus, whereas many non-Indian Indic scripts have multi-part vowels in form NFC, and thus don't fulfil the assumption, mark-up will normally not be inserted between them, and thus the problem will not arise. We should therefore tone down the reported problem - it only applies if mark-up can be inserted between some strings.

The problem arises for some Pali writing systems because the transliterator needs to identify the writing system, a job done much better with a long string. Chopping it up can defeat the attempt at identification. Text in a quotation or usage example will have the word being documented emboldened, so there will generally be three chunks independently presented to the transliterator, rather than a whole sentence. Because of the effects of sandhi and the fact that in Indic scripts one can usually only embolden whole aksharas, I often (or indeed, because I use texts greedily, usually), end up manually transliterating the texts anyway and so the bug rarely affects my contributions. (The general phenomenon did strike my Pali challenge for the Interlinear glossing template, in a severer environment in which individual words were presented.)

@Sgconlaw, Benwing2: Now, back when I started adding quotations, I was invited to add a Pali transliterator so that I wouldn't need to transliterate quotations manually. As others may have responded to those urgings and used non-Roman examples and quotations without manual transliterations, a warning about this bug should be added to the documentation of the general quotation and usage example templates. Does the bug actually affect any languages other than Pali? --RichardW57m (talk) 16:13, 19 April 2023 (UTC)[reply]

@RichardW57m You are correct that this is indeed an issue, and it’s something I would very much like to do away with. Strictly speaking it isn’t a bug, as it’s intentional, but it’s certainly a limitation.

The reason for introducing it is because the module converts various formatting characters (based on Lua patterns) into PUA characters, and the originals are stored in a table. The string is then subdivided, and the portions between those characters are fed through in turn. At the end, the original formatting is then reinserted. This is a way to avoid sending any formatting through things like transliteration modules, which are often not equipped to handle them well, and means that complex formatting survives intact.

The downside, of course, is that that means it’s possible to place formatting in places that causes the output to be incorrect. In practice, that’s rare, but it has been known to happen (e.g. Korean had to be excluded, as a handful of entries were impacted by this in a way that was impossible to get around).

The (better) alternative would be to fix up all the transliteration modules to ensure they handle these PUA characters okay, so that we know it’s safe to give them the full string. I had to do that for Korean, and it was nontrivial. It also probably involves setting up a complex test suite, which certainly isn’t straightforward, either. Theknightwho (talk) 16:29, 19 April 2023 (UTC)[reply]

This is technically beyond my competence, so I don't have anything useful to add to the conversation. — Sgconlaw (talk) 20:32, 19 April 2023 (UTC)[reply]

@Sgconlaw: Possibly on where the warnings to users should be posted, primarily which templates should hold warnings. --RichardW57 (talk) 21:10, 19 April 2023 (UTC)[reply]

@RichardW57: I'm afraid I don't even understand your initial post. I may have to read it again (maybe a few times). — Sgconlaw (talk) 21:50, 19 April 2023 (UTC)[reply]

The formal problem is that the management of transliteration assumes that the transliteration of a concatenation is the concatenation of the transliterations. For scripts of Farther India, that isn't necessarily so, so the warning in the documentation of Module:links (and not in Module:links/documentation) is alarming. However:

The conditions for this to be a problem are more circumscribed, and for example seem unlikely to arise for the Burmese script.
The warning is addressed to developers, but is also relevant to editors using automatic transliteration in quotations and the like.
The action needed of editors is to check transliterations.
The problem might not be around for ever.

I started this topic with the aim of dispelling undue alarm but encouraging necessary caution. --RichardW57 (talk) 22:21, 19 April 2023 (UTC)[reply]

Graph extension disabled

Yesterday the Wikimedia Foundation noted that in the interests of the security of our users, the Graph extension was disabled. This means that pages that were formerly displaying graphs will now display a small blank area. To help readers understand this situation, communities can now define a brief message that can be displayed to readers in place of each graph until this is resolved. That message can be defined on each wiki at MediaWiki:Graph-disabled. Wikimedia Foundation staff are looking at options available and expected timelines. For updates, follow the public Phabricator task for this issue: T334940

--MediaWiki message delivery (talk) 17:36, 19 April 2023 (UTC)[reply]

Maybe the new graphs could look nice, when the issue is resolved. --Apisite (talk) 06:15, 25 April 2023 (UTC)[reply]

"Not that my opinion holds any weight, but In that case, my opinion is to sunset Vega, upgrade EasyTimeline to datadraw (made by the same author as ploticus using similar syntax) (or keep porting Ploticus to new debian releases), and close T137291 as invalid." - Snævar, "Restrict editing of Vega spec to a small set of users" --Apisite (talk) 04:46, 5 June 2023 (UTC)[reply]

Wiktionary:Per-browser preferences

Do we still need Wiktionary:Per-browser preferences? Ive been using Wiktionary for years and I didnt even notice it lurking in the left sidebar (on Vector) until now, but it seems it is a much less user-friendly version of the Special:Preferences function that most of us probably click purely on instinct. But since both simply appear with the label Preferences, some people might click the legacy left-sidebar version instead. Are there still some functions that depend on the legacy version still being accessible? Thanks, —Soap— 21:23, 19 April 2023 (UTC)[reply]

@Soap compare my other reply to you just now... I suspect that part of the issue here is that we have too few interface-admins who can work on converting these preferences to gadgets (or removing unnecessary ones). This, that and the other (talk) 13:04, 4 May 2023 (UTC)[reply]

Special:AbuseFilter/87

According the log of edits this filter caught, it caught 35 edits, all in 2018. In the Notes, Eq mentions that the vandal it's aimed at prolifically added a certain thing, and @Surjection mentions reactivating it in 2022 because the vandal was back, but the log doesn't show it catching any edits in 2022, and the edits it did catch in 2018 were edits like this and the creation of Zeilensprung and Tropico del Capricorno with normal content, which (without giving away the private workings of the filter) are entirely unrelated to what the text of the filter or the notes suggest it's aimed at. Is the log messed up and showing the hits for a different filter, or did this filter exclusively catch a few unrelated edits years ago, in which case do we still need it to be active and consuming 4.6 conditions of the condition limit now? Other filters to consider deactivating: filter 139 (the "3033" filter) caught just four edits, all a year ago and to a single page, which seems like it could be better handled by page protection; and filter 147 (the "78564565" filter) caught just two edits. - -sche (discuss) 09:26, 21 April 2023 (UTC)[reply]

You can see in the filter's history that it was previously targeting a completely unrelated kind of vandalism. Since Equinox repurposed the filter it doesn't appear to have had a single hit. Although I don't know what Surjection saw last year, there is evidence that the vandal in question is surprisingly persistent, so the filter may be worth repairing and keeping.

I'm supportive of turning off those other two filters though. This, that and the other (talk) 11:33, 21 April 2023 (UTC)[reply]

Persistent is an understatement. I created Special:AbuseFilter/46 to deal with them 7 years ago (I suspect they were doing the same thing 5 years before that), and I still had to block them again and clean up more of their edits (see Special:Contributions/63.135.183.128) 9 hours or so ago. I suspect there's something neurological involved, because they keep trying to add the same kind of edits that bear only superficial resemblance to anything that would actually work. It took them years and hundreds of disallowed edits before they stopped doing the specific things that Filter 46 was designed to stop. There are enough of their bad edits that are too close to what competent editors do for an abuse filter to stop them entirely. Chuck Entz (talk) 03:39, 6 May 2023 (UTC)[reply]

Ugh, yeah, this was probably the only "recurrent problem edit" more annoying than the XXXXXX ones (and that strange person who occasionally comes around to tell us about Bee Movie and other film credits). I haven't seen it happen for a while though. Equinox ◑ 02:25, 8 May 2023 (UTC)[reply]

Well, if either of you or anyone else can rewrite the filter so it actually catches those edits (AFAICT it hasn't caught anything since 2018, and what it caught then wasn't this), by all means please turn it back on. - -sche (discuss) 03:26, 8 May 2023 (UTC)[reply]

Transliteration of wbr tag

For HTML5, the tag is the same as the ZWSP, but has the advantage of being visible in general purpose editors. In many languages unburdened* by substantial software support, ZWSP is the invisible indicator of a boundary between words. For these languages, the transliteration of the word boundary is the space character. However, ZWSP is now also seeing use in languages which use spaces to separate words, so the best transliteration of a ZWSP is not always a space.

* Thai is burdened thus, and so changing the size of the display easily results in misbroken words.

--RichardW57m (talk) 13:57, 21 April 2023 (UTC)[reply]

The Mon transliteration module, which is still under development, was transliterating '' as a space, while deliberately leaving other encodings of the tag (e.g. '') unchanged. However, the code in Module:languages obscurely* no longer passes this tag to the transliteration module; this change has broken {{mnw-quote}}. To repair this template and the documentation of Module:mnw-translit, I would like to know how the tag is processed.

* Well, I couldn't find where the tags are converted.

--RichardW57m (talk) 13:57, 21 April 2023 (UTC)[reply]

Is the tag passed through unmodified?
Or is it normalised, e.g. to '' or to ''?
Should we rather use ,  or ? --RichardW57m (talk) 13:57, 21 April 2023 (UTC)[reply]

Is there actually some Wiktionary convention to distinguish ZWSP as a word separator and ZWSP as merely a line-breaking opportunity? (For example, one might be encoded as a hexadecimal numerical entity and the other as a decimal numerical entity, rather than say as '' v. ''.) We might also want to have the option of selectively transliterating it as a soft-hyphen, though one ought to be careful with languages that traditionally didn't use soft hyphens but now do, such as Thai. --RichardW57m (talk) 13:57, 21 April 2023 (UTC)[reply]

Should we abandon the hope of not having to significantly edit HTML text from unburdened languages without visible word separation? --RichardW57m (talk) 13:57, 21 April 2023 (UTC)[reply]

Module:languages converts formatting characters to ensure they don't get messed around with by the transliteration modules (and others), because we have hundreds of them and many are designed quite crudely. The only way to re-enable  would be to convert it into a ZWSP before that happens, or to make a special exclusion for . The latter would be difficult, and would partially defeat the point of the way things are set up anyway.

The former is possible, but I'm not keen on it anyway because I don't think it's the right approach. Instead, I propose that Mon uses the same approach taken by Chinese, Thai and Khmer, which is to say that it should use the space character for word boundaries. Before displaying Mon text on the page, these spaces can merely be removed - making sure to do this after the transliteration's been generated. Perhaps the template could insert  tags automatically, too. Theknightwho (talk) 16:35, 25 April 2023 (UTC)[reply]

Yep. I will use my examples again to demonstrate the effective use of spaces in language-specific templates (+ Japanese and Khmer):

Input:

{{zh-x|他是我的朋友|He is my friend|in_notes=y}}

{{th-x|เขา เป็น เพื่อน ของ ผม|He is my friend}}

{{ja-x|あの人は私の友達です|あのひとはわたしのともだちです|He is my friend}}

{{km-x|គាត់ គឺជា មិត្ត{មិត} របស់ ខ្ញុំ|He is my friend}} ({មិត} is a respelling for មិត្ត (mɨt))

Output:

他是我的朋友 ― tā shì wǒ de péngyǒu ― He is my friend

เขา เป็น เพื่อน ของ ผม

kǎo bpen pʉ̂ʉan kɔ̌ɔng pǒm

He is my friend

あの人(ひと)は私(わたし)の友達(ともだち)です

anohito wa watashi no tomodachi desu

He is my friend

គាត់គឺជា មិត្ត របស់ខ្ញុំ ― kŏət kɨɨ ciə mɨt rɔbɑh khñom ― He is my friend

Anatoli T. ^{(обсудить}/^вклад) 04:32, 26 April 2023 (UTC)[reply]

I counter-propose that Mon quotations be minimally modified from normal Mon text in HTML, and that one be allowed to use the standard quotation templates ({{quote-book}} etc.) in the normal fashion. At present, this isn't really available until we have properly agreed Mon transliteration, though note that the principal tendency is to use transliteration rather than transcription, though we can use separate templates for metadata and text. Until then, it's worth leaving the manual transliteration in headwords as a way of gauging opinions. The biggest problem is that it is common to transliterate U+1036 MYANMAR SIGN ANUSVARA according to its pronunciation, which can be /m/, /ʔ/, /h/ (not all systems distinguish) or /ɔ/. Once the transliteration module is incorporated, we can look at what to do about the residual cases. Note that the current mark-up is done by adding a single letter, rather than the full word respelling used elsewhere. --RichardW57m (talk) 13:03, 26 April 2023 (UTC)[reply]

There are people who object to turning every word into a link; at present this would have the problem of creating a sea of red and orange. It also creates an issue where transliterators choose not to separate what appear to be words - more potential cases like Thai วันจันทร์, which we seem to have rejected as a word. --RichardW57m (talk) 13:03, 26 April 2023 (UTC)[reply]

Finally, we don't have a generic template merging the characteristics of {{th-x}} and {{ja-x}}; one would have to be written. The present hope is that {{mnw-quote}} is temporary, and can ultimately be deprecated (for readability of old pages) or deleted. I expect matters to move at a glacial pace. --RichardW57m (talk) 13:03, 26 April 2023 (UTC)[reply]

@Theknightwho: I seem to have forgotten to announce that I had fixed the problem with {{mnw-quote}}. I did the conversion of ZWJ to space in the template, rather than deferring it to the transliterator. Unfortunately, it seems that the template does not become redundant when the Mon transliterator is released. There's also the problem that anusvaras need to be tagged, but there is hacking preprocessing in the language modules that can fix the anusvara issue - assuming it gets invoked by the quotation templates. --RichardW57m (talk) 15:51, 19 July 2023 (UTC)[reply]

Multilevel list enumeration

Is there a good way to format multilevel lists to provide numbering of subsenses as e.g. 1. (a), rather than 1. 1.? See discussion at Wiktionary_talk:Style_guide#Subsense_formatting/syntax. —DIV (49.180.206.58 01:43, 22 April 2023 (UTC))[reply]

Not a good way. The Wikimedia software uses HTML numbered lists, so you would have to monkey with things at the HTML level to change from numbers to letters, which would probably be rather messy and high-maintenance. Chuck Entz (talk) 03:23, 22 April 2023 (UTC)[reply]

If you make an account, you can change the formatting of lists using CSS in Special:MyPage/common.css. For instance, I have 1., a., i. as the three list levels because of this CSS:

ol {
    list-style-type: decimal;
}
ol ol {
    list-style-type: lower-latin;
}
ol ol ol {
    list-style-type: lower-roman;
}

If you don't want to make an account, there are various browser extensions that let you add CSS to particular websites, such as Greasemonkey. — Eru·tuon 05:18, 22 April 2023 (UTC)[reply]

Thanks for some helpful & insightful suggestions, Eru.

Chuck Entz, given that my suggestion would improve readability, it seems that the question is how difficult it is to implement that technically. Leave aside how I implemented the example: I'm not recommending that these things be hardcoded; I'm making a recommendation as to the resulting style (technology-neutral as to the implementation).

It seems that Eru has already demonstrated how this is possible in HTML. Rather than only having reader-friendly multilevel numbering for 'special' users who (i) have an account, and (ii) know how to code in HTML, and (iii) are aware of the CSS template ...I'm essentially proposing to change the default behaviour. Prima facie could it be as simple as implementing Eru's CSS code in the global/default template?

In the meanwhile, I also happened upon an example of what you get with the current default. See the entry at stroke#English, from which I reproduce an abridged extract below.

1. An act of hitting; a blow, a hit.

[...]

1. a stroke on the chin

[...]

4. (ball games) An act of hitting or trying to hit a ball; also, the manner in which this is done.

1. (cricket) The action of hitting the ball with the bat; a shot.

2. (golf) A single act of striking at the ball with a club; also, at matchplay, a shot deducted from a player's score at a hole as a result of a handicapping system.

[...]

2. A movement similar to that of hitting.

1. One of a series of beats or movements against a resisting medium, by means of which movement through or upon it is accomplished.

[...]

1. (rowing)

1. The movement of an oar or paddle through water, either the pull which actually propels the boat, or a single entire cycle of movement including the pull; also, the manner in which such movements are made; a rowing style.

2. (by extension) The rower who is nearest the stern of the boat, the movement of whose oar sets the rowing rhythm for the other rowers; also, the position in the boat occupied by this rower.

2. (swimming) [...]

2. A beat or throb, as of the heart or pulse.

There are four levels, all numbered indistinguishably. I think this looks awful.

Let's not start out with what we think will be easy to do with the technology and then try to justify the lacklustre outcome. Let's start with what the best formatting would be, and then figure out whether there's a suitable way to achieve that outcome.

—DIV (1.145.0.140 05:28, 20 May 2023 (UTC))[reply]

Surface analysis template

It is not showing affixes categories anymore. Can anyone fix this? 2A01:CB06:A021:48AD:FCF5:2153:668:B44C 06:37, 22 April 2023 (UTC)[reply]

They should be showing now. Theknightwho (talk) 07:05, 22 April 2023 (UTC)[reply]

chess

“[…] from Middle Persian 𐭬𐭫𐭪𐭠 (mlkʾ /šāh/)[[Category:|CHESS]], from Old Persian 𐏋 (XŠ /xšāyaθiya/).” The category text is not in the source. J3133 (talk) 15:46, 22 April 2023 (UTC)[reply]

Note that both the function "template_categorize" in Module:utilities/templates, which is in the transclusion list at chess and the function "tr" in Module:Phli-translit have local variables named "categories", and that the latter function only assigns a value to "categories" when certain letters are present, but always returns it. Is it possible that the variable is being shared somehow by the two functions? Chuck Entz (talk) 18:03, 22 April 2023 (UTC)[reply]

After looking at Special:WhatLinksHere/Module:Phli-translit, it looks like the part after [[Category:| is always the pagename in uppercase, so there's something else going on. @Theknightwho. Chuck Entz (talk) 18:19, 22 April 2023 (UTC)[reply]

At lwc, {{alter|pal|𐭩𐭥𐭬}} puts the page in Category:Automatic Inscriptional Pahlavi transliterations containing ambiguous characters and displays 𐭩𐭥𐭬 (yʿm)[[Category:|LWC]] even if I comment out everything else on the page. Chuck Entz (talk) 18:35, 22 April 2023 (UTC)[reply]

@Chuck Entz It seems to be related to transcriptions interfering with categories added by transliteration modules. These are both pretty niche features, so I'll take a look to see what's going on. Theknightwho (talk) 18:55, 22 April 2023 (UTC)[reply]

@Erutuon has fixed this. This exact issue previously happened with Module:zh-translit, actually - I just forgot this module was affected as well. Theknightwho (talk) 19:34, 22 April 2023 (UTC)[reply]

Replace Tagalog Reference into Template

Hello, I would like to request that all Tagalog terms with "* Maugnaying Talasalitaang Pang-agham Ingles-Pilipino" (The words can be found with insource:/* Maugnaying Talasalitaang Pang-agham Ingles-Pilipino/), that text should be replaced with "* {{R:Agham}}". Thanks! Ysrael214 (talk) 17:37, 22 April 2023 (UTC)[reply]

@Ysrael214:

Done JeffDoozan (talk) 19:54, 23 April 2023 (UTC)[reply]

Oh wow, I was about to do that manually. Thank you! — 🍕 Yivan000 ^view_talk 23:56, 23 April 2023 (UTC)[reply]

Documentation of Transliteration Interface

We need to update this. Some hasty, irritated documentation at Wiktionary:Etymology_scriptorium/2023/April#Ancient_Greek_ϕᾶρος_at_entry_pharate may need to be updated when we're done. The update may belong at {{translit_module_documentation}}. --RichardW57m (talk) 13:15, 24 April 2023 (UTC)[reply]

@RichardW57m: I did some investigation and this is what I found.

The 2nd return value "fails" seems to have no effect on how transliteration is displayed on normal entries. The only thing it does is to make mod:languages skip some of the code called "Second round of temporary substitutions" in the comment.
The 3rd return value "cats"'s effect is uncertain. Different templates seem to treat it differently. It does generate categories for link templates, but headword-line templates simply discard it.

I don't know how to update the doc with this. -- Huhu9001 (talk) 04:53, 25 April 2023 (UTC)[reply]

I think we need @Theknightwho to explain how fails being true and text being nil are intended to differ, beyond fails being nil means that the old one-value convention for the return value was used. --RichardW57m (talk) 13:19, 25 April 2023 (UTC)[reply]

The current usage of fails is the sane logic that failing to transliterate any reasonable part of the string constitutes a failure to transliterate the string. --RichardW57m (talk) 13:19, 25 April 2023 (UTC)[reply]

The frequent ignoring of cats just means that not all transliteration clients have been updated. The jobs should be straightforward to complete for Module:headword. {{xlit}} is more awkward. For each case, there will be some of three reasonable ways to proceed:

Add the category links to the complete string. This is suitable in most cases.
convert the category links to module errors
This rather presumes that categorisation is entirely in terms of errors!
Suppress the categories - what is currently happening.

Perhaps this is why the integration of the new interface has not been completed. --RichardW57m (talk) 13:19, 25 April 2023 (UTC)[reply]

The reason the transliteration method returns an explicit fail value is because there needs to be a definitive way to say "this transliteration failed", rather than merely assuming it has because you've got nil. After all, the transliteration method will return nil if you input nil, but it doesn't register that as a transliteration fail, as it's just a non-op. I considered using false, but that might break some other module, as there are lots that rely on the transliteration method. Theknightwho (talk) 16:42, 25 April 2023 (UTC)[reply]

I think a better example is where transliteration for a language is switched off for some scripts, but one doesn't want to flag the omission as an error. A relevant transient example would be Lao script Pali, which took longer to get working well than other scripts. Does any transliterator yet return anything but nil for the second return value? One possibility of outputting an error message from the transliteration module is to return it as a string value of fails, though the only pleasant way I see of exposing it to invoking wikicode is to process it as though it were the transliteration. --RichardW57m (talk) 10:49, 26 April 2023 (UTC)[reply]

In short, I think the point is that nil can indicate 'no service' as opposed to indicating 'something went wrong'. --RichardW57m (talk) 10:49, 26 April 2023 (UTC)[reply]

Tatweel and suffixes

The Ottoman Turkish word طاتسز is automatically assigned to Category:Ottoman Turkish terms suffixed with سز. That page was deleted recently because it was empty. Now it would not be empty if created. If I try to create it I am warned

Warning: Display title "Category:Ottoman Turkish terms suffixed with ـسز" was ignored since it is not equivalent to the page's actual title.

The correct page is Category:Ottoman Turkish terms suffixed with ـسز, which is empty. The convention for Ottoman Turkish at least, per a discussion some years ago, is to write the suffix as ـسز (“without”) distinct from the etymologically unrelated سز (“you”). Vox Sciurorum (talk) 17:11, 25 April 2023 (UTC)[reply]

@Vox Sciurorum This was due to a change made by User:Fenakhay, which I have reverted because of the breakage it introduces. Benwing2 (talk) 04:21, 26 April 2023 (UTC)[reply]

Temporarily remove inactive users from Babel categories?

Someone suggested this years ago, and some people supported it and others—including me—were sceptical/opposed, but I've come to think it's a good idea: could we write a bot to go through Babel categories, and for any user who hasn't edited in N months, HTML comment-out their Babel template or add a category-suppressing parameter to it? This would make it easier to use the categories to find recently-active users to contact when there are questions about particular languages. Bonus points if the bot, performing the check every few months, could restore categories to users who become active again (if not, the bot's edit summary could tell the user to unsuppress themselves once they returned to activity and noticed its edit to their page). (Optional bonus points if the bot could check users' global activity, since in theory someone inactive here but active on ko.Wikt would still get a notification and come here if pinged with a question about some Korean word.) Or even if we don't write a bot to do this, should we allow people to manually comment-out inactive users in whatever language categories they regularly check? The one downside I can see is that if there aren't any active editors in a given language, decategorizing them means there are no leads if a question about the language comes up; as it is, if I go to a category and the only user is inactive, I might at least be able to get a response if I e-mailed them. An alternative idea is to not remove inactive users from the categories but move them to the end by sorting them under a string like "⏳ inactive"+their username (or something). - -sche (discuss) 00:33, 27 April 2023 (UTC)[reply]

Any of these would be good. I'd prefer that we don't make it too hard to find inactive editors of a given language. Activity in other wikis should count. The special icon and moving to the bottom idea appeal to me. DCDuring (talk) 19:17, 27 April 2023 (UTC)[reply]

`{{wikipedia}}` and `{{hot word}}` broken on mobile

On narrow screens, these two templates interfere with headers and images, turning the page into a mess: example

The issue appears to be that the HTML are floated, which means they don't get their own line on the page. This works fine on wide screens, where there's plenty of room for multiple elements to share the same line. On narrow screens, everything gets squashed.

To fix this issue, I was looking into editing MediaWiki:Mobile.css, but it seems like I need to be an interface administrator to make edits. If some could look into this or just give me the permissions, it would be much appreciated. Ioaxxere (talk) 04:25, 27 April 2023 (UTC)[reply]

Thanks, I've been complaining about {{wikipedia}} for a long time, but people seem to like it that way. Do you want to disable floating? Maybe the box can be shrunk a bit as well. Unfortunately I don't have permissions either. Jberkel 21:50, 1 May 2023 (UTC)[reply]

Spaces attribute for Scripts

@Benwing2, Theknightwho Where is the discussion of the addition (on 4 March 2023) of the unreliable, undefined attribute spaces to Module:scripts/data? --RichardW57m (talk) 16:16, 27 April 2023 (UTC)[reply]

Words used in definitions?

Can we make a list of words we don't have English or Translingual entries for, but which are used in "gloss" definitions, or definitions of English entries, whatever best avoids getting results like avertir (non-English lemma) due to its inclusion in the definition of avertissent (its own inflected form, which defines itself as an inflected form of the French word avertir)? To find typos and misspellings to correct as well as entries to add. A similar project was undertaken by User:Beland (who may be able to do/assist with this), see his talk page, but it searched everything including quotations, so was swamped by obsolete spellings in quotes. - -sche (discuss) 12:09, 29 April 2023 (UTC)[reply]

Also would reveal how many words in our definitions are poor choices for non-specialist entries in a dictionary for normal users. The software needed might also be a useful component of more elaborate software to detect definition circularity, though we might want to track rather than suppress avertir/avertissement/avertissent/ […] for that purpose. DCDuring (talk) 14:25, 29 April 2023 (UTC)[reply]

@-sche, DCDuring: I'm not sure exactly what's being requested. Would you like me to do a version of Wiktionary:Spell check/likely misspellings that excludes quotations? If so, how can I tell what's a quotation from the wikitext; do I just ignore all lines starting with "#*"?

avertir would not be included both because that page exists, and because avertissent only uses it inside a template (which are ignored).

I was also under the impression that archaic spellings are still included in Wiktionary; seems like entries for those should be easy to make because they just point at the modern spelling? -- Beland (talk) 03:29, 1 May 2023 (UTC)[reply]

Well, it was easy to do a quick run excluding lines beginning with "#*", and the results do seem to be more highly concentrated with misspellings. Posted at: Wiktionary:Spell check/likely misspellings#T1+ASCII from 2022-04-20 dump excluding quotations.

DCDuring, I'm not sure if you mean circular definitions in the sense of the term itself being repeated in the definition, or definitions of multiple terms that are defined only by each other. These spell checks treat different pages highly independently, which is great for parallelization, and so far I haven't bothered distinguishing English from non-English words. Tracking circularity of definitions across multiple pages would be a rather different calculation that would require very different access patterns. Some low-level moss code might be helpful in parsing Wiktionary dumps, but after that it would probably require building something from scratch. Detecting single-page circularity would require less infrastructure, but I'm not sure it's easy for an algorithm to do compared to how easy it is for a human to know they are confused. Not sure we have enough traffic to crowdsource identification of circular definitions, but perhaps I could come up with some heuristics if you had a few examples of existing circular definitions. -- Beland (talk) 09:28, 1 May 2023 (UTC)[reply]

I realized after I posted that there would be some giant steps to get to what I seek. I was hitching a ride on the discussion because I thought that the same urge to improve definition quality that motivated the initial request might draw folks to more challenging tasks. I am interested here only in the problems of English definitions. Non-English L2 section definitions seem most to suffer from disambiguation problems due to use of highly polysemic words in glosses.

One class of circularity occurs in definitions taken from MW1913. Common definitions from there of common words include a list of synonyms, usually also common and polysemic. See keen Adj. defs. 1-6 and sharp Some only have multiple synonyms as definitions. I suppose that an initial filter would be the occurrence in a definition of three or more (wikilinked?) words, separated by commas or semicolons (and and or or).

As for other types, hypothetically like sharp being defined as "keen" and keen as "sharp", limiting the start of searches for such circularity terms with a single definition might be practical. DCDuring (talk) 11:58, 1 May 2023 (UTC)[reply]

See also {{rfeq}} Vininn126 (talk) 10:07, 1 May 2023 (UTC)[reply]

Thanks for the list excluding quotations, Beland! :) To answer your earlier question / explain what I was looking for and why: basically, I was looking for a list of higher-priority issues. Wiktionary:Spell check/likely misspellings (including words in quotations) is very useful towards the ultimate goal of making Wiktionary complete, and I'm thankful for it, but a (separate) list of priority cases would differ from it in several ways. Wiktionary does include archaic spellings if they're attested, but they're not always (sufficiently) attested: there may be three different quotations that all use a word like narrative, and hence those quotations will be present in the entry narrative and potentially the entries of every other attested word the quotations use (e.g. the same Dickens quote Citations: "and touched them with its freshest tints" has been pasted into the entries or citations pages of fourteen of the words it uses; any words in that quote which lacked entries would, AFAICT, show up near the top of Wiktionary:Spell check/likely misspellings)... but some words in the quotation, e.g. camaraderial (amingos, etc), may not be (otherwise/sufficiently) attested. Many of the words in the quotation-inclusive section of Wiktionary:Spell check/likely misspellings are spellings from old works, often not in English, which therefore have to be checked (individually, slowly) by someone who knows enough about each language, e.g. 13th century Galician or 15th century French, to determine whether the hits for e.g. google books:enjames are the same/relevant sense (making the word attestable), whether amingos is real or a misspelling (making it excludable), etc. In the long term process of making Wiktionary complete, determining which 13th century Galician words found in a quote are attested in enough other quotes to be included is indeed work that will need to be done. But in the short term, a list excluding quotations / one restricted to words used "in wikivoice" — where (normally) only English (or Translingual) words should be used — that don't exist as entries with ==English== or ==Translingual== sections, would be useful... as you've now found. :) Btw, if all templates are ignored, that seems suboptimal: ideally anything used as (at least) the second parameter (if not just any parameter other than the 1st parameter) of the very common linking templates {{m|en|2ndPARAMETER}}, {{l|en|2ndPARAMETER}} or as the first parameter of {{1|1stPARAMETER}} that doesn't have an English entry should also be found (that, with "en", is if we're looking for things that should have English entries and don't). - -sche (discuss) 21:11, 1 May 2023 (UTC)[reply]

It should be possible to get this from the HTML dumps, and ignoring languages other than English is easy: they are marked with the HTML lang= attribute (if wrapped in a template). Jberkel 21:43, 1 May 2023 (UTC)[reply]

We probably want to lump Translingual into English for the purposes of this, since any species name etc. used in wikivoice will have a Translingual rather than an English entry, so a list of things used in wikivoice and lacking an English entry will be swamped with taxonomic names unless we also count things with Translingual entries. (OTOH, IMO we'd want to find out if, say, "foobarisers" is used in wikivoice, and is a bluelink / exists as an entry, but only because it's a French entry, with no English entry.) - -sche (discuss) 22:20, 1 May 2023 (UTC)[reply]

Ok, I'll take a look once the stats and wanted pages are generated. However, the "enterprise" dumps are broken, again :( phab:T335761. Jberkel 16:17, 2 May 2023 (UTC)[reply]

Thanks for the very detailed elucidations! It will take me a bit to find time to tweak my code for Wiktionary-specific templates and things and implement the proposed heuristics to detect circular definitions. In the meantime, it looks like several editors have jumped in (yay!) and fixed all the obvious misspellings my first no-quotation run detected! This is kind of amazing, because over on English Wikipedia we have always had at least a year of backlogged typos. I checked the other categories the system sorts typos into, and found that some of them had many additional misspellings and easy-to-fix missing English entries. I posted updated lists at Wiktionary:Spell check/likely misspellings#More likely typos from 2023-04-20 dump so people can keep fixing high-priority actual misspellings. I also re-ran the with-quotations algorithm across more typo categories, and created a separate page for the slower work of filling in missing entries for all languages. The top 50 are now posted at Wiktionary:Spell check/most wanted if anyone wants to peck at those. I'll probably find more after Wiktionary-specific tweaks to the parser, but the current results already look useful. -- Beland (talk) 23:31, 2 May 2023 (UTC)[reply]

Many of the wanted terms are typographic variations of ſ/s (theſe etc.), perhaps they should be filtered? Or created with {{alt typ}}? I don't remember what we agreed on. – Jberkel 20:27, 15 May 2023 (UTC)[reply]

ſo what? It's a legitimate, even an important, spelling variation, isn't it? Otherwise we wouldn't make our quotations hard to read. DCDuring (talk) 00:30, 16 May 2023 (UTC)[reply]

Like on Wikipedia, long-s is automatically redirected (without even an explicit "hard direct") to s, forms with long s are never entered as entries. The list should convert long s to s (and hence not call for adding theſe, which is an invalid entry title, if these exists: but it would be fine to call for, say, "sodaizations" if there are quotes about "ſodaizations" and we have no entry "sodaizations"). - -sche (discuss) 04:40, 16 May 2023 (UTC)[reply]

I guess we have long s in quotations for the same kind of reason that we have publisher's addresses and multiline subtitles in citations. I wish I knew why that kind of reason is appropriate for Wiktionary. DCDuring (talk) 12:10, 16 May 2023 (UTC)[reply]

It's probably a combination of nerdery and data fidelity: "academically" transcribed sources preserve even more typographic detail (such as hyphenation). The data can always be converted/normalized/reduced later, if needed. – Jberkel 12:24, 16 May 2023 (UTC)[reply]

We often have things like this memorialized in RQ templates. DCDuring (talk) 14:11, 16 May 2023 (UTC)[reply]

(Happy to register my own opposition to long s in quotations and wildly belaboured citation info while we're are it. —Al-Muqanna المقنع (talk) 17:10, 19 July 2023 (UTC))[reply]

Bot suggestions

Hey, I have two ideas that could be performed by a bot, respectively regarding Dutch and French entries.

Dutch entries often lack specification of whether they are used in the Netherlands or Belgium. On the Dutch Wiktionary, one can find extensive prevalence categories of what % of Dutch/Flemish people know a certain word. A large discrepancy in the knowledge % very likely indicates that one word is only common in one of the two regions. I don't know what threshold can be used, but there is certainly bot potential in this, I can just feel it.

Moreover, French entries really often lack an Etymology section. On the French Wiktionary, loanwords and so on are much more extensively documented, compare for example this and this. And many of the categorized French words are already present here, their entries are just not as detailed. Surely there is a way to code a bot into parsing this information and indicating it. Synotia (talk) 20:09, 29 April 2023 (UTC)[reply]

A new template for Old Tupi

Hi, I assume here I can ask help to creating a template for old Tupi nouns. It's about relational nouns. But I'm not an expert of wiktionary template creation and I never have programed in Lua or Javascript before. NoKiAthami (talk) 21:08, 30 April 2023 (UTC)[reply]

Wiktionary:Grease pit/2023/April

Suggested change to Block & Delete reasons

Making a test environment for Lua code, access through git?

Line break with Template:hyphenation

Alternative forms with the wrong language code

"Greek > Ancient" in translation tables

Adding the Origin of Ohana

Quiet Quentin stopped working?

Citation flagged

Productivity categories

Disabling bold and italics for the Khitan scripts

the translation-adder gadget's edit summaries

Minor issue with the translation adder

Template:ll

Entries incorrectly categorized in "Category:Pages linking to anchors not found in Appendix:Glossary"

Template:+obj

First addition to Wiktionary caught in spam filter

Interlinear glossing template

Alphabetization of Ü in a Category List

Old Norse descendant trees broken after Westrobothnian was commented out

hiding entry maintenance categories

Word vectors or embeddings

{{col-auto}}

sloppy templating

unprotect request

Override for Korean hyphen is required.

Graphical display on the right of prefix/infix/suffix slots

My own CSS Page

Unstated bad interaction between Module Links and a client?

Interwiki links are broken?

Better way to handle descendants

Page creation disallowed, "recurring vandalism"

Navajo tables look bad with Firefox (desktop and mobile)

Links to reconstruction entries in etymologies

SORRY abbreviation

Strange Warning

Term not listed alphabetically

Lack of support for parameters in Template:&lit

Problems with sorting in categories

{{grc-IPA}} weirdness

Warning of Headword Transliteration Issues

Thai Transliteration

Baht Symbol

Transliterator String Decomposition Bug

Graph extension disabled

Wiktionary:Per-browser preferences

Special:AbuseFilter/87

Transliteration of wbr tag

Multilevel list enumeration

Surface analysis template

chess

Replace Tagalog Reference into Template

Documentation of Transliteration Interface

Tatweel and suffixes

Temporarily remove inactive users from Babel categories?

{{wikipedia}} and {{hot word}} broken on mobile

Spaces attribute for Scripts

Words used in definitions?

Bot suggestions

A new template for Old Tupi

`{{col-auto}}`

`{{wikipedia}}` and `{{hot word}}` broken on mobile