Wiktionary:Grease pit/2020/November

prōde as indeclinable adjective

Hi. I needed to identify this word, Latin prōde, as an indeclinable adjective but I couldn't figure out how. I looked at Template:la-adecl, linked to from Template:la-adj, and it seemed to suggest I could use "la-adj|prōde<+>" to do what I wanted to do (I quote: "A bare lemma is equivalent to the lemma followed by <+>."), but it doesn't seem to work at all. I'd appreciate it if you fixed this for me.--Ser be etre shi (talk) 07:17, 1 November 2020 (UTC)[reply]

The default method is just {{head|adjective|head=prōde}} {{q|indeclinable}}. You should never leave an entry without a headword template. Chuck Entz (talk) 07:44, 1 November 2020 (UTC)[reply]

@Ser be etre shi I implemented |indecl=1 awhile ago for this purpose; unfortunately it wasn't documented. It now is. Benwing2 (talk) 16:51, 1 November 2020 (UTC)[reply]

Morphology tables for Kunwok verbs

Dear editors, my colleagues are studying the Kunwok dialect of this group w:Bininj Gun-Wok. They have information about the various forms of the verb, see the page.

Please, could you advise, which morphology template in Wiktionary we can take as an example to create a morphology template for Kunwok verbs? --Andrew Krizhanovsky (talk) 11:47, 1 November 2020 (UTC)[reply]

Help for a primitive module

Trying to make for el.wiktionary, probably the most primitive 'auto cat' ever seen. All was going well. But a problem occured with one Category. Its title comprises of two keywords taken from the language-data module. I cannot split the title and match its parts. I describe the problem at el:Module talk:yy. If it is not possible, we could just alter completely this Category's name to a more manageable one. I was just wondering if it were possible -sorry to bother you with such questions- Thank you ‑‑Sarri.greek ^♫ | 12:38, 1 November 2020 (UTC)[reply]

@Sarri.greek Hi Sarri ... I think you're trying to pull out the two language names from a category like Νεοελληνικές λέξεις αγγλικής προέλευσης? Then you'd want something like this:

local receiver_lang, donor_lang = mw.ustring.match(category, "^(.+) λέξεις (.+) προέλευσης$")

After this, if category has the value "Νεοελληνικές λέξεις αγγλικής προέλευσης", then receiver_lang will have the value "Νεοελληνικές" and donor_lang will have the value "αγγλικής". Benwing2 (talk) 17:34, 1 November 2020 (UTC)[reply]

Thank you, thank you @Benwing2. I need help, review and examples like that for Ref#Captures because the expressions flactuate (genitive, accusative etc. I need to match them with some data). The manual is difficult to apply, it is my only lesson-source. I try to write down easy-copy-paste applications need at my notes, for example, notes for titles.

It would be so nice, if real Luaers review such notes. Because, all lesson pages make an abrupt leap from 'Hello world' to very complicated things, without any intermediate examples. Describing does not help, because even a comma makes a difference. Erutuon, recommended help also from wikipedia, and indeed Johnuniq and Trappist the monk gave me lots of tips. The only helpful course i found is this one @wikiversity, by prof. Dave Braunschweig (I asked for help for small wikis). The aim is not to teach Lua, but to offer copy-paste easy-to-apply modules for small wiktionaries. ‑‑Sarri.greek ^♫ | 00:07, 2 November 2020 (UTC)[reply]

@Sarri.greek Feel free to ask me questions. I agree that more examples would help; the manual basically assumes you have experience with another programming language, and it can be rough otherwise. Benwing2 (talk) 00:25, 2 November 2020 (UTC)[reply]

Thank you @Benwing2. It works fine with the above expression. I have problem with el:Κατηγορία:Νεοελληνικές λέξεις προέλευσης από τη μέση άνω γερμανική where i need to substract προέλευσης από τη μέση άνω γερμανική. ‑‑Sarri.greek ^♫ | 00:31, 2 November 2020 (UTC)[reply]

Never mind. I shall try a different combination. ‑‑Sarri.greek ^♫ | 00:34, 2 November 2020 (UTC)[reply]

Same problem at e.g. trial el:Module:yy line500 is that I cannot match the second part of e.g. el:Κατηγορία:Δάνεια από τα αγγλικά από τα αγγλικά to the el:Module:Languages.apota keyword. I did it manually, by substracting the first words (Borrowings, Calquies, etc) assuming that what is left is apota keyword. ‑‑Sarri.greek ^♫ | 01:38, 2 November 2020 (UTC)[reply]

@Sarri.greek I assume από τα αγγλικά means "from English"? I see that each language has an apota entry expressing how to say "from LANG" for that language. Are you trying to go from the από τα ... text to the language code, i.e. get the language code for a specific language given the category? In that case, you first have to build a table that maps from apota text to the language code. What you have in el:Module:Languages is the opposite, it's a map from language code to apota text. If this is what you're trying to do, then you first need to build a table like this:

apota_to_language_code = {}
for langcode, data in pairs(Languages) do
    apota_to_language_code[data.apota] = langcode
end

Then you need to extract the apota text and look up the language code, something like this:

local apota_text = mw.ustring.match(category, " (από .*)$")
local apota_langcode = apota_to_language_code[apota_text]

This assumes that all apota text variants begin with από surrounded by spaces and that the apota text is always at the end of a category. If this isn't the case, then it gets harder, and we can figure out how to handle it depending on what if anything is always true about the apota text. Benwing2 (talk) 04:46, 2 November 2020 (UTC)[reply]

You are great, @Benwing2. I will study and try out all this. I have already tried some at el:Module:lang. There are so many expressions in inflectional languages. I can at least break them to even smaller parts. If needed we would have to change all keys. I will let you know what happened! PS -i do not know how to write captures, i write them wrong all the time-. ‑‑Sarri.greek ^♫ | 04:53, 2 November 2020 (UTC)[reply]

@Benwing, apota was not a problem. Already works fine at el:Module:yyline500+el:Module:langp.langapota_to_langiso. I will try some of your thoughts at titles with two keys, plus 2 variations per key with your style for captures. Thank you so much. ‑‑Sarri.greek ^♫ | 07:20, 2 November 2020 (UTC)[reply]

The problem is keyword «from»: mw.ustring.match(cat_title, "^(.+) ???where does it stop? (.+) προέλευσης$") It does not know where to put boundaries, unless it finds if somewere, in order to match it. Τhe expressions preceding it have the same problem: they are other keywords, also varying. I guess, the whole thing is impossible. It is the fault of such varying style of naming. I will propose to my bureaucrat a change of all these keyowrds. ‑‑Sarri.greek ^♫ | 08:09, 2 November 2020 (UTC)[reply]

@Sarri.greek It might be possible with the existing naming scheme but you might have to loop over all languages and check each one in turn, which isn't efficient if you have a lot of languages. Definitely better if the naming scheme is as consistent as possible. Benwing2 (talk) 15:19, 2 November 2020 (UTC)[reply]

Yes yes Sir! @Benwing2, I am now trying to persuade them to change... :) We all thank you for your help ‑‑Sarri.greek ^♫ | 15:25, 2 November 2020 (UTC)[reply]

Module:ckb-pron

Could someone who understands pronunciation modules please take a look at Module:ckb-pron and figure out why it's suddenly started creating links to the pronunciations? For example, at مازوو (mazû) instead of outputting /maːzuː/ it's outputting /maːzuː/. The weird thing is that neither Module:ckb-pron nor {{ckb-IPA}} has been edited recently, but this behavior has only started within the last two weeks. —Mahāgaja · talk 13:20, 1 November 2020 (UTC)[reply]

@Mahagaja It's because of a change made by User:Fenakhay to Module:ckb-translit on Oct 21. I think the intent was to have translits linked similarly e.g. to what happens with Gothic, but this isn't the right way to do this. I undid the change and I'll poke around and see if I can figure out how the Gothic translit linking is happening. Benwing2 (talk) 17:02, 1 November 2020 (UTC)[reply]

There's a link_tr = true setting for Gothic in Module:languages/data3/g. This is implemented in Module:links. We can potentially add this to Module:languages/data3/c for Central Kurdish. Benwing2 (talk) 17:07, 1 November 2020 (UTC)[reply]

Automatic transclusion of English pronunciation symbols

One area where Wiktionary is still severly lacking is consistent and user-convenient inclusion of pronunciation guide material. While pronunciation files are way more difficult to formulate at first, much less to incorporate into entries and to reflect standard phonlogy, phonetic alphabets can revolutionize the enrichment of Wiktionary with pronunciation data. To date, however, they can only be entered by individual contributors using the on-screen IPA keyboard, which I think is the largest barrier to widespread participation in creating pronunciation sections, especially for laypersons. At the same time, almost every entry on Merriam-Webster is accompanied by the site's specialized transcription, which can be converted, with full conservation, to IPA by a simple conversion table. Luckily, Merriam-Webster's has a free API, too. Based on the previous, I propose the creation of:

template:MerriamIPA to be used by individual contributors to automatically transclude US pronunciation characters and convert them into IPA. These can include both symbolic representations and audio files. Though, to incorporate the files would require further technical and legal compatibility with Commons to perform the extra steps to create an entry there.
MerriamIPA bot to search for pages lacking in US pronunciation data and adds them as appropriate. As a next proofreading step, to supplement regional pronunciations added as perceived by editors, it can compare already present IPA symbols for US pronnuciation with standards from Merriam-Webster and add the latter if not found.

Unfortunately, I don't have neither the time nor the expertise required for such project, but I do think that such preliminary conception can be easily translated into a concrete piece of code by volunteers on the site. Please let me know your feedback. Assem Khidhr (talk) 15:38, 2 November 2020 (UTC)[reply]

Excellent idea. Let's hear what the bot guys have to say. -- Dentonius (my politics | talk) 15:54, 2 November 2020 (UTC)[reply]

@Assem Khidr, Dentonius I don't think we can legally do this. The API says this:

The Merriam-Webster Dictionary API is free as long as it is for non-commercial use, usage does not exceed 1000 queries per day per API key, and use is limited to two reference APIs.

I'm not quite sure what this means as I haven't looked into what "API key" means in this context, but I seriously doubt it would fly to start copying stuff into Wiktionary. Their goal is clearly to entice people into building their API into games and such so that the successful ones end up having to pay them for continued use, not to allow a free site like Wiktionary to leverage their work. Benwing2 (talk) 03:11, 4 November 2020 (UTC)[reply]

Anything restricted to noncommercíal use isn't free enough for a Wikimedia project. —Mahāgaja · talk 14:43, 4 November 2020 (UTC)[reply]

@Benwing2, Mahagaja: I was thinking about this when I read the noncommercial condition in the API site, but as far as I know, content available under fair use terms is sometimes allowed in Wikimedia projects. Whether this would apply to a Wiktionary template, a Wiktionary bot, and pronunciation data, however, is beyond my knowledge at the moment. I'll try to consult the appropriate sources. Legalities asides though, I was wondering about the technical feasibility of such project. Once the vision is adopted and the technical readiness achieved, even if with significantly less comprehensive free equivalents, data availability might follow suit in the future. Assem Khidhr (talk) 15:42, 4 November 2020 (UTC)[reply]

template lb produces a wrong link

(Märkisch)

linking to w:Brandenburgisch dialect.

German Low German Buəter & Schæper for example are attested in Friedrich Woeste, "Eine Zwergsage. Mündlich .. in märk. [= märkischer] Mundart" and are from a Westphalian dialect, for which WP only has w:Westphalian language. --Der Zeitmeister (talk) 19:14, 2 November 2020 (UTC)[reply]

I guess Woeste is using a different definition of Märkisch than we are. If you're sure these words are Westphalian, then just label them Westphalian, rather than Märkisch. Does he say where in the Westphalian Sprachraum his "Märkisch" is/was spoken? —Mahāgaja · talk 16:43, 4 November 2020 (UTC)[reply]

@Mahagaja:

It's relating to Grafschaft Mark (en.wp) and Märkischer Kreis (en.wp); also compare märkisch-sauerländisch.
Westphalian is less specific, hence it's not good, especially as there are many quite different Westphalian dialects, such as: Westmünsterländisch, Münsterländisch, Lippisch, Paderbornisch, Soestisch, ...
Using an ambiguous term when there are common and less ambiguous or even unambiguous terms is a bad idea. For Brandenburgisch there are common and less ambiguous terms:
- Brandenburgisch
- Märkisch-Brandenburgisch (*Mark-Brandenburgisch might be better but AFAIK doesn't exist)
- Brandenburgish (English and not German; e.g. The Dialects of Modern German, The Oxford Handbook of Comparative Syntax, poor e-book edition).
- More specific terms like Nordmärkisch, Mittelmärkisch, Nordbrandenburgisch.

--Der Zeitmeister (talk) 10:25, 2 January 2021 (UTC)[reply]

@Der Zeitmeister: It seems both dialects get called "Märkisch", so we're probably best off avoiding that term altogether. I've no objection to calling the East Low German lect "Brandenburgisch". As for the Westphalian lect, maybe we could call it "Märkisch Westphalian"? Also, w:de:Westfälische Dialekte suggests that Westphalian can be divided roughly into four dialect groups or more narrowly into nine dialects, including Märkisch. Would it be good enough for us to use the four-group scheme instead, and assign Märkisch words to whichever group Märkisch belongs to (I assume it's South Westphalian but I'm not sure)? —Mahāgaja · talk 11:05, 2 January 2021 (UTC)[reply]

@Mahagaja: South Westphalian too is too unspecfic. Hence Märkisch, Westfälisch-Märkisch or Märkisch-Sauerländisch should be used, with Brandenburgisch (or Brandenburgish, Brandenburgian, if any is a common name?) for the other.

BTW:

What can be found:

Märkisch-Westphalian gets only one or two Google Books result, but not related to languages ([1]).
märkisch-westfälisch gets a few results relating to languages ([2], [3]; borderline: [4]).
[5] gives some titles with westfälisch-märkisch, and there might be some titles with märkisch-sauerländisch (märkisch-sauerländ. is an abbreviation, märkisch-sauerländem probably a mistake lacking an -isch).
Westfälisch-Märkisch gets a few result ([6]; with lack of capital: [7], [8]).
Märkisch-Sauerländisch too ([9], [10], [11], [12]).

Possibilities:

Use an existing term, like Märkisch or Westfälisch-Märkisch.
Make up a translation of an existing term like Märkish or Markish, Westphalian-Markish. Not a good idea, and in English often the German terms are used, e.g. English Münsterländisch from German Münsterländisch and not Münsterlandish (or with Munster, -ic or -ian).
Make up a term, like *Märkisch-Westfälisch or *Markish-Westphalian. A bad idea, at least if there are (common) existing terms.

--Der Zeitmeister (talk) 19:00, 30 January 2021 (UTC)[reply]

@Der Zeitmeister: I do think using a German name is probably best, because there is certainly far too little English-language literature about this dialect for an English name to be established. And I reiterate that Märkisch alone is ambiguous, because Brandenburgisch is also called Märkisch. I could support "Westfälisch-Märkisch" or "Märkisch-Sauerländisch", though. —Mahāgaja · talk 19:12, 30 January 2021 (UTC)[reply]

Bot task: change hyphens to en dashes in defdate argument

There is a page Special:WhatLinksHere/Template:tracking/defdate/hyphen listing uses of {{defdate}} with hyphens. One is supposed to use an en dash instead of a hyphen to separate centuries. Most of these can be mechanically converted by looking for the pattern digit,digit,t,h,hyphen,digit,digit in the argument. Alternatively, the template itself could to the work. See next post. Vox Sciurorum (talk) 19:26, 2 November 2020 (UTC)[reply]

Improving defdate

On French Wiktionary there is a template named siècle that takes a century number (in Roman numerals) as argument and adds the appropriate text around it, e.g. {{siècle|XX}} yields (XXe siècle). We could improve the most common use of defdate by adding a template century with one or two arguments. With one argument it generates the equivalent of [from 17th c.]. With two arguments it generates a range using the en dash that nobody likes to type, [17th–19th c.]. Thoughts? Vox Sciurorum (talk) 19:31, 2 November 2020 (UTC)[reply]

Good idea, it has been brought up before, but nothing came out of it: Wiktionary:Beer parlour/2018/November § Making Template:defdate more machine-readable. – Jberkel 21:25, 2 November 2020 (UTC)[reply]

I created {{century}}. {{century|3|21}} = [3rd–21st c.]. I will attempt to document it. Vox Sciurorum (talk) 23:57, 2 November 2020 (UTC)[reply]

Seeing who linked to my just-created template turned up another discussion from 2009: Wiktionary:Beer_parlour/2009/September#Dating_and_subsenses. I like the idea suggested there that the obsolete label should say when the term became obsolete. And yet, the best is the enemy of the good. Vox Sciurorum (talk) 00:07, 3 November 2020 (UTC)[reply]

@Vox Sciurorum An alternative is to make {{defdate}} smarter, so that e.g. it autoconverts hyphens next to numbers to en-dashes, and automatically converts e.g. 17c -> 17th c., and autoconverts 17-19c -> 17th–19th c.. If you think this is the right approach, I can probably implement it. Benwing2 (talk) 05:36, 5 November 2020 (UTC)[reply]

I figured if it hadn't happened in two years it wasn't going to happen. Making {{defdate}} smarter is fine, if it doesn't use too much memory. Vox Sciurorum (talk) 14:35, 10 November 2020 (UTC)[reply]

Strip soft hyphens from links

I unknowingly cut and pasted a word containing U+00AD, a soft hyphen. The character is invisible so I didn't know I had it, but it caused what should have been a blue link to turn red: ursprung instead of ursprung. Soft hyphens should be stripped like diacritics are in Old English link (bisċoprīċe = biscoprice), and they should not be allowed in page names. Vox Sciurorum (talk) 19:05, 3 November 2020 (UTC)[reply]

@Vox Sciurorum: I've prohibited the soft hyphen in titles with a title blacklist entry. I get that it'd be convenient if it were automatically stripped from links formatted by Module:links, but I'm uncertain whether to do that because we mostly strip things that are supposed to be in the displayed text and I'm not sure we want soft hyphens. If we stripped soft hyphens, I think it would be done for all languages in makeEntryName in Module:languages. (Would be nice if the MediaWiki software would do it for us in wikilinks so that plain links would also automatically link to the soft-hyphen-less title.) — Eru·tuon 01:03, 4 November 2020 (UTC) — Eru·tuon 01:03, 4 November 2020 (UTC)[reply]

Buggy Buginese Template

The template bug-noun isn't displaying Lontara spellings properly. --Apisite (talk) 09:10, 6 November 2020 (UTC)[reply]

memory errors

Anyone know any changes made recently resulting in more memory errors? Formerly using {{multitrans}} would reduce memory by around 13M on average, whereas recently it's not reducing as much memory. I didn't make any changes to the {{multitrans}} mechanism that would result in this. land and bat are now running over when recently, with {{multitrans}}, they were well within range. Benwing2 (talk) 07:32, 7 November 2020 (UTC)[reply]

It's sad to see how little support there is from the WMF regarding our memory problems. Phabricator tickets regularly get closed deferring to "on-wiki" solutions, a ticket regarding better memory profiling support (so we can fix on-site) doesn't move. Plus we're stuck with an ancient version of Lua which has some inherent memory management weaknesses. – Jberkel 10:46, 11 November 2020 (UTC)[reply]

Do all the things that are done in real time have to be done in real time? Some content may never change if done correctly the first time. Some templated content could be substed. We could decide not to have all translations downloaded with the rest of the entry. We could offload content such translations to Wikidata and only load the translations on demand. Do we really need {{l}} rather than {{ll}}? DCDuring (talk) 22:53, 11 November 2020 (UTC)[reply]

@DCDuring The reason for doing some of these in real time and not substing is to avoid things getting messed up or out of sync if the underlying code that the subst is based on is changed. I would rather first see if we can get more creative with memory usage. For example, I was able to cut around 6MB on average out of pages with large translation tables by changing the way the language data is loaded, and below I propose splitting the language data further for more memory savings. Benwing2 (talk) 02:20, 16 November 2020 (UTC)[reply]

The "substing" should be done on a different layer, and already is to some degree: the transcluded pages are cached and don't have to be re-rendered, unless dependent modules change. I like the idea of moving some data to Wikidata, its better suited for the job than the huge blobs of JSON/mediawiki markup and can can be queried as needed. – Jberkel 08:25, 16 November 2020 (UTC)[reply]

It's a high price we will be paying for the advantage. Shifting the cost to WM by increasing the memory limit to, say, 75Mb is nice for us, but isn't that a cost for Wikimedia that is proportional to the number of Wiktionary users online at a particular time? I assume that is on SSDs. Do we know how many users and how many wiktionary pages they have open on average? Is it just windows that have focus that matter? And how many items of relevant content get changed in a day? How much would relying on Wikidata slow things down compared to our current Lua-module-dependent approach. DCDuring (talk) 21:43, 16 November 2020 (UTC)[reply]

The limit is on RAM used to regenerate a page (generally in respose to an edit or template/module change). The generated page which consumes SSD space is smaller. Vox Sciurorum (talk) 22:18, 16 November 2020 (UTC)[reply]

In any event any cleverness in working to operate within the current limit is greatly appreciated. DCDuring (talk) 21:43, 16 November 2020 (UTC)[reply]

strange result from template:zh-see on 库

Saw something leak out of using {{zh-see|庫}} at bottom of page 库. It is showing (bold added)

For pronunciation and definitions of 库 – see 庫 (“warehouse; storehouse; {{zh-short|庫侖|coulomb; etc.”).

whereas for another use like 车 we see a more benign

For pronunciation and definitions of 车 – see 車 (“chariot; cart; land wheeled vehicle; car; etc.”)

Not having looked at this before, apparently the Module:zh-see is simply sucking up and repeating the initial definitions from the destination page? At page 庫 we see:

# [[warehouse]]; [[storehouse]]

# {{lb|zh|physics}} {{zh-short|庫侖|[[coulomb]] {{gloss|unit of electrical charge}}}}

# {{lb|zh|programming}} [[library]]

So is this module not understanding the use of templates within the initial definition strings? But wait, at 車 we see the early use of templates in initial definitions ( {{lb}} ):

# {{†}} [[chariot]]; [[cart]]

# land [[wheeled]] [[vehicle]]; {{lb|zh|specifically}} [[car]] {{zh-mw|輛|部|臺|c:架|mn:頂}}

So... is it that Module:zh-see is only looking at the first NN characters, and not noticing an unterminated template reference? The second example 車 has just about 100 chars to get past the included "[[car]]". Checking only 100 chars in the first example 庫 would leave us in the middle of the zh-short template. Leaving us pretty ugly looking, yep. Shenme (talk) 22:06, 7 November 2020 (UTC)[reply]

Issue with link to Latin -entia

If you go to the English entry at -ence and click on the Etymology link to Latin -entia, it takes you to Latin -ia; not to -entia. It seems there is a redirect from -entia to -ia, but why ? Leasnam (talk) 23:03, 8 November 2020 (UTC)[reply]

This is not a Grease Pit issue. See Talk:-antia. Chuck Entz (talk) 23:22, 8 November 2020 (UTC)[reply]

en.wiktionary costs CPU cycles

Hi there,

my browser's task manager's lately been showing the English Wiktionary tab as gobbling up a lot of ressources on my computer, regardless of what page i'm on (like this simple edit page). So either it's my browser, or Wikt. The French Wikt tab does not seem to require as much, although it's not far behind. —Jerome Potts (talk) 12:31, 9 November 2020 (UTC)[reply]

Pronunciation templates

Looking at the out of memory error on i, I found that removing the various generators of pronunciation, e.g. {{hu-IPA}} which calls Module:hu-pron, allowed several more languages to generate without error. I don't understand what causes the Lua evaluator to use memory, but perhaps there is an optimization to be done on the various IPA modules. Otherwise, what do people think about manually copying the auto-generated pronunciation to reduce module invocations on this large page? (I have a lot of experience debugging memory use in various languages with automatic and manual memory management, but I don't know Lua in particular.) Vox Sciurorum (talk) 14:34, 10 November 2020 (UTC)[reply]

Bug with multiword terms category for unsupported titles?

Unsupported_titles/Colon_slash automatically went into Category:Translingual multiword terms. This seems like a mistake. Equinox ◑ 22:36, 10 November 2020 (UTC)[reply]

I think it's because Module:links/data (which is fully protected) needs to be updated to turn the title from "Colon_slash" to ":/" – Nixinova [‌T|C] 22:52, 10 November 2020 (UTC)[reply]

I've added it to Module:links/data, but that doesn't remove it from the category. — Eru·tuon 23:15, 10 November 2020 (UTC)[reply]

Oh, it happens to all unsupported titles as well -- Category:Translingual multiword terms. Definitely an issue with the categorisation template(s). Thought it was just an issue with this new page. – Nixinova [‌T|C] 23:20, 10 November 2020 (UTC)[reply]

@Nixinova, Erutuon, Equinox I'll look into this and fix it. Benwing2 (talk) 01:31, 12 November 2020 (UTC)[reply]

I disabled multiword terms in Translingual; too many false positives of various sorts. Benwing2 (talk) 03:48, 12 November 2020 (UTC)[reply]

English multiword terms?

And speaking of the new "multiword term" categories, why are there no CAT:English multiword terms? —Mahāgaja · talk 22:46, 10 November 2020 (UTC)[reply]

@Mahagaja This is because I added English to the no_multiword_cat list in Module:headword/data. My logic was that there are so many such entries that it's not clear the category is useful; but if you think the category will be useful, I can remove English from the list. Benwing2 (talk) 01:34, 12 November 2020 (UTC)[reply]

I took English out of the list. Benwing2 (talk) 03:48, 12 November 2020 (UTC)[reply]

Special:BookSources obsolete link

I assume one needs some sort of elevated privileges to fix this... clicking on the U.S. Library of Congress link for an ISBN number results in the error "LC Catalog - Legacy Interface Retired"; the correct current URL appears to be

https://catalog.loc.gov/vwebv/search?searchCode=STNO&searchArg=【ISBN】&searchType=1&limitTo=none&fromYear=&toYear=&limitTo=LOCA%3Dall&limitTo=PLAC%3Dall&limitTo=TYPE%3Dall&limitTo=LANG%3Dall&recCount=25&page.search.search.button=Search

...with "【ISBN】" replaced by the ISBN number. --Struthious Bandersnatch (talk) 19:11, 11 November 2020 (UTC)[reply]

mod:ja and lua memory

It seems that a copy of mod:ja/data is made every time mod:ja is loaded:

local data = mw.loadData("Module:ja/data")

--[[...]]

export.data = {
	joyo_kanji = data.joyo_kanji,
	jinmeiyo_kanji = data.jinmeiyo_kanji,
	grade1 = data.grade1,
	grade2 = data.grade2,
	grade3 = data.grade3,
	grade4 = data.grade4,
	grade5 = data.grade5,
	grade6 = data.grade6
}

--[[...]]

local grade1_pattern = ('[' .. data.grade1 .. ']')
local grade2_pattern = ('[' .. data.grade2 .. ']')
local grade3_pattern = ('[' .. data.grade3 .. ']')
local grade4_pattern = ('[' .. data.grade4 .. ']')
local grade5_pattern = ('[' .. data.grade5 .. ']')
local grade6_pattern = ('[' .. data.grade6 .. ']')
local secondary_pattern = ('[' .. data.secondary .. ']')
local jinmeiyo_kanji_pattern = ('[' .. data.jinmeiyo_kanji .. ']')
local hyogaiji_pattern = ('[^' .. data.joyo_kanji .. data.jinmeiyo_kanji .. ']')

Is it possible to alleviate the memory problem by improving the code above? -- Huhu9001 (talk) 06:50, 12 November 2020 (UTC)[reply]

@Huhu9001: Unlike require, mw.loadData only generates the module table the first time it is called on a page, while every time it is called it generates a special table that allows the code to access the cached module table. The special table takes up memory every time mw.loadData is called, so we can save memory if any templates use Module:ja but don't call a function that accesses Module:ja/data. It can be tested by replacing local data = mw.loadData("Module:ja/data") in Module:ja with

local function load_only_if_needed(module_title)
	local module
	return function()
		if module == nil then
			module = mw.loadData(module_title)
		end
		return module
	end
end

local load_data = load_only_if_needed("Module:ja/data")

and then replacing data with load_data(). This will only load the module if needed, so it will save whatever memory is used for mw.loadData. This technique can be used for other mw.loadData-loaded modules too. — Eru·tuon 00:13, 15 November 2020 (UTC)[reply]

I made the change to Module:ja described above and previewed 水 with the template sandbox extension (see the diff), but there was no difference in where the out-of-memory error showed up. Oh well. — Eru·tuon 00:18, 15 November 2020 (UTC)[reply]

splitting language data

Currently language data is split by the first letter. However, I think this is not fine enough. I suggest we split further, either by first+second letter or go all-out and put each language in its own module. The latter solution is used by various Asian-language modules and seems to work OK. Benwing2 (talk) 22:59, 14 November 2020 (UTC)[reply]

BTW for reference, I made a recent change to {{multitrans}} to only load language modules as needed and use mw.loadData() instead of require(), and it reduced land by 6 MB, which was enough to get it below the 50MB threshold. (It's not yet enabled on other pages that use {{multitrans}}.) So there is still a lot of room for optimization in the language modules. Benwing2 (talk) 23:05, 14 November 2020 (UTC)[reply]

Also, the {{multitrans}} solution can be adopted anywhere we have a large number of similar templates called. Benwing2 (talk) 23:06, 14 November 2020 (UTC)[reply]

@Erutuon, Rua Any thoughts about this? Benwing2 (talk) 03:34, 17 November 2020 (UTC)[reply]

I'm not completely opposed to splitting up the language data modules, but I feel like they'd be tedious to edit because there would be many more of them (using the two-letter solution, 602 according to this script, up from 28). We have variables like ACUTE, BREVE, PUNCTUATION that are used in multiple language data records, and they'd have to be copied to some of the child modules, or moved to fields in a helper module, because actual literal combining diacritics are hard to read and it's hard to ensure that the punctuation remains the same. Having a lot of modules also makes it more work to update multiple languages at once: more pages to edit.

At least, I have already felt a bit overwhelmed even with just 28 modules. At least the modules would be smaller, with no more than 112 languages in each module (currently the maximum is 624 in Module:languages/data3/k according to my "language stuff" page).

I'd prefer to edit the larger modules, and automatically create the smaller ones. A bot could dump the language data as Lua using the equivalent of require "Module:debug".dump. Trouble is some bot operators would have to maintain the bot and make sure it updates the modules quickly enough that people don't get annoyed that their changes aren't being reflected in pages. A benefit would be that we could split the language data modules however we like, while template editors wouldn't have to learn a new system.

In any case, to test this out, a bot would be the easiest method.

Historical note that Rua mentioned the splitting language data idea in Wiktionary:Grease pit/2019/March § Another optimization. Earlier, TheDaveRoss suggested it in Wiktionary:Grease pit/2018/February § light and Lua error: not enough memory and I mentioned using a bot and DTLHS said that I should volunteer to maintain it. I hadn't written bot scripts then, but now I think could at least write a script to create the modules; I don't yet know how to make a bot respond to certain pages being edited.

I was also thinking the bot could dump the data as JSON so that it's easier for gadgets and bots to use — if they often retrieve data for one or more languages. Then the smaller data modules could just be something like:

-- Data found in [[Module:languages/data/an.json]]
return require("Module:languages/load json")("an")

Just throwing this out there; I'm not sure if it's a good idea. But it'd be easy to do with a bot. — Eru·tuon 09:14, 17 November 2020 (UTC)[reply]

Splitting into smaller modules by two letters would not work in the long run, if translation tables get larger and have more languages in them. Eventually, every language will be covered. On the other hand, splitting off the data points that translation tables need will reduce that problem. —Rua (mew) 09:38, 17 November 2020 (UTC)[reply]

@Erutuon Thanks for your comments. I get your point about it being hard to edit 600+ modules if you need to do it by hand. An alternative to having a bot script that autosplits the modules is to do it using Javascript, similar to how Module:languages/canonical names works. I don't know exactly how this is set up by I imagine it isn't too hard for someone like you who knows Javascript. The Javascript would have to be smart enough to only make changes when needed so it doesn't take too long. If you can set up the basic structure, I can help you improve it; I've worked a bit with Javascript and can figure it out if the infrastructure is already in place. As for things like ACUTE, this can be handled automatically by the script that does the splitting; it doesn't actually need to create variables like ACUTE, but can just use something like u(0x0301) directly wherever needed (or even just inline the character directly), since people shouldn't be manually editing the auto-generated files. Benwing2 (talk) 01:58, 18 November 2020 (UTC)[reply]

@Rua The translation tables are actually not an issue currently, due to {{multitrans}} (based on a suggestion you made). The problem is more with pages like a, e, i, etc. that have a lot of entries for different languages on them, and here I think splitting the language tables by two characters would really help. I think your alternative suggestion is to split on the other dimension, e.g. split out certain fields like 'otherNames/varieties/aliases' and Wikimedia codes and such. This is definitely possible but it would introduce some of the same issues that splitting by two characters would introduce in terms of making it harder to do edits across modules, and would have an additional IMO negative effect in that the information on individual languages would be split across multiple files. Benwing2 (talk) 02:02, 18 November 2020 (UTC)[reply]

If we do all of this automatically, it'd be easy to create another set of modules with only some of the language data fields and see how that affects memory usage. I'm doubtful that it would improve things, but it doesn't hurt to try.

If we want to use both Python and JavaScript, it would be easiest to have a module that outputs the list of data modules that need to be saved. Then the script only has to call the module function and save the data modules. The module can decide which data modules need to be updated, which is easiest to do from inside the Wiktionary module system.

I already tried writing a sandbox module that would bundle the source code for the two-letter-prefix language data modules in JSON format, but it ran into the template include limit, and this limit also applies to the ExpandTemplates API, so the module has to break the response into chunks that are each under 2 MiB. Just recreated the test module at Module:User:Erutuon/split language data modules, but without chunking logic yet.

I guess chunking could lead to an inconsistent state in the split-up modules if someone edits another language module between the time one chunk is generated and the next is. The script could retrieve the latest revision timestamp of all the user-edited language data modules before and after the chunks are retrieved. If the before and after timestamps are not the same, it could keep retrieving the chunks, until a certain number of retries has been exceeded. — Eru·tuon 04:08, 19 November 2020 (UTC)[reply]

If the module just outputs the language data for each two-letter-prefix module as JSON, it comes well under 2 MiB. (There are a lot of \n and \t in pretty-printed Lua.) So either I need a Lua table pretty-printer for JavaScript or Python, or to just create Lua modules that parse a JSON string (return mw.text.jsonDecode [[ json here ]]) or JSON pages that are loaded by Lua modules. The latter currently requires changing the content model, because .json pages in the Module namespace are not automatically set to JSON, but that could be done with a script if an admin runs it every time a new JSON page needs to be created. The JSON string method seems the easiest because it doesn't involve Lua pretty-printing or content model changing. — Eru·tuon 10:28, 19 November 2020 (UTC)[reply]

I split the language data under a sandbox module using the JSON string method. I tried out using the alternative data modules by adding

if true then
	return "User:Erutuon/split language data modules/data/" .. code:sub(1, 2)
end

at the beginning of export.getDataModuleName in Module:languages and entering a title in the "Preview page with this template" below the edit box and clicking "Show preview". I think this makes {{multitrans}} use the split-up modules even though it doesn't load language data with Module:languages. This caused land and white to run out of memory, and fire and black and water to increase their memory usage somewhat. Not what I expected. — Eru·tuon 11:35, 24 November 2020 (UTC)[reply]

apihighlimits question

May I request apihighlimits to be enabled for my account? Once in a few months I'd like to use API:categorymembers on subcategories of Hungarian lemmas (or this category proper). Getting results by 200 or 500 and then having to combine them manually can get tedious when the total is closer to the order of 10,000s than to the thousands. Or I wonder if I could get a researcher bit (which includes apihighlimits) if this way is more convenient for you. Or shall I request it on behalf of my bot account) instead? (This account hasn't been active on the English-language Wiktionary, but I made quite a few edits on the Hungarian Wikipedia around a decade ago.) Thank you in advance. Adam78 (talk) 18:44, 16 November 2020 (UTC)[reply]

The only user groups with the apihighlimits right in Special:ListGroupRights are administrator or bot, so you'd have to be one of those. If your bot account were botified here, you could download subcategories using that account.

But there might be a tool on ToolForge (see the directory) that does what you want, for instance PetScan.

You can also generate it from the dump (specifically from the files page.sql, categorylinks.sql, siteinfo-namespaces.json). I have a program that does this; it creates a JSON map from page title to array of categories that start with a certain prefix. Filtering by the prefix Hungarian_ should get all the pages in the subcategories of Category:Hungarian lemmas as well as some other categories. (The result looks like {"-a":["Hungarian_lemmas","Hungarian_suffixes",...],...}.) If that sounds useful for your purposes, I can figure out a way to get the JSON to you (it's 8.9 MB) or explain how to run the program if you want to be able to generate the file yourself. — Eru·tuon 01:01, 17 November 2020 (UTC)[reply]

@Erutuon, thank you very much for your reply! PetScan seems to be perfect for all my purposes. – It's a bit odd though that when I tried looking up Hungarian lemmas, it produced one fewer results (26,634) than the number of the articles listed there (26,635) – and not due to the exact time of the request, because PetScan did have the 10 most recent lemmas given here under "Recent additions to the category", so the difference must lie somewhere else. Non-lemma forms resulted in the same total in PetScan as here, even though the number is higher (38,089). Adam78 (talk) 19:49, 17 November 2020 (UTC)[reply]

Missing babel box template

The template Phnx-1 for Phoenician doesn't exist for some reason. Dngweh2s (talk) 20:22, 16 November 2020 (UTC)[reply]

Adding labels to comparative/superlative forms

In the adjective "head" line, e.g. as generated by "en-adj" template, I want to list both "more/most" and "-er/-est" forms, but label the latter as less common, or rare, or whatever. I sort of remember seeing this done somewhere, but I can't remember where, and I could be wrong anyway. Does anyone know a recommended way to achieve this, either using "en-adj" or some other way? Mihia (talk) 11:51, 18 November 2020 (UTC)[reply]

@Mihia I added support for this in {{en-adj}}; use |comp_qual=, |comp2_qual=, ... to specify comparative qualifier(s), and |sup_qual=, |sup2_qual= to specify superlative qualifiers.

@Mihia Oops. Benwing2 (talk) 01:37, 19 November 2020 (UTC)[reply]

@Benwing2: Great, thanks very much for doing that. Mihia (talk) 21:59, 19 November 2020 (UTC)[reply]

Suggestion for Module:la-adj/table: use `{{diagonal split header}}`

For greater clarity and a better appearance, use {{diagonal split header}} instead of plain Case / Gender. I don’t know how to make it work in Lua, otherwise I’d implement the change myself. —Born2bgratis (talk) 02:13, 19 November 2020 (UTC)[reply]

template:de-conj-strong & template:de-conj-weak

It lacks forms, which are quite regular:

1st ps. sg. indicative present active: e.g. bind' and bind besides binde (from binden)
2nd ps. sg. imperative for words with stem in d: e.g. bind' and bind besides binde (from binden)

Some examples where these forms do exists and are lacking in the table: lieben, sagen, laufen, binden, finden (at least for the indicative). --Schläsinger X (talk) 02:45, 20 November 2020 (UTC)[reply]

@Schläsinger X I'm not sure we want to include forms like bind' and bind. These are informal pronunciations, for sure, but as spellings they may be nonstandard. Benwing2 (talk) 03:10, 20 November 2020 (UTC)[reply]

Even as spellings they are quite common. And as for the imperative, forms like bind are already included when the stem doesn't end in d or t (e.g. laufen has "lauf (du) | laufe (du)"; but binden and retten only have the form with -e). --Schläsinger X (talk) 11:40, 20 November 2020 (UTC)[reply]

Because in the imperative the apocopic forms are standard and not considered informal. Apart from that, a final schwa can be clipped virtually anywhere in poetry and informal or dialect-close language, so there is no point to include these forms in the tables. It should be avoided as noisy, to show readers what the expected standard forms are. And indeed it is useless to create such apostrophic spellings and I reckon that such verb forms should be deleted on sight, so much nonstandard and irregular these spellings are – the expected spelling is bind. Fay Freak (talk) 23:03, 20 November 2020 (UTC)[reply]

For verbs with stems in d or t, the endingless imperative is missing - albeit even Duden has them: binden, retten. --Schläsinger X (talk) 10:04, 21 November 2020 (UTC)[reply]

"Watchlist" facility

When I create a new page, I leave the "watch this page" box ticked. What is this supposed to do? I mean, what is supposed to happen once I have "watched" a page? Should I get notifications or messages relating to that page or what? I have looked at the "Watchlist" page but the list is dominated by MY edits (that I can see on my "Contributions" page, and don't need to see again here). How do I turn those off? The presentation of that "filters" list is also thoroughly confusing because many are potentially subsets of others, and it is unclear how they interact. For example, the default settings leave all types of editors unchecked in the list, so wouldn't that exclude all edits of all types? Then there are the notifications of "A link was made from X to Y" that I get, I think when Y is a page I created. This seems to be controlled by the "notification preferences" settings. Is this separate from "Watchlist" or are the two connected? Should I get any notifications as a result of "watchlist" settings? And why are there two separate places to configure watchlists (one under "Preferences" and one under "Watchlist" itself). I am just totally confused by all this. Can anyone help, or is there anywhere it is clearly explained? Mihia (talk) 20:55, 20 November 2020 (UTC)[reply]

Whenever my Watchlist kept getting crowded, I just got a new account. I recommend it Darren X. Thorsson (talk) 21:05, 20 November 2020 (UTC)[reply]

Yes, unfortunately the watchlist feature doesn't seem to scale well, it takes several seconds to load when your list reaches a certain size. – Jberkel 13:02, 21 November 2020 (UTC)[reply]

Notifications for links are separate from the watchlist. You can turn them off in Special:Preferences#mw-prefsection-echo under "Page link". When "watch this page" is ticked, the page is added to your watchlist. You shouldn't get notifications for activity related to pages on your watchlist because that's not an option in preferences. I think with no filters on a particular aspect of an edit it just shows you everything, but as soon as you choose one or more filters that pertains to some aspect, like the user who performed an action, it removes edits not agreeing with any of the filters. For instance, you can select two filters to show only actions by newcomers and unregistered users. For more information you might try mw:Manual:Watchlist and mw:Help:Watchlist and w:Help:Watchlist. — Eru·tuon 21:48, 20 November 2020 (UTC)[reply]

Thanks, so, just to confirm, the "Watchlist" is just that list of edits on the "Watchlist" page, where it says "Live updates", and is unconnected with the notification system, right? Do you (or anyone) know a way to turn off my own edits in that list? I thought that the main idea of "watchlist" was that you wanted to see other activity on an article, not your own. Mihia (talk) 21:57, 20 November 2020 (UTC)[reply]

There's a row of checkboxes: Hide: [] registered users [] anonymous users [] my edits [] bots [] minor edits [] page categorization [] Wikidata. Check my edits. Vox Sciurorum (talk) 21:59, 20 November 2020 (UTC)[reply]

Vox Sciurorum's directions work if you've got the earlier version of the watchlist. If you have the newer fancier version, check in "filter changes". There's a filter for "changes by others". You can select it, and if you want to always use it, save it using the bookmark icon inside the "active filters" box. — Eru·tuon 22:04, 20 November 2020 (UTC)[reply]

Aha, thank you very much. IMO the design of that filter list is HIGHLY confusing. By default "Changes by You" is UNCHECKED, and yet "Changes by You" are by default DISPLAYED. "Go figure", as I believe they say. Mihia (talk) 22:28, 20 November 2020 (UTC)[reply]

I find I have to experiment on the occasions when I am adjusting the watchlist because the meaning of the filters is not clear. I can never remember whether the filter controls include or exclude. The result is that I rarely adjust the watchlist and the controls to do so are just a waste of screen space. DCDuring (talk) 13:46, 21 November 2020 (UTC)[reply]

template:gsw-decl-adj

The template is incorrect, or incomplete and misleading. There are other forms as added in lieb and guet. Possibly, there are even more as seen in: w:Low Alemannic German#Adjectives (unsourced), w:Walser German#Nouns, w:de:Walliserdeutsch#Deklination der Adjektive.

To which dialect(s) do the forms in the template belong?
How should other forms be added? All in one template (which becomes ugly, messy, confusing, and is incorrect for adjectives which don't exist in all dialects), or with different templates for different sub-dialects?

@Widsith as creator of the template. --Schläsinger X (talk) 10:09, 21 November 2020 (UTC)[reply]

It's based on the "standard" Swiss German, which is a sort of codified mixture based mainly on the dialect of Zurich. I have no idea how other dialects should be treated; I'm just adding what is important and necessary for me (I live in Switzerland), and in practice every valley here does things slightly differently, without exaggeration. In my opinion the best way to illustrate these things is with lots of examples – though this again poses unique problems for Swiss German, which is mainly unwritten. Ƿidsiþ 07:20, 23 November 2020 (UTC)[reply]

Font of the headword at "root"

The headword at root is appearing in a sans serif font, unlike ordinary entries. I suspect this is because there is a Chinese section (which uses {{zh-noun}} and places the entry into "Category:Chinese terms written in foreign scripts"). I think the change of font is fine if the entry consists of only a Chinese term written in a Latin script, but it doesn't seem like a good idea when there is an English section and other language sections as well. Should this be fixed? — SGconlaw (talk) 12:50, 21 November 2020 (UTC)[reply]

(Additional discussion: #Chinese_section_on_out_(etc?)_causing_top_of_page_to_change_font.) - -sche (discuss) 00:56, 25 November 2020 (UTC)[reply]

Resolved: see "#Weird styling applied to the entry title at check" below. — SGconlaw (talk) 17:16, 29 November 2020 (UTC)[reply]

Template:R:Mindat date problem

Creating emmerichite, I added a MinDat reference and noticed that it said "accessed 29 August 2016", which is wrong, because I accessed it just now, today, and perhaps this entry didn't exist there in 2016. I see that this date is hard-coded into the template. Really I think it should embed "today's date" at the point when a new entry is saved. Or what can we do about this? Equinox ◑ 15:40, 21 November 2020 (UTC)[reply]

That date seems to have been in the template from the beginning; I'm not sure why. I couldn't figure out how to have the template indicate the date at the point when a new entry is saved, so as a temporary (semi-permanent?) fix I replaced it with the year when the database was launched and the current year. — SGconlaw (talk) 16:25, 21 November 2020 (UTC)[reply]

Can we start categorizing archived rfv's, rfd's etc. by language?

It occurred to me that I wanted to look through words that have previously failed rfv to see if I can attest them, but the language isn't currently recorded. Adding a lang parameter to {{archive-top}} is easy, and the header that is generated when you follow the "+" link on {{rfv}} and {{rfv-sense}} already contains language data; however, that data is simply discarded when you archive it. I don't have permission to edit MediaWiki:Gadget-aWa.js for some reason (I edited MediaWiki:Gadget-QQ.js in the past, but I seem to have lost that permission at some point?).__Gamren (talk) 20:46, 21 November 2020 (UTC)[reply]

Sure. @Chuck Entz, Gamren should be made an interface admin. —Μετάknowledge^{discuss/deeds} 21:01, 21 November 2020 (UTC)[reply]

Received; thanks Chuck.__Gamren (talk) 14:22, 22 November 2020 (UTC)[reply]

Chinese section on out (etc?) causing top of page to change font

Go to "outside" and look at the appearance of the word "outside" at the very top of the page, above the TOC: it appears, for me, in a relatively smaller and 'fancier' serif font. Now look at "out": it appears in a different, relatively larger and simpler / sans-serif font. Previewing individual language sections by themselves, I have found that this is due to something in the Chinese L2 section. What should be changed so that the Chinese modules/templates do not change the display of top-of-page Latin-script text? - -sche (discuss) 21:54, 24 November 2020 (UTC)[reply]

This is the same issue as brought up at #Font of the headword at "root" above. —Mahāgaja · talk 22:07, 24 November 2020 (UTC)[reply]

I see. Until a better fix can be had, I've fixed it by removing the headword-line template and replacing it with bare text + manual categorization. - -sche (discuss) 00:52, 25 November 2020 (UTC)[reply]

Update: judging by all right, and then tested on out and root, simply using {{head}} instead of Chinese-specific templates also seems to produce correct results; I've switched the two aforementioned entries to that format. - -sche (discuss) 00:55, 25 November 2020 (UTC).[reply]

Great, thanks. This may need to be documented at the Chinese-specific templates (that is, don’t use these templates if there are other language sections in the entry?). — SGconlaw (talk) 04:37, 25 November 2020 (UTC)[reply]

Resolved: see "#Weird styling applied to the entry title at check" below. — SGconlaw (talk) 17:16, 29 November 2020 (UTC)[reply]

No italics for ikt mentions

On my Mac (OS 10.15.7, Firefox 83.0) links and mentions for Inuvialuktun (code ikt) are formatted identically and use a font that doesn't match anything else. Compare Greenlandic napaartoq (“rowan”), napaartoq (“rowan”), Inuvialuktun napaaqtuq (“tree”), napaaqtuq (“tree”). I see italics, roman, roman, roman with the last two in a different font than the first two. Is there anything special about Inuvialuktun that calls for a different font? If so, it needs to provide both roman and italic forms. Vox Sciurorum (talk) 15:17, 26 November 2020 (UTC)[reply]

The reason is that Inuvialuktun is listed at Module:languages/data3/i as being written only in Canadian Syllabics, not in the Latin alphabet. Should I add Latin as a script for it? —Mahāgaja · talk 17:32, 26 November 2020 (UTC)[reply]

There is a translation dictionary[13][14] cited in an RFV thread using Latin script. According to internet searches both scripts are in use. Vox Sciurorum (talk) 18:51, 26 November 2020 (UTC)[reply]

@Vox Sciurorum: OK, I've added Latin script for Inuvialuktun, and just look at the words in your original post now! —Mahāgaja · talk 19:24, 26 November 2020 (UTC)[reply]

It's a miracle! It also pains me to see. I used to work for a company that sold a cloud service where we kept all your documents (3d models) on our server. One of our basic rules was old versions of a document should render the same forever even if the generating functions (equivalent to templates) changed later. We did this by linking each document to the specific version of a template being used. There was a task that would generate new versions of documents with links updated, but the old version was always available. We never had the problem seen on Wiktionary where you go to an old version of page and see a sea of red due to an incompatible template change. Vox Sciurorum (talk) 19:29, 26 November 2020 (UTC)[reply]

Now I'm seeing a funny font for Inuktitut, code iu, which does have "Latn" (not Latn) in the script list. {{cog|iu|napaaqtuq|t=tree}} = Inuktitut napaaqtuq (“tree”). Vox Sciurorum (talk) 19:35, 26 November 2020 (UTC)[reply]

Yeah, that happens for several languages. I don't know how to fix it; it doesn't seem to have anything to do with the language's entry in Module:languages. —Mahāgaja · talk 20:03, 26 November 2020 (UTC)[reply]

Looks like it's a combination of (1) using a Mac, (2) the directive "html,body{font-family:sans-serif}" in the first stylesheet loaded by the head section of HTML, (3) lang="iu" in the <i/> surrounding the mentioned word. I wonder if this is a problem with Apple's font handling. It happens with both Firefox and Safari on Mac, not with Chrome on Chrome OS. Does Firefox use WebKit? Vox Sciurorum (talk) 20:34, 26 November 2020 (UTC)[reply]

I don't know, but the fonts are different for me too, and I use Windows, not Mac. —Mahāgaja · talk 21:06, 26 November 2020 (UTC)[reply]

Bug report (WikiHiero)

Just noticed this bug on the Egyptian nḫt-nb.f page. The word's WikiHiero code reads as follows: n:xt-x*t:D40-nb:f which appears correct. The hieroglyphs, however, render as

instead of

Looks to me like WikiHiero automatically added in a couple extra letters when it ligatured the first cluster. If anyone here has access to the back end, you might want to check it out. (While I could just edit the nḫt-nb.f page, I thought I'd bring the issue up since it might be affecting other pages too. Not sure why the n:xt was even ligatured to begin with, since it's not n&xt.) 2601:49:C301:D810:B163:5C7E:AE44:4CEE 00:50, 27 November 2020 (UTC)[reply]

@Vorziblix —Μετάknowledge^{discuss/deeds} 03:10, 27 November 2020 (UTC)[reply]

WikiHiero unfortunately tries to ligate glyphs separated by : and not just & as you’d expect; you have to use :* or *: instead to force a non-ligature. This is a product of bad logic in the main WikiHiero code here, where it converts all separators into & when testing for available ligature images. Rarely, as with n:xt, it inappropriately expands ligatures if it has an image with the right name available. In this case the cause of the issue is that the image

is named hiero_n&xt.png in the WikiHiero code repository instead of being named hiero_n&xt&x&t.png as it should be. Unfortunately we don’t have control over that at Wiktionary; you’d have to file a bug report at Phabricator to get it fixed. It’s annoying, but it’s a known bug, and most of the uses of

on Wiktionary are intentional, done with the bug in mind. You’re right that the one at nḫt-nb.f isn’t, though; it should be

. I’ve gone ahead and fixed it in the entry. — Vorziblix (talk · contribs) 03:58, 27 November 2020 (UTC)[reply]

Ok, thanks! 2601:49:C301:D810:9108:9228:94E:3E1F 12:26, 27 November 2020 (UTC)[reply]

Weird styling applied to the entry title at check

In the inspect element it is easy to verify that "Hani" class is the culprit. I think we should take an action right now or else we might have a "Hani" Wikidemic soon. Dixtosa (talk) 12:11, 28 November 2020 (UTC)[reply]

I've changed the Chinese section to use {{head|zh|verb}} instead of {{zh-verb}} as a temporary workaround. Vox Sciurorum (talk) 13:52, 28 November 2020 (UTC)[reply]

This issue has now been mentioned on this page three times. Should we document your workaround on the Chinese template pages (i.e., use {{head}} instead of this template if there are other language sections in the entry)? — SGconlaw (talk) 17:00, 28 November 2020 (UTC)[reply]

Rather, "use {{head}} instead of this template if the word is written in the Latin alphabet". There's no reason not to use {{zh-verb}} on pages written in Hanzi that also have Korean, Japanese, or Vietnamese entries. —Mahāgaja · talk 17:21, 28 November 2020 (UTC)[reply]

Ha ha, good point. — SGconlaw (talk) 18:26, 28 November 2020 (UTC)[reply]

Or better yet, change the module that adds "Hani" so it doesn't do so when the headword isn't in Han characters. Chuck Entz (talk) 17:37, 28 November 2020 (UTC)[reply]

@Erutuon: can what Chuck suggested be implemented? — SGconlaw (talk) 18:26, 28 November 2020 (UTC)[reply]

@Sgconlaw, Chuck Entz: I've implemented a similar fix: Module:headword will only add a display title if the title is not all-ASCII. That fixes the cases mentioned so far, but wouldn't prevent a non-completely-ASCII Latin title from being formatted as Hani. I'll see if I can find any such cases and edit the module if I do. — Eru·tuon 21:15, 28 November 2020 (UTC)[reply]

Okay, so here is all the script-tagged display titles where the title contains Latin characters (from a SQL query). A few contain non-ASCII Latin characters (as defined by Unicode, not by Module:scripts/data):

atŏw: Cham
krăwng: Cham
limœû: Cham
mɨta: Cham
nhiệm_kì: Cham
nưng: Tavt
pa-rá-su-um: Xsux
pét: Tavt
thău: Cham
tức_là: Cham
điều_hành: Cham
Ȣ: polytonic
ȣ: polytonic
ȣ̈: polytonic
ᴕ: polytonic
輔酶Ⅰ: Hani
輔酶Ⅱ: Hani

Cham, Tavt, and Xsux are errors of script recognition; Eastern Cham, Tai Dam, and Akkadian are defined as only using one script each, so Cham, Tavt, and Xsux respectively are assigned with no checking. This can be fixed by adding Latn to the list of scripts for those languages (as all of them use Latin script fairly often: search User:Erutuon/scripts in link templates for their language codes, cjm, blt, and akk). polytonic is manually assigned and the Hani is not an error. — Eru·tuon 22:49, 28 November 2020 (UTC)[reply]

Fixed Eastern Cham and Tai Dam by adding Latin to their scripts, and added Latin to the scripts of Akkadian as well because Akkadian in link templates does sometimes use Latin script according to User:Erutuon/scripts in link templates, but it doesn't fix the display title in pa-rá-su-um because |head= in that entry uses |head=𒉺𒁺𒋢𒌝 and the display title logic in Module:headword doesn't check whether the script of the |head= parameter matches the script of the title. I could just change the ASCII check above to a check for Latin script or punctuation or whitespace, but I'm not sure if that's the best solution when the actual problem is that script in the display title might match the |head= parameter rather than the title. — Eru·tuon 23:07, 28 November 2020 (UTC)[reply]

Iff there's only one entry where the pagetitle is Latin script but the head= is cuneiform, that suggests that that entry is doing something nonstandard and needs to be changed, rather than that we need a bigger change to "handle" it. Looking at the edit history, I see the page had been at the cuneiform title before Tom 144 moved it, and I recall that Wiktionary:Votes/2019-05/Lemmatizing Akkadian words in their transliteration passed, but I see it has mostly not been implemented, due to various problems with it. Tom or Victar might know what should be done with this entry, which might include moving it back to cuneiform. - -sche (discuss) 23:28, 28 November 2020 (UTC)[reply]

There were also a few Akkadian links with Latin script, which can be seen here. I moved them to |tr= if they have hyphens, otherwise to |ts=, and reverted my edit to the language data. That was the right way to handle this. Someone else will have to deal with the Latin-script Akkadian title. — Eru·tuon 05:05, 29 November 2020 (UTC)[reply]

Great! Thanks, @Erutuon. — SGconlaw (talk) 17:16, 29 November 2020 (UTC)[reply]

@Dixtosa: I can't believe you didn't go with "Handemic". 😂 - -sche (discuss) 20:19, 28 November 2020 (UTC)[reply]

Additional blank line at swack

What causes the extra blank line before Etymology 3? Is {{R:Partridge 1984}} inserting a spurious blank line? Equinox ◑ 16:32, 29 November 2020 (UTC)[reply]

I fixed it by removing a newline from {{R:Partridge 1984}}. Vox Sciurorum (talk) 16:41, 29 November 2020 (UTC)[reply]

Template:it-prep phrase

Can someone create this please? It's just a headword-line template, but I don't have much knowledge of how to code these. Imetsia (talk) 01:28, 30 November 2020 (UTC)[reply]

Would it make anything easier that would be more complicated with {{head|it|prepositional phrase}}? If not, there's no particular need for a language-specific template. —Mahāgaja · talk 08:17, 30 November 2020 (UTC)[reply]

Many of the major languages have the template (Template:en-prep phrase, Template:fr-prep phrase, etc.), so it seems fitting that Italian have it too. It does nothing more than save a few characters and templatize the headline. It's not strictly necessary, but nice to have. Imetsia (talk) 17:26, 30 November 2020 (UTC)[reply]

@Mahagaja, Imetsia: The French headword templates invoke Module:fr-headword, which, thanks to @Benwing2, automatically splits terms containing an apostrophe or a hyphen. This allows me to do edits like this.

If Module:it-headword was able to do the same, Italian-specific headword templates invoking it would be an improvement over the generic templates. P U C – 12:09, 2 August 2021 (UTC)[reply]

Ban variation selectors in page names

User:Justinrleung recently nominated for deletion the page 次󠄁, which is 次 plus the Unicode variation selector U+FE01 (URL shows as /次%F3%A0%84%81 in my browser, using the UTF-8 encoding of U+FE01). U+FE00 to U+FE0F should be prohibited in page names. According to Wikipedia, they should only be used when they do not change the meaning. Vox Sciurorum (talk) 10:15, 30 November 2020 (UTC)[reply]

For reference, titles with variation selectors as of the 2020-11-20 dump are listed at User:Erutuon/lists/variation selectors. At the moment, mainspace pages only use the emoji-related variation selectors (VS15, VS16: U+FE0E, U+FE0F), apart from the pages for the variation selectors themselves, which just redirect to Appendix:Control characters. And most are just emoji (marked by VS16) or non-emoji (marked by VS15) versions of characters that redirect to the entry for the character. — Eru·tuon 11:06, 30 November 2020 (UTC)[reply]

次+VS2 exists (linked above) but is not on your list. Vox Sciurorum (talk) 11:16, 30 November 2020 (UTC)[reply]

Right, that's a recent addition and my list reflects the state of affairs on November 20th. — Eru·tuon 11:19, 30 November 2020 (UTC)[reply]

Hmm, apparently I wasn't checking for titles with code points in the Variation Selectors Supplement block, but there weren't any before November 30th when 次 was created. — Eru·tuon 20:23, 3 December 2020 (UTC)[reply]

As the employment of variation selectors, however uncommon in practice, is legitimate under various circumstances (obviously they exist for linguistic content), one should be adverse to a ban. That “they should only be used when they do not change the meaning” is doubtful. ك (k) and (k) is not different in meaning either, yet the distinction exists. Editors are expected to exhaust the possibilities of Unicode to represent writing as it appears on paper. It’s disheartening if entries are deleted or banned beforehand because they are “too correct” and in general one cannot use Unicode correctly because it exceeds the vulgar’s computer education. Fay Freak (talk) 11:48, 30 November 2020 (UTC)[reply]

Chiming in late as I do some maintenance work on RFDs.

One technical approach that might work as a partial ban would be to ban the variation selectors when used in combination with certain codepoint ranges. So far as I'm aware, these selectors should never be used in conjunction with any CJK string, so that's a sizable chunk that could be implemented right there. ‑‑ Eiríkr Útlendi │^{Tala við mig} 17:57, 10 February 2021 (UTC)[reply]

Definition API

I am trying to figure out how to use the Wiktionary API to get the definition markup for a word. I found the API documentation and know how to get lists of words and information about words, just not the actual definition. I have also done a web search and could find no answer other than screen scraping. I even know how to get Wikipedia content; frustratingly, everything other than what I want.

My real goal, if it matters, is to find the year of first use of a word, which I would get by looking at the years of all the illustrative quotations. I know that there is also the Citation: namespace. Do I need to look there as well, or is it fairly reliable that the earliest known use will always be in the main entry?

Matchups (talk) 13:54, 30 November 2020 (UTC)[reply]

P.S. Should API and help desk have hatnote references to the appropriate project pages, as help does?

There isn't a real API for Wiktionary, there's only a somewhat unmaintained endpoint used by the Wikipedia app: /page/definition/{term}. It does not return any quotations, however. I'm not sure if the quotations on Wiktionary are really useful for what you're trying to do. A word might have been used in a lot of other contexts before the earliest date listed here. – Jberkel 14:08, 30 November 2020 (UTC)[reply]

As Jberkel pointed out, the quotations in our entries are insufficient for the purpose you intend. Many entries have no quotations at all, and even for those that do, we are in most cases unable to guarantee that the earliest quotations shown are the first known published occurrences of particular entries, simply because editors have neither the time nor resources to conduct extensive research. (Occasionally when we do know that a particular quotation is the first known occurrence, we indicate this in a note.) — SGconlaw (talk) 18:48, 30 November 2020 (UTC)[reply]

Looking for the {{defdate}} text at the end of a definition would be better, but still not very good. Vox Sciurorum (talk) 18:53, 30 November 2020 (UTC)[reply]

Indeed, because we don’t consistently add {{defdate}} for all senses of all entries. — SGconlaw (talk) 19:18, 30 November 2020 (UTC)[reply]

Thanks, all, for the info. I will do the best I can with what's there, for now. The intent is to use it as one of the inputs to a machine learning process, so even dirty data is better than nothing at all. And perhaps I will also see what I can do to improve some words going forward. Matchups (talk) 01:10, 1 December 2020 (UTC)[reply]