Benwing2

Catalan inflections

Latest comment: 8 months ago65 comments4 people in discussion

Hi Ben, any chance we could have automatic Catalan inflections? There's User:DTLHS/catalan bot requests, but it doesn't seem to be running very often, and it's tedious to add manually to a list. Jberkel 18:12, 11 December 2023 (UTC)Reply

@Jberkel Yeah I have looked into this. The thing is that I'd probably have to rewrite Module:ca-verb to work like Module:es-verb or Module:pt-verb. The Spanish, Portuguese and Galician modules were all written mostly by me and implement JSON fetching of the inflections as well as {{es-verb form of}} and similar to automatically fetch the correct inflections for a given verb form. The former wouldn't be too hard to add to the existing module but the latter would be painful, and it would probably be better to rewrite the module instead. I have looked into doing this but I don't have that good a handle on Catalan verbs, esp. those in -er/-re. Do you have any good references that explain how Catalan verbs work, especially focusing on the -er/-re verbs, which is where the irregularities seem to be? The current module seems to push a lot of the complexity down into the template call, e.g. veure's invocation looks like this, which is a mess:

{{ca-conj-ure-ia2|v|e<!--
-->|past_part=vist<!--
-->|past_part_mpl=vistos<!--

-->|pres_ind_1_sg=veig<!--

-->|pret_ind_stem=vei<!--
-->|pres_sub_stem=veg<!--
-->|impf_sub_stem=vei<!--

-->|pret_ind_1_sg=viu<!--
-->|pret_ind_2_sg2=veres<!--
-->|pret_ind_3_sg2=véu<!--
-->|pret_ind_1_pl2=vérem<!--
-->|pret_ind_2_pl2=véreu<!--
-->|pret_ind_3_pl2=veren<!--

-->|impr_2_sg=veges|impr_2_sg2=ves<!--
-->|impr_2_pl2=veieu<!--
-->}}

I'd want to have this stuff all in the module itself, similarly to what's being done for Spanish, Portuguese, Italian, French, etc. Benwing2 (talk) 23:15, 11 December 2023 (UTC)Reply

Ok, thanks for looking into it, I sent you some reference material via email. Jberkel 09:01, 12 December 2023 (UTC)Reply

@Jberkel Thanks, I received it. Benwing2 (talk) 21:12, 12 December 2023 (UTC)Reply

@Jberkel I have a question, not sure if you know the answer. In -ar verbs whose root vowel is e or o, is that vowel pronounced è or é (or ò or ó for roots in o) in root-stressed forms (e.g. first-singular present indicative), or does it vary from verb to verb? In Proto-Romance it varied from verb to verb, and this is still the case in modern Italian. Spanish has a reflex of that in verbs that unexpectedly have ie or ue in root-stressed forms, but Portuguese has regularized the vowel quality (for example, using low-mid vowels in -ar verbs). I think in conservative varieties of Occitan at least, it varies from verb to verb, and this is reflected in the spelling. Benwing2 (talk) 08:41, 13 December 2023 (UTC)Reply

Pinging @Vriullop from ca.wikt. Ultimateria (talk) 23:28, 13 December 2023 (UTC)Reply

@Vriullop @Ultimateria It appears that it varies from verb to verb in Catalan, at least based on the two verbs pegar, which ca.wikt says has /ɛ/ in Central Catalan (consistent with its origin from Latin short ĭ), and membrar, which ca.wikt says has /e/ in Central Catalan (again, consistent with its origin from Latin short ĕ). But the situation is complicated by the dialects, where many dialects have /e/ for both verbs. I'm interested in finding a dictionary that indicates these vowel qualities so that maybe we can include them in the conjugation table, similarly to how the French and Italian conjugation tables give pronunciation; this would only be for Central Catalan for now (maybe forever), since the dialects are complicated. Benwing2 (talk) 00:19, 14 December 2023 (UTC)Reply

BTW if what I've said is correct, where can I find in Catalan dictionaries the indication of how the stressed vowel is pronounced for a given verb? Benwing2 (talk) 05:32, 14 December 2023 (UTC)Reply

For variation in dialects see the notation used with {{ca-IPA}}: ê for /ɛ/ in Central, /e/ in Valencian and /ə/ in Balearic. Similarly with ô, and è, é, ò, ó has no variations. This is fair consistent with few exceptions.

It is etymological, ê from Latin ĭ or ē, but with some exceptions.

The only dictionary that indicates the rhizotonic stress is the DNV, for example membrar says é, but it is only for Valencian and it could be either ê or é. It is only helpful for è and ò. I have not found any other source indicating systematically the rhizotonic stress, even the dictionary of pronunciation I have in my bookshelf only includes some paradigmatic verbs. Frankly, there are some verbs I don't know how they are pronounced, apart from my personal perception, not a good sample. The only clue is a noun related with the verb, and the etymology of inherited ones. On ca.wikt I include a rhizotonic parameter verb by verb with ca-IPA notation. Vriullop (talk) 09:25, 14 December 2023 (UTC)Reply

@Vriullop Thank you! I wonder why Catalan dictionaries are so bad at including the rhizotonic vowel quality patterns. Pretty much all monolingual Italian dictionaries list the rhizotonic quality (and position) for all verbs. What about the pronunciation of other forms, such as verbs with pres 3s in -ou or -eu? Are there any dictionaries indicating the vowel quality of these and other endings? Thanks for any help you can give. Benwing2 (talk) 09:57, 14 December 2023 (UTC)Reply

I'm not sure what you mean, 'mou' from 'moure' and 'veu' from 'veure' have the same stress that the infinitive.

Endings that may be ambiguous, without any graphic accent:

-em, -eu, as in cantem, canteu, cantarem, cantareu: ê
-essis, -essin, as in cantessis, cantessin: é
-eres, -eren, as in temeres, temeren: é
infix -eix- (-eixo, -eixes, -eix, -eixen, -eixi, -eixis, -eixin): ê, but not used in Valencian that change to -ix-

This is a summary from different sources, coherent with the etymology. Vriullop (talk) 12:38, 14 December 2023 (UTC)Reply

@Vriullop OK thanks, I suppose that the DCVB dictionary gives the infinitive pronunciation of words like moure. This is very helpful; if I have other questions I'll let you know. Benwing2 (talk) 19:55, 14 December 2023 (UTC)Reply

DCVB is fine for pronunciation, but in some cases is not complete or confuse. If necessary, you can compare it with the GDLC in the link "francès" that includes translation ca-fr and also pronunciation in Central Catalan, and the DNV for Valencian. Vriullop (talk) 20:57, 14 December 2023 (UTC)Reply

@Vriullop Thanks! Benwing2 (talk) 21:41, 14 December 2023 (UTC)Reply

@Jberkel I wrote a preliminary Catalan conjugation module; see User:Benwing2/test-ca-conj for examples. It has a few bugs in it that I'm working out, but it's close. Benwing2 (talk) 22:13, 17 December 2023 (UTC)Reply

Already looking good, thanks for working on this! Jberkel 22:26, 17 December 2023 (UTC)Reply

Pronunciation of feu is correct, 2n pl. regular with -eu, and the irregular past was spelled féu in pre-2016 orthography which is more helpful.

The pattern /e/ in Central and /ɛ/ in Valencian is possible, but rare. It can appear for different reasons:

Pronunciation of stressed e is not as uniform in Central Catalan as in other dialects. For example, some word can be /e/ in Barcelona and /ɛ/ in Girona or vice versa. In general, one of the two is considered formal and the other local or dialectal. The formal one is usually the expected one or the same as in Valencian and Balearic.
Recent loanwords may have hesitations in their adaptation. They are usually adapted with è, but with é for the Spanish ones.

The DCVB indicates these local details. In this case I trust the GDLC more. The DCVB comes from fieldwork in the 1920s. Some of the pronunciations have not been registered in other late 20th c. fieldwork. The GDLC compiles the pronunciation of the main reference work used for radio and TV speakers in Central formal speech. In short, this pattern is rare in formal pronunciation. As far as I can remember, it doesn't happen with verb forms, and it can be treated like other irregular cases that do not follow an expected pattern. --Vriullop (talk) 18:00, 19 December 2023 (UTC)Reply

Although the /e/-/ɛ/ pattern above is rare, the other way is more common: /ɛ/ in Central and Balearic, /e/ in Valencian. This is noted on cawikt as ë (double e), a variant of ê (triple e). Stressed schwa in Balearic is used in inherited words and inflections. In cultisms or loanwords (i.e. cafè), or just words perceived as literary (i.e. mestre), instead of schwa it is /ɛ/ as in Central. There are indeed verb forms with rhizotonic vowel ë. There is no equivalent with stressed o, but for consistency it could be noted ö (double o) instead of ô. Vriullop (talk) 08:02, 21 December 2023 (UTC)Reply

@Vriullop Thanks for all your help. I have implemented ë in Module:ca-IPA. Can you help me by fixing the default rules in the module that currently default to ê to instead default to ë when it's correct? For example, cens defaults to cêns when it should be cëns. This is in the mid_vowel_e() function of Module:ca-IPA. I don't know Catalan well enough to fix it myself, and the corresponding cawikt module in ca:Module:ca-pron/AFI seems to have the same rules we currently have. Benwing2 (talk) 20:49, 21 December 2023 (UTC)Reply

As stressed schwa depends on inherited v. cultism, there is too much variation with -ens, -ena, -enes endings to be able to redefine the rule. I have added a tracking and I have checked where it was being applied by default. After adding hint ê or ë, I think it is safer to remove this rule: Special:WhatLinksHere/Template:tracking/ca-IPA/ens-ena-enes. Later, I'll look other rules with default ê. Vriullop (talk) 09:19, 22 December 2023 (UTC)Reply

@Vriullop Thank you. I agree about removing the rule. In general I'm not much in favor of rules like this that are wrong a significant fraction of the time, and prefer to be explicit except when it's nearly completely predictable. Benwing2 (talk) 11:06, 22 December 2023 (UTC)Reply

@Vriullop I just discovered that cerndre is irregularly missing the first r in pronunciation. Does this carry through to inflected forms like cerno, cerns or are they pronounced regularly with /r/? Benwing2 (talk) 03:05, 24 December 2023 (UTC)Reply

BTW there is a bug in cawikt's handling of Balearic pronunciation with ê; hard /k/ shows up as /c/ in the first of two alternants. See ca:cerca for an example. Benwing2 (talk) 03:08, 24 December 2023 (UTC)Reply

@Vriullop OK, I have several more questions. I'll try to list them all here and avoid pinging you individually.

cors "privateering campaign" and cors "Corsican" are given without the /r/ in Eastern Catalan pronunciation both here and in cawikt. However, GDLC says /kórs/ for the former and /kɔ́rs/ for the latter. Which is correct, and if the /r/ is correct, do we need to update Module:ca-IPA?
I am going through mid-vowel verbs trying to update the inflected forms to have the correct vowels. I am probably going to implement something soon in {{ca-conj}} and/or {{ca-verb}} to let you specify the mid-vowel quality and display it, similar to what cawikt does. I cannot determine the vowel quality of the following verbs so far: cessar, conrar, copar, copsar, crepar, dopar, drenar, gestar. Can you help?
I am going to update Module:ca-IPA so you can individually specify the pronunciation of different dialects, as I have found some need for this. Apropos of this, I notice that the cawikt version of {{ca-pron}} supports ẽ; do you think we should support this, or just use the per-dialect support I am going to add?
Also, I'm more and more convinced that we should have few default rules for mid-vowel quality, and require it to be given explicitly in all cases that don't involve a well-known affix.
fossa "pit, grave, etc.": does it have /o/ [per GDLC] or /ɔ/ [per DNV, DCVB and cawikt]?
llei "law": does it have /e/ or /ɛ/ in Eastern Catalan, or some complex mixture? cawikt says /ɛ/, GDLC says /e/, DCVB says a complex mixture.

Thanks for your help, Benwing2 (talk) 06:41, 24 December 2023 (UTC)Reply

Lot of stuff here, but I'm happy to help.

'Cerndre' losts first r when followed by sequence -ndr-. That is infinitive, future and conditional. All other forms have regular pronunciation. This happens also with prendre and derived verbs. See ca:Categoria:Rimes en català -ɛndɾe including 14 verbs ending with -prendre. Sequence -rndr- only occurs in 'cerndre' and there is not any other term with sequence -rendr- other than these 14 verbs.
/c/ in Mallorcan is an allophone of /k/, i.e. local pronunciation [məˈʎɔ̞ɾ.ca̟]. You're right, this is phonological and not phonemic. Catalan works often include some phonological symbols in phonemic representations for dialectal contrast, but this is not the case of [c] with restricted use. I plan to remove it for being misleading.
'Cors' fixed on cawikt. This r is really retained, respelled 'corrs'. The module should not assume the lost of -r(s) in final coda for monosyllables. While most polysyllables do, most monosyllables don't. The problem is how to manage that.
My guest on rhizotonic vowels:
- cessar: é; inherited from Latin ě not followed by an opening context, and DNV é.
- conrar: ó; from unstressed o, reduction of conrear, DNV ó.
- copar: ó; from French /u/ and analogous to noun copa, DNV ó.
- copsar: ó; inherited from Latin ǔ, DNV ó.
- crepar etym 1: ë; as noun crep from the same French root, neologism not attested in Balearic, DNV é.
- crepar etym 2: é; from Latin ě, only used in Balearic.
- dopar: ó; neologism as in Spanish, close to the English original, DNV ó.
- drenar é; idem.
- gestar: é; from Latin ě, as the noun gesta from the same root, DNV é.
Notation ẽ is hardly used. It is better to fix that with parameters per-dialect: ca:Special:Diff/2245937. I'll remove it on cawikt.
Some rules for mid-vowels are theoretically justified. I have this pending to review the unwanted side effects. I agree that it shouldn't lead to erroneous results.
Fossa should be ò from Latin ǒ, but there have been some modern changes during the 20th c. that I am still unable to explain. The DCVB shows the situation in the first third of the 20th c. in accordance with etymology. Probably in Central today is hesitant. In this case, I would say ó in Central and ò in Balearic and Valencian, two dialects more conservative.
Llei fixed on cawikt. From Latin ē it should be ê, but the diphthong has changed it: é in most Central, retained è in northern Central, /ə/ in Balearic, é in Valencian.

Vriullop (talk) 18:27, 26 December 2023 (UTC)Reply

@Vriullop Thank you! I have applied the changes offline to the specific verbs and other words mentioned above, and I will push them soon. Still working on Module:ca-IPA. A few more questions:

More verbs where I'm not sure of the rhizotonic vowel quality: menar "to lead" (is this ê?), menjar "to eat" (apparently it uses now-deprecated ẽ?), mentir "to lie" (?), molar "to mock" (from Spanish; ó?).
mesa "altar, mense, table": cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/. Mistake?
messes "harvest time": again, cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/.

Benwing2 (talk) 05:46, 27 December 2023 (UTC)Reply

'menar': ê.
'menjar': é but Balearic ə. I'll modify the rizo parameter to accept an explicit /e/, /ə/, only used here.
'mentir': é in forms without -eix-.
'molar', to rock, from Spanish: ó.
'mesa' as a noun has two etyms with different pronunciations, but GDLC only show one in translations. Here DCVB is correct.
'messes', I would say é but irregular è in Central.

Vriullop (talk) 09:47, 27 December 2023 (UTC)Reply

@Vriullop Thanks for your quick response! I have made the offline updates. Some more questions (for N and O) ...

noble: I already pinged you about this. DNV says /o/ for Valencian but DCVB says /ɔ/.
nombre: cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
odre: Same. cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
ofi "office": Vowel quality? Maybe /o/ since the o is unstressed in oficina?
oi: DCVB splits the interjection into /ɔj/ "yes" from Latin hoc and /oj/ (expression of pain or surprise). GDLC and DNV group these two meanings and say the pronun for both is /ɔj/. Who is right?
orla "border, fringe": DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/.
oro "suit in a Spanish deck or cards": Same as previous: DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/. (Not in GDLC.)

Benwing2 (talk) 01:03, 28 December 2023 (UTC)Reply

For P:

peli "film" (clipping of pel·lícula): cawikt says pel·li has ê, so I assume this is the same, but it seems strange to have ê for a recent coinage.
perca "perch (fish)": cawikt says /ɛ/ for Valencian but DNV says /e/. DCVB doesn't give a pronunciation.
pesta "plague": cawikt and DCVB say /ɛ/ for Central but GDLC says /e/ (mistake?).
pleca "vertical bar": Balearic vowel? Is it ê?
poblar "to populate": DNV says stressed vowel is /o/ despite poble having /ɔ/. Mistake?
porro "leek; spliff": cawikt and DCVB say /ɔ/ but both GDLC and DNV say /o/.
posa "pose" (not in cawikt): GDLC says /o/ despite this being derived from posar, which has /ɔ/. (Are there two different pronuns/etyms here?)
postres "dessert": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
pregar "to pray": Presumably /e/ (same as prec)?

Benwing2 (talk) 05:23, 28 December 2023 (UTC)Reply

For P:

'peli' is an informal spelling of 'pel·li'. The latter is used in the press and has been consolidated, unlike other clippings. I spontaneously pronounce it è just like any word beginning with consonant + stressed e + l, including inherited ones from Latin both ě and ē. Being of general use and not exclusively colloquial, I would say ê, fully adapted in Central and the same value as unstressed in Balearic and Valencian.
'perca': ë. Expected é but è per context C+ě+r, not fully changed in learned borrowings.
'pesta' is weird, expected é but with some irregular è not enough explained in context C+ě+s. From the sources, è but irregular é in Central, although the irregularity is the other way around.
'pleca': ë, as a technical word, schwa is improbable in Balearic.
'poblar': I can't find any explanation for the difference between 'poble' and 'pobla'. Without any confirmation, for now I would say ò.
'porro': ó. Expected ò but usually changes to ó before -rr-.
'posa': noun ó and verb ò. Expected ò both from 'pausa' and 'pausare', but most current senses of the noun are calques of French or Spanish, both ó.
'pregar': é.

Vriullop (talk) 13:30, 29 December 2023 (UTC)Reply

On cawikt the pronunciation was first added according to DCVB. Revision with GDLC is partial, not completed. Inclusion of pronunciation on DNV is recent, not yet checked. Your guesses are usually correct.

For N and O:

'noble': ô. Expected ó, on first syllable changed to ò per consonant context, except on areas with Mozarabic influence as in Valencian.
'nombre': ô. The same case, but I trust DCVB for Balearic with irregular ó.
'odre': ô, but Balearic ó.
'ofi', I've never heard it in Catalan. My guess is ó either from an unstressed vowel or from Spanish.
'oi' both ò and ó. I trust DCVB with three groups, the last one used specially in Balearic. The two authors of the DCVB were Balearic, and both 'oi las' (surprise) and 'ois' (moans) result familiar to me heard from Balearic people. Probably outside the Balearic Islands people don't care about the difference with barely used senses.
'orla': ô. Again, an expected ó changed to ò except in Valencian, confirmed in descriptive works.
'oro': ô, hesitant by analogy with inherited 'or'.

Vriullop (talk) 15:32, 28 December 2023 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ For R:

reble "rubble": cawikt and DCVB say ê, but GDLC says /e/ for Central Catalan.
recar "to regret": DNV says /e/; DCVB suggests /e/ everywhere, is that right?
regar "to water": Etymologically should be ê, is that right? (OTOH reg has /e/ everywhere per GDLC and DNV)
regna, regne, regnar: These seem to have [ŋn]. Do all words in -gn- have this? If so we should fix Module:ca-IPA to do this automatically. (Is this Eastern Catalan only? Valencian seems to have [gn].)
reptar: In the meaning "to reprimand; to challenge" it seems to have rhizotonic /e/. In the meaning "to crawl" I am not sure.
resar "to pray": Since this is a Spanish borrowing, does it have /e/? res "prayer" seems to have /e/.
retre "to give back, to return": cawikt and DCVB say /e/ in Eastern Catalan but GDLC says /ɛ/.
rosca "screw thread": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
rosta "fried bacon, fried bread": cawikt says /ɔ/ for both Eastern and Western; DNV says /ɔ/ for Valencian but GDLC says /o/. DCVB has /ɔ/ and /o/ dialectally.
rosta (feminine of rost "steep"): Same. cawikt says /ɔ/ for all, DNV says /ɔ/ but GDLC says /o/. Here, DCVB has only /ɔ/.
rotar: Two etyms: (1) "to belch": Does it have /o/ like rot "belch"? (2) "to rotate": Does it have /o/ because it's borrowed from Spanish?
rotllo: "roll; annoyance": DNV says it has /o/ but rotlle has /ɔ/. Mistake? cawikt and DCVB say forms have /ɔ/ everywhere, and GDLC agrees that both forms have /ɔ/ in Central Catalan. Note also rotlo, where again DNV has /o/; here again, DCVB says /ɔ/ everywhere but in this case cawikt says uses ô to get /o/ in Valencian.

Benwing2 (talk) 08:37, 28 December 2023 (UTC)Reply

@Vriullop Thanks again for your detailed responses, I really appreciate the work you're putting into the responses. Issues I found involving terms with S:

seca "mint": GDLC says /ɛ/, DNV says /e/ and cawikt says ê, which are all compatible, but DCVB says /ɛ/ everywhere. In this case I wonder if DCVB is actually correct while both DNV and cawikt are mistaken.
sedar "to sedate": DNV says /e/ for root vowel but unknown in Central Catalan.
sense "without": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
sentir "to feel": DNV says /e/ root vowel. No dictionary attests the Central Catalan root quality, although /e/ is expected.
serva "serviceberry": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
setge "siege; figwort": cawikt and DCVB say é, DNV says /e/ but GDLC says /ɛ/.
soga "rope": DNV and GDLC both say /ɔ/ but DCVB says variously /o/ or /ɔ/ for a bunch of obscure places that I'm not familiar with but seem mostly Northwest Catalonian. I assume Balearic must have /ɔ/ but not sure.
sonso "clumsy, gauche": cawikt and DCVB say /o/ for both East and West; DNV agrees with /o/, but GLDC says /ɔ/ for Central Catalan. Maybe this is a case of changing over the last century?
sorna "sarcasm": cawikt says ô, but both DNV and GDLC say /o/. DCVB doesn't give pronun.
sosa "saltwort, soda ash": cawikt and DCVB say ô, but both DNV and GDLC say /o/.
sostre "ceiling": cawikt says ó and DNV says /o/, but GDLC says /ɔ/. DCVB maybe has the real story: /ɔ/ in Barcelona, /o/ elsewhere. I'm going with the idea that Western Catalan (Northwestern and Valencian) have /o/, while Central has /ɔ/ and Balearic has /o/. Correct?
sotjar "to spy on": DNV says /o/ root vowel. No dictionary attests the Central Catalan root quality, but I am guessing /o/ based on the proposed etymologies. Correct?

Note that I'm now 87% through the set of 2,722 terms that I identified for auditing the mid-vowel quality, and have finished with S. T represents about 7% of the total, V represents 4-5%, and the remaining letters around 1%. So I'm quite close to finishing, with lots of help from you :) ... Benwing2 (talk) 09:20, 30 December 2023 (UTC)Reply

For S:

seca: I think the correct one is ë, although I'm not sure about its evolution from Arabic.
sedar: expected ê from Latin sēdō.
sense and sens: expected ê, but such words often used as proclitics tend to become closed. So é but schwa in Balearic.
sentir: é as expected.
serva: ê is correct. As in other similar cases, the GDLC does not distinguish properly different pronunciations from different etyms.
setge: expected é, but è in Central per context subject to openness.
soga: ò in general. It was identified by Coromines in a handful of about 40 words that have changed an etymological ó by ò except in some specific areas. It is known as the Coromines law, and it is still unknown why it includes certain words and not others.
sonso: ó but ò in Central, for unknown reason to me.
sorna: ó in general.
sosa: ó in general.
sostre: it is one of the Coromines law, expected ó changed to ò. This law may have various degrees of extension. Probably most conservatives areas, Balearic and Valencian, maintain the old ó, while most Central has changed to ò. Usually Northwestern also changes by Central attraction, to be confirmed.
sotjar: not sure, but ó is the best guess.

Vriullop (talk) 08:31, 5 January 2024 (UTC)Reply

For R:

reble: expected é. The DCVB with ê seems by analogy with other words. I would say é but with an irregular ə in Balearic.
recar: é as expected from an earlier 'a'.
regar: ê as expected. Nouns 'rec' and 'reg' are interrelated and are not a good indicator for the verb.
All -gn- between vowels are pronounced [ŋn]. Also -n- followed by /k/ or /ɡ/, but this one was reverted per no phonemic.
reptar: é from Latin rěp(u)tō and ê from rēptō.
resar: é as noun 'res'.
retre: I really don't know which process applies here. By now I'd say ë, pending of confirmation.
rosca: ô.
rosta, as a slice of bacon usually fried with bread is a typical dish of the Pyrenees. Although it is the feminine form of 'rost', from the old sense "roasted", in the Pyrenees this ò usually changes to ó. In the DCVB, I read that the northernmost localities say ó, and ò it is quite far from the Pyrenees. In short, as a noun ó in Central, ò in Valencian and Balearic. As an adjective form: ò, although the GDLC does not separate it properly.
rotar: ó for both etyms.
rotllo, what a mess! It is not attested in Valencian until recent times, probably from Spanish rollo. This ó is archaic, not accepted in other areas where it is used from Old Catalan. 'Rotlle' is the inherited form, hardly used in Valencian where it is preferred the spelling 'rotle', both ò. 'Rotlo' is only used in Balearic, for me it is anecdotal how to try to pronounce it by outsiders with a range of alternatives spellings.

Vriullop (talk) 11:29, 4 January 2024 (UTC)Reply

@Vriullop Thank you again! BTW I have gone through and added (offline) stressed root vowels to all enwikt Catalan verbs with e or o where I could determine it, using some combination of cawikt, DNV, GDLC and DCVB. (It looks like I was able to figure out the vowel for 1,174 verbs in -ar, 33 verbs in -ir and all relevant verbs in -re and -er, and only couldn't figure out the vowel for 72 verbs in -ar and 2 verbs in -ir.) I am mostly done coding the changes I want to make to Module:ca-IPA and I'll use the new code to support displaying the root vowel info. I'll post the list of undetermined verbs soon. Benwing2 (talk) 19:55, 4 January 2024 (UTC)Reply

BTW I have finished the changes to Module:ca-IPA and Module:ca-headword and pushed all the root vowel additions. You can see them in action e.g. in flirtejar, besar, adreçar, annexar and several others. Benwing2 (talk) 07:45, 5 January 2024 (UTC)Reply

Also, I added tracking for all terms with defaulted mid vowel quality, with the plan of removing some of the defaults. The first word I looked at, for example, is amulet, a recent borrowing that claims to have ê, which seems unlikely. Benwing2 (talk) 08:07, 5 January 2024 (UTC)Reply

Here is the list of now 68 -ar verbs where I couldn't identify the Central Catalan root vowel (sometimes only in one etymology out of several): afogar, agregar, al·legar, alterar, amonestar, ancorar, atemptar, celebrar, col·laborar, commemorar, compensar, condensar, confessar, congregar, conrear, contemplar, crebar, delegar, denegar, depredar, desagregar, desintegrar, deteriorar, devorar, discrepar, dreçar, dropar, edulcorar, elaborar, elevar, encetar, engegar, enllumenar, ennuegar, ensopegar, entaforar, entollar, entrenar, esborrar, esbotzar, esmicolar, espitregar, esverar, evaporar, exacerbar, expectorar, explorar, gofrar, impetrar, increpar, integrar, interpretar, isolar, laborar, negar, perforar, prolongar, rememorar, retolar, rosegar, secretar, segregar, somorgollar, temptar, tomar, trafegar, trepar, trepollar. Benwing2 (talk) 08:12, 5 January 2024 (UTC)Reply

In some cases I can't be completely sure, these are my best guesses: afogar ó, agregar é, al·legar ê, alterar é, amonestar é, ancorar ó, atemptar é, celebrar é, col·laborar ó, commemorar ô, compensar ê, condensar ê, confessar é, congregar é, conrear ë, contemplar é, crebar é, delegar é, denegar é, depredar é, desagregar é, desintegrar é, deteriorar ó, devorar ô, discrepar é, dreçar ë, dropar ó, edulcorar ô, elaborar ó, elevar é, encetar é, engegar é, enllumenar ê, ennuegar ë, ensopegar ê, entaforar ó, entollar ò (both), entrenar é, esborrar ó, esbotzar ó, esmicolar ô, espitregar ë, esverar é, evaporar ó, exacerbar é, expectorar ó, explorar ó, gofrar ó, impetrar é, increpar é, integrar é, interpretar é, isolar ô, laborar ó, negar é (both), perforar ó, prolongar ó, rememorar ó, retolar ó, rosegar ê, secretar ë, segregar é, somorgollar ó, temptar é, tomar ó, trafegar ê, trepar é, trepollar ó. Vriullop (talk) 08:23, 10 January 2024 (UTC)Reply

Reviewing mid-vowel defaults tracked:

e/u: doesn't make any sense, probably it was intended for a diphthong -eu-.
o/u: also nonsense.
e/ct-cts-cts-ctes: too many variations è with cases of é only in Central.
e/dre-dres: mostly ë instead of é.
e/final-l: it is stable but needs to exclude -ell(s).
e/l-ls-ll: it's ok, I haven't found any problem.
e/ma-mes: too many variations
e/ens-ena-enes: too many variations ê/ë
e/nse-nses: it doesn't worth for a few words
e/nt-nts: mostly é with few exceptions, widely used
e/r-rs-ra-res: too many variations é/ê
e/rC: it's ok
e/sos-sa-ses: it's ok
e/t-ts-ta-tes: too many variations
è/s-blank: FIXME only in last syllable stressed, currently includes tèbia, època, ...
o/r-rs-ra-res: too many variations

Vriullop (talk) 09:20, 8 January 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Vriullop I have finished everything up through T and pushed the offline changes to Wiktionary. Issues I found with T:

teca three etyms: (1) "food"; (2) "teak"; (3) "theca". All three have /e/ per DNV, and (1) and (3) have /ɛ/ per GDLC. (1) has /ɛ/ per DCVB, otherwise not indicated. I am guessing then that (1) and (3) have ë, and (2) must have either é or ë.
temprar: Exactly parallel to emprar. cawikt says ê but DNV says /e/ for tempre. Is /e/ more recent for Central?
temptar "to try": /e/ per DNV, I'm guessing é per etymology.
tesla "tesla": /e/ per DNV, I'm guessing é.
testar "to witness": /e/ per DNV, I'm guessing é per etymology.
teu "your": /ɛ/ per GDLC for Central Catalan but /e/ per cawikt. GDLC says /e/ for meu "my" so I wonder if this isn't a mistake in GDLC.
text "text": /tekst/ per GDLC, /tɛkst/ per DNV. Correct? DCVB says /test/ for everywhere, which may be antiquated.
tomar (1) "to catch"; (2) "to knock down". Root vowel?
tondre "to shear": /o/ for Central in cawikt and DCVB, but /ɔ/ in GDLC (DNV says /o/). However, note that tosa has /o/ in GDLC. What's going on here?
tora "aconite": GDLC and DNV both say /o/ but DCVB says /ɔ/ for both Western and Eastern. Is /ɔ/ antiquated?
torbar "to disturb", torba "disturbance" and "torba" peat: GDLC and DNV both say /o/ but cawikt says /ɔ/ for Central Catalan (/o/ for Valencian). Is /ɔ/ wrong or antiquated?
tors "torso": cawikt says /o/ (dialect not indicated), but GDLC says /ɔ/ for Central (and DNV says /o/ for Valencian). I am assuming GDLC is correct.
trempa, trempar: cawikt says ê everywhere, in agreement with DCVB for tremp and trempa, but GDLC gives /e/ for both tremp and trempa; maybe /e/ is more modern as DCVB's fieldwork is ~ 100 years old.
trenca "duffel coat": A borrowing from Spanish trenca. The other meaning of the noun "breakage; lesser grey shrike" has ê but this seems unlikely for a Spanish borrowing. I'm guessing ë.
trepa "trimming; stencil" also "mob, riffraff, rabble" also a form of trepar "to drill, to perforate". DNV says /e/ for all three etyms; GDLC says /e/ for the first two, but DCVB says /ɛ/ for the meaning "mob, rabble". I am not sure whether all three are etymologically related.
tropa "troop; crowd": cawikt says /ɔ/ everywhere (and DNV says /ɔ/) but GDLC says /o/. DCVB says /ɔ/ for Eastern but /o/ for Girona; maybe /o/ for Central is more recent.
trotllo "medusafish": cawikt says /ɔ/ everywhere but DNV says /o/, so I'm assuming ô.

Also a few other issues:

alliberar: cawikt says /ɛ/ everywhere but DNV says /e/.
beca and derived becar "to give a scholarship to": cawikt says ë but DNV says è.
clon: cawikt says ò but DNV says /o/. I am guessing ô then.
emprar: cawikt says ê but GDLC says /e/ for empre. Is /e/ more recent for Central?
perseverar: cawikt says ê, are we sure? sever has é.

Benwing2 (talk) 01:53, 6 January 2024 (UTC)Reply

One more question (sorry for the barrage of questions): Currently the module section for Central Catalan unilaterally removes final single -r, whether absolutely word-final or followed by an -s. I'm thinking of making this less absolute, as follows:

Don't remove final -r(s) in monosyllables.
In non-monosyllables, remove final -r(s) in -ar, -er, -ir and in -[dtsç]or, but not otherwise. This is based on the fact that most words in -[dtsç]or are agent nouns and seem to fairly consistently remove the -r, while the remaining words in -or often (but not always) preserve the -r per GDLC. Here is a long list of such words: amor, humor, anterior, vapor, rumor, labor, major, tenor, tumor, terror, inferior, superior, clamor, posterior, furor, ulterior, tricolor, temor, rigor, vigor, menor, decor, olor, llavor, suor, licor, rubor, petricor, negror, remor, millor, albor, cremor, claror, grogor, blavor, maror, pitjor, frescor, senyor, finor, incolor, rojor, vermellor, blancor, lletjor, amargor, primor, favor, picor, escalfor, tremolor, esgarrifor, llacor, raor, xafogor. The idea is that to force the preservation of -r, write 'rr', and to force the non-preservation, write '-ó' (although if all these words preserve the -r in Valencian, we'd want some other signal, e.g. '-(r)'). Thoughts? Benwing2 (talk) 09:50, 6 January 2024 (UTC)Reply

This plan sounds fine, assuming:

The non-preservation happens when the final syllable is stressed. When unstressed only affects some words, like créixer, càntir.
In Valencian it is always preserved. To force the non-preservation in Central and Balearic writing '-(r)' or '-(r)s' is intuitive. In fact, this is similar to the rhymes, i.e. Rhymes:Catalan/a(ɾ).
In Balearic there are more loses of final -r than in Central Catalan. See amor, although the result is correct, it is not consistent when the preservation is forced writing 'rr', and it is not reasonable to assume that in Balearic no final -r is ever pronounced. Maybe it should be fixed with a per-dialect parameter.

There are many pending things above that require more time. Vriullop (talk) 11:57, 6 January 2024 (UTC)Reply

@Vriullop Thanks for your comments. I'm thinking that writing rr should force the pronunciation of final -r everywhere, while writing something like rh should cause it to be pronounced in Central Catalan but not Balearic. This is based on looking through the DCVB with a sample of the above nouns, some of which appear to have pronounced -r in the Balearics, some not, and for some it depends on where in the Balearics. More complex scenarios can be handled using dialect-specific params (which are now implemented; see llei for an example). Benwing2 (talk) 21:23, 6 January 2024 (UTC)Reply

Another possibility in place of rh for "pronounced everywhere but Balearics" is (rr). This sets up a hierarchy of pronunciation: rr > (rr) > (r) > nothing. Benwing2 (talk) 01:13, 7 January 2024 (UTC)Reply

BTW I am planning on making it required to specify the way final -r is pronounced, using one of rr, (r), (rr) [or maybe rh if we decide on that] or omitting it, except in the circumstances where it defaults to (r), which are multisyllabic words ending in stressed -ar, -er, -ir or -[dtsç]or. In all other circumstances, the pronunciation seems far too irregular to provide a default.

Note that I have already removed the majority of defaults for mid vowel o and added the vowel explicitly, and I'm planning on doing the same for mid vowel e. For the defaults I removed, either there were few places that made use of the defaults or there were many but with lots of errors, e.g. o and e in the penultimate syllable with -i or -is in the last syllable were defaulting to ò and è respectively, which makes sense for adjectives of this form but doesn't work for subjunctive verb forms, and there were lots of places where this default was being used for subjunctives, producing incorrect results.

One other thing: the pronunciation given in GDLC for meteor is [məteɔ́ɾ], with unstressed [e]. Is this correct? If so I'll need to add a special symbol to allow for unstressed unreduced vowels. However, maybe it's a mistake; I found a pronunciation on Forvo here [1], which sounds more like [mətəɔ́ɾ] (BTW cawikt says [mətəóɾ] with /o/, which may be wrong as well for Central Catalan). Benwing2 (talk) 05:08, 7 January 2024 (UTC)Reply

I forgot to add, I'm implementing a shortcut notation to make it easier to specify things like the pronunciation of final -r without having to repeat the entire word. If you write [FROM:TO] where FROM is part of the spelling and TO is the corresponding respelling, it will make that substitution in the respelling as long as it's unambiguous. So you can write [or:ôrr] for meteor. To make it even shorter, in cases where the spelling and respelling are similar enough, you can just write the respelling, hence [ôrr], and the code knows that ô should match either o or ô in the original spelling and rr should match either r or rr. Another common example is [ks], which is equivalent to [x:ks] and can be used to respell x as ks in words like boxejador. This will all be documented in {{ca-IPA}} as soon as I push the code. Benwing2 (talk) 05:16, 7 January 2024 (UTC)Reply

Great.

For final -r, I like the hierarchy rr > (rr) > (r)
'meteor' with unstressed [e] is correct. No need to do anything in the module, function reduction_ae does not apply any reduction in groups 'eà' and 'eò'.
A shortcut for respelling is useful.

Vriullop (talk) 10:27, 8 January 2024 (UTC)Reply

@Vriullop I have implemented everything described above and fixed up all terms in final -r(s) appropriately. The use of the respellings for -r is documented in {{ca-IPA}}. The substitution notation like [ó(r)] is still being documented. Benwing2 (talk) 02:29, 10 January 2024 (UTC)Reply

@Vriullop Thanks for your comments! I have added add the root vowels you specified and am going through the defaulted mid vowel conditions and fixing them up. One thing I notice is that written bl pronounced /b.bl/ and similarly written gl pronounced /g.gl/ aren't correctly handled. For bl at least it seems not all occurrences of bl result in this doubling, e.g. doblar does but sublim doesn't yet they have the same structure in terms of # of syllables, word shape, position of the accent, etc. What do you recommend? i tried manually adding written g.gl to segle, writing it as seg.gle, but then Valencian also gets the doubling, which is wrong. I see two approaches: (1) Manually require all doubled bl and gl to be written as bbl and ggl except maybe in certain suffixes (e.g. -able(s), ible(s)), and have the Valencian-specific code remove the doubling and convert it back to single stops; (2) Double bl and gl by default. This would mean we'd need some method of indicating the non-doubled occurrences, maybe by writing sub.lim or something (although this might be problematic when we start providing phonetic output with fricative [βɣð], which I'd like to do soon; not actually sure though if there will be an issue). Thoughts? Benwing2 (talk) 07:02, 12 January 2024 (UTC)Reply

The groups -bl- and -gl- are geminate in Central and Balearic in post-stressed position: poble /ˈpɔb.blə/, regla /ˈreɡ.ɡlə/, including endings -able, -ible. That can be coded in the module. It doesn't happen in Valencian, nor in pre-stressed position, as in sublim. But all its derivatives are also geminate even if in pre-stressed position: poblar, població, reglar, reglament, ... That needs to be respelled pobblar, pobblació, regglar, regglament, and then undone in Valencian. Vriullop (talk) 08:22, 12 January 2024 (UTC)Reply

@Vriullop Got it, thanks. I'll implement this. What do you think of just providing phonetic output and changing the /.../ to [...]? This seems consistent with what the various dictionaries do; or at least, they explicitly show the fricative allophones [βɣð]. This would mean, for example, that the issue of whether to display [ŋ] goes away: we just display it whenever it's pronounced as such. Benwing2 (talk) 20:04, 12 January 2024 (UTC)Reply

I have implemented what you said for -bl- and -gl-. I am currently working on auto-adding secondary stress to adverbs in -ment. (In the process I'm adding a quick shorthand to indicate a part of speech for a given term, e.g. n/RESPELLING or just n/ for a noun, a/RESPELLING or just a/ for an adjective, etc. The idea here is that terms in -ment default to adverbs, which means they get secondary stress by default, but you can override this by specifying n/ for a noun like desembarcament or a/ for an adjective like vehement. Some terms need both a part of speech and respelling, e.g. desdoblament needs n/[bbl] to indicate that it's a noun and the -bl- is pronounced /bbl/.) I have a question though about this. Adverbs in the DNV are indicated with *primary* stress on the preceding component and no stress on -ment, e.g. see [2] for feliçment. This seems rather strange to me and it's contrary to what the Wikipedia article on Catalan phonology says. Is this really true or is it just something weird in the DNV? Benwing2 (talk) 23:40, 12 January 2024 (UTC)Reply

BTW I found an exception to the rule that post-stressed -bl- is geminate: bíblic (and Bíblia). Are there others? If so and given how many exceptions there are in the other direction, I wonder if we shouldn't just make all -bl- and -gl- geminate by default in Central Catalan and Balearic, and require that all cases where this doesn't happen get rewritten using [b.l] or [g.l]. Benwing2 (talk) 04:01, 13 January 2024 (UTC)Reply

I implemented the auto-adding of secondary stress to adverbs in -ment, along with the part of speech hints described above, and fixed up all nouns and adverbs in -ment appropriately. (I actually added pronunciations to all or almost all nouns and adverbs in -ment that were missing them; this took several hours for adverbs because there are around 800 of them in -ment, and many of them have secondarily stressed e or o, which needed looking up.) The mid vowel hint now applies to the part preceding the adverbial -ment, not to the -ment itself (which is always pronounced /men(t)/ with /e/). Note also that in the future, these part of speech hints can also help with things like terms in -ar, where adjectives in Central Catalan pronounce the final -r but nouns and verbs generally dont. Benwing2 (talk) 07:33, 13 January 2024 (UTC)Reply

OK, from the GDLC it looks like there are actually three ways that -bl- can be pronounced: obligar has [βl], doblar has [bbl], and obliterar has [bl]. Is that correct? If so I'll need to come up with some notation to distinguish these three. Maybe we should write o-bliterar to get [bl]; this is consistent with words like hipoglucèmia, which have hard single [gl] following a prefix with secondary stress [ìpuglusɛ́miə]. This would suggest a respelling hípo-glusèmia. Then if we need post-stressed [βl], we write e.g. Bíb.lia, and if we need post-stressed [bl] for some reason we'd write e.g. Bí-blia or something, and to get post-stressed [bbl] we'd write e.g. Bíbblia (or rely on the default). Make sense? Sorry to dump so much text on you. Benwing2 (talk) 09:25, 13 January 2024 (UTC)Reply

Great work here.

The inclusion of allophones βɣðŋɱ does not imply to change the transcription with brackets [...] In fact, /β/ is not a w:voiced bilabial fricative but a simplification without diacritic of an approximant [β̞]. Catalan works follow a convention of "broad transcription" with the inclusion of what is considered relevant and without any claim about phonemic values. A purely phonemic transcription is a theoretical discussion. According to different authors, between 25 and 31 phonemes can be considered in Catalan. For example, the schwa is a predictable dialectal allophone, but it is relevant in contrast with other Romance languages. If it were necessary to mark that it is not strictly phonemic, frwikt uses \backslashes\. They are also used by the Merrian-Webter as a notation for its own IPA transcription. The criteria followed in enwikt do not seem consistent enough to me.
The DNV does not show primary and secondary stress, nor does it in compound words. It is more noticeable in Eastern dialects without schwa in secondary stress. The stress showed in adverbs with -ment is misleading.
'Bíblic' and 'Bíblia' are the only exceptions to geminate bl.
I have not found any explanation for 'obliterar' and 'hipoglucèmia'. See https://giec.iec.cat/textgramatica/codi/4.4.3.3. Maybe as cultism in very formal speech, but I think it doesn't worth to make exceptions here. On the contrary, note that /β/ does not happen in Balearic and formal Valencian after a vowel, that is in dialects that distinguish /b/-/v/.

Vriullop (talk) 09:17, 15 January 2024 (UTC)Reply

@Vriullop Thanks for your response, this is very helpful. I am currently working on fixing up terms with written x (there are a lot of mistakes) but I'm almost done with the offline portion and I think next I'll focus on adding the fricative allophones and correctly handling multiple words. For handling multiple words I need to know the following:

What are the unstressed words? I assume they are all the proclitic object pronouns em, et, es, el, la, els, les, li, ens, us, ho, hi, en; plus the enclitic ones -me, -te, -se, -lo, -la, -los, -les, -li, -nos, -vos/-us, -ho, -hi, -ne (which might already be handled correctly); the contracted ones with apostrophe (which may already be handled correctly); maybe the unstressed possessives mon, ma, mos, mes, ton, ta, tos, tes, son, sa, sos, ses; the prepositions a, de, per, amb (and obsolete ab?), en (what about cap, des?); the prepositional contractions al, als, del, dels, pel, pels; articles el, la, els, les (already handled as proclitic pronouns), personal articles en, na (what about indefinite articles un, u, uns?); maybe salat articles es/ets, sa, ses, so, sos; the conjunctions i, o (what about si?). Any others?
Which assimilation rules apply across words? The Wikipedia article Catalan phonology says that final -s voices before a vowel, which seems to cause a preceding consonant to voice as well, hence tots els has /dz/ in the middle. I assume that lenition of written b d g occurs across word boundaries as well. What about final omitted -r? Does it reappear before a vowel in the next word, e.g. in a phrase like vaig amar una dona? (And for that matter, does the -ig in vaig become voiced in this phrase?) Do you have any references on this?

Thanks again. Benwing2 (talk) 09:57, 15 January 2024 (UTC)Reply

The list is correct: proclitic and enclitic pronouns, unstressed possessives, prepositions but not 'cap', 'des', contractions, articles including personal ones and salats, indefinite articles but not 'u', conjunctions including 'si' and 'ni', and also que as a pronoun and conjunction.

In general, contact between words have the same process of assimilation, voicing, or devoicing that inside words. A typical example is els avis /əlz/, els savis /əls/, and tots els is really /ˈtodz.əls/, and vaig amar /ˌbad͡ʒ.əˈma/. The final -t reappears followed by a vowel (sant Antoni /ˌsan.tənˈtɔ.ni/). The final -r of infinitives only reappears followed by a pronoun (anar-hi /əˈna.ɾi/). From chapter 4.4 onwards of the IEC grammar you can find a lot of examples. Vriullop (talk) 12:37, 15 January 2024 (UTC)Reply

@Vriullop Thanks again for your help. I finally finished most of the work on multiword support. Still to go is approximant allophones of b/d/g, correct handling of apostrophes (represented with ‿), and ‿ as an indicator of liaison in respelling for cases like Sant Antoni respelled Sànt‿Antòni (which should produce /ˌsan.tənˈtɔ.ni/). I (more or less) read chapter 4.4 in the IEC grammar and I notice it also talks about certain cases of total assimilation where maybe cap de is pronounced /kad də/ or something, but I'm not sure we should implement that. I have some questions though:

Brunsvic (as in e.g. Nova Brunsvic) given as [bɾunzvík] in GDLC, is the v correct?
For drets humans, the module currently generates /ˈdɾɛdz uˈmans/, is that correct?
fer cas, fer acte de presència: Is the <r> pronounced in Central Catalan?
Sant Llorenç de la Salanca: the module currently generates /ˈsaɲ ʎuˈɾɛnz də lə səˈlaŋ.kə/ for Central and /ˈsand ʎoˈɾɛnz de la saˈlaŋ.ka/ for Valencia; correct? In general, does final -ç voice when the next word begins with a vowel?
The IEC grammar is equivocal about whether b/d/g become fricatives after /r/, /ɾ/ and /z/, what should we do in this case?
It appears double schwa /əə/ is often compressed to single schwa /ə/ in Central and maybe Balearic, but not in Valencian. This is indicated in GDLC and seems to operate fairly consistently if the second schwa is in a closed syllable (sobreescalfament, contraescarpa), but only sometimes in an open syllable (centreafricà, contraatac). Can you comment here? Likewise, /i/ or /u/ followed by schwa seems to elide the schwa in aeroespacial, autoescola, antiespasmòdic, but only sometimes if the schwa is in an open syllable (hence not in autoerotisme, antiemètic but yes in fotoelèctric, fotoelectricitat, macroeconomia). Likewise /uu/ seems to compress to /u/ if the second /u/ is in a closed syllable (microorganisme), but only sometimes in an open syllable. How do you think we should handle these cases?
I am trying to figure out what to do for written <tn>, <tm>, <tl>, <tll>. It seems that these tend to be pronounced as geminates in native words (e.g. cotna, setmana) but with [d] in cultisms/learned words. I'm thinking maybe we should make the cultism behavior the default and require respelling for the remainder, and least for <tm> where there are more terms like ritme, aritmètic, atmosfera than terms like setmana. But maybe this should differ depending on the different spellings, e.g. <tl> even in a cultism like atlàntic seems to have a geminate in it in Central Catalan but not in Valencian. Can you comment on what you think should be done?

Benwing2 (talk) 22:45, 26 January 2024 (UTC)Reply

Note, I also revamped the testcases, see Module:ca-IPA/testcases (which demonstrate there's still a lot to fix). Benwing2 (talk) 23:26, 26 January 2024 (UTC)Reply

Brunsvic is strange. It is supposed the GDLC includes pronunciation from the Diccionari ortogràfic i de pronúncia (DOP), but it turns out that the DOP does not include proper names. For non-Catalan place names I check ésAdir, a website for radio and tv journalists, and it shows /'bɾunz.βik/ as I expected.
'Drets humans' is correct.
'Fer cas', 'fer acte', are correct. The r of infinitives only reappear followed by pronouns: fer-se /ˈfer.sə/, fer-hi /ˈfe.ɾi/, fer-t'ho /ˈfer.tu/...
'Sant Llorenç de la Salanca' is correct. Final /s/ of Llorenç is voiced /z/ followed by a voiced consonant or by a vowel.
The IEC grammar is too much descriptive about approximants, when they may or may not appear. Considering that /β/ is rare in dialects with contrast /v/-/b/, that is Balearic and Valencian, and trying to be consistent with GDLC and DNV:
- No approximants r/s + b/d/g in Central.
- No approximants r/s + b in Balearic and Valencian.
- Approximants r/s + d/g in Balearic and Valencian.
In general, the concurrence of two identical vowels /əə/ (or /aa/, /ee/), /uu/ (or /oo/) is reduced to a single vowel. Variations may depend on formal v. informal, or common use v. cultism, or emphasis of some prefixes. It is hard to define any exception.
Written <tm> and <tn> are geminated in a handful of inherited words: cotna, reguitnar, setmana and its derivatives. But 'setmana' with a single /m/ in Valencian. 'Vietnamita' and 'sotmetre' are hesitant. Others like 'ritme', 'ètnic', 'algoritme' are cultisms /dm/.
Written <tl> is always /ll/ in Central and Balearic. In Valencian it is /ll/ in inherited words and /dl/ otherwise. Valencian inherited words include those with alternative spelling <tll>: ametla > ametlla, butla > butlla...
Written <tll> as alternative spelling of inherited <tl> is pronounced /ʎʎ/ in Central and /ll/ in Balearic and Valencian. Although the DNV includes 'ametlla', 'butlla'... it is not really used, and if written it is still pronounced as <tl>. As a cultism, like 'ratlla', 'bitllet' or 'butlletí', it is pronounced /ʎʎ/ in Central and /ʎ/ in Balearic and Valencian.

Vriullop (talk) 10:54, 29 January 2024 (UTC)Reply

@Vriullop Thanks. I have (already) implemented most of the above things. I haven't yet implemented reduction of adjacent unstressed vowels or redone the implementation of <tl> and <tll>. As for Sant Llorenç de la Salanca, the module formerly generated [ˈsand ʎoˈɾɛnz ðe la saˈlaŋ.ka] for Valencia (note the [d] in /sand/) but I am guessing this is wrong, so I changed it so it now generates [ˈsaɲ ʎoˈɾɛnz ðe la saˈlaŋ.ka]. Basically I am guessing that elision of stops after nasals happens in Valencia before a consonant but not a vowel or utterance-finally. Is this correct? Benwing2 (talk) 01:53, 30 January 2024 (UTC)Reply

I didn't notice 'sant'. It is correct, elision of t and assimilation of the nasal before a consonant, not before a vowel or isolated.--Vriullop (talk) 08:00, 30 January 2024 (UTC)Reply

Your bot is removing valid categories

Latest comment: 8 months ago11 comments2 people in discussion

e.g. {{C|de|Western Sahara}} at Westsahara. —Justin (koavf)❤T☮C☺M☯ 00:55, 1 January 2024 (UTC)Reply

@Koavf This is unavoidable. When you add a page to a category, sometimes it takes a little while for the category to register having the page in it, and in the meantime it shows up in CAT:Empty categories, which is what I use periodically to delete empty categories. I check that category before deleting the empty categories referenced, but I can't notice everything. Any non-empty categories so deleted will get re-created in a few days in any case. Benwing2 (talk) 01:06, 1 January 2024 (UTC)Reply

What are you talking about? That category was on that page for 5.5 years and your bot removed it for no reason. How is that unavoidable? Are you telling me that your bot is going to re-add all of these categories and undelete them as well? —Justin (koavf)❤T☮C☺M☯ 01:09, 1 January 2024 (UTC)Reply

Dude, fuck off. Seriously. Yelling at me is not going to get me to help you any quicker than writing nicely.

As for my response, I thought you were referring to my recent deletion of empty categories (as of a few hours ago) rather than a bot change from a month and a half ago. In the future I'd recommend you link to the specific diff. My removal of the category at that time was a by-hand change, not a script change, even though the bot pushed the change; that's what "manually assisted" means (and I have a strong feeling I've already explained this to you). The reason for the removal is that Module:place normally auto-adds categories of this nature, and I thought it would in this case; the reason it didn't is apparently because Western Sahara is listed in Module:place/shared-data as a country, but its definition identifies it (correctly) as a territory rather than a country. I'll fix this so it gets correctly auto-added. Benwing2 (talk) 01:30, 1 January 2024 (UTC)Reply

I was much nicer than you were just now and was in no sense "yelling". There was no reason for that language. I didn't realize that what I wrote was ambiguous and I thought that referring you to the entry would be sufficiently clear where you can see what your bot (or script or by-hand you) did. Thanks for agreeing to fix this and undelete all of these categories. When will this happen? —Justin (koavf)❤T☮C☺M☯ 22:18, 1 January 2024 (UTC)Reply

When will you or your bot undo these category removals? —Justin (koavf)❤T☮C☺M☯ 22:42, 15 January 2024 (UTC)Reply

@Koavf Which removals are you referring to? Specifically to do with Western Sahara, or are there any others? Benwing2 (talk) 22:44, 15 January 2024 (UTC)Reply

The only ones I am aware of are removals of the sort {{C|CODE|Western Sahara}} which emptied several categories that were then deleted. I'm not familiar with any others. —Justin (koavf)❤T☮C☺M☯ 22:46, 15 January 2024 (UTC)Reply

When will you or your bot undo these category removals? —Justin (koavf)❤T☮C☺M☯ 01:37, 21 January 2024 (UTC)Reply

@Koavf Did you not get my ping? I did this days ago. Benwing2 (talk) 02:37, 21 January 2024 (UTC)Reply

I see that it did and no, I didn't. For some weird reason, I also did not get updates for this thread even after subscribing. :/ Thanks a lot. —Justin (koavf)❤T☮C☺M☯ 10:16, 21 January 2024 (UTC)Reply

Twice-borrowed terms

Latest comment: 9 months ago3 comments3 people in discussion

I looked up παλάβρα, which is from παραβολή after passing through Ladino, and found out that, after moving all the "twice-borrowed terms" categories to "terms borrowed back into", there are still lots more Greek twice-borrowed terms than Greek terms borrowed back into Greek. This may also be true of other languages. Can you look into it? PierreAbbat (talk) 16:43, 1 January 2024 (UTC)Reply

@PierreAbbat It’s because they were added manually due to the origin being Ancient Greek, which is a misuse of the category imo. Theknightwho (talk) 19:17, 1 January 2024 (UTC)Reply

Yeah @Pierre, if I may expand on what Theknightwho said, it is indeed because of Ancient Greek being considered a separate language, and this is discussed at Wiktionary:Beer_parlour/2023/November#Does_'terms_borrowed_back_into_LANG'_include_cases_where_the_borrowing_was_from_an_ancestor? (and actually quite a few other places over the years, e.g. Wiktionary:Etymology_scriptorium/2016/June#Twice-borrowed_term_or_term_derived_from_an_older_stage_of_the_same_language?, Wiktionary:Beer_parlour/2011/October#Twice-borrowed_terms), and ... it's tricky. Because ... while I'm sympathetic to the potential complaint that it's somewhat arbitrary that a word used in the modern form of Hebrew or Latin (or Chinese) and derived from the variety spoken two thousand years ago can be automatically categorized as "borrowed back" while a word in modern Greek or English can't be, just because we decided it was most convenient to handle the changes those languages underwent as still being ==Hebrew==, ==Latin== (or ==Chinese==) but decided to split the changes Greek underwent between two languages ... we do have to draw a line somewhere or else we get into absurdities (e.g. a term from Proto-Indo-European, which went into French, and was borrowed into English, is twice-borrowed/borrowed-back?), and if we draw the line anywhere other than "whatever we've decided to consider a separate full language", it gets fuzzy and messy fast. But please comment in the November BP discussion linked above if you have suggestions. - -sche (discuss) 19:45, 1 January 2024 (UTC)Reply

New :toBcp47Code() method

Latest comment: 8 months ago2 comments2 people in discussion

If I interpret this recent change to Scribunto correctly, it provides a way to convert from MediaWiki langcodes to proper langcodes directly. Might be worth incorporating, as I imagine it’ll simplify some of our code, and I think you’re more familiar with that side of things than me. Theknightwho (talk) 15:50, 2 January 2024 (UTC)Reply

@Theknightwho Unfortunately I'm not sure this is useful for our purposes. Wiktionary language codes aren't always the same as MediaWiki language codes and I don't think we ever need to convert MediaWiki -> BCP47; instead if anything we'd need to convert MediaWiki <-> Wiktionary and Wiktionary -> BCP47. Benwing2 (talk) 22:47, 15 January 2024 (UTC)Reply

Addition to quotation-template documentation

Latest comment: 8 months ago2 comments2 people in discussion

I just fixed a module error caused by WF converting a quote to {{quote-book}} without checking what goes where. The template documentation is thoroughly organized, voluminous, and useless for figuring out how to fix parameter values in the wrong slots. I was going to add a little index of positional parameters, but that would have required reverse-engineering your documentation module. Instead, I'm just going to dump a mockup here, and let you deal with it:

Positional parameters

Position:	1	2	3	4	5	6	7	8
Description:	Language code(s)	Year	Author	Title	URL	Page	Quote	Translation
Equivalents:		`\|year=`	`\|author=`	`\|title=`	`\|url=`	`\|page=` `\|pages=`	`\|text=` `\|passage=`	`\|t=` `\|translation=`
See group:	Quoted text	Date	Author	Title	Title	Page and line	Text	Text

An alphabetical index of parameter names might also be nice.

And, no, I don't want fries with that...

Thanks! Chuck Entz (talk) 06:14, 5 January 2024 (UTC)Reply

@Chuck Entz Yeah there are so many params that organizing them properly is a very challenging task. For this reason I tried to do away entirely with positional params but some people squawked loudly enough that they are kept for {{quote-book}} and {{quote-journal}}, and disallowed for the rest. I think your mockup is a good idea. Benwing2 (talk) 06:17, 5 January 2024 (UTC)Reply

Using the Old French conjugation table as an inspiration

Latest comment: 8 months ago6 comments2 people in discussion

I was trying to create a more complex conjugation table for the Old Spanish language. Then I started viewing other templates and learned that the one used for the Old French language is perfect. I might be able to perform some basic editions to adapt for the Old Spanish conjugation system. However, I couldn't get a sample of that template to edit as there are so many links together. So would you please share with me a simple, editable sample of the template of the Old French language so I can apply it to this page: Cantar? Besides, it'd be helpful to better standardize Wiktionary. Thalyson2019 (talk) 05:42, 6 January 2024 (UTC)Reply

@Thalyson2019 The Old French conjugation tables aren't implemented using templates but rather using a module: Module:fro-verb. I agree that it's a good base to start with when designing a conjugation system for a language that wasn't really standardized. I'm not sure if you are comfortable working in Lua, because the module is written in Lua and it's not really possible to do what it does just using template syntax. Benwing2 (talk) 05:57, 6 January 2024 (UTC)Reply

Is there any solution for that? I already have the verbs and their positions in mind. I'm not familiar with Lua, even though I create basic templates. Thalyson2019 (talk) 06:08, 6 January 2024 (UTC)Reply

@Thalyson2019 You'd have to get someone to create the Lua module for you. I can't commit to something like this right now as I have already committed to several other projects. However if you create some mockups and link them here, then if/when I or someone else is able to contribute, the mockups can be a good starting point. Benwing2 (talk) 06:10, 6 January 2024 (UTC)Reply

Such mockups should be in format of codes or pictures? Thalyson2019 (talk) 06:14, 6 January 2024 (UTC)Reply

@Thalyson2019 Maybe some sample template calls for some simple verbs like cantar and some complicated verbs as well (tener? ir?). I or anyone working on this would in addition need some good resources on Old Spanish verb conjugation. Benwing2 (talk) 06:18, 6 January 2024 (UTC)Reply

Finnish inflections

Latest comment: 8 months ago6 comments4 people in discussion

Hey Benwing, I know that WingerBot is used to mass-create the inflection pages for Romance verbs. Is there any way that it could do similar work with Finnish noun forms? According to Jberkel's last data dump there are literally millions of Finnish redlinks, most of which appear to be nouns, so bot help is probably necessary to make a real dent. Thanks for your time! Vergencescattered (talk) 20:01, 6 January 2024 (UTC)Reply

@Vergencescattered: have you talked to @Surjection about this? As a native speaker with a bot, they would be a more logical choice, and more likely to be aware of potential problems. Chuck Entz (talk) 20:35, 6 January 2024 (UTC)Reply

@Vergencescattered I agree with Chuck. Also pinging @Hekaheka. E.g. there may be a reason these forms aren't created (too many of them?). Benwing2 (talk) 21:20, 6 January 2024 (UTC)Reply

There are probably somewhere around 200,000 nouns in Finnish and each has 30 inflected forms (15 cases in singular and plural) without taking into account any suffixes. This is the rough number found in Nykysuomen sanakirja. Adding dialects and slang one gets roughly to half a million or more. That would give 6 to 15 million entries. If we add the six (third person possessive suffixes are the same for plural and singular but to compensate this potential simplification there are two of them) possessive suffixes, the number of potential entries increases to 40 to 100 million. Some of the forms might be unattestable as abessive, comitative and instructive are quite rarely used but that does not cut more than 20% of the total. On top of this each verb has close to one hundred inflected forms if we take into account the possessive forms of some infinitives and participles.

This leads me to think that we might need a new approach to inflected forms in general. Perhaps they should have an entry of their own only in such rare cases in which the inflected form has a meaning or meanings that cannot be readily derived from the lemma form. In most cases the system would work so that a search for an inflected form would redirect to the article of the lemma form. --Hekaheka (talk) 23:33, 8 January 2024 (UTC)Reply

@Hekaheka It would be great if MediaWiki could autogenerate the text of an inflected form, but in its current state it can't do either that or redirect from an inflected form to a lemma form. IMO the most useful thing about having inflected forms entered as such is when you have homophones or homographs between different inflected forms. This occurs fairly often in the Romance languages, for example, between noun and verb forms or between adjective and verb forms. It also occurs fairly often in Russian between noun and verb forms but rarely for adjectives except for short forms of adjectives; for this reason I have never done a bot run to create Russian adjective forms (besides the fact that there are a lot of them). If Finnish grammar is largely regular and doesn't have a lot of homonyms, I would think it's not useful to have inflected forms generated. I suppose for the moment we need to use our judgment as to whether it's worth it to create such forms. Benwing2 (talk) 23:38, 8 January 2024 (UTC)Reply

I would definitely appreciate their input! I didn't know about Surjection or their bot before you mentioned them, so I apologize for bothering you about it. Thank you! Vergencescattered (talk) 23:27, 6 January 2024 (UTC)Reply

Request to deploy `{{szy-pron}}`

Latest comment: 5 months ago8 comments2 people in discussion

I've created a Sakizaya pronunciation template, and I need help deploying it to all Sakizaya language entries on Wiktionary. Could you assist with this using your bot account? --TongcyDai (talk) 17:29, 7 January 2024 (UTC)Reply

@TongcyDai What needs to be done here? Are there any cases where manual respelling or other help for the template is needed? Benwing2 (talk) 22:54, 7 January 2024 (UTC)Reply

When adding the template, simply insert {{szy-pron}} into each Sakizaya entry, no parameters and respelling are needed. TongcyDai (talk) 10:16, 8 January 2024 (UTC)Reply

Please let me know if there's anything else you need from me to deploy the template. --TongcyDai (talk) 18:38, 1 March 2024 (UTC)Reply

@Benwing2 Is there anything I can help with? --TongcyDai (talk) 07:06, 17 April 2024 (UTC)Reply

@TongcyDai Apologies for the delay, I am working on this now. However, the template should be called {{szy-IPA}} for consistency with other pronunciation templates. Do you mind if I rename it? Benwing2 (talk) 23:26, 17 April 2024 (UTC)Reply

Thank you for the update. I appreciate your help and have no objections to renaming the template, please go ahead. --TongcyDai (talk) 07:33, 18 April 2024 (UTC)Reply

@TongcyDai Done. Benwing2 (talk) 23:18, 18 April 2024 (UTC)Reply

Relational -> demonym

Latest comment: 8 months ago2 comments2 people in discussion

Could you clean up Spanish demonyms like diff? It makes more sense than categorizing 900+ demonyms as relational adjectives just because they don't have a one-word translation in English. Ultimateria (talk) 19:23, 7 January 2024 (UTC)Reply

@Ultimateria Hi, I actually wrote a script awhile ago to do exactly this but never ran it. I don't remember why; maybe it needed a few fixes. I'll go ahead and finish this. Benwing2 (talk) 22:52, 7 January 2024 (UTC)Reply

Revert adding acceleration forms to `{{pl-conj-ai}}`

Latest comment: 8 months ago11 comments3 people in discussion

Hi @Benwing2. You just reverted the changes to the template {{pl-conj-ai}}. Could you please elaborate on what was broken? So I could see how it could be fixed while preserving the benefit of the acceleration forms? Incidentally, similar changes have been made to other templates, so the same error could arise for other verbs. You are referring to active adverbial participles, for which only one single form was used before, even though those adverbs have different forms depending on plural/singular and gender. Maybe the breaking tool needs to be updated to cater for those other forms. @Vininn126 JuChelou (talk) 14:04, 25 January 2024 (UTC)Reply

@JuChelou For one thing, the specific value of 'active adjectival participle' (along with various other specific values) is processed specially in Module:accel/pl and causes the inflection to be set to 'actv|adj|part'. By changing this you broke this support, and caused it to use an invalid inflection tag set 'm|s|active adjectival participle'. The other inflections of the participle were similar. The correct thing to do is to leave the masc sing participial forms unchanged and if you want to add acceleration to the other forms, they should cause the form to be created as e.g. {{feminine singular of|pl|PARTICIPLE}} rather than as an inflection of the verb. You can see an example of how to handle this correctly by looking at the lines starting at Module:accel/pt#L-21. Benwing2 (talk) 22:50, 25 January 2024 (UTC)Reply

Thank you @Benwing2 for your reply. @Vininn126

I tried something in Module:accel/pl and {{pl-conj-ai}} to add proper accel form support for the adjectival participles.

However, I am not fully satisfied with the result because:

1/ on the masculine singular form, it could add 2 forms, for example for wyrzucający wyrzucać

2/ the result would not be similar if the new wiki page is triggered from the conjugation chart or from the adjective declension chart (which I also added recently). For example, for wyrzucające, the new wiki page triggered from the verb link would "miss" the fact that it is also the form for accusative neuter and accusative non virile.

Any advice? Or should I just ditch the extra accel forms for the participle and contributors would use the new accel links from the adjective declension module? JuChelou (talk) 16:18, 26 January 2024 (UTC)Reply

In theory you could generate wyrzucający and from there generate the others, but it's less than ideal. Vininn126 (talk) 16:40, 26 January 2024 (UTC)Reply

@JuChelou Hmmm, I'm not quite sure how to handle #2; either you'd have to add all the non-nominative forms of the participles to the verb table so that the accelerator code knows about them automatically, or you'd have to hack the code in MOD:accel/pl somehow to add the remaining inflections in. (This latter thing is possible, as I think I added a hook that you can define in the accelerator module that operates at the end after all the inflections have been combined.) As for #1, the general principle I've followed is not to include definitions for non-lemma forms that are identical in spelling to the lemma. I followed this principle, for example, when I create a bot script to add Russian noun inflections. This also happens in Portuguese verbs (where the 1st and 3rd singular future subjunctive usually looks the same as the infinitive), and for Latin feminine nouns (where the ablative singular is spelled the same as the nominative singular, although the pronunciation is different as the ablative ends in long -ā while the nominative ends in short -a). I actually removed the cases where Portuguese verbs were defined normally but had an additional definition as the 1st/3rd singular future subjunctive, but I may have left alone the Latin ablative cases because of the different pronunciation. In the Polish case, the pronunciation is the same and so you could fix this by just not having an accelerator defined on the forms that look like the lemma.

In general, I would actually argue that instead of including only the nominative case forms, it's best not to include anything but the masculine nominative singular of the various participles in the verb table, and require that the remaining forms be defined using accelerators on the participle table, even though User:Vininn126 thinks this is non-ideal. This is how we handle participles in Russian, for example, which is similar in many ways to Polish. I think the main benefit to having non-lemma participle forms defined in the verb table is if there are irregularities in their formation, but I don't think this is the case in Polish. Benwing2 (talk) 23:20, 26 January 2024 (UTC)Reply

An additional thought is maybe we shouldn't be defining non-lemma forms of participles at all, since AFAIK they're quite regular and there are a lot of them. See the discussion above about #Finnish inflections. This is the policy we follow for Russian, for example. Benwing2 (talk) 23:22, 26 January 2024 (UTC)Reply

Where do we define non-lemma participles? Vininn126 (talk) 10:17, 27 January 2024 (UTC)Reply

@Vininn126 Sorry, can you clarify what you mean? Benwing2 (talk) 10:37, 27 January 2024 (UTC)Reply

I simply didn't understand your last message Vininn126 (talk) 10:59, 27 January 2024 (UTC)Reply

Thank you @Benwing2 for your very detailed answer.

Basically, regarding your recommendations for #1, that would be easy to remove the accel form for the version identical to the lemma form.

For the #2 however, that would be more tricky as it would require to duplicate generating all the forms, opening room for discrepancies between the pl-adj module and the polish accel module.

If I understand correctly, your overall recommendation is to remove all the other forms of the participles in conjugation templates. Basically, we would just have "active adjectival participle: masculine singular nominative form".

It would be similar to what is done for the verbal noun, where there is only the masculine singular nominative form, even though other forms exist.

@Vininn126 what would be your opinion on removing the additional forms of the adjectival participles from the conjugation templates? JuChelou (talk) 17:02, 28 January 2024 (UTC)Reply

Sounds fine to me; it's not typical to have them. Vininn126 (talk) 18:03, 28 January 2024 (UTC)Reply

On the `{{quote-book}}` template

Latest comment: 8 months ago4 comments3 people in discussion

Hi,
I was wondering what exactly the combined use of the parameters |start_year= and |year= is supposed to communicate.
It's supposed to mean a range of dates, but—with an example 1390–1400—is range meant:

in the sense of "the composition of this work started in 1390, and ended in 1400"?

or in the sense of "this work was probably completed (or brought to its current state, if unfinished) somewhere between 1390 and 1400"?

Thanks in advance for any clarification. I've recently discovered these parameters, and I'm not sure I've been using them properly. —— GianWiki (talk) 15:24, 25 January 2024 (UTC)Reply

@GianWiki These parameters were there before I started to clean the template up, so you might ask User:Sgconlaw, but I'm thinking it's used for works that took several years to create. Benwing2 (talk) 23:45, 25 January 2024 (UTC)Reply

I see, I hadn't noticed that. I'll try asking them just to make sure.

Thank you for your time. —— GianWiki (talk) 08:18, 26 January 2024 (UTC)Reply

@GianWiki: I don't think the parameters were clearly defined at the time when I first tidied up the {{quote-}} templates. Personally, I use them to mean a range of publication dates (for example, if a novel is originally published in parts in a magazine over many months), and if I intend a range of dates to mean anything else I add a qualification in parentheses for clarity like this: |year=c. '''1597–1600''' (date written). — Sgconlaw (talk) 10:54, 26 January 2024 (UTC)Reply

WingerBot and Welsh animal genders

Latest comment: 8 months ago2 comments2 people in discussion

Hi, your bot edited garan ("crane") and petris ("partridge") so they would be “m or f by sense”, which isn’t correct. I've corrected them, but can you amend the bot so it doesn’t edit other animals like this please?

Garan is usually a masculine noun, that can be feminine due to dialect, rather than the sex of the animal (e.g. in Iolo Williams’s Llyfr Adar and the Geiriadur yr Academi) and petris is feminine.

I’ve consulted a bit with other Welsh speakers and the only source I can see for petris ever being masculine is the Geiriadur Prifysgol Cymru, which could easily be due to one or two examples from centuries ago. “A small cock partridge” would be ceiliog petris bach – where bach modifies ceiliog, not petris.

Cheers, Arafsymudwr (talk) 15:54, 30 January 2024 (UTC)Reply

@Arafsymudwr This was a one-off run where I manually made the changes in question in a text editor and only used the bot to push the changes (that's what "manually assisted" means in the changelog message). So there's no script to amend but I'll make sure not to change the genders of animal terms in Welsh (or generally in any language, I think) in the future. Benwing2 (talk) 06:11, 31 January 2024 (UTC)Reply

Links to English possessives in inflection-line templates

Latest comment: 7 months ago13 comments2 people in discussion

I wish I had included this in my request about links to components of hyphenated terms in English inflection templates. (How's that coming, BTW?) Many vernacular names of organisms are like Gundlach's hawk (See Gundlach's hawk). It would be better, especially for me, if the link were to Gundlach rather than the possessive. I can't think of any instances for which the possessive would be a better link target and believe that any such instances are relatively rare exceptions. DCDuring (talk) 16:29, 31 January 2024 (UTC)Reply

@DCDuring Yes, in fact my concerns over how to handle apostrophes are why this hasn't already gotten done. I'm thinking that we should split any term with a trailing 's except for one's and someone's (with exceptions also maybe for he's, she's, it's), but not split other terms with apostrophes (e.g. I'm, don't, haven't). BTW I notice that we've split apostrophe-s into two terms, 's for the contraction and -'s for the possessive. Personally I think this is confusing and probably they should be merged into 's (without the hyphen). It also makes auto-linking more difficult; probably we should link all occurrences of 's into -'s since this is the more common case. Benwing2 (talk) 22:07, 31 January 2024 (UTC)Reply

This 's/-'s distinction gets to how to indicate the distinction between an inflectional ending and a contraction, doesn't it? One one level one needs a linguistics or philosophy degree to be qualified and/or motivated to argue this, but I don't hold the right degrees. On another level, how to help users, it would seem both should be on the same page, almost certainly 's. It probably should go to BP, but you may be able to go ahead with what is convenient to implement and rely of links between [['s[[ and -'s to help users in the meanwhile. DCDuring (talk) 22:22, 31 January 2024 (UTC)Reply

@DCDuring Please see User:Benwing2/test-en-multiword for some examples of the new headword link handling system that I'm testing. It includes the ability to change the link of one (or several) of the words of a multiword expression without having to write out the entire expression; see the examples that specify |head=~.... (This functionality was already implemented for Italian and later extended to other Romance languages.) Note that if there are both hyphens and spaces, the default behavior is to link the space-separated components but not break up hyphen-separated components, although this can be changed using |splithyph=1. Possibly the default should be reversed and hyphen-separated components broken up by default unless |nosplithyph=1 is given; what do you think? Benwing2 (talk) 00:01, 2 February 2024 (UTC)Reply

I will look at it in about 16 hours. DCDuring (talk) 00:04, 2 February 2024 (UTC)Reply

@DCDuring: OK, thanks. BTW I'm thinking we should indeed change the default when there are both hyphens and spaces, and maybe make an argument to convert hyphenated terms to space-separated terms, e.g. for cases like civil-rights movement and claw-hammer coat that should be linked as [[civil rights|civil-rights]] [[movement]] and [[claw hammer|claw-hammer]] [[coat]] (likewise closed-circuit television, clock-face timetable, coffee-table book, etc.), although there are also examples like close-up lens, coin-operated laundry, context-free grammar, co-occurrence network, etc. where we do want to link the hyphenated component as such. Benwing2 (talk) 00:58, 2 February 2024 (UTC)Reply

I really like the more hyphenated forms because they reduce certain kinds of possible misreading of MWEs, but contemporary relative frequency may indicate that hyphenated forms are already much less frequent. For three-part English vernacular names of organisms, I often find that the hyphen is in the wrong place or is not useful. But black billed amazon is not a good substitute for black-billed amazon. DCDuring (talk) 01:10, 2 February 2024 (UTC)Reply

@DCDuring I have redone the handling of terms with both hyphens and spaces so that it now looks up the hyphenated term to see whether it exists in order to determine how to link it. Specifically:

If the term exists as a space-separated compound, link to that. (We prefer space-separated compounds because the hyphen-separated form often exists as a soft redirect.)
Otherwise, if the term exists as a hyphen-separated compound, link to that.
Otherwise, link the hyphenated terms separately.

This handles most cases properly, although there are occasional situations where it fails; for example, close up and close-up both exist and are different, and by default close-up lens links (wrongly) to the former. For this reason I've provided params to override the default handling: |hyphspace=1 forces case (1) above, |nosplithyph=1 forces case (2) above, and |splithyph=1 forces case (3) above.

Benwing2 (talk) 05:27, 2 February 2024 (UTC)Reply

I hope we will never have entries for terms like scaly-headed. So I'll have to use nosplithyph=1 for a vast number of vernacular names. I may as well not have asked for this favor. I suppose I could create a new template to wrap {{en-noun}} or {{head}}, specifiying the parameter, to save keystrokes for these vernacular name entries.DCDuring (talk) 13:41, 2 February 2024 (UTC)Reply

@DCDuring If you need to use |nosplithyph=1 for a large number of vernacular names, that is defeating the purpose of things. Can you explain why you think you need to use this for so many? Things like scaly-headed are SOP so should be split, IMO. Benwing2 (talk) 20:37, 2 February 2024 (UTC)Reply

I misread in haste, I think. DCDuring (talk) 22:43, 2 February 2024 (UTC)Reply

@DCDuring I have implemented the various changes to the linking behavior of Module:en-headword. They are documented on the module documentation page Module:en-headword/documentation (although the section on link modifiers is still to be written). There is text in the documentation of {{en-noun}}, {{en-verb}} and {{en-adj}} pointing to the module documentation page for the specifics about multiword linking and suffix handling. Let me know if there's anything else needed documentation-wise. Benwing2 (talk) 00:10, 7 February 2024 (UTC)Reply

The section on link modifications (renamed from link modifiers for clarity) is written. Benwing2 (talk) 00:46, 7 February 2024 (UTC)Reply

devil's own

Latest comment: 7 months ago4 comments3 people in discussion

I reverted WingerBot's edit to this entry not just because of the module error (I think you added |def= to the noun and proper noun code, but not to the adjective), but because it looks to me like the syntax is more along the lines of "[the devil's] own" rather than the "the [devil's own]. Not that I would get into an edit war over this- I just wanted to make sure you were aware of that dimension before deciding how to fix things. Chuck Entz (talk) 04:23, 4 February 2024 (UTC)Reply

@Chuck Entz Thanks. Yeah I forgot about handling adjectives with the in them. As for the syntax issue, all that |def=1 does is add the before the head; it doesn't assert any particular way of parsing the constituents. I suppose it could be interpreted as asserting an analysis like the [devil's own] but that wasn't my intention (and I'm not quite sure how we'd indicate such an analysis in the head). Benwing2 (talk) 04:47, 4 February 2024 (UTC)Reply

But adjectives don't have the in them. We should review the entries that so claim and determine whether there is good reason to ever have the inside the headword template for adjectives. DCDuring (talk) 14:14, 4 February 2024 (UTC)Reply

Never mind. I was thinking of leading the. We have numerous entries of purported adjectives with the embedded. Some of them seem like attributive use of a noun, but not all. DCDuring (talk) 14:23, 4 February 2024 (UTC)Reply

Category:LANG nouns with other-gender equivalents

Latest comment: 7 months ago3 comments2 people in discussion

Hello Benwing. I hope that this does not take too much of your time. How should CAT:Telugu nouns with gendered forms be added to MOD:te-headword? I tried looking at MOD:hi-pa-headword, but could not figure out what and where to add the equivalent of:

table.insert(data.categories, data.langname .. " " .. plpos .. " with other-gender equivalents")

to MOD:te-headword. I noticed that this feature was missing for Telugu when I saw

Synonym: (female) రచయిత్రి

at the entry for రచయిత (racayita). ~~The Lua-fication of {{te-noun}} means adding features such as this is not as easy as adding~~

~~{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}~~

~~to {{te-noun}}~~. Kutchkutch (talk) 00:46, 5 February 2024 (UTC)Reply

Adding

{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}

at the end of {{te-noun}} seems to work for categorisation but not for the headword line. Kutchkutch (talk) 00:59, 5 February 2024 (UTC)Reply

@Kutchkutch Glad you figured it out. IMO Module:te-headword needs to be rewritten; it wasn't written by me and doesn't really follow the standard structure for such modules, which is probably why you had difficulty figuring out how to add the appropriate code. Benwing2 (talk) 22:33, 8 February 2024 (UTC)Reply

Email

Latest comment: 7 months ago2 comments2 people in discussion

Btw, idk if you have notifications turned on for emails, but I sent you one. Vininn126 (talk) 22:24, 8 February 2024 (UTC)Reply

Thanks, I responded. For some reason I didn't get an email notification here on Wiktionary even though I do have email notifications turned on. Benwing2 (talk) 22:32, 8 February 2024 (UTC)Reply

bùzháodiào

Latest comment: 7 months ago3 comments2 people in discussion

Hello. Could you help me fix the Traditional Chinese conversion here? Thanks. ---> Tooironic (talk) 00:31, 11 February 2024 (UTC)Reply

@Tooironic What's the exact issue? BTW in general I am not too familiar with how the Trad <-> Simp conversion works; User:Theknightwho knows more. Benwing2 (talk) 00:32, 11 February 2024 (UTC)Reply

Thank you User:Theknightwho! ---> Tooironic (talk) 00:39, 11 February 2024 (UTC)Reply

hmm

Latest comment: 7 months ago3 comments2 people in discussion

How much longer is it going to take you to finally finish making this new pronunciation module for Polish? You've been doing it for several months now, hurry up, or someone might think you're getting a little lazyyy :) Gugugagasraniewbanie (talk) 08:30, 13 February 2024 (UTC)Reply

@Gugugagasraniewbanie Yeah it will happen soon. Benwing2 (talk) 08:32, 13 February 2024 (UTC)Reply

OK, then you have my forgiveness Gugugagasraniewbanie (talk) 08:35, 13 February 2024 (UTC)Reply

Mon-Burmese script

Latest comment: 7 months ago8 comments3 people in discussion

I changed some letters defined for specific languages (e.g. "X is a letter of the Shan alphabet") to that language (i.e. Shan), then added a request for definition to the translingual entry. If this is somehow considered vandalism, I'll revert myself, but I'm assuming obvious fixes like this are acceptable, an it parallels other entries that only have definitions for specific languages. (A definition might be as simple as stating that it's a letter of the Mon-Burmese script corresponding to a certain letter in Sanskrit, but I didn't do that myself as I thought I might be accused of vandalism.)

I also removed a couple pronunciations that were for the wrong entry. kwami (talk) 04:25, 14 February 2024 (UTC)Reply

@Kwamikagami "Vandalism" doesn't seem like the right word for changes that are in good faith. As to whether they are wrong or counterproductive I don't know but they seem generally fine to me. User:RichardW57 do you have any comments? Benwing2 (talk) 04:45, 14 February 2024 (UTC)Reply

Okay, "blockable offense" then. kwami (talk) 04:47, 14 February 2024 (UTC)Reply

Yeah I understand. BTW I think blocking is only likely if you edit-war or keep making changes of a specific nature after people have objected to them. (Also editors who don't know what they're doing but think they do; editors of this nature can do a lot of damage.) Wikipedia seems generally more tolerant of edit-warring, maybe because of the number of editors relative to how many articles there are. Benwing2 (talk) 04:57, 14 February 2024 (UTC)Reply

@Benwing2: Which Shan alphabet? There are several Shan languages, which often makes the letters translingual because shared by several Shan languages! The change seems backwards - I would have said that the thing to do was to waste space by adding the Shan entry. As Burmese-script words easily consist of a single letter, cloning letters to each language using them makes Wiktionary more difficult to find by eye, in accordance with the apparent aim of difficulty of use. --RichardW57 (talk) 08:49, 14 February 2024 (UTC)Reply

If there are other Shan languages besides [shn], and they use the same letter, then they should be listed. But as it was, they were not listed -- only [shn] was.

And yes, I know you want to lump all languages together, but that's not the consensus for Wikt. kwami (talk) 18:49, 14 February 2024 (UTC)Reply

We have Shan (shn), Khamti Shan (kht), Aiton (aio), Phake (phk) and Tai Laing (tle) that use the Burmese script. The Tai Nuea (tdd) (= Tai Le /Tai Dehong / Chinese Shan) (not to be confused with Northern Tai or Northern Thai) and Tai Khuen (kkh) (though their speech is more akin to Northern Thai, but they identify as Shan) use different scripts. There's also Khamyang (ksu or nrr). Tai Ahom should arguably be included, but again it has its own script. --RichardW57 (talk) 23:32, 14 February 2024 (UTC)Reply

And when we say a letter is used by [shn], do we necessarily know that it's also used by the others? E.g. in Lik-Tai for Khamti? The label "Shan" may cover multiple languages in some usage, but when Wikt has an entry for Shan [shn], we mean specifically that language. When we mean Khamti, we say Khamti. Etc. But sure -- if we can demonstrate that a letter is used by multiple languages, we can say that it's used for multiple languages. Though when giving the pronunciation and orthographic rules, we need to be careful not to present [shn] as representative if it isn't. kwami (talk) 01:23, 15 February 2024 (UTC)Reply

Seeking template help

Latest comment: 7 months ago2 comments2 people in discussion

Hi, we find your Hindi language templates very helpful. Could you assist us with essential Sylheti templates (language code: syl) on English Wiktionary? We could contribute with translations, although we are still familiarizing ourselves with Wiktionary policies. -- ꠢꠣꠍꠘ ꠞꠣꠎꠣ (talk) 07:52, 16 February 2024 (UTC)Reply

@ꠢꠣꠍꠘ ꠞꠣꠎꠣ Hi I'm up to my ears in requests so I'm won't be able to get to this soon, although if someone else wants to work on it using the Hindi modules as a starting point, I can provide guidance. Benwing2 (talk) 09:55, 16 February 2024 (UTC)Reply

Category:Romance terms inherited from Latin nominatives

Latest comment: 7 months ago7 comments2 people in discussion

Hi. Sorry, I think I was a bit too 'bristly' with how I responded earlier. I really do support removing these categories and sticking the relevant content into 'Appendix: Romance terms plausibly inherited from Latin nominatives'. Nicodene (talk) 17:21, 18 February 2024 (UTC)Reply

@Nicodene This sounds good to me and "plausibly" sounds like a good term to use, and I apologize if I also was a bit in-your-face. If you can write the appendix and put the terms there in a list, I can remove the categories from the terms by bot. Benwing2 (talk) 19:56, 18 February 2024 (UTC)Reply

Done. This should actually make it easier for me to reorganise/restructure it all, which I've been meaning to do. Nicodene (talk) 20:39, 18 February 2024 (UTC)Reply

@Nicodene Thanks! Benwing2 (talk) 00:45, 19 February 2024 (UTC)Reply

@Nicodene I am going to remove the pages listed in the appendix from the '... inherited from Latin nominatives' categories. Just checking that this is OK with you. Benwing2 (talk) 04:56, 19 February 2024 (UTC)Reply

Yes, go for it please. Nicodene (talk) 05:01, 19 February 2024 (UTC)Reply

@Nicodene OK it's done. BTW the appendix is looking good and I'm glad you have included detailed notes. Benwing2 (talk) 05:26, 19 February 2024 (UTC)Reply

Macrolanguages

Latest comment: 7 months ago7 comments4 people in discussion

Hi - do you have any ideas for how we could handle macrolanguages in the data (Chinese being the most obvious example, given how we handle Chinese L2s). I’m not keen to create a whole new type of object, since this situation comes up in loads of places, as we don’t have a coherent distinction between “is a type of” and “is a descendant of”, leading to the issues I mentioned in WT:RFM#Converting Min Nan into a family, where Teochew and Leizhou Min are “descended from” Min Nan, whereas they’re actually types of Min Nan.

I suspect you’ve noticed similar things with how Persian and Latin are handled. One common situation which stands out are language periods: we list Old Latin as ancestral to Latin, but as it’s an etym-only language of Latin that technically means we’re saying it’s ancestral to itself. Same for Early Modern English and English, and so on. We get round it by adding an explicit check to Module:languages to prevent a language being ancestral to itself, but that’s a kludge which is symptomatic of our poorly defined language model.

Also see the Japonic family tree at Category:Proto-Japonic language, where the periodisation of Japanese is all messed up because they’re all treated as etym-only languages part of Japanese, even though Early/Late Middle Japanese have Middle Japanese as their immediate parent. (They currently display in the wrong order, since Middle Japanese should not be listed before Early Middle Japanese if we were to follow the same system as Latin; the data is correct but Module:family tree is bugged.) A much bigger issue is that we imply Middle Japanese is split into three periods, and that the central period is somehow representative. This is confusing at best, and outright misleading to anyone who isn’t familiar with the nuances of our data modules. Theknightwho (talk) 18:29, 18 February 2024 (UTC)Reply

@Theknightwho Since you have merged etym-only and full languages to the point that both are more or less just types of Language objects, can we not just have a "type" field identifying something as a macrolanguage? That way it will still work as a language for most purposes. IMO we do need to properly distinguished is-a-X and is-a-descendant-of-X, and it seems you've provided a way with the ancestors field. As for the issue of Old Latin vs. Latin, we do have a "Classical Latin" etym language and ultimately we need to push more in this direction, although it will require some thinking. These are just the thoughts off the top of my head. Benwing2 (talk) 19:54, 18 February 2024 (UTC)Reply

@Benwing2 Thanks - that's helpful to think about.

I'd rather not have a specific macrolanguage field, since it's superfluous to whether or not something is set as being a "type of" that language. I think the handling of Chinese, Latin, Persian, English and (one I missed above) Norwegian should probably all be done in the same way. At the most extreme end, the Sinitic family and Chinese are in fact the same thing, so I'm more inclined towards having a way to set one language as a type of another (as we do with etym-only languages), fully merging etym-only languages into languages, and then having a flag which sets whether it should be treated as a full language. That way, we also get rid of the weird half-and-half situation going on with Classical Persian and the arbitrary distribution of Chinese lects between language and etym-only language, while making it more straightforward to switch something from one to the other (e.g. the Prakrits). It may also be worth doing the same with families, since (as Chinese shows) macrolanguages and families are basically the same thing in most situations.

I think we probably need some kind of periodisation mechanism. In the case of Latin, if we're treating Old Latin as a "type of" Latin, then strictly speaking Latin's ancestor should be Proto-Italic. However, within that we could have the various periods, including Classical Latin, and there should be a way to set a default period for situations when only the generic language code is provided. For most languages that would be the standard language; in the case of Latin, it would be Classical. This would alo potentially address the issue of cross-overs between regional lects and periods: e.g. Northern Early Modern English, and should also help avoid the silly Japanese situation, since periods should be possible to nest inside each other. Theknightwho (talk) 20:10, 18 February 2024 (UTC)Reply

@Theknightwho All this sounds good to me in general although it would be helpful if you could write out your proposals in more detail as it's sometimes a bit hard for me to work out what your thoughts are when presented abstractly. Benwing2 (talk) 20:31, 18 February 2024 (UTC)Reply

@Benwing2 Will do. I’ll also have a think about how we should handle this in the family tree display, since a lot of the confusion stems from that displaying descendants and variants/types in exactly the same way. Theknightwho (talk) 20:52, 18 February 2024 (UTC)Reply

One problem that needs to be addressed is that language change doesn't always follow a tidy tree model. Macrolanguages are messy. A macrolanguage always has a standard lect that the other lects identify with- but there can be more than one, and which lect is the standard can change over time. Even some of the more complex ordinary languages have similar phenomena. This can end up being reflected in the history of languages both within and deriving from the (macro)language.

With English, you have the same language changing its prestige/standard dialect several times in Old English due to the rise and fall from prominence of specific kingdoms: Anglia, Mercia, Northumbria, and finally Wessex (this is off the top of my head- I'm sure I missed something). With the transition to Middle English it all moved to London. Middle English borrowed heavily from Old Northern French, but since then the source has been Parisian French. Scots split off from the northern dialects that descended primarily from Northumbrian. I'm sure there were changes in the Old Norse dialects that Old English and Middle English borrowed from, and then there's the matter of Brythonic Pictish and Goidelic Gaelic in Scotland and their influence on Scots and northern English.

China had several changes in which were the prestige lects, and these are reflected in the various named yomi in Japanese, as well as the borrowings into other neighboring languages. Then there's Mycenaean Greek, which is different from whatever became Ancient Greek, and the fact that older Latin borrowings didn't come from the Attic dialect that became modern Greek, and Tsakonian that came from Doric, etc.

If you look at a regional lect, you can find things descended directly from the same region in the ancestral language, and things that came in from the standard lects of the different historical stages, and other things that were borrowed from various external languages. Sometimes separate languages split off from these regional lects, so they have more in common with the regional varieties of the main language than with the standard lects of any historical period.

To stretch the tree analogy a bit: sometimes a limb that's touching the ground sets root and becomes a tree in its own right, and other times branches or roots from separate trees graft together after prolonged contact.

I seem to have written a book here, but I hope you can see what I'm getting at. It would be a good idea to think about some way of representing the internal structure of macrolanguages and even regular languages, and the way that different descendants can come from different parts of the same language. There's a complex interchange between region and historical period, so the Wessex dialect of today has a completely different status from the Wessex dialect of a thousand years ago, and the geographical identification of what's mainstream and what's dialectal changes over time. It's all secondary to the main concept of parent and daughter language, but it might help us with some exceptional cases like Chinese. Chuck Entz (talk) 23:15, 18 February 2024 (UTC)Reply

Agreed. Even Anglo-Norman, the main vehicle of 'Gallicisms' in Middle English, began as a chaotic hodge-podge of Old French dialects, certainly in many respects 'northern-flavoured', but not only, and increasingly slanting towards (but never quite attaining) Central French norms as the centuries went by. In this case as well there is no question of a precise dialectal ancestry. Nicodene (talk) 14:34, 19 February 2024 (UTC)Reply

Italicising synonyms for taxonomic names

Latest comment: 7 months ago23 comments5 people in discussion

Hi Benwing. Could you edit Module:form of, Module:form of/templates, and/or T:synonym of to add the ability to italicise the linked-to term in transclusions of {{synonym of}} (preferably by calling |i=), please? Such functionality is needed for taxonomic synonyms. ATM, work-arounds like those seen in Asclepias filiformis var. buchenaviana, Bulbophyllum buchenavianum, Gomphocarpus filiformis var. buchenavianus, Megaclinium buchenavianum, and Tropaeolum buchenavianum are necessary. 0DF (talk) 00:38, 19 February 2024 (UTC)Reply

@DCDuring who would know how this is handled in other taxonomic entries. Chuck Entz (talk) 01:08, 19 February 2024 (UTC)Reply

Now, {{syn of}} (and {{alt form of}}, possibly others) suppresses italics formatting that {{taxlink}} provides or direct or piped wikitext formatting. All we would need is templates like {{syn of}} and {{alt form of}} to handle embedded wikitext for italics, as is now possible in other templates that incorporate links. Alternatively Something like {{syn of}}, say {{taxsyn}} (also {{taxalt}}), would have all the formatting capabilities {{taxlink}}, which include not italicizing terms like "var.", "section" ("sect.", "subsect"), "subg.", and "subsp." in taxonomic names. This would probably not involve too much renaming of templates at this point. DCDuring (talk) 13:58, 19 February 2024 (UTC)Reply

And it would be nice to allow † to appear without requiring pipes. DCDuring (talk) 14:37, 19 February 2024 (UTC)Reply

@DCDuring: I assume it would be possible to include the non-italicising functionality of {{taxlink}} in {{synonym of}} by making it contingent upon both |1=mul and |i=1 being true. I can't imagine a case in which one would want to define a term as a synonym of something translingual that contains any of the strings sect., subg., subsect., subsp., or var.; italicise it; and for that term not to be a taxonomic name. 0DF (talk) 14:38, 19 February 2024 (UTC)Reply

The italicization rules of the various taxonomic bodies include that all taxonomic names (ie, any rank) of viruses, bacteria, and archaebacteria be italicized. It is probably simpler to use passed-through wikitext italics than to duplicate {{taxlink}} functionality. DCDuring (talk) 14:47, 19 February 2024 (UTC)Reply

@DCDuring: I only meant {{taxlink}}'s functionality of automatically de-italicising those few abbreviations. Italicising dependent on a parsing the taxon (as a species, genus, phylum, or whatever) seems superfluous and unnecessarily complicated for {{synonym of}}; |i=1 should be all that's necessary. 0DF (talk) 14:59, 19 February 2024 (UTC)Reply

It seems too complicated to me too, but I've often been surprised with what our techno-mavens are willing to do, for reasons that remain mysterious to me. Simply passing through wikiformatting (and, possibly, "†") would be fine with me. It would be easy enough to find the relatively few instances we would have of improper handling of those not-to-be-italicized terms in {{syn of}}, {{alt of}}, and the various etymology templates, too. DCDuring (talk) 19:06, 19 February 2024 (UTC)Reply

@DCDuring: How would you want the obelus to be treated? 0DF (talk) 22:42, 19 February 2024 (UTC)Reply

Directly in front of taxon, ignored for linking, but displayed without being italicized. DCDuring (talk) 12:42, 20 February 2024 (UTC)Reply

@DCDuring: De-italicising would be handled in the same way as it's handled for sect., subg., subsect., subsp., and var., I expect. Stripping † from the link text would be easy (handled in the same way Latin ā, ē, ī, ō, ū, ȳ link to Latin a, e, i, o, u, y), but it may end up being enacted in undesirable circumstances. Do we need a new (mul-tax?) language code for taxonomic names, perhaps? 0DF (talk) 18:06, 20 February 2024 (UTC)Reply

I'd prefer a shorter one, of course, like 'mult' or 'mul-t'. DCDuring (talk) 18:27, 20 February 2024 (UTC)Reply

@Mahagaja: How much freedom do we have in devising language codes? 0DF (talk) 18:30, 20 February 2024 (UTC)Reply

@0DF: You'd have to get consensus at WT:RFM for it. I wouldn't hold my breath. —Mahāgaja · talk 18:48, 20 February 2024 (UTC)Reply

@Mahagaja: Thanks for the response. I mean, rather, what restrictions are there on the form that language codes take? I know we use ISO 639-3 codes where they're available, but what about custom, in-house codes? 0DF (talk) 20:17, 20 February 2024 (UTC)Reply

@0DF @Mahagaja @DCDuring We actually already have mul-tax as a variant of Translingual (no idea when it got added, but see Module:etymology languages/data). I don't think it's used for anything at the moment, but it would make sense to use it for this. Theknightwho (talk) 20:25, 20 February 2024 (UTC)Reply

@Theknightwho: Thank you.
@DCDuring: How 'bout it?
0DF (talk) 20:29, 20 February 2024 (UTC)Reply

I always fear that the cure will turn out worse than the disease. Can it all be done automagically or will there be a few hundred exceptions? It is true that mul in Latin script is hard to confuse with mul in CJKV. DCDuring (talk) 20:37, 20 February 2024 (UTC)Reply

Daniel Carrero added Tax. "for test purposes" back in November 2016; -sche then standardized it to mul-tax. I don't know what he was testing, but the code is there for anyone who wants to use it. —Mahāgaja · talk 20:38, 20 February 2024 (UTC)Reply

BTW, why are discussions like this conducted on a userpage rather than, say, BP or GP? Does that just reflect where the power is? DCDuring (talk) 20:42, 20 February 2024 (UTC)Reply

@DCDuring: I looked at the histories of Module:form of, Module:form of/templates, and {{synonym of}}. They showed me that Benwing had done a lot of editing on all three, so I figured he/she would be sufficiently familiar with those pages to make the changes I requested. There's nothing suspicious about that and I hardly see how I can be said to have "power" here. 0DF (talk) 00:33, 21 February 2024 (UTC)Reply

It's a habit of exclusion, not an intent of exclusion. Specific folks can always be pinged. DCDuring (talk) 14:31, 21 February 2024 (UTC)Reply

@DCDuring: I guess so. Not that I intended the request to turn into a prolonged discussion. 0DF (talk) 15:12, 21 February 2024 (UTC)Reply

Error handling with Module:parameters and Module:languages

Latest comment: 7 months ago18 comments3 people in discussion

Hiya - just a heads up (and you've probably noticed already), but I've recently updated Module:parameters to allow languages, scripts, families (etc) as data types, as well as a few other things. The means that the argument table which is returned contains the relevant object(s), and invalid codes will throw an error (which automatically highlights the incorrect parameter). This avoids having to manually handle invalid codes, since the only way to do proper error-handling previously was to pass the ready-baked parameter into Module:languages using getByCode's paramForError parameter, which was tricky when dealing with lists etc. Having converted a number of template modules, it's also cut down on code length by quite a bit, too.

Ideally, we should be able to remove error handling from Module:languages and Module:scripts altogether at some point, since it doesn't really belong there, and it's annoying having to work around it when requesting etymology langs and families, too. Theknightwho (talk) 15:21, 27 February 2024 (UTC)Reply

@Theknightwho Yup I did notice it, thanks. I haven't had a chance to use the new functionality but it sounds good to me. BTW if you haven't already done this you might consider adding support for comma-separated lists of lang codes and for a term with a preceding language code (see parse_term_with_lang in Module:parse utilities, which implements this latter functionality currently). Benwing2 (talk) 20:01, 27 February 2024 (UTC)Reply

@Benwing2 I've already done the comma-separated list actually, but haven't updated the documentation since I want to make sure the implementation is stable/won't need further expansion. The solution I opted for was sublist=, where sublist=true splits the list using %s*,%s*, but using a string value allows for other splits. The other thing which isn't yet documented is set=, which is for parameters that take an (ideally small) closed set of values, where inputs with other values would be nonoperative anyway.

I'll have a think about how to handle preceding langcodes. Theknightwho (talk) 20:07, 27 February 2024 (UTC)Reply

@Theknightwho The |set= support is definitely useful. Note that the corresponding flag in Python's argparse module is called |choice=, which might possibly be a clearer name (although I can see the argument for using set as well). Benwing2 (talk) 20:16, 27 February 2024 (UTC)Reply

@Benwing2 That makes sense. The reason I opted for set= is because it uses the {a = true, b = true, c = true} format, since that makes lookup much faster/simpler. Theknightwho (talk) 20:26, 27 February 2024 (UTC)Reply

@Theknightwho Hmm, I wonder if that isn't false economy since it requires more typing, and I imagine a lot of people will call listToSet on a list to handle this format. Benwing2 (talk) 20:28, 27 February 2024 (UTC)Reply

@Benwing2 That's a good point, but checking a list is the same amount of work as doing listToSet, so changing Module:parameters to accept a list would simply guarantee the worst-case scenario, instead of leaving it up to the calling module. Theknightwho (talk) 20:34, 27 February 2024 (UTC)Reply

@Theknightwho I suppose but the actual difference in memory and speed is completely negligible, so IMO you might as well make it easier for the callers. Benwing2 (talk) 20:54, 27 February 2024 (UTC)Reply

And also you don't have the overhead of loading a new module. Benwing2 (talk) 20:54, 27 February 2024 (UTC)Reply

@Benwing2 If I have time, I might do some profiling on Module:parameters, since I have a feeling it's contributing a significant chunk to page loading time. e.g. a loads about a second faster since I made the changes, and there are still quite a few other optimisations that could be made. Theknightwho (talk) 21:02, 27 February 2024 (UTC)Reply

@Theknightwho OK but I still think requiring the use of a set rather than (also) allowing a list is a micro-optimization since the number of items should be small. Benwing2 (talk) 21:10, 27 February 2024 (UTC)Reply

@Benwing2 Alright - I can change it. Theknightwho (talk) 21:16, 27 February 2024 (UTC)Reply

@Theknightwho & Ben: pardon the partial threadjacking, but I've been waning to ask you two about the practicality of adding parameter checking to existing, non-Lua templates, and this seems like an opportune moment while you're both already thinking about Module:paramaters. I'm envisioning something like an unobtrusive template {{allowparams|1,2,3,foo,bar,baz}} that could be added to existing templates to generate errors/warnings when the template is invoked with any params besides those listed. On the backend, it could just call Module:parameters.process() with the list of supplied params and then do nothing with the result. Ignoring the difficulty of identifying the valid parameters and cleaning up all the existing calls with invalid parameters, would adding param checking to every template add an unacceptable overhead to page processing? JeffDoozan (talk) 01:45, 28 February 2024 (UTC)Reply

@JeffDoozan I think User:Theknightwho can best answer the question about efficiency as he's done a lot more investigations of this sort. Benwing2 (talk) 01:48, 28 February 2024 (UTC)Reply

@JeffDoozan That's certainly doable, but it would add an extra Lua burden to those templates, and in many cases it would be more straightforward to do the whole thing in Lua anyway.

The reason why it concerns me is that a lot of these mixed templates already make multiple calls into Lua to retrieve things like language names, and there is an inherent cost every time a module is invoked; this is the reason why {{multitrans}} is so effective, because it removes that inherent cost from each template. Aside from memory costs, each invocation is quite time-consuming (relatively speaking), since a ton of things are done by the back-end to create each new Lua environment. Theknightwho (talk) 01:48, 28 February 2024 (UTC)Reply

@Theknightwho: Thank you for the explanation. I had naively assumed that if a page calls Lua once, then subsequent calls would be relatively cheap. I'm still assuming that most pages include few enough templates that the benefit of having parameter checking outweighs the cost of invoking the checks, but as pages get bigger and closes to memory/speed limits, the calculus may change. Do you have any guess where that tipping point might be? (100 additional calls? 1,000? 10,000?) For pages that exceed that threshold, maybe {{allowparams}} could check the pagename against a fixed "denylist" of problematic pages before invoking Lua. I'm assuming the denylist would be < 100 pages and could be programmatically generated from an XML dump by counting the number of templates that would call {{allowparams}}. What do you think? JeffDoozan (talk) 17:39, 28 February 2024 (UTC)Reply

@JeffDoozan So conventional wikicode would probably preclude that being workable, because there's the post-expand include size limit of 2MB, which is calculated by adding up the size of every page accessed, multiplied by the number of times it's accessed, and on top of that, parser functions like {{#if:}} actually apply a multiplier to anything that goes through them (which compounds, though I think it's capped at something like x12). This was a big problem we ran into with the lite templates, where the bottom 10% of a simply wasn't loading templates anymore. Even now, it's using about 1.8MB of the limit. Obviously I'm being really pessimistic when I say these things, but the irony of it is that adding these kinds of checks to aid large pages can end up having the opposite of the intended effect!

The things that help are:

Reducing the number of calls into Lua. If it can be done in one invoke that's ideal, but really it should be no more than 5. This includes uses of any templates which themselves are Lua based (like {{l}}), since they each result in independent calls into Lua. The Coptic conjugation templates are a great example of why this matters, since they're way slower than water/translations despite having nowhere near as many links.
Not creating complex wikicode logic with the parser functions (like we do with the citation templates, for example). They're really slow, a pain in the neck to maintain, and inevitably result in lots of separate Lua invocations for basic information like language names.

In terms of the parameter checking, let me know if there are any templates which are on your priority list, because it may be that we can score some quick-wins by converting some of them into pure-Lua, whereas with others the manual parameter checking may be workable. Theknightwho (talk) 17:51, 28 February 2024 (UTC)Reply

@TheknightwhoThat kind of deep information is exactly why I wanted to run this by you. Since I'm hoping to do this programatically and en-mass, it would be limited to templates where I can parse the code to find all of the parameters used, which eliminates anything already calling #invoke since the invoked module can make its own use of the parameters and I'm not sure how practical it is to try to determine the parameters used by a Module. I think this means that every modified template would mean 1 additional call to Lua for every use and also that there's likely little or no benefit to converting them to Lua. How many total Lua calls on a page is too many?

I would probably start with the templates that don't already have calls with bad parameters, which probably means the lesser used templates that might not even be included on our bigest pages. I can check which templates are used on pages with more than X template calls and exclude those templates from the mass conversion, to ensure we're not adding additional stress to our biggest pages. I understand that not all template calls are equal, but is there some reasonable number of template calls I could use for detecting "big" pages? 100? 500? 1000? JeffDoozan (talk) 20:34, 28 February 2024 (UTC)Reply

"terms spelled with"

Latest comment: 5 months ago4 comments2 people in discussion

Hi, I would like to bring your attention to categories such as Category:Hindi terms spelled with ॉ. We seem to have decided that ◌ (U+25CC) should not be used for the Hindi combining characters, but Translingual doesn't seem to know about that, which is why Category:Translingual terms spelled with ◌ॉ exists. What should we do about that? --kc_kennylau (talk) 16:47, 28 February 2024 (UTC)Reply

@Kc kennylau Can you explain further about U+25CC? What is its replacement? As for the "terms spelled with" categories, AFAIK these categories are suppressed for one-character entries but this entry seems to involve two Unicode chars. Maybe User:Theknightwho can comment more as he reworked the code to generate these categories. Benwing2 (talk) 02:40, 29 February 2024 (UTC)Reply

U+25CC is usually used with combining characters (see Category:Translingual terms spelled with ◌̺, which is U+25CC followed by U+033A) in order to display the character. However, due to some unknown reasons, at least in my browser the Hindi combining characters in "isolation" already come with a dotted circle when they are rendered, so using U+25CC would create two dotted circles when displayed. I tried to look at The Unicode Standard, but so far it seems to me that this is not really specified one way or another, at least not specifically for Devanagari. This is why I don't really know if we should include U+25CC or not. --kc_kennylau (talk) 02:48, 29 February 2024 (UTC)Reply

(moved to Wiktionary:Beer parlour/2024/April#"terms spelled with") --kc_kennylau (talk) 00:59, 4 April 2024 (UTC)Reply

Latin macronization change: veho, vē̆xī, vectum

Latest comment: 6 months ago7 comments2 people in discussion

Hello, I was just looking into the vowel length of Latin vē̆xī (perfect of vehō) and it looks like most recent sources think there's a good chance that it had a long vowel like Sanskrit ávākṣam (although there is some uncertainty). I edited the entry for vehō with notes on this and to mark the vowel in the perfect stem as ē̆, but of course, that doesn't affect all the inflected forms and derived compounds (e.g. advehō, convehō, invehō, prōvehō, subvehō, trānsvehō, ēvehō). Could you have Wingerbot update those? (The long vowel seems to only be reconstructed for the perfect stem vē̆x-, not the supine stem vect-). I hope it's not too much trouble. I have also been wondering how I might set up a bot account of my own to make changes like this after editing the length of a vowel in Latin entries; if that's feasible for me to do, any tips would be welcome! Urszag (talk) 20:46, 1 March 2024 (UTC)Reply

@Urszag Hi. I'll go ahead and fix these. As for setting up a bot account, in order to do that (a) you need to be able to write Python scripts, (b) you do some small test runs using your own account and verify that everything works, (c) you set up a vote to create an account for your bot using the link in WT:Votes. I recommend using a combination of pywikibot to interface to Wiktionary and mwparserfromhell to parse the template invocations on a given page. Note that there's also AutoWikiBrowser which lets you make semi-automated changes based on regular expressions and takes less work to set up than a bot account; I used this several years ago before I set up a bot account. (It is only supported on Windows but it seems to work OK through Wine on MacOS, and there's also a JavaScript browser variant called JWB.)

BTW are there are any other macron changes you need done? I think there's an outstanding request somewhere in my archives that I never got to, possibly it was from you. Benwing2 (talk) 01:49, 2 March 2024 (UTC)Reply

Done. Benwing2 (talk) 05:18, 2 March 2024 (UTC)Reply

OK, I found the previous request. It was from you in April 2023: User talk:Benwing2/2023#More Latin vowel length changes. You mentioned hirtus, hirsutus, luxus, luctor. The relevant part of the input to my script has this:

###
### hīrtus
### 
a1 hīrtus
pn2 Hīrtius
a1 hīrsūtus
a1 hīrtellus
a3 hīrtipēs hīrtiped
###
### lūctor
###
v1+ lūctor
n1 lūcta
n3 lūctātiō
n3 lūctātor
v1+ adlūctor
v1+ allūctor
v1+ collūctor
n3 collūctātiō
v1+ conlūctor
n3 conlūctātiō
v1+ ēlūctor
a3 ēlūctābilis
a3 inēlūctābilis
v1+ relūctor
n3 relūctātiō
###
### lūxus "dislocated"
###
a1 lūxus
n4 lūxus
v1+ lūxō

Do all these need to change to ī̆ ū̆? Are there any words missed here? Also can you give me the appropriate changelog comment(s) to have the bot add when making the changes? The default is "if before two cons, per Bennett corrected by Allen and Michelson" but that's obviously wrong for these cases. Benwing2 (talk) 05:30, 2 March 2024 (UTC)Reply

Thanks! Those all look correct with ī̆ ū̆. I would add lūxuria, lūxuriō, lūxuriōsus, lūxuriēs, obluctor.

In addition, it looks like I missed some inflected forms of derivatives of nūbō, nūpsī, nū̆ptum when I made that change (e.g. nūptum, nūptiāle). Specifically, there's innūbō, inflected forms of innū̆ptus, nū̆ptia, nū̆ptiae, nū̆ptiālis, nū̆ptus (It seems I just edited the main entry for these), and connūbium and its inflected forms.

I just made a new change to the perfects of alliciō, allē̆xī (formerly marked as just long) and illicio, illexī and pellicio, pellexī (formerly marked as just short) to mark them as uncertain (it seems likely all three had the same quality, probably short). These just need the inflected verb forms updated.

The references I'm basing these on are cited at the pages for hī̆rtus, lū̆xus, lū̆ctor, alliciō, nūbō, cōnū̆bium, so I think one option is to add notes of the format "Vowel length marked as uncertain based on references cited at hī̆rtus", and so on. Or the specific references could be listed as follows. Hirt- and lux-: uncertain based on Bennett (long) vs. De Vaan (short). Luct-: uncertain based on Bennett (long) vs. De Vaan, Wartburg, Buchi and Schweickard (short, with complications). Allex-: uncertain based on Bennett, Buck and Allen. Nupt-: uncertain based on Lewis and Bennett (long) vs. De Vaan, Ernout and Meillet, Wartburg and Bienvenu (short). -nubium: uncertain per Kennedy. -licio, -lē̆xī: uncertain per Bennett and Buck, "probably short" per Allen.--Urszag (talk) 15:13, 2 March 2024 (UTC)Reply

@Urszag Done. Note that there also exists conubialis, which is currently indicated with long ū. Not sure if this needs ū̆. Benwing2 (talk) 06:18, 4 March 2024 (UTC)Reply

Thank you! Yes, conubialis seems to be like conubium.--Urszag (talk) 06:36, 4 March 2024 (UTC)Reply

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order

Latest comment: 6 months ago6 comments2 people in discussion

how can you change the alphabet order of the Hijazi Arabic letters from

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي

to

آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

since پ and ڤ are additional letters and not part of the Alphabetical order عربي-٣١ (talk) 12:39, 2 March 2024 (UTC)Reply

@عربي-٣١ Are you referring to the sort order as it appears on category pages? The thing is, those additional letters are letters even if they aren't part of the standard Hijazi alphabet, and they need to be sorted *somewhere*. The "to chart" you gave doesn't include them anywhere. Benwing2 (talk) 22:47, 3 March 2024 (UTC)Reply

Oh NVM, you want them placed at the end. Benwing2 (talk) 22:48, 3 March 2024 (UTC)Reply

@Theknightwho @Fenakhay What do you think about this? It looks to me like there is no explicit sort key currently specified for Hijazi Arabic (nor for Egyptian and Gulf Arabic). Standard Arabic has a sort key but only for one Judeo-Arabic character, and Moroccan Arabic has a sort key of some sort that has no comments so I'm not sure what it's doing. IMO we should strive to treat all varieties of Arabic the same as much as possible, e.g. in using the same sort order everywhere as much as feasible; the additional letters correspond to /p/ and /v/, which are marginal phonemes in most varieties of Arabic (with the possible exception of /p/ in Iraqi varieties?). (Also per Wikipedia's Varieties of Arabic article, there are two different ways of writing /v/ in the Arabic script, corresponding to an East-West split.) Benwing2 (talk) 03:06, 4 March 2024 (UTC)Reply

@Benwing2 Well they are additional variants of letters (foreign letters) and should be included at the end of this list, since they are already included as the last when you check the pages in any of the Arabic dialects sorting pages, also the Arabic sorting key should be from right to left as with the rest of Arabic dialects (not from left to right as it is now in Category:Arabic terms) عربي-٣١ (talk) 16:01, 13 March 2024 (UTC)Reply

@عربي-٣١ Sounds good to me but can you post about this in the Beer parlour (WT:BP) to make sure no one objects? Benwing2 (talk) 18:05, 21 March 2024 (UTC)Reply

Replacement of quotation templates

Latest comment: 6 months ago6 comments2 people in discussion

Hi, when you have time could you please do the following quotation template replacements?

{{RQ:Ayliffe PJCA}} → {{RQ:Ayliffe Juris Canonici}}
{{RQ:Bancroft USA}} → {{RQ:Bancroft United States}}
{{RQ:Fairfax Godfrey of Bulloigne}} → {{RQ:Tasso Fairfax Godfrey of Bulloigne}}
{{RQ:Milton Eikonoklastes|1|Chapter Name}} → {{RQ:Milton Eikonoklastes|chapter=Chapter Name|page=1}}

Thank you! — Sgconlaw (talk) 13:45, 3 March 2024 (UTC)Reply

@Sgconlaw Done. Benwing2 (talk) 22:47, 3 March 2024 (UTC)Reply

Thanks! — Sgconlaw (talk) 11:28, 4 March 2024 (UTC)Reply

By the way, was the {{RQ:Milton Eikonoklastes}} replacement also done? I couldn’t tell; maybe none of the entries it’s used in are on my watchlist. If so I’m changing the template to swap around the |1= and |2= parameters so that the template is in line with other templates. — Sgconlaw (talk) 11:37, 4 March 2024 (UTC)Reply

@Sgconlaw Yes. There were only a few pages using those params though. Benwing2 (talk) 21:08, 4 March 2024 (UTC)Reply

OK, great. — Sgconlaw (talk) 22:04, 4 March 2024 (UTC)Reply

Bugs in ar-conj/module:ar-verb

Latest comment: 6 months ago1 comment1 person in discussion

Hi. I want to inform you about a couple of problems in ar-con/module:ar-verb. I already informed Fenakhay about'em, I'll also inform you just in case, perhaps you can sort it out. I'm sorry in advance for my post being this long:

when I was looking for entries on حَيَّ/حَيِيَ (root ح ي و), I saw long present tense alone (يَحْيَا) still being generated for short form, and it doesn't generate the short one (يَحَيُّ), which exists per Lisan al-Arab: [3]. Needs to be fixed to generate short present tense.

Also a related problem is for عَيَّ/عَيِيَ (root ع ي ي), while the conjugation table for long form عَيِيَ will be generated with specified paradigm i/a with long present يَعْيَا, unlike with حَيَّ, conjugation table for عَيَّ won't be generated at all. Btw, it also has short version of present: يَعْيُّ: [4]

Also notice how participles aren't generated at all for حَيَّ/حَيِيَ (should be short and long versions: حَيّ and حَيِيّ). Fixmaster (talk) 20:45, 5 March 2024 (UTC)Reply

Bugs in ar-conj/module:ar-verb (part 2)

Latest comment: 6 months ago1 comment1 person in discussion

Also notice how participles aren't generated at all in conjugation tables for حَيَّ/حَيِيَ (should be short and long versions of active participles: حَيّ and حَيِيّ). Same goes for عَيَّ/عَيِيَ (should be عَيّ/عَيِيّ per dictionaries).

And if you generate the conjugation table with عَيِيَ (don't forget, the table for عيَّ won't generate at all), there will be participles, but with wrong form: عَايٍ for active and مَعْيُوّ for passive.

Btw, speaking of passive participles, what they should be? In almaany online dictionary, I found مَحْيىّ and مَعِيّ correspondingly. Notice how patterns don't match? In any case, they could probably be ignored, those passove are mostly theoretical and impersonal, anyway. Just thought it was worthy of mentioning.

What matters is the ability to generate the conjugation table at all for short version verb عَيَّ like we have for حَيَّ, long present tense for these 2 (يَحَيُّ and يَعَيُّ) which currently isn't generated, and generation of short/long active participles (حَيّ/جَيِيّ and عَيّ/عَيِيّ)

Just as a side note: maybe there should be parameters in the template to forcefully override active/passive participles (like we have the parameter for verbal nouns)? Just an idea. Fixmaster (talk) 20:41, 5 March 2024 (UTC)Reply

About categories

Latest comment: 6 months ago4 comments2 people in discussion

Feedback on categories from a not-so-clever reader, if you allow me. I find Categories at en.wikt very complex and unpatrolled (many were started by someone, and then were left untouched). Some of them are broken in so specialised subcategories, that one cannot find a wanted word e.g. dog in Cat:en:Animals. Is there an index=1 kind of Category-Index (allll members a...z)? We have done this at @el.wikt.Animals, plants, medicine with a different colour. Just 3 or 4 Cats. The little ««« links to the overall Cat for all languages. Also! The code-indicator for topics makes alphabetisations and comprehension impossible: why should a reader know the codes? If a first word is to be avoided, why not the style: Cat:Animals (English)? Thank you for listening. ‑‑Sarri.greek ^♫ I 03:31, 6 March 2024 (UTC)Reply

@Sarri.greek You've brought up several points and this is a big topic. Can you bring this up in the Beer Parlour? Most of the basic decisions concerning category structure predate me and we'd need consensus to institute any significant changes. Benwing2 (talk) 03:35, 6 March 2024 (UTC)Reply

@Benwing2, Here, I am not an admin, it is not my place to bring such things for discussion -my understanding of en.wikt structure and modules is not adequate-. Sir, I have been thinking at el.wikt (from where my admin.collegues, mostly wikipedians, demanded that i stop, for being too autocratic... True: I cannot stand sloppiness, lack of refs, loose CFI etc. :) But same is valid for all wiktionaries perhaps: 20 years have passed. Basics (plus details too) are covered. What now? I think, a general workpage for a.Feedback on the current state. b.The future plans for formation of crews on each subject. Cleanup, reviewing, and unifgying: cats, params, templats. Leadership: vote plans by Xadmin, by Zadmin., people responsible to do the plan and supervise the crews. If you organise a room /wikt.Future or something... and subpages for Cats, for Temps etc... we could all bring ideas? Plus: a very important thing. en.wikt is now the leader of all wiktionries, where every little wikt. copies from. IF you had to design a wiktionary from scratch, how would you go about it? Because now, it is a patchwork procedure: adding, correcting in a maze of things... Hhhhh I talk too much too! Sorry ‑‑Sarri.greek ^♫ I 04:01, 6 March 2024 (UTC)Reply

@Sarri.greek I think in a wiki it's impossible to do everything top-down. It has to be done through consensus. Also I don't think we need a separate wikt.Future discussion forum or anything; that's what the Beer Parlour is for. There's no need to be an admin to initiate a discussion for change, just go ahead. Benwing2 (talk) 04:32, 6 March 2024 (UTC)Reply

Adding a category with multiple subcategories

Latest comment: 6 months ago3 comments2 people in discussion

Hi, I'd like to add categories to track calls to templates with bad parameters but I haven't touched categories before so I wanted to double check that that this is a reasonable idea and that I'm going about it the right way. I think I need to create a parent category and then use a handler for the per-template categories. Since these would be maintenance categories, I would edit Module:category tree/poscatboiler/data/wiktionary maintenance and insert:

-- add the variable handlers at the top of the page (the file doesn't currently use any handlers)
local handlers = {}

--- snip ---

raw_categories["Pages using bad params when calling a template"] = {
	description = "Pages that use unrecognized parameters when calling a template.",
	additional = "These template calls should be reviewed and corrected or removed",
	breadcrumb = "Bad template params",
	parents = {"Wiktionary maintenance"},
	can_be_empty = true,
	umbrella = false,
	hidden = true,
}

table.insert(handlers, function(data)
	local template = data.label:match("^Pages using bad params when calling (.+)$")
	if template then
		return {
			description = "Pages that use unrecognized parameters when calling " .. template .. ".",
	        additional = "These template calls should be reviewed and corrected or removed",
			breadcrumb = template,
			umbrella = false,
			parents = {{
				name = "Pages using bad params when calling a template",
				sort = template,
			}},
		}
	end
end)

-- add HANDLERS to the existing return table
return {RAW_CATEGORIES = raw_categories, HANDLERS = handlers}

I know I can do something similar using template tracking, but I'm trying to make this a little more "user friendly" with the hope that it won't just be me cleaning up these categories. Is there an overhead cost to using categories like this or anything else I should take into consideration? Thanks! JeffDoozan (talk) 21:04, 8 March 2024 (UTC)Reply

@JeffDoozan Yup, this approach will work, although you need a few changes: (1) use a raw handler instead of a regular handler (because the category in question doesn't begin with a language name), and the first line of the handler should use `data.category` instead of `data.label`; (2) you don't need the `umbrella` settings because raw categories don't have corresponding umbrella categories. Other than that everything looks good. Benwing2 (talk) 21:18, 8 March 2024 (UTC)Reply

After adding categorization to ~300 templates that are used less than 5 times and called at least once with invalid parameters, I think it would be easier for cleanup if the templates were categorized into "language" templates and "general use" templates, like this:

Category:Pages using bad params when calling a template
- Category:Pages using bad params when calling Finnish templates
  - Category:Pages using bad params when calling Template:fi-decl-hame-dot
- Category:Pages using bad params when calling general use templates
  - Category:Pages using bad params when calling Template:cite-av‎

To do that, I came up with the following code:

raw_categories["Pages using bad params when calling a template"] = {
	description = "Pages that use unrecognized parameters when calling a template.",
	breadcrumb = "Bad template params",
	parents = {"Wiktionary maintenance"},
	can_be_empty = true,
	hidden = true,
}

table.insert(raw_handlers, function(data)
	local template_type = data.category:match("^Pages using bad params when calling (.+) templates$")
	if template_type then
		return {
			description = "Pages that use unrecognized parameters when calling " .. template_type .. " templates.",
			breadcrumb = template_type,
			parents = {{
				name = "Pages using bad params when calling a template",
			}},
			hidden = true,
		}
	end
end)

table.insert(raw_handlers, function(data)
	local template = data.category:match("^Pages using bad params when calling (.+)$")

	if template then
        template_name_without_namespace = template:gsub("^Template:", "")

		-- Check if the template name starts with a hyphenated language code
		local lang
		possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?-[a-z][a-z][a-z])-")
		if possible_language_code ~= nil then
			lang = require("Module:languages").getByCode(possible_language_code)
		end

		-- Check if the template name starts with a two or three character language code
		if lang == nil then
			possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?)-")
			lang = require("Module:languages").getByCode(possible_language_code)
		end

		local template_type
		if lang == nil then
			template_type = "general use"
		else
			template_type = lang:getCanonicalName()
		end

		return {
			description = "Pages that use unrecognized parameters when calling " .. template .. ".",
	        additional = "These template calls should be reviewed and the bad parameter should be corrected or removed.",
			breadcrumb = template,
			parents = {{
				name = "Pages using bad params when calling " .. template_type .. " templates",
				sort = template_name_without_namespace,
			}},
			hidden = true,
		}
	end
end)

Am I just re-inventing umbrella categories? Is there a better way to do this? Would this add unnecessary overhead to the categorization system? JeffDoozan (talk) 22:28, 15 March 2024 (UTC)Reply

A couple of code replacements

Latest comment: 6 months ago23 comments3 people in discussion

Hi, as part of the Min Nan split, would it please be possible for you to bot replace a couple of the codes which are being deprecated? The only places these are now used should be links, which should make the switch straightforward.

Hokkien: nan-hok → nan-hbl (etym-only to full language conversion)
Teochew: zhx-teo → nan-tws (code standardisation within the nan family)

Thanks. Theknightwho (talk) 01:12, 11 March 2024 (UTC)Reply

@Theknightwho Sure, will do. Benwing2 (talk) 01:30, 11 March 2024 (UTC)Reply

Thanks. Theknightwho (talk) 01:34, 11 March 2024 (UTC)Reply

@Theknightwho Does the code zhx-teo still exist? I can't find any references to it in the language data. Benwing2 (talk) 01:34, 11 March 2024 (UTC)Reply

@Benwing2 It's currently set up as an alias, but that's just a temporary thing. I recently changed the way aliases are handled so that they're no longer directly integrated into the data, because (a) that added overhead we don't need most of the time, (b) it makes keeping track of aliases easier by collating them all in one place, (c) it means we can use them for situations like this, where a code is being changed for whatever reason, and (d) we can now use them for full languages without having to complicate the language data (see point c). They're now stored in Module:languages/data at the bottom. Theknightwho (talk) 01:37, 11 March 2024 (UTC)Reply

@Theknightwho Ahh, thanks. Benwing2 (talk) 01:41, 11 March 2024 (UTC)Reply

@Benwing2 Btw, it does mean the integration isn't quite as smooth as before, since you now can't use aliases for anything that accesses the language data directly as the alias is only looked up during the creation of a language object. In practical terms, that just means they can't be used anywhere in the language data itself (e.g. the ancestors field). That was semi-intentional, though, since we don't really want aliases in the first place. Theknightwho (talk) 01:45, 11 March 2024 (UTC)Reply

@Theknightwho Yeah that is fine. I agree we should eliminate aliases as much as possible, and in fact I did that previously with a bunch of random etym-only aliases. Benwing2 (talk) 01:47, 11 March 2024 (UTC)Reply

@Benwing2 I've just added a check to Module:data consistency check for alias codes, which covers the data for languages, etym-only languages, families and scripts: all it does is check that none of the subtables has multiple keys (e.g. due to someone adding m["abc"] = m["xyz"], which is the old way aliases were handled).

The only ones it's found at the moment are for various Arabic script codes, where I consolidated all the ones that had identical tables a while back. Working out what to do with them will need a proper discussion, though. Theknightwho (talk) 02:43, 11 March 2024 (UTC)Reply

@Theknightwho Yeah I've never been very happy with having a bunch of language-specific script codes for Arabic and certain other scripts. However, I'm not sure whether it's possible to eliminate them (or some of them) using things like language selectors in CSS. Maybe User:This, that and the other and/or User:Erutuon can comment more. Benwing2 (talk) 02:48, 11 March 2024 (UTC)Reply

@Theknightwho I did a replacement run for both codes but as the tracking categories were only added yesterday, it will take longer to flush out all the old usages (indeed I now see 8 new pages in the nan-hok category and 3 in the zhx-teo category). Benwing2 (talk) 03:22, 11 March 2024 (UTC)Reply

Thanks. Theknightwho (talk) 05:38, 11 March 2024 (UTC)Reply

@Theknightwho I'll do another run tomorrow. Benwing2 (talk) 05:41, 11 March 2024 (UTC)Reply

@Theknightwho Did another run. Going to bed now but will do another one tomorrow evening; hopefully that will catch any stragglers. Benwing2 (talk) 08:42, 11 March 2024 (UTC)Reply

Sounds good - thanks. Theknightwho (talk) 08:43, 11 March 2024 (UTC)Reply

@Theknightwho I did two runs, one just now and one about 10 hours ago, and already more have appeared, so it may be a few days before everything catches up and there are no more additions to the tracking categories. Benwing2 (talk) 07:50, 12 March 2024 (UTC)Reply

@Theknightwho I went through CAT:Terms derived from Hokkien and CAT:Terms derived from Teochew recursively and changed all the terms in them as well as remaining tracked terms (including uses in {{rfp}} and {{cog}} and such). I *THINK* this is done now; probably close enough that you can delete the old codes and handle any remaining errors as they occur. Benwing2 (talk) 22:40, 12 March 2024 (UTC)Reply

@Benwing2 Thanks - I caught one, but that looks to be it. Theknightwho (talk) 18:49, 13 March 2024 (UTC)Reply

I have also wondered why we use those special lang+script codes for the Arab and Beng scripts. Perhaps they date from a time when no other solution was well-supported enough to deliver different fonts for different languages. I note that Syrc and Xsux specify different fonts for different languages with CSS alone, so it is clearly possible to do it that way. (Not too sure what is going on with Mong...) This, that and the other (talk) 03:50, 11 March 2024 (UTC)Reply

@-sche, Surjection Maybe either of you could comment. If we can replace things like fa-Arab with just the appropriate language selectors in MediaWiki:Gadget-LanguagesAndScripts.css I would rather do it that way and not expose what is essentially an implementation detail into the wikicode. Benwing2 (talk) 03:57, 11 March 2024 (UTC)Reply

@This, that and the other, Benwing2 In the case of Mong, it's been split because the code actually covers four closely related scripts: Mongolian (proper), [Oirat] Clear Script, Manchu and Xibe. It's a situation where the split exists to get more accurate language data, rather than because we need different CSS classes (though that may be something we want in the future; Manchu and Oirat-specific fonts exist, and I suspect Xibe as well). In each case, the character ranges only cover the characters used by those scripts; there's some overlap, but most are only used in a subset of the four. See [5] for a breakdown (note: Todo = Clear Script; Sibe = Xibe). (Edit: this distinction does matter in some cases, e.g. Sanskrit, which has Mong, mnc-Mong and xwo-Mong.) Theknightwho (talk) 05:38, 11 March 2024 (UTC)Reply

@Theknightwho could you update the Chinese entry at WT:LT, such as it is? This, that and the other (talk) 03:51, 11 March 2024 (UTC)Reply

Done. Theknightwho (talk) 05:38, 11 March 2024 (UTC)Reply

Module editing tutorials

Latest comment: 6 months ago2 comments2 people in discussion

Hi, would you be able to point me to some places where I can learn more about module creation and editing?

I'm self-taught in HTML which has served me fine for entries and templates, but there are quite a lot of things I would like to see done at the module level in Welsh (ways of presenting collective-singulative nouns, accounting for literary and colloquial forms in adjectives, a template for phrasal verbs, a template for generating IPA transcriptions) that at the moment are well beyond my abilities.

I'd also prefer not to bother other users by constantly asking them to do tasks for me when I could just learn to do it myself. Cheers, Arafsymudwr (talk) 16:45, 13 March 2024 (UTC)Reply

@Arafsymudwr Sorry for the very belated response! The documentation on how modules work, as well as links to tutorials (under the "Getting started" section), is found in WT:LUA. The first thing you will need to do is learn something about Lua. If you are at all familiar with JavaScript, you will find Lua rather similar. When you make a change to a module, you should always test it before saving. The way to do that is to use the "Preview page with this template" functionality (a box near the bottom left) to preview one of the Welsh pages that uses the module. Start by making a small change and gradually make more extensive changes as you get more comfortable. Let me know if I can be of more assistance. Benwing2 (talk) 23:59, 19 March 2024 (UTC)Reply

Min translations

Latest comment: 6 months ago55 comments3 people in discussion

Hi - following the renaming of various Min lects, could you please do the following name replacements in translation sections?

Min Bei → Northern Min (mnp)
Min Dong → Eastern Min (cdo)
Min Zhong → Central Min (czo)
Puxian → Puxian Min (cpx)

They should all be nested under Chinese.

I'm not including Min Nan, since all the translations have to be converted manually due to the split anyway, so changing them to Southern Min would just create confusion. Thanks. Theknightwho (talk) 21:26, 13 March 2024 (UTC)Reply

@Theknightwho OK I have an existing script to sort translations that I was able to modify to handle this. I will run it shortly. As for Min Nan in translation sections, I checked and there are 2,637 pages with Min Nan translations in them so it will take awhile to do this totally manually. I had hoped they would have a qualifier by them indicating the particular Min Nan lect but usually that doesn't seem to be the case. The first two examples, from dictionary and rain cats and dogs, are typical:

*: Min Nan: {{tt+|nan|字典|tr=lī-tián / jī-tián}}, {{tt|nan|詞典|tr=sû-tián}}, {{tt|nan|辭林|tr=sû-lîm}}
*: Min Nan: {{t+|nan|㴙㴙落|tr=tsa̍p-tsa̍p-lo̍h}}

I know little about Min Nan but from what I've heard, I suspect the vast majority of them are Hokkien. It may be possible in any case to speed this up by looking up the terms in question to see whether the lect can be identified. For example, the four terms given above all have Pronunciation sections indicating that the transliterations in question are Hokkien (and some of them also have Taiwanese Hokkien qualifiers). Some translations don't have transliterations given, but in that case as long as there is a Hokkien pronunciation given, I think it's fine to tag it as Hokkien. (Also I looked for Teochew translations and several of them are tagged as nan or even mn, presumably because someone thought mn stood for Min Nan.) Benwing2 (talk) 23:53, 13 March 2024 (UTC)Reply

@Benwing2 Thanks - I've spent a couple of hours going over them so far, and I've already dealt with all the ones that were marked Teochew (including the one labelled mn, yeah). Out of the ones simply marked "Min Nan", I've only found one which was definitely Teochew, with the others all being Hokkien.

In terms of automating it, the safest thing to do would be to convert any which don't have numbered tones to Hokkien, leaving the rest for manual review (which will probably be <20).

There could plausibly be a handful which are in fact Teochew but have POJ-style (i.e. Hokkien-style) transliterations, but I don't think it's feasible to determine those, since it would be way too time-consuming to convert it to the correct romanisation and check against the entry for every single translation.

Theknightwho (talk) 00:02, 14 March 2024 (UTC)Reply

@Theknightwho: Sounds good. For reference here is the complete list of Min Nan translations as of the Mar 1 dump that have numbered tones in them:

Page 872 four: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|四|tr=sì, sù}}, {{qualifier|Teochew}} {{tt+|nan|四|tr=si3}}
Page 873 five: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|五|tr=go, ngò}}, {{qualifier|Teochew}} {{tt+|nan|五|tr=ngou6}}
Page 1054 eight: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|八|tr=peh, poeh, pat}}, {{qualifier|Teochew}} {{tt+|nan|八|tr=boih4}}
Page 2107 percent: Found match for regex: *: Min Nan: {{t|nan|百分之|tr=pah-hun-chi...|alt=百分之……}} {{qualifier|the number follows it, e.g. 30%: 百分之三十 pah-hun-chi saⁿ-cha̍p}}
Page 2462 cousin: Found match for regex: *: Min Nan: {{t|nan|叔伯兄|tr=chek-peh-hiaⁿ}} {{qualifier|{{tooltip|older, father’s brother’s son|[[oFBS]]|und=1}}}}, {{t|nan|叔伯阿兄|tr=chek-peh-a-hiaⁿ}} {{qualifier|{{tooltip|older, father’s brother’s son|[[oFBS]]|und=1}}}}, {{t|nan|叔伯小弟|tr=chek-peh-sió-tī}} {{qualifier|{{tooltip|younger, father’s brother’s son|[[yFBS]]|und=1}}}}, {{t|nan|叔伯阿姊|tr=chek-peh-a-chí}} {{qualifier|{{tooltip|older, father’s brother’s daughter|[[oFBD]]|und=1}}}}, {{t|nan|叔伯小妹|tr=chek-peh-sió-mōe, chek-peh-sió-bē}} {{qualifier|{{tooltip|younger, father’s brother’s daughter|[[yFBD]]|und=1}}}}, {{t|nan|表兄|tr=piáu-hiaⁿ}} {{qualifier|{{tooltip|older, mother’s sibling’s or father’s sister’s son|o[[MSiS]] or [[oFZS]]|und=1}}}}, {{t|nan|表小弟|tr=piáu-sió-tī}} {{qualifier|{{tooltip|younger, mother’s sibling’s or father’s sister’s son|y[[MSiS]] or [[yFZS]]|und=1}}}}, {{t|nan|表姊|tr=piáu-ché, piáu-chí}} {{qualifier|{{tooltip|older, mother’s sibling’s or father’s sister’s daughter|o[[MSiD]] or [[oFZD]]|und=1}}}}, {{t|nan|表小妹|tr=piáu-sió-mōe, piáu-sió-bē}} {{qualifier|{{tooltip|younger, mother’s sibling’s or father’s sister’s daughter|y[[MSiD]] or [[yFZD]]|und=1}}}}
Page 2809 handmaid: Found match for regex: *: Min Nan: {{t|nan|女婢|tr=lu2-pi7}}, {{t|nan|tsa1-boo2-kan2}}
Page 4233 eyelash: Found match for regex: *: Min Nan: {{t|nan|目睭毛//目珠毛|tr=ba̍k-chiu-mn̂g / ba̍k-chiu-mô͘}}, {{t+|nan|目睫毛|tr=ba̍k-chiah-mn̂g / ba̍k-cheeh-mô͘ / ba̍k-chiah-mô͘ / ba̍k-chia̍p-mn̂g / ba̍k-chiap-mn̂g}}, {{t|nan|目毛|tr=ba̍k-mn̂g / ba̍k-mô͘}}; {{t|nan|目眥毛|tr=mag8 ci3 mo5}} {{q|Teochew}}
Page 4352 flesh: Found match for regex: *: Min Nan: {{t+|nan|肉|tr=bah4}}
Page 5089 stiff: Found match for regex: *: Min Nan: {{t|nan|liau1}}
Page 16449 aircraft: Found match for regex: *: Min Nan: {{t|nan|飞行器|tr=hui1-hing5-khi3}}
Page 30166 gnash: Found match for regex: *: Min Nan: {{t|nan|咬牙切齒|tr=ga6 ghê5 ciag4 ki2}}, {{t|nan|咬牙|tr=kā-gê}}, {{t|nan|切齒|tr=chhiat-khí / chhiat-chhí}}
Page 31973 farmer: Found match for regex: *: Min Nan: {{t+|nan|農民|tr=lông-bîn}}, {{t+|nan|作穡人|tr=chò-sit-lâng}}, {{t+|nan|作田人|tr=chó-chhân-lâng}}, {{t|nan|农夫|tr=long5-hu1}}
Page 35994 cabbage: Found match for regex: *: Min Nan: {{t|nan|植物人|tr=sêg4 muêh8 nang5}}
Page 38088 arsehole: Found match for regex: *: Min Nan: {{t|nan|lan7-tsiau2-bin7}}, {{t|nan|臭面人|tr=tshau2-bin7-lang5}}
Page 43201 glove: Found match for regex: *: Min Nan: {{t+|nan|手囊|tr=tshiu2-long5}}, {{t+|nan|手套|tr=tshiu2-tho3}}
Page 45493 reunion: Found match for regex: *: Min Nan: {{t|nan|ui5-loo5 围炉}}
Page 45800 dung beetle: Found match for regex: *: Min Nan: {{t|nan|蜣螂|tr=khiong-lông}}, {{qualifier|Quanzhou Hokkien}} {{t|nan|屎龜|tr=sái-ku}}, {{t+|nan|牛屎龜|tr=gû-sái-ku}}, {{qualifier|Teochew}} {{t|nan|牛屎核|tr=ghu5 sai2 hug8}}
Page 48510 loess: Found match for regex: *: Min Nan: {{t+|nan|黃色}}, {{t|nan|黄砂|tr=hong2 sê1}}
Page 50690 troublesome: Found match for regex: *: Min Nan: {{t|nan|lo1so1}}, {{t|nan|lui1-lui1-tui1-tui1}}, {{t|nan|啰嗦|tr=lo1-so1}}
Page 50799 feud: Found match for regex: *: Min Nan: {{t|nan|se3-siu5}}
Page 54507 sashimi: Found match for regex: *: Min Nan: {{t|nan|刺身|tr=chhiah-sin; sa33 si55 mih3}}
Page 64034 shove: Found match for regex: *: Min Nan: {{t|nan|long1}}, {{t|nan|lang1}}, {{t|nan|nng1}}
Page 67068 vulva: Found match for regex: *: Min Nan: {{t|nan|陰門|tr=im-mn̂g}}, {{t|nan|外阴|tr=gue7-im1}}
Page 76097 shirk: Found match for regex: *: Min Nan: {{t|nan|liu1-kiang1}}
Page 104634 halfway: Found match for regex: *: Min Nan: {{t|nan|半路|tr=puann3-loo7}}
Page 106660 thimble: Found match for regex: *: Min Nan: {{t|nan|鍼黹}}, {{t+|nan|針黹|tr=cham-chí, chiam-chí}} {{qualifier|Mainland China}}, {{t|nan|指套|tr=chí-thò}} {{qualifier|Quanzhou and Xiamen}}, {{t|nan|頂針|tr=dêng2 zam1}}, {{t|nan|銅指|tr=tâng-cháiⁿ}}
Page 106811 spacious: Found match for regex: *: Min Nan: {{t|nan|阔|tr=khuah4}}, {{t|nan|khuann3-long1-long1}}
Page 125580 K2: Found match for regex: *: Min Nan: {{t|nan|K2 Hong}}
Page 179602 disadvantageous: Found match for regex: *: Min Nan: {{t|nan|put4-li7}}
Page 335793 Wiktionary:Beer parlour/2007/April: Found match for regex: :::*:Min Nan: (''Amoy'') [[囡仔]] ([[gín-á]]); (''Teochew'') [[孥囝]] ([[nou5gian2]])
Page 1357199 Wiktionary:Beer parlour/2009/May: Found match for regex: :: That's '''1''' then. The [[child]] has 3 levels. Is it really necessary? Can we keep to 2 levels? For example, ** Min Nan: 囡仔 (gín-á), 孥囝 (nou5gian2) (''Teochew'')? [[User:Atitarev|Anatoli]] 22:39, 10 May 2009 (UTC)

Benwing2 (talk) 00:58, 14 March 2024 (UTC)Reply

Thanks. Theknightwho (talk) 00:59, 14 March 2024 (UTC)Reply

@Theknightwho I am running my script now to change Min Dong -> Eastern Min and Min Bei -> Northern Min and re-sort appropriately (there were no translations involving Min Zhong or Puxian). A couple of questions:

Are you finished fixing up the pages with numbered tones in them that I mentioned above? If so once the script finishes I'll do a run to change Min Nan -> Hokkien in translations along with nan -> nan-hbl, and re-sort.
What about occurrences of Min Dong etc. in {{lb}}, {{tlb}}, {{zh-forms}}, {{q}} (occurring mostly in Synonyms sections), etc.? Do these need to be renamed? On rough count, there are 1,318 occurrences of Min Dong in {{lb}}, 279 in {{q}}, 48 in {{zh-forms}} and 20 in {{tlb}}. Counts for Min Bei are roughly similar, while there are only a few instances of Min Zhong and Puxian (without "Min").

Benwing2 (talk) 02:50, 14 March 2024 (UTC)Reply

@Benwing2 Thanks.

Yes.
Yes. For things like labels etc., "Min Nan" should be changed to "Southern Min".

Theknightwho (talk) 04:12, 14 March 2024 (UTC)Reply

@Theknightwho OK sounds good. #1 is running now. Benwing2 (talk) 04:14, 14 March 2024 (UTC)Reply

@Theknightwho What about things like "[Cc]oastal Min" as occurs in {{zh-forms}} in 唐人 and in {{lb}} in 牛母? (I guess these need manual editing, as it appears Coastal Min can be any of Eastern, Southern or Puxian.) Benwing2 (talk) 04:19, 14 March 2024 (UTC)Reply

See also coastal|_|Min in 儂. Benwing2 (talk) 04:20, 14 March 2024 (UTC)Reply

@Theknightwho: Not sure if this is useful but there are 203 occurrences of from=Min in the Mar 1 dump, which generally occur in {{surname}}:

Page 27803 Cu: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 31307 Lao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 54700 Dee: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}, most notably borne by:
Page 68861 Kong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 71443 Juan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80226 Chan: Found match for regex: # {{surname|tl|from=Min Nan}} (Hokkien) of Chinese origin
Page 80245 Chi: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80288 Co: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80532 Du: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, mostly around [[Cebu]]
Page 80539 Dy: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}
Page 80915 Go: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81022 Haw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81061 Ho: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of Chinese origin, most notably borne by:
Page 81318 King: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 81334 Ko: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81420 Lee: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}
Page 81515 Lu: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81516 Lua: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 81890 Ng: Found match for regex: # {{surname|tl|{{w|Chinese Filipino}}|from=Min Nan}}
Page 82214 Po: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82353 Que: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82618 See: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82665 Shaw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82674 Sia: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82690 Sin: Found match for regex: # {{surname|tl|from=Min Nan}}, most associated with former Archbishop of Manila, {{w|Jaime Sin}}
Page 82735 So: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most notably borne by:
Page 82750 Son: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 82890 Sy: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82930 Tan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82931 Tang: Found match for regex: # {{surname|en|Chinese|from=Min Nan}}.
Page 82949 Te: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82956 Tee: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83037 To: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83141 Ty: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 83141 Ty: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83396 Yap: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83409 Young: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 83409 Young: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 121853 Tiu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 196098 Samson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 766971 Lew: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 825754 Anson: Found match for regex: # {{given name|tl|male|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 825754 Anson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 1066196 Chu: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1178407 Yao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265062 Lim: Found match for regex: # {{surname|ilo|from=Min Nan}}
Page 1265062 Lim: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265654 Cheng: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265730 Ang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265732 Ong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265733 Suan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265734 Cua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266900 Pua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266901 Uy: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266918 Chua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266924 Khoo: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 1266970 Ching: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin or {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, notably borne by:
Page 1277675 Gan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1284142 Koa: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1443955 Nga: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 1579807 Kang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2178085 Deang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2625641 Wee: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2700666 Tin: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2845428 Henson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 3305014 Yang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 3750292 Lo: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 4170429 Chung: Found match for regex: # {{surname|tl|from=Cantonese}}, or {{surname|tl|from=Min Nan}} (Hokkien) of Chinese origin.
Page 4713793 Coo: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, or {{surname|tl|from=Cantonese}} of Cantonese Chinese origin.
Page 5112069 Sanson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5152613 Kho: Found match for regex: # {{surname|tl|Malaysia, Singapore, Indonesia, Philippines, Thailand, Vietnam-Chinese|from=Min Nan}}, most notably borne by:
Page 5152613 Kho: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 5159150 Kua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5171997 Yee: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5375208 Yu: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}, the 26th most common in the Philippines
Page 5375208 Yu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, the 26th most common in the Philippines
Page 5404772 Ngo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5406204 Chong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 5406528 Tong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5406530 Chiu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 5410833 Leong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 5411779 Pang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5413143 Ison: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5415076 Dizon: Found match for regex: # {{surname|pam|from=Min Nan}} of Chinese origin, notably borne by:
Page 5415076 Dizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, notably borne by:
Page 5435565 Yung: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5437599 Shao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5437924 Loo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5438022 Sison: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5438022 Sison: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry, notably borne by:
Page 5438104 Hau: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5438288 Tian: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5439278 Teng: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of [[Hokkien]] origin
Page 5442404 Ting: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5453194 Tien: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5512124 Tuazon: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5512124 Tuazon: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5512124 Tuazon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5514761 Goh: Found match for regex: # {{cln|en|surnames from Chinese}} {{surname|en|Chinese|from=Min Nan}}.
Page 5514761 Goh: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5538352 Niu: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5538352 Niu: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5543775 Quiambao: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5543775 Quiambao: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin, most notably borne by:
Page 5558677 Lacson: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5558677 Lacson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry, most notably borne by:
Page 5582383 Tecson: Found match for regex: # {{surname|en|from=Min Nan}} ''[[Hokkien]] Chinese'', common among Filipinos of Chinese descent.
Page 5582383 Tecson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, most notably descendants of ‘Tek Sun’ brothers from Guangzhou (Canton), China
Page 5584134 Layson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5586737 Cinco: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5586737 Cinco: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5614689 Soon: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 5618852 Singson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5636472 Gozon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5646811 Gotamco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5652715 Cayco: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}}
Page 5652718 Syson: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5652722 Layco: Found match for regex: # {{surname|tl|Tagalog|from=Min Nan}}
Page 5653661 Tengco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of Hokkien origin
Page 5655949 Yuzon: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}
Page 5655949 Yuzon: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5656631 Tiongson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5656647 Cojuangco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, borne by a known political and business clan in the Philippines
Page 5671242 Jocson: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5673469 Tiangco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5674047 Quisumbing: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5674054 Lichauco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5676213 Locsin: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most notably borne by:
Page 5677430 Quizon: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5677430 Quizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most associated with [[w:Dolphy|Dolphy]], which bears the real name of Rudolf Quizon
Page 5677431 Quimpo: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5677431 Quimpo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5678951 Tangco: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5678980 Tiongco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5678984 Guanzon: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5678991 Hizon: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry, most notably descendants of migrants from [[Macau]] to {{w|Parián}}, {{w|Mexico, Pampanga|Mexico}}, {{w|Pampanga}}
Page 5684485 Tiamson: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5686268 Tuason: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, {{alt form|tl|Tuazon|nocap=1}}
Page 5686671 Tio: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5687329 Ganzon: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5689830 Pecson: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5689830 Pecson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5690622 Siason: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5690623 Tiozon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5691453 Unson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin common among Filipinos of Chinese ancestry
Page 5692143 Cuizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5692145 Suico: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5693840 Quimson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5694341 Tancinco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5696938 Ongkiko: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5696941 Sioson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700562 Bauzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700580 Yatco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700589 Gancayco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700604 Limjoco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700656 Coquia: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700659 Dijamco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700712 Ticzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700939 Cosico: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5701342 Yuvienco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5701354 Sangco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5738755 Ayson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5740882 Songco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5764989 Leyson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5769732 Kiamzon: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5769773 Sayson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5773490 Sanciangko: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}
Page 5773649 Guico: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5773673 Tanchoco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5773685 Siongco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5788737 Tayson: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5788738 Limcaoco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5885208 Joson: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5889986 Tanseco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5906982 Siao: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5906982 Siao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5983082 Yongco: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5983082 Yongco: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6060762 Pacquiao: Found match for regex: # {{surname|ceb|from=Min Nan|xlit=Pacquiao}}
Page 6601914 Caw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6601919 Pueson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 6601923 Causon: Found match for regex: # {{surname|tl|from=Min Nan}} common with Filipinos with Chinese ancestry
Page 6601938 Quitson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 6601988 Auyong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6601989 Awyoung: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603830 Syaw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603831 Shau: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603884 Hwan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603960 Liong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 6603976 Mapua: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, notably borne by:
Page 6638858 Banzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 7439359 Teh: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 7782052 Sitchon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 7782063 Itchon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 7849686 Tiong: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 7849688 Diong: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 7924413 Ngeh: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 8003694 Canoy: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 8060607 Gueco: Found match for regex: # {{surname|pam|from=Min Nan 慧哥}}
Page 8343774 Siocson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 8343781 Bengzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 9058034 Quiason: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 9058035 Quiazon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry

As can be seen, these are almost all Min Nan, almost all Tagalog and some of them explicitly say "of Hokkien origin". Are these all Hokkien? If so I'll change them accordingly. Benwing2 (talk) 04:29, 14 March 2024 (UTC)Reply

@Benwing2 Thanks for this re the surnames. The whole "of X origin" thing is totally superfluous imo, so should be deleted. If it explicitly says Hokkien somewhere then change it to that; it might also be possible to infer it from the etymology section, too. Any remaining ones should be left to manual review. Theknightwho (talk) 04:33, 14 March 2024 (UTC)Reply

@Theknightwho All right, I'll do this. BTW some of them are already fixed; I randomly picked Siocson and User:Mlgc1998 fixed it 3 days ago. Benwing2 (talk) 04:36, 14 March 2024 (UTC)Reply

@Benwing2 It's probably fine to keep Coastal Min in {{zh-forms}}. We should probably have proper categories set up for it, which categories like Category:Southern Min Chinese would be part of.

There's a whole issue with labels in Chinese entries causing a ton of duplication between the label categories and the lemma categories, but we've not come up with a satisfactory solution to it yet. Theknightwho (talk) 04:30, 14 March 2024 (UTC)Reply

@Theknightwho Yeah, IMO things like Category:Hokkien Chinese should go away in favor of Category:Hokkien lemmas now that we have the latter. {{lb}} could be made to generate the latter category in place of the former but it doesn't seem like such a good idea as it wouldn't categorize correctly into the other categories. Benwing2 (talk) 04:34, 14 March 2024 (UTC)Reply

Also IMO all label categories that refer to specific lects should have corresponding lang codes, either full or etym-only, and probably the etym-only categories added by the Pronunciation section instead of the {{lb}}. Note also that User:-sche proposed awhile ago renaming "etym-only language" to something else, which IMO is a good idea; they have gone far beyond being used only for etymologies. Benwing2 (talk) 04:39, 14 March 2024 (UTC)Reply

Yeah, agreed. It's probably worth starting a thread on the BP about renaming etym-only languages, as the current name is really misleading. Theknightwho (talk) 04:50, 14 March 2024 (UTC)Reply

Done. BTW it looks like "Min Nan" was already removed from all Tagalog etc. surnames; the only remaining instances of "from=Min" occurred in a few English surnames of Min Dong origin. I cleaned them up and removed the text "of Chinese origin" etc. following various {{surname}} invocations. The script to implement #2 above (correct "Min Dong", "Min Bei" etc. in labels/qualifiers/etc.) is running. Benwing2 (talk) 07:05, 14 March 2024 (UTC)Reply

Task #2 is close to done; going to sleep now. There are still 6,406 occurrences of "Min Nan" in qualifiers, which my script didn't touch. The occurrences can be found here: User:Benwing2/qualifier-min-nan-1 and User:Benwing2/qualifier-min-nan-2 (split over two files because otherwise the files supposedly exceed the 2MB size; in fact the total file size is 1.2MB but there's that stupid doubler effect). Some of the qualifiers occur in Reference sections but the vast majority seem to occur in Synonyms and Antonyms sections. I am guessing again that the majority are Hokkien but I'm not sure, and generally the transliterations aren't attached. Here we might have to fall back on looking up the terms in question to see which lects they are listed as occurring in (which should be bottable, if you provide appropriate instructions). Benwing2 (talk) 08:31, 14 March 2024 (UTC)Reply

@Theknightwho Let me know if you need help with any other renaming tasks that can be done or sped up by bot. I notice you're going through and renaming instances of "Min *" in comments, {{rfp}} params and other random places but there may be too many to do by hand. There were 17,750 pages satisfying the regex (Min Bei|Min Dong|Min Zhong|Puxian|Min Nan) as of the Mar 1 dump, and 12,222 remaining when I re-downloaded the same pages last night before running task #2. Task #2 changed 6,245 pages, meaning there might be on the order of 6,000 pages left, although I can check for sure by re-downloading the same pages. As I mentioned above, most of the occurrences are probably Min Nan occurring in qualifiers because my script didn't change them. Benwing2 (talk) 22:51, 14 March 2024 (UTC)Reply

@Benwing2 Thanks. Yeah, I was just going through and renaming the various "Min Bei" and "Min Dong" labels, but noticed that "Min Nan" is used on thousands of pages. It's annoying, as it's the one where "Hokkien" is sometimes a more appropriate label. That being said, it's not wrong to put "Southern Min", so it would probably be helpful to change those automatically. Theknightwho (talk) 23:04, 14 March 2024 (UTC)Reply

@Theknightwho See my comment above from last night. It's probably possible to figure out how to change Min Nan automatically to the right label by looking up the page in question to see what lects are listed on the page. If you want me to work on that I can although I'd need some instructions as to what lects to look out for. Benwing2 (talk) 23:10, 14 March 2024 (UTC)Reply

@Benwing2 Yes please - @Justinrleung might be able to give better pointers than me. Theknightwho (talk) 23:12, 14 March 2024 (UTC)Reply

@Theknightwho OK, I re-downloaded the relevant pages. There are 7,396 pages remaining satisfying the regex (Min Bei|Min Dong|Min Zhong|Puxian|Min Nan). Of these, 7,128 mention Min Nan; 39 mention Min Bei; 59 mention Min Dong; 22 mention Min Zhong; and 211 mention Puxian but only 15 of those mention Puxian using the regex Puxian($|.$|[^ ]| [^M]), which excludes "Puxian Min". There are 8,195 total lines mentioning of Min Nan (since some pages mention Min Nan more than once). Of these lines, 6,761 contain a qualifier and 6,593 specifically satisfy the regex {q.*{zh-l, i.e. a qualifier followed by a Chinese-style link. Of the 1,605 lines not satisfying {q.*{zh-l, 45 match {q.*{l (a qualifier with a generic link); 111 contain {{thcwd}} or a variant ({{thcwda}}, {{thcwdq}}), almost all preceded by a Min Nan qualifier; 227 contain Min Nan inside of {{zh-forms}}; 21 contain Min Nan inside of {{zh-see}}; 105 contain Min nan inside of {{zh-der}}, {{col3}} or a variant; and 24 contain an occurrence of {{desc}}. Excluding all of these leaves 1,063 occurrences over 412 pages, of which 260 are outside of mainspace. So I think it should be possible to create a script to handle the {q.*{zh-l occurrences, and handle the remainder type-by-type in a semi-manual fashion. Benwing2 (talk) 00:02, 15 March 2024 (UTC)Reply

@Benwing2 Sounds like a good plan. Thanks for doing this. Theknightwho (talk) 00:03, 15 March 2024 (UTC)Reply

@Theknightwho FYI I also did a download run of those same pages checking for those now containing "Southern Min". There are 5,119 lines over 4,377 pages mentioning Southern Min, mostly in labels (as expected) but occasionally in other places that could stand to be reviewed. Benwing2 (talk) 00:05, 15 March 2024 (UTC)Reply

@Theknightwho OK. Can you help me sketch out a general idea of what the qualifiers should be transformed into? For example, I randomly picked page 4445 天涯海角, which contains a synonym 天邊海角 labeled "Min Nan". This latter page has a label Hokkien and it also has {{zh-pron|mn=ml,jj,tw:thiⁿ-piⁿ-hái-kak|cat=cy}}. According to the documentation of {{zh-pron}}, mn means Hokkien and the codes inside mean ml="Mainland China (Xiamen, Quanzhou, Zhangzhou)", jj="Jinjiang", tw="mainstream Taiwan", for which a pronunciation is given. How much info do we want in the qualifiers? Is just "Hokkien" enough in this situation? In general, what lects should be specified in the qualifiers? Maybe just Hokkien, Teochew, Leizhou? Possibly also Quanzhou and/or Zhangzhou dialect if pronun is given for these dialects? This is where I need a bit of guidance from someone like you who knows the languages in question. Benwing2 (talk) 00:24, 15 March 2024 (UTC)Reply

@Benwing2 I'd wait for Justin to comment, as I think you're really overestimating my knowledge. I've got a very broad understanding of what needs to be done, but my understanding of Module:nan-pron is relatively low, so I won't be much help in interpreting the input. Theknightwho (talk) 00:30, 15 March 2024 (UTC)Reply

@Theknightwho OK. I had assumed you know the languages because you seem able to correctly split the lects; maybe you're just a fast learner ;) ... Benwing2 (talk) 00:40, 15 March 2024 (UTC)Reply

@Benwing2: I think for qualifiers of synonyms, etc., it can just be

"Hokkien" when there's only a Hokkien pronunciation, "Teochew" when there's only a Teochew pronunciation, etc., and we don't need to worry about the finer distinctions, which we will get with {{lb}} at the entry. If it's more than one Southern Min variety, we could either use the Southern Min label or list all the relevant Southern Min languages; I don't have a strong feeling about either way. — justin(r)leung _{{ (t...) | c=› }} 01:38, 15 March 2024 (UTC)Reply

@Justinrleung All right. What is the complete list of Southern Min varieties? Benwing2 (talk) 01:39, 15 March 2024 (UTC)Reply

The currently supported varieties in {{zh-pron}} are Hokkien, Teochew and Leizhou Min. Other than these, there's Hainanese as well as other varieties that haven't be dealt with (WT:RFM#Additional Southern Min languages). — justin(r)leung _{{ (t...) | c=› }} 01:46, 15 March 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Justinrleung, Theknightwho: I finished the script to convert Min Nan and Southern Min in qualifiers in Synonym/Antonym sections (and the like; whenever followed by a {{zh-l}} link). Out of 6,283 pages where it tried to do something, it was able to process 5,938, which is a pretty good record (94.5%). The breakdown of lects generated is as follows:

5418 Hokkien
 485 Hokkien|Teochew
  16 Hokkien|Teochew|Leizhou
  10 Hokkien|Leizhou
   9 Teochew

The script issued 663 warnings. They are here: User:Benwing2/min-nan-qualifier-conversion-warnings. One of you two might want to go through them. Note that 268 "may be ignorable" (meaning that the script was able to continue on and ultimately do something, despite the warning). Of the remaining 395, 276 are due to the link referring to a nonexistent page; you'd need domain knowledge to know which lect(s) are appropriate. This leaves 119, of which 50 are "Couldn't parse" errors (the line wasn't formatted in a standard fashion); 35 are "Couldn't find 'Min Nan' or 'Southern Min' qualifier" errors (the qualifier template says something like {{q|literary or Min Nan, Hakka}} or {{q|Cantonese, Min Nan}} rather than just "Min Nan"); 22 are "Saw multiple Etymology/Pronunciation sections" (in such a case, the code tries hard to figure out the correct lects, including using the gloss in the {{zh-l}} link and making sure there is more than one Etymology/Pronunciation section that refers to Min Nan and that the two sections have different lects in them); 5 are "Can't find Chinese section"; and 7 are some random misc stuff. I am going to run the script in save mode either tonight or tomorrow. Benwing2 (talk) 07:52, 15 March 2024 (UTC)Reply

@Theknightwho This is running; maybe 1 to 1.5 hours and it will finish. Benwing2 (talk) 20:43, 15 March 2024 (UTC)Reply

Cool - thanks. Theknightwho (talk) 20:47, 15 March 2024 (UTC)Reply

BTW can {{zh-l}} be replaced by {{l|zh}}? I'm not sure any more what the Chinese-specific behavior in {{zh-l}} is. Maybe it's just automatic handling of traditional vs. simplified forms? Benwing2 (talk) 20:47, 15 March 2024 (UTC)Reply

@Theknightwho Also maybe we can have the lect be specified using a lang code prefix instead of having it a separate qualifier. Benwing2 (talk) 20:48, 15 March 2024 (UTC)Reply

@Benwing2 On that point, would it be possible to do a similar analysis for all uses of the nan code used in the Thesaurus namespace? There are 483 uses at the moment, but conversion is slow as it requires a bunch of manual analysis. Some of them also have "Min Nan" in qualifiers, which will need revising as well. Theknightwho (talk) 20:52, 15 March 2024 (UTC)Reply

@Theknightwho OK, I'll take a look. Benwing2 (talk) 20:54, 15 March 2024 (UTC)Reply

@Theknightwho @Justinrleung For this purpose I think we (a) need to add the missing etym-only codes for Min Nan lects, and (b) we should include the specific lect and not just "Hokkien" in the lang prefix or qualifier. For example, I took a look at Thesaurus:打耳光 meaning "to slap someone in the face"; there are three synonyms labeled nan as well as two more explicitly labeled Zhangzhou Hokkien and Tainan Hokkien respectively. Of the three labeled nan, one is a red link, one is labeled Xiamen Hokkien and one is labeled "Quanzhou, Zhangzhou and Taiwanese Hokkien". Labeling the latter two just "Hokkien" would seem incomplete. Benwing2 (talk) 21:09, 15 March 2024 (UTC)Reply

@Benwing2 The principle I've followed so far has been to use the most specific label which adequately covers everything at the target, where that's possible. So anything that's labelled (e.g.) "Xiamen Hokkien" would get the langcode nan-xmn, but something labelled "Quanzhou, Zhangzhou and Taiwanese Hokkien" would just get nan-hbl. I agree with Justin that the labels for links aren't as important as those on the entries themselves, so incompleteness isn't the end of the world. When multiple lects are mentioned (e.g. Hokkien and Teochew), I've ditched the langcode altogether and put (e.g.) "Southern Min" as a qualifier. Theknightwho (talk) 21:12, 15 March 2024 (UTC)Reply

Also, as an aside, we don't currently have an etym-only langcode for Taiwanese Hokkien, because it's not a well-defined lect in the way varieties like Xiamen, Zhangzhou and Quanzhou are; all three are spoken on Taiwan, but (for historical reasons) the Hokkien-speaking communities on Taiwan have undergone a lot more influence from Japanese and English than their equivalents on the mainland, so it makes sense to use that label sometimes. In those cases, just labelling them "Hokkien" isn't really a problem if it's just in the thesaurus entry. Theknightwho (talk) 21:20, 15 March 2024 (UTC)Reply

@Theknightwho All right, let me look at a few more examples. While we're at it, what do you think of replacing the etym-only codes for the Hokkien varieties with ones conforming to the principles I laid out in WT:RFM? Since these codes are newly added I suspect they're barely used. This would mean nan-jnj -> nan-jin (Jinjiang Hokkien), nan-qzh -> nan-qua (Quanzhou Hokkien), nan-xmn -> nan-xia (Xiamen Hokkien), nan-zzh -> nan-zha (Zhangzhou Hokkien), nan-plp -> nan-qua-PH (probably) or nan-PH (possibly) or nan-phi (perhaps) (Philippine Hokkien). Benwing2 (talk) 21:44, 15 March 2024 (UTC)Reply

@Benwing2 I don't mind too much. I have a small preference for doing it syllabically rather than by the first letters of the name, but I don't mind if you want to use a standardised format for them.

There are sometimes instances where we won't be able to follow it, though (e.g. Category:South Dravidian I languages and Category:South Dravidian II languages, where I opted for dra-sdo and dra-sdt, respectively). Theknightwho (talk) 21:48, 15 March 2024 (UTC)Reply

@Theknightwho Yes, understood. BTW I wouldn't have an issue with something more syllabic than using the first three letters, it's just that it's not so easy to guess automatically what the right set of letters to use is in that case. (Actually the principle you followed for South Dravidian I/II *is* consistent with the principles I laid out, which call for using the initials of the lect when using the first three letters isn't practical.) Benwing2 (talk) 21:53, 15 March 2024 (UTC)Reply

@Theknightwho I changed the language codes. I used nan-hbl-PH for Philippine Hokkien. I think we can go ahead and use nan-hbl-TW for Taiwanese Hokkien, and create subvariety codes for the specific dialects that are derived respectively from Xiamen, Zhangzhou and Quanzhou (e.g. nan-xia-TW etc.). I also modified Module:columns so that it can take a comma-separated list of prefixed lang codes, e.g. nan-hbl,hak:[[毋]][[知]] and handle them appropriately (i.e. using the first one to create the term link but displaying all of them as qualifiers). I'm going to work on fixing up the Thesaurus entries now. Benwing2 (talk) 23:29, 16 March 2024 (UTC)Reply

@Benwing2: I think in most cases, specific dialects of Taiwanese Hokkien should not be tied back to the source varieties of Quanzhou and Zhangzhou (and maybe Xiamen, which is itself generally thought of as a Quanzhou-Zhangzhou mixed variety). These kinds of labels are generally not helpful lexicographically; they are only well-defined phonologically and have small bearing on vocabulary, where much more convergence has occurred in Taiwan due to dialect levelling. The locales in Taiwan (e.g., Lukang, Yilan, etc.) for subdialects of Taiwanese that are less mixed may be more helpful in cases where we want to highlight them. — justin(r)leung _{{ (t...) | c=› }} 02:55, 17 March 2024 (UTC)Reply

@Justinrleung OK, this is fine and it jives well with the nan-hbl-TW label. I was just responding the User:Theknightwho's assertion that Taiwanese Hokkien isn't a well-defined lect. Benwing2 (talk) 03:21, 17 March 2024 (UTC)Reply

@Theknightwho Code is written to process Thesaurus entries and convert nan as appropriate. I will finish the analysis tomorrow and run it. Benwing2 (talk) 09:32, 17 March 2024 (UTC)Reply

@Theknightwho I expanded the script I wrote so it also attempts to convert lects mentioned in <qq:...> qualifiers into lect code prefixes. (This is the origin of that "part 1" section in WT:RFM.) These should not change the qualifier output much (possibly in some cases rearranging the order, that's it) but will help with transliteration and such. Some stats on what I have so far:

I ran it on the 2,013 pages in CAT:Chinese thesaurus entries. It would change 620 pages.
It issues 328 warnings. Of these:
1. 255 of these are due to unrecognized lects in qualifiers. All of these are already discussed in the "part 1" WT:RFM section.
2. Of the remaining 73, 40 are due to looking up a page tagged as nan: and finding it doesn't exist.
3. Of the remaining 33, 14 are "informational" warnings that can be ignored.
4. Of the remaining 19, 15 are due to finding multiple etymologies with different sets of Southern Min varieties in the different etymologies.

Benwing2 (talk) 05:04, 18 March 2024 (UTC)Reply

@Theknightwho Scratch the above stats. My script needs some changes to not overgenerate in the presence of multiple definitions (it already handles multiple etymology/pronunciation sections but needs to be extended for multiple definitions, because sometimes specific labels apply only to specific definitions). Benwing2 (talk) 05:24, 18 March 2024 (UTC)Reply

OK, I rewrote the script to take into account the presence of multiple definitions and try to use the glosses present in Thesaurus pages to whittle down the set of possible definitions to use. The first pass doing that increased warnings from 328 to 1,344 (!) and reduced the number of pages changed from 620 to 490, but I think I can do a whole lot better than that. Stay tuned. Benwing2 (talk) 07:07, 18 March 2024 (UTC)Reply

Theknightwho With some changes I brought the warnings down to 498 and increased the pages changed up to 624. I just ran the script. There are now only 52 pages remaining in the Thesaurus namespace with nan links. The warnings generated are here: User:Benwing2/zh-thesaurus-conversion-warnings, minus the warnings about "Saw unhandled lect qualifier", which aren't very important. (For reference, the first four such warnings are as follows:

Page 4 Thesaurus:一會: WARNING: Saw unhandled lect qualifier Anxi Hokkien (term [[一孔久]]): <qq:Anxi Hokkien>
Page 46 Thesaurus:中飽: WARNING: Saw unhandled lect qualifier Taiwan (term [[歪哥]]): <qq:Taiwan>
Page 56 Thesaurus:亂說: WARNING: Saw unhandled lect qualifier Internet slang (term [[口胡]]): <qq:neologism, Internet slang>
Page 61 Thesaurus:互聯網: WARNING: Saw unhandled lect qualifier Mainland China (term [[網絡]]): <qq:Mainland China, Hong Kong, Macau>

) Of the 245 warnings in that file (covering 144 pages), only 67 of them actually concern being unable to convert the nan code (or occasionally the Min Nan qualifier) to something more specific. I'd focus on those. A couple of such warnings are given here for reference:

Page 26 Thesaurus:不料: WARNING: Unable to convert 'nan' to correct lang code (reason: Found synonym/antonym [[無疑悟]] (template {{col3|zh|不料|孰料|詎料|豈料|yue:點知|想不到|不意<qq:formal>|不虞<qq:literary>|nan:無疑|怎知|哪知|未料|怎料|nan:無疑悟|沒想到|不謂<qq:literary>|不圖<qq:literary>}}, glossed as 'unexpectedly') but page doesn't exist)
Page 53 Thesaurus:乞討: WARNING: Unable to convert 'nan' to correct lang code (reason: Saw multiple definitions with different Southern Min types for synonym/antonym [[分]] (template {{col3|zh|乞討|討乞|行乞|zhx-tai:乞米|nan:討食<t:to beg for food>|nan:分}}, glossed as 'to beg (ask for food or money as charity)'): defn '# to [[divide]]; to [[separate]]' has Hokkien,Teochew while defn '# {{lb|zh|Hakka|Teochew|Hainanese}} to [[give]]' has Hokkien,Teochew,Hainanese; skipping)

Benwing2 (talk) 04:36, 19 March 2024 (UTC)Reply

Generally, {{zh-l}} should be replaced (especially if it's giving a Hokkien pronunciation), but that's probably something to do en masse at another time, as there are tens of thousands of uses so we'll probably want to hash out a proper conversion method. Theknightwho (talk) 20:55, 15 March 2024 (UTC)Reply

@Theknightwho Yes, agreed; just something to keep in mind. Benwing2 (talk) 20:56, 15 March 2024 (UTC)Reply

Module:columns and Module:sa-verb, Module:sa-verb/data

Latest comment: 6 months ago4 comments2 people in discussion

There are 3 sanskrit entries in CAT:E because of an error in {{sa-conj}}, and I checked the entire transclusion list for ततान- your edit to Module:columns is the only recent change to executable code for anything in the list. Indeed, there are comments in Module:sa-verb, saying that code was copied from Module:columns and would need to be updated if that were changed. Chuck Entz (talk) 00:19, 17 March 2024 (UTC)Reply

@Chuck Entz Thank you, I'll fix. I looked for modules using Module:columns but I forgot about the display_from entry point. Benwing2 (talk) 00:21, 17 March 2024 (UTC)Reply

@Chuck Entz I don't think my change to Module:columns has anything to do with this error. User:Exarchus is actively working on Module:sa-verb/data and made the last change only an hour ago. User:Exarchus, can you take a look at these errors? They are due to a buggy Lua pattern. Benwing2 (talk) 00:35, 17 March 2024 (UTC)Reply

I somehow read the dates wrong on those edits- I could have sworn they were from the same date as the ones to Module:sa-verb. You're no doubt right. Sorry! Chuck Entz (talk) 00:47, 17 March 2024 (UTC)Reply

Replacement of quotation templates

Latest comment: 6 months ago3 comments2 people in discussion

Hi, I'd appreciate it if you could do the following bot replacements:

{{RQ:Fuller Bertram Cope}} → {{RQ:H. B. Fuller Bertram Cope}}
{{RQ:Fuller On the Stairs}} → {{RQ:H. B. Fuller On the Stairs}}
{{RQ:Livy Holland Romane Historie}} → {{RQ:Livy Holland Romane Historie|year=1659}} (the template has been updated to add the 1st edition, which is now the default).

Thank you. — Sgconlaw (talk) 16:00, 27 March 2024 (UTC)Reply

@Sgconlaw Done. Benwing2 (talk) 02:08, 28 March 2024 (UTC)Reply

Thanks! — Sgconlaw (talk) 04:18, 28 March 2024 (UTC)Reply

`{{quote-song}}`

Latest comment: 6 months ago1 comment1 person in discussion

Subsequent to our discussion at Wiktionary:Grease_pit/2024/February regarding {{quote-song}}, would you mind making the appropriate edits to the module? RcAlex36 (talk) 07:05, 28 March 2024 (UTC)Reply

Wu information origin

Latest comment: 6 months ago17 comments2 people in discussion

The recent update to Module:labels/data/lang/zh is very much appreciated. However, a lot of the information included seems poorly researched, with a lot of unnecessary/false information, etc. seemingly lifted straight from some enwikp entries, which may be problematic. What sources did you consult? If you need pointers regarding reading/an explanation of the zh primary sources feel free to let me know. Thanks — nd381 (talk) 12:31, 29 March 2024 (UTC)Reply

@ND381 Apologies, I was using a combination of Chinese and English Wikipedia entries and Glottolog, which seem to generally agree with each other. I had assumed they were reliable since they generally agree with each other. I can't read primary sources in Chinese, though, except using Google Translate. Let me know what specifically seems wrong and feel free to correct and/or delete stuff. Benwing2 (talk) 18:46, 29 March 2024 (UTC)Reply

I should add, I generally only created a label when there is a page in the Chinese or English Wikipedia (and hence a Wikidata item) for the particular lect, except in some cases of higher-level groupings. Benwing2 (talk) 19:23, 29 March 2024 (UTC)Reply

Glottolog's family tree is known to have some mistakes and Wikipedia is, well Wikipedia. A few notes from a cursory look at Glottolog's family tree:

it uses Li Rong (1987)'s classifications, which are famously unreliable not just in Wu but nation-wide
It makes some pretty unorthodox naming choices and forgot to put Jinhua under Wuzhou
Its Northenr (Taihu) Wu is a pretty big mess and although I agree with some of their choices it is important to note that not everything there is accepted
- In particular, "Northern Zhejiang" (which I've seen more as "Southern N Wu", tautological as it may be) is one that is highly likely to be a valid branch, which contrasts with Glottolog's "Northwestern", "Su-Hu-Jia", and "Tiaoxi" branches, themselves also forming a Northern N Wu branch
- The "Northwestern Wu" branch is completely disproven as Piling has Southern Mandarinic (ie. Huai) influence whereas Hangzhounese has Northern Mandarinic influence

I (and wpi for Yue) have notified/fixed a lot of the mistakes already present, however, please consult us next time before making large scale, academically controversial changes to Chinese templates. Do you have a Discord or other means of instant messaging? I can send some English-language sources about Wu diachronics and classification so that you can make a more informed decision next time. Thanks — nd381 (talk) 00:17, 30 March 2024 (UTC)Reply

@ND381 I am not on Discord currently; feel free to post the sources here. I did notice that Jinhua was not under Wuzhou in Glottolog, and in general I followed the names used in Wikipedia (English and/or Chinese) except for the reclassifying of Wuzhou Wu as Jinqu Wu, which seems not well-accepted. In terms of intermediate branches, if there is controversy about them, one fairly easy way to handle that is to flatten the trees, so that e.g. the Tiaoxi etc. branches go away. Overall what I have been trying to do for all primary branches is fill out the main missing labels, esp. those corresponding to labels already present in various entries, although I did add more labels for Wu than other branches. Before my changes, things seemed to be in a pretty haphazard state. The idea is that from labels we can create categories and then add the more important ones as etymology-only languages. Keep in mind that in general, information in a place like Module:labels/data/lang/zh can easily be changed as it's in a single location and not propagated across several entries; but indeed I will try to consult you guys in the future. Note also that because of these label changes there are now some uncreated categories in Special:WantedCategories, such as Category:Jinhua Wu, Category:Chuqu Wu, Category:Quzhou Wu and Category:Shaoxing Wu (and possibly more; the data in Special:WantedCategories is from a couple of days ago). I will be creating these categories using {{auto cat|dialect=1}} if that's OK with you; again, the information here is easily fixable if needed. Benwing2 (talk) 00:36, 30 March 2024 (UTC)Reply

Please also note that I just made a change to the code that handles the language variety categories using {{auto cat|dialect=1}} so that the Wikipedia articles are automatically pulled out of the labels in places like Module:labels/data/lang/zh if not explicitly specified. See Category:Jiaoliao Mandarin for an example of where this does its thing. In general I am trying to consolidate the information on lects in fewer places, as it's currently scattered in at least five locations (labels data modules; language data modules such as Module:etymology languages/data; "dialect data" modules used for {{alt}}; category pages that use {{auto cat|dialect=1}}; dialect synonyms data modules such as Module:dialect_synonyms/zh). Benwing2 (talk) 00:47, 30 March 2024 (UTC)Reply

I think what I'm also going to do is add a parent field to the label data so that the tree of lects can be indicated properly; this info is already specifiable in Module:etymology languages/data and {{auto cat|dialect=1}}. Benwing2 (talk) 01:02, 30 March 2024 (UTC)Reply

I see, thank you. I am not against the creation of the category pages.

What we have in the labels page for the most part works now, though I would like to make the following changes:

Lishui and Pucheng (Fujian) Wu are to be separated; the original Fujian ∈ Lishui notation was only done because of the lack of Chuzhou.
"Northern Zhejiang Wu" and "Northwestern Wu" are to be removed as very few sources even mention let alone include them
Jinhuanese is to be included as a Jinqu variety
Taizhouic is to be renamed to Taizhou; "Taizhounese" in itself doesn't really exist as the urban centre of Taizhou is home to several varieties. There is nothing that is conflated with Taizhou in reality other than maybe Tàizhounese (Mandarin > Huai) in Jiangsu
Not implemented yet, but if necessary, Urban Shanghainese should be a subcategory of Shanghainese, which should in turn be a subcategory of Sujiahu

If I have any additional thoughts later I can inform you/edit the labels page. When the time comes.

My sources document is here [6] (a lot of the books are pirated) and is maintained and handled by many other users, including several here on Wiktionary. The Wu section is Section 1.6. Thanks — nd381 (talk) 03:12, 30 March 2024 (UTC)Reply

@ND381 Wow that is a lot of information in that link! Thank you. I will make the suggested changes. A couple of questions, though:

If we rename "Taizhouic Wu" -> "Taizhou Wu", what should happen to the current "Taizhou Wu"? The reason I chose "Taizhouic" as the name is because of the existence of both the Taizhou Wu and Taizhou dialect articles, based on the name "Beijingic Mandarin" (which is found in Glottolog) corresponding to the primary Mandarin branch Beijing Mandarin (division of Mandarin) Mandarin as opposed to "Beijing Mandarin" corresponding to the dialect of Beijing itself, i.e. Beijing dialect. Should we use something like "Urban Taizhou Wu" corresponding to Wikipedia's Taizhou dialect?
"Jinqu" seems to be the more recent name of Wuzhou Wu. Should we get rid of Wuzhou Wu in favor of Jinqu Wu?

Benwing2 (talk) 03:25, 30 March 2024 (UTC)Reply

Also do you have knowledge of Northern Min? According to Chinese Wikipedia, there are two primary branches called Dongxi (東溪) and Xixi (西溪) (although Glottolog does not have them, but groups all Northern Min varieties other than Shaojiang, which seems to not be Northern Min at all, under "Northwestern Min Bei"). If they are real, I am thinking maybe it would be better to call them something like "East Northern Min" and "West Northern Min". Does this make sense? I have similar notes in Module:labels/data/lang/zh about Eastern Min, where the primary branches Funing and Houguan should maybe instead be called North Eastern Min and South Eastern Min. Benwing2 (talk) 03:43, 30 March 2024 (UTC)Reply

Regarding this, I do not study Min. I do not know. Ask one of the Minguists here instead. I would direct you to one but I'm not sure which of the people I know are here and which are on Discord, sorry — nd381 (talk) 09:01, 30 March 2024 (UTC)Reply

1. Delete it. I saw your comment regarding "Beijingic Mandarin" already. The main problem here is that "Taizhounese" doesn't actually refer to something that "corresponds to the dialect of [urban] Taizhou itself", as that would usually be called Linhainese (臨海話), Huangyanese (黃巖話), Jiaojiangese (椒江話), etc., cf. Wugniu. What I have as "Urban Taizhounese" in the dump™ is just an helpful label for searching and is not meant to be used as an authoritative source in classification. I would recommend removing (Urban) Taizhou Wu and rename Taizhouic Wu to just Taizhou Wu. If an "urban Taizhou" label is to be desired, use Jiaojiang.

2. It's not that it's more recent and more just that the revised edition of Li's atlas (Li 2012; the one in the dump™ called 中國語言地圖集), which is still filled with blatantly false information, uses it. I personally use a Wuzhou-Chuzhou-Xinqu (Xin referring to Shangrao) split but you can really do whatever you want/leave it be since classifying these lects is still p contentious

3. Change "Fujian Wu" to Pucheng Wu. No conflict with Pucheng Min since Wu is specified; lects in Ningde and the Jinxiang isolate are Auish (ie. Wenzhou-related) and Northern respectively and would lead to more ambiguity.

Thanks — nd381 (talk) 08:59, 30 March 2024 (UTC)Reply

@ND381 Hi. You caught me right as I'm going to bed but please take a look at the current state of the module. I already renamed Taizhouic to Taizhou and Taizhou to Urban Taizhou; I'll delete the latter. I removed Northern Zhejiang Wu and Northwestern Wu but left Wuzhou as-is, and added a Pucheng Wu node, removing Fujian Wu. BTW I am now going through and adding textual descriptions and parent label properties, which will be useful in centralizing the information currently found in the individual category pages; but none of this is in the production module yet, just on my own machine. Also see the Grease Pit post I made about centralizing/consolidating lect info. Benwing2 (talk) 09:06, 30 March 2024 (UTC)Reply

Thank you. 谷拿脫 — nd381 (talk) 09:35, 30 March 2024 (UTC)Reply

@ND381 I have added support for including descriptions and parent labels in the label data in Module:labels/data/lang/zh. I have converted the Lua comments to parent labels in most cases and added descriptions (using the |region= or sometimes |def= fields) for Mandarin and Northern Wu lects and some others. I am working on Southern Wu now but I may need a bit of help. In particular, I added labels for all the subgroups called 小片 (xiǎopiàn) in Chinese, which we define as "cluster" (see the box near the bottom of the w:zh:吴语 page for a diagram of all these clusters), but increasingly I think they shouldn't be defined. Enwiki generally doesn't include such intermediate divisions in its descriptions of individual dialects, and most of these "clusters" are red links, redirects or stubs in zhwiki. Tentatively I'm thinking of keeping the ones for Northern Wu (Piling, Tiaoxi, Sujiahu, etc.) and probably the ones for Chuqu Wu (Longqu, Chuzhou, Shangshan), and discarding the remainder. Thoughts?

Also, on a related subject, why is it that there is such extraordinary diversity in the Wu lects (esp. the southern ones) in such a small area, when Mandarin lects seem to vary only a little over vast areas? Is the terrain in Zhejiang such that movement is very difficult? Or was there some sort of recent calamity in Northern China that caused migration all over the place (and resulting dialect mixing)? Benwing2 (talk) 06:06, 1 April 2024 (UTC)Reply

mountain = dividing + no wide-scale areal effects (Mandarinic is not a phylogenetic group)

as for xiaopian, honestly if you want you don't need to add any since they're very contentious — nd381 (talk) 10:51, 1 April 2024 (UTC)Reply

@ND381 OK thanks. I have left the xiaopians that I mentioned above (for Northern Wu and Chuqu Wu) and removed the others. I finished adding parents and region descriptions for Southern and Northern Wu lects to Module:labels/data/lang/zh, added all Wu categories that had at least one entry in them and fixed the existing Wu categories to read just {{auto cat|dialect=1}} (instead of having additional parameters to specify the parent, region, etc.), so that the parent and region description get picked up from the label data. There shouldn't be anything very controversial that I added; the descriptions are mostly just listing the area(s) where each lect is spoken per English and Chinese Wikipedia, although in some cases (e.g. Fuyang Wu, Hangzhounese, Jinxiang Wu, Old Guangde Wu, Sujiahu Wu, Urban Shanghainese Wu, Baizhang Wu, Changbei Wu, Jujiang Wu, Old Xuanzhou Wu, Pucheng Ou Wu, Pucheng Wu, and in general all the primary branches) I added text under the |addl= field describing the notable characteristics of the lect.

It occurs to me we will eventually probably need to split Wu into different languages, at least on the primary branch lines (Northern Wu, plus some number of Southern Wu branches); but I think we probably should wait to tackle that until we finish the Southern Min and Yue splits. Benwing2 (talk) 05:13, 2 April 2024 (UTC)Reply

Gender-neutral adjectives in Module:es-headword

Latest comment: 6 months ago2 comments2 people in discussion

I noticed you added the option gneut for gender-neutral nouns in Spanish. Could you add the same option for adjectives?

For example, the headword-line for the adjective latine currently displays as "m or f", which is wrong, it should look the same as the headword-line for the noun. 26agcp (talk) 19:20, 30 March 2024 (UTC)Reply

@26agcp I added this. You can use |gneut=1 on an adjective to indicate that it's gender-neutral, which I have done for latine. I'm not sure whether this will work correctly on adjectives not ending in -e, such as latinx or latin@ (if it's possible to use these as adjectives). If these are adjectives and |gneut=1 doesn't work right, let me know and I'll fix it. Benwing2 (talk) 06:15, 1 April 2024 (UTC)Reply

Category:Chinese terms written in foreign scripts

Latest comment: 5 months ago3 comments2 people in discussion

Hi, I noticed that you've added functionality in Module:zh-pron to automatically add pages that do not contain any Chinese characters to the category. However this has caused the category to be flooded with POJ entries (which are romanisation entries and therefore shouldn't be there) and Zhuyin entries (which are not "foreign"). Can you see how this can be fixed? Or perhaps revert the changes for the time being. Much thanks.

PS: The POJ entries are there because of Module:zh-see which tries to call Module:zh-pron with |only_cat=. It's a total mess there which I don't want to talk about.

– wpi (talk) 13:56, 2 April 2024 (UTC)Reply

@Wpi Thanks for letting me know. The Zhuyin entries should be fixable by changing the regex to exclude Zhuyin/Bopomofo characters. If the POJ entries are only there because they are calling Module:zh-pron with a specific flag, I can check for that flag. Let me see what I can do. Benwing2 (talk) 19:41, 2 April 2024 (UTC)Reply

OK, this should be fixed. Benwing2 (talk) 20:07, 2 April 2024 (UTC)Reply

Replacement of quotation template

Latest comment: 5 months ago3 comments2 people in discussion

Hi, when you are free could you please do the following bot runs?

{{RQ:Dryden Iliad}} → {{RQ:Homer Dryden Iliad}}
{{RQ:Mandela Long Walk to Freedom}} → {{RQ:Mandela Long Walk to Freedom|year=2010}} (the quotation template has been updated to add the 1st edition (1994), so current uses need to be updated)
{{RQ:Pope Iliad}} → {{RQ:Homer Pope Iliad}}
{{RQ:Selver RUR}} → {{RQ:Capek Selver RUR}}

Thank you. — Sgconlaw (talk) 22:52, 2 April 2024 (UTC)Reply

@Sgconlaw Done. Benwing2 (talk) 04:13, 5 April 2024 (UTC)Reply

Thanks! — Sgconlaw (talk) 04:36, 5 April 2024 (UTC)Reply

Time-outs from change to Module:headword

Latest comment: 5 months ago2 comments2 people in discussion

Hi - I think your latest change to Module:headword is causing time-outs at some Written Oirat entries, like ᠨᡇᡇ᠍ᠷ. Theknightwho (talk) 03:38, 5 April 2024 (UTC)Reply

@Theknightwho Yup, I just added a check to make sure this doesn't happen. I don't quite know why it's happening, something weird about the script being returned, but I limit the iterations to 10 now no matter what. Benwing2 (talk) 03:41, 5 April 2024 (UTC)Reply

By the way

Latest comment: 5 months ago9 comments2 people in discussion

It seems that one thing that really slows things down is declaring functions inside other functions. Sometimes it's unavoidable, but there are plenty of instances where it's straightforward to move them out of the parent function; sometimes with extra parameters, if they needed to access any upvalues. This happens a lot with anonymous functions declared inside gsub, but Module:languages is currently a big offender, since all the methods get redeclared every time a language object is requested. Theknightwho (talk) 11:52, 5 April 2024 (UTC)Reply

@Theknightwho Hmmm, interesting. Do you know if it's related to the size of the function (in which case we could move the contents of the larger functions in Module:languages outside of the object) or just the presence of the function? Benwing2 (talk) 18:29, 5 April 2024 (UTC)Reply

@Benwing2 There seems to be an inherent cost for each closure, just like objects. The literal length of the function in bytes is a (very small) factor, but since it's only parsed once it makes no difference whether it's inside another function or not. Theknightwho (talk) 18:35, 5 April 2024 (UTC)Reply

This is basically the memory-speed trade-off with Lua. Local objects/functions are cleared very quickly by the garbage collector (especially anonymous ones), but you need to spend extra time generating each one, and often it's just not worth it. Theknightwho (talk) 18:39, 5 April 2024 (UTC)Reply

@Theknightwho Got it, in which case the only way to speed up Module:languages is to redo it without the use of an object, which would entail (AFAIK) a huge amount of rewriting of code that uses it. Possibly there is an in-between way, e.g. create an object-less version of Module:languages and then create an object that wraps it, and rewrite only the core modules (Module:links, Module:headword, Module:translations, ...?) to use the object-less version. But this is still a fair amount of work. Benwing2 (talk) 19:26, 5 April 2024 (UTC)Reply

@Benwing2 I don’t think it’s as bad as that - it should be possible to move the function declarations out of the language-generating function, since they’re inherited via metamethods anyway. If I remember correctly, the reason for the current set-up is because I wanted to make it possible to grab language objects that use require instead of mw.loadData in contexts where speed is more important than memory-use (since mw.loadData adds a lot of overhead to data access times). I think the only module which uses that option is Module:family tree, since everything’s done in a single invocation via {{auto cat}}. Theknightwho (talk) 15:12, 6 April 2024 (UTC)Reply

@Benwing2 I just implemented this, and this change alone sped things up by about 5% on very large pages. useRequire is now specified using a key and a dedicated lang:loadData(modname) method, but in all honesty it might not be necessary anymore: it was only ever implemented because mw.loadData makes memory usage worse with {{auto cat}}, because everything's contained within one invoke, and some proto-language pages were pushing the old 50MB limit due to the descendants trees. Theknightwho (talk) 23:16, 21 April 2024 (UTC)Reply

@Theknightwho Hmm, are you saying the useRequire functionality might not be needed because it's only used in Module:family tree, and the 50MB limit no longer applies? If so, it might be reasonable to consider removing it at some point, but I would say leave it for the moment because it might be needed elsewhere. I notice for example that some pages that use {{auto cat|dialect=1}} are hitting 65MB or so of memory, and maybe could benefit from this. I think the high memory usage is because of the implementation that searches through all labels to find those that categorize into a particular category, since I've been consolidating the lect info into the labels modules instead of having them scattered in the {{auto cat}} calls themselves and duplicated in several other places. I added memoization of the calls to dialect_handler (which ends up being called multiple times due to the way that the poscatboiler code retrieves information on all parent categories in order to determine the breadcrumbs and the parents' parents etc.), which reduced the memory a bit and sped things up a lot. Maybe adding further memoization of the fetched labels data would reduce memory usage significantly and/or use mw.loadData (the labels data itself is already loaded using mw.loadData but it is then converted into containing structures by Module:labels/utilities, which maybe could be cached since it's all happening in a single {{auto cat}} invocation). Benwing2 (talk) 23:31, 21 April 2024 (UTC)Reply

@Theknightwho BTW thanks for all the profiling work you're doing. This sort of work isn't really my strong point and something I don't really enjoy doing that much, so I am glad you are putting the time into doing it as it's quite necessary. Benwing2 (talk) 23:33, 21 April 2024 (UTC)Reply

pcall and accessing nonexistent pages

Latest comment: 5 months ago4 comments2 people in discussion

I think I've worked out the reason why pcall(require, ...) is so slow when used with nonexistent modules: it's because nothing gets cached in package.loaded, so every time the module's requested it's forced to run the full loader, whereas retrieved modules simply use the cache on subsequent accesses. We could get around the issue by adding false to package.loaded after the first failure, which should speed things up. After doing various profiling tests, I'm pretty sure the issue isn't down to pcall itself. Theknightwho (talk) 06:23, 8 April 2024 (UTC)Reply

@Theknightwho Interesting. This does make sense, and I wondered why I was seeing such slow pcalls (loading nonexistent modules) when you reported no issues with them. Benwing2 (talk) 06:36, 8 April 2024 (UTC)Reply

@Benwing2 I've added safe_require to Module:utilities, which (1) checks if there's a cached value for the module in package.loaded and returns it if so, (2) runs pcall(require, modname), and (3) if the module doesn't exist, caches it as false. Two things of interest:

It's still about twice as fast as require even when handling already-cached modules, since it doesn't bother with all the assert safety checks and so on (close to 1 million iterations per second).
Nonexistent modules still don't work with require even after they've been cached, since require checks if p then instead of if p ~= nil then. I'll put in a Phabricator ticket about that, but they'll probably ignore it.

Theknightwho (talk) 07:43, 8 April 2024 (UTC)Reply

Related to this, I've discovered that if you require a module with a return value of false, you get true haha. require seems to use true as a placeholder so that modules with no return value get cached, but the falsy existence check causes this bug. Theknightwho (talk) 08:58, 8 April 2024 (UTC)Reply

Consolidating into Module:string utilities

Latest comment: 5 months ago10 comments3 people in discussion

Hiya - I've done a total rewrite of most of Module:string utilities, which I'll be introducing over the next few days (so that I don't run into issues changing everything at once). I've decided to reverse course on splitting out functions into their own modules, as I'm not convinced that it's actually very helpful, and it makes organising everything much more confusing.

At the same time, we've got a bunch of duplicate functions floating around (I think there are 4 version of pattern_escape), so it makes sense to consolidate everything. To that end, I think it makes sense to merge in most of the single-function modules, as well as some of the smaller satellite modules which are integral to string manipulation, like Module:pattern utilities, because so many of them are dependent on each other anyway. Theknightwho (talk) 18:22, 8 April 2024 (UTC)Reply

@Theknightwho OK, makes sense! Benwing2 (talk) 18:25, 8 April 2024 (UTC)Reply

@Benwing2 I've rolled out pretty much everything new - there are stll a few single-function modules I want to merge in, but at least the new code seems to be holding up well. By the way - I've renamed capturing_split to split, since it's faster than mw.text.split for everything except the default charset (since that's the only time mw.text.split uses the standard string library), so it makes sense to use with and without capturing groups. Theknightwho (talk) 21:04, 8 April 2024 (UTC)Reply

@Theknightwho Interesting ... I wrote capturing_split() long ago with no particular intention of making it fast; the capturing functionality was just needed by Module:ru-common. Benwing2 (talk) 21:10, 8 April 2024 (UTC)Reply

@Benwing2 I've reworked it pretty heavily, but it essentially still works in the same way. The big thing is finding fast ways to detect whether you can use the string library, since anything involving magic characters in the ustring functions is completely hopeless. Theknightwho (talk) 21:17, 8 April 2024 (UTC)Reply

@Theknightwho Cool, thanks for all this work. I do think after you finish this you should revisit the pattern change in Module:headword made by User:Erutuon that seems to have slowed down the average time of big pages; maybe there's a way to preserve the functionality while avoiding the double Kleene star operators. Benwing2 (talk) 21:21, 8 April 2024 (UTC)Reply

@Benwing2 Yeah, it should be possible to do it in multiple stages. Theknightwho (talk) 21:24, 8 April 2024 (UTC)Reply

Also, just to illustrate the point about speed: the revised split function is over 10 times faster than mw.text.split with the input split("abc", ""), and the gap increases as the string gets longer. Theknightwho (talk) 23:12, 8 April 2024 (UTC)Reply

I'll say again that changing and even removing that pattern altogether and previewing a page didn't seem to significantly up Lua execution, but I gave up pretty quickly so maybe it's worth further testing. — Eru·tuon 00:20, 12 April 2024 (UTC)Reply

@Erutuon Yeah there seems to be a whole lot of variability in times, but something definitely caused an average-time slowdown, just don't know what. Benwing2 (talk) 00:22, 12 April 2024 (UTC)Reply

Module:grc:Dialects

Latest comment: 5 months ago2 comments2 people in discussion

CAT:E is being swamped with Greek pages complaining that this module doesn't exist. I think it may have been deleted prematurely... Ioaxxere (talk) 01:41, 10 April 2024 (UTC)Reply

@Ioaxxere Ahh, fuck me. Thanks to whoever undeleted it. Benwing2 (talk) 01:43, 10 April 2024 (UTC)Reply

Nonfunctional newversion in `{{quote-journal}}`

Latest comment: 5 months ago1 comment1 person in discussion

See zacusi. Neither |journal2= or |work2= is accepted. ―⁠Biolongvistul (talk) 13:42, 11 April 2024 (UTC)Reply

Template:tracking/defdate/hyphen

Latest comment: 5 months ago2 comments2 people in discussion

{{defdate}} still seems to be using this, and is now displaying a redlink to it; see e.g. sirrah or bḥ. (Searching mainspace for "template:tracking" finds about 600 instances of this, but AFAICT no instances of any other template besides defdate doing anything like this, so it seems to be an issue with only this one template, not a more widespread issue.) - -sche (discuss) 18:20, 11 April 2024 (UTC)Reply

@-sche Should be fixed. Benwing2 (talk) 22:23, 11 April 2024 (UTC)Reply

Bot-addition of templates

Latest comment: 5 months ago3 comments2 people in discussion

You've recently told Theknightwho multiple times to not introduce changes to core modules without discussing it with the community beforehand. Then why are you doing the exact same thing with templates? I have not heard anything about this, it's from before my time, I am now finding it all over the place added by a bot, and I have many objections! You can't just go off an old discussion and start mass-adding templates with a bot without making sure the current editors are still fine with it, especially when the discussion that this template was based on had just five users comment on, and, I repeat, is eight years old. Thadh (talk) 21:42, 11 April 2024 (UTC)Reply

@Thadh I did not do that. User:-sche posted in the Beer Parlour about this template, see WT:Beer parlour/2024/April#T:antsense, to finally clarify T:sense on antonyms, and I posted and said I would do a bot run to introduce this if no one objected. I waited about a week; next time I'll wait a month if that would help. What are your specific objections? This can be undone if necessary but I want to make sure your objections can't be met in some other way. Benwing2 (talk) 21:46, 11 April 2024 (UTC)Reply

Yes, waiting a bit longer would be a good idea for next time. I'll post my objections in the BP thread, but thanks for pointing me towards it. Thadh (talk) 21:48, 11 April 2024 (UTC)Reply

Why does Wingerbot has been made to "canonicalize Sicilian phonemic pronun"?

Latest comment: 5 months ago5 comments3 people in discussion

Can I ask you why every Sicilian pronunciations I am encountering it's being wrongly changed in phonological expressions? Hyblaeorum (talk) 09:59, 19 April 2024 (UTC)Reply

@Hyblaeorum What is wrong? User:Nicodene asked me to convert Sicilian pronunciations into their phonemic form, that's all. Benwing2 (talk) 19:24, 19 April 2024 (UTC)Reply

Transcriptions using // are phonemic, and Sicilian only has five vowel phonemes: /i ɛ a ɔ u/. Nicodene (talk) 19:47, 19 April 2024 (UTC)Reply

Actually Sicilian language has 5 stressed vowels and 3 unstressed ones. English got 28.

So if a language like Sicilian has a given set of unstressed vowels in its system are they going to be put out of the slashes?

Just to be clear:

u lupu is not pronuounced /uˈlu.pu/;

it's unavoidably /ʊˈlu.pʊ/

I would like to allow people to know how to speak my language; not spreading misinformations about it. Hyblaeorum (talk) 08:42, 20 April 2024 (UTC)Reply

[ʊ] is an unstressed allophone of the phoneme /u/ in Sicilian. Unless you can show an example where [u] versus [ʊ] can distinguish word A versus word B (a minimal pair) I don’t see any basis for giving a phoneme */ʊ/. It is simply a mis-use of basic linguistic notation. Phonetic and phonemic transcription are not the same thing.

Speaking of misinformation, according to The Oxford Guide to the Romance Languages (pages 250–1) Sicilian /i/, /u/ in word-final position are phonetically [i], [u] and not [ɪ], [ʊ]. I’m not sure why you keep doing that. Nicodene (talk) 09:38, 20 April 2024 (UTC)Reply

"Cannot handle template `{{synonym of}}`."

Latest comment: 5 months ago4 comments2 people in discussion

I thought I would bring this up here rather than throwing red meat to the wolves at a certain other discussion. If we're going to be having Module:transclude pulling from a wide variety of entries, we need to make it robust enough so it doesn't get the vapors at the first sight of a template someone didn't think to program into it. I'm really surprised I haven't seen this error before. Chuck Entz (talk) 02:22, 21 April 2024 (UTC)Reply

@Chuck Entz Yeah this is why I generally only use it for toponyms. Handling things like {{synonym of}}, {{alternative form of}}, etc. is tricky because when you switch to another language, the form-of template no longer becomes valid. In this case, admiral was changed to say it's a synonym of flagship, but naturally that relationship doesn't apply in Middle Polish or any other language. I think in some cases like this one, this can be fixed by just listing the other term without any form-of qualifiers (hence "synonym of flagship; a ship of the line [etc.]" becomes just "flagship; a ship of the line [etc.]"), but that may not work in all cases. The only other alternative I can think of is to just ignore the form-of template in the transclusion. Thoughts? Benwing2 (talk) 02:44, 21 April 2024 (UTC)Reply

My first thought was to ignore, but flag. That way someone could follow up to look for things that could be fixed or that would need to be addressed. Chuck Entz (talk) 02:55, 21 April 2024 (UTC)Reply

@Chuck Entz OK, I will implement something like this: (a) handle all the form-of templates I can think of in some sensible way, (b) handle unrecognized templates by ignoring them but issuing a warning during Preview, and also add template tracking and/or a tracking category, and also maybe logging using mw.log(). We could also insert some text into the output itself saying essentially "implement handling for this template"; I don't know if you think this is a good idea. Benwing2 (talk) 03:27, 21 April 2024 (UTC)Reply

transliteration of Greek to Latin characters

Latest comment: 5 months ago3 comments2 people in discussion

Hi Benwing2, I am usually in el-wikt and only occasionally here. We are looking for a tool in el-wikt to transliterate names in greek characters to latin characters according to en:w:ISO 843. I see that here Template:t does something similar (we would only have to change the table of equivalent characters), but my knowledge is not enough to locate and copy the relevant part (I am looking at Module:translations, but as I told you, I cannot see which module is invoked). Are you the right person to ask for help? If not, who could possibly give us a hand? FocalPoint (talk) 16:45, 22 April 2024 (UTC)Reply

Forget it, we found it ! No need to invest time. Have a nice day. FocalPoint (talk) 05:32, 23 April 2024 (UTC)Reply

@FocalPoint Glad you found it, and sorry for the delay in responding. Benwing2 (talk) 05:50, 23 April 2024 (UTC)Reply

Duplicate categories

Latest comment: 5 months ago2 comments2 people in discussion

Hi, I'm from ckbwiktionary. While mass importing subcategories of Category:Languages by country on ckbwiktionary, I noticed that Category:Languages of Republic of the Congo and Category:Languages of the Republic of the Congo being the same category. Which one should stay? Thanks! Aram (talk) 21:56, 26 April 2024 (UTC)Reply

@Aram Thanks for pointing this out. The latter category should stay; this one is consistent with our naming policies (which include the word "the" when appropriate), and contains all but one of the languages. Benwing2 (talk) 22:08, 26 April 2024 (UTC)Reply

toxic hellstew

Latest comment: 5 months ago5 comments2 people in discussion

The last CFI-meeting quotation I found was from 2015 (this 2019 Wired article is quoting the 2015 blog). I'm not sure whether non-CFI-meeting quotations should have any bearing on a label like ephemeral, assuming we did start using it. Ioaxxere (talk) 06:32, 28 April 2024 (UTC)Reply

@Ioaxxere The larger point is that I don't think you can determine "ephemerality" until well after the fact; 2015 is not far enough in the past. Even if a well-known blog like medium.com doesn't count for CFI, the fact that it is still in use by them (and not in any way quoting the 2015 blog) means it's likely not "ephemeral". Benwing2 (talk) 07:00, 28 April 2024 (UTC)Reply

Yes, I guess we'll have to wait and see although I think it's fairly unlikely that this particular term will get revived. FYI, Medium isn't a blog: it's user-generated content on par with Twitter and others. Ioaxxere (talk) 07:05, 28 April 2024 (UTC)Reply

@Ioaxxere OK, my mistake about Medium but I think the point still stands. Benwing2 (talk) 07:19, 28 April 2024 (UTC)Reply

BTW I would label this term as a neologism; IMO that fairly portrays the facts that it is (relatively) new and not yet (ever?) clearly entered the lexicon. Benwing2 (talk) 07:22, 28 April 2024 (UTC)Reply

Your reverts

Latest comment: 5 months ago5 comments3 people in discussion

Why are you reverting? — Fenakhay ^{(حيطي · مساهماتي)} 22:32, 28 April 2024 (UTC)Reply

Wiktionary:Requests_for_deletion/Others#Category:Noun_plural_forms_by_language seems to show consensus for removal of noun plural form as a headword. Benwing2 (talk) 22:34, 28 April 2024 (UTC)Reply

IMO it accomplishes nothing over noun form. What other sorts of noun forms are there? Benwing2 (talk) 22:35, 28 April 2024 (UTC)Reply

What consensus?? You need a vote in WT:BP if you want to implement this drastic change. There are dual, paucal and plural forms in Chadian Arabic. — Fenakhay ^{(حيطي · مساهماتي)} 22:39, 28 April 2024 (UTC)Reply

A vote for this kind of thing seems appropriate to me. DCDuring (talk) 11:43, 29 April 2024 (UTC)Reply

Replacement of quotation templates (April 2024)

Latest comment: 4 months ago15 comments4 people in discussion

Hello, when you are free, please carry out the following replacements:

{{RQ:Homer Chapman Iliads}} → {{RQ:Homer Chapman Iliads|year=1843}}
{{RQ:Homer Chapman Odysseys}} → {{RQ:Homer Chapman Odysseys|year=1857}}
{{RQ:Thomson Autumn}} → {{RQ:Thomson Autumn|year=1768}} (except gazetteer and mellow)
{{RQ:Thomson Seasons}} → {{RQ:Thomson Seasons|year=1768}}
{{RQ:Thomson Spring}} → {{RQ:Thomson Spring|year=1768}}
{{RQ:Thomson Summer}} → {{RQ:Thomson Summer|year=1768}}
{{RQ:Thomson Winter}} → {{RQ:Thomson Winter|year=1768}}

(I have updated the templates with the 1st editions of the work as the default, so all current uses which were based on later versions need to have those versions specified).

Thank you. (I may add a few more requests over the next few days.) — Sgconlaw (talk) 17:21, 30 April 2024 (UTC)Reply

@Sgconlaw: A friendly suggestion: vary your discussion-section titles, rather than always naming them “Replacement of quotation templates”. Identical names mean that section links all point to the first such-named section of a given page (e.g. #Replacement of quotation templates). 0DF (talk) 17:47, 30 April 2024 (UTC)Reply

@0DF: OK. I used to give them numbers, but it was hard to keep count … — Sgconlaw (talk) 17:50, 30 April 2024 (UTC)Reply

@Sgconlaw: I can quite imagine. What about titles that specify what you're quoting? E.g., in this case, “Replacement of [Homer|Chapman|Odysseys] quotation templates”? 0DF (talk) 21:05, 30 April 2024 (UTC)Reply

@0DF: I'll put the date. But, frankly, does it really matter at all? — Sgconlaw (talk) 22:50, 1 May 2024 (UTC)Reply

@Sgconlaw: On talk pages, not all that much, but it must be inconvenient for Benwing, though I don't know whether he/she cares. Alternatively, you can just append new requests under old requests in the same section. Feel free also to ignore my suggestion; I only made it because I'd noticed it happening several times, but this isn't my talk page, so it hardly affects me. 0DF (talk) 23:06, 1 May 2024 (UTC)Reply

@0DF @Sgconlaw Honestly I didn't even notice that all the titles are the same. Benwing2 (talk) 23:13, 1 May 2024 (UTC)Reply

@Benwing2: À fortiori you didn't care. 0DF (talk) 14:48, 2 May 2024 (UTC)Reply

@0DF: For those of us who read the diffs from the revision history, it's completely irrelevant. A completely unrelated issue, though, is the years of redlinks to the deleted/moved templates that are cluttering Special:WantedTemplates. Now that most of the tracking template links are gone, redlinks to deleted templates are the next obvious target. Chuck Entz (talk) 03:19, 2 May 2024 (UTC)Reply

@Chuck Entz: Perhaps, but for those who don't, it is relevant. 0DF (talk) 14:48, 2 May 2024 (UTC)Reply

@Chuck Entz Yup, I noticed that although I'm not sure how to solve it. I suppose we could go through the user, talk and Wiktionary pages that link to these templates and comment them out using <nowiki> or whatever, although it would be nice if MediaWiki provided a way of suppressing links by namespace when generating these lists. Benwing2 (talk) 03:49, 2 May 2024 (UTC)Reply

@Sgconlaw Are you expecting to add more requests? If not I will go ahead and run the ones you've listed. Benwing2 (talk) 04:40, 6 May 2024 (UTC)Reply

OK, I've added the other requests. Please go ahead! — Sgconlaw (talk) 15:31, 6 May 2024 (UTC)Reply

@Sgconlaw Should be done. Note that it also ran on the template pages themselves as well as Wiktionary:Quotations/Templates/English T–Z and Wiktionary:Quotations/Templates/English C; if this is wrong, please undo. Thanks! Benwing2 (talk) 20:35, 6 May 2024 (UTC)Reply

Thanks! — Sgconlaw (talk) 20:39, 6 May 2024 (UTC)Reply

Template:eu-verb form of/new

Latest comment: 4 months ago3 comments2 people in discussion

Hello: If possible, could you run your bot to change all instances of {{eu-verb form of/new}} to {{eu-verb form of}}? Both templates are identical so no other changes are required. Thank you in advance. Santi2222 (talk) 18:03, 6 May 2024 (UTC)Reply

@Santi2222 Done. Benwing2 (talk) 20:43, 6 May 2024 (UTC)Reply

Perfect, thanks! Santi2222 (talk) 11:10, 7 May 2024 (UTC)Reply

WingerBot converting "gl" to "q"

Latest comment: 1 month ago6 comments3 people in discussion

Hi! I noticed WingerBot converting "gl" to "q" in Tagalog. However, I see it changing glosses of definitions into "q". It says here that to provide context in definitions, we use "gl" instead of "q". So it seems to me that the bot edits done to Tagalog lemmas seems to be wrong. Could you check? Thanks! Mar vin kaiser (talk) 07:24, 7 May 2024 (UTC)Reply

@Mar vin kaiser Hi Mar (can I call you that?). Can you give me some examples? I am following standard practice, which is AFAIK: glosses are used when giving full definitions of terms; labels are used to label registers and grammar and usage characteristics before the definition; raw parens are used to indicate typical direct objects that are not part of the meaning of the verb when standing by itself; and qualifiers are used for everything else. The documentation you are pointing to is in no way standard practice; it was added by User:Fytcha in November 2021 based, I assume, on a misunderstanding of what {{gloss}} is for. More specifically:

direct object arguments to verbs go in raw parens, e.g. to hand (something to someone) NOT to hand {{gl|something to someone}}.
full definitions for concepts, either to clarify polysemy or for unfamiliar concepts, use {{gl}}. e.g. [[advocacy]] {{gl|public support of a particular cause}}.
all other clarifications and qualifications use {{q}}, e.g. [[develop|developed]]; [[progressive]] {{q|of a people, nation, etc.}}.

Note the difference between case 2, which *defines* ("glosses") the term advocacy and case 3, which does *NOT* define the terms developed or progressive but *qualifies* then by giving the context in which the terms might be appropriately uesd.

Note also that I went through manually looking for uses of {{gl}} and corrected them by hand to use the correct syntax. The only purpose of the bot was to push changes I made manually offline, in a text editor. If you think I've gotten standard practice wrong, please post in the BP about this and we can have a discussion about what standard practice in definitions reall is when it comes to {{q}} vs. {{lb}} vs. {{gl}}. Benwing2 (talk) 08:25, 7 May 2024 (UTC)Reply

@Benwing2: Thanks for the reply. I thought the documentation I shared was the standard practice. Where can I find a guide where the difference is explained as part of standard practice? --Mar vin kaiser (talk) 08:29, 7 May 2024 (UTC)Reply

@Mar vin kaiser The closest I could find was Wiktionary:Style_guide#Styling templates, where it says the usage of {{gl}} is "Used to gloss a definition by redefining it in different words, especially to disambiguate an English defining word with many definitions, goes after the definition." and the usage of {{q}} is "Miscellaneous explanatory text (register, variety etc.)". Meanwhile under Wiktionary:Style_guide#Parentheses it says "Parentheses should be used in definitions only for the purpose of identifying the selectional restrictions of the headword in the current sense:" which is consistent with my #1 above. Benwing2 (talk) 19:30, 7 May 2024 (UTC)Reply

@Benwing2: Sorry for the late reply. To explain where I was coming from when I made that edit: I thought that providing full definitions to disambiguate polysemy (as in your given example [[advocacy]] {{gl|public support of a particular cause}}) also falls under "[providing] context for a definition". If you think this is confusing or wrong, please revert my edit. Thanks. — Fytcha〈 T | L | C 〉 13:46, 27 August 2024 (UTC)Reply

@Fytcha Hello! Glad to see you're back. I'll fix the documentation to be clearer. Benwing2 (talk) 18:49, 27 August 2024 (UTC)Reply

"a" is for module errors

Latest comment: 4 months ago21 comments4 people in discussion

Adding the new {{tl-pr}} template seems to have pushed this over the edge into permanent rather than intermittent Lua timeout errors. I'm not sure what can be done about it. Chuck Entz (talk) 14:05, 12 May 2024 (UTC)Reply

Yeah, this has been concerning me as well, and there are no obvious causes which stand out. Theknightwho (talk) 16:39, 12 May 2024 (UTC)Reply

@Chuck Entz @Theknightwho We can revert the changes on this page back to where they were before as a temporary fix, but I think for a permanent fix we might have to either split up letter pages or request some sort of per-page exception to the Lua limits. Letter pages in general are problematic because there are potentially thousands of languages that could be on the page. Benwing2 (talk) 19:25, 12 May 2024 (UTC)Reply

BTW I tried switching {{tl-pr}} to use "raw" notation like this:

{{tl-pr
|raw:/ˈʔej/ [ˈʔɛɪ̯]<qq:letter name, Filipino alphabet>
|raw:/ˈʔa/ [ˈʔa]<qq:letter name, Abakada alphabet, Abecedario>
|raw:/ˈa/ [ˈa]<qq:phoneme, stressed>
|raw:/a/ [ɐ]<qq:phoneme, unstressed>
|syll=a
}}

Here we specify the IPA directly to avoid going through the code to generate the IPA, but it seems to make no difference. Benwing2 (talk) 19:47, 12 May 2024 (UTC)Reply

@Benwing2 Yeah, a lot of it is down to the inherent costs of Scribunto and the underlying core modules. I submitted this patch for review today which should help a little bit: it speeds up mw.clone by about 15%, and cloning _G for each invoke is a big contributor to the inherent cost. Theknightwho (talk) 20:55, 12 May 2024 (UTC)Reply

The problem with asking for more Lua time is that the page already takes too long to load overall, independent of any system resources it uses. One time I got a server timeout while doing a null edit. Chuck Entz (talk) 21:28, 12 May 2024 (UTC)Reply

@Chuck Entz Yeah that is true. I personally think MediaWiki should not try to recompute pages on the fly if they took more than a certain time to be generated last time they were generated, but that would require some discussion I'm sure by the MediaWiki developers. I think in the long run we'll have to split the letter pages somehow, although I'm not sure how. Benwing2 (talk) 21:34, 12 May 2024 (UTC)Reply

@Benwing2 We could try doing a general version of {{multitrans}}, which was what the template parser was originally written to do, but we'd need to come up with a way to solve the section edit link issue, because they don't normally appear for headings enclosed in templates. Theknightwho (talk) 21:41, 12 May 2024 (UTC)Reply

@Ioaxxere just substed the entire page, which caused a module error in {{desctree}} at Reconstruction:Proto-Slavic/a. I fixed a problem with derivation categories caused by the Breton etymology using Welsh language codes, but I won't mind if you wipe out my work by reverting to the pre-subst version. Chuck Entz (talk) 23:26, 12 May 2024 (UTC)Reply

Oops, I admit that I neglected the possibility of other pages relying on the templates at a. Maybe the best solution is to only subst certain templates which are both Lua-hungry and relatively unlikely to be modified. Alternatively, these other templates could be modified to work with HTML. Ioaxxere (talk) 23:34, 12 May 2024 (UTC)Reply

@Theknightwho I think there is a way to solve the section link issue. User:This, that and the other might have ideas.

@Ioaxxere @Chuck Entz If we subst it like this, we need to write JavaScript to allow the page to be updated easily from the source. In general though this is a sub-optimal solution, as it's difficult to prevent people from manually hacking the substed version rather than the source. Benwing2 (talk) 23:50, 12 May 2024 (UTC)Reply

I think the best solution is spin the letter entries off to subpages, since they're a completely different sort of lexicography- no syntax or semantics. So "X" as the 24th letter of the English alphabet would go there, but the "adult-content" symbol would stay. Likewise, all the prepositions, particles, etc. at "a" would stay, and probably the abbreviations on all the pages. It would at least temporarily solve our system problems and also address the issues that Kwamikagami has been hacking and ad-hoc-ing over. We might even think about setting up a namespace for the character entries. Chuck Entz (talk) 01:23, 13 May 2024 (UTC)Reply

@Chuck Entz I would be in favor of that. What about names of letters, since often the name of the letter "a" is also "a"? Do we leave those or move them? Benwing2 (talk) 01:44, 13 May 2024 (UTC)Reply

Those seem like entry-page material. After all, there would still be entries like aitch, so why would we treat the ones that happen to be the same as letters any differently? Chuck Entz (talk) 02:09, 13 May 2024 (UTC)Reply

@Chuck Entz Makes sense. I think we should bring this up in the Beer parlour. Benwing2 (talk) 02:16, 13 May 2024 (UTC)Reply

@Benwing2 @Chuck Entz I've discovered a major cause of the recent uptick: Module:headword/data, which is used by Module:headword, calls the process_page function in Module:headword/page right at the end, which means the data table contains a load of computationally expensive data that only needs to be generated once for the whole page.

Normally this isn't an issue, but if the pagename= parameter is given to {{head}}, then process_page will be run again with that new pagename instead. This is usually only needed for testing, but a bunch of headword modules have been using the actual pagename as a default value for pagename=, which has meant process_page was being called about 15 times on a, adding a ton of unnecessary work.

I've amended Module:headword so that it only does that if pagename= is something other than the default value, but this really should be fixed in the calling modules, because they'll be wrongly overriding the proper default pagename calculated by Module:headword/data, which is non-trivial to determine for things like unsupported titles. Theknightwho (talk) 17:23, 13 May 2024 (UTC)Reply

@Theknightwho Oops, that was my doing. I have meant to clean up the handling of pagename in the various modules I've worked on but I didn't realize it was causing this problem. I'll make it a priority to fix these. Can you point me to some of the modules doing this? Benwing2 (talk) 19:43, 13 May 2024 (UTC)Reply

@Benwing2 I didn't look in much detail, but it was a bunch of Romance ones plus Tagalog iirc. Theknightwho (talk) 19:47, 13 May 2024 (UTC)Reply

@Theknightwho OK thanks! Benwing2 (talk) 19:55, 13 May 2024 (UTC)Reply

@Theknightwho The easiest way to clean this up is to keep your fix in place that avoids rerunning process_page (which you can view as not a hack but a sort of caching fix), and change the modules to use mw.loadData("Module:headword/data").pagename rather than mw.title.getCurrentTitle().subpageText as the default pagename when |pagename= isn't explicitly given by the user. (This is already done by the Spanish headword module in fact.) Lots of the modules unilaterally set pagename in the Module:headword data structure passed from the main entry point to the POS-specific functions, and some of them need to be able to have access to the pagename (whether the actual pagename or user-specified proxy) to do POS-specific processing. The alternative is to have two different fields in the Module:headword data structure, one that is always set with the pagename and one which is set only when the pagename needs to be overridden. To me this seems an unnecessary complication given your recent fix, but if you think this is the right approach, I would recommend renaming the pagename field that is recognized by Module:headword to be overriding_pagename. That makes it clear that this field should only be set when overriding the default pagename; otherwise the various headword modules would need some other field name to hold the always-available pagename, which would IMO be less than clear. What do you think? Benwing2 (talk) 05:52, 14 May 2024 (UTC)Reply

@Benwing2 I think the first solution is preferable, to be honest.

A third thing that could help would be to separate out the stuff in Module:headword/page that relies on the pagename from the stuff that doesn't, but that would be pretty faffy. Theknightwho (talk) 16:44, 14 May 2024 (UTC)Reply

привыкнул

Latest comment: 4 months ago4 comments3 people in discussion

Would you mind undeleting привыкнул? It is an archaic form of привык. Source: Ushakov Dictionary, online: [7] [8], quote: ПРИВЫКНУТЬ, привыкну, привыкнешь, прош. привык, привыкла, и (устар.) привыкнул, сов. (к привыкать). For context, "устар." is short for "устаревшее" (neuter form of устаревший) (see ru:Викисловарь:Условные_сокращения). This archaic form can be encountered in older books. For example, it shows up five times in the Russian National Corpus: corpus search results. —andrybak (talk) 10:56, 15 May 2024 (UTC)Reply

It's not even archaic, I mainly use this form. Thadh (talk) 13:01, 15 May 2024 (UTC)Reply

@Thadh, same, but this would be original research ;-) —andrybak (talk) 13:13, 15 May 2024 (UTC)Reply

@Andrybak I undeleted it and marked it as archaic. Benwing2 (talk) 23:21, 15 May 2024 (UTC)Reply

K

Latest comment: 4 months ago2 comments2 people in discussion

K 194.71.19.145 08:42, 20 May 2024 (UTC)Reply

? Benwing2 (talk) 09:02, 20 May 2024 (UTC)Reply

Phrasal verbs with forward

Latest comment: 4 months ago7 comments2 people in discussion

Your bot deleted these, but we really should include them. Any fix? Denazz (talk) 09:55, 20 May 2024 (UTC)Reply

There may be other particles missing too, as there used to be over 5000 entries in Category:English phrasal verbs. Would be a bummer to add them all again manually. Denazz (talk) 10:02, 20 May 2024 (UTC)Reply

I'm referring to the fact that Category:English phrasal verbs formed with "forward" is empty. Denazz (talk) 10:04, 20 May 2024 (UTC)Reply

@Denazz This is easy to fix without adding all the manual categorization; there is a list of recognized particles in Module:en-headword. Please check out the list and let me know if anything is missing. Benwing2 (talk) 00:26, 21 May 2024 (UTC)Reply

Easy to fix for the programmers like you! I don't understand modules, Ben, and see no "list of recognized particles". You'll need to guide me to the list, I'm afraid. Assume I'm a moron, and you won't be far off! Denazz (talk) 14:52, 21 May 2024 (UTC)Reply

OK, the list is easy to find on that page, but it is protected. I suggest adding "asunder" to the list. Denazz (talk) 14:56, 21 May 2024 (UTC)Reply

I think asunder,forward,forwards,adrift,aground,backwards should be added. Denazz (talk) 14:58, 21 May 2024 (UTC)Reply

A minor issue.

Latest comment: 4 months ago1 comment1 person in discussion

𑀪𑀸𑀅 (bhāa) is the altform of 𑀪𑀸𑀕 (bhāga) and has two descendants. 𑀪𑀸𑀅 (bhāa) and 𑀪𑀸𑀕 (bhāga) are descendants of भाग (bhāga), so under that Sanskrit entry I put {{desctree|pra|𑀪𑀸𑀕|𑀪𑀸𑀅}}, which should have shown 𑀪𑀸𑀕 (bhāga), 𑀪𑀸𑀅 (bhāa) and the two descendants of 𑀪𑀸𑀅 (bhāa). It is doing exactly that, except for one mistake: 𑀪𑀸𑀅 (bhāa) is now appearing twice. Could you please look into the issue when you have time? -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 01:52, 22 May 2024 (UTC)Reply

IPA|a=

Latest comment: 4 months ago2 comments2 people in discussion

Hi, now that {{IPA}} takes |a= for accents, please edit {{x2IPA}} to take it as well so that inputs like {{subst:x2IPA|en|/dOg/|a=GA}} work. Thanks! —Mahāgaja · talk 12:47, 23 May 2024 (UTC)Reply

@Mahagaja Should be fixed. Benwing2 (talk) 18:51, 23 May 2024 (UTC)Reply

Replacement for {{RQ:Milton Paradise Lost}}

Latest comment: 2 months ago8 comments3 people in discussion

Hi, I have overhauled {{RQ:Milton Paradise Lost}}. Could you please carry out the following bot replacement?

If a use includes |edition=2nd, don't make any changes.
Otherwise, add |year=1873 to the use.
If a use has an Arabic or Roman numeral in the |1= position, add |book= to it.
If a use includes |url=, remove it as that parameter is no longer in the template.

Thank you. (@Chuck Entz: for your information.) — Sgconlaw (talk) 15:22, 27 May 2024 (UTC)Reply

@Sgconlaw I implemented (4) for the time being to reduce the errors in CAT:E. The others will come shortly. Benwing2 (talk) 23:42, 27 May 2024 (UTC)Reply

Just a gentle reminder that there are still 91 entries in CAT:PFE because of this. Chuck Entz (talk) 14:48, 29 May 2024 (UTC)Reply

@Chuck Entz OK, I'll take a look. Benwing2 (talk) 18:34, 29 May 2024 (UTC)Reply

Spot-checking CAT:PFE, they seem to all be due to Roman numerals in |1=. Of course, anything with Arabic numerals for the book number in |1= won't show up in CAT:PFE, so I wouldn't see those. I don't know if there are any that have page numbers already. Perhaps just flag those with |1= ≤12 and skip |1= >12, since there are no more than 12 books in any edition.

Parameter 1	Action	Parameter 2	Action
none	none	none	none
none	none	text	none
Roman numeral	add "book="	none
Roman numeral	add "book="	text	add "passage="
Arabic numeral	not sure: add "book="?	none	none
Arabic numeral	not sure: add "book="?	text	if "book=" added to Parameter 1, add "passage="

Chuck Entz (talk) 01:10, 1 June 2024 (UTC)Reply

Thanks, I've been punting on this because (a) I need to verify and enumerate all the cases, which you seem to have done, (b) I need to write a script (I have a script to handle most requests from User:Sgconlaw, but this one has special logic in it). Thanks for writing out the steps. Benwing2 (talk) 01:13, 1 June 2024 (UTC)Reply

The template can deal with either an Arabic or Roman numeral as a value for |book=. — Sgconlaw (talk) 04:57, 1 June 2024 (UTC)Reply

Wonder if you've had a chance to work on this? — Sgconlaw (talk) 21:40, 17 July 2024 (UTC)Reply

Template:label

Latest comment: 4 months ago3 comments2 people in discussion

We have to come up with a better way to show the labels. Currently the template page is periodically popping up in CAT:E, and just now I got a server timeout while doing a null edit with previewing enabled. It's only a matter of time until the module error becomes permanent. Perhaps the list could be split up into separate subpages linked to from the documentation page. Chuck Entz (talk) 14:46, 29 May 2024 (UTC)Reply

@Chuck Entz Yes, I agree. I just previewed it and it took 3 tries to not get it to time out. From looking at the list, it looks like maybe we could split it three ways: (a) language independent, (b) language-dependent A-G, (c) language-dependent H-Z. Benwing2 (talk) 18:38, 29 May 2024 (UTC)Reply

@Chuck Entz I turned off label tracking while processing the large table of labels, and now the CPU usage is around 6 secs, the memory usage is about 64MB and the post-expand size is around 1.4MB; all well below the limits. So it may not be necessary to split the labels for awhile. Benwing2 (talk) 04:44, 30 May 2024 (UTC)Reply

Replacement of quotation templates (May–July 2024)

Latest comment: 2 months ago9 comments2 people in discussion

Hi, could you please carry out the following replacements:

{{RQ:Butler Hudibras}} → {{RQ:Butler Hudibras|year=1905}}. The parameter |year=1678 is no longer used and should be removed.
{{RQ:Falkner Moonfleet}} → {{RQ:Falkner Moonfleet|year=1934}} (if |year= is not currently used). Also, remove all instances of the |chapter= parameter as it is not used by the template any more.
{{RQ:Marlowe Edward 2}} → {{RQ:Marlowe Edward 2|year=1622}}
{{RQ:Tusser Good Husbandrie}}:
- If no |year=, add {{RQ:Tusser Good Husbandrie|year=1810}}.
- If |year=1580 appears, change to {{RQ:Tusser Good Husbandrie|year=1878}}.
{{RQ:Wollstonecraft Vindication Women}} – remove the |chapter= parameter which is no longer used.

Thank you. (@Chuck Entz, for your information.) — Sgconlaw (talk) 18:12, 29 May 2024 (UTC)Reply

Wondering if you've had a chance to work on these? I added one more. — Sgconlaw (talk) 21:39, 17 July 2024 (UTC)Reply

@Sgconlaw Oops, I totally overlooked this. Let me do them now. Benwing2 (talk) 22:21, 17 July 2024 (UTC)Reply

Much obliged! — Sgconlaw (talk) 22:32, 17 July 2024 (UTC)Reply

@Sgconlaw You mentioned above that year=1678 for {{RQ:Butler Hudibras}} should be removed, but in windore the year is 1663. Benwing2 (talk) 23:11, 17 July 2024 (UTC)Reply

That was a typo, as |year=1663 doesn’t do anything in the template as far as I’m aware. — Sgconlaw (talk) 05:06, 18 July 2024 (UTC)Reply

@Sgconlaw OK, everything else should be done. Let me know if you see any errors or issues. Benwing2 (talk) 05:25, 18 July 2024 (UTC)Reply

Thanks! By the way, I also asked about {{RQ:Milton Paradise Lost}} above. — Sgconlaw (talk) 16:30, 18 July 2024 (UTC)Reply

Oops, I missed that too :( ... Benwing2 (talk) 20:44, 18 July 2024 (UTC)Reply

Coding help

Latest comment: 4 months ago3 comments2 people in discussion

Hello,

I created some categories (Theodore Roosevelt, Wars) and I need help integrating them into the module/cat tree (because Wiktionary uses a module and not just categorization like everywhere else 😡😡😡). Do you do that sort of thing? I filed a couple requests on Grease Pit but nobody replied Purplebackpack89 13:54, 30 May 2024 (UTC)Reply

Your request there reads like something an executive would leave on the voicemail of an assistant. It would help if you made a case for it. Otherwise the unspoken gut response is going to be "did you want fries with that?". Chuck Entz (talk) 15:09, 30 May 2024 (UTC)Reply

I'm going to interpret your comment as, "make a case for the category's existence". The Theodore Roosevelt category should exist because there are more than a dozen words about, created by, or popularized by him. The Wars category should exist because there are quite a few proper nouns that are Wars and contain the word "War" Purplebackpack89 16:20, 30 May 2024 (UTC)Reply

Yeak Laom

Latest comment: 3 months ago2 comments2 people in discussion

I am a newbie at using the {{place}} template- could you please improve its formatting in this entry to enable better categorization? Thank you! Inqilābī 17:00, 2 June 2024 (UTC)Reply

@Inqilābī It generally looks good to me. I made a minor tweak to add the words municipality and province for clarity. Communes aren't currently categorized separately. If you want lakes categorized, you could write it like this (it's not categorizing currently because it doesn't recognize the words volcanic or crater as qualifiers):

{{place|en|A volcanic crater <<lake>> in <<c/Cambodia>>}}.

Benwing2 (talk) 00:33, 3 June 2024 (UTC)Reply

dont tread on me RfD...suggest immediate withdrawal

Latest comment: 3 months ago6 comments2 people in discussion

I think you really need to withdraw that RfD First off, why do you start the RfD with "an entry created by Purplebackpack89"? What the hell's that got to do with anything? It smacks of treating entries I create in bad faith. Please remember to assume good faith with all editors. Also, what research did you do on historical use of the term? Because I DID do research on the term and rendering it that way IS commonly used, it's not some random misspelling as you seem to allege. Purplebackpack89 23:31, 2 June 2024 (UTC)Reply

@Purplebackpack89 If you disagree with an RfD, the procedure for doing so is not to demand that the creator withdraw the RfD. Instead you should post a response justifying the term. As for tagging you, I routinely do that in RfD's as a courtesy to the creator of the page that their term is being submitted to RfD or RfV. No bad faith is intended. Benwing2 (talk) 23:51, 2 June 2024 (UTC)Reply

I've also provided rationale at the RfD. Please read it. Purplebackpack89 23:54, 2 June 2024 (UTC)Reply

You're down 5-1 on this one...sure you don't want to withdraw? Purplebackpack89 21:11, 3 June 2024 (UTC)Reply

Dude. Chill. For real. Benwing2 (talk) 00:00, 4 June 2024 (UTC)Reply

or...you could admit that maybe that RfD was a bad idea? Purplebackpack89 05:12, 5 June 2024 (UTC)Reply

Weird WingerBot bug with translations

Latest comment: 3 months ago2 comments2 people in discussion

Hi,

I just wanted to let you know that your bot seems to have made a strange formatting change on the entry left-wing (I reverted it) causing all the translations to show up in the gloss field. I'm not sure if this has happened in any other pages, but you might want to take a look. LOOKSQUARE (talk) 20:37, 4 June 2024 (UTC)Reply

@LOOKSQUARE I think this was a one-off issue related to the strange formatting of the {{trans-top}} line. Looks like User:Fenakhay corrected the formatting. Benwing2 (talk) 03:05, 5 June 2024 (UTC)Reply

Persian Audio labels

Latest comment: 3 months ago13 comments2 people in discussion

Hi, is it possible to add the parameter |a=IR to all audios recorded by User:Darafsh? And honestly all audios that are imported from Persian Wiktionary, Afghanistan is not as connected to the internet as Iran is, so it's pretty safe to assume all audios from Persian Wiktionary are in the Iranian dialect. The bot that is mass importing from Persian Wiktionary is not labeling any of the recordings (I don't believe it is your bot, though).

But, if you want to be safe, I suspect that almost all Persian audios are only recorded by a handful of people; If that's the case, could the bot like, make a list? Then I can listen to the audios recorded by that user to confirm if they are Iranian. — SAMEER (؂・؄・؏) 03:47, 5 June 2024 (UTC)Reply

@Sameerhameedy Sure. Are you looking for a list of all Persian audios, or a list of all users that created Persian audios (which would have to be based on the file names), or both? Benwing2 (talk) 03:50, 5 June 2024 (UTC)Reply

I was hoping it was possible to compile a list of all user's that created audio files. Then I could confirm all users speak the Iranian dialect before the bot began to label what dialect they're speaking in the audio.

If that's not possible, I'm not sure what the best course of action is to reduce the amount of audios that need review... but the large number of unlabeled audios is a bit problematic — SAMEER (؂・؄・؏) 04:21, 5 June 2024 (UTC)Reply

@Sameerhameedy I recently made a dump of all occurrences of {{audio}} (661,680 of them) for cleaning up their captions. There are 1,028 Persian audio files. 701 of them come from Lingua Libre, from only four authors: Afsham23 (not all are labeled but the labeled ones all say "Iran" or "Iranian" in some variant, except for میخ that says Dari); Darafsh (most aren't labeled; the labeled ones say "Iran" or "Iranian"); Mazanin (same labeling issues as with Darafsh); and a single audio معماری from Soroor (Opsylac), labeled "Iran". The remaining 327 are anonymous, and mostly have the format Fa-ستاره.ogg; about half are labeled "Iran" or "Iranian" and the others are unlabeled, except for Fa-جلو.ogg and Fa-دوشنبه.ogg labeled Tehran. If you want I can get you a sorted list of all the audio files so you can listen to a sample of them and verify that they're Iranian. Benwing2 (talk) 04:47, 5 June 2024 (UTC)Reply

After listening to their audio recordings, I can confirm that all 4 people you mentioned had Iranian accents and all audio recordings by them can be safely labeled as "Iranian Persian".

Yes, if you could make a list I can give them a listen and try to pick out any that are not Iranian. Admittedly their might be some ambiguous cases if I don't have multiple audio samples per person, but I should be able to determine most. — SAMEER (؂・؄・؏) 05:31, 5 June 2024 (UTC)Reply

See User:Benwing2/fa-audio. Benwing2 (talk) 05:43, 5 June 2024 (UTC)Reply

lol I'm really sorry, but my browser keeps crashing when I open the link. It looks like all the audios are there though, is it possible to remove all audios that are already identified? If not, it's fine. I'll try to open it on my computer or something, it should be able to handle it. — SAMEER (؂・؄・؏) 06:05, 5 June 2024 (UTC)Reply

Sure, I'll remove those that are already identified. Benwing2 (talk) 06:08, 5 June 2024 (UTC)Reply

@Sameerhameedy See User:Benwing2/fa-audio-no-iran-1 and User:Benwing2/fa-audio-no-iran-2. I removed the audios identified as Iran or Iranian, which cuts out about 25%, and split the remainder into two. Let me know if you're able to open them. Benwing2 (talk) 06:12, 5 June 2024 (UTC)Reply

yes thank you that works perfectly. Since we're having a bot add the labels, I'll just delete all the Iranian audios as I go through the list. I'll let you know when I am done. — SAMEER (؂・؄・؏) 06:45, 5 June 2024 (UTC)Reply

@Sameerhameedy Sounds good. Benwing2 (talk) 08:45, 5 June 2024 (UTC)Reply

@Benwing2, Ive finished listening to all the audios. They are all Iranian speakers. I did upload some audios (of a family member speaking) a few days ago with dialect labels. You intentionally removed those right? Or did I miss them?

Anyway, we can safely label all those audios as Iranian. (though could the bot also make sure they are below fa-IPA and not above it?) — SAMEER (؂・؄・؏) 23:41, 5 June 2024 (UTC)Reply

@Sameerhameedy The list of audio files I generated finished on June 1, so if you uploaded anything after that, it will have been missed. I will label all the Persian audio files I found as Iranian (and make sure they are below fa-IPA). Benwing2 (talk) 03:01, 6 June 2024 (UTC)Reply

Category:Pages using duplicate arguments in template calls

Latest comment: 3 months ago2 comments2 people in discussion

From {{audio}}. —Fish bowl (talk) 04:53, 5 June 2024 (UTC)Reply

@Fish bowl Thanks. I just encountered these myself, will fix. Benwing2 (talk) 04:54, 5 June 2024 (UTC)Reply

editing again

Latest comment: 3 months ago3 comments2 people in discussion

Hi Benwing,

RichardW57m commented in one of the discussions that "And adding {{rfc|mul|Need meaning rather than graphical description}} to an existing entry is a valid process."

Is that the way to go? Or if there's a pattern of errors, should it be a consolidated mass request? kwami (talk) 00:47, 10 June 2024 (UTC)Reply

@Kwamikagami It would be best not to mass-add the same {{rfc}} tag to a bunch of entries. Instead, make a list of all the entries you have issues with and bring that list to the Beer Parlour with specific proposals for how to fix the issues. Benwing2 (talk) 01:01, 10 June 2024 (UTC)Reply

Okay, noted on my page. kwami (talk) 01:13, 10 June 2024 (UTC)Reply

Split Yue and Split Wu

Latest comment: 3 months ago1 comment1 person in discussion

Hi, please have a look at these two threads. Are the splits ready? --kc_kennylau (talk) 13:24, 10 June 2024 (UTC)Reply

Getting rid of the old category boilers

Latest comment: 3 months ago2 comments2 people in discussion

I'm going to do some reworking of Module:auto cat so that it no longer relies on keeping the old category boilerplate templates around, since none of them are used directly anymore so there's no reason not to just do everything via direct function calls. This affects {{poscatboiler}}, {{topic cat}} and {{ws topic cat}}. Theknightwho (talk) 14:45, 22 June 2024 (UTC)Reply

@Theknightwho Agreed. I didn't even know about {{ws topic cat}}. Benwing2 (talk) 18:15, 22 June 2024 (UTC) Benwing2 (talk) 18:15, 22 June 2024 (UTC)Reply

A favour

Latest comment: 3 months ago9 comments2 people in discussion

Hi Benwing,

I've been wanting to set up an auto-conjugator for Franco-Provençal. Unfortunately I don't quite understand how the ‘architecture’ you've set up for French works as it is a bit elaborate and spread across multiple modules/templates.

I was wondering, would you mind perhaps setting up a basic architecture for Franco-Provençal by recycling some of the code used for French? The French verb-endings (etc) can be kept as placeholders for me to replace/modify.

Best,

- Nicodene (talk) 06:18, 23 June 2024 (UTC)Reply

@Nicodene The conjugator for French is in any case non-ideal; it was inherited from older code written by User:Kc kennylau and hacked on by many people. It is really messy and IMO it needs a complete rewrite along the lines of Module:es-verb, Module:pt-verb, Module:gl-verb, Module:ca-verb or the like (which are not split across multiple modules and templates). Unfortunately I don't know the first thing about Franco-Provençal and creating such modules is not as easy as merely copying another module and changing the endings; usually rather more fundamental changes are needed. On top of that, I don't even know if Franco-Provençal is standardized in its spelling; if not, that's a whole nother can of worms. If you point me to detailed (and ideally complete) information on how Franco-Provençal verb endings and irregular verbs work, I will take a look and will have a better idea how to create such a module, but I'm not guaranteeing it will get done on any specific time frame. Benwing2 (talk) 07:00, 23 June 2024 (UTC)Reply

@Benwing2 I see.

There is a standard ‘supra-dialectal’ spelling system that is widely enough accepted to have been used e.g. for an article in the JIPA. A fairly comprehensive description of the spelling system is available here. The same source also provides, among other things, a description of Franco-Provençal morphology. I'll go ahead and write up a summary, in English, of how the verbs work. This will probably take a day or two. Nicodene (talk) 07:23, 23 June 2024 (UTC)Reply

@Nicodene Great, that will be very helpful. Benwing2 (talk) 07:31, 23 June 2024 (UTC)Reply

It's done. Conjugations 1 and 2 can be automated (apart from the handful of exceptions mentioned). For conjugation 3 the verbs will require their own individualized tables.

I've included a description of noun and adjective morphology as well, as well as orthographic consonant changes caused by changes in following vowels. This is already more or less handled by Module:frp-headword, which I cloned from the French counterpart and modified a bit, but perhaps it can be done more ‘cleanly’ or efficiently. Nicodene (talk) 22:33, 1 July 2024 (UTC)Reply

@Nicodene Awesome, thanks! The way that the equivalent of conjugation 3 verbs are handled in French (i.e. irregular verbs in -re/-ir/-oir) is through principal parts, along with rules that default certain principal parts to others if not explicitly given. (Same goes for the Spanish, Portuguese, Galician and Catalan verb modules.) Probably the same can be done here. Benwing2 (talk) 22:46, 1 July 2024 (UTC)Reply

@Benwing2 I see. I've yet to (definitively) complete the third-conjugation verb inflections but there does seem to be a fair amount of predictability.

You'll notice there is quite a bit of allomorphy. For instance ‘I would finish’ would be any of the following: fenirê, fenirên, fenitrê, fenitrên. In principle this could be condensed to feni(t)rê(n), at the cost of looking a bit unpleasant and also messing up any links. Alternatively we could perhaps shove some of the allomorphy into footnotes.

For the tables we already have Category:Franco-Provençal verb inflection-table templates as a starting point. All of these are incorrect in one way or another but the layout seems usable. Nicodene (talk) 23:02, 1 July 2024 (UTC)Reply

Yeah the table layout looks fine, and similar to the layout of other Romance languages. As for allomorphy, that also happens e.g. in the reintegrationist Galician norm. See amar for a typical example, or fazer for an example with up to five possible forms (in the 2nd plural pluperfect). Since the source dictionary for reintegrationist Galician identified some forms as more or less recommended, we followed suit in putting the less recommended ones in italics with a footnote. If such a distinction is made here, we could do that too, or just list the possible forms as long as there aren't too many. Benwing2 (talk) 23:41, 1 July 2024 (UTC)Reply

I suppose it's not too bad. We can go ahead and list the forms consecutively, in whatever order.

For the forms in question, the source doesn't seem to explicitly recommend one over another. My copy of Stich 2003 (a paperback dictionary) should arrive in a few weeks - if that has anything to say on the matter we can adjust accordingly.

Nicodene (talk) 03:44, 2 July 2024 (UTC)Reply

WingerBot is changing language codes to the wrong code

Latest comment: 3 months ago3 comments2 people in discussion

Hi, as you can see in this edit, while cleaning up entries to use the {{tcl}} template, your bot incorrectly changed two language codes (Central Nahuatl, nhn; and Czech, cs) to ncn, which corresponds to the Nauna language. Please correct your bot so it refrains from making such errors in the future. This edit happened a little over 7 months ago, so I don't know if any other pages were affected, but it seems probable.

Thank you. Brusquedandelion (talk) 06:40, 25 June 2024 (UTC)Reply

@Brusquedandelion Thanks for the bug report. This was probably a one-off error; anything that says 'manually assisted' by it was done by me manually editing the respective pages offline in a text editor. In these cases there isn't a bot script responsible. Benwing2 (talk) 06:47, 25 June 2024 (UTC)Reply

Gotcha. Thanks! Brusquedandelion (talk) 20:01, 25 June 2024 (UTC)Reply

caricature form

Latest comment: 3 months ago6 comments2 people in discussion

Thinking about debbil, ebery, etc, it occurs to me that finding wording that will look grammatical in a range of situations (templates) is non-trivial. This would work in {{lb}}, but in {{pronunciation spelling of}}, it'd display "a caricature of Black speech English": perhaps you know how to make it instead display "a caricature of Black speech in" in that environment, to produce "a caricature of Black speech in English", or else I suppose suppress "speech" so it produces "...Black English"? Such a label also doesn't look good in {{altform}}'s from=, but perhaps we just have to not use it in {{altform}}, unless you can think of a wording that would work there (either for use in that particular family of templates, or a better general-use wording); it does seem like most Remus-esque spellings are currently handled as {{pronunciation spelling of}}s, not {{altform}}s(?). (I also considered whether to say "a caricature..." or "caricatures...".) - -sche (discuss) 20:47, 25 June 2024 (UTC)Reply

There is support in Module:labels for changing the display form, Wikipedia link, categorization etc. depending on the "mode", where "mode" means how it's being called (whether from {{lb}}, accent qualifiers or form-of templates). You can see an example here Module:labels/data/lang/la#L-52 which changes the display and Wikipedia link for the label Ecclesiastical Latin when called from {{a}} or from the various |a= and |aa= parameters for specifying accent qualifiers. In this case, if you add a key form_of_display, it overrides the value of display for form-of templates, so maybe you can add form_of_display = "a caricature of Black speech in" or form_of_display = "a caricature of Black" or something. Benwing2 (talk) 20:55, 25 June 2024 (UTC)Reply

Thanks for explaining how display-changing works. I can't think of a way to make both T:altform/T:altspell and T:pronunciation spelling of display sensibly, unless there is a way to send a different value to the former vs the latter:

if I set form_of_display = "a caricature of Black" then T:pronunciation spelling of can display "Pronunciation spelling of every, representing a caricature of Black English.", but T:altform displays "A caricature of Black form of every",
if I set it to "a caricature of a Black pronunciation" instead, then T:altspell and to a lesser extent T:altform display decently, "A caricature of a Black pronunciation spelling of every.", but then T:pronspell displays badly

I suppose this is not necessarily a problem: I'm sure we have other "labels" which are only intended for use in one place and not another (e.g. it'd make little sense to stick "RP" in T:lb). And on a balance, I guess we should prioritize having T:pronunciation spelling of display well, and consider that to be the correct template to use for these (and consider using T:altform to be substandard)...? I was given pause for a moment by the fact that these don't represent anyone's actual pronunciation, but I see we use T:pronunciation spelling of for other potentially irreal pronunciations, like representations of Canadian or German accents that don't always reflect how a Canadian or German accent actually sounds.

You can check how it looks now in ebery, and in T:lb in debbil. I think maybe putting it in from= is better than putting it in a label; what do you think? - -sche (discuss) 23:24, 25 June 2024 (UTC)Reply

I think I agree with you that using |from= is better. However it's always possible (and not hard) to add another "mode" to distinguish {{pronunciation spelling of}} (and any other templates behaving similarly?) from {{altform}}. I don't think there are any labels currently using form_of_* so there are no compatibility issues in any case to worry about. (In general I prefer having one label to represent a single semantic concept rather than multiple labels for different templates, where you need to remember which label goes with which template.) Benwing2 (talk) 23:40, 25 June 2024 (UTC)Reply

OK. As far as I'm concerned, we can hold off on adding another "mode" for now, since I'm not sure it's needed here: my idea above for how to make T:altspell+from= display something grammatical basically entailed using the fact that the template generates "Foo spelling of bar" + supplying "pronunciation" as part of the from=, to make it display "...pronunciation spelling of...", which is . . . just making it an ersatz version of {{pronunciation spelling of}}, ha; maybe we can just consider using this label in T:altspell substandard, for now, and let the awkward way it'd display be a sign of that. - -sche (discuss) 02:58, 26 June 2024 (UTC)Reply

OK that's fine. Benwing2 (talk) 03:19, 26 June 2024 (UTC)Reply

Prakrit module errors

Latest comment: 3 months ago3 comments2 people in discussion

Hi! I found that non-mainspace pages like talkpages, user subpages, etc. using old codes like psu, pmh, etc. have module errors and have become unreadable due to their depracation in favour of new etymology-only codes of these Prakrit lects. Would it be possible for you to change these by bot like it was done for the mainspace pages? —Svārtava · 01:44, 26 June 2024 (UTC)Reply

I can do this although it will take some time before I get to it. Can you make a list of the old and new codes? I think there may have been two sets of old codes because I rationalized the etym-only codes at a certain point. Benwing2 (talk) 01:54, 26 June 2024 (UTC)Reply

Yes, you're right. Here's the list:

Ardhamagadhi Prakrit: pka / inc-pka → pra-ard
Helu Prakrit: elu-prk / inc-elu → pra-hel
Khasa Prakrit: inc-kha / inc-khs → pra-kha
Magadhi Prakrit: inc-mgd / inc-pmg → pra-mag
Maharastri Prakrit: pmh / inc-pmh → pra-mah
Paisaci Prakrit: inc-psc / inc-psi → pra-pai
Sauraseni Prakrit: psu / inc-pse → pra-sau

It's certainly not very urgent, you can take your time. Thanks in advance. —Svārtava · 03:25, 26 June 2024 (UTC)Reply

T:hi-ndecl

Latest comment: 3 months ago7 comments2 people in discussion

Hello! Is it currently possible to have both the phonetic respelling of the lemma and the Perso-Arabic broken plural in the Hindi declension template {{hi-ndecl}}? It seems like the template only supports one or the other but not both.

For example, at ज़लज़ला (zalazlā) the phonetic respelling is ज़ल्ज़ला (zalzalā) and the Perso-Arabic broken plural is ज़लाज़िल (zalāzil). Although Perso-Arabic broken plurals are usually rare in Devanagari-script Hindustani, it still might be useful to display them on the lemma entry.

In |1= the syntax for phonetic respelling is

//ज़ल्ज़ला<M>

and the syntax for broken plurals in |1= is

((<M>,<M.plstem:ज़लाज़िल.dirpl:ज़लाज़िल>))

When I try to combine the two as

((//ज़ल्ज़ला<M>,<M.plstem:ज़लाज़िल.dirpl:ज़लाज़िल>))

I see both

the phonetic respelling below the page title

and the automated transliteration below the page title

separated by a comma in the singular column as

ज़लज़ला, ज़लज़ला

zalzalā, zalazalā

Am I not using the syntax correctly, or does something need to be fixed in the template {{hi-ndecl}} or the underlying module MOD:hi-noun? Kutchkutch (talk) 09:53, 26 June 2024 (UTC)Reply

@Kutchkutch Can you try this?

((//ज़ल्ज़ला<M>,//ज़ल्ज़ला<M.plstem:ज़लाज़िल.dirpl:ज़लाज़िल>))

Benwing2 (talk) 18:08, 26 June 2024 (UTC)Reply

Thank you! That expression produces the output that I was expecting. It seems like you did not have to edit anything. That is good since this has probably not taken too much your time, and the template and module already had this capability.

Although there are only a few users who have added Perso-Arabic broken plurals to

{{hi-ndecl}}

and CAT:Hindi nouns with irregular plural stem

it might be helpful to mention this examples at:

T:hi-ndecl/documentation

for future reference. Kutchkutch (talk) 05:18, 27 June 2024 (UTC)Reply

Yup, I need to document the overrides. I'll use the above example as one; do you have any other examples of individual overrides? Benwing2 (talk) 05:20, 27 June 2024 (UTC)Reply

Regarding manual overrides in general:

The only category that I am aware of to find existing manual overrides is:

CAT:Hindi nouns with irregular plural stem

अहं (ahã) and ॐ (om) are the only Sanskrit learned borrowings in that category. The rest of the terms in that category are Perso-Arabic borrowings. For अहं (ahã) and ॐ (om), the spelling of the lemma does not end in म (ma), but म (ma) appears in the spelling of the oblique plural stem.

Determining which declension patterns need examples at

T:hi-ndecl/documentation

may need some detailed consideration. If there are any interested Hindi editors, perhaps they could add important examples on that documentation page.

There are two phenomena that I can think of at the moment for which I do not know what the syntax should be to obtain the expected output.

Correct lemma auto translit, incorrect irregular plural auto translit:

For बिरादर (birādar) and अफ़सर (afsar), the automated transliterations of the lemma forms are correct. However, the automated transliteration of their irregular plurals

बिरादरान (birādrān) and

अफ़सरान (afasrān)

need to be respelled as

बिराद*रान (birādarān) and

अफ़स*रान (afsarān)

What would be the syntax for this? I tried

((<M>,//बिराद*रान<M.plstem:बिरादरान.dirpl:बिरादरान>))

and ((<M>,//अफस*रान<M.plstem: अफसरान.dirpl:अफसरान>))

but they do not produce the expected output.

See Persian_grammar#Plural for additional context about -आन (-ān) / ـَان (-ān):

In the literary language, animate nouns generally use the suffix ـان -ân

Multiple irregular plurals:

For नतीजा (natījā) & जौहर (jauhar), the comma & space are included in the link. For example, the direct plurals of

नतीजा (natījā) and

जौहर (jauhar)

have a single link for the entire list of forms:

[[नताइज, नताएज]] → नताइज, नताएज

[[जवाहिरात, जवाहरात, जवाहर]] → जवाहिरात, जवाहरात, जवाहर

instead of a separate link for each form:

[[नताइज]], [[नताएज]] → नताइज, नताएज

[[जवाहिरात]], [[जवाहरात]], [[जवाहर]] → जवाहिरात, जवाहरात, जवाहर

जवाहरात (javāhrāt) needs to be respelled as जवाह*रात (javāharāt).

For some context on these forms:

नताइज (natāij) & नताएज (natāej) are just alternative forms of the Perso-Arabic broken plural (natāyij). However,

जवाह*रात (javāharāt) & जवाहिरात (javāhirāt)

from Classical Persian (jawāhirāt) are actually double broken plurals derived from the single broken plural

जवाहर (javāhar) & जवाहिर (javāhir)

from Classical Persian (jawāhir) where it says:

Classical Persian (jawāhir) has become singular in Persian and so has its own plural forms despite its etymology as a plural form.

Kutchkutch (talk) 13:59, 27 June 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ After your edit to बिरादर (birādar) and figuring out how to add multiple irregular plurals, the two issues mentioned above appear to be resolved. Kutchkutch (talk) 06:53, 28 June 2024 (UTC)Reply

Great! I added an example of using .plstem:... but I will add some more. Benwing2 (talk) 06:56, 28 June 2024 (UTC)Reply

Your undoing of my close was in error

Latest comment: 2 months ago10 comments3 people in discussion

There is no reason for the RfD at hawk tuah to continue. The IP who RfDed it hasn't even participated and no one has provided any rationale at all for deletion. Furthermore, no rationale exists to have it deleted via RfD at all; IDONTLIKEIT or ITHINKITSGROSS aren't rationales for deletion here. I ask you: what is the point of that RfD continuing?

Also, let's be perfectly honest here: if anybody but me had closed that, you wouldn't have undone it. You've somehow decided that I shouldn't be on this project and are attempting to harass me off it by undoing anything questionable. Please stop. Purplebackpack89 13:03, 7 July 2024 (UTC)Reply

@Purplebackpack89 Dude. You need to Assume Good Faith or I *will* start blocking you. I have told you this before. No one but you speedy-closes RfD's after 4 days. I would recommend you not close any RfD's since you seem to believe all of them are in error and don't seem able to follow normal policy. Benwing2 (talk) 19:09, 7 July 2024 (UTC)Reply

@Benwing2 I blocked them for 3 days for this, as they do this kind of thing in almost every single discussion they participate in, and it’s getting really disruptive. Check their contribution history, too: they make these kinds of threads on tons of users’ talk pages.

See their talkpage for more details on the block. Theknightwho (talk) 19:14, 7 July 2024 (UTC)Reply

Bad and POINTY block by Theknightwho. "Almost every single discussion" is an exaggeration at best. Blocking somebody for criticizing a close, or an undo of a close, is highly inappropriate. Blocking somebody for feeling victimized is 1984 territory. Criticism is disruption and criticizing an admin should NOT be a blockable offense. If we blocked Theknightwho every time he criticized somebody or started a fight, he'd be inedeffed.

As for, "No one but you speedy-closes RfD's after 4 days.", you almost did that yourself...until you had to walk it back once you realize how hypocritical it was. Purplebackpack89 14:22, 10 July 2024 (UTC)Reply

@Purplebackpack89 I blocked you for precisely the kind of behaviour you have immediately started engaging in again, which have now escalated to false accusations of hypocrisy: Ben did not speedy close any thread, and didn't even say they had considered doing so. If you carry on causing drama like this, I will block you for a week, because it is disruptive. Theknightwho (talk) 22:17, 10 July 2024 (UTC)Reply

You will do no such thing. You will instead stop getting your jollies by blocking me and find something else to do.

FWIW, the edit I am referring to is right here Purplebackpack89 23:57, 10 July 2024 (UTC)Reply

Can you not read? Wonderfool speedy-closed the term, not me. Benwing2 (talk) 00:02, 11 July 2024 (UTC)Reply

My mistake. But if Wonderfool did it, shouldn't he have also been blocked 3 days? Or blocked indef because he's a sock?

Anyway: the main thing is that both of you need to stop blocking me, or considering blocking me, over essentially nothing. You and I disagree? Whatever! That's not grounds for a block! Purplebackpack89 00:15, 11 July 2024 (UTC)Reply

@Purplebackpack89 That isn't why you were blocked. Theknightwho (talk) 00:24, 11 July 2024 (UTC)Reply

@Purplebackpack89 The problem is not that you and I disagree. The problem is that your constant assumptions of bad faith on the part of everyone but you are extremely disruptive. Please read WP:NOTHERE and WP:Disruptive editing, as a lot of the points there apply to you. I'm giving you a last warning: next time you assume bad faith towards someone, you will be blocked for a week. Benwing2 (talk) 01:02, 11 July 2024 (UTC)Reply

Optional aliases for "[language] foos" categories

Latest comment: 2 months ago7 comments2 people in discussion

I was thinking about Category:Four-character idioms by language, since it's not a very intuitive name, but I know we struggled to come up with something language-neutral that was better. However, I don't think we need to constrain ourselves to that name when it comes to the language-specific categories themselves, where it would make more sense to use "chengyu", "yojijukugo" or whatever. Would it be feasible for categories like this to have aliases, so that languages can use a more appropriate custom name where applicable? As another example, having Category:Japanese Han characters instead of "Japanese kanji" is just silly. Theknightwho (talk) 21:43, 8 July 2024 (UTC)Reply

@Theknightwho Well, there is support for language-specific categories currently. You could use that to simulate alias of sorts. For true aliases we might need a bit of coding; how do you envision them working? Would e.g. someone need to specify the category name as 'chengyu' or are you expecting that if the say it auto-redirects to 'chengyu'? Benwing2 (talk) 22:08, 8 July 2024 (UTC)Reply

@Benwing2 So if you had category "foo" with alias "bar", you could create "Category:langname foos" or "Category:langname bars", but the latter would be treated as those it were the former for the purposes of subcategorisation and so on. I don't think it would be feasible to maintain some kind of special whitelist, so we'd just have to trust that people wouldn't create "Japanese hanja" or whatever. We could probably enforce this by doing existence checks against other possible aliases with {{auto cat}}. Theknightwho (talk) 22:23, 8 July 2024 (UTC)Reply

@Theknightwho What do you mean "for the purposes of subcategorization"? BTW I'm not sure why whitelists or lists of the language-specific equivalents of given categories aren't feasible. We do something similar with {{pseudo-loan}}, which displays as pseudo-anglicism if the source is English, pseudo-Latinism if the source is Latin, etc. falling back to pseudo-loan from SOURCE for non-special-cased source languages. It even displays wasei eigo (appropriately linked) if the source is English and destination is Japanese. Benwing2 (talk) 22:42, 8 July 2024 (UTC)Reply

@Benwing2 As in, "Category:Chinese chengyu" would be categorised as though it were "Category:Chinese four-character idimoms", "Category:Japanese kanji" as though it were "Category:Japanese Han characters" etc. etc.

The reason I'm not keen on a whitelist is that terms like "chengyu" and "kanji" potentially apply to whole language families (Sinitic and Japonic), so it's just annoying to maintain, and I don't think there's much harm in someone technically being able to create "Chinese kanji", since if they're messing about like that then they're unlikely to care about details like this. I doubt we will need aliases in more than a handful of situations, but the number of languages which will need to use them is potentially in the dozens. Theknightwho (talk) 22:48, 8 July 2024 (UTC)Reply

Why not do the whitelist at the family level then? Benwing2 (talk) 23:16, 8 July 2024 (UTC)Reply

I guess. I've got some bugfixes to do elsewhere, but once those are done I'll have a look at it today.

On a semi-related note, there's preliminary consensus for having a Chinese equivalent to Category:Japanese kanji by reading, which will probably need to be covered in a similar way. It's slightly complicated by a couple of edge-cases (hanzi with polysyllabic readings (e.g. ⿱𡩧⿺進⿰貝招 (zhāocáijìnbǎo)) and non-hanzi with monosyllabic readings (e.g. 屄養／屄养 (biǎng), man (mê̄n), Q (kiù)), so I'm not certain on the exact name yet. Theknightwho (talk) 12:07, 9 July 2024 (UTC)Reply

A little guidance

Latest comment: 2 months ago4 comments2 people in discussion

Hi, if you have time, could you explain the easiest way to set up a conjugation module?? I made Module:User:Babr/fa-conj (though I haven't started actual conjugations yet), but I can't get the table to display. I was hoping to make a module that can create conjugations for Dari, Iranian, Kabuli & Tehrani with one input.

I think I can probably do it myself, I was just hoping you could explain how to set it up. If you have time, that is. — BABR・talk 00:30, 9 July 2024 (UTC)Reply

@Babr I can try to help you. How familiar are you with Lua, and coding in general? Benwing2 (talk) 00:36, 9 July 2024 (UTC)Reply

Hmm, well I understand some very basic elements of Lua like string functions. That's is to say, I'm definitely a beginner, lol. But, even with my very basic ability, I find replacing characters contextually is a bit easier to do in lua than it is to do with dozens of templates; Which is basically why I was hoping to make a luacized conjugation table. — BABR・talk 01:19, 9 July 2024 (UTC)Reply

@Babr Well, you might start with MOD:hi-verb, which is about the most straightforward of the verb modules I've written. But even then there is significant work to do in creating a conjugation module and it goes well beyond just find and replace, because each language has its own system for how conjugations work and how to deal with irregularities, etc. Benwing2 (talk) 01:40, 9 July 2024 (UTC)Reply

quick bot change

Latest comment: 2 months ago1 comment1 person in discussion

Hey. Could you change all the pages linking at the top of User:Jberkel/lists/wanted/20240701/es linking to lloviznarse to link to lloviznar? Like this edit plsDenazz (talk) 19:27, 21 July 2024 (UTC)Reply

something's wrong with your bot

Latest comment: 2 months ago2 comments2 people in discussion

The edit summary doesn't match the edit at all. The Great Redirector (talk) 20:14, 23 July 2024 (UTC)Reply

@The Great Redirector That falls under "misc cleanups". Since I tend to do edits offline and push them in bulk, it's impossible to get the edit summary to match in every single particular. Instead I put the major changes into the summary and one-off miscellaneous changes get listed as "misc cleanups". Benwing2 (talk) 21:52, 23 July 2024 (UTC)Reply

New Serbo-Croatian declension table

Latest comment: 2 months ago21 comments3 people in discussion

Hello, Benwing,

Hope your day is going well.

As a speaker of Serbo-Croatian, I've noticed how daunting our adjective tables appear to be. Thus, I've made a simplified design based on Czech and Slovene:

New Serbo-Croatian design

Let me know what you think.

Regards,

Kostović (talk) 12:59, 28 July 2024 (UTC)Reply

Adding to my previous comment. With the new design, the underlying system might as well be overhauled.

I've created an sh-adjective module based on cs-adjective.

Serbo-Croatian differs from Czech in that we have indefinite and definite adjective forms, e.g. dobar/dobri. The indefinite, which is used in dictionaries, can have a host of endings:

-ak, -an, -ao, -ar (the letter A disappears outside of the masculine nominative, this is called nepostojano A (eng. a/ø alternation, fleeting a, movable a));
-ag, -av, -en, -ev, -ov, -in (the final vowel stays regardless of case).
-ji (soft ending).

In Czech and other Slavic languages, the closest thing to the Serbo-Croatian indefinite form is the short form (ex. šťasten, from šťastný). However, the Serbo-Croatian definite stretches out to the nominative, genitive, dative, accusative and locative. That is why, on the design, multiple cases are marked with ind. (indefinite) or def. (definite).

If you have spare time and will, I would appreciate any assistance with making the module closer to the design.

Regards,

Kostović (talk) 17:10, 28 July 2024 (UTC)Reply

@Kostović I'm not good enough at either Serbo-Croatian morphology or Lua to talk specifics, but:

It's standard practice to set up the template page with either fake parameters/input or code that knows what to do when called from the template page itself (or at least wrap the executable part in <includeonly></includeonly>). Having a new template pop up immediately in CAT:E is not a good sign. If nothing else, you should have created the template first in your userspace where the errors don't show up in CAT:E, worked out the bugs,then copied it to template space.
The error it's throwing: Short forms can only be specified for lemmas ending in -i, but saw 'křepký', seems to show that your code is generating nonsense. At the very least, one part of your code is generating a type of output that another part of your code finds categorically wrong. Also not a good sign.

Chuck Entz (talk) 22:14, 28 July 2024 (UTC)Reply

@Chuck Entz 'křepký' is a leftover from the Czech module. Definitely this should have been put in userspace until it was ready. BTW User:Kostović I have a partly complete Serbo-Croatian adjective module that I was working on but never finished. What is lacking in your module, and what I was in the process of implementing, is support for accent diacritics. We definitely need this, and as a native speaker I imagine you can help with this. Benwing2 (talk) 22:23, 28 July 2024 (UTC)Reply

@Benwing2 @Chuck Entz

Grateful for both replies; I've remade the template in my userspace. Feel free to delete or disable the official module until testing is done.

As for accent diacritics, I would be more than glad to help. @Benwing2, what was your original idea of implementing them? I see three options:

for the base adjective only (dȍbar);
for the forms where the base accent changes, like the neuter and feminine (dȍbar, dòbro, dòbra);
all forms (dȍbar, dȍbrōg(a), dȍbrī etc.) — my preferred option, the most informative

I'm not the best at Lua, but I will explain everything the best I can. What's important is narrowing down all patterns. It should be possible.

Kostović (talk) 04:14, 29 July 2024 (UTC)Reply

@Kostović Yes, we want all the accent diacritics on all forms. If you could help me identify the different classes/patterns of accent diacritics, I can write the code to implement them. Benwing2 (talk) 06:39, 29 July 2024 (UTC)Reply

@Benwing2 Then I will first remake the design. Is it possible to make a single table that toggles between indefinite and definite forms with the press of a button? Would that require JavaScript?

If that is not an option, we can keep two tables as it is now, but streamlined as per the design. The comparative and superlative can be moved to separate pages with their own declination, just like Czech.

Kostović (talk) 07:10, 29 July 2024 (UTC)Reply

@Kostović I *think* it's possible to have a single table that toggles between indefinite and definite without JavaScript but for now I'd just go with two tables. Keep in mind that in Bosnian, even the forms that are spelled the same between the definite and indefinite differ in vowel length, with the definite having a long vowel at the end and the indefinite having a short vowel at the end, so it might make sense to have two full tables. Benwing2 (talk) 07:58, 29 July 2024 (UTC)Reply

I've made an interactive proof of concept. Feel free to give your opinion. Some points of highlight:

It has links to all declination forms, including ones such as dòbrīm(a), where the a in parentheses links to the longer form.
Comparative and superlative forms are on their own pages, with their own declinations (1) (2).
Each base form (dȍbar, bȍljī, nȃjboljī) has an accompanying adverb (dòbro, bȍlje, nȃjbolje). The latter two differ from the neuter by a short final vowel (compare the adjective bòsanskī and adverb bòsanski).

I'm not a hundred percent sure about the declinations of dȍbar, specifically the female genitive (why does the indefinite feminine genitive dòbrē have a long ē when the accusative dòbru does not?).

Besides the standard Hrvatski jezični portal, my sources are the following:

Akcenatski savjetnik (Montenegrin)
Hrvatski mrežni rječnik (Croatian)
Srpski jezik (Serbian, requires registration)

I will consult with Croatian and Serbian linguists and see what they have to say about the table.

Kostović (talk) 05:55, 30 July 2024 (UTC)Reply

@Kostović Great, thank you! This looks good. I will take a look at the sources you've mentioned. Do you have any sources specifically on the different accent patterns in Serbo-Croatian? I don't know much at all about Serbo-Croatian accent patterns but based on the analogy of Russian I can guess that there may be two types of definite adjectives (stem-stressed and end-stressed) and three types of indefinite adjectives (stem-stressed, end-stressed and mobile), and the correspondence between the definite and indefinite patterns may not be always predictable. What I'd like to see is some rules for how to form the different accent patterns; then I can properly code them up. Note that when I say "stem-stressed", "end-stressed" and "mobile" I'm referring to the *underlying* pattern, where rising accents are considered equivalent to falling accents one syllable to the right. IMO the only sensible way to implement the accent patterns is to (a) separate length from stress, (b) do all operations in the code on the underlying accents and convert to surface accents at the very end. (This is a bit similar to how I handle vowel alternations, i.e. о-а, е-я, э-а, in Belarusian; the code always operates on the underlying vowels, which don't change as the stress moves around, and converts to surface vowels at the end.) Benwing2 (talk) 06:04, 30 July 2024 (UTC)Reply

Srpski jezički atelje has a list of accentology literature. I'm not sure whether it's available digitally, and none of it is in English.

I've made a topic asking for additional help with dobar in particular, and I've also asked for literature that dives into adjectives.

To tell you the truth, without literature, I will need some time to comprehend how adjectives are accented—weeks or months, depending on how much free time I have. Therefore, I would at least implement the new design based on Czech, then take up accenting. Since you know the wiki better and have access to a bot, would you like to do the technical part?

Kostović (talk) 07:15, 30 July 2024 (UTC)Reply

So the thing about doing an adjective module without accents is that it would need significant rewriting to support accents, so I think it's better to put off doing this until we have at least a basic handle on how accents work. If you examine various common adjectives I imagine you'll be able to work out the basic system, esp. if you start with the assumptions I mentioned above (which are probably true). I can also look into them if you give me a list of common adjectives of various accent patterns. Benwing2 (talk) 08:03, 30 July 2024 (UTC)Reply

@Benwing2 Understood. I'll look at accents this week. Many people are on vacation, so we'll likely have to wait for expert help. Kostović (talk) 14:23, 30 July 2024 (UTC)Reply

@Kostović OK sounds good, thanks! Benwing2 (talk) 20:33, 30 July 2024 (UTC)Reply

@Benwing2 Regarding accents: once they are implemented, what will the average editor have to do when adding a new adjective? Will they need to provide the base form (dȍbar, srȅtan, lijȇp) or can the module somehow detect that too?

Kostović (talk) 05:43, 31 July 2024 (UTC)Reply

@Kostović It depends on how the accent system works exactly. However, if it is similar to Russian, probably they will have to supply the base form with accents plus some indication of what the accent class is. Depending on how the system works, we can probably supply appropriate defaults so that e.g. if they leave off the accent class, it picks the most common one that goes with the base form. This wouldn't necessarily be advisable if there are several common accent classes for a given base form, but I get the feeling most adjectives follow standard patterns and only a small number do weird things. BTW this is based off of Bosnian, Serbian, Croatian — A Grammar by Ronelle Alexander, which on pages 362-363 mention the adjectives dobar, crn, zelen and junak and say "There is only one significant accentual alternation in adjectives, and it only occurs in a few adjectives." If this is true, probably only the base form with accents will be required. Also I looked in Hrvatski mrežni rječnik and searched for other adjectives and although I found crn, I couldn't find zelen, mlad or star, which are all very basic adjectives. Are these simply missing or am I searching wrong? Benwing2 (talk) 06:02, 31 July 2024 (UTC)Reply

You are not missing anything, the site just lacks examples for those terms. Try Školski rječnik hrvatskoga jezika instead.

I am glad you found an English source. I will look at it too. BTW: junak is a noun, the corresponding adjective is junački.

Kostović (talk) 06:35, 31 July 2024 (UTC)Reply

@Kostović Thanks! That latest site looks to have a lot of good info. Benwing2 (talk) 06:50, 31 July 2024 (UTC)Reply

I will note that some accents vary based on Croatian and Serbian sources, particularly the sequence -ije-, which stems from the Slavic yat:

Croatian: lijȇp, rijȇč, stijéna–stijȇne (nominative singular and plural)
Serbian Ijekavian: lȉjep, rȉječ, stijèna–stȉjene

As I see it, these are the only standards worth considering; there are no published dictionaries with accents for Bosnian and Montenegrin. If we have two sets of declension tables (Croatian and Serbian), it's likely that the accents for a single term will be different in all genders and cases depending on the standard.

This is why nobody has done any comprehensive tables of accents for Serbo-Croatian before. And dictionaries can be wrong. But it beats having nothing, right? Perhaps we could do something similar to Eastern and Western Armenian.

Kostović (talk) 08:14, 31 July 2024 (UTC)Reply

@Kostović Interesting, I know things get tricky when you consider ekavian vs. ijekavian but I didn't realize there were differences between Croatian and Serbian Ijekavian. I imagine most adjectives don't have this issue? If so, we can just handle this by calling the adjective declension template twice, labeling one Serbian and the other Croatian; or alternatively, use a single combined table with footnotes to distinguish the two standards (the inflection library I wrote to handle inflection tables, Module:inflection utilities, can handle this situation more or less automatically). Do you think it's worth considering ekavian accents, since AFAIK ekavian is more common in Serbia than ijekavian? I think it should be possible to derive the ekavian pronunciation from the ijekavian one in most circumstances (but not the other way around). Benwing2 (talk) 09:31, 31 July 2024 (UTC)Reply

I would keep Croatian and Serbian separate. Just like Eastern and Western Armenian, users should be able to provide info for the dialect they are fluent in and leave the other to those who are fluent in that.

Overall, I believe the most convenient option would be one table per dialect, and, in both tables, implement a toggle button for the indefinite/definite forms.

You are right about Ekavian, it has a majority in Serbia and should not be left out. I will mainly rely on Ijekavian for consistency's sake.

Kostović (talk) 12:09, 31 July 2024 (UTC)Reply

Replacement of quotation templates (August 2024)

Latest comment: 2 months ago1 comment1 person in discussion

Hi, could you please carry out the following replacements?

If |page= is used: {{RQ:Bacon Sylva Sylvarum}} → {{RQ:Bacon Sylva Sylvarum|edition=3rd}}. (Note to self: accretion is already using the 1st edition.)
If |page= is used: {{RQ:Mortimer Husbandry}} → {{RQ:Mortimer Husbandry|edition=2nd}}. (Note to self: ciderist and mend are already using the 1st edition.)
{{RQ:Pynchon Crying Lot}} → {{RQ:Pynchon Crying of Lot 49}}. If |page= is used, add |year=1976.
If |year=1594 is not used and |page= is used: {{RQ:Shakespeare Venus and Adonis}} (or {{RQ:Shakespeare Venus}}) → {{RQ:Shakespeare Venus and Adonis|year=1896}}.

Please replace only mainspace uses—uses in other spaces like Template: and Wiktionary: can be ignored. Thank you. — Sgconlaw (talk) 19:51, 1 August 2024 (UTC)Reply

Breaking change

Latest comment: 1 month ago2 comments2 people in discussion

Just a heads up that whatever edit you just made to Module:pl-adjective seems to have broken it. · • SUM1 • · (talk) 09:35, 11 August 2024 (UTC)Reply

@SUM1 Thank you, it was a bug, it is fixed now. Benwing2 (talk) 09:45, 11 August 2024 (UTC)Reply

WT:VOTES for my bot

Latest comment: 27 days ago10 comments2 people in discussion

Hi Benwing2 - thanks so much for lifting my block almost a week ago, it was really appreciated

I went ahead and created Wiktionary:Votes/bt-2024-08/User:ColumbaBushBot_for_bot_status but nobody has responded yet

What's the best way I can pursue consensus-building before the vote ends? ColumbaBush (talk) 14:43, 22 August 2024 (UTC)Reply

@ColumbaBush IMO you should post in the Beer parlour mentioning the vote and outlining what exactly you are planning on doing with the bot, as well as pointing to some existing bot-changes you've made as examples. Benwing2 (talk) 19:58, 22 August 2024 (UTC)Reply

will do - thanks (as always) ColumbaBush (talk) 20:05, 22 August 2024 (UTC)Reply

done! Wiktionary:Beer_parlour/2024/August#c-ColumbaBush-20240823070300-User:ColumbaBushBot ColumbaBush (talk) 07:07, 23 August 2024 (UTC)Reply

Thanks again for the feedback, some more people responded in support of the bot. Since the vote end date is looming, are there enough votes at this point or should I do more to rally the troops? ColumbaBush (talk) 02:57, 30 August 2024 (UTC)Reply

@ColumbaBush I gave you a supporting vote. 6-0 should be enough even if no one else responds, and in general I don't think there's a specific quorum for votes. Benwing2 (talk) 03:03, 30 August 2024 (UTC)Reply

you're fast! thanks so much ColumbaBush (talk) 03:13, 30 August 2024 (UTC)Reply

my last ask (i promise) 🙂

now that the voting period has ended, is there a process i need to follow to request any of the Wiktionary:Bureaucrats (yourself included) to set the bot flag? ColumbaBush (talk) 01:56, 5 September 2024 (UTC)Reply

Someone should close the vote as passed and then request that a bureaucrat (e.g. me) add the bot flag. The vote closing should happen shortly. Benwing2 (talk) 02:01, 5 September 2024 (UTC)Reply

thanks for the turbo reply, i appreciate it ColumbaBush (talk) 02:10, 5 September 2024 (UTC)Reply

Apabhramsa lects

Latest comment: 1 month ago20 comments5 people in discussion

Hi, Benwing2. I have a question regarding the newly created Apabhramsa language, in particular about its lects. Their codes were named as apa-[FIRST 3 LETTERS OF LECT NAME] according the convention set by Prakrit ones (pra-mah, pra-sau, etc.) but it is problematic because apa already stands for Category:Apachean languages. My next thought is to rename them to inc-apa-[FIRST 3 LETTERS OF LECT NAME] but that would be taking the naming into 9-letter codes. As you have much experience with such stuff, what do you think would be the best code for these? (A short-lasting thought was to rename Apabhramsa as inc-apb and rename the lects as apb-[...] but I realized that while that is justified in the case of pra- which is a well known ISO code, this is not true for apb-.) Svartava (talk) 11:17, 24 August 2024 (UTC)Reply

apb is the CAT:Sa'a language.

aph is the CAT:Athpare language.

apr is the CAT:Arop-Lokep language.

apm is the CAT:Chiricahua language.

Kutchkutch (talk) 13:41, 24 August 2024 (UTC)Reply

@Kutchkutch The normal convention would be inc-[CODE], where there would be no specific indication of Apabhramsa (unless you embed it into the code itself). Benwing2 (talk) 19:02, 24 August 2024 (UTC)Reply

@Benwing2: Unfortunately, that's not really preferable because of the existence of similarly named Prakrits like "Maharastri Prakrit" and "Maharastri Apabhramsa", so it is desirable to have some indication of Apabhramsa in the code. Would inc-apa-[CODE] be fine in such a case? Svartava (talk) 19:22, 24 August 2024 (UTC)Reply

@Svartava Hmm, normally we only use long codes like that with proto-languages, although looking through Module:etymology languages/data we do have xme-ttc-nor (Northern Tati), xme-ttc-eas (Eastern Tati), etc. An alternative as I mentioned is to encode some indication of Apabhramsa into the three-letter code after inc, hence inc-ama for Maharastri Apabhramsa, inc-agu for Gurjara Apabhramsa, inc-avr for Vracada Apabhramsa, etc. Benwing2 (talk) 19:46, 24 August 2024 (UTC)Reply

@Benwing2 Well, but the codes like those are quite similar to the ones like the Prakrit ones e.g., inc-pka which were deprecated for the reason that they were sort of just 3 characters selected according to some random and uncommon pattern. What about the apb- code series, would that be acceptable since it doesn't clash with a language family, but is not a well known or established code like pra-? (At this point I'm feeling conflicted between the most precise option vs the one comfortable to type, also @Kutchkutch.) Svartava (talk) 19:56, 24 August 2024 (UTC)Reply

@Svartava We can't use apb because it's a language, as noted above. In general we should not make up new 3 letter family codes because there's a high probability they will be used for languages in the future even if they aren't currently in use. So our only options that I can think of are to use the long codes like inc-apa-mah or shorter ones like inc-ama (or inc-maa, with the a for Apabhramsa at the end; the latter is at least consistent with how we've handled Wu lects, where e.g. if the lect is called Yongxiang we would use wuu-yox, taking two letters from the first part and one from the second part, since most Chinese lects are two-part). Asking @Kutchkutch, @Theknightwho, @-sche for comment. Benwing2 (talk) 20:09, 24 August 2024 (UTC)Reply

Other than the options Benwing has mentioned, the only (other) thing I can think to suggest is to create a prefix that's in the Private Use / Reserved range qaa ... qtz, like "qap-", but to me some kind of systematic distinction like "inc-ama" or "inc-maa" for Maharastri Apabhramsa (and maybe, for even more systematic distinction, correspondingly "inc-pma" or "inc-map" for Maharastri Prakrit) seems more intuitive than removing them from inc. - -sche (discuss) 21:56, 24 August 2024 (UTC)Reply

@-sche I agree with this. I wouldn’t support using a q** code, because the first part of the code is a 3 letter family code that tells you about the language. There’s no clear reason why Apabhramsa needs to have distinctive codes, so I don’t think it’s a good idea to carve out an exception for it. My strong preference is for inc-xxx, as 9 letter codes are annoyingly long. Theknightwho (talk) 22:04, 24 August 2024 (UTC)Reply

@Svartava Without commenting on anything else, we definitely shouldn’t be making novel 3-letter codes for families or languages. apb is already used by Sa'a, so it would be a problem to use it for this as well. Theknightwho (talk) 21:59, 24 August 2024 (UTC)Reply

So it seems we are better off using six-letter codes rather than very long distinctive codes and means we prefer reverting to slighly ambiguous codes (which we got rid of for Prakrit lects, but it doesn't seem possible to avoid them here). This would be my code preferece by taking one starting letter, one letter from in between (chosed based on the pronunciation) and a for Apabhramsa:
Avahattha: inc-avh

Gurjara Apabhramsa: inc-gja

Kasmiri Apabhramsa: inc-kma

Maharastri Apabhramsa: inc-mra

Sauraseni Apabhramsa: inc-ssa

Takka Apabhramsa: inc-tka

Vracada Apabhramsa: inc-vca

I have chosen such naming because in future we might have Ardhamagadhi Apabhramsa so it can be named inc-ama. Avahattha is sometimes called "Magadhi Apabhramsa" so a hypothetical code for it is inc-mga which doesn't clash with any other lect, but it would clash with Maharastri we chose inc-maa. In general, I think this naming would be more likely to work in future and less likely to create clashes. @Kutchkutch, Benwing2: What do you think of this set? --Svartava (talk) 03:37, 25 August 2024 (UTC)Reply

@Svartava I would prefer you just use the first two letters rather than choosing the first letter and a random letter in the middle. It will be easier to remember that way and also consistent with the way we do other codes. I don't see an issue with inc-maa for Maharastri Apabhramsa; once it's known that Avahatta is Avahatta rather than "Magadhi Apabhramsa", inc-maa would necessarily be for Maharastri. Benwing2 (talk) 03:45, 25 August 2024 (UTC)Reply

@Theknightwho Svartava Since 9 letter codes are annoyingly long, there is a comparison that can be made with Early New Indo-Aryan languages. In codes for these historical chronolects, the ‘o’ prefix is for ‘old’. The codes are:

In all of these codes, the format is

inc-oxx

with the

xx

being the corresponding ISO two-letter code for the modern language. However, associating each of these historical chronolects with the modern chronolect does not imply

that the only descendant of the historical chronolect is the corresponding modern chronolect
and that Old xx is the only name for it

For example, Old Gujarati is

also called “Old Rajasthani”
and not just the ancestor of Gujarati since it is also the ancestor of the CAT:Rajasthani languages.

omr Old Marathi doesn’t have the inc- prefix since the ISO assigned it a three-letter code.
In the relatively recently created code inc-oaw for Old Awadhi, the three-letter code awa for Awadhi was truncated to just aw.

With the possible exception of CAT:Kamarupi Prakrit language inc-kam (which seems to be an Apabhramsa rather than a Prakrit), the Apabhramsas lects encompassed by the language code inc-apa are the immediate predecessor of all these ‘old’ Early New Indo-Aryan languages. They have been merged by the same reasoning as Prakrit.

With a six-letter codes, the naming scheme would either be

inc-axx according to the usual convention

or inc-xxa according to the Wu Chinese convention

inc-axx has the advantage of the ‘a’ being more easily identifiable at the front rather than being appended at the end. However, in both cases, there are only two lect-specific letters, which doesn’t give much room for specificity.

Using the corresponding ISO two-letter code has been a highly effective way to utilise the two-letters that are available to specify the Early New Indo-Aryan chronolects. The corresponding Apabhramsa codes would be:

(ks is the CAT:Kashmiri language)

inc-amr CAT:Maharastri Apabhramsa
inc-asd CAT:Vracada Apabhramsa

(sd is the CAT:Sindhi language)

In the case of

inc-apa CAT:Takka Apabhramsa

that would be the same code as the Apabhramsa language itself. So, since the second letter of the word ‘Punjabi’ is ‘u’ rather than ‘a’

it could be

inc-apu CAT:Takka Apabhramsa

Kutchkutch (talk) 03:44, 25 August 2024 (UTC)Reply

@Kutchkutch I think this approach is too confusing esp. inc-ahi for Sauraseni Apabhramsa. In general we always try to use the lect name in the code rather than some other name; doing in that way keeps it easier to remember and maintains consistency in code naming across different language families. Benwing2 (talk) 03:48, 25 August 2024 (UTC)Reply

Taking

we always try to use the lect name in the code rather than some other name

into consideration, my proposal would then be modified to:

Kutchkutch (talk) 04:26, 25 August 2024 (UTC)Reply

@Kutchkutch These are fine with me and easy to remember as they consistently use a followed by the first two letters of the lect. @Svartava any objections? Benwing2 (talk) 04:51, 25 August 2024 (UTC)Reply

@Benwing2, Kutchkutch: I'm not trying to be nitpicky or obstructive but while this naming is consistent and works for now, it is not far-fetched to think that first two letters can easily be common when we add more Apabhramsa lects. For example, if we merge Kamarupi, its hypothetical name would also be inc-aka but according the the middle letter approach it would be inc-kra (or inc-akr). As mentioned above that Yongxiang is truncated to yox, I think this can work as well (even if initially hard to memorize the codes, but it is pronunciation based so sort of comes naturally). --Svartava (talk) 04:58, 25 August 2024 (UTC)Reply

@Svartava Please let's only worry about that if/when it comes up in the future. Any approach has flaws but using random middle letters is IMO much worse; the Chinese approach only works because it's consistent and most Chinese lects are clearly formed of two parts, which isn't the case here. What you're proposing is inconsistent and is no better (and IMO worse) than introducing a small inconsistency in the future if/when we have to handle something like Kamarupi. Benwing2 (talk) 05:21, 25 August 2024 (UTC)Reply

I agree with Benwing. Until the status of Kamarupi Prakrit can be sufficiently clarified, the more important priority at the present is to disassociate the current Apabhramsa lects from the code apa.
The primary contributor of Kamarupi Prakrit, User:Msasag, is still somewhat active. We still need to ask him about whether merging it into Apabhramsa is appropriate. It is true that if Kamarupi were merged into Apabhramsa, its hypothetical name would also be inc-aka using the ‘first two letters’ approach just like Kasmiri Apabhramsa. However, this can be discussed later. Kutchkutch (talk) 05:30, 25 August 2024 (UTC)Reply
@Kutchkutch, Benwing2: OK, no issues. I guess, if we ever merge Kamarupi into Apabhramsa we can name that separately inc-akr as an exception to the convention (alternatively change Kasmiri to inc-aks and name Kamarupi as inc-aka, but this could be thought of if the situation arises). I agree these ones do look better than the middle letter set. We can go ahead with the change now, perhaps. Svartava (talk) 05:57, 25 August 2024 (UTC)Reply

Prakrit

Latest comment: 1 month ago8 comments2 people in discussion

@Benwing2: Hi again. Since the language family code pra has been deprecated, it would be good if we change the code of CAT:Prakrit language from inc-pra to pra (which is also an ISO code). This was originally intended at Wiktionary talk:Votes/2021-03/Merging Prakrit lects into one but it couldn't happen because of it being a family code. As per here, Theknighwho has recently brought this to attention for templates like {{pra-sc}}, etc. since it is undesirable to chop off parts of language codes like this.

Will it be possible for you to convert all instances of inc-pra to pra (in mainspace and reconstruction) anytime in the coming days? I will myself deal with the edits required in various modules to reduce the workload to just mechanical conversions. Svartava (talk) 07:50, 31 August 2024 (UTC)Reply

Shouldn't we first deal with the Apabhramsa lang code changes (if not already done)? Once these are done I can help you with this. Benwing2 (talk) 07:52, 31 August 2024 (UTC)Reply

Yes, those are already done. Svartava (talk) 07:56, 31 August 2024 (UTC)Reply

All template/module edits regarding changing inc-pra to pra have been done. For now, Theknightwho has set inc-pra as alias of pra to prevent module errors.
So only bot-conversions of inc-pra to pra have to be done now on mainspace and reconstruction pages.

Svartava (talk) 15:43, 31 August 2024 (UTC)Reply

OK sounds good. Benwing2 (talk) 18:39, 31 August 2024 (UTC)Reply

BTW I'll use the tracking set up by the aliases, but it won't be complete at this point. The new dump file should be released tomorrow evening maybe, and at that point I can find all the remaining cases. Benwing2 (talk) 21:03, 31 August 2024 (UTC)Reply

@Svartava I converted all the ones in Special:WhatLinksHere/Wiktionary:Tracking/languages/inc-pra. Since then, 4 more have appeared. I'll see if more appear tomorrow, and also pull out all occurrences in the dump file once it's available and see how many of these haven't been converted. Benwing2 (talk) 06:43, 1 September 2024 (UTC)Reply

@Benwing2: Thanks! I didn't know aliases tracked like this.

[9] are a few left ones Svartava (talk) 07:59, 1 September 2024 (UTC)Reply

"Benwing" Pronunciation

Latest comment: 25 days ago3 comments2 people in discussion

Hello,

How is your username pronounced? Is there an audio or IPA?

Audio (US):

(file)

If the audio above is inaccurate, then I can redo it.

Thank you Flame, not lame (talk) 11:06, 6 September 2024 (UTC)Reply

@Flame, not lame Your guess is as good as mine :), and your pronunciation sounds fine to me. Benwing2 (talk) 03:15, 7 September 2024 (UTC)Reply

that is awesome. Flame, not lame (talk) 03:50, 7 September 2024 (UTC)Reply

Adding onlyg parameter to Latin adjective headword lines

Latest comment: 25 days ago2 comments2 people in discussion

Following up on this Beer Parlour discussion on Latin adjectives used only in some genders such as victrix. The goal is to add functionality so that calls such as {{la-adj|victrīx|onlyg=f, n-pl|m=victor}} can be used to generate headword lines like " victrīx f (genitive victrīcis, masculine victor); third-declension defective adjective (feminine- and neuter-only)" and automatically place victrix in Category:Latin third declension feminine- and neuter-only adjectives. As far as I can tell, it looks like adding any parameters to Template:la-adj will require edits to Module:la-headword (which is locked) as well as Module:la-nominal. I just created User:Urszag/Sandbox/trix, Module:la-headword/sandbox and Template:la-adj-onlyg to try to test edits to suggest to the main modules, but I'm having trouble knowing where to start because of how large the modules are. Am I right in thinking that an "onlyg" parameter would need to be inserted in the local params list at do_generate_adj_forms (line 2621 in Module:la-nominal)? If so, can that be done now without it causing any negative side effects to current functionality? Urszag (talk) 02:28, 7 September 2024 (UTC)Reply

@Urszag Let me get back to you on this shortly. I think ideally these params should not be actual template params but should be indicators inside of <...> notation but I need to review the Latin code to see how it works; it was one of the earliest inflection modules I wrote so it doesn't make standard use of Module:inflection utilities. Benwing2 (talk) 03:13, 7 September 2024 (UTC)Reply

Dogra Script font

Latest comment: 19 days ago5 comments2 people in discussion

@Benwing I have been adding links for terms in the CAT:Dogri language in the CAT:Dogra script.

Would it be reasonable to add the following to MediaWiki:Gadget-LanguagesAndScripts.css just like User:Svartava’s request at Special:Diff/80987280?

/* Dogra */
.Dogr {
    font-family: 'Noto Serif Dogra', sans-serif;
}

If such a change is appropriate, there are a lot of other Noto fonts that I would be interested in adding to MediaWiki:Gadget-LanguagesAndScripts.css for other scripts. For example, this would include Takri, Mahajani, Khojki, Multani, Tirhuta, Modi, Khudawadi, etc just to name a few.

Would it be appropriate to give me the Interface administrator right as per WT:Interface_administrators#Applying_for_this_right, or would you prefer that I make an edit request for every instance just to be safe? Kutchkutch (talk) 07:21, 10 September 2024 (UTC)Reply

I think this reasonable but I don't know CSS that well. Maybe run it by @Surjection and/or @Ioaxxere. As far interface admin, since this gives you the ability to make changes to very fundamental parts of the interface, it would probably be better for now if you just make a list of all the changes you want. Benwing2 (talk) 20:50, 10 September 2024 (UTC)Reply

I think this reasonable but I don't know CSS that well.

Thanks a lot for the feedback.

Maybe run it by @Surjection and/or @Ioaxxere

Okay, we’ll see if they respond in the next few days.

interface admin … gives you the ability to make changes to very fundamental parts of the interface

Thanks for this explanation of what the interface admin right means.

it would probably be better for now if you just make a list of all the changes you want

If and when I decide to add to the proposed change above, then I can let you (or the appropriate user) know.
Perhaps the advantage of asking other users (if they respond) is that they can provide input on whether on whether the proposed change is helpful or not. This would be no different from other users that have requested changes at MediaWiki talk:Gadget-LanguagesAndScripts.css.
Just to be clear (if it wasn’t obvious already), I’m not interested in getting the right just for the sake of having it. If it’s unclear whether I need it, and there’s a risk of things going horribly wrong even with the slightest error, then that’s fine. I just mentioned it as a remote possibility if that avoids bugging other users.
Although there’s a difference in the technical skills needed for the general admin right versus the interface admin right, it was not entirely clear to me whether I needed the general admin right before I was given it. But, now I better understand what the general admin right entails after having used it the past few years.
Although I don’t claim to have any advanced competency of how to edit modules or scripts, the Lua-0 that I put on my user page is a bit of an understatement. In real life, I have taken intermediate courses in coding/programming/scripting. The Lua-0 just means that I’m often reluctant to test my ability to do that on Wiktionary.
The reason for bringing all this up is that for all these scripts and languages, I might be the only user or one of the few users who add(s) links and entries for them. So, it would just be nice to have some control or input over how they get displayed. Kutchkutch (talk) 10:45, 11 September 2024 (UTC)Reply

At the moment, I’m not interested in making edit requests to anything else that requires the Interface Admin right.

Since there is a section for scripts that don’t support bolding or italics, the scripts that I am interested in would have to be checked for that (including the most recent addition Siddham)

For the CAT:Mahajani script used for the CAT:Marwari language, the addition would perhaps be something like this

/* Mahajani */
.Mahj {
    font-family: 'Noto Sans Mahajani', sans-serif;
	font-size: 125%;
}

Kutchkutch (talk) 07:20, 12 September 2024 (UTC)Reply

Once again, thanks for implementing the two suggested changes above. And, I appreciate that you’re taking the time to read what I have to say.

I could definitely notice the difference for the Mahajani script since I put the font-size as 125% in the suggested code snippet. Previously, the Mahajani script seemed a bit too small. As a result of this change, I felt encouraged to create a few pages with the title in the Mahajani script.

Since I may be the only user who is using the Mahajani script, it might be justified to have this sitewide rather than just on my own user common.css as mentioned at WT:Fonts, This is the reason why wouldn’t propose changes for scripts that several other users also edit in.

However, there seems to be no noticeable difference for the Dogri and Siddham scripts. This lack of any discernible difference must be because there is no such customisation, and there are probably no other fonts available for these scripts other than Noto. So these inclusions must simply serve as placeholders until such customisations are added.

I’ll let you know if and when I’d like to request any customisation for these scripts such as font-size:125%.

For now, my subsequent requests are for

I hope this isn’t too many at once, and that there aren’t any errors.

/* Kaithi */
.Kthi {
    font-family: 'Noto Sans Kaithi', sans-serif;
	font-size: 125%;
}

/* Khojki */
.Khoj {
    font-family: 'Noto Sans Khojki', sans-serif;
	font-size: 125%;
}

/* Khudawadi */
.Sind {
    font-family: 'Noto Sans Khudawadi', sans-serif;
	font-size: 125%;
}

/* Modi */
.Modi {
    font-family: 'Noto Sans Modi', sans-serif;
	font-size: 125%;
}

/* Multani */
.Mult {
    font-family: 'Noto Sans Multani', sans-serif;
	font-size: 125%;
}

/* Takri */
.Takr {
    font-family: 'Noto Sans Takri', sans-serif;
	font-size: 125%;
}

/* Tirhuta */
.Tirh {
    font-family: 'Noto Sans Tirhuta', sans-serif;
	font-size: 125%;
}

Kutchkutch (talk) 05:36, 13 September 2024 (UTC)Reply

A request for template replacement by bot: `{{egy-alt}}` → `{{egy-alt tr}}`

Latest comment: 13 days ago3 comments2 people in discussion

Hi! Forgive my impertinence in asking for a favor, but I’m not sure who else to come to with this request. {{egy-alt}} is a very old template shortcut to {{egy-alternative transliteration of}}, a definition-line form-of template for Egyptian. Because the shortcut was created long before modern naming schemes for alt-form and form-of templates were in place, its name does not conform with the practice in other languages, where {{alt}} lists alternative forms under a lemma entry, and form-of templates have names like {{alt sp}} and {{alt typ}}. I’d like to bring Egyptian into conformity with other languages, and so rename the {{egy-alt}} shortcut to {{egy-alt tr}}, freeing up {{egy-alt}} itself as a name for a future rewrite of {{egy-hieroforms}} (which is the Egyptian template that actually does what {{alt}} does for other languages). Do you think you could use a bot to change all invocations of {{egy-alt}} to {{egy-alt tr}} across the few hundred pages that use it? It would really help with making the manual labor manageable. Many thanks in advance! — Vorziblix (talk · contribs) 23:44, 18 September 2024 (UTC)Reply

@Vorziblix Should be done. Benwing2 (talk) 04:09, 19 September 2024 (UTC)Reply

Perfect, many thanks again! — Vorziblix (talk · contribs) 04:11, 19 September 2024 (UTC)Reply

Quote parameter title flexibility

Latest comment: 9 days ago5 comments2 people in discussion

The fact that {{quote-book}} titles are bound to an italicised <cite> format is limiting and corresponds neither to the variety of use cases nor to the flexibility that quote templates otherwise allow for. Search for title=<span style="font-style:normal;"> to get a picture of how often I have been forced to resort to this hack. In my opinion, two measures are in order:

make |title= optional if |chapter= is present, to allow for quoting short stories or poems in cases when they are unbound to an edition;
allow for an unitalicised |title= for works such as the Bible or miscellanea lacking proper titles.

―⁠Biolongvistul (talk) 08:57, 23 September 2024 (UTC)Reply

OK, I will try to add this soon, maybe in the next couple of days. Can you give me some examples of places where you had to resort to the above hack to get an unitalicized title? Benwing2 (talk) 08:59, 23 September 2024 (UTC)Reply

@Biolongvistul Also for #1, can you give me examples of where you would like to use this proposed functionality? @Sgconlaw what do you think of this? Benwing2 (talk) 09:00, 23 September 2024 (UTC)Reply

~~I admittedly can’t remember any particular case for #1.~~ See reședea. ―⁠Biolongvistul (talk) 09:14, 23 September 2024 (UTC)Reply

See abia. ―⁠Biolongvistul (talk) 09:04, 23 September 2024 (UTC)Reply

Ancient Greek cities

Latest comment: 6 days ago3 comments2 people in discussion

Hi, could WingerBot please stop converting [[Category:grc:Cities]] to {{C|grc|Cities}}? I've been going through Category:Ancient Greek entries with topic categories using raw markup and converting the place names in it to {{place}} in the definition line so that they get categorized into more specific categories like CAT:grc:Cities in Greece etc. Thanks! —Mahāgaja · talk 10:32, 24 September 2024 (UTC)Reply

@Mahagaja Very sorry! This was running all night while I was asleep and it's done now. However, you should be able to find these by looking at the category itself; {{place}} won't normally categorize directly into Category:grc:Cities, so pretty much all of the 881 or so entries currently in that category have the category added directly. If you find there are a significant number of entries in that category that in fact are using {{place}}, I can run a bot script to compile a list of the cities that get added directly using {{C}}. I can also give you a list of the contents of Category:Ancient Greek entries with topic categories using raw markup prior to the changes my bot made. Benwing2 (talk) 21:28, 24 September 2024 (UTC)Reply

OK, thanks —Mahāgaja · talk 19:21, 25 September 2024 (UTC)Reply

Bot mistake

Latest comment: 7 days ago7 comments2 people in discussion

Looks like the robot undid its own change. Oops. -BRAINULATOR9 (TALK) 00:30, 25 September 2024 (UTC)Reply

Another example. -BRAINULATOR9 (TALK) 01:43, 25 September 2024 (UTC)Reply

@Brainulator9 Thanks. That seems to have happened because I was doing several runs at once and forgot that this would happen when they clashed. Will fix. Benwing2 (talk) 03:08, 25 September 2024 (UTC)Reply

You're welcome. Also, I noticed that instances of [[Category:langcode:Topic|{{SUBPAGENAME}}]] were not caught. This seems to have mostly been done on Reconstruction entries. -BRAINULATOR9 (TALK) 03:39, 25 September 2024 (UTC)Reply

Yeah the code excludes sort keys that have brackets or braces in them. I'm not quite sure why I added that exclusion; it's probably not needed. At the same time, the {{SUBPAGENAME}} thing isn't needed once converted to use {{C}}; I'll add a check for this. Benwing2 (talk) 03:56, 25 September 2024 (UTC)Reply

(sees reverted edit notification) Well, if you need another example, Reconstruction:Proto-Celtic/trīs also does this with cln-type categories. -BRAINULATOR9 (TALK) 05:35, 25 September 2024 (UTC)Reply

OK thanks. I fixed my script to correctly handle uses of {{SUBPAGENAME}} after running it on language-name categories. I think you must have fixed all the instances that didn't get correctly handled by my bot (except maybe the Proto-Celtic one you just pointed out), because it didn't find any raw language-name categories needing fixing, and the categories under Category:Entries with topic categories using raw markup by language are almost entirely empty. I am deleting the empty categories as I write this, and then I'll rerun my script on the remainder of the language-name categories to catch cases like the Proto-Celtic one above. Benwing2 (talk) 05:42, 25 September 2024 (UTC)Reply

Quechua labels

Latest comment: 6 days ago5 comments2 people in discussion

Hi, first off thanks for the help with the labels. I'm kind of learning as I go with editing Wiktionary, so a lot of the formal stuff I don't know how to do. I think I have created labels for the Quechua varieties that I needed, but I don't know if I did it correctly. I edited the page Module:labels/data/lang/qu. All of these are for Southern Quechua as that's what Quechua as a stand-alone term normally means. So, I've added the terms Ayacucho (quy), Santiagueño (aka Argentinian) (qus) and Cuzco-Collao (no code), this latter one split into actual dialectal/regional differences under the terms Puno (qxp), Cuzco (quz) and (South) Bolivian (quh) and North Bolivian (qul). I've left South Bolivian as simply Bolivian because that's where the majority of Quechua speakers are and what the majority of people mean when they say Bolivian; North Bolivian is spoken but to a lesser extent, not as much as South Bolivian, ergo just Bolivian. I've also added Proto-Quechua(n) (I don't see why use the -n but alas) (qwe-pro) and Classical Quechua (qwc). Now, these two should have their own separate language section as well imo, just like Proto-Indo-European and Latin aren't subsumed under Spanish. (I also created a module section for Classical Quechua as I wanted to add labels to some entries Module:labels/data/lang/qwc–idk if it's right because I don't think there is a language module for it yet, but it confused me so much that I just went ahead and created that page and figured I could delete it later.) Anyways, as to the regions where the varieties are spoken, it's kinda in the name with a few extra regions, but if you want more specificity I can make a list with the exact provinces/districts where they're spoken.

Regarding the terms of other varieties within "Quechua", there are very very few that have been added, also a reason why I wanted to separate them. I've only seen the suffixes -chaw and -yku being put under "Ancash", "Junín" (aka Wanka) and "San Martín" (aka Huánuco). I don't think they've been properly categorised since there were no labels in Module:labels/data/lang/qu for them, so they were just comments using the {{qualifier:}} code. So my guess is it's not systematic, but we can delete them as we edit entries in "Quechua" since there are so few of them—I could do a couple rn. SantiChau23 (talk) 05:42, 25 September 2024 (UTC)Reply

Thanks. Yes, Module:labels/data/lang/qu is the right place to add these things for now until we split Quechua. I agree we will need Proto-Quechua(n) as a separate language once we split, regardless of how the split is done. I don't know enough about Classical Quechua to indicate whether it should be a separate language or a language variety (and in the latter case, a variety of what?) and in general we shouldn't be creating lang-specific label modules for nonexistent codes so I would move the stuff in Module:labels/data/lang/qwc to Module:labels/data/lang/qu for now. Hopefully we'll resolve the Quechua split issue in a few days. It will help a lot to have someone like you around who is knowledgeable in the specifics of the Quechuan varieties and seems to be able to pick up the current ways things are done fairly quickly. As for the regions where each variety is spoken, the point of doing that is so that the category page correctly displays those regions. You don't necessarily need to indicate the exact provinces or districts if there are a bunch of them, but IMO we should do so if a given variety is confined to one or a small number of such administrative divisions. If you look for example in Module:labels/data/lang/zh you can see an example of a highly ramified set of varieties, grouped into a tree of higher-level and lower-level varieties, each of which includes specific region information on where the variety is spoken (from regions as small as particular neighborhoods in a city to as large as entire provinces, or to particular counties in a given province or in general whatever level of specificity makes the most sense). Benwing2 (talk) 05:55, 25 September 2024 (UTC)Reply

Yeah, I'll see exactly how the varieties are consigned in the literature to see where they say a variety is spoken. For example, there is Chanca which is normally equivalent to Ayacucho, but it's also spoken in half of the neighbouring region of Apurímac, and then the other half speaks Cuzco Quechua basically, but I've seen it subdivided into "Apurímac" or even "Eastern Apurímac" Quechua. Now, I don't think there's enough distinction in a linguistic or pragmatic sense to divide it and it's basically just Ayacucho-Chanca spoken on one side and Cuzco spoken in the other.

As for Classical Quechua, I think it should be included but not a hill I'm willing to die on. I think more people are gonna agree to have it as a variety of Southern Quechua (if in the future we actually get to create, if not, then just of Quechua). Its relationship with modern Quechua is akin to Early Modern English aka Shakespeare's English and actual modern English: there's a lot of mutual intelligibility, but a lot of other times there isn't (think of schoolchildren trying to read Shakespeare and having an aneurysm with spellings and weird phrases). But also the relationship is more like Ecclesiastical Latin with Romance languages, the latter did not descend from the former, but they're like "sister" varieties. Also,for Quechua I don't think the problem is so much the grammatical constructions themselves, which seem to be quite similar, but more so the spellings, which are still kinda pervasive to this day. So, I'd say we create a kind of "etymological only language" or however it's termed in Wiktionary and have meanings under that label (maybe instead of "historical", be more exact with "Classical"?).

I say it should be separate and treated kind of as Middle English since it has a different way of being organised (cf. Runa Simi Taqi by Szemiński) where there's a standardised way of consigning the forms. Also, some meanings are different since they were acquired in the colonial era and even more so transfigured by how the Spanish were using them. If you want an example of how it might look as an independent language, look at Quechua inka and inqa vs. Classical Quechua inka and inqa, noting the discrepancies in spelling. SantiChau23 (talk) 17:25, 25 September 2024 (UTC)Reply

@SantiChau23 OK thanks. I don't completely understand the ins and outs of the different Quechua varieties but we can treat Classical Quechua either as its own language or as an etymology-only variety of Southern Quechua or whatever (I agree with using a Classical label in that case; this is how we do it with other Classical varieties). Let's see how the splitting discussion proceeds. Benwing2 (talk) 21:41, 25 September 2024 (UTC)Reply

Yeah, I was rambling a bit. I think it can go both ways, and maybe I think it'd be better to treat it as Early Modern English, i.e. as an etymology-only variety. SantiChau23 (talk) 21:50, 25 September 2024 (UTC)Reply

-aceous

Latest comment: 5 days ago1 comment1 person in discussion

Hey. Could you do some mass changes like this? There are some 483 entries on Wiktionary:Todo/compounds not linked to from components/2024-01/page 8 that have the same error (searching for "aceous"), like bolbitiaceous. Denazz (talk) 13:11, 27 September 2024 (UTC)Reply

rsk-IPA issues

Latest comment: 3 days ago2 comments2 people in discussion

You might recall a while back when you were asked to fix multiword entries for the rsk-IPA template. Well, that's fixed now, but could you now fix monosyllabic words? There's stresses on monosyllabic words (e.g. дзень (dzenʹ)) for some reason and I don't know why. In addition, words that start with two consonants in a row for some reason have the stress marker on the second consonant, like we have with фляша (fljaša). If you could fix that then that would be greatly appreciated. Thank you. Insaneguy1083 (talk) 10:50, 28 September 2024 (UTC)Reply

Stress on monosyllabic words is not an error, it's done on purpose. You simply have stress there. Thadh (talk) 13:52, 28 September 2024 (UTC)Reply

Add topic

Benwing2

Archive

Catalan inflections

Your bot is removing valid categories

Twice-borrowed terms

New :toBcp47Code() method

Addition to quotation-template documentation

Using the Old French conjugation table as an inspiration

Finnish inflections

Request to deploy {{szy-pron}}

Relational -> demonym

Revert adding acceleration forms to {{pl-conj-ai}}

On the {{quote-book}} template

WingerBot and Welsh animal genders

Links to English possessives in inflection-line templates

Category:LANG nouns with other-gender equivalents

Email

hmm

Mon-Burmese script

Seeking template help

Category:Romance terms inherited from Latin nominatives

Macrolanguages

Italicising synonyms for taxonomic names

Error handling with Module:parameters and Module:languages

"terms spelled with"

Latin macronization change: veho, vē̆xī, vectum

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order

Replacement of quotation templates

Bugs in ar-conj/module:ar-verb

Bugs in ar-conj/module:ar-verb (part 2)

About categories

Adding a category with multiple subcategories

A couple of code replacements

Module editing tutorials

Min translations

Module:columns and Module:sa-verb, Module:sa-verb/data

Replacement of quotation templates

{{quote-song}}

Wu information origin

Gender-neutral adjectives in Module:es-headword

Replacement of quotation template

Time-outs from change to Module:headword

By the way

pcall and accessing nonexistent pages

Consolidating into Module:string utilities

Nonfunctional newversion in {{quote-journal}}

Template:tracking/defdate/hyphen

Bot-addition of templates

Why does Wingerbot has been made to "canonicalize Sicilian phonemic pronun"?

"Cannot handle template {{synonym of}}."

transliteration of Greek to Latin characters

Duplicate categories

Your reverts

Replacement of quotation templates (April 2024)

WingerBot converting "gl" to "q"

"a" is for module errors

K

Phrasal verbs with forward

A minor issue.

IPA|a=

Replacement for {{RQ:Milton Paradise Lost}}

Replacement of quotation templates (May–July 2024)

Coding help

dont tread on me RfD...suggest immediate withdrawal

Weird WingerBot bug with translations

Persian Audio labels

editing again

Split Yue and Split Wu

Getting rid of the old category boilers

A favour

WingerBot is changing language codes to the wrong code

caricature form

Prakrit module errors

Your undoing of my close was in error

Optional aliases for "[language] foos" categories

A little guidance

quick bot change

something's wrong with your bot

New Serbo-Croatian declension table

Replacement of quotation templates (August 2024)

Request to deploy `{{szy-pron}}`

Revert adding acceleration forms to `{{pl-conj-ai}}`

On the `{{quote-book}}` template

`{{quote-song}}`

Nonfunctional newversion in `{{quote-journal}}`

"Cannot handle template `{{synonym of}}`."

A request for template replacement by bot: `{{egy-alt}}` → `{{egy-alt tr}}`