Archive edit

Catalan inflections edit

Hi Ben, any chance we could have automatic Catalan inflections? There's User:DTLHS/catalan bot requests, but it doesn't seem to be running very often, and it's tedious to add manually to a list. Jberkel 18:12, 11 December 2023 (UTC)Reply[reply]

@Jberkel Yeah I have looked into this. The thing is that I'd probably have to rewrite Module:ca-verb to work like Module:es-verb or Module:pt-verb. The Spanish, Portuguese and Galician modules were all written mostly by me and implement JSON fetching of the inflections as well as {{es-verb form of}} and similar to automatically fetch the correct inflections for a given verb form. The former wouldn't be too hard to add to the existing module but the latter would be painful, and it would probably be better to rewrite the module instead. I have looked into doing this but I don't have that good a handle on Catalan verbs, esp. those in -er/-re. Do you have any good references that explain how Catalan verbs work, especially focusing on the -er/-re verbs, which is where the irregularities seem to be? The current module seems to push a lot of the complexity down into the template call, e.g. veure's invocation looks like this, which is a mess:





I'd want to have this stuff all in the module itself, similarly to what's being done for Spanish, Portuguese, Italian, French, etc. Benwing2 (talk) 23:15, 11 December 2023 (UTC)Reply[reply]

Ok, thanks for looking into it, I sent you some reference material via email. Jberkel 09:01, 12 December 2023 (UTC)Reply[reply]
@Jberkel Thanks, I received it. Benwing2 (talk) 21:12, 12 December 2023 (UTC)Reply[reply]
@Jberkel I have a question, not sure if you know the answer. In -ar verbs whose root vowel is e or o, is that vowel pronounced è or é (or ò or ó for roots in o) in root-stressed forms (e.g. first-singular present indicative), or does it vary from verb to verb? In Proto-Romance it varied from verb to verb, and this is still the case in modern Italian. Spanish has a reflex of that in verbs that unexpectedly have ie or ue in root-stressed forms, but Portuguese has regularized the vowel quality (for example, using low-mid vowels in -ar verbs). I think in conservative varieties of Occitan at least, it varies from verb to verb, and this is reflected in the spelling. Benwing2 (talk) 08:41, 13 December 2023 (UTC)Reply[reply]
Pinging @Vriullop from ca.wikt. Ultimateria (talk) 23:28, 13 December 2023 (UTC)Reply[reply]
@Vriullop @Ultimateria It appears that it varies from verb to verb in Catalan, at least based on the two verbs pegar, which ca.wikt says has /ɛ/ in Central Catalan (consistent with its origin from Latin short ĭ), and membrar, which ca.wikt says has /e/ in Central Catalan (again, consistent with its origin from Latin short ĕ). But the situation is complicated by the dialects, where many dialects have /e/ for both verbs. I'm interested in finding a dictionary that indicates these vowel qualities so that maybe we can include them in the conjugation table, similarly to how the French and Italian conjugation tables give pronunciation; this would only be for Central Catalan for now (maybe forever), since the dialects are complicated. Benwing2 (talk) 00:19, 14 December 2023 (UTC)Reply[reply]
BTW if what I've said is correct, where can I find in Catalan dictionaries the indication of how the stressed vowel is pronounced for a given verb? Benwing2 (talk) 05:32, 14 December 2023 (UTC)Reply[reply]
For variation in dialects see the notation used with {{ca-IPA}}: ê for /ɛ/ in Central, /e/ in Valencian and /ə/ in Balearic. Similarly with ô, and è, é, ò, ó has no variations. This is fair consistent with few exceptions.
It is etymological, ê from Latin ĭ or ē, but with some exceptions.
The only dictionary that indicates the rhizotonic stress is the DNV, for example membrar says é, but it is only for Valencian and it could be either ê or é. It is only helpful for è and ò. I have not found any other source indicating systematically the rhizotonic stress, even the dictionary of pronunciation I have in my bookshelf only includes some paradigmatic verbs. Frankly, there are some verbs I don't know how they are pronounced, apart from my personal perception, not a good sample. The only clue is a noun related with the verb, and the etymology of inherited ones. On ca.wikt I include a rhizotonic parameter verb by verb with ca-IPA notation. Vriullop (talk) 09:25, 14 December 2023 (UTC)Reply[reply]
@Vriullop Thank you! I wonder why Catalan dictionaries are so bad at including the rhizotonic vowel quality patterns. Pretty much all monolingual Italian dictionaries list the rhizotonic quality (and position) for all verbs. What about the pronunciation of other forms, such as verbs with pres 3s in -ou or -eu? Are there any dictionaries indicating the vowel quality of these and other endings? Thanks for any help you can give. Benwing2 (talk) 09:57, 14 December 2023 (UTC)Reply[reply]
I'm not sure what you mean, 'mou' from 'moure' and 'veu' from 'veure' have the same stress that the infinitive.
Endings that may be ambiguous, without any graphic accent:
  • -em, -eu, as in cantem, canteu, cantarem, cantareu: ê
  • -essis, -essin, as in cantessis, cantessin: é
  • -eres, -eren, as in temeres, temeren: é
  • infix -eix- (-eixo, -eixes, -eix, -eixen, -eixi, -eixis, -eixin): ê, but not used in Valencian that change to -ix-
This is a summary from different sources, coherent with the etymology. Vriullop (talk) 12:38, 14 December 2023 (UTC)Reply[reply]
@Vriullop OK thanks, I suppose that the DCVB dictionary gives the infinitive pronunciation of words like moure. This is very helpful; if I have other questions I'll let you know. Benwing2 (talk) 19:55, 14 December 2023 (UTC)Reply[reply]
DCVB is fine for pronunciation, but in some cases is not complete or confuse. If necessary, you can compare it with the GDLC in the link "francès" that includes translation ca-fr and also pronunciation in Central Catalan, and the DNV for Valencian. Vriullop (talk) 20:57, 14 December 2023 (UTC)Reply[reply]
@Vriullop Thanks! Benwing2 (talk) 21:41, 14 December 2023 (UTC)Reply[reply]
@Jberkel I wrote a preliminary Catalan conjugation module; see User:Benwing2/test-ca-conj for examples. It has a few bugs in it that I'm working out, but it's close. Benwing2 (talk) 22:13, 17 December 2023 (UTC)Reply[reply]
Already looking good, thanks for working on this! Jberkel 22:26, 17 December 2023 (UTC)Reply[reply]

Pronunciation of feu is correct, 2n pl. regular with -eu, and the irregular past was spelled féu in pre-2016 orthography which is more helpful.

The pattern /e/ in Central and /ɛ/ in Valencian is possible, but rare. It can appear for different reasons:

  • Pronunciation of stressed e is not as uniform in Central Catalan as in other dialects. For example, some word can be /e/ in Barcelona and /ɛ/ in Girona or vice versa. In general, one of the two is considered formal and the other local or dialectal. The formal one is usually the expected one or the same as in Valencian and Balearic.
  • Recent loanwords may have hesitations in their adaptation. They are usually adapted with è, but with é for the Spanish ones.

The DCVB indicates these local details. In this case I trust the GDLC more. The DCVB comes from fieldwork in the 1920s. Some of the pronunciations have not been registered in other late 20th c. fieldwork. The GDLC compiles the pronunciation of the main reference work used for radio and TV speakers in Central formal speech. In short, this pattern is rare in formal pronunciation. As far as I can remember, it doesn't happen with verb forms, and it can be treated like other irregular cases that do not follow an expected pattern. --Vriullop (talk) 18:00, 19 December 2023 (UTC)Reply[reply]

Although the /e/-/ɛ/ pattern above is rare, the other way is more common: /ɛ/ in Central and Balearic, /e/ in Valencian. This is noted on cawikt as ë (double e), a variant of ê (triple e). Stressed schwa in Balearic is used in inherited words and inflections. In cultisms or loanwords (i.e. cafè), or just words perceived as literary (i.e. mestre), instead of schwa it is /ɛ/ as in Central. There are indeed verb forms with rhizotonic vowel ë. There is no equivalent with stressed o, but for consistency it could be noted ö (double o) instead of ô. Vriullop (talk) 08:02, 21 December 2023 (UTC)Reply[reply]
@Vriullop Thanks for all your help. I have implemented ë in Module:ca-IPA. Can you help me by fixing the default rules in the module that currently default to ê to instead default to ë when it's correct? For example, cens defaults to cêns when it should be cëns. This is in the mid_vowel_e() function of Module:ca-IPA. I don't know Catalan well enough to fix it myself, and the corresponding cawikt module in ca:Module:ca-pron/AFI seems to have the same rules we currently have. Benwing2 (talk) 20:49, 21 December 2023 (UTC)Reply[reply]
As stressed schwa depends on inherited v. cultism, there is too much variation with -ens, -ena, -enes endings to be able to redefine the rule. I have added a tracking and I have checked where it was being applied by default. After adding hint ê or ë, I think it is safer to remove this rule: Special:WhatLinksHere/Template:tracking/ca-IPA/ens-ena-enes. Later, I'll look other rules with default ê. Vriullop (talk) 09:19, 22 December 2023 (UTC)Reply[reply]
@Vriullop Thank you. I agree about removing the rule. In general I'm not much in favor of rules like this that are wrong a significant fraction of the time, and prefer to be explicit except when it's nearly completely predictable. Benwing2 (talk) 11:06, 22 December 2023 (UTC)Reply[reply]
@Vriullop I just discovered that cerndre is irregularly missing the first r in pronunciation. Does this carry through to inflected forms like cerno, cerns or are they pronounced regularly with /r/? Benwing2 (talk) 03:05, 24 December 2023 (UTC)Reply[reply]
BTW there is a bug in cawikt's handling of Balearic pronunciation with ê; hard /k/ shows up as /c/ in the first of two alternants. See ca:cerca for an example. Benwing2 (talk) 03:08, 24 December 2023 (UTC)Reply[reply]
@Vriullop OK, I have several more questions. I'll try to list them all here and avoid pinging you individually.
  1. cors "privateering campaign" and cors "Corsican" are given without the /r/ in Eastern Catalan pronunciation both here and in cawikt. However, GDLC says /kórs/ for the former and /kɔ́rs/ for the latter. Which is correct, and if the /r/ is correct, do we need to update Module:ca-IPA?
  2. I am going through mid-vowel verbs trying to update the inflected forms to have the correct vowels. I am probably going to implement something soon in {{ca-conj}} and/or {{ca-verb}} to let you specify the mid-vowel quality and display it, similar to what cawikt does. I cannot determine the vowel quality of the following verbs so far: cessar, conrar, copar, copsar, crepar, dopar, drenar, gestar. Can you help?
  3. I am going to update Module:ca-IPA so you can individually specify the pronunciation of different dialects, as I have found some need for this. Apropos of this, I notice that the cawikt version of {{ca-pron}} supports ; do you think we should support this, or just use the per-dialect support I am going to add?
  4. Also, I'm more and more convinced that we should have few default rules for mid-vowel quality, and require it to be given explicitly in all cases that don't involve a well-known affix.
  5. fossa "pit, grave, etc.": does it have /o/ [per GDLC] or /ɔ/ [per DNV, DCVB and cawikt]?
  6. llei "law": does it have /e/ or /ɛ/ in Eastern Catalan, or some complex mixture? cawikt says /ɛ/, GDLC says /e/, DCVB says a complex mixture.
Thanks for your help, Benwing2 (talk) 06:41, 24 December 2023 (UTC)Reply[reply]
Lot of stuff here, but I'm happy to help.
  • 'Cerndre' losts first r when followed by sequence -ndr-. That is infinitive, future and conditional. All other forms have regular pronunciation. This happens also with prendre and derived verbs. See ca:Categoria:Rimes en català -ɛndɾe including 14 verbs ending with -prendre. Sequence -rndr- only occurs in 'cerndre' and there is not any other term with sequence -rendr- other than these 14 verbs.
  • /c/ in Mallorcan is an allophone of /k/, i.e. local pronunciation [məˈʎɔ̞ɾ.ca̟]. You're right, this is phonological and not phonemic. Catalan works often include some phonological symbols in phonemic representations for dialectal contrast, but this is not the case of [c] with restricted use. I plan to remove it for being misleading.
  • 'Cors' fixed on cawikt. This r is really retained, respelled 'corrs'. The module should not assume the lost of -r(s) in final coda for monosyllables. While most polysyllables do, most monosyllables don't. The problem is how to manage that.
  • My guest on rhizotonic vowels:
    • cessar: é; inherited from Latin ě not followed by an opening context, and DNV é.
    • conrar: ó; from unstressed o, reduction of conrear, DNV ó.
    • copar: ó; from French /u/ and analogous to noun copa, DNV ó.
    • copsar: ó; inherited from Latin ǔ, DNV ó.
    • crepar etym 1: ë; as noun crep from the same French root, neologism not attested in Balearic, DNV é.
    • crepar etym 2: é; from Latin ě, only used in Balearic.
    • dopar: ó; neologism as in Spanish, close to the English original, DNV ó.
    • drenar é; idem.
    • gestar: é; from Latin ě, as the noun gesta from the same root, DNV é.
  • Notation ẽ is hardly used. It is better to fix that with parameters per-dialect: ca:Special:Diff/2245937. I'll remove it on cawikt.
  • Some rules for mid-vowels are theoretically justified. I have this pending to review the unwanted side effects. I agree that it shouldn't lead to erroneous results.
  • Fossa should be ò from Latin ǒ, but there have been some modern changes during the 20th c. that I am still unable to explain. The DCVB shows the situation in the first third of the 20th c. in accordance with etymology. Probably in Central today is hesitant. In this case, I would say ó in Central and ò in Balearic and Valencian, two dialects more conservative.
  • Llei fixed on cawikt. From Latin ē it should be ê, but the diphthong has changed it: é in most Central, retained è in northern Central, /ə/ in Balearic, é in Valencian.
Vriullop (talk) 18:27, 26 December 2023 (UTC)Reply[reply]
@Vriullop Thank you! I have applied the changes offline to the specific verbs and other words mentioned above, and I will push them soon. Still working on Module:ca-IPA. A few more questions:
  1. More verbs where I'm not sure of the rhizotonic vowel quality: menar "to lead" (is this ê?), menjar "to eat" (apparently it uses now-deprecated ẽ?), mentir "to lie" (?), molar "to mock" (from Spanish; ó?).
  2. mesa "altar, mense, table": cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/. Mistake?
  3. messes "harvest time": again, cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/.
Benwing2 (talk) 05:46, 27 December 2023 (UTC)Reply[reply]
  • 'menar': ê.
  • 'menjar': é but Balearic ə. I'll modify the rizo parameter to accept an explicit /e/, /ə/, only used here.
  • 'mentir': é in forms without -eix-.
  • 'molar', to rock, from Spanish: ó.
  • 'mesa' as a noun has two etyms with different pronunciations, but GDLC only show one in translations. Here DCVB is correct.
  • 'messes', I would say é but irregular è in Central.
Vriullop (talk) 09:47, 27 December 2023 (UTC)Reply[reply]
@Vriullop Thanks for your quick response! I have made the offline updates. Some more questions (for N and O) ...
  1. noble: I already pinged you about this. DNV says /o/ for Valencian but DCVB says /ɔ/.
  2. nombre: cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
  3. odre: Same. cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
  4. ofi "office": Vowel quality? Maybe /o/ since the o is unstressed in oficina?
  5. oi: DCVB splits the interjection into /ɔj/ "yes" from Latin hoc and /oj/ (expression of pain or surprise). GDLC and DNV group these two meanings and say the pronun for both is /ɔj/. Who is right?
  6. orla "border, fringe": DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/.
  7. oro "suit in a Spanish deck or cards": Same as previous: DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/. (Not in GDLC.)
Benwing2 (talk) 01:03, 28 December 2023 (UTC)Reply[reply]
For P:
  1. peli "film" (clipping of pel·lícula): cawikt says pel·li has ê, so I assume this is the same, but it seems strange to have ê for a recent coinage.
  2. perca "perch (fish)": cawikt says /ɛ/ for Valencian but DNV says /e/. DCVB doesn't give a pronunciation.
  3. pesta "plague": cawikt and DCVB say /ɛ/ for Central but GDLC says /e/ (mistake?).
  4. pleca "vertical bar": Balearic vowel? Is it ê?
  5. poblar "to populate": DNV says stressed vowel is /o/ despite poble having /ɔ/. Mistake?
  6. porro "leek; spliff": cawikt and DCVB say /ɔ/ but both GDLC and DNV say /o/.
  7. posa "pose" (not in cawikt): GDLC says /o/ despite this being derived from posar, which has /ɔ/. (Are there two different pronuns/etyms here?)
  8. postres "dessert": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
  9. pregar "to pray": Presumably /e/ (same as prec)?
Benwing2 (talk) 05:23, 28 December 2023 (UTC)Reply[reply]
For P:
  • 'peli' is an informal spelling of 'pel·li'. The latter is used in the press and has been consolidated, unlike other clippings. I spontaneously pronounce it è just like any word beginning with consonant + stressed e + l, including inherited ones from Latin both ě and ē. Being of general use and not exclusively colloquial, I would say ê, fully adapted in Central and the same value as unstressed in Balearic and Valencian.
  • 'perca': ë. Expected é but è per context C+ě+r, not fully changed in learned borrowings.
  • 'pesta' is weird, expected é but with some irregular è not enough explained in context C+ě+s. From the sources, è but irregular é in Central, although the irregularity is the other way around.
  • 'pleca': ë, as a technical word, schwa is improbable in Balearic.
  • 'poblar': I can't find any explanation for the difference between 'poble' and 'pobla'. Without any confirmation, for now I would say ò.
  • 'porro': ó. Expected ò but usually changes to ó before -rr-.
  • 'posa': noun ó and verb ò. Expected ò both from 'pausa' and 'pausare', but most current senses of the noun are calques of French or Spanish, both ó.
  • 'pregar': é.
Vriullop (talk) 13:30, 29 December 2023 (UTC)Reply[reply]
On cawikt the pronunciation was first added according to DCVB. Revision with GDLC is partial, not completed. Inclusion of pronunciation on DNV is recent, not yet checked. Your guesses are usually correct.
For N and O:
  • 'noble': ô. Expected ó, on first syllable changed to ò per consonant context, except on areas with Mozarabic influence as in Valencian.
  • 'nombre': ô. The same case, but I trust DCVB for Balearic with irregular ó.
  • 'odre': ô, but Balearic ó.
  • 'ofi', I've never heard it in Catalan. My guess is ó either from an unstressed vowel or from Spanish.
  • 'oi' both ò and ó. I trust DCVB with three groups, the last one used specially in Balearic. The two authors of the DCVB were Balearic, and both 'oi las' (surprise) and 'ois' (moans) result familiar to me heard from Balearic people. Probably outside the Balearic Islands people don't care about the difference with barely used senses.
  • 'orla': ô. Again, an expected ó changed to ò except in Valencian, confirmed in descriptive works.
  • 'oro': ô, hesitant by analogy with inherited 'or'.
Vriullop (talk) 15:32, 28 December 2023 (UTC)Reply[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── For R:

  1. reble "rubble": cawikt and DCVB say ê, but GDLC says /e/ for Central Catalan.
  2. recar "to regret": DNV says /e/; DCVB suggests /e/ everywhere, is that right?
  3. regar "to water": Etymologically should be ê, is that right? (OTOH reg has /e/ everywhere per GDLC and DNV)
  4. regna, regne, regnar: These seem to have [ŋn]. Do all words in -gn- have this? If so we should fix Module:ca-IPA to do this automatically. (Is this Eastern Catalan only? Valencian seems to have [gn].)
  5. reptar: In the meaning "to reprimand; to challenge" it seems to have rhizotonic /e/. In the meaning "to crawl" I am not sure.
  6. resar "to pray": Since this is a Spanish borrowing, does it have /e/? res "prayer" seems to have /e/.
  7. retre "to give back, to return": cawikt and DCVB say /e/ in Eastern Catalan but GDLC says /ɛ/.
  8. rosca "screw thread": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
  9. rosta "fried bacon, fried bread": cawikt says /ɔ/ for both Eastern and Western; DNV says /ɔ/ for Valencian but GDLC says /o/. DCVB has /ɔ/ and /o/ dialectally.
  10. rosta (feminine of rost "steep"): Same. cawikt says /ɔ/ for all, DNV says /ɔ/ but GDLC says /o/. Here, DCVB has only /ɔ/.
  11. rotar: Two etyms: (1) "to belch": Does it have /o/ like rot "belch"? (2) "to rotate": Does it have /o/ because it's borrowed from Spanish?
  12. rotllo: "roll; annoyance": DNV says it has /o/ but rotlle has /ɔ/. Mistake? cawikt and DCVB say forms have /ɔ/ everywhere, and GDLC agrees that both forms have /ɔ/ in Central Catalan. Note also rotlo, where again DNV has /o/; here again, DCVB says /ɔ/ everywhere but in this case cawikt says uses ô to get /o/ in Valencian.

Benwing2 (talk) 08:37, 28 December 2023 (UTC)Reply[reply]

@Vriullop Thanks again for your detailed responses, I really appreciate the work you're putting into the responses. Issues I found involving terms with S:
  1. seca "mint": GDLC says /ɛ/, DNV says /e/ and cawikt says ê, which are all compatible, but DCVB says /ɛ/ everywhere. In this case I wonder if DCVB is actually correct while both DNV and cawikt are mistaken.
  2. sedar "to sedate": DNV says /e/ for root vowel but unknown in Central Catalan.
  3. sense "without": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
  4. sentir "to feel": DNV says /e/ root vowel. No dictionary attests the Central Catalan root quality, although /e/ is expected.
  5. serva "serviceberry": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
  6. setge "siege; figwort": cawikt and DCVB say é, DNV says /e/ but GDLC says /ɛ/.
  7. soga "rope": DNV and GDLC both say /ɔ/ but DCVB says variously /o/ or /ɔ/ for a bunch of obscure places that I'm not familiar with but seem mostly Northwest Catalonian. I assume Balearic must have /ɔ/ but not sure.
  8. sonso "clumsy, gauche": cawikt and DCVB say /o/ for both East and West; DNV agrees with /o/, but GLDC says /ɔ/ for Central Catalan. Maybe this is a case of changing over the last century?
  9. sorna "sarcasm": cawikt says ô, but both DNV and GDLC say /o/. DCVB doesn't give pronun.
  10. sosa "saltwort, soda ash": cawikt and DCVB say ô, but both DNV and GDLC say /o/.
  11. sostre "ceiling": cawikt says ó and DNV says /o/, but GDLC says /ɔ/. DCVB maybe has the real story: /ɔ/ in Barcelona, /o/ elsewhere. I'm going with the idea that Western Catalan (Northwestern and Valencian) have /o/, while Central has /ɔ/ and Balearic has /o/. Correct?
  12. sotjar "to spy on": DNV says /o/ root vowel. No dictionary attests the Central Catalan root quality, but I am guessing /o/ based on the proposed etymologies. Correct?
Note that I'm now 87% through the set of 2,722 terms that I identified for auditing the mid-vowel quality, and have finished with S. T represents about 7% of the total, V represents 4-5%, and the remaining letters around 1%. So I'm quite close to finishing, with lots of help from you :) ... Benwing2 (talk) 09:20, 30 December 2023 (UTC)Reply[reply]
For S:
  • seca: I think the correct one is ë, although I'm not sure about its evolution from Arabic.
  • sedar: expected ê from Latin sēdō.
  • sense and sens: expected ê, but such words often used as proclitics tend to become closed. So é but schwa in Balearic.
  • sentir: é as expected.
  • serva: ê is correct. As in other similar cases, the GDLC does not distinguish properly different pronunciations from different etyms.
  • setge: expected é, but è in Central per context subject to openness.
  • soga: ò in general. It was identified by Coromines in a handful of about 40 words that have changed an etymological ó by ò except in some specific areas. It is known as the Coromines law, and it is still unknown why it includes certain words and not others.
  • sonso: ó but ò in Central, for unknown reason to me.
  • sorna: ó in general.
  • sosa: ó in general.
  • sostre: it is one of the Coromines law, expected ó changed to ò. This law may have various degrees of extension. Probably most conservatives areas, Balearic and Valencian, maintain the old ó, while most Central has changed to ò. Usually Northwestern also changes by Central attraction, to be confirmed.
  • sotjar: not sure, but ó is the best guess.
Vriullop (talk) 08:31, 5 January 2024 (UTC)Reply[reply]
For R:
  • reble: expected é. The DCVB with ê seems by analogy with other words. I would say é but with an irregular ə in Balearic.
  • recar: é as expected from an earlier 'a'.
  • regar: ê as expected. Nouns 'rec' and 'reg' are interrelated and are not a good indicator for the verb.
  • All -gn- between vowels are pronounced [ŋn]. Also -n- followed by /k/ or /ɡ/, but this one was reverted per no phonemic.
  • reptar: é from Latin rěp(u)tō and ê from rēptō.
  • resar: é as noun 'res'.
  • retre: I really don't know which process applies here. By now I'd say ë, pending of confirmation.
  • rosca: ô.
  • rosta, as a slice of bacon usually fried with bread is a typical dish of the Pyrenees. Although it is the feminine form of 'rost', from the old sense "roasted", in the Pyrenees this ò usually changes to ó. In the DCVB, I read that the northernmost localities say ó, and ò it is quite far from the Pyrenees. In short, as a noun ó in Central, ò in Valencian and Balearic. As an adjective form: ò, although the GDLC does not separate it properly.
  • rotar: ó for both etyms.
  • rotllo, what a mess! It is not attested in Valencian until recent times, probably from Spanish rollo. This ó is archaic, not accepted in other areas where it is used from Old Catalan. 'Rotlle' is the inherited form, hardly used in Valencian where it is preferred the spelling 'rotle', both ò. 'Rotlo' is only used in Balearic, for me it is anecdotal how to try to pronounce it by outsiders with a range of alternatives spellings.
Vriullop (talk) 11:29, 4 January 2024 (UTC)Reply[reply]
@Vriullop Thank you again! BTW I have gone through and added (offline) stressed root vowels to all enwikt Catalan verbs with e or o where I could determine it, using some combination of cawikt, DNV, GDLC and DCVB. (It looks like I was able to figure out the vowel for 1,174 verbs in -ar, 33 verbs in -ir and all relevant verbs in -re and -er, and only couldn't figure out the vowel for 72 verbs in -ar and 2 verbs in -ir.) I am mostly done coding the changes I want to make to Module:ca-IPA and I'll use the new code to support displaying the root vowel info. I'll post the list of undetermined verbs soon. Benwing2 (talk) 19:55, 4 January 2024 (UTC)Reply[reply]
BTW I have finished the changes to Module:ca-IPA and Module:ca-headword and pushed all the root vowel additions. You can see them in action e.g. in flirtejar, besar, adreçar, annexar and several others. Benwing2 (talk) 07:45, 5 January 2024 (UTC)Reply[reply]
Also, I added tracking for all terms with defaulted mid vowel quality, with the plan of removing some of the defaults. The first word I looked at, for example, is amulet, a recent borrowing that claims to have ê, which seems unlikely. Benwing2 (talk) 08:07, 5 January 2024 (UTC)Reply[reply]
Here is the list of now 68 -ar verbs where I couldn't identify the Central Catalan root vowel (sometimes only in one etymology out of several): afogar, agregar, al·legar, alterar, amonestar, ancorar, atemptar, celebrar, col·laborar, commemorar, compensar, condensar, confessar, congregar, conrear, contemplar, crebar, delegar, denegar, depredar, desagregar, desintegrar, deteriorar, devorar, discrepar, dreçar, dropar, edulcorar, elaborar, elevar, encetar, engegar, enllumenar, ennuegar, ensopegar, entaforar, entollar, entrenar, esborrar, esbotzar, esmicolar, espitregar, esverar, evaporar, exacerbar, expectorar, explorar, gofrar, impetrar, increpar, integrar, interpretar, isolar, laborar, negar, perforar, prolongar, rememorar, retolar, rosegar, secretar, segregar, somorgollar, temptar, tomar, trafegar, trepar, trepollar. Benwing2 (talk) 08:12, 5 January 2024 (UTC)Reply[reply]
In some cases I can't be completely sure, these are my best guesses: afogar ó, agregar é, al·legar ê, alterar é, amonestar é, ancorar ó, atemptar é, celebrar é, col·laborar ó, commemorar ô, compensar ê, condensar ê, confessar é, congregar é, conrear ë, contemplar é, crebar é, delegar é, denegar é, depredar é, desagregar é, desintegrar é, deteriorar ó, devorar ô, discrepar é, dreçar ë, dropar ó, edulcorar ô, elaborar ó, elevar é, encetar é, engegar é, enllumenar ê, ennuegar ë, ensopegar ê, entaforar ó, entollar ò (both), entrenar é, esborrar ó, esbotzar ó, esmicolar ô, espitregar ë, esverar é, evaporar ó, exacerbar é, expectorar ó, explorar ó, gofrar ó, impetrar é, increpar é, integrar é, interpretar é, isolar ô, laborar ó, negar é (both), perforar ó, prolongar ó, rememorar ó, retolar ó, rosegar ê, secretar ë, segregar é, somorgollar ó, temptar é, tomar ó, trafegar ê, trepar é, trepollar ó. Vriullop (talk) 08:23, 10 January 2024 (UTC)Reply[reply]
Reviewing mid-vowel defaults tracked:
  • e/u: doesn't make any sense, probably it was intended for a diphthong -eu-.
  • o/u: also nonsense.
  • e/ct-cts-cts-ctes: too many variations è with cases of é only in Central.
  • e/dre-dres: mostly ë instead of é.
  • e/final-l: it is stable but needs to exclude -ell(s).
  • e/l-ls-ll: it's ok, I haven't found any problem.
  • e/ma-mes: too many variations
  • e/ens-ena-enes: too many variations ê/ë
  • e/nse-nses: it doesn't worth for a few words
  • e/nt-nts: mostly é with few exceptions, widely used
  • e/r-rs-ra-res: too many variations é/ê
  • e/rC: it's ok
  • e/sos-sa-ses: it's ok
  • e/t-ts-ta-tes: too many variations
  • è/s-blank: FIXME only in last syllable stressed, currently includes tèbia, època, ...
  • o/r-rs-ra-res: too many variations
Vriullop (talk) 09:20, 8 January 2024 (UTC)Reply[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vriullop I have finished everything up through T and pushed the offline changes to Wiktionary. Issues I found with T:

  1. teca three etyms: (1) "food"; (2) "teak"; (3) "theca". All three have /e/ per DNV, and (1) and (3) have /ɛ/ per GDLC. (1) has /ɛ/ per DCVB, otherwise not indicated. I am guessing then that (1) and (3) have ë, and (2) must have either é or ë.
  2. temprar: Exactly parallel to emprar. cawikt says ê but DNV says /e/ for tempre. Is /e/ more recent for Central?
  3. temptar "to try": /e/ per DNV, I'm guessing é per etymology.
  4. tesla "tesla": /e/ per DNV, I'm guessing é.
  5. testar "to witness": /e/ per DNV, I'm guessing é per etymology.
  6. teu "your": /ɛ/ per GDLC for Central Catalan but /e/ per cawikt. GDLC says /e/ for meu "my" so I wonder if this isn't a mistake in GDLC.
  7. text "text": /tekst/ per GDLC, /tɛkst/ per DNV. Correct? DCVB says /test/ for everywhere, which may be antiquated.
  8. tomar (1) "to catch"; (2) "to knock down". Root vowel?
  9. tondre "to shear": /o/ for Central in cawikt and DCVB, but /ɔ/ in GDLC (DNV says /o/). However, note that tosa has /o/ in GDLC. What's going on here?
  10. tora "aconite": GDLC and DNV both say /o/ but DCVB says /ɔ/ for both Western and Eastern. Is /ɔ/ antiquated?
  11. torbar "to disturb", torba "disturbance" and "torba" peat: GDLC and DNV both say /o/ but cawikt says /ɔ/ for Central Catalan (/o/ for Valencian). Is /ɔ/ wrong or antiquated?
  12. tors "torso": cawikt says /o/ (dialect not indicated), but GDLC says /ɔ/ for Central (and DNV says /o/ for Valencian). I am assuming GDLC is correct.
  13. trempa, trempar: cawikt says ê everywhere, in agreement with DCVB for tremp and trempa, but GDLC gives /e/ for both tremp and trempa; maybe /e/ is more modern as DCVB's fieldwork is ~ 100 years old.
  14. trenca "duffel coat": A borrowing from Spanish trenca. The other meaning of the noun "breakage; lesser grey shrike" has ê but this seems unlikely for a Spanish borrowing. I'm guessing ë.
  15. trepa "trimming; stencil" also "mob, riffraff, rabble" also a form of trepar "to drill, to perforate". DNV says /e/ for all three etyms; GDLC says /e/ for the first two, but DCVB says /ɛ/ for the meaning "mob, rabble". I am not sure whether all three are etymologically related.
  16. tropa "troop; crowd": cawikt says /ɔ/ everywhere (and DNV says /ɔ/) but GDLC says /o/. DCVB says /ɔ/ for Eastern but /o/ for Girona; maybe /o/ for Central is more recent.
  17. trotllo "medusafish": cawikt says /ɔ/ everywhere but DNV says /o/, so I'm assuming ô.

Also a few other issues:

  1. alliberar: cawikt says /ɛ/ everywhere but DNV says /e/.
  2. beca and derived becar "to give a scholarship to": cawikt says ë but DNV says è.
  3. clon: cawikt says ò but DNV says /o/. I am guessing ô then.
  4. emprar: cawikt says ê but GDLC says /e/ for empre. Is /e/ more recent for Central?
  5. perseverar: cawikt says ê, are we sure? sever has é.

Benwing2 (talk) 01:53, 6 January 2024 (UTC)Reply[reply]

One more question (sorry for the barrage of questions): Currently the module section for Central Catalan unilaterally removes final single -r, whether absolutely word-final or followed by an -s. I'm thinking of making this less absolute, as follows:
  1. Don't remove final -r(s) in monosyllables.
  2. In non-monosyllables, remove final -r(s) in -ar, -er, -ir and in -[dtsç]or, but not otherwise. This is based on the fact that most words in -[dtsç]or are agent nouns and seem to fairly consistently remove the -r, while the remaining words in -or often (but not always) preserve the -r per GDLC. Here is a long list of such words: amor, humor, anterior, vapor, rumor, labor, major, tenor, tumor, terror, inferior, superior, clamor, posterior, furor, ulterior, tricolor, temor, rigor, vigor, menor, decor, olor, llavor, suor, licor, rubor, petricor, negror, remor, millor, albor, cremor, claror, grogor, blavor, maror, pitjor, frescor, senyor, finor, incolor, rojor, vermellor, blancor, lletjor, amargor, primor, favor, picor, escalfor, tremolor, esgarrifor, llacor, raor, xafogor. The idea is that to force the preservation of -r, write 'rr', and to force the non-preservation, write '-ó' (although if all these words preserve the -r in Valencian, we'd want some other signal, e.g. '-(r)'). Thoughts? Benwing2 (talk) 09:50, 6 January 2024 (UTC)Reply[reply]
This plan sounds fine, assuming:
  • The non-preservation happens when the final syllable is stressed. When unstressed only affects some words, like créixer, càntir.
  • In Valencian it is always preserved. To force the non-preservation in Central and Balearic writing '-(r)' or '-(r)s' is intuitive. In fact, this is similar to the rhymes, i.e. Rhymes:Catalan/a(ɾ).
  • In Balearic there are more loses of final -r than in Central Catalan. See amor, although the result is correct, it is not consistent when the preservation is forced writing 'rr', and it is not reasonable to assume that in Balearic no final -r is ever pronounced. Maybe it should be fixed with a per-dialect parameter.
There are many pending things above that require more time. Vriullop (talk) 11:57, 6 January 2024 (UTC)Reply[reply]
@Vriullop Thanks for your comments. I'm thinking that writing rr should force the pronunciation of final -r everywhere, while writing something like rh should cause it to be pronounced in Central Catalan but not Balearic. This is based on looking through the DCVB with a sample of the above nouns, some of which appear to have pronounced -r in the Balearics, some not, and for some it depends on where in the Balearics. More complex scenarios can be handled using dialect-specific params (which are now implemented; see llei for an example). Benwing2 (talk) 21:23, 6 January 2024 (UTC)Reply[reply]
Another possibility in place of rh for "pronounced everywhere but Balearics" is (rr). This sets up a hierarchy of pronunciation: rr > (rr) > (r) > nothing. Benwing2 (talk) 01:13, 7 January 2024 (UTC)Reply[reply]
BTW I am planning on making it required to specify the way final -r is pronounced, using one of rr, (r), (rr) [or maybe rh if we decide on that] or omitting it, except in the circumstances where it defaults to (r), which are multisyllabic words ending in stressed -ar, -er, -ir or -[dtsç]or. In all other circumstances, the pronunciation seems far too irregular to provide a default.
Note that I have already removed the majority of defaults for mid vowel o and added the vowel explicitly, and I'm planning on doing the same for mid vowel e. For the defaults I removed, either there were few places that made use of the defaults or there were many but with lots of errors, e.g. o and e in the penultimate syllable with -i or -is in the last syllable were defaulting to ò and è respectively, which makes sense for adjectives of this form but doesn't work for subjunctive verb forms, and there were lots of places where this default was being used for subjunctives, producing incorrect results.
One other thing: the pronunciation given in GDLC for meteor is [məteɔ́ɾ], with unstressed [e]. Is this correct? If so I'll need to add a special symbol to allow for unstressed unreduced vowels. However, maybe it's a mistake; I found a pronunciation on Forvo here [1], which sounds more like [mətəɔ́ɾ] (BTW cawikt says [mətəóɾ] with /o/, which may be wrong as well for Central Catalan). Benwing2 (talk) 05:08, 7 January 2024 (UTC)Reply[reply]
I forgot to add, I'm implementing a shortcut notation to make it easier to specify things like the pronunciation of final -r without having to repeat the entire word. If you write [FROM:TO] where FROM is part of the spelling and TO is the corresponding respelling, it will make that substitution in the respelling as long as it's unambiguous. So you can write [or:ôrr] for meteor. To make it even shorter, in cases where the spelling and respelling are similar enough, you can just write the respelling, hence [ôrr], and the code knows that ô should match either o or ô in the original spelling and rr should match either r or rr. Another common example is [ks], which is equivalent to [x:ks] and can be used to respell x as ks in words like boxejador. This will all be documented in {{ca-IPA}} as soon as I push the code. Benwing2 (talk) 05:16, 7 January 2024 (UTC)Reply[reply]
  • For final -r, I like the hierarchy rr > (rr) > (r)
  • 'meteor' with unstressed [e] is correct. No need to do anything in the module, function reduction_ae does not apply any reduction in groups 'eà' and 'eò'.
  • A shortcut for respelling is useful.
Vriullop (talk) 10:27, 8 January 2024 (UTC)Reply[reply]
@Vriullop I have implemented everything described above and fixed up all terms in final -r(s) appropriately. The use of the respellings for -r is documented in {{ca-IPA}}. The substitution notation like [ó(r)] is still being documented. Benwing2 (talk) 02:29, 10 January 2024 (UTC)Reply[reply]
@Vriullop Thanks for your comments! I have added add the root vowels you specified and am going through the defaulted mid vowel conditions and fixing them up. One thing I notice is that written bl pronounced / and similarly written gl pronounced / aren't correctly handled. For bl at least it seems not all occurrences of bl result in this doubling, e.g. doblar does but sublim doesn't yet they have the same structure in terms of # of syllables, word shape, position of the accent, etc. What do you recommend? i tried manually adding written to segle, writing it as, but then Valencian also gets the doubling, which is wrong. I see two approaches: (1) Manually require all doubled bl and gl to be written as bbl and ggl except maybe in certain suffixes (e.g. -able(s), ible(s)), and have the Valencian-specific code remove the doubling and convert it back to single stops; (2) Double bl and gl by default. This would mean we'd need some method of indicating the non-doubled occurrences, maybe by writing sub.lim or something (although this might be problematic when we start providing phonetic output with fricative [βɣð], which I'd like to do soon; not actually sure though if there will be an issue). Thoughts? Benwing2 (talk) 07:02, 12 January 2024 (UTC)Reply[reply]
The groups -bl- and -gl- are geminate in Central and Balearic in post-stressed position: poble /ˈpɔb.blə/, regla /ˈreɡ.ɡlə/, including endings -able, -ible. That can be coded in the module. It doesn't happen in Valencian, nor in pre-stressed position, as in sublim. But all its derivatives are also geminate even if in pre-stressed position: poblar, població, reglar, reglament, ... That needs to be respelled pobblar, pobblació, regglar, regglament, and then undone in Valencian. Vriullop (talk) 08:22, 12 January 2024 (UTC)Reply[reply]
@Vriullop Got it, thanks. I'll implement this. What do you think of just providing phonetic output and changing the /.../ to [...]? This seems consistent with what the various dictionaries do; or at least, they explicitly show the fricative allophones [βɣð]. This would mean, for example, that the issue of whether to display [ŋ] goes away: we just display it whenever it's pronounced as such. Benwing2 (talk) 20:04, 12 January 2024 (UTC)Reply[reply]
I have implemented what you said for -bl- and -gl-. I am currently working on auto-adding secondary stress to adverbs in -ment. (In the process I'm adding a quick shorthand to indicate a part of speech for a given term, e.g. n/RESPELLING or just n/ for a noun, a/RESPELLING or just a/ for an adjective, etc. The idea here is that terms in -ment default to adverbs, which means they get secondary stress by default, but you can override this by specifying n/ for a noun like desembarcament or a/ for an adjective like vehement. Some terms need both a part of speech and respelling, e.g. desdoblament needs n/[bbl] to indicate that it's a noun and the -bl- is pronounced /bbl/.) I have a question though about this. Adverbs in the DNV are indicated with *primary* stress on the preceding component and no stress on -ment, e.g. see [2] for feliçment. This seems rather strange to me and it's contrary to what the Wikipedia article on Catalan phonology says. Is this really true or is it just something weird in the DNV? Benwing2 (talk) 23:40, 12 January 2024 (UTC)Reply[reply]
BTW I found an exception to the rule that post-stressed -bl- is geminate: bíblic (and Bíblia). Are there others? If so and given how many exceptions there are in the other direction, I wonder if we shouldn't just make all -bl- and -gl- geminate by default in Central Catalan and Balearic, and require that all cases where this doesn't happen get rewritten using [b.l] or [g.l]. Benwing2 (talk) 04:01, 13 January 2024 (UTC)Reply[reply]
I implemented the auto-adding of secondary stress to adverbs in -ment, along with the part of speech hints described above, and fixed up all nouns and adverbs in -ment appropriately. (I actually added pronunciations to all or almost all nouns and adverbs in -ment that were missing them; this took several hours for adverbs because there are around 800 of them in -ment, and many of them have secondarily stressed e or o, which needed looking up.) The mid vowel hint now applies to the part preceding the adverbial -ment, not to the -ment itself (which is always pronounced /men(t)/ with /e/). Note also that in the future, these part of speech hints can also help with things like terms in -ar, where adjectives in Central Catalan pronounce the final -r but nouns and verbs generally dont. Benwing2 (talk) 07:33, 13 January 2024 (UTC)Reply[reply]
OK, from the GDLC it looks like there are actually three ways that -bl- can be pronounced: obligar has [βl], doblar has [bbl], and obliterar has [bl]. Is that correct? If so I'll need to come up with some notation to distinguish these three. Maybe we should write o-bliterar to get [bl]; this is consistent with words like hipoglucèmia, which have hard single [gl] following a prefix with secondary stress [ìpuglusɛ́miə]. This would suggest a respelling hípo-glusèmia. Then if we need post-stressed [βl], we write e.g. Bíb.lia, and if we need post-stressed [bl] for some reason we'd write e.g. Bí-blia or something, and to get post-stressed [bbl] we'd write e.g. Bíbblia (or rely on the default). Make sense? Sorry to dump so much text on you. Benwing2 (talk) 09:25, 13 January 2024 (UTC)Reply[reply]
Great work here.
  • The inclusion of allophones βɣðŋɱ does not imply to change the transcription with brackets [...] In fact, /β/ is not a w:voiced bilabial fricative but a simplification without diacritic of an approximant [β̞]. Catalan works follow a convention of "broad transcription" with the inclusion of what is considered relevant and without any claim about phonemic values. A purely phonemic transcription is a theoretical discussion. According to different authors, between 25 and 31 phonemes can be considered in Catalan. For example, the schwa is a predictable dialectal allophone, but it is relevant in contrast with other Romance languages. If it were necessary to mark that it is not strictly phonemic, frwikt uses \backslashes\. They are also used by the Merrian-Webter as a notation for its own IPA transcription. The criteria followed in enwikt do not seem consistent enough to me.
  • The DNV does not show primary and secondary stress, nor does it in compound words. It is more noticeable in Eastern dialects without schwa in secondary stress. The stress showed in adverbs with -ment is misleading.
  • 'Bíblic' and 'Bíblia' are the only exceptions to geminate bl.
  • I have not found any explanation for 'obliterar' and 'hipoglucèmia'. See Maybe as cultism in very formal speech, but I think it doesn't worth to make exceptions here. On the contrary, note that /β/ does not happen in Balearic and formal Valencian after a vowel, that is in dialects that distinguish /b/-/v/.
Vriullop (talk) 09:17, 15 January 2024 (UTC)Reply[reply]
@Vriullop Thanks for your response, this is very helpful. I am currently working on fixing up terms with written x (there are a lot of mistakes) but I'm almost done with the offline portion and I think next I'll focus on adding the fricative allophones and correctly handling multiple words. For handling multiple words I need to know the following:
  1. What are the unstressed words? I assume they are all the proclitic object pronouns em, et, es, el, la, els, les, li, ens, us, ho, hi, en; plus the enclitic ones -me, -te, -se, -lo, -la, -los, -les, -li, -nos, -vos/-us, -ho, -hi, -ne (which might already be handled correctly); the contracted ones with apostrophe (which may already be handled correctly); maybe the unstressed possessives mon, ma, mos, mes, ton, ta, tos, tes, son, sa, sos, ses; the prepositions a, de, per, amb (and obsolete ab?), en (what about cap, des?); the prepositional contractions al, als, del, dels, pel, pels; articles el, la, els, les (already handled as proclitic pronouns), personal articles en, na (what about indefinite articles un, u, uns?); maybe salat articles es/ets, sa, ses, so, sos; the conjunctions i, o (what about si?). Any others?
  2. Which assimilation rules apply across words? The Wikipedia article Catalan phonology says that final -s voices before a vowel, which seems to cause a preceding consonant to voice as well, hence tots els has /dz/ in the middle. I assume that lenition of written b d g occurs across word boundaries as well. What about final omitted -r? Does it reappear before a vowel in the next word, e.g. in a phrase like vaig amar una dona? (And for that matter, does the -ig in vaig become voiced in this phrase?) Do you have any references on this?
Thanks again. Benwing2 (talk) 09:57, 15 January 2024 (UTC)Reply[reply]
The list is correct: proclitic and enclitic pronouns, unstressed possessives, prepositions but not 'cap', 'des', contractions, articles including personal ones and salats, indefinite articles but not 'u', conjunctions including 'si' and 'ni', and also que as a pronoun and conjunction.
In general, contact between words have the same process of assimilation, voicing, or devoicing that inside words. A typical example is els avis /əlz/, els savis /əls/, and tots els is really /ˈtodz.əls/, and vaig amar /ˌbad͡ʒ.əˈma/. The final -t reappears followed by a vowel (sant Antoni /ˌsan.tənˈtɔ.ni/). The final -r of infinitives only reappears followed by a pronoun (anar-hi /əˈna.ɾi/). From chapter 4.4 onwards of the IEC grammar you can find a lot of examples. Vriullop (talk) 12:37, 15 January 2024 (UTC)Reply[reply]
@Vriullop Thanks again for your help. I finally finished most of the work on multiword support. Still to go is approximant allophones of b/d/g, correct handling of apostrophes (represented with ‿), and ‿ as an indicator of liaison in respelling for cases like Sant Antoni respelled Sànt‿Antòni (which should produce /ˌsan.tənˈtɔ.ni/). I (more or less) read chapter 4.4 in the IEC grammar and I notice it also talks about certain cases of total assimilation where maybe cap de is pronounced /kad də/ or something, but I'm not sure we should implement that. I have some questions though:
  1. Brunsvic (as in e.g. Nova Brunsvic) given as [bɾunzvík] in GDLC, is the v correct?
  2. For drets humans, the module currently generates /ˈdɾɛdz uˈmans/, is that correct?
  3. fer cas, fer acte de presència: Is the <r> pronounced in Central Catalan?
  4. Sant Llorenç de la Salanca: the module currently generates /ˈsaɲ ʎuˈɾɛnz də lə səˈlaŋ.kə/ for Central and /ˈsand ʎoˈɾɛnz de la saˈlaŋ.ka/ for Valencia; correct? In general, does final -ç voice when the next word begins with a vowel?
  5. The IEC grammar is equivocal about whether b/d/g become fricatives after /r/, /ɾ/ and /z/, what should we do in this case?
  6. It appears double schwa /əə/ is often compressed to single schwa /ə/ in Central and maybe Balearic, but not in Valencian. This is indicated in GDLC and seems to operate fairly consistently if the second schwa is in a closed syllable (sobreescalfament, contraescarpa), but only sometimes in an open syllable (centreafricà, contraatac). Can you comment here? Likewise, /i/ or /u/ followed by schwa seems to elide the schwa in aeroespacial, autoescola, antiespasmòdic, but only sometimes if the schwa is in an open syllable (hence not in autoerotisme, antiemètic but yes in fotoelèctric, fotoelectricitat, macroeconomia). Likewise /uu/ seems to compress to /u/ if the second /u/ is in a closed syllable (microorganisme), but only sometimes in an open syllable. How do you think we should handle these cases?
  7. I am trying to figure out what to do for written <tn>, <tm>, <tl>, <tll>. It seems that these tend to be pronounced as geminates in native words (e.g. cotna, setmana) but with [d] in cultisms/learned words. I'm thinking maybe we should make the cultism behavior the default and require respelling for the remainder, and least for <tm> where there are more terms like ritme, aritmètic, atmosfera than terms like setmana. But maybe this should differ depending on the different spellings, e.g. <tl> even in a cultism like atlàntic seems to have a geminate in it in Central Catalan but not in Valencian. Can you comment on what you think should be done?
Benwing2 (talk) 22:45, 26 January 2024 (UTC)Reply[reply]
Note, I also revamped the testcases, see Module:ca-IPA/testcases (which demonstrate there's still a lot to fix). Benwing2 (talk) 23:26, 26 January 2024 (UTC)Reply[reply]
  • Brunsvic is strange. It is supposed the GDLC includes pronunciation from the Diccionari ortogràfic i de pronúncia (DOP), but it turns out that the DOP does not include proper names. For non-Catalan place names I check ésAdir, a website for radio and tv journalists, and it shows /'bɾunz.βik/ as I expected.
  • 'Drets humans' is correct.
  • 'Fer cas', 'fer acte', are correct. The r of infinitives only reappear followed by pronouns: fer-se /ˈfer.sə/, fer-hi /ˈfe.ɾi/, fer-t'ho /ˈfer.tu/...
  • 'Sant Llorenç de la Salanca' is correct. Final /s/ of Llorenç is voiced /z/ followed by a voiced consonant or by a vowel.
  • The IEC grammar is too much descriptive about approximants, when they may or may not appear. Considering that /β/ is rare in dialects with contrast /v/-/b/, that is Balearic and Valencian, and trying to be consistent with GDLC and DNV:
    • No approximants r/s + b/d/g in Central.
    • No approximants r/s + b in Balearic and Valencian.
    • Approximants r/s + d/g in Balearic and Valencian.
  • In general, the concurrence of two identical vowels /əə/ (or /aa/, /ee/), /uu/ (or /oo/) is reduced to a single vowel. Variations may depend on formal v. informal, or common use v. cultism, or emphasis of some prefixes. It is hard to define any exception.
  • Written <tm> and <tn> are geminated in a handful of inherited words: cotna, reguitnar, setmana and its derivatives. But 'setmana' with a single /m/ in Valencian. 'Vietnamita' and 'sotmetre' are hesitant. Others like 'ritme', 'ètnic', 'algoritme' are cultisms /dm/.
  • Written <tl> is always /ll/ in Central and Balearic. In Valencian it is /ll/ in inherited words and /dl/ otherwise. Valencian inherited words include those with alternative spelling <tll>: ametla > ametlla, butla > butlla...
  • Written <tll> as alternative spelling of inherited <tl> is pronounced /ʎʎ/ in Central and /ll/ in Balearic and Valencian. Although the DNV includes 'ametlla', 'butlla'... it is not really used, and if written it is still pronounced as <tl>. As a cultism, like 'ratlla', 'bitllet' or 'butlletí', it is pronounced /ʎʎ/ in Central and /ʎ/ in Balearic and Valencian.
Vriullop (talk) 10:54, 29 January 2024 (UTC)Reply[reply]
@Vriullop Thanks. I have (already) implemented most of the above things. I haven't yet implemented reduction of adjacent unstressed vowels or redone the implementation of <tl> and <tll>. As for Sant Llorenç de la Salanca, the module formerly generated [ˈsand ʎoˈɾɛnz ðe la saˈlaŋ.ka] for Valencia (note the [d] in /sand/) but I am guessing this is wrong, so I changed it so it now generates [ˈsaɲ ʎoˈɾɛnz ðe la saˈlaŋ.ka]. Basically I am guessing that elision of stops after nasals happens in Valencia before a consonant but not a vowel or utterance-finally. Is this correct? Benwing2 (talk) 01:53, 30 January 2024 (UTC)Reply[reply]
I didn't notice 'sant'. It is correct, elision of t and assimilation of the nasal before a consonant, not before a vowel or isolated.--Vriullop (talk) 08:00, 30 January 2024 (UTC)Reply[reply]

Your bot is removing valid categories edit

e.g. {{C|de|Western Sahara}} at Westsahara. —Justin (koavf)TCM 00:55, 1 January 2024 (UTC)Reply[reply]

@Koavf This is unavoidable. When you add a page to a category, sometimes it takes a little while for the category to register having the page in it, and in the meantime it shows up in CAT:Empty categories, which is what I use periodically to delete empty categories. I check that category before deleting the empty categories referenced, but I can't notice everything. Any non-empty categories so deleted will get re-created in a few days in any case. Benwing2 (talk) 01:06, 1 January 2024 (UTC)Reply[reply]
What are you talking about? That category was on that page for 5.5 years and your bot removed it for no reason. How is that unavoidable? Are you telling me that your bot is going to re-add all of these categories and undelete them as well? —Justin (koavf)TCM 01:09, 1 January 2024 (UTC)Reply[reply]
Dude, fuck off. Seriously. Yelling at me is not going to get me to help you any quicker than writing nicely.
As for my response, I thought you were referring to my recent deletion of empty categories (as of a few hours ago) rather than a bot change from a month and a half ago. In the future I'd recommend you link to the specific diff. My removal of the category at that time was a by-hand change, not a script change, even though the bot pushed the change; that's what "manually assisted" means (and I have a strong feeling I've already explained this to you). The reason for the removal is that Module:place normally auto-adds categories of this nature, and I thought it would in this case; the reason it didn't is apparently because Western Sahara is listed in Module:place/shared-data as a country, but its definition identifies it (correctly) as a territory rather than a country. I'll fix this so it gets correctly auto-added. Benwing2 (talk) 01:30, 1 January 2024 (UTC)Reply[reply]
I was much nicer than you were just now and was in no sense "yelling". There was no reason for that language. I didn't realize that what I wrote was ambiguous and I thought that referring you to the entry would be sufficiently clear where you can see what your bot (or script or by-hand you) did. Thanks for agreeing to fix this and undelete all of these categories. When will this happen? —Justin (koavf)TCM 22:18, 1 January 2024 (UTC)Reply[reply]
When will you or your bot undo these category removals? —Justin (koavf)TCM 22:42, 15 January 2024 (UTC)Reply[reply]
@Koavf Which removals are you referring to? Specifically to do with Western Sahara, or are there any others? Benwing2 (talk) 22:44, 15 January 2024 (UTC)Reply[reply]
The only ones I am aware of are removals of the sort {{C|CODE|Western Sahara}} which emptied several categories that were then deleted. I'm not familiar with any others. —Justin (koavf)TCM 22:46, 15 January 2024 (UTC)Reply[reply]
When will you or your bot undo these category removals? —Justin (koavf)TCM 01:37, 21 January 2024 (UTC)Reply[reply]
@Koavf Did you not get my ping? I did this days ago. Benwing2 (talk) 02:37, 21 January 2024 (UTC)Reply[reply]
I see that it did and no, I didn't. For some weird reason, I also did not get updates for this thread even after subscribing. :/ Thanks a lot. —Justin (koavf)TCM 10:16, 21 January 2024 (UTC)Reply[reply]

Twice-borrowed terms edit

I looked up παλάβρα, which is from παραβολή after passing through Ladino, and found out that, after moving all the "twice-borrowed terms" categories to "terms borrowed back into", there are still lots more Greek twice-borrowed terms than Greek terms borrowed back into Greek. This may also be true of other languages. Can you look into it? PierreAbbat (talk) 16:43, 1 January 2024 (UTC)Reply[reply]

@PierreAbbat It’s because they were added manually due to the origin being Ancient Greek, which is a misuse of the category imo. Theknightwho (talk) 19:17, 1 January 2024 (UTC)Reply[reply]
Yeah @Pierre, if I may expand on what Theknightwho said, it is indeed because of Ancient Greek being considered a separate language, and this is discussed at Wiktionary:Beer_parlour/2023/November#Does_'terms_borrowed_back_into_LANG'_include_cases_where_the_borrowing_was_from_an_ancestor? (and actually quite a few other places over the years, e.g. Wiktionary:Etymology_scriptorium/2016/June#Twice-borrowed_term_or_term_derived_from_an_older_stage_of_the_same_language?, Wiktionary:Beer_parlour/2011/October#Twice-borrowed_terms), and ... it's tricky. Because ... while I'm sympathetic to the potential complaint that it's somewhat arbitrary that a word used in the modern form of Hebrew or Latin (or Chinese) and derived from the variety spoken two thousand years ago can be automatically categorized as "borrowed back" while a word in modern Greek or English can't be, just because we decided it was most convenient to handle the changes those languages underwent as still being ==Hebrew==, ==Latin== (or ==Chinese==) but decided to split the changes Greek underwent between two languages ... we do have to draw a line somewhere or else we get into absurdities (e.g. a term from Proto-Indo-European, which went into French, and was borrowed into English, is twice-borrowed/borrowed-back?), and if we draw the line anywhere other than "whatever we've decided to consider a separate full language", it gets fuzzy and messy fast. But please comment in the November BP discussion linked above if you have suggestions. - -sche (discuss) 19:45, 1 January 2024 (UTC)Reply[reply]

New :toBcp47Code() method edit

If I interpret this recent change to Scribunto correctly, it provides a way to convert from MediaWiki langcodes to proper langcodes directly. Might be worth incorporating, as I imagine it’ll simplify some of our code, and I think you’re more familiar with that side of things than me. Theknightwho (talk) 15:50, 2 January 2024 (UTC)Reply[reply]

@Theknightwho Unfortunately I'm not sure this is useful for our purposes. Wiktionary language codes aren't always the same as MediaWiki language codes and I don't think we ever need to convert MediaWiki -> BCP47; instead if anything we'd need to convert MediaWiki <-> Wiktionary and Wiktionary -> BCP47. Benwing2 (talk) 22:47, 15 January 2024 (UTC)Reply[reply]

Addition to quotation-template documentation edit

I just fixed a module error caused by WF converting a quote to {{quote-book}} without checking what goes where. The template documentation is thoroughly organized, voluminous, and useless for figuring out how to fix parameter values in the wrong slots. I was going to add a little index of positional parameters, but that would have required reverse-engineering your documentation module. Instead, I'm just going to dump a mockup here, and let you deal with it:

Positional parameters
Position: 1 2 3 4 5 6 7 8
Description: Language code(s) Year Author Title URL Page Quote Translation
Equivalents: |year= |author= |title= |url= |page=
See group: Quoted text Date Author Title Title Page and line Text Text

An alphabetical index of parameter names might also be nice.

And, no, I don't want fries with that...

Thanks! Chuck Entz (talk) 06:14, 5 January 2024 (UTC)Reply[reply]

@Chuck Entz Yeah there are so many params that organizing them properly is a very challenging task. For this reason I tried to do away entirely with positional params but some people squawked loudly enough that they are kept for {{quote-book}} and {{quote-journal}}, and disallowed for the rest. I think your mockup is a good idea. Benwing2 (talk) 06:17, 5 January 2024 (UTC)Reply[reply]

Using the Old French conjugation table as an inspiration edit

I was trying to create a more complex conjugation table for the Old Spanish language. Then I started viewing other templates and learned that the one used for the Old French language is perfect. I might be able to perform some basic editions to adapt for the Old Spanish conjugation system. However, I couldn't get a sample of that template to edit as there are so many links together. So would you please share with me a simple, editable sample of the template of the Old French language so I can apply it to this page: Cantar? Besides, it'd be helpful to better standardize Wiktionary. Thalyson2019 (talk) 05:42, 6 January 2024 (UTC)Reply[reply]

@Thalyson2019 The Old French conjugation tables aren't implemented using templates but rather using a module: Module:fro-verb. I agree that it's a good base to start with when designing a conjugation system for a language that wasn't really standardized. I'm not sure if you are comfortable working in Lua, because the module is written in Lua and it's not really possible to do what it does just using template syntax. Benwing2 (talk) 05:57, 6 January 2024 (UTC)Reply[reply]
Is there any solution for that? I already have the verbs and their positions in mind. I'm not familiar with Lua, even though I create basic templates. Thalyson2019 (talk) 06:08, 6 January 2024 (UTC)Reply[reply]
@Thalyson2019 You'd have to get someone to create the Lua module for you. I can't commit to something like this right now as I have already committed to several other projects. However if you create some mockups and link them here, then if/when I or someone else is able to contribute, the mockups can be a good starting point. Benwing2 (talk) 06:10, 6 January 2024 (UTC)Reply[reply]
Such mockups should be in format of codes or pictures? Thalyson2019 (talk) 06:14, 6 January 2024 (UTC)Reply[reply]
@Thalyson2019 Maybe some sample template calls for some simple verbs like cantar and some complicated verbs as well (tener? ir?). I or anyone working on this would in addition need some good resources on Old Spanish verb conjugation. Benwing2 (talk) 06:18, 6 January 2024 (UTC)Reply[reply]

Finnish inflections edit

Hey Benwing, I know that WingerBot is used to mass-create the inflection pages for Romance verbs. Is there any way that it could do similar work with Finnish noun forms? According to Jberkel's last data dump there are literally millions of Finnish redlinks, most of which appear to be nouns, so bot help is probably necessary to make a real dent. Thanks for your time! Vergencescattered (talk) 20:01, 6 January 2024 (UTC)Reply[reply]

@Vergencescattered: have you talked to @Surjection about this? As a native speaker with a bot, they would be a more logical choice, and more likely to be aware of potential problems. Chuck Entz (talk) 20:35, 6 January 2024 (UTC)Reply[reply]
@Vergencescattered I agree with Chuck. Also pinging @Hekaheka. E.g. there may be a reason these forms aren't created (too many of them?). Benwing2 (talk) 21:20, 6 January 2024 (UTC)Reply[reply]
There are probably somewhere around 200,000 nouns in Finnish and each has 30 inflected forms (15 cases in singular and plural) without taking into account any suffixes. This is the rough number found in Nykysuomen sanakirja. Adding dialects and slang one gets roughly to half a million or more. That would give 6 to 15 million entries. If we add the six (third person possessive suffixes are the same for plural and singular but to compensate this potential simplification there are two of them) possessive suffixes, the number of potential entries increases to 40 to 100 million. Some of the forms might be unattestable as abessive, comitative and instructive are quite rarely used but that does not cut more than 20% of the total. On top of this each verb has close to one hundred inflected forms if we take into account the possessive forms of some infinitives and participles.
This leads me to think that we might need a new approach to inflected forms in general. Perhaps they should have an entry of their own only in such rare cases in which the inflected form has a meaning or meanings that cannot be readily derived from the lemma form. In most cases the system would work so that a search for an inflected form would redirect to the article of the lemma form. --Hekaheka (talk) 23:33, 8 January 2024 (UTC)Reply[reply]
@Hekaheka It would be great if MediaWiki could autogenerate the text of an inflected form, but in its current state it can't do either that or redirect from an inflected form to a lemma form. IMO the most useful thing about having inflected forms entered as such is when you have homophones or homographs between different inflected forms. This occurs fairly often in the Romance languages, for example, between noun and verb forms or between adjective and verb forms. It also occurs fairly often in Russian between noun and verb forms but rarely for adjectives except for short forms of adjectives; for this reason I have never done a bot run to create Russian adjective forms (besides the fact that there are a lot of them). If Finnish grammar is largely regular and doesn't have a lot of homonyms, I would think it's not useful to have inflected forms generated. I suppose for the moment we need to use our judgment as to whether it's worth it to create such forms. Benwing2 (talk) 23:38, 8 January 2024 (UTC)Reply[reply]
I would definitely appreciate their input! I didn't know about Surjection or their bot before you mentioned them, so I apologize for bothering you about it. Thank you! Vergencescattered (talk) 23:27, 6 January 2024 (UTC)Reply[reply]

Request to deploy {{szy-pron}} edit

I've created a Sakizaya pronunciation template, and I need help deploying it to all Sakizaya language entries on Wiktionary. Could you assist with this using your bot account? --TongcyDai (talk) 17:29, 7 January 2024 (UTC)Reply[reply]

@TongcyDai What needs to be done here? Are there any cases where manual respelling or other help for the template is needed? Benwing2 (talk) 22:54, 7 January 2024 (UTC)Reply[reply]
When adding the template, simply insert {{szy-pron}} into each Sakizaya entry, no parameters and respelling are needed. TongcyDai (talk) 10:16, 8 January 2024 (UTC)Reply[reply]
Please let me know if there's anything else you need from me to deploy the template. --TongcyDai (talk) 18:38, 1 March 2024 (UTC)Reply[reply]

_demonym" data-mw-fallback-anchor="Relational_-.3E_demonym" data-mw-thread-id="h-Relational_->_demonym-20240107192300">Relational -> demonym edit

Could you clean up Spanish demonyms like diff? It makes more sense than categorizing 900+ demonyms as relational adjectives just because they don't have a one-word translation in English. Ultimateria (talk) 19:23, 7 January 2024 (UTC)Reply[reply]

@Ultimateria Hi, I actually wrote a script awhile ago to do exactly this but never ran it. I don't remember why; maybe it needed a few fixes. I'll go ahead and finish this. Benwing2 (talk) 22:52, 7 January 2024 (UTC)Reply[reply]

Revert adding acceleration forms to {{pl-conj-ai}} edit

Hi @Benwing2. You just reverted the changes to the template {{pl-conj-ai}}. Could you please elaborate on what was broken? So I could see how it could be fixed while preserving the benefit of the acceleration forms? Incidentally, similar changes have been made to other templates, so the same error could arise for other verbs. You are referring to active adverbial participles, for which only one single form was used before, even though those adverbs have different forms depending on plural/singular and gender. Maybe the breaking tool needs to be updated to cater for those other forms. @Vininn126 JuChelou (talk) 14:04, 25 January 2024 (UTC)Reply[reply]

@JuChelou For one thing, the specific value of 'active adjectival participle' (along with various other specific values) is processed specially in Module:accel/pl and causes the inflection to be set to 'actv|adj|part'. By changing this you broke this support, and caused it to use an invalid inflection tag set 'm|s|active adjectival participle'. The other inflections of the participle were similar. The correct thing to do is to leave the masc sing participial forms unchanged and if you want to add acceleration to the other forms, they should cause the form to be created as e.g. {{feminine singular of|pl|PARTICIPLE}} rather than as an inflection of the verb. You can see an example of how to handle this correctly by looking at the lines starting at Module:accel/pt#L-21. Benwing2 (talk) 22:50, 25 January 2024 (UTC)Reply[reply]
Thank you @Benwing2 for your reply. @Vininn126
I tried something in Module:accel/pl and {{pl-conj-ai}} to add proper accel form support for the adjectival participles.
However, I am not fully satisfied with the result because:
1/ on the masculine singular form, it could add 2 forms, for example for wyrzucający wyrzucać
2/ the result would not be similar if the new wiki page is triggered from the conjugation chart or from the adjective declension chart (which I also added recently). For example, for wyrzucające, the new wiki page triggered from the verb link would "miss" the fact that it is also the form for accusative neuter and accusative non virile.
Any advice? Or should I just ditch the extra accel forms for the participle and contributors would use the new accel links from the adjective declension module? JuChelou (talk) 16:18, 26 January 2024 (UTC)Reply[reply]
In theory you could generate wyrzucający and from there generate the others, but it's less than ideal. Vininn126 (talk) 16:40, 26 January 2024 (UTC)Reply[reply]
@JuChelou Hmmm, I'm not quite sure how to handle #2; either you'd have to add all the non-nominative forms of the participles to the verb table so that the accelerator code knows about them automatically, or you'd have to hack the code in MOD:accel/pl somehow to add the remaining inflections in. (This latter thing is possible, as I think I added a hook that you can define in the accelerator module that operates at the end after all the inflections have been combined.) As for #1, the general principle I've followed is not to include definitions for non-lemma forms that are identical in spelling to the lemma. I followed this principle, for example, when I create a bot script to add Russian noun inflections. This also happens in Portuguese verbs (where the 1st and 3rd singular future subjunctive usually looks the same as the infinitive), and for Latin feminine nouns (where the ablative singular is spelled the same as the nominative singular, although the pronunciation is different as the ablative ends in long -ā while the nominative ends in short -a). I actually removed the cases where Portuguese verbs were defined normally but had an additional definition as the 1st/3rd singular future subjunctive, but I may have left alone the Latin ablative cases because of the different pronunciation. In the Polish case, the pronunciation is the same and so you could fix this by just not having an accelerator defined on the forms that look like the lemma.
In general, I would actually argue that instead of including only the nominative case forms, it's best not to include anything but the masculine nominative singular of the various participles in the verb table, and require that the remaining forms be defined using accelerators on the participle table, even though User:Vininn126 thinks this is non-ideal. This is how we handle participles in Russian, for example, which is similar in many ways to Polish. I think the main benefit to having non-lemma participle forms defined in the verb table is if there are irregularities in their formation, but I don't think this is the case in Polish. Benwing2 (talk) 23:20, 26 January 2024 (UTC)Reply[reply]
An additional thought is maybe we shouldn't be defining non-lemma forms of participles at all, since AFAIK they're quite regular and there are a lot of them. See the discussion above about #Finnish inflections. This is the policy we follow for Russian, for example. Benwing2 (talk) 23:22, 26 January 2024 (UTC)Reply[reply]
Where do we define non-lemma participles? Vininn126 (talk) 10:17, 27 January 2024 (UTC)Reply[reply]
@Vininn126 Sorry, can you clarify what you mean? Benwing2 (talk) 10:37, 27 January 2024 (UTC)Reply[reply]
I simply didn't understand your last message Vininn126 (talk) 10:59, 27 January 2024 (UTC)Reply[reply]
Thank you @Benwing2 for your very detailed answer.
Basically, regarding your recommendations for #1, that would be easy to remove the accel form for the version identical to the lemma form.
For the #2 however, that would be more tricky as it would require to duplicate generating all the forms, opening room for discrepancies between the pl-adj module and the polish accel module.
If I understand correctly, your overall recommendation is to remove all the other forms of the participles in conjugation templates. Basically, we would just have "active adjectival participle: masculine singular nominative form".
It would be similar to what is done for the verbal noun, where there is only the masculine singular nominative form, even though other forms exist.
@Vininn126 what would be your opinion on removing the additional forms of the adjectival participles from the conjugation templates? JuChelou (talk) 17:02, 28 January 2024 (UTC)Reply[reply]
Sounds fine to me; it's not typical to have them. Vininn126 (talk) 18:03, 28 January 2024 (UTC)Reply[reply]

On the {{quote-book}} template edit

I was wondering what exactly the combined use of the parameters |start_year= and |year= is supposed to communicate.
It's supposed to mean a range of dates, but—with an example 1390–1400—is range meant:

in the sense of "the composition of this work started in 1390, and ended in 1400"?
or in the sense of "this work was probably completed (or brought to its current state, if unfinished) somewhere between 1390 and 1400"?

Thanks in advance for any clarification. I've recently discovered these parameters, and I'm not sure I've been using them properly. —— GianWiki (talk) 15:24, 25 January 2024 (UTC)Reply[reply]

@GianWiki These parameters were there before I started to clean the template up, so you might ask User:Sgconlaw, but I'm thinking it's used for works that took several years to create. Benwing2 (talk) 23:45, 25 January 2024 (UTC)Reply[reply]
I see, I hadn't noticed that. I'll try asking them just to make sure.
Thank you for your time. —— GianWiki (talk) 08:18, 26 January 2024 (UTC)Reply[reply]
@GianWiki: I don't think the parameters were clearly defined at the time when I first tidied up the {{quote-}} templates. Personally, I use them to mean a range of publication dates (for example, if a novel is originally published in parts in a magazine over many months), and if I intend a range of dates to mean anything else I add a qualification in parentheses for clarity like this: |year=c. '''1597–1600''' (date written). — Sgconlaw (talk) 10:54, 26 January 2024 (UTC)Reply[reply]

WingerBot and Welsh animal genders edit

Hi, your bot edited garan ("crane") and petris ("partridge") so they would be “m or f by sense”, which isn’t correct. I've corrected them, but can you amend the bot so it doesn’t edit other animals like this please?

Garan is usually a masculine noun, that can be feminine due to dialect, rather than the sex of the animal (e.g. in Iolo Williams’s Llyfr Adar and the Geiriadur yr Academi) and petris is feminine.

I’ve consulted a bit with other Welsh speakers and the only source I can see for petris ever being masculine is the Geiriadur Prifysgol Cymru, which could easily be due to one or two examples from centuries ago. “A small cock partridge” would be ceiliog petris bach – where bach modifies ceiliog, not petris.

Cheers, Arafsymudwr (talk) 15:54, 30 January 2024 (UTC)Reply[reply]

@Arafsymudwr This was a one-off run where I manually made the changes in question in a text editor and only used the bot to push the changes (that's what "manually assisted" means in the changelog message). So there's no script to amend but I'll make sure not to change the genders of animal terms in Welsh (or generally in any language, I think) in the future. Benwing2 (talk) 06:11, 31 January 2024 (UTC)Reply[reply]

Links to English possessives in inflection-line templates edit

I wish I had included this in my request about links to components of hyphenated terms in English inflection templates. (How's that coming, BTW?) Many vernacular names of organisms are like Gundlach's hawk (See Gundlach's hawk). It would be better, especially for me, if the link were to Gundlach rather than the possessive. I can't think of any instances for which the possessive would be a better link target and believe that any such instances are relatively rare exceptions. DCDuring (talk) 16:29, 31 January 2024 (UTC)Reply[reply]

@DCDuring Yes, in fact my concerns over how to handle apostrophes are why this hasn't already gotten done. I'm thinking that we should split any term with a trailing 's except for one's and someone's (with exceptions also maybe for he's, she's, it's), but not split other terms with apostrophes (e.g. I'm, don't, haven't). BTW I notice that we've split apostrophe-s into two terms, 's for the contraction and -'s for the possessive. Personally I think this is confusing and probably they should be merged into 's (without the hyphen). It also makes auto-linking more difficult; probably we should link all occurrences of 's into -'s since this is the more common case. Benwing2 (talk) 22:07, 31 January 2024 (UTC)Reply[reply]
This 's/-'s distinction gets to how to indicate the distinction between an inflectional ending and a contraction, doesn't it? One one level one needs a linguistics or philosophy degree to be qualified and/or motivated to argue this, but I don't hold the right degrees. On another level, how to help users, it would seem both should be on the same page, almost certainly 's. It probably should go to BP, but you may be able to go ahead with what is convenient to implement and rely of links between [['s[[ and -'s to help users in the meanwhile. DCDuring (talk) 22:22, 31 January 2024 (UTC)Reply[reply]
@DCDuring Please see User:Benwing2/test-en-multiword for some examples of the new headword link handling system that I'm testing. It includes the ability to change the link of one (or several) of the words of a multiword expression without having to write out the entire expression; see the examples that specify |head=~.... (This functionality was already implemented for Italian and later extended to other Romance languages.) Note that if there are both hyphens and spaces, the default behavior is to link the space-separated components but not break up hyphen-separated components, although this can be changed using |splithyph=1. Possibly the default should be reversed and hyphen-separated components broken up by default unless |nosplithyph=1 is given; what do you think? Benwing2 (talk) 00:01, 2 February 2024 (UTC)Reply[reply]
I will look at it in about 16 hours. DCDuring (talk) 00:04, 2 February 2024 (UTC)Reply[reply]
@DCDuring: OK, thanks. BTW I'm thinking we should indeed change the default when there are both hyphens and spaces, and maybe make an argument to convert hyphenated terms to space-separated terms, e.g. for cases like civil-rights movement and claw-hammer coat that should be linked as [[civil rights|civil-rights]] [[movement]] and [[claw hammer|claw-hammer]] [[coat]] (likewise closed-circuit television, clock-face timetable, coffee-table book, etc.), although there are also examples like close-up lens, coin-operated laundry, context-free grammar, co-occurrence network, etc. where we do want to link the hyphenated component as such. Benwing2 (talk) 00:58, 2 February 2024 (UTC)Reply[reply]
I really like the more hyphenated forms because they reduce certain kinds of possible misreading of MWEs, but contemporary relative frequency may indicate that hyphenated forms are already much less frequent. For three-part English vernacular names of organisms, I often find that the hyphen is in the wrong place or is not useful. But black billed amazon is not a good substitute for black-billed amazon. DCDuring (talk) 01:10, 2 February 2024 (UTC)Reply[reply]
@DCDuring I have redone the handling of terms with both hyphens and spaces so that it now looks up the hyphenated term to see whether it exists in order to determine how to link it. Specifically:
  1. If the term exists as a space-separated compound, link to that. (We prefer space-separated compounds because the hyphen-separated form often exists as a soft redirect.)
  2. Otherwise, if the term exists as a hyphen-separated compound, link to that.
  3. Otherwise, link the hyphenated terms separately.
This handles most cases properly, although there are occasional situations where it fails; for example, close up and close-up both exist and are different, and by default close-up lens links (wrongly) to the former. For this reason I've provided params to override the default handling: |hyphspace=1 forces case (1) above, |nosplithyph=1 forces case (2) above, and |splithyph=1 forces case (3) above.
Benwing2 (talk) 05:27, 2 February 2024 (UTC)Reply[reply]
I hope we will never have entries for terms like scaly-headed. So I'll have to use nosplithyph=1 for a vast number of vernacular names. I may as well not have asked for this favor. I suppose I could create a new template to wrap {{en-noun}} or {{head}}, specifiying the parameter, to save keystrokes for these vernacular name entries.DCDuring (talk) 13:41, 2 February 2024 (UTC)Reply[reply]
@DCDuring If you need to use |nosplithyph=1 for a large number of vernacular names, that is defeating the purpose of things. Can you explain why you think you need to use this for so many? Things like scaly-headed are SOP so should be split, IMO. Benwing2 (talk) 20:37, 2 February 2024 (UTC)Reply[reply]
I misread in haste, I think. DCDuring (talk) 22:43, 2 February 2024 (UTC)Reply[reply]
@DCDuring I have implemented the various changes to the linking behavior of Module:en-headword. They are documented on the module documentation page Module:en-headword/documentation (although the section on link modifiers is still to be written). There is text in the documentation of {{en-noun}}, {{en-verb}} and {{en-adj}} pointing to the module documentation page for the specifics about multiword linking and suffix handling. Let me know if there's anything else needed documentation-wise. Benwing2 (talk) 00:10, 7 February 2024 (UTC)Reply[reply]
The section on link modifications (renamed from link modifiers for clarity) is written. Benwing2 (talk) 00:46, 7 February 2024 (UTC)Reply[reply]

devil's own edit

I reverted WingerBot's edit to this entry not just because of the module error (I think you added |def= to the noun and proper noun code, but not to the adjective), but because it looks to me like the syntax is more along the lines of "[the devil's] own" rather than the "the [devil's own]. Not that I would get into an edit war over this- I just wanted to make sure you were aware of that dimension before deciding how to fix things. Chuck Entz (talk) 04:23, 4 February 2024 (UTC)Reply[reply]

@Chuck Entz Thanks. Yeah I forgot about handling adjectives with the in them. As for the syntax issue, all that |def=1 does is add the before the head; it doesn't assert any particular way of parsing the constituents. I suppose it could be interpreted as asserting an analysis like the [devil's own] but that wasn't my intention (and I'm not quite sure how we'd indicate such an analysis in the head). Benwing2 (talk) 04:47, 4 February 2024 (UTC)Reply[reply]
But adjectives don't have the in them. We should review the entries that so claim and determine whether there is good reason to ever have the inside the headword template for adjectives. DCDuring (talk) 14:14, 4 February 2024 (UTC)Reply[reply]
Never mind. I was thinking of leading the. We have numerous entries of purported adjectives with the embedded. Some of them seem like attributive use of a noun, but not all. DCDuring (talk) 14:23, 4 February 2024 (UTC)Reply[reply]

Category:LANG nouns with other-gender equivalents edit

Hello Benwing. I hope that this does not take too much of your time. How should CAT:Telugu nouns with gendered forms be added to MOD:te-headword? I tried looking at MOD:hi-pa-headword, but could not figure out what and where to add the equivalent of:

table.insert(data.categories, data.langname .. " " .. plpos .. " with other-gender equivalents")

to MOD:te-headword. I noticed that this feature was missing for Telugu when I saw

Synonym: (female) రచయిత్రి

at the entry for రచయిత (racayita). The Lua-fication of {{te-noun}} means adding features such as this is not as easy as adding

{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}

to {{te-noun}}. Kutchkutch (talk) 00:46, 5 February 2024 (UTC)Reply[reply]

{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}
at the end of {{te-noun}} seems to work for categorisation but not for the headword line. Kutchkutch (talk) 00:59, 5 February 2024 (UTC)Reply[reply]
@Kutchkutch Glad you figured it out. IMO Module:te-headword needs to be rewritten; it wasn't written by me and doesn't really follow the standard structure for such modules, which is probably why you had difficulty figuring out how to add the appropriate code. Benwing2 (talk) 22:33, 8 February 2024 (UTC)Reply[reply]

Email edit

Btw, idk if you have notifications turned on for emails, but I sent you one. Vininn126 (talk) 22:24, 8 February 2024 (UTC)Reply[reply]

Thanks, I responded. For some reason I didn't get an email notification here on Wiktionary even though I do have email notifications turned on. Benwing2 (talk) 22:32, 8 February 2024 (UTC)Reply[reply]

bùzháodiào edit

Hello. Could you help me fix the Traditional Chinese conversion here? Thanks. ---> Tooironic (talk) 00:31, 11 February 2024 (UTC)Reply[reply]

@Tooironic What's the exact issue? BTW in general I am not too familiar with how the Trad <-> Simp conversion works; User:Theknightwho knows more. Benwing2 (talk) 00:32, 11 February 2024 (UTC)Reply[reply]
Thank you User:Theknightwho! ---> Tooironic (talk) 00:39, 11 February 2024 (UTC)Reply[reply]

hmm edit

How much longer is it going to take you to finally finish making this new pronunciation module for Polish? You've been doing it for several months now, hurry up, or someone might think you're getting a little lazyyy :) Gugugagasraniewbanie (talk) 08:30, 13 February 2024 (UTC)Reply[reply]

@Gugugagasraniewbanie Yeah it will happen soon. Benwing2 (talk) 08:32, 13 February 2024 (UTC)Reply[reply]
OK, then you have my forgiveness Gugugagasraniewbanie (talk) 08:35, 13 February 2024 (UTC)Reply[reply]

Mon-Burmese script edit

I changed some letters defined for specific languages (e.g. "X is a letter of the Shan alphabet") to that language (i.e. Shan), then added a request for definition to the translingual entry. If this is somehow considered vandalism, I'll revert myself, but I'm assuming obvious fixes like this are acceptable, an it parallels other entries that only have definitions for specific languages. (A definition might be as simple as stating that it's a letter of the Mon-Burmese script corresponding to a certain letter in Sanskrit, but I didn't do that myself as I thought I might be accused of vandalism.)

I also removed a couple pronunciations that were for the wrong entry. kwami (talk) 04:25, 14 February 2024 (UTC)Reply[reply]

@Kwamikagami "Vandalism" doesn't seem like the right word for changes that are in good faith. As to whether they are wrong or counterproductive I don't know but they seem generally fine to me. User:RichardW57 do you have any comments? Benwing2 (talk) 04:45, 14 February 2024 (UTC)Reply[reply]
Okay, "blockable offense" then. kwami (talk) 04:47, 14 February 2024 (UTC)Reply[reply]
Yeah I understand. BTW I think blocking is only likely if you edit-war or keep making changes of a specific nature after people have objected to them. (Also editors who don't know what they're doing but think they do; editors of this nature can do a lot of damage.) Wikipedia seems generally more tolerant of edit-warring, maybe because of the number of editors relative to how many articles there are. Benwing2 (talk) 04:57, 14 February 2024 (UTC)Reply[reply]
@Benwing2: Which Shan alphabet? There are several Shan languages, which often makes the letters translingual because shared by several Shan languages! The change seems backwards - I would have said that the thing to do was to waste space by adding the Shan entry. As Burmese-script words easily consist of a single letter, cloning letters to each language using them makes Wiktionary more difficult to find by eye, in accordance with the apparent aim of difficulty of use. --RichardW57 (talk) 08:49, 14 February 2024 (UTC)Reply[reply]
If there are other Shan languages besides [shn], and they use the same letter, then they should be listed. But as it was, they were not listed -- only [shn] was.
And yes, I know you want to lump all languages together, but that's not the consensus for Wikt. kwami (talk) 18:49, 14 February 2024 (UTC)Reply[reply]
We have Shan (shn), Khamti Shan (kht), Aiton (aio), Phake (phk) and Tai Laing (tle) that use the Burmese script. The Tai Nuea (tdd) (= Tai Le /Tai Dehong / Chinese Shan) (not to be confused with Northern Tai or Northern Thai) and Tai Khuen (kkh) (though their speech is more akin to Northern Thai, but they identify as Shan) use different scripts. There's also Khamyang (ksu or nrr). Tai Ahom should arguably be included, but again it has its own script. --RichardW57 (talk) 23:32, 14 February 2024 (UTC)Reply[reply]
And when we say a letter is used by [shn], do we necessarily know that it's also used by the others? E.g. in Lik-Tai for Khamti? The label "Shan" may cover multiple languages in some usage, but when Wikt has an entry for Shan [shn], we mean specifically that language. When we mean Khamti, we say Khamti. Etc. But sure -- if we can demonstrate that a letter is used by multiple languages, we can say that it's used for multiple languages. Though when giving the pronunciation and orthographic rules, we need to be careful not to present [shn] as representative if it isn't. kwami (talk) 01:23, 15 February 2024 (UTC)Reply[reply]

Seeking template help edit

Hi, we find your Hindi language templates very helpful. Could you assist us with essential Sylheti templates (language code: syl) on English Wiktionary? We could contribute with translations, although we are still familiarizing ourselves with Wiktionary policies. -- ꠢꠣꠍꠘ ꠞꠣꠎꠣ (talk) 07:52, 16 February 2024 (UTC)Reply[reply]

@ꠢꠣꠍꠘ ꠞꠣꠎꠣ Hi I'm up to my ears in requests so I'm won't be able to get to this soon, although if someone else wants to work on it using the Hindi modules as a starting point, I can provide guidance. Benwing2 (talk) 09:55, 16 February 2024 (UTC)Reply[reply]

Category:Romance terms inherited from Latin nominatives edit

Hi. Sorry, I think I was a bit too 'bristly' with how I responded earlier. I really do support removing these categories and sticking the relevant content into 'Appendix: Romance terms plausibly inherited from Latin nominatives'. Nicodene (talk) 17:21, 18 February 2024 (UTC)Reply[reply]

@Nicodene This sounds good to me and "plausibly" sounds like a good term to use, and I apologize if I also was a bit in-your-face. If you can write the appendix and put the terms there in a list, I can remove the categories from the terms by bot. Benwing2 (talk) 19:56, 18 February 2024 (UTC)Reply[reply]
Done. This should actually make it easier for me to reorganise/restructure it all, which I've been meaning to do. Nicodene (talk) 20:39, 18 February 2024 (UTC)Reply[reply]
@Nicodene Thanks! Benwing2 (talk) 00:45, 19 February 2024 (UTC)Reply[reply]
@Nicodene I am going to remove the pages listed in the appendix from the '... inherited from Latin nominatives' categories. Just checking that this is OK with you. Benwing2 (talk) 04:56, 19 February 2024 (UTC)Reply[reply]
Yes, go for it please. Nicodene (talk) 05:01, 19 February 2024 (UTC)Reply[reply]
@Nicodene OK it's done. BTW the appendix is looking good and I'm glad you have included detailed notes. Benwing2 (talk) 05:26, 19 February 2024 (UTC)Reply[reply]

Macrolanguages edit

Hi - do you have any ideas for how we could handle macrolanguages in the data (Chinese being the most obvious example, given how we handle Chinese L2s). I’m not keen to create a whole new type of object, since this situation comes up in loads of places, as we don’t have a coherent distinction between “is a type of” and “is a descendant of”, leading to the issues I mentioned in WT:RFM#Converting Min Nan into a family, where Teochew and Leizhou Min are “descended from” Min Nan, whereas they’re actually types of Min Nan.

I suspect you’ve noticed similar things with how Persian and Latin are handled. One common situation which stands out are language periods: we list Old Latin as ancestral to Latin, but as it’s an etym-only language of Latin that technically means we’re saying it’s ancestral to itself. Same for Early Modern English and English, and so on. We get round it by adding an explicit check to Module:languages to prevent a language being ancestral to itself, but that’s a kludge which is symptomatic of our poorly defined language model.

Also see the Japonic family tree at Category:Proto-Japonic language, where the periodisation of Japanese is all messed up because they’re all treated as etym-only languages part of Japanese, even though Early/Late Middle Japanese have Middle Japanese as their immediate parent. (They currently display in the wrong order, since Middle Japanese should not be listed before Early Middle Japanese if we were to follow the same system as Latin; the data is correct but Module:family tree is bugged.) A much bigger issue is that we imply Middle Japanese is split into three periods, and that the central period is somehow representative. This is confusing at best, and outright misleading to anyone who isn’t familiar with the nuances of our data modules. Theknightwho (talk) 18:29, 18 February 2024 (UTC)Reply[reply]

@Theknightwho Since you have merged etym-only and full languages to the point that both are more or less just types of Language objects, can we not just have a "type" field identifying something as a macrolanguage? That way it will still work as a language for most purposes. IMO we do need to properly distinguished is-a-X and is-a-descendant-of-X, and it seems you've provided a way with the ancestors field. As for the issue of Old Latin vs. Latin, we do have a "Classical Latin" etym language and ultimately we need to push more in this direction, although it will require some thinking. These are just the thoughts off the top of my head. Benwing2 (talk) 19:54, 18 February 2024 (UTC)Reply[reply]
@Benwing2 Thanks - that's helpful to think about.
I'd rather not have a specific macrolanguage field, since it's superfluous to whether or not something is set as being a "type of" that language. I think the handling of Chinese, Latin, Persian, English and (one I missed above) Norwegian should probably all be done in the same way. At the most extreme end, the Sinitic family and Chinese are in fact the same thing, so I'm more inclined towards having a way to set one language as a type of another (as we do with etym-only languages), fully merging etym-only languages into languages, and then having a flag which sets whether it should be treated as a full language. That way, we also get rid of the weird half-and-half situation going on with Classical Persian and the arbitrary distribution of Chinese lects between language and etym-only language, while making it more straightforward to switch something from one to the other (e.g. the Prakrits). It may also be worth doing the same with families, since (as Chinese shows) macrolanguages and families are basically the same thing in most situations.
I think we probably need some kind of periodisation mechanism. In the case of Latin, if we're treating Old Latin as a "type of" Latin, then strictly speaking Latin's ancestor should be Proto-Italic. However, within that we could have the various periods, including Classical Latin, and there should be a way to set a default period for situations when only the generic language code is provided. For most languages that would be the standard language; in the case of Latin, it would be Classical. This would alo potentially address the issue of cross-overs between regional lects and periods: e.g. Northern Early Modern English, and should also help avoid the silly Japanese situation, since periods should be possible to nest inside each other. Theknightwho (talk) 20:10, 18 February 2024 (UTC)Reply[reply]
@Theknightwho All this sounds good to me in general although it would be helpful if you could write out your proposals in more detail as it's sometimes a bit hard for me to work out what your thoughts are when presented abstractly. Benwing2 (talk) 20:31, 18 February 2024 (UTC)Reply[reply]
@Benwing2 Will do. I’ll also have a think about how we should handle this in the family tree display, since a lot of the confusion stems from that displaying descendants and variants/types in exactly the same way. Theknightwho (talk) 20:52, 18 February 2024 (UTC)Reply[reply]
One problem that needs to be addressed is that language change doesn't always follow a tidy tree model. Macrolanguages are messy. A macrolanguage always has a standard lect that the other lects identify with- but there can be more than one, and which lect is the standard can change over time. Even some of the more complex ordinary languages have similar phenomena. This can end up being reflected in the history of languages both within and deriving from the (macro)language.
With English, you have the same language changing its prestige/standard dialect several times in Old English due to the rise and fall from prominence of specific kingdoms: Anglia, Mercia, Northumbria, and finally Wessex (this is off the top of my head- I'm sure I missed something). With the transition to Middle English it all moved to London. Middle English borrowed heavily from Old Northern French, but since then the source has been Parisian French. Scots split off from the northern dialects that descended primarily from Northumbrian. I'm sure there were changes in the Old Norse dialects that Old English and Middle English borrowed from, and then there's the matter of Brythonic Pictish and Goidelic Gaelic in Scotland and their influence on Scots and northern English.
China had several changes in which were the prestige lects, and these are reflected in the various named yomi in Japanese, as well as the borrowings into other neighboring languages. Then there's Mycenaean Greek, which is different from whatever became Ancient Greek, and the fact that older Latin borrowings didn't come from the Attic dialect that became modern Greek, and Tsakonian that came from Doric, etc.
If you look at a regional lect, you can find things descended directly from the same region in the ancestral language, and things that came in from the standard lects of the different historical stages, and other things that were borrowed from various external languages. Sometimes separate languages split off from these regional lects, so they have more in common with the regional varieties of the main language than with the standard lects of any historical period.
To stretch the tree analogy a bit: sometimes a limb that's touching the ground sets root and becomes a tree in its own right, and other times branches or roots from separate trees graft together after prolonged contact.
I seem to have written a book here, but I hope you can see what I'm getting at. It would be a good idea to think about some way of representing the internal structure of macrolanguages and even regular languages, and the way that different descendants can come from different parts of the same language. There's a complex interchange between region and historical period, so the Wessex dialect of today has a completely different status from the Wessex dialect of a thousand years ago, and the geographical identification of what's mainstream and what's dialectal changes over time. It's all secondary to the main concept of parent and daughter language, but it might help us with some exceptional cases like Chinese. Chuck Entz (talk) 23:15, 18 February 2024 (UTC)Reply[reply]
Agreed. Even Anglo-Norman, the main vehicle of 'Gallicisms' in Middle English, began as a chaotic hodge-podge of Old French dialects, certainly in many respects 'northern-flavoured', but not only, and increasingly slanting towards (but never quite attaining) Central French norms as the centuries went by. In this case as well there is no question of a precise dialectal ancestry. Nicodene (talk) 14:34, 19 February 2024 (UTC)Reply[reply]

Italicising synonyms for taxonomic names edit

Hi Benwing. Could you edit Module:form of, Module:form of/templates, and/or T:synonym of to add the ability to italicise the linked-to term in transclusions of {{synonym of}} (preferably by calling |i=), please? Such functionality is needed for taxonomic synonyms. ATM, work-arounds like those seen in Asclepias filiformis var. buchenaviana, Bulbophyllum buchenavianum, Gomphocarpus filiformis var. buchenavianus, Megaclinium buchenavianum, and Tropaeolum buchenavianum are necessary. 0DF (talk) 00:38, 19 February 2024 (UTC)Reply[reply]

@DCDuring who would know how this is handled in other taxonomic entries. Chuck Entz (talk) 01:08, 19 February 2024 (UTC)Reply[reply]
Now, {{syn of}} (and {{alt form of}}, possibly others) suppresses italics formatting that {{taxlink}} provides or direct or piped wikitext formatting. All we would need is templates like {{syn of}} and {{alt form of}} to handle embedded wikitext for italics, as is now possible in other templates that incorporate links. Alternatively Something like {{syn of}}, say {{taxsyn}} (also {{taxalt}}), would have all the formatting capabilities {{taxlink}}, which include not italicizing terms like "var.", "section" ("sect.", "subsect"), "subg.", and "subsp." in taxonomic names. This would probably not involve too much renaming of templates at this point. DCDuring (talk) 13:58, 19 February 2024 (UTC)Reply[reply]
And it would be nice to allow to appear without requiring pipes. DCDuring (talk) 14:37, 19 February 2024 (UTC)Reply[reply]
@DCDuring: I assume it would be possible to include the non-italicising functionality of {{taxlink}} in {{synonym of}} by making it contingent upon both |1=mul and |i=1 being true. I can't imagine a case in which one would want to define a term as a synonym of something translingual that contains any of the strings sect., subg., subsect., subsp., or var.; italicise it; and for that term not to be a taxonomic name. 0DF (talk) 14:38, 19 February 2024 (UTC)Reply[reply]
The italicization rules of the various taxonomic bodies include that all taxonomic names (ie, any rank) of viruses, bacteria, and archaebacteria be italicized. It is probably simpler to use passed-through wikitext italics than to duplicate {{taxlink}} functionality. DCDuring (talk) 14:47, 19 February 2024 (UTC)Reply[reply]
@DCDuring: I only meant {{taxlink}}'s functionality of automatically de-italicising those few abbreviations. Italicising dependent on a parsing the taxon (as a species, genus, phylum, or whatever) seems superfluous and unnecessarily complicated for {{synonym of}}; |i=1 should be all that's necessary. 0DF (talk) 14:59, 19 February 2024 (UTC)Reply[reply]
It seems too complicated to me too, but I've often been surprised with what our techno-mavens are willing to do, for reasons that remain mysterious to me. Simply passing through wikiformatting (and, possibly, "") would be fine with me. It would be easy enough to find the relatively few instances we would have of improper handling of those not-to-be-italicized terms in {{syn of}}, {{alt of}}, and the various etymology templates, too. DCDuring (talk) 19:06, 19 February 2024 (UTC)Reply[reply]
@DCDuring: How would you want the obelus to be treated? 0DF (talk) 22:42, 19 February 2024 (UTC)Reply[reply]
Directly in front of taxon, ignored for linking, but displayed without being italicized. DCDuring (talk) 12:42, 20 February 2024 (UTC)Reply[reply]
@DCDuring: De-italicising would be handled in the same way as it's handled for sect., subg., subsect., subsp., and var., I expect. Stripping from the link text would be easy (handled in the same way Latin ā, ē, ī, ō, ū, ȳ link to Latin a, e, i, o, u, y), but it may end up being enacted in undesirable circumstances. Do we need a new (mul-tax?) language code for taxonomic names, perhaps? 0DF (talk) 18:06, 20 February 2024 (UTC)Reply[reply]
I'd prefer a shorter one, of course, like 'mult' or 'mul-t'. DCDuring (talk) 18:27, 20 February 2024 (UTC)Reply[reply]
@Mahagaja: How much freedom do we have in devising language codes? 0DF (talk) 18:30, 20 February 2024 (UTC)Reply[reply]
@0DF: You'd have to get consensus at WT:RFM for it. I wouldn't hold my breath. —Mahāgaja · talk 18:48, 20 February 2024 (UTC)Reply[reply]
@Mahagaja: Thanks for the response. I mean, rather, what restrictions are there on the form that language codes take? I know we use ISO 639-3 codes where they're available, but what about custom, in-house codes? 0DF (talk) 20:17, 20 February 2024 (UTC)Reply[reply]
@0DF @Mahagaja @DCDuring We actually already have mul-tax as a variant of Translingual (no idea when it got added, but see Module:etymology languages/data). I don't think it's used for anything at the moment, but it would make sense to use it for this. Theknightwho (talk) 20:25, 20 February 2024 (UTC)Reply[reply]
@Theknightwho: Thank you.
@DCDuring: How 'bout it?
0DF (talk) 20:29, 20 February 2024 (UTC)Reply[reply]
I always fear that the cure will turn out worse than the disease. Can it all be done automagically or will there be a few hundred exceptions? It is true that mul in Latin script is hard to confuse with mul in CJKV. DCDuring (talk) 20:37, 20 February 2024 (UTC)Reply[reply]
Daniel Carrero added Tax. "for test purposes" back in November 2016; -sche then standardized it to mul-tax. I don't know what he was testing, but the code is there for anyone who wants to use it. —Mahāgaja · talk 20:38, 20 February 2024 (UTC)Reply[reply]
@DCDuring: I looked at the histories of Module:form of, Module:form of/templates, and {{synonym of}}. They showed me that Benwing had done a lot of editing on all three, so I figured he/she would be sufficiently familiar with those pages to make the changes I requested. There's nothing suspicious about that and I hardly see how I can be said to have "power" here. 0DF (talk) 00:33, 21 February 2024 (UTC)Reply[reply]
It's a habit of exclusion, not an intent of exclusion. Specific folks can always be pinged. DCDuring (talk) 14:31, 21 February 2024 (UTC)Reply[reply]
@DCDuring: I guess so. Not that I intended the request to turn into a prolonged discussion. 0DF (talk) 15:12, 21 February 2024 (UTC)Reply[reply]

Error handling with Module:parameters and Module:languages edit

Hiya - just a heads up (and you've probably noticed already), but I've recently updated Module:parameters to allow languages, scripts, families (etc) as data types, as well as a few other things. The means that the argument table which is returned contains the relevant object(s), and invalid codes will throw an error (which automatically highlights the incorrect parameter). This avoids having to manually handle invalid codes, since the only way to do proper error-handling previously was to pass the ready-baked parameter into Module:languages using getByCode's paramForError parameter, which was tricky when dealing with lists etc. Having converted a number of template modules, it's also cut down on code length by quite a bit, too.

Ideally, we should be able to remove error handling from Module:languages and Module:scripts altogether at some point, since it doesn't really belong there, and it's annoying having to work around it when requesting etymology langs and families, too. Theknightwho (talk) 15:21, 27 February 2024 (UTC)Reply[reply]

@Theknightwho Yup I did notice it, thanks. I haven't had a chance to use the new functionality but it sounds good to me. BTW if you haven't already done this you might consider adding support for comma-separated lists of lang codes and for a term with a preceding language code (see parse_term_with_lang in Module:parse utilities, which implements this latter functionality currently). Benwing2 (talk) 20:01, 27 February 2024 (UTC)Reply[reply]
@Benwing2 I've already done the comma-separated list actually, but haven't updated the documentation since I want to make sure the implementation is stable/won't need further expansion. The solution I opted for was sublist=, where sublist=true splits the list using %s*,%s*, but using a string value allows for other splits. The other thing which isn't yet documented is set=, which is for parameters that take an (ideally small) closed set of values, where inputs with other values would be nonoperative anyway.
I'll have a think about how to handle preceding langcodes. Theknightwho (talk) 20:07, 27 February 2024 (UTC)Reply[reply]
@Theknightwho The |set= support is definitely useful. Note that the corresponding flag in Python's argparse module is called |choice=, which might possibly be a clearer name (although I can see the argument for using set as well). Benwing2 (talk) 20:16, 27 February 2024 (UTC)Reply[reply]
@Benwing2 That makes sense. The reason I opted for set= is because it uses the {a = true, b = true, c = true} format, since that makes lookup much faster/simpler. Theknightwho (talk) 20:26, 27 February 2024 (UTC)Reply[reply]
@Theknightwho Hmm, I wonder if that isn't false economy since it requires more typing, and I imagine a lot of people will call listToSet on a list to handle this format. Benwing2 (talk) 20:28, 27 February 2024 (UTC)Reply[reply]
@Benwing2 That's a good point, but checking a list is the same amount of work as doing listToSet, so changing Module:parameters to accept a list would simply guarantee the worst-case scenario, instead of leaving it up to the calling module. Theknightwho (talk) 20:34, 27 February 2024 (UTC)Reply[reply]
@Theknightwho I suppose but the actual difference in memory and speed is completely negligible, so IMO you might as well make it easier for the callers. Benwing2 (talk) 20:54, 27 February 2024 (UTC)Reply[reply]
And also you don't have the overhead of loading a new module. Benwing2 (talk) 20:54, 27 February 2024 (UTC)Reply[reply]
@Benwing2 If I have time, I might do some profiling on Module:parameters, since I have a feeling it's contributing a significant chunk to page loading time. e.g. a loads about a second faster since I made the changes, and there are still quite a few other optimisations that could be made. Theknightwho (talk) 21:02, 27 February 2024 (UTC)Reply[reply]
@Theknightwho OK but I still think requiring the use of a set rather than (also) allowing a list is a micro-optimization since the number of items should be small. Benwing2 (talk) 21:10, 27 February 2024 (UTC)Reply[reply]
@Benwing2 Alright - I can change it. Theknightwho (talk) 21:16, 27 February 2024 (UTC)Reply[reply]
@Theknightwho & Ben: pardon the partial threadjacking, but I've been waning to ask you two about the practicality of adding parameter checking to existing, non-Lua templates, and this seems like an opportune moment while you're both already thinking about Module:paramaters. I'm envisioning something like an unobtrusive template {{allowparams|1,2,3,foo,bar,baz}} that could be added to existing templates to generate errors/warnings when the template is invoked with any params besides those listed. On the backend, it could just call Module:parameters.process() with the list of supplied params and then do nothing with the result. Ignoring the difficulty of identifying the valid parameters and cleaning up all the existing calls with invalid parameters, would adding param checking to every template add an unacceptable overhead to page processing? JeffDoozan (talk) 01:45, 28 February 2024 (UTC)Reply[reply]
@JeffDoozan I think User:Theknightwho can best answer the question about efficiency as he's done a lot more investigations of this sort. Benwing2 (talk) 01:48, 28 February 2024 (UTC)Reply[reply]
@JeffDoozan That's certainly doable, but it would add an extra Lua burden to those templates, and in many cases it would be more straightforward to do the whole thing in Lua anyway.
The reason why it concerns me is that a lot of these mixed templates already make multiple calls into Lua to retrieve things like language names, and there is an inherent cost every time a module is invoked; this is the reason why {{multitrans}} is so effective, because it removes that inherent cost from each template. Aside from memory costs, each invocation is quite time-consuming (relatively speaking), since a ton of things are done by the back-end to create each new Lua environment. Theknightwho (talk) 01:48, 28 February 2024 (UTC)Reply[reply]
@Theknightwho: Thank you for the explanation. I had naively assumed that if a page calls Lua once, then subsequent calls would be relatively cheap. I'm still assuming that most pages include few enough templates that the benefit of having parameter checking outweighs the cost of invoking the checks, but as pages get bigger and closes to memory/speed limits, the calculus may change. Do you have any guess where that tipping point might be? (100 additional calls? 1,000? 10,000?) For pages that exceed that threshold, maybe {{allowparams}} could check the pagename against a fixed "denylist" of problematic pages before invoking Lua. I'm assuming the denylist would be < 100 pages and could be programmatically generated from an XML dump by counting the number of templates that would call {{allowparams}}. What do you think? JeffDoozan (talk) 17:39, 28 February 2024 (UTC)Reply[reply]
@JeffDoozan So conventional wikicode would probably preclude that being workable, because there's the post-expand include size limit of 2MB, which is calculated by adding up the size of every page accessed, multiplied by the number of times it's accessed, and on top of that, parser functions like {{#if:}} actually apply a multiplier to anything that goes through them (which compounds, though I think it's capped at something like x12). This was a big problem we ran into with the lite templates, where the bottom 10% of a simply wasn't loading templates anymore. Even now, it's using about 1.8MB of the limit. Obviously I'm being really pessimistic when I say these things, but the irony of it is that adding these kinds of checks to aid large pages can end up having the opposite of the intended effect!
The things that help are:
  • Reducing the number of calls into Lua. If it can be done in one invoke that's ideal, but really it should be no more than 5. This includes uses of any templates which themselves are Lua based (like {{l}}), since they each result in independent calls into Lua. The Coptic conjugation templates are a great example of why this matters, since they're way slower than water/translations despite having nowhere near as many links.
  • Not creating complex wikicode logic with the parser functions (like we do with the citation templates, for example). They're really slow, a pain in the neck to maintain, and inevitably result in lots of separate Lua invocations for basic information like language names.
In terms of the parameter checking, let me know if there are any templates which are on your priority list, because it may be that we can score some quick-wins by converting some of them into pure-Lua, whereas with others the manual parameter checking may be workable. Theknightwho (talk) 17:51, 28 February 2024 (UTC)Reply[reply]
@TheknightwhoThat kind of deep information is exactly why I wanted to run this by you. Since I'm hoping to do this programatically and en-mass, it would be limited to templates where I can parse the code to find all of the parameters used, which eliminates anything already calling #invoke since the invoked module can make its own use of the parameters and I'm not sure how practical it is to try to determine the parameters used by a Module. I think this means that every modified template would mean 1 additional call to Lua for every use and also that there's likely little or no benefit to converting them to Lua. How many total Lua calls on a page is too many?
I would probably start with the templates that don't already have calls with bad parameters, which probably means the lesser used templates that might not even be included on our bigest pages. I can check which templates are used on pages with more than X template calls and exclude those templates from the mass conversion, to ensure we're not adding additional stress to our biggest pages. I understand that not all template calls are equal, but is there some reasonable number of template calls I could use for detecting "big" pages? 100? 500? 1000? JeffDoozan (talk) 20:34, 28 February 2024 (UTC)Reply[reply]

"terms spelled with" edit

Hi, I would like to bring your attention to categories such as Category:Hindi terms spelled with ॉ. We seem to have decided that ◌ (U+25CC) should not be used for the Hindi combining characters, but Translingual doesn't seem to know about that, which is why Category:Translingual terms spelled with ◌ॉ exists. What should we do about that? --kc_kennylau (talk) 16:47, 28 February 2024 (UTC)Reply[reply]

@Kc kennylau Can you explain further about U+25CC? What is its replacement? As for the "terms spelled with" categories, AFAIK these categories are suppressed for one-character entries but this entry seems to involve two Unicode chars. Maybe User:Theknightwho can comment more as he reworked the code to generate these categories. Benwing2 (talk) 02:40, 29 February 2024 (UTC)Reply[reply]
U+25CC is usually used with combining characters (see Category:Translingual terms spelled with ◌̺, which is U+25CC followed by U+033A) in order to display the character. However, due to some unknown reasons, at least in my browser the Hindi combining characters in "isolation" already come with a dotted circle when they are rendered, so using U+25CC would create two dotted circles when displayed. I tried to look at The Unicode Standard, but so far it seems to me that this is not really specified one way or another, at least not specifically for Devanagari. This is why I don't really know if we should include U+25CC or not. --kc_kennylau (talk) 02:48, 29 February 2024 (UTC)Reply[reply]

Latin macronization change: veho, vē̆xī, vectum edit

Hello, I was just looking into the vowel length of Latin vē̆xī (perfect of vehō) and it looks like most recent sources think there's a good chance that it had a long vowel like Sanskrit ávākṣam (although there is some uncertainty). I edited the entry for vehō with notes on this and to mark the vowel in the perfect stem as ē̆, but of course, that doesn't affect all the inflected forms and derived compounds (e.g. advehō, convehō, invehō, prōvehō, subvehō, trānsvehō, ēvehō). Could you have Wingerbot update those? (The long vowel seems to only be reconstructed for the perfect stem vē̆x-, not the supine stem vect-). I hope it's not too much trouble. I have also been wondering how I might set up a bot account of my own to make changes like this after editing the length of a vowel in Latin entries; if that's feasible for me to do, any tips would be welcome! Urszag (talk) 20:46, 1 March 2024 (UTC)Reply[reply]

@Urszag Hi. I'll go ahead and fix these. As for setting up a bot account, in order to do that (a) you need to be able to write Python scripts, (b) you do some small test runs using your own account and verify that everything works, (c) you set up a vote to create an account for your bot using the link in WT:Votes. I recommend using a combination of pywikibot to interface to Wiktionary and mwparserfromhell to parse the template invocations on a given page. Note that there's also AutoWikiBrowser which lets you make semi-automated changes based on regular expressions and takes less work to set up than a bot account; I used this several years ago before I set up a bot account. (It is only supported on Windows but it seems to work OK through Wine on MacOS, and there's also a JavaScript browser variant called JWB.)
BTW are there are any other macron changes you need done? I think there's an outstanding request somewhere in my archives that I never got to, possibly it was from you. Benwing2 (talk) 01:49, 2 March 2024 (UTC)Reply[reply]
Done. Benwing2 (talk) 05:18, 2 March 2024 (UTC)Reply[reply]
OK, I found the previous request. It was from you in April 2023: User talk:Benwing2/2023#More Latin vowel length changes. You mentioned hirtus, hirsutus, luxus, luctor. The relevant part of the input to my script has this:
### hīrtus
a1 hīrtus
pn2 Hīrtius
a1 hīrsūtus
a1 hīrtellus
a3 hīrtipēs hīrtiped
### lūctor
v1+ lūctor
n1 lūcta
n3 lūctātiō
n3 lūctātor
v1+ adlūctor
v1+ allūctor
v1+ collūctor
n3 collūctātiō
v1+ conlūctor
n3 conlūctātiō
v1+ ēlūctor
a3 ēlūctābilis
a3 inēlūctābilis
v1+ relūctor
n3 relūctātiō
### lūxus "dislocated"
a1 lūxus
n4 lūxus
v1+ lūxō
Do all these need to change to ī̆ ū̆? Are there any words missed here? Also can you give me the appropriate changelog comment(s) to have the bot add when making the changes? The default is "if before two cons, per Bennett corrected by Allen and Michelson" but that's obviously wrong for these cases. Benwing2 (talk) 05:30, 2 March 2024 (UTC)Reply[reply]
Thanks! Those all look correct with ī̆ ū̆. I would add lūxuria, lūxuriō, lūxuriōsus, lūxuriēs, obluctor.
In addition, it looks like I missed some inflected forms of derivatives of nūbō, nūpsī, nū̆ptum when I made that change (e.g. nūptum, nūptiāle). Specifically, there's innūbō, inflected forms of innū̆ptus, nū̆ptia, nū̆ptiae, nū̆ptiālis, nū̆ptus (It seems I just edited the main entry for these), and connūbium and its inflected forms.
I just made a new change to the perfects of alliciō, allē̆xī (formerly marked as just long) and illicio, illexī and pellicio, pellexī (formerly marked as just short) to mark them as uncertain (it seems likely all three had the same quality, probably short). These just need the inflected verb forms updated.
The references I'm basing these on are cited at the pages for hī̆rtus, lū̆xus, lū̆ctor, alliciō, nūbō, cōnū̆bium, so I think one option is to add notes of the format "Vowel length marked as uncertain based on references cited at hī̆rtus", and so on. Or the specific references could be listed as follows. Hirt- and lux-: uncertain based on Bennett (long) vs. De Vaan (short). Luct-: uncertain based on Bennett (long) vs. De Vaan, Wartburg, Buchi and Schweickard (short, with complications). Allex-: uncertain based on Bennett, Buck and Allen. Nupt-: uncertain based on Lewis and Bennett (long) vs. De Vaan, Ernout and Meillet, Wartburg and Bienvenu (short). -nubium: uncertain per Kennedy. -licio, -lē̆xī: uncertain per Bennett and Buck, "probably short" per Allen.--Urszag (talk) 15:13, 2 March 2024 (UTC)Reply[reply]

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order edit

how can you change the alphabet order of the Hijazi Arabic letters from

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي


آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

since پ and ڤ are additional letters and not part of the Alphabetical order عربي-٣١ (talk) 12:39, 2 March 2024 (UTC)Reply[reply]

@عربي-٣١ Are you referring to the sort order as it appears on category pages? The thing is, those additional letters are letters even if they aren't part of the standard Hijazi alphabet, and they need to be sorted *somewhere*. The "to chart" you gave doesn't include them anywhere. Benwing2 (talk) 22:47, 3 March 2024 (UTC)Reply[reply]
Oh NVM, you want them placed at the end. Benwing2 (talk) 22:48, 3 March 2024 (UTC)Reply[reply]

Replacement of quotation templates edit

Hi, when you have time could you please do the following quotation template replacements?

Thank you! — Sgconlaw (talk) 13:45, 3 March 2024 (UTC)Reply[reply]

@Sgconlaw Done. Benwing2 (talk) 22:47, 3 March 2024 (UTC)Reply[reply]