Wiktionary:Requests for moves, mergers and splits

(Redirected from Wiktionary:RFM)
Wiktionary Request pages (edit) see also: discussions
Requests for cleanup
add new | history | archives

Cleanup requests, questions and discussions.

Requests for verification/English
add new English request | history | archives

Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.

Requests for verification/CJK
add new CJK request | history

Requests for verification of entries in Chinese, Japanese, Korean or any other language using an East Asian script.

Requests for verification/Italic
add new Italic request | history

Requests for verification of Italic-language entries.

Requests for verification/Non-English
add new non-English request | history | archives

Requests for verification of any other non-English entries.

Requests for deletion/Others
add new | history

Requests for deletion and undeletion of pages in other (not the main) namespaces, such as categories, appendices and templates.

Requests for moves, mergers and splits
add new | history | archives

Moves, mergers and splits; requests listings, questions and discussions.

Requests for deletion/English
add new English request | history | archives

Requests for deletion of pages in the main namespace due to policy violations; also for undeletion requests.

Requests for deletion/CJK
add new CJK request | history

Requests for deletion and undeletion of entries in Chinese, Japanese, Korean or any other language using an East Asian script.

Requests for deletion/Italic
add new Italic request | history

Requests for deletion and undeletion of Italic-language entries.

Requests for deletion/Non-English
add new non-English request | history | archives

Requests for deletion and undeletion of any other non-English entries.

Requests for deletion/​Reconstruction
add new reconstruction request | history

Requests for deletion and undeletion of reconstructed entries.

{{attention}} • {{rfap}} • {{rfdate}} • {{rfquote}} • {{rfdef}} • {{rfeq}} • {{rfe}} • {{rfex}} • {{rfi}} • {{rfp}}

All Wiktionary: namespace discussions 1 2 3 4 5 - All discussion pages 1 2 3 4 5

This page is designed to discuss moves (renaming pages), mergers and splits. Its aim is to take the burden away from the Beer Parlour and Requests for Deletion where these issues were previously listed. Please note that uncontroversial page moves to correct typos, missing characters etc. should not be listed here, but moved directly using the move function.

  • Appropriate: Renaming categories, templates, Wiktionary pages, appendices, rhymes and occasionally entries. Merging or splitting temp categories, templates, Wiktionary pages, appendices, rhymes.
  • Out of scope: Merging entries which are alternative forms or spellings or synonyms such as color/colour or traveled/travelled. Unlike Wikipedia, we don’t redirect in these sort of situations. Each spelling gets its own page, often employing the templates {{alternative spelling of}} or {{alternative form of}}.
  • Tagging pages: To tag a page, you can use the general template {{rfm}}, as well as one of the more specific templates {{move}}, {{merge}} and {{split}}.

Note that discussions for splitting, merging, and renaming languages are often also held here, and should be archived to WT:LTD when closed.

2015 edit

West African Pidgin English varieties edit

Ethnologue has assigned codes to some but not all of the varieties of West African Pidgin English, and we in turn have incorporated some (e.g. pcm) but not all (e.g. not gpe) of those codes. As WP notes, the "contemporary English-based pidgin and creole languages are so similar that they are sometimes grouped together under the name 'West African Pidgin English'" (a name which also denotes their predecessor which developed in the 1700s). WP's examples are illustrative, particularly in that its Ghanaian and Nigerian Pidgin English examples are identical. I propose to merge at least the following three varieties into wes, renaming it "West African Pidgin English":

  1. Ghanaian Pidgin English (gpe)
  2. Nigerian Pidgin English (pcm)
  3. Cameroonian Pidgin English (wes)

We could also discuss whether or not to merge Sierra Leone Krio (kri, which WP notes its often mistaken for English slang due to its similarity to English, but which has a somewhat distinct alphabet), Pichinglis / Fernando Po Creole (fpe), and Liberian Kreyol / Liberian Pidgin English (lir). - -sche (discuss) 21:11, 11 August 2015 (UTC)Reply[reply]

The question is a very complex one. Firstly (but of least importance), scholars are divided on which lects have creolised and which have not, but it is generally agreed upon that at least some of the language you mentioned are not pidgins, which would make the name "West African Pidgin English" somewhat of a misnomer (the more neutral name "Wes-Kos" have been suggested as an alternative, but even linguists haven't fully adopted it). Secondly, all these lects are remarkably similar on a lexical level, but that's unsurprising; after all, they resulted from separate but very similar language contact events, and then probably modified each other (one scholar posits that Krio and Cameroonian Pidgin English relexified each other to some degree after pidginisation). The similarities are also obscured by the fact that there is nothing close to an agreed orthography for most of these, and pronunciation does differ a bit across West Africa. Linguistically, I'd probably merge them all, but practically that may not be the best decision. I know we have entries in pcm, but probably next to nothing for the rest, and if somebody wants to add them, given how each lect is very neatly assigned to a certain West African country, at least it won't be confusing for them to do so. Conclusion: the literature is schizophrenic, the lects mutually intelligible, and the existing situation remarkably unproblematic. Therefore I abstain. —Μετάknowledgediscuss/deeds 21:19, 16 August 2015 (UTC)Reply[reply]
  Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Per Wiktionary:Votes/2011-04/Lexical categories, move:

Rationale: This makes these categories nominally consistent with all other categories that describe the words ("Category:English blablabla") rather than their meanings ("Category:en:blablabla"), such as all categories listed in Category:English terms by etymology.

In fact, I believe Category:English exonyms should be a subcategory of Category:English terms by etymology.

It's interesting to note that Category:English terms by etymology was once called Category:en:Etymology before it was moved multiple times. --Daniel Carrero (talk) 23:22, 11 October 2015 (UTC)Reply[reply]

Being an exonym is not a matter of how a word was created. In fact, terms often don't start off as exonyms, but become exonyms as the languages diverge and evolve. So it's not appropriate to put it under etymology. —CodeCat 00:11, 12 October 2015 (UTC)Reply[reply]
  • Oppose: Exonyms should remain as a category and English exonyms should be a subcategory of it. Purplebackpack89 20:15, 12 October 2015 (UTC)Reply[reply]
I nominated specifically "Category:en:Exonyms -> Category:English exonyms", you mentioned "English exonyms should be [] ", so I don't see how this would work as an oppose vote to my nomination. I don't suppose you wanted the category to remain named "Category:en:Exonyms", right?
In any event, the format that other umbrella categories use according to Wiktionary:Votes/2011-04/Lexical categories is "Category:Exonyms by language" -> "Category:English exonyms". Like "Category:Nouns by language" -> "Category:English nouns". --Daniel Carrero (talk) 00:16, 13 October 2015 (UTC)Reply[reply]
Oh, sorry, I missed the "en" in there. Retracting my vote. Purplebackpack89 00:22, 13 October 2015 (UTC)Reply[reply]
No problem, thank you. --Daniel Carrero (talk) 00:26, 13 October 2015 (UTC)Reply[reply]
This should not be controversial, but it's wise to check. DCDuring TALK 23:32, 14 October 2015 (UTC)Reply[reply]
The situation of all of our names categories is complicated, compounded by the unclear scope of some, e.g. the exonyms category seems to only contain place exonyms, not other exonyms like German or Xerxes. And suppose someone attests a foreign exonym of an English-speaking place [e.g. Japanese-derived "Rondon" for "London"] in English, the way e.g. Deutsch#English or google books:"speak Eigo" are attested in English: would that go in the "English exonyms"/"en:Exonyms" category?
It's been suggested that we need to revamp the system more widely, also doing something about e.g. transliterations of foreign names (Pyotr, Putin, Kaifeng, etc); even the question of whether and how Placenames should be a subset of Names has come up before, though I'm having trouble finding the discussion (I think there's more than just the discussion in the section immediately below this one, and Category talk:en:Place names and Category talk:en:Names and WT:Info desk/2013/July, but I can't find it offhand). On a balance, names are a lot more like a "POS" category than a "topic" category. I agree they aren't per se terms by etymology, since as noted above, they only sometimes originate as exonyms, sometimes they originate as endonyms and then the speakers of the language get forcibly relocated, or the language evolves into two. (Is Icelandic Rín an exonym for the Rhine? Icelanders do not live near the Rhine, but the name goes back to when their ancestors did...) - -sche (discuss) 16:36, 27 December 2022 (UTC)Reply[reply]

Recategorize into Category:Names by language edit

Pinging some editors from the discussion above: @User:Daniel Carrero, @User:Rua, @User:Purplebackpack89, @User:-sche

As I explained above, it seems infeasible to rename cat:Exonyms (and its subcategories) without also changing what its parent category is. So I propose we remove cat:Exonyms from cat:Places, add it to cat:Names by language, rename it to cat:Exonyms by language, and rename its subcategories to e.g. cat:English exonyms. Exonyms are not places; they are names. I realize this would extend the breadth of cat:English names and its siblings. I think this makes sense, but I would also accept cat:Exonyms by language being under cat:Terms by semantic function by language. — excarnateSojourner (talk · contrib) 03:48, 25 February 2023 (UTC)Reply[reply]

Reviving the earlier discussion, I'm still bothered by the fact that we have two different categories for names. But the previous discussion also made it clear that it's not as easy as just merging them.

CodeCat 00:45, 10 November 2015 (UTC)Reply[reply]

FWIW, what I am going to say is somewhat off-topic and maybe I'm minority on that, but I would not mind using the naming system "Category:English xxxx" for all topical categories: Category:en:Chess -> English terms related to chess. (or any better name along those lines) --Daniel Carrero (talk) 00:59, 10 November 2015 (UTC)Reply[reply]
"Category:en:Transliteration of personal names" could be renamed to "Category:English names transliterated from other languages", I suppose. What's the matter with the demonyms category? It contains demonyms, as expected. Would it be better titled "English demonyms", on the model of "English phrases"? - -sche (discuss) 06:02, 10 November 2015 (UTC)Reply[reply]
"Category:en:Transliteration of personal names" would be better named "English transliterations of (foreigners') personal names". Notice the existence of e.g.Category:Latvian transliterations of English names. Names of non-English speakers are not English names. I agree with CodeCat that place names belong to topic categories.--Makaokalani (talk) 14:32, 10 November 2015 (UTC)Reply[reply]
Here's the old discussion if anyone wants to read it. - excarnateSojourner (talk | contrib) 15:58, 12 April 2022 (UTC)Reply[reply]
Category:en:Place names was deleted by Equinox in 2017-05 because it was empty. Category:Transliteration of personal names (and its language-specific subcategories) were moved to Category:Foreign personal names in 2021-09 with the help of WingerBot. - excarnateSojourner (talk | contrib) 16:14, 12 April 2022 (UTC)Reply[reply]
@ExcarnateSojourner There being no opposition here, only support (albeit mostly old support), and no opposition or interest when I brought this up in the BP, let's revise whatever needs to be revised to put (at a minimum) all given names and surnames into subcategories of Category:Names by language, instead of some of them being in subcategories of Category:Names. The split is haphazard and arbitrary; I see the intention — put a name that was given within English in one top-level category and a name transliterating a foreign name in a different top-level category — but in practice that's not maintained, since e.g. Alexandra in the context of discussing ancient Greek is transliterating the Ancient Greek name, Sergei has been given to babies born in the Anglosphere (and to characters in English fiction), and we don't maintain such a split with place names. - -sche (discuss) 16:01, 24 April 2023 (UTC)Reply[reply]
It making no sense to have Alexandra (in works about ancient Greece where it's romanizing a Greek name), Alexandra (in fiction about ancient Greece where it's a given name), Alexandra (as borne by British or American people today), Sonya, Vadim and Vladimir divided haphazardly into two different top-level categories, "Names" vs "Names by language", I'm now (attempting) editing the modules to consolidate them into "Names by language" subcategories. - -sche (discuss) 14:37, 5 May 2023 (UTC)Reply[reply]
(Assistance solicited at Module talk:names#en:Russian_male_given_names,_etc.) - -sche (discuss) 14:48, 5 May 2023 (UTC)Reply[reply]

Recategorize Category:Demonyms and Category:Ethnonyms edit

Pinging some editors from the discussion above: @User:Rua, @User:Daniel Carrero

As I explained in the discussion about exonyms above, renaming the language-specific subcategories of cat:Demonyms properly will require removing it from the topic category tree and adding it to the set category tree. We should similarly recategorize cat:Ethnonyms, another child of cat:Names that did not yet exist when this discussion started. I propose recategorizing them into Category:Terms by semantic function subcategories by language, unless someone can find a better place, and renaming them cat:Demonyms by language and cat:Ethnonyms by language. — excarnateSojourner (talk · contrib) 06:55, 25 February 2023 (UTC)Reply[reply]

@ExcarnateSojourner @-sche I am going to take a stab at implementing this. Can you help with what the renames should be? I understand the separation between poscat categories and topic categories should be "lexical" vs. "semantic" but I sometimes have trouble putting this into practice. A tentative list based on what's already been proposed:
  1. 'DESTLANGCODE:SOURCELANG male given names' -> 'DESTLANG male given names transliterated from SOURCELANG'; same for 'female given names', 'surnames', etc. This doesn't work; these are not DESTLANG names but SOURCELANG names rendered into DESTLANG. So I propose 'DESTLANG renderings of SOURCELANG male given names' or similar. ("Transliteration" isn't quite right; sometimes these are transliterations, sometimes respellings, sometimes mere borrowings (cf. Italian Clinton).)
  2. 'LANGCODE:Foreign personal names' (a grouping category) -> 'LANG foreign personal names'
  3. 'LANGCODE:Demonyms' -> 'LANG demonyms'
  4. 'LANGCODE:Ethnonyms' -> 'LANG ethnonyms'
  5. 'LANGCODE:Exonyms' -> 'LANG exonyms'
  6. 'LANGCODE:Letter names' -> 'LANG letter names'
  7. 'LANGCODE:Couple nicknames' -> 'LANG couple nicknames'
  8. 'LANGCODE:Named roads' -> 'LANGCODE:Names of roads' and remove from 'LANGCODE:Names'
  9. 'LANGCODE:Named prayers' -> 'LANGCODE:Names of prayers' and remove from 'LANGCODE:Names'
What about the following:
  1. Subcategories of 'LANGCODE:Demonyms':
    1. 'LANGCODE:Armenian demonyms'?
    2. 'LANGCODE:Celestial inhabitants'?
      1. 'LANGCODE:Ufology' -> stays as a topic category.
    3. 'LANGCODE:Latvian demonyms'?
    4. 'LANGCODE:Nationalities'
    5. 'LANGCODE:Tribes'
      1. 'LANGCODE:Celtic tribes'
      2. 'LANGCODE:Germanic tribes'
      3. 'LANGCODE:Native American tribes'
      • See also 'LANGCODE:Mongolian tribes' under 'LANGCODE:Ethnonyms'.
  2. Subcategories of 'LANGCODE:Ethnonyms':
    1. 'LANGCODE:Mongolian tribes' -> Goes wherever 'LANGCODE:Celtic tribes', 'LANGCODE:Germanic tribes' and 'LANGCODE:Native American tribes' go.
  3. 'LANGCODE:Place names' -> Delete and reclassify the terms under them using {{place}} so they end up in 'Places in FOO'.
  4. 'LANGCODE:Places' -> Leave as a topic category but remove 'LANGCODE:Names' as a parent?
  5. Script-specific variants of 'LANGCODE:Letter names': 'LANGCODE:Arabic letter names', 'LANGCODE:Devanagari letter names', 'LANGCODE:Imperial Aramaic letter names', 'LANGCODE:Korean letter names', 'LANGCODE:Latin letter names'?
  6. Subcategories of 'LANGCODE:Nicknames':
    1. 'LANGCODE:Nicknames' itself? This is a grouping category.
    2. 'LANGCODE:Nicknames of individuals'?
    3. 'LANGCODE:City nicknames'?
    4. 'LANGCODE:Country nicknames'?
      1. 'LANGCODE:Racist names for countries' -> Terminate with extreme prejudice, see WT:BP.
    5. 'LANGCODE:Sports nicknames' -> either 'LANGCODE:Sports team nicknames', 'LANGCODE:Nicknames of sports teams', 'LANG sports team nicknames', 'LANG nicknames of sports teams'
      • See also 'LANGCODE:Couple nicknames' above.
  7. 'LANGCODE:Onomastics' -> stays as topic category but should not have 'LANGCODE:Names' as one of its parents.
  8. 'LANGCODE:Language families'? Regardless, it should not have 'LANGCODE:Names' as one of its parents.
  9. 'LANGCODE:Languages'? Regardless, it should not have 'LANGCODE:Names' as one of its parents.
  10. 'LANGCODE:Taxonomic names' and subcategories:
    1. 'LANGCODE:Taxonomic names' itself?
    2. 'Taxonomic eponyms by language': Already a pos category.
    3. 'Specific epithets' -> 'Translingual specific epithets'?
Other topic categories not directly reachable through 'LANGCODE:Names' but needing consideration:
  1. 'LANGCODE:Ships (fandom)' and numerous subcategories ('LANGCODE:F/F ships (fandom)', 'LANGCODE:M/M ships (fandom)', 'LANGCODE:Heterosexual ships (fandom)', 'LANGCODE:Homosexual ships (fandom)', 'LANGCODE:Polyamorous ships (fandom)', 'LANGCODE:RPF ships (fandom)'
  2. 'LANGCODE:Horse given names'
Benwing2 (talk) 07:14, 31 October 2023 (UTC)Reply[reply]
@-sche Wondering if you missed my ping. I know my post is long, so take your time in responding. Benwing2 (talk) 06:36, 4 November 2023 (UTC)Reply[reply]
Sorry, didn't mean to ignore your ping, but got distracted by life after seeing it. As far as the categories for "English renderings of Ukrainian names" (or whatever), I have no strong preference for any particular name at this time. My immediate concern was just with addressing the odd point of bifurcation where "native English placename like Warwick or Alberta; English rendering of an Armenian placename like Stepanakert; English rendering of a personal name someone gave a baby born in Ukraine like Volodymyr" are in one top-level category system ("LANGCODE:Names", named like 'set' categories), and "personal name someone gave a baby born in Canada" is in a different top-level category system ("LANGNAME names", treated like a quasi-part of speech). It's hard to decide where exactly to split the spectrum of categories we're dealing with here, if we're wanting to keep e.g. "John" in "Category:English male given names" at that (part-of-speech-esque) category name, but wanting to consider some things like Category:en:Native American tribes to be clearly a set/list category (a set/list of tribes); my immediate point was just that I don't see a sound basis for considering "John, Jane" a POS-type (LANGNAME) category but "Volodymyr, Sergei" a LANGCODE:-set-type category — surely they're both one or both the other, and the greater momentum seems to be towards considering "names" a POS-type/LANGNAME category. But maybe we should think about that more carefully and consider them all to be "sets"? (But then, "Category:English verbs" is also just a category containing the set of English verbs. Hmm... should we perhaps allow only things that are truly "parts of speech" to have "Category:LANGNAME foobars" names, and make all the "names" categories that contain John and Volodymyr into set categories? Should that be the direction in which we eliminate the bifurcation of the 'John' vs 'Volodymyr' categories?)
I do think even keeping names in two subcategories like "English given names" vs "English renderings of Ukrainian names"/"English renderings of Chinese names"/etc [whatever we call those categories] based on, in effect, whether they were born in Ukraine vs to a Ukrainian family in Canada (or in China vs to a Chinese family in America) may be less than ideal; e.g. what do we do if a transliterated Ukrainian or Chinese name is common in English-language fiction? What about if it's a German name; does the fact that those names are "natively" Latin script make the threshold for considering them to have become "English names" lower? Does it make a difference if the fiction is set in lightly-fictionalized Germany or Ukraine or China, vs in a space future or a generic medievalesque Middle Earth / Westeros? But I don't have time to think through and suggest any proposal for any better approach to that yet.
"LANG foreign personal names" (e.g. "English foreign personal names") sounds a bit odd; would "LANG renderings of foreign personal names" (aligning with your proposed "DESTLANG renderings of SOURCELANG male given names") be better, iff we're sticking with moving "Names" categories to LANGNAME names and not LANGCODE names?
I will try to respond more, and to the rest, later. - -sche (discuss) 17:54, 4 November 2023 (UTC)Reply[reply]
@-sche Thanks for your comments. I have no issue with "LANG renderings of foreign personal names". I see your point about the line between nativized foreign-origin names and renderings of actual foreign names being fuzzy, but there does feel to me like a distinction, esp. in languages like Latvian that tend to respell foreign names according to Latvian spelling conventions, and the distinction is fairly clearly made in reality between e.g. the large number of Russian names respelled according to Latvian conventions (and used e.g. by the large population of Russians in Latvia) vs. the smaller number of Russian-origin names that have become nativized for naming of ethnic Latvians. In a multi-ethnic society like the US or Canada where nationality and ethnicity aren't always clearly distinguished, things get a lot fuzzier, although it still feels like there's some sort of distinction between names like Volodymyr or Volha that are unlikely to be borne by anyone other than someone who is Ukrainian (resp. Belarusian) or whose parents or grandparents are Ukrainian (resp. Belarusian), vs. a name like Vladimir or Olga that might be given to someone with no particular connection to Russia. As for whether these should use LANGNAME-type or LANGCODE-type naming, I'm not sure although I gather the distinction is supposed to be lexical vs. semantic, if that helps at all. Benwing2 (talk) 23:57, 4 November 2023 (UTC)Reply[reply]
I guess we should stick with LANGNAME naming for given names / surnames, then, at least for now. (Switching gears for a moment to address a different aspect:) Regarding "horse given names", we also have (but apparently don't currently categorize) dog given names likes Scruffy, Fido, and Spot, and we have Polly as a name for a parrot, and Mittens, Kitty, Socks for cats (also e.g. Miming in Cebuano). Perhaps we should merge all the different animals into one category for "animal given names". To me, at least, it seems intuitive to then handle this category in whatever way we handle the human given name categories—so, if we're naming the category that contains 'John' "English male given names", then 'Fido' goes in "English animal given names", or if we're using language codes, then use codes for both. (Back to the first gear:) We also have names that belong to specific individual people (Confucius, Cicero) or animals (Laika, and mythically Cerberus, Garm); we seem to put these in LANGCODE-set categories; I suppose the rationale is that the category that contains "Confucius, Cicero" contains a set of individuals, whereas "John" and "Jane" are 'less restricted'... in practice, people have undoubtedly also named babies 'Confucius' and 'Cicero', but if we demonstrate that, then we add a {{given name}} sense, so I guess we're fine leaving the individuals in LANGCODE-set categories and the {{given name}}s in LANGNAME categories... I guess this also explains the difference between nicknames (LANGNAME nicknames) and relationship names (the category contains a set of specific ships)...? nevermind, "Category:Nicknames" doesn't contain what I would've expected ("Bob, Jim, Tom" for Robert, James, Thomas) - -sche (discuss) 18:45, 5 November 2023 (UTC)Reply[reply]
@-sche This all sounds good to me. I think I'll start on the renames in a couple of days depending on how the comments go. Benwing2 (talk) 21:45, 5 November 2023 (UTC)Reply[reply]
Just checking, when your "list based on what's already been proposed" includes "'LANGCODE:Demonyms' -> 'LANG demonyms'" but then your follow-up proposal is for Subcategories of 'LANGCODE:Demonyms': like 'LANGCODE:Armenian demonyms'?, you're proposing to not actually rename "'LANGCODE:Demonyms' -> 'LANG demonyms'", right? I'm just checking that we're going to handle "Demonyms" and the subcategories like "Armenian demonyms" the same way, either all using LANGCODEs or all using LANGNAME. I could see handling the categories that actually have the word "demonyms" in their name either way, but since some of the other subcategories like "LANGCODE:Native American tribes" do seem more like set categories, maybe it's best to consider the whole batch to be set categories and stick with LANGCODE names like they have at present? (But maybe move them out of the "Names" category?)
"Couple nicknames" is an interesting case, because intuitively it seems like those and (relation)ship names should be handled the same way, since they seem like the exact same thing: "Lumity" is the portmanteau name for the two specific individuals Luz Noceda and Amity Blight, and Billary is the portmanteau name for the two specific individuals Bill Clinton and Hillary Clinton... maybe LANGCODE:Couple nicknames should be renamed "LANGCODE:Couples" to be more clearly a set category? and moved out from under the "names" category, since we don't categorize ship names as "names"? - -sche (discuss) 02:34, 6 November 2023 (UTC)Reply[reply]
@-sche Thanks for pointing out that inconsistency. Rua's point awhile ago was that 'Native American tribes' is named correctly as a set category because the contents are "names of Native American tribes" but 'Armenian demonyms' isn't named correctly as the contents aren't "names of Armenian demonyms". Rua suggested renaming 'Demonyms' -> 'Peoples' although that seems a bit strange to me as the term 'demonym' is fairly well established, and furthermore a distinction could be made between nominal demonyms and adjectival demonyms (note, we have {{demonym-noun}} and {{demonym-adj}} for these two, respectively), which is clearly a lexical distinction. That suggests maybe they should all be considered lexical categories, esp. since I think something like Category:en:Exonyms doesn't make sense as a set category (being an exonym is completely a lexical property. If we are to make Category:en:Armenian demonyms a lexical category, IMO it should be Category:English demonyms for Armenians as Category:English Armenian demonyms doesn't make much sense. As for CAT:en:Couples, that seems ambiguous so maybe it should be CAT:en:Nicknames of couples or something (which would be keeping with future names like CAT:Types of stars and such). Benwing2 (talk) 02:54, 6 November 2023 (UTC)Reply[reply]
"CAT:en:Nicknames of couples" works. Or should it even be "Nicknames of pairs", since it currently contains a few things like Bushbama {{subst:dash}} or should we remove those? (We don't categorize e.g. Republicrat as anything but "US politics".)
Good point about exonyms. "Demonyms", or at least the things currently in the "Demonyms" categories, seem to straddle the line between being a set category like "Occupations", vs being lexical like "Exonyms"... ugh, as you said earlier, it's hard to pin down and "put into practice" the difference, since so many of these categories exist in a grey area with characteristics of both. Like: it would not technically be wrong AFAICT to say "Category:English male given names and Category:English nouns are set categories containing the set of all English male given names or nouns respectively" (it would just be madness, heh). And in the other direction, isn't being a placename as much a lexical property as being a given name? But should they go into the same top-level "LANGNAME names" category, or is that madness?
Thinking aloud for a moment, I guess one difference is whether a term refers to one specific entity, or to an open-ended cast, which would rationalize why "John" and "Bob"—as names that can be given to an open-ended variety of people, new babies every day—are in (or belong in, in the case of "Volodymyr") "LANGNAME names" categories, whereas "Baghdad Bob" (individual's nickname), "Billary" and "Lumity" (real and fictional couples' nicknames) and e.g. "Saskatchewan" and "Yerevan" (placenames) refer to specific entities, and so are LANGCODE set categories...? So then, since demonyms like "Saskatchewanian" and "Yerevanian" also refer to an open-ended set of people (new babies born in Saskatchewan every day), and as you say, 'being a demonym' can be argued to be a lexical property like 'being an exonym', that justifies them being "LANGNAME demonyms" categories...? (Then the "type of"-set categories, like the category for "the set of all types of stars" or "the set of Native American tribes", are LANGCODE-set categories for a different reason.) - -sche (discuss) 19:04, 6 November 2023 (UTC)Reply[reply]
@-sche Yes, that seems to make a lot of sense. BTW I have written the script to move topic (langcode) categories to lexical (langname) categories and I'm probably going to run it on exonyms first. Benwing2 (talk) 19:59, 6 November 2023 (UTC)Reply[reply]
@-sche I have moved the exonyms and foreign-personal-names categories. Benwing2 (talk) 03:30, 7 November 2023 (UTC)Reply[reply]
  • @Benwing2 Sorry for being absent here. I'm glad to see discussion happening and generally support your proposals. A few specific comments:
8. 'LANGCODE:Named roads': Why not 'LANGCODE:Roads' (and remove from 'LANGCODE:Names')?
9. 'LANGCODE:Named prayers': Why not 'LANGCODE:Prayers' (and remove from 'LANGCODE:Names')?
5. Regarding letter names, see also cat:Letters.
— excarnateSojourner (talk · contrib) 21:18, 22 November 2023 (UTC)Reply[reply]
@ExcarnateSojourner The main reason for including the word "named" is that otherwise it might not be clear whether the categories are set-type or related-to categories. Benwing2 (talk) 00:57, 23 November 2023 (UTC)Reply[reply]
Relevant to the discussion above about creating a general animal given names category, this discussion points out "Ralph" for a raven, as well as "Rover" as another dog name. Whenever the situation with human names is sorted out, I suggest moving "LANGCODE:Horse given names" ("is:Horse given names") to "LANGNAME animal given names" ("Icelandic animal given names"), unless anyone has objections... (or we could add a general "animal given names" category and retain subcategories for specific animals if one or more languages had a lot of names for them, as might be the case for dogs and horses...) - -sche (discuss) 17:24, 11 November 2023 (UTC)Reply[reply]

2016 edit

I see no evidence that this exists as a separate language, and move that it be merged with tr. The literature which references it seems to describe the dialect of Turkish which may be spoken by Gagauz people in the Balkan Peninsula. —Μετάknowledgediscuss/deeds 20:17, 3 July 2016 (UTC)Reply[reply]

Wikipedia, citing Ethnologue, insists that Balkan Gagauz Turkish, Gagauz, and Turkish are all separate, and a few sources do seem to take that view, e.g. Cem Keskin, Subject agreement-dependency of accusative case in Turkish, or, Jump-starting grammatical machinery (2009) speaks of "Balkan Gagauz Turkish, Gagauz, Turkish, Iraqi Turkmen, North and South Azerbaijani, Salchuq, Aynallu, Qashqay, Khorasan Turkic, Turkmen, Oghuz Uzbek, Afshar, and possibly Crimean Tatar". Other references speak of Balkan Gagauz Turkish as a variety of Gagauz, e.g. James Minahan's Encyclopedia of the Stateless Nations says "The Gagauz speak a Turkic language [...] also called Balkan Gagauz or Balkan Turkic, [which] is spoken in two major dialects, Central and Southern, with the former the basis of the literary language. Other dialects [include] Maritime Gagauz" (which comports with w:Gagauz's list of its dialects). Matthias Brenzinger's Language Diversity Endangered also treats Balkan Gagauz "or slightly misleading, Balkan Turkic" in his entry on Gagauz, but says it that the Balkan "varieties might deserve the status of outlying languages but very little information is available about them." (A few generalist references seem to subsume all gag into tr.) I would leave them all separate, pending more conclusive evidence that they should be merged. - -sche (discuss) 23:58, 3 July 2016 (UTC)Reply[reply]
I think there's some confusion about what exactly we're talking about, and whether it's Gagauz or Turkish. Just because they use the term "Balkan Gagauz Turkish" doesn't mean that they're referring to the language with ISO 639-3 code bgx. When I look at who's citing the references listed for bgx at Glottolog, Manević (the reference for its classification) is cited in papers clearly talking about the dialects of tr. These are the only actual words attributed to this lect that I can find. —Μετάknowledgediscuss/deeds 00:33, 4 July 2016 (UTC)Reply[reply]
@Tropylium, on the subject of Turkic languages spoken in Europe, do you know anything about this one, and about its differences or similarity to Gagauz and standard Turkish? - -sche (discuss) 01:08, 11 May 2017 (UTC)Reply[reply]
I'm not previously familiar with this dispute, but here are a few handbooks on the topic:
  • Menges in The Turkic Languages and Peoples has the following slightly complicated quote (p. 11): "The Turkic languages spoken farthest west are the Balkanic dialects of Osman and Gagauz in Bosnia, Bulgaria and Macedonia. These seem to form two groups, one of possibly pre-Osman origin, and a later Osman one. To the former belong the Gaǯaly in Deli-Orman (Eastern Bulgaria), who, according to V. A. Moškov, are descended from the Päčänäg, Uz, and Torci (?), the Surguč, numbering about 7000 people in the district (vilājät) of Edirnä, who call themselves Gagauz. In Moškov's opinion, they, too, go back to the Päčänägs (?) and the Macedonian Gagauz; they number ca. 4000 people in southeastern Macedonia." — It seems clear that some group(s) corresponding to "Balkan Gagauz" is being identified here, but I am not even sure how to parse the sentence structure; e.g. are "Uz" and "Torci" some of the pre-Osman Turkic groups, or some of the alleged ancestors of the Gaǯaly? ("Osman" is, of course, Turkish.)
  • Hendrik Boeschoten in a classificatory chapter in Routledge's The Turkic Languages mentions that "a few speakers [of Gagauz] in northern Bulgaria, Romania and Greece, adhere to the Orthodox faith, and have their own history." This again seems to refer to "Balkan Gagauz", but with no indication of being its own language.
So far I would gather from this that "Balkan Gagauz" is at most a sister language of "non-Balkan Gagauz", and perhaps indeed just a different dialect group (perhaps one whose features are not reflected in written standard Gagauz). But the Manević 1954 paper would be more informative on this topic, if anyone wants to hunt it down. --Tropylium (talk) 11:55, 11 May 2017 (UTC)Reply[reply]
I think Balkan Gagauz should be merged with gag, especially since it contains no entries. The few terms that would be specific for Gagauz spoken outside of the traditional Gagauz area in Moldova/Romania/Bulgaria can be dealt with within gag entries. The only thing is that some etymologies of other Turkic languages sometimes refer to Balkan Gagauz instead of Gagauz, because editors didn't know the difference between two. Otherwise I don't see any problems with merging them two.
On the other hand, Gagauz should definitely NOT be merged with Turkish, that is pretty obvious to me.Allahverdi Verdizade (talk) 05:09, 9 September 2018 (UTC)Reply[reply]
@Metaknowledge This is a hard question, I can offer only guesswork.
I can't find any good maps for the distribution of Gagauz and (Muslim) Turks proper in the Balkans, most don't show Balkan Gagauz at all although we know they exist at least in Bulgaria and Macedonia.
It seems that they are not easily separated geographically from Muslim Turks although they presumably live in different localities. I'm guessing this means that their languages ("Balkan Gagauz Turkish" and "Rumelian Turkish") could be the same, although maybe only the latter call their language "Turkish", so I guess that they (would?) use Standard Turkish in education and administration.
This would be a good argument to merge Balkan Gagauz into Turkish, except that this paper shows that Balkan Turkic (if this really is a single language) is quite distinct from Anatolian Turkish and perhaps worth considering a different language. Baskakov also considers Balkan Turkish and (Moldovan) Gagauz to form a clade within Oghuz and Anatolian Turkish and Azerbaijani to form another. Crom daba (talk) 21:35, 30 September 2018 (UTC)Reply[reply]
@Anylai, can you find anything in Turkish on the possible differences between Balkan Gagauz and Rumelian Turkish? Crom daba (talk) 21:38, 30 September 2018 (UTC)Reply[reply]
Merge / delete it. The distribution of the name, the way it is “mentioned”, points towards it being a ghost language. The name is not attestable as used by anyone having particular information about it; nobody can add anything under it either in such a situation where it is a content-filled concept for nobody. Its alleged synonyms “Balkan Turkish” and “Rumelian Turkish” show it is just an SOP term for Turkish as spoken on the Balkans respectively Rumelia, i.e. remnant speakers of the Ottoman rule. German Balkantürkisch, distinguished from Türkeitürkisch as a regiolect. Fay Freak (talk) 13:38, 2 December 2020 (UTC)Reply[reply]
  Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Even more languages without ISO codes, part 6 edit

This next batch is of languages from lists other than Ethnologue and LinguistList. As before, I've tried to vet them all beforehand, but I will have doubtlessly made some mistakes. NB if you want to find more: I've avoided dealing with most of the Loloish languages, because all the literature seems to be in Chinese. —Μετάknowledgediscuss/deeds 04:54, 6 July 2016 (UTC)Reply[reply]

Australian languages edit

Tasmanian and other edit

Northeastern Tasmanian:
  • Northeastern, Pyemmairre language (aus-pye)   Done
    alt names/varieties: Plangermaireener, Plangamerina, Cape Portland, Ben Lomond, Pipers River
  • North Midlands, Tyerrernotepanner language (aus-tye) — Bowern considers this a dialect; perhaps we should just trust her
    now has an ISO code which should be added instead, see BP shortly - -sche (discuss) 04:27, 14 October 2020 (UTC)Reply[reply]
  • Lhotsky/Blackhouse Tasmanian language (aus-lbt) — the worst name in Bowern's set!
    I'm not sure... the very language is "reconstructed" by Bowern on the assumption that three wordlists (of which only two make it into the name) attest the same language, although apparently none of the three bothered to name the language. The chance of someone "would run across [a word in] it and want to know what it means" seems nonexistent. If we wanted to host the wordlists, we could do that in an appendix or on Wikisource. - -sche (discuss) 16:09, 9 August 2016 (UTC)Reply[reply]
Bowern's methods are scientific; but I would feel better if more than one scholar was saying there was one language in this set of wordlists, the way that for e.g. Port Sorrell, Dixon & Crowley and Glottolog agree that there is a unit/lect there. - -sche (discuss) 16:55, 4 June 2017 (UTC)Reply[reply]
and what of "Norman Tasmanian"? - -sche (discuss)
  • Here is another language we might need a code for: Ma(') Pnaan (poz-map?), also known by the exonyms Punan Malinau and Punan Segah, a language of Borneo / East Kalimantan, summarized by Antonia Soriente here and elsewhere. Compare the other things listed at Punan language. - -sche (discuss) 05:21, 29 August 2016 (UTC)Reply[reply]

Marrithiyel edit

Maridan [zmd], Maridjabin [zmj], Marimanindji [zmm], Maringarr [zmt], Marithiel [mfr], Mariyedi [zmy], Marti Ke [zmg]: should these be merged? References speak of a singular Marrithiyel language. - -sche (discuss) 21:30, 20 July 2016 (UTC)Reply[reply]

Some more missing American languages edit

Here are a few more North American languages for which we could add codes:

  • Akokisa (nai-ako). WP says it is attested certainly in two words in Spanish records (Yegsa "Spaniard[s]", which Swanton suggests is similar to Atakapa yik "trade" + ica[k] "people"; and the female name Quiselpoo), and possibly in more words in a wordlist by Jean Béranger in 1721 (if the wordlist is not some other language).
  • Algonquian–Basque pidgin (crp-abp). Wikipedia has a sample. The Atlas of Languages of Intercultural Communication, citing Bakker, says it was spoken from at least 1580 (and perhaps as early as 1530s) through 1635, and "only a few phrases and less than 30 words attributable to Basque were written down" (though apparently more words, attributable to other sources, were also recorded).
  • Guachichil (Cuauchichil, Quauhchichitl, Chichimeca) (nai-gch or, if Guachí is added as sai-gch, perhaps nai-gcl to prevent the two similarly-named lects from being mixed up by only typoing the initial n vs s), apparently sparsely attested.
  • Concho (nai-cnc). The Handbook of North American Indians, volume 10, says "three words of Concho [...] were recorded in 1581 [and] look like they may be [...] Uto-Aztecan".
  • Jumano (Humano, Jumana, Xumana, Chouman, Zumana, Zuma, Suma, and Yuma) (nai-jmn). The Handbook says "It has been established that the Jumano and Suma spoke the same language. Three words have been recorded" of it.

and from South America:

  • Peba / Peva (sai-peb), said by Erben to more properly by called Nijamvo, Nixamvo. Spoken in "the department of Loreto" in Peru. Attested in wordlists by Erben and Castelnau, which Loukotka provides, and which disagree with each other substantially: munyo (Erben) / money (Castelnau) "canoe, small boat"; nero (E) / yuna (C) "demon"; nebi (E) / nemey (C) "jaguar"; teki (E) / tomen-lay (C) "one", manaxo (E) / nomoira (C) "two"; etc. I would even consider that one might not be the same language as the other... what's with these languages that survive in disparate wordlists? lol.
  • possibly Saynáwa: fr.Wikt grants a code to this variety of Yaminawá language, described here (see also [1]).

- -sche (discuss) 04:04, 16 August 2016 (UTC)Reply[reply]

Support all except possibly Akokisa. I think it's a dialect of Atakapa, and that the wordlist is very likely not being linked correctly. That said, it's so few words, that there's no real reason not to accept it as a separate language, just to be conservative about it. —Μετάknowledgediscuss/deeds 04:08, 16 August 2016 (UTC)Reply[reply]
Good point about Akokisa. (I am reminded that you had mentioned its dialectness earlier; sorry I forgot!) The wordlist, labelled only with a tribal name per WP, is possibly plain Atakapa, but Yegsa is supposedly recorded as specifically Akokisa; OTOH that doesn't rule out that Akokisa is a dialect. Indeed, M. Mithun's Languages of Native North America treats as dialects Akokisa, Eastern ("the most divergent, [...] known from a list of 287 entries") and Western ("the best documented. Gatschet recorded around 2000 words and sentences, as well as texts [...] Swanton recorded a few Western forms", all published in 1932 in a dictionary). I suppose the benefit to treating it as a dialect would be that we could context-label Yegsa and Quiselpoo as {{lb|aqp|Akokisa}} and then Béranger's forms as {{lb|aqp|possibly|Akokisa}} without needing to agonize over which header to put them under. - -sche (discuss) 15:31, 16 August 2016 (UTC)Reply[reply]

Nkore-Kiga edit

As can be seen at w:Nkore-Kiga language, Kiga [cgg] should definitely be merged into Nyankore [nyn]. Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledgediscuss/deeds 05:21, 18 September 2016 (UTC)Reply[reply]

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity [of] 78%–96% between Nyankore, Nyoro [nyo], and their dialects; 84%–94% with Chiga [cgg], [...and] 81% with Zinza [zin]" (Kiga, meanwhile, is said to be "77% [similar] with Nyoro [nyo]"), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) [...] called Runyakitara [...] taught at the University and used in internet browsing, but [it] is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)Reply[reply]

Itneg lects edit

See w:Itneg language. All the dialects have different codes, but we really should give them a single code and unify them. I came across this problem with the entry balaua, which means "spirit house" (but I can't tell in which specific dialect). It's also known as Tinggian (with various different spellings), and this may be a better name for it than Itneg. —Μετάknowledgediscuss/deeds 02:09, 23 September 2016 (UTC)Reply[reply]

What distinguishes these two? —suzukaze (tc) 03:31, 9 October 2016 (UTC)Reply[reply]

If there is no meaningful difference between these, I propose keeping Category:Chinese Han characters as it is managed by {{poscatboiler}} and merging Category:Chinese hanzi into it. —suzukaze (tc) 04:17, 9 October 2016 (UTC)Reply[reply]

@Wyang, Atitarev, is there a difference between Category:Chinese hanzi and Category:Chinese Han characters, or can Category:Chinese hanzi be merged into Category:Chinese Han characters as suzukaze proposes? - -sche (discuss) 00:27, 28 March 2017 (UTC)Reply[reply]
They can be merged, IMO. --Anatoli T. (обсудить/вклад) 00:52, 28 March 2017 (UTC)Reply[reply]
(reviving this discussion after almost three years) Merge per Suzukaze-c's proposal above. — justin(r)leung (t...) | c=› } 03:30, 19 January 2020 (UTC)Reply[reply]

There seems to be no notable difference between the two categories so they should be merged I guess. Ffffrr (talk) 21:40, 10 December 2021 (UTC)Reply[reply]

Update? For reference, it looks like the "Chinese hanzi" category is populated by this code in Module:zh-pron. 00:21, 27 May 2022 (UTC)Reply[reply]

Paraguayan Guaraní [gug] edit

I just noticed that we have this for some reason. Guaraní is a dialect continuum that is quite extensive, both in inter-dialect differences and in geography, and certain varieties have been heavily influenced by Spanish or Portuguese. That said, our Guaraní [gn] content is, as far as I can tell, pretty much entirely on Paraguayan Guaraní, which for some reason has a different code, [gug]. My attention was brought to this by User:Guillermo2149 changing L2 headers (I have not reverted his edits, but they do cause header-code mismatch). We could try splitting up the Guaraní dialects, but it would hard to choose cutoffs and would definitely confuse potential editors, of which we have had more since Duolingo released a Guaraní course. I think the best choice is to merge [gug] into [gn] and mark words extensively for which dialects or countries they are used in. @-scheΜετάknowledgediscuss/deeds 01:29, 1 November 2016 (UTC)Reply[reply]

  •   Support [gn] and [grn] are the codes of the macrolanguage, [gug] is the code for the specific dialect spoken in Paraguay, also, until now, I haven't found any [gn] lemma to be out of [gug]. --Guillermo2149 (talk) 01:52, 1 November 2016 (UTC)Reply[reply]
  •   Support. — Ungoliant (falai) 11:00, 1 November 2016 (UTC)Reply[reply]
  Support merging gn and gug. - -sche (discuss) 14:33, 1 November 2016 (UTC)Reply[reply]
  • @Guillermo2149, Ungoliant MMDCCLXIV, -sche, Angr: I see now that there are three more Guaraní dialect codes that we have: Mbyá Guaraní [gun], Chiripá [nhd], and Western Bolivian Guaraní [gnw]. I presume that we should merge these into [gn] as well, but the case is arguably less clear given that in our current state, all our [gn] lemmas are really [gug]. What do you all think? —Μετάknowledgediscuss/deeds 22:51, 14 November 2016 (UTC)Reply[reply]
    I stick by my motto, "When in doubt, merge". —Aɴɢʀ (talk) 09:53, 15 November 2016 (UTC)Reply[reply]
    I think we should actually merge [gn] into [gug] and not viceversa. By the way, [gn] is the only one that should be merged, [gun] has similar and some equal words but the language is very different, and [nhd] is similar and very close to [gug] but it's slightly different and always confused with [gug] --Guillermo2149 (talk) 00:37, 7 December 2016 (UTC)Reply[reply]
Don't forget there's also [gui] and apparently also [tpj]. - -sche (discuss) 04:28, 16 May 2017 (UTC)Reply[reply]

2017 edit

Merger into Scandoromani edit

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. -- 20:19, 25 January 2017 (UTC)Reply[reply]

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)Reply[reply]
I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)Reply[reply]
Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.
PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)Reply[reply]

Chinese Pidgin English (cpi) edit

This is not a separate language at all, it's just English with different grammar and some loanwords, but other than that it's completely intelligible with standard English. As such, it should be moved to Category:Chinese English. -- Pedrianaplant (talk) 15:19, 8 February 2017 (UTC)Reply[reply]

That's not at all the impression I get from Chinese Pidgin English. It seems to be a distinct language to me, as much as any other English-based pidgin. —Aɴɢʀ (talk) 16:45, 8 February 2017 (UTC)Reply[reply]
We did delete Hawaiian Pidgin English in the past though (see Template talk:hwc). I don't see how this case is any different. -- Pedrianaplant (talk)
I know we did, but I didn't participate in that discussion (only 3 people did), and I disagree with it too, probably even more strongly than I disagree with merging cpi. —Aɴɢʀ (talk) 17:02, 8 February 2017 (UTC)Reply[reply]
  • Basically, this is a terminological problem. There may have been a true pidgin in each of these cases, but it has not been recorded. What is called a pidgin in many descriptive works is instead a dialect of English that is very easy to understand, nothing like the real English-based pidgins and creoles that I have studied. If you look at the actual quotations used to support lemmas in Chinese Pidgin English, you find that it is Chinese English. Support merge, but leave [cpi] as an etymology-only code. —Μετάknowledgediscuss/deeds 23:16, 8 February 2017 (UTC)Reply[reply]
  • At least some texts seem very distinct, to the point of unintelligibility; consider "Joss pidgin man chop chop begin" (Whedon's translator begins chopping things? or "god's businessman begins right away"?). On the other hand, other sentences given by Wikipedia are quite intelligible...and possibly not attestable under the stricter CFI to which English is subject. I'm not sure what to do. (Our short previous discussion also didn't reach a firm resolution.) - -sche (discuss) 17:46, 8 March 2017 (UTC)Reply[reply]
    I mean, I use joss and chop chop in English normally (having grown up in a fairly Chinese environment likely has something to do with that)... and I think that was chosen as an especially extreme example. —Μετάknowledgediscuss/deeds 03:32, 25 March 2017 (UTC)Reply[reply]

More unattested languages edit

The following languages have ISO codes, but those codes should be removed, as there is no linguistic material that can be added to Wiktionary. This list is taken from Wikipedia's list of unattested languages, but I have excluded languages which are not definitively extinct (and thus which may have material become available). If there was any reliable source I could find corroborating the WP article's claim of lack of attestation, it is given after the language. —Μετάknowledgediscuss/deeds 04:15, 4 April 2017 (UTC)Reply[reply]

  • Aguano language [aga]
    Unclear if it even existed per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).
  • Barbacoas language [bpb] (the Wikipedia article has a discussion of the conflation of this unattested language with Pasto, which needs a code; for clarity, I think this [bpb] should be retired and an exceptional code made explicitly for Pasto)
    Retired, following the ISO, see Wiktionary:Beer parlour/2020/October#2019-2020_ISO_code_changes. Content, if needed for migration to a Pasto code, was m["bpb"] = { "Barbacoas", "Q2669202", "sai-bar", otherNames = {"Pasto"}, scripts = Latn, } - -sche (discuss) 06:23, 14 October 2020 (UTC)Reply[reply]
  • Giyug language [giy]
    AIATSIS has the following to say: "According to Ian Green (2007 p.c.), this language probably died before the 1920's and neighbouring groups in the Daly claim it was the language of Peron Island which was linguistically and perhaps culturally distinctive from the nearby mainland societies. Black & Walsh (1989) say that this may or may not have been a dialect of Wadiginy N31." —Μετάknowledge
    The 1992 International Encyclopedia of Linguistics, v. 1, p. 337, says "Giyug: 2 speakers reported in 1981, in the Peron Islands in Anson Bay, southwest of Darwin." The 2003 edition repeats the claim that "2 speakers remain". Wikipedia says it's extinct and unattested, but Glottolog, although having no resources on it, suggests it's not extinct. Might be best to leave it alone for now. - -sche (discuss) 01:13, 6 August 2020 (UTC)Reply[reply]
  • Mawa language (Nigeria) [wma] (We call this "Mawa", if removed, [mcw] Mahwa (Mawa language (Chad) can be renamed to the evidently more common spelling "Mawa".)
    Removed, and mcw renamed. Glottolog had only one reference to support the existence of Mawa, Temple (1922), which does not even include a section under that header. There may be confusion with the section on the "Marawa", but that does not even mention what language those people speak. (Temple also knows very little about linguistics; while skimming through, I found that Margi (a Chadic language) was said to be similar to the languages of South Africa. —Μετάknowledgediscuss/deeds 01:39, 6 August 2020 (UTC)Reply[reply]
  • Nagarchal language [nbg]
    Appendix I in The Indo-Aryan Languages records this language as being a subdialect of Dhundari [dhd] and the 1901 Indian Census concurs; this is at odds with its description as an unattested Dravidian language, but the geographical specifications seem to match up.
  • Ngurmbur language [nrx]
    AIATSIS says: "Harvey (PMS 5822) treats Ngomburr as a dialect of Umbukarla N43, but in Harvey (ASEDA 802), it is listed as a separate language." Nicholas Evans confirms in The Non-Pama-Nyungan Languages of Northern Australia that it is unattested.
  • Wasu language [was]
    Unclassified due to its absence of data per The Indigenous Languages of South America: A Comprehensive Guide (Campbell and Grondona).

Yenish edit

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-scheΜετάknowledgediscuss/deeds 00:49, 7 April 2017 (UTC)Reply[reply]

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)Reply[reply]
The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)Reply[reply]
(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)Reply[reply]

What's the difference? --Barytonesis (talk) 20:19, 17 April 2017 (UTC)Reply[reply]

Apparently (Google n-grams) the term could be used with or without an object. The definition should be somewhat different. An example of use without a direct object is "to rake over the coals of failure". I don't know how to word this in a substitutable way. It seems to mean something like "to belabor (something negative (result, process), obvious from context) as if in reprimand". DCDuring (talk) 15:14, 3 January 2018 (UTC)Reply[reply]
  • The first page of Google Books results for rake over the coals is nearly all dictionaries, which I would guess are purposefully omitting the pronoun / proper noun from the phrase. There are one or two proper uses, but they are both in transcripts of speech and may use the phrase to mean "rehash an old issue that has been thoroughly discussed" rather than "scold". I'm not convinced this form is used with a consistent meaning, so I favour a merge into rake someone over the coals. — excarnateSojourner (talk · contrib) 04:41, 8 September 2023 (UTC)Reply[reply]

Move entries in CAT:Khitan lemmas to a Khitan script edit

The Khitan wrote using a Siniform script. Are these Chinese transcriptions of Khitan? —suzukaze (tc) 02:22, 13 August 2016 (UTC)Reply[reply]

I'm a little confused about what's going on here. Are you RFV-ing every entry in this category? Or are you just looking for evidence that Khitan was written using this script? —Mr. Granger (talkcontribs) 12:45, 13 August 2016 (UTC)Reply[reply]
The Khitans had their own script. These entries use the Chinese script. —suzukaze (tc) 17:30, 13 September 2016 (UTC)Reply[reply]
I understand that, but I don't understand what your goal is with this discussion. If you want to RFV every entry in the category, then I'd like to add {{rfv}} tags to alert anyone watching the entries. If you want to discuss what writing systems Khitan used, maybe with the goal of moving all of these entries to different titles, then I'm not sure RFV is the right place for the discussion. (Likewise with the Buyeo section below.) —Mr. Granger (talkcontribs) 17:55, 13 September 2016 (UTC)Reply[reply]
Moved to RFM. - -sche (discuss) 21:04, 30 April 2017 (UTC)Reply[reply]

Some spurious languages to merge or remove, 2 edit

remove Adabe [adb]

Geoffrey Hull, director of research for the Instituto Nacional de Linguística in East Timor, notes (in a 2004 Tetum Reference Grammar, page 228) that "the alleged Atauran Papuan language called 'Adabe' is a case of the mistaken identity of Raklungu," a dialect (along with Rahesuk and Resuk) of Wetarese. He notes (in The Languages of East Timor, Some Basic Facts) that only Wetarese is spoken on the island, and Studies in Languages and Cultures of East Timor likewise says "The three Atauran dialects—with the northernmost of which the dialect of nearby Lirar is mutually intelligible—are unquestionably Wetarese, and not dialects of Galoli, as Fox and Wurm suggest for two of them (n. 32). The same authors refer (ibidem) to a supposedly Papuan language of Atauro, the existence of which appears to be entirely illusory." (The error appears to have originated not with Fox and Wurm but with Antonio de Almeida in 1966.) - -sche (discuss) 01:45, 31 May 2017 (UTC)Reply[reply]

We could repurpose the code into one for those three Atauran varieties of Malayo-Polynesian Wetarese, Rahesuk, Resuk, and Raklu Un / Raklungu (the last of which Ethnologue does list as an alt name of adb, despite their erroneous family assignment of it), perhaps under the name "Atauran Wetarese" for clarity. - -sche (discuss) 01:52, 31 May 2017 (UTC)Reply[reply]
remove Agaria [agi]

Glottolog makes the case that this is spurious. - -sche (discuss) 07:57, 31 May 2017 (UTC)Reply[reply]


Arma (aoh) is also said to be "a possible but unattested extinct language"; I am trying to see if that means it is entirely unattested, or if there are personal/ethnic/place names, etc. - -sche (discuss) 09:45, 3 June 2017 (UTC)Reply[reply]

Removed, see Wiktionary:Beer_parlour/2020/October#2019-2020_ISO_code_changes. - -sche (discuss) 06:18, 14 October 2020 (UTC)Reply[reply]
Aghu language

The VU Amsterdam report linked to here seems to indicate that one lect has been given multiple codes, and that "Jair" at least is spurious. Further research wouldn't hurt. —Μετάknowledgediscuss/deeds 00:24, 3 October 2019 (UTC)Reply[reply]

Can we come up with more descriptive names than Category:Aa please? —CodeCat 22:37, 14 May 2017 (UTC)Reply[reply]

IMO they are fine as they are. We could use "Letter Aa", etc, I guess. - excarnateSojourner (talk | contrib) 04:51, 29 April 2022 (UTC)Reply[reply]

This should be handled with {{liushu}}, since jiajie is one of the six categories (liushu). — justin(r)leung (t...) | c=› } 18:36, 17 May 2017 (UTC)Reply[reply]

Can both of these templates be renamed to include a language code? —CodeCat 19:01, 17 May 2017 (UTC)Reply[reply]
{{jiajie}} should be merged with {{liushu}}, which could be renamed as {{Han liushu}}, following {{Han compound}} and {{Han etym}}. It might not be a good idea to use a particular language code because these templates are intended for use in multiple languages now. They used to be used under Translingual, but we have decided to move the glyph origin to their respective languages. — justin(r)leung (t...) | c=› } 20:22, 17 May 2017 (UTC)Reply[reply]
You can use script codes as prefixes too. We have Template:Latn-def, Module:Cans-translit and such. —CodeCat 20:26, 17 May 2017 (UTC)Reply[reply]

Entries in CAT:Taos lemmas with curly apostrophes edit

Many Taos entries use curly apostrophes to represent glottal stops. They should either use the easy-to-type straight apostrophe ' that many other languages use, or the apostrophe letter ʼ that Navajo and a few other languages use. - -sche (discuss) 21:36, 20 May 2017 (UTC)Reply[reply]

I agree. The headword template interprets the curly apostrophe as a punctuation mark (because it is), and automatically links words such as adùbi’íne as adùbiíne. (Personally, I think the apostrophe letter looks better, but there may be other considerations.) — Eru·tuon 21:45, 20 May 2017 (UTC)Reply[reply]
Oh, and I just learned of the Unicode character for the saltillo. But no entries use it, and I am averse to introducing yet another visually-almost-identical symbol to represent the glottal stop, next to the three (counting the curly apostrophe) mentioned above that are already in use, plus the ˀ that some entries use. - -sche (discuss) 02:23, 21 May 2017 (UTC)Reply[reply]
I'm in favor of standardizing on U+02BC MODIFIER LETTER APOSTROPHE for any language that uses an apostrophe-looking thing as a letter. —Aɴɢʀ (talk) 13:52, 21 May 2017 (UTC)Reply[reply]
Probably reasonable for glottalizationy apostrophes. At least Skolt Sami uses ʹ U+02B9 MODIFIER LETTER PRIME for suprasegmental palatalization though, which should likely be kept separate. --Tropylium (talk) 16:55, 21 May 2017 (UTC)Reply[reply]
I've moved quite a few of these; about 140 remain to be moved. - -sche (discuss) 04:49, 24 July 2018 (UTC)Reply[reply]
  • @-sche Is there an easier way to tell which entries still need to be moved then opening each of them individually? I've tried using Ctrl+F on the category page, but apparently Mozilla thought it would be helpful for all these apostrophe-like characters to match each other. — excarnateSojourner (talk · contrib) 04:58, 8 September 2023 (UTC)Reply[reply]
    @ExcarnateSojourner If you Ctrl+F and then select "match diacritics" it matches only the character you want. Another option: if you download AWB you can use it to pull the contents of a category and then to filter and keep only titles containing the curly quote, which yields this list. The AWB software enforces the rule that you can't actually change pages with AWB unless you're approved, but you can use it to search database dumps and filter categories and generate lists like that regardless of whether you're approved or not. I pulled that list from the "Taos lemmas" category; the few non-lemma forms we have seem to use the modifier letter apostrophe already, so unless there are forms which are neither categorized as lemmas nor as non-lemmas, that should be all the entries ... but note that there may be occurrences in translations tables, links in one Taos entry to another, etc, which also need changing. - -sche (discuss) 19:13, 8 September 2023 (UTC)Reply[reply]
  •   Done: I (as ExcarnateSojournerBot) have used a Python script to replace curly apostrophes (U+2019) with modifier letter apostrophes (U+02BC) in the titles and text of all entries in cat:Taos lemmas, cat:Taos non-lemma forms, and the sole subcategory of the latter. — excarnateSojourner (talk · contrib) 23:45, 9 September 2023 (UTC)Reply[reply]
    Thanks. Are you able to run a script to clean up translations? E.g. river/translations and stream still have the form with the curly single quote mark. Plausibly there could also be mentions of Taos words in etymology sections. I think searching a database dump for instances of {{t|twf|...}}, or the equivalent with {{t+}}, {{t-check}}, {{t+check}}, {{tt}}, {{tt+}}, {{tt-check}}, {{tt+check}}, {{m}}, {{m-lite}}, {{l}}, and {{l-lite}}, or any etymology template ({{bor|FOO|twf|...}}, etc) where FOO is any language code (or any string of 2-11 a-z or - characters) and the ... is any number of characters other than | or }} that includes one or more curly apostrophes, would find relevant instances. If I could work out how to write that as a regex string, I could search the database dump myself with AWB and provide the list. - -sche (discuss) 06:24, 11 September 2023 (UTC)Reply[reply]

Should perhaps be moved to long story? W3ird N3rd (talk) 06:42, 9 August 2017 (UTC)Reply[reply]

In contrast to long story short, neither seems entryworthy to me. They are quite transparent. Checking “long story”, in OneLook Dictionary Search., one notes that none of those references find it inclusionworthy, whereas “long story short”, in OneLook Dictionary Search. shows some coverage. DCDuring (talk) 11:01, 9 August 2017 (UTC)Reply[reply]

sense: Noun: "(aviation) A large multi-engined aircraft. The term heavy normally follows the call-sign when used by air traffic controllers."

In the aviation usage AA21 heavy ("American Airline flight 21 heavy") the head of the NP is AA21, heavy being a qualifying adjective indicating a "wide-bodied", ergo "heavy", aircraft.

Move to noun with any adjustments required. DCDuring (talk) 13:19, 24 August 2017 (UTC)Reply[reply]

@DCDuring You're proposing we move from noun to noun? Did you mean from noun to adjective? - excarnateSojourner (talk | contrib) 05:57, 18 October 2022 (UTC)Reply[reply]
I don't know what I meant 5 years ago, but that's what I mean now: move it to adjective. Though it would be good to confirm that there is not sufficient attestation of heavies and/or [DET] heavy. DCDuring (talk) 12:48, 18 October 2022 (UTC)Reply[reply]
I can find the plural in reference to large (sometimes restricted to widebody) commercial aircraft and heavy bombers (sometimes 2-engine, always at least 4-). Also "heavy" motor vehicles (eg. large trucks, esp semis). I'm not entirely sure what heavy refers to when used by the pilot of a Cessna. DCDuring (talk) 12:57, 18 October 2022 (UTC)Reply[reply]
  • Keep everything where it is. We now have an appropriate adjective sense, and the plural of the noun has been cited. — excarnateSojourner (talk · contrib) 02:13, 10 September 2023 (UTC)Reply[reply]

Renaming mey edit

We currently have it as "Hassaniya" (which we used to spell as Hassānīya; those macra were removed along the way, presumably by Liliana, although I don't see any discussion; MG deleted the old category once it was empty). To match the other colloquial Arabic languages, it should be "Hassaniya Arabic". (Note: if Arabic is merged, this will become moot.) —Μετάknowledgediscuss/deeds 07:07, 16 September 2017 (UTC)Reply[reply]

This seems a bit different from most of the other forms of Arabic which are "[Adjective referring to a place] Arabic", where just calling the lect "Libyan" (etc) would be more awkward. Still, I have no objection to a rename, though I don't have time to rename all the categories right now. I also notice that, while Hassaniya is probably still the most common spelling overall, it seems like Hassaniyya started to become more common around 2003. - -sche (discuss) 04:03, 29 December 2017 (UTC)Reply[reply]

Categories about country subdivisions to include the country name edit

This will include at least the following:

Categories for certain things that are located within these subdivisions will also be named, e.g. Category:Cities in Aomori (Prefecture)Category:Cities in Aomori Prefecture, Japan. —Rua (mew) 13:07, 16 October 2017 (UTC)Reply[reply]

Support. I oppose the existence of categories with language code like "en:" in the first place, but what is proposed here seems to be an improvement over the status quo. --Daniel Carrero (talk) 20:27, 20 October 2017 (UTC)Reply[reply]
I would have opposed a lot of these, but I was too late on the scene. DonnanZ (talk) 15:51, 12 November 2017 (UTC)Reply[reply]
Support all except Category:Abkhazia, Georgia (for which I abstain as I do not properly understand the political situation explained by User:Palaestrator verborum). - excarnateSojourner (talk|contrib) 03:34, 29 October 2021 (UTC)Reply[reply]
US states were moved by MewBot (talkcontribs) in 2017. - excarnateSojourner (talk | contrib) 22:00, 27 April 2022 (UTC)Reply[reply]

The rename has been put on hold until there is a clear consensus either way. Please vote! —Rua (mew) 15:11, 14 November 2017 (UTC)Reply[reply]

@Rua It looks sane to me if politics are let out. But why is Abkhazia in Georgia though it is an independent state, statehood only depending on factual prerequisites and not on diplomatic recognition which has nothing to do with it? Where does the Crimea belong to? (article Sevastopol is only in Category:en:Ukraine because it has not really been edited since 2014.) I can think of two solutions: First possibility: We focus on geographical and cultural constants. Second possibility: We focus on the actual political power. I disprefer the second slightly because it can mean much work in cases of war (i.e. how much the Islamic state holds etc., or say the current factions in Libya). But in neither case Abkhazia is in Georgia. But the first possibility does not even answer what the Crimea belongs to, i.e. I am not sure if it is historically correct to speak of the Crimea as Ukraine. And geographical terms are often fuzzy and subject to editorial decisions. All seems so easy if you start your concepts from the United States, which do not even have a name for the region they are situated in. And even for the USA your idea is questionable because the constituent states of the United States are states in their own right (Teilstaat, Gliedstaat in German), as is also the case for the Federal Republic of Germany and the Russian Federation partially (according to the Russian constitution only those of the 85 subjects are states which are called Republic, not the Oblasti etc.). Is Tatarstan Russia? Not even Russians can agree with such a sentence, as in Russia one sharply distinguishs русские and россияне, Россия and Российская федерация. Technically Ceuta and Melilla are in Morocco because Spain is not in Africa. Also, Kosovo je Srbija, and it would become just a coincidence if a place important in Serbian history is listed as X, Kosovo or X, Serbia. Palaestrator verborum (loquier) 16:06, 14 November 2017 (UTC)Reply[reply]

@Rua: Most of these categories like Category:en:Special wards in Tokyo are back on the {{delete}} list. I think these should be removed again for the time being. DonnanZ (talk) 18:02, 14 November 2017 (UTC)Reply[reply]

  • Starting with the above, I don't know how the Tokyo ward system works, but I imagine it's a subdivision of the city. In England wards are subdivisions in cities, boroughs, local government districts, and possibly counties. "Wards in" is the natural usage.
Municipalities similarly. For example in Norway there are hundreds of municipalities (kommuner) which are subdivisions within counties (fylker). Some of these can be large, especially in the north, but so are the counties in the north. To me "municipalities in" is the natural wording.
States and provinces in the USA and Canada: In nearly all cases it is unnecessary to add the country name as the names are unambiguous. The only exception I can think of is Georgia, USA. This could also apply to prefectures in Japan and states in India (is there a Punjab in Pakistan?). DonnanZ (talk) 18:52, 14 November 2017 (UTC)Reply[reply]
Yes, there is, like there is in India. Maybe categorisations should be abundant? Cities can belong to Punjab as well as to Punjab, India, and the Crimea is part of administration of both the Russian Federation and the Republic Ukraine at least for some purposes in the Republic Ukraine. We can make the least thing wrong by adding Sheikh Zuweid (presuming it exists) as well to the Islamic State as to the Arab Republic of Egypt, because we do not want to judge morally and formally states and terror organizations are indistinguishable. On the other hand of course we need sufficient data to relate towns to administrative divisions and ISIS presumably does not publish organigrams. Palaestrator verborum (loquier) 19:44, 14 November 2017 (UTC)Reply[reply]
Discussion moved to Wiktionary:Requests for deletion/Others#Category:en:Directives.

2018 — February edit

Why is this in the singular? It just looks weird in the case of a title like this. (Somewhat irrelevant, extra issue: the page needs a lede to explain what a shortcut is.) PseudoSkull (talk) 05:23, 21 February 2018 (UTC)Reply[reply]

Support on both counts. —Μετάknowledgediscuss/deeds 19:23, 20 March 2018 (UTC)Reply[reply]
Support per nom. - excarnateSojourner (talk|contrib) 03:42, 29 October 2021 (UTC)Reply[reply]
@PseudoSkull There has been a section explaining what shortcuts are this whole time. It's just not right at the top, which might have been done intentionally to make the table of common shortcuts as quickly accessible as possible. - excarnateSojourner (talk | contrib) 06:10, 18 October 2022 (UTC)Reply[reply]

2018 — March edit

This is extremely trivial, not to mention something that could be found even if it were not categorised. I think that it suits an appendix much better, so I propose that its contents be moved to Appendix:English words ending in -gry. —Μετάknowledgediscuss/deeds 03:23, 15 March 2018 (UTC)Reply[reply]

A benefit to having it as a category is that theoretically it ought to be addable by the headword templates examining the pagename (like "English terms spelled with Œ"), which, if implemented (...if it could be implemented without excessive memory costs), would allow it to be kept up to date automatically. - -sche (discuss) 17:16, 15 March 2018 (UTC)Reply[reply]
That is true, but I don't really think we should be using headword templates to collate trivia. —Μετάknowledgediscuss/deeds 17:47, 15 March 2018 (UTC)Reply[reply]
Delete per proponent. --Per utramque cavernam 18:09, 31 May 2018 (UTC)Reply[reply]
Is there something like Category:English lemmas but sorted from the end, like anger, ranger, hunger, angry, hungry? --幽霊四 (talk) 19:40, 6 February 2021 (UTC)Reply[reply]
At http://tools.wmflabs.org/dixtosa/ you can get a list of all entries in any category that end with any string you like. —Mahāgaja · talk 20:58, 6 February 2021 (UTC)Reply[reply]
Support the proposed move per nom. - excarnateSojourner (talk|contrib) 05:00, 29 October 2021 (UTC)Reply[reply]

2018 — April edit

Entries for Japanese prefecture names that end in (ken, prefecture) edit

I would like to request the move of the content of entries like 茨城県 (Ibaraki-ken, literally Ibaraki prefecture) to simply 茨城 (Ibaraki, Ibaraki), cf. Daijisen. is not an essential part of the name.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): Suzukaze-c 03:19, 19 April 2018 (UTC)Reply[reply]

As a counterargument, Shogakukan's 国語大辞典 entry for 茨城 (Ibaraki) has one sense listed as 「いばらきけん(茨城県)」の略 ("Ibaraki-ken" no ryaku, "short for Ibaraki-ken"), and the 茨城 page on the JA Wikipedia is a disambig pointing to 茨城県 as one possible more-specific entry. ‑‑ Eiríkr Útlendi │Tala við mig 03:52, 19 April 2018 (UTC)Reply[reply]
(edit conflict) It seems like a two-word phrase to me. I am not a native speaker, but I think that if someone asked "水戸市は何県?" ((in) What prefecture is Mito?) then "茨城です。" (It's Ibaraki) would be a correct answer. Entries such as 奈良 and 広島 should have both the city and the prefecture. (I see that 奈良 currently does.) Cnilep (talk) 04:01, 19 April 2018 (UTC)Reply[reply]
茨城県です would also be correct and probably more common. At least 東京 and 東京都 are clearly distinguished. No one in Izu Ōshima would say he/she is from 東京. — TAKASUGI Shinji (talk) 04:04, 19 April 2018 (UTC)Reply[reply]
Yes, 茨城県 is also correct. And if someone asked どこの出身? (Where are you from?) the answer would probably be 奈良県 rather than 奈良, or else expect a follow-up question. But I don't think that is necessarily a matter of word boundaries. Compare Pittsburgh, Pennsylvania and Pittsburgh, Kansas; the fact that it is usually necessary, and always acceptable to specify the latter doesn't mean that Pittsburgh on its own is not a proper noun. By same token, I think that 茨城 (et alia) is a word. That's the point I had in mind. I will say nothing about what is more common. I don't even have good intuitions about frequency in my native language. Cnilep (talk) 04:54, 19 April 2018 (UTC)Reply[reply]
I fully agree that 茨城 is a term worthy of inclusion. I also think that 茨城県 is a term worthy of inclusion. We have entries for both New York and New York City, and even New York State. Similarly, I think we should have entries for [PREFECTURE NAME], and also for [PREFECTURE NAME] and [PREFECTURE NAME] and [PREFECTURE NAME], etc., as appropriate. ‑‑ Eiríkr Útlendi │Tala við mig 05:03, 19 April 2018 (UTC)Reply[reply]
I believe New York is a special case because there is both the state and the city. We have Washington State, but we don't have City of Chicago or State of Oregon. —Suzukaze-c 18:40, 19 April 2018 (UTC)Reply[reply]
A lot (maybe all?) of the prefecture names minus the (-ken) suffix are polysemous. Listing a few from the north to the south, limiting just to geographical senses, and just in the same regions at that:
  • 青森 (Aomori): a prefecture and a city
  • 岩手 (Iwate): a prefecture, a city, and a township
  • 秋田 (Akita): a prefecture and a city
  • 山形 (Yamagata): a prefecture, a city, and a village
  • 宮城 (Miyagi): a prefecture, a county, a township, a rural area (ancient Japan), a village, an island, and a mountain
  • 福島 (Fukushima): a prefecture, a city, and a township
  • 新潟 (Nīgata): a prefecture, a city, a park, and a village
  • 栃木 (Tochigi): a prefecture and a city
  • 茨城 (Ibaraki): a prefecture, a county, and a township
Jumping south a bit to touch on Anatoli's example further below:
  • 奈良 (Nara): a prefecture, a city, a township, and a village
I am consequently in support of including both the bare name, and the qualified name(s), much as we already do for similar situations with English terms. ‑‑ Eiríkr Útlendi │Tala við mig 21:35, 19 April 2018 (UTC)Reply[reply]
They are polysemic because most prefectures were named after their capital city during the abolition of the han system. Exceptions include 埼玉 and 沖縄, where cities are named after their prefecture. — TAKASUGI Shinji (talk) 12:23, 23 April 2018 (UTC)Reply[reply]
Generally support. Less duplication is good, and it is not much different from Chinese etc. for which we generally delemmatise, if not completely hard-redirect, these forms. Wyang (talk) 04:49, 19 April 2018 (UTC)Reply[reply]
Support. For a dictionary, I think we don't need to keep entries with both prefecture name and prefecture, despite the usage but it's always helpful to provide usage notes (e.g. normally used with 県: ~県) and usage examples, e.g. 奈良県(ならけん) (Nara ken, Nara (prefecture)). --Anatoli T. (обсудить/вклад) 05:45, 19 April 2018 (UTC)Reply[reply]

Same suffix as in быль (bylʹ), убыль (ubylʹ), прибыль (pribylʹ), отрасль (otraslʹ), поросль (poroslʹ). а belongs to the stem. Guldrelokk (talk) 23:27, 20 April 2018 (UTC)Reply[reply]

@Atitarev, Benwing2, Chignon: Please voice an opinion; if you agree, the couple of entries using this suffix need to be modified. —Μετάknowledgediscuss/deeds 01:52, 16 April 2019 (UTC)Reply[reply]
Agreed. The two entries need a change. --Anatoli T. (обсудить/вклад) 01:57, 16 April 2019 (UTC)Reply[reply]
ruwikt: Категория:Русские слова с суффиксом -ль (Category:Russian words suffixed with -ль). --Anatoli T. (обсудить/вклад) 02:08, 16 April 2019 (UTC)Reply[reply]
@Guldrelokk, Benwing2, Chignon: I have modified entries, the category is orphaned, -ль (-lʹ) still needs to be defined. --Anatoli T. (обсудить/вклад) 03:30, 16 April 2019 (UTC)Reply[reply]
@Atitarev, can you please resolve this? —Μετάknowledgediscuss/deeds 07:41, 6 March 2021 (UTC)Reply[reply]

2018 — July edit

After some discussion on Category talk:Baybayin script (that went a bit off-topic), some of the Indian language editors (@Bhagadatta, Msasag and myself) have agreed that this category should be renamed to Category:Eastern Nagari script, the reasons being (1) several languages other than Bengali use this script, and (2) the Bengali alphabet is just a subset of this script and lacks some of the glyphs used by other Bengali-script languages (most prominently Assamese which has a separate r-glyph). I want to make sure that there are no objections to this by editors who were not in the discussion. —AryamanA (मुझसे बात करेंयोगदान) 02:06, 20 July 2018 (UTC)Reply[reply]

google:assamese+site:unicode.orgSuzukaze-c 02:16, 20 July 2018 (UTC)Reply[reply]

@Asm sultan, Dubomanab Kutchkutch (talk) 05:35, 21 July 2018 (UTC)Reply[reply]

  Support -- Bhagadatta (talk) 08:38, 21 July 2018 (UTC)Reply[reply]

The two verb senses are bad IMHO. The first should be at busy oneself, I think, since it is always reflexive AFAIK. The second one doesn't sound right at all -- "He busied her" isn't something I've heard. Is that real at all? 02:36, 29 July 2018 (UTC)Reply[reply]

Support the move of verb sense 1 to busy onself. Send verb sense 2 to RFV. - excarnateSojourner (talk|contrib) 05:46, 29 October 2021 (UTC)Reply[reply]
It's not purely reflexive, so I oppose the move for sense 1. Examples: "I will [] busy him with my affairs till he forgets his own" [2]; "And what has been busying you?" [3]; " [] he busied you with other chores" [4]. Rarer than I thought, since I've heard e.g. "sorry for busying you" in real life, but it's a thing. Sense 2 I'm unfamiliar with. —Al-Muqanna المقنع (talk) 23:51, 2 December 2022 (UTC)Reply[reply]
[[busy oneself]] might be a good hard redirect to the appropriate sense of busy, which would benefit from {{lb|en|usually reflexive}} and corresponding usage examples. DCDuring (talk) 14:46, 3 December 2022 (UTC)Reply[reply]
The sense at [[busy]] should remain, whether or not there is a separate lemma entry for busy oneself. DCDuring (talk) 14:48, 3 December 2022 (UTC)Reply[reply]
Redirecting busy oneself and a label makes sense, agreed. —Al-Muqanna المقنع (talk) 16:10, 3 December 2022 (UTC)Reply[reply]

2018 — August edit

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)Reply[reply]

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)Reply[reply]
I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)Reply[reply]
@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)Reply[reply]
A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)Reply[reply]
@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)Reply[reply]
Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)Reply[reply]
Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:
  • There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
  • Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".
This, that and the other (talk) 09:25, 1 November 2023 (UTC)Reply[reply]

Mecayapan Nahuatl saltillos edit

A number of Mecayapan Nahuatl words are currently written with U+0027 APOSTROPHE, which is a punctuation mark and not a letter. And a couple are using U+02BC MODIFIER LETTER APOSTROPHE, which is the wrong shape for this language. They should all be written with U+A78C LATIN SMALL LETTER SALTILLO instead.

--Lvovmauro (talk) 09:48, 31 August 2018 (UTC)Reply[reply]

Or perhaps they should just be moved to use the Modifier Letter Apostrophe, cf WT:RFM#Entries_in_CAT:Taos_lemmas_with_curly_apostrophes, to avoid over-proliferation of different apostrophe-ish letters. I think we should try to be consistent within the Nahuatl languages, at least, in which codepoint we use. - -sche (discuss) 20:26, 31 August 2018 (UTC)Reply[reply]
Most Nahuan languages don't use any sort of apostrophe. Mecayapan is unusual. --Lvovmauro (talk) 01:54, 1 September 2018 (UTC)Reply[reply]

2018 — September edit

Arawak and Island Carib edit

Any objections to me renaming Arawak arw (4 entries) and Island Carib crb (0 entries) to Lokono and Kalhiphona, respectively? Arawak is easily confused with the Arawak/Arawakan proto language and family, and Carib is one of two often confounded languages, the Carib language and the Island Carib language. --Victar (talk) 04:03, 6 September 2018 (UTC)Reply[reply]

No objection to renaming Arawak, but I'm not sure about Kalhiphona, which seems to be quite rare even on a Google web search, and which seems to invite as much possible confusion (in its various spellings) with the various spellings of Garifuna as it avoids with other "Carib"s. - -sche (discuss) 06:56, 19 September 2018 (UTC)Reply[reply]

Template:superlative attributive of to Template:da-superlative attributive of edit

Only used for Danish. —Rua (mew) 17:15, 9 September 2018 (UTC)Reply[reply]

I don't envisage using them in Norwegian. DonnanZ (talk) 13:53, 11 September 2018 (UTC)Reply[reply]

It’s not about goon but go-on. Most books on Japanese seem to use kan-on and go-on with a hyphen rather than the correctly Romanized kan’on and goon. — TAKASUGI Shinji (talk) 15:42, 22 September 2018 (UTC)Reply[reply]

2018 — October edit

I propose to rename Category:Korean determiners to Category:Korean adnominals, just like Category:Japanese adnominals. Korean gwanhyeongsa are grammatically almost identical to Japanese rentaishi or adnominals, which may or may not be determiners. Gwanhyeongsa are generally divided into three classes: demonstrative gwanhyeongsa, numeral gwanhyeongsa, and qualifying gwanhyeongsa ([5]). The last ones are not determiners. (pinging @Atitarev, Eirikr, Garam, HappyMidnight, KoreanQuoter) — TAKASUGI Shinji (talk) 23:31, 10 October 2018 (UTC)Reply[reply]

Support. --Garam (talk) 08:21, 12 October 2018 (UTC)Reply[reply]
Tentatively Support. Let's check with User:Wyang who was also involved and had an opinion in a related discussion on the group of words ending in (, jeok). --Anatoli T. (обсудить/вклад) 02:42, 13 October 2018 (UTC)Reply[reply]
I feel determiner is the more common name for this in English; the different definitions of these terms across languages should not be a concern - e.g. we also use adjective differently for Korean. adnominal may be confused with the -eun, -neun, -eul, -deon forms of Korean verbs and adjectives. Wyang (talk) 03:57, 13 October 2018 (UTC)Reply[reply]
@Wyang: The problem is that Category:Korean determiners contains words other than determiners. It will be all right to have both Category:Korean adnominals and Category:Korean determiners without renaming if you want, just like Category:Japanese adnominals and Category:Japanese determiners. — TAKASUGI Shinji (talk) 10:31, 13 October 2018 (UTC)Reply[reply]

@Tibidibi, AG202Fish bowl (talk) 11:32, 7 February 2022 (UTC)Reply[reply]

ichthyosaur vs. ichthyosaurus, and other terms like these. edit

I'm in a dispute with an editor over the exact meaning and differences between these two terms - are they the same or must we tell apart the order from the genus? Is there is a standard to follow? Дрейгорич (talk) 15:55, 27 October 2018 (UTC)Reply[reply]

The standard is making a survey of contemporary and past usages and using that to inform the definitions. DTLHS (talk) 16:15, 27 October 2018 (UTC)Reply[reply]
I've gone ahead and cleaned up the definitions, and linked to the scientific genus in the entry in case anyone wants that. Дрейгорич (talk) 16:23, 27 October 2018 (UTC)Reply[reply]

2018 — November edit

Language request: Old Cahita edit

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)Reply[reply]

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)Reply[reply]
I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)Reply[reply]

Cleanup suggestions for some badly attested Semitic languages, needing admin action edit

Discussion moved from Wiktionary:Grease_pit/2018/November#Cleanup suggestions for some badly attested Semitic languages, needing admin action.
  1. Pray somebody add |scripts = {"Narb"} to Module:languages/data3/x after line 1026 for xna. (Otherwise mentions of words in it are shown in slanted letters.)
    Added. DTLHS (talk) 03:17, 14 November 2018 (UTC)Reply[reply]
    It seems that even MediaWiki:Common.css needs a new class for Narb added, to get font-style: normal; Sarb is there and has it, Narb is not there. If the mention of a North Arabian word in عَنْكَبُوت(ʕankabūt) works then it is complete. Also I see that in Module:scripts/data Narb does not have direction = "rtl" while Sarb has. Fay Freak (talk) 14:43, 15 November 2018 (UTC)Reply[reply]
    Good catch. I've updated Common.css and Mobile.cc and set it to display rtl. Sadly, it seems there are no fonts that display it. If you or I could find a good image of what the letters are supposed to look like, I might have time to make a basic font iff the letters don't have to be joined the way they do in Arabic. - -sche (discuss) 22:08, 18 November 2018 (UTC)Reply[reply]
    I as an Archfag recently had a great update three weeks ago that adds displaying support for Old North Arabian, amongst other things like which improved Arabic and Syriac script rendering everywhere. gucharmap calls the name of the font by “Noto Sans Old North Arabian”, which I find in the filelist of the noto-fonts package. @-sche Fay Freak (talk) 22:29, 18 November 2018 (UTC)Reply[reply]
  2. I think everything under Category:Old North Arabian script languages should be “Ancient North Arabian” (xna), it is to wonder that Dadanitic (sem-dad), Hismaic (sem-his), Safaitic (sem-saf), Taymanitic (sem-tay), Dumaitic (sem-dum), Hasaitic (sem-has), Thamudic (sem-tha) are separate languages on Wiktionary (some also with no script assigned). (Prolly someone went through some lects and added all he found.) Those lects are at a level of attestion or study where it does not even matter whether they are dialects or languages, and “Thamudic” is even a collective term for any of the Ancient North Arabian lects not further classified. Many inscriptions cannot be classified unto more specific lects anyway (you know, people also were nomads and wrote graffiti here and there) and they can only be entered as “Ancient North Arabian”. With words being found randomly and in concise consonantal writing I don’t see why one would pursue separation other than by stating the find spot.
  3. Also, “Qatabanian” (xqt), “Sabaean” (xsa), “Minaean” (inm), “Harami” (xha, redirects to “Minaean” on Wikipedia), Hadrami (xhd) – likewise otiose distinctions, regarding form and amount of attestion of Epigraphic South Arabian, as the name says only epigraphically attested, without any vowels –, have been unpopular in use already, entries and etymologies use the header “Old South Arabian” (sem-srb). I suggests to cross out those. Etymology-only is possible so one can use those in {{cog}} when in an individual case a word is known to be attested as of one of the dialects. North Arabian epigraphy categorization is more complex and it is better anyway to mention in each etymology where a lexeme has been encountered.
    1. Himyaritic (sem-him), as an attested language, is rather mythical because the Ḥimyarites wrote Sabaean. Wikipedia mentions “three Himyaritic texts”, at the same time in the Encyclopedia of Arabic Language and Lingustics s.v. we read about two: “It is not even possible to establish whether they were written in the same language. The first text dates from around 100 C.E. and the second from around 300 C.E.” And about the secondary material from Early Medieval Arabs: “It is easy to see that quotations from Himyaritic offer very different readings according to the manuscripts.” Or according to others, mentioned in the EALL, Ḥimyarite is the same as Arabic, only with peculiar features (which might as well derive from Arabicized transmission, or later language fusion or whatever, much that could fool us). It could be grouped with those spurious languages if this category held languages from Antiquity.
  4. Gurage is according to Wolf Leslau, it’s most eminent scholar, one language with twelve dialects; others share this view. The material for this language, particularly by Leslau across his works, only lists words as “Gurage”, without qualifying if they are “Inor”, “Mesqan” or some other Gurage, so on Wiktionary one cannot simply give “Gurage” words (which has recently been done in Semitic comparisons by abusing the code of the largest dialect Sebat Bet Gurage, in spite of the source saying “Gurage”). The following dialects I find on en.Wiktionary as languages: Kistane/Soddo (gru), Mesqan (mvz), Sebat Bet Gurage (sgw), Silt'e (stv), Inor (ior), Muher (sem-mhr), Mesmes (mys), Chaha (sem-cha), Wolane (wle), Zay (zwa); some of these are considered subdialects of Sebat Bet Gurage. There are more I don’t find on Wiktionary. It’s perhaps like with the Aramaic dialects yore or the Low German dialects today. People publish Westphalian dictionaries but it’s still Low German and so treated by Wiktionary. I suspect that instead of holding controversial subdivisions deriving from Ethnologue we should, holding to the sources, keep the Wiktionary-language level higher. The source for a certain word can be further qualified by labels as with Coptic. I mean that with language, unlike with biological taxonomy, one cannot simply assume that distinctiveness of a taxon is ascertained by experiments and then authoritatively published in some reference. As the individual forms are described in this dictionary, one must weigh if the data allows distinction at all. Currently it looks to me that hence Gurage must be lumped; I don’t know if, with new data or emerging different literary standards, separating the lects with separate codes will later be convenient (the increase in language material will be disappointing and unlikely someone will come and add Gurage in thousands of entries anyway, let’s be realistic), but I doubt that it would be comfortable. See also Why is Old Novgorodian a separate language in Wiktionary? This is the question: Is the difference in data enough to justify separation? The actual language-dialect distinction does not matter, it must be seen functionally, for dictionary purposes, for dictionary purposes. And if linguists publish material as “Gurage” the distinction is probably not good for Wiktionary headers. Isn’t it out of scope of Wiktionary to distinguish lect clusters when they are generally unwritten and chiefly written by and variously lumped and splitted by linguists? That’s a difficult question. Also I fear that such distinctions might be precisely the cause why nobody comes and pours out his rich Gurage knowledge. An adept would not be sure to distinguish, pendulating between two extremes, not witting if he should split as much as he can by all kinds of criteria or if to standardize and to abstract. To help though first all mentioned codes need the Ge'ez and Latin script both assigned, and the macrolanguage created. Maybe there will be late order from early ambiguity. Though I would perhaps do the order by lumping and labelling by location, were I that certain aficionado.
The obese Wiktionary:List of languages currently comprising 8055 lects needs cuts however. Fay Freak (talk)
This discussion really belongs at rfm, because that's where we normally discuss changes to whether or how we recognize a language. The Grease pit is for discussing how to implement something along those lines- not whether it should be implemented. The other option would be at the Beer parlour, but this seems like something that would benefit from the more specialized focus of rfm. Chuck Entz (talk) 03:39, 14 November 2018 (UTC)Reply[reply]
Good distinction. I hesitated at 4:13 AM where to put it because of the mixed content. Moved. Fay Freak (talk) 14:16, 14 November 2018 (UTC)Reply[reply]
Some prior discussion of Thamudic et al is on Category talk:Hismaic language; IIRC they were separated because literature does mention them as distinct entities, but if they were very similar or often treated as one language, and especially if there's difficulty in assigning specific texts to specific ones due to similarity, that would be an argument for reversing that decision and going back to the conservative approach of treating them all as one language with 'dialect'/'region' labels where appropriate.
(As to the venue, yes, these discussions tend to happen on RFM for quirky historical reasons — originally the discussions entailed actually merging or splitting language templates — although some have proposed the Beer Parlour as a more logical venue. There are minor benefits and drawbacks to either venue; this venue does have the advantage that discussions stay on the page until resolved.) - -sche (discuss) 17:20, 14 November 2018 (UTC)Reply[reply]
I avoided Beer Parlour because I thought it is only for matters already affecting people, but it would not affect anyone we know now. Fay Freak (talk) 14:43, 15 November 2018 (UTC)Reply[reply]
Who is likely to have access to resources on Africa's Semitic languages that could help judge what to do with Gurage? User:Metaknowledge, User:Wikitiki89? Wikipedia insists "The Gurage languages do not constitute a coherent linguistic grouping", which seems incompatible with merging them. William A. Shack, in his book on The Gurage, writes that "each Gurage dialect is usually understood only by its own speakers, and there is a rough correlation between the contiguity of dialect groups and the extent to which their dialects are mutually intelligible." (Steven Danver, in his (general-focus) encyclopedia, says "the languages of the different groups of Ethiopian Gurage are seldom mutually intelligible.") Marvin Lionel Bender, in his 1976 Language in Ethiopia, says "Although seventeen varieties of Gurage dialects are listed, mutual intelligibility reduces this to four languages and three dialect clusters as follows (Hetzron classification):
  Gogot, Misqan, Muxir, Soddo
  East Gurage (Inneqor, Silti, Urbareg, Weleni, Zway)
  Central West Gurage (Chaha, Gumer, Gura Izha)
  Peripheral West Gurage (Ener, Geto, Indegegn, Innemor)"
However, his very next sentence is: "Gogot, Muxir, Soddo comprise a geographical (non-genetic) grouping of non-mutually-intelligible languages known as 'North Gurage'", all of which seeems to suggest that merging all of the Gurages would not be sound.
- -sche (discuss) 17:28, 14 November 2018 (UTC)Reply[reply]
The cited grouping of course adds to the confusion. Three languages, but four dialects clusters, not mentioning their intersections? Well, we will not find out how one should see them without deep-diving. But the question is which direction Wiktionary should go: likely the current division is not correct. Should Wiktionary just add all possible splits so they can be cleaned up later when someone would commit himself to add the whole Gurage and judge about which distinctions are most convenient or should we have one macro-code because distinction is hopeless? The reason why I have even mentioned Gurage is that for example Leslau’s Etymological Dictionary of Geʿez which I like to use just gives words as “Gurage”, which sounds like there is a common vocabulary. Fay Freak (talk) 14:43, 15 November 2018 (UTC)Reply[reply]
Perhaps you can deduce from Leslau's literature list which Gurage language he gets his data from? He seems to have written an etymological dictionary of Gurage as well, presumably its foreword could clear things up.
His own field studies. I hade linked his Etymological Dictionary of Gurage (“according to Wolf Leslau” etc.). Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply[reply]
As a volunteer project (run on fancy), we really have no other choice than to wait for someone to investigate the matter deeply and order the languages in a manner that facilitates their lexicographical work.
Maybe we need non-genetic language group categories and ways to give forms in unindentified languages belonging to language groups. Crom daba (talk) 15:49, 15 November 2018 (UTC)Reply[reply]
  • @Fay Freak, -sche: A bit late, but here are my responses to the three outstanding problems (your #2–4):
    1. It is fairly evident that Ancient North Arabian is not a single language, and I advocate that sem-xna be abolished rather than the specific language codes; read Al-Jallad (2018), "What is Ancient North Arabian?". He sees Safaitic (which he has written a grammar of) and Hismaic as being of the same continuum as Old Arabic, but they are obviously too distinct from Classical Arabic for lexicographical purposes. He supports the distinctness of the others as languages, and of the various "Thamudic" lects. Based on Al-Jallad, I would prefer we split Thamudic B, C, D, etc as necessary; each language will have a very small corpus, but it seems like the most honest way to do it, and if more inscriptions are found, the lettered Thamudic wastebaskets will probably get their own names as the others did.
    2. Old South Arabian is also not a single language, though Sabaean was the standard that the other lects imitated, and I advocated that sem-srb be abolished as well. Multhoff (2019) in The Semitic Languages makes the case for four distinct languages: Sabaean, Minaean, Qatabanian, and Hadrami. She makes no mention, however, of Harami. Macdonald (2000), "Reflections on the linguistic map of pre-Islamic Arabia" explains that "Harami" is a name given to a few Sabaean texts that seem to have been contaminated by other Semitic languages, which is not at all an unusual feature and not unique to that site, so I suggest we remove that code.
      1. As for Himyaritic, I now think I was wrong to include it. There are three texts often attributed to it, but see Stein (2008), "The ‘Himyaritic’ Language in Pre-Islamic Yemen", which makes a strong argument to consider these as simply very late examples of Sabaean, which is indisputably the language of the other texts of the region in that script.
    3. Finally, for Gurage, the chief problem is that some scholars follow Hetzron in saying that Gurage is polyphyletic, in which case lumping would be committing a grave error (and the same charge has been levelled for Aramaic, with perhaps more evidence). Meyer (2011) in the International Handbook does seem to support the unity of Gurage, and treats the lects together, which gives me hope for lumping, but he is unwilling to commit to whether they should be considered dialects or languages. I think your Gurage-adding genius is mythical, so we have to choose which is least bad: many languages with scanty coverage, because their forms may be similar to forms entered under a different L2 header; or one Gurage language with decent coverage, but many forms that are not marked for what dialect they belong to and therefore a poor resource. I hesitantly support merger, given those choices. —Μετάknowledgediscuss/deeds 03:13, 10 August 2020 (UTC)Reply[reply]
    4. An addendum: "Hadrami" is a terrible name for xhd, and invites confusion with Hadrami Arabic. Wikipedia uses "Hadramautic", but N-grams and a quick literature review suggests that "Hadramitic" is more common. @Fay Freak, -sche again (yes, I know I'm pestering, but I don't want to move forward on all this alone, both because I am fallible and because some of these, particularly splitting OSA, would require a bit of work, although in that case there is an online corpus that will help immensely). —Μετάknowledgediscuss/deeds 02:40, 17 August 2020 (UTC)Reply[reply]
Re North Arabian: Many works I browsed through speak of Old North Arabian as a unit with dialects, but also carefully specify what lects (including Thamudic B vs C, etc) words are attested in. Some imply, in their presentations, that a large number of words are identical between dialects, at least in the sample of vocabulary that they're treating (e.g., the pronouns treated in Roger D. Woodard, The Ancient Languages of Syria-Palestine and Arabia (2008), pages 197-198), though this seems to be because the authors are presenting 'normal', normalized and romanized forms, given Al-Jallad's evidence that words (even the supposedly distinctive definite article) varied not just among dialects but even within the writings of individual speakers. The native script also loses many possible differences in pronunciation, but then, we are a written, writing-based dictionary. I find slightly more works speaking of "Ancient North Arabian dialects" than "Ancient North Arabian languages", and the fact that some authors have argued the varieties are the same language not only as each other but even as Arabic itself does suggest a high degree of similarity (or that the scholars in question are lumpers). As we're dealing with small, extinct and apparently clearly delineated corpora, it seems like the conservative approach of treating each under its own L2 could be better, and we could retire xna ... unless we need it as a wastebasket for unsorted things, which Al-Jallad (and Fay Freak, above) suggests we would. (Bah, It's messy business, deciding what's a language and what's a dialect...) I will try to dig into the rest later. - -sche (discuss) 04:10, 19 August 2020 (UTC)Reply[reply]

Well myself I have added Sabaean, Minaean, Qatabanian entries meanwhile, understanding and quoting a few inscriptions, although apart from some occasional features I noticed little how such an inscription can be classified as either, other than by provenience or rulers or gods mentioned—but that must be due to my blasé comparative approach that also makes me read Romance without recognizing the individual language. So somehow the volition to a merge is gone, though the lumping codes “Old South Arabian” and “Old North Arabian” must be kept for inscriptions no one has classified. Both are useful.

For Himyaritic, however, nothing is left. As here said already, the three alleged Himyaritic inscriptions don’t even need to be in the same language, and they aren’t even from anything to be called Ḥimyar (there are “Lesser Himyarites” and “Greater Himyarites” and the ethnic identity is fragmentary, too, by the way). In the “Critical Reevaluation” of the Ḥimyaritic language – cited by Wikipedia on Himyaritic language one does not know what for: their “undeciphered-k language” header recently introduced is surely a made-up term, oddly suggesting that these inscriptions are yet another language when those “k-language” inscriptions are exactly those otherwise claimed for Himyaritic, so we see Wikipedia editors had no clue and phantasize together languages due to their disdain for primary sources – helpfully includes a map, also coming to the conclusion “we have no reason to assume the existence of some “non-Ṣayhadic” language in pre-Islamic Yemen that was spoken besides the (Late) Sabaic idiom known from the inscriptions.” That from the fact that “Himyaritic” words typically given from Arabic sources are all also found in Sabaic, and the grammar found in the three inscriptions, including the prefixed instead of postfixed article which is only found in two of them, is too either found in Sabaean or can well be ascribed to their being poetry, which is also the reason for their being poorly understood. Many Arabic poems are also hard to understand and mostly helped by the copious material for the language which is not the case for languages with so limited a corpus, like Old South Arabian. Even in the Digests, Latin prose, not all passages are of discoverable meaning.

What would hinder man though to add understood words with quotes from the ominous inscriptions as Sabaean? Or anything from Arabic sources transmitted as Himyaritic instead of Arabic as Sabaean? For there is no evidence for it being a particular language. You see, from the corpus-based standpoint Wiktionary takes Himyaritic must go. Nothing can get the header “Himyaritic”, it can only be mentioned at Sabaean or Old South Arabian entries that Himyaritic nature is suggested by those who have come to believe in this extraordinary claim for which extraordinary evidence is not provided. Fay Freak (talk) 04:18, 6 August 2021 (UTC)Reply[reply]

I went on and moved our only “Ḥimyaritic” entry after that famous sentence to Yemeni Arabic in which the word طَيِّب(ṭayyib) for “gold” turns out otherwise known, and to be nothing else than Classical Arabic طَيِّب(ṭayyib, good) meaning “refined” and therefore gold, while Old South Arabian could not have developed such sense, so it is clear the famous quote one has been so inept to classify is at best only macaronic Sabaean-Yemenite Arabic. It is well put by Marijn van Putten:
The Arab grammarians were interested in describing correct usage of language of Classical Arabic. It is quite clear that Himyaritic (and by extension Yemeni Arabic) did not fall in the category of 'correct usage'. Within this context, it is of course not surprising that anything that is "wrong" and from Yemen might be denoted as Himyaritic. This would then include both varieties of Yemeni Arabic and some surviving vestiges of Ancient South Arabian. Fay Freak (talk) 04:59, 13 September 2021 (UTC)Reply[reply]
Now also in a new article by Koutchoukali like communis opinio, though his blogs transpire by him stalking Wiktionary: later Muslim historians would refer to anything related to South Arabia’s pre-Islamic history as “Himyaritic,” all memory of its other states having passed away. Fay Freak (talk) 01:01, 18 September 2021 (UTC)Reply[reply]

Merging Classical Mongolian into Mongolian edit

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)Reply[reply]

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply[reply]
@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.
In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.
Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)Reply[reply]
This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)Reply[reply]
@-sche, yes, the Persian model would be desirable.
It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.
The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)Reply[reply]
That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — [ זכריה קהת ] Zack. — 14:25, 30 November 2018 (UTC)Reply[reply]
Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)Reply[reply]
I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)Reply[reply]
Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply[reply]
OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply[reply]
@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)Reply[reply]
There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, sofa) in Inner Mongolia (from 沙發沙发 (shāfā) and софа (sofa, sofa) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.
Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)
Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. (обсудить/вклад) 03:25, 27 September 2020 (UTC)Reply[reply]
@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.
Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)Reply[reply]

2018 — December edit

Renaming agu edit

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledgediscuss/deeds 05:42, 19 December 2018 (UTC)Reply[reply]

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)Reply[reply]

2019 — January edit

"comparative adjectives" > "adjective comparative forms" edit

Apparently there was a recent vote to remove the ambiguity of comparative and superlative categories. What I don't understand is why the name "comparative adjectives" was chosen, which suggests a lemma category, yet it's now being subcategorised under non-lemmas. Lemma subcategories are named "xxx POSs", as can be seen in Module:category tree/poscatboiler/data/lemmas. Non-lemma subcategories are named "POS xxx forms", visible in Module:category tree/poscatboiler/data/non-lemma forms. Therefore, the obvious place for comparative forms of adjectives is the "adjective comparative forms" category we used to have. The new name, although voted on, stands out as an exception among all of our existing categories and is inconsistent. It should therefore either be renamed back to reflect its non-lemma status, or it should be moved back under its original lemma parent category. —Rua (mew) 23:57, 10 January 2019 (UTC)Reply[reply]

@Surjection, ErutuonRua (mew) 00:09, 11 January 2019 (UTC)Reply[reply]

The vote was here: Wiktionary:Votes/2018-07/Restructure comparative and superlative categories. — Eru·tuon 00:13, 11 January 2019 (UTC)Reply[reply]
Participles are not lemmas yet they are called "(language) participles", so it's not as if the comparatives/superlatives would exactly be exceptions of some kind. They even have their own "participle forms" categories! The former also applies to gerunds. — surjection?⟩ 09:13, 11 January 2019 (UTC)Reply[reply]
And to make it clear, "adjective/adverb comparative/superlative forms" categories are to be made obsolete as a direct result of the vote. — surjection?⟩ 09:16, 11 January 2019 (UTC)Reply[reply]
Yes, and that should be undone, because as I said, the name "comparative adjectives" suggests that they are lemmas because of our existing naming scheme. Participles are non-lemmas by virtue of being participles, but adjectives are lemmas, so "comparative adjectives" are also lemmas. Are you implicitly proposing to rename all non-lemma categories to this new scheme, e.g. "dual adjectives", "plural nouns", "possessive nouns", "feminine adjectives"? If the vote is upheld then I will propose this change to make things consistent again. —Rua (mew) 12:00, 11 January 2019 (UTC)Reply[reply]
I certainly would not assume "comparative adjectives" refer to lemmas in any way as much as "participles" don't. If we go back to "adjective comparative forms", what do you suggest for the name of the category with inflected forms of such? And don't just say "put them in 'Adjective forms'", because that at the very least isn't consistent as I stated below. In the old system, there was no consistency at all - inflected forms of comparatives and superlatives went to either the same category as them or Adjective forms without any sort of rule. — surjection?⟩ 12:17, 11 January 2019 (UTC)Reply[reply]
I would not even categorise inflected forms of comparatives in a special way. They are just adjective forms. I don't even think comparatives should be categorised separately at all, there is no obvious need to do so. The example of possessive forms is perhaps the best parallel, since they have inflection tables of their own in Northern Sami and many other languages. Do you propose renaming them to "possessive nouns" so that there can be a separate "possessive noun forms" category? —Rua (mew) 12:28, 11 January 2019 (UTC)Reply[reply]
If you feel comparatives too don't need a special category, I'm personally fine with bunching all of them under "adjective forms", but that will too need wider consensus to implement. When it comes to those possessive nouns, I would argue comparatives and superlatives are closer to participles than to those possessive forms, which is why I believe they're not a good parallel and should be considered separately. — surjection?⟩ 12:40, 11 January 2019 (UTC)Reply[reply]
Why? —Rua (mew) 12:46, 11 January 2019 (UTC)Reply[reply]
Many participle forms develop into adjectives of their own right and some comparative/superlatives too have developed into their own forms. Possessive forms by comparison basically never have, showing that they are fundamentally different in some way. — surjection?⟩ 12:49, 11 January 2019 (UTC)Reply[reply]
In fact, unlike this new system which has parallels, I'm fairly sure the old system of having "adjective comparative forms" but then the forms of comparatives under "adjective forms" is more of an exception. — surjection?⟩ 09:32, 11 January 2019 (UTC)Reply[reply]
Not really. We don't have separate non-lemma categories for everything in Module:category tree/poscatboiler/data/lemmas and in fact we don't need to. Under the old system, all comparative forms could be categorised under "adjective comparative forms", so that includes all case forms of comparatives. There was never any need to separately categorise forms of comparatives. In fact I'm generally opposed to subcategorising non-lemmas, so that's why I moved everything in Dutch to just "adjective forms". We don't need a subcategory for every possible type of non-lemma form. However, if we do have them, then they should be named consistently. —Rua (mew) 12:00, 11 January 2019 (UTC)Reply[reply]
We don't have separate non-lemma categories for the reason that many of them are simply not inflectable on and upon themselves. Again, participles have separate categories for the main participle and inflected forms of such - why should this not apply to comparative and superlative adjectives? — surjection?⟩ 12:17, 11 January 2019 (UTC)Reply[reply]
What I get out of your argument is that you think "POS xxx forms" should become "xxx POSs" when the form has its own inflections. But then what about cases like English, where comparatives don't have their own forms and are simply adjective forms? Or cases like Dutch or Swedish, where there are multiple superlative forms but their inflections are shown on the lemma? How is an editor supposed to know what the name of the category for any particular adjective form is, when some of them are named differently from others? —Rua (mew) 12:28, 11 January 2019 (UTC)Reply[reply]
That is indeed my argument for comparatives and superlatives due to their so far horridly inconsistent handling. In the case of English and all other languages, they will only have "comparative adjectives", no "comparative adjective forms", much like English would have "participles" that too aren't lemmas but would not have "participle forms". In cases like Dutch, Swedish and such where comparative/superlative forms are more numerous, those need to be handled on a language by language basis, ideally to choose one of the forms as the most lemma-esque (such as which form dictionaries primarily use to describe the comparative/superlative of an adjective), and if not one can be decided, it is more of a tricky situation (possibly all into "comparative/superlative adjective forms"?). Editors in turn can rely on other existing entries and eventually remember these entries much like the existing ones are, or use language-specific headword templates. Yes, the new system is by no means perfect, but I would argue it is miles better than what we had before. — surjection?⟩ 12:38, 11 January 2019 (UTC)Reply[reply]
But again, how is an editor of these languages supposed to know that, while adjective forms normally go in "adjective xxx forms", it is somehow different for comparative and superlative forms? You still haven't answered this. Your argument is based on sublemma-ness, but this differs per language, not all languages treat comparatives and superlatives as sublemmas. The categorisation should allow for both treatments depending on the needs of the individual language, not force a particular treatment on all languages. The fact that you think it makes sense for Finnish doesn't mean it makes sense for English. Now we have Category:English comparative adjectives for an adjective form, but Category:English noun plural forms for a noun form. How is that consistent? —Rua (mew) 12:45, 11 January 2019 (UTC)Reply[reply]
I did already answer that question - read the latter part of my previous response. Many a time has an editor checked an existing entry to see how something is formatted, and I doubt there would be a single editor that has never done that. Many of the languages with comparatives and superlatives set up have language-specific headword templates, and many of those too have ACCEL which can too give the correct headword category autom- oh wait, it can't anymore since someone removed that capability. — surjection?⟩ 12:49, 11 January 2019 (UTC)Reply[reply]
You have not answered the question. An editor cannot, based on the rule that non-lemma categories are named "adjective xxx forms", guess the correct name of the category for comparative forms, whereas they could before. Instead, there is now a single exception that comparatives are named "comparative adjectives". Where are all the other "xxx POSs" categories for non-lemmas? Again, are you proposing that all non-lemmas be renamed to match this new scheme? If not, what justifies this single exception? —Rua (mew) 12:54, 11 January 2019 (UTC)Reply[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Which question exactly have I not answered? The question was "how would an editor of these languages know the correct name for the categories?", which I have now answered not less than twice in my two previous responses. Instead, what it seems you are arguing is that the new scheme creates inconsistency in terms of the category names for non-lemma forms. Indeed, if other derivations are shown to be just like participles or comparative/superlatives, I'm happy to agree to move them under a similar scheme as well, but the possessive forms you brought up above are not an example of such. — surjection?⟩ 12:58, 11 January 2019 (UTC)Reply[reply]

Since it seems that this is the new norm for naming categories, I have proposed to rename all existing categories to match the new naming scheme at WT:BP. —Rua (mew) 13:16, 11 January 2019 (UTC)Reply[reply]

@Rua Given the edits you have made to the templates and modules are still in place, are you willing to revert those yourself or are you asserting that you are overriding the consensus established by the vote? — surjection?⟩ 21:10, 11 January 2019 (UTC)Reply[reply]

See also Category talk:Terms making reference to character shapes by language.

Perhaps they could be merged, or perhaps both could be kept (Japanese: characters; letters?), but the naming should be consistent, at the least. —Suzukaze-c 11:08, 20 January 2019 (UTC)Reply[reply]

Merge, perhaps into Category:Terms derived from character shapes by language (a bit shorter, and inclusive of non-letter characters). - excarnateSojourner (talk | contrib) 04:50, 28 April 2022 (UTC)Reply[reply]

2019 — February edit