Wiktionary:Beer parlour

(Redirected from Wiktionary:BP)
Latest comment: 1 hour ago by RichardW57 in topic Braided Trees of Descendants

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

Earlier years





















November 2023

Vector2022 letter to el.wiktionary - Discussion edit

A letter was sent to us at el.wiktionary, that by Novermber 11th a new skin Vector2022 (like this, or this example) will be applied as default desktop view.
Discussion in English is ongoing here for everyone to join. Regardless of aesthetics, I am worried for the loss of __TOC__ (placed at all our Appendices and such pages) and very sad to have interwiki links, which we click constantly, hidden in a dropdown. We are now trying to substitute these, manually, or with some Tempaltes. If this skin is intended for wiktioanries too, not only wikipedias, could we ask for wiktionary‑specific modifications? We would be interested in your opinion and support. Thank you. ‑‑Sarri.greek  I 23:53, 1 November 2023 (UTC)Reply[reply]

Parsing policy edit

Why is Wiktionary:Parsing categorized as a Wiktionary policy?  --Lambiam 08:39, 2 November 2023 (UTC)Reply[reply]

You'll have to ask @Koavf, who added the category. PUC – 08:49, 2 November 2023 (UTC)Reply[reply]
Because I couldn't think of anything better. —Justin (koavf)TCM 08:59, 2 November 2023 (UTC)Reply[reply]
  Resolved: now categorized as just Wiktionary.  --Lambiam 17:58, 2 November 2023 (UTC)Reply[reply]

Splitting Ancient Greek edit

@Mahagaja, Sarri.greek, Saltmarsh (please ping any other interested users)

Currently, Ancient Greek is handled as one (macro)language. This means that while Attic and Homeric Greek have a very good coverage, other lects like Aeolic and Doric are mostly an afterthought. For instance, the inflection tables note that the "dialectal" inflections are discussed in the appendix.

AFAIK, until Koine Greek there was no standardised Hellenic variety at all. Homeric Greek had a lot of influence on the various lects, but everyone mostly wrote in their own vernacular. As such it makes sense to me to split the Ancient Greek lects into major dialect groups, also considering the fact the various lects differ quite strongly. I imagine two scenarios:

  • A very rough division (Arcado-Cypriot, Ionic (incl. Attic), Aeolic, West Greek (incl. Doric).
  • A more detailed division (Arcadian, Cypriot, Attic, Western Ionic, Eastern Ionic, Thessalian, Boeotian, Lesbian, Doric, Northwestern Greek)

This would help on the following fronts: Most importantly, it would increase the possibilities in covering the various dialects from inscriptions, as well as (Lesbian) Aeolic in Sappho's work or (Eastern) Ionic in Herodotus. It will also make etymological coverage for Tsakonian historically accurate. As a bonus it would also finally give Proto-Hellenic more credibility, and make it much easier to provide descendants in the form of various languages, rather than various dialects of one single language.

I'm eager to hear your thoughts on this. Thadh (talk) 11:57, 2 November 2023 (UTC)Reply[reply]

I don't think splitting grc into multiple languages is necessary to achieve any of those goals. All of your desiderata are achievable with the status quo of having the dialects be etymology-only varieties of Ancient Greek. Splitting grc up would simply unlink the less well covered dialects from the very useful infrastructure (templates and modules) we have in place. If there are ways in which the existing templates and modules are inadequate for the less popular dialects, I think it makes more sense to improve the templates and modules to accommodate them. —Mahāgaja · talk 12:31, 2 November 2023 (UTC)Reply[reply]
The only ancient greek dialect which was certainly NOT mutually intelligible with the others is Arcado-Cypriot, which, like Scots compared to English, is very conservative in Nature. I think it is the only one which needs to be split.
As for the other four, Attic, Doric, Aeolic and Ionic, they were probably no more different than modern english dialects, with the exception that english orthography is practically the same everywhere. Ελίας (talk) 14:20, 2 November 2023 (UTC)Reply[reply]
Splitting the language up would just make things more complicated: more language codes to keep track of and more knowledge of Ancient Greek dialectology required to do simple things like adding a quote. Chuck Entz (talk) 15:08, 2 November 2023 (UTC)Reply[reply]
Splitting would be a nightmare for Greek borrowings in other languages. We would not know which dialect code to use. Vahag (talk) 19:17, 2 November 2023 (UTC)Reply[reply]
I agree with User:Mahagaja. We already have the infrastructure in place for handling several dialects of Ancient Greek; the focus on Attic and Homeric Greek is simply due to the fact that there are a whole lot more sources for these dialects than for the others. Look at the current situation with Scots, which is almost completely neglected; that's what would happen if we split off various of the Ancient Greek dialects. IMO if there's any split that makes sense, it's splitting the later stages of Greek (e.g. Medieval Greek) into a different L2 language, and I know there has already been a discussion about this initiated by User:Sarri.greek, although it didn't end up anywhere. (Note, I'm not expressing a specific opinion on whether this split is the best thing as I don't know enough about Medieval Greek.) Benwing2 (talk) 20:46, 2 November 2023 (UTC)Reply[reply]
@Benwing2. It did not (Medieval Greek, March2023), and I intend to renew the petition once a year, so that I can resume my work (now mainly on Koine and Med.Greek) at en.wiktionary. I hope that en.wiktionary handles 'languages', period phases as well as dialects (which it calls 'languages' too), according to bibliography, not because of the personal interests of editors. The love of wiktionarians for Homer and dialectal Greek is commended, but may I remind you, that an Athenian of the 5th century would comfortably listen to Doric at theatre plays -amidst the Peloponnesian War- (a label marking dialects, I think, suffices). Speaking of phases, the label Koine (grc-koi), also needs some care, because it covers many centuries. Although en.wikt/arians dislike Koine and Medieval Greek, they did exist, and all bibliography accepts it, the variety of opinions regarding only termini. Thank you Sir, for bringing this issue up, it really was a blow to me, the neglect with which it has been suppressed. ‑‑Sarri.greek  I 00:16, 3 November 2023 (UTC)Reply[reply]
@Sarri.greek It is unfortunately common for Wiktionary discussions to peter out with no action taken. Feel free to create another Beer Parlour discussion, and make it a simple request to split Medieval Greek (with an appropriately defined time period) from Ancient Greek. The last discussion was long and I am not sure exactly what the objections were. You might want to state the prior objections and give rebuttals, but the fewer words used, the better, otherwise people are likely to not read it. Benwing2 (talk) 00:24, 3 November 2023 (UTC)Reply[reply]
@Benwing2, sorry: Why is a discussion needed for the obvious? Does en.wiktionary need discussions to handle well referenced linguistic issues, may that be 'kinds of borrowings, languages, dialects, etc? My first paragraph at Medieval Greek, March2023 is quite short, very clear, mentions the reference-support, and I was amazed that the blah blah had to drag that far. The sysops of a wiktionary or of a wikipedia, need to just take a brief look at the bibliography to get the picture; one does not need to be a specialist on the language. If the sysops of enwikt, abstain from taking a look on the grounds that they are not specialists, it will never, ever be implemented. If there were non-anonymous, professional consultants for wikiprojects, discussions would deal only with tech matters and the details of implementing things. ‑‑Sarri.greek  I 00:38, 3 November 2023 (UTC)Reply[reply]

We should choose a Word of the Year edit

Choosing a Word of the Year seems to be a popular dictionary tradition. The problem is that most of the picks are godawful, either being some neologism that no one has heard of (goblin mode, lol) or having no apparent relationship to what actually happened in the last year. I think the dictionary world needs some people who can take this job seriously.

My proposal is generative, reflecting the sophisticated generative AI models released throughout late 2022 and early 2023, such as: ChatGPT (November 2022), GPT-4 (March 2023), and DALL-E 3 (August 2023). Google Trends data shows a significant increase in searches for "generative" and other AI-related terms throughout 2023 [1].

Do you guys think this would be a good idea? Pinging @Sgconlaw, Lingo Bingo Dingo. Ioaxxere (talk) 20:13, 3 November 2023 (UTC)Reply[reply]

Agreed. It can be based on some actual data like increased percentage of views. Maybe a top 10 or unranked list of five? —Justin (koavf)TCM 20:31, 3 November 2023 (UTC)Reply[reply]
Interesting. I have a few questions:
  • Who chooses the word? Is there to be a panel, or is the word to be voted on? Or is it to be based on actual data, as @Koavf suggests?
  • Presumably the word has to have gained currency in the preceding year? Or, to put it another way, should Word of the Year 2023 be featured in 2024?
  • On what date does the word get featured?
Sgconlaw (talk) 20:34, 3 November 2023 (UTC)Reply[reply]
@Sgconlaw, Koavf I think the word should be chosen by a WT:VOTE in which anyone can nominate candidates. The winner can be featured on the main page around late December to early January. Ioaxxere (talk) 19:11, 4 November 2023 (UTC)Reply[reply]
Good thinking. —Justin (koavf)TCM 19:12, 4 November 2023 (UTC)Reply[reply]
@Ioaxxere: I suppose the Word of the Year will be featured somewhere on the Home Page? We'll need to think about the layout of the page and where the WOTY will appear—above the WOTD box, or elsewhere? Will it stay up for a whole year? — Sgconlaw (talk) 20:09, 4 November 2023 (UTC)Reply[reply]
No, I meant that it would only be featured around late December to early January. Ioaxxere (talk) 20:31, 4 November 2023 (UTC)Reply[reply]
It seems a fun idea and I encourage you to pursue it, but I am not going to take part in setting it up or choosing words. I do have an alternative proposal for a WotY: jailbreak, which is in my opinion a lexically more interesting word than generative. ←₰-→ Lingo Bingo Dingo (talk) 20:36, 4 November 2023 (UTC)Reply[reply]
We could ask ChatGPT for suggestions, weren't it for the 2021 cutoff date :) Jberkel 20:56, 4 November 2023 (UTC)Reply[reply]
Ok, I've created Wiktionary:Votes/2023-11/Word of the Year Ioaxxere (talk) 22:50, 5 November 2023 (UTC)Reply[reply]
@Ioaxxere: it probably doesn’t need to be that formal a vote. An ordinary vote here at the Beer Parlour is sufficient. — Sgconlaw (talk) 23:06, 5 November 2023 (UTC)Reply[reply]
I agree that it doesn't have to be that formal, but I think it's fine to keep it as a formal vote. It will give it more prominence, since it will show up on everyone's Watchlists. Andrew Sheedy (talk) 23:13, 5 November 2023 (UTC)Reply[reply]

incipient edit war on Module:ar-headword edit

Module:ar-headword has long had the ability to mark a personal/non-personal distinction on nouns, since it affects the agreement and pluralization patterns (non-personal nouns take feminine singular agreement in the plural and often use different plural forms). User:Fenakhay removed this functionality without explanation, and when I asked them why, they gave no justification other than "doesn't make sense". I undid this as a contentious change made without consensus, and Fenakahay reverted my undo claiming that the onus is on me to find consensus to undo his change. AFAIK this isn't at all how Wiktionary consensus works; the onus is on the person making the change to seek consensus if the change is controversial. In my view, this information is useful and important, and similar to the animacy marking in Slavic languages (compare also Romanian, which has a class of gender-changing nouns that are marked as "neuter" on the lemma). Fenakhay thinks this info is not useful mainly based on the fact that it's typically not marked in Arabic dictionaries (but from what I've seen, Arabic dictionaries are deficient in many respects compared with the best dictionaries of other major inflected languages, and leave out lots of info useful for non-native speakers). Benwing2 (talk) 01:26, 4 November 2023 (UTC)Reply[reply]

No Arabic dictionary; be it monolingual or bilingual, marks “animacy” to words. The addition is unjustifiable. Non-natives making stuff up and reinventing how Arabic gender is listed because they read it in a grammar book... Typical. — Fenakhay (حيطي · مساهماتي) 01:31, 4 November 2023 (UTC)Reply[reply]
Furthermore, it is not about “animacy” but being sentient or not. So anything that's not sentient, their adjective is inflected in the feminine singular including animals. For example: تِلْكَ ٱلْكِلَابُ ٱلْحَمْرَاءُ تَنْبَحُ‎ ― tilka l-kilābu l-ḥamrāʔu tanbaḥuthose red dogs bark, as you can see, the adjective أَحْمَر(ʔaḥmar) is inflected in the feminine singular, same for the determiner تِلْكَ(tilka) and the verb itself.
If a learner wants to know if a word refers to a sentient or non-sentient, they only need to ask themselves if the referred is a person or not. — Fenakhay (حيطي · مساهماتي) 01:48, 4 November 2023 (UTC)Reply[reply]
"sentient" is another word "person". "Animacy may not be the right word but there are different levels, e.g. Polish/Ukrainian, etc. has "inanimate/animate/person" (three-way distinction), as opposed to "inanimate/animate" only in Russian, etc. We can discuss terminology. Anatoli T. (обсудить/вклад) 02:13, 4 November 2023 (UTC)Reply[reply]
Notifying Arabic editors: (Notifying Alarichall, Atitarev, Benwing2, Mahmudmasri, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fixmaster, Roger.M.Williams, Zhnka, Sartma): Fenakhay (حيطي · مساهماتي) 01:34, 4 November 2023 (UTC)Reply[reply]
Adding grammatical information is a plus, especially if it helps to determine how words are used in a sentence. Native speakers may find it intuitive but I don't know if the person/non-person agreement is never taught at school in Arabic speaking countries.
Let's look at these examples (yes, from a grammar book in English)
  1. الْمُعَلِّمُونَ مُجْتَهِدُونَ‎‎ (persons)al-muʕallimūna mujtahidūnathe teachers (m-p) are diligent, personal pronoun: هُمْ(hum)
  2. الْمُعَلِّمَاتُ مُجْتَهِدَاتٌ‎‎ (persons)al-muʕallimātu mujtahidātunthe teachers (f-p) are diligent, personal pronoun: هُنَّ(hunna)
  1. الْأَقْلَامُ جَدِيدَة‎‎ (non-persons)al-ʔaqlāmu jadīdathe pens (m-p) are new, personal pronoun: هِيَ‎(hiya)
  2. الطَّاوِلَاتُ كَبِيرَة‎‎‎ (non-persons)aṭ-ṭāwilātu kabīrathe tables (f-p) are big, personal pronoun: هِيَ‎(hiya)
The adjectives for non-persons in the plural are in the feminine forms.
Another example:
السُّودُ(as-sūdu, the blacks), الْبِيضُ(al-bīḍu, the whites) - these can only refer to humans
I don't quite know what the Arabic gender structure was and is now at Wiktionary but I think we need to distinguish persons from non-persons.
(By the time I've typed my answer, I see new edits appeared) Anatoli T. (обсудить/вклад) 02:06, 4 November 2023 (UTC)Reply[reply]
It is a simple equation:
  • if WORD1 (in the plural) refers to a human being, then the adjectives are inflected according to the gender/number of the word.
  • if WORD2 (in the plural) refers to a non-human; be it an object, a concept or an animal, then the adjectives are inflected in the feminine singular.
Sorry but this is a grammar rule and doesn't add any information to the word itself. It is not rocket science. — Fenakhay (حيطي · مساهماتي) 02:12, 4 November 2023 (UTC)Reply[reply]
@Fenakhay. No rocket science, true but I find it useful what agreement to use dependent on the sense. Like @Thadh also mentioned below, it depends on the sense. The same applies to Slavic languages for many words (not trying to make Slavic and Semitic similar to each other but I find similarities) Anatoli T. (обсудить/вклад) 02:15, 4 November 2023 (UTC)Reply[reply]
We are taught about عَاقِل(ʕāqil, sentient) and غَيْر عَاقِل(ḡayr ʕāqil, non-sentient) in school. — Fenakhay (حيطي · مساهماتي) 02:13, 4 November 2023 (UTC)Reply[reply]
@Fenakhay: Thanks. Do you think labelling "sentient/non-sentient" is inappropriate in the Arabic headword (more than one if that's a case for specific words)? "person/non-person" is just another way of expressing the same thing, which is also used in grammar books. Anatoli T. (обсудить/вклад) 02:19, 4 November 2023 (UTC)Reply[reply]
I'm told that this personal/non-personal distinction in verbal/adjectival agreement is always evident from the noun's meaning, and not inherent to a lemma by itself; In that case, it seems like something we might put as a note in inflection tables, but I don't think it needs to be added to headwords. As for plurality patterns, that doesn't seem like a strong enough argument by itself. Thadh (talk) 01:42, 4 November 2023 (UTC)Reply[reply]
It can be left out in the plural, marking all, even the animates, as pl, but editors we know not will feel a need to be more explicit, I know the edit patterns of casual site visitors. And mark inanimate plurals as feminines for example. To avoid inconsistencies and have clear models, it is sensitive that we have specific gender markers at plural POS. While singular entries typically have enough noise, I wouldn’t want to add on every page “hey, did you know that plural forms of inanimate nouns agree with feminine singular forms in Arabic?” Fay Freak (talk) 02:34, 4 November 2023 (UTC)Reply[reply]
The declension table shows "sound masculine plural"/"sound feminine plural" for sentient (person) nouns. The gender labels can only be applied to sentient (person) nouns to avoid too much "noise".
It can be compared to Czech nouns where only masculine nouns differ by animate/inanimate. The animacy for feminines/neuters is unimportant (no grammatical changes). Anatoli T. (обсудить/вклад) 02:42, 4 November 2023 (UTC)Reply[reply]
@Benwing2: You have not answered my question implied on Fenakhay’s talk page whether after the removal the site uses less Lua memory or processing time. Since I am not invested in essentialist dogmatic distinctions, the consideration that there were only about ten pages, out of myriads, an amount of pages that for Wiki pages necessarily constitutes an “error margin” conditioned by the negligence resulting from project participation being voluntary, that were actually using the removed genders, in combination with the computing principle of toning down complexity, favouritises the removal. I am much less concerned with what would “make sense” abstractly than you might expect: though comparative conceptualization is attractive, the implementation’s predictable effect upon and explainability to occasional readers and editors is of concern. If there are some instructions that can be chosen for a template then one needs to understand what one would try to achieve on pages with it, what to signify to readers, otherwise it is not “useful”. If it were “useful info”, man would have marked it, isn’t it? The entries do not appear to be adrift of accurate, exhaustive grammatical information. Didn’t feel a need nor note and suddenly Benwing opened our eyes that without marking the genders by the theoretically envisioned method we were missing out on something all the time? Here I am concerned with not making entries overfraught with information few uninvited in our particular circle would understand—such as claiming other genders than feminine and masculine in the singular. The point to make, which eventual editors also attempt to make at those pages no matter our choice, is most appropriately noted at plural entries: the being a plural of something but agreeing with feminine singular vs. the being masculine plural and the being feminine plural. Fay Freak (talk) 02:18, 4 November 2023 (UTC)Reply[reply]
I support the view that sentient/non-sentient (person/non-person) should be added as an option to the headword/tables or usage notes, even if it hasn't been regularly done. We could say that Slavic word animacy is not important either. Come on, it's common sense, right? (:sarcasm:)
I will comply with whatever is decided, though. It bothers me, also that we are marking non-sentient plural nouns as "plural", which is kind of misleading. To me, it seems we have to distinguish three types of plurals, which govern different adjectives, verbs and pronouns. Anatoli T. (обсудить/вклад) 02:29, 4 November 2023 (UTC)Reply[reply]
Honestly I don’t really know where the marking non-sentient plural nouns as “plural” comes from, somewhen I recognized it as the correct thing. (Many old entries mark inanimates falsely as m-p or f-p after this rule.) Fay Freak (talk) 02:37, 4 November 2023 (UTC)Reply[reply]
I would use "m-p", "f-p" and something like "np" for non-sentient nouns. (Was is used before?) Anatoli T. (обсудить/вклад) 02:45, 4 November 2023 (UTC)Reply[reply]
@Fay Freak I am trying to understand your comment, but the difference in processing speed and memory between having the extra gender distinctions and not having them is negligible. Benwing2 (talk) 02:56, 4 November 2023 (UTC)Reply[reply]
Actually, @Fay Freak, how should we mark non-sentient plural nouns in your opinion? Should the gender be marked? This has been raised several times. Also, knowing what the original gender (in the singular) of those nouns was seems irrelevant grammatically. In fact, for some words I came across, it may be impossible or difficult to determine if they are feminine singular or (non-sentient) plural. Anatoli T. (обсудить/вклад) 02:55, 4 November 2023 (UTC)Reply[reply]
Probably plural-inanimate (with some abbreviation), this would make learners more aware of the agreement, and fewer editors would make the mistake of marking as feminine singular, as—due to the morphologic relations which a language user is aware of, and to avoid the claim of masculine inanimates switching their gender in the plural—I would prefer to say it is not feminine singular; technically it means that the verbs and adjectives used with the inanimate plurals are also not feminine singular but of the same inanimate plural gender having form syncretism with feminine singular but we won’t muddle the tables with this observation. I know those cases where one is unsure whether something is plural of something or just an alternative form and/or by itself a singular, this is no specific problem, since in such cases there are also masculine singular inanimates. Fay Freak (talk) 03:09, 4 November 2023 (UTC)Reply[reply]
Animacy is culture-specific, and the Slavic languages do not allow for personal beliefs. ("Я съел вкусный зайца" would be ungrammatical regardless of whether you think a hare is animate). From what I understand, in Arabic this isn't the case, and the speaker does decide whether to assign animacy (/sentiency) to a noun or not. If I misunderstand, do tell me, because in that case I will change my opinion above. Thadh (talk) 03:32, 4 November 2023 (UTC)Reply[reply]
@Thadh: In Arabic, the usage is also quite grammatical. Please see the simplest Arabic examples I used in my post above. "sound masculine plural" would be inappropriate for e.g. non-sentient nouns (non-humans). The distinction is not between animate/inanimate but between persons and non-persons (animals fall into the same category as things). Anatoli T. (обсудить/вклад) 03:38, 4 November 2023 (UTC)Reply[reply]
I don't doubt non-persons cannot be agreed to with personal markers, but is the other way around also true? If a word is not clear to be a person or not (e.g. mythical creatures)? Thadh (talk) 04:06, 4 November 2023 (UTC)Reply[reply]
@Thadh The situation in the Slavic languages is not quite so clear-cut, AFAIK. The Russian terms for things like "bacteria" and "virus" may or may not be animate, depending on the speaker (e.g. scientists tend to view the terms as animate, others mostly not) and Czech is known to have a large number of "facultative animates" (things like mushrooms that may or may not be considered animate, depending on the speaker, and things like salami that are clearly inanimate but nonetheless treated as animate by some speakers). Benwing2 (talk) 04:56, 4 November 2023 (UTC)Reply[reply]
@Thadh: The other way around is also true.
  1. In هٰؤُلَاءِ أَوْلادٌ‎‎ ― hāʔulāʔi ʔawlādunthese are boys هٰؤُلَاءِ(hāʔulāʔi, these) (m. pl) can only refer to sentient (rational) nouns.
  2. In هٰذِهِ كُتُبٌ‎‎ ― hāḏihi kutubunthese are books هٰذِهِ(hāḏihi, this) can refer to any feminine singular or non-sentient (irrational) plural nouns.
The plurals of non-sentient nouns are treated as feminine singular. They use the pronoun هِيَ‎(hiya, she), which also means "they" for non-sentient plurals.
Native speakers may shed more light on how mythical creatures are declined but will it make a difference for this discussion? Slavic languages also have corner cases. Anatoli T. (обсудить/вклад) 05:34, 4 November 2023 (UTC)Reply[reply]
Just adding a separate perspective as an Anglophone student of Arabic. I don't have great expertise in the language and I use Wiktionary a lot when reading Arabic because it is more informative and more easily navigable than traditional dictionaries. (I mostly edit when I try to look up a word in Wiktionary and realise that a word or a sense is missing.) I really appreciate having as much grammatical information as possible: the transliterations into Latin script, full inflection tables, information about gender, etc. Wiktionary is unlike traditional dictionaries in providing all these. It was especially useful to me at the beginning of my Arabic-studying journey: Wiktionary helped me stick with learning Arabic, and this in turn has encouraged me to keep contributing to Wiktionary. So although I don't have particularly well informed opinions about marking sentience, I would generally encourage including and keeping information that may seem obvious to native-speakers but is not obvious to students—including and perhaps especially total beginners. Alarichall (talk) 08:01, 4 November 2023 (UTC)Reply[reply]

How consensus works edit

An important point is being missed in this discussion. I'd like to get clarity on this. If a module, template or other practice has been stable for a long time, then any contentious change needs consensus before the change is made, and should be left in the status quo until consensus is achieved to change it. User:Fenakhay seems to disagree with this principle, based on their consistent attempts to force through the change being discussed above, and their serial reversions of my undos. Fenakhay claims as justification for this change that "there was no vote when the functionality was originally added", which seems quite spurious, as there rarely is such a vote. As an example, there was certainly no vote that led to the current state of Latin verbs using "I" forms, but the practice has long been stable, hence I am seeking consensus in the BP to change this. Similarly there was no vote that led to Ancient Greek being treated as a single L2 rather than several dialect-specific L2's, and User:Thadh rightly created a BP discussion instead of unilaterally introducing changes and then demanding that anyone wanting to undo the change needs consensus to do so. Benwing2 (talk) 03:09, 4 November 2023 (UTC)Reply[reply]

I agree with the principle that any such changes (provided there is either an active community for the language, the language has a large amount of readers, or the editor in question isn't an editor of the language), in 'core' matters, including headwords and language treatment, should be discussed first. Thadh (talk) 03:26, 4 November 2023 (UTC)Reply[reply]
He thought it is not contentious. It is inflammatory to claim he seems to disagree with the principle. Since practical use in the future was not demonstrated, on the contrary. The discussion has become theoretical by large now since no one is realistically hindered in expanding upon our Arabic entries. Accurate though that the reasoning formulated was spurious. Accurate also that, for this but theoretical effect of the particular present state of the module, there is a negligible status quo bias in favour of the previous module’s state, which would of course be changed anyway if you turn out to have the better view, after the discussion which no one has been prevented from kicking off if he care, this is surely a consideration when someone is WT:BOLD. I understand you cherished your own work and intellectual input that went into the module; if you actually planned to use the contested features within the next days it would be a different matter, but this is not the case, hence we are rightly apathic to whether your version or Fenakhay’s edits stay in the near future: We are making consensus now, and whether or not one of you two gets the provisory last word—you could edit war on it and nothing would change in the world, futile! Again, we try to think what people would realistically use in the entries. Fay Freak (talk) 03:32, 4 November 2023 (UTC)Reply[reply]

Does 'terms borrowed back into LANG' include cases where the borrowing was from an ancestor? edit

I am cleaning up remaining cases where 'twice-borrowed terms' occurs, since the category has been renamed. There are, for example, 57 cases in CAT:Greek twice-borrowed terms, all of which appear to have the category added manually and where the chain of borrowing was typically Greek <- Ottoman Turkish <- Ancient Greek. Do these count as "borrowed back into Greek" terms? Similarly, there are several French terms borrowed from English which ultimately were borrowed from Old French. Do these count as "borrowed back into French" cases? Yet another example are wasei kango terms (Japanese coinages made from Chinese words) that are borrowed back into Chinese (we have around 100 of them). Most Japanese borrowings of Chinese words occurred during Middle Chinese, yet the {{wasei kango}} template considers them 'borrowed back into Chinese' terms and adds the category manually. If we do consider these are "borrowed back into" terms, this should be handled automatically, and either way, we should remove the manually added categories (ignoring cases similar to fakaleitī, where the etymology is incomplete so the category wouldn't get added automaticall). Benwing2 (talk) 05:57, 4 November 2023 (UTC)Reply[reply]

Hmm... on one hand, if we say these don't count, then it's kind of arbitrary that terms from ancient Hebrew or ancient zh borrowed via another language back into modern Hebrew or Chinese can be categorized, whereas terms from ancient Greek borrowed back into modern Greek can't, just because we previously and unrelatedly decided it was most practical to handle Ancient and modern Greek under separate L2s, but ancient and modern Hebrew and Chinese under (mostly) one L2 apiece. And a term that went from early Middle English to e.g. (middle) French to late Middle English can be categorized, but a term that went from late Middle English to (middle) French to Early Modern English can't be (which is, again, arbitrary). It means decisions about whether it makes sense to handle two different languages under one L2 will start being influenced by whether people want to be able to consider the language(s) to have twice-borrowed terms, which seems undesirable.
On the other hand, if we say these do count, do we have a "cutoff mechanism", so that we're not considering a term that went "PIE → Latin → English" to have been "borrowed back into English"? (That's not a rhetorical question; do we already have some module in which we record that "Old English, Middle English, modern English" count as stages of 'a language' in a way that "Proto-Indo-European, Proto-Germanic, English" don't? It seems plausible we might.) - -sche (discuss) 06:50, 4 November 2023 (UTC)Reply[reply]
@-sche That's a very good question that I didn't think of. AFAIK we don't have a built-in way currently of specifying that e.g. Old English is an earlier stage of English from this perspective whereas the ancestor of Old English (Proto-West-Germanic) is not. We do have a distinction between object inheritance (which represents an "is-a" relationship, e.g. US English is a kind of English, Mandarin Chinese is a kind of Chinese) and ancestrality (Middle English is an ancestor of English, and Old Italian is an ancestor of Italian even though it's also an etym-language variant of Italian). However, the ancestrality chain for English goes all the way back to PIE. I do think this can be determined automatically in most cases by looking for shared words at the end of the language name, and this accords with most people's sense of "early stage of a language": English and Old English share a word at the end, and Western Neo-Aramaic and its ancestor Aramaic share a word at the end assuming hyphens separate words, whereas English and Proto-West-Germanic don't. Benwing2 (talk) 07:00, 4 November 2023 (UTC)Reply[reply]
@Benwing2 I'm not a fan of that approach, because it's still totally arbitrary: Buryat and Mongolian are both descendants of Classical Mongolian, but your approach would only apply between CM and Mongolian, not CM and Buryat. The only reason we consider one to be Mongolian and the other not is for historical and political reasons, and if we renamed Mongolian to Khalkha (which would we very plausibly could) then suddenly it would change the status of all these terms. You could make the same argument for all the Langues d'oïl other than French with respect to Old French, for example. One of the strengths of the current set-up is that it gets around the issue of which language is the "true" main descendant, and I'd oppose adding it in. Theknightwho (talk) 07:06, 4 November 2023 (UTC)Reply[reply]
@Theknightwho There's also the practical issue that there's no way of distinguishing "back-borrowings" between A and B and regular borrowings using templates such as {{der}} or {{bor}}. Either we'd need to create an explicit {{bbor}} = "back-borrowing" or similar, or we'd have to make a new version of {{der}} that can have multiple levels of the chain inside its parameters. For example, replacing the following:
From {{inh|en|enm|orenge}}, {{m|enm|orange}}, from {{der|en|fro|pome orenge|t=fruit orange}}, influenced by the place name {{m|en|Orange}} (which is from Gaulish and unrelated to the word for the fruit and color) and by {{der|en|pro|auranja}} and calqued from {{der|en|roa-oit|melarancio}}, {{m|it|melarancia}}, compound of {{m|it|mela|t=apple}} and {{m|it|[[un]]'[[arancia]]|t=an orange}}, from {{der|en|ar|نَارَنْج}}, from Early {{der|en|fa-cls|نارنگ|tr=nārang}}, from {{der|en|sa|नारङ्ग|t=orange tree}},<ref name="OnlineED">{{R:Online Etymology Dictionary|entry=orange}}</ref> from {{der|en|dra-pro|*nār-}} (compare {{cog|ta|நார்த்தங்காய்}}, compound of {{m|ta|நரந்தம்|t=fragrance}} and {{m|ta|காய்|t=fruit}}; also {{cog|te|నారంగము}}, {{cog|ml|നാരങ്ങ}}, {{cog|kn|ನಾರಂಗಿ}}).
We'd have something like this:
From {{der|en|<<inh:enm:orenge>>, {{m|enm|orange}}, from <<ibor:fro:pome orenge<t:fruit orange>>>, influenced by the place name {{m|en|Orange}} (which is from Gaulish and unrelated to the word for the fruit and color) and by <<der:pro:auranja>> and <<ical:roa-oit:melarancio>>, {{m|it|melarancia}}, compound of {{m|it|mela|t=apple}} and {{m|it|[[un]]'[[arancia]]|t=an orange}}, from <<ibor:ar:نَارَنْج>>, from Early <<ibor:fa-cls|نارنگ<tr:nārang>>>, from <<ibor:sa:नारङ्ग<t:orange tree>>>,<ref name="OnlineED">{{R:Online Etymology Dictionary|entry=orange}}</ref> from <<ibor:dra-pro:*nār->> (compare {{cog|ta|நார்த்தங்காய்}}, compound of {{m|ta|நரந்தம்|t=fragrance}} and {{m|ta|காய்|t=fruit}}; also {{cog|te|నారంగము}}, {{cog|ml|നാരങ്ങ}}, {{cog|kn|ನಾರಂಗಿ}}>>.
The basic idea is that you can stuff an entire sentence into the second parameter of {{der}} (or whatever), and inheritance/borrowing/calque/etc. relationships are placed inside of <<...>>, similar to {{place}}. The variants ibor:, iinh:, ical:, etc. stand for "indirect borrowing", "indirect inheritance", etc. and indicate that the term in question is borrowed/inherited from the preceding-specified term; this lets the code have access to the full etymology tree, meaning it can do things like automatically find back-borrowings and other interesting phenomena. Benwing2 (talk) 07:45, 4 November 2023 (UTC)Reply[reply]
@Theknightwho If we do include back-borrowings of this sort, I would rephrase it not as "what is the (single) true descendant of a given language" but "how far up the chain do earlier stages go"? That means that e.g. Scots and English (since we treat them as separate L2's) could both have Middle English and Old English as earlier stages, but not Proto-West-Germanic, and similarly the various modern Oïl languages would all have Old French as an earlier stage but not Proto-Gallo-Romance. Benwing2 (talk) 07:55, 4 November 2023 (UTC)Reply[reply]
@Benwing2 I feel like the natural cut-off is to only include attested languages, but that may be too broad. Theknightwho (talk) 08:45, 4 November 2023 (UTC)Reply[reply]
@Theknightwho Does that mean Latin counts as an earlier stage of French? Benwing2 (talk) 08:58, 4 November 2023 (UTC)Reply[reply]
Well I suppose it is, and I suppose it’s somewhat interesting to see borrowings back and forth between language families: compare Old/Middle Chinese terms borrowed into Old/Middle Japanese, where the Japanese descendant has been borrowed into Mandarin. Intuitively, those seem notable to me. Theknightwho (talk) 09:15, 4 November 2023 (UTC)Reply[reply]
Any attempt at automation runs into the problem of determining when a given language started. This can’t reliably be determined by their conventional names, as mentioned in the above discussion. It may be best to simply let the status quo stand, leaving users free to decide this on a case-by-case basis. Nicodene (talk) 11:21, 4 November 2023 (UTC)Reply[reply]
Here is a suggestion for a cutoff mechanism excluding “PIE → Latin → English”. Consider a borrowing pattern A → ... → B → C, in which A is an ancestor of C. If A begat another descendant D before the term completed the leg B → C of its interlingual trip, where D is considered a genuinely different language from C, not a kind of dialect of C, this is a cutoff for the notion the term was borrowed “back” (Toto, I’ve a feeling we're not in A-land anymore). There will remain cases that are on the fence (what is “genuinely different”?), but this excludes most of the obvious cases (PIE *mel- → ... → French mal → English mal ) while allowing Ancient Greek κουκκούμιον (koukkoúmion) → Ottoman Turkish گوگم‎ → Greek γκιούμι (gkioúmi) and Middle Dutch bolwerc → French boulevard → Dutch boulevard  --Lambiam 16:37, 7 November 2023 (UTC)Reply[reply]
Spitballing: have an extra parameter for each language like isStageOf so e.g. ang would be set to enm, and enm to en and sco. Alternatively, store this in a separate module* only {{bor}} et al. access, so it doesn't inflate the size of the module that {{l}}, {{lb}}, {{head}} et al. access. What to consider a stage of what is subjective in places, but I don't think avoiding automating it avoids the problem, since we still need to know whether it's right if an editor manually categorizes a term, so people don't (intentionally, or even unawarely) edit-war over it.
For my part, I'm not sure I would consider Latin to be just an earlier "stage" of French, because Latin split into so many languages and French is not considered the "Modern Latin" (actual la-Latin is). So it'd be useful for us to decide that, regardless of whether we're categorizing manually or by module.
The question also extends to descendants of French, English, etc: if a term in Middle English was borrowed into (middle) French, then borrowed from modern French by Jamaican Creole, was it "borrowed back into Jamaican Creole"? I'm inclined to say no. OTOH an edge case like "term used in colonial-era English texts from Jamaica, borrowed into another unrelated language there, and then borrowed by Jamaican Creole" is the sort of thing I'd suggest allowing manual categorization of.
*In a separate module, each chain could also be separate, if other people actually do want to categorically allow any English term borrowed into another language and then into Jamaican Creole to count as twice-borrowed, and of course allow an Old English term borrowed into [stages of] French and then back into English to count as twice-borrowed, but don't want to consider an Old English term borrowed into French and then into Jamaican Creole to be twice-borrowed. Just have one chain "ang, enm, en" and another "en, jam", and {{der|jam|ang}} would see that no chain contained both "jam" and "ang" and so not count it as 'borrowed back'. - -sche (discuss) 15:16, 4 November 2023 (UTC)Reply[reply]
I've been doing cleanup of {{bor}} vs. {{der}}, and the same issue comes up there: {{bor}} should only be used for borrowing into the language of the entry, but people tend to see the word "borrowed" in an etymology and use {{bor}}, regardless of the steps in between. This is easy to sort out when an English entry uses {{bor}} for the borrowing of an Ancient Greek word into Latin, but there are lots of cases such as English entries where the borrowing occured in Middle English or Old English, or Indonesian entries where the borrowing was into Classical Malay. I can see how it could get really sticky in cases like borrowings between Scots and English, since they're both descended from Middle English but English speakers tend to think of English as the "real" continuation of Middle English. Then there are the Norwegian lects and their relationship with Danish.
Another thing I see a lot of is the use of {{inh}} for ancestors of terms that were borrowed from a related language, so someone might use {{inh|nb|gem-pro}} for a term that was borrowed from Middle Low German- but that's a separate issue. Chuck Entz (talk) 16:02, 4 November 2023 (UTC)Reply[reply]
@Chuck Entz it's good to hear you clarify the point about {{bor}} only being used for "borrowing into the language of the entry", as opposed to borrowings earlier in the chain of derivations. I suspected this to be the case, but it is not actually spelled out at Template:borrowed/documentation (which only has a note about language stages) or WT:ETYM (which contains the confusing, vague wording "If any step of a word’s history is a borrowing, this step should be flagged as such") - I wonder if you could add it to our documentation as appropriate? This, that and the other (talk) 23:39, 7 November 2023 (UTC)Reply[reply]

@Benwing2 about your "cleaning up remaining cases where 'twice-borrowed terms' […] for example, 57 cases in CAT:Greek twice-borrowed terms, all of which appear to have the category added manually and where the chain of borrowing was typically Greek <- Ottoman Turkish <- Ancient Greek. Do these count as "borrowed back into Greek" terms?", - and I add: also Ancient Greek > Latin > some european languages > Modern Greek. The Greek case (relation of ancient-modern) is special. Greek dictionaries use two different terms αντιδάνειο (antidáneio) (literally counter-loanword, marked in strict sense as "to borrow back", Rückwanderer), and αναδανεισμός (anadaneismós) ανά (aná)+δανεισμός (daneismós) (re-borrowing, to borrow again the same word, like your doublets). May I ask please, for a clarification of the definitions for the linguistic terms (also at Glossary) of twice-borrowed and reborrowing and their difference. I translated αντιδάνειο as twice-borrowed, at the above 57 Greek words. ++Probalby, I should have used 'reborrowed' Thank you. ‑‑Sarri.greek  I 00:00, 8 November 2023 (UTC)Reply[reply]

@Sarri.greek "Borrowed back into the same language" means the chain of borrowing was X -> non-descendant Y -> X. (The above discussion is whether the two X's can be different languages in a parent-child relationship.) "Doublet" means one of two terms that originated in the same term but arrived in the destination language by two different paths. "Twice-borrowed" is being phased out in favor of "Borrowed back into the same language", and "reborrowed" doesn't have a formal definition here. Can you clarify what the difference between αντιδάνειο and αναδανεισμός is? Benwing2 (talk) 00:13, 8 November 2023 (UTC)Reply[reply]
@Benwing2 I have difficulty understanding X - nondescendant... w:en:Reborrowing also puzzles me, because it mixes up 'to borrow again = doublet' and 'to borrow back = Rückwanderer'
αντιδάνειο (antidáneio) = thewordXX at language A > thewordXX(perhaps altered) to some OTHER language > thesamewordXX (perhaps altered) back to A2, a later phase of language A. Example: αψέντι (absinthe). For Greek, we mark it only when dictionaries say so, we do not make up markings from our own judgement. The other term 'αναδανεισμός' is the 'doublets borrowed two times resulting in two different forms (like your fire, pyre but Greek dictionaries do not comment on it. ‑‑Sarri.greek  I 00:31, 8 November 2023 (UTC)Reply[reply]

Appendix cruft in Citations edit

e.g. Citations:spectre. I don't think these citations for fancruft appendices should be in "real" citations space, mixing with the useful stuff that meets WT:CFI. Thoughts? Equinox 16:28, 5 November 2023 (UTC)Reply[reply]

Yes. get them out of there. — SURJECTION / T / C / L / 17:30, 5 November 2023 (UTC)Reply[reply]
Why? —Justin (koavf)TCM 01:00, 6 November 2023 (UTC)Reply[reply]
Because the sense they are for is never going to meet CFI. — SURJECTION / T / C / L / 06:58, 6 November 2023 (UTC)Reply[reply]
Below, I will try to explore the idea that "the sense they are for is never going to meet CFI". (P.S.: If the issue is with Template:item, please see diff below.)
On WT:CFI at WT:FICTION it is written:
"Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are independent of reference to that universe may be included only in appendices of words from that universe, and not in the main dictionary space."
At Wiktionary:Criteria for inclusion/Fictional universes, linked at WT:FICTION, it is further written:
"This is a Wiktionary policy, guideline or common practices page. []
"These are examples of the criteria for inclusion as applied to terms originating in fictional universes such as Star Wars, Star Trek, Lord of the Rings, Harry Potter, and Dungeons and Dragons. Examples below include lightsaber, protocol droid, Darth Vader, and Vulcan.
"Such terms which have three citations in separate works, but which do not have three citations that are independent of reference to that universe, may be included only in appendices of words from that universe, and not in the main dictionary space."
I don't care if you delete every quote and every bs fiction fancruft appendix. That's cool, I can 100% understand it. However, I might feel that the claim: "the sense they are for is never going to meet CFI" seen above in this diff seems unusual viewed in light of the above-quoted portions of WT:CFI and the ancillary policy page. I'm only reading the plain language and I have no awareness of other policies or practices that may nullify these passages. --Geographyinitiative (talk) 07:56, 6 November 2023 (UTC) (Modified)Reply[reply]
I think the Citations namespace can be used a place to show that a term is on its way to meeting CFI. I support having the Mass Effect cites there (although there should be a cite that refers to this sense without a mention of the video game.) CitationsFreak (talk) 19:25, 5 November 2023 (UTC)Reply[reply]
I agree with this stance. The citations in the Citations namespace should either count towards meeting CFI or, for particularly rare terms, help clarify the meaning when context alone is insufficient. The namespace should not be used for senses that are not CFI-compliant to begin with. Andrew Sheedy (talk) 19:48, 5 November 2023 (UTC)Reply[reply]
@Andrew Sheedy, CitationsFreak, Daniel Carrero, Equinox, Surjection See Appendix talk:Mass Effect for context. You guys work out what you want to do; this is totally experimental for me and I don't really care. I would encourage you not to judge the Mass Effect cites by Citations:spectre (MOVED TO Citations:Spectre), but instead by one of the better ones: Citations:Ardat-Yakshi. Bro that Citations page kicks ass, as I believe you will agree. Anyway, it's all theoretically identical to Citations:protocol droid which has been around for decades with no problem. lol lmao &c. --Geographyinitiative (talk) 23:43, 5 November 2023 (UTC) (Modified)Reply[reply]
You are correct, it is a very good Citations page. Not ready for a main entry, but something worth being there. CitationsFreak (talk) 23:47, 5 November 2023 (UTC)Reply[reply]
Yes, but the entry at protocol droid was deleted. Jberkel 00:01, 6 November 2023 (UTC)Reply[reply]
@Jberkel Please see Appendix:Star Wars, where 'protocol droid' is listed in an in-universe fancruft appendix containment zone with a link to the ancient page Citations:protocol droid. My goal with the recent Citations pages for Mass Effect in-universe words was to do something similar in most respects. --Geographyinitiative (talk) 00:25, 6 November 2023 (UTC)Reply[reply]
Yes, but different rules apply for entries that have been deleted, the citations page is usually kept for archival purposes etc., and sometimes it is created from the citations of the deleted entry. Ardat-Yakshi has not been deleted, and it'll probably never get created. Jberkel 07:44, 6 November 2023 (UTC)Reply[reply]
@Jberkel Thank you for your response. Please see diff, which explains the basis in WT:CFI for creating Citations:Ardat-Yakshi (1) without any intent to create Ardat-Yakshi, and (2) solely for the purpose of upholding Ardat-Yakshi's appearance on Appendix:Mass Effect. --Geographyinitiative (talk) 07:59, 6 November 2023 (UTC) (Modified)Reply[reply]
Then why not add the citations to the Appendix namespace, something like Appendix:Mass Effect/Citations, or Appendix:Mass Effect/Ardat-Yakshi/Citations? Adding lots of entries to Citations: without the intent of creation makes it kind of pointless. It's just noise. Jberkel 08:25, 6 November 2023 (UTC)Reply[reply]
@Jberkel Thanks for your reply. This is an interesting proposal and would be an unique page as far as I am presently aware. When I put citations on Citations:Ardat-Yakshi page, I was merely following the link automatically generated at Appendix:Mass Effect when using Template:item (which is used for 'protocol droid' at Appendix:Star Wars), where it is written:
"In appendices that contain lists of headwords and definitions, this template returns a headword.
"Additionally, this template also always generates a link to the respectives talk and citations page." --Geographyinitiative (talk) 08:44, 6 November 2023 (UTC)Reply[reply]
Created by one single editor without much consensus, it seems. "For an appendix that uses this template extensively, see Appendix:The Legend of Zelda". The fact that the mentioned Appendix is no more is a giveaway. Jberkel 08:52, 6 November 2023 (UTC)Reply[reply]

This was really a thought-provking discussion for me, and I'd like to let you see what I'm coming out of this with.
My conclusion is to proceed with putting fiction-universe word cites on the normal Citations pages unless/until I see tangible examples made by other people of a Citations page like Jberkel mentions. Whenever I see that, I'll change over to that. But in the interim, I think that it just makes sense to put them on the normal Citations pages, because you never know how these words might "break out" at some point, like hobbit, lightsaber, etc. Because I think anyone would agree the fiction-universe words do still have to meet normal three cites/independent/spanning a year rules somehow, on some page somewhere. Where else would they go than in the normal Citations page? When I see something different, I'll do that, but until then, yeah, I think it's commanded by Wiktionary's CFI guidelines that I continue, unless I assert that the words are in "clearly widespread use". Which I don't, and it would be foolish to assert. SEE ALSO: Wiktionary talk:Votes/pl-2008-01/Appendices for fictional terms. --Geographyinitiative (talk) 13:50, 13 November 2023 (UTC) (Modified)Reply[reply]

what makes an exonym and can it be automated? edit

We have CAT:Exonyms by language and its subcats. But how different do the source and target language renderings need to be for it to be counted as an exonym? There are obvious cases like Germany vs. Deutschland, Finland vs. Suomi, Egypt vs. Arabic مِصْر(miṣr). But CAT:English exonyms also includes Rome vs. Italian Roma, Seville vs. Spanish Sevilla, Milan vs. Italian Milano and even Hesse vs. German Hesse (??) and Tierra del Fuego vs. Spanish Tierra del Fuego (????), as well as close renderings of names written in other scripts like Kyiv vs. Ukrainian Ки́їв (Kýjiv) and literal translations like Sugarloaf Mountain vs. Portuguese Pão de Açúcar (literally sugar loaf). I'm asking because I wonder if it's possible to automate this in {{place}}; otherwise the categories are forever doomed to be incomplete. The case of transliteration may be impossible to automate, but just considering cases of borrowing from the same script, does any change at all in the spelling count? What about simple dropping of accents, like Peru vs. Spanish Perú? Benwing2 (talk) 03:53, 7 November 2023 (UTC)Reply[reply]

Maybe someone was thinking of the occasional difference in pronunciation with Hesse (that it's only sometimes pronounced like in German, and sometimes loses a syllable)?? though I'm not sure it's sensible to be that persnickety. Calling Tierra del Fuego an exonym (added in diff) seems wrong no matter how you look at it. Maybe someone just figured that calling it anything in ==English== was an exonym since the native name is ==Spanish==??? Calling Kyiv an exonym also seems wrong (added in diff), but maybe someone thought the fact that English speakers don't write "a building in Ки́їв was bombed last night" means they're using an exonym?? I can follow the train of thought that if the locals call something "Pão de Açúcar", then for English to call it "Sugarloaf Mountain" rather than e.g. "Pao de Acucar moutain" is an exonym, like if Germans called Philadelphia "Bruderliebe" it would be an exonym. But yes, we clearly need to establish some guidelines for what counts as an exonym, if people are categorizing everything up to and including "calling it Tierra del Fuego instead of Tierra del Fuego"! :o Personally, I would not consider adapting a name to another language's orthography and phonology to make it an exonym (so I would not count Kyiv, Peru or Tierra del Fuego as exonyms), or at least, I would suggest that if we were defining exonym as "any alteration whatsoever, even to approximate a phoneme a language doesn't have by one it does have, or to render it into a language's script", then it's too broad to be worth categorizing. - -sche (discuss) 19:16, 7 November 2023 (UTC)Reply[reply]
The definition of exonym agreed upon by the United Nations Group of Exports on Geographical Names is:
Name used in a specific language for a geographical feature situated outside the area where that language is spoken, and differing in its form from the name used in an official or well-established language of that area where the geographical feature is located. [2]
"Differing in its form" is pretty vague and all-encompassing, and seems to cover every example mentioned except for Tierra del Fuego, which has the same spelling and same pronunciation in the native language and in English.
A Polish research article [3] asks the same questions, and comes up with a whopping eleven different categories of exonym. I don't think we necessarily need to be that specific, but I agree that the term "exonym" is too broad for our purposes of categorizing them. I would propose the following divisions:
  • Endophones - the paper lists Turkish İtalya as an endophone of Italian Italia, as the pronunciation is fully identical but the spelling is different.
  • Endographs - France, Paris, and Argentina are examples of English names that are spelled identically to the endonym, but pronounced with notable differences. I'm unsure whether this should include mere diacritic differences like Perú, or whether those should count as entirely different characters.
  • Exographs - 北京 to Beijing or Peking, Ки́їв to Kyiv or Kiev, Brasil to Brazil. The exonym is not spelled with the same characters as the native name, and the pronunciation is not identical, but there is a systematic effort to adapt the endonym directly into the phonemes of the language.
  • Cognate exonyms - München to Munich, Napoli to Naples, 日本 to Japan. There is a significant change in both pronunciation and in spelling, but it is still derived from an endonym. The research paper divides this further based on the specific nature of the change, but that is likely overkill for us.
  • Calqued exonyms - Pão de Açúcar to Sugarloaf Mountain, Dutch Nederland to French Pays-Bas or Irish An Ísiltír. Perhaps German Niederlande and English Netherlands. The endonym is translated instead of directly borrowed.
  • True exonyms - Germany, Holland (in reference to the Netherlands), and Egypt are all examples of names that aren't derived from a modern endonym. They could be further classified into whether or not there is a historical root endonym (as in Egypt) or a partial name has become representative of a whole (as in Holland) but again, that might be overly specific.
Qwertygiy (talk) 23:14, 7 November 2023 (UTC)Reply[reply]
"-graphs" categories seem pointless to me, because any time two languages use different scripts, every placename would go in the category (since we've previously decided that when a term is used in e.g. English in its native e.g. Cyrillic script and not adapted to English letters—like when Москва is used in English—it's code-switching). Only the last three ("Munich", "Pays-Bas", "Germany") seem interesting to me, so for my part my suggestion is to categorize either the three of those, or even just the last one ("true exonyms"). I suppose we should also consider whether it is likely that anyone not involved in the discussion (here?) that decides on standards will understand or maintain any standards (since users are currently putting even things like Tierra del Fuego and Kyiv in), and hence whether it's worth categorizing exonyms at all. (Another thorny question is what to do when speakers of language A lived in a place, developed a name for it, and then got forcibly relocated by speakers of language B. Is "Bakhmut" now an exonym for what the Russians call Артёмовск [Artyomovsk], now that the latter occupy it?) - -sche (discuss) 17:59, 8 November 2023 (UTC)Reply[reply]
According to both sources I found, yes. An exonym can become an endonym, and vice versa, for various reasons. America and Wales, for example -- they were names first applied by people who had never even seen the land in question, but nobody could reasonably argue that English is not the dominant tongue, let alone officially recognized language, of these areas, and that these words are officially used by the English-speaking natives. By the UN definition, that makes them endonyms for at least the last two centuries.
This, one could easily argue, makes the distinction between endonym and exonym almost pointless from an etymological standpoint. Qwertygiy (talk) 21:23, 8 November 2023 (UTC)Reply[reply]

Reconstructed senses edit

There exists a problem with some terms that exist half-way between a reconstruction and not. An example of this would be krok#Old Polish, which has an attestation in krokiem, but that is a lexicalized case form. Currently I have it set krok as a reconstructed sense. I know @Thadh has had a similar issue and can provide some other examples where a categorizing label would be helpful. More input overall is needed. Vininn126 (talk) 14:11, 7 November 2023 (UTC)Reply[reply]

I don't remember any specific examples, but this is a widely occuring feature, especially in parent languages with a relatively small or specialised corpus. Thadh (talk) 16:25, 7 November 2023 (UTC)Reply[reply]
IMO, if we considered krokiem a mere inflected {{form of}} krok—with the same POS, and a definition just saying it's an inflected form—then it'd fine to lemmatize the lemma form krok rather than the inflected form, because the word is attested, just not the nominative inflected form. I'm certain we've done already that for various Latin words. But if krokiem is a distinct word with an entirely differently part of speech(!) and its own definition, not a mere inflected form of krok, then krok shouldn't be in mainspace, it should be a Reconstruction: page like Reconstruction:Old Norse/bljúgr or Reconstruction:Old Norse/blettr, because the word krok isn't attested, only the separate word krokiem is attested (like Equinox's recent remark that one can't use the existence of nothingize to add nothingizer, even though the derivation is obvious). - -sche (discuss) 18:54, 7 November 2023 (UTC)Reply[reply]
Perhaps Old Polish krok would be better as a true reconstruction, but there there are still instances where a lemma is attested but a sense is not except i.e. in derived terms and children. Vininn126 (talk) 18:57, 7 November 2023 (UTC)Reply[reply]
I think I see what you mean. AFAIK the correct thing to do even then is to put the unattested sense in the Reconstruction namespace. I agree it's not great that users have to go between two places to find all the content about the word, but putting senses that we know aren't attested into mainspace also seems bad. Old English docce (for example) is a reconstruction because it's not attested on its own, even though it existed in the ancestor language and the child language and is attested in Old English compounds. (And for Bär (boar), I had to demonstrate that it was actually attested on its own and not only in compounds, otherwise Brown*Toad was arguing for deleting it.) - -sche (discuss) 19:53, 7 November 2023 (UTC)Reply[reply]
These are still very different things to the issue at hand.
Imagine language Foo, with the descendant Bar and the reconstructed ancestor Proto-Foo. We have the word foo, which in Bar means "tongue, language", and the reconstructed meanings of Proto-Foo are also "tongue, language". Now, consider a scenario where the only attested sense in the Foo language is "language". Yet, based on both the ancestor and the descendant we can be quite certain that it also meant "tongue".
This is essentially what we're dealing with, it doesn't have anything to do with derived terms, just basic descent. Thadh (talk) 21:38, 7 November 2023 (UTC)Reply[reply]
Yeah, that's the same situation as *docce and e.g. *picga which existed in the ancestor and descendant. (Indeed, docce and picga may have a stronger case for having existed—as each has attested derived terms which use the sense, in addition to having something it descended from and into—than something that "doesn't have anything to do with derived terms, just basic descent".) If the sense isn't attested, my understanding is that (at present) it's supposed to go in the Reconstruction namespace.
But I'm just saying that's my understanding of what the current norm is; I don't mean to come across as defending it; I'm not opposed to changing it, though I'm not convinced that putting senses we know are unattested into mainspace is appropriate. If we do put them in mainspace, the "(reconstructed)" label looks good.
I have wondered for a long time whether we should start using the presence or absence of colour to signal differences in our entries more often (e.g. to distinguish labels indicating restriction to a particular jargon, vs labels that merely indicate topic without any restriction, like "anatomy" on elbow), and here one idea would be, if we want to put reconstructed senses in mainspace, could we colour the sense's background grey like {{Webster 1913}} (and kind of like {{LDL}})? - -sche (discuss) 01:10, 8 November 2023 (UTC)Reply[reply]
@-sche IMO not a bad idea. Accessibility UI guidelines call for being careful with colors due to colorblind users, but this can be handled by looking up the color pairs to avoid (esp. for those with red-green colorblindness), or simply using color vs. no color, as you suggest. Benwing2 (talk) 23:01, 8 November 2023 (UTC)Reply[reply]
@-sche: It's not the same situation - *docce and *picga aren't attested in any sense, rather than just in some of them - but I think we're on the same page about there not being any current solutions other than dumping a word into the reconstruction mainspace. I'm not sure colours are a good idea, but {{lb|LANG|reconstructed}} or {{lb|LANG|unattested}} should, in my opinion, be satisfactory.
To illustrate what I mean I found an example of a reconstructible sense in an attested word: Old East Slavic лѣсъ (lěsŭ) has multiple senses, including "timber". This sense is shared in both Russian and Ukrainian, and I'm pretty sure Belarusian has it, too. However, in Old Ruthenian, only the sense "forest" is attested, whereas "timber" isn't. Thadh (talk) 14:50, 10 November 2023 (UTC)Reply[reply]

Picture dictionary image sizes edit

The 'picture dictionary' maps on e.g. Abkhazia and Tbilisi are very large, perhaps because Georgia is a horizontally wide, vertically short country, since I notice that the maps of tall-and-thin countries like Palestine are not as wide, but are correspondingly comically tall (I can't view the top and bottom at the same time unless I zoom way out). On mobile, and on my computer at the level of zoom I normally use, the map on Tbilisi crowds the definition entirely off the screen (although on my computer, if I zoom one step out, I see the definition on the left and image on the right coexisting in what I imagine is the expected harmonious way). Should we make the maps a bit smaller, at least a little closer to the size of regular images? - -sche (discuss) 18:27, 7 November 2023 (UTC)Reply[reply]

@-sche It would be great if the image could resize with the screen width. User:This, that and the other, User:Erutuon or User:Sokkjo as our resident CSS experts, is that possible? For me, the map at Tbilisi is totally fine and occupies only the rightmost 20% or so of the width, but I have a very wide monitor and the Chrome window takes up most of the monitor's width. Benwing2 (talk) 00:08, 8 November 2023 (UTC)Reply[reply]
The problem is WT:PICDIC is a dumpster fire. It should've been rebuilt to use mw:Extension:ImageMap to allow for rescaling. Instead, WT:Picture dictionary/en:Georgia-map has a set width and can't be resized. --{{victar|talk}} 08:01, 8 November 2023 (UTC)Reply[reply]
But maybe the better question is, do we need such an interactive map on a wikt entry? File:Administrative Divisions of Georgia (country) - en.svg would probably suffice and is scalable. --{{victar|talk}} 17:30, 8 November 2023 (UTC)Reply[reply]
I wonder why the map of Georgia is shown at all? Tbilisi is a huge capital city in Georgia, while Abkhazia doesn't even consider itself as a part of Georgia. It would be better and more informative to have a map of Tbilisi and a map of Abkhazia instead. Tollef Salemann (talk) 10:38, 10 November 2023 (UTC)Reply[reply]

Splitting Serbo-Croatian, or at a minimum supporting standardized lects alongside it edit

@Anarhistička Maca, Vorziblix, Benwing2 for visibility.


This thread is to propose splitting Serbo-Croatian into Serbian, Croatian, Bosnian and Montenegrin, or at least supporting those as separate L2 languages alongside it. This is far from the first BP thread on that subject, so I won't rehash all the details and controversy, but rather focus my argument on current realities and precedent from other Wiktionaries:

  • "Serbo-Croatian" is a polarizing term in the countries of former Yugoslavia. Croatian linguistics and society, in particular, reject it soundly and vocally. As a result, we're likely making it harder to recruit and retain Croatian editors. This probably extends to the other 3 affected countries. As this is fundamentally a volunteer-driven project, I find it self-defeating to argue for Serbo-Croatian "unity" from an abstract linguistic viewpoint that's disconnected from the reality on the ground.
  • Four of the other Wiktionaries with over a million entries - German, French, Greek and Russian - all support, at a minimum, Serbian, Croatian and Bosnian. Some of them also have "Serbo-Croatian", which predictably lags behind the other lects in coverage. That's not too dissimilar from e.g. Croatian vs. Serbo-Croatian Wikipedia.
  • While the standard languages are mutually intelligible, changes in orthography, accentuation and diachronic development can result in entries that are more complex than they should be, e.g. kći. This also affects things that should be simple - like {{ux}} and {{uxi}} - but which in reality require judgment about which standard to pick.
  • The actual vote back in 2009 to unify the lects under the "Serbo-Croatian" L2 header ended up as "no consensus". I lack the historical background on how it became the norm anyway.

It's not lost on me even a little bit what a gargantuan task it would be to properly divvy up the 50k+ existing Serbo-Croatian entries. I see that as a gradual, multi-year process, partly dependent on us being able to recruit more BCMS-speaking volunteers (which some of us are trying to do). But I believe we should start somewhere. To that end, I propose:

  • keeping Serbo-Croatian as an L2 through the potentially lengthy period until a proper "split" is achieved. That would ensure we don't immediately break or have to redo the etymologies of borrowings in other languages.
  • adding Serbian, Croatian, Bosnian and Montenegrin as L2s with their respective ISO codes.
  • investigating options to bootstrap Croatian, Bosnian and possibly Montenegrin by noting that their entries would be Latin-alphabet-only, and have certain labels that distinguish them as such. The bootstrap process should include criteria for the "safe deletion" of the corresponding SCr entry, e.g. when it's not linked to from entries in other languages.
  • consider promoting Kajkavian and Chakavian to L2s as "Kajkavian Croatian" and "Chakavian Croatian", for the following reasons:
    • mutual intelligibility between those dialects and Shtokavian (the basis for BCMS) is limited, esp. in the case of Kajkavian
    • they have independent literary traditions going back centuries. That's part of their successful respective bids to get their own ISO language codes.

These are, of course, just a handful of initial steps - I'd be happy to discuss any sub-projects under the overall "split" umbrella, and we'll likely have a number of those between automation and manual work. My best-case outcome for this project looks like this:

  • smaller, simpler individual entries
  • better coverage of the living, modern state of each variety
  • happy contributors and an easier time in attracting more of them

As always, I'm looking forward to your thoughts!


Chernorizets (talk) 04:14, 8 November 2023 (UTC)Reply[reply]

@Chernorizets Ugh, I am   strongly opposed to this. All standardized Serbo-Croatian lects are strongly mutually intelligible and we would be doing a big disservice to our readers to duplicate the information four times over across 50k entries. Whether Chakavian and Kajkavian should be considered separate L2's is a completely separate matter, but in terms of standard Bosnian, Croatian, Serbian and Montegrin, definitely not. If the main issue is the term "Serbo-Croatian" itself, that can potentially be renamed if we can find a suitable replacement term. Benwing2 (talk) 04:20, 8 November 2023 (UTC)Reply[reply]
BTW I think a better use of resources would be to figure out how to reduce the duplication between Latin and Cyrillic equivalent entries, using transclusion or similar. Benwing2 (talk) 04:22, 8 November 2023 (UTC)Reply[reply]
@Benwing2 who in your mind are the readers to whom the split would be a disservice? Is it language learners? Because there are plenty of e.g. Croatian-specific or Serbian-specific textbooks, apps and educational media. Is it people looking up a word they found somewhere? Because they'd either be looking up rijeka or reka or река (which are separate articles today anyway), not some "amalgam" of the three, and it might be useful to know that e.g. the first is the standard Bosnian and Croatian spelling, while the latter two are the standard Serbian spellings. Chernorizets (talk) 04:32, 8 November 2023 (UTC)Reply[reply]
If your concern is indicating the usage as standard Croatian, Serbian, Bosnian, etc., that is easy to do without a massive splitting effort. We currently indicate this particular difference as Ijekavian vs. Ekavian in the "Alternative forms" section, because the split between the two doesn't exactly correspond with the split between countries. The terms "Ijekavian" and "Ekavian" link to a Wikipedia section explaining what these terms mean. We could easily add the terms "Ijekavian", "Ekavian" etc. to the headword. In general, splitting into different L2's would result in a huge additional and unnecessary barrier to entry for editors, who would have to duplicate most info across something like 7 entries (Croatian Latin, Serbian Latin, Serbian Cyrillic, Bosnian Latin, Montenegrin Latin, Serbo-Croatian Latin, Serbo-Croatian Cyrillic, maybe also Montenegrin Cyrillic), which would inevitably result in steady divergences as some entries get updated when others don't. This would cause a ton of confusion for readers who would wonder why the Serbian entry lists definitions A, B, D, E while the Croatian entry lists definitions B, C, E, F, when in reality all definitions apply to both. The quality of the resulting coverage of Serbo-Croatian terms would decline (probably quite significantly within a few years).
I have been told that the differences between standard Serbo-Croatian varieties are less than the differences between American, British and Australian English, and certainly less than the differences between European and Brazilian Portuguese. Would you support a split of English and Portuguese for similar reasons to what you proposed above? Benwing2 (talk) 05:35, 8 November 2023 (UTC)Reply[reply]
@Benwing2 I'm not sure how this is different from the situation with any other close group of lects. The amount of coverage will always depend on the amount of investment by volunteer editors. I don't see it as a problem that e.g. Croatian may end up having more/different entries compared to Serbian, or the other way around, just like I don't see it as a requirement that every article on Croatian Wikipedia needs to have a corresponding article on Serbian Wikipedia.
I definitely wouldn't expect an editor to have to create N entries, and that was never implied in my proposal. As for the possible divergence of senses, that's a valid concern, but consider that sometimes that's actually what we'd want, e.g. zrak. Right now, we're using per-country labels to reflect the fact that this is the common word for "air" in Croatia and Bosnia, whereas in Serbia it's vazduh. I'd argue that something like this is more confusing to a user rather than less. The languages also sometimes differ as to how loanwords and neologisms are incorporated into the lexicon - e.g. sebić is a trendy Croatian word for "selfie", rather than the direct loan selfi. We could continue doing what we do today and remember to tag things with the right country label, but I don't see how that's any less exhausting than if we could just create the entry as a Croatian entry.
As to English and Portuguese - if the people who speak these languages one day decided that "English" and "Portuguese" are divisive, offensive or otherwise polarizing terms, and changed their constitutions to call their official languages something else, then at that point I'd support a split. To the best of my understanding, while we're not there with English and Portuguese, we're there with Serbo-Croatian. Chernorizets (talk) 05:59, 8 November 2023 (UTC)Reply[reply]
@Chernorizets You seem to be confusing the term "Serbo-Croatian" with the linguistic reality that there's only one language involved. As I said above, we can easily use a different term; this is what the ICC did, for example, using Bosnian/Croatian/Serbian or something similar. Benwing2 (talk) 06:08, 8 November 2023 (UTC)Reply[reply]
@Benwing2 Wiktionary L2 names are used in many ways, and it's hard for a casual observer (or even a less casual one) to discern that something is a "term" vs a prescriptive statement or something else. I can get behind just doing something about the "term" (name?), although it's not obvious to me what we'd choose
This is informative: https://en.m.wikipedia.org/wiki/Declaration_on_the_Common_Language. The text of the declaration doesn't give the common language a name, despite arguing that it's a common, pluricentric language. Take a look at the Croatian version of this article. How does that inform your opinion?
I suggested L2s because of precedent elsewhere and because I don't otoh have a better alternative. I'm open to other ideas. Chernorizets (talk) 06:44, 8 November 2023 (UTC)Reply[reply]
@Chernorizets Yes, they are finessing the issue of choosing a name. Wikipedia uses "Serbo-Croatian". I have a book on Serbo-Croatian grammar called "Bosnian, Croatian, Serbian, a Grammar" by Ronelle Alexander subtitled "with Sociolinguistic Commentary" that has a long section on the sociolinguistic issues. By its title the book is implicitly endorsing the BCS naming convention (and even has the letters B C S highlighted in yellow on the cover and grouped together vertically). I know some linguists use terms like "South Slavic Dialect Continuum", although that often includes Slovenian as well. I am not attached to any particular name. Benwing2 (talk) 06:53, 8 November 2023 (UTC)Reply[reply]
@Benwing2 I still think it's telling how the EN version of this article lacks a "Criticism" section, whereas:
I gather from these articles that the official Serbian position is that the common language is most correctly called "Serbian", with three additional codified varieties, whereas the Croatian position is that the declaration is as DOA as the notion of "Serbo-Croatian" (paraphrasing). I suppose any choice of name is going to leave someone unhappy, but the choice of English WP and Wikt to stick with "Serbo-Croatian" is likely appealing to the least number of people.
I think it might be worthwhile reaching out to some of the admins or bureaucrats of DE, FR, EL and RU Wiktionary to ask how they decided to support both "Serbo-Croatian" and the individual lects as L2s. Admittedly, besides RU Wikt, the number of lemmas is relatively small, but I'd assume it was still a conscious decision. Thoughts? Chernorizets (talk) 08:46, 8 November 2023 (UTC)Reply[reply]
@Chernorizets It may have been the "course of least resistance" and stemmed from individual Serbian and/or Croatian contributors. I notice there are many more Serbian lemmas than Croatian lemmas in ruwikt. Benwing2 (talk) 09:04, 8 November 2023 (UTC)Reply[reply]
@Chernorizets: My thought is that they didn’t do a conscious decision but follow the flock of Wikipedia, just like you attempt to do emphatically. en.Wiktionary imported their WT:List of languages thence and hence Ethnologue. Lots of them had to be deleted because even their names are unattested outside, like Saʽidi Arabic and their separate treatment is unnatural to editors and readers, and this is not becoming a concerning thought for Wikipedia editors, like terminology or linguistic concepts in general. Bright shiny object. According to Wikipedia, Rāziḥīy “may be a surviving Old South Arabian language”—because one single author promoted his publication career strategically with this spectacular claim. Because reflection on attention or source criticism would be OR/TF. On Wikipedia the illogical published claim wins against balanced treatment, unless there are forces to also fear to be not mainstream. “It’s in a source!!!” And not fringe unless demonstrated otherwise. Some Dan Polansky can always come up with a skewed statistical argument that his view is the majority. (The majority is actually silent.) Which disregards practical benefit of the reader, who is supposed to become smarter, get his thoughts in order, and not politically correct first. I always prefer a superficial slant if the information density is high. The practical benefit here is one will tell the reader exactly if and in so far as one thinks if something is regional, but there is generally no data on this. If we split then information has to be verified in additional iterations to go sure whether something is Croatian, Serbian, Bosnian, Montenegrin. Fay Freak (talk) 09:31, 8 November 2023 (UTC)Reply[reply]
@Fay Freak I'm not "emphatically" attempting to do anything. I'm seeking opinions on the consensus-based forum that is Beer Parlour, and I have no special powers or privileges to "get my way" on this. Your colorfully phrased assumption that either Wikipedia got it wrong w.r.t. BCMS, or that other Wiktionaries just blindly followed Wikipedia's example, is just that - an assumption. You might be right, you might be wrong, but the thing that's actually troubling to me is that there's no appetite to even reach out to those other projects and find out. It smacks of the "we know better" attitude of EN Wiktionary that I've observed on more than one occasion. Chernorizets (talk) 02:06, 9 November 2023 (UTC)Reply[reply]
@Benwing2 I like this disclaimer on the Talk page of the "Croatian language" article on WP:
Croatian is a standardized register of a language which is also spoken by Serbs, Bosniaks, and Montenegrins. In English, this language is generally called "Serbo-Croat(ian)". Use of that term in English, which dates back at least to 1864 and was modeled on both Croatian and Serbian nationalists of the time, is not a political endorsement of Yugoslavia, but is simply a label. As long as it remains the common name of the language in English, it will continue to be used here on Wikipedia.
I think we'd be well-advised to put a version of this paragraph at the top of Category:Serbo-Croatian language, and maybe also on Wiktionary:About Serbo-Croatian. If nothing else, at least this would give a more principled reason for our choice of the name, rather than ease-of-use arguments or individual editors' interpretation of linguistics (however well-informed and good-intentioned they might be). Chernorizets (talk) 02:29, 9 November 2023 (UTC)Reply[reply]
@Chernorizets: No objections to this. Benwing2 (talk) 03:02, 9 November 2023 (UTC)Reply[reply]
@Benwing2 how would we make it happen? I'm not super familiar with language cat configuration. Chernorizets (talk) 03:19, 9 November 2023 (UTC)Reply[reply]
@Chernorizets You might just put the text on the category page itself, above the call to {{auto cat}}; if we add it to the modules, it will require some work as there's currently a special handler for 'Foo language' categories that doesn't have any provision for customized category text. Benwing2 (talk) 03:26, 9 November 2023 (UTC)Reply[reply]
  Oppose furiously. All argument rests on social proof, i.e. conjectural contributor attitudes in place of linguistic realities, and even it does not work, as there will be contributors less attracted to editing this language, either because they like Yugoslavia or because they are historically conscious or diachronically oriented or because the individual languages lose relevancy – I don’t speak Croatian, Serbian or Bosnian, I speak an exotic like Serbo-Croatian! Which curiously has a consequence that I can well speak with people in Germany, hence pick up words, without knowing whether it was Croatian, Serbian or Bosnian or Montenegrin or what. Don’t ask tribes, that’s racist and given the vivid history, inflammatory; speakers are just currently in the process of forgetting the difference. Hence also why would one see from a text on the internet whether it is one or the other? For most native speakers here on Wiktionary I am not sure either even if I have chatted with them in Serbo-Croatian. If it isn’t long enough or just somewhat idiosyncratic and playing upon another regiolect, or just old, one might not know or be sure. Language categorization cannot depend on the place of publication. Fay Freak (talk) 04:34, 8 November 2023 (UTC)Reply[reply]
@Fay Freak both Wikipedia, and the four large Wiktionaries I mentioned, give editors the choice between Serbo-Croatian, which is not the name of an official language in any country today (AFAIK), and the four independent standards. You raise the good point that some editors might prefer to work in Serbo-Croatian, and I'm fine with that. I just don't understand what's so different about English Wiktionary in particular that we'd want to deny the opportunity to create Serbian, Croatian, Bosnian or Montenegrin entries. Chernorizets (talk) 04:41, 8 November 2023 (UTC)Reply[reply]
@Chernorizets: Perhaps it would even be being different for difference’s sake, so someone has a choice with this project as opposed to others :) But seriously, if I can talk to people and be understood, as employing their language essentially, and essentially correctly and not some pidgin, it is the same language and they don’t have separate ones. This means other projects purposefully fail at logics or economic allocation of resources. If you want to be supported by Croatian institutions to make a dictionary then you might limit your scope to Croatian for simplicity’s sake or the like, as getting the countries in hasn’t worked out well before 🫨. Humans doing business is nasty. And you are from the US, it is easy to ignore particularist politics and imagine languages inside of this country without identification by the users and there appraise the differences, as experimental science works; it is a bit like medical diagnosis or something: People don’t needs have the very same thing they claim, wrong beliefs about the body are widespread, so about the tongue. Fay Freak (talk) 05:07, 8 November 2023 (UTC)Reply[reply]
@Fay Freak for what it's worth, I'm from Bulgaria. More importantly though, my thoughts have been shaped by talking to Serbian and Croatian speakers, as well as by comparing the way Serbian and Croatian Wikipedia cover the same contentious topics around language and identity. Of course, you could make the argument that I'm working with a limited sample of people - and therefore opinions - but that would be true for any Wiktionarian.
As to the criterion of mutual intelligibility, even on English Wiktionary we rely more on convention than purely linguistic reasoning. E.g. Catalan and Occitan are largely mutually intelligible, esp when you consider dialects, but we keep with established custom and treat them separately. I could probably dig up a bunch more examples like this, even with smaller language distances involved, and I will if you ask me to, but my point is that EN Wiktionary doesn't have a uniform treatment of "nearby" lects, perhaps because the world doesn't have one either. Croatian is an official language of the European Union. Serbo-Croatian is not. How we handle this is up for debate, but the reality of it isn't. Chernorizets (talk) 05:39, 8 November 2023 (UTC)Reply[reply]
What is it about kći that would be improved by splitting the lects? Sure, the alt-forms are complex, but only one form is labelled dialectally. In fact, I find the labels quite useless; "by analogy with oblique stem forms" and "apheretic variant" are etymological information, not the stuff of context labels or qualifiers. This, that and the other (talk) 07:09, 8 November 2023 (UTC)Reply[reply]
@This, that and the other it was just the most recent example I'd seen of a word with multiple variants, some of which - when you click on them - say they're regional. So the Chakavian stuff would go in a Chakavian entry, and the regional versions would go under the correct region. It may not be the best example - see zrak for some country label soup. Chernorizets (talk) 08:58, 8 November 2023 (UTC)Reply[reply]
@Chernorizets Honestly that doesn't look so bad to me. Benwing2 (talk) 09:04, 8 November 2023 (UTC)Reply[reply]
  Strongly oppose splitting Bosnian/Serbian/Croatian/Monetengrin etc etc. I'm all for splitting languages, but this would be splitting based exclusively on politics. We can't please everyone. I don't however have the background knowledge to say anything about splitting Kajkavian/Chakavian/Shtokavian. Thadh (talk) 08:58, 8 November 2023 (UTC)Reply[reply]
I have mixed feelings. I don't think it's purely a political split as stated above, that is a gross exaggeration, however politics do play a major role in this. I think a split based on nation would be a bad idea, but I'd be curious if splitting by various lects would make more sense. I think we'd have to see some examples of what splitting would potentially do to really understand if it makes sense lexically. Vininn126 (talk) 10:17, 8 November 2023 (UTC)Reply[reply]
This split would be a waste of time and resources. One can underline that certain feature is specific to Serbian, Croatian, Bosnian, or Montenegrin even as of now. There is no need of overcomplicating the matter. Furthermore, the actual linguistic differences in the SCr dialect continuum lie within the former dialectal subgroups, which had been (mostly) supplanted by the standard. One would learn almost nothing in regard to them from the split of the standards. 11:07, 8 November 2023 (UTC)Reply[reply]
Unfortunately, I'd have to weakly oppose this. I'm one of the biggest proponents of splitting languages, but it looks like this case, there's already a strong linguistic consensus that these lects are standardized varieties of the same language. Maybe the name of the current L2 could change if it's not as clear? Also, for the record, while we're at it, our 3 Norwegian lects should really really be combined into 1. AG202 (talk) 13:36, 8 November 2023 (UTC)Reply[reply]
@AG202 I think there is general consensus outside of Norwegian (and maybe other Scandinavian) editors to merge Norwegian, but last time this discussion came up, the Norwegian editors were strongly opposed. I agree the current situation is non-ideal, to say the least. Benwing2 (talk) 21:33, 8 November 2023 (UTC)Reply[reply]
Honestly, I feel like we’d need to bite the bullet someway or another. The split seems to be just based on non-linguistic reasons. It’d be best to try and convince the everyone else in that case, even though I hate to say it. AG202 (talk) 01:59, 9 November 2023 (UTC)Reply[reply]
I don't see any good reason for comparison between Norwegian and Serbo-Croatian. They are in two very different situations. Ignoring the politics and history, there are atleast two opposite tendentions which become clear in Norwegian (since 1920-s) and Serbo-Croatian (since 1990-s). While the Norwegian mess is over time being more and more united, as Bokmål and Nynorsk are slowly becoming closer to each other, mainly because they are used in the same country, the Serbo-Croatian dialects are splitting their ways because they ain't no more in the same country. Funny enough, those languages which are considered as a part of Serbo-Croatian, they are divided into cross-border dialects. So if you ask me, Serbo-Croatian (as well as Norwegian Nynorsk), is just a standard spelling of those dialectal continuums, and I don't see any reason for splitting. Norwegian Bokmål in other hand is just Danish which became Norwegianized over time, and its grammar, lexicon and pronunciation are hard to merge together with the rural Norwegian dialects, which have more similarities to Swedish and Faroese. Anyway, I'm not so deep into the Serbo-Croatian stuff, so I should abstain from opposing/supporting its splitting. Tollef Salemann (talk) 17:34, 20 November 2023 (UTC)Reply[reply]
@AG202 Yeah, although I do actually support merging Norwegian, I should point out that the Danish ancestry of Bokmål is a genuine linguistic point. Anyway, this is all a bit off-topic. Theknightwho (talk) 17:50, 20 November 2023 (UTC)Reply[reply]
  Oppose This basically boils down to a conflict between a prescriptive and a descriptive approach. The separate-language approach is strongly, vehemently prescribed due to politics and to reaction against abusive language policies of the past. It's perfectly understandable, but Wiktionary is a descriptive dictionary. Chuck Entz (talk) 15:29, 8 November 2023 (UTC)Reply[reply]
@Chuck Entz it looks like we have very few - if any - active BCMS contributors, but this wasn't always the case since we have 50k+ BCMS lemmas (which is on the high end for Slavic languages on EN Wikt). I've been trying to "root cause" this, and I wonder if the term we've adopted (as well as the 2009 vote, which didn't pass btw) have something to do with it. I participate in a large online community (> 55k members) of Slavic-language speakers and enthusiasts, and I'm trying to encourage some of them to become Wiktionary editors. Having seen more than one heated discussion around how this language ought to be named, I just fear that even if we could attract volunteers, we might not be able to retain them, particularly if they're from Croatia. I'm searching for the right answer - I admit I don't have it. This proposal was based on precedent in other large Wiktionaries, as well as Wikipedia. Chernorizets (talk) 01:57, 9 November 2023 (UTC)Reply[reply]
@Chernorizets Maybe, maybe not. You should look at who contributed them; I suspect a lot of them come from User:Ivan Štambuk, who identifies himself as a native Serbo-Croatian speaker but who has not been active since 2019 at the latest. For many languages, there are relatively few contributors, but they are often prolific, and so when they go inactive, the language stops getting new lemmas. This seems to have happened with Latvian, for example, where most terms (I think) were added by User:Pereru, who has been inactive since 2015. His entries are characterized by extremely thorough etymologies, BTW. It would be interesting to look at the number of lemmas over time; you can presumably access this by looking at the page history of Wiktionary:Statistics/generated, which is updated fairly often going back to 2010. You could also write a script to parse the Nov 1 dump with complete edit history; see this link. Benwing2 (talk) 03:01, 9 November 2023 (UTC)Reply[reply]
  Weak oppose I'd support separate L2s for Kajkavian and Chakavian (the latter might need its own Ekavian/Ikavian split with its intricacies), and a rename of S-C to "Bosnian, Croatian, Serbian, Montenegrin," but IMO this should be abbreviated in heads or templates to BCMS; if a shorter term is suggested which doesn't ruffle any feathers I'm open to it as well. In order to not totally isolate Kaj and Cha where there isn't a a common etymon with S-C, or when such a page is simply missing, I'd like etym sections to state "(Un)related to BCMS/Kaj/Cha," because I really do like the fact that there are links to the other lects' terms on a given entry (due to closeness and the marginal nature of the non-prestige lects on Wikt), and the fact that listing all forms interdialectally can add confusion or clutter, especially for learners, when terms might be given as eg. syns but only be synonymous in a certain lect (for example, ščap is Kaj but its synonym is given as the prestige štap, unmarked as standard, and not a different Kaj variant or the Croatia-only šćap (which you might expect a Kajkavian speaker to use more due to proximity)).
I think we also need a unified policy for writing alt spelling and jat reflexes. For example, on the Cyrillic page дрво I wrote the ux using the standard Serbian spelling седети, and on Latin drvo the ijekavian spelling sjediti. However, there is also an alternate ijekavian spelling, sjedjeti, which goes unmentioned. This could be solved with using slashes to show alternate spellings on the same level, and showing dialectally/regionally differing spellings/words on the entries of their specific (regional) lect.
Also, the issue of Montenegrin-specific letters needs to be addressed. Anarhistička Maca (talk) 22:52, 8 November 2023 (UTC)Reply[reply]
I think the issue of any unified group of lects is a way of more easily marking which a word belongs to. It's very easy to mark a word as being specific to a lect, and an unmarked term is supposedly universal, but if it's in all but a few areas it's clunky to write everything out. I think we need a better system for that, but I'm not sure what. Vininn126 (talk) 23:03, 8 November 2023 (UTC)Reply[reply]
@Anarhistička Maca regarding Kajkavian and Chakavian, things may not be as clear-cut as I had thought. Each of those is actually a dialect group rather than a single variety and, for instance, it turns out that some Chakavian subdialects are closer to Shtokavian than others. I've started a thread in a large online group asking native speakers of Shtokavian to rate their amount of mutual intelligibility with Chakavian and Kajkavian - comments have begun coming in, and I hope more people participate so that I can get a diverse set of perspectives. Chernorizets (talk) 02:15, 9 November 2023 (UTC)Reply[reply]
@Chernorizets Yes, Chakavian is known for this. If you look at various Proto-Slavic pages, you'll see that all listed Chakavian descendants (or at least the ones I added) include the particular town where the term is used; this is consistent with usage in Derksen's Etymological Dictionary of the Slavic Inherited Lexicon (Leiden). Benwing2 (talk) 02:51, 9 November 2023 (UTC)Reply[reply]
  Oppose The merger was made on common sense. --Anatoli T. (обсудить/вклад) 22:54, 8 November 2023 (UTC)Reply[reply]
  Oppose. — Fenakhay (حيطي · مساهماتي) 02:12, 9 November 2023 (UTC)Reply[reply]
  Oppose due to seemingly clear mutual intelligibility and other points as raised in previous discussions. MedK1 (talk) 11:11, 9 November 2023 (UTC)Reply[reply]
  Abstain, because I don't speak any BSC, but I would like to comment. Several people here contrast politics with linguistic reality, but I don't think it's that easy, because the former has a strong effect on the latter. I'm reminded of the following quote from The Slavic Languages (Sussex & Cubberly, 2006, p. 74):
Naylor, writing in 1980, observed that ‘‘the linguistic differences between the two variants are no greater than those between British and American English and would not justify separating them into two separate languages’’ (1980: 68). This linguistic judgment has been overtaken by history, and it is difficult to conceive of a set of circumstances which would reunite Serbian, Croatian and Bosnian.
The majority of the political and cultural elite in former Yugoslavia are determined to have separate national languages and their determination makes it so. I think @Chernorizets makes good points and is sensibly cognizant of the downsides of their proposal. It seems to me that even if this is rejected now, it's only a matter of time that something similar is implemented. —Caoimhin ceallach (talk) 17:08, 22 November 2023 (UTC)Reply[reply]

Interviews: Tell us about your experiences using Wikidata in the Wikimedia sister projects edit

Hello, the Wikidata for Wikimedia Projects team at Wikimedia Deutschland is investigating the different ways Wikidata is being used in the Wikimedia projects. If you would like to speak with us about your experiences with integrating Wikidata in Wikimedia wikis, please sign up for an interview in this registration form. Please note that currently, we are only able to conduct interviews in English.

For more information, visit our project page. Feedback is always welcome here. Thank you. Danny Benjafield (WMDE) (talk) 13:36, 8 November 2023 (UTC)Reply[reply]

English Wiktionary policies related to pronunciation audio edit

Does updating the Help:Audio_pronunciations page require prior discussion and consensus in Beer parlour? In my opinion, the instructions are rather obsolete and misleading in their current form. More details can be found in the following Grease pit discussion [4] and I can repost my suggestions for improvement here if anyone is interested. —Ssvb (talk) 11:20, 9 November 2023 (UTC)Reply[reply]

@Ssvb I'd say no; this sounds like the kind of non-contentious change that you can just go ahead and make. Benwing2 (talk) 20:30, 9 November 2023 (UTC)Reply[reply]
@Benwing2: Thanks! I have updated Help:Audio_pronunciations with the hopefully non-contentious information about Lingua Libre, essentially describing the status quo.
Do you happen to know what's the status of Lingua Libre Bot? I see that its page says "Wikis in tests or needing approval: English Wiktionary ". Is having a bot automatically editing English Wiktionary articles to add references to Lingua Libre pronunciation files even desirable? —Ssvb (talk) 00:02, 12 November 2023 (UTC)Reply[reply]
@Ssvb I don't know what this bot is. There is a bot User:DerbethBot operated by User:Derbeth that auto-adds pronunciation files from various sources but I think Lingua Libre may be on its deny list because of the uneven quality of its audio files. Benwing2 (talk) 04:17, 12 November 2023 (UTC)Reply[reply]

Pronunciations for Minor Geography edit

I would like to let you all know that I now want to use some old gazetteers- see Chuxiong and Akesu for examples- to do pronunciations for all the words in Category:en:Places in China and etc. words. Here's some I've done just now: Chengkou, Chenghai, Chenggu. I plan to do this on all of them, and make it nice. I will try to fill in the gaps for words that aren't listed in any gazetteers if I can feel certain that I'm giving a reasonable pronunciation (like Chengdong; for some words like Qingjin, idk if English speakers have spoken stuff like this aloud in English language conversation). This process will probably bring to light some confusing dilemmas and errors and what have you, which I plan to bring up at the tea room. This process will hopefully include IPA pronunciations as well when I can kind of match syllables one-to-one. Let me know if you see any problems with this plan. I hope this will help increase the value of the entries to the readers, attract more editors, and serve as an example for other areas of geography. --Geographyinitiative (talk) 13:00, 9 November 2023 (UTC)Reply[reply]

@Geographyinitiative My main concern is that the pronunciation of lesser-known places may not be very stable in English. I see you dealt with this in the Chenggu article but I'd expect the same to happen everywhere. Also, old gazetteers may have outdated pronunciations -- the spelling is likely to significantly influence the pronunciation of foreign toponyms, and so places written in Wade-Giles or Postal Romanization will likely have different pronunciations from the same place written in Pinyin. Benwing2 (talk) 20:34, 9 November 2023 (UTC)Reply[reply]
"spelling is likely to significantly influence the pronunciation" for sure! I've marveled for a few years now, as I've added pronunciations to Chinese placenames myself, checking whenever possible both modern dictionaries and actual spoken examples on youtube, that spelling pronunciations seem to be the only pronunciations people use for Chinese placenames. Even when the Mandarin pronunciation uses only phonemes that English also has, even educated speakers with knowledge of Chinese use spelling pronunciations—consistently pronouncing -wu- as /wu/ (and not like spoken Chinese /u/), -yu- as /ju/ (even though Chinese has no /j/), pronouncing the other consonants as they are written in pinyin rather than as they are pronounced in Mandarin, and the vowel letters in the same consistent ways... (This is also true for German and French people pronouncing Chinese placenames!) I've actually considered writing, and I suggest we do just write, a table or module that just converts pinyin to (English) IPA.
As you noticed about Cheng-, a few pinyin vowel letters (consistently) get pronounced in a (consistent) few different ways, and for any sufficiently common placename containing that vowel letter all of the ways can be found on Youglish, but the module just needs to output multiple options for those letters. I've yet to find any Chinese placename where a syllable is pronounced some particular way that it isn't also pronounced in other placenames that have that syllable, except when a name is too uncommon to find the full compliment of possibilities—but I don't think that means the full compliment doesn't exist, any more than we would think a pinyin placename was {{lb|British}} if it was so rare that the only three books it appeared in happened to all be British. Even stress is predictable (most commonly it's either on the last syllable or equal on all syllables). - -sche (discuss) 04:09, 10 November 2023 (UTC)Reply[reply]
When I recently added "chǔngʹko͞oʹ" to Chenggu, I looked back at my edit and thought to myself that there is no WAY that an English speaker, no matter their background, is going to read 'Chenggu' with a friggin "k" sound like in cat or tack. N O P E. But then again- perhaps. Perhaps in some hyperspecialized circumstances, it could happen- and indeed, the word would only be spoken in English in very special circumstances anyway. But to me, the "chǔngʹko͞oʹ" pronunciation is probably exclusive to someone thinking of the Ch'eng-ku/Chengku etc alternative forms of Chenggu.
When you look out over the inter-generational and inter-civilizational CHAOS of Citations:Hebei/Citations:Hopeh/Citations:Hopei, Citations:He'nan, Citations:Shanxi, Citations:Guangzhou, Citations:Yunan, Talk:Kuomingtang, and Citations:Xingjiang, you realize how ephemeral even the names of vast regions of China really are.
I will be thinking about your comments above and just correct me if you see anything that looks wrong. I'm open to whatever you have to say. --Geographyinitiative (talk) 10:18, 10 November 2023 (UTC) (Modified)Reply[reply]

@-sche, Benwing2 holy shit guys I'm so excited. Have you seen my pronunciation work recently on the geography terms?? It's going really well I think. I have hoped to do this for a long time. Feel free to add IPA, I just haven't learned IPA yet. But I am planning to get to it. Geographyinitiative (talk) 14:57, 15 November 2023 (UTC)Reply[reply]

Belarusian IPA notation edit

I see that the current Module:be-pronunciation produces IPA transcription [ˈzɫod͡zʲej] for the word зло́дзей (zlódzjej). But this somewhat differs from the sounds listed in the Belarusian phonology article (ɫ vs. l, o vs. ɔ, e vs. ɛ). Should it be [ˈzlɔd͡zʲɛj] instead? Or maybe something else? I'm not really familiar with IPA and would appreciate any help. —Ssvb (talk) 14:12, 9 November 2023 (UTC)Reply[reply]

@Atitarev, @Benwing2: There exists Арфаэпічны слоўнік беларускай мовы (Orthoepic dictionary of the Belarusian language) published in 2017. It provides standard Belarusian literary pronunciation transcription for 117K words (albeit in Cyrillic notation) and can be potentially used as a source and reference for the pronunciation information of the Belarusian words in the English Wiktionary.
There was also an interview (in Belarusian) with the creators of this dictionary and an additional article about it. Basically, they developed an automatic converter program (similar to the be-pronunciation Lua module) with several hundreds rules encoded in it. The results of this automatic conversion had been verified by the linguists, who analysed speech of theatre performers, voice actors and other professional Belarusian language users. Their conclusion was that the automatic converter was 98% accurate (made mistakes in roughly 2 words out of 100). For the paper edition of the dictionary, the automatic convertor's mistakes had been corrected by humans, plus alternative variants of pronunciation had been added where appropriate. The authors of the dictionary also have their own website, which can be used to get non-perfect automatically generated transcriptions (also in IPA format) or query information from paper dictionaries.
Now I wonder. How should the words like чэ́шскі (čéšski) be handled in Wiktionary articles? The above mentioned dictionary lists two pronunciation variants in Cyrillic notation: "[чэ́шск'і] // [чэ́ск'і]". If I add manual overrides IPA(key): [ˈʧɛʂskʲi] replace ʧ with t͡ʃ, invalid IPA characters (ʧ) and IPA(key): [ˈʧɛskʲi] replace ʧ with t͡ʃ, invalid IPA characters (ʧ) via the {{IPA|be}} template, then the correct symbols for the Belarusian IPA notation still need to be clarified first (ɫ vs. l, o vs. ɔ, e vs. ɛ). Alternatively, maybe the Module:be-pronunciation could gain an extra feature to allow accepting transcription overrides in Cyrillic notation (directly copied from Арфаэпічны слоўнік беларускай мовы) and automatically convert them to IPA?
Ssvb (talk) 14:31, 9 November 2023 (UTC)Reply[reply]
@Ssvb The way this is generally handled in various pronunciation modules is to use a respelling using the rules of the language in question, hence you could use чэ́скі as a respelling (although if the reduction of шс -> с is systematic, the module could be made to generate it automatically). As for ɫ vs. l, o vs. ɔ, e vs. ɛ, I can't answer that well enough as I don't know Belarusian; but User:Atitarev may be able to answer. Benwing2 (talk) 20:39, 9 November 2023 (UTC)Reply[reply]
@Benwing2, @Ssvb:
"э" should normally produce [ɛ]. Since this is straightforward (?), I have just changed it.
I think "о" is just [o] and we use [ɫ] for a hard, unpalatalised "л".
@Benwing2, I think @Ssvb is asking to allow [ˈt͡ʂɛʂskʲi] (respelled [чэ́шск'і]), as one of the allowed pronunciation, this is the same with Russian (the "шск" part) where че́шский (čéšskij) is pronounced [ˈt͡ɕeʂskʲɪj] but in Belarusian [ˈt͡ʂɛskʲi] (respelled [чэ́ск'і]) is also correct (two variants).
There may be some small imperfections, which may be harder to iron out, since resources are rather poor. There is an unfinished discussion re [e] vs [ɛ] for the Russian module, there may be some similarities with Belarusian, e.g. where "э" should be [e] or where "е" should be [e]?
Module_talk:ru-pron#Stressed_е_not_followed_by_consonant_+_front_vowel_or_by_palatalised_consonant_should_be_ɛ? Anatoli T. (обсудить/вклад) 22:32, 9 November 2023 (UTC)Reply[reply]
@Atitarev OK, I have no idea whether [e] or [ɛ] is more correct for Russian or Belarusian; as a native speaker of the former you'd know better. We can fix чэ́шскі to generate [шск] instead of or in addition to [сːк] (which variants are correct?). Benwing2 (talk) 22:38, 9 November 2023 (UTC)Reply[reply]
@Benwing2, according to @Ssvb, [ˈt͡ʂɛʂskʲi] is correct but I think we should display both [ˈt͡ʂɛʂskʲi] and [ˈt͡ʂɛsːkʲi].
In the linked discussion, started by @User:SUM1, "этот" vs "эти" differ slightly, the former has [ɛ], the latter [e] but I would find a bit hard to define this rule. More here: w:Russian_phonology#Front_vowels. Anatoli T. (обсудить/вклад) 22:55, 9 November 2023 (UTC)Reply[reply]
@Atitarev There is a lot of detail in that article. It indicates the different allophones and their contexts but I'm not sure we want to go into that much detail; I think it would just overwhelm the language learner. Benwing2 (talk) 23:08, 9 November 2023 (UTC)Reply[reply]
@Benwing2: I agree, the Russian module is fine. Thank you for all the efforts! Anatoli T. (обсудить/вклад) 23:11, 9 November 2023 (UTC)Reply[reply]
@Atitarev I mean, right now there seem to be at least three different IPA notation flavours for Belarusian and this doesn't feel right:
It would be great if at least Wiktionary and Wikipedia could agree with each other on the right symbols choice for their IPA notations. The talk page of the Wikipedia article had some discussions. And the dictionary web interface shows some contact information for feedback too. —Ssvb (talk) 00:10, 10 November 2023 (UTC)Reply[reply]
@Ssvb: They don't have to match 100%. It depends on the level of precision, which is always a point of discussion here for various languages. Compare the Russian Wiktionary [ˈfondəvɨɪ̯] with our: [ˈfondəvɨj] for the adjective: фо́ндовый (fóndovyj) or [ɛlʲɪkˈtronʲɪkə] vs our [ɪlʲɪkˈtronʲɪkə] for the noun электро́ника (elektrónika). It has to be consistent and clear, ideally decisions documented.
Do you agree, @Benwing2? Anatoli T. (обсудить/вклад) 00:52, 10 November 2023 (UTC)Reply[reply]
@Atitarev Yes, absolutely. There is a lot of wiggle room in the IPA as well as "conventional" usages of IPA symbols that may not any longer reflect current reality (e.g. the use of [ʌ] for the vowel of cut, which at least in American English is actually [ɐ], and the values of most French nasal vowels; the symbols [ɑ̃ ɛ̃ ɔ̃] reflect the pronunciation of maybe 150 years ago, when the current pronunciations are more like [ɒ̃ æ̃ õ]). Benwing2 (talk) 01:56, 10 November 2023 (UTC)Reply[reply]
@Atitarev: Regarding your examples of Russian. Wouldn't it make sense to list both [ɛlʲɪkˈtronʲɪkə] and [ɪlʲɪkˈtronʲɪkə] as alternative pronunciation variants for электро́ника (elektrónika)? And maybe have an audio sample for each variant if they actually differ. Here are Russian pronunciation audio files recorded in Lingua Libre by different users for various words starting with "элект-": . —Ssvb (talk) 06:46, 10 November 2023 (UTC)Reply[reply]
@Ssvb, @Benwing2: The pronunciation module is based on defined rules. Unstressed "э" is normally reduced in natural pronunciation, just like "е". Some people's pronunciation is affected by the spelling in a slow speech. It is equally applicable to "э" and "е", often to "я" but almost never to "о". It would be burdensome to provide spelling pronunciation to each entry. The most natural and relaxed pronunciation was chosen. Anatoli T. (обсудить/вклад) 21:53, 12 November 2023 (UTC)Reply[reply]
@Ssvb, @Benwing2: I think [ˈzɫod͡zʲej] is correct. Anatoli T. (обсудить/вклад) 22:36, 9 November 2023 (UTC)Reply[reply]

"geographic region" vs. "administrative region" edit

I am trying to fix up the handling in {{place}} of entities identified as "regions". The problem is that in some countries, "region" has a specific administrative sense, often as a top-level subpolity underneath the country, whereas in others, "region" is merely a geographic and cultural term for an area with some sort of cohesion. This was leading to problems e.g. in France, which has political regions such as Normandy, Hauts-de-France and Provence-Alpes-Côte d'Azur (quite a mouthful) as well as geographic/cultural regions such as the Loire Valley. My solution for France was to use the term "administrative region" to refer to the political kind of region and just "region" to refer to the geographic/cultural kind. For France at least, this is confirmed by Wikipedia, which terms the political kind of region an "administrative region" e.g. in the page on Provence-Alpes-Côte d'Azur. OTOH, this doesn't seem to apply to all countries, e.g. Regions of Turkmenistan just refers to the 5 political regions as "regions". Nonetheless I'm thinking of requiring that political regions be declared as "administrative regions" in order to be categorized as political entities. The idea is that the category system would recognize "Regions of COUNTRY" categories for all countries just like all countries can currently have "Cities of COUNTRY" and "Rivers of COUNTRY" categories, but only for countries with political regions would "Administrative regions of COUNTRY" be recognized. An alternative is to use the term "political region", but I'm not sure that term has much currency. Note for example that the Wikipedia article Regions of Turkmenistan is categorized under Category:First-level administrative divisions by country.

Thoughts? Benwing2 (talk) 08:30, 10 November 2023 (UTC)Reply[reply]

I think that cultural/geographic entities can be "regions", like America's Midwest and top-level administrative political boundaries can be "administrative divisions", e.g. Indiana is generically an "administrative division of the United States" and specifically a "state of the United States". This avoids problems where there are specifically-defined government "regions", such as Regions of Czechia. "Administrative region" also works as far as I'm concerned. —Justin (koavf)TCM 08:40, 10 November 2023 (UTC)Reply[reply]
I am in favor of "administrative division" over "administrative region" and especially over "political region". For one, as mentioned, it's already standard on Wikipedia, and consistency is a strong argument. Additionally, the term "region" typically implies some level of geographic cohesion, which political borders don't necessarily display. Enclaves, exclaves, gerrymandering, and "wastebasket divisions" of leftover territory can all create somewhat-arbitrary collections of distant pieces of land that are lumped together under one government roof. Using the term "division" in this sense avoids any ambiguity.
"Political region" could be misconstrued even more strongly, implying areas where certain factions (Conservative, Democratic, Taliban, abortion stances, whatever) have the strongest support. (I don't think it should be used that way, but I'm sure someone with strong opinions would if that were the chosen label for this category.) Qwertygiy (talk) 17:31, 10 November 2023 (UTC)Reply[reply]
My interpretation of Benwing's proposal was that "administrative region" was only to be used in cases where the administrative divisions are actually, officially called "regions". The local term for the administrative division (e.g. states, provinces, ...) will continue to be used in all other cases. If this is what is being proposed, it does not make sense to use the term "administrative division" in this context. @Benwing2 correct me if I am wrong here.
In any case I support the proposal as written. This, that and the other (talk) 00:13, 11 November 2023 (UTC)Reply[reply]
@This, that and the other Yes, that's exactly right. Benwing2 (talk) 00:30, 11 November 2023 (UTC)Reply[reply]

WT:PREFS show of hands - is anybody using it? edit

This page, which has pride-of-place linkage from the third position of our sidebar, has become very dusty. I just went through it and removed all the preferences that were non-functional or had been superseded by a gadget. I ended up removing around half the options. A number of those that remain are quite trivial visual changes, like relocating {{was WOTD}} to a different position on screen, indenting {{also}} more than usual, and changing the shape of bullets in the Monobook skin.

If WT:PREFS is as little-used as I believe it is, I'm inclined to decommission it entirely after creating gadgets for any preferences which are actually considered useful. Note that logged-out users can use gadgets via WT:Preferences/V2, which is linked via a "Preferences" link in the top-right of the page for these users. On the other hand, if a large number of the WT:PREFS preferences are in use by various people, it would be possible to un-phase-out the page and improve its visual appearance.

What I'm asking is:

  1. Does anybody use any of the features at WT:PREFS? If you're unsure, the answer is most likely "no". Anyone who uses these features would periodically have to re-enable them when you switch computers or browsers - as the title implies, these preferences are per-browser and are not saved in your Wiktionary account.
  2. If yes to (1), which features do you currently have switched on? Do these features matter enough to you that you would like them to remain available?

This, that and the other (talk) 11:44, 10 November 2023 (UTC)Reply[reply]

I do have settings turned on from there, but they really should be moved. Vininn126 (talk) 12:18, 10 November 2023 (UTC)Reply[reply]
Having checked, however, none are turned on. I do remember in the past I had some turned on, but I don't remember which. Vininn126 (talk) 12:46, 10 November 2023 (UTC)Reply[reply]
Nope. Equinox 12:34, 10 November 2023 (UTC)Reply[reply]
No, but if I'd noticed they were there I'd have tried some of them out, and will do so now, specifically:
Edit sections without going to the edit screen. (Same as Ædit?)
Enable audio recording tool. (doesn't work AFAICT. See User:Yair rand/AddAudio.js.)
Filter watchlist and recent changes to only show changes for certain languages. (not compatible with enhanced watchlist and recent changes)
Add a button next to the search box to simplify inputting special characters. (now only works for he, eo, and ru)
I didn't yet look to see whether there are gadgets for these. DCDuring (talk) 17:40, 10 November 2023 (UTC)Reply[reply]
Above are superseded or non-function. (See comments in parentheses for examples.) DCDuring (talk) 20:05, 10 November 2023 (UTC)Reply[reply]
Good point about AjaxEdit. I removed that from WT:PREFS. This, that and the other (talk) 00:02, 11 November 2023 (UTC)Reply[reply]
I'm not using it. Of the various prefs listed there, "Hide the copyright warning in the edit window" seems like something we should not be allowing people to do. For "Disable the javascript redirect between pages that differ only in case", plausibly useful to people adding e.g. German nouns, maybe we should (or already do?) have a gadget that allows people to turn off auto-redirection in general. - -sche (discuss) 17:57, 10 November 2023 (UTC)Reply[reply]
Ha, never realized it existed until now. — Sgconlaw (talk) 18:40, 10 November 2023 (UTC)Reply[reply]
Strongly agree re the copyright warning - why the hell was that even added in the first place? Theknightwho (talk) 19:23, 10 November 2023 (UTC)Reply[reply]
There's a copyright warning on Wikt? CitationsFreak (talk) 20:15, 10 November 2023 (UTC)Reply[reply]
MediaWiki:CopyrightwarningJustin (koavf)TCM 20:19, 10 November 2023 (UTC)Reply[reply]
I have used it before, but some don't work and it doesn't persist. Since it's a failing kludge, it needs to be replaced with proper gadgets, where available. That old record audio in the browser one was great, but no one seems able or motivated to fix it. :/ —Justin (koavf)TCM 19:29, 10 November 2023 (UTC)Reply[reply]
Yes, I've used it, though most of the ones I had selected became gadgets and I switched to using those. I currently have the following selected, though neither is that important to me or works properly:
  • Enable audio recording tool
  • Filter watchlist and recent changes to only show changes for certain languages
Andrew Sheedy (talk) 20:11, 10 November 2023 (UTC)Reply[reply]

This is great input, thanks all. The feedback is more or less as I expected. I've proposed an action (convert to gadget or remove) for each WT:PREFS feature at User:This, that and the other/WT:PREFS dispositions - please take a look and speak up if you have strong feelings (or edit that page directly if you like). This, that and the other (talk) 00:14, 11 November 2023 (UTC)Reply[reply]

This is too small a sample for action. I have added comments on the user page, which probably should be a temporary?/permanent? project page. DCDuring (talk) 14:11, 11 November 2023 (UTC)Reply[reply]
My intent is to leave this discussion open for a month or so, noting that it is linked from a prominent box on WT:PREFS. So I invite further comment from anyone.
Also, thanks for your input on the user page. This is intended to a time limited project so we may as well leave it where it is, I feel. This, that and the other (talk) 21:32, 11 November 2023 (UTC)Reply[reply]
I think that some of the wishes embodied in these 'Preferences' items are pretty good. I don't have any idea how difficult it would be to implement them durably, nor whether they will catch the interest of those with the capabilities to implement. DCDuring (talk) 00:21, 12 November 2023 (UTC)Reply[reply]

Stenoscript edit

We have around 100 entries for abbreviations used in Stenoscript, an English shorthand system. They are placed in English sections but often lack a valid PoS header due to their flexibility (e.g. ndv, rsp). How should we approach these terms? I am not sure how many works are published in Stenoscript (apart from manuals) that count towards attestation. If they are kept, they should all have some kind of header and proper categorization. (Pinging @Kwamikagami.) Einstein2 (talk) 18:33, 11 November 2023 (UTC)Reply[reply]

@Einstein2 We have had problems before with the quality of Kwami's entries and I think this very discussion came up previously. We either need to use proper headers and categories or delete the entries. Benwing2 (talk) 04:20, 12 November 2023 (UTC)Reply[reply]
We have several POS headers that do not correspond to parts of speech as the term is commonly understood, such as Ligature and Symbol. Some assignments are artificial; for example, calling the Japanese character a "syllable" is an act of desperation. A few are language-specific, such as Kanja. The list of allowed headers is not frozen; new additions can be proposed. (Determinative was added in 2022.) The main issue with Shorthand is the scarcity of applicable entries. While it could theoretically also do duty for other languages, non-alphabetic shorthand notations are practically inaddible. One possible approach is to classify, say, ak under the POS headers of Noun (act) and Verb (acknowledge) and add as Usage notes to e.g. the verb entry, “This shorthand can also be used for related words (acknowledges,acknowledged, acknowledging, acknowledgement, etc.)”.  --Lambiam 21:11, 12 November 2023 (UTC)Reply[reply]

I've seen stenoscript used to abbreviate scattered words in handwriting, though not in printed works. I think it's worth listing them, though agree that we need some formal solution to the POS header problem. kwami (talk) 21:54, 13 November 2023 (UTC)Reply[reply]

Somali Orthography edit

The Somali latin alphabet is remarkably phonemic, with the exception of pitch accent and front/back vowel distinction.

In my opinion, an umlaut should be used to distinguish /æ/, /ɛ/, /ɪ/, /ɞ/ and /ʉ/ from their "tense" counterparts /ɑ/, /e/, /i/, /o/, /u/ the same should obviously apply to long vowels which are simply written as two vowels in a row, as in finnish.

Pitch accent can be phonemically described with an acute diacritic, although it has three different phonetic realizations: high, low, and falling.

Just like in latin and ancient greek, these diacritical marks should not be used in page names, but within the pages themselves.

This is obviously very difficult due to the surprising lack of written somali sources. But I do believe it will have to be done eventually.

Ελίας (talk) 22:36, 11 November 2023 (UTC)Reply[reply]

Somali has a standard orthography. It does not distinguish the tense and lax vowels ever. The pitch accent is, sometimes (I believe the grave is used always, but maybe that's only for some pitches and acutes for others).
We shouldn't be creating a new orthography for a language that already has one. Thadh (talk) 22:59, 11 November 2023 (UTC)Reply[reply]
IMO what User:Ελίας is asking for is not a new orthography but diacritics to identify important vowel distinctions. I would be surprised if there are no dictionaries that contain such diacritics. Benwing2 (talk) 04:21, 12 November 2023 (UTC)Reply[reply]
@Benwing2: I'm yet to find one, which is very sad, because that was the main thing stopping me from editing the language. Thadh (talk) 09:11, 12 November 2023 (UTC)Reply[reply]
My proposal is not to create a new orthography, it is simply to use diacritics within the pages themselves and not in the page name.
Compare an ancient greek entry such as "θυμός" which you can look up without using the macron to indicate vowel length, but has the form θῡμός within the page itself.
This is reinforced by the fact that, as far as I know, the ancient greeks didn't indicate vowel length with the exception of eta and omega.
Ελίας (talk) 09:41, 12 November 2023 (UTC)Reply[reply]
I agree with your proposal, FWIW. Benwing2 (talk) 09:47, 12 November 2023 (UTC)Reply[reply]
Ancient Greeks didn't, but later scholars did, long before Wiktionary came along. If you can find any diacritics already in use (by anyone) outside of Wiktionary, you would absolutely get my support, but otherwise - no.
And @Benwing2: This situation is equivalent to adding diacritics to vowels in English to indicate what phoneme they represent. Thadh (talk) 11:01, 12 November 2023 (UTC)Reply[reply]
@Thadh Well, in fact, there are several dictionaries that do just that for English. Benwing2 (talk) 11:27, 12 November 2023 (UTC)Reply[reply]
Well, there aren't for Somali, not that I know of. I would be pleasantly surprised to be proven wrong. Thadh (talk) 11:30, 12 November 2023 (UTC)Reply[reply]
@Thadh How about scholarly papers? I would think some of them would use notation like this to indicate the pronunciation. How do Somali dictionaries indicate pronunciation? Benwing2 (talk) 12:23, 12 November 2023 (UTC)Reply[reply]
@Ελίας: Okay I have found one grammar discussing possibilities to distinguish the two vowel series (Nilsson 2020), he gives the possibility to notate Ä and Ą for the tense vowels (So /e/ <ë> or <ę> and so on). Honestly surprised to find any mention of diacritics used for this, but both solutions seem fine to me. Thadh (talk) 12:31, 12 November 2023 (UTC)Reply[reply]
(By the way, I must note that he doesn't use either of these systems in the entirety of his grammar, he only mentions them) Thadh (talk) 12:40, 12 November 2023 (UTC)Reply[reply]
I think it would be better to use only one of these, the umlaut. The reason for this is that, according to Nilsson, the "heavy" vowels are less common than "ordinary vowels", and also because the ogonek is less common on keyboards. (I must note, however, that according to Nilsson, /ɪ/ and /ɛ/ are ordinary vowels while /i/ and /e/ are heavy vowels) Ελίας (talk) 12:57, 12 November 2023 (UTC)Reply[reply]
@Ελίας: Yes, so <ä>, <ë>, <ï>, <ö>, <ü> for /æ/, /e/, /i/, /ɞ/, /ʉ/ respectively. Thadh (talk) 18:23, 12 November 2023 (UTC)Reply[reply]
@Thadh You're right, I got mixed up with turkish where ï is a back vowel. The umlaut should indicate the front series /æ/, /e/, /i/, /ɞ/, /ʉ/. Thank you. Ελίας (talk) 19:59, 12 November 2023 (UTC)Reply[reply]
And on pitch, I think we should follow what John Saeed mentions in his grammar: <á> for high, <a> for low, <à> for falling. This seems to be the accepted standard for pitch notation. Thadh (talk) 12:46, 12 November 2023 (UTC)Reply[reply]
If you mean "Central Somali - A grammatical outline" by John I Saeed, then I couldn't find any mention of diacritics other than the acute accent. Also, it seems more parsimonious to use the least amount of diacritics as Somali apparently uses a mora-based pitch accent and not tonemes. The falling tone is phonemically a high-low sequence, so for a long falling /ɑ/ vowel it may be represented as "áa" instead of "àa". But if the consensus is, in fact, to use diacritics for the falling tone, then I agree that it should be followed, even if it is more complicated than it should be. Ελίας (talk) 13:17, 12 November 2023 (UTC)Reply[reply]
@Ελίας: No, I mean Somali by John Saeed (1999) →ISBN. You should see what is used there (note that he mentions using the circumflex (â) for the phonological section only, so just skip over that, he then proceeds using the established practice). Thadh (talk) 18:16, 12 November 2023 (UTC)Reply[reply]
@Thadh "The first observation is that the perceived three tone system can be simplified to two units by treating falling tone (FG) as a sequence of High (H) and low (L) tone." (p. 18-19) He also makes use of this "phonemic orthography" in page 19: *góol > gôol, *náyl > nâyl. I think it would be thriftier to treat the falling tone as high-low. Ελίας (talk) 20:15, 12 November 2023 (UTC)Reply[reply]
@Ελίας: As I said, ignore his phonological section. Later in the book, grave is used throughout for where he uses a circumflex - gòol, nàyl. Thadh (talk) 20:58, 12 November 2023 (UTC)Reply[reply]
@Thadh Then I agree, we should use the established practice. (As long as it is accurate, it doesn't make much of a difference to me.) If I am not mistaken, we have reached consensus. I am new to Wiktionary, and Wikimedia as a whole, so I am not quite sure of what needs to be done for this to become an official policy, but I would guess that a vote needs to be held? Ελίας (talk) 21:16, 12 November 2023 (UTC)Reply[reply]
@Ελίας I don't think a vote is necessary for these sorts of things. Usually votes are for policies that apply to Wiktionary as a whole or for giving someone new privileges e.g. admin/bureaucrat/bot owner. Changes for individual languages just need consensus from the relevant editors. Benwing2 (talk) 21:20, 12 November 2023 (UTC)Reply[reply]
@Ελίας: No vote needed. You should write WT:About Somali including a part about vowel marking in headwords (compare WT:AAA and other "WT:About ..." pages). Thadh (talk) 21:20, 12 November 2023 (UTC)Reply[reply]
@Ελίας no need for a vote. Unless there's some conflict with site-wide rules, the community of editors for a given language have quite a bit of autonomy. The main question is whether there are other people who work with Somali that haven't had a chance to weigh in. Chuck Entz (talk) 21:23, 12 November 2023 (UTC)Reply[reply]
I generally agree with Thadh. I would heavily caution adding phonemic distinctions in the orthography that aren't used by natives and/or grammars in the language. We don't change our orthography in English here to add accent marks or to distinguish /θ/ & /ð/ or the many other consistencies in English. Same with a bunch of other languages. The headword line should match the orthography, while the Pronunciation section should show the actual pronunciation. There are languages that do have optional accent/tone/vowel markers like Hausa or Igbo, but those are also well-documented and can be found in dictionaries and such (and can be found occasionally with native writings), but if that's not the case with Somali, I would oppose any sort of additional change. AG202 (talk) 21:23, 12 November 2023 (UTC)Reply[reply]
@AG202 English is a good example, actually. Wiktionary uses enPR in many entries. That's very similar to what I'm proposing. Ελίας (talk) 21:43, 12 November 2023 (UTC)Reply[reply]
@AG202: Not sure English is a good counterexample; there are dictionaries in English that use diacritics to indicate pronunciation, and the Book of Mormon does too. Also English orthography is heavily irregular while I gather Somali orthography is quite regular other than not marking tone or vowel "heaviness". Benwing2 (talk) 22:59, 12 November 2023 (UTC)Reply[reply]
@Benwing2: But my point is that even though some English dictionaries do that, we do not (nor do other major dictionaries afaik), and I haven’t seen anyone seriously propose that we include them on the headword line (and doubt that it’d have much support).
@Ελίας: If it’s just in the pronunciation section, then I’d support it, but in the headword line, no without other evidence. AG202 (talk) 23:10, 12 November 2023 (UTC)Reply[reply]
@AG202Could you clarify what you mean by evidence? Ελίας (talk) 23:13, 12 November 2023 (UTC)Reply[reply]
@AG202 I guess I don't see the relevance here. Some languages like Russian and other Slavic languages do include such markings, some don't. English and Somali are nothing alike. Benwing2 (talk) 23:27, 12 November 2023 (UTC)Reply[reply]
The analogy was to make an emphasis on what orthographies are actually used. In English, accent marks aren’t commonly used even in English dictionaries so we don’t put them in headword lines. We should apply the same standard and consistency to Somali and not invent things that aren’t used. That’s why I ask for sources to see if that distinction is actually made in multiple sources using Somali. AG202 (talk) 05:16, 13 November 2023 (UTC)Reply[reply]
@AG202 IMO this is not a good analogy because English lexicography has well-established practices while I doubt the same can be said of Somali. I think a better comparison is to other African languages and their treatment in Wiktionary. From looking through Module:languages/data/2, quite a number of African languages have extra diacritics, although I don't know what the diacritics stand for. Benwing2 (talk) 05:28, 13 November 2023 (UTC)Reply[reply]
The other African languages that I can speak of like Igbo and Hausa (as mentioned in my initial comment) have explicitly optional diacritics and characters that are cited in their standard orthographies and can be seen in multiple dictionaries and vocabularies (with input from natives). This also can be seen in materials for non-natives. I don’t know if the same can be said for Somali from what I’ve seen. AG202 (talk) 05:51, 13 November 2023 (UTC)Reply[reply]
@Benwing2: Somali has tons of dictionaries. Here are some examples of how they handle these vowel diacritics.
  • De Larajasse (1897): pitch diacritic (acute for any) if not penultimate
  • Nakano (1976): no diacritics
  • Farah (1992): no diacritics
  • Farah (1995): no diacritics
  • Awde (1999): no diacritics
  • Adam (1999): no diacritics
  • ESL dictionary (2001): no diacritics
  • Puglielli (2012): no diacritics
I think this is pretty clear, no? I personally am partial to pitch marking, which is used by many grammars, but I am yet to use these tenseness marks in use. Thadh (talk) 12:42, 13 November 2023 (UTC)Reply[reply]
@Ελίας The above is what I meant by evidence. Checking some other resources as well:
  • A Somali Newspaper Reader (1984): No diacritics
  • Colloquial Somali (1995): Pitch diacritics, acute
  • Somali (Saeed, 1999): Pitch diacritics (acute for high/stressed, grave for falling, and no accent for low), stating the following on the vowel distinction:

The relationship between the sets of front and back vowels is interesting. Firstly they are not simply phonetically conditioned variants and thus are not allophones in classical phonemic terms. Individual members of the major lexical categories, for example nouns, verbs and adjectives, must occur with a specific vowel quality and there are a number of minimal pairs […] However such minimal pairs are very few and for the most part the back/front distinction is important for correct pronunciation but not for distinguishing lexical meaning.

Looking at some orthographic examples here and especially the grammar from the language committee, there's no mention of umlauts or different orthographies for vowel qualities. I could also take a look at Af Soomaali Aan Ku Hadalno (hadallo), but also I'm not going to purchase it just for this, and I expect that it'd have the same orthography. Thus far, there's only been one source that's suggested umlauts but doesn't even use them itself, which is very telling. There are plenty of Somali writings to look at; you just have to be willing to take the effort to look for it since it's not as accessible online for various unfortunate reasons. It's not lacking literature at all.
Overall, again, I do not think that we should impose an orthography that's used very rarely (if at all). Some languages do not show phonemic distinctions in their orthography and that's fine; English doesn't either. That does not mean that we should add distinctions that aren't there in even optional spelling. It should only be added to the pronunciation section. I would also remove this addition to WT:About Somali as there hasn't been a consensus to add it. AG202 (talk) 16:29, 13 November 2023 (UTC)Reply[reply]
@AG202 Once again, I would like to clarify: I am not proposing an orthographic reform for Somali. I am only proposing a form of phonemic notation, just like enPR, to be used on somali entries. I named the discussion "Somali Orthography" because I wanted to emphasize the shortcomings of Somali orthography, and also because I wasn't exactly sure of what my project was at the time. Ελίας (talk) 17:15, 13 November 2023 (UTC)Reply[reply]
Then yes, I’d support a change in the pronunciation section, but what we have currently at WT:About Somali about changing the headword line cannot stay. AG202 (talk) 17:39, 13 November 2023 (UTC)Reply[reply]

Gay slang vs. LGBT slang edit

We currently have a valid {{lb|en|gay slang}} label with the corresponding category tree. On the other hand, a relatively large number of definitions are tagged as {{lb|en|LGBT|_|slang}} or {{lb|en|LGBT|slang}} as there is no valid "LGBT slang" label. Wikipedia considers the two terms synonymous. However, Category:English gay slang states: "English slang terms whose usage is typically restricted to homosexual people."

I suggest renaming the label and category to "LGBT slang" and revising the category description as I don't think there is a distinct slang vocabulary "restricted to homosexual people" (as opposed to other LGBT communities). This would also enable the proper categorization of the entries mentioned above. There is also Category:Transgender slang by language, which could potentially be a subcategory. Einstein2 (talk) 23:26, 12 November 2023 (UTC)Reply[reply]

Hmm, I like your proposal for a new category, but why not leave the gay slang category intact, and make both that and transgender slang children of the new LGBT slang category?
There is definitely some specifically gay slang .... note the easily missed Category:Polari subcategory at the top there. Most of those words, such as eek, are opaque to modern readers, but we categorize old words just as much as new. Additionally I would say that there are some words that, while they might be known to other communities, are still gay slang because they refer specifically to gay people .... more of those terms are for men right now than for women.
Thanks, Soap 09:38, 13 November 2023 (UTC)Reply[reply]
I support the notion of a new Category:LGBT slang (which we should define as accommodating all of 2SLGBTQQIA+) with more specific subcategories, both language-wise and regarding sexual and/or gender identity.  --Lambiam 15:41, 13 November 2023 (UTC)Reply[reply]
Support - makes sense to have this. Theknightwho (talk) 20:45, 13 November 2023 (UTC)Reply[reply]
  Support but what do we do with the category Category:English gay slang? Most of the terms seem to be either LGBT or gay-male-specific, which suggests renaming the latter to Category:English gay male slang; but then someone knowledgeable will have to go through and recategorize the existing terms appropriately. Benwing2 (talk) 21:45, 13 November 2023 (UTC)Reply[reply]

Proper definition of when Old Galician-Portuguese ends and Galician/Portuguese starts. edit

This is User:MedK1 and I feel like it's about time we define this once and for all. Paging @Stríðsdrengur as the only user who's edited an OGP page recently seemingly.

At some point, we need to draw a firm line. The Galician pages with quotes from the 13th century aren't okay. Portuguese obsolete forms like muyto are subject to the CFI like everything else in the language (It's a WT:WDL) after all), and if somebody RFVs it (which I might soon), I don't doubt it'd fail.

That word and some others were most definitely much more used back in OGP's time, and so, properly adding them to OGP would greatly improve its word-count (currently depressing) and would prevent any interesting information from being removed out of Wiktionary due to limited attestation.

I believe these are the places we could draw the line:

This was supposed to be an exhaustive list. I'm partial to 1516 and 1536 myself; I feel like their reasons are the most strong, and Pero's writing in 1500 has a distinct lack of Renaissance (read: fancy) spellings compared to what we imagine as the start of Modern Portuguese; it's distinct from the 1516 and 1536 spellings as linked above, and worlds apart from the 1789 and 1890 dictionaries that I left... somewhere here in Wiktionary. Thoughts? 2804:18:7B:CB71:1:0:5BBF:CAD5 13:16, 13 November 2023 (UTC)Reply[reply]

@MedK1 I am in agreement that we should not use quotes from Old Galician-Portuguese to illustrate modern Galician terms. User:Nicodene and/or User:Ultimateria might have thoughts as general Romance contributors, and there are various contributors to modern Portuguese who may want to weigh in; otherwise I think whatever you think is best is fine. Benwing2 (talk) 20:34, 13 November 2023 (UTC)Reply[reply]
This was also discussed a few months ago. I defer to the views expressed by @Froaringus and @Sarilho1 on the matter. Nicodene (talk) 21:44, 13 November 2023 (UTC)Reply[reply]
Adding some comments from linguistic sources:
'Most historians have considered Galician and Portuguese as varieties of the same language until around the 14th century [...] Towards the beginning of the 15th century, Galician and Portuguese already show some noticeable phonological differences [...]' - Martínez-Gil 1997, 'Word-final epenthesis in Galician', p. 332 in Issues in the phonology and morphology of the major Iberian languages.
'[...] from the 15th century, when the increasingly impermeable political frontier went up between Galicia and Portugal, Galician lost contact with its sister tongue, Portuguese.' - Hermida 2001, 'The Galician speech community', p. 115 in Multilingualism in Spain: Sociolinguistic and psycholinguistic aspects of linguistic minority groups.
'We shall use the term Galician-Portuguese (GP) for the medieval varieties spoken in Galicia and Portugal until roughly the Renaissance, although some consistent differences already existed during the late Middle Ages (Maia 1997). Following this period, we shall speak of Galician (Glc.) and Portuguese (Pt.) as different languages...' - Dubert & Galves 2016 'Galician and Portuguese', p. 412 in The Oxford guide to the Romance languages.
Nicodene (talk) 22:37, 13 November 2023 (UTC)Reply[reply]
Very informative! Thanks for the link to the other topic; I see now that the man that was called the last remnant of Old GP is Gil Vicente, not Garcia de Resende. I find it pretty interesting that both of them died in 1536 though.
I did some digging through a few of the sources you've presented, and I wasn't able to figure out which differences they're talking about exactly. I was aware of slight differences in spelling trends such as -m/-n, but surely that can't be all, can it? When I think of the modern-day languages, the major, consistent phonological differences I can think about are the devoicing of G/J to X and the pronunciation of C/Zs as a dental fricative. I don't think it's a coincidence that Castillian Spanish shares the same features though; the simplest explanation is that they developed the sound changes at relatively the same time: the middle of the 16th-17th centuries[6]. However, that's past the 'beginning of the 15th century' deadline brought up by the scholars above. Since they can't be talking about these relatively big changes (that still don't even really affect comprehension), I'm not certain about what they might be alluding to.
Replying to points raised in the previous topic, I'm not at all happy with drawing the line at any point before 1400, and especially not at 1300. The amount of obscure lemmas that would have to be created for the modern languages and the amount of terms that wouldn't be able to be represented because they don't pass CFI (pre-Renaissance spellings were an anything-goes situation) would be ludicrous. I agree with Froaringus when he says "That was perhaps the last political opportunity for [Old] Galician-Portuguese to maintain its unity", but the part I agree the most with is his wording: "political opportunity". Ferdinand I's relinquishment of Galicia was a political move, and while it obviously had consequences to the language used in both territories, they can't and couldn't have been immediate. To draw the line over there is to draw the line with a political rather than linguistic base, which is exactly what I, Benwing and other people were voting against just now concerning Serbo-Croatian.
I mentioned consonant devoicing and C/Zs as some of the biggest phonological differences between the two languages. Both features were being 'developed' starting in the middle of the 16th century. I couldn't help but notice that the 1536 figure is pretty close to that timeline-wise. It seems to be a pretty consistent beat when it comes to differentiating Portuguese and Galician linguistically; what with the death of two notable (Old) Portuguese cultural figures, the confection of a Renaissance-styled dictionary and the beginnings of notable phonological changes in Galicia. MedK1 (talk) 00:40, 14 November 2023 (UTC)Reply[reply]
@MedK1 I agree that the big phonological differences between Galician and Portuguese that are shared with Spanish are unlikely to be coincidental; Occam's Razor would dictate that they are due to mutual influence. Benwing2 (talk) 02:14, 14 November 2023 (UTC)Reply[reply]
Hi. I'll insert myself here.
First: Most Galician philologist don't consider OGP a different language, but a different phase of the language. Galician medieval written production (tens of thousands of loose parchments, hundreds of books of inventories and the likes; and then the general prose, to the exclusion of the lyrical production which is a common endeavour) is roughly equivalent in size and quality to the Portuguese one, and is usually studied as a integral part of the curriculum of the language, not as, say, Latin, which is an ancestor language. Again, both literatures and written traditions have had their own autonomous life at least since the middle of the 14th century, and you shouldn't forcibly strip a language of its literature.
Second: Main early spelling differences (since the 13th century): Galician <nn>,<ñ>,<i>, <y> vs Portuguese <nh> for the palatal nasal; Galician <ll>, <l> vs Pt. <lh> for the palatal lateral; Galician <i>, <y> for the vowel vs. Pt <i>, <y>, <h>. It is of importance to note that early Portuguese is much more homogeneous than early Galician, both in spelling and in the admission of dialectal features, because of the Portuguese royal chancellery. Also, already since the 13th centuries is perceptible both some vocabulary difference in between both varieties and, most notably, in the verb conjugation, e.g., Galician disso/disse (MG dixo) vs. Pt disse 'he/she said", Galician quisiste/quisische (MG quixeches) vs Pt. quisiste 'You wanted'...
Third: current MAIN phonetic differences in between both languages with an old origin (out of my head!):
- Galician lost of phonemic opposition b / v (also affects northern Portuguese): attested since the 13th century, notable since 1400 (baca instead of vaca, "cow", since at least 1406).
- Galician devoicing and collapse of fricatives, etc, so there is no /ʃ/ vs. /ʒ/ opposition, and /s/ vs /z/ collapsed in the west but /z/ > /θ/ in the east (notable since 1400: sexa instead of seja, "it may be", attested since 1270; marso" instead of março or marzo, March, since 1314).
- Galician plurals of -l ended words: animal > animaas > animás (in the East and the standard norm animais) vs. Pt. animais: since the late 14th century (rayaas "royals (a coinage)" 1391, oficiaas "officials" 1394).
- Galician loss of phonemic nasal vowels (ã > a in the East, ã > /aŋ/, /aN/ in the West). Since the 13th century, but most notable since 1400 and responsible of a good deal of divergent nouns and verbs in between Pt. and Gz.: G umha Pt uma < OGP ûa, G engadir < êadir "to add", G sandar Pt sarar < OGP sãar "to cure", Gz servidume Pt servidão < OGP servidûe < Lat servitūdinem...).
- Galician result of -ano > OGP -ão: MG irmão, irmãos > WG (irmaan >) irmán, irmáns CG/EG irmao, irmaos (vs. PT irmão) "brother, brothers": Most notable since 1400, but, for example yrmaan "brother" is already attested in 1338.
- Portuguese confusion of -ão / -am / -om > -ão/-am (since the 15th century?) vs. its absence in Galician: Pt. eles comeram 'they ate' vs Galician eles comeron 'they ate'.
- Portuguese <ch> /tʃ/ > /ʃ/, but Galician still /tʃ/ (also residually in N Portugal: 18th century?)
- Galician "gheada", /g/ produced as an aspirate (or regionally as /k/ after a nasal): 18th century.
- Pär Larson (2018) La lingua delle cantigas: grammatica del galego-portoghese. 9788843093953.
(I'll add more at home) Froaringus (talk) 14:29, 14 November 2023 (UTC)Reply[reply]
- Clarinda de Azevedo Maia (1986) Historia do Galego Português
- Fernando Venâncio (2019) Assim nasceu uma língua. 978-989-702-510-5.
- Ramón Mariño Paz (2017) Fonética e fonoloxía históricas da lingua galega. 978-84-9121-187-7.
- Xosé Manuel Sanchez Rei (2021) O Portugués esquecido. O galego e os dialectos portugueses setentrionais. 978-84-8487-537-6.
I should add here that the Portuguese grammarians since the 16th centuries addresses Portuguese and Galician as different languages, but not because Galician being influenced by Spanish, but because Galician felt distinctly rural, unsophisticated, and archaic to them. Froaringus (talk) 17:06, 14 November 2023 (UTC)Reply[reply]
- Rübecamp, Rudolf (1932), “A linguagem das Cantigas de Santa Maria, de Afonso X o Sábio”, in Boletim de Filologia, volume I, pages 273–356
-Vaz Leão, Ângela (2000), “Questões de linguagem nas Cantigas de Santa Maria, de Afonso X”, in Scripta[7], volume 4, issue 7, →DOI, retrieved 16 November 2017, pages 11–24
This later author wrote (page 15, my translation) in reference to the language used in the Galician-Portuguese lyric tradition: "13th century literary Galician-Portuguese constituted still an unity, if well an unstable one. Certainly the common spoken use was showing the future bifurcation into Galician and Portuguese. The same was happening to the literary language: inside that artificial unity the presence of advanced notices of separation can be found". Froaringus (talk) 17:33, 14 November 2023 (UTC)Reply[reply]
Sorry guys. Two other references:
- Rudolf Rübecamp (1930) Die Sprache der altgalizischen Cantigas de Santa Maria von Alfonso el Sabio
- Clarinda de Azevedo Maia (1996) "O Galego-Português medieval"
In the late article the author defends the existence of a Medieval Galician-Portuguese language that goes beyond the lyrical Galician-Portuguese tradition (this existence is non pacific), although she recognises that there are growing differences appreciable since the XIII century and that grew during the XIV and XV centuries: "beyond a large common base, Galician documents show some specific evolutions that eventually will constitute true Galician innovations". Later "Since this common Galician-Portuguese phase the variants will follow different "historical pathways": of this separation of Galician and Portuguese are responsible some alterations of historical and political origin happening in both territories, some of them belonging to prior times but which let feel its linguistic consequences most notably from the middle of the XIV century and beginnings of the next century." Froaringus (talk) 18:29, 14 November 2023 (UTC)Reply[reply]
Summarizing some historical facts:
- Galician-Portuguese evolved from Vulgar Latin in what Joseph Piel called Magna Galicia, that is, Galician and northern Portugal. Pre-Latin Western Indo-European languages acted as substrate. As early adstrata acted the Germanic languages of Sueves and Goths.
- The Arabs invasion of the Iberian peninsula produced a partial depopulation of the Douro river valley. Many bishops of what is today Portugal flee to Galicia, among them those of Coimbra, Lamego, Dume and Braga. The Arab presence in Galicia was ephemeral, if any at all. During the next century Galicians "reconquered" and repopulated much of northern Portugal: [ https://revistas.ucm.es/index.php/RFRM/article/view/61690 "Ad populandum": toponímia e repovoamento no sul da Galiza alto-medieval]
- During the 9 and 10th century Galicia and N Portugal constituted an unity, governed sometimes by privative kings; the foundation of Santiago de Compostela as a pilgrimage centre brought people an culture from north of the Pyrenees.
- In the late 11th century Galicia was awarded to the count Raymond of Burgundy as personal fiefdom. He aspirated to succeed the king, but died before him. In any case, his son, future Alfonso VII, was given the Kingdom of Galicia with the title of king. At the same time, his cousin Henry was crowned as first king of Portugal. Alfonso was supported by Galician noblemen and the archbishop of Santiago de Compostela. Henry was supported by the Portuguese nobility and by the recently reinstituted archbishopric of Braga. Galicia was divided in two: Portugal expanded south while Galicia, united with León, managed to maintain their independence from Castile. Both kings of that century lies in the Royal Pantheon of the Cathedral of Santiago de Compostela.
- Probably under the growing French influence, Galicia and Portugal developed a lyrical tradition similar to that of southern or northern France, or Sicily.
- By 1230 Galicia, León and Castile where united under one only king, each country maintaining the title of kingdom. Alfonso X became the most important patron of this Galician-Portuguese tradition. In 1290 the Galician-Portuguese is first mentioned as a language of culture. It was by the Catalan Jofre de Foixà, courtier of the king of Sicily. The language is called, simply, gallego, Galician.
- During the 14th century, Galicia fought a lost for alternative kings to those who finally reigned in Castille. By then, the language was already known internally as galego and divergence with Portuguese was more evident. Galician noblemen paid for the translation of books based on the Roman of Troy, king Arthur, etc, and production of works of history.
- During the 15th century Galicia became impermeable to the Royal power, and rivalry between noble families led to, for example, one knight taking prisoner a bishop, for months, at least at two different times with different protagonists. A series of revolts ended circa 1470 with a true revolution that destroyed most castles along the country. Sadly the revolution was defeated by the lords who, anyway, would be also eventually defeated by the Catholic Monarchs, who implanted a Royal Audience, as body of government and justice of the kingdom of Galicia, under the authority of a governor, also president of the Audience and General Captain with vice-royal powers (which is actually the structure used by the Spanish Empire later in the Americas and the Philippines). The interlocutor of the Governor were the Junta del Reino de Galicia, a representative assembly whose deputies were nominated by the cities. As result, most important noblemen were forced to go to Castile to work for the kings, and the economic powers of the many Galician monasteries were put under Castilian rule. So, most nobles and monasteries stopped issuing documents in Galician, and by 1530 Galician was seldom used in legal documents (beyond personal and place names, and concepts with poor translation into Spanish) but just in private letters, songs, theatre... At that time, the first Portuguese grammarians write their works, acknowledging Galician an Portuguese as two different languages.
So, 1500 could be Ok after more than two centuries of accumulative divergence, but keeping in mind that:
-at least for Galician studies, Old Galician or Galician-Portuguese is a period rather than different language.
-whenever a big fish and a small fish are put together, bigger tends to eat or make disappear the little one.
Froaringus (talk) 21:12, 14 November 2023 (UTC)Reply[reply]
Wow, that's a lot of lines to read. Thank you for this and for all the references!
I don't have a lot to add, but I'd like to make some comments regarding your concerns at the very bottom of your post.
  • Portuguese philologists, just like Galician ones, see Medieval Portuguese as a period of one single language as well[8], just like "Classical Portuguese" right afterwards and the current "Modern Portuguese" period.
  • I see what you mean about big and small fish, but I believe you can rest assured none of that would happen here. As you mentioned, "Galician medieval written production is roughly equivalent in size and quality to the Portuguese one". They're both big fish in a big pond.
    • I actually think that it's more likely for the 'small fish' to 'disappear' if we keep the status quo, because medieval forms under the "Portuguese" L2 are subject to CFI. They're "safer" under the OGP L2.
Qwerty below me mentioned English being in a very similar situation to this, and some leniency being allows for texts that fall "on the 'wrong side' of the line". With that and the fact Galician was actually already somewhat limited in usage by 1530 in mind (I didn't know that!), I too am perfectly alright with 1500 as the date. I think we've reached a consensus here! MedK1 (talk) 18:48, 15 November 2023 (UTC)Reply[reply]
In Galician studies 1500 is, give or take, the limit most frequently used to define the end of Medieval Galician. In fact, even (less formal) late 15th century texts already sound and feel more like Middle Galician, but 1500 is certainly the most accepted date. Froaringus (talk) 08:49, 16 November 2023 (UTC)Reply[reply]
While I am unfamiliar with the particular details of this divide, I do think it's prudent to draw parallels with the boundary where Middle English becomes Modern English and Middle Scots -- coincidentally, it's exactly the same time frame. There are solid arguments for 1476 (the first printing press in England) and 1535 (the first complete Bible printed in English), so rather than weigh the merits of both, the OED simply splits the difference by giving a date of 1500 and allowing for some leniency if context indicates a text falls on the "wrong side" of the line. To my knowledge, we follow the same approach. Qwertygiy (talk) 02:52, 14 November 2023 (UTC)Reply[reply]
Any line between different historical periods of a language is arbitrary, and more of a smeared boundary. It's not like people were speaking OGP one year and Galician or Portuguese the next year. I've seen scholars in different languages use either historical events (e.g. the boundary between Old English and Middle English being around the Norman Conquest), or the time of publication of notable literary artifacts (per @Qwertygiy's examples for Middle vs Modern English).
Personally, I prefer the latter, because it's a concrete example of a coherent text that belongs to a particular language stage per scholarly consensus. It seems that your personal favorites are along that line too, so it seems like 1500 is a decent choice. From what I can tell (without having read the entirety of this thread), Froaringus is on board with that as well. Chernorizets (talk) 02:09, 16 November 2023 (UTC)Reply[reply]
Yep. Sorry for the wall of text. Given more time I could have come with something more compact and palatable, but I decided to act on the spot. And yes, I'm OK with circa 1500 :-) Froaringus (talk) 09:38, 16 November 2023 (UTC)Reply[reply]
@Froaringus based on what I read after posting here, it sounds like the period from roughly 1400-1500 was effectively transitional, where the divergence between Galician and Portuguese was smaller towards the beginning and more pronounced towards the end. I'd anyway expect a transitional period rather than a sharp boundary. Put another way, 1500 sounds like the approximate time past which it would be hard to justify talking about a single Portuguese-Galician language. If my understanding is correct, then 1500 still makes sense, but editor discretion will probably still be needed if one is quoting from a document written in, say, 1485. Chernorizets (talk) 02:44, 18 November 2023 (UTC)Reply[reply]
@Chernorizets There is a parallel in English; the end of Middle English is variously dated 1500-1550 AD, so a text from say 1525 could be assigned to either, I suppose. (For that matter, if we take the end of Early Middle English as when the case and gender system collapsed, it can be dated anywhere from c. 1200-1340 depending on the region ...). Benwing2 (talk) 08:08, 18 November 2023 (UTC)Reply[reply]
Yep, I agree. Still I'll add a pair of things, for completeness:
- Usually 1500 is the year Galician philologist would give as end of the Medieval period, but still legal documents from, say, 1520 would be cited as Medieval because of strictly historic reasons (essentially, Galician being displaced by Spanish as the language of law and administration since circa 1480). But linguistics features of those documents usually show them as Middle Galician's.
- When you compare some documents, or even books, given/published in northern Portugal, near the boundary of both countries, around 1500, these tend to fend the gap. I mean, by then Galician and standard Portuguese have already been making their own separate ways for quite some time, but they existed in a dialect continuum.
So, yes, in my opinion some editor discretion will be needed around that year (both before and after). So, I also agree with Benwing2. Froaringus (talk) 10:47, 18 November 2023 (UTC)Reply[reply]
By the way: can someone look into the flag attributed to Old Galician-Portuguese? It's a minor issue, but it should represent not just the county or kingdom of Portugal, but also the kingdom of Galicia. There're plenty of flags and coats of arms on wikimedia: see Kingdom of Galicia. Froaringus (talk) 17:55, 18 November 2023 (UTC)Reply[reply]

Turkish etmek verbs edit

Pinging Afb2011, Anlztrk Flāvidus Itidal Johanna-Hypatia Justthatboredguy Lagrium Moonpulsar Newgrass 82 Orexan PinkPanthress Rd1978 Sabri76 Sedataltundal Trimpulot Whitekiko.

In Japanese, non-compound verbs are a practically closed class; very many verbs are a compound of a noun + する (“do”), for example 管理する (kanri suru, manage), literally “do management”. We have entries for some 7000 of such suru verbs. Per Wiktionary:About Japanese § Verb forms of nouns, these verbs do not have their own independent entries, but are accommodated together with the entry of the noun. (管理する is a hard redirect to 管理#Japanese.)

Turkish verbs do not form a closed class, but Turkish has an analogous construction of verbs that are a compound of a noun + etmek (“do”), for example idare etmek (manage), literally “do management”. We currently have entries for about 143 such etmek verbs, but the official Güncel Türkçe Sözlük of the Turkish Language Association has well over 1000 of these, from abandone etmek to zuhur etmek. Might it be an idea to follow the Japanese example? A user who, not knowing the verb tehcir etmek, encounters the phrase Türkiye’ye tehcir edilen Bulgaristan Türkleri will almost certainly begin by looking up the term tehcir. A further advantage would be that it is much easier to create entries for these etmek verbs by adding them after an existing entry for the noun than by creating a new page.  --Lambiam 12:20, 14 November 2023 (UTC)Reply[reply]

I don't think that would be a bad idea in principle, but how would we deal with verbs derived in such a way which change the spelling of the noun, like zannetmek from zan, or keşfetmek from keşif?
Trimpulot (talk) 13:49, 14 November 2023 (UTC)Reply[reply]
I don't think such verbs are relevant here. Their pages can stay just the way they are. Newgrass 82 (talk) 15:05, 14 November 2023 (UTC)Reply[reply]
Indeed. The suggestion made here applies solely to multi-word verbs in which the last word is etmek, separated in the orthography from the (unchanged) noun by a space.  --Lambiam 21:27, 14 November 2023 (UTC)Reply[reply]
The same applies for Korean 하다 (hada), Hindi करना (karnā), Persian کردن‎(kardan), but for some reason only Japanese gets the (IMO most sensible) treatment.--Saranamd (talk) 14:44, 14 November 2023 (UTC)Reply[reply]
Eh for Korean, as a Korean learner and having interacted with other learners, we're much more likely to look up the 하다 forms directly. It might also get weird in terms of clutter and then also the verbs/adjectives where the "noun" isn't used outside of the stem. Ex: 은은 (euneun). Korean dictionaries, as you know, also have separate 하다 entries. AG202 (talk) 20:22, 14 November 2023 (UTC)Reply[reply]
@AG202 There is a distinction between actually inseparable verbs/adjectives like 착하다 (chakhada) and ones like 사랑하다 (saranghada). 사랑하다 (saranghada) is clearly not actually one word because it can be split by particles and even entire NPs and adverbial phrases, e.g.
우리 사랑 아름답게 했다. (uri-neun sarang-eul cham areumdapge haetda., We loved very beautifully.)
Japanese has the same distinction, and the equivalents to 착하다 (chakhada) are given their own entries while the equivalents to 사랑하다 (saranghada) are grouped with the noun.
The reason the way Japanese does it best is because when a noun gets updated (in terms of definitions, usage notes, etc.), the verb section is much more likely to get updated along with it if it's actually on the same page. Institutional dictionaries don't have this concern because they aren't reliant on volunteers and the people actually get paid to maintain consistency.--Saranamd (talk) 20:11, 15 November 2023 (UTC)Reply[reply]
For Japanese, at least, the monolingual resources I'm familiar with all list definitions under the noun, indicating whether the noun can be used with suru for verb senses. Basically what we do here.
An example in the bilingual section of Weblio, for the noun 重視 (jūshi, serious consideration; important regard”, literally “heavy + view, noun, usable with する (suru) for verb senses): https://ejje.weblio.jp/content/%E9%87%8D%E8%A6%96 ‑‑ Eiríkr Útlendi │Tala við mig 20:24, 15 November 2023 (UTC)Reply[reply]
I think Japanese gets this treatment because noun + する verbs are much more prominent in Japanese than noun + etmek verbs are in Turkish.
You're right that a person who sees "tehcir edilen" will probably go to the page tehcir if they're unfamiliar with Turkish, but tehcir etmek is already linked as a "derived term" in that page. Newgrass 82 (talk) 15:15, 14 November 2023 (UTC)Reply[reply]
This is true for tehcir, but not in general. The page for alay does not link to alay etmek, the page for ameliyat does not link to ameliyat etmek, the page for dans does not link to dans etmek, and so on. The noun fark lists ten derived terms and three related terms, but fark etmek is not among them. The transitive senses of this verb are not easily guessed from the common meaning of the noun.  --Lambiam 22:14, 14 November 2023 (UTC)Reply[reply]
Uplifting. Turkish pages getting a special treatment.
There is a special headword template in Japanese for those verbs: { { ja-verb-suru }}. See 監督
Needed here too?
— flavidus (t...) | c=› } Flāvidus (talk) 20:26, 14 November 2023 (UTC)Reply[reply]
If we adopt this approach, we’ll create an analogous headword-line template for Turkish, perhaps named {{tr-verb-etmek}}.  --Lambiam 21:23, 14 November 2023 (UTC)Reply[reply]
I think this will do the job:
{{head|tr|verb|head={{PAGENAME}} etmek|third-person singular simple present|{{PAGENAME}} eder}}
 --Lambiam 21:44, 14 November 2023 (UTC)Reply[reply]
Totally fine by me. I can start working on adding more "etmek" verbs on my free time. That template can make our job easier + maybe we can create a category for these verbs. Moonpulsar (talk) 23:44, 14 November 2023 (UTC)Reply[reply]
No part of this proposal makes any sense. Regarding Japanese entries, I can't wrap my head around why they would decide to do something so strange as this. Structures like this exist in many languages. No amount of productivity can justify this abomination. Completely unnecessary for Turkish and has no academic basis. This part about a user looking up "tehcir" boggles my mind. So? That's like saying if someone didn't know the meaning of do business, they'd look up the word business first. Yeah, that's kinda how the human brain works. Should we just put "do business" under the page for "business" then? I don't see how that constitutes valid grounds for this proposal.
Plus, if we were to do this then surely we'd have to do something similar for derivative suffixes? bilgi (knowledge) and bilgili (who has knowledge, knowledgable) for instance have precisely the same amount of semantic and pragmatic difference as "management" and "to manage" after all. And it doesn't even take a full word like etmek, just a couple letters extra. Most Turkish suffixes are highly productive too, derived entries are just taking up too much space. What about compound verbs formed with yapmak, like alışveriş yapmak, hata yapmak, yol yapmak, açıklama yapmak, egzersiz yapmak etc. We could just chop them all down and stuff them into the entries for their root words and simplify the whole thing. Complete nonsense. Orexan (talk) 07:20, 15 November 2023 (UTC)Reply[reply]
It is clear that you are against the proposal, but I do not quite see what your objection is. It does make sense to me.  --Lambiam 19:22, 15 November 2023 (UTC)Reply[reply]
I agree with @Lambiam -- your opposition is plain, but your reasoning is opaque.
This proposal is specifically about multi-word compounds, where etmek is a separate word. This is not about suffixes. This is not about yapmak (although, in my ignorance of Turkish, I do not see why it couldn't be extended to that construction as well, so long as such constructions are lexically significant).
We did this for Japanese entries precisely because it makes sense -- Japanese can use all-purpose verb する (suru, to do) after almost any noun. There is no real value in creating separate entries for the nouns and then the verb forms using suru. There are exceptions, such as 愛する (aisuru), where the suru portion is analyzed instead as an integral part of the word, rather than as a separate addition, and for these we have full entries. This appears to be analogous to those cases in Turkish where the root noun has fused with the etmek, such as the zannetmek and keşfetmek examples above. For all other Japanese suru verbs, where the suru is deemed a separate word, we have entries just at the noun headword, and include a "Verb" section that describes how this noun works with suru to express verb senses.
If we instead have separate entries for the noun, and the noun + suru, we are forced to duplicate a lot of information, and we must ensure that all of the noun entries correctly point to the suru entries as well, for no appreciable gain in usability. ‑‑ Eiríkr Útlendi │Tala við mig 19:57, 15 November 2023 (UTC)Reply[reply]
@Orexan I don't understand your point about yapmak. As far as I've seen, verbs formed with it are almost exclusively SoPs: in your example of hata yapmak, hata is pretty clearly just the direct object of the verb, while in a construction such as idare etmek, the noun is just providing semantic meaning to the verb, which can still take a direct object. Moreover, wouldn't it be a good thing to move all the verbs derived with etmek or even olmak, when possible, under their noun component, since, as you said, derived entries are taking up too much space? The modality of doing that (whether by following the example of Japanese suru verbs, or by following the example of some Ottoman Turkish entries, such as قادر‎, which show these verbs as collocations, or by doing something else entirely) can be discussed further, but I don't see a reason not to embrace the general idea. Trimpulot (talk) 18:34, 16 November 2023 (UTC)Reply[reply]
This proposal is how the Wiktionary data is structured, and has not much to do in particular with Turkish. Since as someone said there're number of other languages with same or similar syntax.
Being a native speaker doesn't entitle anyone a free ticket to a privileged opinion, since the person who proposed this, and others here endorsing are knowledgeable and/or linguists.
As for exceptions with etmek that behaves differently only proves the rule, which do exists in Japanese as well.
QUOTE https://en.wiktionary.org/wiki/Wiktionary:About_Japanese#Verb_forms_of_nouns "Note however that some verbs ending in する behave differently, such as 愛する and other verbs with one kanji plus する. See 愛する."

Maybe we'd ping some Japanese active contributors and admins to ask why did they do something as strange as this. :)
Maybe you'd move this discussion into a admin level, if one exists.
As for me: Not that I study Japanese or any other language, yet I visit Japanese entries a lot, and I remember being glad that the data was structured as it is.
— flavidus (t...) | c=› } Flāvidus (talk) Flāvidus (talk) 12:55, 15 November 2023 (UTC)Reply[reply]

@Lambiam Just FYI, there was an RFD proposal awhile ago to delete all the Hindi करना (karnā) compound verbs as SOP but I objected and pointed out that many of them are not transparently derived from the base word. I don't think the proposal was to move them to the base word, but just to delete them. I am of two minds about whether to lemmatize them under the base term; for English we normally lemmatize phrasal verbs as such e.g. get up, take on rather than putting them under get and take, as some dictionaries do. At least for English this makes sense because common phrasal verbs often have a multitude of different meanings and there are a lot of phrasal verbs derived from common verbs like get and take, and putting them all under the base verb would get extremely unwieldy (as well as the fact that there's no clear structure for doing this in Wiktionary). This may work for Japanese because there appears to be only one light verb that most such verbal compounds are made from i.e. suru, but in Hindi there are several besides करना (karnā). Not sure about Turkish; are there others besides etmek? Benwing2 (talk) 09:26, 16 November 2023 (UTC)Reply[reply]
Orexan mentioned compound verbs formed with yapmak, like hata yapmak. We currently list 19 such verbs, while the Turkish Wiktionary has 115 entries, but the only ones the two Wiktionaries have in common are ağda yapmak, banyo yapmak, sörf yapmak and şaka yapmak. They are unlike the etmek verbs in that, as far as I see, they are all intransitive; the object slot of the transitive verb yapmak is already taken up by the first component, like in English do battle, do business, make amends and make conversation. Also, some of those listed are in my opinion just a sum of parts; for example, araştırma yapmak is as transparent as English do research.  --Lambiam 11:02, 16 November 2023 (UTC)Reply[reply]
PS. I just realized one can also use bir şaka yapmak, bir hata yapmak or hatalar yapmak, which sows some doubt in my mind we are dealing with honest verbs here, rather than idiomatic collocations. Compare pull rank, where one can say “he pulled his rank”.  --Lambiam 11:26, 16 November 2023 (UTC)Reply[reply]
I don't see moving etmek phrasal verbs under the noun entry as a big improvement. If the meaning is predictable and SOP then it should not have an entry altogether and may be added as a collocation in the noun entry, as I have done in a number of OsmT. entries like لاف‎, قربان‎, قادر‎, etc., while if it isn't SOP, as in fark etmek, then it rightly deserves its entry. In this regard Turkish is not unlike most languages I know of, albeit accentuated. The Japanese situation is somewhat different: suru verbs have been analysed since time immemorial as a distinct class of verbs, so treating them as a separate POS under the noun entry is a neat practical compromise. In Turkish on the other hand this usage doesn't seem necessarily restricted to etmek, the same treatment should be given at least for olmak, possibly even for yapmak, and at that point one wonders where to stop. The point "one would look under the noun entry" makes sense, although this is not an isolated case, as has been pointed out. There are many situations, Turkish aside, where I struggled to realise I had to look under a multiword term to grasp the semantic evolution of the words together. This can be any type of multiword term, adjective + noun, preposition + adverb, verb + verb, etc. and not restricted to this kind of phrasal verbs. To conclude, I think SOP etmek (and yapmak, etc.) verbs should be made into collocations and deleted, while non-SOP ones should be kept as derived terms. Catonif (talk) 10:26, 17 November 2023 (UTC)Reply[reply]

maltho thi afrio lito edit

What are the attributes of these words from the Salic law around 500AD? I'm reading it as:

speak-1.SG.PRES.IND PRON-2.SG.ACC liberate-1.SG.PRES.IND villein-NOM.SG
‘I declare: Thee I liberate, villein.’

Is this correct, and under which language should they be classed? 500AD seems rather too early for Old Dutch, but we have no Frankish on WT anymore. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 17:49, 16 November 2023 (UTC) Minor formatting changes ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 17:55, 16 November 2023 (UTC) Reply[reply]

Semitic verb lemma glosses edit

Hello. What do you think about adding a lemma gloss third-singular masculine past (or similar) to Semitic verbs to indicate what the citation form is? This was requested by a number of users for various languages (cf. Wiktionary:Beer parlour/2023/October#Changing Latin verb definitions to use "to ..." instead_of "I ..."). Based on this, I added it to Arabic and Maltese, and was reverted by User:Fenakhay. He claims this information is unnecessary as language learners of Maltese and Arabic will know Semitic dictionary conventions. Thoughts? Benwing2 (talk) 06:53, 17 November 2023 (UTC)Reply[reply]

Those decisions should be taken by each language community. And posting it in WT:BP is a way to attract unconcerned parties in an effort to enforce a decision on all Semitic languages is a really pathetic tactic to be honest. If you want to discuss those changes for Arabic, post them in Wiktionary Talk:About Arabic. You have been using this tactic many times to impose your rather unilateral driven decisions and it is kinda getting annoying. — Fenakhay (حيطي · مساهماتي) 06:57, 17 November 2023 (UTC)Reply[reply]
@Fenakhay: With all the respect, your criticism toward @Benwing2 is unfair, in my opinion. He has contributed greatly to Arabic inflection modules is huge. No-one had enough patience and stamina to get these modules to the current level. (There's always room for fixes and improvement, of course). Also, unconcerned parties won't contribute in discussions.
Rather than fighting, can we look at the topic at hand? :) If you look at various conjugation tables, e.g. Russian иска́ть (iskátʹ), the lemma "infinitive" appears on the first line, which makes also clear that it's the lemma, even if it's not said in words. Compare with the French chercher, German suchen.
On the other hand, it's not clear why the Bulgarian тъ́рся (tǎ́rsja) is the lemma. Same with the Macedonian бара (bara). I think it would be beneficial to include بَحَثَ(baḥaṯa) on a top line, since this is the lemma. It's the word use look for. I am suggesting any particular design change but it can be discussed if it's agreed on and the discussion continues in a positive way. Anatoli T. (обсудить/вклад) 08:50, 20 November 2023 (UTC)Reply[reply]
And the fact that you haven’t pinged the concerned parties (Arabic and Hebrew editors) shows the motives of this thread. — Fenakhay (حيطي · مساهماتي) 07:02, 17 November 2023 (UTC)Reply[reply]
Best discussed within the editor community of each language. I only have the tip of my big toe involved in Ge'ez (I hope to keep learning more Ge'ez to be more involved in the future), and as far as Fenakhay's claim goes, I think it's very reasonable to say that Ge'ez learners already know the citation form is the 3rd-person singular perfect while using "to X" glosses in the English, because that's what every textbook in fact does (Lambdin, Wright, Prochazka, the draft of Butts). I don't know about Maltese but you guys could check textbooks if there's an argument going on about this. When Fenakhay says there's a tradition in "Semitic languages" to do this, I don't think he means generalist Semitic linguistics literature, but actually does mean the textbook and dictionary tradition of every individual Semitic language (and it's possibly true).--Ser be être 是talk/stalk 07:20, 17 November 2023 (UTC)Reply[reply]
In any case I wouldn't prefer that for Amharic either, for the same reason SBES mentions. Thadh (talk) 08:00, 17 November 2023 (UTC)Reply[reply]
@Fenakhay Please don't attribute spurious motives to me; calling me "pathetic" is a rather strong term to use and reflects more on you than me. I have posted here because I prefer consistency across languages and would like to get a wide set of opinions, and I don't think that's an unreasonable position to take. On your request I am pinging (Notifying Alarichall, Atitarev, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): the Arabic language editors. There is no workgroup for Hebrew so I'm not sure who the relevant editors are and I'm not even sure how active the Hebrew community is at this point. Benwing2 (talk) 09:40, 17 November 2023 (UTC)Reply[reply]
@Benwing2: But that's not a gloss of the lemma! It's the non-gloss of identification of a form. It's an explanation of the headword of the entry, and should be de-emphasised. If you want support for a means of presenting that information, show the form to us. It's not documented, and what I saw for Latin (formerly under pluit) was bad, and is now gone. RichardW57m (talk) 11:13, 17 November 2023 (UTC)Reply[reply]
What RichardW57m says. Already discreetly done even, in so far as we link verb verb morphology appendices, that say “the citation forms, which in Arabic means the 3rd-person masculine singular perfect” – perfect. Everything more is noisy. Fay Freak (talk) 14:02, 17 November 2023 (UTC)Reply[reply]
@Benwing2: Could you please give me an example? As I remember, we used to learn Literary Arabic verbs in school by their masculine third person form in the past. Not sure about Modern Hebrew, though, perhaps Classical Hebrew is taught similarly to Classical Arabic. --Esperfulmo (talk) 14:25, 17 November 2023 (UTC)Reply[reply]
Usage for Hebrew varies. Many use the 3sm qal perfective or the simply the root (and that's not a simple rule for פ״ו verbs), but there are some dictionaries that use the present participle. I've a feeling there's a Hebrew dictionary out there that actually uses the infinitive - I've face-palmed on seeing the infinitive used for numerical algorithms for grouping languages by similarity. --RichardW57 (talk) 23:57, 25 November 2023 (UTC)Reply[reply]

Proto-Ta-Ne Omotic edit

Request to add code otn-pro, which would include Bench, Gonga/Kefoid, Ometo, and Yemsa/Janjero (https://glottolog.org/resource/languoid/id/gong1255, see also Blažek 2008 in In Hot Pursuit of Language in Prehistory, as there is a basis for grouping on lexical similarity) Saph668 (talk) 12:15, 17 November 2023 (UTC)Reply[reply]

I think @Saph668 has misunderstood my suggestion on WT:TR, which was to ask for the addition of the Ta-Ne Omotic language family (as given above) and its proto-language, and its proto-language. I don't think 'otn' is a suitable code element, as it is assigned to an Otomi language. I think we may have to go for something starting omv-, perhaps omv-ggi for Gonga-Gimojan, Bender's name for the family. Or is there a problem with omv-ggi-pro for the Proto-language? --RichardW57m (talk) 13:45, 17 November 2023 (UTC)Reply[reply]
@Saph668: Is it more frequently used than "North Omotic"? Also, otn is already used for Tenango Otomi, so we need to call it something else, like omv-otn. Furthermore, if there are no reconstructions yet done for the branch, it may be best to just keep it as a family without a reconstructed ancestor. Thadh (talk) 14:11, 17 November 2023 (UTC)Reply[reply]
@Thadh: There are quite a few reconstructions around labelled as "North Omotic" - I'm not sure what their quality is. (I suspect there may be a lot of refinement to come.) The problem with nomenclature is that the sense doesn't seem stable - does it include the Dizoid and Mao groups? On the other hand, the "Ta-Ne Omotic" group does seem to be a stable concept, and is recognised by sceptical Glottolog; it is highly plausible that Proto-Ta-Ne Omotic actually existed. The same cannot be said of Proto-North Omotic - Glottolog does not accept the existence of the "North Omotic" group (or even Omotic). --RichardW57m (talk) 14:28, 17 November 2023 (UTC)Reply[reply]
@RichardW57m: We decide which languages to include in a branch ourselves (of course on the basis of other sources), so we don't need to use a lesser-used term just to make it clear to others - we have family trees for that; But if it is truly more used/preferred by other sources, we can adopt it. And as for reconstructions, if there are at least solid sound correspondences, we can opt for a solution where we do add the proto-language, but agree not to create or link to any reconstructions, rather just putting {{inh|LANG|omv-otn-pro|-}} in the etymology for categorisation. This was also the plan with Proto-Cushitic (although I believe at some point people still started adding some shaky forms to the etymologies). Thadh (talk) 14:35, 17 November 2023 (UTC)Reply[reply]
Apologies, I didn't put much thought into the language code... Regardless, RichardW57m is right, Ta-Ne Omotic is a more stable concept. Saph668 (talk) 14:37, 17 November 2023 (UTC)Reply[reply]
The advantage of a having a proto-language is that one can list the cognates under the reconstruction, rather than having n-1 cognates listed for each of n cognates, as opposed to the current clutter where inherited Tai words are slowly acquiring umpteen cognate Zhuang forms in the etymologies of the cognates. --RichardW57m (talk) 14:43, 17 November 2023 (UTC)Reply[reply]

Maltese "words" ending with hyphen edit

(copying from Wiktionary talk:Beer parlour, where it was accidentally placed)

(Notifying Alarichall, Atitarev, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): We now include Maltese words such as il- and tal- whose entry name contains a hyphen because it is never used without one (e.g. il-mara, tal-kelb), but this practice seems unprecedented in Wiktionary outside Maltese. Usually the hyphen indicates that the given entry is an affix, e.g. dés-, which combines to give words such as désordre without the hyphen.

I am bringing this up here to decide once and for all whether such Maltese entries should be included in Wiktionary with or without a hyphen. (So please avoid bringing up the argument that the current modus operandi for Maltese is to include the hyphen.)

Personally I can see both benefits and drawbacks, and a potential argument for the hyphens is that it seems a bit ridiculous to suggest that "x" is itself a word (on the basis of the current x-), since it has only one consonant and no vowels. I can also see another potential argument for the hyphens, namely that currently we accept apostrophes in entry titles (e.g. m’).

(P.S. lill- was deleted in 2017 by @Qehath.)

--kc_kennylau (talk) 02:36, 18 November 2023 (UTC)Reply[reply]

@Kc kennylau Personally I don't see why a single consonant can't be a word; but regardless of that, il- is properly speaking a clitic, so if you're looking for analogies outside of Maltese, you should look for how clitics are handled in other languages. Russian has several clitics, for example (which can be found mixed into CAT:Russian particles, but maybe should be moved into CAT:Russian clitics), and the ones that attach to preceding words with a hyphen are written with a hyphen, e.g. -либо (-libo), -то (-to), -ка (-ka). This is somewhat analogous to English's apostrophe-s clitic ('s), where we include the apostrophe that joins the clitic to the preceding word. So this suggests that the current use of a hyphen is correct. (And I should add, Russian has single-consonant words written without a hyphen: б (b), ж (ž), ль (), etc. These are also clitics but are normally written as separate words, hence no hyphen.) Benwing2 (talk) 08:21, 18 November 2023 (UTC)Reply[reply]
I see no reason why it shouldn't include a hyphen, given that it's spelled that way.--Urszag (talk) 11:25, 18 November 2023 (UTC)Reply[reply]

nsteraq etc. edit

(Notifying Alarichall, Atitarev, Benwing2, Esperfulmo, Erutuon, عربي-٣١, Fay Freak, Assem Khidhr, Fenakhay, Fixmaster, Roger.M.Williams, Zhnka, Sartma): There has been a mini edit-war on what is currently Template:mt-conj/VII+VIII (link to history), so I decided that it would be better to settle things here.

We have both stated our reasonings; I stated that Ġabra, Aquilina, and verb.mt all classify nsteraq etc. as Form VII, while @Fenakhay stated that Aquilina actually said that they belong to both Forms VII and VIII; so I have decided to produce what Aquilina has said:

"Another variant of Pattern 7 is obtained by prefixing in as for the seventh form and infixing t after the first radical as for the eighth form."

Here the keyword is "another variant of pattern 7", meaning that such verbs belong to Form VII; Aquilina mentioned the eighth form as a comparison, i.e. this form is like Form VIII, but it is actually not.

I hereby request @Fenakhay to produce what Aquilina has said that convinced him that they belong to Form VIII as well; and if he does so, I would also like to discuss what we should do in this situation.

I have come here with another problem: Ġabra conjugates the verb in the first person singular as "nsteraqt", which contradicts with the form "instraqt" given by verb.mt and Aquilina, since the former has an extra "e". What should we do in this situation? Which form is the "correct" one? Or maybe they are variants? Or maybe they are underattested theoretical forms (so it doesn't really matter)?

--kc_kennylau (talk) 20:58, 18 November 2023 (UTC)Reply[reply]

First of all, Ġabra and verb.mt are not a reliable source for verb classification as they contain many errors.
Second of all, Aquilina, in his own dictionary, says that, i.e. nxtegħel, is VII+VIII meaning it is a mix of both forms which makes sense. And it is the same approach taken by other Arabic linguistics for mixed forms in dialectal Arabic. Where is your quote from as you haven't provided a source?
For your last point, it should be nstraqt as the cluster /str/ is permissible and I've confirmed it with a native speaker.
N.B. Could you not spam the Arabic workgroup because this doesn't concern Arabic itself. You can either ping the Maltese editors or create a dedicated workgroup for Maltese. — Fenakhay (حيطي · مساهماتي) 21:23, 18 November 2023 (UTC)Reply[reply]
@Fenakhay: My quote is from P.159 in "Teach Yourself Maltese" by the same Joseph Aquilina.
I'll edit the templates to reflect the nstraqt forms etc.
Since you seem to be more familiar with the Maltese side of en.wikt, could you provide me with a list of editors that should be in said "dedicated workgroup for Maltese"?
--kc_kennylau (talk) 22:39, 18 November 2023 (UTC)Reply[reply]
(cc @Fenakhay) Update: I have temporarily included a parameter named "keep4" to specify when the first root vowel should not be deleted in certain forms; I think the template T:mt-conj/VIII also has the same problem, e.g. steraq also has the wrong form steraqt.
--kc_kennylau (talk) 23:37, 18 November 2023 (UTC)Reply[reply]
The necessity of the "keep4" parameter seems to be corroborated by verb.mt, which assigns two models to the form VIII verbs, called ltemaħ and ftakar. --kc_kennylau (talk) 14:20, 19 November 2023 (UTC)Reply[reply]

Surnames with many different origins edit

Do we have a preference for whether/when to group various origins of surnames? E.g. Jiang has a lot of different etymology sections for the surname from Mandarin Jiāng as opposed to the one from Mandarin Jiāng as opposed to from Jiǎng, etc.; likewise Li.
In contrast, Wang, Wong, He and Hu have only one section covering all the origins; likewise other names I can find like Campbell, Meyer, Steen, Johnson, Ng, Bear and Doe.
On the face of it, the Wang/Johnson approach seems better to me, because separate ety sections would seem to require we go outside of lexicography and into genealogy in order to satisfy ATTEST/RFV, to show that not only did books mention people named Jiang, but that 3+ traced their ancestry specifically to family 姜 as opposed to 江, and that 3 books' Does took their name from deer as opposed to water, etc. (We also cover e.g. the two origins which led to the modern verb settle under one ety section for a similar reason, that in many cases they can't be teased apart.) What do you think? - -sche (discuss) 19:24, 19 November 2023 (UTC)Reply[reply]

@-sche Seems OK to me to group etymologies like this for the reason you mention: it may not be easy to separate the origins (esp. for Chinese surnames). Benwing2 (talk) 00:07, 20 November 2023 (UTC)Reply[reply]
See also this relevant BP discussion on surnames alt forms/doublets.
I would prefer that each etymology section includes only surnames from one language, since they sometimes have varying pronunciation/alternative forms/doublets of the surname in English, e.g. Hui is /hweɪ/ for the Mandarin-derived ones and /hu.i/ (or /hɵy̯/ in Hong Kong) for the Cantonese-derived one; or Lee and Choi which have different doublets for the Chinese and Korean ones. It would be a somewhat cluttery if they are merged into one section but the same qualifiers repeat multiple times throughout. The exhaustive listing of Li is for sure silly though, and I agree that they 100% needs to be merged. An approach like Wang is still OK to me though, but the etymologies should be formatted like the Chinese and Danish ones on Wang which uses proper templates and feels less wordy than the other ones that repeat "As a …".
I'm also curious as to what should be done for entries where there are some place names/other proper nouns that shares an etymon with a surname, e.g. Wu, which would be extremely messy if the surnames are merged into one section (in either case of keeping the other proper nouns in a section seperate from the surname, or just lumping everything into one massive section). – wpi (talk) 06:20, 21 November 2023 (UTC)Reply[reply]
I don't know how Wiktionary should appropriately cover this situation. However, to shed light on the issue, I have just now demonstrated that all five etymologies could very likely meet WT:ATTEST on their own. Do with that what you will. --Geographyinitiative (talk) 07:15, 21 November 2023 (UTC) (Modified)Reply[reply]

Getting rid of "see also" edit

A passing thought: what do people think about the idea of (at some point in the future) deprecating "see also" sections? They would be superseded by the existing section types that express the actual nature of the connection between the words, like "hyponym", "meronym", (etymologically) "related terms", etc. The problem with "see also" is that it has no semantics. Equinox 12:28, 22 November 2023 (UTC)Reply[reply]

Good luck with it... There's nothing inherently wrong with no semantic connection.Jewle V (talk) 12:32, 22 November 2023 (UTC)Reply[reply]
(Oh, to clarify, I mean that "see also" doesn't express the nature of the connection — I'm not saying that the meanings of the see-also words have to be similar.) How else will our AI overlords learn how things fit together? hmmm. Equinox 12:34, 22 November 2023 (UTC)Reply[reply]
Sometimes the fact there's no semantic connection is important. I'd like to think linking dim sum from dim and sum would benefit a curious passer-by. Jewle V (talk) 12:53, 22 November 2023 (UTC)Reply[reply]
Hmmm, well, it seems to be a useful miscellaneous section, for example for terms which are semantically but not etymologically related to the entry and are of a different part of speech. That’s what the section seems to be currently used for. — Sgconlaw (talk) 13:15, 22 November 2023 (UTC)Reply[reply]
We need an "other" heading for items that have some kind of connection with the headword, but not one of those that we have a specified home for. Sgconlaw's example is a very good one, but whatever homes we decree or add for specific relationships, there will always be more, especially in the minds of contributors. In some cases, there are terms that are readily confused with the headword, but aren't homophones, let alone homographs. In an entry that contains "not to be confused with X", I wish we would move 'X' to "See also". OTOH, I may have taken a step too far when I used the heading for disease vectors for an entry for a species of bacteria. DCDuring (talk) 18:54, 22 November 2023 (UTC)Reply[reply]
@Equinox while I understand your concern that "See also" can be a bit of a wildcard/kitchen sink, I think it's a losing game to try to give a name to every kind of relationship that can exist - now and in the indefinite future - between two articles on Wiktionary. If you look at WT:EL, "See also" can furthermore point to "other pages on Wiktionary, including appendices and categories." I don't know how often we take advantage of that, but it could be a great way to increase the visibility of some of our appendices.
Another idea might be to expand the "See also" section of WT:EL with guidance on when not to put things in there, in favor of a more fitting section. It would also be nice to have a brief statement of the perceived or anticipated benefit of "See more" for Wiktionary users. Chernorizets (talk) 12:06, 23 November 2023 (UTC)Reply[reply]
An example of a highly relevant semantic connection that is not necessarily syn or cot is s.v. interrogatee at See also > arrestee, detainee, suspect. Such a subclass can be described as "especially relevant Venn overlap". Strictly hierarchical relations such as hyponymy and hypernymy are also very important, but their limit is the strictly hierarchical, and not all relationships are solely hierarchical. Quercus solaris (talk) 20:01, 23 November 2023 (UTC)Reply[reply]
PS: Granted that such things could be moved to a Thesaurus entry and then the See also section need not say anything except "See Thesaurus:blah". A question that follows, though, is, will people object to having Thesaurus entries that are rather sparse? The one for interrogatee could have an ant section and a small see also section and nothing else. If a consensus exists for allowing such sparse Thesaurus entries, then I could adhere to that method. Quercus solaris (talk) 20:20, 23 November 2023 (UTC)Reply[reply]
It's a grab bag because our entries need grab bags for
  1. items a contributor knows belong in the entry but doesn't know where
  2. items whose semantic relation has a name that is unknown to any users other that some semanticists (which includes more than half of those listed in WT:ELE/
  3. particular examples are sgconlaw's and often-confused-with-the-headword (even excluding disease vectors!)
Putting these items in Thesaurus namespace means that users who are unaware that there should be things beside synonyms and antonyms there will be even less likely to see them and pursue them than now.
If someone wants to put in the effort to clean up a sample of, say, 100 See also sections and report findings, there might be something worthwhile to talk about. DCDuring (talk) 03:39, 24 November 2023 (UTC)Reply[reply]
Agree on both of those two latter points. Quercus solaris (talk) 04:45, 24 November 2023 (UTC)Reply[reply]
Would you rather that contributors put items under wrong headings or failed to put them in at all? We don't pay much attention to talk pages. DCDuring (talk) 15:36, 24 November 2023 (UTC)Reply[reply]
My own opinion (and I suspect a commonly held opinion among Wiktionarians) is that the former is much better than the latter. (I grant that the question might have been asked rhetorically, but it's worth answering, as part of working through answers to the driving question in this thread.) Just let contributors enter logically connected things under "See also" and let someone else refine the placement later if they care to and are able. For example, I often find things at "See also" (from others) that I move up to "Synonyms" or "Coordinate terms" when that's in fact what they are. But when they don't fit there, they have to remain at "See also". Quercus solaris (talk) 17:22, 24 November 2023 (UTC)Reply[reply]
Broadly agree with Sgconlaw, DCDuring and WF above, the semantic categories aren't exhaustive and there are cases where they might fit but pedantry over it doesn't help readers. —Al-Muqanna المقنع (talk) 18:44, 24 November 2023 (UTC)Reply[reply]

feng edit

@Seoovslfmo Where would feng shui, feng-huang, Hai-feng go on the feng page if 'See also' were eliminated? Also, should these appear under feng under current rules? ([9]) See Wiktionary:Tea_room/2023/October#What_is_the_relationship_between_tai_and_tai_chi? and elsewhere. -Geographyinitiative (talk) 11:00, 27 November 2023 (UTC)Reply[reply]

That's not for me to say, GI. I'm not a massive believer in always following the rules, as evidenced by my 1000+ (and counting!) blocks on the site! Seoovslfmo (talk) 11:08, 27 November 2023 (UTC)Reply[reply]

Semantics and surface etymology edit

We define surface etymology, as of writing this post at least, as:

The apparent etymology of a term based on components occurring in the modern form of the language, such as earth + -en for earthen, which actually occurred in Old English as eorthene.

One question that arose on the Discord server is whether the meaning of the word needs to be accounted for when adding surface analysis to entries.
Consider the apple of discord (pun not intended), the Polish średni ("average, intermediate"):

  • We hold that Proto-Slavic *serda meant "the middle," but also "Wednesday." Its Old Polish descendant śrzeda meant only "Wednesday," and so does Polish środa now.
  • Proto-Slavic *serda gave rise to the adjective *serdьnъ, which meant "middle." Its Old Polish descendant śrzedni meant "middle, average" and so does Polish średni, roughly.

The clou is that the "middle" meaning of śrzeda/środa was lost in Old Polish already, yet we hold that średni is, by surface analysis, środa + -ni. However, surface analysis (again, going by our definition) requires that the components occur in the modern form of the language, and the etymology is apparent. It's pretty clearly not apparent that "Wednesday + [adjectivizing suffix]" would yield a word meaning "mediocre." So is surface etymology more about the meaning of its components, or of its components etymons? Hythonia (talk) 14:21, 22 November 2023 (UTC) Pinging @PUC, Vininn126 as they were involved in the discussion initially. Hythonia (talk) 14:22, 22 November 2023 (UTC)Reply[reply]

  Support PUC – 14:48, 22 November 2023 (UTC)Reply[reply]
I have no strong feelings. I would lean towards accepting etymons, but if everyone agrees that it should also be lexemes then seems fine to me. I'd also like to take this time to say we really should change the wording/name of the template... Vininn126 (talk) 14:57, 22 November 2023 (UTC)Reply[reply]
@PUC What do you support? It was an or-type question.
@Hythonia My understanding of surface etymology is basically that it's the answer a native speaker without a linguistic education would give. I think their answer would usually involve both form and meaning. Applying this as a test to your example, it's quite plausible that someone might derive "middle" from "Wednesday-ish", since Wednesday is after all the middle of the work week. —Caoimhin ceallach (talk) 17:39, 22 November 2023 (UTC)Reply[reply]
I'm not sure that "without linguistic education" is exactly the right criteria - a lot of Polish speakers are not consciously aware of deverbals, but it's part of the modern language. (And yes, you can have surface analysis deverbals). Vininn126 (talk) 18:01, 22 November 2023 (UTC)Reply[reply]
You're right, I was more thinking of knowledge of the historical development of their language and of historical linguistics in general. I don't know if grammatical knowledge has a strong interference effect on judgement. At any rate I meant a speaker should ideally decide purely based on linguistic intuition. —Caoimhin ceallach (talk) 18:22, 22 November 2023 (UTC)Reply[reply]
This raises an interesting point - a lot of people don't realize that "reponsible" comes from response". I suppose the lack of a similar lexeme is the reason, but guaging intuition might not be so easy is the point. Others might find that example obvious. Vininn126 (talk) 18:57, 22 November 2023 (UTC)Reply[reply]
True, absent a systematic study of surface etymology you'd have to intuit other people's intuition, or restrict yourself to clear-cut cases, which the above case is not. —Caoimhin ceallach (talk) 12:58, 23 November 2023 (UTC)Reply[reply]
@Hythonia if earthen is a representative example, then IMO średni is not like it. The English adjective closely tracks the semantics of its noun component, while the Polish one doesn't. It's also not necessarily relevant that other Slavic cognates, e.g. Bulgarian среден (sreden), do support the surface etymology a la Proto-Slavic. It's still the case that średni didn't just appear out of thin air, so if it were me, I'd say in the etymology section that the underlying morphology is that of Proto-Slavic (using {{affix}} with |nocat=1), but the semantic relationship was lost even in Old Polish. So, no {{surf}}. Chernorizets (talk) 11:52, 23 November 2023 (UTC)Reply[reply]
So let's say we'd like to show etymological formations anyway, what would be the best approach for that? Perhaps the lexemes are lost but the etymons are not. Vininn126 (talk) 11:59, 23 November 2023 (UTC)Reply[reply]
@Vininn126 {{surf}} is not the only way to express the morphology of a term - {{affix}} can do that too, and since it doesn't come with a magical incantation like "by surface etymology", you can write something more descriptive of the actual situation yourself. {{affix}} supports qualifiers and other params you can use in some combination to give more context on each morphological constituent. Was that your question, or something else? Chernorizets (talk) 12:12, 23 November 2023 (UTC)Reply[reply]
This is why I'd be for changing the wording of surface analysis to allow for historical developments as well. It reduces the amount of ways to write this information. Why have two ways when we can have one? Vininn126 (talk) 12:15, 23 November 2023 (UTC)Reply[reply]
@Vininn126 productive suffixes don't necessarily need an etymology template. E.g. Bulgarian компютърен (kompjutǎren, computer (rel. adj)) is a simple example of slapping the highly-productive -ен (-en) suffix to компютър (kompjutǎr, computer). I'd express that with {{affix}} rather than an etymology template. The point I'm trying to make is that one size will not fit all, and I'd rather we just clarity the cases where {{surf}} makes sense.
AFAICT "surface analysis" is a bit of a Wiktionary term, and right now it means "synchronic" + "in the present". If we were to change that, we'd need to change Appendix:Glossary and possibly a few other places. Doable, but maybe the way to go is to have a "synchronic analysis" template with a time referent, since synchronic can refer to a point of time in the past. It all depends on how common this is beyond the one Polish example that started the discussion. Chernorizets (talk) 21:01, 23 November 2023 (UTC)Reply[reply]
I still don't see why we can't make a one-size-fits all. Maybe the current use doesn't allow for it, but why can't we modify it? Something to, for example, "morphologically". Vininn126 (talk) 21:08, 23 November 2023 (UTC)Reply[reply]
@Vininn126 just a casual, quick look at the pages linking to {{surf}} indicates that it's being used by many languages, including a bunch of non-IE ones. At this time, I don't think a strong enough case has been made to change the template for everyone. If the issue with średni is particularly common in Polish, then maybe there should be a Polish template to account for it. If the issue is common across languages, then that would make a case for modifying {{surf}}, but this thread by itself doesn't demonstrate that. Just my POV. Chernorizets (talk) 04:18, 24 November 2023 (UTC)Reply[reply]
@Chernorizets It's perfectly reasonable to want to make that kind of change. We should look at how most people are actually using it to inform us if they are using it for etymons or lexemes. If they are using it for etymons, there is a clear indication that people naturally look at it that way, if not, then well no. Just because it's used by many people doesn't mean it can't be changed. Amd creating a particular template for just Polish is a bad idea - lots of terms have lexically obscured etymologies. Vininn126 (talk) 09:10, 24 November 2023 (UTC)Reply[reply]

FYI: The Low Down on the Greatest Dictionary Collection in the World edit

https://www.atlasobscura.com/articles/biggest-dictionary-collectionJustin (koavf)TCM 19:15, 22 November 2023 (UTC)Reply[reply]

@Koavf: fascinating! Thanks. — Sgconlaw (talk) 21:34, 22 November 2023 (UTC)Reply[reply]

The full set of blog posts can be found via https://blogs.libraries.indiana.edu/tag/kripke-collection/ - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:59, 26 November 2023 (UTC)Reply[reply]

Besides the headword, where else should word stress be marked for Bulgarian terms? edit

(Notifying Atitarev, Benwing2, Bogorm, Bezimenen, Kiril kovachev):

Word stress is not marked in Bulgarian writing, except in rare cases of homographs like ѝ (ì, (to) her) vs и (i, and). Since stress is free and unpredictable, we indicate it on the headword line, and in the input to our pronunciation module in order to get correct IPA. For consistency with other Slavic languages, we use the acute accent to mark stressed vowels.

Where else (e.g. English translation sections) should word stress be indicated for Bulgarian entries? Currently, the tendency is "everywhere", but that's inconsistently applied, and I'm not sure if it's always the right thing to do. So here's a likely non-exhaustive list of situations, and my thoughts on them - I could use some more perspectives.

  • English translation sections
    I'm not sure about this one. Given that some languages use the acute accent as a diacritic, a user looking at a translation box might reasonably conclude that Bulgarian words are written with stress marks. It would take them an extra click, and possibly a scroll to the top of the page to realize that's not the case. One could argue that it gives a clue to the pronunciation, but I'm skeptical - in a translation box, Bulgarian words get transliterated, but the transliteration is not a good proxy for the actual pronunciation. For that, a user would still have to click on the term and navigate to its entry. In short, the stress doesn't add much, but can be distracting/confusing.
  • Descendant lists of terms in other languages from which Bulgarian borrowed/derived
    For pretty much the same reasons as for translation boxes, I'm skeptical it's of value to mark stress on Bulgarian terms when they are listed as one of several "Descendants" of another entry.
  • Etymologies of terms in other languages that are borrowed from Bulgarian
    This seems useful, because the Bulgarian term's stress might inform the phonology of the corresponding term in the borrowing language.