Arabic and Hebrew transliteration edit

Wiktionary currently transliterates the glottal stop in both Arabic and Hebrew as ʔ and the voiced pharyngeal fricative in both languages as ʕ. Would it be possible to correct these to respectively transliterate the glottal stop as ʾ and the voiced pharyngeal fricative as ʿ so they would be in line with Wiktionary's transliteration of other Semitic languages, which all use ʾ and ʿ?

Wiktionary also currently transliterates the Arabic voiceless velar fricative as . However, an alternate transliteration as is also used for this sound. Since is used for the transliteration of voiceless velar fricative for most Semitic languages except for Hebrew and Aramaic, I would like to request that Wiktionary's transliteration of the Arabic voiceless velar fricative be changed from to as well. Antiquistik (talk) 13:08, 1 May 2024 (UTC)Reply

@Antiquistik: No, we switched the other day. As for to , I don’t know, perhaps it’s better if you want to make an etymological statement that is fricativized k which we keep in begadkefat affected languages while organic. Fay Freak (talk) 18:08, 1 May 2024 (UTC)Reply
@Fay Freak In this case, I will add my opposition to the discussion regarding that change.
Concerning to , should I make another request, or should I add it to this one itself? Antiquistik (talk) 18:55, 1 May 2024 (UTC)Reply
@Antiquistik IMO the opposite change should happen and other Semitic languages should use ʔ and ʕ. The problem with the forward and backward quotes is that they're too small and too easily confused in many fonts. I also think ḵ is better than ḫ; ḫ is easily confused with the pharyngeal fricative. Benwing2 (talk) 23:49, 1 May 2024 (UTC)Reply
Personally, I agree with Benwing, although I am sympathetic to the idea that we should use whatever is most widely used, and I am also sensitive to the issue of words being findable by people who search for them using other transliteration systems. I would like us to implement having the templates/modules produce (but then potentially set to be invisible / display:none by default) other common transliterations so the entries can be found if people use our site search or Google and search for ʾiʿlān etc, as discussed in the 2022 discussion, unless that would cause problems. Then we could probably also set different CSS classes for the different transliterations so people could select whether they see ʾiʿlān or ʔiʕlān, similar to the way people can choose to see or not see {{,}} (and we could debate which one would be most helpful to have on by default for the average lay reader). - -sche (discuss) 02:33, 2 May 2024 (UTC)Reply
@-sche I think this is a good idea. AFAICT it would require some changes to Module:languages (which handles transliteration) so that a given transliteration method can return multiple transliterations rather than just one, each transliteration associated with properties such as CSS class, with one of them identified as "canonical" (meaning it is displayed while the others aren't). The only tricky thing here is manual transliterations; ideally, there would be method to convert a manual transliteration in the canonical system into each of the other systems, so that users have to specify only one transliteration rather than multiple. In the examples here, that conversion isn't hard, but sometimes it may not be possible (e.g. the current Hebrew transliterations are based on modern Hebrew pronunciation, which has several mergers compared with Biblical Hebrew, so we couldn't convert modern to Biblical Hebrew transliterations). Benwing2 (talk) 02:45, 2 May 2024 (UTC)Reply
@Benwing2: I believe that some of the existing manual transliteration entries may need to be reviewed in order to see whether their use was actually justified in the first place. Some of them are there only to workaround various technical issues, which ceased to exist. For example, this manually added transliteration for a Belarusian quotation became unnecessary after this fix. And I definitely support the idea of having multiple transliteration schemas, because this would allow introducing Belarusian Łacinka in addition to the current WT:BE TR scholary transliteration. As @-sche mentioned, the primary motivation is that words should be preferably searchable via Google or via the search box from the Wiktionary front page. Belarusian entries currently solve the searchability problem via manually added "Alternative forms" sections with red links, but this isn't ideal. So the proposed improvement has uses even beyond Arabic and Hebrew. --Ssvb (talk) 16:41, 2 May 2024 (UTC)Reply
Yes, I'm also in favour of having multiple transliteration schemes for this reason. Theknightwho (talk) 11:44, 8 May 2024 (UTC)Reply
@-sche This is a good proposal.
@Benwing2 I understand that ʔ and ʕ are more visible than the small half-rings, but I question how useful using them would be for the average reader since they are barely used in current transliteration schemes. If it hinders readers' ability to find these entries, we should avoid using them. Additionally, when is ḫ confused with the pharyngeal fricative? Antiquistik (talk) 05:42, 2 May 2024 (UTC)Reply
@Antiquistik I'm not sure what you mean by "barely used in current transliteration schemes". Are you referring to transliteration schemes outside of Wiktionary? If so, why do you think the average reader will be familiar with them, but won't be familiar with IPA? As for using ḫ, my point is that this is easily confused with ḥ (the transliteration for pharnygeal fricative), and having all three of h ḫ ḥ is going to make for endless confusion. Benwing2 (talk) 05:47, 2 May 2024 (UTC)Reply
@Benwing2 While I don't think that the average reader will be more familiar with the IPA signs, I doubt that they will be searching Arabic terms with signs from the current standard transliteration schemes substituted by IPA signs that are rarely used for Arabic transliteration.
And, as pointed out by @Ssvb, the entries need to be searchable. Using the more widely employed transliteration is the better option for this.
As for the transliteration of /x/, I strongly disagree with your position. The transliterations for other Afroasiatic languages like Old South Arabian, Ugaritic and Ancient Egyptian use both ḫ and ḥ without any problem, and I don't see why should the organic /x/ in Arabic be represented through a character used for sounds affected by begadkefat. Antiquistik (talk) 11:19, 3 May 2024 (UTC)Reply
@Antiquistik: Your premise of the signs being but used in IPA transcriptions before having been adopted by Wiktionary is wrong. We realized that there are lots of linguistic books, more or less traditionally Semitist, with them as their editorial choice for transcription. I have doomsurfed the philologies enough in the last 1½ decade to know that this is by far not so uncommon as to be stunting someone’s dictionary use. I also want to raise your attention towards pertinent languages without native writing system that can only be entered in an academic transcription, the Modern South Arabian languages, which have suffered some variations in transcription styles over the decades and native countries of researchers but I think are amenable as written down at أَيْدَع (ʔaydaʕ), whereas with all their diacritics the rings would strain the readers’ tempers. Fay Freak (talk) 11:37, 3 May 2024 (UTC)Reply
@Fay Freak How prevalent are Arabic transliterations using the IPA signs compared to the half-rings? Antiquistik (talk) 13:06, 3 May 2024 (UTC)Reply
@Antiquistik: No one, or at least not me, can do stats on such thing. There’s is also a qualitative difference in the kinds of resources that use them. In purely Semitist sources due to tradition the rings hold their ground. I have clicked around in my Semitics folder for you. I wanted to say that Leonid Kogan uses MODIFIER LETTER GLOTTAL STOP ˀ a lot, which is a bit more conspicious and between the two extremes, but the second work by him I opened ({{R:tig:Kogan:2011}} after {{R:sem-pro:GC}}), goes the whole hog and uses ʔ for Arabic and the other Semitic languages. {{R:sqt:CSOL}} and {{R:sem-pro:SED}} uses ˀ, anything published in the Journal of Semitic Studies such as doi: 10.1093/jss/fgt038 the rings, we may see it as a publisher decision, in more relaxed journal pieces he seems to prefer the IPA letters? In the old and long series Perspectives on Arabic linguistics you got the IPA letters all around. There is a lot of socialization behind letter choices, you just need to get used them, but not lose aesthetic sense. University docents may teach something specific but there is a point where one shan’t believe other people. Younglings learn and adults function by imitation but science by organized skepticism, a dilemma.
The complicated part: I can hold you a lecture how it is has to do with spatial-temporal memory, again the first chapter of the handbook of memory, ASD and the law I mentioned. Everything normal in the head, you guys tribally react to relations previously experienced with and from other people, in spite of the meatspace effecting the worst selection bias, contrary to universalism of science. You underestimate the psychological background behind all this. I did hardly positively respond to what teachers required or expected from me in terms of organizing a treatise, by some internal logics which aren’t strictly rationally evident, writing points of a paper in this and that order and not missing out a super-influential fashionable nonsense in the field I mean, which is detrimental to exams, and self-portrayal in job applications, however exquisitely able to judge the merits of the matter in isolation, and I am now very aware how strong feelings about signs come about, without sustaining them myself. We don’t just count voices together to let the loudest party win, this is not how creating good stuff works, only a working hypothesis. Fay Freak (talk) 14:09, 3 May 2024 (UTC)Reply

Descendant tree design edit

Here's my idea for a horizontal tree style that could be generated by {{etymon}}. I've switched up the colour scheme, since this is a descendants tree rather than an etymology tree. We can also include question marks or labels just as in the etymology tree. Let me know what you think! @Vininn126, Equinox, Sławobóg, -sche, 0DF Ioaxxere (talk) 21:24, 1 May 2024 (UTC)Reply

How would you represent borrowings and morphological reshaping in this format? Also I think I prefer Design 2, because in Design 1 the single right-branching node might be interpreted as somehow different from the below-branching nodes (and in addition, in Design 1 someone might e.g. interpret the juncture where Proto-Italic branches off as its own node, a daughter of PIE rather than just an artifact of the design). However, even better than either IMO would be one where the parent is centered vertically among all of its children rather than being at the top. Benwing2 (talk) 02:55, 2 May 2024 (UTC)Reply
@Benwing2: Probably with the same label system that {{etymon}} already uses. I like your idea for centering the node, although for trees with a huge number of lines it might lead to the ultimate ancestor being far down the page. Possibly the ultimate ancestor could be given some kind of special status where it always goes at the top left of the page. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)Reply
I think Design 2 is also my preference, at least on desktop. Vininn126 (talk) 13:25, 3 May 2024 (UTC)Reply

Design 1

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 2

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 3

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 4

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

Design 5

Proto-Indo-European *ph₂tḗr

Proto-Germanic *fadēr

Proto-West Germanic *fader

Old English fæder

Middle English fader

English father

Scots faither

English faeder

Proto-Italic *patēr

Latin pater

Old French pere

Middle French pere

French père

English père

English pater

Tok Pisin pater

Proto-Celtic *ɸatīr

Old Irish athair

Manx ayr

English ayr

At the risk of stating the obvious, only a small fraction of the descendants are being shown here. Is this focussed on English? Nicodene (talk) 21:49, 1 May 2024 (UTC)Reply
@Nicodene: This is just a mockup. I created all the HTML by hand, but the full (automatically-generated) tree will have all the descendants. Ioaxxere (talk) 22:11, 1 May 2024 (UTC)Reply
How would they all fit? Some of the ‘nodes’ have dozens of direct descendants. Nicodene (talk) 22:16, 1 May 2024 (UTC)Reply
@Nicodene: The tree would be extremely tall in that case. Either way, it would still be significantly more readable than something like what we currently have at Reconstruction:Proto-Sino-Tibetan/s-la#Descendants. Ioaxxere (talk) 22:19, 1 May 2024 (UTC)Reply
I have to agree with Nicodene. With etymology trees and the vertical format, it makes more sense to me because the tree will be much more compressed, but for descendants, I can't really see it working as well. It'll get really unwieldy and fast. The list you've pointed too isn't good either, but I don't like replacing one problem with another one. Looking at the link you've sent, how would this interact with etymology-only languages or the situation with Chinese? AG202 (talk) 03:06, 2 May 2024 (UTC)Reply
Etymology-only languages shouldn't be too difficult to handle in general. For Chinese, I feel like including dozens of dialectal pronunciations in Reconstruction:Proto-Sino-Tibetan/s-la is excessive and we should reduce that to only those forms which were borrowed into other languages. It's also possible that descendants trees will end up having less automation than etymology trees in general. Ioaxxere (talk) 05:31, 2 May 2024 (UTC)Reply
One thing that needs to be addressed is alternative forms. In Middle English, there are loads of them for everything. They can't always be ignored, because there are enough cases like catch and chase from Old French: chacier, chacer; cachier, flour and flower from Middle English: flour, fflour, fflowr, fleur, flor, floure, flower, flowr, flowre, flowyr, flur or even morrow and morn from Middle English: morwe, morewe, morowe, morow, morrou, morue, morw, morȝe, morewen, morowen, morȝen, morwen, morwyn, morwhen, morwoun, morun, moron, moryn, morn; morgen, marhen, mareȝen, morghen, moruwe, where different alternative forms have different descendants. Chuck Entz (talk) 18:31, 3 May 2024 (UTC)Reply
Love it. After a quick glance at the HTML, is the only difference alignment? I think that since this could appear early on in a number of entries that have right-floating tables of contents, I think left-alignment makes the most sense to avoid some of the inevitable bunching. —Justin (koavf)TCM 22:14, 1 May 2024 (UTC)Reply
@Koavf: No, the difference is whether there are connectors on the bottom of the boxes. I have no idea why the alignment is different, actually... Ioaxxere (talk) 22:16, 1 May 2024 (UTC)Reply
Ah, I see that now. —Justin (koavf)TCM 22:17, 1 May 2024 (UTC)Reply

get rid of noun and adjective plural form categories once and for all edit

There appears to be consensus established here, here and here, as well as in this diff, to not categorize noun and adjective non-lemma forms in separate 'noun plural forms' and 'adjective plural forms' categories. Yet when I made such a change for newly added Chadian Arabic terms, my favorite editor User:Fenakhay went on a revert spree. By longstanding consensus, we do not in general categorize non-lemma forms as e.g. Category:Russian noun prepositional case forms etc., so I don't see why an exception needs to be made for noun plural forms. However, I'd like to get clear consensus here to remove all such categories and delete the entries from Module:category tree/poscatboiler/data/non-lemma forms that allow such categories to be recognized. We have already done this for some languages; for example, there is intentionally no Category:English noun plural forms, and that page is protected against re-creation by bots or non-admins.

The alternative is to outline a clear rationale for why we need such categories and a rule for which situations they are allowed and which situations they aren't allowed. Either way, the current haphazard situation, where some languages have such categories and some don't, and the categories are incomplete, is unmaintainable.

Benwing2 (talk) 23:45, 1 May 2024 (UTC)Reply

And a stronger consensus at Wiktionary:Requests for deletion/Others#Category:Adjective plural forms by language. It seems that Fenakhay is the only editor who supports the retention of these categories. Consensus is against them. This, that and the other (talk) 02:55, 2 May 2024 (UTC)Reply
I support getting rid - trivial category intersections like this are a waste of time. Theknightwho (talk) 03:24, 2 May 2024 (UTC)Reply
I don't see any rationale for this kind of category either and so am in favour of deleting them. Nicodene (talk) 14:13, 2 May 2024 (UTC)Reply
I agree as well. Ioaxxere (talk) 17:57, 2 May 2024 (UTC)Reply
Support deleting these. Ultimateria (talk) 17:21, 6 May 2024 (UTC)Reply
If we have this kind of thing, it should be with a clear rationale for when/where and why (as Benwing says) and it should be added automatically, probably by whatever headword- or definition-line templates we're using to declare something as a noun plural form, paucal form, etc in the first place — I say this because as far as I saw in the prior RFDs, the categories were populated haphazardly and manually with handfuls of entries, which is not useful. The usefulness of categorizing non-lemma forms by their specific non-lemma-ness seems small (though not nonexistent) to me; I suppose if I wanted to know what kinds of endings Foobarian noun plural forms had, a category would be useful, but the array of endings which Foobarian noun plural forms have could alternatively be mentioned on the About Foobar page, or on the Foobarian equivalent of Appendix:English grammar. Can anyone articulate something these categories would be useful for? (Absent that, I have no objection to deleting them, and indeed voted to do so in some of the prior RFDs.) - -sche (discuss) 19:21, 2 May 2024 (UTC)Reply
Personally, I find these categories very useful from a navigational standpoint, so I'd like to see them kept. That said, they should be added automatically as part of templates like {{infl of}} and {{plural of}}, not added manually by users. Binarystep (talk) 11:26, 5 May 2024 (UTC)Reply
@Binarystep Do you realize this is simply an intersection category? In general we don't usually include intersection categories because you can search for any combination using the Search feature. In this case, e.g. to do the equivalent of CAT:Chadian Arabic noun plural forms, you can search for the combination of category CAT:Chadian Arabic noun forms and template Template:plural of. Adding them automatically using templates like {{infl of}} and {{plural of}} has already been tried, but it turns out to be difficult from a programmatic standpoint in some cases and a maintenance headache, which is the reason I want them removed. Benwing2 (talk) 20:02, 5 May 2024 (UTC)Reply

template similar to Template:alt or Template:desc for Derived terms, Related terms, etc.? edit

Hi. User:Fay Freak and I have been having a discussion about using {{alt}} or {{desc}}, or a creating a similar template, for Derived terms and the like. This came up because Fay Freak has been using {{desc|nolb=1}} in Derived terms sections. (Note: |nolb=1 disables the language name at the beginning. FF proposes renaming |nolb= to |nolang= to avoid confusion with |lb= for labels and because what's being suppressed is a language name, not a label.) Both {{alt}} and {{desc}} let you specify a series of terms along with per-term properties plus overall labels for the whole set of terms, although the syntax of the two templates is different and {{desc}} has some extra features specific to descendants. Note that we also have {{syn}}, {{ant}}, etc. for inline synonyms/antonyms/etc., which likewise have support for specifying a series of terms with both per-term properties and overall labels. The current syntax for Derived terms, Related terms and such involves manually listing each term with {{l}} and using {{q}} to add qualifiers as needed, but compared with {{alt}} and {{desc}} this is both more cumbersome and less standardized, meaning that different people format things differently. I think we ought to have some way for Derived terms sections and the like of specifying a list of terms plus labels, similar to {{alt}} and {{desc}}. The question is, should we just reuse e.g. {{alt}} for this purpose, or create another template? (If the latter, I'd maybe call it {{terms}}.) Potentially we could rename {{alt}} to {{terms}} or something similarly generic and keep {{alt}} as an alias, since there isn't really anything about {{alt}} that is specific to Alternative forms.

I'm omitting mention of {{col3}} and the like; while these are useful especially for long lists of similar terms, they don't provide the ability to specify a set of labels at the end of the list of terms, as {{alt}} and {{desc}} do.

Benwing2 (talk) 05:22, 2 May 2024 (UTC)Reply

That'd be quite nice. All I have to add is that it'd help to have the option to split derived terms into columns or put them in collapsible boxes, as people have been doing with a variety of other templates (cf. cado). Nicodene (talk) 14:01, 2 May 2024 (UTC)Reply
I think we'd be able to scrape this to be honest. All it'd need is an etymology section for most terms... Vininn126 (talk) 16:20, 5 May 2024 (UTC)Reply
@Vininn126 I don't quite understand what you mean, can you clarify? Benwing2 (talk) 20:03, 5 May 2024 (UTC)Reply
Sorry, misinterpreted. Not sure I have a strong opinion. Vininn126 (talk) 07:17, 6 May 2024 (UTC)Reply
I'm having trouble understanding the need for such a template beyond stringing multiple {{l}}s together. Can you give an example? I'm also confused by the association being made between Derived terms and Alternative forms. They're pretty distinct in my mind. -- Sokkjō 03:41, 11 May 2024 (UTC)Reply
@Sokkjo User:Fay Freak gave the example in Sittenstrolch of using {{desc|de|Sittich|lb=prison slang|nolb=1}} under Derived terms in order to get the label functionality; it displays as
You can get a somewhat similar effect using {{alt|de|Sittich||prison slang}}:
Here, only one term is listed but you can easily imagine listing multiple terms and multiple labels, which are supported in both syntaxes. Note that you couldn't so easily just use a qualifier because the labels autolink like {{lb}} labels, but don't categorize. I suppose you could write
* {{l|de|foo}}, {{l|de|bar}}, {{l|de|baz}} {{lb|de|prison slang|Austria|nocat=1}}
which displays as
much like writing
* {{alt|de|foo|bar|baz||prison slang|Austria}}
but as you can see, the former is much more awkward.
The reason I brought this up is that there's not a lot of functionality (and arguably no functionality) that's specific to {{alt}}; that's why I mentioned generalizing (or simply renaming) {{alt}} so it can be used outside of Alternative forms sections. Benwing2 (talk) 07:10, 11 May 2024 (UTC)Reply
In the example Sittenstrolch, there is no reason a usage label would belong there -- that should be left to the entry page. If I saw a user add that, I would delete it. -- Sokkjō 07:27, 11 May 2024 (UTC)Reply
Obviously not everyone agrees with you, because qualifiers and labels are extremely common in derived terms, synonyms and the like. I would tread lightly and think twice before deleting such a label. Benwing2 (talk) 08:39, 11 May 2024 (UTC)Reply
What other users are putting usage labels in the derived terms section?! -- Sokkjō 05:07, 12 May 2024 (UTC)Reply
Being able to string together multiple {{l}}’s is all I ever wanted for Christmas. Nicodene (talk) 05:56, 12 May 2024 (UTC)Reply

Plurals on head lines and declension tables edit

Is there any point in having both plurals on the head line and a declension table showing the plural for a noun lemma? I would be inclined to omit the plural(s) when there is a declension table. --RichardW57m (talk) 16:36, 2 May 2024 (UTC)Reply

@RichardW57m it would perhaps help to specify which language you're thinking of and give an example. This, that and the other (talk) 03:07, 6 May 2024 (UTC)Reply
The specific language where this has come up is Lithuanian, avìdė, which currently only displays the plural through the declension table. A similar specific is with the Lithuanian adjective headword template, where until recently many ordinals' neuter form was wrong and contradicted the following declension table. --RichardW57m (talk) 11:27, 7 May 2024 (UTC)Reply
IMO it depends on how regular the inflections in question are. If they serve as something like principal parts, I think it's useful to put them on the headword line as well as in the declension table, because then someone with some familiarity with the language will know how to inflect the term without needing to look through the whole declension table to figure out what the most important parts are. This is similar to how we list the past historic and past participle for Italian verbs. OTOH if they are largely predictable, putting them in the headword line is less useful. Benwing2 (talk) 23:27, 8 May 2024 (UTC)Reply
As Benwing suggested, I would say the answer is language-specific. For example, in German, plurals seem to be the most unpredictable declined form of a noun, so it makes some sense to give the plural in the head line.--Urszag (talk) 22:38, 9 May 2024 (UTC)Reply

A way to more easily connect with readers: a follow-up edit

Following Wiktionary:Beer_parlour/2024/March#A_way_to_more_easily_connect_with_readers, I wrote to WMF in an attempt to figure out how to best resolve this issue. @Johan Jönsson replied and has given us an option, I think. He suggests we create a new mailing list for admins and for us to put enwiktionary in the name somehow. What do people think of this solution? Vininn126 (talk) 16:03, 3 May 2024 (UTC)Reply

  Support Ioaxxere (talk) 16:56, 3 May 2024 (UTC)Reply
  Support This, that and the other (talk) 08:30, 5 May 2024 (UTC)Reply
  Support Binarystep (talk) 12:45, 5 May 2024 (UTC)Reply
  Support Thadh (talk) 11:09, 6 May 2024 (UTC)Reply

Volga Türki language edit

Greetings, I'd like to propose giving Volga Türki an L2.

It is a significant member of the Middle Turkic literary languages, and is as important as Ottoman Turkish, Chagatai and Karakhanid, all of which already have their own Wiktionary categories: Category:Ottoman Turkish language, Category:Chagatai language, Category:Karakhanid language. Volga Türki is considered a descendant of Karakhanid, together with Chagatai, however they all are roughly contemporary.

It was in wide use in the Volga-Ural region from 15th century (if including Qissa-i Yosof poem by Qul Ghali, then from 12th century) until adoption of Cyrillic and Latin scripts for Tatar and Bashkir languages under Soviet rule. Even though before Soviet rule, at late 18th-early 19th century the written languages for Tatar and Bashkir started to slightly diverge from Volga Türki, it remained a common standard for international affairs, especially between other Turkic groups.

Its addition would not only help with etymological sections, but also help connect the cognates with other Turkic languages, similarly to other Middle Turkic literary languages' sections.

As for Unicode characters, numerals and readings, I already have prepared all of this, and will work on adding them as soon as the category is created. The sources of lemmas are going to be taken from books, dictionaries and other written resources from that time period. I will try to list a source for each lemma whenever possible.

The only issue, however, is that the language does not have its own ISO 639-2 code yet. I propose one of the following codes to be used for the language: iut (for İdil-Ural Turkic); tui (Turkic of İdil-Ural). I deprecate codes like vut (Volga-Ural Turkic) and ott (Old Tatar) firstly due to the name Volga not being used by the locals, especially during the era of Volga Türki, and secondly due to the name Volga/İdil/İdel Türki being neutral, and Old Tatar primarily referring to the diverged variant of Volga Türki that was used specifically for Tatar. Bababashqort (talk) 16:06, 3 May 2024 (UTC)Reply

What is the Volga Turki corpus and how accessible is it? Qissa-i Yosof poem by Qul Ghali should definitely not be included, as it is covered by Khorezmian Turkic [1]. Allahverdi Verdizade (talk) 20:40, 3 May 2024 (UTC)Reply
  Support BurakD53 (talk) 17:57, 4 May 2024 (UTC)Reply
Its corpus mostly isn't digitalised, but practically all Bashkir and Tatar literature from at least 16th century until late 19th century is written in Volga Türki. The books, manuscripts and magazines are still preserved in a lot of libraries in Tatarstan and Bashkortostan. As for Qissa-i Yusuf, that is somewhat debatable, but given the timeframe it probably suits Khorezmian, as one of the ancestors of Volga Türki. Bababashqort (talk) 07:24, 5 May 2024 (UTC)Reply
@Bababashqort: for the last issue, we generally make up our own codes using the code for the group it belongs to (probably "trk") followed by a hyphen ("-") followed by some sequence of letters that's not already in use by us. That way there's no chance of our code conflicting with an ISO code. Since this is strictly for internal use and our modules and css/jss code convert everything for browsers, we don't have to use existing ISO codes. Chuck Entz (talk) 18:24, 4 May 2024 (UTC)Reply
Yes, I've been told that wiki uses a placeholder, but didn't exactly know how it worked. Thank you for explaining!
In this case I'd suggest trk-iut Bababashqort (talk) 07:25, 5 May 2024 (UTC)Reply
@Bababashqort We try to use the first three letters of the lect in the second part of names like this. What do you think of trk-idi or trk-vol? Benwing2 (talk) 08:04, 5 May 2024 (UTC)Reply
trk-idi includes only the Volga part, as well as trk-vol. The name itself, however, is taken from the most widespread naming of the language, which unfortunately is shortened to Volga Türki, omitting Ural. And speaking of İdil, it is actually spelled as İdel in Tatar itself, İdil is just more Common Turkic. Therefore the only solution seems to be trk-iut, it's not that hard to deduce I think. Bababashqort (talk) 11:54, 5 May 2024 (UTC)Reply
@Allahverdi Verdizade suggested to make a Turki category instead, which I'd very much prefer. It would remove the need to add more distinct subvariants of it, such as North Caucasian Turki, Nogay Turki and others. This would also allow to use derivation template for all languages that used it: Crimean Tatar, Kumyk, Nogay, Bashkir and others. Bababashqort (talk) 13:21, 5 May 2024 (UTC)Reply
@Bababashqort Sure, that works. What language is this a category of? Benwing2 (talk) 19:56, 5 May 2024 (UTC)Reply
I think he meant he wants Türki as a language code, not specifically Volga Türki Bortkastningskonto (talk) 07:01, 6 May 2024 (UTC)Reply
@Bortkastningskonto @Bababashqort OK, I need more information then. Is "Türki" supposed to be an L2 language? This is an awfully generic name for a language, and I would likely oppose this name for this reason. And I will repeat my assertion that the code for Volga Türki should be 'trk-vol' in keeping with the name. The code should reflect the first three letters of the lect name barring extraordinary circumstances (usually due to ambiguity when there are multiple lects sharing the first three letters, which is not an issue here). @Allahverdi Verdizade can you weigh in here? I am not qualified enough to tell whether this should be an L2 language, an etym-only language or just a label of some other language (the last two being rather similar). Benwing2 (talk) 07:11, 6 May 2024 (UTC)Reply
I didn't actually suggest making Türki a L2, rather I wondered whether it wouldn't be better to do so depending on how different Volga Türki is from, say, North Caucasian Türki. I can't answer that question myself, and I think, in general, very few people can give a well-informed opinion on that. Reading this book on North Caucasian Turki (in Russian) might help a little. Considering that Bababashqort is likely only going to work with sources written in the Volga variety, maybe it is the safest to create a Volga Turki L2, in which case you would circumvent the problem with "awfully generic name". Documents in North Caucasian Turki are terribly inaccessible (not digitized or normalized), so I don't think anyone is going to work with them.
In any case, there is also the problem of classifying "literary languages" and fitting them into genealogical tree schemes. It is often said that this or that language "is moslty X, but also incorporates elements of Y", at the same time as it "continues the literary tradition of Z". I can't exactly tell you what it means that "Volga Turki continues the tradition of Khorezmian Turki", which in turn "continues the tradition of Karakhanid", as it oftentimes is put in Russian books on the matter. Too much arbitrariness for my taste. So my opinion is that these "literary languages" maybe should not have ancestors and descendants. Allahverdi Verdizade (talk) 17:35, 7 May 2024 (UTC)Reply
  Support Yorınçga573 (talk) 20:23, 9 May 2024 (UTC)Reply

Request for a new language edit

Yet again, I request for Old Lombard to be listed separately, as for now Old Lombard is listed as a dialect and not a language. That Northern Irish Historian (talk) 17:30, 4 May 2024 (UTC)Reply

I notice that Old Italian is currently an etym-only variant of Italian. Why can't Old Lombard be the same? How different are Lombard and Old Lombard? Benwing2 (talk) 18:59, 4 May 2024 (UTC)Reply
Old Lombard:
  • Faremo preg a Deo a Questi cominzament
  • et a la soa mather ke preg l’omnipotent.
  • Ke n’des a dir et a far tute l so placiment
  • Ço ked is la scritura si se conven a dir
  • De la pasin de Christ a ki ne plas hodir
  • La qual per nu katif je plase sostegnir
  • Bene questi paroli de panzer e da stremir
  • Qui longa fis e dis del pasio del fy de la rayna.
  • La qual si m’dia gratia et a mi sia vesina
  • Ke parlo dritament de la pasion divina
  • St’apreso si me scampo da la infernal pena.
Modern Lombard:
  • Ambiaróm con ‘na preghiéra a Dio
  • e a sò madèr che la préghes l’Onipotent
  • Che nómes a dì e a fa töt de so gradimènt
  • E per bontà sò el vègnes a compimènt
  • Chèl che la dis la Scritüra isé come l’è giöst a dìl
  • De la pasiù de Cristo a chi che öl sintìl
  • Pasiù che per notèr pecadùr la sèrf a soportà
  • Con rasegnasiù chèste parole de pianzer e dè dulùr
  • Ché se parla e se dìs del fiöl de la regina
  • Che la me dàghes gràsia e la me stàghes vizìna
  • ‘Ntat che parle drit de la pasiù divina
  • Semài che scamparó de la pena infernal.
That Northern Irish Historian (talk) 22:35, 8 May 2024 (UTC)Reply
@That Northern Irish Historian That's not what I was looking for; you have pasted in two different translations which naturally will be different. If you try to match up the corresponding words, they are IMO marginally different enough to maybe be considered different L2's (although they differ less e.g. than the current Occitan dialects). I notice however that there are 0 lemmas currently listed as Old Lombard; are you actually planning on adding some? Benwing2 (talk) 23:16, 8 May 2024 (UTC)Reply
Yes, but see zinqui, Jesu, and other pages. It is not working. That Northern Irish Historian (talk) 23:24, 8 May 2024 (UTC)Reply

That's how we enter these words. If you have any objections, please write here. BurakD53 (talk) 14:29, 5 May 2024 (UTC) wordsReply

lol. Yes, I have objections. Allahverdi Verdizade (talk) 16:11, 5 May 2024 (UTC)Reply
As I said before, I want the {{trk-ogz-pro}} code to be removed and replaced with {{trk-ogz}}. Since we have already reconstructed them all under the {{trk-pro}} pages, Proto-Oghuz is quite unnecessary. If anyone still wants to reconstruct Proto-Oghuz, you can reconstruct it using the * sign on the Oghuz page. (Which is quite unnecessary) Likewise, {{trk-klj}} can also refer to the Arghu language, but the data in this language consists of a few words. {{trk-ogz}} is the direct ancestor of all Oghuz languages, in short, it is the same as Proto-Oghuz {{trk-ogz-pro}}. However, we cannot enter these Oghuz or Proto Oghuz words recorded in the Diwan into the site as entries. It requires reconstruction in order to be entered to us. However, these Proto Oghuz words, also Proto Khalaj words, are not a reconstruction. I think that both of them should be entered as input on the site, the biggest reason is that these languages cannot be assumed to be dialects of other languages. But since the Arghu language consists of only a few words, it can be entered under the name Proto. Oghuz language is mentioned many times in the Diwan and even information about its grammar is given. A few Proto Khalaj, i.e. Arghu, words may be added as exceptions. But since this is the case for Oghuz, there is no need to create a language code called Proto-Oghuz. This is my opinion. I firmly reject the addition of these Oghuz words to Old Anatolian Turkish. Not every word mentioned in the Diwan has been witnessed in Old Anatolian Turkish, and the place where Kashgarî shows the Oghuzs on the map in the period he mentions is not Iran, but Central Asia. Also the words here are more archaic than the form in which they are found in Old Anatolian Turkish. BurakD53 (talk) 18:22, 5 May 2024 (UTC)Reply
  Support Yorınçga573 (talk) 20:10, 9 May 2024 (UTC)Reply

Lemma categories edit

Discussion moved from WT:Beer parlour/2024/April#Lemma categories.

I've been cleaning up Special:UncategorizedPages, and I've run across a number where @Nicodene has disabled categorization for alternative forms. My understanding is that all mainspace entries should be in either Category:[Language] lemmas or Category:[Language] non-lemma forms. While an alternative form is supposed to be a stub that links to the main form, as far as the categories are concerned, it's a lemma. It's certainly not a non-lemma form, because it has its own non-lemma forms. Leaving it out of both categories raises the question of why we have the entry at all, if we feel we need to hide it: if we don't link to it in the main entry, there's no way to navigate to it.

This has come up before over the years, and we've more than once decided to do it this way. As far as I can tell, Nicodene is the only editor who's doing otherwise. Has anything changed? Chuck Entz (talk) 03:13, 6 May 2024 (UTC)Reply

Why should Category:Franco-Provençal lemmas be clogged with twelve different renditions of ôtro, seventeen of ôtrament, and ten of solament? Why should Category:Old French lemmas (not to mention Category:Old French adverbs) be clogged with two hundred seventy one renditions of iluec? The whole point of a lemma is to provide a citation form to cover the variants. That is how altforms and altspellings are handled by the vast majority of dictionaries. Nicodene (talk) 03:23, 6 May 2024 (UTC)Reply
I'm of two minds here. Yes, we generally include alternative spellings and forms as lemmas; otherwise, for example, we'd end up including only one of oxidi{s,z}e as a lemma, and the other would go nowhere. At the same time, however, including 171 alt variants of iluec seems like serious overkill. Maybe we need a separate policy for non-standardized languages vs. standardized ones. Benwing2 (talk) 07:16, 6 May 2024 (UTC)Reply
At a minimum, every entry should be in some category. As far as how that's been accomplished up to now, my understanding matches Chuck's, that every entry is supposed to be categorized as either a lemma or a nonlemma (or both) and that alternatively-spelled nouns are still nouns (and lemmas, from the category / grammatical perspective). We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings", or something more specific like "Category:Foobarian alternative forms and spellings of nouns", "Category:Foobarian alternative forms and spellings of lemmas", but I do think we should continue to regard a completely uncategorized entry—an entry that cannot be accessed from any part of our category tree—as a problem.
There was support for not putting just any alternative spelling into topical categories in this 2022 discussion, but that didn't leave the entries categoryless.
FWIW, the issue of terms having tons of spellings isn't strictly limited to overall-nonstandardized languages, e.g. English has lots of spellings of kinnikinnick, Muhammad, voivode... but I think Benwing's suggestion of handling this on a per-language basis (and just accepting that the English categories will have a few cases like Muhammad where there are a bunch of spellings) is probably more workable than e.g. trying to decide (in a way that can be maintained over time with any consistency) on a per-spelling basis what counts, in a mostly-standardized (but standards-body-less "ungoverned") language like English, as a "standard" spelling. (E.g., several of the alternative spellings of Muhammad are used mainly in scholarly works, so dismissing them as nonstandard seems hard; and in the other direction, for a largely dialectal word, determining why any one spelling should be considered more standard than another seems hard.) - -sche (discuss) 13:54, 6 May 2024 (UTC)Reply
We could change that, e.g. add a parameter which, instead of turning categorization off, moves the entries from "Category:Foobarian nouns" to at least a POS-agnostic catchall "Category:Foobarian alternative forms and spellings"
I would be quite happy to use that if it were available as an option.
My main concern is keeping the categories clear and usable. When I look up 'Foobarian feminine nouns', for instance, I'd rather not have to wade through 5–10 (+) duplicates for every distinct noun. That is a serious headache with languages like Franco-Provençal or Romansch. Nicodene (talk) 07:54, 7 May 2024 (UTC)Reply
@-sche: I would like this to be implemented for English as well. Having full-fledged entries for minor spelling variants was a bad idea. Ioaxxere (talk) 03:08, 10 May 2024 (UTC)Reply
I disagree. All words should be given equal status, at least when it comes to categorization. I don't think Wiktionary should be treating variant spellings as inferior forms of the main entries. For starters, every spelling is (or was) the "default" spelling to someone. Using the example of Muhammad, for instance, there are plenty of people named Mohamad, Mohamed, Mohammad, Muhamad, Muhammet, etc. and it seems weird to claim that their names are merely lesser variants of the single "canonical" spelling. There's also the fact that some spellings carry unique etymological information, have slightly different pronunciations, or are used primarily by certain groups (regional spellings, for instance, or spellings used primarily by non-native speakers). Frankly, I find it troubling that there have been so many recent attempts lately to get us to reduce our coverage rather than expand it. At this rate, I won't be surprised if someone starts a proposal to convert alternate spellings into hard redirects. Binarystep (talk) 19:20, 11 May 2024 (UTC)Reply

Edit with "username removed" edit

This edit has the user name removed. How can one see (if not who the user is), which user removed it and why? [2] Equinox 09:33, 6 May 2024 (UTC)Reply

I removed it because it was an accidental IP/logged-out edit by an editor (the same as did a similar change to unrapable). — SURJECTION / T / C / L / 10:17, 6 May 2024 (UTC)Reply
I'm officially saying: don't do that. You can revert, delete, but do not wipe content unless it's real serious stuff like child porn. Thank you. Equinox 23:01, 7 May 2024 (UTC)Reply
Re how to see which admin performed the revdel: it's technically in the "View logs for this page" link on the edit history page, [3]. If there were a lot of revdels and they did not follow so closely after the time the edits themselves were made, e.g. if I now went to the page and hid a revision from two months ago, and then Surjection hid a revision from one month ago as well as your edit just now, it might be hard for non-admins [who don't have "diff" links] to discern from that log who hid which thing... I guess in that case they'd just have to say "hey, who revdel'd X" and admins could check.) - -sche (discuss) 14:28, 6 May 2024 (UTC)Reply
@Surjection: I want you to understand how it looked to me: I saw that someone had made an edit, they had no name, I couldn't see them, or talk to them, or discuss, it was like a GHOST DID IT. And I couldn't see who removed their name either. If you ever spent time on WP:OFFICE then ...well. Equinox 22:50, 7 May 2024 (UTC)Reply
I would, personally, be happy to see text like "edit made by a user whose name is hidden by this admin: Surjection". What I think is wrong and bad and goes against our free openness is just that MYSTERY NO-NAME. Equinox 22:51, 7 May 2024 (UTC)Reply
Side point: I know Chuck Entz (for example) likes to "clean the graffiti wall" so that vandals can't see their names. But I don't like that. The wiki should be a public space and we should only hide the history in real serious situations like "doxxing" (real name-addresses) or... am I wrong? @-sche @Chuck Entz @Surjection (and even worse, are there Wikipedia rules we are supposed to obey as children.) Equinox 22:55, 7 May 2024 (UTC)Reply
AFAIK it's global WMF policy to suppress this kind of thing (the IP addresses of users who've accidentally edited logged out), and indeed to suppress it way harder than a mere revision-deletion like Surjection did: "oversighters" have (or had?) database access to delete the information so hard that not even admins can see it. (But it also takes time to contact them, so it's fine for admins to revdel it in the meantime, like this.) This is precisely because of doxxing concerns, because many IP addresses identify the person's real address. (Other IP addresses, of course, merely send you to that one farm in Kansas.) If you ever see an edit where you think the content of the edit is wrong, just undo the edit... as you saw in this case, the username being suppressed doesn't prevent you from undoing the edit. - -sche (discuss) 01:39, 8 May 2024 (UTC)Reply
Would there be an issue if contributors were to hide their IP address with their screen names after, say, a week? CitationsFreak (talk) 03:46, 8 May 2024 (UTC)Reply
I should clarify that AFAIK such hiding only happens when someone requests it—usually the person who made the edit, though plausibly someone else who simply noticed what was going on. Last I heard, WMF folks were trying to roll out something that automatically obfuscates all IP addresses by making them show up in edit histories as e.g. incrementing numbers that change periodically or on request (so anytime someone thinks their current [non-]IP is getting too much attention from admins, they can hit "refresh" and start doing vandalism under a new identity, just like logged-in users can by creating multiple accounts), which will probably remove the need to do this in the future, if it gets implemented. - -sche (discuss) 05:12, 8 May 2024 (UTC)Reply
Would there be an issue if contributors could request that their IP address be hidden by their screen name? CitationsFreak (talk) 05:42, 8 May 2024 (UTC)Reply

The issue of Old Kashubian (Old Pomeranian?) edit

I came to a recent realization about the {{R:zlw-opl:SPJSP|Old Polish dictionary}}: it contains texts from Pomerania with Pomeranian features, as it was made during a time when Kashubian was considered a dialect of Polish. However, typologically, this is very, very wrong. Pomeranian is considered North Lechitic, and anything "Polish" and (Masovian, Upper Polish, Lower Polish, and Silesian) are considered East Lechitic, therefor anything Old Kashubian should not be considered Old Polish. I propose a split; I intend to add the location of creation for any Old Polish documents anyway for a future dialectal project (for Old Polish this means categorizing somehow location of attestation by dialect) and separating any texts from Pomerania for "Old Pomeranian" with a code zlw-opm, or perhaps "Old Kashubian" zlw-ocb with Kashubian and Slovincian as the children. These codes seem clunky to me and I am open to others. I have also corroborated this by emailing the editors of the Old Polish dictionary, who have told me that it indeed is "Old Kashubian", which they accept in their framework of Old Polish. Gorazd also holds the same view. @Thadh @Sławobóg @Rakso43243 @Benwing2 @Mahagaja @Silmethule. Vininn126 (talk) 10:50, 6 May 2024 (UTC)Reply

Alternatived are if we accept Kashubian and Slovincian as the descendants of Old Pomeranian, then we could set them both to be descendants of Old Polish. However, the argument for this is one could accept "Old Kashubian" as a constituent of Old Polish - not a dialect, but constituent. This is what the editor of the Old Polish dictionary told me, quote " Nie napisałam, że to dialekt. Napisałam, że to element składowy języka staropolskiego. To duża różnica. Język starokaszubski to element składowy języka staropolskiego." The alternative is also we ignore this, which seems wrong to me as well. Vininn126 (talk) 11:42, 6 May 2024 (UTC)Reply
Another solution: give Old Kashubian an etycode and make it an alt of Old Polish and if a term is attested in Pomerania, we could set the Kashubian and Slovincian reflexes as inherited from that? Otherwise directly from Proto-Slavic. Vininn126 (talk) 14:36, 6 May 2024 (UTC)Reply
@Vininn126 I think this last solution is maybe the best. This is similar to what is done with Old Northern French, which is considered an etym-only variety of Old French even though Old French as normally construed refers to the Old French of the Paris area whereas Old Northern French refers to the Old French of Normandy, and neither is an ancestor or descendant of the other. The two differ significantly in phonology, e.g. Old French chacier /tʃatsiɛr/ -> English "chase" vs. Old Northern French cachier /katʃiɛr/ -> English "catch". Anglo-Norman and modern Norman are both descendants of Old Northern French (although we currently list Norman as a descendant of Middle French, which is wrong) and modern French is a descendant of Old French per se. Benwing2 (talk) 18:38, 6 May 2024 (UTC)Reply
I know @Silmethule also mentioned a similar situation with Ancient and Mycenean Greek and also Old Norse and Swedish/Icelandic. See also my question on WT:About Old Polish. Related to that, I'm unsure how to handle labels for all of this. I think we'd want to list Kashubian/Slovincian in the Old Polish entries if and only if a text from Pomerania has an attestation. And any Kashubian/Slovincian words should still have "inherited from Old Kashubian/Pomeranian". Vininn126 (talk) 18:49, 6 May 2024 (UTC)Reply
@Nicodene As our resident Romance expert, do you agree with changing the ancestor of Norman to be Old Northern French instead of Middle French? This will cause the 5 terms in CAT:Norman terms inherited from Middle French to throw errors, I think. Can you fix up those 5 terms? Also I notice there are 30 terms in CAT:Norman terms inherited from Medieval Latin, which seems impossible and probably need to be cleaned up. Benwing2 (talk) 19:54, 6 May 2024 (UTC)Reply
I've just cleared out the categories in question. Αgreed on removing Middle French as an ancestor of Norman. As for its further ancestor, I would leave it as just Old French, which includes ONF as-is. I think the latter are best treated as one overall language.
I've been meaning to eliminate '[Romance] terms inherited form Medieval Latin' in general, reassigning them to '...inherited from Early Medieval Latin' or '...borrowed from [later] Medieval Latin'. That will take some time. When it's done, perhaps we can make {{inh|romance language|ML.|...}} throw an error message and a brief comment. Nicodene (talk) 00:50, 7 May 2024 (UTC)Reply
@Nicodene Thanks! I think the basic advantage of setting the ancestor of Norman to be Old Northern French is it more clearly shows the ancestry (when you go CAT:Norman language and look at the Ancestors panel) than just setting it to Old French. Since Old Northern French is an etym-only variant of Old French, I don't think it will make any difference in terms of what Norman terms are allowed to inherit from. What do you think? Benwing2 (talk) 01:44, 7 May 2024 (UTC)Reply
Oh, so setting it to ONF won't disallow inheritance from Old French. In that case it sounds fine to me. Nicodene (talk) 01:50, 7 May 2024 (UTC)Reply
Yeah that's right. Benwing2 (talk) 01:56, 7 May 2024 (UTC)Reply
@Nicodene @Benwing2 Here's how it works. If you set a variety (etym-only language) as an ancestor, the descendant can inherit from:
  • That ancestor and any (sub)varieties of that ancestor (in this case, Old Northern French, and any varieties it might have).
  • The parent (in this case, Old French) unless the ancestral variety is also explicitly ancestral to its parent (read: the thing it's a variety of), which doesn't apply here. This is for situations like Tajik having Classical Persian as an ancestor: Classical Persian's set as a variety of Persian, but is also set as its ancestor. Since Tajik's ancestor is also Classical Persian, it's only possible for it to inherit from Classical Persian (and any varieties thereof), not Persian in general.
It can't inherit from:
  • Any other varieties of the parent which aren't in the direct lineage of its ancestor (i.e. it wouldn't be able to inherit from other varieties of Old French, unless they're ancestral to/descended from/a subvariety of Old Northern French). To use an Italic example: if we set the proto-language of Romance to be Vulgar Latin, instead of simply Latin, the Romance languages could also inherit from Classical Latin (its ancestor), Latin (the general parent) and Old Latin (set as the ancestor of Latin), but they wouldn't be able to inherit from varieties like Medieval Latin or New Latin, since they aren't in the direct lineage.
It sounds complicated, but it seems to line up pretty neatly with most people's intuitions in practice. Theknightwho (talk) 15:40, 10 May 2024 (UTC)Reply
So it would be possible to set Old Kashubian as an etym-only variant of Old Polish and then set Kashubian and Slovincian as the children of Old Kashubian but not Old Polish? Vininn126 (talk) 15:43, 10 May 2024 (UTC)Reply
@Vininn126 Per the rules just outlined, we could definitely make Old Kashubian an etym-only variety of Old Polish and set the ancestor of Kashubian and Slovincian to Old Kashubian, but people would still be able to "inherit" Kashubian and Slovincian terms from Old Polish. It'd be like the situation with Old French. If you wanted to avoid that, either we'd need a new flag or rule of some sort, or we'd need to change the name of Old Polish to e.g. "Old Lechitic" and make Old Polish an etym-only variety of Old Lechitic. @Theknightwho Here's a thought though. If we set the explicitly set the ancestor of Old Kashubian to Proto-Slavic, would that make it impossible to inherit Kashubian terms from Old Polish? That would be like a slight generalization of the special-case rule for ancestral-to-parent etym languages. Benwing2 (talk) 21:18, 10 May 2024 (UTC)Reply
I've dreamed of "Old Lechitic", but it doesn't encompass Polabian. Vininn126 (talk) 22:08, 10 May 2024 (UTC)Reply
@Vininn126 Sorry, why does Polabian matter here? It can just be excluded from Old Lechitic just as it would be excluded from Old Polish. Benwing2 (talk) 08:41, 11 May 2024 (UTC)Reply
@Benwing2 I have actually tossed the idea of "Old Lechitic" around before with @Sławobóg and @Silmethule. I suppose since it contains Old Kashubian as well there is more precedent for the name. Vininn126 (talk) 08:45, 11 May 2024 (UTC)Reply
Also pinging @KamiruPL as the other main Old Polish editor so he can be aware of the goings-on and give his opinion. Vininn126 (talk) 08:34, 11 May 2024 (UTC)Reply
I wouldn't like this. This is almost akin to handling Old East Slavic as an Old Church Slavonic variety. Pomeranian and Polish are two distinct branches, and the fact that an earlier stage was highly influenced in their literary variety by the other doesn't make them one and the same. Thadh (talk) 20:43, 6 May 2024 (UTC)Reply
There's actually a similar issue with texts from Pomerania from {{R:pl:SXVI}} and {{R:pl:SXVII}} but I think we can safely nest these under modern Kashubian with a label, as I have done with Middle Polish. Vininn126 (talk) 19:43, 6 May 2024 (UTC)Reply

Old Polish regional categorization edit

As a sort of continuation of Wiktionary:Beer_parlour/2024/May#The issue of Old Kashubian (Old Pomeranian?) and Wiktionary talk:About Old Polish#Regional Old Polish, I'm trying to figure out the best way to handle regional information for Old Polish. I have a document explaining the origin of most texts in Old Polish so it should be easy to figure out which of the 5 lects currently considered Old Polish (those being Masovian, Greater Polish, Lesser Polish, Silesian, and Pomeranian/Kashubian). I think it would be useful for readers to know which region a definition/term has been attested, as Old Polish wasn't a single entity and ultimately is the source of those modern dialects today, so we can see more clearly regional features and the like. My concern about using labels is that they would imply that a term might have been limited to a given lect, which we can't know for sure. What do others think? Vininn126 (talk) 19:17, 6 May 2024 (UTC)Reply

One solution could be to use {{lb}} but print the text {{lb|zlw-opl|attested in|Masovia|Lesser Poland}} etc. @Benwing2, would this be technically bad? Vininn126 (talk) 15:56, 8 May 2024 (UTC)Reply
@Vininn126 No, I don't see why that would be an issue. attested in isn't currently a recognized label but could easily be made one, so that it suppresses the following comma. Benwing2 (talk) 23:21, 8 May 2024 (UTC)Reply
@Benwing2 Alright, that would be fine, and I think that's a good solution. Vininn126 (talk) 07:32, 9 May 2024 (UTC)Reply
@Benwing2 Another solution would be to have the quotation templates categorize by dialect when added to a page. This probably would be a bad idea? Vininn126 (talk) 07:44, 9 May 2024 (UTC)Reply
@Vininn126 Yeah the quotation templates do take a label but I feel uncomfortable categorizing based on that label. You could for example imagine someone illustrating a general-use term with a sentence written in a dialect, and labeling the quotation with the dialect in question; that doesn't mean in this case that the term is in the dialect. Benwing2 (talk) 08:24, 9 May 2024 (UTC)Reply
@Benwing2 Alright so for now I'm going to add the location of creation of the documents and a note saying what label the quotation template should count toward, see {{RQ:zlw-opl:AcCas}}, and I'll add the labels and regions manually from there. Unless it'd be possible to do a bot job after. Vininn126 (talk) 08:26, 9 May 2024 (UTC)Reply
@Vininn126 Might be possible, depends on how regular everything is and you making a list of all the quotation templates and associated lect/labels. Benwing2 (talk) 08:36, 9 May 2024 (UTC)Reply

Continental Celtic edit

We have Continental Celtic as a family, but my understanding is that the consensus among Celticists is that is CC isn't a clade but just a term of convenience for Celtic languages other than the Insular Celtic ones. Isn't our custom at Wiktionary to have only actual genetic families, not convenient groupings? —Mahāgaja · talk 11:28, 7 May 2024 (UTC)Reply

@Mahagaja Yeah we should get rid of this. BTW the Wikipedia article on Continental Celtic was in a terrible state due to a bunch of crap added a month ago, which I reverted. Benwing2 (talk) 22:06, 7 May 2024 (UTC)Reply
Yeah, agreed. Theknightwho (talk) 04:09, 10 May 2024 (UTC)Reply

Ban one-descendant Proto-Italic and Proto-Hellenic redlinks edit

There are already far too many one-descendant Proto-Italic and Proto-Hellenic entries, and adding one descendant redlinks to, for example, a descendant tree or an etymology section is only going to encourage more of these entries being created. These redlinks should be banned. -saph 🍏 13:31, 7 May 2024 (UTC)Reply

Right, there should be above-average incentive to create such a page, so unless it is already decided to have one, bots should neutralize these links. Fay Freak (talk) 13:51, 7 May 2024 (UTC)Reply
In practice, what does a 'ban' on making certain kinds of redlinks mean, and what is the alternative it is supposed to incentivize? I guess mentioning the same form but not linking it would be slightly better, as it doesn't encourage creating an entry, but I'm not totally happy with that either in some cases. E.g. if the reconstructed form is itself doubtful, I wouldn't want it to be mentioned anywhere.--Urszag (talk) 15:44, 7 May 2024 (UTC)Reply
For example:
From Proto-Italic *fworom, from Proto-Indo-European *dʰwor-om (enclosure, courtyard, i.e. something enclosed by the door, or the place outside, i.e. through the door), from *dʰwer- (door, gate).
With the Proto-Italic word displaying as just plain text, rather than what we currently have (forum). As for the reconstructed form being doubtful, we should just list the hypothesised PIE form, e.g.:
As opposed to the current etymology given at serius. -saph 🍏 15:58, 7 May 2024 (UTC)Reply
The alternative it is supposed to incentivize is not creating such entries. You would have to have a more serious motive than ticking off a removed red link, since they are not apparent in the first place. Fay Freak (talk) 16:03, 7 May 2024 (UTC)Reply
Agreed. Down the line it may also be worth discussing a general ban of reconstructions (and their associated redlinks) that have only one descendant and no derived terms. Nicodene (talk) 22:49, 9 May 2024 (UTC)Reply
Could someone run a bot to do this? -saph 🍏 19:50, 10 May 2024 (UTC)Reply

Add "Muslim", "Hindu" etc. labels? edit

Proposal to add labels for lemmas used by people of specific faiths (which are not necessarily religious terms, rather they're only used by certain groups. Case in point মিঞা (mĩa) which has a Muslim gloss, but the Muslim label is an alias for 'Islam', though it's not an 'Islamic' term, just used by Muslims. Urdu dictionaries, which I concern myself with, have used these labels for centuries without prejudice. I know this would be useful for languages in the Indian subcontinent, as well as European languages (especially English). نعم البدل (talk) 20:55, 7 May 2024 (UTC)Reply

@نعم البدل There are (at least) two possibilities here. One is to disentangle the labels 'Muslim' and 'Islam' in a language-independent fashion, and the other is to do it for specific languages. I suspect the aliasing of 'Muslim' and 'Islam' was done with English entries in mind, where on the surface it makes a certain amount of sense (e.g. we have 'Muslim finance' as an alias of 'Islamic finance' and 'Christian' as an alias of 'Christianity'). A third possibility is to create a separate label, something like 'Muslim usage' or 'Muslim speakers', which makes it clear that the term is used by particular speech communities. Note that the advantage of doing it in a language-specific fashion is we can create associated categories, such as Category:Muslim Bengali, to categorize such terms, which wouldn't make so much sense if done language-independently. Finally, the adjective-noun issue you're bringing up isn't limited to this case; there is for example the issue of 'British India' (English terms formerly used in British India) vs. 'British Indian' (English terms currently used by Brits of Indian background).
BTW if you think the terms should be disentangled language-independently, you can see all current uses of the label 'Muslim' here: Special:WhatLinksHere/Wiktionary:Tracking/labels/label/Muslim (there are only 9 of them). Benwing2 (talk) 21:58, 7 May 2024 (UTC)Reply
@Benwing2: I think the 'Muslim' (etc.) tag should be detached from the 'Islam' label and made into an independent label and placed under the Module:labels/data/topical so that, as you say, it can generate associated categories, something like Category:Bengali Muslim speech (similar to Category:English women's speech terms, a minor difference between 'Muslim Bengali' as the label I'm proposing should be shed of its religious connotations as much as possible).
  • you can see all current uses of the label 'Muslim' here – Thank you for this! As far as I can see, apart from marabout, all of the other terms should be placed under my proposed label, as that's what was probably implied. Note how the 'Muslim' tag in মিঞা (mĩa) was encapsulated with Template:a (added by an IP), not the 'Muslim' label – likely because the 'Muslim' label appends the lemma to Category:Islam which doesn't fit. نعم البدل (talk) 02:21, 8 May 2024 (UTC)Reply
@نعم البدل OK, let's see if there are any objections/comments, and if not I'll make this change in a few days. Benwing2 (talk) 03:04, 8 May 2024 (UTC)Reply
Yeah no worries! نعم البدل (talk) 17:34, 8 May 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Benwing2, نعم البدل For a while there was a category named CAT:Musalman Gujarati, which is now empty. The handful of terms that were in it were moved to CAT:Gujarati dialectal terms. It would be helpful if there is a category named something like CAT:Gujarati Muslim speech as a replacement for CAT:Musalman Gujarati.

There is a phenomenon known as being a Cultural Muslim, but not a practising Muslim, who might use the terms in a category such as CAT:Muslim speech but not necessarily identify with the terms in CAT:Islam. The same would probably be applicable to other faiths.

Would greetings such as salaam alaikum that are associated with a Muslim context but may or may not be intended to be Islamic be in the proposed CAT:Muslim speech alongside CAT:Islam? For this particular term, it says on Wikipedia that it is ‘common among Arabic speakers of other religions (such as Arab Christians and Mizrahi Jews)’. The usage notes section of नमस्ते says ‘it is often considered gracious to greet someone in their religion’s greeting’ [even if that differs from their own religion]. Kutchkutch (talk) 03:44, 10 May 2024 (UTC)Reply

@Kutchkutch: I might be drifting away from the subject a little since I'm a little in interested in this :) The case with salaam alaikum is slightly complex, though. In Arabic, it's a common greeting, and used by people who follow Abrahamic faiths. I'm not really sure about the exact perception of that phrase in Arabic but in Urdu, it's sometimes the same, people who speak Urdu, regardless of their faith, might use that term, but some hardliners might be of the opinion that it's even forbidden to say 'Salam' to a non-Muslim, while other Muslims might not even bat an eye to the other's faith, and a label might not even be considered. Generally, I would say it applies to a CAT:Muslim speech (but not Category:Islam) because of alternatives like آداب (ādāb) being considered more 'neutral'. Is नमस्ते (namaste) considered to be inherently an Hindu phrase, as is generally the perception of Urdu speakers – even when it comes it to Hindi, or is it somewhat neutral? نعم البدل (talk) 01:44, 11 May 2024 (UTC)Reply
@نعم البدل: Thanks for the clarification about سَلام عَلَیکُم (salām 'alaikum).
  • Is नमस्ते considered to be inherently an Hindu phrase, …when it comes it to Hindi, or is it somewhat neutral?
  • With respect to this proposal, नमस्ते and नमस्कार could go in CAT:Hindi Hindu speech. However, there is inherently nothing Hindu about the words नमस्ते and नमस्कार in of themselves other than Sanskrit being the liturgical language of Hinduism (similar to how Arabic is the liturgical language of Islam). What may considered inherently Hindu/Buddhist/Jain/Sikh about नमस्ते and नमस्कार is when the the salutation (and related hand gesture 🙏) is toward a deity rather than actual person.
  • Although the words नमस्ते and नमस्कार are found in Vedic literature in the context of worshipping Hindu deities, the words themselves are formations derived नमस्, which is cognate to نماز .نماز was probably associated with Zoroastrianism rather than Islam before the Islamic conquest of Persia, and this is indicative that the term was not inherently bound to a particular religion.
  • Even though नमस्ते and नमस्कार are considered Hindu greetings, it seems to be neutral when speaking Hindi because it may only be inappropriate to use them if both the speaker and listener belong to a community that has its own community-specific greeting such as सलाम अलैकुम (salām alaikum) among Muslims, जय जिनेंद्र (jay jinendra), among Jains and सत श्री अकाल (sat śrī akāl) among Sikhs. The reason for this may be that India is 79.8% Hindu (according to the 2011 census). If there are no overt indicators to guess the other person’s religion when talking to strangers, using the Hindu greetings (alongside the English greetings) may be considered as neutral since there is an 80% probability that the other person is a Hindu.
  • Under this proposal, would title of CAT:Judeo-Urdu remain the same or would it be renamed to CAT:Urdu Jewish speech? Kutchkutch (talk) 11:23, 11 May 2024 (UTC)Reply
    @Kutchkutch That is a very good question; maybe User:-sche would have some thoughts about this. Often the Judeo-Foo varieties are their own dialects rather than just consisting of extra terms added to the language and writing the language in the Hebrew script (most famously "Judeo-German" aka Yiddish and "Judeo-Spanish" aka Ladino, each of which has its own L2). In other cases however, it is more comparable to the distinction between Hindi and Urdu, and in some situations there is even less of a difference; I'm not sure about Judeo-Urdu. Benwing2 (talk) 18:34, 11 May 2024 (UTC)Reply

Englishman picture edit

So User:Shoshin000 (among other trollish activities) has been insisting on adding a picture of an angry football hooligan as the picture of "Englishman". I reverted it once, he restored. I mention this because I know the modus operandi and soon I'll be accused of being a badmin. Check out the entry and you know the previous picture was nicer. Equinox 22:47, 7 May 2024 (UTC)Reply

I personally think your picture is better (although I wonder, do we need a picture to illustrate this?). Benwing2 (talk) 00:00, 8 May 2024 (UTC)Reply
Honestly, I like Shoshin's pic, as it's more stereotypical.[1] There's nothing inherently Englishman-y about Eq's pic, besides the depicted person being English.
[1] Then again, that's a good argument against the pic. CitationsFreak (talk) 03:25, 8 May 2024 (UTC)Reply
It could be argued that pictures of nationalities, if they exist at all, should show someone of that nationality in characteristic clothing (although that is probably more appropriate for nationalities that actually have characteristic clothing that most people wear on a day-to-day basis). OTOH it's in general very hard to capture a nationality in single picture (for this reason, Wikipedia usually supplies a whole collection of pictures to illustrate a nationality), and in any case this is more encyclopedic than dictionaric (a real but rare word). Benwing2 (talk) 04:06, 8 May 2024 (UTC)Reply
Yeah, I was thinking that a college would be best. I'm not sure what a recognizable British outfit would be, and having one person stand-in for Britain could imply that British people all are X. Highly unlikely, but possible. CitationsFreak (talk) 04:12, 8 May 2024 (UTC)Reply
I don't think nationalities should have photos at all, but I also disagree that File:ENG-BEL (6).jpg is "a picture of an angry football hooligan". The person in that photo doesn't look angry, nor is he doing anything hooliganish. His Englishness is clearly shown by the St George's Cross painted on his face. He arguably does illustrate [[Englishman]] better than the photo of Greg Rutherford, since Rutherford is representing the entire UK (not just England) in his photo. All that said, however, it is probably better to leave such entries unillustrated to avoid stereotyping. —Mahāgaja · talk 08:07, 8 May 2024 (UTC)Reply
I agree, this is not an image that requires an image. Vininn126 (talk) 08:09, 8 May 2024 (UTC)Reply
Aren't photos appropriate where there is an attestable, probably dated and often derogatory or demeaning, definition of a stereotype? Eg, Bavarians with lederhosen, Prussians with spiked helmets, Mexicans with sombreros and/or serapes.
There is no such definition here, nor would I expect us to attest any such definition. DCDuring (talk) 17:34, 8 May 2024 (UTC)Reply
I don't really see this picture as a problem, really, even though I wouldn't pick it myself. It'd probably be fine as part of a collage. Theknightwho (talk) 17:51, 8 May 2024 (UTC)Reply

Fixing Telugu rhymes edit

For years now, User:Rajasekhar1961 has been adding Telugu rhymes written in Telugu script instead of IPA. There is a special hack in Module:rhymes to deal with this, but IMO Telugu should (obviously) use IPA for rhymes, just like all other languages. Does anyone object to this? Can anyone out there read Telugu script well enough to tell me if the rhymes listed under Rhymes:Telugu (e.g. Rhymes:Telugu/రం) and are even salvageable, or should just be nuked? I don't know much about Telugu but scripts are generally not 1-to-1 mappable to IPA, so I don't know what it means to have a rhyme listed using Telugu script. Benwing2 (talk) 00:40, 8 May 2024 (UTC)Reply

Strongly agree. Theknightwho (talk) 11:40, 8 May 2024 (UTC)Reply
@Benwing2, Theknightwho Rajasekhar1961 has certainly put effort into creating CAT:Telugu rhymes. However, unless the definition of a rhyme in a Telugu or Dravidian context differs from
‘the second part of a syllable, from the vowel on, as opposed to the onset’
you are correct in pointing out that these do not appear to have been done correctly. From an orthographic perspective, the final consonant (or consonant cluster) followed by the final diacritic (or the inherent schwa) of a word written in Telugu script (which is a Southern Brahmic abugida) does not constitute a rhyme. The entries in CAT:Telugu rhymes categorise words by word-final syllables rather than rhymes because the onset is included.
A Telugu editor could probably rectify the words mentioned on the entries in CAT:Telugu rhymes. However, even if there is a user with the appropriate background to do so, it would be a lot of work, and it would be the equivalent of deleting the entries currently in CAT:Telugu rhymes and starting over again. Kutchkutch (talk) 11:13, 10 May 2024 (UTC)Reply
@Kutchkutch Thanks. User:Rajasekhar1961 can you comment on why you did this? If I don't hear from you in a few days I will go ahead and delete all the Telugu rhymes. Benwing2 (talk) 14:50, 10 May 2024 (UTC)Reply

Kwami is messing with translingual entries, again edit

Just want to make sure there are some eyes on Kwami, as they've been making mass edits to Translingual entries that seem... worrying. After being reverted by @Theknightwho and @Benwing2 for deleting the translingual section, Kwami has recently begun deleting all the definitions from the translingual section instead.

I reverted all (but one) of the single character edits they've made today. However, they've been editing hundreds of TL entries and I have no idea how many entries are affected, as I've been very busy recently and can't check.

I'm not sure how bad the situation is so I don't want to "call out" Kwami. Just want to make sure people are aware before it becomes out of hand, like the last time this was discussed on here. — Sameer مشارکت‌هابحث﴿ 23:54, 8 May 2024 (UTC)Reply

@Sameerhameedy Thank you. I have blocked him for a month this time; I am getting seriously sick of this. I think he has used up all his lives; next time we should consider a permablock. Benwing2 (talk) 00:38, 9 May 2024 (UTC)Reply
Thank you, I'm also a bit annoyed since Kwami has gotten so many warnings and continues to do the same action. Now, Kwami has indicated that they will actually start a discussion on this issue before acting. There's no way to know if Kwami will actually follow through on that statement, but hopefully they do, so we don't have to do this every month. — Sameer مشارکت‌هابحث﴿ 00:51, 9 May 2024 (UTC)Reply
Just to clarify, these weren't random articles. I went through the whole Latin Extended Additional block and replaced physical descriptions (e.g. "the letter N with a line below") with requests for definition. I didn't delete actual definitions that would tell the reader what the letter meant or what it was used for.
Sameer, the discussion is the next thread. kwami (talk) 06:16, 10 May 2024 (UTC)Reply
@Kwamikagami That is exactly the issue. You are continuing to fail to see that there is no consensus for doing what you did, after 10+ times that you've been asked to get consensus *BEFORE* doing mass changes. If you're not seeing this now, I doubt you will ever see it, and if you're not willing to defer to and respect consensus, you're in for a permablock. Benwing2 (talk) 08:01, 10 May 2024 (UTC)Reply
@Benwing2, Sameer was concerned that there may be many more such edits, so I clarified what edits I had made. That included the category of articles I had edited, and the kind of edits I had made on them. I thought they might find that helpful.
As to your point, I wonder how possible it is to get consensus to do anything here. Hopefully the discussion below will produce consensus. My hopes aren't high, given that previous discussions got nowhere, but you never know. kwami (talk) 09:06, 10 May 2024 (UTC)Reply
I've been at Wiktionary for almost 20 years and have never yet seen a Beer parlour discussion result in consensus, so my hopes aren't high either. —Mahāgaja · talk 09:16, 10 May 2024 (UTC)Reply
You can't have read many Beer Parlour discussions, then. Kwami is simply trying to convince themselves that what they're being asked to do is impossible, because they can't ever accept they're wrong about anything, ever. It's not complicated. Theknightwho (talk) 10:57, 10 May 2024 (UTC)Reply
@Mahagaja I have seen plenty of Beer Parlour discussions that result in consensus; not sure what you're referring to. Benwing2 (talk) 14:53, 10 May 2024 (UTC)Reply
I guess ironically there's no consensus if there's consensus? And while I think us Wiktionarians like to bicker and we often disagree over certain details and such, I do think there's enough cooperation, compromise, and agreement to say that plenty of threads end inn consensus. Vininn126 (talk) 15:10, 10 May 2024 (UTC)Reply
If I decide that "q" is not a proper English letter unless followed by "u" and I want to get rid of all the English entries with a "q" not followed by "u", there is no way that I can get consensus for that via any process. That doesn't mean that I can go ahead and remove the English entries for words like Qatar and Iraq or even BBQ (it's not a proper abbreviation) because the usual process doesn't work. It means I should find something else to do. The unwritten question underlying all of this is "how can I get my way when I'm right and I can't get people to say they agree with me". Yes, the process isn't perfect, and sometimes doesn't work- but rejecting it entirely won't fix it. Chuck Entz (talk) 16:16, 10 May 2024 (UTC)Reply
That's why I'm here. The question is straightforward: do we have standards for what counts as a definition? If so, what are they? Where can I find them?
In this case, does a graphic description count as a definition? Quite a few editors have said they do not, but there seems to be difficulty in implementing that.
Also, should we have a translingual section without providing evidence of translingual use? Especially when there is no definition in that translingual section?
Do we have consensus that such things should be tagged with RFDef or RFD, and how should I respond if I tag them and someone goes through and deletes the tags without discussion because they don't like the extra work?
It's fine to say 'go to RFD', but why spend months doing that if it should be obvious from the outset that they're not going to pass? That's a waste of everyone's time. That's why I'd like some concrete standards to follow. I assume Wikt must have standards; if you could just show me where they are (I don't see anything in the help pages), I could add a link to my user page and refer to them when making edits. Then instead of arguing over every edit, I could point to the standards and show that I've been following them, or they could point to them and show that I've been violating them. I don't mean about the RFD process, but about the content of our articles. kwami (talk) 19:37, 10 May 2024 (UTC)Reply
In favor of this. Vininn126 (talk) 09:41, 10 May 2024 (UTC)Reply

Do descriptions count as "definitions"? edit

I'm not being facetious here. This is a serious question for something I haven't understood for a long time.

For instance, in the article á, would "the letter a with an acute accent" be a valid definition? If so, should such descriptions be added to all letters? If not, should they be removed (perhaps placed under a "Description" heading instead)? And if not, and the only material for an article is such a non-definition, should the entry be tagged as needing a definition, or the article tagged for deletion for having no content?

I suspect that if I were to add a definition to cat as "the word spelled C-A-T", I would be blocked for vandalism. I don't see any meaningful difference between that and defining á as "the letter a with an acute accent". I've been told this is a straw-man argument, but I really don't understand what's appropriate in our entries if graphical descriptions are allowed as actual definitions.

The same applies to emojis, of course. Should an emoji of a face with tears be defined as "a face with tears", or should the definition be what it means and what it's used for? kwami (talk) 00:29, 9 May 2024 (UTC)Reply

@Kwamikagami I agree that "the letter a with an acute accent" is not a good definition. If á had a translingual section it should at least explain how the letter is typically used across languages. I assume it usually represents some kind of /a/? Ioaxxere (talk) 01:43, 9 May 2024 (UTC)Reply
A definition that depended on users understanding IPA, however, would be unsatisfactory. DCDuring (talk) 01:56, 9 May 2024 (UTC)Reply
Personally, I don't see the point of a translingual section, except for things like the IPA or IAST transliteration, and this particular case would only be sum-of-parts in such cases.
But my question is what should be done with articles that have such non-definitions. Since starting this discussion, I was blocked for pasting [rfdef] tags yesterday on a bunch of articles in place of such descriptions.
So,
  1. since it is not a good definition, can it be removed?
  2. should it be replaced with a request for definition, or should the empty section be deleted?
kwami (talk) 01:56, 9 May 2024 (UTC)Reply
@Kwamikagami: I think it's better to improve the definition rather than adding a ton of {{rfdef}}s as this creates lots of work for other editors. Ioaxxere (talk) 02:09, 9 May 2024 (UTC)Reply
I agree, and I've been doing that where I can. But how do I improve the definition when there is no definition? What is the translingual definition of a letter that does not have translingual use? What is the definition of a letter that has no evidence of any kind of use? What I've been tagging are cases where I can't find any definition to provide.
The reason I've been adding rfdef tags is that I'm not allowed to delete empty entries.
So, if there's an empty article or section, one that has no content except a character box, and no definition of what's supposedly being defined, what's the solution? Do we leave it as a joke, or do we try to improve it? If we want to improve it, how do we do that, when there's no available data to improve it with?
If someone added a bunch of articles, all with the definition being "it's a word", shouldn't we at the very least tag them as needing actual definitions, even if that creates work for people? kwami (talk) 02:17, 9 May 2024 (UTC)Reply
I've gone through and added definitions to hundreds of these articles. The ones I've tagged are ones where I can find to definition to give. It's a choice of adding a tag or leaving Wikt looking like a joke. kwami (talk) 02:24, 9 May 2024 (UTC)Reply
Clearly, most definitions are simple descriptions, prototypically, for nouns, having a hypernym and differentia. The descriptions are also supposed to be useful to users. "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself, thus a straw-man. For a Latin letter with a diacritical mark it might be useful for some that the definition explained how the so-marked letter differs from the Latin character without the mark or those with other marks in each relevant character set.
A definition of a word naming an emoji might include a description as well as what the emoji is understood to mean, like a good definition of green light. The entry might also have the appropriate image, too, constituting an ostensive definition, redundant to the headword in the case of Latin characters diacritically marked. DCDuring (talk) 01:56, 9 May 2024 (UTC)Reply
> "The word spelled C-A-T" is not useful being redundant to the graphic representation of the headword itself"
But "the letter a with an acute accent" is equally redundant to the graphic representation of the headword itself. So no, it's not a straw-man, it's reductio ad absurdum.
Because our users include many people who are not familiar with diacritical marks on Latin letters we give them an explanation of what they are looking at. Those descriptions also help those who, like me, don't have great vision and can't necessarily discriminate among the various diacritical marks and don't necessarily know the names of those marks. DCDuring (talk) 12:29, 9 May 2024 (UTC)Reply
I have no problem with that. That's what the 'description' section is for. But it's not a definition.
We also have a pronunciation section for people who don't know how to pronounce a word. But again, the pronunciation of a word is not its definition.
In many cases I've moved the description to a description section, and tagged the definition section as needing a definition. But then people get annoyed that I'm creating work for them, because now they're expected to treat Wikt as an actual dictionary.
Much of the opposition to improving articles seems to center around it being more important for Wikt to be correctly formatted and to look good, than for it to actually contain any content or be useful as a dictionary. kwami (talk) 21:58, 9 May 2024 (UTC)Reply
Definitions of nouns are not descriptions of the word, but of the meaning of the word. Orthography and pronunciation belong in other sections: they are not the definition itself. Why should graphemes (incl. emojis) be different? Many have (graphical) description, etymology and pronunciation sections. Those cover the description of the letter as a mark on paper or as said. A definition concerns itself with meaning. If no meaning is provided, so the reader can't tell what the symbol is for, then we're not providing a functional definition. kwami (talk) 02:00, 9 May 2024 (UTC)Reply
BTW, we also have cases of 'translingual' sections with zero evidence of translingual use. Sometimes letters are specific to a particular language, yet they have a translingual section with no definition. All that does is push the actual definition down lower on the page, where you can't see it without scrolling. That is a minority situation, but we do have hundreds of articles with "the letter a with an acute accent"-type descriptions as their 'definition'. kwami (talk) 02:10, 9 May 2024 (UTC)Reply
I guess my question is, if it's not acceptable to add these fake definitions, and it's not acceptable to tag them for improvement, why is it not acceptable to delete them? kwami (talk) 02:29, 9 May 2024 (UTC)Reply
@Kwamikagami What's not acceptable is deleting them without going through RFV or RFD. Come on, we've said this many times by now. Theknightwho (talk) 02:36, 9 May 2024 (UTC)Reply

English anagrams edit

English anagrams haven't been updated in a while. Could someone run a bot to update them? Maybe @Kiril kovachev, Benwing2 Ioaxxere (talk) 01:43, 9 May 2024 (UTC)Reply

@Ioaxxere I can try, but I'm not sure if I trust myself to do it properly. Specifically the part of which characters (like punctuation) should be removed when comparing two words. Kiril kovachev (talkcontribs) 12:41, 9 May 2024 (UTC)Reply
@Kiril kovachev: It doesn't matter too much, since the vast majority of English terms don't have any special characters. Punctuation (periods, commas, etc.) as well as different casing should definitely be ignored, but I have no preference with respect to diacritics. Ioaxxere (talk) 15:45, 9 May 2024 (UTC)Reply

becocked, and whether we want Trivia sections edit

Recently, Trump used the word becocked, which attracted some attention because it's an unusual word, and quite a lot of people thought he'd made it up (even though he didn't). Is this the kind of thing we want to note in trivia sections? To me, it seems like the kind of thing no-one will care about in a month, and that it adds pointless clutter. Pinging @Ioaxxere, who originally added it as a usage note, but later changed it to the little-used Trivia heading. Theknightwho (talk) 02:12, 9 May 2024 (UTC)Reply

His use of the term attracted some media coverage ([4] [5]) making it probably the most notable event in the history of becocked. Does a single sentence about this really add so much clutter? Ioaxxere (talk) 02:18, 9 May 2024 (UTC)Reply
Ask yourself this: in two years' time, if someone came across this in an entry, would they feel like this addition was the cringeworthy result of terminally online recency bias? Almost certainly yes. It's basically just celebrity gossip. Theknightwho (talk) 02:21, 9 May 2024 (UTC)Reply
"X said this word" should never go in Trivia/Useful Notes. It can go as a quote, however. CitationsFreak (talk) 03:08, 10 May 2024 (UTC)Reply
I agree completely. I think trivia sections should chiefly be used for things like noting that a word is thought to be the longest in a particular language, has no vowels, doesn’t rhyme with any other word—that sort of thing. — Sgconlaw (talk) 11:35, 10 May 2024 (UTC)Reply
I agree too. PUC19:24, 11 May 2024 (UTC)Reply

Manipuri vs Meitei language (moved to RFM) edit

I propose we change it to Meitei as the language is predominantly spoken by the Meitei people. Meitei is not the only language indigenous to Manipur. There are other ethnic groups in Manipur who speak different languages. So there are many Manipuri languages, Meitei is only one of them. 178.120.0.250 10:40, 9 May 2024 (UTC)Reply

FWIW; this is about renaming what we call Manipuri to Meitei. I told the IP to come here, but in hindsight, perhaps WT:RFM would be a better venue.
At least the English Wikipedia seems to use Meitei as the primary name for the language. — SURJECTION / T / C / L / 11:10, 9 May 2024 (UTC)Reply
Sure, btw you can call me 178 if you want. It's a bit more specific. 178.120.0.250 11:31, 9 May 2024 (UTC)Reply
Yes, WT:RFM is the usual place for discussions about renaming languages. —Mahāgaja · talk 13:34, 9 May 2024 (UTC)Reply
i oppose the proposition as it is unneeded; the rename request is unnecessary as it neither adds nor removes anything valuable. There aren't any active editors in the language, and if such a user comes up and finds problem with the name he will point that out naturally and the the discussion will be fruitful. Discussing over it shall only cause a wastage of time, given that in this case the current name is obviously not obstructive. Word0151 (talk) 14:42, 9 May 2024 (UTC)Reply
  Support seems like Wikipedia already changed the name. Not that we need to match Wikipedia, but if they changed it and the only interested editors here wanna change it too... why not? — Sameer مشارکت‌هابحث﴿ 15:52, 9 May 2024 (UTC)Reply

Performing bulk edits for Bengali/Bangla edit

Discussion moved from Wiktionary talk:Beer parlour/2024/May.

I'm a NLP researcher who uses Wiktionary to collect pronunciation data. As part of this effort we have noticed various inconsistencies in phonemic transcription. For example,

1. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali have only one voiceless glottal fricative /h/, so /ɦ/ > /h/. E.g.: অকৃতোদ্বাহ 'bachelor' /ɔ.kri.t̪od̪.ba.ɦo/ > /ɔ.kri.t̪od̪.ba.ho/. This IPA symbol is not correctly represented in Wiktionary Bengali transliteration guide. Therefore, I propose to edit the guide.

2. The correct phonemic transcription (ref. Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970) for affricates should include the tie-bar, so /tʃ, t͡ʃʰ, dʒ, d͡ʒʱ/ > /t͡ʃ, t͡ʃʰ, d͡ʒ, d͡ʒʱ/. E.g.:চরম 'extreme' /tʃɔɾom/ > /t͡ʃɔɾom/, ছায়াছবি 'film' /tʃʰae̯atʃʰbi/ > /t͡ʃʰae̯at͡ʃʰbi/, জল 'water' /dʒɔl/ > /d͡ʒɔl/, ঝিনুক 'sea shells' /dʒʱinuk/ > /d͡ʒʱinuk/. This tie-bar is not included in Wiktionary Bengali transliteration guide. I proposed to include this tie-bar for affricates symbols.

3. According to various sources (Khan, 2010; Dasgupta, 2003, Ferguson & Chowdhuri, 1960; Chatterji, 1970), Bengali doesn't have palatal plosive /c and ɟ/. Instead it has post alveolar affricates (ref. https://en.wiktionary.org/wiki/Wiktionary:Bengali_transliteration). Therefore, /c/ > /t͡ʃ/ and /ɟ/ > /d͡ʒ/. E.g. : অগোচর 'beyond one's knowledge' /ɔɡocɔr/ > /ɔɡot͡ʃɔr/, অগ্নিযুগ '(figurative) the age of revolution' /oɡniɟuɡ/ > /oɡnid͡ʒuɡ/.

Does there exist any tool or API that could allow us to apply bulk edits? If this sounds right, I will start to make corrections. Arundhatisgupta (talk) 16:06, 9 May 2024 (UTC)Reply

I relocated this post because it was in the wrong place. — Sgconlaw (talk) 16:20, 9 May 2024 (UTC)Reply
The IPA has long held that the tie bar is not necessary when transcribing languages that don't distinguish affricates from stop-fricative sequences. If Bengali doesn't distinguish /t͡ʃʰ/ from ?/tʃʰ/, then our current transcription convention is fine.
In describing the phonetics of a language, you want to be as precise as possible, so the ties are a good thing. But with a key like we have, they're not necessary.
The tie bars clutter a transcription and can make it more difficult to read. If we did implement them, it would probably be better to use the under-tie, ⟨t͜ʃʰ⟩. That's generally more legible because our eyes pick up details better at the top of a symbol, so the under-tie is less distracting. kwami (talk) 05:12, 10 May 2024 (UTC)Reply
While "the tie bar is not necessary", it is good practice to include it and most languages on Wiktionary do. I don't see why Bengali would be an exception. Thadh (talk) 11:36, 10 May 2024 (UTC)Reply
I agree with @Thadh.
@kwami It is not necessary for English as well. Why did you included it in English? Also, there is no consistency. If you think it is not necessary then make sure that you maintain that consistency. E.g.: অগচ্ছিত 'not entrusted to anyone' /ɔɡot͡ʃt͡ʃʰit̪o/ has the tie bar but চরম 'extreme' /tʃɔɾom/ doesn't. What do you think about that? Arundhatisgupta (talk) 16:37, 10 May 2024 (UTC)Reply
Whichever convention is chosen, it should be consistent, and should match the key. kwami (talk) 19:19, 10 May 2024 (UTC)Reply
There will be a confusion when /t/ and /ʃ/ occurs together but they are not affricate. E.g. কুৎসা 'slander' /kutʃa/ and বচসা 'contention' /bɔtʃoʃa/. Without a tie-bar they seems like having similar pronunciation for /tʃ/ but the correct pronunciations are - /kutʃa/ and /bɔt͡ʃoʃa/. Arundhatisgupta (talk) 16:51, 10 May 2024 (UTC)Reply
That can be handled as ⟨kut.ʃa⟩ and ⟨bɔtʃoʃa⟩ or as ⟨kutʃa⟩ and ⟨bɔt͜ʃoʃa⟩ -- or, for maximal clarity, as ⟨kut.ʃa⟩ and ⟨bɔt͜ʃoʃa⟩. Just as long as we're consistent, or people will get really confused. kwami (talk) 19:22, 10 May 2024 (UTC)Reply
I personally think there should be a tie bar and it should go above, which is the more common practice. Benwing2 (talk) 21:06, 10 May 2024 (UTC)Reply
@Kwamikagami 1. If you are introducing a syllable break (indicating with a dot), then it should be applied consistently for all words in Wiktionary.
2. According to Wikipedia, undertie is used to represent linking (absence of a break) in the International Phonetic Alphabet. E.g.: /vuz‿ave/ (Ref. https://en.wikipedia.org/wiki/Tie_(typography)#cite_note-6) Arundhatisgupta (talk) 21:34, 10 May 2024 (UTC)Reply
Linking is used to override the orthographic spaces we insert between words in transcription. In that example, the words are //vuz ave// but the pronunciation is /vu.za.ve/. The /za/ forms a single syllable. That tie is not the same thing as the 'slur' tie used for affricates, which comes from musical notation (slurred notes). kwami (talk) 21:56, 10 May 2024 (UTC)Reply
I think that there should be tie bar and it should go above, which is the more common and establish practice for phonemic/phonemic transcription. It is important to maintain the consistency within language and across Wiktionary.
Is there any objection regarding other inconsistencies mentioned in the proposal? Arundhatisgupta (talk) 09:52, 11 May 2024 (UTC)Reply
@Arundhatisgupta No objections from me although I don't know enough about Bengali phonology to say whether e.g. the use of palatal plosives or affricates is correct. IMO the best way to go about making these changes is either manually or through AWB or JWB, which let you quickly do semi-manual changes based on regexes. Benwing2 (talk) 18:15, 11 May 2024 (UTC)Reply

How should we present Latin adjectives that inflect like nouns (or that are really appositive nouns?) edit

A few times now, I've been puzzled about how to handle showing the inflected forms of certain Latin third-declension adjectives that don't fit well into any of the usual adjective inflection patterns, because they show the endings typical for a noun instead. Currently, these seem to mostly be treated in our entries as third declension adjectives of "one ending", but I think there are some issues with the accuracy of this in terms of showing forms and usage.

A particularly clear case is certain rare words that are attested with adjectival function but that have the form of feminine nouns, such as silvicultrīx, -trīcis and Nīlōtis, -tidis (which have the forms respectively of Latin and Greek feminine agent nouns). The masculine counterparts would presumably be *silvicultor and *Nīlōtēs, but these do not to my knowledge occur, and in any case, we normally treat agent nouns as noun lemmas (distinct for masculine and feminine) rather than combining the masculine and feminine versions under one adjective lemma. Should we lemmatize such words as nouns and include a usage note saying that they're used appositively? Or should we put them as adjectives (as many dictionaries do) but include some kind of special headword and declension table coding to avoid showing masculine or neuter forms, which I think aren't accurate in this case? For example, Gaffiot marks silvicultrix as "adj. f".

Not quite as clearcut are cases like senex, iuvenis, mās that generally have the form of nouns, commonly function as either nouns or as adjectival or appositional modifiers of masculine or feminine nouns, but are extremely rare or unattested in the neuter. (I've found some neuter forms attested in some cases in New Latin.) Functionally, I think there isn't much difference between how mās and fēmina are used, but we treat mās as a noun or adjective and fēmina as only a noun. Urszag (talk) 02:49, 11 May 2024 (UTC)Reply

@Urszag We have a whole category Category:Latin first declension adjectives for words like amnicola and indigena that don't seem so different from the words you've cited, and there are unquestionably third-declension non-i-stem adjectives (e.g. vetus, concolor) that "show the endings typical for a noun", so I don't see an issue treating these as adjectives. Benwing2 (talk) 05:33, 11 May 2024 (UTC)Reply
Yes, we have that category for first-declension adjectives. The full inflection of those words as adjectives is actually a bit questionable also (there was an RFV that I closed based on New Latin examples, but I added some notes in Appendix:Latin_first_declension discussing how the neuter plural nominative/accusative/vocative forms in -a and dative/ablative forms in -īs are rather hypothetical and ambiguous, as they've often (at least since Priscian's time) been interpreted as belonging to a second-declension paradigm instead (e.g. that of indigenus).
Third-declension non-i-stem adjectives such as vetus exist, but are rare (aside from comparative forms). When there are attested neuter forms distinct from the masculine/feminine forms (such as vetus in the accusative singular, or vetera in the nominative/accusative plural), this establishes that a word is formally distinct from an appositive noun (and also establishes whether the neuter plural ends in -ia or -a, which can't always be predicted from the forms of the ablative singular or genitive plural). But I think that such neuter forms are often unattested (except sometimes in very late periods of the language) and in that case it's arguably misleading to just present a single full declension table. E.g. I found iuvenia once in Medieval Latin and occasionally in New Latin, and a couple of New Latin cases of iuvena (both from the same author), but I think it's more misleading than not to present either of these as established or standard Latin forms: a late imperial-era grammarian says that this word simply lacks neuter plural forms. In cases like this, there's an existing parameter to mark an adjective as lacking neuter forms, so I ended up using that and mentioning other forms in usage notes. But in cases like silvicultrīx and Nīlōtis, I don't know how to best present the fact that they occur only as adjectival modifiers of feminine nouns: for now I've just removed the declension table from the second word (since several forms are unattested, or only attested in post-Classical texts, and the Greek origin makes it tricky to actually infer what missing forms would be), but for silvicultrīx it seems fairly clear that it would simply inflect like victrīx. If we continued to categorize these as adjectives, does it sound reasonable to establish a parametric way to mark them as feminine-only?--Urszag (talk) 06:33, 11 May 2024 (UTC)Reply
@Urszag Yes, I think so. We have done that for some other languages, e.g. French adjective headwords have an |onlyg= parameter that you can set to a gender (m or f), a number (s or p) or a gender-number combination (e.g. m-s, f-p). Now mind you, some of the terms that make use of this (e.g. enceinte) would IMO be better treated as conventional adjectives that are simply rare in other genders or numbers; there's even a usage note for enceinte that says
The masculine form enceint is occasionally used with regard to transgender men, for species with male pregnancy such as seahorses, as well as in metaphorical, jocular, or fantastic contexts.
And indeed you will find that in Spanish, the corresponding words like encinto, embarazado and preñado are given in the masculine, with quotes establishing that such usage does exist. But if the term is indeed unattested in some genders, I would definitely support adding a flag to suppress those genders in the declension table and make sure that the title next to the declension table reflects this. Benwing2 (talk) 06:53, 11 May 2024 (UTC)Reply