Wiktionary:Beer parlour/2022/November

discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← October 2022 · November 2022 · December 2022 → · (current)

We have Category:Adjective feminine forms by language and Category:Adjective plural forms by language for certain languages, esp. Romance languages. Do we really need these categories? Do they add anything useful? In general we don't categorize non-lemma forms according to their inflectional properties, so I'm not sure why we're doing it here. Benwing2 (talk) 05:32, 1 November 2022 (UTC)Reply[reply]

Do we really need categories by etymology?
The information can be useful for the collection of oddities. For example, cat:Welsh adjective plural forms collects plurals that are distinct from the masculine singular, very much a minority of Welsh adjectives. Now, the current method of collection leaves a great deal to be desired. One needs to know that plural forms should be categorised as 'adjective plural form' rather than 'adjective form' via the PoS headline, which is not mentioned in WT:About Welsh. Consequently, the category is much shorter than it should be, omitting for instance gwynion.
In this case, better coverage would be obtained by generating a category 'Welsh adjectives with distinct plural', though there may be some awkward corner cases. A specific 'form of' template would also work, though there is the problem of training editors to choose the right template.
Category:Hebrew adjective feminine forms could likewise be useful, if one can restrict the display to feminines ending in taw. RichardW57m (talk) 10:59, 1 November 2022 (UTC)Reply[reply]
If we want to categorize irregular / unexpected forms, it would be better to add something like "irregular" to the category names (and update the contents); as it is, Category:Portuguese adjective feminine forms, with its combination of regular singular and plural form-of soft redirects, which swamp any irregular forms that may be in there, seems kinda useless. Probably we should also rename the Welsh category something like "...irregular plural forms" or "...distinct plural forms" instead of just "...plural forms" for consistency, although if regular plural forms are identical to the singular and wouldn't be categorized at all (since we seem to in general not put "inflected form of itself" sense lines on pages), the need is less pressing. - -sche (discuss) 16:35, 1 November 2022 (UTC)Reply[reply]
The Portuguese case is more difficult, but for the Hebrew case one can use a search such as:
incategory:"Hebrew adjective feminine forms" intitle:/...*ת/
Unfortunately, regular expressions seem not to support anchors at all. Let us not make the best the enemy of the good. --RichardW57m (talk) 10:14, 3 November 2022 (UTC)Reply[reply]
I say delete them. @Embryomystic may want to weigh in. Ultimateria (talk) 02:35, 2 November 2022 (UTC)Reply[reply]
While we're at it, why do we split lemmas by part of speech? --RichardW57m (talk) 10:28, 3 November 2022 (UTC)Reply[reply]
You mean categories like "English nouns"? I find those very useful for filtering searches. I regularly include or exclude results by part of speech category. Ultimateria (talk) 03:34, 5 November 2022 (UTC)Reply[reply]
I agree with User:-sche here; categories like this are only useful if they track only irregular forms (and have the appropriate name). Tracking all forms (the vast majority of which will be regular) isn't terribly helpful. Benwing2 (talk) 02:39, 2 November 2022 (UTC)Reply[reply]
I don't think I have any arguments to offer in favour of retaining them, but I agree that there are situations in Welsh and Hebrew (and probably others) where subcategories of adjective forms might be a good idea even if the general concept is discarded. embryomystic (talk) 01:42, 3 November 2022 (UTC)Reply[reply]
It would be good to hear from the creators such as @Ruakh, LlywelynII before we trash their work on templates and modules. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)Reply[reply]
Speaking only for myself — I don't have strong feelings either way. I don't actually remember doing work on templates and modules to support these categories, but whatever it was that I did, I imagine that most of it would have been needed anyway in order to show the right display text. —RuakhTALK 19:14, 11 November 2022 (UTC)Reply[reply]
You are I take it aware that the concepts of a regular Welsh plural noun and of a regular masculine Arabic plural are dubious, just like the concept of the regular perfect of a Latin 3rd conjugation verb. --RichardW57m (talk) 10:27, 3 November 2022 (UTC)Reply[reply]
@RichardW57m I am not proposing to "trash" Hebrew and Welsh template or module work. It's not even strictly necessary to eliminate all categories named "adjective feminine forms" and "adjective plural forms" etc. But I see no benefit at all to keeping these categories for Romance languages; do you? BTW there do exist regular Arabic masculine plurals (aka "sound masculine plurals"). The irregular ones you're thinking of are broken plurals, and IMO the categories should be named as such, i.e. in a language-specific manner. In fact, we do have such categories; take a look for example at Category:Arabic nouns by inflection type and you'll see a lot of them. Benwing2 (talk) 04:17, 5 November 2022 (UTC)Reply[reply]
These categories are populated by templates and modules. If the categories are deleted, an implicit invitation to recreate (namely, a red link) will be sent to everyone who is shown the categories of a page being placed in them. The only way to permanently remove the categories requires changing the templates and modules. Now, it may be simple to orphan the categories by adjusting the code invoked by {{auto cat}}, but that strikes me as a retrograde step if the categories still exist. Once objects are no longer be placed in them, these categories will be caught up in the regular slaughter of empty categories.
Eliminating these categories for Romance languages is extra work - and I'm not sure that French adjective plurals in -x are not of interest. (Unfortunately, anchors are currently missing from regular expressions in searches - someone should raise a Phabricator ticket to add them.) --RichardW57m (talk) 10:09, 7 November 2022 (UTC)Reply[reply]
OT: Arabic sound masculine plurals are just one, circumscribed option - it's hard to describe them as the 'regular' form, except when the singular fits certain patterns of derivation, and there are also predictable broken plurals, e.g. for diminutives. --RichardW57m (talk) 11:57, 7 November 2022 (UTC)Reply[reply]
@RichardW57m I absolutely do not understand your objection concerning eliminating the Romance categories given that no one else is in favor of keeping them. "It's extra work to get rid of them" is a pretty questionable reason for keeping them (and in any case the actual work is trivial). Benwing2 (talk) 02:15, 8 November 2022 (UTC)Reply[reply]
I suspect you're uttering an untruth. Not all users read the Beer Parlour every week. Should you even expect non-editors to read the Beer Parlour at all? You haven't even announced the threat to delete them on the category pages themselves! And how do you propose to eliminate these categories properly? How to restore them isn't obvious to everyone - your knowledge of the systems employed is excellent, but the systems are not well documented. Indeed, different languages do the same thing differently. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)Reply[reply]
Now, some of their functionality should be addressed better. But one should put the alternative functionality in place before deleting the old. --RichardW57m (talk) 11:21, 8 November 2022 (UTC)Reply[reply]
@RichardW57m Damn you are blustery. I want to at least eliminate the Portuguese adjective categories, which are populated only partially and only when you use {{adj form of}}, and you are acting like the gatekeeper of all category changes. You haven't given a single reason why these categories are actually useful and worth the maintenance burden (which falls on people like me, not you). Please let me know why you are so desperate to keep them -- do *you* actually use them? Or is this just a sort of "nothing should ever be removed because someone might possibly find them useful"? Benwing2 (talk) 02:13, 14 November 2022 (UTC)Reply[reply]
It is partly that someone went to the trouble of creating them; I'd be a lot happier if on reconsideration they accepted that it was not useful work. I'd be a lot happier if you put notices on the categories you want to get rid of alerting any who actually use them of the categories' imminent removal. In general, these adjective categories are being generated by two routes - the {{inflection_of}}-type route, as in {{adj form of}}, and by direct invocations of {{head}}. I don't see any saving in eliminating the former for Portuguese; you eliminate one occurrence of "romance_adjective_categorization," in Module:form_of/cats. Now, there would be a saving of maintenance effort if you eliminated the categorisation of entries by inflection for gender and number, but if that is what you are proposing, say so. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)Reply[reply]
Having looked into this mechanism, I am now wondering if it could actually be useful when revising inflection tables and reducing the number of forms. Several cases spring to mind for the dative singular of Pali a-stems:
  1. The ending in -tthaṃ does not actually seem to be a case-ending, and one day we may be able to get rid of it. (There was no mechanism to formally challenge it.) By that time, there may be senses of words ending in -tthaṃ claiming them as dative singulars. We would then need to eliminate or redescribe them. However, for this one, it might actually be quicker to search for noun and adjective forms in -tthaṃ, and rely on such forms having entries in the Roman script.
  2. We may be overstating the number of masculine and neuter dative singulars in -āya. This is not a rare form for feminines, being used for several cases. We may therefore need to revise such case forms when entered as terms.
  3. At present, we distinguish Pali datives from genitives by their meaning. In Prakrit, the criterion is form, and therefore many words lack datives entirely. If we switched Pali to the same treatment as Prakrit, what are currently described as dative/gentive caseforms will have to be redescribed.
It would seem a shame for categorisation by inflection to have to be re-implemented for such filtering. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)Reply[reply]
I am already using some Pali verb forms as the basis of categorisation, but categorising the lemma rather than the inflected form, and classifying the categories as maintenance categories. The problem is that the textbooks help one recognise a form, rather than tell one if it does not exist. --RichardW57m (talk) 13:24, 14 November 2022 (UTC)Reply[reply]

Duplicated words in Category borrowedEdit

notifying @Benwing2, Erutuon. It is me again, about borrowed terms. Example: under Category:Greek terms borrowed from French, the members of subcategories Category:Greek learned borrowings from French, the Cat:unadapted & the Cat:obor, etc are duplicated. They appear twice, in the 2 categories. So, we cannot tell, which ones have the template {{bor}}. The terms under calques and semantic loans are OK, they do not appear twice. Same happens at e.g. Category:English terms borrowed from French, and so on. For Greek languages, perhaps others too, the {{bor}} template is very significant and distinct from the other templates. It would be great if this duplication could be avoided? Thank you. ‑‑Sarri.greek  I 07:58, 1 November 2022 (UTC)Reply[reply]

@Sarri.greek I guess you're requesting that template {{lbor}} does not categorize into 'DEST terms borrowed from SOURCE' but only 'DEST learned borrowings from SOURCE'? My original logic for categorizing into both is that a learned borrowing is still a borrowing, and if you remove them from the parent category, it would be easy for a new Wiktionary user to miss the fact that they also have to look in all child subcategories to find all borrowings. Also there was a vote in favor of including 'DEST terms borrowed from SOURCE' also in 'DEST terms derived from SOURCE', and this is in the spirit of that vote. OTOH I suppose this same argument could potentially be made for including all terms in all subcategories in all their parent categories, which might be undesirable. Benwing2 (talk) 02:48, 2 November 2022 (UTC)Reply[reply]
No, this time I do not request any template (! I changed my mind, since en.wikt, thinks differently.)@Benwing2. As is have seen here and there, there are 2 kins of Categories:
1) The index-like-cateogires (all the members of all subcategories can be viewed there) (Probably they should have a different name too: Index:C....)
2) and the 'non-index' ones which are
  • 2a) either empty, and only subcategories can be seen
  • or 2b) subcategories have their hyponyms, +we view in the general Cat the words which have no characteristic of a hyponym.
The above e.g. Category:Greek learned borrowings from French and the similar are a bit sloppy in the sense that there is no way to spot the {bor} = the ones that are NOT hyponyms (I have understaood, that in Eng.Dictionaries, the {bor} is a general word and means no specific kind of borrowing. So, The structure 1. or 2b (I would love to have the 2b, because it serves other languages too, which need to separated {bor} from {lobr} {ubor} ... I am sorry, that I cannot express myself a bit better from the linguisitcs side of things. Thank you, for your attention. ‑‑Sarri.greek  I 03:00, 2 November 2022 (UTC)Reply[reply]

Ecclesiastical Latin vs. Medieval and New LatinEdit

For purposes of classification what's the difference between them meant to be exactly on WT? The definitions currently on the category pages are (Ecclesiastical Latin) "a form of Latin initially developed to discuss Christian thought and later used as a lingua franca by the Medieval and Early Modern upper class of Europe"; (New Latin) "a revival in the use of Latin in original, scholarly, and scientific works since c. 1375/1500"; (Medieval Latin) "a primarily written form of Latin used across Europe in the Middle Ages". The definition of Ecclesiastical Latin is the sticking point here since it makes it synonymous with, or a collective term for, Medieval and New Latin, or weirdly implies that the latter are basilects (not "upper class").

My own thought, which seems to better reflect the terms that are actually in the category and how I've used it as a label myself, is that Ecclesiastical Latin should be limited to terms with a specifically liturgical or theological bearing, especially ones that have been current in the Catholic Church up to the contemporary era (apart from the liturgy, many Catholic specialist journals were still written in Latin up to the mid-20th century). The "lingua franca" stuff should be dropped from the description—Ecclesiastical Latin is Latin used by the Church, not just "the upper class" and not specifically in medieval or early modern times. —Al-Muqanna المقنع (talk) 12:03, 1 November 2022 (UTC)Reply[reply]

Do we need a category for Ecclesiastical at all? As you mention, it spans multiple periods in history. It almost amounts to a topic label, such as 'food' or 'types of potato'. Nicodene (talk) 14:07, 1 November 2022 (UTC)Reply[reply]
I tend to agree actually, it would make more sense to just have straightforward chronological categories and use Category:la:Theology, Category:la:Bible, Category:la:Christianity etc. where appropriate, and maybe treat existing "Ecclesiastical Latin" labels as meaning "post-Classical". I was thinking about this when I made dēcrētum horribile, which is very much a theological term but a Protestant one (the term is Calvin's and both of my Latin citations are from Lutherans)—is there "Protestant Ecclesiastical Latin", or should it just be listed as New Latin? Might be easier to avoid the question and just use Medieval/Renaissance/New with topics as appropriate. —Al-Muqanna المقنع (talk) 14:14, 1 November 2022 (UTC)Reply[reply]
How much does in cost us to maintain these labels and categories? If all we get is a bit of tidiness, it doesn't seem worthwhile to suppress the information reflected in the labels and categories. Not all of our category groups are mutually exclusive and collectively exhaustive, nor should they have to be. DCDuring (talk) 15:05, 1 November 2022 (UTC)Reply[reply]
@DCDuring: The problem isn't tidiness, it's that it isn't clear what the label is actually intended to mean, and the description of the category (which is also the intro of the Wikipedia page the label links to) contradicts how it's used in practice. I don't mind if it's kept with an explanation, e.g. along the lines of my suggestion above (Latin as used by the Church, up to the contemporary age). But I am sympathetic to Nicodene's point to the extent that getting rid of the term would not actually suppress any information, since as actually used it doesn't seem to contribute anything that wouldn't be covered by a chronological + topical combination like "New Latin, theology" and the like. —Al-Muqanna المقنع (talk) 16:44, 1 November 2022 (UTC)Reply[reply]
Exactly. The meaning isn't tidy.
It certainly doesn't contribute anything to someone not interested in what it might mean. Is it really true that all Ecclesiastical Latin is about academic theology, rather than, say, maintenance of churches, canon law, or the conduct of rituals. Has anyone knowledgeable taken a good look at how the labels are actually used? What was the source of the labels? How did the source apply them? Is "Ecclesiastical Latin" actually used only for terms used in theological discourse? Do we have anyone who respects the subject(s) enough to make an improvement on the current labels and categories? Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins. Doesn't that add to the value of the existing label? DCDuring (talk) 22:55, 1 November 2022 (UTC)Reply[reply]
A lot of it relates to law, and it's not entirely appropriate to put the word "ecclesiastical" on that. Yes, much of it obviously was used in that way by the church, but certainly not exclusively. Theknightwho (talk) 22:57, 1 November 2022 (UTC)Reply[reply]
@DCDuring: I think I get your point a little better, but I'm not concerned about e.g. the use of "Ecclesiastical Latin" in etymology sections and the like, imported from dictionaries, although those could be more precise in some cases. I am myself a specialist and I add terms that I come across in primary sources. It isn't clear to me when "Ecclesiastical Latin" should be applied to a term that is being added, or, conversely, what it means when someone else adds one, because our definition of the term is poor. I imagine for a non-specialist it would be even less helpful. So, I think it would probably be good to clarify how we are using it. If you're asking for someone knowledgeable to take a look, well, I am here and taking a look at it, hence this thread. "Ecclesiastical Latin" of course does not only apply to academic theology, hence my point above about theological or liturgical bearing and my suggestion to describe it expansively as language used in relation to Church matters and especially terms that are not obviously circumscribed by era.
I do disagree, as a point of fact, that "Ecclesiastical Latin seems to have had more uniformity than, say, scientific, literary, legal or medical Latins": I think FWIW that in practice precisely the opposite is true. Law Latin developed over a much shorter period, is entirely technical and constituted more of a pan-European argot because the study of law was dominated by a small number of institutions (Orléans, Bologna) from the time of the reintroduction of the Corpus Iuris Civilis. By contrast, liturgy in the Middle Ages was not developed by technicians and, before the advent of printing, Trent, and Quo primum, the language of clergy reflected a much more diverse set of local practices, often developed diocese by diocese. Anyway, all this is just to say that I think we should decide on an in-house definition of Ecclesiastical Latin that can be applied with reasonable consistency and can be explained to non-specialist readers, rather than just point to or copy what's on the Wikipedia page, which is fine as it is but wasn't written with a dictionary in mind. —Al-Muqanna المقنع (talk) 00:30, 2 November 2022 (UTC)Reply[reply]
It may be relevant to this discussion to note that there is an official Vatican body responsible for (among other things) creating a dictionary of neologisms for modern concepts, which are likely often not used, but are probably incorporated into the official Latin translations of Vatican documents. Most of these are not ecclesiastical terms per se, but I would think they are primarily used in ecclesiastical contexts (papal encyclicals and the like). Andrew Sheedy (talk) 04:20, 2 November 2022 (UTC)Reply[reply]
That's worth noting, for sure. Our current definition of EL focuses on medieval and early modern usage, and sometimes dictionaries use it just to mean "Medieval Latin": but that's a very different beast from Latin as used by the Vatican now. I think my "era-independent" suggestion would encompass that better. —Al-Muqanna المقنع (talk) 19:46, 3 November 2022 (UTC)Reply[reply]


Modern literature on Mongolic languages tends to make a distinction between Proto-Mongolic (the direct ancestor to Middle Mongolian, spoken between the 10th/11th and 13th centuries) and Pre-Proto-Mongolic, the ancestor to that language, tracing back to approximately the 5th century. Although Proto-Mongolic and Pre-Proto-Mongolic are both unattested, the distinction does still matter, as they're reconstructed by very different means: Proto-Mongolic is primarily reconstructed from extant (and attested) languages within the Mongolic family (though obviously with Turkic, Tungusic and Sino-Tibetan influence where appropriate). On the other hand, Pre-Proto-Mongolic is only possible to reconstruct externally (i.e. indirectly), from what we can infer from known/suspected contact with other language families at the time, and then cross-comparing to what we know about Proto-Mongolic + later developments.

Obviously the number of Pre-Proto-Mongolic lemmas is inevitably going to be quite small for a very long time, but I think the difference between the two is significant enough that it warrants creating a separate L2. For comparison, Pre-Proto-Mongolic would be (near-)contemporary with Old Turkic. Theknightwho (talk) 16:53, 1 November 2022 (UTC)Reply[reply]

@Theknightwho: Unless there are descendants of Pre-Proto-Mongolic other than Proto-Mongolic, it seems quite shaky to reconstruct it at all. You can always give the reconstructed older forms (with appropriate references) in the etymology sections of Proto-Mongolic, there's no need to make separate lemmas for them. Thadh (talk) 17:05, 2 November 2022 (UTC)Reply[reply]
@Thadh We do know that there was some influence of Pre-Proto-Mongolic during that period, which is how we are able to do any reconstructions. I would also feel uncomfortable adding reconstructions under a name not used for them outside of Wiktionary. Theknightwho (talk) 17:09, 2 November 2022 (UTC)Reply[reply]
@Theknightwho: Even so, reconstructing purely on the basis of (supposed) loanwords is... eh. And I'm not saying you should add PPM lemmas under the name of PM, I'm rather referring to things like Proto-Finnic *hüvä, where the earlier stage (early Proto-Finnic, or pre-Proto-Finnic, if you wish) is given in the etymology section. Same thing is also widely done for Pre-Germanic. No need to make links out of them. Thadh (talk) 17:13, 2 November 2022 (UTC)Reply[reply]
@Thadh I should probably have mentioned that much of this comes from the attempted reconstruction of Khitan, which is a para-Mongolic language, of which Proto-Mongolic is only one (or is its sister family, depending on which academic you talk to). Although this is tentative (and I'm unsure quite how many actual pages we can be confident enough to create), there are certainly a small handful. Theknightwho (talk) 18:33, 4 November 2022 (UTC)Reply[reply]
@Theknightwho: Usually, creating full-fledged codes for proto-languages that contain just one more descendant than another code has not provided fantastic results here on Wiktionary - take Proto-Polynesian (compared to Proto-Nuclear Polynesian) and Proto-Semitic (compared to Proto-West Semitic) - usually the former is identical to the latter and people just link the older language making the whole categorisation and lemmatisation a mess. Thadh (talk) 20:40, 4 November 2022 (UTC)Reply[reply]
I'm not sure that would happen here. There aren't many PPM reconstructions, compared to the large number of reconstructions for PM. Theknightwho (talk) 21:19, 4 November 2022 (UTC)Reply[reply]
Support. AG202 (talk) 05:39, 4 November 2022 (UTC)Reply[reply]
I don't think it's terribly necessary if we are talking about loans from let's say Proto-Turkic into pre-Proto-Mongolic. @Thadh's example of Proto-Finnic *hüvä shows how to illustrate the etymology elegantly without an additional entry for pre-Proto-Finnic. In such a case, we can include the Proto-Mongolic form among the descendants of the Proto-Turkic reconstruction, thus reciprocally linking the two forms to each other.
The opposite case is more interesting if let's say again Proto-Turkic borrowed from pre-Proto-Mongolic, i.e. if the Proto-Turkic form cannot be derived from Proto-Mongolic but definitely reflects an earlier form preceding the Proto-Mongolic stage (similar to pre-Grimm's law Germanic borrowings into Finnic). In the etymology of the Proto-Turkic form, we could mention the putative pre-Proto-Mongolic form and link to the Proto-Mongolic reconstruction derived from the latter, but we cannot include the Proto-Turkic reconstruction among the descendants of the Proto-Mongolic entry. In such a case (to ensure reciprocal linking), pre-Proto-Mongolic entries make sense. –Austronesier (talk) 18:36, 4 November 2022 (UTC)Reply[reply]
@Austronesier: I think we could bend the rules a little and give the Proto-Turkic descendant on the Proto-Mongolian entry with a necessary qualifier From earlier *PPM_form: in the descendants section, something like on Proto-Finnic *omena (there are much better examples but I can't come up with one off the top of my head and I think the premise is quite clear here). Thadh (talk) 20:34, 4 November 2022 (UTC)Reply[reply]
@Thadh I'm not sure I understand why that should be necessary, instead of doing it properly. Theknightwho (talk) 21:16, 4 November 2022 (UTC)Reply[reply]
@Theknightwho: In practice, how many entries will we get for pre-Proto-Mongolic as donor? –Austronesier (talk) 21:20, 4 November 2022 (UTC)Reply[reply]
@Austronesier I wouldn't say very many - at least not at this stage. We're probably looking at 20 reconstructions which are possible at all, which theoretically could be used on the pages for about 10 languages each. Theknightwho (talk) 21:32, 4 November 2022 (UTC)Reply[reply]
@Theknightwho: That is doing it properly. Reconstructions of languages based on borrowings are very speculative, and we don't host terms that would normally have two (**) or even three (***) asterisks.
If we're aiming at a language with under thirty terms that can be (relatively) safely reconstructed, while having a solid reconstruction of a descendant that is also the ancestor of all its other descendants, then just adding this note to thirty lemmas out of hundreds potential pages isn't a problem and saves space and a lot of headache.
If we're talking about an actual solidly reconstructed language with a lot of reconstructions, then that means that pretty much any modern Mongolic term will need to have one more code added to its etymology, and that's becoming bothersome on that end. Thadh (talk) 21:25, 4 November 2022 (UTC)Reply[reply]
So your argument is that if there aren't many there's no point, and if there are lots then it's too much work? Hmm. Forgive me if I'm misunderstanding you there. Theknightwho (talk) 21:29, 4 November 2022 (UTC)Reply[reply]
@Theknightwho: I'm saying if there's few then there's no point, and if there are lots it may be better to just switch to generally giving the older form instead of the newer in the reconstructions. Thadh (talk) 16:33, 6 November 2022 (UTC)Reply[reply]
  Oppose - Even the reconstruction of Proto-Mongolic is tentative and based upon a handful of works. There is no consensus on the reconstruction of Pre-PM and indeed the reconstruction of the Khitan sound system itself is still in its early phases. The needs of linking Turkic and Tungusic cognates and Khitan entries can be well served by the PM pages themselves. Hromi duabh (talk) 14:28, 25 November 2022 (UTC)Reply[reply]

Apply for Funding through the Movement Strategy Community Engagement Package to Support Your CommunityEdit

The Wikimedia Movement Strategy implementation is a collaborative effort for all Wikimedians. Movement Strategy Implementation Grants support projects that take the current state of a Movement Strategy Initiative and push it one step forward. If you are looking for an example or some guide on how to engage your community further on Movement Strategy and the Movement Strategy Implementation Grants specifically, you may find this community engagement package helpful.

The goal of this community engagement package is to support more people to access the funding they might need for the implementation work. By becoming a recipient of this grant, you will be able to support other community members to develop further grant applications that fit with your local contexts to benefit your own communities. With this package, the hope is to break down language barriers and to ensure community members have needed information on Movement Strategy to connect with each other. Movement Strategy is a two-way exchange, we can always learn more from the experiences and knowledge of Wikimedians everywhere. We can train and support our peers by using this package, so more people can make use of this great funding opportunity.

If this information interests you or if you have any further thoughts or questions, please do not hesitate to reach out to us as your regional facilitators to discuss further. We will be more than happy to support you. When you are ready, follow the steps on this page to apply. We look forward to receiving your application.

Best regards,
Movement Strategy and Governance Team
Wikimedia Foundation Mervat (WMF) (talk) 13:49, 2 November 2022 (UTC)Reply[reply]


I propose we move Braille from Translingual to Alt Forms of the approrpiate L2's and create something like {{braille form of}}. Braille entries as they are are a mess. @Binarystep @AG202 @Thadh, and anyone else interested. Vininn126 (talk) 14:44, 2 November 2022 (UTC)Reply[reply]

That seems sensible for many entries. Can it be automated? —Justin (koavf)TCM 14:49, 2 November 2022 (UTC)Reply[reply]
Isn't some Braille translingual? Maybe numeric digits, music notation, etc.? Equinox 14:57, 2 November 2022 (UTC)Reply[reply]
This definitely seems true, so some translingual braille will have to stay. Vininn126 (talk) 15:30, 2 November 2022 (UTC)Reply[reply]
There is already {{Brai-def}}, but that seems to be only ever used for Japanese. – Wpi31 (talk) 16:00, 2 November 2022 (UTC)Reply[reply]
I'm inclined to oppose: Braille is essentially an alternative orthography never used in print media nor on the web; We don't include morse code, attested encoding mechanisms or shorthand either, and for good reason: It takes five minutes to look up the braille alphabet and you'll be able to read any braille text with the table, assuming you even manage to find a braille text that doesn't have a regular text next to it. And why on earth Unicode decided to add braille is beyond me. Thadh (talk) 17:01, 2 November 2022 (UTC)Reply[reply]
Braille books exists? Vininn126 (talk) 17:28, 2 November 2022 (UTC)Reply[reply]
Okay, I guess that wasn't a perfect wording, I rather meant "print media intended for visual consumption" - braille books are still intended for a very specific group of people that would probably prefer using regular text types if they could. Thadh (talk) 17:31, 2 November 2022 (UTC)Reply[reply]
Of course they aren't for visual consumption, the vast majority of people reading these books can't see. I don't think I'm understanding the difference you are making. Is your argument based on the fact we should be recording printed letters as opposed to cues for other senses because these alternative "alphabets" are usually based on a visual alphabet? Somewhat relatedly, do you think what we have at is what we should be doing? Vininn126 (talk) 17:36, 2 November 2022 (UTC)Reply[reply]
The point I'm making is that braille, along with morse, shorthand etc., are specialised respellings of the regular (in English's case, Latin) orthographies. So they don't have any place in a dictionary, plain and simple: If someone seriously wants to see what a braille texts says, they should use a converter, or a chart, but not a dictionary. To give some more examples of specialised respellings: binary code, hexadecimal code, UTF-codings... So no, I don't think is something we should be doing, I'm fine with keeping the translingual entry for consistency's sake, but making language-specific entries makes no sense to me. Thadh (talk) 17:42, 2 November 2022 (UTC)Reply[reply]
This is essentially the discussion from a while ago trying to determine if we should collapse a lot of Language's letter content into translingual, utlimately the consensus from that was that we should keep them separate. I think it's rather inconsistent to have separate letter information for a in each L2 but not for various symbols such as this. Vininn126 (talk) 17:46, 2 November 2022 (UTC)Reply[reply]
@Thadh Braille can be radically different from language to language and country to country though… it’s not the same as Morse code at all. You can’t look up a Braille converter for Braille in Japan for example and expect it to be the same. Also there are shorthand words made from Braille that don’t align with the letters. It feels oddly similar to the arguments made against including Sign Languages. Looking at ⠁⠉ for example, in English Braille it means “according” from the shorthand of “ac” but in w:Korean Braille it means 그러나 (geureona, but, however), which you wouldn’t even be able to easily guess from the Korean Braille alphabet. Another example is which differs from language to language significantly. Who knows what other shorthand Braille there are? This is actually one of the better things that Unicode has added, along with SignWriting as it can increase access significantly (who knows how Braille can interact with screen readers?) To quote w:English Braille: “Braille is frequently portrayed as a re-encoding of the English orthography by sighted people. However, braille is a separate writing system, not a variant of the printed English alphabet”. To label it as a respelling of a regular orthography is inaccurate. This is lexical information that’s important to users and increases accessibility and awareness of how Braille works. I support this proposal wholeheartedly. CC: @Vininn126 AG202 (talk) 21:26, 2 November 2022 (UTC)Reply[reply]
See also: w:English Braille#Contractions & w:American Braille. You can’t pull out a dictionary and read everything out automatically. And that’s only three Braille systems that I’ve looked into, let alone the many many more. AG202 (talk) 21:40, 2 November 2022 (UTC)Reply[reply]
I hate to use the "as someone [relative clause]" formation but as someone whose mother frequently uses Braille and teaches it, this "code" stance is fairly wrong.
"A few shorthands" (note: this was wording used on the English Wiktionary Discord, not here) does not come close to covering the amount of contractions, multisymbol contractions, symbols, and deprecated usages in Modern English Braille. There are sixty-four (64) possible Braille cells and the amount of distinct symbols and indicators in modern English Braille far exceeds that.
Braille, as we know, is not a language, but it is a specialized orthography deserving of demarcation from translingual lemmata. This discussion must acknowledge that not only is there of course multilingual Braille, but there is Braille specifically designed for technical purposes, e.g. Nemeth Braille Code (used for encoding mathematical + phsyical notation). These technical codes (which exist in tongues beyond English) are extremely complex and cannot likely be explained away in a translingual section.
Again, in a Braille cell, there are sixty-four possible individual characters. Multiple Braille cells are used to represent completely different letters, contractions, and symbols in different languages. N is not exclusively a translingual page. Why should be so?
I am aware, Thadh, that you yourself don't like letter pages anyway. But there is a precedent. Jodi1729 (talk) 17:10, 3 November 2022 (UTC)Reply[reply]
I would like to add a clarification - when I say split, I mean just split the existing letters by language. I do not wish to imply things like transliterations of each words. If there are interesting, non predictable attestable forms of words and such then we can discuss that. Vininn126 (talk) 23:50, 2 November 2022 (UTC)Reply[reply]
Support. Binarystep (talk) 07:59, 3 November 2022 (UTC)Reply[reply]
Largely oppose. For the one-cell characters, they are mostly better not split by natural language. is nice and compact - it would be disastrous to split the Bharati Braille usage by language, and I wouldn't like to split the lemma by script. Abbreviations and logograms are possible exceptions - I wonder what multilingual Braille systems do for the word-like abbreviations. In this case, perhaps Wiktionary should act like a reference manual and list transliterators from Braille. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)Reply[reply]
I concede that there may be a case for L2 Braille-system headers, such as 'Unified'. --RichardW57m (talk) 12:48, 4 November 2022 (UTC)Reply[reply]
Why would it be any more disastrous than splitting for any other script? Theknightwho (talk) 13:06, 4 November 2022 (UTC)Reply[reply]
@Theknightwho: The letter 'a' only has entries for languages written in the Roman alphabet, the corresponding Braille letter would have entries for every language written in Braille - you would add most of the languages of mainland south and southeast Asia. --RichardW57m (talk) 10:42, 7 November 2022 (UTC)Reply[reply]
Why is that a problem, though? Theknightwho (talk) 15:05, 7 November 2022 (UTC)Reply[reply]
@Theknightwho: do you really thing umpteen entries for is better than what we currently have? --RichardW57m (talk) 10:20, 9 November 2022 (UTC)Reply[reply]
@RichardW57m: If they’re generally semantically different, then yes. Theknightwho (talk) 16:07, 9 November 2022 (UTC)Reply[reply]
They will usually map in the first instance to "1" or the "character used to represent /a/ or the approximation thereto". --RichardW57m (talk) 17:26, 9 November 2022 (UTC)Reply[reply]
So do normal letters. I take it you would support merging those into translingual as once proposed? Vininn126 (talk) 17:31, 9 November 2022 (UTC)Reply[reply]
Yes, and I note a lot of letters have false precision, as exemplified by definitions like "the nineteenth letter of the Welsh alphabet". The word 'nineteenth' is false precision - it depends on whether 'j' is in the Welsh alphabet, and some such definitions have been inconsistent. (It wasn't when I was a boy.)
Sheer aesthetics argue for the collapse to a single lemma in the case of Braille. --RichardW57m (talk) 11:12, 10 November 2022 (UTC)Reply[reply]
That's an argument for improving the quality of those entries; not removing them. Theknightwho (talk) 16:52, 10 November 2022 (UTC)Reply[reply]
Is there even multilingual braille? AG202 (talk) 13:24, 4 November 2022 (UTC)Reply[reply]
w:Bharati Braille. RichardW57m (talk) 10:45, 7 November 2022 (UTC)Reply[reply]
Thank you, I missed that in your original comment. I do wonder though, seeing how Braille systems often have contractions and shorthand, if those could differ for the languages that implement the Bharati Braille system, as you mentioned. Though I disagree with the implementation that Wiktionary should only act as a reference manual. Maybe an L2 Braille system like "Bharati Braille" would be useful, because as is, "Translingual" is not clear and has become a catch-all which is a problem. AG202 (talk) 15:23, 7 November 2022 (UTC)Reply[reply]
I'm not saying that Wiktionary should act only as a reference manual. If someone is trying to decipher some Braille text, I think it is too much to hope that we will have found attestations for the Braille spelling of every English word in Braille, let alone Welsh. What we can do is point to a transliteration service. We might even supply them ourselves - if we list abbreviations, let alone words, we should probably offer transliterations, just as we do for other scripts, though notably on a language by language basis. (Hmm - non-Roman targeted Brailles need two levels of transliteration - target script and Latin script. And Bharati Braille is script-agnostic - it even supports basic Latin!) --RichardW57m (talk) 13:04, 10 November 2022 (UTC)Reply[reply]
The idea of an L2 heading "Bharati Braille" has some appeal, especially if we must break translingual up. As far as I can tell, Bharati Braille has no contractions - it's 'level 1' equivalent employs all codes for simply written words. There must be some subtleties in the writing - I need to draw up a cell to letter etc. coding table. --RichardW57m (talk) RichardW57m (talk) 13:04, 10 November 2022 (UTC)Reply[reply]
I believe that lack of a contractions in one language isn't an argument to not separate other languages. Lack of a word for "bombard" in one language is not evidence to not add it in another language. Also I want to emphasize the point of the thread is not to provide transliterations, just change the presentation of the current entries to be more consistent with other letters. There was an attempt to merge them into translingual before, ultimately leading to no change. Vininn126 (talk) 13:13, 10 November 2022 (UTC)Reply[reply]
Ah, so you are just talking about Braille letters, and not other Braille characters? Note that we haven't split ÷, whose Scandinavian meaning ("minus sign") is different to its English meaning. I will remark here that Unicode considers the Braille characters to be symbols, not letters! Unfortunately, consistency is overrated. ---18:15, 10 November 2022 (UTC) RichardW57m (talk) 18:15, 10 November 2022 (UTC)Reply[reply]
Look at my second comment to myself above. Also, disagree on the consistency! It makes a huge difference for readers. Vininn126 (talk) 18:38, 10 November 2022 (UTC)Reply[reply]
So does Unified English Braille have 26, about 51 or how many letters? Is there anywhere a Wiktionary taxonomy for entities in Braille script? It's the 26 that are amongst the most translingual! --RichardW57m (talk) 12:52, 11 November 2022 (UTC)Reply[reply]
I'm not following. Could you please elaborate? Vininn126 (talk) 12:54, 11 November 2022 (UTC)Reply[reply]
Not all 64 6-dot Braille cells are letters. I think I've seen ligature and logogram used, and, irrelevently, of course there are the ten numerals which double as letters. Decade 5 is mostly punctuation, and the right-shifted cells are mostly 'format' or similar characters. The dotless cell does not function as a letter. --RichardW57m (talk) 14:51, 11 November 2022 (UTC)Reply[reply]
(honestly, it should be split) AG202 (talk) 19:00, 10 November 2022 (UTC)Reply[reply]
There are five characters in Bharati Braille (and more for Indian Urdu) whose writing includes format characters. There are also a couple of ambiguous characters - or at least, that's implicit in the documentation I can find. --RichardW57m (talk) 12:14, 11 November 2022 (UTC)Reply[reply]

Frequency information in usage notesEdit

A user removed frequency information that I added to entry supermajority:

"The term supermajority is much more common in the American corpus while qualified majority is much more common in the British corpus."

It traced to {{R:GNV}}.

The user said it belongs to context label but did not add any context label himself. This kind of procedure seems very unwiki to me.

I don't think we can fairly describe this in a context label: "Chiefly British" or "Chiefly American" does not seem appropriate context labels. It is not so clear what the prevalence in the corpora means; all we can do is state the prevalence and let the reader follow the GNV link to see for themselves. All it can mean is that Americans use "supermajority" to refer to their political supermajorities while EU uses "qualified majority" to refer to what they do.

What do you think? Does the usage note do more harm than good? I find it very useful, especially when paired with a link to follow.

--Dan Polansky (talk) 13:07, 3 November 2022 (UTC)Reply[reply]

If that isn’t what the word “chiefly” means, then it’s not at all clear what it is ever supposed to mean. It’s also clearly escaped your attention that I did add a context label, but it would have been helpful if you could have bothered to do it yourself.
Rather than putting this information in a usage note that uses 5-10 times as many words as necessary, it is much better to simply use a context label - something that we do almost everywhere else. I also wasn’t aware that “British English” and “EU English” are synonymous. Theknightwho (talk) 13:12, 3 November 2022 (UTC)Reply[reply]
Per WT:EL: These notes should not take the place of context labels when those are adequate for the job. Case closed. Theknightwho (talk) 14:59, 3 November 2022 (UTC)Reply[reply]
It would, however, be helpful to indicate in the entry the more common British equivalent. Andrew Sheedy (talk) 15:20, 3 November 2022 (UTC)Reply[reply]
@Andrew Sheedy It’s right under the definition. Theknightwho (talk) 15:21, 3 November 2022 (UTC)Reply[reply]
supermajority,(qualified majority*4) at Google Ngram Viewer shows supermajority to be about 4 times as common as the other term in the American corpus. Does it make qualified majority "chiefly British"? Not to me: the term still sees very significant use in the American corpus. To me, "chiefly British" would require much smaller use in the American corpus. A problem is that we do not define anywhere what "chiefly" means numerically, something a professional dictionary would have to do. We have too many things uncodified. --Dan Polansky (talk) 18:27, 3 November 2022 (UTC)Reply[reply]
If your concern is the precise meaning of the adverb "chiefly", then that is solvable by using a different adverb. I strongly suspect you're just nitpicking, though. Theknightwho (talk) 18:42, 3 November 2022 (UTC)Reply[reply]
So which adverb? As I explained, my understanding of "chiefly" is different from what the data shows. The sentence I used does not suffer from that problem. I do not recall ever tagging entries as "chiefly US" or "chiefly UK" and I do not know what our guideline is for that tagging. My suspicion is that it is based on whim. The problem with crude labels is apparent in color entry, which says "color (countable and uncountable, plural colors) (American spelling) (Canadian spelling, rare)". By contrast, OED says "colour | color" and data shows "color" to be fairly common in the British corpus as of late[1]. Crude labels do not do justice to facts and OED does a better job than we do in its "color" entry. --Dan Polansky (talk) 19:10, 3 November 2022 (UTC)Reply[reply]
So you're arguing that a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?
Your color example is actually a great demonstration of why we need to take these NGram numbers with a heavy dose of salt anyway. Just because a variant is used in a corpus doesn't mean that it is actually accepted as being part of a particular variety of English by the speakers of that variety. There are other reasons why it might occur instead: spellcheckers, for instance. Theknightwho (talk) 19:22, 3 November 2022 (UTC)Reply[reply]
'a word can be "much more common" in one corpus without it being "chiefly" used in that corpus?' Yes. Kind of obvious to me.
The color example shows real data, not guesses and unsubstantiated opinions. OED seems to think so as well given they say "colour | color". Given the data and the OED entry, it seems that "color" is now widely accepted in the British English. Supplementary evidence could challenge that idea, but mere unsubstantiated opinions won't. --Dan Polansky (talk) 14:50, 4 November 2022 (UTC)Reply[reply]
The color example shows that you don't understand how to interpret raw data, and that you don't understand that the OED isn't limited to British English; you've failed to address both of these points. As a native speaker of British English, I can tell you pretty definitively that color is not "widely accepted" in British English. Other corpora do not support your point, either, given that BASE and British English 2006 contain almost 0 instances of color, and UkWac Complete and GloWbE show very low usage compared to colour. As someone who is not a native speaker of English and who does not even live in a country where English is the dominant language, you are not in a position to make the claim that you are; especially when you've stonewalled the obvious explanation that I've already given you.
Stop embarrassing yourself. You seem to have absolutely no idea what you're doing, and seem to be completely incapable of accepting that the conclusions you've hastily jumped to might be flawed; often fatally so. Theknightwho (talk) 20:56, 5 November 2022 (UTC)Reply[reply]
As an American, qualified majority would confuse me; I would have assumed it's the normal use of qualified + majority, which could mean anything given the context. I'd think that the use of qualified majority in US English would either be in that broader sense, or specifically talking about the EU procedures and using the language they use to describe them.--Prosfilaes (talk) 18:00, 16 November 2022 (UTC)Reply[reply]
I googled for US uses of qualified majority, and after pages of British or European uses, and a few US pages talking about the EU, I found a George Washington University article that used the phrase "qualified majority": "Along with Costa Rica, Argentina, Ecuador, and Nicaragua have adopted qualified majority-runoff rules; i.e., to win outright, the leading candidate must reach a threshold, but the threshold is lower than 50 percent of the vote."[2] That is, this US source uses "qualified majority" for almost the exact opposite of "supermajority".--Prosfilaes (talk) 18:07, 16 November 2022 (UTC)Reply[reply]

Albanian proper noun lemmas - indefinite vs definiteEdit

I think lemmas for Albanian proper nouns should be the indefinite forms, like with common nouns, even if definite forms are more commonly used. There is no consistency and there are many duplications. So I have already created or changed indef. forms to be lemmas and def forms to be a inflected form only (focusing on country names for now).

Examples of pairs indefinite - definite (already checked and edited by me)

  1. Indi - India
  2. Francë - Franca
  3. Afganistan - Afganistani

Inflection and headword templates (incomplete) currently support the indefinite form to be the lemma.

Question: not sure if indefinite forms are always easily found but they probably exist. Do all proper nouns have both forms?

Please note Albanian Wikipedia uses definite forms in article names, e.g. "India", not "Indi".

Please comment if you have preferences or knowledge or on the subjects, as I have been making changes, so that less rework would be required. Anatoli T. (обсудить/вклад) 23:35, 3 November 2022 (UTC)Reply[reply]

How come Korean verb conjugation templates split between different degrees of politeness, but Japanese don't?Edit

Compare Category:Japanese verb inflection-table templates with Category:Korean verb inflection-table templates.

I just feel like Japanese verb templates would really benefit from the addition of ます forms, etc. These wouldn't be immediately intuitive to new language learners from the Japanese verb templates as they currently stand, when they're just as essential in the appropriate contexts in Japanese as they are in Korean.

So, what do you think? Dennis Dartman (talk) 00:57, 4 November 2022 (UTC)Reply[reply]

@Dennis Dartman: The Japanese inflection of the formal ます (masu) is consistent and simple. It doesn't change dependent on conjugation type, unlike Korean (even if there are commonalities in Korean). Types 1, 2 or 3 have the same formal endings - -ました, -ませ, -ません, etc. Perhaps a link or a note in the conjugation table will suffice.
BTW, unfortunately, Korean template don't handle conjugations with 100% accuracy irregular verbs when the formal forms are lacking or the informal forms are lacking. Anatoli T. (обсудить/вклад) 01:11, 4 November 2022 (UTC)Reply[reply]
Could you give an example for the wrong conjugations? @Atitarev AG202 (talk) 05:36, 4 November 2022 (UTC)Reply[reply]
@AG202: There are a few. The latest issues are in Module_talk:ko-conj#Issues_with_the_module but you can see me in the same talk page. Both good knowledge of the Korean grammar and module writing skills are required but this could be a combined effort with building cases. Suppression and manual overrides (or a different type for the copulas, special versb) would be required. Anatoli T. (обсудить/вклад) 07:30, 4 November 2022 (UTC)Reply[reply]
@Dennis Dartman: Because the current Japanese inflection-table templates are actually not "tables", but rather "lists". As you can notice, these templates in a single line have kanji, kana and romaji, the three items essentially constituting just one single inflected form. Thus they are 1-dimensional, which I would call "lists", and can only give a few forms before becoming almost unreadable. We would need 2-dimensional tables to have enough space for those addtional ます forms.
While it is possible to convert these templates to 2-D table-like structures, this may increase Lua memory usage which I am not sure would be a good idea to everyone. -- Huhu9001 (talk) 03:02, 4 November 2022 (UTC)Reply[reply]
Well, considering Wiktionary is apparently okay with the likes of Template:sw-conj... Dennis Dartman (talk) 03:47, 4 November 2022 (UTC)Reply[reply]
Re: memory issues. The Japanese conjugation template already uses Lua memory due to invoking Module:ja repeatedly, even though the main glue of the template is not written in Lua. In my testing, removing the conjugation template from 愛玩 saved a little over 2 MB, out of the 52 MB limit. For comparison, both Russian conjugation templates on минимизировать (minimizirovatʹ) combined, fully implemented using Lua, take less than 1 MB. My guess is that if the list/table were extended to have twice as many forms using its current implementation, that would require about twice as much memory. If the whole table were implemented using Lua (like the Russian ones are), the addition of more forms might incur a smaller additional cost because it wouldn't require loading the modules over and over. I haven't tested this though.
Anyway, memory would mainly be a concern on verb entries whose title is a single kanji character, since single-character pages tend to be the worst offenders for Lua memory overuse (due to having many language sections, each of which can be long). 04:44, 4 November 2022 (UTC)Reply[reply]
I don’t think the additional workload is likely to be majorly intensive, and there are economies of scale in Lua if done right. The new Mongolian inflection template uses about 3MB (which is an inherent issue due to how the forms have to be generated, though there are about 3 times as many as Russian). Splitting off the independent genitives so that they have their own tables on the relevant pages didn’t save that much memory, despite cutting the number of forms from 117 to 53. Theknightwho (talk) 12:28, 4 November 2022 (UTC)Reply[reply]

The Swahili conjugation template: is it unnecessarily convoluted? Could it use trimming?Edit

Template:sw-conj is massive. Gargantuan.

And are we okay with this?

Feedback from a Swahili speaker preferred. Dennis Dartman (talk) 03:48, 4 November 2022 (UTC)Reply[reply]

I recommend that you explain why this is bad, and what ideas you have got to fix it. (Feedback from someone who wants to solve problems preferred.) Equinox 03:50, 4 November 2022 (UTC)Reply[reply]
I agree it seems rather unwieldy, but as long as it's collapsed by default how much of an issue is it? One benefit of having a comprehensive table is that if you come across a conjugated form and search for it, you'll find the entry for the stem.
I guess one drawback is that big pages take longer to load. For comparison, the table on ruka takes up about 309 kB of HTML, while the whole page (excluding JS, images, etc.) is 439 kB. The arm photograph in the Lower Sorbian section is 36.3 kB. 04:20, 4 November 2022 (UTC)Reply[reply]
I already mentioned this at several places, without receiving any response. Another issue is that the table is …drumroll… woefully incomplete; there’s many more relative forms than that. (For example, on the very fist page of Duniani kuna watu you’ll find the form linalompeleka. We do have the entry -peleka, but this form doesn’t show in search and I certainly can’t find it anywhere in the interminable collapsible boxes in the table.)
Furthermore the template often generates wrong forms.
Anyway, most of the forms (including the lacking ones) are 100% predictable. Trying to include all forms is like including “I wouldn’t have been robbed” in the conjugation table at rob. What’s the point? It drowns the useful information in a sea of cruft.
Finally, looking up every single word in a language you don’t know the basics of is a completely pointless exercise. Try and look up all the words in “I’ll make it up to you.” This won’t tell you the meaning of the sentence at all, because you should look up make up instead of individual words. Someone who doesn’t know the basics of a language should learn the basics of the language rather than looking up every single string surrounded by spaces they come across. Otherwise you just get Bud Carry Without Being in Love. MuDavid 栘𩿠 (talk) 02:06, 5 November 2022 (UTC)Reply[reply]
There does come a point when agglutinative languages get completely out of hand, yes. Are any of the suffixes involved derivational? Certainly for Mongolian and Turkish, voices like the causative are treated as deverbal, and participles are given their own tables. Otherwise, their verbs would also end up with hundreds/thousands of forms in their conjugation tables. My Mongolian spellcheck dictionary boasts that it can handle 1.8 billion possible inflections, and there's no genuine grammatical reason for it to stop there; just a practical one. We should take a similar view, but probably want to draw the line in a different place (e.g. "with one who is not without one that has a horse" is one for the spellchecker that we probably don't need, and that's just looking at nouns). Theknightwho (talk) 03:04, 5 November 2022 (UTC)Reply[reply]
I also complained about this a few months ago, but it seems I somehow did not have the tables collapsed by default the way nearly everyone else does. If other people complain, I just want to make sure that they are not having the same problem I once had. Soap 11:16, 5 November 2022 (UTC)Reply[reply]

pinging @Dennis Dartman, Equinox, Theknightwho, Soap, Jodi1729, MartinMichlmayr, Metaknowledge, JohnC5, Habst, anybody else? I’ve been working on a proposal for new, trimmed, tables: here. There’s still work to do (see the issues I mention at the bottom of the page), but I feel I’ve progressed enough that feedback is welcome. Let me know what y’all think. MuDavid 栘𩿠 (talk) 03:38, 14 December 2022 (UTC)Reply[reply]

Replacing bare lists of adjectives & nouns in usage notesEdit

We currently have a number of entries which have usage notes containing (sometimes very lengthy) lists of nouns and adjectives that the main term is commonly used with. For example: at argument, majority & practice. In my opinion, these are extremely low-effort and of little-to-no use to a reader, given they provide zero contextual information, can't be used as signposts, and are laid out in a format that is much too dense for those who would use the info that they're trying to convey anyway. They're just unlinked blocks of text (which is particularly bad on mobile), with absolutely no information about how any of the listed terms are used with the word in question.

The largest problem, though, is that this is a major misuse of usage notes, which makes it harder to pick out any genuine usage information which is buried underneath. Usage notes are frequently one of the most important sections in an entry, given they usually contain (sometimes critically) important contextual information, which someone unfamiliar with the term needs to know in order to understand the term properly. We do not want to train our readers to skip over them, as these lists invariably will do.

Fortunately, we already have ways of displaying this kind of information: collocations and derived terms. These have several advantages, not least of which are that they're segmented off from other info (and therefore easier to parse), as well as the fact that they show how the two terms are used together. Not every adjective is in the attributive position, after all.

Given this only affects a relatively small number of entries (<50) at present, I suggest we nip this in the bud by converting all of these sections into collocations (or whatever else is appropriate in the context), and we disallow the addition of these bare lists going forward. @Dan Polansky has claimed to me that this is the "traditional" way of doing things, but if there was ever truth in that, it's obviously not how things are generally done now. Theknightwho (talk) 13:49, 4 November 2022 (UTC)Reply[reply]

I used this format in the English Wiktionary for over a decade. It is more compact and the information conveyed is the same as the space-wasting format for collocations. The lists are as useful as collocations; the difference is that, instead of writing "A X, "B X" and "C X", I write A, B, and C and leave it to the reader to fill in X. This format is used by some collocation dictionaries. The format is very compact, ensuring that even a fairly long list of items takes little screen space.
I don't oppose anyone wanting to convert this to the space-wasteful collocation format, but it is not worth my effort and I find the more compact format preferable.
I find the lists very useful. They sometimes reveal deficiencies in our definitions. They are often more useful than badly chosen quotations of use, of which we have many.
The information content is nearly the same as with collocations, just more compact.
If I were a reader, I would be glad someone is actually willing to do this kind of menial work.
I would therefore appreciate if I were allowed to continue using that format, in part as impetus for and a recognition of the work being done, even if uninspiring menial work. It will be easy to convert the information to a collocation format using a bot later in volume if desired. --Dan Polansky (talk) 14:46, 4 November 2022 (UTC)Reply[reply]
We use "Adjectives often used with" on 28 entries, "Nouns often used with" on 16, "Verbs often used with" on 6 and "Adverbs often used with" on 2. Several entries have multiple lists. If you have being using this format for over a decade, you clearly haven't been using it very often. It is certainly not a "traditional" format.
You also fail to account for the fact (which I have already mentioned) that we cannot use a bot to convert into collocations, because it cannot accurately predict how each of the collocated terms will fit together. In fact, you've addressed none of the issues. Please just do the work properly in the first place, instead of lying to everyone that your niche way of doing things is the de facto standard. I would also appreciate if you did not misrepresent the issue: the problem is obviously how the information is being presented, and not the fact that it is there at all. Theknightwho (talk) 14:59, 4 November 2022 (UTC)Reply[reply]
Agreed with User:Theknightwho on all accounts; please use the collocation templates and section. See for instance how nice and tidy broken looks, much better than those plain lists that visually interfere with the actual usage notes. Furthermore, I hope that the new translation table improvements also affect collocation tables because that would allow us to display them even more concisely while not sacrificing readability. By the way, {{co-top}} has already been deployed 262 times, {{coi}} 4'902 times. If there's any standard, it's this. — Fytcha T | L | C 〉 15:15, 4 November 2022 (UTC)Reply[reply]
I don't like how broken looks; the repetition of the adjective feels unnecessary and a relatively short list takes so much space, using only two colums. German Wiktionary uses a compact format with plain comma-separated lists. And it is of course more typing. More place taken in the wiki code. The adjectives are not linked either in broken. The lists usually do not interfere with anything since most entries do not have usage notes. And collocations are in fact usage.
If forced, I will probably resort to using this unseemly co-top business, but it is really annoying. --Dan Polansky (talk) 15:23, 4 November 2022 (UTC)Reply[reply]
In revision history of hopeless, I noticed there used to be my list of collocating nouns and someone has converted this to the new collocation format later. We did use to have many more of my lists before the new collocation format vote. --Dan Polansky (talk) 15:30, 4 November 2022 (UTC)Reply[reply]
We used to live in huts and shit in the woods. It is equally irrelevant. Theknightwho (talk) 15:36, 4 November 2022 (UTC)Reply[reply]
It supports my claim about traditional practice. The new practice is unlikely to be objectively better: German Wiktionary does not use it and Polish Wiktionary does not either, from what I remember. Some people happen to prefer this wasteful format and so do some collocation dictionaries; other collocation dictionaries don't. To liken other professional collocation dictionaries to "shitting in the woods" is outlandish. I always recommend paying attention to objective verifiable facts and contrast them to subjective preferences, whims and value statements. It would be more respectful and true to facts to recognize that people and their preferences differ and start from there. --Dan Polansky (talk) 15:41, 4 November 2022 (UTC)Reply[reply]
The issue is that you've given no actual argument other than calling it wasteful, which is trivially disproven by the fact that we can put it in a collapsible box, and doesn't address the fact that not all collocations are formulaic. What other Wiktionaries do is not of any relevance, given they frequently mimic our practices. Theknightwho (talk) 15:47, 4 November 2022 (UTC)Reply[reply]
"wasteful, which is trivially disproven by the fact that we can put it in a collapsible box": nonsense. It is visually wasteful once uncollapsed. Should not need to be said.
What other Wiktionaries and other collocation dictionaries do confirms that there is no "objectively best" way of doing it, and in fact, there's often no arguing about taste. I rest my case that we are dealing with subjective preferences, not objective facts of value. --Dan Polansky (talk)
Why does that matter when they can collapse and uncollapse it at will? These objections are absolutely surreal. You've not addressed a single concern raised here; you've just had a self-absorbed tantrum about having to do things a bit differently. Theknightwho (talk) 16:11, 4 November 2022 (UTC)Reply[reply]
An interesting question: how many of these collocations we now have were added anew by people willing to do the menial work and how many of them are just converted collocations entered by me. This would help show how much editors take the value of collocations seriously beyond talking about them and regulating them. --Dan Polansky (talk) 15:46, 4 November 2022 (UTC)Reply[reply]
Tagging @Vininn126, who is a big fan of collocations. Theknightwho (talk) 15:49, 4 November 2022 (UTC)Reply[reply]
All of the collocations I add are taken from a Polish National Corpus. Sometimes it takes a very long time to do them. Vininn126 (talk) 16:00, 4 November 2022 (UTC)Reply[reply]
  • I would think we would want these hidden by default, so that they wasted less space. Text in a show-hide bar could explain or hint at what lurked beneath. DCDuring (talk) 17:53, 4 November 2022 (UTC)Reply[reply]
    It really depends on the amount of collocations. When it gets close to 7 for a definition I move them from inline to the collapsible box. Sometimes I even set up multiple boxes with senses or even senseid's. Vininn126 (talk) 19:16, 4 November 2022 (UTC)Reply[reply]
    IMHO as soon as they take up more space than the show/hide bar, they should appear under it, collapsed. I'd use the number of columns that led to the smallest amount of vertical screen space occupied by these lists when expanded. Everyone wants to get a lot of space for their favorite content: etymology, pronunciation, citations, usage examples, etc.; now, collocations too. I still have the strong suspicion that users want definitions first and foremost. Their other interests vary. Registered users get to make appear what they want, whatever the default. DCDuring (talk) 22:29, 4 November 2022 (UTC)Reply[reply]
  • Agree with Theknightwho and DCDuring, including when Theknightwho is being rude. MuDavid 栘𩿠 (talk) 01:36, 5 November 2022 (UTC)Reply[reply]

The minds seem set, but for the benefit of the reader, let's consider 4 major collocation dictionaries:

Very interesting. The only one that does anything like Wiktionary is Cambridge. By my subjective taste, the format chosen by Wiktionary is greatly inferior, requiring visual parsing of the same repetitive element again and again. 3 of 4 collocation dictionaries agree. Oh well. I will add that whether an adjective collocates attributively or predicatively is largely irrelevant: it is still a collocation. A "rule" can be "rigid" in the predicative position, and that is also interesting. --Dan Polansky (talk) 14:32, 5 November 2022 (UTC)Reply[reply]

Stop being so fragile. You have already made it very clear that you have contempt for the concerns of other users over the way you want to do things. No need to repeat yourself. Theknightwho (talk) 14:38, 5 November 2022 (UTC)Reply[reply]
The current Wiktionary format is superior because it is also flexible in its presentation. You can just change your user JS to display collocations as text lists again if you insist. It's not possible the other way around: a script can't reliably parse your unstandardized plain text lists and convert them to bulleted lists. — Fytcha T | L | C 〉 14:45, 5 November 2022 (UTC)Reply[reply]
And how do I customize it so that the repetitive element gets hidden, to match the presentation in the collocation dictionaries? If it at least used tilde instead of the repetitive element, that would be quite an improvement. --Dan Polansky (talk) 14:54, 5 November 2022 (UTC)Reply[reply]
You could make your own template and apply it. If there were enough use or interest someone might moudule-ize it. DCDuring (talk) 15:07, 5 November 2022 (UTC)Reply[reply]
What is that supposed to mean? That is not JavaScript. I would have to edit mainspace, wouldn't I? That's not personal customization. --Dan Polansky (talk) 15:18, 5 November 2022 (UTC)Reply[reply]
@Dan Polansky: You can do that by adding this line to User:Dan Polansky/common.js: document.querySelectorAll('.collocation .e-example b').forEach(e => e.textContent = '~') This only works if the proper template ({{co}}/{{coi}}) is used and if the term is bolded (which should be done anyway). — Fytcha T | L | C 〉 15:33, 5 November 2022 (UTC)Reply[reply]
Thank you; fair enough. The boldface is another bad idea: the repetitive items are obtrusive enough even without boldface. If the collocating items that vary were in boldface, that would make a little bit more sense. A problem with the customization idea is that we should provide best defaults possible. We will have to assume that the choice made is the best default from usability standpoint. I don't believe that at all and 3 dictionaries agree with me, but the minds are set, so it's what it is. --Dan Polansky (talk) 15:38, 5 November 2022 (UTC)Reply[reply]
With your mindset, no-one would ever innovate. Theknightwho (talk) 15:52, 5 November 2022 (UTC)Reply[reply]

Change "Middle Mongolian" to "Middle Mongol"Edit

Discussion moved to WT:RFM.

Comments in quotationsEdit

If I need to clarify something minor in usage examples or quotations (for instance things for which English lacks a distinction), I abuse {{abbr}} to add my comment (see obe for a recent example). However, I'm wondering what the best approach would be for quotations where every other word merits a comment. This mainly happens when a language that is correctly written with (many) diacritics is informally written without them (see mierli for a recent example). I want to add the correct forms with diacritics so that learners know what to look up and how it is pronounced but I don't know how to best present this information. {{abbr}} seems inadequate because it is impossible to copy from the hover text (well, apart from editing the page) while {{sic}} after every other word looks woefully ugly and disrupts the reading flow. Ideas? — Fytcha T | L | C 〉 20:03, 5 November 2022 (UTC)Reply[reply]

@Fytcha: I always use block brackets [ ] - cf. the quote at hää. Thadh (talk) 21:53, 5 November 2022 (UTC)Reply[reply]
Brackets or even another line if a whole quote needs normalizing (analogous to how translations are given on a second line) seems like the best approach. The latter might require updating the template if anyone wanted to have the template do it rather than formatting the cite "manually". If multiple words need nomalizing or {{sic}}ing, it seems advisable to move the brackets to the end, i.e. knot [not] sick [sic] everi [every] wurd [word], but instead knot sick everi wurd [not sic every word]. Related issue: in the 2003 quote on that page,
  • 2003 April 13, Spaima Limbricilor, “Ploua, ploua, Bombonel se oua! [It rains, it rains, Bombonel lays eggs!]”, in soc.culture.romanian, Usenet [It rains, it rains, Bombonel lays eggs!][3]:
it's weird that the template puts the trans-title= redundantly in two places (and not in the best place either time; I'd think it would ideally be placed outside of, but directly next to, the quoted title). - -sche (discuss) 01:20, 7 November 2022 (UTC)Reply[reply]
What about using |tr= or |ts= for the normalized spelling? 01:24, 7 November 2022 (UTC)Reply[reply]
What about languages that do need transcription or transliteration? I don't think that's desirable. Thadh (talk) 07:14, 7 November 2022 (UTC)Reply[reply]
What I've done, e.g. at Pali ປິ (pi), is to split the transliteration line into transliteration of text as is and as normalised/corrected, separated by
"<br><span style='font-style:normal;'>With ambiguities resolved:</span><br>"
or similar using angle brackets in the actual text. The path is a bit complicated - the text is stored in Module:RQ:pi:Anisongfree in variable resolve and is formatted by {{quote-web}}. This technique is useful for Lao script, where the writing is usually ambiguous, and for older Tai Tham texts, where the spelling is idiosyncratic or simply atrocious. --RichardW57m (talk) 12:55, 7 November 2022 (UTC)Reply[reply]
Interesting. Maybe this tells us that we could do with an additional parameter in our templates, something along the lines of |copyedited=. As for [], I think a new template should be created attaches a separate CSS class to these comments. This has the advantages that their appearance is customizable and even toggleable and also that it is always clear which [] are from the source and which were inserted by us. — Fytcha T | L | C 〉 13:05, 7 November 2022 (UTC)Reply[reply]
@Fytcha: Good ideas! We need two versions of |copyedited=, one for the original script, and one for the transliteration, as it may sometimes be appropriate to edit the original script. While one could copy-edit my Tai Tham instances, typical Lao-script Pali writing systems cannot make the distinctions, which is why there was pressure to encode the Buddhist Institute's additions/restorations. In some cases, we might even want to correct the original and then resolve sandhi in the transliteration! That makes me think we need explanatory lines saying what the change is. --RichardW57m (talk) 13:26, 7 November 2022 (UTC)Reply[reply]

Was this change a 'substantial or contested change' that should have required a formal vote?Edit

In late 2014 and early 2015 a vote was held to decide whether to "[make] it official policy to delete entries which do not meet WT:CFI [...] even if there is a consensus to keep". That change was not enacted and the vote was closed "no consensus" with 7 supporting votes and 9 opposing votes (44% support).

Shortly after the vote was closed, Kephir, who entered an "abstain" in the vote, removed the template marking Wiktionary:Criteria for inclusion as a "policy, guideline or common practices page" and instead marked it as obsolete and "not intended to be used ever again" saying "how else can you interpret [the vote]?". Shortly afterward that BD2412, who did not participate in the vote, undid the change saying that "CFI can still be a guideline even if it is not mandatory where there is consensus for an exception". After that BD2412 added the following passage to Wiktionary:Criteria for inclusion:

In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination editors that inclusion of the term is likely to be useful to readers.

In their edit summary for the change BD2412 wrote that "[t]his is what the vote really means."

More recently, PUC removed the passage added by BD2412 saying "[it] was never approved by vote" and also explained "so that it can no longer be invoked by the likes of Dan Polansky".

For context, during the time that the passage added by BD2412 was part of Wiktionary:Criteria for inclusion, I am aware of seven instances where it was referenced as part of a discussion (1, 2, 3, 4, 5, 6, 7)

Per a 2012 vote, "[a]ny substantial or contested changes [to CFI] require a VOTE". My question for now is "was the removal of the passage originally added by BD2412 a 'substantial or contested change' that should have required a formal vote?" If the consensus is "no", then this discussion can resolve with no further action. If the consensus if "yes", I will start a vote about whether the passage should be removed from Wiktionary:Criteria for inclusion. I appreciate hearing everyone's thoughts and hope we can approach this question narrowly. Take care. —The Editor's Apprentice (talk) 23:33, 5 November 2022 (UTC)Reply[reply]

The removal by PUC is both substantial and contested, and per policy requires a vote. I understand PUC action: the sentence arrived into CFI without a vote about that sentence. But the sentence was in CFI for over 7 years (diff) and has become entrenched; any admin who opposed its addition could have removed it in the days, weeks and ever months that followed. Plus of the sentence: it documents our widespread practice of policy overrides, supported in some cases by nearly everyone, e.g. for hot words, for which no one played the stick-to-the-rules game. Without a sentence like this, CFI would be less honest. I don't really need the sentence anyway: I can always invoke User:Dan Polansky/IA § Policy override. Without policy overrides, editors must not invoke LEMMING ever again, nor "set phrase", nor "term of art", etc. We don't have Wikipedia's W:Wikipedia:IAR, and the sentence does this job in a nuanced and fairly weak manner, not so aggressive as the "Ignore all rules" phrase. And if we want to remove from CFI things that arrived without a vote and are not supported by consensus, fine, let's remove the irrational WT:COMPANY. If people want to improve the sentence and place it to other location of CFI, fine, let's do that, but as a replacement, not a removal. --Dan Polansky (talk) 07:38, 6 November 2022 (UTC)Reply[reply]
It has not become entrenched, and was in fact already contested one year ago, as can be seen from the discussion I linked to above. Maybe no admin cared enough to remove it until now, and maybe some did not even notice it.
It's the addition of this sentence that should be submitted to a vote and approved by a 2/3 (or 60% percent, I don't particularly care) majority, not its removal. Submitting its removal to a vote means a 1/3 minority could have its say in what appears in the CFI. PUC – 10:10, 6 November 2022 (UTC)Reply[reply]
Should WT:COMPANY be removed from CFI as not arriving into CFI via a vote? --Dan Polansky (talk) 11:03, 6 November 2022 (UTC)Reply[reply]
For others, do you wish the controversial phrasebook provision in CFI to be removed from it as not arriving into CFI via a vote? --Dan Polansky (talk) 11:04, 6 November 2022 (UTC)Reply[reply]
None of this is relevant to the discussion. It's just whataboutism. Theknightwho (talk) 11:48, 6 November 2022 (UTC)Reply[reply]
This kind of invocation of the concept of whataboutism is largely nonsense. The idea is: if one proposes to act in a way that can be questioned, one should identify the principle behind the action and examine the acceptability of the principle on a variety of specific examples. The idea is Kantian and Popperian. Here, the principle seems to be: "If part of CFI is controversial and was added to CFI without a vote, it should be removed without a vote even if it was in CFI for several years". I propose to investigate whether we want to accept the principle by applying it to a variety of cases. To apply the principle to some cases but not to others is to reject the principle.
PUC, are you acting on this principle or on another principle? Your answers to questions would be appreciated; I have no desire to interact with the Knight. --Dan Polansky (talk) 12:37, 6 November 2022 (UTC)Reply[reply]
This is obstructionism, and an obvious attempt to muddy the water by making the discussion about something that it is not. Please stop. OP even requested that we approach this question narrowly. Theknightwho (talk) 12:43, 6 November 2022 (UTC)Reply[reply]
If we are to approach the question "narrowly", then the only question is whether the change is a) substantial or b) contested. It is in fact both a) and b). Everyone should agree on that (which is glaringly obvious), make the agreement on record and move on. The core of this pickle is that a "narrow" approach seems unfair to PUC. It seems to me that there could be a meaningful interaction with PUC in which he would himself realize the approach he has taken is unworkable, and would undo his edit. --Dan Polansky (talk) 13:05, 6 November 2022 (UTC)Reply[reply]
No, that isn't what it means to approach a question narrowly, and you're just trying to find ways to keep talking about things that are not relevant. You do love your false dichotomies, though. Theknightwho (talk) 13:20, 6 November 2022 (UTC)Reply[reply]
I forbid myself to respond to the Knight in this thread. I allow myself to respond to PUC. --Dan Polansky (talk) 13:27, 6 November 2022 (UTC)Reply[reply]
I believe this was a substantial change as well, as I have always operated under the assumption that this rule applied. I would think many editors did, since few of us were here back in 2015. Thadh (talk) 13:59, 6 November 2022 (UTC)Reply[reply]
To quote what I said on Discord: "Agree it should not have been removed [as that] clause is not being evoked in every discussion anyways and plenty of entries get deleted [anyways]". If the issue is with Dan Polansky, then talk to him directly. As seen in the prior discussion there was no consensus as to whether or not that line should be deleted, and it was inappropriate to delete it, especially after having participated in the aforementioned discussion. And it was even more inappropriate to make such a sweeping change with no discussion in this forum and to target a specific user while doing so. I maintain that this, along with prior instances, is unbecoming of someone that has recently become an admin. I hope that another admin will revert this change while this discussion takes place. AG202 (talk) 15:11, 6 November 2022 (UTC)Reply[reply]
See also: User_talk:PUC#call_the_fire_department, pages shouldn't be deleted like this while ignoring CFI. This combined with other behaviors related to "inclusionists" is very concerning, and it's not the first time that this has been brought up either. AG202 (talk) 15:18, 6 November 2022 (UTC)Reply[reply]
I see two possible types of concern:
  • genuine concern: you simply care about my (not) following proper procedure, and would have objected just as strongly if I had summarily removed a sentence of CFI you don't agree with, or summarily deleted an otherwise valid entry you personally disliked (on account of offensiveness, for example);
  • ideologically motivated concern: you don't care as much about my (not) following proper procedure as about my challenging things that support your views.
Which one is it here?
As for your quote:
  • "[that] clause is not being evoked in every discussion anyways": what's the logic there? That clause still is a bad argument. It's a good thing when a virus isn't spreading everywhere; that doesn't mean we shouldn't try to get rid of it completely.
  • "plenty of entries get deleted [anyways]": so we should start grasping at straws?
PUC – 20:53, 7 November 2022 (UTC)Reply[reply]
I’ve called out “inclusionists” and “deletionists” alike for not following procedure and for many other reasons. Deleting a clause in CFI without discussion, however, is unheard of for me in my time here, and the fact that the edit summary was targeted towards a specific user, the other example I cited, and prior encounters, it raises it to a level of concern that untenable for me at the moment. If you had edited CFI in the other “direction”, I still would not have been pleased. Though I don’t agree with everything done here, I have never gone as far as to unilaterally change something major on my own without discussion and consensus, even when I’m the only one with expertise, let alone something as powerful as CFI.
As for the quotes you’ve cited, the first one was in response to your edit summary where you removed the clause. When you said “Dan Polansky and the likes”, it makes it seem like it’s something that’s seeping in every discussion, when it’s not, and even if it was, you should’ve still had discussion about it here first. The second quote was less of an argument and more of a personal observation. In my time here, I haven’t really seen that clause itself save an entry enough times to warrant any sort of backlash at this rate (“a virus,” really?). I understand that you don’t like it, but that’s not how things work here as I’ve seen myself many times. There should be consensus. I’m just overall concerned that you took it upon yourself to make that change with the admin power that you recently got, almost in retaliation against described “inclusionists” when, to me, that’s not really what admin power should be for. AG202 (talk) 02:55, 8 November 2022 (UTC)Reply[reply]

I've reverted myself. @The Editor's Apprentice: I do hope this will be put to the vote. As you can see, the sentence is controversial, and should certainly not get a free pass imo. It's unfortunate that it's been sitting there unchallenged for so many years. PUC – 19:29, 6 November 2022 (UTC)Reply[reply]

Since there seems to be consensus that the removal was "substantial" and/or should have followed a formal vote, I have started a formal vote which is currently in the premature stage to answer the question of if the passage originally added by BD2412 should remain. Please give it a look and discuss and possible improvements or fixes on the vote's talk page. Take care. —The Editor's Apprentice (talk) 06:42, 8 November 2022 (UTC)Reply[reply]

Let's deprecate the Thesaurus namespaceEdit

To be clear, I think there's a lot of value in giving synonyms, but I think there are some serious flaws in how we do it at the moment. I don't want to set out a detailed proposal for this without getting a sense for what the consensus is, but my overall impression is that we probably want to integrate it into the mainspace:

  1. Badly neglected and inconsistent. I think it's fair to say that not very many editors maintain thesaurus pages. They're inconsistently categorised (Category:English thesaurus entries is extremely incomplete), and have no standardised layout, which creates confusion for the reader. There are also wide inconsistencies as to whether we should be including the language code in the page name. None of this is desirable, and it certainly does not aid the reader. It's also obvious that the various clean-up jobs which have been done over the years on Wiktionary have bypassed the Thesaurus: the template still uses the acryonym "ws" (depite the Wikisaurus name being deprecated back in 2017), there are still a bunch of interlanguage links (which have been removed everywhere else), and the templates still follow schemes that have been deprecated (e.g. they still use lang). These are all obviously fixable, but it is highly indicative of how much attention is actually paid to these pages by the majority of the editor-base (read: not much). The lists even still require manual alphabetizing, which is absurd.
  2. Potential clutter is easily avoidable. As with other sections which often contain lengthy lists (e.g. derived terms), there are ways of including these that don't clutter the page. The most obvious solution being to ensure that the section is collapsible.
  3. Better to be consistent with everything else. I can't think of a compelling reason for treating synonyms differently to everything else; particularly given that we only do this when lists of synonyms become longer. It's far more accessible (and better meets reader expectations) to treat synonyms consistently across all pages, whether the list has 3 entries or 300; just as we do with derived terms et al.
  4. A better model is already in use. The pages for Chinese already make use of an extensive system of modularised thesaurus templates, which can be placed on each page as necessary, and update automatically as new synonyms are added (i.e. bypassing the reason for having thesaurus pages in the first place). In the case of Chinese, these are primarily used to show dialectal distribution, but there is no reason why a similar system can't be used in a more general purpose way. You can see a bunch of these in use at 條#Chinese. Note: I am not saying we should use this layout; just that the underlying system can obviously be utilised by other languages. It's also not the only possible solution, but simply an example of how we could do things better. A less radical model would be doing what we do with translations, which is to point the user to the translation section on the primary entry.

What are people's thoughts? Theknightwho (talk) 22:47, 7 November 2022 (UTC)Reply[reply]

One use case for a thesaurus is to try to gradually navigate to the mot juste (or a word you have forgotten). I've also used this e.g. when composing cryptic crossword clues and trying to create a convincing "surface reading". In such cases it's very useful to navigate in a thesaurus-only mode: when I see a candidate, I click it, and jump to the thesarus page for that word, and thus get closer and closer. You see the same interface in e.g. Microsoft Word thesaurus. (We don't really have enough thesaurus root words to be able to do this, yet.) Equinox 23:03, 7 November 2022 (UTC)Reply[reply]
That is a good point. I think the main problem that we have is that our thesaurus at the moment essentially just acts like an overflow, and I don't think it's likely to change anytime soon. I also suspect that any modularized implementation would allow both formats, and for greatly expanded coverage in thesaurus-only mode as well (as any input to page sections would also benefit that). Theknightwho (talk) 23:14, 7 November 2022 (UTC)Reply[reply]
I agree that there is a lot of improvement to be made with respect to synonyms, and a dropdown template with a dozen words or so automatically added in seems like a great idea. But in my opinion, many thesaurus entries are so impractically long that they deserve their own pages. Do we really want the full list of synonyms in Thesaurus:drunk to display in the main entry?
Ioaxxere (talk) 05:39, 8 November 2022 (UTC)Reply[reply]
We already include pages with very large numbers of derived terms (comparable to Thesaurus:drunk); it doesn't seem like too much of any issue to me. Just have a look at the derived terms on neuro-, where they're not too difficult to parse (remembering that most thesaurus entries won't have lots of similar-looking terms like that, either). Theknightwho (talk) 07:01, 8 November 2022 (UTC)Reply[reply]
Don't forget about the Lua memory limits. Having more content on main will likely push some entries over the cliff. Could we have a mixed model, where most of the content is in Thesaurus:, but some of it could be pulled into the main space? Perhaps the most salient synonyms? – Jberkel 08:22, 8 November 2022 (UTC)Reply[reply]
Good point. I almost never remove synonyms from the mainspace. In my view, the mainspace should list some of the most common synonyms and then link to the thesaurus. By contrast, I have seen some editors remove synonyms from the mainspace, which I find not so good. --Dan Polansky (talk) 08:44, 8 November 2022 (UTC)Reply[reply]
Lua memory limits are a concern only on a comparatively tiny number of pages. While the issue is obviously there, it's important to remember that the back-end for labels alone is considerably more burdensome than synonyms are ever likely to be, given the size of the module tables involved. Theknightwho (talk) 09:46, 8 November 2022 (UTC)Reply[reply]
Yes, concerns are now only on a small number of pages, but only because content was moved *out* of pages. I think this should be the general direction to follow, moving non-essential content out of main, either to namespaces or to Wikidata. Editing large pages is already very slow right now. – Jberkel 10:29, 8 November 2022 (UTC)Reply[reply]
Including a massive list of synonyms within the entry, many of them slang or obscure, would seriously hinder writers who are just looking for a single decent word. We should provide 10-20 of the most common and useful synonyms (and a handful of antonyms) as part of a template, provided with a link to the full thesaurus page. See the layout of Google's dictionary for what I basically mean. Ioaxxere (talk) 14:28, 8 November 2022 (UTC)Reply[reply]
There are plenty of options for how we could lay things out. There is no obligation to do a massive list with no additional context. Theknightwho (talk) 15:26, 8 November 2022 (UTC)Reply[reply]
I agree. Your example with neuro- gave the impression that you would like a list of synonyms to be laid out as such, but that would of course be less than ideal. Ioaxxere (talk) 15:33, 8 November 2022 (UTC)Reply[reply]
I guess my point was just that there's plenty of precedent for having large lists in mainspace. I'd certainly prefer them to be subdivided sensibly, though. Theknightwho (talk) 15:48, 8 November 2022 (UTC)Reply[reply]
Benefits are described here:
In brief, the thesaurus helps ease maintenance of semantic lists by centralization, allows focus on a single sense or place in the semantic space, allows focus on semantic relations to the exclusion of etymology, pronunciation, etc., and provides hints where to navigate next to find other semantic lists via the "=> Thesaurus" links next to items.
The problems raised do not seem intractable or very serious. The greatest problem is the lack of interest of editors, but I don't expect using the mainspace would improve that very much. The work on the thesaurus involves hard and unique challenges that most editors are not interested in. The bulk of the English thesaurus was made by two people with serious interest in it; AdamBMorgan did a lot of work there. An entry to consider is Thesaurus:number with all its structure and rich content, not constrained to synonyms, hyponyms and meronyms. To form a better idea of what's involved and what the mentioned benefits mean in practice, one has to look at some of the more interesting complex non-synonymic entries. (As an aside, the voters in Wiktionary:Votes/pl-2017-11/Restricting Thesaurus to English thought having a separate thesaurus is a good idea.) --Dan Polansky (talk) 07:25, 8 November 2022 (UTC)Reply[reply]
Just compare how much attention derived terms get compared to the thesaurus. The difference is enormous. You haven't really explained why it needs to be in a separate namespace or presented any solutions to the (numerous) issues outlined, which are well-proven to be a problem judging by just how neglected and problematic the Thesaurus namespace currently is.
By the way, I'm going to nip in the bud any attempt to misrepresent this as being about the work involved or whether synonyms are valuable, because quite obviously I want to improve access to that, not remove it. Theknightwho (talk) 07:43, 8 November 2022 (UTC)Reply[reply]
Derived terms are entirely trivial to add and figure out, requiring no skill to talk of at all; semantic relations, which emphatically are not just true synonymy, which is relatively boring and uninspiring, are a whole different beast. I recommend the readers to read the page with the benefits articulated, and if anyone has any questions for me, please ask, and I will try to do my best. --Dan Polansky (talk) 08:01, 8 November 2022 (UTC)Reply[reply]
What relevance does any of that have to how we present information? You're also wrong, but it simply doesn't have anything to do with the topic at hand. The thesaurus pages are in a sorry state, whichever way you slice things. Theknightwho (talk) 08:17, 8 November 2022 (UTC)Reply[reply]
Let's try something different: where is the thesaurus data for Thesaurus:number to be stored? Directly in the mainspace, in number? What about Thesaurus:drunk, in drunk? Will there be templates and modules to extract the content from the mainspace entry drunk and show it in the synonym entries? --Dan Polansky (talk) 12:09, 10 November 2022 (UTC)Reply[reply]
From reading the top again, the answer seems to be templates, like in some Chinese entries. So all the people who could not figure out the thesaurus will be able and willing to do essentially the same kind of information filtering, selecting, taxonomizing and sequential ordering (e.g. Thesaurus:number), just using the template namespace and template technology? Is really using templates and modules easier for non-technical mortals, perhaps semanticists, ontologists and philosophers in general, than using the markup in use in the thesaurus? And why could not the same templating and module technology proposed be used in the thesaurus namespace? Could we thus retain the namespace but use the proposed technological change, provided the change really brings more pros than cons? --Dan Polansky (talk) 12:29, 10 November 2022 (UTC)Reply[reply]
People seem to have no problem doing so with everything else that's done through modules. It's not that we're all too thick to work out how to use the thesaurus, if that's what you're implying. Theknightwho (talk) 12:52, 10 November 2022 (UTC)Reply[reply]
Okay, let us assume (I don't) that editing modules and templates is generally as easy and general-editor-friendly as editing the current setup of the thesaurus. How is the semantic focus to the exclusion of everything else going to be achieved, given the semantic relationships are going to be transcluded in to the mainspace in some way? --Dan Polansky (talk) 13:13, 10 November 2022 (UTC)Reply[reply]
It's possible to use module data for more than one purpose. This is trivially obvious. Theknightwho (talk) 16:34, 11 November 2022 (UTC)Reply[reply]
@Dan Polansky: I think it would have been better had you disclosed that you are the sole editor of the benefits page. By linking to a Wiktionary namespace entry where arguments are collected, people who forget to check the history are tempted to think that there is more support for your personal views than there actually is. Please don't try to make it look like something that is not the case (e.g. by writing Benefits are described here instead of "I've described the benefits here"). — Fytcha T | L | C 〉 11:47, 8 November 2022 (UTC)Reply[reply]
Fair point; I could have made it explicit that I am the sole author of the argumentation. However, the benefits are an exercise in argumentation and do not necessarily have objective factual validity, as is all too often the case with "benefits". Everyone has to form their own judgment. While there are some purely factually valid claims such as the listing of thesauri in other dictionaries, to what extent the factual claims are relevant or convincing is for the reader to determine. --Dan Polansky (talk) 11:54, 8 November 2022 (UTC)Reply[reply]
Perhaps your views would be better-suited to your personal userspace. Theknightwho (talk) 15:25, 8 November 2022 (UTC)Reply[reply]

Keep, but rename it to DanThoughts. --Vahag (talk) 11:07, 10 November 2022 (UTC)Reply[reply]

new Vector 2022: wasted spaceEdit

Looks like they've just added a banner prompting people to switch to Vector 2022. I tried it and the first thing I notice is that there's a lot of wasted whitespace on the right, which seems to serve no purpose at all; in addition, the left rail got wider, which combined with the wasted space on the right means the contents in the middle are a lot narrower. I gather many people will switch skins, but all the wasted space will mean we potentially need to make things significantly more vertical and less horizontal (maybe necessary anyway for mobile devices, but otherwise non-ideal). Is there a way to recover the space with some customization settings while not switching entirely back to Vector 2010? Benwing2 (talk) 04:04, 8 November 2022 (UTC)Reply[reply]

Honestly it looks like a mobile version to me, but I'm just being grumpy. Vininn126 (talk) 10:11, 8 November 2022 (UTC)Reply[reply]
They also somehow managed to make the title non-copyable in edit mode (again)  :/ (phab:T322725) – Jberkel 10:37, 8 November 2022 (UTC)Reply[reply]
The only hope I have left after seeing that they didn't respond to well-founded criticism is that Vector 2022 is never rolled out as a default skin on en.wikt (do we have control over that?). It's patently clear that this skin was created with only Wikipedia in mind. — Fytcha T | L | C 〉 11:53, 8 November 2022 (UTC)Reply[reply]
Unchecking "Enable limited width mode" under "Skin preferences" recovers the space on the right, but the left side remains well padded. JeffDoozan (talk) 20:23, 8 November 2022 (UTC)Reply[reply]
It might be possible to change the sidebar width. The DIV class seems to be .mw-panel but I was trying to figure it out a few days ago and for some reason it didnt work on the site, even though it did work in my HTML editor on my computer. There might be some CSS that's loading externally that's interfering with it. What I do know is that I've hidden the sidebar entirely on a private wiki where it just doesn't serve much purpose. I wouldn't want to hide the sidebar on Wiktionary, but I'd hope it is at least possible to compress it and make the font smaller since most high-volume editors won't need it very often. Soap 19:34, 11 November 2022 (UTC)Reply[reply]

Including lists of notable people in a fieldEdit

@Dan Polansky wants to include a list of notable philosophers in our thesaurus page (history) Thesaurus:philosopher. What do other people think about this? — Fytcha T | L | C 〉 12:18, 8 November 2022 (UTC)>Reply[reply]

We do list instances in various entries in the thesaurus, and it makes sense, e.g. Thesaurus:country and Thesaurus:political party. The "instance of" relationship is well established in the thesaurus. In the discussed entry, it follows the example of Moby II and WordNet. In so far as mainspace entries should better be covered in the thesaurus, e.g. the sense for Aristotle should be covered somewhere and it naturally belongs to Thesaurus:philosopher. The choice of the notable philosophers is driven by criteria that, while arbitrary, are bound to two specific external lists, providing for maintainability. --Dan Polansky (talk) 12:26, 8 November 2022 (UTC)Reply[reply]
I really don't understand the point of this proposal, at least as a proposal vs. an opportunity to troll. Most dictionaries do not include such lists. In particular, at least one dictionary with an affiliated encyclopedia does not, Merriam Webster. Why should we be duplicating content available from a sister project. We would have at best the same content WP has as in a mere listing page or category. DCDuring (talk) 16:09, 10 November 2022 (UTC)Reply[reply]
@DCDuring To understand the objections better, I have posed a list of analytical questions below. If you would be so inclined and answered some of them, that would be great. The lead question is whether all instance-of relationships are a problem or something else is a problem. As for background, I do recall your objections to our having geographic names, so I am not surprised by your opposition. I am surprised that you spend most of your time here making a vastly incomplete replica of Wikispecies, but that's your choice, not for me to judge. --Dan Polansky (talk) 16:19, 10 November 2022 (UTC)Reply[reply]
Not in the least interest in encouraging any more of this. DCDuring (talk) 18:21, 10 November 2022 (UTC)Reply[reply]
@DCDuring We do have geographic entries as per voted policy supported by a 2/3-supermajority and we do have planets. Should Mars be removed from Thesaurus:planet since it is in instance-of relationship? And should biological taxa be removed from the mainspace since they are generally considered names of specific entities and they do not show attributive use in widely understood meaning? There is in fact no policy protection for names of taxa. Where are your principles, if any? --Dan Polansky (talk) 19:06, 10 November 2022 (UTC)Reply[reply]
I can see this as a type of hyponym, and I think it makes a certain amount of sense. I wonder if a link to a category page or something would be better. Vininn126 (talk) 12:27, 8 November 2022 (UTC)Reply[reply]
One thing is certain: listing all merely "notable" philosophers rather than "very notable" would become unwieldy. One can try to figure out where to draw the line, and include fewer notable instances. The choice I made was based on two reasonably short external lists, one a thesaurus, one a semantic network that we picked the semantic relationships from. If there is a shorter canonical list, we can consider using that one. A comprehensive list of notable philosophers should indeed be delegated to a category. However, including senses for specific philosophers in the mainspace is still a controversial issue, with no policy regulating the subject, so filling the category would be controversial. --Dan Polansky (talk) 12:39, 8 November 2022 (UTC)Reply[reply]
I see this as a can of worms: It opens us up to potentially endless content disputes over who is or isn't a philosopher, whether someone's pet philosopher is noteworthy enough etc. with no good mechanism to determine who's in the right. We could theoretically do the heavy legwork of meticulously defining a razor-sharp demarcation such that there is no dispute possible, but OTOH we could also just not include these. Providing a list of luminaries in a field is an encyclopedia's job. I also want to remind that there is currently majority (though currently not supermajority) support in an ongoing RFD to delete such a "surname-person-sense": Wiktionary:Requests_for_deletion/English#DickensFytcha T | L | C 〉 12:44, 8 November 2022 (UTC)Reply[reply]
The thesaurus as a whole provides for potentially endless content disputes since so much of it cannot be algorithmically and deterministically regulated. Arbitrary external lists can be picked and there does not need to be any dispute. I picked two lists that do not in any way cater to my preferences but rather are "natural" picks. While Dickens is perhaps more vulnerable to poorly argued deletionist whims, Aristotle could be less so. I will also note that -ian/-ist nouns describing adherents (Platonist, Aristotelian, Marxist) are natural hyponyms of Thesaurus:philosopher, and will be probably listed anyway, and the selection problem will be the same or similar. As for the "covered by encyclopedia" argument, that alone has almost no force since a lot of dictionary content is necessarily covered by encyclopedias, and better so, e.g. names of laws, theorems and principles. Many dictionaries/networks do think this kind of content is inclusion worthy. --Dan Polansky (talk) 12:58, 8 November 2022 (UTC)Reply[reply]
I strongly advise that you read WT:What Wiktionary is not. It is becoming extremely tiresome having to relitigate all the minutiae of Wiktionary because of your constant attempts at rules lawyering. There is an obvious and material difference between terms like Marxism and the philosopher Karl Marx. We focus on the former, and leave the details of the things those terms describe to Wikipedia. Theknightwho (talk) 15:35, 8 November 2022 (UTC)Reply[reply]
My point is that the problem of selection criteria for surnames and -ian/-ist items is the same. If all -ian/-ist items are listed, they will be too many. And the list is fairly selective and interesting. Such a list is in fact not found in Wikipedia; you can try. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)Reply[reply]
Why do we need lists for surnames or -ian/-ist terms? Both of those are already covered by categories. Theknightwho (talk) 15:57, 8 November 2022 (UTC)Reply[reply]
To make the thesaurus more complete as for hyponymy. The categories do not list hyponyms of "philosopher"; they list all derivations from -ian/-ist, which will be not only such hyponyms. At the very least, one should list a few examples to remind the reader that such hyponyms exist. --Dan Polansky (talk) 16:00, 8 November 2022 (UTC)Reply[reply]
What you seem to be doing is manually creating lists that could be trivially generated from categories. Theknightwho (talk) 16:02, 8 November 2022 (UTC)Reply[reply]
A selection of very notable instances cannot be generated from categories. That holds true for all instances for which we have names in Wiktionary, whether people, countries, cities, rivers, mountains, etc. In my view, listing at least some instances adds value. WordNet agrees. --Dan Polansky (talk) 16:13, 8 November 2022 (UTC)Reply[reply]
Notable by who's judgment? Why do we care about a list of philosophers that you personally consider noteworthy? How does any of this relate to terms? Theknightwho (talk) 16:18, 8 November 2022 (UTC)Reply[reply]
By the determination of an external list, not mine. Popper is missing on the list, a scandal. The point is that exemplification is better than no exemplification. I am fine discussing how many should be there, whether 50, 100 or 200. For rivers, I picked the longest ones, that's easier and more easily measurable than notability of philosophers. But notability of philosophers is also a fact; some are much more notable than others. --Dan Polansky (talk) 16:25, 8 November 2022 (UTC)Reply[reply]
So you've just copied a list you found somewhere else? Theknightwho (talk) 16:26, 8 November 2022 (UTC)Reply[reply]
(outdent) The list is a union of Moby Thesaurus II and WordNet for "philosopher", and precisely the union. Nothing personal. We may choose a different standard if we wish. --Dan Polansky (talk) 16:41, 8 November 2022 (UTC)Reply[reply]
Oppose. Equinox 15:27, 8 November 2022 (UTC)Reply[reply]
What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)Reply[reply]
Maybe you better start another vote saying "everyone must explain their votes to Dan's satisfaction". I'm not an idiot. Equinox 09:07, 25 November 2022 (UTC)Reply[reply]
Oppose. Obviously not dictionary material. Theknightwho (talk) 15:29, 8 November 2022 (UTC)Reply[reply]
What is the substantive argument? Consensus should be based on a combination of voting and reasoning. --Dan Polansky (talk) 15:53, 8 November 2022 (UTC)Reply[reply]
Nothing "obvious" about it given WordNet disagrees and so do multiple other dictionaries that do contain biographical entries. From the point of view of an external observer with no bias, it is dictionary material in so far as it is found in dictionaries. --Dan Polansky (talk) 15:56, 8 November 2022 (UTC)Reply[reply]
It's explained in point 1 of WT:What Wiktionary is not: Wiktionary is not an encyclopedia, a genealogy database, or an atlas; that is, it is not an in-depth collection of factual information, or of data about places and people. Encyclopedic information should be placed in our sister project, Wikipedia. Wiktionary entries are about words. A Wiktionary entry should focus on matters of language and wordsmithing: spelling, pronunciation, etymology, translation, concept, usage, quotations and links to related words. Theknightwho (talk) 16:01, 8 November 2022 (UTC)Reply[reply]
That's fine; it's just the "instance-of" relationship, not "in-depth collection of factual information" and not "data about places and people". And the quoted passage is flawed in that it does not even recognize semantic relationships as valid content. --Dan Polansky (talk) 16:09, 8 November 2022 (UTC)Reply[reply]
Oppose. We are not a short-attention-span version of WP. DCDuring (talk) 16:46, 8 November 2022 (UTC)Reply[reply]
More substance please. Neither is WordNet. Exemplification is a great principle. --Dan Polansky (talk) 16:48, 8 November 2022 (UTC)Reply[reply]
You have thousands of words here. Stop demanding that people give you lengthy explanations; especially when you make absolutely no effort to come to a common understanding with other users anyway. Theknightwho (talk) 18:47, 8 November 2022 (UTC)Reply[reply]
Clarifying my previous comment - ultimately oppose. At most listing a category or somethiing it's not lexical. Vininn126 (talk) 15:36, 8 November 2022 (UTC)Reply[reply]
Oppose as well, it's not lexical as Vininn stated and amounts to pure taxonomising of things rather than words. —Al-Muqanna المقنع (talk) 16:04, 8 November 2022 (UTC)Reply[reply]
What does it mean "it's not lexical"? Is the "instance of" relationship a problem, or just notable people? Can countries be listed in Thesaurus:country? --Dan Polansky (talk) 16:06, 8 November 2022 (UTC)Reply[reply]
Individuals aren't lexical, that's an axiom. Vininn126 (talk) 16:11, 8 November 2022 (UTC)Reply[reply]
Names of individual entities (people, rivers, etc.) are words, unless they are multi-word names and thus are "lexical" and even multi-word names are as lexical as phrases. "Instance of" is a semantic relationship, as per WordNet and common sense. --Dan Polansky (talk) 16:16, 8 November 2022 (UTC)Reply[reply]
And categories of specific rivers are also not "lexical"? Should they therefore be deleted as encyclopedic? --Dan Polansky (talk) 16:17, 8 November 2022 (UTC)Reply[reply]
Again: why are you duplicating the function of categories, while also removing a load of info? It just means either more maintenance work, or yet another thing that will become neglected; like much of the thesaurus already is. Theknightwho (talk) 16:25, 8 November 2022 (UTC)Reply[reply]
What am I "removing"? I don't recall removing anything. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)Reply[reply]
Re-read what I wrote. Theknightwho (talk) 16:42, 8 November 2022 (UTC)Reply[reply]
Nothing to see there. I am not removing anything; I am doing exemplification on the model of Moby II and WordNet. WordNet is an amazing role model, absolutely astounding, regardless of the flaws that it necessarily has. There is no maintenance problem: the list is frozen as a union of Moby II and WordNet. --Dan Polansky (talk) 16:46, 8 November 2022 (UTC)Reply[reply]
You're unbelievable. I obviously meant that you are including a cut-down version of a list that we already have. Stop finding every excuse to miss the point. Theknightwho (talk) 18:48, 8 November 2022 (UTC)Reply[reply]
Yes, exemplification means that not the complete list is included. What you are saying is "exemplification is bad", without saying why it is bad. To my mind, complete lists are uninteresting: I have no interest to look at a comprehensive list of 10,000 philosophers, most of which will not ring any bell in my mind. The same for rivers: I would rather be reminded of some notable instances than of the first 200 items of a comprehensive list where the only claim the items make for themselves is that they lead the alphabet. If I wanted a complete list of rivers, I could go to Wikipedia anyway, or make a Wikidata query; I don't need a dictionary for that. --Dan Polansky (talk) 18:58, 8 November 2022 (UTC)Reply[reply]
There have been arguments for that! Proper nouns are inherently different. Keeping them for etymologies and other information is one reason to keep them, but they are inherently different from common nouns. Vininn126 (talk) 16:28, 8 November 2022 (UTC)Reply[reply]
Are you now saying that "Amazon" is not a word? I placed a source to Appendix:Wordhood claiming otherwise, although I find the claim that it is not a word an absurdity. --Dan Polansky (talk) 16:34, 8 November 2022 (UTC)Reply[reply]
I did not say that! I said they are different. Please stick to the words that I use! Vininn126 (talk) 16:37, 8 November 2022 (UTC)Reply[reply]
Fine, just answer "No" and we move on. We now have that "Amazon" is a word. Now, is the "instance of" relationship between "Amazon" and "river" a relationship that is "lexical"? --Dan Polansky (talk) 16:39, 8 November 2022 (UTC)Reply[reply]
Amazon IS a river specifically, but it's ONE river, which is a different relationship than a TYPE of river which can refer to many instances of it. THAT is lexical. Vininn126 (talk) 16:46, 8 November 2022 (UTC)Reply[reply]
But surely "instance of" is the relationship between the meaning of words "Amazon" and "River" and therefore is "lexical" (of or pertaining to words)? Why would hyponymy be lexical and "instance of" not given both are relationships between word meanings? --Dan Polansky (talk) 16:51, 8 November 2022 (UTC)Reply[reply]
This are inherently different kinds of instances. This is a singular instance, one-of-a-kind, inherently by definition. Other instances, countable or uncountable, still refer to something that can be or be shared with multiple entities - which is why proper nouns are different from non-proper nouns and why the relationship is not lexical. If the Amazon belonged to a category of rivers that behaved differently than other categories of rivers, whatever word we used to describe that category would be lexical. Vininn126 (talk) 16:57, 8 November 2022 (UTC)Reply[reply]
(outdent) There is no doubt "hyponymy" and "instance of" are different relationships, as recognized by Wikidata, although WordNet confuses the two a bit. But what does it have to do with the word "lexical"? What is the definition of the word "lexical" other than "of or relating to words"? And what is the business with the word "lexical" anyway? The words "Amazon" and "river" are semantically connected, and the thesaurus relationships are semantic relationships; the word "lexical" is not used for the purpose. And why are categories allowed to do something that the thesaurus is not? --Dan Polansky (talk) 17:02, 8 November 2022 (UTC)Reply[reply]
Inclined to oppose. The example of political parties is shaky as I'm sure that many of those could be subject to RFD themselves (some have already been deleted), but the example of countries isn't relevant because those are explicitly allowed by CFI, whereas philosophers are not. A lot of the entries linked don't even have an entry for the philosopher mentioned, and a few of the ones that do don't feel super notable and could be subject to RFV/RFD. AG202 (talk) 16:51, 8 November 2022 (UTC)Reply[reply]
Now that's a different line of reasoning. It would mean listing countries in Thesaurus:country would be okay because the names themselves as countries are guaranteed to be included. For philosophers, I would argue that their names are going to be included in some form, e.g. as Aristotle or as Russell, so they will mostly be bluelinks, and where they would be redlinks, {{ws}} enables saying link= to disable linking.
I believe among the large set of all Wikipedia-notable philosophers, those relatively few listed are likely to be very notable, given the selection made by the authors of WordNet and Moby II. A different list of notable philosophers could be chosen; I am fine with that. --Dan Polansky (talk) 16:57, 8 November 2022 (UTC)Reply[reply]
No. Since when are dictionaries to present the instance level? There may be a ”philosophy dictionary” doing this, but only in as much as it is not a dictionary but misnomed. You also have a link to a list of philosophers on Wikipedia which has better personnel for the same job.
Your attempts to deduce arguments from assumptions that you put in our mouths but we have not mentioned are all beside the point and a waste. You are the only one dropping the term “word” in this thread. WT:CFI saying “all words in all languages” is not specific enough to demand inclusion of anything that you deem a word. And still we do not ascribe value to “notability” in an absolute sense—the instance may be as notable as it can be, in the context of this project it won’t be as much. Fay Freak (talk) 17:05, 8 November 2022 (UTC)Reply[reply]
Ever heard of WordNet? And Moby II thesaurus? Both lexicographical works? I made neither of them. Ever heard of Wiktionary topical categories for specific rivers? Not dictionary content? As for "word", I did not introduce the word "lexical" into the discussion and "lexical" means "of or relating to words". The point is not really notability but exemplification, and to achieve exemplification, one needs to make some arbitrary cut off or choice, which for rivers may be length and for philosophers may be notability. --Dan Polansky (talk) 17:13, 8 November 2022 (UTC)Reply[reply]
It seems like you are the only one who esteems it necessary to cut off and around the dictionary arbitrarily. The other editors here work on some kind of system, which you seek every opportunity to deny by introducing anything to estrange em, though its foreignness to this place be immediately discernible, supported by the observation that few have much ambition to formulate rules—but this is your personal guiding theme, others just want to write a dictionary and not a philosophy of dictionaries, which they aren’t at a loss about. Fay Freak (talk) 17:27, 8 November 2022 (UTC)Reply[reply]
Well, it is not reasonable to list all rivers as instances in Thesaurus:river, so if examples are to be given on the model of WordNet and Moby II, some selection has to take place, some arbitrary cut off. I don't understand what the above is all about. How does that relate to anything that I have said above? How does my editing of the thesaurus interfere with anything that others are doing? How does it impact "work on some kind of system"? --Dan Polansky (talk) 17:48, 8 November 2022 (UTC)Reply[reply]
Why do you want to manually duplicate what we can already do with categories? And why do you want to do so in a way that requires an "arbitrary cut off"? These are arguments against your approach, because they make it very clear that there is no underlying principle here other than what you've decided to hyperfocus on today. Theknightwho (talk) 18:51, 8 November 2022 (UTC)Reply[reply]
To exemplify, as I write above, not just philosophers but rivers, countries, mountains, mountain ranges, etc. I want to follow WordNet's wisdom. All I hear is "exemplification is bad", with no argument to support that notion. Exemplification is not duplication of a comprehensive list, by definition. --Dan Polansky (talk) 19:04, 8 November 2022 (UTC)Reply[reply]

This is a lost cause, but let me make the point that the thesaurus is a word finder. To find the name of a very notable philosopher is to find a word, by starting with another related word, here "philosopher". Why make the word finding function less rich? Sure, other sources such as WordNet already do the job and are one click away, but why make the "word finder" less rich in its "word finding" capacity? It has enough space on the page. --Dan Polansky (talk) 20:40, 8 November 2022 (UTC)Reply[reply]

The list provided a tool to navigate from philosophers to the derived adjectives: you click on the name to get to the mainspace and there you see the derived adjective. For that, the philosopher does not necessarily need to have a sense in the mainspace, only an entry for the name. Thus, one can answer the question: what notable philosophers have an adjective derived from them? Without the list, there is no way to do that in Wiktionary. A category of philosophers would only be there to serve the purpose if they all had senses in the mainspace, which is controversial; the thesaurus can work without that. --Dan Polansky (talk) 21:05, 8 November 2022 (UTC)Reply[reply]

DuckDuckGo and Yandex and Twitter’s and Reddits search functions are also word finders, most useful to find usage and discussions of terms; doesn’t mean Wiktionarians should build a search engine to be accessed by the Thesaurus namespace. It’s a word finder but not for all kinds of words and only to find these limited kinds of “words” in a specific fashion. You are not making a point but a petitio principii the whole time, defining things as what you want them to be.
The list was a kludge, like an improvised explosive device. But you don’t have any mission to take on this site, repurpose the tools offered here to achieve your objectives, as you can just enter other sites and employ their devices. You are acting as though there were a frontline that, to extend your influence sphere, you would have to break by any argument imaginable, not being able to desert from your position, but actually we have to cooperatively restrict ourselves for a concentrated and coordinated effort to allocate scanty manhours, which are diluted if there is no prospect of contours in the resulting work. Fay Freak (talk) 21:41, 8 November 2022 (UTC)Reply[reply]
I'm about 85% sure that I agree with you, but I have to admit that I did get lost in your second paragraph. Theknightwho (talk) 21:47, 8 November 2022 (UTC)Reply[reply]
What nonsense. The semantic relations employed by Wiktionary and the thesaurus are modeled on WordNet, and I am merely following WordNet's lead, making my own thoughts along the way and finding that I like the result, which is still in the revision history. There is no "repurposing" of the tool: there is use of the tool as designed by the tool maker WordNet. The above is pure rhetoric full of buzzwords and figures of speech while making no substantive argument. The claim that I am doing "my" way is absurd since I am doing the WordNet and Moby II way and perhaps I would not even come up with the idea that we should list considerable number of instances without them. I tried to do what they are doing, already before the philosopher entry in geographic entries, and I find it cool. By contrast, it is the opposition that is doing "their" way by disregarding practice in external sources that serve as inspiration. It is all the more curious given the opposition does not spend any resources on the thesaurus and has made derogatory remarks about it. --Dan Polansky (talk) 22:09, 8 November 2022 (UTC)Reply[reply]
This an obvious no, this is what w:Category:Philosophers is for and does. - -sche (discuss) 09:52, 9 November 2022 (UTC)Reply[reply]

In the spirit of Sisyfos, I will try to understand the problems raised or implied. Questions:

  • Is the problem with instance-of relationship? If so, planets have to go from Thesaurus:planet and countries have to go from Thesaurus:country.
  • Is the problem with cut-off on the number of instances covered? If so, specific rivers have to go from Thesaurus:river: it is not practical to list all the rivers and only a sample can be given.
  • Is the problem specifically with humans? If so, why? Why are specific humans more encyclopedic than specific rivers?
  • Is the problem with poor measurability of notability? If so, rivers could be kept in Thesaurus:river, but something would have to be done about individuals in Thesaurus:philosopher. Could we perhaps include philosophers whose names are used figuratively, as in "he is no Socrates"? We could thus exemplify without relying on notability.
  • Is the problem with including items that have no sense in the mainspace? If so, I could modify the list to include only such items: "items from Moby II and WordNet that are covered by mainspace" or "only items covered by mainspace".
  • Is the problem with duplicating Wikipedia? If so, why should we have Category:en:Rivers and why should its category structure involve the encyclopedic CAT:en:Rivers in the United States and CAT:en:Rivers in Alabama, USA?

I pledge to avoid responding to individuals who have shown to produce unproductive arguments to prevent derailing the discussion. There are some individuals who have produced interesting and relevant thought and I would like to hear from them. --Dan Polansky (talk) 09:42, 10 November 2022 (UTC)Reply[reply]

I would personally be fine with, and encourage, removing the existing instances from rivers, philosophers, countries, and planets. When I say it is non-lexical, I mean that it is taxonomising referents rather than words. That is unavoidable to some extent when mapping semantic relationships, but instance-of relationships are entirely about the referents and shed virtually no light on words. Thesaurus:country, for example, would be much more useful to my mind if it had a more detailed list of terms related to countries, rather than the vast majority of the entry being a mechanical list of existing sovereign states. —Al-Muqanna المقنع (talk) 11:23, 10 November 2022 (UTC)Reply[reply]
If you remove planets, the thesaurus will show no connection between Thesaurus:planet and Thesaurus:Earth and no connection between Thesaurus:country and Thesaurus:United States of America. I don't see how that disconnection can be desirable. Thesaurus:country listing countries does not prevent it from listing other terms. Granted, the country entry lists quite many instances, but if it is to connect thesaurus entries that are semantically related, it has to do it, or have a separate thesaurus entry just for the purpose, e.g. Thesaurus:country/instances. --Dan Polansky (talk) 11:52, 10 November 2022 (UTC)Reply[reply]
I think our readers are smart enough to understand that absence on a Thesaurus page does not mean absence of any connection whatsoever. It can be replaced by a see also link to a category, which also prevents having to manually edit the information in multiple places. —Al-Muqanna المقنع (talk) 11:55, 10 November 2022 (UTC)Reply[reply]
"any connection whatsoever" is a red herring and not under discussion, e.g. phonological connections. As for "referents rather than words", semantic relations are done via relationships between referents; for instance, hyponymy is for subset relationship on referents and meronymy is on part-of relationship of referents. Thus, we have meronymy in Thesaurus:Brazil that connects the referent of Brazil to the referent of Mato Grosso. --Dan Polansky (talk) 12:01, 10 November 2022 (UTC)Reply[reply]
I agree that phonological connections are a red herring and not under discussion, and if people intuitively understand that exclusion from a thesaurus page doesn't exclude phonological connections, I'm sure they can also be trusted to understand that it doesn't exclude non-lexical instance-of relationships. These various examples are not particularly impressive to me; I don't think we gain anything from using the Thesaurus namespace to detail everything that's e.g. located within a country and if that is all it's being used for we could probably do without it entirely. —Al-Muqanna المقنع (talk) 12:32, 10 November 2022 (UTC)Reply[reply]
Okay, hyponymy is a subset relationship on referents, how about that? --Dan Polansky (talk) 12:41, 10 November 2022 (UTC)Reply[reply]

Collapsing the table of contents to only show language namesEdit

For many entries, it is quite difficult to find the language one is interested in due to the table of contents being excessively long. Take for example this page and compare it to the same page on the Spanish Wiktionary.

The Spanish Wiktionary has a nice solution: The table of contents is collapsed for all languages except the dictionary's main language (Spanish). I want to propose that we do the same.

This proposal is different from the last discussion in that section names are only collapsed, not removed, and the section names of the English entry would be shown.

(On a related note: Does it not annoy others that the sections on the mobile version aren't collapsed, or that we don't have a table of contents there? It takes forever to scroll down to the section one is interested in.)

--Hvergi (talk) 09:51, 9 November 2022 (UTC)Reply[reply]

This seems pretty reasonable, with the caveat that we should keep the table of contents floating to the right instead of making a big block that forces all of the actual definitions in the entry way down the page. —Justin (koavf)TCM 10:04, 9 November 2022 (UTC)Reply[reply]
  •   Support — excarnateSojourner (talk · contrib) 15:45, 17 November 2022 (UTC)Reply[reply]
  • It would be useful to have some kind of count of the number of entries with multiple L2 sections, possibly differentiating lemma from non-lemma L2s.
I don't see benefit to this if there is only one language in the entry, or even two or three. The benefit seems to arise only for the relatively small proportion of entries that have large number of L2 sections. I also don't see the benefit for Translingual items, whether they be for symbols, CGKV or other characters, of taxonomic names.
Is there a way to address the problem in the case of entries with large numbers of L2 sections without diminishing the value of the ToC in cases where it poses no problem? DCDuring (talk) 16:18, 17 November 2022 (UTC)Reply[reply]
Honestly I wonder if mobile UI designers in general have been having a laugh at us for fifteen years, as the lack of scrollbars on any mobile browser that Im aware of have been forcing us to flick, flick, flick our way through pages all this time, and it seems like such an easily solvable problem since there are quite often scrollbars in other areas of the mobile interface such as (on Android at least) the list of installed apps. The problem you mention would be more annoying to me if I wasnt dealing with the same thing on every other site already. Thanks for bringing it up, though. Soap 00:19, 25 November 2022 (UTC)Reply[reply]

I was confused. You mean Vector legacy 2010 without Tabbed Languages enabled. With new Vector 2022, the TOC in the sidebar is collapsed for pages with more than 20 sections. --Vriullop (talk) 13:57, 21 November 2022 (UTC)Reply[reply]

20 sections?? I have the misfortune of studying two languages that frequently coincide with other languages in their linguistic family and are last or near-last in the alphabetic list for that family. Namely, Portuguese—which comes after Catalan, Galician, Ligurian, Old (Catalan, Galician, Ligurian), Old French, and Old Portuguese (boy, makes me feel for students of Spanish!)—and Ukranian, which comes last in the Slavic family and almost dead-last in Cyrillic languages in general.
It’s excruciating when there are four fully-fleshed-out entries (including etymologies, declension tables, quotes, and related words). (I get annoyed when entries in Japanese are buried beneath “Translingual” and “Chinese”—but that’s just being curmudgeonly.) Why was the number 20 chosen, and can it be made a user-modifiable variable?
(I’ve found, btw, that currently it is much faster to collapse the earlier sections than it is to scroll them.)TreyHarris (talk) 19:55, 30 November 2022 (UTC)Reply[reply]

Plato and whether concrete persons are subsenses of name sensesEdit

I converted the "Greek philosopher" sense of Plato to a subsense of the given name sense twice and was reverted by @Dan Polansky both times. I pointed to Trump, Clinton and Hitler for analogous cases, though there are admittedly also entries like Stalin where the person sense is on the same line as the name sense and entries like Plato, Aristotle and Socrates (before I changed them) where the persons and names are on different lines entirely. Dan then demanded that I provide evidence up to an impossible-to-meet standard (evidence that this is the "dominant" practice which is apparently more stringent than "many but far from all entries"), which is why I elected to instead bring attention to it in the BP (again).

I would be in favor of disallowing all person senses on pages where the set of page title words is a (potentially improper) subset of the set of name words of that person. As an example, Donald Trump should not be a permissible sense for either of the pages Donald, Trump or Donald Trump but it is okay for something entirely different such as Cheetolini. I don't think there's currently super-majority support for this so if we must include these person senses, we should at least include them in some way subordinate to the name senses (i.e. either as a subsense (which I prefer) or as is done in Stalin but certainly not as a separate sense) because that's what they are: The set of referents of the name sense of Plato is any person called Plato, which thus makes the philosopher sense merely a restriction of that, a subset of the set of referents, hence a subsense. — Fytcha T | L | C 〉 21:27, 10 November 2022 (UTC)Reply[reply]

I favour doing exactly what you did in the first place.
On a related note, I think we should introduce something similar to WP:POINT (if we don't have it already). I think we are all getting sick of having Wiktionary held hostage at this point. Theknightwho (talk) 22:03, 10 November 2022 (UTC)Reply[reply]
  • The reason I reverted is in the edit summary: "restore the philosopher as the main sense following a long-term tradition: this is the primary activated semantic node under the symbol out of context; Plato the philosopher is extraordinarily notorious". I did so because I found the new format ugly and stupid, which of course is subjective. The nesting indentation helps nothing from usability perspective. Some editors started to change that practice, so it is now inconsistent. The objective of your edit seems to be to doubly demote the primary semantic node by changing it to the 2nd place and indenting it at the same time; and yet, the only translation table in Plato is for the philosopher. If there is consensus for a change, fine, let's find what the consensus is and make it a policy, issue closed. And since it seems to be a matter of preference and not of factually correct or incorrect, I think 60% should be a pass in this case; we should not be deadlocked on such issues only because we require the high standard of 2/3-supermajority and then let people fight the issues by back-and-forth in the mainspace. As an aside, having a dedicated sense in Trump for the president is user-friendly: if the user asks "what are the nicknames for Donald Trump", it is most straightforward to search for them in Trump entry, and there they are. It would be better if the president sense were not nested and indented, though; now it looks ugly and stupid. --Dan Polansky (talk) 07:21, 11 November 2022 (UTC)Reply[reply]
    The issue is, I'm providing rigorous arguments for why something is a subsense, whereas you're just talking about your feelings and completely irrelevant things like translation tables. From your reply I take it that you have nothing to object to the actual logic of my argument which reduces your objection to "I acknowledge that subsenses are used correctly here but I object to their correct usage anyhow because I dislike them for subjective reasons." Is this an accurate characterization of your position? Also, judging off of WT:Subsenses and the linked discussion WT:Beer_parlour/2015/May#ELE:_explicitly_ban_nested_subdefinitions/subsenses?_Or_allow_in_rare_cases?, it seems like there is good consensus to not only keep them but to employ them more often. And while I personally don't care about WT:LEMMING much, I know that you do and I want to point out the fact that the majority of monolingual dictionaries (that I use) make frequent use of subsenses. — Fytcha T | L | C 〉 12:09, 11 November 2022 (UTC)Reply[reply]
    The contrast is between "we should be using subsensing more often" vs. "whenever there is arguably a subsense relationship, we should indicate it by indenting and nesting even if there are only two or three sense lines and even if the subsense has priority in the sense activation list over the broader sense." Maybe there is consensus for the latter as well, I don't know. As you see from the thread title, is asks whether we should "allow in rare cases", whereas some people seem to think it should be done nearly always when possible in principle. As for lemmings, I know of no lemming that has Plato entry done the way you propose. As for subjectivity, there is element of subjectivity but also objectivity: I believe the notion that the philosopher sense leads the activation list is very likely to be correct. What is subjective is the assessment of what takes priority, whether semantic relations (hyponymy and the like) or activation frequency relations and usability. --Dan Polansky (talk) 13:37, 11 November 2022 (UTC)Reply[reply]
    On another related note, is anyone else getting sick of Dan asserting (without evidence) that whatever he prefers is always the status quo, and that it’s up to other people to overturn it? How about he accepts the burden of proof for once, given he has provided absolutely no evidence for that. As far as I can tell, it’s just a rhetorical tactic to stack the deck in his favour in every discussion. Theknightwho (talk) 15:09, 11 November 2022 (UTC)Reply[reply]
His extremely long filibusters make discussions hard to follow. Equinox 15:29, 11 November 2022 (UTC)Reply[reply]
I don't think Plato-person is a subsense of Plato-name, because a person is not a name. Rather, Plato-person has an instance of Plato-name; or his name (but not he himself) is an example of the name. In object-oriented programming you would never derive Person from Name. (My preference with specific people like Plato and Einstein is to put their Wikipedia links in the "See also" section, and only include them at all if they are the overwhelmingly commonest known person of that name.) Equinox 15:30, 11 November 2022 (UTC)Reply[reply]
This gets into some quite tedious (literal) semantics but it's worth noting that our name senses don't (generally) define the word as referring to a name, they tend to be non-gloss definitions explaining that the word is used as a name (for people). In that case I believe it's fair to talk about instances being subsenses, though it's ultimately really a presentational issue and I don't really mind either way. —Al-Muqanna المقنع (talk) 17:54, 11 November 2022 (UTC)Reply[reply]
I don't care how tedious I am, if I'm right. ("Bureaucrat Conrad, you are technically correct. The best kind of correct!") It's definitely a strange question, and actually opens up whole cans of worms: e.g. if Smith is a name, but we have a plural Smiths, then what is it a plural of? Two Smiths are two people, not two names, but we wouldn't define Smith as a person (unless it was Einstein, haha!), and then even if we did define it as a person, then the plural would usually be two of any people with the name, and not two of the defined person. I can see how this seems like boring semantic dancing, but I think it's actually a strong indicator of why a dictionary, defining words, should not get into questions of individual personalities. Equinox 05:04, 12 November 2022 (UTC)Reply[reply]
On the specific issue, like Al-Muqanna, I'm not really bothered by either presentation.
On the broader issue, Dan has been an obstructionist for as long as he's been here, also years before his hiatus. I'm loath to block a long-time editor who does also do some good work, but I do think we have a "one disruptive editor" problem more than a "we need new rules about POINTing" problem — no shade to TKW, rules can help in general or future cases, but rules can also be gamed and part of this user's MO is rules-lawyering, so at some point a community has to exercise discretion and block people who are not participating in the collaboratively-building-a-dictionary part of working together to build a dictionary. Seeing how many other people are also fed up with his obstructionism and filibustering independent of their feelings on the specific issues like this, I have bitten the bullet and blocked him, and repeat my block summary here in case the length gets cut off in the block log: "persistent, years-long history of disruptive editing and obstructionism; in particular, I highlight as w:WP:DE does that disruptive editing need not be "intentional. Editors may be accidentally disruptive because they [...] lack the social skills or competence necessary to work collaboratively. That the disruption occurs in good faith does not change that it is harmful"."
- -sche (discuss) 19:12, 11 November 2022 (UTC)Reply[reply]
Further discussion at User talk:-sche#Dan_block. - -sche (discuss) 00:24, 12 November 2022 (UTC)Reply[reply]
I want to thank @-sche for their courage in this decision; I don't see myself as ever having the courage to permanently block a long-time editor. It goes without saying that I am saddened that we miss out on the good work Dan could have done in the future for this project but I also agree that the status quo would have been untenable in the long term. I want it to be known that I, being a lover of second chances, would be in favor of unblocking him, provided that he abstains from participating in Wiktionary policy making, edit wars and and the likes (I don't want to provide a comprehensive list here because I'm sure Dan is smart enough to figure out by himself which kinds of edits are fine and which ones aren't). — Fytcha T | L | C 〉 01:05, 12 November 2022 (UTC)Reply[reply]

Is Proto-Norse a dialect of Proto-GermanicEdit

The differences between Proto-Norse and Proto-Germanic are pretty small. Would it perhaps be a good idea to treat Proto-Norse as a late dialect of Proto-Germanic, much like we did for Frankish? -- {{victar|talk}} 02:46, 12 November 2022 (UTC)Reply[reply]

@Mårtensås as the main (only?) editor of Proto-Norse. Thadh (talk) 00:44, 13 November 2022 (UTC)Reply[reply]
Proto-Norse is an attested language; we have several hundred words in it. Now, the earliest Proto-Norse is so close to Proto-Germanic that it might better be classified as Proto-North-West Germanic (the common ancestor of North and West Germanic); Elmer Antonsen argues for this, and he is right in that it does not show any specifically North Germanic innovations, only common ones like *ē > ā, *-ai > ē, *-ō > -u. This would also solve the issue of a word like ᚱᚨᛇᚺᚨᚾ, which as it is is classified as Proto-Norse, even though it might just as well be "Proto-English", the two languages being almost identical at this time.
We further have certain innovations common to Anglo-Frisian and North Germanic, but not shared with the more southern West Germanic languages, such as the 3rd person plural present indicative *eʀun[1] or 2nd person plural pres. ind. *eʀt[2]. Another one would be the collapse of the n-stem oblique conjugation into that of the accusative, as we already see in the genitive raihan above: Old English: -a, -an, -an, -an, Old Norse: -i, -a, -a, -a[3], Old High German: -o, -on, -en, -en. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 11:19, 13 November 2022 (UTC)Reply[reply]
All the pings: @Mahagaja, Rua, Mnemosientje, Mulder1982 --{{victar|talk}} 08:51, 13 November 2022 (UTC)Reply[reply]
  1. ^ Old English: earon, Old Norse: eru, Old High German: sind
  2. ^ Old English: eart, Old Norse: ert, Old High German: bist
  3. ^ -n has been lost word-finally, but survives in early inscriptions which show that the change is at least from the 400s.

Japanese verb conjugation template discussion Part 2Edit

User:Huhu9001 has created a new conjugation template here. I think it's fine, but what do you think of it? (Memory issues, etc?) Dennis Dartman (talk) 16:14, 12 November 2022 (UTC)Reply[reply]

I really like the fact that forms and polarity are now split by axis but, due to the (IMO useless) furigana and romaji that are now presented vertically instead of horizontally, I still prefer the current table. The table in Module:User:Huhu9001/000/documentation is 1249 pixels high, the one in 泳ぐ#Conjugation only 568. — Fytcha T | L | C 〉 16:36, 12 November 2022 (UTC)Reply[reply]
Remember that on mobile, screen width is at a premium. Theknightwho (talk) 16:42, 12 November 2022 (UTC)Reply[reply]
Well, tell that to the creator and supporters to Template:sw-conj! Dennis Dartman (talk) 18:44, 12 November 2022 (UTC)Reply[reply]
I'm not sure how that's relevant, really. There are obviously extremes at both ends of the spectrum. Theknightwho (talk) 23:49, 15 November 2022 (UTC)Reply[reply]
I like the improved completeness and precision of this new one for sure. It specifies clearly all the "principle parts", but I don't like that they aren't written in Japanese script anymore. It was nice to see hyperlinked 未然形, 終止形, etc., which are now missing. Also, the boxes, as we've established, are quite tall, and the comparatively thinner boxes of the old template were more aesthetically pleasing in my eyes. They wasted less space in conveying the same information.
Nevertheless, a huge bonus of the new one is the extra forms it has that the old one doesn't: distinguished "adverbial" forms, optative and presumptive, all the polite forms, etc.
In comparison, the old template was much more terse, so it wasn't so good a reference for all the possible forms the verb could actually take. In the wild, I'd seen the presumptive form used in writing, but Wiktionary never had anything to say about it in the conjugation template, so I only now got to understand it a little better. Kiril kovachev (talk) 22:52, 12 November 2022 (UTC)Reply[reply]
I love it. I find it a bit odd to have the names of the "principal parts" without -kei (I don't think you ever use those words without -kei in Japanese when referring to the grammatical form). I'm not too bothered by having the actual Japanese words in kanji. I don't think principal parts is the term we want there though. It works with Latin, Greek, etc., but it's not really the same thing in JP. If we were actually giving "principal parts", we would (1) not repeat the identical forms (oyogu x2, oyoge x2), and (2) we would give "oyoi-" and "oyogo-" too. I would stick to something more traditional, like "bases" or "stems". (Well, if it was up to me I wouldn't even include them, since they were created to describe Classical Japanese and they are not really functional/useful when talking about Modern Japanese,on the contrary they only create confusion, but that's another question.) — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 16:05, 15 November 2022 (UTC)Reply[reply]
  • On the whole, I think this represents a number of improvements. This new format fills some important gaps in our older infrastructure, such as including polite forms.
  • I agree with @Kiril kovachev that the new format leaves out the linked Japanese grammatical terms that would go in the header under "Principal Parts", and with @Sartma that the final (-kei) should be added on the end of those terms, but these are minor issues and easy to fix. Perhaps something like "Verb Stem Forms" might be a better label than "Principal Parts". I do see value in including the verb stems under the labels, as these are still relevant in modern Japanese grammar. Where the stem forms are identical, we could merge the table cells -- easy enough, since the identical forms always occur between the adjacent 終止形 (shūshikei, terminal or predicate form) and the 連体形 (rentaikei, attributive form) (both ending in -u), and between the adjacent 已然形 (izenkei, realis form, for Classical) / 仮定形 (kateikei, hypothetical form, for modern) and the 命令形 (meireikei, imperative form) (both ending in -e for so-called quintigrade verbs, but distinct for other verb classes).
  • Minor note: "Reitai" is a typo for "Rentai", c.f. 連体形 (rentaikei, attributive form).
  • I think a couple of the English-language labels on the left might be problematic. "Subjunctive" doesn't quite capture how ~たら (-tara) or ~ば (-ba) function, for instance: both can express "if" conditions, with ~ば expressing more of a prerequisite causal relationship than ~たら. I am more accustomed to seeing these described as "Conditional". By way of example, a subjunctive statement such as "it would have been better if I had gone" can be expressed in Japanese without using either ~たら or ~ば: 行ったほうがよかったのに (itta hō ga yokatta no ni). Meanwhile, the ~て (-te) forms are more conjunctives than adverbs: compare 眩しく明るい (mabushiku akarui, blindingly bright) and 眩しくて明るい (mabushikute akarui, blinding and bright).
  • In terms of usability, I am concerned about the use of gray text on a gray background for the Japanese terms in the table -- lower contrast is not ideal for accessibility reasons, and some of our readers will struggle to see this clearly. Black text on a white background would work better.
  • I also agree with @Fytcha that the furigana (the smaller characters above the kanji, provided as a phonetic guide) are not terribly useful here -- anyone who can read the hiragana used for the furigana can already understand how the okurigana break down, so the correlation with the romanized text is obvious. And anyone who cannot read even the hiragana would need the romanized text, and would have no use for the furigana. I would propose to remove the furigana, and thereby save some space in the layout. The furigana features require Module:ja-ruby, so cutting out this dependency might also save a bit on Lua memory.
These minor changes aside, I think we would be well served to try this out with the different verb types (quintigrade, -i and -e monograde; also the different ending morae for the quintigrade, such as ~ぐ -gu vs. ~く -ku, etc.), and ideally use this to replace our existing Japanese verb conjugation table templates. Props to @Huhu9001 for tackling this. ‑‑ Eiríkr Útlendi │Tala við mig 19:38, 15 November 2022 (UTC)Reply[reply]

I insist on my terminology.

Principal parts:
  • This name most accurately describe the role of mizenkei, etc. in the Japanese grammar. They were inventions of traditional grammarians, as a mnemonic tool to help remember the whole inflection, despite language change in Modern Japanese rendering them flawful. Traditional dictionaries would give them as clues for other inflected forms, should they give any. They can be ignored if you have better ways to memorize.
  • "verb stem forms" is a particularly uninformative name. It can not be more obvious that this term is a "verb", and everything in the table are "forms". Meanwhile the word "stem" is overused. In the headline we already have a "stem" which means ren'yokei, and sometimes we talk about "consonant or vowel stems" which is actually something like せ for する. Here we are not to add another six "stems" as this word is becoming almost meaningless.
  • Calling them "conditional" is simply a misunderstanding of what the "conditional mood" means. In a if-sentence, the conditional mood is the one in the main clause. The mood in the if-clause is called "subjunctive". Japanese -ba and -tara are obviously in the if-clause, not the main clause.
  • The example 行ったほうがよかった only shows that -ba/-tara are paraphrasable. That's an unrelated topic.
  • While some -te forms translate into English "(verb) and (verb)", not all of them do. -te has a variety of meanings other than "(verb) and (verb)" which names like "conjuntive" would obscure. In any case -te forms are, even when they do translate into "(verb) and (verb)", syntactically adverbial phrases, making "adverbial" the most proper name.
  • "Conjunctive" is easily conflated with the "conjunctive mood", something totally different. -- Huhu9001 (talk) 03:11, 18 November 2022 (UTC)Reply[reply]
@Huhu9001: Principal parts are something very precise. Despite their name, principal parts are not just "parts", but fully formed/inflected words. That's why I'm saying this definition doesn't apply in the case of the 活用形 (katsuyōkei). The mizenkei of iku (to go) is ika-, and it's not a fully formed/inflected word. The fully formed word would be ikanai or ikō. Actual principal parts for iku would be, for instance, iku, ikimasu, ikanai, ikeba, ikō, itta, since remembering those forms would allow you to derive all other inflections.
So no, "principal parts" is not the name that most accurately describes the role of Japanese katsuyōkei. On the very contrary. If you stick to it, you'll just be misusing it. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 16:43, 18 November 2022 (UTC)Reply[reply]
@Huhu9001: Actually, for iku you would just need iku and itta as principal parts. That's all one needs to know to conjugate the verb in all the other forms. You see what thinking in terms of katsuyōkei does to one's brain? It stops you from seeing the obvious! Lol. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 17:40, 18 November 2022 (UTC)Reply[reply]

Correct regional label for Switzerland AlemannicEdit

What is the correct regional label for entries like Chlapf? I added (Switzerland) but, strictly speaking, this is wrong because there are Swiss dialects that use Klapf instead. The reason why I feel like I need to add a label in the first place is because I don't know whether it is used in Alsace, Northern Italy etc. and so if I don't add a label, it implicitly gives off the impression that this word is used in these dialects as well. It's not the first time I've been confronted with this dilemma. Pinging @Widsith, Linshee. — Fytcha T | L | C 〉 18:06, 12 November 2022 (UTC)Reply[reply]

We've discussed this kind of thing a bit before on our respective Talk pages I think. Personally I don't think having a "Switzerland" label on gsw entries makes much sense, since I think it's assumed we're talking about Swiss German in the absence of any other label. Even in such a diverse dialect continuum as Alemannic German, at some point you have to assume that a particular form is a majority form, and then the other variants would be listed under an "Alternative forms" section. At least that is how I would handle it, from a practical point of view. Ƿidsiþ 06:16, 13 November 2022 (UTC)Reply[reply]
@Widsith: Apologies for the late reply. Yeah, I think this is a very reasonable middle-ground solution and I think it should be codified into WT:AGSW (WIP). The only downside I can think of is that it would be impossible to find all Swiss German lemmas (without external processing), i.e. all lemmas that would be understood in Switzerland. It seems at least half-way plausible that somebody would want to look that up, but oh well, let's just keep it simple. — Fytcha T | L | C 〉 16:44, 26 November 2022 (UTC)Reply[reply]
@Fytcha If we knew it wasn't used outside Switzerland, then I'd think "Switzerland" was a fine label, since I don't think any of our labels assert that every single speaker and sub-dialect in the region uses the term. For example, I'm sure there are terms labelled "(UK)" or "(US)" for which some sub-dialects have other terms, but "(UK)" is still a useful label. But if you aren't sure it's limited to Switzerland, then...yeah, that's tricky; I understand the desire not to present something as pan- or non-dialectal if you don't know that it is, but presenting it as Switzerland-only (when you don't know that it is) also seems suboptimal. What we really need, not just for Alemannic but also Low German and probably other languages (Chinese?), is a system for marking "known to be used in at least the following dialects: ...". - -sche (discuss) 10:43, 26 November 2022 (UTC)Reply[reply]
@-sche: A system to distinguish whether a given set of context labels is known to be complete or not would probably be the best long-term solution, but I don't see us implementing that anytime soon. I agree with your contention and I think we should just use no label at all for such entries for the time being. — Fytcha T | L | C 〉 16:44, 26 November 2022 (UTC)Reply[reply]

Lingua Libre .ogg Format?Edit

Under Help:Audio pronunciations, it looks like it's been agreed upon that we should use the .ogg format, "because it is a free format". Naturally, I agree with this, but using the Lingua Libre resource mentioned on that same page creates .wav files, rather than .ogg. Is this okay, and is there any way to record in the .ogg format on Lingua Libre? I'm just concerned that the wave files generated are uncompressed, so might waste a lot of space on the Wikimedia servers when a simple ogg would do fine. Thanks for any feedback, Kiril kovachev (talk) 22:41, 12 November 2022 (UTC)Reply[reply]

@Kiril kovachev I wouldn't worry about wasting space - there's a filesize limit for a reason, but you won't be anywhere near it. Theknightwho (talk) 17:00, 15 November 2022 (UTC)Reply[reply]

the vowel of floor, horse, etc in GenAmEdit

Many entries notate the vowel of floor, core, hoarse, horse, etc only as /ɔ/, as if it were the vowel of flaw and caw. Since it's not (in GenAm), and one can even contrive minimal sets like core vs. a poetic caw'r (monosyllabized like o'er...for which we comically give /oʊ/ only in a disyllabic pronunciation), I think /ɔ/ is misleading. Some editors add /o/ or /oʊ/ to entries here and there, but we should approach this categorically. Merriam-Webster says the vowel+r of floor, core, hoarse, and horse, north and force is "[oɚ, ɔɚ]" (they write "ȯr", but clarify that this means IPA [oɚ, ɔɚ], vs. the "ȯ" of flaw which is [ɔ]). Dictionary.com writes the vowel of floor, core, and hoarse (but not horse) as /ɔ, oʊ/ without any suggestion that /oʊ/ is restricted to the few dialects than haven't undergone the hoarse-horse merger. (The 1933 OED, which distinguished hoarse (hōᵊɹs) from horse (hǭɹs, with italics), also distinguished horse's vowel from haw's (hǭ, no italics).

I think we should categorically include GenAm /o/ or /oɚ/ pronunciations in all these entries, either alongside /ɔ/, or relegating GenAm /ɔ/ to a separate {{a|without the hoarse-horse merger}} line. (/o/ is not limited to the few accents that don't have the hoarse-horse merger, as entries like floor currently claim, because as M-W indicates, it's the pronunciation with and which resulted from the merger.) Agree, disagree, other ideas? Catalyst for this was diff ~ diff and Talk:Florida; pinging Whoop whoop pull up, Tharthan and Soap, plus Mahagaja who's had things to say about this vowel in the past. - -sche (discuss) 21:48, 13 November 2022 (UTC)Reply[reply]

/ˈflɔɹ.ɪ.də/ sounds to me like something from Noo Yawk and points north, but I don't use /ɔ/ except in diphthongs- so I may be an outlier. I don't think it's a coincidence that Tharthan is from an area that uses /ɔ/ a lot. Chuck Entz (talk) 22:21, 13 November 2022 (UTC)Reply[reply]
It's an extremely complicated issue. For speakers with the horse/hoarse merger, the starting point of the rhotacized vowel in question is more open than the starting point of the goat vowel but closer than the thought vowel, even for people who distinguish cot and caught and have a real /ɔ/ in the latter. Therefore, the north/force vowel can be thought of either as /ɔɹ/ (with the understanding that the /ɔ/ is closer here than in its nonrhotacized equivalent) or as /oɹ/ (with the understanding that the /o/ is more open here than in its nonrhotacized equivalent). (Incidentally, I wouldn't use /oʊɹ/ since there's really no [ʊ] offglide, and we aren't generally using /ɚ/ to transcribe the ends of rhotacized diphthongs.) For speakers that distinguish horse and hoarse, the transcriptions /ɔɹ/ and /oɹ/ respectively make much more sense (but in the U.S., most speakers who make this distinction are either also nonrhotic speakers, meaning it's actually a matter of /ɔː/ vs. /oə/, or else they've merged north with start rather than with force, meaning it's a matter of /ɑɹ/ vs. /oɹ/). However, it's important to remember that Florida (and foreign, forest, orange, etc.) are neither north words nor force words, but lot words where the stressed vowel precedes an intervocalic /ɹ/ (as shown by the fact that New Yorkers say /ˈflɑɹɪdə/ but not */nɑɹθ/). So however we decide we want to transcribe north and force, it won't have an effect on Florida, which will need a separate decision. I'd also point out that all previous attempts to impose any sort of consistency to our transliteration of GenAm and other U.S. accents have failed miserably, as everyone has their own ideas as to the best system, and that inconsistency has already been enshrined in Appendix:English pronunciation, which has been written to be tolerant of using a wide range of symbols to represent the same sounds. —Mahāgaja · talk 22:28, 13 November 2022 (UTC)Reply[reply]
I'm originally from Central Massachusetts, and /ɔɹ/ in any context sounds completely alien (in fact, whenever I try to pronounce it myself, it comes out as /ɔɚ/), even though I'm from an area that uses /ɔ/ a lot (you can thank the cot/caught and father/bother mergers for that) and where /o/ is itself completely alien except in /oɹ/ (as in bore), /ol/ (as in bowl), /oʊ/ (as in bow), and /oɪ/ (as in boy). Just throwing this out there. (And, for that matter, thinking about some of the contexts where /o/ does occur is leading me to suspect that a lot of our diphthong notations show the wrong vowel for their offglides...) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:55, 14 November 2022 (UTC)Reply[reply]
I would like to point out that we currently transcribe the word Florida with the syllable boundary after the r, not before it, so it's FLOR-i-da, not FLO-ri-da according to us. This is in keeping with words like massive, where a single consonant after a stressed syllable is grouped with the preceding vowel. My theory that I posted on talk:Florida was that the dual pronunciations arise because different speakers, even within the same dialect, might parse the syllables in different ways, with some of us thinking of it as FLOR-i-da and others as FLO-ri-da. Since as you point out there are very few speakers of American English for whom IPA(key): [flɔr] is a possible word, it may be that those of us who group the /r/ with the first syllable have only the "floor" pronunciation to choose from, while those who mentally parse the first syllable as open can pronounce it with any vowel that's valid in an open syllable. Thus, it might be that this word does not follow the usual dialectal patterns. Soap 01:03, 14 November 2022 (UTC)Reply[reply]
Yes IMO the vowel of 'floor' and 'horse' is clearly /o/ not /ɔ/ in GA. Note that GA speakers who merge 'cot' and 'caught' do not merge 'car' and 'core', which suggests rather strongly that the vowel in 'core' is not /ɔ/. It is probably time for me to try creating an English pronunciation module; I've been spending the last couple of months on Portuguese pronunciations but that work is coming to a close. Benwing2 (talk) 02:19, 14 November 2022 (UTC)Reply[reply]
Exactly (re car). Another example that just occurred to me is that "floor eight" or "the floor ate it" (to invent a more creative excuse for not having my homework than "the dog ate it") clearly has a different vowel from "flaw rate" or "the flaw rate it (has is XYZ)" in GenAm. - -sche (discuss) 02:34, 17 November 2022 (UTC)Reply[reply]
I support the /oɹ/ representation for the north and force merged vowel. I think I analyzed it as another instance of the goat vowel /oʊ/, before I learned about these IPA systems that write it with an ɔ symbol. It is generally a bit lower phonetically than the vowel in goat, but I think that's a natural effect of the velarized /ɹ/ after it. It's usually very different in quality from the caught vowel /ɔ/ or /ɑ/ in accents under the General American umbrella, such that I don't think anyone perceives north and force as having the same vowel as caught. Though it's a separate issue as Mahagaja says, for words like Florida I would support two General American transcriptions, one for the Flarrida pronunciation (I guess /ˈflɑɹədə/ with the same vowel as lot), probably influenced by NYC and neighboring accents, and one for the Midwestern one with the same vowel as north and force (/ˈfloɹədə/), because these represent slightly different phonological systems. — Eru·tuon 00:42, 15 November 2022 (UTC)Reply[reply]
I would agree that the o vowel before /l/ and /r/ monophthongizes, at least for many dialects including mind. Vininn126 (talk) 16:53, 15 November 2022 (UTC)Reply[reply]

Dictionaries give the following pronunciations for "Florida" in GA:

  • OED: /ˈflɔrᵻdə/
  • Routledge: /ˈflɔrƗdə/
  • Cambridge: /ˈflɔːrɪdə/, /ˈflɑːrɪdə/
  • Longman: /ˈflɔːrɪdə/, /ˈflɑːrɪdə/
  • Merriam-Webster: ˈflȯrədə, ˈflärədə

Hope that helps. Nosferattus (talk) 07:18, 14 November 2022 (UTC)Reply[reply]

@Tharthan, Soap, -sche, Mahagaja, Chuck Entz, Benwing2, Vininn126, Nosferattus Have we come up with any sort of consensus re: GA /o/? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:26, 18 November 2022 (UTC)Reply[reply]

Well, you, I, Benwing and Erutuon are in favour of /o/, and Vininn seems to be in favour. Chuck doesn't use /ɔ/: Chuck, do you pronounce north/force words with /o/, or what? Mahagaja seems to be saying either could work (and in the past mentioned the notation [o̞] which might be something to consider for the narrow transcription). I can't tell if Soap is taking a stance; I think Tharthan would prefer to stick to /ɔ/ but I'm not sure. I mocked up one possible way of handling these at hoarse and horse, tentatively leaving /ɔ/ alongside /o/. If there's not (much) opposition, I reckon we update Appendix:English pronunciation. But are we merely adding /o/ so these words will have "/ɔ/, /o/", or are we dropping /ɔ/ from the modern pronunciation? (I trust we're still mentioning it as the pre-merger pronunciation in any case, something that'll be easier to do systematically once we have a pronunciation module.) BTW, I'd like us to add the words that make up Wells' lexical sets' names into that appendix, to make it easier to see which line covers force... - -sche (discuss) 23:06, 19 November 2022 (UTC)Reply[reply]
My opinion for words like floor is that they should be listed as having /or/ in GenAm because almost no American dialect still allows [ɔr] in a closed syllable .... basically it'd be people who are horse/hoarse unmerged but still pronounce final /r/. But there are words like Florida that have idiosyncratic pronunciations and should not be forced to show only /or/. Soap 23:16, 19 November 2022 (UTC)Reply[reply]
@-sche: I am basically in agreement with you on this, -sche. I would say that I don't think that we necessarily ought to drop /ɔ/ altogether from the modern pronunciation. Something along the lines of what you have right now at horse looks good, although I am not sure that it makes the most sense for an Early Modern English pronunciation of the word to be placed right next to that (Modern English) pronunciation that you have there near the bottom. The Early Modern English pronunciation ought to be separate, on its own line, not immediately next to that pronunciation.
When it comes to some words, such as Florida, I think that what Soap has brought up regarding syllable boundary differences from speaker to speaker impacting how that word might be pronounced probably warrants some indication of that when such differences lead to a real difference in pronunciation. In the case of that particular word, that would mean including an [-ɔ.ɹɪ-] pronunciation in addition to the other pronunciations. Tharthan (talk) 00:16, 20 November 2022 (UTC)Reply[reply]
I think General American-type accents that pronounce the vowel of Florida differently from north and force typically pronounce it with the lot vowel, not the thought vowel. So then the correct transcription would be /ɑɹ/ (ignoring syllable break considerations), contrasting with /oɹ/ (or /ɔɹ/?) for north, force, and I guess glory. My reasoning is that these accents seem to be influenced by New York City pronunciations (I hear them from TV presenters a lot), and there I think the vowel of Florida is more similar to the low vowel of lot and not the back and sometimes raised and rounded vowel of thought. The syllable break certainly is the condition that would explain why Florida hasn't merged with force but north has, but it isn't enough to describe the current phonemic distinction between Florida and north-force in General American-type accents that I typically hear. I could be wrong because I don't live in an area where these pronunciations are at all common. But I'm curious now if for you, Tharthan, because you have a north and force contrast, would Florida and north have the caught vowel, and glory and force the goat vowel (contrasting with the vowel of starry or lot)? — Eru·tuon 22:20, 23 November 2022 (UTC)Reply[reply]
You are talking about something that is different from what Soap and I were referring to with regard to Florida. The issue at play that we were bringing up about Florida is that, for some speakers, there are syllable boundary differences—as Soap said: different speakers parsing Florida's syllables in different ways. Most saying /ˈfloɹ.ɪ.də/ (the first part like the word floor), but others saying [ˈflɔ.ɹɪ.də], (the first part like flaw).
The /ˈflɑɹ.ɪ.də/ pronunciation is an entirely different subject, one that I think is adequately covered in our entry for Florida. The /ˈflɑɹ.ɪ.də/ pronunciation is listed as New York City, Philadelphian, and non-Bostonian traditional Eastern New England English. A /ˈflɒɹ.ɪ.də/ pronunciation is indicated as the traditional Bostonian pronunciation, as it is in England's Received Pronunciation.
With regard to your last question, I am sorry to say Erutuon but I only have a partial horse-hoarse distinction. And I suspect that the only reason that I have any distinction at all is because I spent a lot of time from a very young age around relatives who had a full distinction in their speech.
A couple of examples: mourning is different from morning for me, and four is different from for for me. But, in contrast, the first word in a hypothetical "more *ning" would not have the GOAT vowel, and fore doesn't have the GOAT vowel either. Nor does boar or bore. And yet hoarse does have that vowel.
I guess the situation with the horse-hoarse distinction for me is not dissimilar to the situation that another New Englander Wiktionarian here briefly mentioned that they have with the father-bother merger. As they said about their own speech in a discussion related to this one: "full father/bother merger except before /ɹ/ plus a handful of other scattered exceptions which remain unmerged." Tharthan (talk) 00:05, 24 November 2022 (UTC)Reply[reply]
@Tharthan As the New Englander Wiktionarian in question, it's "the situation that she has" and "as she said about her own speech", FYI. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:51, 24 November 2022 (UTC)Reply[reply]
Does anyone else want to express an opinion on whether to have both /ɔ/ and /o/ as post-merger GenAm pronunciations, or just /o/ (with /ɔ/ only as the pre-merger pronunciation)? Normally when we list multiple pronunciations for the same accent they're contrastive, whereas these are not, they're competing ways of representing the same pronunciation for words in the horse, hoarse sets (setting Florida aside as a different beast that may be pronounced both like floor+ɪdə and like flaw+ɹɪdə). I'm interpreting Benwing, Erutuon and Whoop as preferring just /o/. For now, I've done this. BTW if Canadian English should also use /oɹ/, let's discuss that... - -sche (discuss) 23:43, 22 November 2022 (UTC)Reply[reply]
Erutuon's edit changing the Canadian borrow and forest vowels from /ɔɹ/ to /oɹ/ highlighted that we also needed to update the appendix with regard to which vowel those words have in GenAm. The appendix claimed that borrow and a handful of other words only have /ɑɹ/ while horror and forest only have /ɔɹ/, but this is wrong in both directions: in truth, looking at Youtube and Merriam-Webster, any of these words can have either the horse-hoarse vowel or the start vowel in GenAm, like in the other listed dialects. I boldly made this change; if this needs to be tweaked (or reverted) and discussed further, please tweak/revert/discuss. It is unclear which accents or pronunciations the note "This sequence only occurs before another vowel." is intended to refer to. - -sche (discuss) 00:23, 24 November 2022 (UTC)Reply[reply]
Do we need to update the enPR given for that line? It's just ŏr, which covers the first of the possibilities several of the dialects allow, /ɒɹ/ (although it means we're representing GenAm's /ɑɹ/ with two different enPR transcriptions), but it means we're not indicating the separate existence of the other possible pronunciation which all of the dialects allow, ɔɹ, oɹ (unless we're saying that sound gets represented two different ways, as ŏr (which thus stands for two things) or ôr, based on... what? facts about the etymology which someone looking to add an enPR transcription may not know?). - -sche (discuss) 00:38, 24 November 2022 (UTC)Reply[reply]
The reason for having a different lines for borrow and horror is that borrow has /ɑɹ/ in all GA, whereas horror has /ɑɹ/ in New York City-influenced (and similar) GA and /oɹ/ elsewhere, as shown in the chart of mergers of /ɒr/ and /ɔr/ (using their diaphonemic symbols) or Aschmann's table of r-colored vowels. I think Merriam-Webster's /oɹ/-type transcription for borrow must be for Canadian English; I don't know of a US accent that pronounces it that way. — Eru·tuon 02:23, 24 November 2022 (UTC)Reply[reply]
@Erutuon - borrow doesn't have /ɑɹ/ in all GA, though; for me, it comes out as /ˈbɔ.ɹou/, with /ɔ.ɹ/ (the phonotactic incompatibility between these two phones being averted by the intervening syllable break). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:15, 24 November 2022 (UTC)Reply[reply]
Just to be clear, are you saying you have an American accent without the cot-caught merger and you pronounce borrow with the THOUGHT vowel, in contrast with both /o/ as in boring and /ɑ/ as in starring? So borrow (or sorrow) would essentially rhyme with "raw row" or "saw row" (setting aside stress) but not with "spa row" or "sore O"?--Urszag (talk) 06:23, 24 November 2022 (UTC)Reply[reply]
@Urszag - With the cot/caught merger, but the rest of that is correct. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 06:41, 24 November 2022 (UTC)Reply[reply]
Oh! Then, do you have /ɔ/ in cot/caught/bother/borrow and lack the merger of this with the father/spa/starry vowel /ɑ/? Urszag (talk) 06:47, 24 November 2022 (UTC)Reply[reply]
/ɔ/ in cot/caught/father/bother/borrow and /ɑ/ in spa/starry (basically, it's /ɑ/ when immediately before /ɹ/ without an intervening syllable break and also in some foreign loans like spa/bra/Nazi, /ɔ/ otherwise). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 07:59, 24 November 2022 (UTC)Reply[reply]
...You have /ɔ/ in father?
I do not recall having ever encountered anyone who pronounced the word father /ˈfɔðɚ/, and was not aware that there was a North American dialect that pronounced it that way. I guess that if someone spoke a dialect which had first had a form of the father-bother merger take place that resulted in historical /ɑ/ and /ɒ/ merging to to /ɒ/ in the dialect, that then afterwards had a cot-caught merger occur with /ɔ/ being the resulting vowel, /ˈfɔðɚ/ could conceivably result from that. With that said, again, I have no recollection of having ever heard /ˈfɔðɚ/ for father in my life. Tharthan (talk) 09:07, 24 November 2022 (UTC)Reply[reply]
...yes, I do. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 10:48, 24 November 2022 (UTC)Reply[reply]
@Erutuon: Hmm... M-W seems to specify when they're giving a Canadian or other non-GenAm/US pronunciation (as for schedule, [3]), and they don't call borrow-with-o Canadian. Well, I have no objection to re-splitting the lines, I'd just like to understand why it's considered useful: is there a reason these 4–5 words having a different main pronunciation (in one dialect) than the other members of their broader set (while both sets would still need notes about how they can have the other pronunciation) means they need their own line in the table; like, does there need to be a separate line for every anomalous word, or is there something special about these? Were they historically in a different class than the horror, forest words, or are they separated out by other reference works so often that we'd be remiss not to do likewise? - -sche (discuss) 19:29, 24 November 2022 (UTC)Reply[reply]

ʌ in American English pronunciationsEdit

There's something of a schism between dictionary editors over whether a stressed schwa should be represented as ʌ or ə in American English. Some dictionaries transcribe stressed schwas as ʌ in IPA so that the American and British transcriptions match. Other dictionaries transcribe stressed schwas as ə in IPA because stressed and unstressed schwas are allophones and should be represented by the same symbol. Dictionaries that follow the first convention include the Longman Pronunciation Dictionary and the Cambridge English Pronouncing Dictionary, while dictionaries that follow the second convention include the Merriam-Webster Collegiate Dictionary, the Oxford English Dictionary, and the Routledge Dictionary of Pronunciation for Current English. Geoff Lindsey convincingly argues that the first convention doesn't make any sense at https://www.youtube.com/watch?v=wt66Je3o0Qg. As far as I can tell, Wiktionary follows the first convention (using ʌ), but maybe we should be using ə instead. See examples at above#Pronunciation, Russia#Pronunciation, and love#Pronunciation. Thoughts? Nosferattus (talk) 06:56, 14 November 2022 (UTC)Reply[reply]

I want to be against this because I have a clear stressed phonetic schwa in words like pull ~ full ~ bull, and it is neither [ʊ] nor a syllabic consonant. Meanwhile there is never an unstressed [ʊ], whereas I would say I do have unstressed [ʌ] in words like above. There is a minimal pair between [ʌ] and [ə] in unstressed position .... Rosa's roses. So if any stressed vowel is to be united with schwa, for me it must be /ʊ/.
However as I intimated above, I live in New England, don't travel much, and don't consume mass media, and it took me well into adulthood to realize that most people even in America don't talk quite the same way as me. So all I can really say is that for some Americans the /ʌ/=/ə/ analysis doesn't work. Soap 10:21, 14 November 2022 (UTC)Reply[reply]
@Soap For me (another New Englander born and raised, albeit one since transplanted to Minnesota), Rosa's roses has [ə] and [ɪ] rather than [ʌ] and [ə] (my personal go-to unstressed [ʌ]/[ə] minimal pair is untangle/entangle), and [ʊ] does exist in unstressed environments (for instance, the second vowel in fishhook). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 02:27, 16 November 2022 (UTC)Reply[reply]
@Soap: Interesting. Are could and cud homophones for you? How do you pronounce the vowel in strut? For me (also a native GenAm speaker), it's definitely a mid central schwa (with a relaxed mouth), identical to ⟨a⟩ in about. Whereas for pull, I purse my lips on the vowel. Do you think it would make sense for us to list both /əˈbʌv/ and /əˈbəv/ as GenAm pronunciations for above, since for many (most?) GenAm speakers the vowel sounds are identical (other than the stress)? Nosferattus (talk) 17:24, 14 November 2022 (UTC)Reply[reply]
My understanding has always been that the ʌ=ə argument is based on a technicality ... that there are no minimal pairs if you posit secondary stress .... and therefore that the two phonemes can be united even though they sound audibly quite distinct. Ive never heard it claimed that any sizable group of people actually pronounces the vowels of cut ~ strut etc as a literal IPA [ə] schwa. If there are people actually saying [ə'bəv] for "above", here in the United States, then perhaps I need to get out more, because I've never heard that at all. That sounds sarcastic, but obviously the people here and the many dictionaries using this scheme can't all be wrong. Still, I just want to make sure this isn't based on a misunderstanding of some similar type to the Florida discussion above where dictionaries have been using "ô" or some other symbol to indicate two audibly distinct vowels. The fact that this seems to come up over and over again not just here but on linguistics forums indicates to me that there are at least two credible sides to the debate.
As for my pronunciations .... no, I dont unify could / cud or any other pair of words in which one member has /ʌ/ and the other has /ʊ/. STRUT also has /ʌ/. Perhaps my dialect is in the minority, although I want to at least add that I wouldnt assume it is just New England that speaks like me, as this has nothing to do with cot/caught or horse/hoarse and may have a completely different distribution. The word above is [ʌˈbʌv] for me, though I'd accept [ə'bʌv] in rapid speech, but never *[ə'bəv]. I dont have a good answer for how to transcribe above but we should be able to find an agreement as a community. Soap 22:14, 14 November 2022 (UTC)Reply[reply]
@Soap You should watch the video by Geoff Lindsay by OP. Vininn126 (talk) 22:15, 14 November 2022 (UTC)Reply[reply]
I admit I didnt have time to play the video this morning. Having played it, I can't change anything I said above. It's clear to me that he pronounces the words like double and fungus with [ʌ] in the first syllable and a reduced vowel in the second syllable ([ə] in fungus, and either [ə] or just a syllabic consonant in double). He never explains in the video why he considers both sounds to belong to the same vowel phoneme, but I believe Im familiar with the argument, as I said up above ..... essentially, we can argue that [ə] and [ʌ] are the same phoneme in some (perhaps most) dialects on a technicality, even though theyre audibly distinct. But the reduced vowel system is less accurate even in those dialects where the ʌ=ə analysis works, and simply wrong for the dialects where the ʌ=ə analysis breaks down.
As for the schoolbook respelling "uh" for both /ə/ and /ʌ/? I'd say that's a weak argument. There simply arent enough vowel symbols in the English alphabet to represent every IPA phoneme properly. Schoolbooks often run into the same problem with /æ/, but nobody would take that as evidence that English is losing its /æ/ vowel.
I can accept phonemic stressed /ə/ as a phoneme, even if it is realized as essentially [ʌ], since people seem to want to reduce the vowel inventory as much as possible when transcribing words. But even Lindsay admits that there are some speakers, a minority, who continue to contrast /ə/ and /ʌ/. I think we should therefore continue to mark the contrast, so that we can better cover the whole range of English dialects. Soap 22:40, 14 November 2022 (UTC)Reply[reply]
I hate to say it but I wonder how much psychology is playing a role in this. Perhaps your expectations are affecting it. I also believe that east coast US accents are more conservative when it comes to this split, but if you check youglish you'll see most people using schwa. Vininn126 (talk) 22:45, 14 November 2022 (UTC)Reply[reply]
Could we at least agree that ['fʌŋ.gəs] and ['dʌb.əl ~ dʌb.l̩] would be the best narrow IPA transcriptions of the two words he spoke in that video at around the 2:50 mark? If you're hearing ['fəŋ.gəs] and ['dəb.əl] then yes we disagree on the core issue at hand. Thanks, Soap 22:55, 14 November 2022 (UTC)Reply[reply]
Well first of all in that list I don't think he's necessarily trying to present the phoneme, rather, list words that would contain it. Also that said stressed schwa and unstressed schwa will sound different, which will sound different from /ʌ/. I also wouldn't call that /ʌ/ in those words. Vininn126 (talk) 23:00, 14 November 2022 (UTC)Reply[reply]
But there are no minimal pairs between stressed /ə/ and stressed /ʌ/ in any dialect that Im aware of. So why not just call the supposed stressed schwa /ʌ/? Soap 23:04, 14 November 2022 (UTC)Reply[reply]
The lack of minimal pairs usually points to the lack of a phoneme, not the existence of one. Vininn126 (talk) 23:06, 14 November 2022 (UTC)Reply[reply]
But analyzing stressed /ʌ~ə/ as a single phoneme doesnt eliminate /ʌ/ as a phoneme because of words like undone, which have an unstressed /ʌ/ that sounds clearly different from the schwa. (At least in the speech I've heard.) Are you saying that the word undone should be transcribed as /ən'dən/? If so, that complicates our transcription of words like embattle, for which we list initial schwa as a valid pronunciation .... analyzing un- as /ən/ would make it seem that at least some Americans merge en- with un-, which I've never heard claimed even by people who believe in the ʌ=ə theory.
We can get around this problem by positing secondary stress for the prefix un-, which would allow us to explain how it's pronounced as [ʌ] without actually being /ʌ/ ..... that analysis works well enough, but it's also more complicated than just keeping /ə/ and /ʌ/ as separate phonemes. Soap 09:09, 15 November 2022 (UTC)Reply[reply]
Both vowels in undone are schwas for me. As to en- and em-, I tend to pronounce it more with /ɪ/ or the likes there of. As to the notion that this would complicate entries I fail so see how this is more complicated. Vininn126 (talk) 09:13, 15 November 2022 (UTC)Reply[reply]
Neither undone vowel is a schwa for me, and I'm definitely an AmE speaker. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 02:07, 16 November 2022 (UTC)Reply[reply]
Well, embattle only listed schwa as a pronunciation because it was recently added in diff by Whoop whoop, but AFAICT this is mistaken: it may be a regional pronunciation, but it's not GenAm, where (as Vininn says) it's /ɪ/ (or else /ɛ/), and hence en-, em- is distinct from un-.
Whoop whoop, I think we should also discuss /ɾ/ before you add it to /broad/ transcriptions; until now we've only given that in the [narrow] transcriptions. - -sche (discuss) 03:07, 16 November 2022 (UTC)Reply[reply]
@-sche Discussion of /ɾ/ would probably best be split out into a new section, given that it isn't directly related to the current one and how long the current one is. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 03:47, 16 November 2022 (UTC)Reply[reply]
I doubt strut and comma are consistently audibly distinct in General American. They are variable in Standard Southern British as Geoff Lindsey points out in his blog post about the strut vowel. When he has cut out a bunch of strut vowels out with audio editing software so that we can play them back in isolation, some of them sound like a mid-central schwa, others like a lower and backer [ʌ], and others like a lower central [ä]. He shows similar variation in the comma vowel (unstressed schwa) and the first element of the diphthong of goat. So all three vowels vary within a sort of triangle in the phonetic vowel space. Lindsey's post is not about American English, but I suspect there's similar variation in the strut and comma vowels in many General American type accents, and that if you were to play a wide selection of instances of each vowel sound back in isolation, you couldn't very reliably tell which was strut and which was comma by the phonetic quality of the isolated vowel sounds. (Though I think not many North Americans would have the extremely open [ä] pronunciation of strut or comma that sounds either Queen Elizabethy or Cockney.)
Double is a bad example because the strut and comma vowels are in different environments (stressed and unstressed; the second is before /l/ so it's affected by velarization and possibly l-vocalization) so you can expect them to sound different. Some better strut and comma minimal pairs can be generated with the un- prefix because it has an unstressed strut: unequal versus an equal for instance. I can contrast those if I'm emphasizing the difference, but it's not consistent. I might have a contrast in schwa-y vowels, but it doesn't correspond to the strut and comma lexical sets. Like I sometimes might distinguish between the final vowels of Rosa's and roses as Soap does, but that isn't the strut and comma distinction because they have the vowels /ə/ (comma) and /ɪ/ in old-fashioned RP.
Strut equaling comma is weird for me for a different reason. I don't consistently distinguish strut from comma, and my /ɪ/ vowel is schwa-like too so using the same symbol for kit and comma makes sense to me. So, if strut equals comma and kit equals comma, then strut equals kit, but that's not true because I pronounce kit with a higher vowel than strut. So in my weird idiolect I'm not sure exactly how to divide up the historical strut, comma, and kit lexical sets into phonemes. — Eru·tuon 00:17, 15 November 2022 (UTC)Reply[reply]
Geoff Lindsey's accent is nowadays significantly more Southern British English influenced, which has the distinction (and also the FOOT/STRUT split). When he talks about his native accent he's talking about the significantly more Scouse accent he had as a child. I agree that when he says the words the phones are different, however when the Americans in the video say the words they are largely the same. --Muzer (talk) 01:12, 16 November 2022 (UTC)Reply[reply]
@Soap, separate question re your earlier comment: where's the /ʌ/ you say you have in Rosa's roses? I thought the traditional analysis was Rosa's = /ə/ and roses = /ɨ/ ~ /ɪ/ (~ /ə/), do you lower and back both vowels (Rosa's to /ʌ/ and roses to /ə/)? - -sche (discuss) 22:58, 14 November 2022 (UTC)Reply[reply]
Thanks for asking. Yes, Rosa's has /ʌ/ and roses has /ə/. I had actually written out that Rose is would have /ɪ/, and then deleted it as I felt it would be a distraction since people could say it's secondary stress, a word boundary, or something else. Which is all fair. I dont know offhand whether I have a three-way minimal pair for /ʌ ə ɪ/ in unstressed position. Soap 23:17, 14 November 2022 (UTC)Reply[reply]
Interesting. I don't want to doubt someone's perception of what they're saying, but (like Vininn) I think psychology is influencing how people interpret the differences in sound that they hear, like the guy who said there was a schwa-vs-/ʌ/ contrast in "undone" because he (correctly!) perceived there was a contrast ... it just isn't a schwa-vs-/ʌ/ contrast, because for speakers who distinguish schwa-vs-/ʌ/ undone's vowels are both /ʌ/. I'm not aware of any historical precedent for Rosa's having /ʌ/, so it might be helpful if we could find (or record) some audio and try to check that the difference is not, in fact, something else. - -sche (discuss) 01:22, 15 November 2022 (UTC)Reply[reply]
@Soap: Could you clarify your reasoning about how the schwa sound you have in pull is a problem for the theory that /ʌ/ is the same phoneme as /ə/? Does it contrast with a [ʌ] sound in a similar environment in words like pulverize (/pʌl/) or mull and how exactly? And how is it phonetically different from the vowel in could so that you wouldn't consider it to belong to the same phoneme? I think I sometimes pronounce pull a similar way, though I'm from the Upper Midwest. I keep hearing that fellow North Americans have a single vowel phoneme in pull (/ʊ/), mull (/ʌ/), bowl (/oʊ/) and maybe even ball (/ɔ/) or pall (/ɑ/), though that is not true for me, so probably some clarification would be helpful. I do contrast all of these, though pull and mull are closest phonetically, and I also have a schwa-like vowel in pill (/ɪ/) which is nevertheless quite distinct from the others, I guess higher than pull and mull. The velarized or vocalized /l/ does weird things to the preceding vowel. — Eru·tuon 23:07, 14 November 2022 (UTC)Reply[reply]
Just chipping in saying that for me pull will have schwa, and bowl will have /oʊ/, and ball has /ɑ/. Western with Wisconsin influence. Vininn126 (talk) 23:10, 14 November 2022 (UTC)Reply[reply]
Yes, pull and mull have different vowels. I would say that could has a different vowel as well. At this point, though, while I thank you for your interest, this might be no more than a distraction from the issue at hand, the ʌ=ə argument, .... up above when I said that I might just as well say that ʊ=ə, i did not expect this thread to grow so fast, and wasnt intending to present it as a counterargument so much as an afterthought. After all, the stressed schwa for me occurs in just three words, all of them very similar. My preferred analysis is that /ʌ ə ʊ/ are all separate phonemes and that /ə/ is confined to unstressed syllables. Soap 23:16, 14 November 2022 (UTC)Reply[reply]
For the vast majority of Americans and even a good number of Brits, Lindsay Ellis is right. If we are updating our GA transcriptions I believe we should be using schwa, not upturned v. As to regional lects such as various East Coast lects, as I remember Soap speaks, we should consider that differently. Vininn126 (talk) 18:07, 14 November 2022 (UTC)Reply[reply]
Support this change. I've always wondered why this was the case, especially comparing it to languages that have a true /ʌ/ which can cause a disconnect while trying to compare sounds. AG202 (talk) 20:33, 14 November 2022 (UTC)Reply[reply]
I was sceptical of stressed schwas, but am persuaded it's a fine analysis for modern GenAm; I would just suggest that we should ideally preserve the information about which schwas were /ʌ/ at earlier points in history (compare e.g. the obsolete pronunciations we list on some entries like one) — it sometimes feels like we, the English Wiktionary, cover Ancient Greek phonological history and verb conjugations more comprehensively than English ones! It'd be great if we could get an English pronunciation module going, even if it only works on many but not all entries and some cases still need manual {{IPA|en|...}}, so that we could automatically show developments like older American /ʌ/, /ɝ/ to modern GenAm /ə/, /ɚ/ — perhaps displaying them in reverse order, of course, putting the modern pronunciation first since it's of the most interest to readers, and maybe even collapsing older pronunciations similar to how we do for Ancient Greek. - -sche (discuss) 21:23, 14 November 2022 (UTC)Reply[reply]
Too be fair many British dialects do this, too. However, I agree that marking it as a historical pronunciation is a good idea, but presenting it as modern is just lying to the readers. Vininn126 (talk) 21:25, 14 November 2022 (UTC)Reply[reply]
I agree with this point, it would be helpful to show (or have the option of showing) historical development similar to the various stages shown for Ancient Greek entries. —Al-Muqanna المقنع (talk) 12:06, 15 November 2022 (UTC)Reply[reply]
Fortunately in this case modern Standard Southern British retains the distinction, so you can always look in that transcription to find the traditional distinction. But I agree that proper historical support would be pretty cool. --Muzer (talk) 01:15, 16 November 2022 (UTC)Reply[reply]
I second (fifth?) the proposal to show the stages of historical development in English pronunciation entries. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 16:29, 16 November 2022 (UTC)Reply[reply]
This may be getting off-topic, but the mentions of bull and bowl remind me of when we had a user adding /kl̩/, /bl̩/, etc (!) as GenAm pronunciations of cull, cole, and bull, bowl, etc, which was undone (discussion in 2014, short 2015 follow-up) because even if some people merge those, it's not in General American and it's almost certainly not to /l̩/. - -sche (discuss) 01:22, 15 November 2022 (UTC)Reply[reply]
It is arguable that English sonorants are vowel-like, but with an inserted schwa, so I think our current transcription of including /ə/ in parentheses is best. Vininn126 (talk) 06:43, 15 November 2022 (UTC)Reply[reply]
Support stressed schwas as ə. —Al-Muqanna المقنع (talk) 12:04, 15 November 2022 (UTC)Reply[reply]
I’m not sure how relevant this is, because phonemes denoted by the same IPA symbol in different languages may have different phonetic realizations, but while Dutch does not have the phoneme /ʌ/, it does have stressed schwas, such as the first vowel in je van het. To me, it does not sound close to /ʌ/ or in any case much closer to /ʏ/. Apparently the unstressed schwa does so too to Dutch ears, as witnessed by pronunciation spellings like Het leven is vurrukkulluk.  --Lambiam 23:16, 15 November 2022 (UTC)Reply[reply]
I still think this is a bad idea, but my arguments dont seem to be moving anybody, and I dont want to wear out my welcome. I just have one more thing to add ..... in all other Germanic languages that I know of, the schwa is treated like a vowel in its own right, and nobody feels the need to try to unite it with one of the full vowels, even if it would be technically possible to do so. To do what we're doing is against the tradition of all Germanic languages .... simplifying the phonology on a technicality .... and in my mind, it will make things more confusing for our readers. That's all I can add without repeating myself. Thank you, Soap 21:01, 17 November 2022 (UTC)Reply[reply]
FWIW, I feel a lot less strongly about merging /ʌ/ with schwa (don't take me as a !vote for it, I would just be OK with it if it's what other people think is best), than about the floor vowel. If we continue presenting /ʌ/ and schwa as different and yet the best analysis turns out to be that GenAm actually only has schwas that differ by (secondary or primary) stress, we're still only requiring people to learn that multiple* symbols for pretty similar sounds both (always) indicate the same sound, whereas representing floor with /ɔ/ is asking people to realize that the same symbol means two markedly different sounds and it's not 100% guessable which is meant in a particular case.
*I was going to say "two symbols", but the sources that merge vs distinguish /ʌ/ and /ə/ seem to be largely the same as those that merge vs split /ɝ/ and /ɚ/, so presumably if we're going for maximum distinction we'll be retaining the traditional /ɝ/ notation of term or turf, and thus using three symbols where e.g. Merriam-Webster uses just schwa? But some of the users who've been making /ɝ/→/ɚ/ changes are against /ʌ/→/ə/; is there any scholarly work saying term has schwa but undone doesn't, or...? - -sche (discuss) 22:27, 17 November 2022 (UTC)Reply[reply]
Oh, I've just noticed Merriam-Webster is doing something a bit slick (and wishy-washy): they do use ‹ə› in their non-IPA notation of a lot of things in their dictionary, e.g. hurry, under, bird ‹ˈhə-rē / ˈhər-ē, ˈən-dər, ˈbərd›, as Lindsey noted ... but the pronunciation key doesn't commit to all of these being IPA [ə], instead they talk about unstressed syllables' ‹ə› corresponding to IPA [ə], primary- or secondary-stress syllables' ‹ə› corresponding to IPA [ʌ], ‹ər› as in merger, bird being [ɝ, ɚ], ‹ər, ə-r› as in hurry being its own thing, etc. - -sche (discuss) 09:23, 18 November 2022 (UTC)Reply[reply]
@Soap, the argument that we don't unite vowels to schwa in other Germanic languages seems to me to be mistaken or begging the question. For example, we do represent the final vowel of machen and Tage and Apfel as a schwa, although earlier in German history they were different; presumably you think the difference is "well, those vowels really are now schwas, not separate sounds", but then ... that's the argument being made by people who want to unite these, too. - -sche (discuss) 16:53, 18 November 2022 (UTC)Reply[reply]
i would say it's more analogous to analyzing entdecken with all three vowels as schwa .... in essence, picking an arbitrary vowel to unite with schwa purely for the purpose of simplifying the vowel inventory. But again, there's nothing else I can add without repeating what I've already said above. Soap 23:20, 19 November 2022 (UTC)Reply[reply]
Consider the sentence, “Tatooine was a circumbinary planet that had two Suns, which were called the Summer Sun and the Winter Sun.” I think that the second word of the compound “Summer Sun” in this sentence is unstressed. Is this compound proper noun homophone in American English with the surname Summerson?  --Lambiam 21:50, 19 November 2022 (UTC)Reply[reply]
No, they would be pronounced differently. This, and words like uppercut, have been sometimes used as evidence that English has a phonemic contrast between /ʌ/ and /ə/ that cannot be analyzed away. However, anyone on the opposite side of the argument can always posit secondary stress for any vowel that gets in the way of the analysis. While this isn't the system I prefer, it is consistent, and therefore I think this question doesn't really get to the core of the /ʌ~ə/ debate, which is really about whether stressed schwa contrasts with /ʌ/. Soap 23:24, 19 November 2022 (UTC)Reply[reply]
In other words, the position of one side in the debate is that American English has an unstressed /ʌ/, contrasting with unstressed /ə/, but no stressed /ʌ/.  --Lambiam 21:43, 20 November 2022 (UTC)Reply[reply]
No, of course not. The idea is that apparent contrasts between unstressed /ʌ/ and /ə/ can actually be explained as contrasts between /ə/ with some degree of stress and /ə/ that is completely unstressed. See the Wikipedia article w:Stress and vowel reduction in English#Degrees_of_lexical_stress, in particular "it is common for tertiary stress to be assigned to those syllables that, while not assigned primary or secondary stress, nonetheless contain full vowels": if the degree of stress rather than the level of vowel reduction is taken to be the underlying contrast, then the degree of vowel reduction can be interpreted as a secondary, allophonic property of a vowel.--Urszag (talk) 22:20, 20 November 2022 (UTC)Reply[reply]
The problem with saying that the debate "is really about whether stressed schwa contrasts with /ʌ/" is that if we go by the traditional definition of schwa where it does not occur in stressed syllables, it obviously does not contrast with any stressed vowel: there is nothing special in terms of patterns of contrast about how it compares to stressed /ʌ/, you could equally transcribe stressed /ɪ/ or /ʊ/ as /ə/ with no ambiguity. Identifying /ə/ with /ʌ/ would have to be based either on a supposed greater phonetic similarity between /ə/ and this particular stressed vowel, or on the phonological phenomenon of certain words like what, was, because, of having strong (stressed) forms with /ʌ/ in some American English accents, where the replacement of the word's original stressed vowel quality with /ʌ/ can be explained as caused by vowel reduction followed by "re-stressing" of /ə/ .--Urszag (talk) 22:20, 20 November 2022 (UTC)Reply[reply]
Okay, thank you. I agree with everything you say in this paragraph. I'd even say that if we must unite schwa with some other vowel, /ʌ/ seems like the best choice. However I still oppose this change because I believe it does nothing useful, and makes analysis more complicated. As I said above, this is against the tradition of all Germanic languages, and makes about as much sense to me as transcribing German entdecken with three schwas. Perhaps English is different somehow. But is it?
What do we gain, exactly, by reducing the English vowel inventory by one? People up above have admitted that the stressed schwa has a clear allophone of [ʌ], and the ʌ=ə analysis would make it the only vowel in the English language with allophones so far apart. This may mislead readers who assume, following the pattern of all the other vowels in our transcriptions, that stressed schwa really is pronounced as [ə] and not as [ʌ].
If we do go ahead with the change, it seems we'll need to start positing secondary stress all over the place, in a pattern that seems arbitrary to me (is it just going to be for [ʌ]?), and in your other paragraph you mention that some scholars also believe in tertiary stress, which I'd never heard of before now. If a phonetic analysis requires that much extra work to keep itself together, I think it's more sensible to just stick with the traditional system in which the schwa is a vowel in its own right, occurring only in unstressed syllables. Just like all the other Germanic languages, along with some Uralic languages and Southeast Asian languages. Soap 23:01, 20 November 2022 (UTC)Reply[reply]
I also support continuing to use /ʌ/ rather than replacing it with /ə/.--Urszag (talk) 23:24, 20 November 2022 (UTC)Reply[reply]
While looking for words with both wedge and schwa in their conventional transcriptions (I later found some, like London), I stumbled across this post by a linguistics professor making a case for keeping /ʌ, ɝ/ separate from /ə, ɚ/ in spite of them sounding the same, because they convey stress differences and other information. (As I commented below, this would also support undoing the /ɝ/→/ɚ/ changes that some of the proponents of /ʌ/-not-/ə/ have made, though: I'm not seeing any source or rationale that keeps /ʌ/ that doesn't also keep /ɝ/.) - -sche (discuss) 02:15, 21 November 2022 (UTC)Reply[reply]
@-sche What I don't understand about this argument is why would we want to indicate a difference in stress by using a completely different phoneme/symbol rather than stress markers? And why would we only do this for schwa and not for other vowels? (The professor's argument that it's about the importance of the vowel just seems like a repackaging of the tautological stress argument.) We seem to be bending over backwards to find excuses for why they should remain separate ('some AmEng speakers pronounce them differently', 'we would lose information about the historical pronunciation', 'we would lose information about the importance of the vowel', or just 'it's the traditional transcription'), while ignoring the fact that the entire reason we are providing IPA transcriptions is to tell readers how a word is pronounced, and keeping /ʌ/ and /ə/ separate clearly doesn't help that. At best, it's confusing; at worst, it's teaching non-native speakers how to pronounce GA incorrectly. Unless we're going to put a big warning message next to every /ʌ/ in GA pronunciations, explaining that it's actually just a stressed /ə/ and not the open-mid back unrounded vowel (as described at Wikipedia and elsewhere), I think we're doing our readers a disservice. And it's not like we'd be blazing a new trail. Merriam-Webster, Oxford, and Routledge have all switched to using just /ə/ for GA. It seems that we would be in good company. Nosferattus (talk) 20:05, 22 November 2022 (UTC)Reply[reply]
I think we may have reached an impasse, as at this point some of us (myself included) are simply talking past each other. Just a few paragraphs above I used similar words to you, but arguing for the opposite point ..... I said we'd be misleading our readers if we transcribed words like cut with schwa. My worry was that English language learners might take it literally and believe that they are actually supposed to say [kət] instead of [kʌt]. So we see the same problem, but propose opposite solutions for it. It may be that, as some others have suggested, we are simply hearing the clips differently. Assuming youre as confident of your analysis as I am of mine, it may be that we will never convince each other of anything and might need to work towards some out-of-the-box solution instead of just declaring one viewpoint correct and the other(s) incorrect. Soap 23:22, 24 November 2022 (UTC)Reply[reply]

Pronunciation LayoutEdit

Hi, everyone! @Vininn126 and I had this discussion where the Pronunciation section in an entry (regardless of language) should follow the table of contents found in WT:Pronunciation. The reason is because I'm an editor of Tagalog language entries here, and we put the IPA above the hyphenation of the word (which he pointed out should be syllabification, which is technically what we were going for). However, looking at WT:Pronunciation, Audio File is at the bottom, so if we're to follow the Table of Contents in ordering the Pronunciation section, that would mean we put Audio at the bottom, which is contrary to all English entries, afaik, but he said that was an exception. My reaction was, I find that weird that there's an exception, why couldn't we have a template provided similar to what we have in the Entry layout page, if we're all gonna follow a specific order? Because if not, if we lack a template, I kinda feel that it's open to interpretation, and the order of the Pronunciations section we use in Tagalog language entries could be just as valid. Thoughts? Mar vin kaiser (talk) 11:20, 15 November 2022 (UTC)Reply[reply]

User_talk:Mar_vin_kaiser#Pronunciation_order For reference. Vininn126 (talk) 11:45, 15 November 2022 (UTC)Reply[reply]
Hyphenation is of less importance and less interest than IPA and audio. The Polish order makes the most sense. Ultimateria (talk) 02:29, 17 November 2022 (UTC)Reply[reply]
Frankly, I find the Audio template too bulky, so I usually place it in the end. Unless the template is changed, I would prefer the order to be [IPA, Rhymes, Homophones, Hyphenation, Audio]. I find a template specifically for syllabification quite useless, since it can be represented in the IPA part. Hyphenation, on the other hand, often isn't intuitive and thus should be present somewhere in the entry (e.g.: Portuguese carro is syllabified as ca.rro, but hyphenated as car-ro). - Sarilho1 (talk) 11:30, 17 November 2022 (UTC)Reply[reply]
However, I do think that it would be good for us to agree in a common standard, including regarding the use of hyphenation, rather than having each language or user do as they please. - Sarilho1 (talk) 11:32, 17 November 2022 (UTC)Reply[reply]
I believe the stanard should be the order of headers on WT:Pronunciation but with audio under IPA. It's what most people want. We should do this by adding a section on the page with the header Order of items and list out all the possibilities in order. Vininn126 (talk) 17:08, 17 November 2022 (UTC)Reply[reply]
Do we have any sources about it being what most people want? - Sarilho1 (talk) 17:13, 17 November 2022 (UTC)Reply[reply]
I suppose not, I'm using guesswork. I think the other most logical place for audio is at the top, then IPA, then rhymes, then syllabification, then hyphenation. Vininn126 (talk) 17:28, 17 November 2022 (UTC)Reply[reply]
I for one honestly don't care much, as long as IPA and audio are before the rest. And even so, the Tagalog order was also fine, I didn't mind. Thadh (talk) 18:07, 17 November 2022 (UTC)Reply[reply]
Whatever order we decide on, are we grouping things that pertain to one pronunciation vs another, and ordering within each grouping? To me, it makes more sense to present "IPA of non-rhotic UK pronunciation of fiver; audio of non-rhotic UK pronunciation" followed by "IPA of rhotic US pronunciation; audio of rhotic US pronunciation", rather than "non-rhotic IPA; rhotic IPA followed by non-rhotic audio as if that audio goes with that IPA; then rhotic audio", though I can also see how listing only a UK pronunciation and all its trappings (rhymes, etc) then a US pronunciation and its different rhymes (e.g. decal), could push the existence of e.g. a New Zealand pronunciation off the screen, whereas listing all the IPAs up front would keep them all visible at once. I suppose there are benefts or drawbacks to either. I agree that IPA and audio of how to pronounce a word is probably what most people looking at a ===Pronunciation=== section are most interested in, not hyphenation or rhymes. - -sche (discuss) 22:35, 17 November 2022 (UTC)Reply[reply]
I believe anything nested would then follow the rules are presented before, so if you have British IPA determining something then British audio would go underneath it, and American audio would introduce a new "L1" as it were. Vininn126 (talk) 00:07, 18 November 2022 (UTC)Reply[reply]
BTW: as described in its documentation, the hyphenation template is indeed intended to be for hyphenation (car-ro), not syllabification, which as Sarilho says should already be indicated in the IPA. Many people don't grasp the distinction, so many uses are wrong, but I don't know how we could make it any clearer. I suppose we should wikilink the word "hyphenation" to a page describing what hyphenation is and how it differs from syllabification. Many of our hyphenation listings are incomplete and it's not always clear if references even exist which would support them, as a separate matter. - -sche (discuss) 22:48, 17 November 2022 (UTC)Reply[reply]

Also, I'd like to suggest this format for tabla:

  • (tablá) IPA(key): /tabˈla/, [tɐbˈla]
  • (tabla) IPA(key): /ˈtabla/, [ˈtɐb.lɐ]

or similar, rather than list all the definitions for those spellings/pronuciations. Ultimateria (talk) 05:44, 18 November 2022 (UTC)Reply[reply]

Hiragana in Japanese inflection tablesEdit

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo):

Is there any reason why we give a full hiragana transcription between the conjugated form and its romanization in Japanese inflection templates, like in 豪快? It's completely redundant (it just repeats the same information given in the two other columns). I would propose to get rid of that column to increase readability and reduce redundancy. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:47, 15 November 2022 (UTC)Reply[reply]

Romanization is not Japanese but it serve as a reference for people not familiar with Japanese. Conjugated form may contain kanji that has more than one readings, thus having a full hiragana transcription is appropriate to distinguish. See 行く where 行く can be pronounced as both いく and ゆく.Shen233 (talk) 15:54, 15 November 2022 (UTC)Reply[reply]
@Shen233: I think you missed my point. Even if a word has more possible pronunciations, like 行く, having いく (iku) or ゆく (yuku) doesn't add anything to the information already given by the romanizations "iku" and "yuku". You're just repeating the same information twice. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:22, 15 November 2022 (UTC)Reply[reply]
Fair, one thing I can envision is to get rid of hiragana column, but put furigana instead on all the kanji.Shen233 (talk) 22:25, 19 November 2022 (UTC)Reply[reply]
If we're getting rid of the kana anyway due to being mostly extraneous, how would furigana be useful? ‑‑ Eiríkr Útlendi │Tala við mig 08:11, 20 November 2022 (UTC)Reply[reply]
If anything, I rather delete romaji column in favor of kana. Shen233 (talk) 08:22, 20 November 2022 (UTC)Reply[reply]
I've got the same feeling about removing kana. All textbooks I used were based on kana inflections with no rōmaji but we don't have to remove kana to reduce the number of columns. Using ruby we could have both in the same column, e.g.
  1. () (kuru)
  2. () (koi)
  3. () (kure)
(no hyperlinks are required). Anatoli T. (обсудить/вклад) 08:31, 20 November 2022 (UTC)Reply[reply]
@Shen233: I still think furigana is redundant (it's English Wikipedia, and the hiragana reading is given in the "kanji in this term" box anyway), but it's definitely 100 times better than having the hiragana column, that takes space for no good reason AT ALL. Though I do feel like we all need to remind ourselves here that this is not Japanese Wiktionary but ENGLISH Wiktionary. Treating every Japanese text as if it was a Japanese book for children or a textbook for foreigners (instead of a DICTIONARY for ENGLISH speakers) should not be the direction we want to follow here.
@Eirikr: I agree with you, too. My preference would be to get rid of ANY hiragana (hiragana column AND furigana), but I feel a lot of editors here love their kana (for reasons that escape me completely), so being realistic I understand it's not going to happen. But yeah, if it was up to me I'd delete all unnecessary hiragana and furigana, it only add noise and makes every entry unnecessarily heavy.
@Atitarev: That would be the least bad solution for me, so I could compromise to that. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 12:35, 20 November 2022 (UTC)Reply[reply]
@Sartma: Japanese inflections without kana? Seriously? No way. It's a native way, any Japanese dictionary would use only kana. Besides, there is no 100% correspondence between kana and rōmaji. Nothing is redundant. Keep the way it is. Anatoli T. (обсудить/вклад) 22:15, 15 November 2022 (UTC)Reply[reply]
@Atitarev: Japanese dictionaries don't give inflections in kana like we're doing on Wiktionary, so I'm not sure what you're referring to. Using a column to give a "kana reading" or the inflected forms (which already are mostly in kana anyway) when you have the romanization next to it is 100%, absolutely redundant. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:31, 15 November 2022 (UTC)Reply[reply]
I think there are ways to make the table more space-efficient without reducing the amount of information they contain. They're very spread-out at the moment. Theknightwho (talk) 23:41, 15 November 2022 (UTC)Reply[reply]
@Theknightwho: Using furigana should be enough. (even though, again, I don't see the use of them in a dictionary for English speakers that gives romanizations...). But at least everything would be more compact. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 21:53, 17 November 2022 (UTC)Reply[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────I take your point, that for words like 食べる or 食事する including hiragana on every line of the paradigm is repetitive. (By the way, could you guess that I'm on my lunch break?) There are some words (e.g. (), (), 来る(くる), 来れ(くれ), 来い(こい)) where that is not the case, but in the majority of cases the hiragana is predictable. I would support making the tables more compact or efficient, but would hesitate to get rid of (usually) redundant elements wholesale. Cnilep (talk) 03:33, 17 November 2022 (UTC)Reply[reply]

@Cnilep: How would 来る (kuru) be a different case? As long as you have the romanization, all furigana or kana transcription would be redundant. What's "not enough" with 来ない (konai), 来ます (kimasu), 来れば (kureba), 来い (koi), etc? There's no need for the furigana if you give the romanization. They carry the exact same information. The weird argument that "it's a native way of indicating pronunciation" makes no sense at all in a dictionary that's aimed at English speakers, like the English version of Wiktionary. English speakers are not Japanese native speakers... — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 22:02, 17 November 2022 (UTC)Reply[reply]
Oh, I guess I didn't quite take your point. I thought that your argument was that repeating the furigana is redundant, since the differing okurigana is present on the kanji forms. You actually meant that kana is redundant to romaji, correct? In that case, my comments are off base. Even so, I am not sufficiently convinced that kana is unwarranted, even for an English-speaking readership. My personal opinion, for what its worth, is unchanged: I support efficiency but not removing information. Cnilep (talk) 02:05, 18 November 2022 (UTC)Reply[reply]
@Cnilep: but removing the hiragana-only column would not remove any information. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:18, 18 November 2022 (UTC)Reply[reply]
Removing a column would not remove information if the information is repeated elsewhere (e.g. as furigana). Potentially, however, removing hiragana and not including it elsewhere would remove information. In both of my replies efficiency is meant to describe such actions as removing the hiragana column from the chart and including furigana on each kanji form in the chart. Cnilep (talk) 01:24, 21 November 2022 (UTC)Reply[reply]
I think it would not be a good idea to remove the hiragana. While it is not usual in Japanese to spell verbs fully in hiragana, there are nevertheless conventional aspects to hiragana spellings that aren't fully captured by many common styles of romanization, like the use of づ (as in つづか tsuzuka) vs. ず (as in こず kozu).--Urszag (talk) 02:48, 18 November 2022 (UTC)Reply[reply]
@Urszag: I understand that point, but you do have the hiragana spelling (as furigana) in the headword, AND in the "Kanji in this term" box already, so it's again just another instance of redundancy. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 08:25, 18 November 2022 (UTC)Reply[reply]
  • Some general thoughts.
  • This is the English Wiktionary. We can only assume that our readers have facility in the English language.
  • Removing romanizations completely from Japanese inflection tables removes a large amount of information for 100% of our entries, and renders them effectively useless to a chunk of our target audience -- English readers who may not have any knowledge of Japanese.
  • Removing kana completely from Japanese inflection tables removes a small amount of information for a small percentage of specific cases -- those few cases where Hepburn romanization is a lossy process, such as losing the distinction between (zu) and (zu), or (ji) and (ji), or (o) and (o), or (ha) and (wa), etc.
→ That said, does this matter?
Serious question. I see three relevant categories of our readership to consider here:
  • That portion of our readership that doesn't read Japanese at all won't care.
  • That portion of our readership that already reads Japanese will know enough that they also won't care.
  • That portion of our readership that is trying to learn Japanese will care, and will be potentially impacted by not knowing that zu in the romanized spellings of certain words should be spelled as in kana, etc.
However, if the kana spellings are given elsewhere in the same entry, does the absence of kana from the conjugation tables matter? I don't think it does.
  • Including a kana column does make our tables larger than necessary. This is a problem for smaller UIs, such as mobile phones.
  • Including kana as ruby / furigana obviates the need for a dedicated column, and could be a viable workaround here.
I do have concerns about the potential for ruby causing confusion among our English-reading audience, as ruby text is not commonly used in the English-language world. Reader unfamiliarity with the conventions of ruby text could cause confusion that things like (しゅつ) (shutsu) might represent a single grapheme, rather than the four separate graphemes , , , and , with the latter three providing a phonetic guide to the first -- which guide is wholly useless to a portion of our English-reading audience.
I understand the utility of kana for those Wiktionary users who are trying to learn Japanese. I am worried that we are overusing kana in places that might not be appropriate for a broader audience, much as Sartma points out above.
‑‑ Eiríkr Útlendi │Tala við mig 20:05, 21 November 2022 (UTC)Reply[reply]
@Eirikr: Amen! — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 09:30, 22 November 2022 (UTC)Reply[reply]

Furigana and search indexingEdit

My immediate reaction upon seeing this: no, no, please, NO! Sorry for the emphatics, but there is a huge wrinkle I think people have missed here. (Not your fault for not noticing, since the discussion has mostly been around the needs of aesthetics and of human readers. And unless you are presently actively involved in studying Japanese, and view English Wiktionary as a priceless Japanese learning tool—as I am and I do.)

Furigana messes with searchability, in several big ways:

  1. Replacing both the current kana and romaji with furigana could make the inflected forms impossible to locate by their reading, unless they have their own non-lemma entries as soft-redirects (and the vast majority of Japanese inflections do not).
  2. Still, adding furigana to supplant one or the other of kana or romaji seems like a no-brainer, then, right? Unfortunately, furigana requires the (human or machine) annotator to decide on segmentation—that is, which kanji goes with what sound(s)/kana/romaji. In non-jukugo compounds, this can be entirely arbitrary. That means that you have to make a choice:
    • Don’t segment—then the kanji will be indexed for search, and so will the reading in kana (or ruby romaji, I suppose), but you will have to find the segmentation elsewhere, which—as mentioned above—is an important pedagogical and etymological tool. Also, from some CodePen experimentation, I believe if you put more furigana than fit over the first kanji, you must segment unless you want empty space between the kanj